Re: AMBER: Using XML for Amber files

From: Feng X Zhou <feng.zhou.abbott.com>
Date: Fri, 7 Apr 2006 13:54:59 -0500

Xuebin,

You are probably right - these are mostly cosmetic changes that are probably not metarial enough to spent too much effort on.

Regarding FORTRAN, I am not too familiar if XML libraray can be used. C/C++ and most interprated language should be fine. Also, getting the intital sets oF XML ready is harder.... But once the ball gets rolling, one might expect most XML files to be generated by the proram itself (directly from the data structures in the program, again using XML library). In a way XML can be thought of as an extenstion to the data structures (hashs, arrays, vectors, etc.) in file format so interfacing with the program is reduced, but is a bit more verbose to setup (although, not too bad in readability).

For large trajectories, it is probably more space used as you pointed out. The main saving is it is conceptually easier to manage in code, but not sure if it is worth the space wasted (however, the disks are going to be larger and larger ...) . Most likely it is going to be slower too in terms of processing I/O. Plus, no compression either!

I see the main advantage in using XML being making the data more or less independent on the format in the file. And by doing so open standards can be use for essentially any data format, including parameters, or perhaps even part of the potential functions. This way programs maybe able to share forcefields to each other, and making information sharing easier. Then again, this might be more useful in some applications and not others.... Just want to voice this as a possible thing to consider for future. Obviously, whatever data format one use, is not going to change the quality of the modelling or make a protein fold faster, just less head-ache in dealing withe arrays of arrays or hashes of hashes in the data....


-Feng


Friday, April 07, 2006 12:54 PM
To: amber.scripps.edu
cc:
From: "Xuebin Qiao" <xbqiao.gmail.com>
Subject: Re: AMBER: Using XML for Amber files


Dear Feng:

I only partially agree with your point that XML can be a good way to describe *read only* parameter files, such as force field parameter files. Another big advantage of adopting XML is that we can have a better approach to validate parameter file via DTD.

However, from past experiences, I think following issues should be addressed:

1. Due to the nature of XML, the coding efforts in reading and writing are not quite balanced. Though reading is easy, preparing XML is a relative tedious work because we have to recursively build the DOM tree. In contrast, plain text file are quite balanced in r/w operation. So, I think XML here is more suitable to handle readonly parameter files.

2. The open source fortran XML libs are not as robust and portable as C/C++ counterparts right now. Unfortunately, most amber programs were written in fortran.

3. For large files, e.g. trajectories, their size can be 2 or 3 times larger than that of plain file. Your coding example will be very inefficient then. If insisted, you have to
switch your design from DOM to SAX, and it is a totally different coding patterns. For programmers with only occasional XML coding needs, it apparently would not reduce much coding efforts.

best regards

qxb
-- 
... there have been two really clean, 
consistent models of programming so far: 
the C model and the Lisp model. 
These two seem points of high ground, 
with swampy lowlands between them.
                                      --Paul Graham 
-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber.scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo.scripps.edu
Received on Sat Apr 08 2006 - 18:29:15 PDT
Custom Search