Suite101

Chemical Information Management with CML


© Adam Hughes

Anyone who has used at least two simulation codes or performed at least two types of simulations can tell you that file formatting issues can drive you crazy! Nearly every software vendor supplies their own unique file formats for inputting data, writing out data, and storing database-type information. And of course, competition often precludes other software products from incorporating these file formats. The result is often a messy pile of output files of varying types which require a lot of painful manipulation on the part of the user to glean any useful information. It is not at all uncommon for a computational scientist to develop and employ dozens of programs for the sole purpose of converting data to and from the various formats required by the programs they use in their research.

Of course, while such a gluttony of data types can be murderous to the progress of a project, these file formats are usually created with something far less malevolent in mind. In fact, it's probably safe to say that most of these formats were developed with an eye toward becoming "the" standard to which all other applications would adapt. Although some of the styles have become fairly widely used, for one reason or another none of them has become anything resembling a discipline- wide standard.

The lack of standardization of file formats can probably be traced back to the fact that each one is generally developed by a single research entity, and so the needs of the masses aren't really factored into the decision tree. With the growth of computer power over the last few years and the concomitant explosion of raw data produced, however, it has become more important than ever to try and restore some sort of order to data collecting and manipulation. One of the major efforts in this area has been made in computational chemistry with the development of the Chemical Markup Language, or CML.

Chemical Markup Language is based on XML, the Extensible Markup Language. Described as "HTML with molecules", CML strives to bring coherency to the challenging endeavor of chemical information management. CML has been designed to be human-readable and fairly easy to learn. CML, because of its XML base, is flexible, and provides meaningful categorization of chemical information. While these characteristics are clearly potentially beneficial for computational chemists, CML also has powerful implications for scientists at the information technology end of the marketplace.

In order to understand what CML is and where it is going, it is probably a good idea to look at some of the principles of its chief ancestor, XML. This will be the focus of the

Go To Page: 1 2


The copyright of the article Chemical Information Management with CML in Scientific Computing is owned by . Permission to republish Chemical Information Management with CML in print or online must be granted by the author in writing.

Post this Article to facebook Add this Article to del.icio.us! Digg this Article furl this Article Add this Article to Reddit Add this Article to Technorati Add this Article to Newsvine Add this Article to Windows Live Add this Article to Yahoo Add this Article to StumbleUpon Add this Article to BlinkLists Add this Article to Spurl Add this Article to Google Add this Article to Ask Add this Article to Squidoo