ProvenanceΒΆ

The IHM dictionary is designed to capture all aspects of integrative modeling, from the original deposited experimental data to the final validated models. This allows for maximum reproducibility and resuability. However, many modeling packages are only concerned with the conversion of their own inputs to output models (for example, a model of a complex may be generated by docking comparative models guided by some experimental data of the entire complex). If only this last step of the procedure is captured in the output mmCIF file (in this case, without any information on how the comparative models were themselves obtained) the chain is broken and the outputs cannot be reproduced.

One solution to this problem is to diligently ensure that every input to the modeling has been deposited in an appropriate database and always refer to inputs using ihm.location.DatabaseLocation. In cases where this is not possible, the library provides some metadata parsers in the ihm.metadata module. These will make a best effort to extract any metadata from files available on the local hard drive to better describe their provenance. For example, if the file contains headers or other information that shows that it is merely a copy of a file deposited in an official database, the metadata parsers will return a suitable DatabaseLocation for the dataset. Other information, such as the software used to generate the file, may be available in the metadata.

For more details, see ihm.metadata.MRCParser for electron microscopy density maps (MRC files) or ihm.metadata.PDBParser for coordinate files in PDB format.