Design principles

Lightweight

The classes in this package are designed to be lightweight, taking up as little memory as possible. For example, individual atoms are not stored in Python classes, and are only requested when needed. This is because the library is designed to work with an existing modeling package, which likely already stores data on the system in its own files or data structures, such that duplicating this information would be very inefficient.

Mutable

All classes are designed to be mutable; that is, their contents can be changed after creation. For example, protein chains can be added to or removed from an existing ihm.Assembly object, or the amino acid sequence of an ihm.Entity can be extended. This because some of the modeling packages which use these classes build up their own data model in a similar way.

Types rather than enums

Where the underlying IHM mmCIF dictionary uses an enumeration, generally this corresponds to separate sibling classes in this package. For example, two datasets which differ only in their data_type in the dictionary (such as a electron microscopy density map and small angle scattering data) are represented with two classes in this package: ihm.dataset.EMDensityDataset and ihm.dataset.SASDataset. This cleanly enforces the allowed types in the most Pythonic manner.

Hierarchy of classes

The underlying IHM mmCIF dictionary is essentially structured as a set of rows in database tables, with IDs acting as keys or pointers into other tables. This is naturally represented in Python as a hierarchy of classes, with members pointing to other objects as appropriate. IDs are not used to look up other objects, and are only used internally to populate the tables. For example, to group multiple models together, the dictionary assigns all of the models the same model_group id while in the Python package the ihm.model.Model objects are placed into a ihm.model.ModelGroup object, which acts like a simple Python list.

The table-based representation of the dictionary does allow for objects to exist that are not referenced by other objects, unlike the Python-based hierarchy. Such ‘orphan’ objects can be referenced from orphan lists in the top-level ihm.System if necessary.

Equal versus identical objects

Since the Python objects are mutable, can be constructed iteratively by a modeling package, and live in a hierarchy, it can sometimes turn out that two Python objects while not identical (they point to different locations in memory) are equal (their contents are the same). For example, the two ihm.Assembly objects, one of proteins A, B, and C, and the other of A, C, and B, are not identical (they are different objects) but are equal (the order of the proteins does not matter). The library will attempt to detect such objects and consolidate them on output, describing both of them in the mmCIF file with the same ID, to avoid meaningless duplication of rows in the output tables. This removes some of the burden from the author of the modeling package, which may not care about such a distinction.

mmCIF backend

The classes in this package roughly correspond to categories in the underlying IHM mmCIF dictionary. This allows for simple output of mmCIF formatted files, but also allows for the potential future support for other file formats that support the dictionary or a subset of it, such as MMTF.