Usage of the library for output consists of first creating a hierarchy of Python objects that together describe the system, and then dumping that hierarchy to an mmCIF file.
For a complete worked example, see the simple docking example.
The top level of the hierarchy in IHM is the
ihm.System. All other
objects are referenced from a System object.
Any data used anywhere in the modeling (including in validation) can be
referenced with an
ihm.dataset.Dataset. For example,
electron microscopy data is referenced with
ihm.dataset.EMDensityDataset and small angle scattering data with
A dataset uses an
ihm.location.Location object to describe where it is stored.
Typically this is an
ihm.location.DatabaseLocation for something
that’s deposited in a experiment-specific database such as PDB, EMDB, PRIDE,
or EMPIAR, or
ihm.location.InputFileLocation for something that’s
stored as a simple file, either on the local disk or at a location described
with a DOI such as Zenodo or a publication’s
supplementary information. See the
for more examples.
The architecture of the system is described with a number of classes:
ihm.Entitydescribes each unique sequence.
ihm.AsymUnitdescribes each asymmetric unit (chain) in the system. For example, a homodimer would consist of two asymmetric units, both pointing to the same entity, while a heterodimer contains two entities. It is also possible for an entity to exist with no asymmetric units pointing to it - this typically corresponds to something seen in an experiment (such as a cross-linking study) which was not modeled. Note that the IHM extension currently contains no support for symmetry, so two chains that are symmetrically related should each be represented as an “asymmetric” unit.
ihm.Assemblygroups asymmetric units and/or entities, or parts of them. Assemblies are used to describe which parts of the system correspond to each input source of data, or that were modeled.
ihm.representation.Representationdescribes how each part of the system was represented in the modeling, for example
as coarse-grained spheres.
Restraints and sampling¶
Restraints, that score or otherwise fit the computational model against
the input data, can be created as
These generally take as input a
Dataset pointing to
the input data, and an
Assembly describing which part of the
model the data corresponds to. For example, there are restraints for
3D EM and
small angle scattering.
ihm.protocol.Protocol objects describe how models were generated
from the input data. A protocol can consist of
multiple steps, such as molecular dynamics or
Monte Carlo, followed by one or more analyses, such as clustering, filtering,
rescoring, or validation, described by
These objects generally take an
Assembly to indicate what part
of the system was considered and a
group of datasets to show which data
guided the modeling or analysis.
ihm.model.Model objects give the actual coordinates of the final
generated models. These point to the
Assembly of what was
Protocol describing how the modeling
was done, and the
Representation showing how
the model was represented.
Models can be grouped together for any purpose using the
ihm.model.ModelGroup class. If a given group describes an ensemble
of models, the
ihm.model.Ensemble class allows for additional
information on the ensemble to be provided, such as
localization densities of parts of
the system and precision. Due to size, generally only representative models
of an ensemble are deposited in mmCIF, but the
class allows the full ensemble to be referred to, for example in a more
compact binary format (e.g. DCD) deposited at a given DOI. Groups of models
can also be shown as corresponding to different states of the system using
Metadata can also be added to the system, such as
ihm.Citation: publication(s) that describe this modeling or the methods used in it.
ihm.Software: software packages used to process the experimental data, generate intermediate inputs, do the modeling itself, and/or process the output.
ihm.Grant: funding support for the modeling.
ihm.reference.UniProtSequence: information on a sequence used in modeling, in UniProt.
The library keeps track of several numbering schemes to reflect the reality of the data used in modeling:
- Internal numbering. Residues are always numbered sequentially starting at 1 in an
Entity. All references to residues or residue ranges in the library use this numbering.
- Author-provided numbering. If a different numbering scheme is used by the authors, for example to correspond to the numbering of the original sequence that is modeled, this can be given as an author-provided numbering for one or more asymmetric units. See the
AsymUnit. (The mapping between author-provided and internal numbering is given in the
pdbx_poly_seq_schemetable in the mmCIF file.)
- Starting model numbering. If the initial state of the modeling is given by one or more PDB files, the numbering of residues in those files may not line up with the internal numbering. In this case an offset from starting model numbering to internal numbering can be provided - see the
Once the hierarchy of classes is complete, it can be freely inspected or
modified. All the classes are simple lightweight Python objects, generally
with the relevant data available as member variables. For example, modeling
packages such as IMP will typically
generate an IHM hierarchy from their own internal data models, but in many
cases some information relevant to IHM (such as
associated publication) cannot be determined
automatically and can be filled in by adding more objects to the hierarchy.
The complete hierarchy can be written out to an mmCIF or BinaryCIF file using