Usage¶
Usage of the library for output consists of first creating a hierarchy of Python objects that together describe the system, and then dumping that hierarchy to an mmCIF file.
For a complete worked example, see the simple docking example.
The top level of the hierarchy in IHM is the ihm.System. All other
objects are referenced from a System object.
Datasets¶
Any data used anywhere in the modeling (including in validation) can be
referenced with an ihm.dataset.Dataset. For example,
electron microscopy data is referenced with
ihm.dataset.EMDensityDataset and small angle scattering data with
ihm.dataset.SASDataset.
A dataset uses an
ihm.location.Location object to describe where it is stored.
Typically this is an ihm.location.DatabaseLocation for something
that’s deposited in a experiment-specific database such as PDB, EMDB, PRIDE,
or EMPIAR, or ihm.location.InputFileLocation for something that’s
stored as a simple file, either on the local disk or at a location described
with a DOI such as Zenodo or a publication’s
supplementary information. See the
locations example
for more examples.
System architecture¶
The architecture of the system is described with a number of classes:
ihm.Entitydescribes each unique sequence.
ihm.AsymUnitdescribes each asymmetric unit (chain) in the system. For example, a homodimer would consist of two asymmetric units, both pointing to the same entity, while a heterodimer contains two entities. It is also possible for an entity to exist with no asymmetric units pointing to it - this typically corresponds to something seen in an experiment (such as a cross-linking study) which was not modeled. Note that the IHM extension currently contains no support for symmetry, so two chains that are symmetrically related should each be represented as an “asymmetric” unit.
ihm.Assemblygroups asymmetric units and/or entities, or parts of them. Assemblies are used to describe which parts of the system correspond to each input source of data, or that were modeled.
ihm.representation.Representationdescribes how each part of the system was represented in the modeling, for exampleas atomsoras coarse-grained spheres.
Restraints and sampling¶
Restraints, that score or otherwise fit the computational model against
the input data, can be created as ihm.restraint.Restraint objects.
These generally take as input a Dataset pointing to
the input data, and an Assembly describing which part of the
model the data corresponds to. For example, there are restraints for
3D EM and
small angle scattering.
ihm.protocol.Protocol objects describe how models were generated
from the input data. A protocol can consist of
multiple steps, such as molecular dynamics or
Monte Carlo, followed by one or more analyses, such as clustering, filtering,
rescoring, or validation, described by ihm.analysis.Analysis objects.
These objects generally take an Assembly to indicate what part
of the system was considered and a
group of datasets to show which data
guided the modeling or analysis.
Model coordinates¶
ihm.model.Model objects give the actual coordinates of the final
generated models. These point to the Assembly of what was
modeled, the Protocol describing how the modeling
was done, and the Representation showing how
the model was represented.
Models can be grouped together for any purpose using the
ihm.model.ModelGroup class. If a given group describes an ensemble
of models, the ihm.model.Ensemble class allows for additional
information on the ensemble to be provided, such as
localization densities of parts of
the system and precision. Due to size, generally only representative models
of an ensemble are deposited in mmCIF, but the Ensemble
class allows the full ensemble to be referred to, for example in a more
compact binary format (e.g. DCD) deposited at a given DOI. Groups of models
can also be shown as corresponding to different states of the system using
the ihm.model.State class.
Metadata¶
Metadata can also be added to the system, such as
ihm.Citation: publication(s) that describe this modeling or the methods used in it.
ihm.Software: software packages used to process the experimental data, generate intermediate inputs, do the modeling itself, and/or process the output.
ihm.Grant: funding support for the modeling.
ihm.reference.UniProtSequence: information on a sequence used in modeling, in UniProt.
Residue numbering¶
The library keeps track of several numbering schemes to reflect the reality of the data used in modeling:
Internal numbering. Residues are always numbered sequentially starting at 1 in an
Entity. All references to residues or residue ranges in the library use this numbering. For polymers, this internal numbering matches theseq_idused in the mmCIF dictionary, while for branched entities, this matchesnumin the dictionary. (For other types of entities (non-polymers, waters)seq_idis not used in mmCIF, but the residues are still numbered sequentially from 1 in this library.)Author-provided numbering. If a different numbering scheme is used by the authors, for example to correspond to the numbering of the original sequence that is modeled, this can be given as an author-provided numbering for one or more asymmetric units. See the
auth_seq_id_mapandorig_auth_seq_id_mapparameters toAsymUnit. (The mapping between author-provided and internal numbering is given in tables such aspdbx_poly_seq_schemein the mmCIF file.) Two maps are provided as PDB provides for two distinct author-provided schemes; the “original” author-provided numberingorig_auth_seq_id_mapis entirely unrestricted but is only used internally, whileauth_seq_id_mapmust follow certain PDB rules (and generally matches the residue numbers used in legacy PDB files). In most cases, onlyauth_seq_id_mapis used.Starting model numbering. If the initial state of the modeling is given by one or more PDB files, the numbering of residues in those files may not line up with the internal numbering. In this case an offset from starting model numbering to internal numbering can be provided - see the
offsetparameter toStartingModel.Reference sequence numbering. The modeled sequence may differ from that in a database such as UniProt, which is itself numbered sequentially from 1 (for example, the modeled sequence may be a subset of the UniProt sequence, such that the first modeled residue is not the first residue in UniProt). The correspondence between the internal and reference sequences is given with
ihm.reference.Alignmentobjects.
Output¶
Once the hierarchy of classes is complete, it can be freely inspected or
modified. All the classes are simple lightweight Python objects, generally
with the relevant data available as member variables. For example, modeling
packages such as IMP will typically
generate an IHM hierarchy from their own internal data models, but in many
cases some information relevant to IHM (such as
the associated publication) cannot be determined
automatically and can be filled in by adding more objects to the hierarchy.
The complete hierarchy can be written out to an mmCIF or BinaryCIF file using
the ihm.dumper.write() function.
Input¶
Hierarchies of IHM classes can also be read from mmCIF or BinaryCIF files.
This is done using the ihm.reader.read() function, which returns a list of
ihm.System objects.