Usage¶
Usage of the library for output consists of first creating a hierarchy of Python objects that together describe the system, and then dumping that hierarchy to an mmCIF file.
For a complete worked example, see the simple docking example.
The top level of the hierarchy in IHM is the ihm.System
. All other
objects are referenced from a System object.
Datasets¶
Any data used anywhere in the modeling (including in validation) can be
referenced with an ihm.dataset.Dataset
. For example,
electron microscopy data is referenced with
ihm.dataset.EMDensityDataset
and small angle scattering data with
ihm.dataset.SASDataset
.
A dataset uses an
ihm.location.Location
object to describe where it is stored.
Typically this is an ihm.location.DatabaseLocation
for something
that’s deposited in a experiment-specific database such as PDB, EMDB, PRIDE,
or EMPIAR, or ihm.location.InputFileLocation
for something that’s
stored as a simple file, either on the local disk or at a location described
with a DOI such as Zenodo or a publication’s
supplementary information. See the
locations example
for more examples.
System architecture¶
The architecture of the system is described with a number of classes:
ihm.Entity
describes each unique sequence.
ihm.AsymUnit
describes each asymmetric unit (chain) in the system. For example, a homodimer would consist of two asymmetric units, both pointing to the same entity, while a heterodimer contains two entities. It is also possible for an entity to exist with no asymmetric units pointing to it - this typically corresponds to something seen in an experiment (such as a cross-linking study) which was not modeled. Note that the IHM extension currently contains no support for symmetry, so two chains that are symmetrically related should each be represented as an “asymmetric” unit.
ihm.Assembly
groups asymmetric units and/or entities, or parts of them. Assemblies are used to describe which parts of the system correspond to each input source of data, or that were modeled.
ihm.representation.Representation
describes how each part of the system was represented in the modeling, for exampleas atoms
oras coarse-grained spheres
.
Restraints and sampling¶
Restraints, that score or otherwise fit the computational model against
the input data, can be created as ihm.restraint.Restraint
objects.
These generally take as input a Dataset
pointing to
the input data, and an Assembly
describing which part of the
model the data corresponds to. For example, there are restraints for
3D EM
and
small angle scattering
.
ihm.protocol.Protocol
objects describe how models were generated
from the input data. A protocol can consist of
multiple steps
, such as molecular dynamics or
Monte Carlo, followed by one or more analyses, such as clustering, filtering,
rescoring, or validation, described by ihm.analysis.Analysis
objects.
These objects generally take an Assembly
to indicate what part
of the system was considered and a
group of datasets
to show which data
guided the modeling or analysis.
Model coordinates¶
ihm.model.Model
objects give the actual coordinates of the final
generated models. These point to the Assembly
of what was
modeled, the Protocol
describing how the modeling
was done, and the Representation
showing how
the model was represented.
Models can be grouped together for any purpose using the
ihm.model.ModelGroup
class. If a given group describes an ensemble
of models, the ihm.model.Ensemble
class allows for additional
information on the ensemble to be provided, such as
localization densities
of parts of
the system and precision. Due to size, generally only representative models
of an ensemble are deposited in mmCIF, but the Ensemble
class allows the full ensemble to be referred to, for example in a more
compact binary format (e.g. DCD) deposited at a given DOI. Groups of models
can also be shown as corresponding to different states of the system using
the ihm.model.State
class.
Metadata¶
Metadata can also be added to the system, such as
ihm.Citation
: publication(s) that describe this modeling or the methods used in it.
ihm.Software
: software packages used to process the experimental data, generate intermediate inputs, do the modeling itself, and/or process the output.
ihm.Grant
: funding support for the modeling.
ihm.reference.UniProtSequence
: information on a sequence used in modeling, in UniProt.
Residue numbering¶
The library keeps track of several numbering schemes to reflect the reality of the data used in modeling:
Internal numbering. Residues are always numbered sequentially starting at 1 in an
Entity
. All references to residues or residue ranges in the library use this numbering. For polymers, this internal numbering matches theseq_id
used in the mmCIF dictionary, while for branched entities, this matchesnum
in the dictionary. (For other types of entities (non-polymers, waters)seq_id
is not used in mmCIF, but the residues are still numbered sequentially from 1 in this library.)Author-provided numbering. If a different numbering scheme is used by the authors, for example to correspond to the numbering of the original sequence that is modeled, this can be given as an author-provided numbering for one or more asymmetric units. See the
auth_seq_id_map
andorig_auth_seq_id_map
parameters toAsymUnit
. (The mapping between author-provided and internal numbering is given in tables such aspdbx_poly_seq_scheme
in the mmCIF file.) Two maps are provided as PDB provides for two distinct author-provided schemes; the “original” author-provided numberingorig_auth_seq_id_map
is entirely unrestricted but is only used internally, whileauth_seq_id_map
must follow certain PDB rules (and generally matches the residue numbers used in legacy PDB files). In most cases, onlyauth_seq_id_map
is used.Starting model numbering. If the initial state of the modeling is given by one or more PDB files, the numbering of residues in those files may not line up with the internal numbering. In this case an offset from starting model numbering to internal numbering can be provided - see the
offset
parameter toStartingModel
.Reference sequence numbering. The modeled sequence may differ from that in a database such as UniProt, which is itself numbered sequentially from 1 (for example, the modeled sequence may be a subset of the UniProt sequence, such that the first modeled residue is not the first residue in UniProt). The correspondence between the internal and reference sequences is given with
ihm.reference.Alignment
objects.
Output¶
Once the hierarchy of classes is complete, it can be freely inspected or
modified. All the classes are simple lightweight Python objects, generally
with the relevant data available as member variables. For example, modeling
packages such as IMP will typically
generate an IHM hierarchy from their own internal data models, but in many
cases some information relevant to IHM (such as
the associated publication
) cannot be determined
automatically and can be filled in by adding more objects to the hierarchy.
The complete hierarchy can be written out to an mmCIF or BinaryCIF file using
the ihm.dumper.write()
function.
Input¶
Hierarchies of IHM classes can also be read from mmCIF or BinaryCIF files.
This is done using the ihm.reader.read()
function, which returns a list of
ihm.System
objects.