The ihm Python module

Representation of an IHM mmCIF file as a set of Python classes.

Generally class names correspond to mmCIF table names and class attributes to mmCIF attributes (with prefixes like pdbx_ stripped). For example, the data item _entity.details is found in the Entity class, as the details member.

Ordinals and IDs are generally not used in this representation (instead, pointers to objects are used).

ihm.unknown = ?

A value that isn’t known. Note that this is distinct from a value that is deliberately omitted, which is represented by Python None.

class ihm.System(title=None, id='model', model_details=None)[source]

Top-level class representing a complete modeled system.

Parameters:
  • title (str) – Title (longer text description) of the system.
  • id (str) – Unique identifier for this system in the mmCIF file.
  • model_details (str) – Detailed description of the system, like an abstract.
asym_units = None

All asymmetric units used in the system. See AsymUnit.

authors = None

List of all authors of this system, as a list of strings (last name followed by initials, e.g. “Smith, A.J.”). When writing out a file, if this list is empty, the set of all citation authors (see Citation.authors) is used instead.

citations = None

List of all citations. See Citation.

collections = None

Collections (if any) to which this entry belongs. These are used to group depositions of related entries. See Collection.

comments = None

List of plain text comments. These will be added to the top of the mmCIF file.

complete_assembly = None

The assembly of the entire system. By convention this is always the first assembly in the mmCIF file (assembly_id=1). Note that currently this isn’t filled in on output until dumper.write() is called. See Assembly.

ensembles = None

All ensembles. See Ensemble.

entities = None

All entities used in the system. See Entity.

flr_data = None

Contains the fluorescence (FLR) part. See FLRData.

grants = None

List of all grants that supported this work. See Grant.

locations = None

Locations of all extra resources. See Location.

ordered_processes = None

All ordered processes. See OrderedProcess.

orphan_assemblies = None

All orphaned assemblies in the system. See Assembly. This can be used to keep track of all assemblies that are not otherwise used - normally one is assigned to a Model, ihm.protocol.Step, or Restraint.

orphan_chem_descriptors = None

All orphaned chemical descriptors in the system. See ChemDescriptor. This can be used to track descriptors that are not otherwise used - normally one is assigned to a ihm.restraint.CrossLinkRestraint.

orphan_dataset_groups = None

All orphaned groups of datasets. This can be used to keep track of all dataset groups that are not otherwise used - normally a group is assigned to a Protocol. See DatasetGroup.

orphan_datasets = None

All orphaned datasets. This can be used to keep track of all datasets that are not otherwise used - normally a dataset is assigned to a DatasetGroup, StartingModel, Restraint, Template, or as the parent of another Dataset. See Dataset.

orphan_features = None

All orphaned features. This can be used to keep track of all features that are not otherwise used - normally a feature is assigned to a GeometricRestraint. See Feature.

orphan_geometric_objects = None

All orphaned geometric objects. This can be used to keep track of all objects that are not otherwise used - normally an object is assigned to a GeometricRestraint. See GeometricObject.

orphan_protocols = None

All orphaned modeling protocols. This can be used to keep track of all protocols that are not otherwise used - normally a protocol is assigned to a Model. See Protocol.

orphan_pseudo_sites = None

All orphaned pseudo sites. This can be used to keep track of all pseudo sites that are not otherwise used - normally a site is used in a PseudoSiteFeature or a CrossLinkPseudoSite.

orphan_representations = None

All orphaned representations of the system. This can be used to keep track of all representations that are not otherwise used - normally one is assigned to a Model. See Representation.

orphan_starting_models = None

All orphaned starting models for the system. This can be used to keep track of all starting models that are not otherwise used - normally one is assigned to an ihm.representation.Segment. See StartingModel.

report(fh=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]

Print a summary report of this system. This can be used to more easily spot errors or inconsistencies. It will also warn about missing data that may not be technically required for a compliant mmCIF file, but is usually expected to be present.

Parameters:fh (file) – The file handle to print the report to, if not standard output.
restraint_groups = None

All restraint groups. See RestraintGroup.

restraints = None

All restraints on the system. See Restraint.

software = None

List of all software used in the modeling. See Software.

state_groups = None

All state groups (collections of models). See StateGroup.

update_locations_in_repositories(repos)[source]

Update all Location objects in the system that lie within a checked-out Repository to point to that repository.

This is intended for the use case where the current working directory is a checkout of a repository which is archived somewhere with a DOI. Locations can then be simply constructed pointing to local files, and retroactively updated with this method to point to the DOI if appropriate.

For each Location, if it points to a local file that is below the root of one of the repos, update it to point to that repository. If is under multiple roots, pick the one that gives the shortest path. For example, if run in a subdirectory foo of a repository archived as repo.zip, the local path simple.pdb will be updated to be repo-top/foo/simple.pdb in repo.zip:

l = ihm.location.InputFileLocation("simple.pdb")
system.locations.append(l)

r = ihm.location.Repository(doi='1.2.3.4',
          url='https://example.com/repo.zip',)
          top_directory="repo-top", root="..")
system.update_locations_in_repositories([r])
class ihm.Software(name, classification, description, location, type='program', version=None, citation=None)[source]

Software used as part of the modeling protocol.

Parameters:
  • name (str) – The name of the software.
  • classification (str) – The major function of the software, for example ‘model building’, ‘sample preparation’, ‘data collection’.
  • description (str) – A longer text description of the software.
  • location (str) – Place where the software can be found (e.g. URL).
  • type (str) – Type of software (program/package/library/other).
  • version (str) – The version used.
  • citation (Citation) – Publication describing the software.

Generally these objects are added to System.software or passed to ihm.startmodel.StartingModel, ihm.protocol.Step, ihm.analysis.Step, or ihm.restraint.PredictedContactResstraint objects.

class ihm.Citation(pmid, title, journal, volume, page_range, year, authors, doi, is_primary=False)[source]

A publication that describes the modeling.

Generally citations are added to System.citations or passed to ihm.Software or ihm.restraint.EM3DRestraint objects.

Parameters:
  • pmid (str) – The PubMed ID.
  • title (str) – Full title of the publication.
  • journal (str) – Abbreviated journal name.
  • volume – Journal volume as int for a plain number or str for journals adding a label to the number (e.g. “46(W1)” for a web server issue).
  • page_range – The page (int) or page range (as a 2-element int tuple). Using str also works for labelled page numbers.
  • year (int) – Year of publication.
  • authors – All authors in order, as a list of strings (last name followed by initials, e.g. “Smith, A.J.”).
  • doi (str) – Digital Object Identifier of the publication.
  • is_primary (bool) – Denotes the most pertinent publication for the modeling itself (as opposed to a method or piece of software used in the protocol). Only one such publication is allowed, and it is assigned the ID “primary” in the mmCIF file.
classmethod from_pubmed_id(pubmed_id)[source]

Create a Citation from just a PubMed ID. This is done by querying NCBI’s web API, so requires network access.

Parameters:pubmed_id (int) – The PubMed identifier.
Returns:A new Citation for the given identifier.
Return type:Citation
class ihm.Grant(funding_organization, country, grant_number)[source]

Information on funding support for the modeling. See System.grants.

Parameters:
  • funding_organization (str) – The name of the organization providing the funding, e.g. “National Institutes of Health”.
  • country (str) – The country that hosts the funding organization, e.g. “United States”.
  • grant_number (str) – Identifying information for the grant, e.g. “1R01GM072999-01”.
class ihm.ChemComp(id, code, code_canonical, name=None, formula=None, ccd=None, descriptors=None)[source]

A chemical component from which Entity objects are constructed. Usually these are amino acids (see LPeptideChemComp) or nucleic acids (see DNAChemComp and RNAChemComp), but non-polymers such as ligands or water (see NonPolymerChemComp and WaterChemComp) and saccharides (see SaccharideChemComp) are also supported.

For standard amino and nucleic acids, it is generally easier to use a Alphabet and refer to the components with their one-letter (amino acids, RNA) or two-letter (DNA) codes.

Parameters:
  • id (str) – A globally unique identifier for this component (usually three letters).
  • code (str) – A shorter identifier (usually one letter) that only needs to be unique in the entity.
  • code_canonical (str) – Canonical version of code (which need not be unique).
  • name (str) – A longer human-readable name for the component.
  • formula (str) – The chemical formula. This is a space-separated list of the element symbols in the component, each followed by an optional count (if omitted, 1 is assumed). The formula is terminated with the formal charge (if not zero). The element list should be sorted alphabetically, unless carbon is present, in which case C and H precede the rest of the elements. For example, water would be “H2 O” and arginine (with +1 formal charge) “C6 H15 N4 O2 1”.
  • ccd (str) – The chemical component dictionary (CCD) where this component is defined. Can be “core” for the wwPDB CCD (https://www.wwpdb.org/data/ccd), “ma” for the ModelArchive CCD, or “local” for a novel component that is defined in the mmCIF file itself. If unspecified, defaults to “core” unless descriptors is given in which case it defaults to “local”. This information is essentially ignored by python-ihm (since the IHM dictionary has no support for custom CCDs) but is used by python-modelcif.
  • descriptors (list) – When ccd is “local”, this can be one or more descriptor objects that describe the chemistry. python-ihm does not define any, but python-modelcif does.

For example, glycine would have id='GLY', code='G', code_canonical='G' while selenomethionine would use id='MSE', code='MSE', code_canonical='M', guanosine (RNA) id='G', code='G', code_canonical='G', and deoxyguanosine (DNA) id='DG', code='DG', code_canonical='G'.

formula_weight

Formula weight (dalton). This is calculated automatically from the chemical formula and known atomic masses.

class ihm.PeptideChemComp(id, code, code_canonical, name=None, formula=None, ccd=None, descriptors=None)[source]

A single peptide component. Usually LPeptideChemComp is used instead (except for glycine) to specify chirality. See ChemComp for a description of the parameters.

class ihm.LPeptideChemComp(id, code, code_canonical, name=None, formula=None, ccd=None, descriptors=None)[source]

A single peptide component with (normal) L- chirality. See ChemComp for a description of the parameters.

class ihm.DPeptideChemComp(id, code, code_canonical, name=None, formula=None, ccd=None, descriptors=None)[source]

A single peptide component with (unusual) D- chirality. See ChemComp for a description of the parameters.

class ihm.RNAChemComp(id, code, code_canonical, name=None, formula=None, ccd=None, descriptors=None)[source]

A single RNA component. See ChemComp for a description of the parameters.

class ihm.DNAChemComp(id, code, code_canonical, name=None, formula=None, ccd=None, descriptors=None)[source]

A single DNA component. See ChemComp for a description of the parameters.

class ihm.SaccharideChemComp(id, name=None, formula=None, ccd=None, descriptors=None)[source]

A saccharide chemical component. Usually a subclass that specifies the chirality and linkage (e.g. LSaccharideBetaChemComp) is used.

Parameters:
  • id (str) – A globally unique identifier for this component.
  • name (str) – A longer human-readable name for the component.
  • formula (str) – The chemical formula. See ChemComp for more details.
  • ccd (str) – The chemical component dictionary (CCD) where this component is defined. See ChemComp for more details.
  • descriptors (list) – Information on the component’s chemistry. See ChemComp for more details.
class ihm.LSaccharideChemComp(id, name=None, formula=None, ccd=None, descriptors=None)[source]

A single saccharide component with L-chirality and unspecified linkage. See SaccharideChemComp for a description of the parameters.

class ihm.LSaccharideAlphaChemComp(id, name=None, formula=None, ccd=None, descriptors=None)[source]

A single saccharide component with L-chirality and alpha linkage. See SaccharideChemComp for a description of the parameters.

class ihm.LSaccharideBetaChemComp(id, name=None, formula=None, ccd=None, descriptors=None)[source]

A single saccharide component with L-chirality and beta linkage. See SaccharideChemComp for a description of the parameters.

class ihm.DSaccharideChemComp(id, name=None, formula=None, ccd=None, descriptors=None)[source]

A single saccharide component with D-chirality and unspecified linkage. See SaccharideChemComp for a description of the parameters.

class ihm.DSaccharideAlphaChemComp(id, name=None, formula=None, ccd=None, descriptors=None)[source]

A single saccharide component with D-chirality and alpha linkage. See SaccharideChemComp for a description of the parameters.

class ihm.DSaccharideBetaChemComp(id, name=None, formula=None, ccd=None, descriptors=None)[source]

A single saccharide component with D-chirality and beta linkage. See SaccharideChemComp for a description of the parameters.

class ihm.NonPolymerChemComp(id, code_canonical='X', name=None, formula=None, ccd=None, descriptors=None)[source]

A non-polymer chemical component, such as a ligand or a non-standard residue (for crystal waters, use WaterChemComp).

Parameters:
  • id (str) – A globally unique identifier for this component.
  • code_canonical (str) – Canonical one-letter identifier. This is used for non-standard residues and should be the one-letter code of the closest standard residue (or by default, ‘X’).
  • name (str) – A longer human-readable name for the component.
  • formula (str) – The chemical formula. See ChemComp for more details.
  • ccd (str) – The chemical component dictionary (CCD) where this component is defined. See ChemComp for more details.
  • descriptors (list) – Information on the component’s chemistry. See ChemComp for more details.
class ihm.WaterChemComp[source]

The chemical component for crystal water.

class ihm.Alphabet[source]

A mapping from codes (usually one-letter, or two-letter for DNA) to chemical components. These classes can be used to construct sequences of components when creating an Entity. They can also be used like a Python dict to get standard components, e.g.:

a = ihm.LPeptideAlphabet()
met = a['M']
gly = a['G']

See LPeptideAlphabet, RNAAlphabet, DNAAlphabet.

class ihm.LPeptideAlphabet[source]

A mapping from one-letter amino acid codes (e.g. H, M) to L-amino acids (as LPeptideChemComp objects, except for achiral glycine which maps to PeptideChemComp). Some other common modified residues are also included (e.g. MSE). For these their full name rather than a one-letter code is used.

class ihm.DPeptideAlphabet[source]

A mapping from D-amino acid codes (e.g. DHI, MED) to D-amino acids (as DPeptideChemComp objects, except for achiral glycine which maps to PeptideChemComp). See LPeptideAlphabet for more details.

class ihm.RNAAlphabet[source]

A mapping from one-letter nucleic acid codes (e.g. A) to RNA (as RNAChemComp objects).

class ihm.DNAAlphabet[source]

A mapping from two-letter nucleic acid codes (e.g. DA) to DNA (as DNAChemComp objects).

class ihm.Entity(sequence, alphabet=<class 'ihm.LPeptideAlphabet'>, description=None, details=None, source=None, references=[])[source]

Represent a CIF entity (with a unique sequence)

Parameters:
  • sequence (sequence) – The primary sequence, as a sequence of ChemComp objects, and/or codes looked up in alphabet.
  • alphabet (Alphabet) – The mapping from code to chemical components to use (it is not necessary to instantiate this class).
  • description (str) – A short text name for the sequence.
  • details (str) – Longer text describing the sequence.
  • source (ihm.source.Source) – The method by which the sample for this entity was produced.
  • references (sequence of ihm.reference.Reference objects) – Information about this entity stored in external databases (for example the sequence in UniProt)

The sequence for an entity can be specified explicitly as a list of chemical components, or (more usually) as a list or string of codes, or a mixture of both. For example:

# Construct with a string of one-letter amino acid codes
protein = ihm.Entity('AHMD')
# Some less common amino acids (e.g. MSE) have three-letter codes
protein_with_mse = ihm.Entity(['A', 'H', 'MSE', 'D'])

# Can use a non-default alphabet to make DNA or RNA sequences
dna = ihm.Entity(('DA', 'DC'), alphabet=ihm.DNAAlphabet)
rna = ihm.Entity('AC', alphabet=ihm.RNAAlphabet)

# Can pass explicit ChemComp objects by looking them up in Alphabets
dna_al = ihm.DNAAlphabet()
rna_al = ihm.RNAAlphabet()
dna_rna_hybrid = ihm.Entity((dna_al['DG'], rna_al['C']))

# For unusual components (e.g. modified residues or ligands),
# new ChemComp objects can be constructed
psu = ihm.RNAChemComp(id='PSU', code='PSU', code_canonical='U',
                      name="PSEUDOURIDINE-5'-MONOPHOSPHATE",
                      formula='C9 H13 N2 O9 P')
rna_with_psu = ihm.Entity(('A', 'C', psu), alphabet=ihm.RNAAlphabet)

For more examples, see the ligands and water example.

All entities should be stored in the top-level System object; see System.entities.

branch_descriptors = None

String descriptors of branched chemical structure. These generally only make sense for oligosaccharide entities, and should be a list of BranchDescriptor objects.

Any links between components in a branched entity. This is a list of BranchLink objects.

formula_weight

Formula weight (dalton). This is calculated automatically from that of the chemical components.

is_branched()[source]

Return True iff this entity is branched (generally an oligosaccharide)

is_polymeric()[source]

Return True iff this entity represents a polymer, such as an amino acid sequence or DNA/RNA chain (and not a ligand or water)

residue(seq_id)[source]

Get a Residue at the given sequence position

seq_id_range

Sequence range

class ihm.EntityRange(entity, seq_id_begin, seq_id_end)[source]

Part of an entity. Usually these objects are created from an Entity, e.g. to get a range covering residues 4 through 7 in entity use:

entity = ihm.Entity(sequence=...)
rng = entity(4,7)
class ihm.AsymUnit(entity, details=None, auth_seq_id_map=0, id=None, strand_id=None)[source]

An asymmetric unit, i.e. a unique instance of an Entity that was modeled.

Note that this class should not be used to describe crystal waters; for that, see WaterAsymUnit.

Parameters:
  • entity (Entity) – The unique sequence of this asymmetric unit.
  • details (str) – Longer text description of this unit.
  • auth_seq_id_map – Mapping from internal 1-based consecutive residue numbering (seq_id) to “author-provided” numbering (auth_seq_id plus an optional ins_code). This can be either be an int offset, in which case auth_seq_id = seq_id + auth_seq_id_map with no insertion codes, or a mapping type (dict, list, tuple) in which case auth_seq_id = auth_seq_id_map[seq_id] with no insertion codes, or auth_seq_id, ins_code = auth_seq_id_map[seq_id] - i.e. the output of the mapping is either the author-provided number, or a 2-element tuple containing that number and an insertion code. (Note that if a list or tuple is used for the mapping, the first element in the list or tuple does not correspond to the first residue and will never be used - since seq_id can never be zero.) The default if not specified, or not in the mapping, is for auth_seq_id == seq_id and for no insertion codes to be used.
  • id (str) – User-specified ID (usually a string of one or more upper-case letters, e.g. A, B, C, AA). If not specified, IDs are automatically assigned alphabetically.
  • strand_id (str) – PDB or “author-provided” strand/chain ID. If not specified, it will be the same as the regular ID.

See System.asym_units.

residue(seq_id)[source]

Get a Residue at the given sequence position

segment(gapped_sequence, seq_id_begin, seq_id_end)[source]

Get an object representing the alignment of part of this sequence.

Parameters:
  • gapped_sequence (str) – Sequence of the segment, including gaps.
  • seq_id_begin (int) – Start of the segment.
  • seq_id_end (int) – End of the segment.
seq_id_range

Sequence range

sequence

Primary sequence

strand_id

PDB or author-provided strand/chain ID

class ihm.AsymUnitRange(asym, seq_id_begin, seq_id_end)[source]

Part of an asymmetric unit. Usually these objects are created from an AsymUnit, e.g. to get a range covering residues 4 through 7 in asym use:

asym = ihm.AsymUnit(entity)
rng = asym(4,7)
class ihm.WaterAsymUnit(entity, number, details=None, auth_seq_id_map=0, id=None, strand_id=None)[source]

A collection of crystal waters, all with the same “chain” ID.

Parameters:number (int) – The number of water molecules in this unit.

For more information on this class and the rest of the parameters, see AsymUnit.

number_of_molecules

Number of molecules

seq_id_range

Sequence range

sequence

Primary sequence

class ihm.Atom(residue, id)[source]

A single atom in an entity or asymmetric unit. Usually these objects are created by calling Residue.atom().

Note that this class does not store atomic coordinates of a given atom in a given model; for that, see ihm.model.Atom.

class ihm.Residue(seq_id, entity=None, asym=None)[source]

A single residue in an entity or asymmetric unit. Usually these objects are created by calling Entity.residue() or AsymUnit.residue().

atom(atom_id)[source]

Get a Atom in this residue with the given name.

auth_seq_id

Author-provided seq_id; only makes sense for asymmetric units

comp

Chemical component (residue type)

ins_code

Insertion code; only makes sense for asymmetric units

class ihm.Assembly(elements=(), name=None, description=None)[source]

A collection of parts of the system that were modeled or probed together.

Parameters:
  • elements (sequence) – Initial set of parts of the system.
  • name (str) – Short text name of this assembly.
  • description (str) – Longer text that describes this assembly.

This is implemented as a simple list of asymmetric units (or parts of them), i.e. a list of AsymUnit and/or AsymUnitRange objects. An Assembly is typically assigned to one or more of

See also System.complete_assembly and System.orphan_assemblies.

Note that any duplicate assemblies will be pruned on output.

parent = None

Assembly that is the immediate parent in a hierarchy, or None

class ihm.ChemDescriptor(auth_name, chem_comp_id=None, chemical_name=None, common_name=None, smiles=None, smiles_canonical=None, inchi=None, inchi_key=None)[source]

Description of a non-polymeric chemical component used in the experiment. For example, this might be a fluorescent probe or cross-linking agent. This class describes the chemical structure of the component, for example with a SMILES or INCHI descriptor, so that it is uniquely defined. A descriptor is typically assigned to a ihm.restraint.CrossLinkRestraint.

See ihm.cross_linkers for chemical descriptors of some commonly-used cross-linking agents.

Parameters:
  • auth_name (str) – Author-provided name
  • chem_comp_id (str) – If this chemical is listed in the Chemical Component Dictionary, its three-letter identifier
  • chemical_name (str) – The systematic (IUPAC) chemical name
  • common_name (str) – Common name for the component
  • smiles (str) – SMILES string
  • smiles_canonical (str) – Canonical SMILES string
  • inchi (str) – IUPAC INCHI descriptor
  • inchi_key (str) – Hashed INCHI key

See also System.orphan_chem_descriptors.

class ihm.Collection(id, name=None, details=None)[source]

A collection of entries belonging to single deposition or group. These are used by the archive to group multiple related entries, e.g. all entries deposited as part of a given study, or all models for a genome. An entry (System) can belong to multiple collections.

Parameters:
  • id (str) – Unique identifier (assigned by the archive).
  • name (str) – Short name for the collection.
  • details (str) – Longer description of the collection.

See also System.collections.

class ihm.BranchDescriptor(text, type, program=None, program_version=None)[source]

String descriptor of branched chemical structure. These generally only make sense for oligosaccharide entities. See Entity.branch_descriptors.

Parameters:
  • text (str) – The value of this descriptor.
  • type (str) – The type of the descriptor; one of “Glycam Condensed Core Sequence”, “Glycam Condensed Sequence”, “LINUCS”, or “WURCS”.
  • program (str) – The name of the program or library used to compute the descriptor.
  • program_version (str) – The version of the program or library used to compute the descriptor.

A link between components in a branched entity. These generally only make sense for oligosaccharide entities. See Entity.branch_links.

Parameters:
  • num1 (int) – 1-based index of the first component.
  • atom_id1 (str) – Name of the first atom in the linkage.
  • leaving_atom_id1 (str) – Name of the first leaving atom.
  • num2 (int) – 1-based index of the second component.
  • atom_id2 (str) – Name of the second atom in the linkage.
  • leaving_atom_id2 (str) – Name of the second leaving atom.
  • order (str) – Bond order (e.g. sing, doub, trip).
  • details (str) – More information about this link.