The ihm Python module

Representation of an IHM mmCIF file as a set of Python classes.

Generally class names correspond to mmCIF table names and class attributes to mmCIF attributes (with prefixes like pdbx_ stripped). For example, the data item _entity.details is found in the Entity class, as the details member.

Ordinals and IDs are generally not used in this representation (instead, pointers to objects are used).

ihm.unknown = ?

A value that isn’t known. Note that this is distinct from a value that is deliberately omitted, which is represented by Python None.

class ihm.System(title=None, id='model', model_details=None)[source]

Top-level class representing a complete modeled system.

Parameters:
  • title (str) – Title (longer text description) of the system.

  • id (str) – Unique identifier for this system in the mmCIF file.

  • model_details (str) – Detailed description of the system, like an abstract.

asym_units

All asymmetric units used in the system. See AsymUnit.

authors

List of all authors of this system, as a list of strings (last name followed by initials, e.g. “Smith, A.J.”). When writing out a file, if this list is empty, the set of all citation authors (see Citation) is used instead.

citations

List of all citations. See Citation.

collections

Collections (if any) to which this entry belongs. These are used to group depositions of related entries. See Collection.

comments

List of plain text comments. These will be added to the top of the mmCIF file.

complete_assembly

The assembly of the entire system. By convention this is always the first assembly in the mmCIF file (assembly_id=1). Note that currently this isn’t filled in on output until dumper.write() is called. See Assembly.

ensembles

All ensembles. See Ensemble.

entities

All entities used in the system. See Entity.

flr_data

Contains the fluorescence (FLR) part. See FLRData.

grants

List of all grants that supported this work. See Grant.

locations

Locations of all extra resources. See Location.

multi_state_schemes

All multi-state schemes See MultiStateScheme.

ordered_processes

All ordered processes. See OrderedProcess.

orphan_assemblies

All orphaned assemblies in the system. See Assembly. This can be used to keep track of all assemblies that are not otherwise used - normally one is assigned to a Model, ihm.protocol.Step, or Restraint.

orphan_chem_descriptors

All orphaned chemical descriptors in the system. See ChemDescriptor. This can be used to track descriptors that are not otherwise used - normally one is assigned to a ihm.restraint.CrossLinkRestraint.

orphan_dataset_groups

All orphaned groups of datasets. This can be used to keep track of all dataset groups that are not otherwise used - normally a group is assigned to a Protocol. See DatasetGroup.

orphan_datasets

All orphaned datasets. This can be used to keep track of all datasets that are not otherwise used - normally a dataset is assigned to a DatasetGroup, StartingModel, Restraint, Template, or as the parent of another Dataset. See Dataset.

orphan_features

All orphaned features. This can be used to keep track of all features that are not otherwise used - normally a feature is assigned to a GeometricRestraint. See Feature.

orphan_geometric_objects

All orphaned geometric objects. This can be used to keep track of all objects that are not otherwise used - normally an object is assigned to a GeometricRestraint. See GeometricObject.

orphan_protocols

All orphaned modeling protocols. This can be used to keep track of all protocols that are not otherwise used - normally a protocol is assigned to a Model. See Protocol.

orphan_pseudo_sites

All orphaned pseudo sites. This can be used to keep track of all pseudo sites that are not otherwise used - normally a site is used in a PseudoSiteFeature or a CrossLinkPseudoSite.

orphan_representations

All orphaned representations of the system. This can be used to keep track of all representations that are not otherwise used - normally one is assigned to a Model. See Representation.

orphan_starting_models

All orphaned starting models for the system. This can be used to keep track of all starting models that are not otherwise used - normally one is assigned to an ihm.representation.Segment. See StartingModel.

report(fh=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]

Print a summary report of this system. This can be used to more easily spot errors or inconsistencies. It will also warn about missing data that may not be technically required for a compliant mmCIF file, but is usually expected to be present.

Parameters:

fh (file) – The file handle to print the report to, if not standard output.

restraint_groups

All restraint groups. See RestraintGroup.

restraints

All restraints on the system. See Restraint.

software

List of all software used in the modeling. See Software.

state_groups

All state groups (collections of models). See StateGroup.

update_locations_in_repositories(repos)[source]

Update all Location objects in the system that lie within a checked-out Repository to point to that repository.

This is intended for the use case where the current working directory is a checkout of a repository which is archived somewhere with a DOI. Locations can then be simply constructed pointing to local files, and retroactively updated with this method to point to the DOI if appropriate.

For each Location, if it points to a local file that is below the root of one of the repos, update it to point to that repository. If is under multiple roots, pick the one that gives the shortest path. For example, if run in a subdirectory foo of a repository archived as repo.zip, the local path simple.pdb will be updated to be repo-top/foo/simple.pdb in repo.zip:

l = ihm.location.InputFileLocation("simple.pdb")
system.locations.append(l)

r = ihm.location.Repository(doi='1.2.3.4',
          url='https://example.com/repo.zip',)
          top_directory="repo-top", root="..")
system.update_locations_in_repositories([r])
class ihm.Software(name, classification, description, location, type='program', version=None, citation=None)[source]

Software used as part of the modeling protocol.

Parameters:
  • name (str) – The name of the software.

  • classification (str) – The major function of the software, for example ‘model building’, ‘sample preparation’, ‘data collection’.

  • description (str) – A longer text description of the software.

  • location (str) – Place where the software can be found (e.g. URL).

  • type (str) – Type of software (program/package/library/other).

  • version (str) – The version used.

  • citation (Citation) – Publication describing the software.

Generally these objects are added to System.software or passed to ihm.startmodel.StartingModel, ihm.protocol.Step, ihm.analysis.Step, or ihm.restraint.PredictedContactRestraint objects.

class ihm.Citation(pmid, title, journal, volume, page_range, year, authors, doi, is_primary=False)[source]

A publication that describes the modeling.

Generally citations are added to System.citations or passed to ihm.Software or ihm.restraint.EM3DRestraint objects.

Parameters:
  • pmid (str) – The PubMed ID.

  • title (str) – Full title of the publication.

  • journal (str) – Abbreviated journal name.

  • volume – Journal volume as int for a plain number or str for journals adding a label to the number (e.g. “46(W1)” for a web server issue).

  • page_range – The page (int) or page range (as a 2-element int tuple). Using str also works for labelled page numbers.

  • year (int) – Year of publication.

  • authors – All authors in order, as a list of strings (last name followed by initials, e.g. “Smith, A.J.”).

  • doi (str) – Digital Object Identifier of the publication.

  • is_primary (bool) – Denotes the most pertinent publication for the modeling itself (as opposed to a method or piece of software used in the protocol). Only one such publication is allowed, and it is assigned the ID “primary” in the mmCIF file.

classmethod from_pubmed_id(pubmed_id)[source]

Create a Citation from just a PubMed ID. This is done by querying NCBI’s web API, so requires network access.

Parameters:

pubmed_id (int) – The PubMed identifier.

Returns:

A new Citation for the given identifier.

Return type:

Citation

class ihm.Grant(funding_organization, country, grant_number)[source]

Information on funding support for the modeling. See System.grants.

Parameters:
  • funding_organization (str) – The name of the organization providing the funding, e.g. “National Institutes of Health”.

  • country (str) – The country that hosts the funding organization, e.g. “United States”.

  • grant_number (str) – Identifying information for the grant, e.g. “1R01GM072999-01”.

class ihm.ChemComp(id, code, code_canonical, name=None, formula=None, ccd=None, descriptors=None)[source]

A chemical component from which Entity objects are constructed. Usually these are amino acids (see LPeptideChemComp) or nucleic acids (see DNAChemComp and RNAChemComp), but non-polymers such as ligands or water (see NonPolymerChemComp and WaterChemComp) and saccharides (see SaccharideChemComp) are also supported.

For standard amino and nucleic acids, it is generally easier to use a Alphabet and refer to the components with their one-letter (amino acids, RNA) or two-letter (DNA) codes.

Parameters:
  • id (str) – A globally unique identifier for this component (usually three letters).

  • code (str) – A shorter identifier (usually one letter) that only needs to be unique in the entity.

  • code_canonical (str) – Canonical version of code (which need not be unique).

  • name (str) – A longer human-readable name for the component.

  • formula (str) – The chemical formula. This is a space-separated list of the element symbols in the component, each followed by an optional count (if omitted, 1 is assumed). The formula is terminated with the formal charge (if not zero). The element list should be sorted alphabetically, unless carbon is present, in which case C and H precede the rest of the elements. For example, water would be “H2 O” and arginine (with +1 formal charge) “C6 H15 N4 O2 1”.

  • ccd (str) – The chemical component dictionary (CCD) where this component is defined. Can be “core” for the wwPDB CCD (https://www.wwpdb.org/data/ccd), “ma” for the ModelArchive CCD, or “local” for a novel component that is defined in the mmCIF file itself. If unspecified, defaults to “core” unless descriptors is given in which case it defaults to “local”. This information is essentially ignored by python-ihm (since the IHM dictionary has no support for custom CCDs) but is used by python-modelcif.

  • descriptors (list) – When ccd is “local”, this can be one or more descriptor objects that describe the chemistry. python-ihm does not define any, but python-modelcif does.

For example, glycine would have id='GLY', code='G', code_canonical='G' while selenomethionine would use id='MSE', code='MSE', code_canonical='M', guanosine (RNA) id='G', code='G', code_canonical='G', and deoxyguanosine (DNA) id='DG', code='DG', code_canonical='G'.

property formula_weight

Formula weight (dalton). This is calculated automatically from the chemical formula and known atomic masses.

class ihm.PeptideChemComp(id, code, code_canonical, name=None, formula=None, ccd=None, descriptors=None)[source]

A single peptide component. Usually LPeptideChemComp is used instead (except for glycine) to specify chirality. See ChemComp for a description of the parameters.

class ihm.LPeptideChemComp(id, code, code_canonical, name=None, formula=None, ccd=None, descriptors=None)[source]

A single peptide component with (normal) L- chirality. See ChemComp for a description of the parameters.

class ihm.DPeptideChemComp(id, code, code_canonical, name=None, formula=None, ccd=None, descriptors=None)[source]

A single peptide component with (unusual) D- chirality. See ChemComp for a description of the parameters.

class ihm.RNAChemComp(id, code, code_canonical, name=None, formula=None, ccd=None, descriptors=None)[source]

A single RNA component. See ChemComp for a description of the parameters.

class ihm.DNAChemComp(id, code, code_canonical, name=None, formula=None, ccd=None, descriptors=None)[source]

A single DNA component. See ChemComp for a description of the parameters.

class ihm.SaccharideChemComp(id, name=None, formula=None, ccd=None, descriptors=None)[source]

A saccharide chemical component. Usually a subclass that specifies the chirality and linkage (e.g. LSaccharideBetaChemComp) is used.

Parameters:
  • id (str) – A globally unique identifier for this component.

  • name (str) – A longer human-readable name for the component.

  • formula (str) – The chemical formula. See ChemComp for more details.

  • ccd (str) – The chemical component dictionary (CCD) where this component is defined. See ChemComp for more details.

  • descriptors (list) – Information on the component’s chemistry. See ChemComp for more details.

class ihm.LSaccharideChemComp(id, name=None, formula=None, ccd=None, descriptors=None)[source]

A single saccharide component with L-chirality and unspecified linkage. See SaccharideChemComp for a description of the parameters.

class ihm.LSaccharideAlphaChemComp(id, name=None, formula=None, ccd=None, descriptors=None)[source]

A single saccharide component with L-chirality and alpha linkage. See SaccharideChemComp for a description of the parameters.

class ihm.LSaccharideBetaChemComp(id, name=None, formula=None, ccd=None, descriptors=None)[source]

A single saccharide component with L-chirality and beta linkage. See SaccharideChemComp for a description of the parameters.

class ihm.DSaccharideChemComp(id, name=None, formula=None, ccd=None, descriptors=None)[source]

A single saccharide component with D-chirality and unspecified linkage. See SaccharideChemComp for a description of the parameters.

class ihm.DSaccharideAlphaChemComp(id, name=None, formula=None, ccd=None, descriptors=None)[source]

A single saccharide component with D-chirality and alpha linkage. See SaccharideChemComp for a description of the parameters.

class ihm.DSaccharideBetaChemComp(id, name=None, formula=None, ccd=None, descriptors=None)[source]

A single saccharide component with D-chirality and beta linkage. See SaccharideChemComp for a description of the parameters.

class ihm.NonPolymerChemComp(id, code_canonical='X', name=None, formula=None, ccd=None, descriptors=None)[source]

A non-polymer chemical component, such as a ligand or a non-standard residue (for crystal waters, use WaterChemComp).

Parameters:
  • id (str) – A globally unique identifier for this component.

  • code_canonical (str) – Canonical one-letter identifier. This is used for non-standard residues and should be the one-letter code of the closest standard residue (or by default, ‘X’).

  • name (str) – A longer human-readable name for the component.

  • formula (str) – The chemical formula. See ChemComp for more details.

  • ccd (str) – The chemical component dictionary (CCD) where this component is defined. See ChemComp for more details.

  • descriptors (list) – Information on the component’s chemistry. See ChemComp for more details.

class ihm.WaterChemComp[source]

The chemical component for crystal water.

class ihm.Alphabet[source]

A mapping from codes (usually one-letter, or two-letter for DNA) to chemical components. These classes can be used to construct sequences of components when creating an Entity. They can also be used like a Python dict to get standard components, e.g.:

a = ihm.LPeptideAlphabet()
met = a['M']
gly = a['G']

See LPeptideAlphabet, RNAAlphabet, DNAAlphabet.

class ihm.LPeptideAlphabet[source]

A mapping from one-letter amino acid codes (e.g. H, M) to L-amino acids (as LPeptideChemComp objects, except for achiral glycine which maps to PeptideChemComp). Some other common modified residues are also included (e.g. MSE). For these their full name rather than a one-letter code is used.

class ihm.DPeptideAlphabet[source]

A mapping from D-amino acid codes (e.g. DHI, MED) to D-amino acids (as DPeptideChemComp objects, except for achiral glycine which maps to PeptideChemComp). See LPeptideAlphabet for more details.

class ihm.RNAAlphabet[source]

A mapping from one-letter nucleic acid codes (e.g. A) to RNA (as RNAChemComp objects).

class ihm.DNAAlphabet[source]

A mapping from two-letter nucleic acid codes (e.g. DA) to DNA (as DNAChemComp objects).

class ihm.Entity(sequence, alphabet=<class 'ihm.LPeptideAlphabet'>, description=None, details=None, source=None, references=[])[source]

Represent a CIF entity (with a unique sequence)

Parameters:
  • sequence (sequence) – The primary sequence, as a sequence of ChemComp objects, and/or codes looked up in alphabet.

  • alphabet (Alphabet) – The mapping from code to chemical components to use (it is not necessary to instantiate this class).

  • description (str) – A short text name for the sequence.

  • details (str) – Longer text describing the sequence.

  • source (ihm.source.Source) – The method by which the sample for this entity was produced.

  • references (sequence of ihm.reference.Reference objects) – Information about this entity stored in external databases (for example the sequence in UniProt)

The sequence for an entity can be specified explicitly as a list of chemical components, or (more usually) as a list or string of codes, or a mixture of both. For example:

# Construct with a string of one-letter amino acid codes
protein = ihm.Entity('AHMD')
# Some less common amino acids (e.g. MSE) have three-letter codes
protein_with_mse = ihm.Entity(['A', 'H', 'MSE', 'D'])

# Can use a non-default alphabet to make DNA or RNA sequences
dna = ihm.Entity(('DA', 'DC'), alphabet=ihm.DNAAlphabet)
rna = ihm.Entity('AC', alphabet=ihm.RNAAlphabet)

# Can pass explicit ChemComp objects by looking them up in Alphabets
dna_al = ihm.DNAAlphabet()
rna_al = ihm.RNAAlphabet()
dna_rna_hybrid = ihm.Entity((dna_al['DG'], rna_al['C']))

# For unusual components (e.g. modified residues or ligands),
# new ChemComp objects can be constructed
psu = ihm.RNAChemComp(id='PSU', code='PSU', code_canonical='U',
                      name="PSEUDOURIDINE-5'-MONOPHOSPHATE",
                      formula='C9 H13 N2 O9 P')
rna_with_psu = ihm.Entity(('A', 'C', psu), alphabet=ihm.RNAAlphabet)

For more examples, see the ligands and water example.

All entities should be stored in the top-level System object; see System.entities.

branch_descriptors

String descriptors of branched chemical structure. These generally only make sense for oligosaccharide entities, and should be a list of BranchDescriptor objects.

Any links between components in a branched entity. This is a list of BranchLink objects.

property formula_weight

Formula weight (dalton). This is calculated automatically from that of the chemical components.

is_branched()[source]

Return True iff this entity is branched (generally an oligosaccharide)

is_polymeric()[source]

Return True iff this entity represents a polymer, such as an amino acid sequence or DNA/RNA chain (and not a ligand or water)

residue(seq_id)[source]

Get a Residue at the given sequence position

property seq_id_range

Sequence range

class ihm.EntityRange(entity, seq_id_begin, seq_id_end)[source]

Part of an entity. Usually these objects are created from an Entity, e.g. to get a range covering residues 4 through 7 in entity use:

entity = ihm.Entity(sequence=...)
rng = entity(4,7)
class ihm.AsymUnit(entity, details=None, auth_seq_id_map=0, id=None, strand_id=None, orig_auth_seq_id_map=None)[source]

An asymmetric unit, i.e. a unique instance of an Entity that was modeled.

Note that this class should not be used to describe crystal waters; for that, see WaterAsymUnit.

Parameters:
  • entity (Entity) – The unique sequence of this asymmetric unit.

  • details (str) – Longer text description of this unit.

  • auth_seq_id_map – Mapping from internal 1-based consecutive residue numbering (seq_id) to PDB “author-provided” numbering (auth_seq_id plus an optional ins_code). This can be either be an int offset, in which case auth_seq_id = seq_id + auth_seq_id_map with no insertion codes, or a mapping type (dict, list, tuple) in which case auth_seq_id = auth_seq_id_map[seq_id] with no insertion codes, or auth_seq_id, ins_code = auth_seq_id_map[seq_id] - i.e. the output of the mapping is either the author-provided number, or a 2-element tuple containing that number and an insertion code. (Note that if a list or tuple is used for the mapping, the first element in the list or tuple does not correspond to the first residue and will never be used - since seq_id can never be zero.) The default if not specified, or not in the mapping, is for auth_seq_id == seq_id and for no insertion codes to be used.

  • id (str) – User-specified ID (usually a string of one or more upper-case letters, e.g. A, B, C, AA). If not specified, IDs are automatically assigned alphabetically.

  • strand_id (str) – PDB or “author-provided” strand/chain ID. If not specified, it will be the same as the regular ID.

  • orig_auth_seq_id_map – Mapping from internal 1-based consecutive residue numbering (seq_id) to original “author-provided” numbering. This differs from auth_seq_id_map as the original numbering need not follow any defined scheme, while auth_seq_id_map must follow certain PDB-defined rules. This can be any mapping type (dict, list, tuple) in which case orig_auth_seq_id = orig_auth_seq_id_map[seq_id]. If the mapping is None (the default), or a given seq_id cannot be found in the mapping, orig_auth_seq_id = auth_seq_id. This mapping is only used in the various scheme tables, such as pdbx_poly_seq_scheme.

See System.asym_units.

num_map

For branched entities read from files, mapping from provisional to final internal numbering (seq_id), or None if no mapping is necessary. See ihm.model.Model.add_atom().

residue(seq_id)[source]

Get a Residue at the given sequence position

segment(gapped_sequence, seq_id_begin, seq_id_end)[source]

Get an object representing the alignment of part of this sequence.

Parameters:
  • gapped_sequence (str) – Sequence of the segment, including gaps.

  • seq_id_begin (int) – Start of the segment.

  • seq_id_end (int) – End of the segment.

property seq_id_range

Sequence range

property sequence

Primary sequence

property strand_id

PDB or author-provided strand/chain ID

class ihm.AsymUnitRange(asym, seq_id_begin, seq_id_end)[source]

Part of an asymmetric unit. Usually these objects are created from an AsymUnit, e.g. to get a range covering residues 4 through 7 in asym use:

asym = ihm.AsymUnit(entity)
rng = asym(4,7)
class ihm.WaterAsymUnit(entity, number, details=None, auth_seq_id_map=0, id=None, strand_id=None, orig_auth_seq_id_map=None)[source]

A collection of crystal waters, all with the same “chain” ID.

Parameters:

number (int) – The number of water molecules in this unit.

For more information on this class and the rest of the parameters, see AsymUnit.

property number_of_molecules

Number of molecules

property seq_id_range

Sequence range

property sequence

Primary sequence

class ihm.Atom(residue, id)[source]

A single atom in an entity or asymmetric unit. Usually these objects are created by calling Residue.atom().

Note that this class does not store atomic coordinates of a given atom in a given model; for that, see ihm.model.Atom.

class ihm.Residue(seq_id, entity=None, asym=None)[source]

A single residue in an entity or asymmetric unit. Usually these objects are created by calling Entity.residue() or AsymUnit.residue().

atom(atom_id)[source]

Get a Atom in this residue with the given name.

property auth_seq_id

Author-provided seq_id; only makes sense for asymmetric units

property comp

Chemical component (residue type)

property ins_code

Insertion code; only makes sense for asymmetric units

class ihm.Assembly(elements=(), name=None, description=None)[source]

A collection of parts of the system that were modeled or probed together.

Parameters:
  • elements (sequence) – Initial set of parts of the system.

  • name (str) – Short text name of this assembly.

  • description (str) – Longer text that describes this assembly.

This is implemented as a simple list of asymmetric units (or parts of them), i.e. a list of AsymUnit and/or AsymUnitRange objects. An Assembly is typically assigned to one or more of

See also System.complete_assembly and System.orphan_assemblies.

Note that any duplicate assemblies will be pruned on output.

parent = None

Assembly that is the immediate parent in a hierarchy, or None

class ihm.ChemDescriptor(auth_name, chem_comp_id=None, chemical_name=None, common_name=None, smiles=None, smiles_canonical=None, inchi=None, inchi_key=None)[source]

Description of a non-polymeric chemical component used in the experiment. For example, this might be a fluorescent probe or cross-linking agent. This class describes the chemical structure of the component, for example with a SMILES or INCHI descriptor, so that it is uniquely defined. A descriptor is typically assigned to a ihm.restraint.CrossLinkRestraint.

See ihm.cross_linkers for chemical descriptors of some commonly-used cross-linking agents.

Parameters:
  • auth_name (str) – Author-provided name

  • chem_comp_id (str) – If this chemical is listed in the Chemical Component Dictionary, its three-letter identifier

  • chemical_name (str) – The systematic (IUPAC) chemical name

  • common_name (str) – Common name for the component

  • smiles (str) – SMILES string

  • smiles_canonical (str) – Canonical SMILES string

  • inchi (str) – IUPAC INCHI descriptor

  • inchi_key (str) – Hashed INCHI key

See also System.orphan_chem_descriptors.

class ihm.Collection(id, name=None, details=None)[source]

A collection of entries belonging to single deposition or group. These are used by the archive to group multiple related entries, e.g. all entries deposited as part of a given study, or all models for a genome. An entry (System) can belong to multiple collections.

Parameters:
  • id (str) – Unique identifier (assigned by the archive).

  • name (str) – Short name for the collection.

  • details (str) – Longer description of the collection.

See also System.collections.

class ihm.BranchDescriptor(text, type, program=None, program_version=None)[source]

String descriptor of branched chemical structure. These generally only make sense for oligosaccharide entities. See Entity.branch_descriptors.

Parameters:
  • text (str) – The value of this descriptor.

  • type (str) – The type of the descriptor; one of “Glycam Condensed Core Sequence”, “Glycam Condensed Sequence”, “LINUCS”, or “WURCS”.

  • program (str) – The name of the program or library used to compute the descriptor.

  • program_version (str) – The version of the program or library used to compute the descriptor.

A link between components in a branched entity. These generally only make sense for oligosaccharide entities. See Entity.branch_links.

Parameters:
  • num1 (int) – 1-based index of the first component.

  • atom_id1 (str) – Name of the first atom in the linkage.

  • leaving_atom_id1 (str) – Name of the first leaving atom.

  • num2 (int) – 1-based index of the second component.

  • atom_id2 (str) – Name of the second atom in the linkage.

  • leaving_atom_id2 (str) – Name of the second leaving atom.

  • order (str) – Bond order (e.g. sing, doub, trip).

  • details (str) – More information about this link.