The ihm Python module

Representation of an IHM mmCIF file as a set of Python classes.

Generally class names correspond to mmCIF table names and class attributes to mmCIF attributes (with prefixes like pdbx_ stripped). For example, the data item _entity.details is found in the Entity class, as the details member.

Ordinals and IDs are generally not used in this representation (instead, pointers to objects are used).

ihm.unknown = ?

A value that isn’t known. Note that this is distinct from a value that is deliberately omitted, which is represented by Python None.

class ihm.System(title=None, id='model')[source]

Top-level class representing a complete modeled system.

Parameters:
  • title (str) – Title (longer text description) of the system.
  • id (str) – Unique identifier for this system in the mmCIF file.
asym_units = None

All asymmetric units used in the system. See AsymUnit.

authors = None

List of all authors of this system, as a list of strings (last name followed by initials, e.g. “Smith AJ”). When writing out a file, if this is list is empty, the set of all citation authors (see Citation.authors) is used instead.

citations = None

List of all citations. See Citation.

comments = None

List of plain text comments. These will be added to the top of the mmCIF file.

complete_assembly = None

The assembly of the entire system. By convention this is always the first assembly in the mmCIF file (assembly_id=1). Note that currently this isn’t filled in on output until dumper.write() is called. See Assembly.

ensembles = None

All ensembles. See Ensemble.

entities = None

All entities used in the system. See Entity.

flr_data = None

Contains the fluorescence (FLR) part. See FLRData.

grants = None

List of all grants that supported this work. See Grant.

locations = None

Locations of all extra resources. See Location.

ordered_processes = None

All ordered processes. See OrderedProcess.

orphan_assemblies = None

All orphaned assemblies in the system. See Assembly. This can be used to keep track of all assemblies that are not otherwise used - normally one is assigned to a Model, ihm.protocol.Step, or Restraint.

orphan_chem_descriptors = None

All orphaned chemical descriptors in the system. See ChemDescriptor. This can be used to track descriptors that are not otherwise used - normally one is assigned to a ihm.restraint.CrossLinkRestraint.

orphan_dataset_groups = None

All orphaned groups of datasets. This can be used to keep track of all dataset groups that are not otherwise used - normally a group is assigned to a Protocol. See DatasetGroup.

orphan_datasets = None

All orphaned datasets. This can be used to keep track of all datasets that are not otherwise used - normally a dataset is assigned to a DatasetGroup, StartingModel, Restraint, Template, or as the parent of another Dataset. See Dataset.

orphan_features = None

All orphaned features. This can be used to keep track of all features that are not otherwise used - normally a feature is assigned to a GeometricRestraint. See Feature.

orphan_geometric_objects = None

All orphaned geometric objects. This can be used to keep track of all objects that are not otherwise used - normally an object is assigned to a GeometricRestraint. See GeometricObject.

orphan_protocols = None

All orphaned modeling protocols. This can be used to keep track of all protocols that are not otherwise used - normally a protocol is assigned to a Model. See Protocol.

orphan_pseudo_sites = None

All orphaned pseudo sites. This can be used to keep track of all pseudo sites that are not otherwise used - normally a site is used in a PseudoSiteFeature or a CrossLinkPseudoSite.

orphan_representations = None

All orphaned representations of the system. This can be used to keep track of all representations that are not otherwise used - normally one is assigned to a Model. See Representation.

orphan_starting_models = None

All orphaned starting models for the system. This can be used to keep track of all starting models that are not otherwise used - normally one is assigned to an ihm.representation.Segment. See StartingModel.

restraint_groups = None

All restraint groups. See RestraintGroup.

restraints = None

All restraints on the system. See Restraint.

software = None

List of all software used in the modeling. See Software.

state_groups = None

All state groups (collections of models). See StateGroup.

update_locations_in_repositories(repos)[source]

Update all Location objects in the system that lie within a checked-out Repository to point to that repository.

This is intended for the use case where the current working directory is a checkout of a repository which is archived somewhere with a DOI. Locations can then be simply constructed pointing to local files, and retroactively updated with this method to point to the DOI if appropriate.

For each Location, if it points to a local file that is below the root of one of the repos, update it to point to that repository. If is under multiple roots, pick the one that gives the shortest path. For example, if run in a subdirectory foo of a repository archived as repo.zip, the local path simple.pdb will be updated to be repo-top/foo/simple.pdb in repo.zip:

l = ihm.location.InputFileLocation("simple.pdb")
system.locations.append(l)

r = ihm.location.Repository(doi='1.2.3.4',
          url='https://example.com/repo.zip',)
          top_directory="repo-top", root="..")
system.update_locations_in_repositories([r])
class ihm.Software(name, classification, description, location, type='program', version=None)[source]

Software used as part of the modeling protocol.

Parameters:
  • name (str) – The name of the software.
  • classification (str) – The major function of the sofware, for example ‘model building’, ‘sample preparation’, ‘data collection’.
  • description (str) – A longer text description of the software.
  • location (str) – Place where the software can be found (e.g. URL).
  • type (str) – Type of software (program/package/library/other).
  • version (str) – The version used.

Generally these objects are added to System.software or passed to ihm.startmodel.StartingModel, ihm.protocol.Step, ihm.analysis.Step, or ihm.restraint.PredictedContactResstraint objects.

class ihm.Citation(pmid, title, journal, volume, page_range, year, authors, doi)[source]

A publication that describes the modeling.

Generally citations are added to System.citations or passed to ihm.restraint.EM3DRestraint objects.

Parameters:
  • pmid (str) – The PubMed ID.
  • title (str) – Full title of the publication.
  • journal (str) – Abbreviated journal name.
  • volume (int) – Journal volume number.
  • page_range – The page (int) or page range (as a 2-element int tuple).
  • year (int) – Year of publication.
  • authors – All authors in order, as a list of strings (last name followed by initials, e.g. “Smith AJ”).
  • doi (str) – Digital Object Identifier of the publication.
classmethod from_pubmed_id(pubmed_id)[source]

Create a Citation from just a PubMed ID. This is done by querying NCBI’s web API, so requires network access.

Parameters:pubmed_id (int) – The PubMed identifier.
Returns:A new Citation for the given identifier.
Return type:Citation
class ihm.Grant(funding_organization, country, grant_number)[source]

Information on funding support for the modeling. See System.grants.

Parameters:
  • funding_organization (str) – The name of the organization providing the funding, e.g. “National Institutes of Health”.
  • country (str) – The country that hosts the funding organization, e.g. “United States”.
  • grant_number (str) – Identifying information for the grant, e.g. “1R01GM072999-01”.
class ihm.ChemComp(id, code, code_canonical, name=None, formula=None)[source]

A chemical component from which Entity objects are constructed. Usually these are amino acids (see LPeptideChemComp) or nucleic acids (see DNAChemComp and RNAChemComp).

For standard amino and nucleic acids, it is generally easier to use a Alphabet and refer to the components with their one-letter (amino acids, RNA) or two-letter (DNA) codes.

Parameters:
  • id (str) – A globally unique identifier for this component (usually three letters).
  • code (str) – A shorter identifier (usually one letter) that only needs to be unique in the entity.
  • code_canonical (str) – Canonical version of code (which need not be unique).
  • name (str) – A longer human-readable name for the component.
  • formula (str) – The chemical formula. This is a space-separated list of the element symbols in the component, each followed by an optional count (if omitted, 1 is assumed). The formula is terminated with the formal charge (if not zero). The element list should be sorted alphabetically, unless carbon is present, in which case C and H precede the rest of the elements. For example, water would be “H2 O” and arginine (with +1 formal charge) “C6 H15 N4 O2 1”.

For example, glycine would have id='GLY', code='G', code_canonical='G' while selenomethionine would use id='MSE', code='MSE', code_canonical='M', guanosine (RNA) id='G', code='G', code_canonical='G', and deoxyguanosine (DNA) id='DG', code='DG', code_canonical='G'.

formula_weight

Formula weight (dalton). This is calculated automatically from the chemical formula and known atomic masses.

class ihm.PeptideChemComp(id, code, code_canonical, name=None, formula=None)[source]

A single peptide component. Usually LPeptideChemComp is used instead (except for glycine) to specify chirality. See ChemComp for a description of the parameters.

class ihm.LPeptideChemComp(id, code, code_canonical, name=None, formula=None)[source]

A single peptide component with (normal) L- chirality. See ChemComp for a description of the parameters.

class ihm.DPeptideChemComp(id, code, code_canonical, name=None, formula=None)[source]

A single peptide component with (unusual) D- chirality. See ChemComp for a description of the parameters.

class ihm.RNAChemComp(id, code, code_canonical, name=None, formula=None)[source]

A single RNA component. See ChemComp for a description of the parameters.

class ihm.DNAChemComp(id, code, code_canonical, name=None, formula=None)[source]

A single DNA component. See ChemComp for a description of the parameters.

class ihm.NonPolymerChemComp(id, name=None, formula=None)[source]

A non-polymer chemical component, such as a ligand (for crystal waters, use WaterChemComp).

Parameters:
  • id (str) – A globally unique identifier for this component.
  • name (str) – A longer human-readable name for the component.
  • formula (str) – The chemical formula. See ChemComp for more details.
class ihm.WaterChemComp[source]

The chemical component for crystal water.

class ihm.Alphabet[source]

A mapping from codes (usually one-letter, or two-letter for DNA) to chemical components. These classes can be used to construct sequences of components when creating an Entity. They can also be used like a Python dict to get standard components, e.g.:

a = ihm.LPeptideAlphabet()
met = a['M']
gly = a['G']

See LPeptideAlphabet, RNAAlphabet, DNAAlphabet.

class ihm.LPeptideAlphabet[source]

A mapping from one-letter amino acid codes (e.g. H, M) to L-amino acids (as LPeptideChemComp objects, except for achiral glycine which maps to PeptideChemComp). Some other common modified residues are also included (e.g. MSE). For these their full name rather than a one-letter code is used.

class ihm.DPeptideAlphabet[source]

A mapping from D-amino acid codes (e.g. DHI, MED) to D-amino acids (as DPeptideChemComp objects, except for achiral glycine which maps to PeptideChemComp). See LPeptideAlphabet for more details.

class ihm.RNAAlphabet[source]

A mapping from one-letter nucleic acid codes (e.g. A) to RNA (as RNAChemComp objects).

class ihm.DNAAlphabet[source]

A mapping from two-letter nucleic acid codes (e.g. DA) to DNA (as DNAChemComp objects).

class ihm.Entity(sequence, alphabet=<class 'ihm.LPeptideAlphabet'>, description=None, details=None, source=None, references=[])[source]

Represent a CIF entity (with a unique sequence)

Parameters:
  • sequence (sequence) – The primary sequence, as a sequence of ChemComp objects, and/or codes looked up in alphabet.
  • alphabet (Alphabet) – The mapping from code to chemical components to use (it is not necessary to instantiate this class).
  • description (str) – A short text name for the sequence.
  • details (str) – Longer text describing the sequence.
  • source (ihm.source.Source) – The method by which the sample for this entity was produced.
  • references (sequence of ihm.reference.Reference objects) – Information about this entity stored in external databases (for example the sequence in UniProt)

The sequence for an entity can be specified explicitly as a list of chemical components, or (more usually) as a list or string of codes, or a mixture of both. For example:

# Construct with a string of one-letter amino acid codes
protein = ihm.Entity('AHMD')
# Some less common amino acids (e.g. MSE) have three-letter codes
protein_with_mse = ihm.Entity(['A', 'H', 'MSE', 'D'])

# Can use a non-default alphabet to make DNA or RNA sequences
dna = ihm.Entity(('DA', 'DC'), alphabet=ihm.DNAAlphabet)
rna = ihm.Entity('AC', alphabet=ihm.RNAAlphabet)

# Can pass explicit ChemComp objects by looking them up in Alphabets
dna_al = ihm.DNAAlphabet()
rna_al = ihm.RNAAlphabet()
dna_rna_hybrid = ihm.Entity((dna_al['DG'], rna_al['C']))

# For unusual components (e.g. modified residues or ligands),
# new ChemComp objects can be constructed
psu = ihm.RNAChemComp(id='PSU', code='PSU', code_canonical='U',
                      name="PSEUDOURIDINE-5'-MONOPHOSPHATE",
                      formula='C9 H13 N2 O9 P')
rna_with_psu = ihm.Entity(('A', 'C', psu), alphabet=ihm.RNAAlphabet)

For more examples, see the ligands and water example.

All entities should be stored in the top-level System object; see System.entities.

formula_weight

Formula weight (dalton). This is calculated automatically from that of the chemical components.

is_polymeric()[source]

Return True iff this entity represents a polymer, such as an amino acid sequence or DNA/RNA chain (and not a ligand or water)

residue(seq_id)[source]

Get a Residue at the given sequence position

seq_id_range

Sequence range

class ihm.EntityRange(entity, seq_id_begin, seq_id_end)[source]

Part of an entity. Usually these objects are created from an Entity, e.g. to get a range covering residues 4 through 7 in entity use:

entity = ihm.Entity(sequence=...)
rng = entity(4,7)
class ihm.AsymUnit(entity, details=None, auth_seq_id_map=0, id=None)[source]

An asymmetric unit, i.e. a unique instance of an Entity that was modeled.

Parameters:
  • entity (Entity) – The unique sequence of this asymmetric unit.
  • details (str) – Longer text description of this unit.
  • auth_seq_id_map – Mapping from internal 1-based consecutive residue numbering (seq_id) to “author-provided” numbering (auth_seq_id). This can be either be an int offset, in which case auth_seq_id = seq_id + auth_seq_id_map, or a mapping type (dict, list, tuple) in which case auth_seq_id = auth_seq_id_map[seq_id]. (Note that if a list or tuple is used, the first element in the list or tuple does not correspond to the first residue and will never be used - since seq_id can never be zero.) The default if not specified, or not in the mapping, is for auth_seq_id == seq_id.
  • id (str) – User-specified ID (usually a string of one or more upper-case letters, e.g. A, B, C, AA). If not specified, IDs are automatically assigned alphabetically.

See System.asym_units.

residue(seq_id)[source]

Get a Residue at the given sequence position

seq_id_range

Sequence range

class ihm.AsymUnitRange(asym, seq_id_begin, seq_id_end)[source]

Part of an asymmetric unit. Usually these objects are created from an AsymUnit, e.g. to get a range covering residues 4 through 7 in asym use:

asym = ihm.AsymUnit(entity)
rng = asym(4,7)
class ihm.Atom(residue, id)[source]

A single atom in an entity or asymmetric unit. Usually these objects are created by calling Residue.atom().

class ihm.Residue(seq_id, entity=None, asym=None)[source]

A single residue in an entity or asymmetric unit. Usually these objects are created by calling Entity.residue() or AsymUnit.residue().

atom(atom_id)[source]

Get a Atom in this residue with the given name.

auth_seq_id

Author-provided seq_id; only makes sense for asymmetric units

class ihm.Assembly(elements=(), name=None, description=None)[source]

A collection of parts of the system that were modeled or probed together.

Parameters:
  • elements (sequence) – Initial set of parts of the system.
  • name (str) – Short text name of this assembly.
  • description (str) – Longer text that describes this assembly.

This is implemented as a simple list of asymmetric units (or parts of them), i.e. a list of AsymUnit and/or AsymUnitRange objects. An Assembly is typically assigned to one or more of

See also System.complete_assembly and System.orphan_assemblies.

Note that any duplicate assemblies will be pruned on output.

parent = None

Assembly that is the immediate parent in a hierarchy, or None

class ihm.ChemDescriptor(auth_name, chem_comp_id=None, chemical_name=None, common_name=None, smiles=None, smiles_canonical=None, inchi=None, inchi_key=None)[source]

Description of a non-polymeric chemical component used in the experiment. For example, this might be a fluorescent probe or cross-linking agent. This class describes the chemical structure of the component, for example with a SMILES or INCHI descriptor, so that it is uniquely defined. A descriptor is typically assigned to a ihm.restraint.CrossLinkRestraint.

See ihm.cross_linkers for chemical descriptors of some commonly-used cross-linking agents.

Parameters:
  • auth_name (str) – Author-provided name
  • chem_comp_id (str) – If this chemical is listed in the Chemical Component Dictionary, its three-letter identifier
  • chemical_name (str) – The systematic (IUPAC) chemical name
  • common_name (str) – Common name for the component
  • smiles (str) – SMILES string
  • smiles_canonical (str) – Canonical SMILES string
  • inchi (str) – IUPAC INCHI descriptor
  • inchi_key (str) – Hashed INCHI key

See also System.orphan_chem_descriptors.