The ihm
Python module¶
Representation of an IHM mmCIF file as a set of Python classes.
Generally class names correspond to mmCIF table names and class
attributes to mmCIF attributes (with prefixes like pdbx_ stripped).
For example, the data item _entity.details is found in the
Entity
class, as the details member.
Ordinals and IDs are generally not used in this representation (instead, pointers to objects are used).
-
ihm.
unknown
= ?¶ A value that isn’t known. Note that this is distinct from a value that is deliberately omitted, which is represented by Python None.
-
class
ihm.
System
(title=None, id='model', model_details=None)[source]¶ Top-level class representing a complete modeled system.
Parameters: - title (str) – Title (longer text description) of the system.
- id (str) – Unique identifier for this system in the mmCIF file.
- model_details (str) – Detailed description of the system, like an abstract.
List of all authors of this system, as a list of strings (last name followed by initials, e.g. “Smith, A.J.”). When writing out a file, if this list is empty, the set of all citation authors (see
Citation.authors
) is used instead.
-
collections
= None¶ Collections (if any) to which this entry belongs. These are used to group depositions of related entries. See
Collection
.
-
comments
= None¶ List of plain text comments. These will be added to the top of the mmCIF file.
-
complete_assembly
= None¶ The assembly of the entire system. By convention this is always the first assembly in the mmCIF file (assembly_id=1). Note that currently this isn’t filled in on output until dumper.write() is called. See
Assembly
.
-
ordered_processes
= None¶ All ordered processes. See
OrderedProcess
.
-
orphan_assemblies
= None¶ All orphaned assemblies in the system. See
Assembly
. This can be used to keep track of all assemblies that are not otherwise used - normally one is assigned to aModel
,ihm.protocol.Step
, orRestraint
.
-
orphan_chem_descriptors
= None¶ All orphaned chemical descriptors in the system. See
ChemDescriptor
. This can be used to track descriptors that are not otherwise used - normally one is assigned to aihm.restraint.CrossLinkRestraint
.
-
orphan_dataset_groups
= None¶ All orphaned groups of datasets. This can be used to keep track of all dataset groups that are not otherwise used - normally a group is assigned to a
Protocol
. SeeDatasetGroup
.
-
orphan_datasets
= None¶ All orphaned datasets. This can be used to keep track of all datasets that are not otherwise used - normally a dataset is assigned to a
DatasetGroup
,StartingModel
,Restraint
,Template
, or as the parent of anotherDataset
. SeeDataset
.
-
orphan_features
= None¶ All orphaned features. This can be used to keep track of all features that are not otherwise used - normally a feature is assigned to a
GeometricRestraint
. SeeFeature
.
-
orphan_geometric_objects
= None¶ All orphaned geometric objects. This can be used to keep track of all objects that are not otherwise used - normally an object is assigned to a
GeometricRestraint
. SeeGeometricObject
.
-
orphan_protocols
= None¶ All orphaned modeling protocols. This can be used to keep track of all protocols that are not otherwise used - normally a protocol is assigned to a
Model
. SeeProtocol
.
-
orphan_pseudo_sites
= None¶ All orphaned pseudo sites. This can be used to keep track of all pseudo sites that are not otherwise used - normally a site is used in a
PseudoSiteFeature
or aCrossLinkPseudoSite
.
-
orphan_representations
= None¶ All orphaned representations of the system. This can be used to keep track of all representations that are not otherwise used - normally one is assigned to a
Model
. SeeRepresentation
.
-
orphan_starting_models
= None¶ All orphaned starting models for the system. This can be used to keep track of all starting models that are not otherwise used - normally one is assigned to an
ihm.representation.Segment
. SeeStartingModel
.
-
report
(fh=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶ Print a summary report of this system. This can be used to more easily spot errors or inconsistencies. It will also warn about missing data that may not be technically required for a compliant mmCIF file, but is usually expected to be present.
Parameters: fh (file) – The file handle to print the report to, if not standard output.
-
restraint_groups
= None¶ All restraint groups. See
RestraintGroup
.
-
state_groups
= None¶ All state groups (collections of models). See
StateGroup
.
-
update_locations_in_repositories
(repos)[source]¶ Update all
Location
objects in the system that lie within a checked-outRepository
to point to that repository.This is intended for the use case where the current working directory is a checkout of a repository which is archived somewhere with a DOI. Locations can then be simply constructed pointing to local files, and retroactively updated with this method to point to the DOI if appropriate.
For each Location, if it points to a local file that is below the root of one of the repos, update it to point to that repository. If is under multiple roots, pick the one that gives the shortest path. For example, if run in a subdirectory foo of a repository archived as repo.zip, the local path simple.pdb will be updated to be repo-top/foo/simple.pdb in repo.zip:
l = ihm.location.InputFileLocation("simple.pdb") system.locations.append(l) r = ihm.location.Repository(doi='1.2.3.4', url='https://example.com/repo.zip',) top_directory="repo-top", root="..") system.update_locations_in_repositories([r])
-
class
ihm.
Software
(name, classification, description, location, type='program', version=None, citation=None)[source]¶ Software used as part of the modeling protocol.
Parameters: - name (str) – The name of the software.
- classification (str) – The major function of the software, for example ‘model building’, ‘sample preparation’, ‘data collection’.
- description (str) – A longer text description of the software.
- location (str) – Place where the software can be found (e.g. URL).
- type (str) – Type of software (program/package/library/other).
- version (str) – The version used.
- citation (
Citation
) – Publication describing the software.
Generally these objects are added to
System.software
or passed toihm.startmodel.StartingModel
,ihm.protocol.Step
,ihm.analysis.Step
, orihm.restraint.PredictedContactResstraint
objects.
-
class
ihm.
Citation
(pmid, title, journal, volume, page_range, year, authors, doi, is_primary=False)[source]¶ A publication that describes the modeling.
Generally citations are added to
System.citations
or passed toihm.Software
orihm.restraint.EM3DRestraint
objects.Parameters: - pmid (str) – The PubMed ID.
- title (str) – Full title of the publication.
- journal (str) – Abbreviated journal name.
- volume – Journal volume as int for a plain number or str for journals adding a label to the number (e.g. “46(W1)” for a web server issue).
- page_range – The page (int) or page range (as a 2-element int tuple). Using str also works for labelled page numbers.
- year (int) – Year of publication.
- authors – All authors in order, as a list of strings (last name followed by initials, e.g. “Smith, A.J.”).
- doi (str) – Digital Object Identifier of the publication.
- is_primary (bool) – Denotes the most pertinent publication for the modeling itself (as opposed to a method or piece of software used in the protocol). Only one such publication is allowed, and it is assigned the ID “primary” in the mmCIF file.
-
class
ihm.
Grant
(funding_organization, country, grant_number)[source]¶ Information on funding support for the modeling. See
System.grants
.Parameters: - funding_organization (str) – The name of the organization providing the funding, e.g. “National Institutes of Health”.
- country (str) – The country that hosts the funding organization, e.g. “United States”.
- grant_number (str) – Identifying information for the grant, e.g. “1R01GM072999-01”.
-
class
ihm.
ChemComp
(id, code, code_canonical, name=None, formula=None, ccd=None, descriptors=None)[source]¶ A chemical component from which
Entity
objects are constructed. Usually these are amino acids (seeLPeptideChemComp
) or nucleic acids (seeDNAChemComp
andRNAChemComp
), but non-polymers such as ligands or water (seeNonPolymerChemComp
andWaterChemComp
) and saccharides (seeSaccharideChemComp
) are also supported.For standard amino and nucleic acids, it is generally easier to use a
Alphabet
and refer to the components with their one-letter (amino acids, RNA) or two-letter (DNA) codes.Parameters: - id (str) – A globally unique identifier for this component (usually three letters).
- code (str) – A shorter identifier (usually one letter) that only needs to be unique in the entity.
- code_canonical (str) – Canonical version of code (which need not be unique).
- name (str) – A longer human-readable name for the component.
- formula (str) – The chemical formula. This is a space-separated list of the element symbols in the component, each followed by an optional count (if omitted, 1 is assumed). The formula is terminated with the formal charge (if not zero). The element list should be sorted alphabetically, unless carbon is present, in which case C and H precede the rest of the elements. For example, water would be “H2 O” and arginine (with +1 formal charge) “C6 H15 N4 O2 1”.
- ccd (str) – The chemical component dictionary (CCD) where
this component is defined. Can be “core” for the wwPDB CCD
(https://www.wwpdb.org/data/ccd), “ma” for the ModelArchive CCD,
or “local” for a novel component that is defined in the mmCIF
file itself. If unspecified, defaults to “core” unless
descriptors
is given in which case it defaults to “local”. This information is essentially ignored by python-ihm (since the IHM dictionary has no support for custom CCDs) but is used by python-modelcif. - descriptors (list) – When
ccd
is “local”, this can be one or more descriptor objects that describe the chemistry. python-ihm does not define any, but python-modelcif does.
For example, glycine would have
id='GLY', code='G', code_canonical='G'
while selenomethionine would useid='MSE', code='MSE', code_canonical='M'
, guanosine (RNA)id='G', code='G', code_canonical='G'
, and deoxyguanosine (DNA)id='DG', code='DG', code_canonical='G'
.-
formula_weight
¶ Formula weight (dalton). This is calculated automatically from the chemical formula and known atomic masses.
-
class
ihm.
PeptideChemComp
(id, code, code_canonical, name=None, formula=None, ccd=None, descriptors=None)[source]¶ A single peptide component. Usually
LPeptideChemComp
is used instead (except for glycine) to specify chirality. SeeChemComp
for a description of the parameters.
-
class
ihm.
LPeptideChemComp
(id, code, code_canonical, name=None, formula=None, ccd=None, descriptors=None)[source]¶ A single peptide component with (normal) L- chirality. See
ChemComp
for a description of the parameters.
-
class
ihm.
DPeptideChemComp
(id, code, code_canonical, name=None, formula=None, ccd=None, descriptors=None)[source]¶ A single peptide component with (unusual) D- chirality. See
ChemComp
for a description of the parameters.
-
class
ihm.
RNAChemComp
(id, code, code_canonical, name=None, formula=None, ccd=None, descriptors=None)[source]¶ A single RNA component. See
ChemComp
for a description of the parameters.
-
class
ihm.
DNAChemComp
(id, code, code_canonical, name=None, formula=None, ccd=None, descriptors=None)[source]¶ A single DNA component. See
ChemComp
for a description of the parameters.
-
class
ihm.
SaccharideChemComp
(id, name=None, formula=None, ccd=None, descriptors=None)[source]¶ A saccharide chemical component. Usually a subclass that specifies the chirality and linkage (e.g.
LSaccharideBetaChemComp
) is used.Parameters: - id (str) – A globally unique identifier for this component.
- name (str) – A longer human-readable name for the component.
- formula (str) – The chemical formula. See
ChemComp
for more details. - ccd (str) – The chemical component dictionary (CCD) where
this component is defined. See
ChemComp
for more details. - descriptors (list) – Information on the component’s chemistry.
See
ChemComp
for more details.
-
class
ihm.
LSaccharideChemComp
(id, name=None, formula=None, ccd=None, descriptors=None)[source]¶ A single saccharide component with L-chirality and unspecified linkage. See
SaccharideChemComp
for a description of the parameters.
-
class
ihm.
LSaccharideAlphaChemComp
(id, name=None, formula=None, ccd=None, descriptors=None)[source]¶ A single saccharide component with L-chirality and alpha linkage. See
SaccharideChemComp
for a description of the parameters.
-
class
ihm.
LSaccharideBetaChemComp
(id, name=None, formula=None, ccd=None, descriptors=None)[source]¶ A single saccharide component with L-chirality and beta linkage. See
SaccharideChemComp
for a description of the parameters.
-
class
ihm.
DSaccharideChemComp
(id, name=None, formula=None, ccd=None, descriptors=None)[source]¶ A single saccharide component with D-chirality and unspecified linkage. See
SaccharideChemComp
for a description of the parameters.
-
class
ihm.
DSaccharideAlphaChemComp
(id, name=None, formula=None, ccd=None, descriptors=None)[source]¶ A single saccharide component with D-chirality and alpha linkage. See
SaccharideChemComp
for a description of the parameters.
-
class
ihm.
DSaccharideBetaChemComp
(id, name=None, formula=None, ccd=None, descriptors=None)[source]¶ A single saccharide component with D-chirality and beta linkage. See
SaccharideChemComp
for a description of the parameters.
-
class
ihm.
NonPolymerChemComp
(id, code_canonical='X', name=None, formula=None, ccd=None, descriptors=None)[source]¶ A non-polymer chemical component, such as a ligand or a non-standard residue (for crystal waters, use
WaterChemComp
).Parameters: - id (str) – A globally unique identifier for this component.
- code_canonical (str) – Canonical one-letter identifier. This is used for non-standard residues and should be the one-letter code of the closest standard residue (or by default, ‘X’).
- name (str) – A longer human-readable name for the component.
- formula (str) – The chemical formula. See
ChemComp
for more details. - ccd (str) – The chemical component dictionary (CCD) where
this component is defined. See
ChemComp
for more details. - descriptors (list) – Information on the component’s chemistry.
See
ChemComp
for more details.
-
class
ihm.
Alphabet
[source]¶ A mapping from codes (usually one-letter, or two-letter for DNA) to chemical components. These classes can be used to construct sequences of components when creating an
Entity
. They can also be used like a Python dict to get standard components, e.g.:a = ihm.LPeptideAlphabet() met = a['M'] gly = a['G']
-
class
ihm.
LPeptideAlphabet
[source]¶ A mapping from one-letter amino acid codes (e.g. H, M) to L-amino acids (as
LPeptideChemComp
objects, except for achiral glycine which maps toPeptideChemComp
). Some other common modified residues are also included (e.g. MSE). For these their full name rather than a one-letter code is used.
-
class
ihm.
DPeptideAlphabet
[source]¶ A mapping from D-amino acid codes (e.g. DHI, MED) to D-amino acids (as
DPeptideChemComp
objects, except for achiral glycine which maps toPeptideChemComp
). SeeLPeptideAlphabet
for more details.
-
class
ihm.
RNAAlphabet
[source]¶ A mapping from one-letter nucleic acid codes (e.g. A) to RNA (as
RNAChemComp
objects).
-
class
ihm.
DNAAlphabet
[source]¶ A mapping from two-letter nucleic acid codes (e.g. DA) to DNA (as
DNAChemComp
objects).
-
class
ihm.
Entity
(sequence, alphabet=<class 'ihm.LPeptideAlphabet'>, description=None, details=None, source=None, references=[])[source]¶ Represent a CIF entity (with a unique sequence)
Parameters: - sequence (sequence) – The primary sequence, as a sequence of
ChemComp
objects, and/or codes looked up in alphabet. - alphabet (
Alphabet
) – The mapping from code to chemical components to use (it is not necessary to instantiate this class). - description (str) – A short text name for the sequence.
- details (str) – Longer text describing the sequence.
- source (
ihm.source.Source
) – The method by which the sample for this entity was produced. - references (sequence of
ihm.reference.Reference
objects) – Information about this entity stored in external databases (for example the sequence in UniProt)
The sequence for an entity can be specified explicitly as a list of chemical components, or (more usually) as a list or string of codes, or a mixture of both. For example:
# Construct with a string of one-letter amino acid codes protein = ihm.Entity('AHMD') # Some less common amino acids (e.g. MSE) have three-letter codes protein_with_mse = ihm.Entity(['A', 'H', 'MSE', 'D']) # Can use a non-default alphabet to make DNA or RNA sequences dna = ihm.Entity(('DA', 'DC'), alphabet=ihm.DNAAlphabet) rna = ihm.Entity('AC', alphabet=ihm.RNAAlphabet) # Can pass explicit ChemComp objects by looking them up in Alphabets dna_al = ihm.DNAAlphabet() rna_al = ihm.RNAAlphabet() dna_rna_hybrid = ihm.Entity((dna_al['DG'], rna_al['C'])) # For unusual components (e.g. modified residues or ligands), # new ChemComp objects can be constructed psu = ihm.RNAChemComp(id='PSU', code='PSU', code_canonical='U', name="PSEUDOURIDINE-5'-MONOPHOSPHATE", formula='C9 H13 N2 O9 P') rna_with_psu = ihm.Entity(('A', 'C', psu), alphabet=ihm.RNAAlphabet)
For more examples, see the ligands and water example.
All entities should be stored in the top-level System object; see
System.entities
.-
branch_descriptors
= None¶ String descriptors of branched chemical structure. These generally only make sense for oligosaccharide entities, and should be a list of
BranchDescriptor
objects.
-
branch_links
= None¶ Any links between components in a branched entity. This is a list of
BranchLink
objects.
-
formula_weight
¶ Formula weight (dalton). This is calculated automatically from that of the chemical components.
-
is_polymeric
()[source]¶ Return True iff this entity represents a polymer, such as an amino acid sequence or DNA/RNA chain (and not a ligand or water)
-
seq_id_range
¶ Sequence range
- sequence (sequence) – The primary sequence, as a sequence of
-
class
ihm.
EntityRange
(entity, seq_id_begin, seq_id_end)[source]¶ Part of an entity. Usually these objects are created from an
Entity
, e.g. to get a range covering residues 4 through 7 in entity use:entity = ihm.Entity(sequence=...) rng = entity(4,7)
-
class
ihm.
AsymUnit
(entity, details=None, auth_seq_id_map=0, id=None, strand_id=None)[source]¶ An asymmetric unit, i.e. a unique instance of an Entity that was modeled.
Note that this class should not be used to describe crystal waters; for that, see
WaterAsymUnit
.Parameters: - entity (
Entity
) – The unique sequence of this asymmetric unit. - details (str) – Longer text description of this unit.
- auth_seq_id_map – Mapping from internal 1-based consecutive
residue numbering (seq_id) to “author-provided” numbering
(auth_seq_id plus an optional ins_code). This can be either
be an int offset, in which case
auth_seq_id = seq_id + auth_seq_id_map
with no insertion codes, or a mapping type (dict, list, tuple) in which caseauth_seq_id = auth_seq_id_map[seq_id]
with no insertion codes, orauth_seq_id, ins_code = auth_seq_id_map[seq_id]
- i.e. the output of the mapping is either the author-provided number, or a 2-element tuple containing that number and an insertion code. (Note that if a list or tuple is used for the mapping, the first element in the list or tuple does not correspond to the first residue and will never be used - since seq_id can never be zero.) The default if not specified, or not in the mapping, is forauth_seq_id == seq_id
and for no insertion codes to be used. - id (str) – User-specified ID (usually a string of one or more upper-case letters, e.g. A, B, C, AA). If not specified, IDs are automatically assigned alphabetically.
- strand_id (str) – PDB or “author-provided” strand/chain ID. If not specified, it will be the same as the regular ID.
See
System.asym_units
.-
segment
(gapped_sequence, seq_id_begin, seq_id_end)[source]¶ Get an object representing the alignment of part of this sequence.
Parameters: - gapped_sequence (str) – Sequence of the segment, including gaps.
- seq_id_begin (int) – Start of the segment.
- seq_id_end (int) – End of the segment.
-
seq_id_range
¶ Sequence range
-
sequence
¶ Primary sequence
-
strand_id
¶ PDB or author-provided strand/chain ID
- entity (
-
class
ihm.
AsymUnitRange
(asym, seq_id_begin, seq_id_end)[source]¶ Part of an asymmetric unit. Usually these objects are created from an
AsymUnit
, e.g. to get a range covering residues 4 through 7 in asym use:asym = ihm.AsymUnit(entity) rng = asym(4,7)
-
class
ihm.
WaterAsymUnit
(entity, number, details=None, auth_seq_id_map=0, id=None, strand_id=None)[source]¶ A collection of crystal waters, all with the same “chain” ID.
Parameters: number (int) – The number of water molecules in this unit. For more information on this class and the rest of the parameters, see
AsymUnit
.-
number_of_molecules
¶ Number of molecules
-
seq_id_range
¶ Sequence range
-
sequence
¶ Primary sequence
-
-
class
ihm.
Atom
(residue, id)[source]¶ A single atom in an entity or asymmetric unit. Usually these objects are created by calling
Residue.atom()
.Note that this class does not store atomic coordinates of a given atom in a given model; for that, see
ihm.model.Atom
.
-
class
ihm.
Residue
(seq_id, entity=None, asym=None)[source]¶ A single residue in an entity or asymmetric unit. Usually these objects are created by calling
Entity.residue()
orAsymUnit.residue()
.-
auth_seq_id
¶ Author-provided seq_id; only makes sense for asymmetric units
-
comp
¶ Chemical component (residue type)
-
ins_code
¶ Insertion code; only makes sense for asymmetric units
-
-
class
ihm.
Assembly
(elements=(), name=None, description=None)[source]¶ A collection of parts of the system that were modeled or probed together.
Parameters: - elements (sequence) – Initial set of parts of the system.
- name (str) – Short text name of this assembly.
- description (str) – Longer text that describes this assembly.
This is implemented as a simple list of asymmetric units (or parts of them), i.e. a list of
AsymUnit
and/orAsymUnitRange
objects. An Assembly is typically assigned to one or more ofSee also
System.complete_assembly
andSystem.orphan_assemblies
.Note that any duplicate assemblies will be pruned on output.
-
class
ihm.
ChemDescriptor
(auth_name, chem_comp_id=None, chemical_name=None, common_name=None, smiles=None, smiles_canonical=None, inchi=None, inchi_key=None)[source]¶ Description of a non-polymeric chemical component used in the experiment. For example, this might be a fluorescent probe or cross-linking agent. This class describes the chemical structure of the component, for example with a SMILES or INCHI descriptor, so that it is uniquely defined. A descriptor is typically assigned to a
ihm.restraint.CrossLinkRestraint
.See
ihm.cross_linkers
for chemical descriptors of some commonly-used cross-linking agents.Parameters: - auth_name (str) – Author-provided name
- chem_comp_id (str) – If this chemical is listed in the Chemical Component Dictionary, its three-letter identifier
- chemical_name (str) – The systematic (IUPAC) chemical name
- common_name (str) – Common name for the component
- smiles (str) – SMILES string
- smiles_canonical (str) – Canonical SMILES string
- inchi (str) – IUPAC INCHI descriptor
- inchi_key (str) – Hashed INCHI key
See also
System.orphan_chem_descriptors
.
-
class
ihm.
Collection
(id, name=None, details=None)[source]¶ A collection of entries belonging to single deposition or group. These are used by the archive to group multiple related entries, e.g. all entries deposited as part of a given study, or all models for a genome. An entry (
System
) can belong to multiple collections.Parameters: - id (str) – Unique identifier (assigned by the archive).
- name (str) – Short name for the collection.
- details (str) – Longer description of the collection.
See also
System.collections
.
-
class
ihm.
BranchDescriptor
(text, type, program=None, program_version=None)[source]¶ String descriptor of branched chemical structure. These generally only make sense for oligosaccharide entities. See
Entity.branch_descriptors
.Parameters: - text (str) – The value of this descriptor.
- type (str) – The type of the descriptor; one of “Glycam Condensed Core Sequence”, “Glycam Condensed Sequence”, “LINUCS”, or “WURCS”.
- program (str) – The name of the program or library used to compute the descriptor.
- program_version (str) – The version of the program or library used to compute the descriptor.
-
class
ihm.
BranchLink
(num1, atom_id1, leaving_atom_id1, num2, atom_id2, leaving_atom_id2, order=None, details=None)[source]¶ A link between components in a branched entity. These generally only make sense for oligosaccharide entities. See
Entity.branch_links
.Parameters: - num1 (int) – 1-based index of the first component.
- atom_id1 (str) – Name of the first atom in the linkage.
- leaving_atom_id1 (str) – Name of the first leaving atom.
- num2 (int) – 1-based index of the second component.
- atom_id2 (str) – Name of the second atom in the linkage.
- leaving_atom_id2 (str) – Name of the second leaving atom.
- order (str) – Bond order (e.g. sing, doub, trip).
- details (str) – More information about this link.