The ihm.metadata
Python module¶
Classes to extract metadata from various input files.
Often input files contain metadata that would be useful to include in the mmCIF file, but the metadata is stored in a different way for each domain-specific file type. For example, MRC files used for electron microscopy maps may contain an EMDB identifier, which the mmCIF file can point to in preference to the local file.
This module provides classes for each file type to extract suitable metadata where available.
- class ihm.metadata.MRCParser[source]¶
Extract metadata from an EM density map (MRC file).
- parse_file(filename)[source]¶
Extract metadata. See
Parser.parse_file()
for details.- Returns:
a dict with key dataset pointing to the density map, as an EMDB entry if the file contains EMDB headers, otherwise to the file itself.
If the file turns out to be an EMDB entry, this will also query the EMDB web API (if available) to extract version information and details for the dataset.
- class ihm.metadata.PDBParser[source]¶
Extract metadata (e.g. PDB ID, comparative modeling templates) from a PDB file. This handles PDB headers added by the PDB database itself, comparative modeling packages such as MODELLER and Phyre2, and also some custom headers that can be used to indicate that a file has been locally modified in some way.
See also
CIFParser
for coordinate files in mmCIF format, orBinaryCIFParser
for BinaryCIF format.- parse_file(filename)[source]¶
Extract metadata. See
Parser.parse_file()
for details.- Parameters:
filename (str) – the file to extract metadata from.
- Returns:
a dict with key dataset pointing to the PDB dataset; ‘templates’ pointing to a dict with keys the asym (chain) IDs in the PDB file and values the list of comparative model templates used to model that chain as
ihm.startmodel.Template
objects; ‘entity_source’ pointing to a dict with keys the asym IDs and valuesihm.source.Source
objects; ‘software’ pointing to a list of software used to generate the file (asihm.Software
objects); ‘script’ pointing to the script used to generate the file, if any (asihm.location.WorkflowFileLocation
objects); ‘metadata’ a list of PDB metadata records.
This parser looks at PDB headers. Standard PDB database headers are recognized, plus some added by common comparative modeling packages such as MODELLER and Phyre2, as well as some custom headers that can be used to denote that a PDB file is a locally-modified version of some other resource. Additional details will be extracted from other PDB headers if available, such as
TITLE
records.If the first line of the file starts with
HEADER
and it also contains a PDB ID, then the file is assumed to live in the PDB database. For example, the following will be interpreted as PDB entry 2HBJ:HEADER HYDROLASE, GENE REGULATION 14-JUN-06 2HBJ
If the first line starts with
EXPDTA DERIVED FROM
then the file is assumed to derive from a given PDB ID or a comparative or integrative model available at a given DOI.TITLE
records are expected to describe the nature of the transformation:EXPDTA DERIVED FROM PDB:1YKH EXPDTA DERIVED FROM COMPARATIVE MODEL, DOI:10.1093/nar/gkt704 EXPDTA DERIVED FROM INTEGRATIVE MODEL, DOI:10.1016/j.str.2017.01.006
A first line starting with
REMARK 99 Chain ID :
is assumed to be a model generated by Phyre2. Template information can be added using Modeller-style headers, as below, if desired.A first line starting with
EXPDTA THEORETICAL MODEL, MODELLER
is assumed to be a model generated by Modeller. Headers generated by modern versions of Modeller are parsed to extract information about the comparative modeling script, plus the templates used and their alignment. Templates named1abcX
or1abcX_N
are assumed to be structures deposited in PDB (in this case, chain X in structure 1ABC). A customTEMPLATE PATH
header can be used to point to templates that are not deposited in the PDB database. For example, the model below is assumed to be constructed using templates from PDB codes 3JRO and 3F3F, plus another template inmy_custom_pdb_file.pdb
, and the given alignment:EXPDTA THEORETICAL MODEL, MODELLER 9.18 2017/02/10 22:21:34 REMARK 6 ALIGNMENT: modeller_model.ali REMARK 6 SCRIPT: model-default.py REMARK 6 TEMPLATE PATH custom1 ../inputs/my_custom_pdb_file.pdb REMARK 6 TEMPLATE: 3jroC 33:C - 424:C MODELS 33:A - 424:A AT 100.0% REMARK 6 TEMPLATE: 3f3fG 482:G - 551:G MODELS 429:A - 488:A AT 10.0% REMARK 6 TEMPLATE: custom1 9:A - 352:A MODELS 80:A - 414:A AT 32.0%
A first line starting with
TITLE SWISS-MODEL SERVER
is assumed to be a model generated by SWISS-MODEL, and information about the template(s) is extracted fromREMARK 3
records.
- class ihm.metadata.CIFParser[source]¶
Extract metadata (e.g. PDB ID, comparative modeling templates) from an mmCIF file. This currently handles mmCIF files from the PDB database itself, models compliant with the ModelCIF dictionary, plus files from Model Archive or the outputs from the MODELLER comparative modeling package.
See also
PDBParser
for coordinate files in legacy PDB format, orBinaryCIFParser
for BinaryCIF format.- parse_file(filename)[source]¶
Extract metadata. See
Parser.parse_file()
for details.- Parameters:
filename (str) – the file to extract metadata from.
- Returns:
a dict with key dataset pointing to the coordinate file, as an entry in the PDB or Model Archive databases if the file contains appropriate headers, otherwise to the file itself; ‘templates’ pointing to a dict with keys the asym (chain) IDs in the PDB file and values the list of comparative model templates used to model that chain as
ihm.startmodel.Template
objects; ‘software’ pointing to a list of software used to generate the file (asihm.Software
objects); ‘script’ pointing to the script used to generate the file, if any (asihm.location.WorkflowFileLocation
objects).