The ihm.metadata Python module¶
Classes to extract metadata from various input files.
Often input files contain metadata that would be useful to include in the mmCIF file, but the metadata is stored in a different way for each domain-specific file type. For example, MRC files used for electron microscopy maps may contain an EMDB identifier, which the mmCIF file can point to in preference to the local file.
This module provides classes for each file type to extract suitable metadata where available.
- class ihm.metadata.MRCParser[source]¶
Extract metadata from an EM density map (MRC file).
- parse_file(filename)[source]¶
Extract metadata. See
Parser.parse_file()for details.- Returns:
a dict with key dataset pointing to the density map, as an EMDB entry if the file contains EMDB headers, otherwise to the file itself.
If the file turns out to be an EMDB entry, this will also query the EMDB web API (if available) to extract version information and details for the dataset.
- class ihm.metadata.PDBParser[source]¶
Extract metadata (e.g. PDB ID, comparative modeling templates) from a PDB file. This handles PDB headers added by the PDB database itself, comparative modeling packages such as MODELLER and Phyre2, and also some custom headers that can be used to indicate that a file has been locally modified in some way.
See also
CIFParserfor coordinate files in mmCIF format, orBinaryCIFParserfor BinaryCIF format.- parse_file(filename)[source]¶
Extract metadata. See
Parser.parse_file()for details.- Parameters:
filename (str) – the file to extract metadata from.
- Returns:
a dict with key dataset pointing to the PDB dataset; ‘templates’ pointing to a dict with keys the asym (chain) IDs in the PDB file and values the list of comparative model templates used to model that chain as
ihm.startmodel.Templateobjects; ‘entity_source’ pointing to a dict with keys the asym IDs and valuesihm.source.Sourceobjects; ‘software’ pointing to a list of software used to generate the file (asihm.Softwareobjects); ‘script’ pointing to the script used to generate the file, if any (asihm.location.WorkflowFileLocationobjects); ‘metadata’ a list of PDB metadata records.
This parser looks at PDB headers. Standard PDB database headers are recognized, plus some added by common comparative modeling packages such as MODELLER and Phyre2, as well as some custom headers that can be used to denote that a PDB file is a locally-modified version of some other resource. Additional details will be extracted from other PDB headers if available, such as
TITLErecords.If the first line of the file starts with
HEADERand it also contains a PDB ID, then the file is assumed to live in the PDB database. For example, the following will be interpreted as PDB entry 2HBJ:HEADER HYDROLASE, GENE REGULATION 14-JUN-06 2HBJ
If the first line starts with
EXPDTA DERIVED FROMthen the file is assumed to derive from a given PDB ID or a comparative or integrative model available at a given DOI.TITLErecords are expected to describe the nature of the transformation:EXPDTA DERIVED FROM PDB:1YKH EXPDTA DERIVED FROM COMPARATIVE MODEL, DOI:10.1093/nar/gkt704 EXPDTA DERIVED FROM INTEGRATIVE MODEL, DOI:10.1016/j.str.2017.01.006
A first line starting with
REMARK 99 Chain ID :is assumed to be a model generated by Phyre2. Template information can be added using Modeller-style headers, as below, if desired.A first line starting with
EXPDTA THEORETICAL MODEL, MODELLERis assumed to be a model generated by Modeller. Headers generated by modern versions of Modeller are parsed to extract information about the comparative modeling script, plus the templates used and their alignment. Templates named1abcXor1abcX_Nare assumed to be structures deposited in PDB (in this case, chain X in structure 1ABC). A customTEMPLATE PATHheader can be used to point to templates that are not deposited in the PDB database. For example, the model below is assumed to be constructed using templates from PDB codes 3JRO and 3F3F, plus another template inmy_custom_pdb_file.pdb, and the given alignment:EXPDTA THEORETICAL MODEL, MODELLER 9.18 2017/02/10 22:21:34 REMARK 6 ALIGNMENT: modeller_model.ali REMARK 6 SCRIPT: model-default.py REMARK 6 TEMPLATE PATH custom1 ../inputs/my_custom_pdb_file.pdb REMARK 6 TEMPLATE: 3jroC 33:C - 424:C MODELS 33:A - 424:A AT 100.0% REMARK 6 TEMPLATE: 3f3fG 482:G - 551:G MODELS 429:A - 488:A AT 10.0% REMARK 6 TEMPLATE: custom1 9:A - 352:A MODELS 80:A - 414:A AT 32.0%
A first line starting with
TITLE SWISS-MODEL SERVERis assumed to be a model generated by SWISS-MODEL, and information about the template(s) is extracted fromREMARK 3records.
- class ihm.metadata.CIFParser[source]¶
Extract metadata (e.g. PDB ID, comparative modeling templates) from an mmCIF file. This currently handles mmCIF files from the PDB database itself, models compliant with the ModelCIF dictionary, plus files from Model Archive or the outputs from the MODELLER comparative modeling package.
See also
PDBParserfor coordinate files in legacy PDB format, orBinaryCIFParserfor BinaryCIF format.- parse_file(filename)[source]¶
Extract metadata. See
Parser.parse_file()for details.- Parameters:
filename (str) – the file to extract metadata from.
- Returns:
a dict with key dataset pointing to the coordinate file, as an entry in the PDB or Model Archive databases if the file contains appropriate headers, otherwise to the file itself; ‘templates’ pointing to a dict with keys the asym (chain) IDs in the PDB file and values the list of comparative model templates used to model that chain as
ihm.startmodel.Templateobjects; ‘entity_source’ pointing to a dict with keys the asym IDs and valuesihm.source.Sourceobjects; ‘software’ pointing to a list of software used to generate the file (asihm.Softwareobjects); ‘script’ pointing to the script used to generate the file, if any (asihm.location.WorkflowFileLocationobjects).