The ihm.format Python module¶
Utility classes to handle CIF format.
This module provides classes to read in and write out mmCIF files. It is
only concerned with handling syntactically correct CIF - it does not know
the set of tables or the mapping to ihm objects. For that,
see ihm.dumper for writing and ihm.reader for reading.
See also the stream parser example and the token reader example.
- class ihm.format.CifWriter(fh)[source]¶
Write information to a CIF file. The constructor takes a single argument - a Python filelike object to write to - and provides methods to write Python objects to that file. Most simple Python types are supported (string, float, bool, int). The Python bool type is mapped to CIF strings ‘NO’ and ‘YES’. Floats are always represented with 3 decimal places (or in scientific notation with 3 digits of precision if smaller than 1e-3); if a different amount of precision is desired, convert the float to a string first.
- category(category)[source]¶
Return a context manager to write a CIF category. A CIF category is a simple list of key:value pairs.
- Parameters:
category (str) – the name of the category (e.g. “_struct_conf_type”).
- Returns:
an object with a single method write which takes keyword arguments.
For example:
with writer.category("_struct_conf_type") as l: l.write(id='HELX_P', criteria=writer.unknown)
- loop(category, keys)[source]¶
Return a context manager to write a CIF loop.
- Parameters:
category (str) – the name of the category (e.g. “_struct_conf”)
keys (list) – the field keys in that category
- Returns:
an object with a single method write which takes keyword arguments; this can be called any number of times to add entries to the loop. Any field keys in keys that are not provided as arguments to write, or values that are the Python value None, will get the CIF omitted value (‘.’), while arguments to write that are not present in keys will be ignored.
For example:
with writer.loop("_struct_conf", ["id", "conf_type_id"]) as l: for i in range(5): l.write(id='HELX_P1%d' % i, conf_type_id='HELX_P')
- class ihm.format.CifReader(fh, category_handler, unknown_category_handler=None, unknown_keyword_handler=None)[source]¶
Class to read an mmCIF file and extract some or all of its data.
Use
read_file()to actually read the file.See also
CifTokenReaderfor a class that operates on the lower-level structure of an mmCIF file, preserving data such as comments and whitespace.- Parameters:
fh (file) – Open handle to the mmCIF file
category_handler (dict) – A dict to handle data extracted from the file. Keys are category names (e.g. “_entry”) and values are objects that have a __call__ method and not_in_file, omitted, and unknown attributes. The names of the arguments to this __call__ method are mmCIF keywords that are extracted from the file (for the keywords tr_vector[N] and rot_matrix[N][M] simply omit the [ and ] characters, since these are not valid for Python identifiers). The object will be called with the data from the file as a set of strings, or not_in_file, omitted or unknown for any keyword that is not present in the file, the mmCIF omitted value (.), or mmCIF unknown value (?) respectively. (mmCIF keywords are case insensitive, so this class always treats them as lowercase regardless of the file contents.)
unknown_category_handler – A callable (or None) that is called for each category in the file that isn’t handled; it is given two arguments: the name of the category, and the line in the file at which the category was encountered (if known, otherwise None).
unknown_keyword_handler – A callable (or None) that is called for each keyword in the file that isn’t handled (within a category that is handled); it is given three arguments: the names of the category and keyword, and the line in the file at which the keyword was encountered (if known, otherwise None).
- read_file()[source]¶
Read the file and extract data. Category handlers will be called as data becomes available - for
loop_constructs, this will be once for each row in the loop; for categories (e.g._entry.id model), this will be once at the very end of the file.If the C-accelerated _format module is available, then it is used instead of the (much slower) Python tokenizer.
CifParserErrorwill be raised if the file cannot be parsed.- Returns:
True iff more data blocks are available to be read.
- class ihm.format.CifTokenReader(fh)[source]¶
Read an mmCIF file and break it into tokens.
Unlike
CifReaderwhich extracts selected data from an mmCIF file, this class operates on the file at a lower level, splitting it into tokens, and preserving data such as comments and whitespace. This can be used for various housekeeping tasks directly on an mmCIF file, such as changing chain IDs or renaming categories or data items.Use
read_file()to actually read the file.- Parameters:
fh (file) – Open handle to the mmCIF file
- read_file(filters=None)[source]¶
Read the file and yield tokens and/or token groups. The exact type of the tokens is subject to change and is not currently documented; however, each token or group object has an
as_mmcifmethod which returns the corresponding text in mmCIF format. Thus, the file can be reconstructed by concatenating the result ofas_mmciffor all tokens.CifParserErrorwill be raised if the file cannot be parsed.
- class ihm.format.Filter(target)[source]¶
Base class for filters used by
CifTokenReader.read_file().Typically, a subclass such as
ChangeValueFilteris used when reading an mmCIF file.- Parameters:
target (str) – the mmCIF data item this filter should act on. It can be the full name of the data item (including category) such as
_entity.type; or just the attribute or keyword name such as.type_symbolwhich would match any category (e.g._atom_site.type_symbol).
- filter_category(tok)[source]¶
Filter the given category token.
- Returns:
the original token (which may have been modified), a replacement token, or None if the token should be deleted.
- filter_loop_header(tok)[source]¶
Filter the given loop header token.
- Returns:
the original token (which must not have been modified), a replacement token, or None if the token should be deleted. If the header token is replaced or deleted, all of the original loop rows will also be deleted.
- get_loop_filter(tok)[source]¶
Given a loop header token, potentially return a handler for each loop row token. This function is also permitted to alter the header in place (but not replace or remove it). Keywords should not be removed from the header (as that may confuse other filters) but can be replaced with null tokens.
- Returns:
a callable which will be called for each loop row token (and acts like
filter_category()), or None if no filtering is needed for this loop.
- class ihm.format.ChangeValueFilter(target, old, new)[source]¶
Change any token that sets a data item to
oldto benew.For example, this could be used to rename certain chains, or change all residues of a certain type.
- Parameters:
old (str) – The existing value of the data item.
new (str) – The new value of the data item.
See
Filterfor a description of thetargetparameter.
- class ihm.format.ChangeFuncValueFilter(target, func)[source]¶
Change any token that sets a data item to x to be f(x).
For example, this could be used to perform a search and replace on a string, or match against a regex.
- Parameters:
func (callable) – A function that is given the existing value of the data item, the category name (e.g.
_atom_site), and the keyword name (e.g.auth_seq_id), and should return the new value of the data item (perhaps unchanged).
See
Filterfor a description of thetargetparameter.
- class ihm.format.RemoveItemFilter(target)[source]¶
Remove any token from the file that sets the given data item.
See
Filterfor a description of thetargetparameter.
- class ihm.format.ChangeKeywordFilter(target, new)[source]¶
Change the keyword in any applicable token to be
new.- Parameters:
new (str) – The new keyword.
See
Filterfor a description of thetargetparameter.
- class ihm.format.ReplaceCategoryFilter(target, raw_cif=None, dumper=None, system=None)[source]¶
Replace any token from the file that sets the given category.
This can also be used to completely remove a category if no replacement is given.
- Parameters:
target (str) – the mmCIF category name this filter should act on, such as
_entity.raw_cif (str) – if given, text in mmCIF format which should replace the first instance of the category.
dumper (
ihm.dumper.Dumper) – if given, a dumper object that should generate mmCIF output to replace the first instance of the category.system (
ihm.System) – the System that the given dumper will work on.