The ihm.format Python module

Utility classes to handle CIF format.

This module provides classes to read in and write out mmCIF files. It is only concerned with handling syntactically correct CIF - it does not know the set of tables or the mapping to ihm objects. For that, see ihm.dumper for writing and ihm.reader for reading.

See also the stream parser example and the token reader example.

class ihm.format.CifWriter(fh)[source]

Write information to a CIF file. The constructor takes a single argument - a Python filelike object to write to - and provides methods to write Python objects to that file. Most simple Python types are supported (string, float, bool, int). The Python bool type is mapped to CIF strings ‘NO’ and ‘YES’. Floats are always represented with 3 decimal places (or in scientific notation with 3 digits of precision if smaller than 1e-3); if a different amount of precision is desired, convert the float to a string first.

category(category)[source]

Return a context manager to write a CIF category. A CIF category is a simple list of key:value pairs.

Parameters:

category (str) – the name of the category (e.g. “_struct_conf_type”).

Returns:

an object with a single method write which takes keyword arguments.

For example:

with writer.category("_struct_conf_type") as l:
    l.write(id='HELX_P', criteria=writer.unknown)
loop(category, keys)[source]

Return a context manager to write a CIF loop.

Parameters:
  • category (str) – the name of the category (e.g. “_struct_conf”)

  • keys (list) – the field keys in that category

Returns:

an object with a single method write which takes keyword arguments; this can be called any number of times to add entries to the loop. Any field keys in keys that are not provided as arguments to write, or values that are the Python value None, will get the CIF omitted value (‘.’), while arguments to write that are not present in keys will be ignored.

For example:

with writer.loop("_struct_conf", ["id", "conf_type_id"]) as l:
    for i in range(5):
        l.write(id='HELX_P1%d' % i, conf_type_id='HELX_P')
start_block(name)[source]

Start a new data block in the file with the given name.

write_comment(comment)[source]

Write a simple comment to the CIF file. The comment will be wrapped if necessary for readability. See _set_line_wrap().

class ihm.format.CifReader(fh, category_handler, unknown_category_handler=None, unknown_keyword_handler=None)[source]

Class to read an mmCIF file and extract some or all of its data.

Use read_file() to actually read the file.

See also CifTokenReader for a class that operates on the lower-level structure of an mmCIF file, preserving data such as comments and whitespace.

Parameters:
  • fh (file) – Open handle to the mmCIF file

  • category_handler (dict) – A dict to handle data extracted from the file. Keys are category names (e.g. “_entry”) and values are objects that have a __call__ method and not_in_file, omitted, and unknown attributes. The names of the arguments to this __call__ method are mmCIF keywords that are extracted from the file (for the keywords tr_vector[N] and rot_matrix[N][M] simply omit the [ and ] characters, since these are not valid for Python identifiers). The object will be called with the data from the file as a set of strings, or not_in_file, omitted or unkonwn for any keyword that is not present in the file, the mmCIF omitted value (.), or mmCIF unknown value (?) respectively. (mmCIF keywords are case insensitive, so this class always treats them as lowercase regardless of the file contents.)

  • unknown_category_handler – A callable (or None) that is called for each category in the file that isn’t handled; it is given two arguments: the name of the category, and the line in the file at which the category was encountered (if known, otherwise None).

  • unknown_keyword_handler – A callable (or None) that is called for each keyword in the file that isn’t handled (within a category that is handled); it is given three arguments: the names of the category and keyword, and the line in the file at which the keyword was encountered (if known, otherwise None).

read_file()[source]

Read the file and extract data. Category handlers will be called as data becomes available - for loop_ constructs, this will be once for each row in the loop; for categories (e.g. _entry.id model), this will be once at the very end of the file.

If the C-accelerated _format module is available, then it is used instead of the (much slower) Python tokenizer.

CifParserError will be raised if the file cannot be parsed.

Returns:

True iff more data blocks are available to be read.

class ihm.format.CifTokenReader(fh)[source]

Read an mmCIF file and break it into tokens.

Unlike CifReader which extracts selected data from an mmCIF file, this class operates on the file at a lower level, splitting it into tokens, and preserving data such as comments and whitespace. This can be used for various housekeeping tasks directly on an mmCIF file, such as changing chain IDs or renaming categories or data items.

Use read_file() to actually read the file.

Parameters:

fh (file) – Open handle to the mmCIF file

read_file(filters=None)[source]

Read the file and yield tokens and/or token groups. The exact type of the tokens is subject to change and is not currently documented; however, each token or group object has an as_mmcif method which returns the corresponding text in mmCIF format. Thus, the file can be reconstructed by concatentating the result of as_mmcif for all tokens.

CifParserError will be raised if the file cannot be parsed.

Parameters:

filters (sequence of Filter) – if a list of Filter objects is provided, the read tokens will be modified or removed by each of these filters before being returned.

Returns:

tokens and/or token groups.

class ihm.format.Filter(target)[source]

Base class for filters used by CifTokenReader.read_file().

Typically, a subclass such as ChangeValueFilter is used when reading an mmCIF file.

Parameters:

target (str) – the mmCIF data item this filter should act on. It can be the full name of the data item (including category) such as _entity.type; or just the attribute or keyword name such as .type_symbol which would match any category (e.g. _atom_site.type_symbol).

filter_category(tok)[source]

Filter the given category token.

Returns:

the original token (which may have been modified), a replacement token, or None if the token should be deleted.

get_loop_filter(tok)[source]

Given a loop header token, potentially return a handler for each loop row token. This function is also permitted to alter the header in place (but not replace or remove it). Keywords should not be removed from the header (as that may confuse other filters) but can be replaced with null tokens.

Returns:

a callable which will be called for each loop row token (and acts like filter_category()), or None if no filtering is needed for this loop.

match_token_category(tok)[source]

Return true iff the given token matches the target’s category

match_token_keyword(tok)[source]

Return true iff the given token matches the target’s category and keyword

class ihm.format.ChangeValueFilter(target, old, new)[source]

Change any token that sets a data item to old to be new.

For example, this could be used to rename certain chains, or change all residues of a certain type.

Parameters:
  • old (str) – The existing value of the data item.

  • new (str) – The new value of the data item.

See Filter for a description of the target parameter.

class ihm.format.RemoveItemFilter(target)[source]

Remove any token from the file that sets the given data item.

See Filter for a description of the target parameter.

class ihm.format.ChangeKeywordFilter(target, new)[source]

Change the keyword in any applicable token to be new.

Parameters:

new (str) – The new keyword.

See Filter for a description of the target parameter.

exception ihm.format.CifParserError[source]

Exception raised for invalid format mmCIF files