Utilities

ocdskit.util.grouper(iterable, n, fillvalue=None)[source]
class ocdskit.util.SerializableGenerator(iterable)[source]
class ocdskit.util.JSONEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]
default(obj)[source]

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return super().default(o)
ocdskit.util.iterencode(data, *, ensure_ascii=False, **kwargs)[source]

Return a generator that yields each string representation as available.

ocdskit.util.json_dump(data, io, *, ensure_ascii=False, **kwargs)[source]

Dump JSON to a file-like object.

ocdskit.util.json_dumps(data, *, ensure_ascii=False, indent=None, sort_keys=False, **kwargs)[source]

Dump JSON to a string, and return it.

ocdskit.util.get_definitions_keyword(schema)[source]

Return the schema’s definitions keyword, defaulting to $defs.

Parameters:

schema (dict) – a JSON schema

Returns:

"$defs" or "definitions"

Return type:

str

ocdskit.util.get_ocds_minor_version(data)[source]

Return the OCDS minor version of the release package, record package, release or record.

ocdskit.util.get_ocds_patch_tag(version)[source]

Return the OCDS patch version as a git tag (like 1__1__4) for a given minor version (like 1.1).

Raises:

UnknownVersionError – if the OCDS version is not recognized

ocdskit.util.is_package(data)[source]

Return whether the data is a release package or record package.

ocdskit.util.is_record_package(data)[source]

Return whether the data is a record package.

A record package has a required records field. Its other required fields are shared with release packages.

ocdskit.util.is_record(data)[source]

Return whether the data is a record.

A record has required releases and ocid fields.

ocdskit.util.is_release_package(data)[source]

Return whether the data is a release package.

A release package has a required releases field. Its other required fields are shared with record packages. To distinguish a release package from a record, we test for the absence of the ocid field.

ocdskit.util.is_release(data)[source]

Return whether the data is a release (embedded or linked, individual or compiled).

ocdskit.util.is_compiled_release(data)[source]

Return whether the data is a compiled release (embedded or linked).

ocdskit.util.is_linked_release(data, maximum_properties=3)[source]

Return whether the data is a linked release.

A linked release has required url and date fields and an optional tag field. An embedded release has required date and tag fields (among others), and it can have a url field as an additional field.

To distinguish a linked release from an embedded release, we test for the presence of the required url field and test whether the number of fields is fewer than three.

ocdskit.util.detect_format(path, root_path='', reader=<built-in function open>, additional_prefixes=())[source]

Return the format of OCDS data, and whether the OCDS data is concatenated or in an array.

If the OCDS data is concatenated or in an array, we assume that all items have the same format as the first item.

Parameters:
  • path (str) – the path to a file

  • root_path (str) – the path to the OCDS data within the file

  • additional_prefixes (tuple) – additional prefixes to consider as part of an empty package

Returns:

the format, whether data is concatenated, and whether data is in an array

Return type:

tuple

Raises:

UnknownFormatError – if the format cannot be detected

class ocdskit.util.Format(*values)[source]
compiled_release = 'compiled release'
empty_package = 'empty package'
record = 'record'
record_package = 'record package'
release = 'release'
release_package = 'release package'
versioned_release = 'versioned release'
ocdskit.util.longest_common_subsequence(x, y)[source]

Return the longest common subsequence of two word lists.