API Reference ============= .. toctree:: :maxdepth: 2 api/combine api/upgrade api/mapping_sheet api/packager api/schema api/util api/cli api/exceptions Working with streams -------------------- A naive program buffers all inputs into memory before writing any outputs. OCDS files can be very large, and loading them into memory can exhaust all available memory. The :doc:`command-line interface` therefore reads inputs and writes outputs progressively or one-at-a-time (that is, it "streams"), as much as possible. Streaming writes outputs faster and requires less memory than buffering. Output ~~~~~~ Several library methods return dictionaries with generators as values, which can't be serialized using the ``json`` module without extra work. Use the :func:`ocdskit.util.json_dumps`, :func:`ocdskit.util.json_dump` and :func:`ocdskit.util.iterencode` methods instead. Input ~~~~~ The command-line interface uses `ijson `__ to iteratively parse the JSON inputs with a read buffer of 64 kB. To start, this uses the same amount of memory as ``json.load(f)``: .. code-block:: python import ijson with open(filename) as f: for item in ijson.items(f, ''): # do stuff If you are working with release packages or record packages and only need the releases or records, set the ``prefix`` argument (:code:`''` above) as described in `ijson's documentation `__. Instead of loading the entire package into memory, this instead loads each release or record one-at-a-time. For example: .. code-block:: python for item in ijson.items(f, 'releases.item'): .. code-block:: python for item in ijson.items(f, 'records.item'): The ``prefix`` argument is also relevant if you are working with files that :ref:`embed OCDS data`. For example: .. code-block:: python for item in ijson.items(f, 'results.item'): If you are parsing `concatenated JSON `__, add :code:`multiple_values=True`. For example: .. code-block:: python for item in ijson.items(f, '', multiple_values=True):