OCDS Commands¶
Optional arguments for all commands are:
--encoding ENCODING
the file encoding--ascii
print escape sequences instead of UTF-8 characters--pretty
pretty print output--root-path ROOT_PATH
the path to the items to process within each input
The inputs can be concatenated JSON or JSON arrays.
Note
An error is raised if the JSON is malformed or if the --encoding
is incorrect.
Handling edge cases¶
Large packages¶
If you are working with individual packages that are too large to hold in memory, use the echo command to reduce their size.
Embedded data¶
If you are working with files that embed OCDS data, use the --root-path ROOT_PATH
option to indicate the “root” path to the items to process within each input. For example, if release packages are in an array under a results
key, like so:
{
"results": [
{
"uri": "placeholder:",
"publisher": {"name": ""},
"publishedDate": "9999-01-01T00:00:00Z",
"version": "1.1",
"releases": []
}
]
}
You can run ocdskit <command> --root-path results
to process the release packages. The root path, in this case, is simply the results
key. OCDS Kit will read the entire results
array into memory, and process each array entry.
If the results
array is very large, you should run ocdskit <command> --root-path results.item
instead. The root path, in this case, is the results
key joined to the item
literal by a period (the item
literal indicates that the items to process are in an array). OCDS Kit will read each array entry into memory, instead of the entire results
array.
For this next example, you can run ocdskit <command> --root-path results.item.ocdsReleasePackage
:
{
"results": [
{
"ocdsReleasePackage": {
"uri": "placeholder:",
"publisher": {"name": ""},
"publishedDate": "9999-01-01T00:00:00Z",
"version": "1.1",
"releases": []
}
}
]
}
The root path, in this case, is the results
key joined to the item
literal, joined to the ocdsReleasePackage
key.
detect-format¶
Reads OCDS files, and reports whether each is:
- a release package
- a record package
- a release
- a record
- a compiled release
- a versioned release
- a JSON array of one of the above
- concatenated JSON of one of the above
Mandatory positional arguments:
file
OCDS files
ocdskit detect-format tests/fixtures/realdata/release-package-1.json tests/fixtures/realdata/record-package-1.json
compile¶
Reads release packages and individual releases from standard input, merges the releases by OCID, and prints the compiled releases.
Optional arguments:
--schema SCHEMA
the URL or path of the release schema to use--package
wrap the compiled releases in a record package--linked-releases
if--package
is set, use linked releases instead of full releases, if the input is a release package--versioned
if--package
is set, include versioned releases in the record package; otherwise, print versioned releases instead of compiled releases--uri URI
if--package
is set, set the record package’suri
to this value--published-date PUBLISHED_DATE
if--package
is set, set the record package’spublishedDate
to this value--version VERSION
if--package
is set, set the record package’sversion
to this value--publisher-name PUBLISHER_NAME
if--package
is set, set the record package’spublisher
’sname
to this value--publisher-uri PUBLISHER_URI
if--package
is set, set the record package’spublisher
’suri
to this value--publisher-scheme PUBLISHER_SCHEME
if--package
is set, set the record package’spublisher
’sscheme
to this value--publisher-uid PUBLISHER_UID
if--package
is set, set the record package’spublisher
’suid
to this value--fake
if--package
is set, set the record package’s required metadata to dummy values
If --package
is set, and if the --publisher-*
options aren’t used, the output package will have the same publisher as the last input package.
cat tests/fixtures/realdata/release-package-1.json | ocdskit compile > out.json
For the Python API, see ocdskit.combine.merge()
.
Note
An error is raised if a release is missing an ocid
field, or if the values of the release packages’ version
fields are inconsistent.
upgrade¶
Upgrades packages, records and releases from an old version of OCDS to a new version. Any data not in the old version is passed through. Note: Versioned releases within a record package are not upgraded.
OCDS 1.0 describes an organization’s name
, identifier
, address
and contactPoint
as relevant to identifying it. OCDS 1.1 moves organization data into a parties
array. To upgrade from OCDS 1.0 to 1.1, we create an id
for each organization, based on those identifying fields. This can result in duplicates in the parties
array, if the same organization has different or missing values for identifying fields in different contexts. This can also lead to data loss if the same organization has different values for non-identifying fields in different contexts; the command prints warnings in such cases.
Note: OCDS 1.0 uses the whole-list merge strategy on the suppliers
array to prepare the compiled release and versioned release, whereas OCDS 1.1 uses the identifier merge strategy. This means that you should merge first and then upgrade.
Mandatory positional arguments:
versions
the colon-separated old and new versions
cat tests/fixtures/realdata/release-package-1.json | ocdskit upgrade 1.0:1.1 > out.json
For the Python API, see Upgrade.
If a release package is too large, you can upgrade its individual releases using --root-path releases.item
.
Note
An error is raised if upgrading between the specified versions
is not implemented.
package-records¶
Reads records from standard input, and prints one record package.
Optional positional arguments:
extension
add this extension to the package
Optional arguments:
--uri URL
set the record package’suri
to this value--published-date PUBLISHED_DATE
set the record package’spublishedDate
to this value--version VERSION
set the record package’sversion
to this value--publisher-name PUBLISHER_NAME
set the record package’spublisher
’sname
to this value--publisher-uri PUBLISHER_URI
set the record package’spublisher
’suri
to this value--publisher-scheme PUBLISHER_SCHEME
set the record package’spublisher
’sscheme
to this value--publisher-uid PUBLISHER_UID
set the record package’spublisher
’suid
to this value--fake
set the record package’s required metadata to dummy values
cat tests/fixtures/record_*.json | ocdskit package-records > out.json
To convert record packages to a record package, you can use the --root-path
option:
cat tests/fixtures/realdata/record-package* | ocdskit package-records --root-path records.item
If --uri
and --published-date
are not set, the output package will be invalid. Use --fake
to set placeholder values.
For the Python API, see ocdskit.combine.package_records()
.
package-releases¶
Reads releases from standard input, and prints one release package.
Optional positional arguments:
extension
add this extension to the package
Optional arguments:
--uri URL
set the release package’suri
to this value--published-date PUBLISHED_DATE
set the release package’spublishedDate
to this value--version VERSION
set the release package’sversion
to this value--publisher-name PUBLISHER_NAME
set the release package’spublisher
’sname
to this value--publisher-uri PUBLISHER_URI
set the release package’spublisher
’suri
to this value--publisher-scheme PUBLISHER_SCHEME
set the release package’spublisher
’sscheme
to this value--publisher-uid PUBLISHER_UID
set the release package’spublisher
’suid
to this value--fake
set the release package’s required metadata to dummy values
cat tests/fixtures/release_*.json | ocdskit package-releases > out.json
To convert record packages to a release package, you can use the --root-path
option:
cat tests/fixtures/realdata/record-package* | ocdskit package-releases --root-path records.item.releases.item
If --uri
and --published-date
are not set, the output package will be invalid. Use --fake
to set placeholder values.
For the Python API, see ocdskit.combine.package_releases()
.
combine-record-packages¶
Reads record packages from standard input, collects packages and records, and prints one record package.
If the --publisher-*
options aren’t used, the output package will have the same publisher as the last input package.
Optional arguments:
--uri URL
set the record package’suri
to this value--published-date PUBLISHED_DATE
set the record package’spublishedDate
to this value--version VERSION
set the record package’sversion
to this value--publisher-name PUBLISHER_NAME
set the record package’spublisher
’sname
to this value--publisher-uri PUBLISHER_URI
set the record package’spublisher
’suri
to this value--publisher-scheme PUBLISHER_SCHEME
set the record package’spublisher
’sscheme
to this value--publisher-uid PUBLISHER_UID
set the record package’spublisher
’suid
to this value--fake
set the record package’s required metadata to dummy values
cat tests/fixtures/record-package_*.json | ocdskit combine-record-packages > out.json
If you need to create a single package that is too large to hold in your system’s memory, please comment on this issue.
For the Python API, see ocdskit.combine.combine_record_packages()
.
Note
A warning is issued if a package’s "records"
field isn’t set.
combine-release-packages¶
Reads release packages from standard input, collects releases, and prints one release package.
If the --publisher-*
options aren’t used, the output package will have the same publisher as the last input package.
Optional arguments:
--uri URL
set the release package’suri
to this value--published-date PUBLISHED_DATE
set the release package’spublishedDate
to this value--version VERSION
set the release package’sversion
to this value--publisher-name PUBLISHER_NAME
set the release package’spublisher
’sname
to this value--publisher-uri PUBLISHER_URI
set the release package’spublisher
’suri
to this value--publisher-scheme PUBLISHER_SCHEME
set the release package’spublisher
’sscheme
to this value--publisher-uid PUBLISHER_UID
set the release package’spublisher
’suid
to this value--fake
set the release package’s required metadata to dummy values
cat tests/fixtures/release-package_*.json | ocdskit combine-release-packages > out.json
If you need to create a single package that is too large to hold in your system’s memory, please comment on this issue.
For the Python API, see ocdskit.combine.combine_release_packages()
.
Note
A warning is issued if a package’s "releases"
field isn’t set.
split-record-packages¶
Reads record packages from standard input, and prints smaller record packages for each.
Mandatory positional arguments:
size
the number of records per package
cat tests/fixtures/realdata/record-package-1-2.json | ocdskit split-record-packages 2 | split -l 1 -a 4
The split
command will write files named xaaaa
, xaaab
, xaaac
, etc. Don’t combine the OCDS Kit --pretty
option with the split
command.
split-release-packages¶
Reads release packages from standard input, and prints smaller release packages for each.
Mandatory positional arguments:
size
the number of releases per package
cat tests/fixtures/realdata/release-package-1-2.json | ocdskit split-release-packages 2 | split -l 1 -a 4
The split
command will write files named xaaaa
, xaaab
, xaaac
, etc. Don’t combine the OCDS Kit --pretty
option with the split
command.
tabulate¶
Reads packages, records or releases from standard input and stores releases in a relational database.
Mandatory positional arguments:
database_url
a SQLAlchemy database URL
Optional arguments:
--drop
drop all tables before loading--schema SCHEMA
the release-schema.json to use
cat release_package.json | ocdskit tabulate sqlite:///data.db
For the format of database_url
, see the SQLAlchemy documentation.
Database structure¶
The database structure follows the specified schema. By default, the latest OCDS release schema is used.
The primary table is the releases
table. Secondary tables are created for each array in the schema: for example, there is an awards
table and an awards_suppliers
table.
Naming conventions¶
Table names are based on the JSON paths to the arrays, separated by underscores. For example, the data in the /contracts/*/implementation/documents
array is stored in the contracts_implementation_documents
table.
Column names are based on the JSON paths to the fields, relative to their containing arrays, separated by underscores. For example, the data in the /parties/*/address/region
field is stored in the address_region
column of the parties
table.
Foreign keys¶
Foreign keys are used to JOIN
rows that relate to the same object across tables.
Every table has an ocid
column to identify the contracting process, and a release_id
column to identify the release.
Secondary tables have additional foreign keys to identify a specific object in a given array. The column names follow the pattern {singular}_id
, where singular
is all but the last character of the table name. For example:
- The
awards
table has anaward_id
column to identify the award object by its/awards/*/id
value. - The
awards_suppliers
table has anaward_id
column to identify the award object by its/awards/*/id
value, and asupplier_id
column to identify the supplier by its/awards/*/suppliers/*/id
value.
Additional fields¶
Fields in the JSON data that aren’t described by the provided schema are treated as follows:
- If the field is an array, it is ignored and a warning is reported, for example:
table tender_participationFees does not exist
. - Otherwise, it is stored in a JSON object in an
extras
column. For example, an/awards/*/exchangeRate
value is stored in theextras
column of theawards
as a JSON object like{"exchangeRate": 1.23}
.
To limit the number of fields that are stored in the extras
column, extend the release schema with all relevant extensions, and then use the --schema
option.
Alternative approaches¶
Kingfisher Process stores OCDS releases as JSON blobs in a single column.
Flatten Tool flattens JSON data into CSV and Excel files and supports additional fields, additional arrays and many other ways to customize the output.
validate¶
Reads JSON data from standard input, validates it against the schema, and prints errors.
Optional arguments:
--schema SCHEMA
the URL or path of the schema to validate against--check-urls
check the HTTP status code if “format”: “uri”--timeout TIMEOUT
timeout (seconds) to GET a URL--verbose
print items without validation errors
cat tests/fixtures/* | ocdskit validate
Using a remote schema file:
cat tests/fixtures/* | ocdskit validate https://standard.open-contracting.org/latest/en/release-package-schema.json
Using a local schema file:
cat tests/fixtures/* | ocdskit validate file://path/to/schema.json
echo¶
Repeats the input, applying --encoding
, --ascii
, --pretty
and --root-path
, and using the UTF-8 encoding.
You can use this command to reformat data:
Use UTF-8 encoding:
cat iso-8859-1.json | ocdskit --encoding iso-8859-1 echo > utf-8.json
Use ASCII characters only:
cat unicode.json | ocdskit --ascii echo > ascii.json
Use UTF-8 characters where possible:
cat ascii.json | ocdskit echo > unicode.json
Pretty print:
cat compact.json | ocdskit --pretty echo > pretty.json
Make compact:
cat pretty.json | ocdskit echo > compact.json
You can also use this command to extract releases from release packages, and records from record packages. This is especially useful if a single package is too large to hold in memory.
Split a large record package into smaller packages of 100 records each:
cat large-record-package.json | ocdskit echo --root-path records.item | ocdskit package-records --size 100
Split a large release package into smaller packages of 1,000 releases each:
cat large-release-package.json | ocdskit echo --root-path releases.item | ocdskit package-releases --size 1000
Note that the package metadata from the large package won’t be retained in the smaller packages; you can use the optional arguments of the package-records and package-releases commands to set the package metadata.
If the single package is small enough to hold in memory, you can use the split-record-packages and split-release-packages commands instead, which retain the package metadata.
convert-to-oc4ids¶
Reads individual releases or release packages from standard input, and prints a single project conforming to the Open Contracting for Infrastructure Data Standards (OC4IDS). It assumes all inputs belong to the same project.
The logic for the mappings between OCDS and OC4IDS fields is documented here.
Optional arguments:
--project-id PROJECT_ID
set the project’sid
to this value--all-transforms
run all optional transforms--transforms OPTIONS
comma-separated list of optional transforms to run--package
wrap the project in a project package--uri URI
if--package
is set, set the project package’suri
to this value--published-date PUBLISHED_DATE
if--package
is set, set the project package’spublishedDate
to this value--version VERSION
if--package
is set, set the project package’sversion
to this value--publisher-name PUBLISHER_NAME
if--package
is set, set the project package’spublisher
’sname
to this value--publisher-uri PUBLISHER_URI
if--package
is set, set the project package’spublisher
’suri
to this value--publisher-scheme PUBLISHER_SCHEME
if--package
is set, set the project package’spublisher
’sscheme
to this value--publisher-uid PUBLISHER_UID
if--package
is set, set the project package’spublisher
’suid
to this value--fake
if--package
is set, set the project package’s required metadata to dummy values
cat releases.json | ocdskit convert-to-oc4ids > out.json
Transforms¶
The transforms that are run are described here.
additional_classifications
,description
,sector
,title
: populate top-level fields with their equivalents fromplanning.project
administrative_entity
,public_authority_role
,procuring_entity
,suppliers
: populate theparties
field according to the partyrole
budget
: populatesbudget.amount
with its equivalentbudget_approval
,environmental_impact
,land_and_settlement_impact
andproject_scope
: populate thedocuments
field fromplanning.documents
according to thedocumentType
contracting_process_setup
: Sets up thecontractingProcesses
array of objects withid
,summary
,releases
andembeddedReleases
. Some of the other transforms depend on this, so it is run firstcontract_period
: populates thesummary.contractPeriod
field with appropriate values fromawards
ortender
contract_price
: populates thesummary.contractValue
field with the sum of allawards.value
fields where the currency is the samecost_estimate
: populates thesummary.tender.costEstimate
field with the appropriatetender.value
contract_process_description
: populates thesummary.description
field from appropriate values incontracts
,awards
ortender
contract_status
: populates thesummary.status
field using thecontractingProcessStatus
codelist.contract_title
: populatessummary.title
from the title field inawards
,contracts
ortender
final_audit
: populate thedocuments
field fromcontracts.implementation.documents
according to thedocumentType
funding_sources
: updatesparties
with organizations havingfunder
in theirroles
or fromplanning.budgetBreakdown.sourceParty
location
: populates thelocations
field with an array of location objects fromplanning.projects.locations
procurement_process
: populates the.summary.tender.procurementMethod
and.summary.tender.procurementMethodDetails
fields with their equivalents fromtender
purpose
: populates thepurpose
field fromplanning.rationale
Optional transforms¶
Some transforms are not run automatically, but only if set. The following transforms are included if they are listed in using the --transforms
argument (as part of a comma-separated list) or if --all-transforms
is passed.
buyer_role
: updates theparties
field with parties that havebuyer
in theirroles
description_tender
: populate thedescription
field fromtender.description
if no other is availablelocation_from_items
: populate thelocations
field fromdeliveryLocation
ordeliveryAddress
intender.items
if no other is availableproject_scope_summary
: updatessummary.tender
withitems
andmilestones
fromtender
purpose_needs_assessment
: populate thedocuments
field fromplanning.documents
according to thedocumentType
needsAssessment
title_from_tender
: populate thetitle
field fromtender.title
if no other is available
Transformation Notes¶
Most transforms follow the logic in the mapping documentation. However, there is some room for interpretation in some of the mappings, so here are some notes about these interpretations.
Differing text across multiple contracting process¶
planning/project/title, project/planning/description (planning and budget extension):
If there are any contradictions i.e one contract says the title is different from another a warning is raised and the field is ignored in that case. If all contracting processes agree (when the fields exists in them) then the value is still used.
tender/title, tender/description, /planning/rationale:
If there a multiple contradicting process then we concatenate the strings and put the ocid in angle brackets like:
<someocid> a tender description <anotherocid> another description
If there is only one contracting processes then the ocid part is omitted.
Parties ID across multiple contracting processes¶
When parties/id
from different contracting processes are conflicting and also if there are parties in multiple contracting processes that are the same, we need to identify which are in fact the same party.
The logic that the transforms do to work out matching parties:
- If all
parties/id
are unique across contracting processes then do nothing and add all parties to the project. - If there are conflicting parties/id then look at the
identifier
field and if there arescheme
andid
make an id ofsomescheme-someid
and use that in order to match parties across processes. If there are different roles then add them to the same party. Use the other fields from the first party found with this id. - If there is no
identifier
then make up a new auto increment number and use that as theid
. This means the original IDs get replaced and are lost in the mapping - If there is no
identifier
and all fields apart fromroles
andid
are the same across parties then treat that as a single party and add the roles together and use a single generatedid
.
Document ID across multiple contracting processes¶
If there are are only unique project/documents/id keep the ids the same. Otherwise create a new auto-increment for all docs. This means the original ``documents/id`` are lost
Project Sector¶
Sectors are gathered from planning/project/sector
and it gets all unique scheme
and id
of the form <scheme>-<id>
and adds them to the sector
array. This could mean that the sectors generated are not in the Project Sector Codelist.
Release Links¶
contractingProcesses/releases
within OC4IDS has link to a releases via a URL. This URL will be generated if OCDS release packages are supplied and a uri
is in the package data. However, if this is not case the transform adds an additional field contractingProcesses/embeddedReleases
which contains all releases supplied in their full.
Project Scope Summary¶
If --all-transforms
is set or if project_scope_summary
is included in --transforms
it copies over all tender/items
and tender/milestones
to contractingProcess/tender
. This is to give the output enough information in order to infer project scope.