Config repository automated data tests
Every push to the config repository triggers .github/workflows/test.yml, which runs make test. This is the only workflow that runs on every push/commit — everything else in .github/workflows is either manually triggered (workflow_dispatch) or scheduled (cron)/dispatched from another system.
make test runs three pytest suites (test-unit, test-integration, test-acceptance). This page covers test-acceptance — pytest tests/acceptance/test_config_dataset.py — as these are the tests that validate the config data itself (the CSV files under pipeline/ and collection/). The unit and integration suites test the add-data and batch-assign Python scripts rather than config data, and aren’t covered here.
If a test fails, the PR/branch will show a failing check in GitHub. The assertion message lists exactly which rows/entities failed and why, with clickable links back to the offending line in the CSV (when run in CI on a branch).
NOTE!
These tests only check the shape and internal consistency of the config data (datatypes, uniqueness, ranges, cross-file consistency). They do not check whether the data is semantically correct against a live upstream source — see Configure and run expectations for that.
How the tests work
Each test loads a set of rules and runs them through CsvCheckpoint, from digital_land.expectations.checkpoints.csv. Each rule names an operation (a function in digital_land.expectations.operations.csv) plus the parameters for that operation. If any rule fails, the whole test fails and the assertion message includes every failing rule’s details.
General checks applied to (almost) every config CSV
Most of the tests below call a shared helper, _build_all_csv_rules, before running their own specific rules. This applies to every column.csv, combine.csv, concat.csv, default.csv, default-value.csv, endpoint.csv, expect.csv, filter.csv, old-entity.csv, old-resource.csv, patch.csv, skip.csv, source.csv, transform.csv, lookup.csv, and entity-organisation.csv file across every dataset:
- No blank rows — a row can’t be entirely empty.
- Every column must be registered in the specification — if a CSV has a column name that isn’t a recognised field in the specification, the test fails immediately (this usually means a typo in a header, or a new field that hasn’t been added to the specification yet).
- Column values must match their declared datatype — each field in the specification has a datatype (
integer,decimal,flag,latitude,longitude,curie,curie-list,json,date,datetime,pattern,multipolygon,point,url), and every value in that column is checked against it (e.g. alatitudecolumn must contain valid latitude values, aurlcolumn must match a URL pattern).
The tests
test_lookup
Runs against every lookup.csv. Checks:
- “lookup entities are within organisation ranges” — for each row, the
entitymust fall within theentity-minimum/entity-maximumrange recorded for the sameorganisationin that dataset’sentity-organisation.csv.- Rows are skipped (not checked) if
organisationis blank, isgovernment-organisation:D1342, or belongs to an organisation with anend_dateset (fetched live from datasette). - For
conservation-areaspecifically,local-authority:GLA,government-organisation:D1342andgovernment-organisation:PB1164(Historic England) are also excluded — HE is deliberately recorded against conservation-area entities alongside the owning local authority, so it isn’t expected to match a single organisation’s range.
- Rows are skipped (not checked) if
- The general CSV checks described above.
NOTE!
A large proportion of historiclookup.csvrows (particularly forlisted-building) have a blankorganisationfield, which means this check silently skips them — see config issue #2673 for background. Newly added data via the Manage Service always setsorganisation, so this mostly affects older/legacy entries.
test_entity_belongs_to_single_organisation
Runs against every lookup.csv. Checks that no single entity is recorded against more than one distinct organisation within the same file (ignoring rows with a blank organisation).
conservation-arearows are excluded, since it’s normal for a conservation-area entity to be recorded twice — once against the owning local authority and once against Historic England (government-organisation:PB1164).- The failure message names the specific
prefix(es) involved for each conflicting entity, since onelookup.csvcan contain several prefixes (e.g.local-plan’s lookup.csv also containsplan-timetable,minerals-plan,waste-plan, etc.) — this makes it clear which underlying collection actually has the conflict, rather than just the top-level dataset name. - This does not attempt to determine which organisation is correct — it only flags that more than one is recorded. Working out the correct organisation typically requires checking which endpoint/source the entity’s data actually came from, which isn’t information this test has access to.
Generic CSV structure tests
The following tests are all identical in behaviour — each just runs the general CSV checks against its named file, for every dataset that has one:
| Test | File checked |
|---|---|
test_column_csv |
column.csv |
test_combine_csv |
combine.csv |
test_concat_csv |
concat.csv |
test_default_csv |
default.csv |
test_default_value_csv |
default-value.csv |
test_endpoint_csv |
endpoint.csv |
test_expect_csv |
expect.csv |
test_filter_csv |
filter.csv |
test_old_entity_csv |
old-entity.csv |
test_old_resource_csv |
old-resource.csv |
test_patch_csv |
patch.csv |
test_skip_csv |
skip.csv |
test_source_csv |
source.csv |
test_transform_csv |
transform.csv |
test_old_entity
Runs against every old-entity.csv, in addition to the general CSV checks. Checks:
- “old-entity values are unique” — the same
old-entityvalue can’t appear twice in the same file. - “old-entity statuses only contains 301 or 410” — the
statuscolumn must only ever be301(permanent redirect, i.e. merged into another entity) or410(gone, i.e. retired).
test_entity_organisation
Runs against every entity-organisation.csv, in addition to the general CSV checks. Checks:
- “entity-minimum and entity-maximum ranges do not overlap” — no two rows for the same dataset can declare overlapping entity ID ranges, since each range should map unambiguously to one organisation.
NOTE!
This test normalises line endings (CRLF → LF) and strips trailing empty columns before running, so it isn’t affected by how the file happens to be saved.
Running the tests locally
From the config repo root:
```
pip install -r requirements.txt
pytest tests/acceptance/test_config_dataset.py
```
To run a single test, or a single dataset’s parametrised case:
```
pytest tests/acceptance/test_config_dataset.py -k test_lookup
pytest “tests/acceptance/test_config_dataset.py::test_lookup[pipeline/listed-building]”
```