Tasks
What is a task?
A task is a data quality problem that requires action from a data provider, typically an local planning authority (LPA). Tasks are surfaced to providers through the Submit service so they can identify and fix problems with their data.
Tasks are distinct from issues: issues are granular, row-level records of individual data problems produced during pipeline processing. A task is a higher-level, actionable summary — one task may represent many individual issues of the same type on the same resource.
There are two types of task, each with a different source:
Log tasks
A log task is created when the pipeline cannot reach or download data from an endpoint. This happens when the endpoint returns a non-200 HTTP status code (e.g. 404 Not Found, 500 Internal Server Error). The task tells the LPA that their endpoint is inaccessible and needs to be checked.
The details field for a log task contains the HTTP status code and any exception message:
{"status": 404, "exception": "Not Found"}
Issue tasks
An issue task is created when data was successfully downloaded but contains errors that the pipeline cannot automatically fix. Only issues with severity error and responsibility external become tasks — these are problems that must be fixed by the data provider, not by the data management team.
Issue tasks are grouped: multiple rows with the same issue type, field, resource, and dataset are collapsed into a single task with a count. The details field records what the problem is:
{"issue_type": "invalid-geometry", "count": 3, "field": "geometry"}
Task columns
| Column | Description |
|---|---|
dataset |
The dataset the task relates to (e.g. conservation-area) |
organisation |
The organisation responsible for fixing it |
endpoint |
The endpoint hash the task relates to |
resource |
The resource hash (empty for log tasks where no resource was downloaded) |
details |
JSON string — structure varies by task_source, see above |
severity |
Always error |
responsibility |
Always external (provider must fix) |
task_source |
Either log or issue |
entry_date |
Date the task was generated |
reference |
16-character unique identifier, derived as a truncated SHA-256 hash of key task fields |
How tasks are generated
Tasks are generated nightly as part of the build-digital-land-builder pipeline. See the generate-tasks process documentation for technical details.
The full task table is regenerated from scratch each night and stored as a Delta Lake table, making it queryable via the Submit service.