digital_land.expectations.operations package

Submodules

digital_land.expectations.operations.csv module

digital_land.expectations.operations.csv.check_allowed_values(conn, file_path: Path, field: str, allowed_values: list)

Checks that a field contains only values from an allowed set.

Parameters:
  • conn -- duckdb connection

  • file_path -- path to the CSV file

  • field -- the column name to validate

  • allowed_values -- allowed values for the field

digital_land.expectations.operations.csv.check_field_is_within_range_by_dataset_org(conn, file_path: Path, field: str, external_file: Path, min_field: str, max_field: str, lookup_dataset_field: str, range_dataset_field: str, rules: dict | None = None, dataset_aliases: dict | None = None)

Check field values are within ranges matched by dataset field and organisation.

Matching is fixed to two keys: 1. lookup_dataset_field -> range_dataset_field 2. organisation -> organisation

Parameters:
  • conn -- duckdb connection

  • file_path -- path to the CSV file containing fields to validate

  • field -- single column name to validate (for example: "entity").

  • external_file -- path to the CSV file containing valid ranges

  • min_field -- the column name for the range minimum

  • max_field -- the column name for the range maximum

  • lookup_dataset_field -- dataset column name in file_path

  • range_dataset_field -- dataset column name in external_file

  • rules -- optional dict controlling subset selection on lookup rows. Supported keys: - lookup_rules: dict or list[dict] of structured conditions. Fields in one dict are AND'ed; multiple dicts are OR'ed. Examples: {"lookup_rules": {"prefix": "conservationarea"}} {"lookup_rules": {"organisation": {"op": "in", "value": ["orgA", "orgB"]}}} Use operators like != and not in when you want to exclude rows.

  • dataset_aliases -- optional mapping of lookup dataset values to allowed range dataset values. Example: {"statistical-geography": ["ward", "region"]}

digital_land.expectations.operations.csv.check_fields_are_within_range(conn, file_path: Path, field: str, external_file: Path, min_field: str, max_field: str, rules: dict | None = None)

Check that one or more lookup fields are within ranges from an external file.

Parameters:
  • conn -- duckdb connection

  • file_path -- path to the CSV file containing fields to validate

  • field -- column name(s) to validate. You can pass a single name ("entity") or a comma-separated list ("entity, end-entity"). All specified fields must be within range.

  • external_file -- path to the CSV file containing valid ranges

  • min_field -- the column name for the range minimum

  • max_field -- the column name for the range maximum

  • rules --

    optional dict controlling subset selection on lookup rows. Supported keys: - lookup_rules: dict or list[dict] of structured conditions.

    Fields in one dict are AND'ed; multiple dicts are OR'ed.

    Examples: {"lookup_rules": {"prefix": "conservationarea"}} {"lookup_rules": {"organisation": {"op": "in", "value": ["orgA", "orgB"]}}} Use operators like != and not in when you want to exclude rows.

digital_land.expectations.operations.csv.check_no_blank_rows(conn, file_path: Path)

Checks that the CSV does not contain fully blank rows.

A row is considered blank when every column is empty after trimming whitespace.

Parameters:
  • conn -- duckdb connection

  • file_path -- path to the CSV file

digital_land.expectations.operations.csv.check_no_overlapping_ranges(conn, file_path: Path, min_field: str, max_field: str)

Checks that no ranges overlap between rows.

Two ranges [a_min, a_max] and [b_min, b_max] overlap if: a_min <= b_max AND a_max >= b_min

Parameters:
  • conn -- duckdb connection

  • file_path -- path to the CSV file

  • min_field -- the column name for the range minimum

  • max_field -- the column name for the range maximum

digital_land.expectations.operations.csv.check_no_shared_values(conn, file_path: Path, field_1: str, field_2: str)

Checks that no value appears in both field_1 and field_2.

Parameters:
  • conn -- duckdb connection

  • file_path -- path to the CSV file

  • field_1 -- the first column name

  • field_2 -- the second column name

digital_land.expectations.operations.csv.check_unique(conn, file_path: Path, field: str)

Checks that all values in a given field are unique.

Parameters:
  • conn -- duckdb connection

  • file_path -- path to the CSV file

  • field -- the column name to check for uniqueness

digital_land.expectations.operations.csv.count_rows(conn, file_path: Path, expected: int, comparison_rule: str = 'greater_than')

Counts the number of rows in the CSV and compares against an expected value.

Parameters:
  • conn -- duckdb connection

  • file_path -- path to the CSV file

  • expected -- the expected row count

  • comparison_rule -- how to compare actual vs expected

digital_land.expectations.operations.csv.expect_column_to_be_curie(conn, file_path: Path, field: str)
digital_land.expectations.operations.csv.expect_column_to_be_curie_list(conn, file_path: Path, field: str)
digital_land.expectations.operations.csv.expect_column_to_be_date(conn, file_path: Path, field: str)
digital_land.expectations.operations.csv.expect_column_to_be_datetime(conn, file_path: Path, field: str)
digital_land.expectations.operations.csv.expect_column_to_be_decimal(conn, file_path: Path, field: str)
digital_land.expectations.operations.csv.expect_column_to_be_flag(conn, file_path: Path, field: str)
digital_land.expectations.operations.csv.expect_column_to_be_hash(conn, file_path: Path, field: str)
digital_land.expectations.operations.csv.expect_column_to_be_integer(conn, file_path: Path, field: str)
digital_land.expectations.operations.csv.expect_column_to_be_json(conn, file_path: Path, field: str)
digital_land.expectations.operations.csv.expect_column_to_be_latitude(conn, file_path: Path, field: str)
digital_land.expectations.operations.csv.expect_column_to_be_longitude(conn, file_path: Path, field: str)
digital_land.expectations.operations.csv.expect_column_to_be_multipolygon(conn, file_path: Path, field: str)

Validate that non-empty values in a column are valid polygonal geometries. This expectation relies on DuckDB spatial functions so the provided connection should have the spatial extension loaded.

Parameters:
  • conn -- duckdb connection used to run the query, spatial extension should already be loaded

  • file_path -- path to the CSV file being validated

  • field -- the geometry column to validate

digital_land.expectations.operations.csv.expect_column_to_be_pattern(conn, file_path: Path, field: str)

Validate that non-empty values in a column are valid regex patterns.

digital_land.expectations.operations.csv.expect_column_to_be_point(conn, file_path: Path, field: str)

Validate that non-empty values in a column are valid WKT POINT geometries. This expectation relies on DuckDB spatial functions so the provided connection should have the spatial extension loaded.

Parameters:
  • conn -- duckdb connection used to run the query, spatial extension should already be loaded

  • file_path -- path to the CSV file being validated

  • field -- the point column to validate

digital_land.expectations.operations.csv.expect_column_to_be_url(conn, file_path: Path, field: str)
digital_land.expectations.operations.csv.expect_column_to_match_pattern(conn, file_path: Path, field: str, pattern: str)

Validate that non-empty values in a column match a provided regex pattern.

digital_land.expectations.operations.dataset module

digital_land.expectations.operations.dataset.check_columns(conn, expected: dict)
digital_land.expectations.operations.dataset.count_deleted_entities(conn, expected: int, organisation_entity: int | None = None, resources_cache: dict | None = None)
digital_land.expectations.operations.dataset.count_lpa_boundary(conn, lpa: str, expected: int, organisation_entity: int | None = None, comparison_rule: str = 'equals_to', geometric_relation: str = 'within')

Specific version of a count which given a local authority and a dataset checks for any entities relating to the lpa boundary. relation defaults to within but can be changed. This should only be used on geographic datasets :param conn: sqlite connection used to connect to the db, wil be created by the checkpoint class :param lpa: The reference to the local planning authority (geography dataset) boundary to use :param expected: the expected count, must be a non-negative integer :param organisation: optional additional filter to filter by organisation_entity as well as boundary :param geometric_relation: how to decide if the data is related to the lpa boundary

digital_land.expectations.operations.dataset.duplicate_geometry_check(conn, spatial_field: str)

Compares all the geometries or points of entities in a dataset to find duplicates. Geometries are classed as duplicates if they have > 95% intersection, points are classed as duplicates if they are an exact match :param conn: spatialite connection used to connect to the db, wil be created by the checkpoint class :param spatial_field: the field to be used for comparison, either 'point' or 'geometry'

digital_land.expectations.operations.dataset.fetch_active_resources_for_dataset(dataset_name)

Module contents