digital_land.expectations.operations package
Submodules
digital_land.expectations.operations.csv module
- digital_land.expectations.operations.csv.check_allowed_values(conn, file_path: Path, field: str, allowed_values: list)
Checks that a field contains only values from an allowed set.
- Parameters:
conn -- duckdb connection
file_path -- path to the CSV file
field -- the column name to validate
allowed_values -- allowed values for the field
- digital_land.expectations.operations.csv.check_field_is_within_range_by_dataset_org(conn, file_path: Path, field: str, external_file: Path, min_field: str, max_field: str, lookup_dataset_field: str, range_dataset_field: str, rules: dict | None = None, dataset_aliases: dict | None = None)
Check field values are within ranges matched by dataset field and organisation.
Matching is fixed to two keys: 1. lookup_dataset_field -> range_dataset_field 2. organisation -> organisation
- Parameters:
conn -- duckdb connection
file_path -- path to the CSV file containing fields to validate
field -- single column name to validate (for example: "entity").
external_file -- path to the CSV file containing valid ranges
min_field -- the column name for the range minimum
max_field -- the column name for the range maximum
lookup_dataset_field -- dataset column name in file_path
range_dataset_field -- dataset column name in external_file
rules -- optional dict controlling subset selection on lookup rows. Supported keys: - lookup_rules: dict or list[dict] of structured conditions. Fields in one dict are AND'ed; multiple dicts are OR'ed. Examples: {"lookup_rules": {"prefix": "conservationarea"}} {"lookup_rules": {"organisation": {"op": "in", "value": ["orgA", "orgB"]}}} Use operators like != and not in when you want to exclude rows.
dataset_aliases -- optional mapping of lookup dataset values to allowed range dataset values. Example: {"statistical-geography": ["ward", "region"]}
- digital_land.expectations.operations.csv.check_fields_are_within_range(conn, file_path: Path, field: str, external_file: Path, min_field: str, max_field: str, rules: dict | None = None)
Check that one or more lookup fields are within ranges from an external file.
- Parameters:
conn -- duckdb connection
file_path -- path to the CSV file containing fields to validate
field -- column name(s) to validate. You can pass a single name ("entity") or a comma-separated list ("entity, end-entity"). All specified fields must be within range.
external_file -- path to the CSV file containing valid ranges
min_field -- the column name for the range minimum
max_field -- the column name for the range maximum
rules --
optional dict controlling subset selection on lookup rows. Supported keys: - lookup_rules: dict or list[dict] of structured conditions.
Fields in one dict are AND'ed; multiple dicts are OR'ed.
Examples: {"lookup_rules": {"prefix": "conservationarea"}} {"lookup_rules": {"organisation": {"op": "in", "value": ["orgA", "orgB"]}}} Use operators like != and not in when you want to exclude rows.
- digital_land.expectations.operations.csv.check_no_blank_rows(conn, file_path: Path)
Checks that the CSV does not contain fully blank rows.
A row is considered blank when every column is empty after trimming whitespace.
- Parameters:
conn -- duckdb connection
file_path -- path to the CSV file
- digital_land.expectations.operations.csv.check_no_overlapping_ranges(conn, file_path: Path, min_field: str, max_field: str)
Checks that no ranges overlap between rows.
Two ranges [a_min, a_max] and [b_min, b_max] overlap if: a_min <= b_max AND a_max >= b_min
- Parameters:
conn -- duckdb connection
file_path -- path to the CSV file
min_field -- the column name for the range minimum
max_field -- the column name for the range maximum
Checks that no value appears in both field_1 and field_2.
- Parameters:
conn -- duckdb connection
file_path -- path to the CSV file
field_1 -- the first column name
field_2 -- the second column name
- digital_land.expectations.operations.csv.check_unique(conn, file_path: Path, field: str)
Checks that all values in a given field are unique.
- Parameters:
conn -- duckdb connection
file_path -- path to the CSV file
field -- the column name to check for uniqueness
- digital_land.expectations.operations.csv.count_rows(conn, file_path: Path, expected: int, comparison_rule: str = 'greater_than')
Counts the number of rows in the CSV and compares against an expected value.
- Parameters:
conn -- duckdb connection
file_path -- path to the CSV file
expected -- the expected row count
comparison_rule -- how to compare actual vs expected
- digital_land.expectations.operations.csv.expect_column_to_be_curie(conn, file_path: Path, field: str)
- digital_land.expectations.operations.csv.expect_column_to_be_curie_list(conn, file_path: Path, field: str)
- digital_land.expectations.operations.csv.expect_column_to_be_date(conn, file_path: Path, field: str)
- digital_land.expectations.operations.csv.expect_column_to_be_datetime(conn, file_path: Path, field: str)
- digital_land.expectations.operations.csv.expect_column_to_be_decimal(conn, file_path: Path, field: str)
- digital_land.expectations.operations.csv.expect_column_to_be_flag(conn, file_path: Path, field: str)
- digital_land.expectations.operations.csv.expect_column_to_be_hash(conn, file_path: Path, field: str)
- digital_land.expectations.operations.csv.expect_column_to_be_integer(conn, file_path: Path, field: str)
- digital_land.expectations.operations.csv.expect_column_to_be_json(conn, file_path: Path, field: str)
- digital_land.expectations.operations.csv.expect_column_to_be_latitude(conn, file_path: Path, field: str)
- digital_land.expectations.operations.csv.expect_column_to_be_longitude(conn, file_path: Path, field: str)
- digital_land.expectations.operations.csv.expect_column_to_be_multipolygon(conn, file_path: Path, field: str)
Validate that non-empty values in a column are valid polygonal geometries. This expectation relies on DuckDB spatial functions so the provided connection should have the spatial extension loaded.
- Parameters:
conn -- duckdb connection used to run the query, spatial extension should already be loaded
file_path -- path to the CSV file being validated
field -- the geometry column to validate
- digital_land.expectations.operations.csv.expect_column_to_be_pattern(conn, file_path: Path, field: str)
Validate that non-empty values in a column are valid regex patterns.
- digital_land.expectations.operations.csv.expect_column_to_be_point(conn, file_path: Path, field: str)
Validate that non-empty values in a column are valid WKT POINT geometries. This expectation relies on DuckDB spatial functions so the provided connection should have the spatial extension loaded.
- Parameters:
conn -- duckdb connection used to run the query, spatial extension should already be loaded
file_path -- path to the CSV file being validated
field -- the point column to validate
- digital_land.expectations.operations.csv.expect_column_to_be_url(conn, file_path: Path, field: str)
- digital_land.expectations.operations.csv.expect_column_to_match_pattern(conn, file_path: Path, field: str, pattern: str)
Validate that non-empty values in a column match a provided regex pattern.
digital_land.expectations.operations.dataset module
- digital_land.expectations.operations.dataset.check_columns(conn, expected: dict)
- digital_land.expectations.operations.dataset.count_deleted_entities(conn, expected: int, organisation_entity: int | None = None, resources_cache: dict | None = None)
- digital_land.expectations.operations.dataset.count_lpa_boundary(conn, lpa: str, expected: int, organisation_entity: int | None = None, comparison_rule: str = 'equals_to', geometric_relation: str = 'within')
Specific version of a count which given a local authority and a dataset checks for any entities relating to the lpa boundary. relation defaults to within but can be changed. This should only be used on geographic datasets :param conn: sqlite connection used to connect to the db, wil be created by the checkpoint class :param lpa: The reference to the local planning authority (geography dataset) boundary to use :param expected: the expected count, must be a non-negative integer :param organisation: optional additional filter to filter by organisation_entity as well as boundary :param geometric_relation: how to decide if the data is related to the lpa boundary
- digital_land.expectations.operations.dataset.duplicate_geometry_check(conn, spatial_field: str)
Compares all the geometries or points of entities in a dataset to find duplicates. Geometries are classed as duplicates if they have > 95% intersection, points are classed as duplicates if they are an exact match :param conn: spatialite connection used to connect to the db, wil be created by the checkpoint class :param spatial_field: the field to be used for comparison, either 'point' or 'geometry'
- digital_land.expectations.operations.dataset.fetch_active_resources_for_dataset(dataset_name)