digital_land.expectations package

Subpackages

Submodules

digital_land.expectations.commands module

digital_land.expectations.commands.run_dataset_checkpoint(dataset, file_path, output_dir, config: Config, organisations: Organisation, act_on_critical_error=False)

function to run expectation rules for a given dataset from rules stored in the configuration requires the comfiguration to be passed in.

digital_land.expectations.exception module

exception digital_land.expectations.exception.DataQualityException(message)

Bases: Exception

Exception raised for failed expectations with severity RaiseError. Attributes: response

digital_land.expectations.log module

class digital_land.expectations.log.ExpectationLog(dataset)

Bases: object

a class to create and store the log output from running expectations

add(entry: dict)

function to add an individual log respresented as a dictionary

save(path=None, f=None)
save_parquet(output_dir, partition=None)

Save the output into a parquet file that is a partition for this particular dataset. Unlike saving to a csv this uses a directory to add the partition.

digital_land.expectations.operation module

digital_land.expectations.operation.check_columns(conn, expected: dict)
digital_land.expectations.operation.count_deleted_entities(conn, expected: int, organisation_entity: int | None = None)
digital_land.expectations.operation.count_lpa_boundary(conn, lpa: str, expected: int, organisation_entity: int | None = None, comparison_rule: str = 'equals_to', geometric_relation: str = 'within')

Specific version of a count which given a local authority and a dataset checks for any entities relating to the lpa boundary. relation defaults to within but can be changed. This should only be used on geographic datasets :param conn: sqlite connection used to connect to the db, wil be created by the checkpoint class :param lpa: The reference to the local planning authority (geography dataset) boundary to use :param expected: the expected count, must be a non-negative integer :param organisation: optional additional filter to filter by organisation_entity as well as boundary :param geometric_relation: how to decide if the data is related to the lpa boundary

digital_land.expectations.operation.duplicate_geometry_check(conn, spatial_field: str)

Compares all the geometries or points of entities in a dataset to find duplicates. Geometries are classed as duplicates if they have > 95% intersection, points are classed as duplicates if they are an exact match :param conn: spatialite connection used to connect to the db, wil be created by the checkpoint class :param spatial_field: the field to be used for comparison, either 'point' or 'geometry'

digital_land.expectations.result module

Contains data classes representing the result of an expectation being ran from expectation functions

class digital_land.expectations.result.ExpectationResult(expectation_result: str, checkpoint: str, passed: bool, severity: SeverityEnum, message: str, issues: list, data_name: str)

Bases: object

Class to keep inputs and results of expectations

act_on_failure()

Returns 1 if severity is critical or 0 if severity is not critical raises a warning for failed tests Could be moved to expection suite class

checkpoint: str
data_name: str
dict_for_export()
expectation_result: str
classmethod from_dict(kvs: dict | list | str | int | float | bool | None, *, infer_missing=False) A
classmethod from_json(s: str | bytes | bytearray, *, parse_float=None, parse_int=None, parse_constant=None, infer_missing=False, **kw) A
issues: list
message: str
passed: bool
save_to_file(dir_path: str)

Prepares a naming convention and saves the response to a provided path

classmethod schema(*, infer_missing: bool = False, only=None, exclude=(), many: bool = False, context=None, load_only=(), dump_only=(), partial: bool = False, unknown=None) SchemaF[A]
severity: SeverityEnum
to_dict(encode_json=False) Dict[str, dict | list | str | int | float | bool | None]
to_json(*, skipkeys: bool = False, ensure_ascii: bool = True, check_circular: bool = True, allow_nan: bool = True, indent: int | str | None = None, separators: Tuple[str, str] | None = None, default: Callable | None = None, sort_keys: bool = False, **kw) str
class digital_land.expectations.result.SeverityEnum(value)

Bases: str, Enum

Enumeration for severity csv in specification, may need to be replaced in the future with a different mechanism to read from csv

critical = 'critical'
debug = 'debug'
error = 'error'
info = 'info'
notice = 'notice'
warning = 'warning'

digital_land.expectations.utils module

Utility functions to support functions in the expectation module

notes: - might want to remove QueryRunner at a future date as it loads spatialite which may not be useful for everything

class digital_land.expectations.utils.QueryRunner(tested_dataset_path: str)

Bases: object

Class to run queries usings spatialite

inform_dataset_path()
run_query(sql_query: str, return_only_first_col_as_set: bool = False)

Receives a sql query and returns the results either in a pandas dataframe or just the first column as a set (this is useful to test presence or absence of items like tables, columns, etc).

Note: connection is openned and closed at each query, but for use cases like the present one that would not offer big benefits and would mean having to dev thread-lcoal connection pools. For more info see: https://stackoverflow.com/a/14520670

digital_land.expectations.utils.config_parser(filepath: str)

Will parse a config file

digital_land.expectations.utils.transform_df_first_column_into_set(dataframe: DataFrame) set

Given a pd dataframe returns the first column as a python set

Module contents