digital_land.expectations package
Subpackages
Submodules
digital_land.expectations.commands module
- digital_land.expectations.commands.run_dataset_checkpoint(dataset, file_path, output_dir, config: Config, organisations: Organisation, act_on_critical_error=False)
function to run expectation rules for a given dataset from rules stored in the configuration requires the comfiguration to be passed in.
digital_land.expectations.exception module
- exception digital_land.expectations.exception.DataQualityException(message)
Bases:
Exception
Exception raised for failed expectations with severity RaiseError. Attributes: response
digital_land.expectations.log module
- class digital_land.expectations.log.ExpectationLog(dataset)
Bases:
object
a class to create and store the log output from running expectations
- add(entry: dict)
function to add an individual log respresented as a dictionary
- save(path=None, f=None)
- save_parquet(output_dir, partition=None)
Save the output into a parquet file that is a partition for this particular dataset. Unlike saving to a csv this uses a directory to add the partition.
digital_land.expectations.operation module
- digital_land.expectations.operation.check_columns(conn, expected: dict)
- digital_land.expectations.operation.count_deleted_entities(conn, expected: int, organisation_entity: int | None = None)
- digital_land.expectations.operation.count_lpa_boundary(conn, lpa: str, expected: int, organisation_entity: int | None = None, comparison_rule: str = 'equals_to', geometric_relation: str = 'within')
Specific version of a count which given a local authority and a dataset checks for any entities relating to the lpa boundary. relation defaults to within but can be changed. This should only be used on geographic datasets :param conn: sqlite connection used to connect to the db, wil be created by the checkpoint class :param lpa: The reference to the local planning authority (geography dataset) boundary to use :param expected: the expected count, must be a non-negative integer :param organisation: optional additional filter to filter by organisation_entity as well as boundary :param geometric_relation: how to decide if the data is related to the lpa boundary
- digital_land.expectations.operation.duplicate_geometry_check(conn, spatial_field: str)
Compares all the geometries or points of entities in a dataset to find duplicates. Geometries are classed as duplicates if they have > 95% intersection, points are classed as duplicates if they are an exact match :param conn: spatialite connection used to connect to the db, wil be created by the checkpoint class :param spatial_field: the field to be used for comparison, either 'point' or 'geometry'
digital_land.expectations.result module
Contains data classes representing the result of an expectation being ran from expectation functions
- class digital_land.expectations.result.ExpectationResult(expectation_result: str, checkpoint: str, passed: bool, severity: SeverityEnum, message: str, issues: list, data_name: str)
Bases:
object
Class to keep inputs and results of expectations
- act_on_failure()
Returns 1 if severity is critical or 0 if severity is not critical raises a warning for failed tests Could be moved to expection suite class
- checkpoint: str
- data_name: str
- dict_for_export()
- expectation_result: str
- classmethod from_dict(kvs: dict | list | str | int | float | bool | None, *, infer_missing=False) A
- classmethod from_json(s: str | bytes | bytearray, *, parse_float=None, parse_int=None, parse_constant=None, infer_missing=False, **kw) A
- issues: list
- message: str
- passed: bool
- save_to_file(dir_path: str)
Prepares a naming convention and saves the response to a provided path
- classmethod schema(*, infer_missing: bool = False, only=None, exclude=(), many: bool = False, context=None, load_only=(), dump_only=(), partial: bool = False, unknown=None) SchemaF[A]
- severity: SeverityEnum
- to_dict(encode_json=False) Dict[str, dict | list | str | int | float | bool | None]
- to_json(*, skipkeys: bool = False, ensure_ascii: bool = True, check_circular: bool = True, allow_nan: bool = True, indent: int | str | None = None, separators: Tuple[str, str] | None = None, default: Callable | None = None, sort_keys: bool = False, **kw) str
- class digital_land.expectations.result.SeverityEnum(value)
Bases:
str
,Enum
Enumeration for severity csv in specification, may need to be replaced in the future with a different mechanism to read from csv
- critical = 'critical'
- debug = 'debug'
- error = 'error'
- info = 'info'
- notice = 'notice'
- warning = 'warning'
digital_land.expectations.utils module
Utility functions to support functions in the expectation module
notes: - might want to remove QueryRunner at a future date as it loads spatialite which may not be useful for everything
- class digital_land.expectations.utils.QueryRunner(tested_dataset_path: str)
Bases:
object
Class to run queries usings spatialite
- inform_dataset_path()
- run_query(sql_query: str, return_only_first_col_as_set: bool = False)
Receives a sql query and returns the results either in a pandas dataframe or just the first column as a set (this is useful to test presence or absence of items like tables, columns, etc).
Note: connection is openned and closed at each query, but for use cases like the present one that would not offer big benefits and would mean having to dev thread-lcoal connection pools. For more info see: https://stackoverflow.com/a/14520670
- digital_land.expectations.utils.config_parser(filepath: str)
Will parse a config file
- digital_land.expectations.utils.transform_df_first_column_into_set(dataframe: DataFrame) set
Given a pd dataframe returns the first column as a python set