digital_land.expectations package

Subpackages

Submodules

digital_land.expectations.commands module

digital_land.expectations.commands.run_csv_checkpoint(dataset, file_path, output_dir, rules)

Run expectation rules against a CSV file using duckdb.

digital_land.expectations.commands.run_dataset_checkpoint(dataset, file_path, output_dir, config: Config, organisations: Organisation, act_on_critical_error=False)

function to run expectation rules for a given dataset from rules stored in the configuration requires the comfiguration to be passed in.

digital_land.expectations.exception module

exception digital_land.expectations.exception.DataQualityException(message)

Bases: Exception

Exception raised for failed expectations with severity RaiseError. Attributes: response

digital_land.expectations.log module

class digital_land.expectations.log.ExpectationLog(dataset)

Bases: object

a class to create and store the log output from running expectations

add(entry: dict)

function to add an individual log respresented as a dictionary

save(path=None, f=None)
save_parquet(output_dir, partition=None)

Save the output into a parquet file that is a partition for this particular dataset. Unlike saving to a csv this uses a directory to add the partition.

digital_land.expectations.result module

Contains data classes representing the result of an expectation being ran from expectation functions

class digital_land.expectations.result.ExpectationResult(expectation_result: str, checkpoint: str, passed: bool, severity: SeverityEnum, message: str, issues: list, data_name: str)

Bases: object

Class to keep inputs and results of expectations

act_on_failure()

Returns 1 if severity is critical or 0 if severity is not critical raises a warning for failed tests Could be moved to expection suite class

checkpoint: str
data_name: str
dict_for_export()
expectation_result: str
classmethod from_dict(kvs: dict | list | str | int | float | bool | None, *, infer_missing=False) A
classmethod from_json(s: str | bytes | bytearray, *, parse_float=None, parse_int=None, parse_constant=None, infer_missing=False, **kw) A
issues: list
message: str
passed: bool
save_to_file(dir_path: str)

Prepares a naming convention and saves the response to a provided path

classmethod schema(*, infer_missing: bool = False, only=None, exclude=(), many: bool = False, context=None, load_only=(), dump_only=(), partial: bool = False, unknown=None) SchemaF[A]
severity: SeverityEnum
to_dict(encode_json=False) Dict[str, dict | list | str | int | float | bool | None]
to_json(*, skipkeys: bool = False, ensure_ascii: bool = True, check_circular: bool = True, allow_nan: bool = True, indent: int | str | None = None, separators: Tuple[str, str] | None = None, default: Callable | None = None, sort_keys: bool = False, **kw) str
class digital_land.expectations.result.SeverityEnum(value)

Bases: str, Enum

Enumeration for severity csv in specification, may need to be replaced in the future with a different mechanism to read from csv

critical = 'critical'
debug = 'debug'
error = 'error'
info = 'info'
notice = 'notice'
warning = 'warning'

digital_land.expectations.utils module

Utility functions to support functions in the expectation module

notes: - might want to remove QueryRunner at a future date as it loads spatialite which may not be useful for everything

class digital_land.expectations.utils.QueryRunner(tested_dataset_path: str)

Bases: object

Class to run queries usings spatialite

inform_dataset_path()
run_query(sql_query: str, return_only_first_col_as_set: bool = False)

Receives a sql query and returns the results either in a pandas dataframe or just the first column as a set (this is useful to test presence or absence of items like tables, columns, etc).

Note: connection is openned and closed at each query, but for use cases like the present one that would not offer big benefits and would mean having to dev thread-lcoal connection pools. For more info see: https://stackoverflow.com/a/14520670

digital_land.expectations.utils.config_parser(filepath: str)

Will parse a config file

digital_land.expectations.utils.transform_df_first_column_into_set(dataframe: DataFrame) set

Given a pd dataframe returns the first column as a python set

Module contents