digital_land.utils package

Submodules

digital_land.utils.add_data_utils module

digital_land.utils.add_data_utils.clear_log(collection_dir, endpoint)
digital_land.utils.add_data_utils.download_dataset(dataset, specification, cache_dir)
digital_land.utils.add_data_utils.get_column_field_summary(dataset, endpoint_resource_info, column_field_dir, converted_dir, specification_dir, pipeline_dir)
digital_land.utils.add_data_utils.get_entity_summary(endpoint_resource_info, output_path, pipeline, issue_dir, pipeline_dir)
digital_land.utils.add_data_utils.get_existing_endpoints_summary(endpoint_resource_info, collection, dataset)
digital_land.utils.add_data_utils.get_issue_summary(endpoint_resource_info, issue_dir, new_entities=None)
digital_land.utils.add_data_utils.get_provision_entities_from_duckdb(lookup_path, pipeline, endpoint_resource_info)
digital_land.utils.add_data_utils.get_transformed_entities(dataset_path, transformed_path)

Returns a Dataframe of entities from a dataset. It returns entities that have facts in the transformed file at transformed_path

digital_land.utils.add_data_utils.get_updated_entities_summary(original_entity_df, updated_entity_df)

This will return a summary of the differences between two dataframes of the same entities

digital_land.utils.add_data_utils.get_user_response(message)
digital_land.utils.add_data_utils.is_date_valid(date, date_type)
digital_land.utils.add_data_utils.is_url_valid(url, url_type)
digital_land.utils.add_data_utils.normalise_json(val)

Returns a sorted stringified json

digital_land.utils.add_endpoints_utils module

digital_land.utils.add_endpoints_utils.task_preprocess(ctx)

preparatory steps to tidy up previous runs, and populate the context :param ctx: :return:

digital_land.utils.dataset_resource_utils module

digital_land.utils.dataset_resource_utils.read_dataset_resource_log(dataset_resource_dir, dataset, resource)

Read an existing DatasetResourceLog CSV for a given dataset/resource.

Returns a dict with code-version, config-hash, and specification-hash, or None if the file doesn't exist or can't be read.

Expected path: {dataset_resource_dir}/{dataset}/{resource}.csv

digital_land.utils.dataset_resource_utils.resource_needs_processing(dataset_resource_dir, dataset, resource, current_code_version, current_config_hash, current_specification_hash)

Check whether a resource needs processing by comparing its log against current state.

Returns True if there is no existing log or if any of the three values differ.

digital_land.utils.gdal_utils module

digital_land.utils.gdal_utils.get_gdal_version()

digital_land.utils.hash_utils module

digital_land.utils.hash_utils.hash_directory(dir, exclude=[])

Returns a SHA1 hex digest of the contents of a directory.

Files are sorted before hashing so the result is stable regardless of filesystem ordering. File names are included in the hash so renames are detected.

Parameters:
  • dir -- Path to the directory to hash.

  • exclude -- List of path prefixes (relative to dir) to exclude.

Raises:

RuntimeError -- If dir does not exist or is not a directory.

Module contents