digital_land.package package

Submodules

digital_land.package.csv module

class digital_land.package.csv.CsvPackage(*args, **kwargs)

Bases: Package

write(fieldnames, rows)

digital_land.package.dataset module

class digital_land.package.dataset.DatasetPackage(dataset, organisation, **kwargs)

Bases: SqlitePackage

add_counts(): count the number of entities by resource

entity_row(facts)

entity_row.

Parameters:: facts (list[list[str]]) -- Nested List of facts pertaining to an entity [[entity, field, value]] (and resource)

entry_date_upsert(table, fields, row, conflict_fields, update_fields): Dataset specific upsert function that only replace values for more recent entry_dates. Will insert rows where no conflict is found. where there's a conflict it was compare entry dates and insert other field

insert_entity(facts)

load()

load_column_fields(path)

load_dataset_resource(path)

load_entities(): load the entity table from the fact table

load_facts(path)

load_issues(path)

load_old_entities(path): load the old-entity table

load_transformed(path)

migrate_entity(row)

digital_land.package.dataset_parquet module

class digital_land.package.dataset_parquet.DatasetParquetPackage(dataset, path, duckdb_path=None, transformed_parquet_dir=None, **kwargs)

Bases: Package

analyze_parquet_dir(transformed_parquet_dir): Get details about the transformed_parquet_dir to decide on which strategy to use for creating the fact and fact_resource tables

choose_strategy(parquet_dir_details): What strategy should we use to create fact, fact_resource and entity tables: Return one of: - "direct" - analyse all parquet files at once - "batch" - group the parquet files into batch files of approx. 256MB - Did have the other following as potential strategies but it appears as if 'batch' and 'direct' will suffice - since batching everything into one fie is the equivalent os the 'single_file' option. - "single_file" - put all parquet files into a single parquet file - "consolidate_then_bucket" - put all parquet files into several larger files

close_conn()

combine_parquet_files(input_path, output_path): This method combines multiple parquet files into a single parquet file

get_schema()

group_parquet_files(transformed_parquet_dir, target_mb=256, delete_originals=False): group parquet files into batches, each aiming for approximately 'target_mb' in size.

load()

load_entities(transformed_parquet_dir, resource_path, organisation_path)

load_entities_range(transformed_parquet_dir, resource_path, organisation_path, output_path, entity_range=None)

load_fact_resource(transformed_parquet_dir)

load_facts(transformed_parquet_dir): This method loads facts into a fact table from a directory containing all transformed files as parquet files

load_to_sqlite(sqlite_path): Convert parquet files to sqlite3 tables assumes the sqlite table already exist. There is an arguement to say we want to improve the loading functionality of a sqlite package

digital_land.package.organisation module

class digital_land.package.organisation.OrganisationPackage(**kwargs)

Bases: CsvPackage

check(lpa_path, output_path)

create()

create_from_dataset()

create_from_flattened()

fetch_dataset()

digital_land.package.organisation.issue(severity, row, issue, field='', value='')

digital_land.package.organisation.load_lpas(path)

digital_land.package.organisation.load_organisations(path)

digital_land.package.organisation.save_issues(issues, path)

digital_land.package.package module

class digital_land.package.package.Package(datapackage, path=None, tables=[], indexes={}, specification=None, specification_dir=None)

Bases: object

create(path=None)

class digital_land.package.package.Specification(specification_dir=None, schema=None, field=None)

Bases: object

load()

digital_land.package.sqlite module

class digital_land.package.sqlite.SqlitePackage(*args, **kwargs)

Bases: Package

colvalue(row, field)

commit()

connect()

create()

create_cursor()

create_database()

create_index(table, fields, name=None)

create_indexes(tables=None)

create_table(table, fields, key_field=None, unique=None)

create_tables()

disconnect()

drop_index(table, fields, name=None)

drop_indexes()

execute(cmd)

field_coltype(field)

get_table_fields(tables=None): gets tables fields and join table information for a dictionary of tables

insert(table, fields, row, upsert=False)

load(tables=None)

load_from_s3(bucket_name, object_key, table_name)

load_join_table(table, fields, split_field=None, field=None, path=None)

load_table(table, fields, path=None)

set_up_connection()

spatialite(path=None)

digital_land.package.sqlite.colname(field)

digital_land.package.sqlite.coltype(datatype)

digital_land.package package

Submodules

digital_land.package.csv module

digital_land.package.dataset module

digital_land.package.dataset_parquet module

digital_land.package.organisation module

digital_land.package.package module

digital_land.package.sqlite module

Module contents