digital_land.package package

Submodules

digital_land.package.csv module

class digital_land.package.csv.CsvPackage(*args, **kwargs)

Bases: Package

write(fieldnames, rows)

digital_land.package.dataset module

class digital_land.package.dataset.DatasetPackage(dataset, organisation, **kwargs)

Bases: SqlitePackage

add_counts()

count the number of entities by resource

entity_row(facts)

entity_row.

Parameters:

facts (list[list[str]]) -- Nested List of facts pertaining to an entity [[entity, field, value]] (and resource)

entry_date_upsert(table, fields, row, conflict_fields, update_fields)

Dataset specific upsert function that only replace values for more recent entry_dates. Will insert rows where no conflict is found. where there's a conflict it was compare entry dates and insert other field

insert_entity(facts)
load()
load_column_fields(path)
load_dataset_resource(path)
load_entities()

load the entity table from the fact table

load_facts(path)
load_issues(path)
load_old_entities(path)

load the old-entity table

load_transformed(path)
migrate_entity(row)

digital_land.package.dataset_parquet module

class digital_land.package.dataset_parquet.DatasetParquetPackage(dataset, path, duckdb_path=None, transformed_parquet_dir=None, **kwargs)

Bases: Package

analyze_parquet_dir(transformed_parquet_dir)

Get details about the transformed_parquet_dir to decide on which strategy to use for creating the fact and fact_resource tables

choose_strategy(parquet_dir_details)

What strategy should we use to create fact, fact_resource and entity tables: Return one of: - "direct" - analyse all parquet files at once - "batch" - group the parquet files into batch files of approx. 256MB - Did have the other following as potential strategies but it appears as if 'batch' and 'direct' will suffice - since batching everything into one fie is the equivalent os the 'single_file' option. - "single_file" - put all parquet files into a single parquet file - "consolidate_then_bucket" - put all parquet files into several larger files

close_conn()
combine_parquet_files(input_path, output_path)

This method combines multiple parquet files into a single parquet file

get_schema()
group_parquet_files(transformed_parquet_dir, target_mb=256, delete_originals=False)

group parquet files into batches, each aiming for approximately 'target_mb' in size.

load()
load_entities(transformed_parquet_dir, resource_path, organisation_path)
load_entities_range(transformed_parquet_dir, resource_path, organisation_path, output_path, entity_range=None)
load_fact_resource(transformed_parquet_dir)
load_facts(transformed_parquet_dir)

This method loads facts into a fact table from a directory containing all transformed files as parquet files

load_to_sqlite(sqlite_path)

Convert parquet files to sqlite3 tables assumes the sqlite table already exist. There is an arguement to say we want to improve the loading functionality of a sqlite package

digital_land.package.organisation module

class digital_land.package.organisation.OrganisationPackage(**kwargs)

Bases: CsvPackage

check(lpa_path, output_path)
create()
create_from_dataset()
create_from_flattened()
fetch_dataset()
digital_land.package.organisation.issue(severity, row, issue, field='', value='')
digital_land.package.organisation.load_lpas(path)
digital_land.package.organisation.load_organisations(path)
digital_land.package.organisation.save_issues(issues, path)

digital_land.package.package module

class digital_land.package.package.Package(datapackage, path=None, tables=[], indexes={}, specification=None, specification_dir=None)

Bases: object

create(path=None)
class digital_land.package.package.Specification(specification_dir=None, schema=None, field=None)

Bases: object

load()

digital_land.package.sqlite module

class digital_land.package.sqlite.SqlitePackage(*args, **kwargs)

Bases: Package

colvalue(row, field)
commit()
connect()
create()
create_cursor()
create_database()
create_index(table, fields, name=None)
create_indexes(tables=None)
create_table(table, fields, key_field=None, unique=None)
create_tables()
disconnect()
drop_index(table, fields, name=None)
drop_indexes()
execute(cmd)
field_coltype(field)
get_table_fields(tables=None)

gets tables fields and join table information for a dictionary of tables

insert(table, fields, row, upsert=False)
load(tables=None)
load_from_s3(bucket_name, object_key, table_name)
load_join_table(table, fields, split_field=None, field=None, path=None)
load_table(table, fields, path=None)
set_up_connection()
spatialite(path=None)
digital_land.package.sqlite.colname(field)
digital_land.package.sqlite.coltype(datatype)

Module contents