digital_land.package package
Submodules
digital_land.package.csv module
digital_land.package.dataset module
- class digital_land.package.dataset.DatasetPackage(dataset, organisation, **kwargs)
Bases:
SqlitePackage
- add_counts()
count the number of entities by resource
- entity_row(facts)
entity_row.
- Parameters:
facts (list[list[str]]) -- Nested List of facts pertaining to an entity [[entity, field, value]] (and resource)
- entry_date_upsert(table, fields, row, conflict_fields, update_fields)
Dataset specific upsert function that only replace values for more recent entry_dates. Will insert rows where no conflict is found. where there's a conflict it was compare entry dates and insert other field
- insert_entity(facts)
- load()
- load_column_fields(path)
- load_dataset_resource(path)
- load_entities()
load the entity table from the fact table
- load_facts(path)
- load_issues(path)
- load_old_entities(path)
load the old-entity table
- load_transformed(path)
- migrate_entity(row)
digital_land.package.dataset_parquet module
- class digital_land.package.dataset_parquet.DatasetParquetPackage(dataset, path, duckdb_path=None, transformed_parquet_dir=None, **kwargs)
Bases:
Package
- analyze_parquet_dir(transformed_parquet_dir)
Get details about the transformed_parquet_dir to decide on which strategy to use for creating the fact and fact_resource tables
- choose_strategy(parquet_dir_details)
What strategy should we use to create fact, fact_resource and entity tables: Return one of: - "direct" - analyse all parquet files at once - "batch" - group the parquet files into batch files of approx. 256MB - Did have the other following as potential strategies but it appears as if 'batch' and 'direct' will suffice - since batching everything into one fie is the equivalent os the 'single_file' option. - "single_file" - put all parquet files into a single parquet file - "consolidate_then_bucket" - put all parquet files into several larger files
- close_conn()
- combine_parquet_files(input_path, output_path)
This method combines multiple parquet files into a single parquet file
- get_schema()
- group_parquet_files(transformed_parquet_dir, target_mb=256, delete_originals=False)
group parquet files into batches, each aiming for approximately 'target_mb' in size.
- load()
- load_entities(transformed_parquet_dir, resource_path, organisation_path)
- load_entities_range(transformed_parquet_dir, resource_path, organisation_path, output_path, entity_range=None)
- load_fact_resource(transformed_parquet_dir)
- load_facts(transformed_parquet_dir)
This method loads facts into a fact table from a directory containing all transformed files as parquet files
- load_to_sqlite(sqlite_path)
Convert parquet files to sqlite3 tables assumes the sqlite table already exist. There is an arguement to say we want to improve the loading functionality of a sqlite package
digital_land.package.organisation module
- class digital_land.package.organisation.OrganisationPackage(**kwargs)
Bases:
CsvPackage
- check(lpa_path, output_path)
- create()
- create_from_dataset()
- create_from_flattened()
- fetch_dataset()
- digital_land.package.organisation.issue(severity, row, issue, field='', value='')
- digital_land.package.organisation.load_lpas(path)
- digital_land.package.organisation.load_organisations(path)
- digital_land.package.organisation.save_issues(issues, path)
digital_land.package.package module
digital_land.package.sqlite module
- class digital_land.package.sqlite.SqlitePackage(*args, **kwargs)
Bases:
Package
- colvalue(row, field)
- commit()
- connect()
- create()
- create_cursor()
- create_database()
- create_index(table, fields, name=None)
- create_indexes(tables=None)
- create_table(table, fields, key_field=None, unique=None)
- create_tables()
- disconnect()
- drop_index(table, fields, name=None)
- drop_indexes()
- execute(cmd)
- field_coltype(field)
- get_table_fields(tables=None)
gets tables fields and join table information for a dictionary of tables
- insert(table, fields, row, upsert=False)
- load(tables=None)
- load_from_s3(bucket_name, object_key, table_name)
- load_join_table(table, fields, split_field=None, field=None, path=None)
- load_table(table, fields, path=None)
- set_up_connection()
- spatialite(path=None)
- digital_land.package.sqlite.colname(field)
- digital_land.package.sqlite.coltype(datatype)