Performance Database Table Construction Documentation
This is documentation that explains the construction of the tables in the performance database, including details on base tables, sources of columns.
The Performance Database is designed to store and analyze performance-related metrics extracted from the source database digital_land. The tables of this DB are:
endpoint_dataset_issue_type_summary
endpoint_dataset_resource_summary
endpoint_dataset_summary
provision_summary
reporting_historic_endpoints
reporting_latest_endpoints
1. Table: endpoint_dataset_issue_type_summary
Purpose: To summarize issues associated with each dataset and its associated endpoint for various organisation.
Base Tables:
- issue
: Contains records of issues related to resources sourced from the digital_land.
- resource
: Holds information about the resources linked to the endpoints sourced from the digital_land.
- issue_type
: Defines types of issues sourced from the digital_land.
Columns:
organisation
: Extracted from theprovision
table.organisation_name
: Extracted from theprovision
table.cohort
: Extracted from theprovision
table.dataset
: Extracted from theprovision
table.collection
: Extracted from thereporting_historic_endpoints
table.pipeline
: Extracted from thereporting_historic_endpoints
table.endpoint
: Extracted from thereporting_historic_endpoints
table.endpoint_url
: Extracted from thereporting_historic_endpoints
table.resource
: Extracted from thereporting_historic_endpoints
table.resource_start_date
: Extracted from thereporting_historic_endpoints
table.resource_end_date
: Extracted from thereporting_historic_endpoints
table.latest_log_entry_date
: Extracted from thereporting_historic_endpoints
table.count_issues
: Count of issues from theissue
table.date
: Current date when the query is executed.issue_type
: Type of issues from theissue_type
table.severity
: Severity of issues from theissue_type
table.responsibility
: Responsibility assigned to the issues from theissue_type
table.fields
: Concatenated fields fromissue
table.
2. Table: endpoint_dataset_resource_summary
Purpose: To summarize resources associated with endpoints, including mapping and non-mapping fields. Mapping fields represent those that require conversion to align with the internal system’s accepted field names (e.g., converting ID to reference), while non-mapping fields already match the required names and can be accepted directly without modification.
Base Tables:
- provision
: Contains information about datasets provisioned for a particular Organisation which include organisations, cohorts, and datasets sourced from the digital_land.
- reporting_historic_endpoints
: To store historical data on endpoints sourced from the digital_land.
- column_field
: This is a mapping table that connects specific columns from an endpoint to their corresponding fields sourced according to the dataset’s specification e.g UID -> reference, sourced from the digital_land.
Columns:
organisation
: Extracted from theprovision
table.organisation_name
: Extracted from theprovision
table.cohort
: Extracted from theprovision
table.dataset
: Extracted from theprovision
table.collection
: Extracted from thereporting_historic_endpoints
table.pipeline
: Extracted from thereporting_historic_endpoints
table.endpoint
: Extracted from thereporting_historic_endpoints
table.endpoint_url
: Extracted from thereporting_historic_endpoints
table.resource
: Extracted from thereporting_historic_endpoints
table.resource_start_date
: Extracted from thereporting_historic_endpoints
table.resource_end_date
: Extracted from thereporting_historic_endpoints
table.latest_log_entry_date
: Extracted from thereporting_historic_endpoints
table.mapping_field
: Generated using conditional aggregation fromcolumn_field
to identify fields that map correctly to columns.non_mapping_field
: Generated using conditional aggregation fromcolumn_field
for fields that do not map correctly.
3. Table: endpoint_dataset_summary
Purpose: To summarize endpoint information, including the latest statuses and exceptions.
Base Tables:
- endpoint
: Contains information about endpoints sourced from the digital_land.
- source
: Provides centralized metadata for datasets across different organisations, sourced from the digital_land.
- log
: Contains logs related to endpoint performance sourced from the digital_land.
Columns:
endpoint
: Extracted from theendpoint
table.endpoint_url
: Extracted from theendpoint
table.organisation
: Extracted from thesource
table.dataset
: Extracted from the resource-dataset mapping.latest_status
: Extracted from thelog
table.latest_exception
: Extracted from thelog
table.entry_date
: Entry date of the endpoint.end_date
: End date of the endpoint.latest_resource_start_date
: Start date of the most recent resource.
4. Table: provision_summary
Purpose: This table provides an overview of the provision metrics for each dataset associated with an organization, by tracking the number of active and error-prone endpoints and counting various types of issues(error, warning & notice) and endpoints.
Base Tables:
- provision
: Contains information about datasets provisioned for a particular Organisation which include organisations, cohorts, and datasets sourced from the digital_land.
- organisation
: Contains the names and details of organizations sourced from the digital_land.
- issue
: Contains records of issues related to resources sourced from the digital_land.
- resource
: Holds information about the resources linked to the endpoints sourced from the digital_land.
- issue_type
: Defines types of issues sourced from the digital_land.
- column_field
: This is a mapping table that connects specific columns from an endpoint to their corresponding fields sourced according to the dataset’s specification e.g UID -> reference, sourced from the digital_land.
- endpoint
: Contains information about endpoints sourced from the digital_land.
- source
: Provides centralized metadata for datasets across different organisations, sourced from the digital_land.
- log
: Contains logs related to endpoint, sourced from the digital_land.
- reporting_historic_endpoints
: To store historical data on endpoints sourced from the digital_land.
Columns:
organisation
: Extracted from theprovision
table.organisation_name
: Extracted from theorganisation
table.dataset
: Extracted from theprovision
table.active_endpoint_count
: Represents the count of unique active endpoints for a dataset that do not have an endpoint_end_date (either NaN or blank). This calculation excludes endpoints that have been marked as retired/ended.error_endpoint_count
: Represents the count of unique errored endpoints for a dataset. This count includes endpoints where latest_status is not 200 and where endpoint_end_date is either NaN or blank, ignoring any historical errored endpoints.count_issue_error_internal
: Count of internal error issues calculated fromissue
andissue_type
.count_issue_error_external
: Count of external error issues calculated fromissue
andissue_type
.count_issue_warning_internal
: Count of internal warning issues calculated fromissue
andissue_type
.count_issue_warning_external
: Count of external warning issues calculated fromissue
andissue_type
.count_issue_notice_internal
: Count of internal notice issues calculated fromissue
andissue_type
.count_issue_notice_external
: Count of external notice issues calculated fromissue
andissue_type
.
5. Table: reporting_historic_endpoints
Purpose: To store historical data on endpoints, including their organization, dataset, and status.
Base Tables:
- endpoint
: Contains information about endpoints sourced from the digital_land.
- source
: Provides centralized metadata for datasets across different organisations, sourced from the digital_land.
- log
: Contains logs related to endpoint, sourced from the digital_land.
- organisation
: Contains the names and details of organizations sourced from the digital_land.
- source_pipeline
: Contains information on the pipeline associated with each source sourced from the digital_land.
- resource
: Holds information about the resources linked to the endpoints sourced from the digital_land.
Columns:
organisation
: Extracted fromsource
table.name
: Extracted from theorganisation
table.organisation_name
: Extracted from theorganisation
table.dataset
: Extracted from thesource_pipeline
table.collection
: Extracted from thesource
table.pipeline
: Extracted from thesource_pipeline
table.endpoint
: Extracted from thelog
table.endpoint_url
: Extracted from theendpoint
table.licence
: Extracted from thesource
data.latest_status
: Extracted from thelog
table.latest_exception
: Extracted from thelog
table.resource
: Extracted from thelog
table.latest_log_entry_date
: Extracted using the max function on the entry_date from thelog
table.endpoint_entry_date
: Extracted from theendpoint
table.endpoint_end_date
: Extracted from theendpoint
table.resource_start_date
: Extracted from theresource
table.resource_end_date
: Extracted from theresource
table.
6. Table: reporting_latest_endpoints
Purpose: To store the most recent data on endpoints and provides the latest active endpoint data per organization and pipeline.
Base Tables:
- endpoint
: Contains information about endpoints sourced from the digital_land.
- source
: Provides centralized metadata for datasets across different organisations, sourced from the digital_land.
- log
: Contains logs related to endpoint, sourced from the digital_land.
- organisation
: Contains the names and details of organizations sourced from the digital_land.
- source_pipeline
: Contains information on the pipeline associated with each source sourced from the digital_land.
- resource
: Holds information about the resources linked to the endpoints sourced from the digital_land.
Columns:
organisation
: Extracted from source table.name
: Extracted from the organization table.organisation_name
: Extracted from the organization table.dataset
: Extracted from the dataset table.collection
: Extracted from the source table.pipeline
: Extracted from the source_pipeline table.endpoint
: Extracted from the log table.endpoint_url
: Extracted from the endpoint table.licence
: Extracted from the source data.latest_status
: Extracted from the log table.days_since_200
: Calculated from log as days since last “200 OK” status, from subquery t2.latest_exception
: Extracted from the log table.resource
: Extracted from the log table.latest_log_entry_date
: Extracted using the max function on the entry_date from the log table.endpoint_entry_date
: Extracted from the endpoint table.endpoint_end_date
: Extracted from the endpoint table.resource_start_date
: Extracted from the resource table.resource_end_date
: Extracted from the resource table.rn
: Row number for identifying unique records.