Onboarding

NOTE
This is currently a Work In Progress

The programme and what it’s doing

Explore the planning data site, can you see what datasets we currently publish?
Can you use the search functionality to search for particular data in a particular area?
Read the API docs and see how they reflect the way the searches work
Click the facts button on an entity page to see how data provenance is recorded

Read the documentation of the design process, does the way the data design team work make sense?
Explore the planning consideration backlog, look at some considerations in detail - particularly following through to the github discussion and the links to the dataset on the platform
Take note of the different stages considerations are at, can you work out what stage means we should have data on the platform?

Does the guidance structure help make sense of the journey that data providers go on?
Read some of the data specification guidance (e.g. article-4-direction), does it match the data you saw live on the platform?
Have a look at the technical specification too, does the data model diagram help make sense of how datasets are connected?

Explore the dashboard pages for some different organisations, can you find one with a lot of issues?
Explore some of the issues that are displayed to data providers, do they make sense and do you understand how they would resolve them?
Follow the journey/tasks for a dataset that hasn’t been added yet, can you see how it signposts to different parts of the service?

Can you run some data through the Check service to see how results are displayed to data providers?

Read the pipeline processes and pipeline configuration pages.
Read the blog post on the data model
Check out the config repo - can you find the endpoint URL for a dataset you’re interested in and download the unprocessed data from a provider?
Take a look at some of the recent commits to see the sorts of changes that happen.

Read the specification guidance page.
Take a look around the specification repo, particularly individual dataset files (e.g. article-4-direction.md) - can you work out what sort of processes some of these fields might control?

The Following the data flow tutorial tutorial uses a new endpoint to demonstrate some different datasette tables, run through it.

Read the operational procedures guide, it should help to start to make sense of the data lifecycle.

Follow the key tasks of the onboarding Trello card [note - is this still live?? Looks fairly up to date… but not sure about access (ironically…)]

You’ll want to run any code in a standalone environment. You should be able to get an environment set up by cloning the repo you need and running make init which will install any packages needed in the environment. But before doing that you may need to install some other software on your machine. See the development how to guides for how to do this, you may need wsl if you’re on a windows laptop, and you’ll want to use venv or a similar environment management tool to create new environments.

You’ll need to be given access to the digital-land organisation.
If you’re new to git, follow the github start your journey process on their docs, particularly the “Hello World” section.
Once you’ve got access to digital-land, you may want to configure ssh - this will mean you don’t need to authenticate for every push.