Skip to main content

Running A Data Collection Pipeline

TODO: Needs Completing

For data engineers and often others in our team this is a key process that generates the files that are later loaded into the platform.

Once you understand how to run it for a Collection then it can be applied to any to debug errors that may have happened overnight. It would be good to read the key concepts in the data operations manual for clarity on the terms that we use. this tutorial will describe the practical applications of these concepts.

Anatomy of collection config

First it’s good to understand the anatomy of collection configuration. The config repo is the home of configuration for all collections.

Note: These files may not exist as they are generated when initialising and running the collection.

Inputs

The inputs to

Outputs