Workflow#
Download and update RAW dataset#
To run the pipeline some input datasets are required:
To download, extract and copy a current set of raw data into store/raw
, type
snakemake -j<NUMBER_OF_CPU_CORES> update_raw
A zip file from a prespecified
URL
is downloaded and unzipped to store/temp/
. The raw data files are copied to
the corresponding folders in store/raw/
.
A prompt asks if an already existing file should be updated. Confirm with "y"
or type "n" to skip.
The following additional files must be downloaded manually:
- OpenStreetMap
→ place in
store/raw/osm/data/
Run#
To run the pipeline, go to Digipipe's root digipipe/
or to
digipipe/workflow/
and type
snakemake -j<NUMBER_OF_CPU_CORES>
while NUMBER_OF_CPU_CORES
is the number of CPU cores to be used for the
pipeline execution. You can also make a dry-run (see what snakemake would do
but without actually really doing anything) by typing
snakemake -n
To clean all produced data, use
snakemake -j1 clean
This involves preprocessed data in directories: preprocessed, datasets and appdata.
Pipeline visualization / DAG#
The entire pipeline can be visualized as a directed acyclic graph (DAG). The following command creates the DAG as an svg file in the current directory:
snakemake --dag | dot -Tsvg > dag_rules_full.svg
As the full graph is too packed with information and therefore hardly to grasp,
consider to show only certain parts by disabling some target files in the all
rule. Also, a simple rule graph (the one shown above) can be created and saved
in the current directory using
snakemake --rulegraph | dot -Tsvg > dag_rules_simple.svg
To create a graph in the current directory showing the file dependencies, type
snakemake --filegraph | dot -Tsvg > dag_files.svg
The graphs also provide information on the completed (solid lines) and pending (dashed lines) processing steps. For further details see Snakemake CLI docs.
Snakefiles and config#
- The global workflow is defined in the main Snakefile.
- It includes the module Snakefiles from the data store located at
- store/preprocessed/module.smk and
- store/datasets/module.smk
- In each of these modules, the rules as well as the config from the contained datasets are imported.