Workflow Tools - snakemake & targets
Introduction
Workflow tools let you automate the running and rerunning of code with multiple steps. For example we use it for managing our image processing workflow for our Everglades research.
Python - snakemake
Getting Started
Handling complex inputs with input functions
In our workflows with deal with complex input-output structures, like having early phases of the pipeline work on one flight (file) at a time and later phases work on all of the files from a given site and year as a group.
This can be accomplished by defining custom input functions.
Testing snakemake with partial wildcards
When testing big workflow it is often useful to run the workflow on a subset of the data. For example our Everglades workflow runs on all years, sites, and flight at once, but we might want to test a site year-site combination when making a change. To prepare to do this replace your Wildcards object with the component lists for the main workflow. E.g.,
ORTHOMOSAICS = glob_wildcards("/{year}/{site}/{flight}.tif")
FLIGHTS = ORTHOMOSAICS.flight
SITES = ORTHOMOSAICS.site
YEARS = ORTHOMOSAICS.year
The components are just lists, so you can then replace them with whatever pieces of the full workflow you want to test. E.g.,:
ORTHOMOSAICS = glob_wildcards("/{year}/{site}/{flight}.tif")
TEST = glob_wildcards("/blue/ewhite/everglades/orthomosaics/2022/StartMel/{flight}.tif")
FLIGHTS = TEST.flight # ORTHOMOSAICS.flight
SITES = ["StartMel"] * len(FLIGHTS) # ORTHOMOSAICS.site
YEARS = ["2022"] * len(FLIGHTS) #ORTHOMOSAICS.year