Tracking pipeline processing status¶
Nipoppy trackers search for expected file paths or patterns in pipeline output files. They are specific to pipeline steps, and can be configured to include custom paths.
Summary¶
Prerequisites¶
A Nipoppy dataset with a valid global configuration file and an accurate manifest
See the Quickstart guide for instructions on how to set up a new dataset
Processed imaging data the
<DATASET_ROOT>/derivatives
directorySee Running processing pipelines for expected subdirectory structure
Data directories and files¶
Directory or file |
Content description |
---|---|
|
Input – Derivative files produced by processing pipelines |
|
Output – Tabular file containing processing status for each participant/session and pipeline |
Commands¶
Command-line interface:
nipoppy track
Python API:
nipoppy.workflows.PipelineTracker
Workflow¶
Nipoppy will loop over all participants/sessions that have BIDS data according to the doughnut file
For each participant-session pair:
Paths in the pipeline’s tracker configuration will be processed such that template strings related to the participant/session are replaced by the appropriate values
Each path in the list is checked, then a status is assigned, and the bagel file is updated accordingly
Configuring a pipeline tracker¶
The global configuration file should include paths to tracker configuration files, which are JSON files containing lists of dictionaries.
Here is example of tracker configuration file (default for MRIQC 23.1.0):
{
"PATHS": [
"[[NIPOPPY_BIDS_PARTICIPANT_ID]]/[[NIPOPPY_BIDS_SESSION_ID]]/anat/[[NIPOPPY_BIDS_PARTICIPANT_ID]]_[[NIPOPPY_BIDS_SESSION_ID]]*_T1w.json",
"[[NIPOPPY_BIDS_PARTICIPANT_ID]]_[[NIPOPPY_BIDS_SESSION_ID]]*_T1w.html"
]
}
Importantly, pipeline completion status is not inferred from exit codes as trackers are run independently of the pipeline runners. Moreover, the default tracker configuration files are somewhat minimal and do not check all possible output files generated these pipelines.
Tip
The paths are expected to be relative to the
<DATASET_ROOT>/derivatives
/<PIPELINE_NAME>/<PIPELINE_VERSION>/output
directory.“Glob” expressions (i.e., that include
*
) are allowed in paths. If at least one file matches the expression, then the file will be considered found for that expression.
Note
The template strings [[NIPOPPY_<ATTRIBUTE_NAME>]]
work the same way as the ones in the global configuration file and the pipeline invocation files – they are replaced at runtime by appropriate values.
Given a dataset with the following content in <DATASET_ROOT>/derivatives
:
Running the tracker with the above configuration will result in the imaging bagel file showing:
participant_id |
bids_participant_id |
session_id |
bids_session_id |
pipeline_name |
pipeline_version |
pipeline_step |
status |
---|---|---|---|---|---|---|---|
3000 |
sub-3000 |
BL |
ses-BL |
mriqc |
23.1.0 |
default |
SUCCESS |
Note
If there is an existing bagel, the rows relevant to the specific pipeline, participants, and sessions will be updated. Other rows will be left as-is.
The pipeline_complete
column can have the following values:
SUCCESS
: all specified paths have been foundFAIL
: at least one of the paths has not been found
Running a pipeline tracker¶
Using the command-line interface¶
To track all available participants and sessions, run:
$ nipoppy track \
<DATASET_ROOT> \
--pipeline <PIPELINE_NAME>
where <PIPELINE_NAME>
correspond to the pipeline name as specified in the global configuration file.
Note
If there are multiple versions or steps for the same pipeline in the global configuration file, use --pipeline-version
and --pipeline-step
to specify the desired version and step respectively. By default, the first version and step listed for the pipeline will be used.
The tracker can also be run on a single participant and/or session at a time:
$ nipoppy track \
<DATASET_ROOT> \
--pipeline <PIPELINE_NAME> \
--participant-id <PARTICIPANT_ID> \
--session-id <SESSION_ID>
See the CLI reference page for more information on additional optional arguments.
Note
Log files for this command will be written to <DATASET_ROOT>/logs
/track
Using the Python API¶
from nipoppy.workflows import PipelineTracker
# replace by appropriate values
dpath_root = "<DATASET_ROOT>"
pipeline_name = "<PIPELINE_NAME>"
workflow = PipelineTracker(
dpath_root=dpath_root,
pipeline_name=pipeline_name,
)
workflow.run()
See the API reference for nipoppy.workflows.PipelineTracker
for more information on optional arguments (they correspond to the ones for the CLI).
Next steps¶
If some participants/sessions have failed processing or have not been run yet, they should be run again.
Once the dataset has been processed with a pipeline, Nipoppy extractors can be used to obtain analysis-ready imaging-derived phenotypes (IDPs).