Tracking pipeline processing status¶

Nipoppy trackers search for expected file paths or patterns in pipeline output files. They are specific to pipelines and versions, and can be configured to include custom paths for various levels of granularity.

Summary¶

Prerequisites¶

A Nipoppy dataset with a valid global configuration file and an accurate manifest
- See the Quickstart guide for instructions on how to set up a new dataset
Processed imaging data the <DATASET_ROOT>/derivatives directory
- See Running processing pipelines for expected subdirectory structure

Data directories and files¶

Directory or file	Content description
`<DATASET_ROOT>/derivatives`	Input – Derivative files produced by processing pipelines
`<DATASET_ROOT>/derivatives/bagel.csv`	Output – Tabular file containing processing status for each participant/session and pipeline

Commands¶

Command-line interface: nipoppy track
Python API: nipoppy.workflows.PipelineTracker

Workflow¶

Nipoppy will loop over all participants/sessions that have BIDS data according to the doughnut file
For each participant-session pair:
1. Paths in the pipeline’s tracker configuration will be processed such that template strings related to the participant/session are replaced by the appropriate values
2. Each path in the list is checked, then a status is assigned, and the bagel file is updated accordingly

Configuring a pipeline tracker¶

The global configuration file should include paths to tracker configuration files, which are JSON files containing lists of dictionaries.

Here is example of tracker configuration file (default for MRIQC 23.1.0):

[
    {
        "NAME": "pipeline_complete",
        "PATHS": [
            "[[NIPOPPY_BIDS_PARTICIPANT]]/[[NIPOPPY_BIDS_SESSION]]/anat/[[NIPOPPY_BIDS_PARTICIPANT]]_[[NIPOPPY_BIDS_SESSION]]_*_T1w.json",
            "[[NIPOPPY_BIDS_PARTICIPANT]]_[[NIPOPPY_BIDS_SESSION]]_*_T1w.html"
        ]
    }
]

Importantly, pipeline completion status is not inferred from exit codes as trackers are run independently of the pipeline runners. Moreover, the default tracker configuration files are somewhat minimal and do not check all possible output files generated these pipelines.

Tip

The paths are expected to be relative to the <DATASET_ROOT>/derivatives/<PIPELINE_NAME>/<PIPELINE_VERSION>/output directory.
“Glob” expressions (i.e., that include *) are allowed in paths. If at least one file matches the expression, then the file will be considered found for that expression.

Note

The template strings [[NIPOPPY_<ATTRIBUTE_NAME>]] work the same way as the ones in the global configuration file and the pipeline invocation files – they are replaced at runtime by appropriate values.

Attention

Currently, only the tracker configuration with pipeline_complete in the NAME field will be used, but we are planning to extend trackers to allow multiple statuses per pipeline. Stay tuned!

Given a dataset with the following content in <DATASET_ROOT>/derivatives:

└── derivatives
    └── mriqc
        └── 23.1.0
            └── output
                ├── sub-001
                │   ├── figures
                │   │   ├── sub-001_ses-1_run-01_desc-background_T1w.svg
                │   │   └── sub-001_ses-1_run-01_desc-zoomed_T1w.svg
                │   └── ses-1
                │       └── anat
                │           ├── sub-001_ses-1_run-01_T1w.json
                │           └── sub-001_ses-1_run-02_T1w.json
                └── sub-001_ses-1_run-01_T1w.html

Running the tracker with the above configuration will result in the imaging bagel file showing:

participant_id	bids_id	session	pipeline_name	pipeline_version	pipeline_complete
001	sub-001	ses-1	mriqc	23.1.0	SUCCESS

Note

If there is an existing bagel, the rows relevant to the specific pipeline, participants, and sessions will be updated. Other rows will be left as-is.

The pipeline_complete column can have the following values:

SUCCESS: all specified paths have been found
FAIL: at least one of the paths has not been found

Running a pipeline tracker¶

Using the command-line interface¶

To track all available participants and sessions, run:

$ nipoppy track \
    --dataset-root <DATASET_ROOT> \
    --pipeline <PIPELINE_NAME>

where <PIPELINE_NAME> correspond to the pipeline name as specified in the global configuration file.

Note

If there are multiple versions for the same pipeline in the global configuration file, use --pipeline-version to specify the desired version. By default, the first version listed for the pipeline will be used.

The tracker can also be run on a single participant and/or session at a time:

$ nipoppy track \
    --dataset-root <DATASET_ROOT> \
    --pipeline <PIPELINE_NAME> \
    --participant-id <PARTICIPANT_ID> \
    --session-id <SESSION_ID>

See the CLI reference page for more information on additional optional arguments.

Note

Log files for this command will be written to <DATASET_ROOT>/scratch/logs/track

Using the Python API¶

from nipoppy.workflows import PipelineTracker

# replace by appropriate values
dpath_root = "<DATASET_ROOT>"
pipeline_name = "<PIPELINE_NAME>"

workflow = PipelineTracker(
    dpath_root=dpath_root,
    pipeline_name=pipeline_name,
)
workflow.run()

See the API reference for nipoppy.workflows.PipelineTracker for more information on optional arguments (they correspond to the ones for the CLI).

Next steps¶

If some participants/sessions have failed processing or have not been run yet, they should be run again.

Once the dataset has been processed with a pipeline, Nipoppy extractors can be used to obtain analysis-ready imaging-derived phenotypes (IDPs).