Tracking pipeline processing status

Nipoppy trackers search for expected file paths or patterns in pipeline output files. They are specific to pipelines and versions, and can be configured to include custom paths for various levels of granularity.

Summary

Prerequisites

  • A Nipoppy dataset with a valid global configuration file and an accurate manifest

  • Processed imaging data the <DATASET_ROOT>/derivatives directory

Data directories and files

Directory or file

Content description

<DATASET_ROOT>/derivatives

Input – Derivative files produced by processing pipelines

<DATASET_ROOT>/derivatives/bagel.csv

Output – Tabular file containing processing status for each participant/session and pipeline

Commands

Workflow

  1. Nipoppy will loop over all participants/sessions that have BIDS data according to the doughnut file

  2. For each participant-session pair:

    1. Paths in the pipeline’s tracker configuration will be processed such that template strings related to the participant/session are replaced by the appropriate values

    2. Each path in the list is checked, then a status is assigned, and the bagel file is updated accordingly

Configuring a pipeline tracker

The global configuration file should include paths to tracker configuration files, which are JSON files containing lists of dictionaries.

Here is example of tracker configuration file (default for MRIQC 23.1.0):

[
    {
        "NAME": "pipeline_complete",
        "PATHS": [
            "[[NIPOPPY_BIDS_PARTICIPANT]]/[[NIPOPPY_BIDS_SESSION]]/anat/[[NIPOPPY_BIDS_PARTICIPANT]]_[[NIPOPPY_BIDS_SESSION]]_*_T1w.json",
            "[[NIPOPPY_BIDS_PARTICIPANT]]_[[NIPOPPY_BIDS_SESSION]]_*_T1w.html"
        ]
    }
]

Importantly, pipeline completion status is not inferred from exit codes as trackers are run independently of the pipeline runners. Moreover, the default tracker configuration files are somewhat minimal and do not check all possible output files generated these pipelines.

Tip

  • The paths are expected to be relative to the <DATASET_ROOT>/derivatives/<PIPELINE_NAME>/<PIPELINE_VERSION>/output directory.

  • “Glob” expressions (i.e., that include *) are allowed in paths. If at least one file matches the expression, then the file will be considered found for that expression.

Note

The template strings [[NIPOPPY_<ATTRIBUTE_NAME>]] work the same way as the ones in the global configuration file and the pipeline invocation files – they are replaced at runtime by appropriate values.

Attention

Currently, only the tracker configuration with pipeline_complete in the NAME field will be used, but we are planning to extend trackers to allow multiple statuses per pipeline. Stay tuned!

Given a dataset with the following content in <DATASET_ROOT>/derivatives:

└── derivatives
    └── mriqc
        └── 23.1.0
            └── output
                ├── sub-001
                │   ├── figures
                │   │   ├── sub-001_ses-1_run-01_desc-background_T1w.svg
                │   │   └── sub-001_ses-1_run-01_desc-zoomed_T1w.svg
                │   └── ses-1
                │       └── anat
                │           ├── sub-001_ses-1_run-01_T1w.json
                │           └── sub-001_ses-1_run-02_T1w.json
                └── sub-001_ses-1_run-01_T1w.html

Running the tracker with the above configuration will result in the imaging bagel file showing:

participant_id

bids_id

session

pipeline_name

pipeline_version

pipeline_complete

001

sub-001

ses-1

mriqc

23.1.0

SUCCESS

Note

If there is an existing bagel, the rows relevant to the specific pipeline, participants, and sessions will be updated. Other rows will be left as-is.

The pipeline_complete column can have the following values:

  • SUCCESS: all specified paths have been found

  • FAIL: at least one of the paths has not been found

Running a pipeline tracker

Using the command-line interface

To track all available participants and sessions, run:

$ nipoppy track \
    --dataset-root <DATASET_ROOT> \
    --pipeline <PIPELINE_NAME>

where <PIPELINE_NAME> correspond to the pipeline name as specified in the global configuration file.

Note

If there are multiple versions for the same pipeline in the global configuration file, use --pipeline-version to specify the desired version. By default, the first version listed for the pipeline will be used.

The tracker can also be run on a single participant and/or session at a time:

$ nipoppy track \
    --dataset-root <DATASET_ROOT> \
    --pipeline <PIPELINE_NAME> \
    --participant-id <PARTICIPANT_ID> \
    --session-id <SESSION_ID>

See the CLI reference page for more information on additional optional arguments.

Note

Log files for this command will be written to <DATASET_ROOT>/scratch/logs/track

Using the Python API

from nipoppy.workflows import PipelineTracker

# replace by appropriate values
dpath_root = "<DATASET_ROOT>"
pipeline_name = "<PIPELINE_NAME>"

workflow = PipelineTracker(
    dpath_root=dpath_root,
    pipeline_name=pipeline_name,
)
workflow.run()

See the API reference for nipoppy.workflows.PipelineTracker for more information on optional arguments (they correspond to the ones for the CLI).

Next steps

If some participants/sessions have failed processing or have not been run yet, they should be run again.

Once the dataset has been processed with a pipeline, Nipoppy extractors can be used to obtain analysis-ready imaging-derived phenotypes (IDPs).