Tracking pipeline processing status

Nipoppy trackers search for expected file paths or patterns in pipeline output files. They are specific to pipeline steps, and can be configured to include custom paths.

Summary

Prerequisites

  • A Nipoppy dataset with a valid global configuration file and an accurate manifest

  • Processed imaging data the <DATASET_ROOT>/derivatives directory

Data directories and files

Directory or file

Content description

<DATASET_ROOT>/derivatives

Input – Derivative files produced by processing pipelines

<DATASET_ROOT>/derivatives/imaging_bagel.tsv

Output – Tabular file containing processing status for each participant/session and pipeline

Commands

Workflow

  1. Nipoppy will loop over all participants/sessions that have BIDS data according to the doughnut file

  2. For each participant-session pair:

    1. Paths in the pipeline’s tracker configuration will be processed such that template strings related to the participant/session are replaced by the appropriate values

    2. Each path in the list is checked, then a status is assigned, and the bagel file is updated accordingly

Configuring a pipeline tracker

The global configuration file should include paths to tracker configuration files, which are JSON files containing lists of dictionaries.

Here is example of tracker configuration file (default for MRIQC 23.1.0):

{
    "PATHS": [
        "[[NIPOPPY_BIDS_PARTICIPANT_ID]]/[[NIPOPPY_BIDS_SESSION_ID]]/anat/[[NIPOPPY_BIDS_PARTICIPANT_ID]]_[[NIPOPPY_BIDS_SESSION_ID]]*_T1w.json",
        "[[NIPOPPY_BIDS_PARTICIPANT_ID]]_[[NIPOPPY_BIDS_SESSION_ID]]*_T1w.html"
    ]
}

Importantly, pipeline completion status is not inferred from exit codes as trackers are run independently of the pipeline runners. Moreover, the default tracker configuration files are somewhat minimal and do not check all possible output files generated these pipelines.

Tip

  • The paths are expected to be relative to the <DATASET_ROOT>/derivatives/<PIPELINE_NAME>/<PIPELINE_VERSION>/output directory.

  • “Glob” expressions (i.e., that include *) are allowed in paths. If at least one file matches the expression, then the file will be considered found for that expression.

Note

The template strings [[NIPOPPY_<ATTRIBUTE_NAME>]] work the same way as the ones in the global configuration file and the pipeline invocation files – they are replaced at runtime by appropriate values.

Given a dataset with the following content in <DATASET_ROOT>/derivatives:

└── derivatives
    └── mriqc
        └── 23.1.0
            └── output
                ├── sub-001
                │   ├── figures
                │   │   ├── sub-001_ses-1_run-01_desc-background_T1w.svg
                │   │   └── sub-001_ses-1_run-01_desc-zoomed_T1w.svg
                │   └── ses-1
                │       └── anat
                │           ├── sub-001_ses-1_run-01_T1w.json
                │           └── sub-001_ses-1_run-02_T1w.json
                └── sub-001_ses-1_run-01_T1w.html

Running the tracker with the above configuration will result in the imaging bagel file showing:

participant_id

bids_participant_id

session_id

bids_session_id

pipeline_name

pipeline_version

pipeline_step

status

3000

sub-3000

BL

ses-BL

mriqc

23.1.0

default

SUCCESS

Note

If there is an existing bagel, the rows relevant to the specific pipeline, participants, and sessions will be updated. Other rows will be left as-is.

The pipeline_complete column can have the following values:

  • SUCCESS: all specified paths have been found

  • FAIL: at least one of the paths has not been found

Running a pipeline tracker

Using the command-line interface

To track all available participants and sessions, run:

$ nipoppy track \
    <DATASET_ROOT> \
    --pipeline <PIPELINE_NAME>

where <PIPELINE_NAME> correspond to the pipeline name as specified in the global configuration file.

Note

If there are multiple versions or steps for the same pipeline in the global configuration file, use --pipeline-version and --pipeline-step to specify the desired version and step respectively. By default, the first version and step listed for the pipeline will be used.

The tracker can also be run on a single participant and/or session at a time:

$ nipoppy track \
    <DATASET_ROOT> \
    --pipeline <PIPELINE_NAME> \
    --participant-id <PARTICIPANT_ID> \
    --session-id <SESSION_ID>

See the CLI reference page for more information on additional optional arguments.

Note

Log files for this command will be written to <DATASET_ROOT>/logs/track

Using the Python API

from nipoppy.workflows import PipelineTracker

# replace by appropriate values
dpath_root = "<DATASET_ROOT>"
pipeline_name = "<PIPELINE_NAME>"

workflow = PipelineTracker(
    dpath_root=dpath_root,
    pipeline_name=pipeline_name,
)
workflow.run()

See the API reference for nipoppy.workflows.PipelineTracker for more information on optional arguments (they correspond to the ones for the CLI).

Next steps

If some participants/sessions have failed processing or have not been run yet, they should be run again.

Once the dataset has been processed with a pipeline, Nipoppy extractors can be used to obtain analysis-ready imaging-derived phenotypes (IDPs).