Extracting IDPs from pipeline derivatives

Extraction pipelines ingest a subset of processing pipeline outputs into analysis-ready participant- and/or group-level imaging-derived phenotypes (IDPs) useful for particular downstream analysis (e.g., cortical thickness/subcortical volume tables, connectivity matrices).

A processing pipeline can have many downstream extraction pipelines (e.g. volumetric vs vertex-wise or surface measure extractors). Typically, an extractor will depend only on a single processing pipeline, but Nipoppy can support multiple processing pipeline dependencies as well (e.g., in the case of network extractors utilizing both diffusion and functional outputs).

Just like with the BIDS conversion and processing pipelines, Nipoppy uses the Boutiques framework to run extraction pipelines.

Summary

Prerequisites

  • A Nipoppy dataset with a valid global configuration file and an accurate manifest

  • Processed imaging data in <DATASET_ROOT>/derivatives/<PIPELINE_NAME>/<PIPELINE_VERSION>/output for the relevant processing pipeline(s) that the extractor depends on

  • An processing status file with completion statuses for the processing pipeline(s) associated with the extraction pipeline.

Data directories

Directory

Content description

<DATASET_ROOT>/derivatives/<PIPELINE_NAME>/<PIPELINE_VERSION>/output

Input – Derivative files produced by processing pipelines

<DATASET_ROOT>/derivatives/<PIPELINE_NAME>/<PIPELINE_VERSION>/idp

Output – Imaging-derived phenotypes (IDPs) produced by extraction pipelines

Commands

Workflow

  1. Nipoppy will check the processing status file and loop over all participants/sessions that have completed processing for all the pipelines listed in the PROC_DEPENDENCIES field.

    • See Tracking pipeline processing status for more information on how to generate the processing status file

    • Note: an extraction pipeline may be associated with more than one processing pipeline, and the same processing pipeline can have more than one downstream extraction pipeline

  2. For each participant-session pair:

    1. The pipeline’s invocation will be processed such that template strings related to the participant/session and dataset paths (e.g., [[NIPOPPY_PARTICIPANT_ID]]) are replaced by the appropriate values

    2. The pipeline is launched using Boutiques, which will be combine the processed invocation with the pipeline’s descriptor file to produce and run a command-line expression

Configuring extraction pipelines

Just like with BIDS pipelines and processing pipelines, pipeline and pipeline step configurations are set in the global configuration file (see here for a more complete guide on the fields in this file).

There are several files in pipeline step configurations that can be further modified to customize pipeline runs:

  • INVOCATION_FILE: a JSON file containing key-value pairs specifying runtime parameters. The keys correspond to entries in the pipeline’s descriptor file.

Note

By default, pipeline files are stored in <DATASET_ROOT>/pipelines/<PIPELINE_NAME>-<PIPELINE_VERSION>.

Warning

Pipeline step configurations also have a DESCRIPTOR_FILE field, which points to the Boutiques descriptor of a pipeline. Although descriptor files can be modified, in most cases it is not needed and we recommend that less advanced users keep the default.

Customizing pipeline invocations

The default pipeline invocation files (in <DATASET_ROOT>/pipelines/<PIPELINE_NAME>-<PIPELINE_VERSION>) can be modified by changing existing values or adding new key-value pairs.

Tip

Run the pipeline on a single participant and session with the --simulate flag to check/debug custom invocation files.

Note

To account for invocations needing to be different for different participants and sessions (amongst other things), Nipoppy invocations are actually templates that need to be slightly processed at runtime to replace template strings by actual values. Recognized template strings include:

  • [[NIPOPPY_PARTICIPANT_ID]]: the participant ID without the sub- prefix

  • [[NIPOPPY_SESSION_ID]]: the session ID without the ses- prefix

  • [[NIPOPPY_BIDS_PARTICIPANT_ID]]: the participant ID with the sub- prefix

  • [[NIPOPPY_BIDS_SESSION_ID]]: the session ID with the ses- prefix

  • [[NIPOPPY_<LAYOUT_PROPERTY>]], where <LAYOUT_PROPERTY> is a property in the Nipoppy dataset layout configuration file (all uppercase): any path defined in the Nipoppy dataset layout

Running an extraction pipeline

Using the command-line interface

To process all participants and sessions in a dataset (sequentially), run:

$ nipoppy extract \
    <DATASET_ROOT> \
    --pipeline <PIPELINE_NAME>

where <PIPELINE_NAME> correspond to the pipeline name as specified in the global configuration file.

Note

If there are multiple versions for the same pipeline in the global configuration file, use --pipeline-version to specify the desired version. By default, the first version listed for the pipeline will be used.

Similarly, if --pipeline-step is not specified, the first step defined in the global configuration file will be used.

The pipeline can also be run on a single participant and/or session (useful for batching on clusters and testing pipelines/configurations):

$ nipoppy extract \
    <DATASET_ROOT> \
    --pipeline <PIPELINE_NAME> \
    --participant-id <PARTICIPANT_ID> \
    --session-id <SESSION_ID>

Hint

The --simulate argument will make Nipoppy print out the command to be executed with Boutiques (instead of actually executing it). It can be useful for checking runtime parameters or debugging the invocation file.

See the CLI reference page for more information on additional optional arguments.

Note

Log files for this command will be written to <DATASET_ROOT>/logs/extract

Using the Python API

from nipoppy.workflows import ExtractionRunner

# replace by appropriate values
dpath_root = "<DATASET_ROOT>"
pipeline_name = "<PIPELINE_NAME>"

workflow = ExtractionRunner(
    dpath_root=dpath_root,
    pipeline_name=pipeline_name,
)
workflow.run()

See the API reference for nipoppy.workflows.ExtractionRunner for more information on optional arguments (they correspond to the ones for the CLI).

Next steps

Extracted IDPs are the end-goal of the current Nipoppy framework. There are no next steps after that, though we encourage the use of similar best practices to ensure the reproducibility of any downstream analysis step.