Extracting IDPs from pipeline derivatives¶
Extraction pipelines ingest a subset of processing pipeline outputs into analysis-ready participant- and/or group-level imaging-derived phenotypes (IDPs) useful for particular downstream analysis (e.g., cortical thickness/subcortical volume tables, connectivity matrices).
A processing pipeline can have many downstream extraction pipelines (e.g. volumetric vs vertex-wise or surface measure extractors). Typically, an extractor will depend only on a single processing pipeline, but Nipoppy can support multiple processing pipeline dependencies as well (e.g., in the case of network extractors utilizing both diffusion and functional outputs).
Just like with the BIDS conversion and processing pipelines, Nipoppy uses the Boutiques framework to run extraction pipelines.
Summary¶
Prerequisites¶
A Nipoppy dataset with a valid global configuration file and an accurate manifest
See the Quickstart guide for instructions on how to set up a new dataset
Processed imaging data in
<DATASET_ROOT>/derivatives/<PIPELINE_NAME>/<PIPELINE_VERSION>/output
for the relevant processing pipeline(s) that the extractor depends onAn processing status file with completion statuses for the processing pipeline(s) associated with the extraction pipeline.
This is obtained by running
nipoppy track
(see Tracking pipeline processing status)
Data directories¶
Directory |
Content description |
---|---|
|
Input – Derivative files produced by processing pipelines |
|
Output – Imaging-derived phenotypes (IDPs) produced by extraction pipelines |
Commands¶
Command-line interface:
nipoppy extract
Python API:
nipoppy.workflows.ExtractionRunner
Workflow¶
Nipoppy will check the processing status file and loop over all participants/sessions that have completed processing for all the pipelines listed in the
PROC_DEPENDENCIES
field.See Tracking pipeline processing status for more information on how to generate the processing status file
Note: an extraction pipeline may be associated with more than one processing pipeline, and the same processing pipeline can have more than one downstream extraction pipeline
For each participant-session pair:
The pipeline’s invocation will be processed such that template strings related to the participant/session and dataset paths (e.g.,
[[NIPOPPY_PARTICIPANT_ID]]
) are replaced by the appropriate valuesThe pipeline is launched using Boutiques, which will be combine the processed invocation with the pipeline’s descriptor file to produce and run a command-line expression
Configuring extraction pipelines¶
Just like with BIDS pipelines and processing pipelines, pipeline and pipeline step configurations are set in the global configuration file (see here for a more complete guide on the fields in this file).
There are several files in pipeline step configurations that can be further modified to customize pipeline runs:
INVOCATION_FILE
: a JSON file containing key-value pairs specifying runtime parameters. The keys correspond to entries in the pipeline’s descriptor file.
Note
By default, pipeline files are stored in <DATASET_ROOT>/pipelines
/<PIPELINE_NAME>-<PIPELINE_VERSION>
.
Warning
Pipeline step configurations also have a DESCRIPTOR_FILE
field, which points to the Boutiques descriptor of a pipeline. Although descriptor files can be modified, in most cases it is not needed and we recommend that less advanced users keep the default.
Customizing pipeline invocations¶
Understanding Boutiques descriptors and invocations
Boutiques descriptors have an inputs
field listing all available parameters for the tool being described. As a simple example, let’s use the following descriptor for a dummy “pipeline”:
{
"name": "example",
"description": "An example tool",
"tool-version": "0.1.0",
"schema-version": "0.5",
"command-line": "echo [PARAM1] [PARAM2] [FLAG1]",
"inputs": [
{
"name": "The first parameter",
"id": "basic_param1",
"type": "File",
"optional": true,
"value-key": "[PARAM1]"
},
{
"name": "The second parameter",
"id": "basic_param2",
"type": "String",
"optional": false,
"value-key": "[PARAM2]",
"value-choices": [
"choice1",
"choice2"
]
},
{
"name": "The first flag",
"id": "basic_flag1",
"type": "Flag",
"optional": true,
"command-line-flag": "-f",
"value-key": "[FLAG1]"
}
]
}
Each key in the invocation file should match the id
field in an input described in the descriptor file. The descriptor contains information about the input, such as its type (e.g., file, string, flag), whether it is required or not, etc.
Here is a valid invocation file for the above descriptor:
{
"basic_param1": ".",
"basic_param2": "choice1",
"basic_flag1": true
}
If we pass these two files to Boutiques (or rather, bosh
, the Boutiques CLI tool), it will combine them into the following command (and run it):
echo . choice1 -f
Hence, Boutiques allows Nipoppy to abstract away pipeline-specific parameters into JSON text files, giving it the flexibility to run many different kinds of pipelines!
See also
See the Boutiques tutorial for a much more comprehensive overview of Boutiques.
The default pipeline invocation files (in <DATASET_ROOT>/pipelines
/<PIPELINE_NAME>-<PIPELINE_VERSION>
) can be modified by changing existing values or adding new key-value pairs.
Tip
Run the pipeline on a single participant and session with the --simulate
flag to check/debug custom invocation files.
Note
To account for invocations needing to be different for different participants and sessions (amongst other things), Nipoppy invocations are actually templates that need to be slightly processed at runtime to replace template strings by actual values. Recognized template strings include:
[[NIPOPPY_PARTICIPANT_ID]]
: the participant ID without thesub-
prefix[[NIPOPPY_SESSION_ID]]
: the session ID without theses-
prefix[[NIPOPPY_BIDS_PARTICIPANT_ID]]
: the participant ID with thesub-
prefix[[NIPOPPY_BIDS_SESSION_ID]]
: the session ID with theses-
prefix[[NIPOPPY_<LAYOUT_PROPERTY>]]
, where<LAYOUT_PROPERTY>
is a property in the Nipoppy dataset layout configuration file (all uppercase): any path defined in the Nipoppy dataset layout
Running an extraction pipeline¶
Using the command-line interface¶
To process all participants and sessions in a dataset (sequentially), run:
$ nipoppy extract \
<DATASET_ROOT> \
--pipeline <PIPELINE_NAME>
where <PIPELINE_NAME>
correspond to the pipeline name as specified in the global configuration file.
Note
If there are multiple versions for the same pipeline in the global configuration file, use --pipeline-version
to specify the desired version. By default, the first version listed for the pipeline will be used.
Similarly, if --pipeline-step
is not specified, the first step defined in the global configuration file will be used.
The pipeline can also be run on a single participant and/or session (useful for batching on clusters and testing pipelines/configurations):
$ nipoppy extract \
<DATASET_ROOT> \
--pipeline <PIPELINE_NAME> \
--participant-id <PARTICIPANT_ID> \
--session-id <SESSION_ID>
Hint
The --simulate
argument will make Nipoppy print out the command to be executed with Boutiques (instead of actually executing it). It can be useful for checking runtime parameters or debugging the invocation file.
See the CLI reference page for more information on additional optional arguments.
Note
Log files for this command will be written to <DATASET_ROOT>/logs
/extract
Using the Python API¶
from nipoppy.workflows import ExtractionRunner
# replace by appropriate values
dpath_root = "<DATASET_ROOT>"
pipeline_name = "<PIPELINE_NAME>"
workflow = ExtractionRunner(
dpath_root=dpath_root,
pipeline_name=pipeline_name,
)
workflow.run()
See the API reference for nipoppy.workflows.ExtractionRunner
for more information on optional arguments (they correspond to the ones for the CLI).
Next steps¶
Extracted IDPs are the end-goal of the current Nipoppy framework. There are no next steps after that, though we encourage the use of similar best practices to ensure the reproducibility of any downstream analysis step.