Organizing raw imaging data¶
To use Nipoppy to convert imaging data to the BIDS standard, the data first needs to be organized in a way that Nipoppy can understand and pass to underlying BIDS converters (see Converting a dataset to BIDS). Since different studies typically follow their own methods for raw imaging data organization, this step may require the creation of a custom mapping file or the overriding of some existing methods in the Python API.
Summary¶
Prerequisites¶
A Nipoppy dataset with a valid global configuration file and an accurate manifest
See the Quickstart guide for instructions on how to set up a new dataset
Raw imaging data in
<DATASET_ROOT>/sourcedata/imaging/pre_reorg
Data directories¶
Directory |
Content description |
---|---|
|
Input – Arbitrarily organized raw imaging data (DICOMs or NIfTIs) |
|
Output – Raw imaging data (DICOMs or NIfTIs) organized in a way that facilitates BIDS conversion |
Commands¶
Command-line interface:
nipoppy reorg
Python API:
nipoppy.workflows.DicomReorgWorkflow
Workflow¶
Nipoppy will loop over all participants/sessions that have data in
<DATASET_ROOT>/sourcedata/imaging/pre_reorg
but do not have data in<DATASET_ROOT>/sourcedata/imaging/post_reorg
according to the curation status fileIf the curation status file does not exist, it will be automatically generated
If there is an existing curation status file but it does not have all the rows in the manifest, new entries will be automatically added to the curation status file
The curation status file can also be completely regenerated with
nipoppy track-curation --regenerate
For each participant-session pair:
Files from the
<DATASET_ROOT>/sourcedata/imaging/pre_reorg
directory will be “copied” (the default is to create symlinks) to the<DATASET_ROOT>/sourcedata/imaging/post_reorg
directory into a flat listThe curation status file is updated to indicate that this participant-session pair now has data in
<DATASET_ROOT>/sourcedata/imaging/post_reorg
Configuring the reorganization¶
By default, Nipoppy expects “participant-first” organization, like the following:
<DATASET_ROOT>
└── sourcedata/
└── imaging/
└── pre_reorg/
└── 01/ # participant subdirectory
├── 1/ # session subdirectory
│ ├── protocol1/ # arbitrary DICOM subtree
│ │ ├── 100.dcm
│ │ ├── 101.dcm
│ │ └── 102.dcm
│ └── protocol2/
│ ├── 200.dcm
│ ├── 201.dcm
│ └── 202.dcm
└── 2/
├── protocol3/
│ ├── 300.dcm
│ ├── 301.dcm
│ └── 302.dcm
└── protocol4/
├── 400.dcm
├── 401.dcm
└── 402.dcm
All files in participant-session subdirectories (and sub-subdirectories, if applicable) will be reorganized under <DATASET_ROOT>/sourcedata/imaging/post_reorg
/sub-<PARTICIPANT_ID>/ses-<SESSION_ID>
(note the addition of BIDS prefixes), creating a flat list of files, like this:
<DATASET_ROOT>
└── sourcedata/
└── imaging/
└── post_reorg/
└── sub-01/
├── ses-1/
│ ├── 100.dcm # flat list of DICOM files
│ ├── 101.dcm
│ ├── 102.dcm
│ ├── 200.dcm
│ ├── 201.dcm
│ └── 202.dcm
└── ses-2/
├── 300.dcm
├── 301.dcm
├── 302.dcm
├── 400.dcm
├── 401.dcm
└── 402.dcm
By default, the output files will be relative symbolic links (“symlinks”) to avoid duplication of files.
If "DICOM_DIR_PARTICIPANT_FIRST"
is set to "false"
in the global configuration file, then Nipoppy will instead expect session-level directories with nested participant-level directories (e.g., <DATASET_ROOT>/sourcedata/imaging/pre_reorg
/1/01
for the above example).
If the raw imaging data are not organized in any of these two structures, a custom tab-separated file can be created to map each unique participant-session pair to a directory path (relative to <DATASET_ROOT>/sourcedata/imaging/pre_reorg
). This path to this mapping file must be specified in the "DICOM_DIR_MAP_FILE"
in the global configuration file. See the schema reference for more information.
Here is an example file for a dataset that already uses the ses-
prefix for sessions:
participant_id |
session_id |
participant_dicom_dir |
---|---|---|
01 |
1 |
01/ses-1 |
01 |
2 |
01/ses-2 |
02 |
1 |
02/ses-1 |
Raw content of the example DICOM directory mapping file
1participant_id session_id participant_dicom_dir
201 1 01/ses-1
301 2 01/ses-2
402 1 02/ses-1
Note
More granular customization can also be achieved for both the input file paths and the output file names, see Customizing input and output file paths.
Running the reorganization¶
Using the command-line interface¶
$ nipoppy reorg --datatset <DATASET_ROOT>
See the CLI reference page for more information on optional arguments (e.g., reading DICOM headers to check the image type, and copying files instead of creating symlinks).
Note
Log files for this command will be written to <DATASET_ROOT>/logs
/dicom_reorg
Using the Python API¶
from nipoppy.workflows import DicomReorgWorkflow
dpath_root = "." # replace by path to dataset root directory
workflow = DicomReorgWorkflow(dpath_root=dpath_root)
workflow.run()
See the API reference for nipoppy.workflows.DicomReorgWorkflow
for more information on optional arguments (they correspond to the ones for the CLI).
Customizing input and output file paths¶
There may be datasets where the raw imaging files are not organized in a participant-session directory. An example of this would a dataset whose raw DICOM files are in archives, like so:
In this case, using a DICOM directory mapping file as described above is not enough, since files from different imaging sessions are in the same directory.
The nipoppy.workflows.DicomReorgWorkflow
class exposes two functions for finer control of input paths and output filenames:
nipoppy.workflows.DicomReorgWorkflow.get_fpaths_to_reorg()
can be overridden to map a participant ID and session ID to a list of absolute filepaths to be reorganizednipoppy.workflows.DicomReorgWorkflow.apply_fname_mapping()
can be overridden to rename output filesNote: output files will still be in the
<DATASET_ROOT>/sourcedata/imaging/post_reorg
/sub-<PARTICIPANT_ID>/ses-<SESSION_ID>
directory
Here is an example of custom imaging data reorganization script:
"""Example script for custom DICOM reorganization."""
import argparse
from pathlib import Path
from nipoppy.logger import add_logfile
from nipoppy.workflows import DicomReorgWorkflow
class CustomDicomReorgWorkflow(DicomReorgWorkflow):
"""Custom workflow class that overrides two methods from DicomReorgWorkflow."""
def get_fpaths_to_reorg(self, participant_id: str, session_id: str) -> list[Path]:
"""
Get full file paths to reorganize for a single participant and session.
Here we return a list with only a single path, but more than one path
can be specified.
"""
# self.layout.dpath_raw_dicom will dynamically generate the path to the
# dataset's (unorganized) raw imaging data
return [self.layout.dpath_pre_reorg / participant_id / f"{session_id}.tar.gz"]
def apply_fname_mapping(
self, fname_source: str, participant_id: str, session_id: str
) -> str:
"""
Name the files differently in the sourcedata directory.
Here we ignore the fname_source and (original filename) and return a string
that will be used as the new filename.
Note: this only controls the name of the file. Its parent directories will
have fixed names that cannot be changed.
"""
return f"{participant_id}-{session_id}.tar.gz"
if __name__ == "__main__":
# use a command-line parser
parser = argparse.ArgumentParser(
description="Run the custom DICOM reorganization workflow."
)
parser.add_argument(
"dataset_root",
type=Path,
help="Root directory of Nipoppy dataset",
)
args = parser.parse_args()
# initialize workflow
workflow = CustomDicomReorgWorkflow(dpath_root=args.dataset_root)
# set up logging to a file
logger = workflow.logger
add_logfile(logger, workflow.generate_fpath_log())
# run the workflow
try:
workflow.run()
except Exception:
logger.exception(
"An error occurred with the custom DICOM reorganization script"
)
Running this script on the data shown above will create the following organized files (by default symlinks):
Next steps¶
Now that the raw imaging data has been organized in a standardized participant-session structure, it is ready for BIDS conversion!