Organizing raw imaging data

To use Nipoppy to convert imaging data to the BIDS standard, the data first needs to be organized in a way that Nipoppy can understand and pass to underlying BIDS converters (see Converting a dataset to BIDS). Since different studies typically follow their own methods for raw imaging data organization, this step may require the creation of a custom mapping file or the overriding of some existing methods in the Python API.

Summary

Prerequisites

  • A Nipoppy dataset with a valid global configuration file and an accurate manifest

  • Raw imaging data in <DATASET_ROOT>/sourcedata/imaging/pre_reorg

Data directories

Directory

Content description

<DATASET_ROOT>/sourcedata/imaging/pre_reorg

Input – Arbitrarily organized raw imaging data (DICOMs or NIfTIs)

<DATASET_ROOT>/sourcedata/imaging/post_reorg

Output – Raw imaging data (DICOMs or NIfTIs) organized in a way that facilitates BIDS conversion

Commands

Workflow

  1. Nipoppy will loop over all participants/sessions that have data in <DATASET_ROOT>/sourcedata/imaging/pre_reorg but do not have data in <DATASET_ROOT>/sourcedata/imaging/post_reorg according to the curation status file

    • If the curation status file does not exist, it will be automatically generated

    • If there is an existing curation status file but it does not have all the rows in the manifest, new entries will be automatically added to the curation status file

    • The curation status file can also be completely regenerated with nipoppy track-curation --regenerate

  2. For each participant-session pair:

    1. Files from the <DATASET_ROOT>/sourcedata/imaging/pre_reorg directory will be “copied” (the default is to create symlinks) to the <DATASET_ROOT>/sourcedata/imaging/post_reorg directory into a flat list

    2. The curation status file is updated to indicate that this participant-session pair now has data in <DATASET_ROOT>/sourcedata/imaging/post_reorg

Configuring the reorganization

By default, Nipoppy expects “participant-first” organization, like the following:

<DATASET_ROOT>
└── sourcedata/
    └── imaging/
        └── pre_reorg/
            └── 01/                  # participant subdirectory
                ├── 1/               # session subdirectory
                │   ├── protocol1/   # arbitrary DICOM subtree
                │   │   ├── 100.dcm
                │   │   ├── 101.dcm
                │   │   └── 102.dcm
                │   └── protocol2/
                │       ├── 200.dcm
                │       ├── 201.dcm
                │       └── 202.dcm
                └── 2/
                    ├── protocol3/
                    │   ├── 300.dcm
                    │   ├── 301.dcm
                    │   └── 302.dcm
                    └── protocol4/
                        ├── 400.dcm
                        ├── 401.dcm
                        └── 402.dcm

All files in participant-session subdirectories (and sub-subdirectories, if applicable) will be reorganized under <DATASET_ROOT>/sourcedata/imaging/post_reorg/sub-<PARTICIPANT_ID>/ses-<SESSION_ID> (note the addition of BIDS prefixes), creating a flat list of files, like this:

<DATASET_ROOT>
└── sourcedata/
    └── imaging/
        └── post_reorg/
            └── sub-01/
                ├── ses-1/
                │   ├── 100.dcm  # flat list of DICOM files
                │   ├── 101.dcm
                │   ├── 102.dcm
                │   ├── 200.dcm
                │   ├── 201.dcm
                │   └── 202.dcm
                └── ses-2/
                    ├── 300.dcm
                    ├── 301.dcm
                    ├── 302.dcm
                    ├── 400.dcm
                    ├── 401.dcm
                    └── 402.dcm

By default, the output files will be relative symbolic links (“symlinks”) to avoid duplication of files.

If "DICOM_DIR_PARTICIPANT_FIRST" is set to "false" in the global configuration file, then Nipoppy will instead expect session-level directories with nested participant-level directories (e.g., <DATASET_ROOT>/sourcedata/imaging/pre_reorg/1/01 for the above example).

If the raw imaging data are not organized in any of these two structures, a custom tab-separated file can be created to map each unique participant-session pair to a directory path (relative to <DATASET_ROOT>/sourcedata/imaging/pre_reorg). This path to this mapping file must be specified in the "DICOM_DIR_MAP_FILE" in the global configuration file. See the schema reference for more information.

Here is an example file for a dataset that already uses the ses- prefix for sessions:

participant_id

session_id

participant_dicom_dir

01

1

01/ses-1

01

2

01/ses-2

02

1

02/ses-1

Note

More granular customization can also be achieved for both the input file paths and the output file names, see Customizing input and output file paths.

Running the reorganization

Using the command-line interface

$ nipoppy reorg --datatset <DATASET_ROOT>

See the CLI reference page for more information on optional arguments (e.g., reading DICOM headers to check the image type, and copying files instead of creating symlinks).

Note

Log files for this command will be written to <DATASET_ROOT>/logs/dicom_reorg

Using the Python API

from nipoppy.workflows import DicomReorgWorkflow

dpath_root = "."  # replace by path to dataset root directory
workflow = DicomReorgWorkflow(dpath_root=dpath_root)
workflow.run()

See the API reference for nipoppy.workflows.DicomReorgWorkflow for more information on optional arguments (they correspond to the ones for the CLI).

Customizing input and output file paths

There may be datasets where the raw imaging files are not organized in a participant-session directory. An example of this would a dataset whose raw DICOM files are in archives, like so:

<DATASET_ROOT>
└── sourcedata/
    └── imaging/
        └── pre_reorg/
            ├── 01/           # participant subdirectory
            │   ├── 1.tar.gz  # archive for session 1
            │   └── 2.tar.gz  # archive for session 2
            └── 02/
                ├── 1.tar.gz
                └── 2.tar.gz

In this case, using a DICOM directory mapping file as described above is not enough, since files from different imaging sessions are in the same directory.

The nipoppy.workflows.DicomReorgWorkflow class exposes two functions for finer control of input paths and output filenames:

Here is an example of custom imaging data reorganization script:

"""Example script for custom DICOM reorganization."""

import argparse
from pathlib import Path

from nipoppy.logger import add_logfile
from nipoppy.workflows import DicomReorgWorkflow


class CustomDicomReorgWorkflow(DicomReorgWorkflow):
    """Custom workflow class that overrides two methods from DicomReorgWorkflow."""

    def get_fpaths_to_reorg(self, participant_id: str, session_id: str) -> list[Path]:
        """
        Get full file paths to reorganize for a single participant and session.

        Here we return a list with only a single path, but more than one path
        can be specified.
        """
        # self.layout.dpath_raw_dicom will dynamically generate the path to the
        # dataset's (unorganized) raw imaging data
        return [self.layout.dpath_pre_reorg / participant_id / f"{session_id}.tar.gz"]

    def apply_fname_mapping(
        self, fname_source: str, participant_id: str, session_id: str
    ) -> str:
        """
        Name the files differently in the sourcedata directory.

        Here we ignore the fname_source and (original filename) and return a string
        that will be used as the new filename.

        Note: this only controls the name of the file. Its parent directories will
        have fixed names that cannot be changed.
        """
        return f"{participant_id}-{session_id}.tar.gz"


if __name__ == "__main__":
    # use a command-line parser
    parser = argparse.ArgumentParser(
        description="Run the custom DICOM reorganization workflow."
    )
    parser.add_argument(
        "dataset_root",
        type=Path,
        help="Root directory of Nipoppy dataset",
    )
    args = parser.parse_args()

    # initialize workflow
    workflow = CustomDicomReorgWorkflow(dpath_root=args.dataset_root)

    # set up logging to a file
    logger = workflow.logger
    add_logfile(logger, workflow.generate_fpath_log())

    # run the workflow
    try:
        workflow.run()
    except Exception:
        logger.exception(
            "An error occurred with the custom DICOM reorganization script"
        )

Running this script on the data shown above will create the following organized files (by default symlinks):

<DATASET_ROOT>
└── sourcedata/
    └── imaging/
        └── post_reorg/
            ├── sub-01/
            │   ├── ses-1/
            │   │   └── 01-1.tar.gz
            │   └── ses-2/
            │       └── 01-2.tar.gz
            └── sub-02/
                ├── ses-1/
                │   └── 02-1.tar.gz
                └── ses-2/
                    └── 02-2.tar.gz

Next steps

Now that the raw imaging data has been organized in a standardized participant-session structure, it is ready for BIDS conversion!