Populating an empty dataset

Once an empty Nipoppy dataset has been created, the next step is to manually populate it with the raw data available in the study.

In general, all data pertaining to the study should be stored together (i.e., under <DATASET_ROOT>), as that makes it easier to maintain the dataset, link data between modalities, and keep track of the available data.

Note

Depending on the study you are working with, there might not be any data to put in some of the directories described below – that is not an issue. On the other hand, if your study has data that does not seem to fit anywhere, you should still try to store it inside the Nipoppy dataset. You can create additional directories for non-imaging and non-tabular data under <DATASET_ROOT> or <DATASET_ROOT>/scratch (in moderation).

Summary

Prerequisites

Data directories

Directory

Content description

<DATASET_ROOT>/downloads

Data archives, web downloads, etc. (imaging and non-imaging data)

<DATASET_ROOT>/scratch/raw_imaging

Arbitrarily organized raw imaging data (DICOMs or NIfTIs)

Data archives and web downloads

The <DATASET_ROOT>/downloads directory is for storing data archives (e.g., .zip, .tar, or .tar.gz files), or any file downloaded/moved from another location (e.g., spreadsheets for raw tabular data). An example of this would be file dumps downloaded from web portals (e.g., LONI).

There is no specification for the internal organization inside this directory, though it should be internally consistent. If downloads are made at multiple points in time, files should be labelled with a timestamp (and not overwritten).

Attention

If you have imaging data that does not need to be uncompressed/extracted (for example, if the BIDS conversion pipeline you plan to use can handle data archives), then it should not go in the <DATASET_ROOT>/downloads directory. Instead, those files should go directly to the appropriate imaging data directory.

Raw imaging data

The <DATASET_ROOT>/scratch/raw_imaging directory is for storing raw imaging data as they are, before any organization/processing is done. It is okay (and expected) for the data in this directory to be messy or to follow an arbitrary organization (e.g., many subfolder levels).

Data in this directory will typically consists of DICOM files from scanners, though some analyses might start with files in the NIfTI format instead (e.g., if DICOM-to-NIfTI conversion has already been done and the original DICOMs are not available anymore).

Attention

If both DICOMs and NIfTIs are available, we recommend starting over with the DICOMs since they contain more information than NIfTIs for BIDS conversion. If that is not feasible, then <DATASET_ROOT>/scratch/raw_imaging should contain the NIfTIs, and the raw DICOMs can be archived and stored somewhere else (e.g., <DATASET_ROOT>/downloads).

Next steps

For imaging data, the next step is to reorganize the data in a way that prepares it for BIDS conversion.

If you have tabular non-imaging (e.g., demographic or assessments) data, guidelines for wrangling and linking tabular data can be found here