Setting Up A Data Configuration File (Participant/Subject List)

Overview

C-PAC’s Configuration YAML Files

C-PAC requires at least one pipeline configuration file and one data configuration file (also known as the participant list/subject list) in order to run an analysis. These configuration files are in the YAML file format, which matches contents in a key: value relationship much like a dictionary. This section will focus on the data configuration file setup.

The Data Configuration (Participant List)

The data configuration file is essentially a list of file paths to anatomical and functional MRI scans keyed by their unique IDs, and listed with any additional information as necessary. This file can be generated both via the GUI and the terminal. Both ways are explained below.

UPDATE (May 15, 2018): C-PAC v1.1.0 and Later The data configuration file layout has changed significantly from older versions. Functional scan file paths, and scan parameter information (or scan parameter file paths) are now nested beneath the “func:” key. See the “Data Configuration File YAML Fields” section below for examples.

If you have already-existing data configuration files and wish to update these for the new version, you can simply use your already-existing data settings preset files to regenerate the data configuration files using your new/upgraded install of C-PAC v1.1.0. If you have any questions, please feel free to reach us at our support forum.

Using the GUI

The C-PAC data configuration builder window accepts information about where to find your data, and allows you to customize what gets included in the final list. Once you’re done, it generates a data settings YAML file, which saves your preferences, so that you can easily edit and re-generate your data configuration YAML file later on.

_images/subject_list_gui.png
  1. Data format - [BIDS, Custom]: Whether or not the data is organized in accordance with the BIDS specification. More details below.
  2. BIDS Base Directory - [path]: The base directory of the BIDS-organized data, if you are using BIDS.
  3. Anatomical File Path Template - [text]: If the data is NOT in BIDS format, you can provide a file path template describing the anatomical scans here. More details below.
  4. Functional File Path Template - [text]: If the data is NOT in BIDS format, you can provide a file path template describing the functional scans here. More details below.
  5. Save Config Files Here - [path]: The directory where you want the data configuration builder to save both the data settings file (these configured options) and the data configuration file (the list of input data to be provided to CPAC).
  6. Participant List Name - [text]: The name/label for your data configuration and data settings files.
  7. (Optional) Which Anatomical Scan? - [text]: Sometimes, there are multiple anatomical scans per participant in a dataset. To make life easier, you can tell the data configuration builder which anatomical scan to select for each participant by entering a sub-string here that can be found in the name or label of the anatomical scan you’d like to use for the run. Also, if you are using the Custom Anatomical File Path Template, you can enter a wildcard (*) in the path template in the anatomical scan file name, and the sub-string you enter here will determine which of the files returned by that wildcard is written into the data configuration.
  8. (Optional) AWS Credentials File - [path]: Required if downloading data from a non-public S3 bucket on Amazon Web Services (AWS). This usually takes the form of a CSV file.
  9. (Optional) Scan Parameters File - [path]: Path to a CSV file specifying the slice time acquisition parameters for scans. If set to ‘None’, these parameters will either be defined by the NifTI headers or by an explicit slice order specified in the pipeline configuration builder. Instructions for creating this CSV file can be found here. Note: If your data is in BIDS format, the data configuration builder will read the scan parameters described in the data’s affiliated JSON file(s), if they exist, and a scan parameters CSV file is not required.
  10. (Optional) Field Map Phase File Path Template - [text]: If you are running field map-based distortion correction, AND your data is not in BIDS format, provide the file path template to your phase files here. If your data is in BIDS format, the data configuration builder will find these files automatically.
  11. (Optional) Field Map Magnitude File Path Template - [text]: If you are running field map-based distortion correction, AND your data is not in BIDS format, provide the file path template to your magnitude difference files here. If your data is in BIDS format, the data configuration builder will find these files automatically.
  12. (Optional) Include: Subjects - [text/path]: List the participant IDs to include, to have only those participants included in the list. Either enter it here (ex. “1001, 1002, 1007, ..”), or enter the file path of a text file containing each participant ID on its own line.
  13. (Optional) Exclude: Subjects - [text/path]: The same as above, except to exclude the participants you list here. Useful for when you only need a few dropped from the list of many.
  14. (Optional) Include: Sites - [text/path]: Which sites to include - can be a list or a text file, as described above.
  15. (Optional) Exclude: Sites - [text/path]: Which sites to exclude - can be a list or a text file, as described above.
  16. (Optional) Include: Sessions - [text/path]: Which sessions to include - can be a list or a text file, as described above.
  17. (Optional) Exclude: Sessions - [text/path]: Which sessions to exclude - can be a list or a text file, as described above.
  18. (Optional) Include: Series - [text/path]: Which series to include - can be a list or a text file, as described above.
  19. (Optional) Exclude: Series - [text/path]: Which series to exclude - can be a list or a text file, as described above.

Continue below for some example use cases.

From Terminal

You can configure the settings explained above in the data settings YAML file, then use the C-PAC command-line interface to generate your data configuration file.

If you don’t already have a data settings YAML file, you can generate one in your current directory by running:

cpac utils data_config new_template

You can then configure the file as needed. Once your data settings file is ready, you can generate your data configuration file by running:

cpac utils data_config build /path/to/data_settings.yml

Continue below for some example use cases.

Data: BIDS Format

A full description of the BIDS data organization specification can be found at bids.neuroimaging.io.

This is the simplest option. As the data is in BIDS format, the C-PAC data configuration builder will know where to find all of the input files, the scan parameters (if available), site information, and field map files (if applicable). The inclusion and exclusion options for the different data levels (participant, site, etc.) work as usual.

Using the GUI

Select “BIDS” as your Data Format, and specify where to save the configuration files and the participant list name. Then, provide the BIDS Base Directory, which is the top-most directory level within which your BIDS-organized data set is stored. Click Generate Data Config in the bottom-right corner to generate the data configuration, and to also save this setup into a data settings file. If you only want to save the settings to generate the data configuration for later, click Save Preset.

Using the cpac_data_config_setup.py script

In the data settings file, populate these fields:

dataFormat:                  ['BIDS']
bidsBaseDir:                 /path/to/BIDS/directory
outputSubjectListLocation:   /save/configs/here
subjectListName:             data_config_name

You can also fill in the AWS credentials file field, and the inclusion and exclusion fields, as needed.

Once your data settings file is ready, generate your data configuration file by running:

cpac utils data_config build /path/to/data_settings.yml

Data: Custom Layout

The C-PAC Data Configuration builder can handle a wide range of different directory organization layouts, but can only do it seamlessly for you if all of your data is organized in that same layout. If you have input files arranged in different ways, simply generate two different data configuration files, and then manually add one to the end of the other, in a text editor.

Using the GUI

Set your Data Format to “Custom”, and leave the BIDS Base Directory blank.

In the Anatomical File Path Template field, enter the full file path to any of your anatomical/structural input files (including the file extension .nii/.nii.gz), and then replace the appropriate directory levels with these tags:

{participant}
{site}           (if applicable)
{session}        (if applicable)

Note: C-PAC currently does not support multiple anatomical series/scans/runs at this time, but will do so in the following release!

Your template paths should look something like this, for the corresponding directory layouts:

Actual file:     /home/data/site-01/sub1003/session-A1/anat/mprage.nii.gz
Template path:   /home/data/{site}/{participant}/{session}/anat/mprage.nii.gz

Actual file:     /home/data/site-03/sub-1005_session-B1/anat/anat.nii
Template path:   /home/data/{site}/{participant}_{session}/anat/anat.nii

For the Functional File Path Template field, repeat the process with your functional input files. In addition, different levels of series/scans can be denoted with the following tag:

{series}

If you have field map files, and wish to perform field map-based distortion correction, the same process can be repeated for your field map phase and magnitude files, via the Field Map Phase File Path Template and Field Map Magnitude File Path Template fields in the GUI.

For the file path templates, only the {participant} tags are required. Defaults will be assigned to the other levels if they do not exist.

Then, fill in the fields for where you want the data configuration builder to save your files, and the name/label to use to name the files.

Once your settings are complete, nt list name. click Generate Data Config in the bottom-right corner to generate the data configuration, and to also save this setup into a data settings file. If you only want to save the settings to generate the data configuration for later, click Save Preset.

Using the C-PAC command-line interface (CLI)

Following the instructions for formatting your path templates given above, populate these fields in your data settings file:

dataFormat:                  ['Custom']
anatomicalTemplate:          /path/to/{site}/{participant}/{series}/anat/mprage.nii.gz
functionalTemplate:          /path/to/{site}/{participant}/{series}/func/{series}/bold.nii.gz
outputSubjectListLocation:   /save/configs/here
subjectListName:             data_config_name

You can also fill in the AWS credentials file field, and the inclusion and exclusion fields, as needed.

Once your data settings file is ready, generate your data configuration file by running:

cpac utils data_config build /path/to/data_settings.yml

Custom Path Templates

Here are the file path templates used for the 1000 Functional Connectomes data release, as well as an illustration of the directory structure used for the release:

Anatomical Template:  /path/to/data/{site}/{participant}/anat/mprage_anonymized.nii.gz
Functional Template:  /path/to/data/{site}/{participant}/func/rest.nii.gz
_images/fcon_structure.png

Another example is the file structure used by the ABIDE and ADHD-200 releases:

Anatomical Template:  /path/to/data/{site}/{participant}/{session}/anat_*/mprage.nii.gz
Functional Template:  /path/to/data/{site}/{participant}/{session}/rest_*/rest.nii.gz
_images/abide_adhd_structure.png

A final example is the file structure used by the Enhanced Nathan Kline Institute-Rockland Sample:

Anatomical Template:  /path/to/data/{site}/{participant}/anat/mprage.nii.gz
Functional Template:  /path/to/data/{site}/{participant}/{session}/RfMRI_*/rest.nii.gz
_images/nki-rs_template.png

Users experiencing difficulties defining file path templates may want to re-organize their data to match one of the examples above. If you manually define a file path template and encounter an error when attempting to generate participant lists, please contact us and we will be happy to help.

Data YAML Fields

The data configuration builder GUI or the cpac_data_config_setup.py command line utility will produce a YAML file containing all of the participants and various properties associated with that participant, such as its ID, session number, the location of its resting-state/functional and anatomical scans. Before each participant definition there is a single line with a dash, which indicates that start of the property definitions. Participant properties are indented under this dash. To illustrate, see the sample participant definition below:

# example of data stored locally
-
    subject_id: sub01
    unique_id: ses01
    anat: /path/to/site01/sub01/ses01/anatomical.nii.gz
    creds_path: None
    func:
      scan_1:
        scan: /path/to/site01/sub01/ses01/scan_1_func.nii.gz
        scan_parameters:
          acquisition: seq+z
          firsttr (start volume index): ''
          lasttr (final volume index): ''
          reference: 27
          tr: 3.0
    site: site01
-
    subject_id: sub02
    unique_id: ses02
    anat: /path/to/site01/sub02/ses02/anatomical.nii.gz
    creds_path: None
    func:
      scan_1:
        scan: /path/to/site01/sub02/ses02/scan_1_func.nii.gz
        scan_parameters: None
    site: site01

# example of data stored on an AWS S3 bucket
-
    subject_id: sub200
    unique_id: ses-1
    anat: s3://s3_bucket/path/to/site_A/sub200/anatomical.nii.gz
    creds_path: None (or) /path/to/AWS_credentials.csv
    func:
      scan_name_REST:
        scan: s3://s3_bucket/path/to/site_A/sub200/scan_name_REST_func.nii.gz
        scan_parameters: s3://s3_bucket/path/to/site_A/scan_name_REST_func.json
    site: site_A

# with field map files for distortion correction
-
    subject_id: sub01
    unique_id: ses01
    anat: /path/to/site01/sub01/ses01/anatomical.nii.gz
    creds_path: None
    func:
      scan_1:
        scan: /path/to/site01/sub01/ses01/scan_1_func.nii.gz
        fmap_phase: /path/to/site01/sub01/ses01/scan_1_phase-diff.nii.gz
        fmap_mag: /path/to/site01/sub01/ses01/scan_1_magnitude.nii.gz
    site: site01

Note that more than one functional scan is defined under the func key (i.e., multiple series), and that individual scan parameters can be defined to override the settings used in the C-PAC pipeline configuration GUI.