Setting Up Group Analysis

Overview

C-PAC uses FSL/FEAT to compare findings across groups. You can construct models using a subject list and a phenotype file, select derivatives to be predicted by the model, and define contrasts between conditions using either the GUI or a custom csv file. Then FSL/FEAT will run a second-level General Linear Model (GLM) for you.

The following links provide an introduction to how groups are compared using FSL, as well as how to define contrasts:

The following presentation also gives a good overview of the group analysis user interface:

The example files used in the presentation above are also available below for your perusal:

As with subject list and pipeline configuration, there are two ways to set up FSL group analysis for C-PAC:

  • Using a text editor (useful for remote servers where using the GUI is not possible or impractical)
  • Using the pipeline configuration interface in the GUI

Using a Text Editor

Similar to the pipeline and data configuration YAMLs for the pipeline and subject list specification steps, you can generate a group analysis configuration as a YAML. An example of such a file can be found here. A list of possible keys and values for this YAML are listed below.

Key Description Potential Values
participant_list A list of subjects to be included in the model. This should be a text file with one subject per line. A path (e.g., ‘/data/my_analysis/participant_list.txt’).
pheno_file A csv file containing EV information for each subject. A path (e.g., ‘/data/my_analysis/ev.csv’).
participant_id_label Name of the subjects column in your EV file. A string.
ev_selections Specify which EVs from your phenotype are categorical or numerical. Of those which are numerical, specify which are to be demeaned. A dictionary with two keys, ‘demean’ and ‘categorical’. Each of these keys has a list as its value, with the names of the EVs that are categorical and which are to be demeaned. For example: {‘demean’: [‘age’], ‘categorical’: [‘sex’, ‘diagnosis’]}
design_formula Specify the formula to describe your model design. Essentially, including EVs in this formula inserts them into the model. The most basic format to include each EV you select would be ‘EV + EV + EV + ..’, etc. You can also select to include MeanFD, Measure_Mean, and Custom_ROI_Mean here. See the GUI instructions for more details. A formula (e.g., ‘sex + diagnosis + age + MeanFD_Jenkinson’).
mean_mask Choose whether to use a group mask or individual-specific mask when calculating the output means to be used as a regressor. This only takes effect if you include the ‘Measure_Mean’ regressor in your Design Matrix Formula. A string within a list- can be [‘Group Mask’] or [‘Individual Mask’].
custom_roi_mask Full path to a NIFTI file containing one or more ROI masks. The means of the masked regions will then be computed for each subject’s output and will be included in the model as regressors (one for each ROI in the mask file) if you include ‘Custom_ROI_Mean’ in the Design Matrix Formula. A path (e.g., ‘/data/my_analysis/ROI.nii.gz’) or None.
derivative_list Choose the derivatives to run the group model on. These must be written out as a list, and must be one of the options listed below this table. A list (e.g., [‘alff_to_standard_smooth_zstd’, ‘sca_roi_files_to_standard_smooth_fisher_zstd’] ).
coding_scheme Choose the coding scheme to use when generating your model. ‘Treatment’ encoding is generally considered the typical scheme. Either ‘Treatment’ or ‘Sum’.
group_sep Specify whether FSL should model the variance for each group separately. If this option is enabled, you must specify a grouping variable with the key ‘group_var’. On,Off (no quotes)
grouping_var The name of the EV that should be used to group subjects when modeling variances. If you do not wish to model group variances separately, set this value to None. An EV name without quotes (e.g., ‘sex’) or None.
z_threshold Only voxels with a Z-score higher than this value will be considered significant. A list containing a decimal value as a string (e.g., [‘2.3’]).
p_threshold Significance threshold (P-value) to use when doing cluster correction for multiple comparisons. A list containing a decimal value as a string (e.g., [‘0.05’]).
sessions_list For repeated measures only. This is a list of session names that you wish to include in a single model to run repeated measures or within-subject analysis. Repeated measures will run automatically if this is non-empty. A list containing a session names as strings (e.g., [‘session_1’,’session_2’]).
series_list For repeated measures only. This is a list of series/scan names that you wish to include in a single model to run repeated measures or within-subject analysis. Repeated measures will run automatically if this is non-empty. A list containing a scan names as strings (e.g., [‘scan_1’,’scan_2’]).
contrasts The contrasts to be run as part of the group analysis. See the GUI instructions for more details. A list of contrast descriptions (e.g., [‘C(diagnosis)[T.ADHD] - C(diagnosis)[T.Typical] = 0’, ‘age = 0’]).
f_tests An optional list of f-test strings containing contrasts. If you do not wish to run f-tests, leave this blank. A list of strings containing all the contrasts to be included in the f-test separated by commas (e.g., [‘C(diagnosis)[T.ADHD] - C(diagnosis)[T.Typical] = 0, age = 0’]).
custom_contrasts An optional path to a CSV file which specifies the contrasts you wish to run in group analysis. See ‘Custom Contrasts’ section below. A path (e.g., ‘/data/my_analysis/custom_contrasts.csv’).
model_name A name for the new model. A string.
output_dir Full path to the directory where CPAC should place model files. A path (e.g.,’/data/my_analysis/output’).

The possible values for the items in the derivatives key are as follows:

  • For z-scored analyses:
‘alff_to_standard_zstd’, ‘alff_to_standard_smooth_zstd’, ‘falff_to_standard_zstd’, ‘falff_to_standard_smooth_zstd’, ‘reho_to_standard_zstd’, ‘reho_to_standard_smooth_zstd’, ‘sca_roi_files_to_standard_fisher_zstd’, ‘sca_roi_files_to_standard_smooth_fisher_zstd’, ‘vmhc_fisher_zstd_zstat_map’, ‘dr_tempreg_maps_zstat_files_to_standard’, ‘dr_tempreg_maps_zstat_files_to_standard_smooth’, ‘sca_tempreg_maps_zstat_files’, ‘sca_tempreg_maps_zstat_files_smooth’, ‘centrality_outputs_zstd’, ‘centrality_outputs_smoothed_zstd’

Using the GUI

To configure group-level analysis from the main screen of the GUI, select a pipeline for which you have run preprocessing and individual-level analysis, then click Edit. Then, navigate to Group Analysis Settings and click on the + symbol. Here you may select a model configuration YAML file to use by clicking on the folder to the right to select a file or by typing in a path. Clicking Create or Load FSL Model will allow you to specify models via the C-PAC interface, or load a YAML configuration file containing model settings that can then be adjusted interactively.

_images/ga_main.png
  1. Number of Models to Run Simultaneously - [integer]: This number depends on computing resources. Choose how many models to run at the same time (parallelization).
  2. Models to Run - [checkboxes]: Use the + to add FSL Models to be run (or to create new models).

Specifying Models to Run

_images/ga_model_setup.png
  1. Participant List - [path]: Full path to a list of subjects to be included in the model. This should be a text file with one subject per line. A list in this format containing all subjects run through CPAC was generated along with the main CPAC subject list (see the subject list in Overview). Another easy way to manually create this file is to copy the subjects column from your Regressor/EV spreadsheet.
  2. Phenotype/EV File -[path]: Full path to a .csv file containing EV information for each subject. A file in this format (containing a single column listing all subjects run through CPAC) was generated along with the main CPAC subject list (see the phenotype file in Overview). Levels for categorical variables in this file can be expressed as words (‘ADHD’/’TD’) or numerical values (0/1) depending on your preferences.
  3. Participant Column Name [text]: Name of the subjects column in your EV file.
  4. Model Setup - [checkboxes]: A list of EVs from your phenotype file will populate in this window. From here, you can select whether the EVs should be treated as categorical or if they should be demeaned (continuous/non-categorical EVs only). ‘MeanFD’ and ‘Measure Mean’ will also appear in this window automatically as options to be used as regressors that can be included in your model design. Note that the MeanFD and mean of measure values are automatically calculated and supplied by C-PAC via individual-level analysis. Measure mean is calculated using the mean signal from raw data rather than z-score data, which is then demeaned. Also, MeanFD and mean of measure values are automatically demeaned prior to being inserted into the group analysis model.
  5. Design Matrix Formula - [Patsy formula]: Specify the formula to describe your model design. Essentially, including EVs in this formula inserts them into the model. The most basic format to include each EV you select would be ‘EV + EV + EV + ..’, etc. You can also select to include MeanFD, Measure_Mean, or an intercept here (by adding ‘ + Measure_Mean’, ‘ + MeanFD_<Power/Jenkinson>’ or ‘ + Intercept ‘ respectively). Note that when you add an intercept to your formula categorical variables will automatically be demeaned since this is a requirement for FLAME to run properly. This design formula is pre-generated for the user depending on the EVs in the phenotype file, but can be edited at any time. C-PAC uses the Python library Patsy to generate the design matrices, so more information on how to format your design formula for specific designs can be found here- Patsy formula documentation. If you have used R in the past, Patsy’s formula syntax should be familiar.
  6. Custom ROI Mean Mask (optional) - [path]: Use a binarized mask with one or more ROIs to add averages for those ROIs to the model as EVs. Mask file must be in NifTI format. These averages will be calculated using the raw data, rather than z-scored data, and will then be demeaned afterwards.
  7. Select Derivatives - [checkboxes]: Select which derivatives you would like to include when running group analysis. When including Dual Regression, make sure to correct your P-value for the number of maps you are comparing. When including Multiple Regression SCA, you must have more degrees of freedom (subjects) than there were time series.
  8. Mask for Means Calculation - [Group Mask, Individual Mask]: C-PAC can add the average voxel intensity for a derivative as an EV in the model. If this average voxel intensity is present in the model, this menu allows you to select either a group-level or individual-level mask. Otherwise, this menu can be ignored.
  9. Use z-score Standardized Derivatives - [True, False]: Run model on a z-score standardized version of individual-level outputs or the raw versions.
  10. Z Threshold - [decimal]: Only voxels with a Z-score higher than this value will be considered significant.
  11. Cluster Significance Threshold - [decimal]: Significance threshold (P-value) to use when doing cluster correction for multiple comparisons.
  12. Coding Scheme - [Treatment, Sum]: Select the encoding for your design matrix. For more details, see Patsy’s pages on Treatment and Sum coding.
  13. Model Group Variances Separately - [Off, On]: Specify whether FSL should model the variance for each group separately. If this option is enabled, you must specify a grouping variable below.
  14. Grouping Variable - [text]: The name of the EV that should be used to group subjects when modeling variances. If you do not wish to model group variances separately, set this value to None.
  15. Sessions (Repeated Measures Only) - [dialogue: list of session names]: Enter the session names in your dataset that you wish to include within the same model (this is for repeated measures/ within-subject designs). These will be the names listed as “unique_id” in the original individual-level participant list, or the labels in the original data directories you marked as {session} while creating the C-PAC participant list. Do not adjust your phenotype CSV- C-PAC will re-formulate it internally before passing it to FLAME.
  16. Series/Scans (Repeated Measures Only) - [dialogue: list of scan names]: Enter the series names in your dataset that you wish to include within the same model (this is for repeated measure / within-subject designs). These will be the labels listed under “rest:” in the original individual-level participant list, or the labels in the original data directories you marked as {series} while creating the C-PAC participant list. Do not adjust your phenotype CSV- C-PAC will re-formulate it internally before passing it to FLAME.

Upon populating these fields and clicking Load Phenotype File, your model builder will look something like this:

_images/ga_model_setup_populated.png

Upon making your selections and clicking Next, you will be able to define contrasts.

Specifying Contrasts

_images/ga_contrasts.png
  1. Contrasts - [checkboxes]: Specify your contrasts in this box. When the model builder builds the design matrix, it will process the categorical variables appropriately and provide the names of the different levels available as contrast labels (printed in Patsy syntax) , listed under ‘Available Contrasts’. Contrasts are specified as formulas (e.g., C(diagnosis)[ADHD] + C(diagnosis)[TD]) = 0 for an ADHD > TD contrast or age = 0 for an age > 0 contrast). Note that the way in which you define your model will determine which contrasts are capable of being run. When you are done specifying contrasts check the contrasts you wish to run.
  2. f-Tests - [checkboxes]: Define an f-test by selecting two or more contrasts to include. When you are done, select the f-tests that you wish to run.
  3. Custom Contrasts Matrix - [path]: Define contrasts using a custom CSV file. Instructions for constructing a csv may be found below. Note that if you choose to use a custom CSV, any of the options specified in the ‘Contrasts’ and ‘f-Tests’ boxes will be ignored.
  4. Model Name - [text]: Specify a name for the new model.
  5. Output Directory - [path]: Full path to the directory where CPAC should place the model files (.mat, .con, .grp, .csv) and the outputs of group analysis. The CSV file is a human-readable version of the .mat file used by FLAME that you can use to examine the exact model that C-PAC generated.

When you are done, the contrast screen should look like this:

_images/ga_contrasts_populated.png

Click Save Settings and place your model specification within an appropriate directory. You are now able to reload it for future use.

Creating a Custom CSV File

_images/ga_contrast_csv.png

A custom contrasts csv can be used to define contrasts manually rather than using the graphical model builder. When you create a custom contrasts csv, fill the first cell of the first row with the label ‘Contrasts’, followed by labels for each of the EVs you wish to use. The first column should be filled with labels for the contrasts that you can define - these do not have to follow any particular convention, and can be whatever works best for your experiment. The remainder of the cells can be populated with contrast weights according to your needs.

If you would like to add f-tests, add each f-test as a column to the csv and the assign weights to each contrast to be included in the f-test.

_images/ga_contrast_ftest.png