FSL-FEAT Group Analysis

Overview

C-PAC uses the FSL/FEAT flameo tool to compare findings across groups. You can construct models using a participant list and a phenotype file, select derivatives to be predicted by the model, and define contrasts between conditions using either the GUI or a custom csv file. Then FSL/FEAT will run a second-level General Linear Model (GLM) for you.

There are two ways to set up FSL-FEAT group-level analysis for C-PAC:

  • The FLAME Model Presets (NEW! C-PAC v1.1.0 and later): this allows you to generate a pre-configured analysis model.
  • The Group Analysis Model Builder: this allows you to specify a model from scratch, or to modify any of the generated presets mentioned above.

The following links provide an introduction to how groups are compared using FSL, as well as how to define contrasts:

The following presentation also gives a good overview of the group analysis user interface:

Group Analysis FLAME Presets (NEW! May 15, 2018)

C-PAC has a selection of model presets designed to run commonly-used group analysis designs for FSL-FEAT/FLAME. These correspond to examples provided on FSL’s user guide for FEAT/FLAME. The preset generator will create a group analysis YAML configuration file that you can plug directly into C-PAC and run.

The presets that are generated:

  • are meant to get you up and running quickly
  • are in the form of a group analysis configuration YAML file that can be plugged directly into C-PAC and run
  • include a complete design matrix as an input, which you can view before running
  • include already-configured contrasts in a custom contrasts .CSV file, which can be edited before running
  • can be modified to your liking either by using the standard Group Analysis Model Builder, or by hand via text editor
  • are all in one place in the output directory you specify in the Preset Generator

To configure group-level analysis from the main screen of the GUI, select a pipeline for which you have run preprocessing and individual-level analysis, then click Edit. Then, navigate to Group Analysis Settings and click on the + symbol. Here you may select a model configuration YAML file to use by clicking on the folder to the right to select a file or by typing in a path. Clicking Choose FLAME Model Preset will allow you to generate a pre-set model.

Preset options: Currently, there are 5 presets to choose from. More are on their way over the next few releases this summer (June-August 2018). If you have any commonly-used or useful group model designs you’d like to see as a preset, please let us know!

The available presets are:

From Terminal

You can generate any of these presets using the C-PAC command-line interface (CLI):

cpac group feat load_preset <preset type>

Enter any of the following in place of <preset type> for the type of analysis you want to run:

  • single_grp_avg - Single Group Average (One-Sample T-Test)
  • single_grp_cov - Single Group Average with Additional Covariate
  • unpaired_two - Unpaired Two-Group Difference (Two-Sample Unpaired T-Test)
  • paired_two - Paired Two-Group Difference (Two-Sample Paired T-Test)
  • tripled_two - Tripled Two-Group Difference (‘Tripled’ T-Test)

You can get more information about the required inputs for each preset with the --help flag. For example, to check the parameters for a two-sample unpaired t-test:

cpac group feat load_preset unpaired_two --help

This will produce:

Usage: cpac group feat load_preset unpaired_two [OPTIONS] GROUP_PARTICIPANTS
                                                Z_THRESH P_THRESH PHENO_FILE
                                                PHENO_SUB COVARIATE MODEL_NAME
Options:
  --output_dir TEXT
  --help             Show this message and exit.

Following this, you could generate a ready-to-run two-sample unpaired t-test by running the following, assuming the phenotype CSV has a column of participant IDs named “subject_id” and a column named “diagnosis”, which is the covariate you wish to test:

cpac group feat load_preset unpaired_two /path/to/group_participant_list.txt 2.3 0.05
        /path/to/phenotypic_file.csv subject_id diagnosis grp_analysis1
        --output_dir /path/to/output_dir

You will receive a message like this shortly after:

Group-level analysis participant list written:
/path/to/output_dir/group_analysis_participants.txt

CSV file written:
/path/to/output_dir/cpac_group_analysis/grp_analysis1/design_matrix_grp_analysis1.csv

CSV file written:
/path/to/output_dir/cpac_group_analysis/grp_analysis1/contrasts_matrix_grp_analysis1.csv

Group-level analysis configuration YAML file written:
/path/to/output_dir/cpac_group_analysis/grp_analysis1/gpa_fsl_config_grp_analysis1.yml

The group-level analysis configuration YAML file can then be fed into either the C-PAC group analysis model builder (info below) for editing or extending your analysis model, or listed directly in the C-PAC pipeline configuration to run it immediately. Once this group analysis configuration file is listed in the modelConfigs key of your pipeline configuration, you can run it either from the GUI, or from terminal by running:

cpac group feat /path/to/pipeline_config.yaml

Using the GUI

_images/group_presets_one.png
  1. Choose Preset: Select the type of preset you’d like to generate. The preset generator will prompt you for more information relevant to the type of preset you selected on the next window (except for the Single Group Average (One-Sample T-Test), which needs no additional information).
  2. Participant List - [path]: Full path to a list of subjects to be included in the model. This should be a text file with one subject per line. A list in this format containing all subjects run through CPAC was generated along with the main CPAC subject list (see the subject list in Overview). Another easy way to manually create this file is to copy the subjects column from your Regressor/EV spreadsheet.
  3. Select Derivatives - [checkboxes]: Select which derivatives you would like to include when running group analysis. When including Dual Regression, make sure to correct your P-value for the number of maps you are comparing. When including Multiple Regression SCA, you must have more degrees of freedom (subjects) than there were time series.
  4. Z Threshold - [decimal]: Only voxels with a Z-score higher than this value will be considered significant.
  5. Cluster Significance Threshold - [decimal]: Significance threshold (P-value) to use when doing cluster correction for multiple comparisons.
  6. Model Name - [text]: Specify a name for the new model.
  7. Output Directory - [path]: Full path to the directory where CPAC should place the model files (.mat, .con, .grp, .csv) and the outputs of group analysis. The input files and the group analysis configuration YAML file generated by the preset generator will also be written here.
_images/group_presets_two.png

This window opens if you have selected any of the presets that require phenotypic information [Single Group Average with Additional Covariate, Unpaired Two-Group Difference (Two-Sample Unpaired T-Test)].

  1. Phenotype/EV File -[path]: Full path to a .csv file containing EV information for each subject. A file in this format (containing a single column listing all subjects run through CPAC) was generated along with the main CPAC subject list (see the phenotype file in Overview). Levels for categorical variables in this file can be expressed as words (‘ADHD’/’TD’) or numerical values (0/1) depending on your preferences.
  2. Participant Column Name [text]: Name/label of the subjects column in your phenotype file.
  3. Two groups from pheno to compare [text]: Enter the names/labels of the two columns from your phenotype file, separated by a comma, that specify the two groups (categorical EVs) you wish to compare, OR, if the two groups are encoded in one column only (ex. Sex: M,F,F,M,.. or Diagnosis: 1,1,0,0,..), simply enter the name of that one column.
_images/group_presets_three.png

This window opens if you have selected any of the presets that compare paired/linked groups or conditions [Paired Two-Group Difference (Two-Sample Paired T-Test), Tripled Two-Group Difference (‘Tripled’ T-Test)].

  1. Conditions: Sessions or Series/Scans?: Choose whether the linked groups or conditions you wish to compare are separated in the data as Sessions (multiple visits into the scanner) or Series/Scans (multiple acquisitions during a single session).
  2. Session or Series/Scan IDs - [dialogue: list of names]: Enter the session or series/scan ID names you wish to compare. If they are sessions, they can be found under ‘unique_id’ in the individual-level data configuration YAML file for the data in question, or in the ‘participant_session’ IDs that you would see in the individual-level analysis output directory (ex. ‘3005_1’ would be participant 3005, session 1). If they are series/scans, they will be the functional scan names, which can be found in the data configuration YAML file nested under ‘func’.

C-PAC Group Analysis Model Builder

Use the model builder to create an FSL FEAT model from scratch, or to modify any of the generated presets.

To configure group-level analysis from the main screen of the GUI, select a pipeline for which you have run preprocessing and individual-level analysis, then click Edit. Then, navigate to Group Analysis Settings and click on the + symbol. Here you may select a model configuration YAML file to use by clicking on the folder to the right to select a file or by typing in a path. Clicking FLAME Model Builder/Editor will allow you to specify models via the C-PAC interface, or load a YAML configuration file containing model settings that can then be adjusted interactively.

_images/ga_main.png
  1. Number of Models to Run Simultaneously - [integer]: This number depends on computing resources. Choose how many models to run at the same time (parallelization).
  2. Models to Run - [checkboxes]: Use the + to add FSL Models to be run (or to create new models).

Specifying Models to Run

_images/ga_model_setup.png
  1. Participant List - [path]: Full path to a list of subjects to be included in the model. This should be a text file with one subject per line. A list in this format containing all subjects run through CPAC was generated along with the main CPAC subject list (see the subject list in Overview). Another easy way to manually create this file is to copy the subjects column from your Regressor/EV spreadsheet.
  2. Phenotype/EV File -[path]: Full path to a .csv file containing EV information for each subject. A file in this format (containing a single column listing all subjects run through CPAC) was generated along with the main CPAC subject list (see the phenotype file in Overview). Levels for categorical variables in this file can be expressed as words (‘ADHD’/’TD’) or numerical values (0/1) depending on your preferences.
  3. Participant Column Name [text]: Name of the subjects column in your EV file.
  4. Model Setup - [checkboxes]: A list of EVs from your phenotype file will populate in this window. From here, you can select whether the EVs should be treated as categorical or if they should be demeaned (continuous/non-categorical EVs only). ‘MeanFD’ and ‘Measure Mean’ will also appear in this window automatically as options to be used as regressors that can be included in your model design. Note that the MeanFD and mean of measure values are automatically calculated and supplied by C-PAC via individual-level analysis. Measure mean is calculated using the mean signal from raw data rather than z-score data, which is then demeaned. Also, MeanFD and mean of measure values are automatically demeaned prior to being inserted into the group analysis model.
  5. Design Matrix Formula - [Patsy formula]: Specify the formula to describe your model design. Essentially, including EVs in this formula inserts them into the model. The most basic format to include each EV you select would be ‘EV + EV + EV + ..’, etc. You can also select to include MeanFD, Measure_Mean, or an intercept here (by adding ‘ + Measure_Mean’, ‘ + MeanFD_<Power/Jenkinson>’ or ‘ + Intercept ‘ respectively). Note that when you add an intercept to your formula categorical variables will automatically be demeaned since this is a requirement for FLAME to run properly. This design formula is pre-generated for the user depending on the EVs in the phenotype file, but can be edited at any time. C-PAC uses the Python library Patsy to generate the design matrices, so more information on how to format your design formula for specific designs can be found here- Patsy formula documentation. If you have used R in the past, Patsy’s formula syntax should be familiar.
  6. Custom ROI Mean Mask (optional) - [path]: Use a binarized mask with one or more ROIs to add averages for those ROIs to the model as EVs. Mask file must be in NifTI format. These averages will be calculated using the raw data, rather than z-scored data, and will then be demeaned afterwards.
  7. Select Derivatives - [checkboxes]: Select which derivatives you would like to include when running group analysis. When including Dual Regression, make sure to correct your P-value for the number of maps you are comparing. When including Multiple Regression SCA, you must have more degrees of freedom (subjects) than there were time series.
  8. Mask for Means Calculation - [Group Mask, Individual Mask]: C-PAC can add the average voxel intensity for a derivative as an EV in the model. If this average voxel intensity is present in the model, this menu allows you to select either a group-level or individual-level mask. Otherwise, this menu can be ignored.
  9. Use z-score Standardized Derivatives - [True, False]: Run model on a z-score standardized version of individual-level outputs or the raw versions.
  10. Z Threshold - [decimal]: Only voxels with a Z-score higher than this value will be considered significant.
  11. Cluster Significance Threshold - [decimal]: Significance threshold (P-value) to use when doing cluster correction for multiple comparisons.
  12. Coding Scheme - [Treatment, Sum]: Select the encoding for your design matrix. For more details, see Patsy’s pages on Treatment and Sum coding.
  13. Model Group Variances Separately - [Off, On]: Specify whether FSL should model the variance for each group separately. If this option is enabled, you must specify a grouping variable below.
  14. Grouping Variable - [text]: The name of the EV that should be used to group subjects when modeling variances. If you do not wish to model group variances separately, set this value to None.
  15. Sessions (Repeated Measures Only) - [dialogue: list of session names]: Enter the session names in your dataset that you wish to include within the same model (this is for repeated measures/ within-subject designs). These will be the names listed as “unique_id” in the original individual-level participant list, or the labels in the original data directories you marked as {session} while creating the C-PAC participant list. Do not adjust your phenotype CSV- C-PAC will re-formulate it internally before passing it to FLAME.
  16. Series/Scans (Repeated Measures Only) - [dialogue: list of scan names]: Enter the series names in your dataset that you wish to include within the same model (this is for repeated measure / within-subject designs). These will be the labels listed under “rest:” in the original individual-level participant list, or the labels in the original data directories you marked as {series} while creating the C-PAC participant list. Do not adjust your phenotype CSV- C-PAC will re-formulate it internally before passing it to FLAME.

Upon populating these fields and clicking Load Phenotype File, your model builder will look something like this:

_images/ga_model_setup_populated.png

Upon making your selections and clicking Next, you will be able to define contrasts.

Specifying Contrasts

_images/ga_contrasts.png
  1. Contrasts - [checkboxes]: Specify your contrasts in this box. When the model builder builds the design matrix, it will process the categorical variables appropriately and provide the names of the different levels available as contrast labels (printed in Patsy syntax) , listed under ‘Available Contrasts’. Contrasts are specified as formulas (e.g., C(diagnosis)[ADHD] + C(diagnosis)[TD]) = 0 for an ADHD > TD contrast or age = 0 for an age > 0 contrast). Note that the way in which you define your model will determine which contrasts are capable of being run. When you are done specifying contrasts check the contrasts you wish to run.
  2. f-Tests - [checkboxes]: Define an f-test by selecting two or more contrasts to include. When you are done, select the f-tests that you wish to run.
  3. Custom Contrasts Matrix - [path]: Define contrasts using a custom CSV file. Instructions for constructing a csv may be found below. Note that if you choose to use a custom CSV, any of the options specified in the ‘Contrasts’ and ‘f-Tests’ boxes will be ignored.
  4. Model Name - [text]: Specify a name for the new model.
  5. Output Directory - [path]: Full path to the directory where CPAC should place the model files (.mat, .con, .grp, .csv) and the outputs of group analysis. The CSV file is a human-readable version of the .mat file used by FLAME that you can use to examine the exact model that C-PAC generated.

When you are done, the contrast screen should look like this:

_images/ga_contrasts_populated.png

Click Save Settings and place your model specification within an appropriate directory. You are now able to reload it for future use.

Creating a Custom CSV File

_images/ga_contrast_csv.png

A custom contrasts csv can be used to define contrasts manually rather than using the graphical model builder. When you create a custom contrasts csv, fill the first cell of the first row with the label ‘Contrasts’, followed by labels for each of the EVs you wish to use. The first column should be filled with labels for the contrasts that you can define - these do not have to follow any particular convention, and can be whatever works best for your experiment. The remainder of the cells can be populated with contrast weights according to your needs.

If you would like to add f-tests, add each f-test as a column to the csv and the assign weights to each contrast to be included in the f-test.

_images/ga_contrast_ftest.png

Using a Text Editor

Similar to the pipeline and data configuration YAMLs for the pipeline and subject list specification steps, you can generate a group analysis configuration as a YAML. An example of such a file can be found here. A list of possible keys and values for this YAML are listed below.

Key Description Potential Values
participant_list A list of subjects to be included in the model. This should be a text file with one subject per line. A path (e.g., ‘/data/my_analysis/participant_list.txt’).
pheno_file A csv file containing EV information for each subject. A path (e.g., ‘/data/my_analysis/ev.csv’).
participant_id_label Name of the subjects column in your EV file. A string.
ev_selections Specify which EVs from your phenotype are categorical or numerical. Of those which are numerical, specify which are to be demeaned. A dictionary with two keys, ‘demean’ and ‘categorical’. Each of these keys has a list as its value, with the names of the EVs that are categorical and which are to be demeaned. For example: {‘demean’: [‘age’], ‘categorical’: [‘sex’, ‘diagnosis’]}
design_formula Specify the formula to describe your model design. Essentially, including EVs in this formula inserts them into the model. The most basic format to include each EV you select would be ‘EV + EV + EV + ..’, etc. You can also select to include MeanFD, Measure_Mean, and Custom_ROI_Mean here. See the GUI instructions for more details. A formula (e.g., ‘sex + diagnosis + age + MeanFD_Jenkinson’).
mean_mask Choose whether to use a group mask or individual-specific mask when calculating the output means to be used as a regressor. This only takes effect if you include the ‘Measure_Mean’ regressor in your Design Matrix Formula. A string within a list- can be [‘Group Mask’] or [‘Individual Mask’].
custom_roi_mask Full path to a NIFTI file containing one or more ROI masks. The means of the masked regions will then be computed for each subject’s output and will be included in the model as regressors (one for each ROI in the mask file) if you include ‘Custom_ROI_Mean’ in the Design Matrix Formula. A path (e.g., ‘/data/my_analysis/ROI.nii.gz’) or None.
derivative_list Choose the derivatives to run the group model on. These must be written out as a list, and must be one of the options listed below this table. A list (e.g., [‘alff_to_standard_smooth_zstd’, ‘sca_roi_files_to_standard_smooth_fisher_zstd’] ).
coding_scheme Choose the coding scheme to use when generating your model. ‘Treatment’ encoding is generally considered the typical scheme. Either ‘Treatment’ or ‘Sum’.
group_sep Specify whether FSL should model the variance for each group separately. If this option is enabled, you must specify a grouping variable with the key ‘group_var’. On,Off (no quotes)
grouping_var The name of the EV that should be used to group subjects when modeling variances. If you do not wish to model group variances separately, set this value to None. An EV name without quotes (e.g., ‘sex’) or None.
z_threshold Only voxels with a Z-score higher than this value will be considered significant. A list containing a decimal value as a string (e.g., [‘2.3’]).
p_threshold Significance threshold (P-value) to use when doing cluster correction for multiple comparisons. A list containing a decimal value as a string (e.g., [‘0.05’]).
sessions_list For repeated measures only. This is a list of session names that you wish to include in a single model to run repeated measures or within-subject analysis. Repeated measures will run automatically if this is non-empty. A list containing a session names as strings (e.g., [‘session_1’,’session_2’]).
series_list For repeated measures only. This is a list of series/scan names that you wish to include in a single model to run repeated measures or within-subject analysis. Repeated measures will run automatically if this is non-empty. A list containing a scan names as strings (e.g., [‘scan_1’,’scan_2’]).
contrasts The contrasts to be run as part of the group analysis. See the GUI instructions for more details. A list of contrast descriptions (e.g., [‘C(diagnosis)[T.ADHD] - C(diagnosis)[T.Typical] = 0’, ‘age = 0’]).
f_tests An optional list of f-test strings containing contrasts. If you do not wish to run f-tests, leave this blank. A list of strings containing all the contrasts to be included in the f-test separated by commas (e.g., [‘C(diagnosis)[T.ADHD] - C(diagnosis)[T.Typical] = 0, age = 0’]).
custom_contrasts An optional path to a CSV file which specifies the contrasts you wish to run in group analysis. See ‘Custom Contrasts’ section below. A path (e.g., ‘/data/my_analysis/custom_contrasts.csv’).
model_name A name for the new model. A string.
output_dir Full path to the directory where CPAC should place model files. A path (e.g.,’/data/my_analysis/output’).

The possible values for the items in the derivatives key are as follows:

  • For z-scored analyses:
‘alff_to_standard_zstd’, ‘alff_to_standard_smooth_zstd’, ‘falff_to_standard_zstd’, ‘falff_to_standard_smooth_zstd’, ‘reho_to_standard_zstd’, ‘reho_to_standard_smooth_zstd’, ‘sca_roi_files_to_standard_fisher_zstd’, ‘sca_roi_files_to_standard_smooth_fisher_zstd’, ‘vmhc_fisher_zstd_zstat_map’, ‘dr_tempreg_maps_zstat_files_to_standard’, ‘dr_tempreg_maps_zstat_files_to_standard_smooth’, ‘sca_tempreg_maps_zstat_files’, ‘sca_tempreg_maps_zstat_files_smooth’, ‘centrality_outputs_zstd’, ‘centrality_outputs_smoothed_zstd’