Running C-PAC


As with configuring the subject list, pipeline configuration, and group analysis files, there are two ways of executing a C-PAC run:

  • Using C-PAC’s command line interface
  • Using the main dialog in the C-PAC GUI

In addition to running C-PAC traditionally on your own local computer or on a server, there are three other avenues through which you can run C-PAC without going through the install process:

  • With a Docker container
  • On the Amazon AWS Cloud
  • Through OpenNeuro

More details of these options are available below.

With Docker

A C-PAC Docker image is available so that you can easily get an analysis running without needing to install C-PAC.

The Docker image is designed following the specification established by the BIDS-Apps project. The BIDS Apps project is an initiative to create a collection of reproducible neuroimaging workflows that can be executed as these self-contained environments using Docker containers. These workflows take as input any dataset that is organized according to the Brain Imaging Data Structure (BIDS) standard and generating first-level outputs for this dataset. However, you can provide the C-PAC Docker image with a custom non-BIDS dataset by entering your own data configuration file. More details below.

In addition, we have created a Docker default pipeline configuration as part of this initiative that allows you to run the C-PAC pipeline on your data in an environment that is fully provisioned with all of C-PAC’s dependencies. If you wish to run your own pipeline configuration, you can also provide this to the Docker image at run-time.

To start, first pull the image from Docker Hub:

docker pull fcpindi/c-pac:latest

Once this is complete, you can use the fcpindi/c-pac image tag to invoke runs. The full C-PAC Docker image usage options are shown here, with specific use cases further below.

usage: [-h] [--pipeline_file PIPELINE_FILE]
              [--data_config_file DATA_CONFIG_FILE]
              [--aws_input_creds AWS_INPUT_CREDS]
              [--aws_output_creds AWS_OUTPUT_CREDS] [--n_cpus N_CPUS]
              [--mem_mb MEM_MB] [--mem_gb MEM_GB] [--save_working_dir]
              [--participant_label PARTICIPANT_LABEL [PARTICIPANT_LABEL ...]]
              [--participant_ndx PARTICIPANT_NDX]
              bids_dir output_dir {participant,group,test_config,GUI}

C-PAC Pipeline Runner

positional arguments:
  bids_dir              The directory with the input dataset formatted
                        according to the BIDS standard. Use the format
                        s3://bucket/path/to/bidsdir to read data directly from
                        an S3 bucket. This may require AWS S3 credentials
                        specificied via the --aws_input_creds option.
  output_dir            The directory where the output files should be stored.
                        If you are running group level analysis this folder
                        should be prepopulated with the results of the
                        participant level analysis. Us the format
                        s3://bucket/path/to/bidsdir to write data directly to
                        an S3 bucket. This may require AWS S3 credentials
                        specificied via the --aws_output_creds option.
                        Level of the analysis that will be performed. Multiple
                        participant level analyses can be run independently
                        (in parallel) using the same output_dir. GUI will open
                        the CPAC gui (currently only works with singularity)
                        and test_config will run through the entire
                        configuration process but will not execute the

optional arguments:
  -h, --help            show this help message and exit
  --pipeline_file PIPELINE_FILE
                        Name for the pipeline configuration file to use
  --data_config_file DATA_CONFIG_FILE
                        Yaml file containing the location of the data that is
                        to be processed. Can be generated from the CPAC gui.
                        This file is not necessary if the data in bids_dir is
                        organized according to the BIDS format. This enables
                        support for legacy data organization and cloud based
                        storage. A bids_dir must still be specified when using
                        this option, but its value will be ignored.
  --aws_input_creds AWS_INPUT_CREDS
                        Credentials for reading from S3. If not provided and
                        s3 paths are specified in the data config we will try
                        to access the bucket anonymously
  --aws_output_creds AWS_OUTPUT_CREDS
                        Credentials for writing to S3. If not provided and s3
                        paths are specified in the output directory we will
                        try to access the bucket anonymously
  --anat_only           Only run anatomical preprocessing.
  --n_cpus N_CPUS       Number of execution resources available for the
  --mem_mb MEM_MB       Amount of RAM available to the pipeline in megabytes.
                        Included for compatibility with BIDS-Apps standard,
                        but mem_gb is preferred
  --mem_gb MEM_GB       Amount of RAM available to the pipeline in gigabytes.
                        if this is specified along with mem_mb, this flag will
                        take precedence.
  --save_working_dir    Save the contents of the working directory.
                        The label of the participant that should be analyzed.
                        The label corresponds to sub-<participant_label> from
                        the BIDS spec (so it does not include "sub-"). If this
                        parameter is not provided all subjects should be
                        analyzed. Multiple participants can be specified with
                        a space separated list. To work correctly this should
                        come at the end of the command line
  --participant_ndx PARTICIPANT_NDX
                        The index of the participant that should be analyzed.
                        This corresponds to the index of the participant in
                        the subject list file. This was added to make it
                        easier to accomodate SGE array jobs. Only a single
                        participant will be analyzed. Can be used with
                        participant label, in which case it is the index into
                        the list that follows the particpant_label flag.

Note that any of the optional arguments above will over-ride any pipeline settings in the default pipeline or in the pipeline configuration file you provide via the --pipeline_file parameter.

As an example, in order to run the C-PAC Docker container in participant mode, for one participant, using a BIDS dataset stored on your machine or server, and using the Docker image’s default pipeline configuration (broken into multiple lines for convenience):

docker run -i --rm \
        -v /Users/You/local_bids_data:/bids_dataset \
        -v /Users/You/some_folder:/outputs \
        -v /tmp:/scratch \
        fcpindi/c-pac:latest /bids_dataset /outputs participant

Note, the -v flags map your local filesystem locations to a “location” within the Docker image. (For example, the /bids_dataset and /outputs directories in the command above are arbitrary names). If you provided /Users/You/local_bids_data to the bids_dir input parameter, Docker would not be able to access or see that directory, so it needs to be mapped first. In this example, the local machine’s /tmp directory has been mapped to the /scratch name because the C-PAC Docker image’s default pipeline sets the working directory to /scratch. If you wish to keep your working directory somewhere more permanent, you can simply map this like so: -v /Users/You/working_dir:/scratch.

You can also provide a link to an AWS S3 bucket containing a BIDS directory as the data source:

docker run -i --rm \
        -v /Users/You/some_folder:/outputs \
        -v /tmp:/scratch \
        fcpindi/c-pac:latest s3://fcp-indi/data/Projects/ADHD200/RawDataBIDS /outputs participant

To run the C-PAC Docker container with a pipeline configuration file other than the container’s default pipeline, assuming the configuration file is in the /Users/You/Documents directory:

docker run -i --rm \
        -v /Users/You/local_bids_data:/bids_dataset \
        -v /Users/You/some_folder:/outputs \
        -v /tmp:/scratch \
        -v /Users/You/Documents:/configs \
        -v /Users/You/resources:/resources \
        fcpindi/c-pac:latest /bids_dataset /outputs participant --pipeline_file /configs/pipeline_config.yml

In this case, we need to map the directory containing the pipeline configuration file /Users/You/Documents to a Docker image virtual directory /configs. Note we are using this /configs directory in the --pipeline_file input flag. In addition, if there are any ROIs, masks, or input files listed in your pipeline configuration file, the directory these are in must be mapped as well- assuming /Users/You/resources is your directory of ROI and/or mask files, we map it with -v /Users/You/resources:/resources. In the pipeline configuration file you are providing, these ROI and mask files must be listed as /resources/ROI.nii.gz (etc.) because we have mapped /Users/You/resources to /resources.

Finally, to run the Docker container with a specific data configuration file (instead of providing a BIDS data directory):

docker run -i --rm \
        -v /Users/You/any_directory:/bids_dataset \
        -v /Users/You/some_folder:/outputs \
        -v /tmp:/scratch \
        -v /Users/You/Documents:/configs \
        fcpindi/c-pac:latest /bids_dataset /outputs participant --data_config_file /configs/data_config.yml

Note: we are still providing /bids_dataset to the bids_dir input parameter. However, we have mapped this to any directory on your machine, as C-PAC will not look for data in this directory when you provide a data configuration YAML with the --data_config_file flag. In addition, if the dataset in your data configuration file is not in BIDS format, just make sure to add the --skip_bids_validator flag at the end of your command to bypass the BIDS validation process.

Further usage notes:

  • You can run only anatomical preprocessing easily, without modifying your data or pipeline configuration files, by providing the --anat_only flag.
  • A GUI can be invoked to assist in pipeline custimization by specifying the GUI command line argument, as opposed to participant (this currently only works for Singularity containers).
  • As stated, the default behavior is to read in data that is organized in the BIDS format. This includes data that is in Amazon AWS S3 by using the format s3://<bucket_name>/<bids_dir> for the bids_dir command line argument. Outputs can be written to S3 using the same format for the output_dir. Credentials for accessing these buckets can be specified on the command line (using --aws_input_creds or --aws_output_creds).
  • When the app is run, a data configuration file is written to the working directory. This file can be passed into subsequent runs, which avoids the overhead of re-parsing the BIDS input directory on each run (i.e. for cluster or cloud runs). These files can be generated without executing the C-PAC pipeline using the test_run command line argument.
  • The participant_label and participant_ndx arguments allow the user to specify which of the many datasets should be processed, which is useful when parallelizing the run of multiple participants.

Running On Singularity

You can convert a Docker container into a Singularity container :

docker run --privileged -ti --rm  \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -v /Users/You/singularity_images:/output \
    filo/docker2singularity fcpindi/c-pac:latest

This will create a Singularity container image named something like fcpindi_c-pac_latest-{date}-{hash value}.img.

Running a Singularity image is similar to running a Docker image, except you use -B for mappings instead of -v:

singularity run \
        -B /Users/You/local_bids_data:/bids_dataset \
        -B /Users/You/some_folder:/outputs \
        -B /tmp:/scratch \
        fcpindi_c-pac_latest-{date}-{hash value}.img /bids_dataset /outputs participant

Again, you can also provide an AWS S3 link for the data:

singularity run \
        -B /Users/You/some_folder:/outputs \
        -B /tmp:/scratch \
        fcpindi_c-pac_latest-{date}-{hash value}.img s3://fcp-indi/data/Projects/ADHD200/RawDataBIDS /outputs participant

Using the GUI

This is for configuring data and pipelines or running C-PAC: on your machine or server after C-PAC installation, from a Docker container once the GUI is enabled, or on an X2Go session of an AWS instance.

You can invoke the C-PAC user interface by running:

cpac gui

On C-PAC’s main screen, you can open the data configuration generator and the pipeline configuration builder by clicking New next to Subject Lists and Pipelines, respectively. Once you’ve configured your data and pipelines, or if you are using an already existing set of configurations, you can load them into their respective panes.

If you have loaded multiple subject lists or pipeline configurations, click the checkboxes to the left of the subject list names in the Subject Lists pane to select which sets of subjects you would like to run. Similarly, select the pipelines you’d like to run in the adjacent Pipelines pane. When you are ready to run C-PAC, click the Run Individual Level Analysis button at the bottom.


From Terminal

This is for runs on your machine or server after C-PAC installation, or for runs on an AWS instance whether you have SSHed in or are using an X2Go session to view the instance.

To start a C-PAC run from the command line you can use C-PAC’s command-line interface (CLI). To start an analysis with a specific pipeline configuration and a specific data configuration (subject list) YAML file, run:

cpac run --pipe_config /path/to/pipeline_config.yml /path/to/data_config.yml

You can also omit the --pipe_config parameter to run C-PAC with a pre-configured “default” pipeline if you want to get started quickly:

cpac run /path/to/data_config.yml

In addition, with the release 1.2.0, a few data configuration files are bundled with the C-PAC install, and can be invoked at the command line easily. To quickly run the default pipeline (or any pipeline configuration) with the ADHD200, ABIDE, and NKI-Rockland Sample data releases:

cpac run ADHD200
cpac run ABIDE
cpac run NKI-RS

Of course, you can also run these pre-configured datasets with any pipeline configuration file you choose like so:

cpac run --pipe_config /path/to/pipeline_config.yml ADHD200

Note, these bundled data configurations pull the data from the FCP-INDI AWS-S3 bucket.

You can also quickly modify the amount of cores to dedicate to the run, without supplying or modifying your pipeline configuration file with the –num_cores flag:

cpac run /path/to/data_config.yml --num_cores 4

On the AWS Cloud

The C-PAC team has released an Amazon Marketplace AMI, making it easier for researchers to use C-PAC in the cloud. You can use the AMI to either launch a single machine for basic runs or create a high performance computing (HPC) cluster using Starcluster. Clusters can be dynamically scaled up as your computational needs increase. Detailed explanations of cloud computing and HPC are beyond the scope of this documentation, but we will define a few key terms before we start. If these terms are familiar, you may skip them and proceed to later sections.

  • Amazon Machine Instance (AMI) - A disk image of an operating system and any additional installed software that can be used to create a virtual machine.
  • Instance - A single running virtual machine whose initial state is based on the AMI that it is launched from. Instances can be classified as spot instances or on-demand instances. On-demand instances are reliably created the moment they are requested for a fixed rate. Spot instances are created based on whether or not a bid that you set is accepted by Amazon. They can be significantly cheaper than on-demand instances, but are only created when Amazon accepts your bid.
  • Instance Type - The hardware specification for a given instance. A list of the instance types made available by Amazon may be found here.
  • Terminated Instance - An instance is considered terminated when its resources have been completely freed up for use by others in the Amazon cloud. Any data on a terminated instance that is not relocated to persistent storage such as EBS (see below) will be completely discarded. Instance termination is the virtual equivalent of throwing out a physical server. When you have terminated an instance, you are no longer paying for it. Note that by default, instances do not have persistent storage attached to them- you will need to configure persistent storage when you set up the instance.
  • Stopped Instance - An instance is considered stopped when it is not active, but its resources are still available for future use whenever you choose to reactivate it. Stopping an instance is the virtual equivalent of turning a computer off or putting it in hibernate mode. When you stop an instance, you continue to pay for the storage associated with it (i.e., the main and other volumes attached to it), but not for the instance itself. You should stop an instance when the analyses you are working on are not fully done and you would like to preserve the current state of a running instance.
  • Simple Storage Service (S3) - A form of storage offered by Amazon. S3 is not intended to be directly attached to instances since it lacks a filesystem, but it can be used to archive large datasets. Amazon provides tools for uploading data to S3 ‘buckets’ where it can be stored. It is less costly than EBS.
  • Elastic Block Storage (EBS) - A form of persistent storage offered by Amazon for use with instances. When you have terminated an instance, items stored in an EBS volume can be accessed by any future instances that you start up.
  • EC2 Instance Store - A form of temporary storage that comes included with some instance types. Instance store volumes must be added manually before launching an instance, and all files stored on them will be lost when the instance is terminated. The instance store is typically mounted at /mnt.

Lastly, it would be important to review any terms related to the Sun Grid Engine job scheduler.

Creating AWS Access and Network Keys

Before you can create a single C-PAC machine or a C-PAC HPC cluster, you must first generate credentials that will allow you to log into any AWS instance that you create. The following steps will walk you through the process of creating all the necessary credentials and encryption keys that you will need.

  1. Go to
  2. Click the Sign in to the AWS Console button
  3. Enter your e-mail address and password. If you do not already have an account, enter your e-mail address, select I am a new user. and click the Sign in button. Provide Amazon with the information (e-mail address, payment method) needed to create your account.
  4. Amazon has different regions that it hosts its web services from (e.g. Oregon, Northern Virginia, Tokyo). In the upper right-hand corner there will be a region that you are logged into next to your user name. Change this to your preferred region. The Marketplace AMI is available in all regions, although public AMIs (non-Marketplace AMIs shared from personal accounts) may not be.
  5. Click on your name in the upper right corner and navigate to Security Credentials. Accept the disclaimer that appears on the page.
  6. Click on Access Keys and click on the blue Create New Access Key button. Click Download Key File and move the resulting csv file to a safe and memorable location on your hard drive.
  7. Click on the box in the upper left corner of AWS. Click on EC2. Click on Key Pairs in the left-hand column.
  8. Click on the blue Create Key Pair button. Give your key an appropriate name and click on the blue Create button. A .pem file will now save to disk. Move this file to a safe and memorable location on your hard drive.
  9. On your local drive, open a terminal and run the following command: chmod 400 /path/to/pem/file

Starting a Single C-PAC Instance via the AWS Console

Now that you have generated the access keys and a pem file, you may launch a single instance via Amazon’s web interface by following the steps below. If you are planning on processing many subjects or obtaining computationally-intensive derivatives (such as network centrality), you should use Starcluster instead.

  1. In the left-hand column under the INSTANCES header in the AWS console, click Instances. This is a dashboard of all instances you currently have running in the AWS cloud. Click the blue Launch Instance button.
  2. On the left-hand side of the new page, click on the Amazon Marketplace tab and search c-pac in the search text box.
  3. Click the blue Select button next to the C-PAC AMI. Click the blue Continue button on the next screen.
  4. Now choose the instance type that you would like to use. Note that C-PAC requires at least 8 GB of RAM- the m3.xlarge instance type has 15 GB of RAM and 4 CPUs and functions well with C-PAC for small runs and experimentation. This instance type is equivalent to a standard desktop machine in terms of processing power. To select this type, click on the General purpose tab and click the box next to m3.xlarge. Then, click the Next: Configure Instance Details button. Note that for most larger runs you will want to choose a more powerful instance type, such as c3.4xlarge or c3.8xlarge.
  5. The details page can be used to request spot instances, as well as other functionality (including VPN, VPC options). For a basic run you do not need to change anything, although you can tailor it according to your future needs. Hovering over the ‘i’ icons on this page will give you more insight into the options available. When done, click Next: Add Storage.
  6. On the storage page, you can allocate space for the workstation, such as user and system directories. This is where you can attach instance store volumes if your instance type comes with them. To do this, click the Add New Volume button and select the instance store via the dropdown menu in the Type column. You may need to do this multiple times if your instance comes with multiple instance stores. If you want the files stored on the root volume to be kept after the instance is terminated, uncheck the box below the Delete on Termination column. Note that persistent storage for the datasets can be allocated and attached as described in a later section. Click Next: Tag Instance.
  7. On this page you can tag the instance with metadata (e.g., details related to the specific purpose for the instance). Tags are key-value pairs, so any contextual data that can be encapsulated in this format can be saved. Click Next: Configure Security Group.
  8. On this page, you can modify who has access to the instance. The AMI defaults allow remote access from anywhere. If you would like to customize security to allow only a certain set of IP addresses and users access to the instance, you can do so here. If you find that custom settings, such as using the My IP setting or specifying a range of IP addresses, do not work, consult with your institution’s network administrator to make sure that you are entering settings correctly. Click Review and Launch when you are done.
  9. This final page summarizes the instance details you are about to launch. You might receive some warnings as a result of security or the instance type not being in the free tier. These warnings can be ignored.
  10. Click the Launch button. A dialogue box will ask you to choose a key pair for the instance. Every instance requires a key pair in order for you to securely log in and use it. Change the top drop down menu bar to Choose an existing key pair and select the key pair you created in the Creating AWS Access and Network Keys section in the other drop down menu. Check the acknowledgement check box and click the blue Launch Instances button.
  11. You can click the View Instances blue button on the lower right of the page after to watch your new instance start up in the instance console.
  12. When the Instance State column reads running and the Status Checks column reads 2/2, the instance should be active. Click on the row for the new instance. In the bottom pane, take note of the values for the Instance ID, Public DNS, and Availability zone fields under the Description tab.

Attaching Persistent EBS Storage to an Instance

  1. Once your instance is up and running, you can create a persistent storage volume for your data and results. In the left-hand column under the ELASTIC BLOCK STORE header in the AWS console, click Volumes. This is a dashboard of all volumes that you currently have stored in EBS. Click the blue Create Volume button.
  2. Change the size field in the proceeding dialogue to have enough space to encompass the amount of data you expect to store. A single volume can be as small as 1 GB or as large as 16 TB. Change the availability zone to match the zone from your instance’s Description tab.
  3. Click the checkbox next to the newly-created volume. Click Actions followed by Attach Volumes. Enter the Instance ID from the instance’s Description tab in the Instance field. The Device field should fill itself automatically and should be of the form /dev/sdb or similar. Note the letter used after the sd. Click the blue Attach button.
  4. Execute the following command from the terminal to make it so that your instance can see the volume (replace the letter b at the end of /dev/xvdb with the letter from the previous step).
ssh -i /path/to/pem/file ubuntu@<Public Domain Name> 'sudo mkfs -t ext4 /dev/xvdb && sudo mount /dev/xvdb /media/ebs

To use this volume with future instances, you may attach it to the instance using the AWS console and then use this command:

ssh -i /path/to/pem/file ubuntu@<Public Domain Name> 'sudo mount /dev/xvdb /media/ebs'

Note that the creation of a persistent volume is heavily automated in Starcluster, so if you will be creating many different persistent volumes you should use Starcluster instead.

Accessing Your Instance

There are now two different means of accessing the instance. Either through X2Go (a desktop GUI-based session) or through ssh (a command line session).


  1. Open a terminal and type ssh -i /path/to/pem/file ubuntu@<Public Domain Name>.
  2. Type yes when asked if you trust the source.


  1. Install the X2Go client using the instructions here.
  2. Open X2go and create a new session.
  3. For Host:, enter the Public DNS from earlier.
  4. For Login: enter ubuntu.
  5. SSH port: should be 22.
  6. For Use RSA/DSA key for ssh connection:, select the key you generated for the instance.
  7. Select LXDE for Session and click OK.

When you are done, your session configuration should look similar to the following:


Note: If X2Go does not work on your computer, you can also access the C-PAC GUI by adding the -X flag to the ssh command to enable X11 port forwarding (i.e., the ssh command would be ssh -X -i /path/to/pem/file ubuntu@<Public Domain Name>). X11 port forwarding is very slow compared to X2Go, however, so it is recommended that you troubleshoot X2Go further before turning to this option.

Uploading Data to Your Instance

To upload data to your newly-created AWS instance, you can run the following command on the computer containing your data:

scp -r -i /path/to/pem/key /path/to/data ubuntu@<Public Domain Name>:/path/to/server/directory

If you have configured persistent storage, you will want to ensure that /path/to/server/directory is pointing to the mount point for the persistent storage. If you followed the instructions above or the instructions in the Starcluster section below, the mount point should be /media/ebs.

Starting a C-PAC HPC Cluster via Starcluster

Starcluster is suggested for more sophisticated C-PAC runs. Using Starcluster, you can parallelize your analyses by distributing subjects across multiple nodes in an HPC cluster. The following section describes how to install and configure Starcluster to work with C-PAC, dynamically add nodes to your cluster and leverage C-PAC’s grid functionality.

Installing Starcluster

If you have pip installed, Starcluster can be installed via:

pip install starcluster

Note that if you are using a *nix-based OS and you are not using an environment such as Miniconda, you will need to run the above command with sudo.

If you do not have pip installed, see the Official Starcluster Installation Instructions for alternative installation methods.

Installing the C-PAC Starcluster Plug-ins

The C-PAC Starcluster plug-ins configure the SGE environment that C-PAC uses and ensure that storage space is writable. From the terminal, download the C-PAC Starcluster plug-ins and install them by running the following commands:

cd /tmp
git clone
cd CPAC_CLOUD/sc_plugins
mv *.py ~/.starcluster/plugins

Creating and Editing Your Configuration File

Now you will need to create a Starcluster configuration file so that Starcluster can use your keys and know which instance types you would like to use. To begin, type starcluster help and select option 2.

Fill in the AWS access keys from the CVS file that you created in the Creating AWS Access and Network Keys section:

[aws info]
AWS_ACCESS_KEY_ID = <Your Acces Key>

You do not need to define the AWS_USER_ID field unless you want to create custom AMIs based off the C-PAC AMI. The public C-PAC AMI is available in us-east-1, so you should not change the value of AWS_REGION_NAME.

Point your key definition to the pem file you generated in the Creating AWS Access and Network Keys section:

[key cpac_key]

Find the image ID for the C-PAC AMI by logging into the AWS Console using your favorite web browser. Make sure that you are in the N. Virginia region. Navigate to the EC2 service click Images -> AMIs. Then click Owned by Me in the upper left corner and switch it to Public images. Search for ‘CPAC’. Select the version of C-PAC that you wish to use and look in the lower pane for the AMI ID field.

Add the following cluster definition to your configuration file:

[cluster cpac_cluster]
KEYNAME = cpac_key
PLUGINS = cpac_sge, mnt_config

You can customize this to have additional nodes or use different instance types as per your needs. Note that you can always add nodes later using Starcluster from the command line. If you wish to use spot instances rather than on-demand instances, then add the following line to the cluster definition:

SPOT = <bidding_price>

Also add the following two plug-in definitions for the C-PAC Starcluster plug-ins:

[plugin cpac_sge]
setup_class = cpac_sge.PEInstaller
pe_url =

[plugin mnt_config]
setup_class = mnt_perm.MntPermissions

Attaching Persistent Storage to Your Cluster

By default, the cluster will have an EBS-backed root volume and, if available, an instance store volume mounted at /mnt. Neither of these volumes are persistent and they will be destroyed when the cluster terminates. A shared directory mounted at /home on the head node can be used across nodes. If you need more storage than what is available on the head node or if you want to keep your data after the cluster is terminated, you will need to create a new volume that can be attached to all nodes in the cluster. To do so, begin by creating an EBS-backed volume:

starcluster createvolume --shutdown-volume-host <volume_size_in_gigabytes> <region> -I t2.micro -i <Image ID>

Type starcluster listvolumes and get the volume-id for the volume that you just created. Open up your Starcluster configuration file and add the following volume definition:

[volume cpac_volume]
VOLUME_ID = <Volume ID>
MOUNT_PATH = /media/ebs

Append the following line to your cpac_cluster definition:

VOLUMES = cpac_volume

The Starcluster documentation explains how to perform other operations such as resizing and removing volumes.

Starting the C-PAC Head Node

To start up the head node on your C-PAC HPC cluster, use the following Starcluster command (with substitutions where necessary):

starcluster start -c cpac_cluster <cluster_name>

Adding Additional Nodes

To add additional nodes to your C-PAC HPC cluster, use the following Starcluster command (with substitutions where necessary):

starcluster addnode -n <number_of_nodes_to_add> <cluster_name>

Accessing the Head Node

If you wish to use the C-PAC GUI while accessing the head node, type the following command:

starcluster sshmaster -X -u ubuntu <cluster_name>

If you only wish to access the command line interface, you may omit the -X flag:

starcluster sshmaster -u ubuntu <cluster_name>

You may also use the instructions for X2Go from the Starting a Single C-PAC Instance via the AWS Console section to access the head node via a graphical shell. To do so, obtain the public DNS for the head node by typing starcluster listclusters. The public DNS will be in the last column of the row labeled master.

Using C-PAC to Submit an SGE Job

C-PAC performs the heavy lifting of creating an SGE job submission script and submitting it to the SGE job scheduler seamlessly. There are two ways to accomplish this- either through C-PAC’s GUI or from the command line.

Via the C-PAC GUI:

  1. Type cpac_gui while in the shell on the head node.
  2. From the main C-PAC window, load your pipeline configuration file.
  3. Under Computer Settings in the left pane, change Run C-PAC on Grid to True. Change SGE Parallel Environment to mpi_smp.

When you are done, your window should look like this:


Save the pipeline configuration file and run an analysis as you would normally.

Via the shell:

  1. Open your pipeline configuration YAML file in your preferred text editor.
  2. Change the runOnGrid field to a value of True.
  3. Make sure that the resourceManager field is set to SGE.
  4. Set the parallelEnvironment field to mpi_smp.
  5. Execute the following command to run your pipeline. /path/to/pipeline_config.yml /path/to/CPAC_subject_list.yml

Checking on SGE Jobs

Once you are done submitting the job, you can check its status by typing qstat. This command will produce output that looks similar to the following:

job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
      1 0.55500 submit_201 ubuntu       r     06/05/2015 20:42:13 all.q@master                       1 1
      1 0.55500 submit_201 ubuntu       r     06/05/2015 20:42:13 all.q@node001                      1 2
      2 0.55500 submit_201 ubuntu       r     06/05/2015 20:42:58 all.q@node002                      1 1
      2 0.00000 submit_201 ubuntu       qw    06/05/2015 20:42:47                                    1 2

The job-ID is a number assigned to your job when it is submitted to the scheduler. The state of the job can be represented by one of several values: r means that the job is running, qw means that the job is queued and waiting, and E means that an error has occurred. The queue column indicates on which nodes of your cluster the C-PAC job is being executed.

If an error has occurred on any of the nodes while your pipeline executes, you should check the cluster_temp_files directory that was created in the directory from which you ran C-PAC. This will contain copies of the job submission scripts that C-PAC generated to start your job. It will also contain files containing the standard out and error messages for a given job. You should check these first to determine what may have caused the error. If these files do not help you determine what may have caused the error, feel free to ask for help on the C-PAC forum.

Terminating a Starcluster Instance

When you are done and have exited from your cluster, the following command will terminate the cluster:

starcluster terminate <cluster_name>

If you receive an error from Starcluster while trying to terminate the instance, the following command will force Starcluster to terminate your cluster:

starcluster terminate -f <cluster_name>

Warning: If you are not using persistent storage (see Attaching Persistent Storage to Your Cluster) then all of your data will be lost upon termination of the cluster. You will need to copy your data to another drive if you wish to keep it.

With OpenNeuro

The OpenNeuro project is an initiative to provide easy access to public neuroimaging datasets and the ability to quickly run analysis pipelines on these datasets directly through a web interface. C-PAC is available as an app on OpenNeuro, and more information on running apps on the platform is available here.