The C-PAC team has released an Amazon Marketplace AMI, making it easier for researchers to use C-PAC in the cloud. You can use the AMI to either launch a single machine for basic runs or create a high performance computing (HPC) cluster using Starcluster. Clusters can be dynamically scaled up as your computational needs increase. Detailed explanations of cloud computing and HPC are beyond the scope of this documentation, but we will define a few key terms before we start. If these terms are familiar, you may skip them and proceed to later sections.
Lastly, it would be important to review any terms related to the Sun Grid Engine job scheduler.
Before you can create a single C-PAC machine or a C-PAC HPC cluster, you must first generate credentials that will allow you to log into any AWS instance that you create. The following steps will walk you through the process of creating all the necessary credentials and encryption keys that you will need.
Now that you have generated the access keys and a pem file, you may launch a single instance via Amazon’s web interface by following the steps below. If you are planning on processing many subjects or obtaining computationally-intensive derivatives (such as network centrality), you should use Starcluster instead.
ssh -i /path/to/pem/file ubuntu@<Public Domain Name> 'sudo mkfs -t ext4 /dev/xvdb && sudo mount /dev/xvdb /media/ebs
To use this volume with future instances, you may attach it to the instance using the AWS console and then use this command:
ssh -i /path/to/pem/file ubuntu@<Public Domain Name> 'sudo mount /dev/xvdb /media/ebs'
Note that the creation of a persistent volume is heavily automated in Starcluster, so if you will be creating many different persistent volumes you should use Starcluster instead.
There are now two different means of accessing the instance. Either through X2Go (a desktop GUI-based session) or through ssh (a command line session).
When you are done, your session configuration should look similar to the following:
Note: If X2Go does not work on your computer, you can also access the C-PAC GUI by adding the -X flag to the ssh command to enable X11 port forwarding (i.e., the ssh command would be ssh -X -i /path/to/pem/file ubuntu@<Public Domain Name>). X11 port forwarding is very slow compared to X2Go, however, so it is recommended that you troubleshoot X2Go further before turning to this option.
To upload data to your newly-created AWS instance, you can run the following command on the computer containing your data:
scp -r -i /path/to/pem/key /path/to/data ubuntu@<Public Domain Name>:/path/to/server/directory
If you have configured persistent storage, you will want to ensure that /path/to/server/directory is pointing to the mount point for the persistent storage. If you followed the instructions above or the instructions in the Starcluster section below, the mount point should be /media/ebs.
Starcluster is suggested for more sophisticated C-PAC runs. Using Starcluster, you can parallelize your analyses by distributing subjects across multiple nodes in an HPC cluster. The following section describes how to install and configure Starcluster to work with C-PAC, dynamically add nodes to your cluster and leverage C-PAC’s grid functionality.
If you have pip installed, Starcluster can be installed via:
pip install starcluster
Note that if you are using a *nix-based OS and you are not using an environment such as Miniconda, you will need to run the above command with sudo.
If you do not have pip installed, see the Official Starcluster Installation Instructions for alternative installation methods.
The C-PAC Starcluster plug-ins configure the SGE environment that C-PAC uses and ensure that storage space is writable. From the terminal, download the C-PAC Starcluster plug-ins and install them by running the following commands:
cd /tmp git clone https://github.com/FCP-INDI/CPAC_CLOUD cd CPAC_CLOUD/sc_plugins mv *.py ~/.starcluster/plugins
Now you will need to create a Starcluster configuration file so that Starcluster can use your keys and know which instance types you would like to use. To begin, type starcluster help and select option 2.
Fill in the AWS access keys from the CVS file that you created in the Creating AWS Access and Network Keys section:
[aws info] AWS_ACCESS_KEY_ID = <Your Acces Key> AWS_SECRET_ACCESS_KEY = <Your Secret Key>
You do not need to define the AWS_USER_ID field unless you want to create custom AMIs based off the C-PAC AMI. The public C-PAC AMI is available in us-east-1, so you should not change the value of AWS_REGION_NAME.
Point your key definition to the pem file you generated in the Creating AWS Access and Network Keys section:
[key cpac_key] KEY_LOCATION=/path/to/pem/file
Find the image ID for the C-PAC AMI by logging into the AWS Console using your favorite web browser. Make sure that you are in the N. Virginia region. Navigate to the EC2 service click Images -> AMIs. Then click Owned by Me in the upper left corner and switch it to Public images. Search for ‘CPAC’. Select the version of C-PAC that you wish to use and look in the lower pane for the AMI ID field.
Add the following cluster definition to your configuration file:
[cluster cpac_cluster] KEYNAME = cpac_key PLUGINS = cpac_sge, mnt_config CLUSTER_SIZE = 1 CLUSTER_SHELL = bash NODE_IMAGE_ID = <Image ID> MASTER_INSTANCE_TYPE = t2.medium NODE_INSTANCE_TYPE = c3.8xlarge
You can customize this to have additional nodes or use different instance types as per your needs. Note that you can always add nodes later using Starcluster from the command line. If you wish to use spot instances rather than on-demand instances, then add the following line to the cluster definition:
SPOT = <bidding_price>
Also add the following two plug-in definitions for the C-PAC Starcluster plug-ins:
[plugin cpac_sge] setup_class = cpac_sge.PEInstaller pe_url = https://raw.githubusercontent.com/FCP-INDI/CPAC_CLOUD/master/mpi_smp.conf [plugin mnt_config] setup_class = mnt_perm.MntPermissions
By default, the cluster will have an EBS-backed root volume and, if available, an instance store volume mounted at /mnt. Neither of these volumes are persistent and they will be destroyed when the cluster terminates. A shared directory mounted at /home on the head node can be used across nodes. If you need more storage than what is available on the head node or if you want to keep your data after the cluster is terminated, you will need to create a new volume that can be attached to all nodes in the cluster. To do so, begin by creating an EBS-backed volume:
starcluster createvolume --shutdown-volume-host <volume_size_in_gigabytes> <region> -I t2.micro -i <Image ID>
Type starcluster listvolumes and get the volume-id for the volume that you just created. Open up your Starcluster configuration file and add the following volume definition:
[volume cpac_volume] VOLUME_ID = <Volume ID> MOUNT_PATH = /media/ebs
Append the following line to your cpac_cluster definition:
VOLUMES = cpac_volume
The Starcluster documentation explains how to perform other operations such as resizing and removing volumes.
To start up the head node on your C-PAC HPC cluster, use the following Starcluster command (with substitutions where necessary):
starcluster start -c cpac_cluster <cluster_name>
To add additional nodes to your C-PAC HPC cluster, use the following Starcluster command (with substitutions where necessary):
starcluster addnode -n <number_of_nodes_to_add> <cluster_name>
If you wish to use the C-PAC GUI while accessing the head node, type the following command:
starcluster sshmaster -X -u ubuntu <cluster_name>
If you only wish to access the command line interface, you may omit the -X flag:
starcluster sshmaster -u ubuntu <cluster_name>
You may also use the instructions for X2Go from the Starting a Single C-PAC Instance via the AWS Console section to access the head node via a graphical shell. To do so, obtain the public DNS for the head node by typing starcluster listclusters. The public DNS will be in the last column of the row labeled master.
C-PAC performs the heavy lifting of creating an SGE job submission script and submitting it to the SGE job scheduler seamlessly. There are two ways to accomplish this- either through C-PAC’s GUI or from the command line.
Via the C-PAC GUI:
When you are done, your window should look like this:
Save the pipeline configuration file and run an analysis as you would normally.
Via the shell:
cpac_run.py /path/to/pipeline_config.yml /path/to/CPAC_subject_list.yml
Once you are done submitting the job, you can check its status by typing qstat. This command will produce output that looks similar to the following:
job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 1 0.55500 submit_201 ubuntu r 06/05/2015 20:42:13 all.q@master 1 1 1 0.55500 submit_201 ubuntu r 06/05/2015 20:42:13 all.q@node001 1 2 2 0.55500 submit_201 ubuntu r 06/05/2015 20:42:58 all.q@node002 1 1 2 0.00000 submit_201 ubuntu qw 06/05/2015 20:42:47 1 2
The job-ID is a number assigned to your job when it is submitted to the scheduler. The state of the job can be represented by one of several values: r means that the job is running, qw means that the job is queued and waiting, and E means that an error has occurred. The queue column indicates on which nodes of your cluster the C-PAC job is being executed.
If an error has occurred on any of the nodes while your pipeline executes, you should check the cluster_temp_files directory that was created in the directory from which you ran C-PAC. This will contain copies of the job submission scripts that C-PAC generated to start your job. It will also contain files containing the standard out and error messages for a given job. You should check these first to determine what may have caused the error. If these files do not help you determine what may have caused the error, feel free to ask for help on the C-PAC forum.
When you are done and have exited from your cluster, the following command will terminate the cluster:
starcluster terminate <cluster_name>
If you receive an error from Starcluster while trying to terminate the instance, the following command will force Starcluster to terminate your cluster:
starcluster terminate -f <cluster_name>
Warning: If you are not using persistent storage (see Attaching Persistent Storage to Your Cluster) then all of your data will be lost upon termination of the cluster. You will need to copy your data to another drive if you wish to keep it.