Computer Settings

../_images/compute_gui.png

Pipeline Name - [text]: Name for this pipeline configuration - useful for identification. Note that including an individual participant’s ID code in this will presently cause C-PAC to crash.

  1. Maximum Memory Per Participant (GB) - [number]: The maximum amount of memory each participant’s workflow can allocate. Use this to place an upper bound of memory usage. Warning: ‘Memory Per Participant’ multiplied by ‘Number of Participants to Run Simultaneously’ must not be more than the total amount of RAM. Conversely, using too little RAM can impede the speed of a pipeline run. It is recommended that you set this to a value that when multiplied by ‘Number of Participants to Run Simultaneously’ is as much RAM you can safely allocate.

  2. Maximum Cores Per Participant - [integer]: Number of cores (on a single machine) or slots on a node (cluster/grid) per subject. Slots are cores on a cluster/grid node. ‘Number of Cores Per Participant’ multiplied by ‘Number of Participants to Run Simultaneously’ must not be greater than the total number of cores. Dedicating more than one core/CPU per participant will direct C-PAC to parallelize the motion correction and time series registration transform application steps, for a speed increase.

  3. Number of Participants to Run Simultaneously - [integer]: This number depends on computing resources.

  4. Number of Cores for Anatomical Registration (ANTS) - [integer]: This number depends on computing resources.

  5. FSL directory - [path]: Full path to the FSL version to be used by CPAC. If you have specified an FSL path in your .bashrc file, this path will be set automatically.

  6. Run CPAC on a Cluster/Grid - [False, True]: Select False if you intend to run CPAC on a single machine. If set to True, CPAC will attempt to submit jobs through the job scheduler / resource manager selected below.

  7. Resource Manager - [SGE, PBS, SLURM]: Sun Grid Engine (SGE), Portable Batch System (PBS) or Slurm. Only applies if you are running on a grid or compute cluster. See the section below entitled SGE Configuration for more information on how to set up SGE.

  8. SGE Parallel Environment - [text]: SGE Parallel Environment to use when running CPAC. Only applies when you are running on a grid or compute cluster using SGE. See the section below entitled SGE Configuration for more information on how to set up SGE.

  9. SGE Queue - [text]: SGE Queue to use when running CPAC. Only applies when you are running on a grid or compute cluster using SGE. See the section below entitled SGE Configuration for more information on how to set up SGE.

Configuration Without the GUI

The following nested key/value pairs will be set to these defaults if not defined in your pipeline configuration YAML.

pipeline_setup:
  system_config:

    # Random seed used to fix the state of execution.
    # If unset, each process uses its own default.
    # If set, a `random.log` file will be generated logging the random seed and each node to which that seed was applied.
    # If set to a positive integer (up to 2147483647), that integer will be used to seed each process that accepts a random seed.
    # If set to 'random', a random positive integer (up to 2147483647) will be generated and that seed will be used to seed each process that accepts a random seed.
    random_seed:

    # Select Off if you intend to run CPAC on a single machine.
    # If set to On, CPAC will attempt to submit jobs through the job scheduler / resource manager selected below.
    on_grid:

      run: Off

      # Sun Grid Engine (SGE), Portable Batch System (PBS), or Simple Linux Utility for Resource Management (SLURM).
      # Only applies if you are running on a grid or compute cluster.
      resource_manager: SGE

      SGE:
        # SGE Parallel Environment to use when running CPAC.
        # Only applies when you are running on a grid or compute cluster using SGE.
        parallel_environment:  mpi_smp

        # SGE Queue to use when running CPAC.
        # Only applies when you are running on a grid or compute cluster using SGE.
        queue:  all.q

    # The maximum amount of memory each participant's workflow can allocate.
    # Use this to place an upper bound of memory usage.
    # - Warning: 'Memory Per Participant' multiplied by 'Number of Participants to Run Simultaneously'
    #   must not be more than the total amount of RAM.
    # - Conversely, using too little RAM can impede the speed of a pipeline run.
    # - It is recommended that you set this to a value that when multiplied by
    #   'Number of Participants to Run Simultaneously' is as much RAM you can safely allocate.
    maximum_memory_per_participant: 1

    # The maximum amount of cores (on a single machine) or slots on a node (on a cluster/grid)
    # to allocate per participant.
    # - Setting this above 1 will parallelize each participant's workflow where possible.
    #   If you wish to dedicate multiple cores to ANTS-based anatomical registration (below),
    #   this value must be equal or higher than the amount of cores provided to ANTS.
    # - The maximum number of cores your run can possibly employ will be this setting multiplied
    #   by the number of participants set to run in parallel (the 'Number of Participants to Run
    #   Simultaneously' setting).
    max_cores_per_participant: 1

    # The number of cores to allocate to ANTS-based anatomical registration per participant.
    # - Multiple cores can greatly speed up this preprocessing step.
    # - This number cannot be greater than the number of cores per participant.
    num_ants_threads: 1

    # The number of cores to allocate to processes that use OpenMP.
    num_OMP_threads: 1

    # The number of participant workflows to run at the same time.
    # - The maximum number of cores your run can possibly employ will be this setting
    #   multiplied by the number of cores dedicated to each participant (the 'Maximum Number of Cores Per Participant' setting).
    num_participants_at_once: 1

    # Full path to the FSL version to be used by CPAC.
    # If you have specified an FSL path in your .bashrc file, this path will be set automatically.
    FSLDIR:  /usr/share/fsl/5.0

Setting up SGE

Preliminaries

Before you configure Sun Grid Engine so that it works with C-PAC, you should understand the following concepts:

  • Job Scheduler - A program that can allocate computational resources in an HPC cluster to jobs based on availability and distribute jobs across nodes. C-PAC can use Sun Grid Engine (SGE) as its job scheduler (and SGE comes pre-configured with C-PAC’s cloud image <cloud>).

  • Parallel Environment - A specification for how SGE parallelizes work. Parallel environments can have limits on the number of CPUs used, whitelists and blacklists that dictate who can use resources, and specific methods for balancing server load during distributed tasks.

  • The Job Queue - A grouping of jobs that run at the same time. The queue can be frozen, in which case all jobs that it contains will cease.

  • Head Node - The primary node of an HPC cluster, to which all other nodes are connected. The head node will run a job scheduler (such as Sun Grid Engine) to allocate jobs to the other nodes.

  • Worker Node - A node in an HPC cluster to which tasks are delegated by the head node via a job scheduler.

  • Job Submission Script - A shell script with a series of commands to be executed as part of the job. Submission scripts may also include flags that activate functionality specific to the scheduler.

Configuring A Parallel Environment

The specifics of configuring a parallel environment in SGE more broadly are beyond the scope of this guide (see Oracle’s blog for a good primer). Nevertheless, we will discuss how to configure an environment that is compatible with C-PAC. To do this, we will first create a file named mpi_smp.conf that will appear as follows:

pe_name            mpi_smp
slots              999
user_lists         NONE
xuser_lists        NONE
start_proc_args    NONE
stop_proc_args     NONE
allocation_rule    $pe_slots
control_slaves     TRUE
job_is_first_task  FALSE
urgency_slots      min
accounting_summary TRUE

This configuration ensures that:

  • All of the cores will be used (assuming your system has fewer than 999 cores; if you are lucky enough to have more than this, the maximum value for this field is 9999999).

  • No users are whitelisted or blacklisted and no special hooks or cleanup tasks occur before or after a job.

  • All job slots that a C-PAC job submission requests are on the same machine (this ensures that each unique subject’s computations are taken care of by the same node and the cores allocated for one of C-PAC’s steps are not distributed across different machines).

  • SGE has full control over the jobs submitted (in terms of resource scheduling).

  • The C-PAC run is not part of a parallel job that would require an awareness of which task was performed first (the subjects can be assigned to nodes in any order).

  • An accounting record is written concerning how the job used resources.

To activate this parallel environment and tie it to a job queue named all.q, execute the following commands on your cluster’s master node:

qconf -Ap /path/to/mpi_smp.conf
qconf -mattr queue pe_list "mpi_smp" all.q

You would then set the SGE Parallel Environment to mpi_smp and the SGE queue to all.q in your pipeline configuration file before starting your C-PAC run.