Main Page

From Advanced Computing Facility
Jump to: navigation, search

Advanced Computing Facility Documentation
This site is under construction! If you would like to ask questions or suggest topics for the cluster documentation, please email clusterhelp@acf.ku.edu


Contents

Frequenty Asked Questions

Answers to frequently asked questions and solutions to common problems can be found on our FAQ page.

Access

Who can access the ACF cluster?

Anyone with an ITTC/ACF user account can access the cluster. Email clusterhelp@acf.ku.edu to find out how to sign up for an account.

How to access the cluster in Nichols hall

When you are using a machine connected to the ITTC network, you can access the cluster through SSH to either login1.acf.ku.edu or login2.acf.ku.edu.
SSH can provide command-line access or GUI application access with X11 forwarding.

Remote access

If you are coming from a University computer, try to ssh directly into the cluster login servers:

login1.acf.ku.edu or login2.acf.ku.edu

To access the cluster from outside the University of Kansas, you can use the KU Anywhere VPN. For more details visit: http://technology.ku.edu/software/ku-anywhere-0


Storage

Filesystems

The cluster filesystems can accommodate researchers working with small to moderate data sizes (10GB to 1TB) with varying approaches to fit the specific need. Please email clusterhelp@acf.ku.edu to request additional storage space. Be specific about the amount of space required, the duration of time the data will be stored, and whether the data needs to be backed up.
Cluster filesystems and descriptions

  • /users
The /users filesystem is the most heavily used on the cluster and throughout ITTC. It is extremely important to make sure this filesystem is lightly loaded and always responsive. When running cluster jobs, you may use /users for your compiled programs and cluster job organization, but it is important to store and access data on other filesystems.
  • /data
The /data filesystem is best suited for storing large data sets. The intended usage case for /data is for files that are written once, and read multiple times.
  • /work
The /work filesystem is best suited for recording output from cluster jobs. If a researcher has a batch of cluster jobs that will generate large amounts of output, space will be assigned in /work.
  • /projects
The /projects filesystem is typically used for organizing group collaborations.
  • /scratch
The /scratch filesystem is the only cluster filesystem that is not backed up. This space is used for storing data temporarily during processing on the cluster. Exceptionally large data sets or large amounts of cluster jobs' output may pose difficulty for the storage backup system and are stored in /scratch during processing.
  • /library
The /library directory contains read-only space for researchers who need copies of data on each node of the cluster. Email clusterhelp@acf.ku.edu to ask for data sets to be copied to /library.
  • /tmp
Each node has a local storage space that is freely accessible in /tmp. It is often useful to write output from cluster jobs to the local disk, archive the results, and copy the archive to another cluster filesystem.

File transfer

The system transfer.acf.ku.edu is setup to handle file transfer requests. It is only accessible on the KU campus or when connected to the KU Anywhere VPN. You can transfer data to/from any cluster filesystem using scp (one popular windows client is WinSCP) or by mounting the storage using SMB.
Example using linux/OSX:
scp -r my_folder username@transfer.acf.ku.edu:/scratch/space
Example using SMB mounts under windows:
\\transfer.acf.ku.edu\username
\\transfer.acf.ku.edu\data
Example using SMB mounts under OSX:
smb://transfer.acf.ku.edu/work

Please email clusterhelp@acf.ku.edu for assistance.

Job Submission

A complete guide to submitting cluster jobs is here:
Cluster Jobs Submission Guide

Simplest cluster job example (run this on login1 or login2):
echo "echo Hello World!" | qsub

Queues

The cluster has 5 queues:

  • default
The default queue is used when you do not specify a queue. The default queue has a walltime limit of 1 week. Jobs in this queue have a walltime limit of 3 days unless specified.
  • long
The long queue has a minimum walltime limit of 1 week. There is no maximum walltime. It is up to users to set an acceptable maximum duration of time for jobs to complete.
  • gpu
The gpu queue is used to access cluster nodes having NVIDIA GPU's. CUDA or OpenCL application support is required to make use of GPU computing resources.
  • bigm
The bigm queue is used to access the cluster "large memory" nodes. These nodes are a special resource to be used only when applications require large amounts of memory on a single node.
  • interactive
The interactive queue is used for using graphical interfaces and remote desktop on the cluster and for testing, debugging, and profiling cluster jobs. The "-I" flag (capital "i") to qsub must also be specified to start an interactive job. Jobs in the interactive queue have a walltime limit of 1 day.

Resources

Resources are requested with the "-l" flag (lower case "L") to qsub. Common resources are nodes, ppn (processors per node), mem (memory), and walltime (maximum time required for the job).
For example, this command requests an interactive session with 8 cores on a single node, 12000MB total memory, and at most 10 hours to complete:
qsub -q interactive -I -l nodes=1:ppn=8,mem=12000m,walltime=10:00:00
A simple method for requesting an interactive cluster job is provided with the "qlogin" command. By default, "qlogin" requests 1 node, 1 core, and 1993MB of memory.
qlogin [number_of_nodes] [processors_per_node],[mem=total_memory]

Memory

It is important to request a large enough amount of memory for the cluster jobs you submit. Specifying a large amount of memory to run a job is safe--but will lead to poor utilization on the cluster. The cluster scheduler will kill any jobs that exceed their requested memory.

The recommended practice is to run a single job interactively first and follow the job profiling guidelines in the Profiling section to measure the amount of memory and the number of threads used. After measuring the memory and cpu usage, the correct amount of system resources can be requested for future jobs.

Walltime

The default queue has a default walltime of 3 days (72 hours) and a maximum walltime of 1 week (168 hours). For jobs longer than 1 week, use the "long" queue. The scheduler will kill jobs that exceed the requested amount of walltime.

Walltime is requested using the "-l" flag to qsub in Hours, Minutes, and Seconds, using the following format:
walltime=HH:MM:SS

Email job notification

Email notification by jobs is provided through the "-m" and "-M" qsub flags. Specify an email address to the "-M" qsub flag, and the cluster will send notifications to that email address.

The behavior of email notifications is set by combinations of the letters "a", "b", "e", and "n" provided to the "-m" qsub flag. The options for email notifications are:

  • "a" Abort. Email is sent when job errors are encountered.
  • "b" Begin. Email is sent when jobs begin.
  • "e" End. Email is sent when jobs end.
  • "n" Never. No email is sent.

For large numbers of jobs, the recommended setting is "-m a" which will provide notification only if there is an error.

Output files

The cluster scheduler collects text from STDOUT and STDERR for each job. The text is held locally on the node until the job finishes and then copies back to the same directory from which the job was submitted. You can control the behavior of output files by using the "-o" "-e" and "-j" qsub flags.

If the output file name is omitted, the job will write out files with the same name as the cluster job. For more information, read the Cluster Jobs Submission Guide.

GPU jobs

Jobs that make use of GPU's on the nodes may be requested using the gpu queue and an additional resource flag. On the command line to "qsub", a number of GPU's per node can be requested as:

qsub -q gpu -l nodes=1:ppn=1:gpus={number}

These arguments can be used inside a script file as:

#PBS -q gpu
#PBS -l nodes=1:ppn=1:gpus={number}

The "gpus={number}" argument can be used with varying number of nodes and cores and can be used together with other arguments to the "-l" flag as "memory" and "walltime".

When a job starts, using the above request, it will assign {number} GPU's to the job and export 2 environment variables for users to control which GPU's the job assigns its processes to, namely PBS_GPUFILE and CUDA_VISIBLE_DEVICES.

PBS_GPUFILE contains the name of a file which holds the set of GPU's assigned. The file includes a line for each assigned gpu with the syntax {hostname}-gpu{number}. For example, if a job requests 2 GPU's and is assigned GPU's numbered 1 and 3 on the system, the contents of the file would be:

{hostname}-gpu1
{hostname}-gpu3

CUDA_VISIBLE_DEVICES contains a comma-separated list of GPU device numbers assigned. This variable has important effects on CUDA processes, as it directs the CUDA runtime libraries to automatically run using the numbered GPU's. Other CUDA devices on the system will not be detected by the process.

Application Support

Software environment

The cluster consists of nodes loaded with Red Hat Enterprise Linux version 6 (RHEL6). By default, gcc 4.4.6 is used, but other versions of gcc are available. All installed applications are found in /tools/cluster/6.2

Application usage

Many cluster applications require a set of environment variables to be correctly set, most notably PATH and LD_LIBRARY_PATH. To manage software-specific environment variables, two methods are provided.

env-selector and mpi-selector

To simplify application usage on the cluster, we have provided a program called "env-selector-menu" that handles each user's combined environment variables to support software. Run:

env-selector-menu

This creates a file in the user's home directory called .env-selector containing the selections. You may remove this file to clear the selections chosen.

Env-selector is based on a software called mpi-selector (copyright Cisco 2007), which is also installed. To compile and run cluster applications using MPI, first choose the correct MPI version by running:

mpi-selector-menu

Environment modules

A list of available modules is provided by running:

module avail

An environment module can be loaded in the current shell by running:

module load {module_name}

Tab-completion will correctly show the names of modules after typing in the "module load" command. To make settings persistent, add each "module load {module_name}" command to your ~/.bash_profile script if you are using the bash shell or to your ~/.cshrc file if you are running csh or tcsh.

To see which modules are currently loaded, run:

module list

GUI and remote desktop support

X11 forwarding

Access to a GUI running on the cluster may be accomplished with X11 forwarding. Data from the remote application is sent over ssh to an X server running locally. Each additional ssh connection between the local machine and the cluster must be started with X11 forwarding enabled. The cluster scheduler uses the "-X" flag to qsub to specify X11 forwarding to the login nodes. The following steps assume that the local machine has an X server running.

1. Login via ssh to login1 or login2. Make sure your local ssh client has X11 forwarding enabled. If you are using ssh on the command line, add the "-Y" flag to your ssh command.
2. Start an interactive session with X11 forwarding. Be sure to request the number of cores, amount of memory, and walltime to complete your job. Syntax:

qsub -X -I -q interactive -l nodes={m}:ppn={n},mem={memory},walltime={time} 

VNC remote desktop

Access to desktop sessions over VNC is provided using a websocket-connected VNC server. The following steps start a VNC server on the node in an interactive session, and provide a URL to access the VNC session using a HTML5 capable browser. The URL is accessible from the KU campus or when connected to the KU Anywhere VPN.

1. Run "env-selector-menu". From the menu, choose "TurboVNC-1.1" and "novnc-0.4-11"
2. Start an interactive session. Be sure to request the number of cores, amount of memory, and walltime to complete your job. Syntax:

qsub -I -q interactive -l nodes={m}:ppn={n},mem={memory},walltime={time} 

3. Once you have started an interactive session, run "glogin". The first time you run "glogin", you will be prompted to create a new password to access your VNC session. The password must be less than 8 characters.
4. "glogin" will display a URL. Copy the URL and paste it into a web browser on your system. Enter your username and VNC password to login.
5. When you have completed your job, return to your interactive session to stop the VNC server by typing "ctrl-C".

ACF Portal Website

portal.acf.ku.edu

The ACF Portal can be used for up to 10 simultaneous interactive cluster jobs with a graphical desktop. Access to the ACF Portal is available on the KU campus or when connected to the campus network via the KU Anywhere VPN.

At this time, interactive cluster jobs through the ACF Portal are limited to a single core, 2GB of memory, and 72 hrs of walltime. Future developments may allow users to change the resources requested on the interactive job.

Linux desktop sessions, when started via the portal, can be connected to with the TurboVNC viewer, available for download at the TurboVNC Sourceforge page. Description of the TurboVNC project is online at www.turbovnc.org.

portal.acf.ku.edu also provides a browser-embedded VNC client using Java. To make use of the Java-based VNC client, users may need to adjust Java security settings. The settings can be changed using the Java Control Panel (JCP). See documentation of the JCP at the official Java website. To access the JCP under linux, run "jcontrol" from the command line.

Matlab support

Support for running many matlab jobs on the cluster is provided using the matlab compiler mcc. There are several important considerations for using matlab:

  1. Matlab automatically runs as many threads in each program instance as the number of cores on the system. Users need to use the matlab function "maxNumCompThreads(max_threads)" to set the correct number for the application and request the same number of cores on a single node from the scheduler.
  2. The matlab compiler cache by default unpacks an archive into the user's ~/.mcrCache directory and locks it. Other simultaneous programs will wait until the lock is removed before starting to run. Users need to set the MCR_CACHE_ROOT to a different directory for each cluster job. Typically, this is a directory in /tmp provided by the command "mktemp -d".
  3. To pass in arguments to a compiled matlab program, the program must be written as a matlab function. All arguments are passed as strings. Numerical arguments have to be converted from strings to numbers by using the "str2num(var)" function.

To provide a simple method of compiling and testing matlab programs on the cluster, we have provided the "makematlab" script which automates the tasks required for converting a matlab script to a compiled matlab program. Read more about makematlab.

Software compilation

The cluster provides gcc-4.4.6 support by default, and each other version of gcc in the 4.x.x branch is available through "env-selector-menu". Other libraries such as BLAS, LAPACK, ATLAS, and FFTW3 are available in the /tools/cluster/6.2 directory. When compiling code, request an interactive cluster session and avoid using the login nodes for compilation. If you encounter a missing dependency, please email clusterhelp@acf.ku.edu for assistance.

CUDA support

The latest revsion of each CUDA version is available using "env-selector-menu". The tools are provided for researchers needing access to a specific compiler version. Only the newest version of CUDA can be used for running compiled programs on the GPU nodes in the cluster.
Currently, access to GPU nodes is provided by using the "gpu" queue. Any jobs submitted to "gpu" will only run on GPU nodes. The method of requesting GPU nodes may change to accomodate greater numbers of GPU nodes and different hardware types.

Hadoop support

The ACF cluster supports hadoop in conjunction with the Hadoop-On-Demand (HOD) scheduler. A hadoop cluster with a variable number of nodes can be started from any node in the cluster, but not from the login nodes login1/login2. This means you need to run HOD from within another cluster job which controls the hadoop instance.

HOD has a set of options which control how the hadoop cluster is setup. The essential options are wrapped up in provided scripts named "hodstart" and "hodstop". Required environment variables are declared in env.sh/env.csh scripts in the working directory provided to "hodstart". Note that the variable $HDFS stands for the hdfs filesystem.

hodstart -d work_dir [-v version] [-q queue] [-n number_nodes] [-m memory] [-w walltime]
hodstop -d work_dir

An example hadoop job demonstrating file transfer is shown below.

# Setup basic test
mkdir ~/test
hodstart -d ~/test -n 16 -m 80g -w 00:10:00
source ~/test/env.sh
hadoop dfs -mkdir $HDFS/test
# Create test file to transfer
dd if=/dev/zero of=~/test/txfile bs=1048576 count=1024
# Transfer file
hadoop dfs -cp ~/test/txfile $HDFS/test
# Remove original file
rm ~/test/txfile
# Verify file transferred correctly
hadoop dfs -ls $HDFS/test
# Cleanup
hadoop dfs -rmr $HDFS/test
hodstop -d ~/test

LAMMPS Support

Several versions of LAMMPS are available from the env-selector menu; however, long simulations and large jobs may require special features and/or be very sensitive to specific performance optimizations.

Pre-optimized version

A new optimized build of LAMMPS has been provided to improve runtimes on all hardware types on the ACF cluster; this is listed in env-selector with a "-impi" suffix ("lammps-20140201-impi" at the time of this writing) since it is built against Intel MPI and the MKL libraries. Since this is a departure from previous builds which used GCC, MPICH2, and KISS FFT, a few modifications are required:

1. Before running this optimized LAMMPS in a cluster job, execute "module load lammps-impi/20140201" to override default environment variables

2. Call lmp_acf in place of lmp_g++

The first requirement may be removed in the future; for now, the older versions are being given priority for reasons of backward compatibility and reproducibility of research. After further testing, validation, and review, environment module configurations such as this will be integrated into the standard offerings; for now, please forward any feedback about this environment to clusterhelp@acf.ku.edu

Manual Optimization and Benchmarking

The ACF has provided several tuning guides to help identify strategies for LAMMPS simulation tuning; for now, there is a basic guide which covers methods for collecting repeatable and uniform measurements and an intermediate guide which discusses some built-in LAMMPS optimizations, launch strategies, and library choices for optimizations. There is also an advanced guide which provides examples of processor-specific compiler options and custom MPI wrappers to enable tuning the entire software stack.

The following is a brief list of topics contained in these guides:

  • Basic concepts
    • Running several small jobs vs. one large job
    • Impact of NUMA topology on performance
    • Variations in performance under varying system loads
    • Scripting reproducible builds
  • Intermediate methods
    • Try the OPT package to trim unnecessary calculations
    • Use the OMP package to reduce communication overhead
    • Use the fix balance command to distribute computation efficiently
    • Try linking against optimized math libraries such as FFTW
  • Advanced strategies
    • Compile different components with different optimizing compilers according to their relative strengths
    • Use architecture-specific compiler optimizations depending on the machine where the code is to be executed
    • Pin processes to avoid inefficiency caused by process migration
    • Use custom MPI wrappers based on research-class optimizing compilers
    • Fine-tune multithreading at multiple execution levels

Profiling

The ACF has a license for Intel's Cluster Studio XE, which includes a variety of profiling and tuning interfaces, notably including VTune; VTune Amplifier XE is used to collect and visualize code efficiency, and the VTune Trace Analyzer and Collector is used to collect and visualize message passing efficiency. Both require the intel_compiler environment from env-selector.

Amplifier XE is easiest to run with X forwarding (though the display may be slow off campus):

ssh -Y login1.acf.ku.edu
qsub -I -X
source /tools/cluster/6.2/intel/vtune_amplifier_xe_2013/vtune_amplifier_xe/amplxe-vars.sh
amplxe-gui

ITAC data is collected using mpirun -trace:

# On a compute node using qsub:
source /tools/cluster/6.2/intel/itac/8.1.0.024/bin/itacvars.sh ""
mpirun -trace <program>

This will produce a programname.stf file which can be analyzed in the GUI from a login node:

ssh -Y login1.acf.ku.edu
source /tools/cluster/6.2/intel/itac/8.1.0.024/bin/itacvars.sh ""
traceanalyzer programname.stf

Screenshots, highlights, and simple workflows can be found at the following pages:

To do full profiling on code generated with GCC, use the 'mpigcc' and 'mpigxx' wrappers included with Intel MPI; to create custom wrappers for other compilers, see also the guide on the Intel MPI Binding Kit.

Please forward any corrections or requests to the help team.

Debugging

Hardware Resources

The state of the ACF Cluster computing resources.


Current Cluster Status:


Helpful Commands:

 Queue related commands:
 rqs
 qstat
 showstart
 showq
 Job related commands:
 checkjob
 qstat
 Node related commands:
 freecores
 freenodes
 pbsnodes
 myres
 Storage related commands:
 myquota
Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox