This is a short tutorial to help setup a computing account at the University of Helsinki’s computing cluster. The guide is only relevant for UH students and staff.

As a very first step, you need to ask your group leader to add you to the cluster user group. After that is done, you can follow these steps to log in and install runko.

Preliminary steps

First, you need to login to turso cluster at least once to initialize your home directory structure:

ssh -YA username@turso.cs.helsinki.fi

where username needs to be replaced by your uni-account name. The password is your standard university selected one. The connection only works within the university’s eduroam internet network, i.e., you have to be physically at the campus. If you are greeted with the turso terminal, you can continue to the next step.

If you want to connect from outside the university you need to first jump through via e.g., melkinpaasi.cs.helsinki.fi. Tips for how to automate this are given below.

SSH connection

SSH keys for easier login

Next, you need to identify your machine and update the SSH public keys on the gateway hosts. This step needs to be done only once per machine that you will use to login (or jump to other machines).

First, generate a public SSH key (if you dont already have one). On your own machine’s home directory (i.e., ~/) with

mkdir .ssh 
ssh-keygen -t rsa

and press enter for the default suggested directory and for passphrase (i.e. it leave empty).

The command generates

  • .ssh/id_rsa private ssh key
  • .ssh/id_rsa.pub public ssh key (for sharing)

In order to whitelist your computer you need to copy the id_rsa.pub key to the host machine’s SSH config and update the ssh agent. In practice,

ssh-add
ssh-copy-id USER@melkinpaasi.cs.helsinki.fi
ssh-copy-id USER@turso.cs.helsinki.fi

SSH shortcut to your .ssh/config

One final touch is to configure your own SSH connections to include hile as a known host. The following step need to be done only once per machine that you will use to login to hile.

Append to your own machine’s ~/.ssh/config (or create the directory and file if it does not exist)

Host turso
    HostName turso.cs.helsinki.fi
    User username
    IdentityFile ~/.ssh/id_ed25519
    ProxyJump username@melkinpaasi.cs.helsinki.fi
Host hile
    HostName hile01.it.helsinki.fi
    User username
    IdentityFile ~/.ssh/id_rsa
    ProxyJump username@melkki.cs.helsinki.fi

and replace username with the university account name (note that it appears in 4 places here). Note that the whitespace on the command is made via tabs (not spaces).

After this, you should be able to connect to hile from your own machine with

ssh hile

Runko installation

Modules

Next, we will automate the loading of the necessary HPC modules on hile. SSH to hile and then add to your ~/.bashrc:

## hile modules
module purge

# Cray
module load PrgEnv-cray
module load craype-x86-milan # or rome
module load cce
module load craype

# OFI
module load cray-mpich
module load craype-network-ofi
module load libfabric

# shared memory support
module load cray-pmi
module load cray-dsmml
module load cray-openshmemx

# other tools
module load cray-hdf5
module load cray-python
module load cray-libsci
module load perftools-base

export CXX=CC
export CC=cc

source /wrk-kappa/users/$USER/venvs/runko-cray/bin/activate

#--------------------------------------------------
export RUNKODIR=/wrk-kappa/users/$USER/runko

PYTHONPATH="${PYTHONPATH:+${PYTHONPATH}:}$RUNKODIR/"
PYTHONPATH="${PYTHONPATH:+${PYTHONPATH}:}$RUNKODIR/lib"
PYTHONPATH="${PYTHONPATH:+${PYTHONPATH}:}$RUNKODIR/external/corgi/lib"
export PYTHONPATH

This loads the correct Cray HPC modules when you login to the cluster.

Virtual Python environment

Next, we need to initialize our own python virtual environment. You need to have the correct modules (defined above) loaded so if you have not done so yet, logout and login to load the Cray HPC dev environment.

First, we need to create a directory for storing the virtual python packages with

cd $RUNKODIR
mkdir venvs
cd venvs
python3 -m venv runko-cray

Then, activate the environment with

source venvs/runko-cray/bin/activate

after which you should see the terminal status bar change to

(runko-cray) username@hile:~$ 

or similar. Note that our .bashrc should already have the activation command in it so in the future, when you login back to hile, you should automatically have the correct python environment loaded.

Then, we can install the python requirements (stored and reloaded automatically when we login and activate the venv) with

pip3 install mpi4py --force-reinstall --no-binary mpi4py 
pip3 install h5py scipy matplotlib numpy

The mpi4py needs to be installed separately because the mpi from cray-python is configured incorrectly (its modules are compiled with GCC, not CC)

The computing environment is now ready for compilation and regular use.

Runko installation

All simulation files that need to be accessed by the compute nodes have to reside in the kappa disk space. Therefore, we will also install all of your scripts there. First, move to the kappa work disk space, and clone the runko repository:

cd /wrk-kappa/users/$USER
git clone --recursive https://github.com/natj/runko.git

It is also recommended to modify the runko/CMakeLists.txt and activate hile specific compiler flags by adding (around line 30):

set(CMAKE_CXX_FLAGS_RELEASE "-Ofast -flto -ffp=4 -march=znver3 -mtune=znver3 -fopenmp -fsave-loopmark") // cray compiler flags

Runko installation is now possible. We can compile the code with

cd runko
mkdir build
cd build
CC=cc CXX=CC cmake -DPython_EXECUTABLE=$(which python3) -DCMAKE_BUILD_TYPE=Release ..
make -j4

After which you should see the compilation take place and the tests being run. Note that the CMake will not find the correct Cray compilers if they are not provided via the prefix CC=c-compiler CXX=c++-compiler before the cmake call.

Runko and SLURM usage

Submitting an example job

The code can be ran by e.g., submitting an example SLURM job in the shock project directory

cd $RUNKODIR
cd projects/pic-shocks
cd jobs

and submitting the example job

sbatch 1dsig3.hile

with the content of 1dsig3.hile being something like

#!/bin/bash
#SBATCH -J 1ds3
#SBATCH -p cpu
#SBATCH --output=%J.out
#SBATCH --error=%J.err
#SBATCH -c 1                   # cores per task
#SBATCH --ntasks-per-node=32  # 128 for amd epyc romes
#SBATCH -t 00-03:00:00         # max run time
#SBATCH --nodes=1              # nodes reserved
#SBATCH --mem-per-cpu=7G       # max 7G/128 cores
#SBATCH --distribution=block:block

# SBATCH --exclude=  # exclude some nodes
# SBATCH --nodelist= # white list some nodes

# HILE-C node list
# x3000c0s14b1n0,x3000c0s14b2n0,x3000c0s14b3n0,x3000c0s14b4n0,x3000c0s16b1n0,x3000c0s16b2n0,x3000c0s16b3n0,x3000c0s16b4n0,x3000c0s18b1n0,x3000c0s18b2n0,x3000c0s18b3n0,x3000c0s18b4n0

# specific environment variable settings
export OMP_NUM_THREADS=1
export PYTHONDONTWRITEBYTECODE=true
export HDF5_USE_FILE_LOCKING=FALSE

# Cray optimizations
export MPICH_OFI_STARTUP_CONNECT=1  # create mpi rank connections in the beginning, not on the fly
export FI_CXI_DEFAULT_TX_SIZE=16384 # 4096 # increase max MPI msgs per rank
# export FI_CXI_RDZV_THRESHOLD=16384 # same but for slingshot <2.1
export FI_CXI_RX_MATCH_MODE=hybrid # in case hardware storate overflows, we use software mem

# export FI_OFI_RXM_SAR_LIMIT=524288 # mpi small/eager msg limit in bytes
# export FI_OFI_RXM_BUFFER_SIZE=131072 # mpi msg buffer of 128KiB

# go to working directory
cd $RUNKODIR/projects/pic-shocks/

srun --mpi=cray_shasta python3 pic.py --conf 1dsig3.ini   # Cray

This uses hile to run a job in the cpu queue (-p cpu) on one node (--nodes=1) with 32 cores (--ntasks-per-node=32).

Basic SLURM commands

You can check the status of the SLURM queue with

squeue

and status of your own jobs with

sacct

Sometimes you might also need information about the available partitions which can be accessed with

sinfo -M all

Updated: