This is a short tutorial to help setup a computing account at the University of Helsinki’s computing cluster. The guide is only relevant for UH students and staff.

As a very first step, you need to ask your group leader to add you to the cluster user group. After that is done, you can follow these steps to log in and install runko.

Preliminary steps

First, you need to login to turso cluster at least once to initialize your home directory structure:

ssh -YA username@turso.cs.helsinki.fi

where username needs to be replaced by your uni-account name. The password is your standard university selected one. The connection only works within the university’s eduroam internet network, i.e., you have to be physically at the campus. If you are greeted with the turso terminal, you can continue to the next step.

If you want to connect from outside the university you need to first jump through via e.g., melkinpaasi.cs.helsinki.fi. Tips for how to automate this are given below.

SSH connection

SSH keys for easier login

Next, you need to identify your machine and update the SSH public keys on the gateway hosts. This step needs to be done only once per machine that you will use to login (or jump to other machines).

First, generate a public SSH key (if you dont already have one). On your own machine’s home directory (i.e., ~/) with

mkdir .ssh 
ssh-keygen -t rsa

and press enter for the default suggested directory and for passphrase (i.e. it leave empty).

The command generates

  • .ssh/id_rsa private ssh key
  • .ssh/id_rsa.pub public ssh key (for sharing)

In order to whitelist your computer you need to copy the id_rsa.pub key to the host machine’s SSH config and update the ssh agent. In practice,

ssh-add
ssh-copy-id USER@melkinpaasi.cs.helsinki.fi
ssh-copy-id USER@turso.cs.helsinki.fi

SSH shortcut to your .ssh/config

One final touch is to configure your own SSH connections to include hile as a known host. The following step need to be done only once per machine that you will use to login to hile.

Append to your own machine’s ~/.ssh/config (or create the directory and file if it does not exist)

Host turso
    HostName turso.cs.helsinki.fi
    User username
    IdentityFile ~/.ssh/id_ed25519
    ProxyJump username@melkinpaasi.cs.helsinki.fi
Host hile
    HostName hile01.it.helsinki.fi
    User username
    IdentityFile ~/.ssh/id_rsa
    ProxyJump username@melkki.cs.helsinki.fi

and replace username with the university account name (note that it appears in 4 places here). Note that the whitespace on the command is made via tabs (not spaces).

After this, you should be able to connect to hile from your own machine with

ssh hile

Runko installation

First we need to clone the runko repository to your local hile storage space. All simulation files that need to be accessed by the compute nodes have to reside in the vakka disk space. Therefore, we will also install all of your scripts there.

First, move to the vakka work disk space, and clone the runko repository:

cd /wrk-vakka/users/$USER
git clone --recursive git@github.com:hel-astro-lab/runko.git

Modules

Next, we will automate the loading of the necessary HPC modules on hile. runko repository provides a ready-made module setup file that we can link to the home directory.

ln -s /wrk-vakka/users/$USER/runko/archs/modules ~/modules

After this, we can load the required modules with

module use ~/modules
module --ignore_cache avail
module load runko

The module also sets all the necessary compiler and python directories needed by the code.

Virtual Python environment

Next, we need to initialize our own python virtual environment. You need to have the correct modules (defined above) loaded.

First, we need to create a directory for storing the virtual python packages with

cd /wrk-vakka/users/$USER
mkdir venvs
cd venvs
python3 -m venv runko-cray

Then, activate the environment with

source runko-cray/bin/activate

after which you should see the terminal status bar change to

(runko-cray) username@hile:~$ 

Note that the runko module file activates the python environments automatically the next time you load it.

Then, we can install the python requirements (stored and reloaded automatically when we login and activate the venv) with

MPICC="cc -shared" pip3 install --no-cache-dir --no-binary=mpi4py mpi4py
pip3 install h5py scipy matplotlib numpy

The mpi4py needs to be installed separately because the mpi from cray-python is configured incorrectly (its modules are compiled with GCC, not CC).

The computing environment is now ready for compilation and regular use. It can be loaded at every login by issuing module load runko.

Runko installation

Next we can compile runko using our new modules. First, move to the vakka work disk space

cd $RUNKODIR

It is also recommended to modify the runko/CMakeLists.txt and activate hile specific compiler flags by adding (around line 30):

set(CMAKE_CXX_FLAGS_RELEASE "-Ofast -flto -ffp=4 -march=znver3 -mtune=znver3 -fopenmp -fsave-loopmark") 

Runko installation is now possible. We can compile the code with

mkdir build
cd build
CC=cc CXX=CC cmake -DPython_EXECUTABLE=$(which python3) -DCMAKE_BUILD_TYPE=Release ..
make -j4

After which you should see the compilation take place and the tests being run. Note that the CMake will not find the correct Cray compilers if they are not provided via the prefix CC=c-compiler CXX=c++-compiler before the cmake call.

Runko and SLURM usage

Submitting an example job

The code can be ran by e.g., submitting an example SLURM job in the shock project directory

cd $RUNKODIR
cd projects/pic-shocks
cd jobs

and submitting the example job

sbatch 1dsig3.hile

with the content of 1dsig3.hile being something like

#!/bin/bash
#SBATCH -J 1ds3
#SBATCH -C c
#SBATCH --output=%J.out
#SBATCH --error=%J.err
#SBATCH -c 1                   # cores per task
#SBATCH --ntasks-per-node=32  # 128 for amd epyc romes
#SBATCH -t 00-03:00:00         # max run time
#SBATCH --nodes=1              # nodes reserved
#SBATCH --mem-per-cpu=7G       # max 7G/128 cores
#SBATCH --distribution=block:block

# SBATCH --exclude=  # exclude some nodes
# SBATCH --nodelist= # white list some nodes

module load libfabric

# HILE-C node list
# x3000c0s14b1n0,x3000c0s14b2n0,x3000c0s14b3n0,x3000c0s14b4n0,x3000c0s16b1n0,x3000c0s16b2n0,x3000c0s16b3n0,x3000c0s16b4n0,x3000c0s18b1n0,x3000c0s18b2n0,x3000c0s18b3n0,x3000c0s18b4n0

# specific environment variable settings
export OMP_NUM_THREADS=1
export PYTHONDONTWRITEBYTECODE=true
export HDF5_USE_FILE_LOCKING=FALSE

# Cray optimizations
export MPICH_OFI_STARTUP_CONNECT=1  # create mpi rank connections in the beginning, not on the fly
export FI_CXI_DEFAULT_TX_SIZE=16384 # 4096 # increase max MPI msgs per rank
# export FI_CXI_RDZV_THRESHOLD=16384 # same but for slingshot <2.1
export FI_CXI_RX_MATCH_MODE=hybrid # in case hardware storate overflows, we use software mem

# export FI_OFI_RXM_SAR_LIMIT=524288 # mpi small/eager msg limit in bytes
# export FI_OFI_RXM_BUFFER_SIZE=131072 # mpi msg buffer of 128KiB

# go to working directory
cd $RUNKODIR/projects/pic-shocks/

srun --mpi=cray_shasta python3 pic.py --conf 1dsig3.ini   # Cray

This uses hile to run a job in the cpu queue (-C c) on one node (--nodes=1) with 32 cores (--ntasks-per-node=32).

Note that the job script needs to include the module load libfabric which replaces the libfabric module with the newer version located at the compute node.

Basic SLURM commands

You can check the status of the SLURM queue with

squeue

and status of your own jobs with

sacct

Sometimes you might also need information about the available partitions which can be accessed with

sinfo -M all

Updated: