You will learn how to migrate from FitzRoy to Kupe.

Important information about use of the filesystems on Kupe.

Your files on FitzRoy in /home, /data, and /working are being synchronized with the new filesystems on Kupe. In particular

Path on FitzRoy Path on Kupe
/hpcf/home/ /nesi/home/
/hpcf/working/ /nesi/nobackup/
/hpcf/data/ /nesi/nearline/ (**no direct access!**)

On Kupe, there will also be a new filesystem, /nesi/projects.

This synchronization is occurring on a regular basis with the objective of ensuring that the content of directories on Kupe will be identical to those on FitzRoy.

  • This means that if you modify or create a file in any of the Kupe filesystems noted above, it will be removed at the next synchronization.

Accordingly, we have created a new filesystem (/nesi/transit) on Kupe that is not synchronized with FitzRoy. You can safely work in that filesystem without risk of loss of data.

  • So, after logging into Kupe you should cd /transit/<user name> and work there.

On the date (TBD) that Kupe is deemed to be the primary HPC, synchronization will cease, and you may again resume working from your /home (/project) and /nobackup directories, and move any critical information from /nesi/transit to its final location.

For operational users, the filesystem synchronization (when initiated) will map as follows (where there may be additional symbolic links to make these more direct).

Path on FitzRoy Path on Kupe
/oper /niwa/oper
/test/ecoconnect /niwa/devoper/test
/devel/ecoconnect /niwa/devoper/devel
/oper/archive /niwa/devoper/archive

Translating your job scheduler Load Leveler scripts to SLURM

As Kupe will use SLURM to manage submitted jobs, you will need to rewrite your scripts to make them compatible with the new job scheduler. SLURM syntax is a bit more restrictive than LoadLeveler syntax:

  • For the SLURM directives there must not be any space between # and SBATCH
  • SLURM directives must appear at the beginning of a submission script, otherwise they will be ignored
  • There must not be white space around equal signs (e.g., --job-name = myjob will not work)

Important: Kupe uses a default stacksize of 8192 kB on the XC50 compute nodes. Make sure to increase stacksize in your submission script using the command ulimit -s <stack size>, your program may crash with segmentation faults otherwise. Maximum stacksize can be requested using ulimit -s unlimited.

Commands

LoadLeveler Slurm
llsubmit sbatch
llcancel scancel
llq -u squeue -u

Script directives

LoadLeveler Slurm
#@ job_name = #SBATCH –job-name=
#@ account_no = #SBATCH –account=
#@ wall_clock_limit = #SBATCH –time=
#@ output = #SBATCH –output=
#@ error = #SBATCH –error=
#@ class = #SBATCH –partition=
#@ resources = ConsumableMemory(<mem>gb) #SBATCH –mem-per-cpu=<mem>G
#@ nodes = #SBATCH –nodes=
#@ tasks_per_node = #SBATCH –ntasks-per-node=
#@ parallel_threads = #SBATCH –cpus-per-task=
#@ node_usage = not_shared #SBATCH –exclusive
#@ requirements = (Feature===”build_node_name”) #SBATCH –constraint=build_node_name
#@ network.MPI = #SBATCH –network=
#@ job_type = parallel NA
#@ queue NA

Environment variables

LoadLeveler Slurm
$LOADL_STEP_INITDIR $SLURM_SUMBIT_DIR
$LOADL_PROCESSOR_LIST $SLURM_JOB_NODELIST

mpiexec command

Fitzroy Slurm
poe srun

Download a printable cheat-sheet

Setting up Cylc tasks for SLURM

Here is an example of a Cylc task family that can be used for submitting tasks to the SLURM scheduler.

    [[XC50_SLURM]]
        pre-script = "ulimit -s unlimited"
        [[[job]]]
            batch system = slurm
        [[[directives]]]
            --partition = general
            --job-name = mytestjob
            --time = 02:00:00
            --mem-per-cpu = 4G
            --nodes = 4
            --ntasks = 80
            --cpus-per-task = 2
        [[[environment]]]]
            OMP_NUM_THREADS = 2      # Needs to be set in addition to --cpus-per-task
            OMP_STACKSIZE = 1g

Note that Cylc does not accept directives without further parameters, such as --exclusive`. These can be set using SLURM’s environment variables as shown in the example.