FitzRoy to Kupe migration
- Important information about use of the filesystems on Kupe.
- Translating your job scheduler Load Leveler scripts to SLURM
- Setting up Cylc tasks for SLURM
You will learn how to migrate from FitzRoy to Kupe.
Important information about use of the filesystems on Kupe.
Your files on FitzRoy in /home
, /data
, and /working
are being synchronized with the new filesystems on Kupe. In particular
Path on FitzRoy | Path on Kupe |
---|---|
/hpcf/home/ |
/nesi/home/ |
/hpcf/working/ |
/nesi/nobackup/ |
/hpcf/data/ |
/nesi/nearline/ |
On Kupe, there will also be a new filesystem, /nesi/projects
.
This synchronization is occurring on a regular basis with the objective of ensuring that the content of directories on Kupe will be identical to those on FitzRoy.
- This means that if you modify or create a file in any of the Kupe filesystems noted above, it will be removed at the next synchronization.
Accordingly, we have created a new filesystem (/nesi/transit
) on Kupe that is not synchronized with FitzRoy. You can safely work in that filesystem without risk of loss of data.
- So, after logging into Kupe you should
cd /transit/<user name>
and work there.
On the date (TBD) that Kupe is deemed to be the primary HPC, synchronization will cease, and you may again resume working from your /home
(/project
) and /nobackup
directories, and move any critical information from /nesi/transit
to its final location.
For operational users, the filesystem synchronization (when initiated) will map as follows (where there may be additional symbolic links to make these more direct).
Path on FitzRoy | Path on Kupe |
---|---|
/oper | /niwa/oper |
/test/ecoconnect | /niwa/devoper/test |
/devel/ecoconnect | /niwa/devoper/devel |
/oper/archive | /niwa/devoper/archive |
Translating your job scheduler Load Leveler scripts to SLURM
As Kupe will use SLURM to manage submitted jobs, you will need to rewrite your scripts to make them compatible with the new job scheduler. SLURM syntax is a bit more restrictive than LoadLeveler syntax:
- For the SLURM directives there must not be any space between # and SBATCH
- SLURM directives must appear at the beginning of a submission script, otherwise they will be ignored
- There must not be white space around equal signs (e.g.,
--job-name = myjob
will not work)
Important: Kupe uses a default stacksize of 8192 kB on the XC50 compute nodes. Make sure to increase stacksize in your submission script using the command ulimit -s <stack size>
, your program may crash with segmentation faults otherwise. Maximum stacksize can be requested using ulimit -s unlimited
.
Commands
LoadLeveler | Slurm |
---|---|
llsubmit | sbatch |
llcancel |
scancel |
llq -u |
squeue -u |
Script directives
LoadLeveler | Slurm |
---|---|
#@ job_name = |
#SBATCH –job-name= |
#@ account_no = |
#SBATCH –account= |
#@ wall_clock_limit = |
#SBATCH –time= |
#@ output = |
#SBATCH –output= |
#@ error = |
#SBATCH –error= |
#@ class = |
#SBATCH –partition= |
#@ resources = ConsumableMemory(<mem>gb) | #SBATCH –mem-per-cpu=<mem>G |
#@ nodes = |
#SBATCH –nodes= |
#@ tasks_per_node = |
#SBATCH –ntasks-per-node= |
#@ parallel_threads = |
#SBATCH –cpus-per-task= |
#@ node_usage = not_shared | #SBATCH –exclusive |
#@ requirements = (Feature===”build_node_name”) | #SBATCH –constraint=build_node_name |
#@ network.MPI = |
#SBATCH –network= |
#@ job_type = parallel | NA |
#@ queue | NA |
Environment variables
LoadLeveler | Slurm |
---|---|
$LOADL_STEP_INITDIR | $SLURM_SUMBIT_DIR |
$LOADL_PROCESSOR_LIST | $SLURM_JOB_NODELIST |
mpiexec command
Fitzroy | Slurm |
---|---|
poe | srun |
Download a printable cheat-sheet
Setting up Cylc tasks for SLURM
Here is an example of a Cylc task family that can be used for submitting tasks to the SLURM scheduler.
[[XC50_SLURM]]
pre-script = "ulimit -s unlimited"
[[[job]]]
batch system = slurm
[[[directives]]]
--partition = general
--job-name = mytestjob
--time = 02:00:00
--mem-per-cpu = 4G
--nodes = 4
--ntasks = 80
--cpus-per-task = 2
[[[environment]]]]
OMP_NUM_THREADS = 2 # Needs to be set in addition to --cpus-per-task
OMP_STACKSIZE = 1g
Note that Cylc does not accept directives without further parameters, such as --exclusive
`. These can be set using SLURM’s environment variables as shown in the example.