TSD is connected to a large computing cluster running a SLURM grid. Information about submitting jobs to the cluster can be found on the UiO TSD webpages. Here we will outline some suggestions and recommendation we have on setting up your Rscripts for running on the grid, based on the experiences we have made.
When you submit a job to the grid, alot of the setup you have in your terminal will not be inherted by the job. This is good, because then you can make sure that your script is reproducible. At the core, you will need two files to submit a job:
When loading libraries in an RScript that will be used on the
cluster, we recommend explicitly static the folder where the
installed package can be found. For most users, this will be in
your home ($HOME
) folder, where the cluster actually cannot
access it. The cluster nodes need libraries and software (if not using
modules) to be stored on the /cluster
disk. We recommend
that projects create a central repository for project members to use and
access when running analyses, or if specific package versions need to be
used, to make a system where software and libraries can be stored.
# Create folders to place software and libraries
mkdir /cluster/projects/pXXX/tools/
# Create folder for r libraries
mkdir -p /cluster/projects/pXXX/tools/r_libs/common
In the common
folder place installed packages that users
regularly use and share. Let users create specific own folder within
r_libs
if there are specific versions of packages they need
to use.
Installing packages into specific folders can be done by using the
lib
argument to `install.packages()``
By doing this, library calls within the Rscript you want to submit to
the cluster should be called using the lib.loc
argument,
giving the folder of the installed package.
Make sure your analysis outputs files with your analysis results, as
tables (write.table
) or plain text files
(writeLines
) or any other output you want to work with
later.
We recommend having your Rscript contain alot of either
cat()
or print()
messages, so that you can
catch these for debugging if (when) things fails. We also recommend
adding a cat(Sys.date())
command at the beginning and end
(and maybe even intermediate places in the script), so your output log
can check the amount of time different operations take. Some times
things stall, or get stuck for some reason, and a timestamp might help
you figure out where that is happening.
We will not cover the settings of the slurm script, these really depend on the analyses you are doing, and as such you should follow the USIT guide for settings.
Once you have some base settings for your SLURM script in place, you will need to load R as a module before calling your RScript. We recommend always being explicit i which R-version you want to use (modules load a default version, but for reproducibility you should always specify a version, even if it is the default one).
#!/bin/bash
#SBATCH --account=pXXX
#SBATCH --time=1:00:00
#SBATCH --mem-per-cpu=1G
# Set up job environment:
module purge # clear any inherited modules
module load R/4.0.2
# Run Rscript in a clean R instance, output a logfile
Rscript --vanilla --verbose /cluster/projects/pxxx/path/to/myScript.R > slurm-${SLURM_JOBID}.Rout 2>&1
# append logfile to this scripts logfile
cat slurm-${SLURM_JOBID}.Rout >> slurm-${SLURM_JOBID}.out
# remove Rout log
rm slurm-${SLURM_JOBID}.Rout
This will run your Rscript in a vanilla instance of R, meaning it sets up no environment, profile or the like, it opens R completely clean. This is recommended to keep your script reproducible. It is set to be “verbose” as this means R will be extra vocal when running commands, and we capture that information in a logfile. This makes it easier to understand why your scripts fail when they do.
Once you have your slurm script ready, and your Rscript ready, you can submit your script to the cluster.
Then start debugging what is going wrong. Getting scripts running correctly on TSD can some times require some extra work, but read the slurm logs, and keep fixing the issues as they occur. Submitting a new script rarely works on the first try, we are human after all.
Good luck!