Getting started with supercomputing at CSC

You have signed up for your user account and first CSC project, and are now ready to scale up your computing! This page provides guidance for getting started with using our HPC resources.

It is recommended that new users complete the CSC Computing Environment course, which provides an in-depth introduction to CSC services. The next live teaching instance can be found in the CSC training calendar. A self-learning implementation is also available. The course materials are accessible without having to sign up, and are very useful by themselves.

For a more general introduction to HPC, we recommend the Elements of Supercomputing online course.

Need support?

Do not hesitate to contact the CSC Service Desk if you have any questions about using CSC services. We are happy to help!

Which system should I use?

Puhti

New users are recommended to start working on the Puhti supercomputer. Compared to Mahti, it has much more pre-installed software, more GPU nodes, and typically more available memory per CPU core. Additionally, GPU nodes and some CPU nodes on Puhti have fast local NVMe storage.

Mahti

If you know that your computations are highly parallelizable, you should consider running them on the Mahti supercomputer. Compared to Puhti, Mahti has many more CPU nodes and cores per node. Mahti is intended for computations that are able to effectively utilize at least an entire CPU node.

Additionally, while Mahti has fewer GPU nodes than Puhti, the A100 GPUs on Mahti are considerably more powerful than the V100 GPUs on Puhti, which makes Mahti also suitable for demanding machine learning applications. In contrast to Puhti, only the GPU nodes on Mahti have fast local NVMe storage available.

LUMI

The LUMI supercomputer is one of the fastest in the world. It is intended primarily for running computations that benefit from the large amount of high-performance GPUs in its LUMI-G hardware partition. Whereas the GPUs on Puhti and Mahti are manufactured by Nvidia, the LUMI GPUs are made by AMD, so make sure that your GPU applications are able to run on AMD GPUs. LUMI has its own documentation pages.

On supercomputing

CSC supercomputers offer resources that, when properly used, are well-beyond what the most sophisticated consumer devices are capable of. However, you are not the only one using them. On your personal workstation, you have, in principle, immediate access to resources. On a supercomputer, which is a shared system, you must typically queue for them, since their demand tends to be higher than their supply. See our usage policy for more information.

It is also worth keeping in mind that running your computations on a supercomputer only improves performance if you play to its strengths. Supercomputers are powerful because they allow for parallel computing. If your code is not written to take advantage of multiple CPU cores, or one or more GPUs, there might be no benefit over running it on your own workstation. However, high memory and/or storage requirements, as well as availability of pre-installed software and licenses are other factors which may make using CSC supercomputers attractive for you.

How to access CSC supercomputers?

Web interface

Puhti, Mahti and LUMI each have their own web interface, which allows interacting with the supercomputer through a web browser. The web interface is a good choice for interactive computing, such as analyzing, exploring and visualizing data. For this purpose, the web interface features multiple interactive applications, like Visual Studio Code, Jupyter and RStudio. In addition, it provides a desktop environment featuring software with graphical user interfaces (GUIs), as well as an accelerated visualization app for GPU-accelerated visualization and rendering. For demanding computation, like running full-scale simulations or training neural networks, you should use the command-line interface, since it allows you to access more resources and schedule your jobs.

Command-line interface

While many of the interactive applications in the web interface, such as Jupyter and RStudio, are easy to use and thus a good starting point for using CSC supercomputers, their computing capacity is limited to relatively low-resource interactive usage. If you need access to more resources (e.g. multiple CPU nodes or GPUs) or if your work requires efficiency over interactivity, it is a good idea to switch to using the text-based command-line interface to interact directly with the supercomputer's Linux operating system. While this way of working may seem archaic, it is truly powerful once you get used to it.

The CLI allows you to submit your computations as batch jobs to the SLURM job scheduler, which runs them as soon as the requested resources are available. Importantly, the batch job system ensures that your jobs are run on the compute nodes opposed to the login nodes, which are not intended for heavy computing. Another benefit of batch jobs is that running computations does not necessitate being tied to your workstation. While setting up this automation can require some more planning on your part, in the long run it makes your work more efficient as well as more reproducible both to yourself and other parties, such as reviewers and collaborators.

You can access the command-line interface either by using the shell applications featured in the web interfaces or by using an SSH client on your own workstation.

Connecting with SSH

Please note that connecting to CSC supercomputers from the command-line using an SSH client requires that you first set up SSH keys and add your public key to the MyCSC customer portal. Using SSH keys and MyCSC for adding your public key to a supercomputer is a much more secure way of authenticating than traditional passwords or manually managed SSH keys.

Read the detailed instructions on setting up and using SSH keys.

How to work with software and data?

Software

A variety of useful scientific computing software is available on CSC supercomputers. Puhti is especially distinguished in this regard, having over a hundred pre-installed programs. Our application pages include batch job script examples and guidelines for running the software efficiently on CSC supercomputers. We highly recommend using them as a starting point!

CSC supercomputers use environment modules for managing software environments. These modules cover everything from compilers and programming languages to workflow utilities like Nextflow and Snakemake. Running most of the installed software efficiently requires using the command-line interface, so it is extremely useful to have a working knowledge of the basics of the Linux operating system.

While the pre-installed software covers a wide variety of use cases, it is also possible to install your own applications on CSC supercomputers. The process often differs from carrying out installations on your own computer, so make sure to familiarize yourself with our installation instructions. For compiling HPC applications we have various compilers, high-performance libraries and other utilities available to facilitate this. Note that some installations, such as complex Python environments, benefit from containerization.

You may also wish to develop your own scripts and programs instead of using existing software. It is most efficient to start writing and testing your code on your own device, since running it on a shared system (which supercomputers are) inevitably introduces some overhead. You should only start running your scripts on a supercomputer once you are ready for testing them on a larger scale or using specific resources like GPUs.

Checking availability

If you have a piece of scientific software in mind, it is quite probable that we have it installed on Puhti. Besides browsing Docs CSC, you can search for software on the command-line using the command module spider <search-pattern>. Most often the name of the software module is simply the name of the software itself, and even if your search pattern does not match the module name exactly, the search is case-insensitive and supports partial matches.

Data storage

CSC supercomputers provide distinct disk areas for different data storage purposes. The project-based shared storage can be found under /scratch/<project>. This folder is shared by all users in a project and has a default quota of 1 TB.

Please note that the scratch disk is not meant for long-term data storage and, on Puhti, files that have not been used for 180 days (scratch quota less than 5 TiB) or 90 days (scratch quota 5 TiB or more) will be automatically removed. We recommend the Allas object storage service for storing research data that is not actively used on the supercomputers. See guidelines for managing data on Puhti and Mahti scratch disks for more information. Also note that sensitive data must not be processed or stored on CSC supercomputers. For this purpose we have separate sensitive data services.

CSC supercomputers also have a persistent project-based storage with a default quota of 50 GB. It is located under
/projappl/<project> and recommended, for example, for custom software installations. Additionally, each user can store up to 10 GB of data in their personal home directory ($HOME).

Moving data between a supercomputer and a local workstation is easy using the web interface file browser or command-line file transfer tools like scp and rsync. You can also use the Linux wget utility to download data to a supercomputer directly from a website or FTP server.

CSC does not back up your data!

None of the disk areas are automatically backed up by CSC. This means that data accidentally deleted by the user cannot be recovered in any way. To avoid unintended data loss, make sure to regularly back up your data, for example to Allas or your own organization's storage systems.

Useful links

You can use the navigation sidebar or the search function to find more information about using CSC HPC services. Here we have included links to pages that we think are particularly useful when getting started with supercomputing at CSC.