Disk areas
CSC supercomputers have three main disk areas: home, projappl and scratch. In addition to these disk areas visible to all compute and login nodes, each node has a local temporary disk area that is visible to the particular compute node during a batch job or shell session, only. Please familiarize yourself with the areas and their specific purposes. The disk areas for different supercomputers are separate, i.e. home, projappl and scratch in Puhti cannot be directly accessed from Mahti. Also a more technical description of the Lustre filesystem used in these directories is available.
Note
None of the disk areas are automatically backed up by CSC! This means that data accidentally deleted by the user cannot be recovered in any way. To avoid unintended data loss, make sure to perform regular backups to, for example, Allas. See also the allas-backup tool.
Owner | Environment variable | Path | Cleaning | Automatic backup? | |
---|---|---|---|---|---|
home | Personal | ${HOME} |
/users/<user-name> |
No | No |
projappl | Project | Not available | /projappl/<project> |
No | No |
scratch | Project | Not available | /scratch/<project> |
Not yet - will be 90 days | No |
These disk areas have quotas for both the amount of data and total number of files:
Capacity | Number of files | |
---|---|---|
home | 10 GiB | 100 000 files |
projappl | 50 GiB | 100 000 files |
scratch | 1 TiB | 1 000 000 files |
See Increasing Quotas for instructions on how to apply for increased quota.
Home directory
Each user has a home directory ($HOME
) that can contain up to 10 GB of
data.
The home directory is the default directory where you begin after logging in to supercomputer. However, typically you should change to your project's scratch directory when working because the home directory is not intended for data analysis or computing. Its purpose is to store configuration files and other minor personal data. A home directory exceeding its capacity causes various account problems.
The home directory is the only user-specific directory in supercomputers. All other directories are project-specific. If you are a member of several projects, you also have access to several scratch or projappl directories, but still have only one home directory.
Scratch directory
Each project has by default 1 TB of scratch disk space in the directory /scratch/<project>
.
This fast parallel scratch space is intended as temporary storage space for the data that is used in supercomputers. The scratch directory is not intended for long-term data storage. In the future, any files that have not been used for 90 days will be automatically removed, but this is not yet enabled.
ProjAppl directory
Each project has also a 50 GB project application disk space in the directory
/projappl/<project>
.
It is intended for storing applications you have compiled yourself and libraries etc. that you are sharing within the project. It is not a personal storage space but it is shared with all members of the project team.
It is not intended for running applications, so please run them in scratch instead.
Using Scratch and ProjAppl directories
An overview of your directories in a supercomputer you are currently logged on can be displayed with:
csc-workspaces
For example, if you are member in two projects, with unix groups project_2012345 and project_3587167, then you have access to two scratch and projappl directories:
[kkayttaj@puhti ~]$ csc-workspaces Disk area Capacity(used/max) Files(used/max) Project description ---------------------------------------------------------------------------------- Personal home folder ---------------------------------------------------------------------------------- /users/kkayttaj 2.05G/10G 23.24k/100k Project applications ---------------------------------------------------------------------------------- /projappl/project_2012345 3.056G/50G 23.99k/100k Ortotopology modeling /projappl/project_3587167 10.34G/50G 2.45/100k Metaphysics methods Project scratch ---------------------------------------------------------------------------------- /scratch/project_2012345 56G/1T 150.53k/1000k Ortotopology modeling /scratch/project_3587167 324G/1T 5.53k/1000k Metaphysics methods
Moving to the scratch directory of project_2012345:
cd /scratch/project_2012345
The scratch and projappl directories are shared by all the members of the project. All new files and directories are also fully accessible for other group members (including read, write and execution permissions). If you want to restrict access from your group members, you can reset the permissions with the chmod command.
Setting read-only permissions for your group members for the directory my_directory:
chmod -R g-w my_directory
As mentioned earlier, the scratch directory is only intended for processing data. Any data that should be preserved for a longer time should be copied to the Allas storage server. Instructions for backing up files from CSC supercomputers to Allas can be found in the Allas guide.
Moving data between supercomputers
Data can be moved between supercomputers via Allas by first uploading the data in one supercomputer and then downloading in another supercomputer. This is the recommended approach if the data should also be preserved for a longer time.
Data can also be moved directly between the supercomputers with the rsync command. For example, in order to copy my_results (which can be either file or directory) from Puhti to the directory /scratch/project_2002291 in Mahti, one can issue in Puhti the command:
rsync -azP my_results yourcscusername@mahti.csc.fi:/scratch/project_2002291
Increasing Quotas
You can use MyCSC portal to manage quotas of the scratch and projappl directories.
Remember that even after the quota is increased, the planned automatic cleaning process will continue removing idle files from the scratch directory. Data that is not under active computing should be stored in the Allas storage service.
Remember also, that you can increase these values only to some extent. Especially in the case of number of files, you should reconsider your data work flow, if it requires that tens of millions of files are stored to the scratch area.
Temporary local disk areas
The disk area that is suitable for the temporary files that are only visible within
the login or compute node depends on the type of the node. If the application depends
on the use of temporary files, the suitability of the filesystem may have a large effect
on the performance of the application, see section "Mind your I/O - it can make a big
difference" in the Performance Checklist. Please note that
some applications use temporary files "behind the scenes." Usually these applications
read some environment variable that points to a suitable disk area, such as $TMPDIR
.
Login nodes
Each of the login nodes have 2900 GiB of fast local storage. The storage
is located under $TMPDIR
and is separate for each login node.
The local storage is good for compiling applications and performing pre- and post-processing that require heavy IO operations, for example packing and unpacking archive files.
Note
The local storage is meant for temporary storage and is cleaned frequently. Remember to move your data to a shared disk area after completing your task.
Compute nodes with local SSD (nvme) disks
Interactive batch jobs as well as jobs running in the IO- and gpu-nodes in Puhti and gpu-nodes in Mahti have local fast storage available. In interactive batch jobs this local disk area is defined with environment variable $TMPDIR
and in normal batch jobs with $LOCAL_SCRATCH
. The size of this storage space is defined in the batch job resource request (max. 3600 GB).
These local disk areas are designed to support I/O intensive computing tasks and cases where you need to process large amounts (over 100 000 files) of small files. These directories are cleaned once the batch job finishes. Thus, in the end of a batch job you must copy all the data that you want to preserve from these temporary disk areas to scratch directory or to Allas.
For more information see: creating job scripts.
Compute nodes without local SSD (nvme) disks
In Puhti we simply recommend using compute nodes with nvme disks ($LOCAL_SCRATCH
) for the applications that
require temporary local storage.
In Mahti, with most compute nodes without local nvme disks, it is possible to store
a relatively small amount of temporary files
in memory. In practice the applications can use the directory /dev/shm
for this, for example by
setting export TMPDIR=/dev/shm
. Plese note that the use of /dev/shm
consumes memory, so less is
left available for the applications. This may lead in applications running out of memory sooner than
expected and failing in the compute node, but this usually does no other harm. The plus side is that
if it works, it should be fast. In Puhti however, where applications from
multiple users can share the same node, running out of memory by filling up /dev/shm
will crash
other users applications, too. It is recommended not to use /dev/shm
in Puhti at all.