Python
Python programming language in CSC's Supercomputers Puhti and Mahti.
Available
- Puhti: 3.x versions
- Mahti: 3.x versions
System Python is available by default both in Puhti and Mahti without loading any
module. Python 3 (= 3.6.8) is available as python3
. The default system Python does not
include any optional Python packages. However, you can install simple packages for
yourself by the methods explained below.
In Puhti there are several Python modules available that include different sets of scientific libraries:
- python-data - for data analytics and machine learning
- PyTorch - PyTorch deep learning framework
- RAPIDS - for data analytics and machine learning on GPUs
- TensorFlow - TensorFlow deep learning framework
- JAX - Autograd and XLA for high-performance machine learning
- BioPython - biopython and other bioinformatics related Python libraries
- geoconda - for spatial data anlysis
- and several other modules may include Python...
In Mahti:
- python-data - for data analytics and machine learning
To use any of the above mentioned modules, just load the appropriate module, for example:
module load python-data
Typically, after activating the module, you can continue using the commands
python
and/or python3
but these will now point to different versions of
Python with a wider set of Python packages available. For more details, check
the corresponding application documentation (when available).
Installing Python packages to existing modules
If you find that some package is missing from an existing module, you can often
install it yourself with: pip install <newPythonPackageName> --user
The packages are by default installed to your home directory under
.local/lib/pythonx.y/site-packages
(where x.y
is the version of Python being
used). If you would like to change the installation folder, for example to make
a project-wide installation instead of a personal one, you need to define the
PYTHONUSERBASE
environment variable with the new installation local. For
example to add the package whatshap
to the python-data
module:
module load python-data
export PYTHONUSERBASE=/projappl/<your_project>/my-python-env
pip install --user whatshap
In the example, the package is now installed inside the my-python-env
directory in the project's projappl
directory. Run unset PYTHONUSERBASE
if you
wish to later install into your home directory again.
When later using those libraries you need to remember to add the site-packages
path to PYTHONPATH
(or use the same PYTHONUSERBASE
definition as above).
Naturally, this also applies to slurm job scripts. For example:
module load python-data
export PYTHONPATH=/projappl/<your_project>/my-python-env/lib/python3.9/site-packages/
python3 -c "import whatshap" # this should now work!
Note that if the package you installed also contains executable files these may not work as they refer to the Python path internal to the container (and most of our Python modules are installed with containers):
whatshap --help
whatshap: /CSC_CONTAINER/miniconda/envs/env1/bin/python3.9: bad interpreter: No such file or directory
You can fix this by either editing the first line of the executable to point to
the real python interpreter (check with which python3
) or by running it via
the Python interpreter, for example:
python3 -m whatshap --help
Alternatively you can create a separate virtual environment with venv, however this approach doesn't work with modules installed with Apptainer, which is now the default approach at CSC. Note that Singularity has been re-branded as Apptainer since the beginning of 2022.
If you think that some important package should be included in a module provided by CSC, you can send an email to Service Desk.
Creating your own Python environments
It is also possible to create your own Python environments.
Tykky
The easiest option is to use Tykky for Conda or pip installations.
Custom Apptainer container
In some cases, for example if you know of a suitable ready-made Apptainer or Docker container, also using a custom Apptainer container is an option.
Please, see our Apptainer documentation:
- Running Apptainer containers
- Creating Apptainer containers, including how to convert Docker container to Apptainer container.
Conda
Conda is easy to use and flexible, but it usually creates a huge number of files which is inefficient with shared file systems. This can cause very slow library imports and in the worst case slowdowns in the whole file system. Therefore, CSC has deprecated the direct use of Conda installations on CSC supercomputers. You can, however, still use Conda environments granted that they are containerized. To easily containerize your Conda (or pip) environments, please see the Tykky container wrapper tool.
- CSC Conda tutorial describes in more detail what Conda is and how to use it. Some parts of this tutorial may be helpful also for Tykky installations.
Python development environments
Python code can be edited with a console-based text editor directly on the supercomputer. Codes can also be edited on your local machine and copied to the supercomputer with scp or graphical file transfer tools. You can also edit Python scripts in Puhti from your local PC with some code editors like Visual Studio Code.
Finally, several graphical programming environments can be used directly on the supercomputer, such as Jupyter Notebooks, Spyder and Visual Studio Code, through the Puhti web interface.
Jupyter Notebooks
Jupyter Notebooks allows one to run Python code via a web browser running on a local PC. The notebooks can combine code, equations, visualizations and narrative text in a single document. Many of our modules, including python-data, the deep learning modules and geoconda include the Jupyter notebook package. See the tutorial how to set up and connect to a Jupyter Notebook for using Jupyter in CSC environment.
Spyder
Spyder is scientific Python development environment. Modules python-data and geoconda have Spyder included. The best option for using it is through the Puhti web interface remote desktop.
Python parallel jobs
Python has several different packages for parallel processing:
- multiprocessing
- joblib
- dask
- mpi4py - Python interface to MPI
The multiprocessing
package is likely the easiest to use and as it is part of the
Python standard library it is included in all Python installations. joblib
provides
some more flexibility. multiprocessing
and joblib
are suitable for one
node (max 40 cores). dask
is the most versatile and has several options for
parallelization. Please see CSC's Dask tutorial
which includes both single-node (max 40 cores) and multi-node examples.
See our GitHub repository for some examples for using the different parallelization options with Puhti.
The mpi4py
is not included in the current Python environments in CSC supercomputers,
however, for multinode jobs with non-trivial parallelization it is generally the most
efficient option. For a short tutorial on mpi4py
along with other approaches to improve
performance of Python programs see the free online course Python in High Performance
Computing
License
Python packages usually are licensed under various free and open source licenses (FOSS). Python itself is licensed under the PSF License, which is also open source.