Tykky
Intro
Tykky is a set of tools which make software installations on HPC systems easier and more efficient using Apptainer containers.
Tykky use cases:
- Conda installations, based on Conda
environment.yml
. - Pip installations, based on pip
requirements.txt
. - Container installations, based on existing Docker or Apptainer/Singularity images.
- This includes installations from the Bioconda channel, see this tutorial for an example.
Tykky wraps installations inside an Apptainer/Singularity container to improve startup
times, reduce I/O load, and lessen the number of files on large parallel file systems.
Additionally, Tykky will generate wrappers so that installed software can be used
(almost) as if it was not containerized. Depending on tool selection and settings,
either the whole host file system or a limited subset is visible during execution
and installation. This means that it's possible to wrap installations using e.g
mpi4py
relying on the host-provided MPI installation.
This documentation covers a subset of the functionality and focuses on Conda and Python. Most advanced use-cases are not covered here yet.
Warning
As Tykky is still under development, some of the more advanced features might change with respect to exact usage and API.
Tykky module
To access Tykky tools:
1) Usually it is best to first unload all other modules:
module purge
2) Load the Tykky module:
module load tykky
Conda-based installation
First, make sure that you have read and understood the license terms for Miniconda and any used channels before using the command.
- Miniconda end-user license agreement.
- Anaconda terms of service.
- A blog entry on Anaconda commercial edition.
1) Create a Conda environment file env.yml
:
- Create manually a new file or
- Create the file from an existing Conda installation. For example:
conda env export -n <target_env_name> > env.yml
.- If the existing environment is on a Windows or MacOS machine, the
--from-history
flag might be required to get a.yml
file suitable for Linux. - If the existing environment is on a Linux machine with x86 CPU architecture, it is also possible to use
--explicit
flag.
- If the existing environment is on a Windows or MacOS machine, the
An example of a suitable env.yml
file would be:
channels:
- conda-forge
dependencies:
- python=3.8.8
- scipy
- nglview
Info
The channels
field lists which channels the packages should be pulled from
to this environment, whereas the dependencies
field lists the actual Conda
packages that will be installed into the environment. Note that Conda uses a
channel priority for determining where to install packages from, i.e. it tries
to first install packages from the first listed channel. If no package versions
are specified, Conda always installs the latest versions.
2) Create a new directory <install_dir>
for the installation. /projappl/<your_project>/...
is recommended.
3) Create the installation:
conda-containerize new --prefix <install_dir> env.yml
4) Add the <install_dir>/bin
directory to your $PATH
:
export PATH="<install_dir>/bin:$PATH"
5) Now you can call python
and any other executables Conda has installed in the same
way as if you had activated the environment.
Using Jupyter with a Tykky installation
To use a Tykky installation with Jupyter, include correct conda package in your Conda environment file: jupyter
for Jupyter Notebooks or jupyterlab
for Jupyter Lab. Also additional JupyterLab extensions can be installed, for example jupyterlab-git or dask-labextension.
The best way to use Jupyter in Puhti is with Puhti webinterface. See Jupyter application page for details how to use your own Tykky installation with Puhti web interface Jupyter.
Pip with Conda
To install some additional pip packages, add the -r <req_file>
argument, e.g.:
conda-containerize new -r req.txt --prefix <install_dir> env.yml
Mamba
The tool also supports using Mamba for installing
packages. Mamba often finds suitable packages much faster than Conda, so it is a good
option when the required package list is long. Enable this feature by adding the --mamba
flag.
conda-containerize new --mamba --prefix <install_dir> env.yml
End-to-end example
Create a new Conda-based installation using the previous env.yml
file.
mkdir MyEnv
conda-containerize new --prefix MyEnv env.yml
After the installation finishes, add the installation directory to your PATH
and use it like normal.
$ export PATH="$PWD/MyEnv/bin:$PATH"
$ python --version
3.8.8
$ python3
Python 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 16:22:27)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import scipy
>>> import nglview
>>>
Modifying a Conda installation
Tykky installations reside in a container, so they can not be directly modified.
Small Python packages can be added normally using pip
, but then the Python packages
will be sitting on the parallel file system, which is not recommended for any larger
installations.
To actually modify the installation, we can use the update
keyword together with
the --post-install <file>
option, which specifies a bash script with commands to
run to update the installation. The commands are executed with the Conda environment
activated.
conda-containerize update <existing installation> --post-install <file>
Where <file>
could e.g. contain:
conda install -y numpy
conda remove -y nglview
pip install requests
In this mode the whole host system is available including all software and modules.
Pip-based installations
Sometimes you don't need a full-blown Conda environment or you might prefer pip to manage Python installations. In this case we can use:
pip-containerize new --prefix <install_dir> req.txt
where req.txt
is a standard pip requirements file. The notes and options for
modifying a Conda installation apply here as well.
Note that the Python version used by pip-containerize
is the first Python executable
found in the path, so it's affected by loaded modules.
Important: This Python can not be itself container-based as nesting is not possible!
An additional --slim
flag exists, which will instead use a pre-built minimal Python
container with a much newer version of Python as a base. Without the --slim
flag,
the whole host system is available, whereas with the flag the system installations (i.e.
/usr
, /lib64
, ...) are no longer taken from the host, but instead coming from
within the container.
Container-based installations
Tykky also provides an option to:
- Generate wrappers for tools in existing Apptainer/Singularity containers so that
they can be used transparently (no need to prepend
apptainer exec ...
or modify scripts if switching between containerized versions and "normal" installations). - Install tools available in Docker images, including generating wrappers.
wrap-container -w /path/inside/container <container> --prefix <install_dir>
<container>
can be a local filepath or any URL accepted by Apptainer/Singularity (e.gdocker://
oras://
)-w
needs to be an absolute path (or comma-separated list) inside the container. Wrappers will then be automatically created for the executables in the target directories / for the target path. If you do not know the path of the executables in the container, open a shell inside the container and use the which command. To open a shell:- In case of existing local Apptainer/Singularity file:
singularity shell image.sif
. - In case of Docker or non-local Apptainer/Singularity file, create first the
installation with some path and then start with created
_debug_shell
.
- In case of existing local Apptainer/Singularity file:
Memory errors
With very large installations the resources available on the login node might
not be enough, resulting in Tykky failing with a MemoryError
. In this case, the
installation needs to be done on a compute node, for example using an interactive
session:
# Start interactive session, here with 12 GB memory and 15 GB local disk (increase if needed)
sinteractive --account <project> --time 1:00:00 --mem 12000 --tmp 15
# Load Tykky
module purge
module load tykky
# Run the Tykky commands as described above, e.g.
conda-containerize new --prefix <install_dir> env.yml
Moving and deleting Tykky installation
For deleting a Tykky installation, remove the
Tykky installations can also be moved:
- Inside the same supercomputer, from folder to folder, move the
folder with mv
to new location. - Between Puhti and Mahti use
rsync
. For copying to Mahti, log in to Mahti and change to the folder where you want to move the Tykky installation, then use:
rsync -al <username>@puhti.csc.fi:<install_dir> .
More complicated example
How it works
See the README
in the source code repository. The source code can be found in the
GitHub repository.