Skip to content

Note: Starting from June 11, 2025, using the MyCSC portal will require multi-factor authentication (MFA). Read our MFA guide here.

Whisper

Available

Faster-Whisper-XXL r245.4 is available in Puhti.

License

Faster-Whisper-XXL is licenced using MIT-licence.

Whisper

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.

Usage

CSC users can easily install Whisper in their own python virtual environments in Puhti and Mahti. I addition, Puhti has a pre-installed Faster-Whisper-XXL version of Whisper. This Whisper environment can be activated in Puhti with command:

module load whisper

Sample commands:

whisper audio.mp3 --model medium 

Sample command with diarization enabled:

whisper interview.mp4 --model large --language French --threads 4 --diarize pyannote_v3.0 --diarize_threads 4 --num_speakers 2 -o interview_results

Example batch script

Whisper can use utilize GPU computing effectively. Example batch script below reserves one GPU for a Whisper job.

#!/bin/bash
#SBATCH --account=<project>
#SBATCH --partition=gpu
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=8G
#SBATCH --time=1:00:00
#SBATCH --gres=gpu:v100:1

module load whisper
srun whisper interview.mp4 --model large --language French --threads 4 --diarize pyannote_v3.0 --diarize_threads 4 --num_speakers 2 -o interview_results

More information