The course will be held ONLY Online, in Italian language and limited to 25 Students.
No IN PRESENCE lessons.
The course is free of charge.
At the end of the course, a certificate of attendance will be issued to all eligible participants.
Course Overview
We live in a big-data era, and simple serial bioinformatics pipelines cannot efficiently handle huge datasets. High Performance Computing (HPC) represents an effective solution for researchers who need to analyze and address new biological questions using large-scale data.
This course is both theoretical and practical, aimed at bioinformaticians who want to scale up their analyses on cluster machines. It focuses on the development and execution of automated, reproducible pipelines.
Ad-hoc hands-on sessions will be held every day.
Day 1 – Introduction to HPC and Cluster Basics
Topics:
Cluster architecture: hardware, storage, and software environment
Module system and software navigation
Submitting jobs via the SLURM scheduler
Running single-step batch scripts
Day 2 – NGS Pipelines, Singularity, and GPU
Topics:
Introduction to Next Generation Sequencing (RNA-seq focus)
Building automated pipelines for large datasets
Job concatenation and HPC resource optimization
Singularity containers: packaging bioinformatics tools for reproducibility
GPU usage: running accelerated bioinformatics tasks
Day 3 – Workflow Management with Snakemake & Cloud Scaling
Topics:
Introduction to workflow management concepts
Snakemake basics: rules, dependencies, and reproducibility
Scaling pipelines on cluster and cloud environments without modifying workflow definitions
Tips for portable and scalable pipeline design
Skills Acquired
By the end of the course, students should be able to:
Navigate HPC resources and the software environment
Submit and monitor jobs using SLURM
Use Singularity containers for reproducible pipelines
Run GPU-accelerated jobs
Build an automated pipeline handling large datasets
Apply Snakemake to create scalable, portable, and reproducible workflows
Target Audience
Biologists, bioinformaticians, and computer scientists interested in large-scale NGS data analysis.
Prerequisites
Good knowledge of Python and shell command line
Basic knowledge of R and biology recommended but not required
