High Performance Bioinformatics

En

High Performance Bioinformatics

We live in a big-data era and simple serial bioinformatic pipelines can’t efficiently handle huge input datasets. Hence, High Performance Computing (HPC) can represent a good solution for researchers who need to analyze and address new biological questions with their data.

This course is both theoretical and practical and is addressed to bioinformaticians who want to scale up their analysis on a cluster machine. It mainly focuses on the development and execution of automated data analysis pipelines.

On the first day, students will become confident with a cluster machine (e.g. hardware, software, module environment, data storage) and will learn how to submit a single batch script via the SLURM scheduler.

On the second day, partecipants will be introduced to the world of Next Generation Sequencing (NGS) and will learn how to build a fully automated RNA-seq pipeline able to handle large input datasets, focusing on job concatenations and HPC request resource optimization.

On the last day students will be introduced to cloud computing and to the world of snakemake, a python workflow management system tool able to create reproducible and scalable data analyses, with particular attention to scaling workflows on cluster and cloud without modifyng the workflow definition.

Ad-hoc hands-on sessions, aimed at applying the concepts explained during the course, will be held every afternoon.

Skills:

By the end of the course each student should be able to:

- Know all the conventions and opportunities offered by CINECA for accessing HPC resources;

- Download datasets from public repositories and/or transfer input files from the user’s local computer to the CINECA clusters;

- Navigate through the software environment set up by CINECA;

- Run single-step jobs on a supercomputer via SLURM scheduler;

- Combine several bioinformatics applications into a fully automated pipeline able to run on a supercomputer;

- Learn how to iterate through samples in order to manage huge input datasets;

- Have an overview of how to take advantage of snakemake to build a portable, scalable and fully automated pipeline.

Target audience:

Biologists, bioinformaticians and computer scientists interested in approaching large-scale NGS-data analysis.

Course prerequisites:

Good knowledge of python and shell command line.

A very basic knowledge of R and biology is recommended but not strictly required.

Intended for:

Companies

Health

Research Institutions

Area:

Science

Provided as:

Webinar

Next courses

Non sono previste edizioni di questo corso.

Any question?

For HPC and computer graphics courses, write to corsi.hpc@cineca.it

About CINECA

Cineca is a non profit Consortium, made up of 102 Italian national institutions: Universities, Italian Research Institutions and the Italian Ministries of Universities and Education.

Today it is the largest Italian computing centre, one of the most important worldwide. With more seven hundred employees, it operates in the technological transfer sector through high performance scientific computing, the management and development of networks and web based services, and the development of complex information systems for treating large amounts of data.

It develops advanced Information Technology applications and services, acting like a trait-d'union between the academic world, the sphere of pure research and the world of industry and Public Administration. .

Visit the Cineca website