The advent of next-generation sequencing has brought the possibility of sequencing not only a single genome but the genomes of a whole community of microorganisms of a biome.

We currently know only a small fraction of the viral diversity. The use of metagenomic data and the identification of emerging viruses represent a major challenge in terms of bioinformatics. First, viruses evolve much faster than prokaryotes and eukaryotes, leading to a higher divergence of the sequences and making their detection by conventional pairwise alignment methods more difficult. Second, the number of viral genomes available on public databases is relatively low, compared to archaea and bacteria, for instance. This aspect also makes viral sequence detection and classification much more challenging.

In this course, we intend to cover some innovative methods that have been recently developed and that can increase the sensitivity of detection of evolutionary remote viruses. One of the approaches involves the construction and application of profile HMMs. We will teach conceptual aspects of profile HMM construction, especially for taxonomically specific groups of viruses.

Also, we will offer practical sessions where the students will be able to build and apply profile HMMs in metagenomic data for viral detection and discovery. We also intend to cover the fundamentals of different machine learning approaches and present in practical sessions different methods applied to viral detection, classification, virus-host interactions, among other topics.

Course Official Web site

Full course information can be downloaded here


Prof. Manja Marz, E-mail: Este endereço de email está sendo protegido de spambots. Você precisa do JavaScript ativado para vê-lo.

Friedrich Schiller University Jena, Germany

Prof. Arthur Gruber, E-mail: Este endereço de email está sendo protegido de spambots. Você precisa do JavaScript ativado para vê-lo.

Department of Parasitology, Institute of Biomedical Sciences, University of São

Paulo, Brazil

Schedule and location:

Oct 21 to 25, 2019

8:30 am to 12:15 pm and 13:30 to 18:00

Institute of Biomedical Sciences, Biomédicas II Building, Samuel Pessoa room

– theoretical classes

Institute of Biomedical Sciences, Biomédicas IV Building, room 5 – practical


Class activities: 26 h

Home activities: 4 h


This is course is intended for graduate students, postdocs and young

researchers working in the fields of metagenomics and viral discovery.


Practical sessions will be taught using Linux servers. Previous knowledge of

the Linux command line is required for the practical classes. Fundamentals of

molecular virology are also recommended.


Please send your send your CV and a short letter of intent to Prof. Arthur

Gruber (Este endereço de email está sendo protegido de spambots. Você precisa do JavaScript ativado para vê-lo.). Total number of seats: 25.


The course will be taught in English.


Using the course as a discipline for Brazilian graduate students:

Graduate students may enroll in the course and use it as an official discipline

(IBI5071 - 2 credits). In this case, an official enrollment is required (instructions

are available at the end of this document) and a written exam with be carried

out at the end of the course.



1. Metagenomics - challenges for viral detection and discovery

2. Profile HMM construction for viral detection discovery

3. Screening metagenomic data with profile HMMs

4. Targeted progressive assembly using profile HMMs as seeds

5. Finding proviruses in bacterial genomes using profile HMMs

6. Introduction to machine learning methods

7. SVM to detect viral miRNAs

8. Random Forrest to detect viral miRNAs

9. PCA for viral host classification

10.CNN for viral host classification

11.Introduction into RNA world

12.Folding algorithm: MacCaskill and Partition functions

13.RNAfold to determine the secondary structures of RNA viruses

14.Vienna RNA Package for the study of virus host interactions

15.LRIscan/Circos (long-range Interactions of segmented viruses)

16.Covariance models

17.Infernal to detect viral elements from (meta-)genomic samples