Motivation: Completing the genome sequence of an organism is an important task in comparative, functional and structural genomics. However, this remains a challenging issue from both a computational and an experimental viewpoint. Genome scaffolding (i.e. the process of ordering and orientating contigs) of de novo assemblies usually represents the first step in most genome finishing pipelines. Results: In this paper, we present MeDuSa (Multi-Draft based Scaffolder), an algorithm for genome scaffolding. MeDuSa exploits information obtained from a set of (draft or closed) genomes from related organisms to determine the correct order and orientation of the contigs. MeDuSa formalises the scaffolding problem by means of a combinatorial optimisation formulation on graphs and implements an efficient constant factor approximation algorithm to solve it. In contrast to currently used scaffolders, it does not require either prior knowledge on the microrganisms dataset under analysis (e.g. their phylogenetic relationships) or the availability of paired end read libraries. This makes usability and running time two additional important features of our method. Moreover, benchmarks and tests on real bacterial datasets showed that MeDuSa is highly accurate and, in most cases, outperforms traditional scaffolders. The possibility to use MeDuSa on eukaryotic datasets has also been evaluated, leading to interesting results. Availability: MeDuSa web server: http://combo.dbe.unifi.it/medusa. A stand-alone version of the software can be downloaded from https://github.com/combogenomics/medusa/releases. All results presented in this work have been obtained with MeDuSa v. 1.3.

MeDuSa: a multi-draft based scaffolder / E. Bosi; B. Donati; M. Galardini; S. Brunetti; P. Liò; P. Crescenzi; M.F. Sagot; R. Fani; M. Fondi. - In: BIOINFORMATICS. - ISSN 1367-4803. - STAMPA. - (2015), pp. 2443-2451. [10.1093/bioinformatics/btv171]

MeDuSa: a multi-draft based scaffolder

B. Donati;P. Crescenzi;R. Fani;M. Fondi
2015

Abstract

Motivation: Completing the genome sequence of an organism is an important task in comparative, functional and structural genomics. However, this remains a challenging issue from both a computational and an experimental viewpoint. Genome scaffolding (i.e. the process of ordering and orientating contigs) of de novo assemblies usually represents the first step in most genome finishing pipelines. Results: In this paper, we present MeDuSa (Multi-Draft based Scaffolder), an algorithm for genome scaffolding. MeDuSa exploits information obtained from a set of (draft or closed) genomes from related organisms to determine the correct order and orientation of the contigs. MeDuSa formalises the scaffolding problem by means of a combinatorial optimisation formulation on graphs and implements an efficient constant factor approximation algorithm to solve it. In contrast to currently used scaffolders, it does not require either prior knowledge on the microrganisms dataset under analysis (e.g. their phylogenetic relationships) or the availability of paired end read libraries. This makes usability and running time two additional important features of our method. Moreover, benchmarks and tests on real bacterial datasets showed that MeDuSa is highly accurate and, in most cases, outperforms traditional scaffolders. The possibility to use MeDuSa on eukaryotic datasets has also been evaluated, leading to interesting results. Availability: MeDuSa web server: http://combo.dbe.unifi.it/medusa. A stand-alone version of the software can be downloaded from https://github.com/combogenomics/medusa/releases. All results presented in this work have been obtained with MeDuSa v. 1.3.
2015
2443
2451
E. Bosi; B. Donati; M. Galardini; S. Brunetti; P. Liò; P. Crescenzi; M.F. Sagot; R. Fani; M. Fondi
File in questo prodotto:
File Dimensione Formato  
Bosi et al - 2015 Bioinformatics.pdf

Accesso chiuso

Tipologia: Pdf editoriale (Version of record)
Licenza: Tutti i diritti riservati
Dimensione 608.98 kB
Formato Adobe PDF
608.98 kB Adobe PDF   Richiedi una copia
Bosi et al - Manuscript.pdf

Open Access dal 26/03/2016

Tipologia: Versione finale referata (Postprint, Accepted manuscript)
Licenza: Tutti i diritti riservati
Dimensione 554.95 kB
Formato Adobe PDF
554.95 kB Adobe PDF

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/987633
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 301
  • ???jsp.display-item.citation.isi??? 297
social impact