BOL: Related items

Structural variants PPT

Jit — Wed, 07 Sep 2016 03:16:09 -0500

1000 Genomes data tutorial at ASHG

Structural variants presentation by

Jan Korbel

European Molecular Biology Laboratory (EMBL) Heidelberg Genome Biology Research Unit

Reference:

https://www.genome.gov/pages/research/der/1000genomesprojecttutorials/structuralvariants-jankorbel.pdf

FERMI

Jit — Fri, 09 Sep 2016 05:37:13 -0500

Fermi is a de novo assembler with a particular focus on assembling Illumina short sequence reads from a mammal-sized genome. In addition to the role of a typical assembler, fermi also aims to preserve heterozygotes which are often collapsed by other assemblers. Its ultimate goal is to find a minimal set of
unitigs to represent all the information in raw reads.

Fermi follows the overlap-layout-consensus paradigm and uses the FM-DNA-index (FMD-index) as the key data structure. It is inspired by the string graph assembler (Simpson and Durbin, 2010 and 2012) and has a similar workflow.

As a typical de novo assembler, fermi tends to produce contigs with slightly longer N50. However, the major weakness of fermi is the high misassembly rate. Although fermi provides a tool to fix misassemblies by using paired-end reads to achieve an accuracy comparable to other assemblers, this is not a favorable solution.

Fermi is designed to be used on a multi-core Linux machine with large shared memory. The easiest way to run fermi is to use the run-fermi.pl script. It generates a Makefile. The actual assembly is done by invoking make. Premature assembly processes can be resumed. Here is an example:

run-fermi.pl -dAPe ./fermi -p NA12878 -t16 -f18 reads*.fq.gz > NA12878.mak
make -f NA12878.mak -j16

Address of the bookmark: https://github.com/lh3/fermi

GenomeScope: open-source web tool to rapidly estimate the overall characteristics of a genome, including genome size, heterozygosity rate, and repeat content from unprocessed short reads

Jit — Fri, 21 Oct 2016 05:46:43 -0500

Summary: GenomeScope is an open-source web tool to rapidly estimate the overall characteristics of a genome, including genome size, heterozygosity rate, and repeat content from unprocessed short reads. These features are essential for studying genome evolution, and help to choose parameters for downstream analysis. We demonstrate its accuracy on 324 simulated and 16 real datasets with a wide range in genome sizes, heterozygosity levels, and error rates. Availability and Implementation: http://qb.cshl.edu/genomescope/, https://github.com/schatzlab/genomescope.git

Address of the bookmark: http://qb.cshl.edu/genomescope/

RECORD

Bulbul — Fri, 25 Nov 2016 08:23:36 -0600

Background. Next-generation sequencing technologies are now producing multiple times the genome size in total reads from a single experiment. This is enough information to reconstruct at least some of the differences between the individual genome studied in the experiment and the reference genome of the species. However, in most typical protocols, this information is disregarded and the reference genome is used. Results. We provide a new approach that allows researchers to reconstruct genomes very closely related to the reference genome (e.g., mutants of the same species) directly from the reads used in the experiment. Our approach applies de novo assembly software to experimental reads and so-called pseudoreads and uses the resulting contigs to generate a modified reference sequence. In this way, it can very quickly, and at no additional sequencing cost, generate new, modified reference sequence that is closer to the actual sequenced genome and has a full coverage. In this paper, we describe our approach and test its implementation called RECORD. We evaluate RECORD on both simulated and real data. We made our software publicly available on sourceforge. Conclusion. Our tests show that on closely related sequences RECORD outperforms more general assisted-assembly software.

More at https://sourceforge.net/projects/record-genome-assembler/files/

Address of the bookmark: https://www.ncbi.nlm.nih.gov/pubmed/26558255

ScaffMatch

Jit — Tue, 13 Dec 2016 10:23:56 -0600

caffMatch is a novel scaffolding tool based on Maximum-Weight Matching able to produce high-quality scaffolds from NGS data (reads and contigs). The tool is written in Python 2.7. It also includes a bash script wrapper that calls aligner in case one needs to first map reads to contigs (instead of providing .sam files).

The arguments accepted by ScaffMatch are:

-w) Working directory -- this is the directory where ScaffMatch files are stored. These are .sam files produced after mapping reads to contigs and the resulting scaffolds file `scaffolds.fa` fasta file;

-c) Contig fasta file;

-m) Command line argument with no options. It is used when .sam files are used instead of reads .fastq files. Do not use this option if you provide reads files;

-1) (Comma separated list of) either .fastq or .sam file(s) corresponding to the first read of the read pair;

-2) (Comma separated list of) either .fastq or .sam file(s) corresponding to the second read of the read pair;

-i) (Comma separated list of) insert size(s) of the library(-ies);

-s) (Comma separated list of) library(-ies) standard deviation(s) of insert size(s);

-t) Bundle threshold. Pairs of contigs supported by number of read pairs less than the value of this argument are discarded. Optional argument, by default it is equal to 5;

-g) Matching heuristics: use `max_weight` for Maximum Weight Matching heuristics with the Insertion step, use `backbone` for Maximum Weight Matching heuristics without the Insertion step, use `greedy` for Greedy Matching heuristics;

-l) Log file - where to store the logs. Optional argument. By default, stdout is used.

Address of the bookmark: http://alan.cs.gsu.edu/NGS/?q=content/scaffmatch

pyScaf

Bulbul — Mon, 19 Dec 2016 14:20:33 -0600

pyScaf orders contigs from genome assemblies utilising several types of information:

paired-end (PE) and/or mate-pair libraries (NGS-based mode)
long reads (NGS-based mode)
synteny to the genome of some related species (reference-based mode)

Scaffolding

In reference-based mode, pyScaf uses synteny to the genome of closely related species in order to order contigs and estimate distances between adjacent contigs.

Contigs are aligned globally (end-to-end) onto reference chromosomes, ignoring:

matches not satisfying cut-offs (--identity and --overlap)
suboptimal matches (only best match of each query to reference is kept)
and removing overlapping matches on reference.

In preliminary tests, pyScaf performed superbly on simulated heterozygous genomes based on C. parapsilosis (13 Mb; CANPA) and A. thaliana (119 Mb; ARATH) chromosomes, reconstructing correctly all chromosomes always for CANPA and nearly always for ARATH (Figures in dropbox, CANPA table, ARATH table).
Runs took ~0.5 min for CANPA on 4 CPUs and ~2 min for ARATH on 16 CPUs.

Important remarks:

Reduce your assembly before (fasta2homozygous.py) as any redundancy will likely break the synteny.
pyScaf works better with contigs than scaffolds, as scaffolds are often affected by mis-assemblies (no de novo assembler / scaffolder is perfect...), which breaks synteny.
pyScaf works very well if divergence between reference genome and assembled contigs is below 20% at nucleotide level.
pyScaf deals with large rearrangements ie. deletions, insertion, inversions, translocations. Note however, this is experimental implementation!
Consider closing gaps after scaffolding.

Address of the bookmark: https://github.com/lpryszcz/pyScaf

MCscan

Bulbul — Thu, 22 Dec 2016 03:53:58 -0600

MCscan is a computer program that can simultaneously scan multiple genomes to identify homologous chromosomal regions and subsequently align these regions using genes as anchors. This is the toolset for generating the synteny correspondences in Plant Genome Duplication Database. It is intended as an easy-to-use and quick way to identify conserved gene arrays both within the same genome and across different genomes.

More at http://chibba.agtec.uga.edu/duplication/mcscan/

Address of the bookmark: http://chibba.agtec.uga.edu/duplication/mcscan/

GenomeComp

Jit — Fri, 17 Feb 2017 08:38:32 -0600

GenomeComp is a tool for summarizing, parsing and visualizing the genome wide sequence comparison results derived from voluminous BLAST textual output, so as to locate the rearrangements, insertions or deletions of genome segments between species or strains.

It can be easily used to compare, parsing and visualize large genomic sequences, especially closely related genomes such as inter-species or inter-strains. In addition, it can also show other sequence features like repeat sequence distributions in one whole-genome DNA sequence by comparing the genome to itself.

It is a stand-alone graphical user interface (GUI) program which runs on Linux, Unix, Mac OS X (tested on version 10.2.4 only) and Microsoft Windows platforms and is written in Perl/Tk.

Address of the bookmark: http://www.mgc.ac.cn/GenomeComp/

Krona

Jit — Wed, 22 Mar 2017 04:47:35 -0500

Krona allows hierarchical data to be explored with zooming, multi-layered pie charts. Krona charts can be created using an Excel template or KronaTools, which includes support for several bioinformatics tools and raw data formats. The interactive charts are self-contained and can be viewed with any modern web browser (see Browser support).

Address of the bookmark: https://github.com/marbl/Krona/wiki

Researcher in computer science/biology

Mon, 15 Jul 2013 18:38:40 -0500

Researcher in Computer Science at the Computational Biology Unit - temporary employment

The Department of Informatics is a vacant position as a researcher in computer science, related to Computational Biology Unit (CBU), for 3 years.

The position is part of CBU Service Group and will focus on bioinformatic analysis project and especially the analysis of high-throughput data, including NGS (sequencing), and proteomics data.

The successful candidate will be part of the Norwegian bioinformatics platform's national helpdesk within the project ELIXIR.NO

Applicants must hold a PhD in a relevant subject such as computer science, mathematics, molecular biology and also possess expertise and experience in bioinformatics statistics and analysis of data from high-throughput molecular experiment.

Basic programming or scripting skills are required. Experience in Python, R, Perl, Linux-based operating systems and moreover knowledge of databases and web programming will be a strength for applicants.

We expect enthusiasm and independence and moreover the ability to work in an interdisciplinary team environment.

Good knowledge of English is required.

Salaries start at level 57 (code 1109/LR 24.1) by appointment. Further promotion occurs after
service seniority in the position (at grade 57-65). Of particularly highly qualified applicants may be considered a higher salary.

Further information about the position is available from the chair of the CBU,
Professor Inge Jonassen, e-mail: Inge.Jonassen @ ii.uib.no

The successful applicant must comply with the guidelines that apply at any given time the position.

State employment shall as far as possible reflect the diversity of the population. It is therefore an objective to achieve a balanced age and sex composition and the recruitment of persons with immigrant backgrounds. Persons with immigrant background are requested to apply for the position.

Women are particularly encouraged to apply. If the experts find that several applicants have approximately equivalent qualifications, the rules on equal in the Personnel Regulations for Academic Positions will be applied.

University of Bergen applies the principles of public openness when recruiting staff to scientific positions.

Information about the applicant may be made public even though the applicant has requested not to be named in the list of applicants. If the request does not host admitted to the result, the applicant shall be notified of this.

Send application, CV, certificates, diplomas, undergraduate work and a list of publications (list of publications) online by clicking on https://www.jobbnorge.no/jobbsoknet/login.aspx?returnurl=/jobbsoknet/jobapplication.aspx?jobid=95196

You need to upload certified translations into English or a Scandinavian language of appendices, such as diplomas and transcripts.

Applications sent by email to individuals at the institute will not be considered.

Deadline: 9 August 2013