BOL: Related items

PRISM

Jit — Sat, 10 Dec 2016 15:19:40 -0600

PRISM is a software for split read (reads which span across a structrual variant -- SV ) mapping and SV calling from the mapping result. PRISM is able to detect small insertions and abitrary size deletions, inversions and tandom duplications with the direction of discordant read pairs. PRISM_CTX is a tool for detecting inter-chromosome trans-location events.

PRISM and PRISM_CTX were originally designed and written by Michael Brudno and Yue Jiang, The original PRISM publication can be found here.

The authors may be contacted via e-mail at: prism at cs.toronto.edu.

Additional information is available in the PRISM README file and PRISM_CTX README file.

http://compbio.cs.toronto.edu/prism/

Address of the bookmark: http://compbio.cs.toronto.edu/prism/

ScaffMatch

Jit — Tue, 13 Dec 2016 10:23:56 -0600

caffMatch is a novel scaffolding tool based on Maximum-Weight Matching able to produce high-quality scaffolds from NGS data (reads and contigs). The tool is written in Python 2.7. It also includes a bash script wrapper that calls aligner in case one needs to first map reads to contigs (instead of providing .sam files).

The arguments accepted by ScaffMatch are:

-w) Working directory -- this is the directory where ScaffMatch files are stored. These are .sam files produced after mapping reads to contigs and the resulting scaffolds file `scaffolds.fa` fasta file;

-c) Contig fasta file;

-m) Command line argument with no options. It is used when .sam files are used instead of reads .fastq files. Do not use this option if you provide reads files;

-1) (Comma separated list of) either .fastq or .sam file(s) corresponding to the first read of the read pair;

-2) (Comma separated list of) either .fastq or .sam file(s) corresponding to the second read of the read pair;

-i) (Comma separated list of) insert size(s) of the library(-ies);

-s) (Comma separated list of) library(-ies) standard deviation(s) of insert size(s);

-t) Bundle threshold. Pairs of contigs supported by number of read pairs less than the value of this argument are discarded. Optional argument, by default it is equal to 5;

-g) Matching heuristics: use `max_weight` for Maximum Weight Matching heuristics with the Insertion step, use `backbone` for Maximum Weight Matching heuristics without the Insertion step, use `greedy` for Greedy Matching heuristics;

-l) Log file - where to store the logs. Optional argument. By default, stdout is used.

Address of the bookmark: http://alan.cs.gsu.edu/NGS/?q=content/scaffmatch

pyScaf

Bulbul — Mon, 19 Dec 2016 14:20:33 -0600

pyScaf orders contigs from genome assemblies utilising several types of information:

paired-end (PE) and/or mate-pair libraries (NGS-based mode)
long reads (NGS-based mode)
synteny to the genome of some related species (reference-based mode)

Scaffolding

In reference-based mode, pyScaf uses synteny to the genome of closely related species in order to order contigs and estimate distances between adjacent contigs.

Contigs are aligned globally (end-to-end) onto reference chromosomes, ignoring:

matches not satisfying cut-offs (--identity and --overlap)
suboptimal matches (only best match of each query to reference is kept)
and removing overlapping matches on reference.

In preliminary tests, pyScaf performed superbly on simulated heterozygous genomes based on C. parapsilosis (13 Mb; CANPA) and A. thaliana (119 Mb; ARATH) chromosomes, reconstructing correctly all chromosomes always for CANPA and nearly always for ARATH (Figures in dropbox, CANPA table, ARATH table).
Runs took ~0.5 min for CANPA on 4 CPUs and ~2 min for ARATH on 16 CPUs.

Important remarks:

Reduce your assembly before (fasta2homozygous.py) as any redundancy will likely break the synteny.
pyScaf works better with contigs than scaffolds, as scaffolds are often affected by mis-assemblies (no de novo assembler / scaffolder is perfect...), which breaks synteny.
pyScaf works very well if divergence between reference genome and assembled contigs is below 20% at nucleotide level.
pyScaf deals with large rearrangements ie. deletions, insertion, inversions, translocations. Note however, this is experimental implementation!
Consider closing gaps after scaffolding.

Address of the bookmark: https://github.com/lpryszcz/pyScaf

MCscan

Bulbul — Thu, 22 Dec 2016 03:53:58 -0600

MCscan is a computer program that can simultaneously scan multiple genomes to identify homologous chromosomal regions and subsequently align these regions using genes as anchors. This is the toolset for generating the synteny correspondences in Plant Genome Duplication Database. It is intended as an easy-to-use and quick way to identify conserved gene arrays both within the same genome and across different genomes.

More at http://chibba.agtec.uga.edu/duplication/mcscan/

Address of the bookmark: http://chibba.agtec.uga.edu/duplication/mcscan/

YAHA

Jit — Fri, 20 Jan 2017 05:38:05 -0600

YAHA, a fast and flexible hash-based aligner. YAHA is as fast and accurate as BWA-SW at finding the single best alignment per query and is dramatically faster and more sensitive than both SSAHA2 and MegaBLAST at finding all possible alignments. Unlike other aligners that report all, or one, alignment per query, or that use simple heuristics to select alignments, YAHA uses a directed acyclic graph to find the optimal set of alignments that cover a query using a biologically relevant breakpoint penalty. YAHA can also report multiple mappings per defined segment of the query. We show that YAHA detects more breakpoints in less time than BWA-SW across all SV classes, and especially excels at complex SVs comprising multiple breakpoints.

Availability: YAHA is currently supported on 64-bit Linux systems. Binaries and sample data are freely available for download from http://faculty.virginia.edu/irahall/YAHA.

Contact:

http://genome.wustl.edu/people/groups/detail/hall-lab/

Address of the bookmark: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3463118/

FSA: Fast Statistical Alignment

Jit — Mon, 06 Feb 2017 04:26:01 -0600

FSA is a probabilistic multiple sequence alignment algorithm which uses a "distance-based" approach to aligning homologous protein, RNA or DNA sequences. Much as distance-based phylogenetic reconstruction methods like Neighbor-Joining build a phylogeny using only pairwise divergence estimates, FSA builds a multiple alignment using only pairwise estimations of homology. This is made possible by the sequence annealing technique for constructing a multiple alignment from pairwise comparisons, developed by Ariel Schwartz in "Posterior Decoding Methods for Optimization and Control of Multiple Alignments."

FSA brings the high accuracies previously available only for small-scale analyses of proteins or RNAs to large-scale problems such as aligning thousands of sequences or megabase-long sequences. FSA introduces several novel methods for constructing better alignments:

FSA uses machine-learning techniques to estimate gap and substitution parameters on the fly for each set of input sequences. This "query-specific learning" alignment method makes FSA very robust: it can produce superior alignments of sets of homologous sequences which are subject to very different evolutionary constraints.
FSA is capable of aligning hundreds or even thousands of sequences using a randomized inference algorithm to reduce the computational cost of multiple alignment. This randomized inference can be over ten times faster than a direct approach with little loss of accuracy.
FSA can quickly align very long sequences using the "anchor annealing" technique for resolving anchors and projecting them with transitive anchoring. It then stitches together the alignment between the anchors using the methods described above.
The included GUI, MAD (Multiple Alignment Display), can display the intermediate alignments produced by FSA, where each character is colored according to the probability that it is correctly aligned (see the picture and movie at the top of the page).

You can see more information on the FAQ.

Address of the bookmark: http://fsa.sourceforge.net/

GenomeComp

Jit — Fri, 17 Feb 2017 08:38:32 -0600

GenomeComp is a tool for summarizing, parsing and visualizing the genome wide sequence comparison results derived from voluminous BLAST textual output, so as to locate the rearrangements, insertions or deletions of genome segments between species or strains.

It can be easily used to compare, parsing and visualize large genomic sequences, especially closely related genomes such as inter-species or inter-strains. In addition, it can also show other sequence features like repeat sequence distributions in one whole-genome DNA sequence by comparing the genome to itself.

It is a stand-alone graphical user interface (GUI) program which runs on Linux, Unix, Mac OS X (tested on version 10.2.4 only) and Microsoft Windows platforms and is written in Perl/Tk.

Address of the bookmark: http://www.mgc.ac.cn/GenomeComp/

Krona

Jit — Wed, 22 Mar 2017 04:47:35 -0500

Krona allows hierarchical data to be explored with zooming, multi-layered pie charts. Krona charts can be created using an Excel template or KronaTools, which includes support for several bioinformatics tools and raw data formats. The interactive charts are self-contained and can be viewed with any modern web browser (see Browser support).

Address of the bookmark: https://github.com/marbl/Krona/wiki

Researcher in computer science/biology

Mon, 15 Jul 2013 18:38:40 -0500

Researcher in Computer Science at the Computational Biology Unit - temporary employment

The Department of Informatics is a vacant position as a researcher in computer science, related to Computational Biology Unit (CBU), for 3 years.

The position is part of CBU Service Group and will focus on bioinformatic analysis project and especially the analysis of high-throughput data, including NGS (sequencing), and proteomics data.

The successful candidate will be part of the Norwegian bioinformatics platform's national helpdesk within the project ELIXIR.NO

Applicants must hold a PhD in a relevant subject such as computer science, mathematics, molecular biology and also possess expertise and experience in bioinformatics statistics and analysis of data from high-throughput molecular experiment.

Basic programming or scripting skills are required. Experience in Python, R, Perl, Linux-based operating systems and moreover knowledge of databases and web programming will be a strength for applicants.

We expect enthusiasm and independence and moreover the ability to work in an interdisciplinary team environment.

Good knowledge of English is required.

Salaries start at level 57 (code 1109/LR 24.1) by appointment. Further promotion occurs after
service seniority in the position (at grade 57-65). Of particularly highly qualified applicants may be considered a higher salary.

Further information about the position is available from the chair of the CBU,
Professor Inge Jonassen, e-mail: Inge.Jonassen @ ii.uib.no

The successful applicant must comply with the guidelines that apply at any given time the position.

State employment shall as far as possible reflect the diversity of the population. It is therefore an objective to achieve a balanced age and sex composition and the recruitment of persons with immigrant backgrounds. Persons with immigrant background are requested to apply for the position.

Women are particularly encouraged to apply. If the experts find that several applicants have approximately equivalent qualifications, the rules on equal in the Personnel Regulations for Academic Positions will be applied.

University of Bergen applies the principles of public openness when recruiting staff to scientific positions.

Information about the applicant may be made public even though the applicant has requested not to be named in the list of applicants. If the request does not host admitted to the result, the applicant shall be notified of this.

Send application, CV, certificates, diplomas, undergraduate work and a list of publications (list of publications) online by clicking on https://www.jobbnorge.no/jobbsoknet/login.aspx?returnurl=/jobbsoknet/jobapplication.aspx?jobid=95196

You need to upload certified translations into English or a Scandinavian language of appendices, such as diplomas and transcripts.

Applications sent by email to individuals at the institute will not be considered.

Deadline: 9 August 2013

Fastq format

Jit — Wed, 03 May 2017 04:23:32 -0500

FASTQ format is a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores. Both the sequence letter and quality score are each encoded with a single ASCII character for brevity.

It was originally developed at the Wellcome Trust Sanger Institute to bundle a FASTA sequence and its quality data, but has recently become the de facto standard for storing the output of high-throughput sequencing instruments such as the Illumina Genome Analyzer.^[1]

Address of the bookmark: https://en.wikipedia.org/wiki/FASTQ_format