BOL: Related items

HALC: High throughput algorithm for long read error correction

Jit — Fri, 08 Jun 2018 10:47:41 -0500

HALC, a high throughput algorithm for long read error correction. HALC aligns the long reads to short read contigs from the same species with a relatively low identity requirement so that a long read region can be aligned to at least one contig region, including its true genome region’s repeats in the contigs sufficiently similar to it (similar repeat based alignment approach) HALC was able to obtain 6.7-41.1% higher throughput than the existing algorithms while maintaining comparable accuracy. The HALC corrected long reads can thus result in 11.4-60.7% longer assembled contigs than the existing algorithms.

Address of the bookmark: https://github.com/lanl001/halc

NanoPack: visualizing and processing long-read sequencing data

Jit — Fri, 10 Aug 2018 18:41:34 -0500

The NanoPack tools are written in Python3 and released under the GNU GPL3.0 License. The source code can be found at https://github.com/wdecoster/nanopack, together with links to separate scripts and their documentation. The scripts are compatible with Linux, Mac OS and the MS Windows 10 subsystem for Linux and are available as a graphical user interface, a web service at http://nanoplot.bioinf.be and command line tools.

https://academic.oup.com/bioinformatics/article/34/15/2666/4934939

Address of the bookmark: https://github.com/wdecoster/nanoQC

Nieduszynski Group

Fri, 26 Sep 2014 19:35:06 -0500

Complete, accurate replication of the genome is essential for life. All chromosomes in eukaryotic cells must be duplicated and then segregated to daughter cells to ensure genetic integrity and produce the large number of cells that make up a multicellular organism. We are using genetic, genomic and computational methods to understand how chromosome replication is regulated to ensure genome stability. By focusing on the basic biology that underpins cell growth and division we aim to provide new insights that may help our understanding of diseases such as cancer and congenital disorders.

More http://www.nieduszynski.org/index.php
http://www.path.ox.ac.uk/research/cell-biology-and-pathology/conrad-nieduszynski-group

shovill: Assemble bacterial isolate genomes from Illumina paired-end reads

BioStar — Sat, 02 Jan 2021 07:05:36 -0600

Shovill is a pipeline which uses SPAdes at its core, but alters the steps before and after the primary assembly step to get similar results in less time. Shovill also supports other assemblers like SKESA, Velvet and Megahit, so you can take advantage of the pre- and post-processing the Shovill provides with those too.

Address of the bookmark: https://github.com/tseemann/shovill

COPE: an accurate k-mer-based pair-end reads connection tool to facilitate genome assembly

Jit — Wed, 06 Dec 2017 02:08:14 -0600

An efficient tool called Connecting Overlapped Pair-End (COPE) reads, to connect overlapping pair-end reads using k-mer frequencies. We evaluated our tool on 30× simulated pair-end reads from Arabidopsis thaliana with 1% base error. COPE connected over 99% of reads with 98.8% accuracy, which is, respectively, 10 and 2% higher than the recently published tool FLASH. When COPE is applied to real reads for genome assembly, the resulting contigs are found to have fewer errors and give a 14-fold improvement in the N50 measurement when compared with the contigs produced using unconnected reads.

Address of the bookmark: ftp://ftp.genomics.org.cn/pub/cope

ALPACA: A hybrid strategy for assembly of genomic DNA shotgun sequencing reads.

Seema Singh — Mon, 30 Apr 2018 04:38:40 -0500

ALPACA requires Celera Assembler 8.3 or later. It is recommended to build Celera Assembler from source. (Why? The pre-built binaries CA_8.3rc1 and CA8.3rc2 will work for any large data set.

Detail paper at https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-017-3927-8

Address of the bookmark: https://github.com/VicugnaPacos/ALPACA

Frequent Paired-end reads (PE 2x100) mapping command lines

Jit — Tue, 15 May 2018 08:59:29 -0500

bowtie2 -x hs37m -X 650 -q -1 r1.fq -2 r2.fq -S r12.bowtie2.sam

bwa aln hs37m.fa r1.fq > r1.sai && bwa aln hs37m.fa r2.fq > r2.sai \
&& bwa sampe hs37m r1.sai r2.sai r1.fq r2.fq > r12.bwa.sam

bwa bwasw ../index/bwa/hs37m.fa r12.fq > r12.bwasw.sam

gsnap -A sam -d hs37m r1.fq r2.fq > r12.gsnap.sam

novoalign -r Random -o SAM -f r1.fq r2.fq -i 500 50 -d hs37m-k14s3.novo > r12.novo.sam

smalt map -f samsoft -i 650 -o r12.smalt-k20s13.sam hs37m-k20s13 r1.fq r2.fq

stampy.py -g hs37m -h hs37m -o r12.stampy.sam -M r1.fq,r2.fq

soap -D hs37m.fa.index -a r1.fq -b r2.fq -l 32 -g 3 -u dummy -2 dummy -o r12.soap

pbalign: maps PacBio reads to reference sequences and saves alignments to a BAM file

Jit — Thu, 24 May 2018 10:06:52 -0500

pbalign aligns PacBio reads to reference sequences, filters aligned reads according to user-specific filtering criteria, and converts the output to either the SAM format or PacBio Compare HDF5 (e.g., .cmp.h5) format. The output Compare HDF5 file will be compatible with Quiver if --forQuiver option is specified.

Address of the bookmark: https://github.com/PacificBiosciences/pbalign

Breakpointer: using local mapping artifacts to support sequence breakpoint discovery from single-end reads

Jit — Tue, 12 Jun 2018 12:41:10 -0500

Breakpointer is a fast tool for locating sequence breakpoints from the alignment of single end reads (SE) produced by next generation sequencing (NGS). It adopts a heuristic method in searching for local mapping signatures created by insertion/deletions (indels) or more complex structural variants(SVs).

Address of the bookmark: https://github.com/ruping/Breakpointer

LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly

Rahul Nayak — Thu, 14 May 2020 15:09:52 -0500

LR_Gapcloser is a gap closing tool using long reads from studied species. The long reads could be downloaed from public read archive database (for instance, NCBI SRA database ) or be your own data. Then they are fragmented and aligned to scaffolds using BWA mem algorithm in BWA package. In the package, we provided a compiled bwa, so the user needn't to install bwa. LR_Gapcloser uses the alignments to find the bridging that cross the gap, and then fills the long read original sequence into the genomic gaps.

Address of the bookmark: https://github.com/CAFS-bioinformatics/LR_Gapcloser