BOL: Related items

BioScripts

Rahul Nayak — Sun, 28 Jun 2015 07:46:14 -0500

You are requested to please bookmark collection of bioinformatics tools, scripts, codes that can be pieced together in a very easy and flexible manner to perform both simple and complex bioinformatics tasks.

The next-generation sequencing included whole genome sequencing(WGS), transcriptome sequencing (whole cDNA sequencing, RNA-seq), digital gene expression sequencing (Tag-Seq), ChIP-Seq, and so on. And there are many sequencing platform to generate sequece, as well know Sanger/ABi(the frist generation), Solexa/illumina, SOLiD/ABi, 454/Roche. But thier sequence format is different, also they have different error type. High quality data is very important for further analysis or data mining. There are many pipeline for raw sequence quality analysis and control with few of process for reporting reads quality statistical details, trimming, filtering, and error correction. Please bookmarks them for the benefits of bioinformatics community.

https://code.google.com/p/biowiki/

https://code.google.com/p/ngs-pipeline/source/browse/#svn%2Ftrunk

NGSand Perl scripts https://code.google.com/hosting/search?q=NGS+perl&projectsearch=Search+projects

NGS and Python scripts https://code.google.com/hosting/search?q=NGS+Python&projectsearch=Search+projects

Address of the bookmark: https://code.google.com/hosting/search?q=bioinformatics&sa=Search

Bioinformatics in Africa: Part2 - Kenya

BioStar — Sat, 06 Feb 2021 13:23:54 -0600

International Livestock Research Institute (ILRI):

Under a NEPAD initiative, the Biosciences Eastern and Central Africa (BECA) (www.biosciencesafrica.org) was established at ILRI. BECA consists of a hub, regional nodes, and other affiliated laboratories and partner institutes. A state of the art joint Bioinformatics Platform (www.becabioinfo.org), whose overall goal is to provide a coherent and powerful bioinformatics infrastructure for use by all scientists in East and central Africa. The Platform goal requires both physical and intellectual developments that together provide researchers with access to diverse infrastructure in a widearea network, thereby addressing four important aspects of bioinformatics:

1) Science: bioinformatics tools for data integration and visualization, standardization of data formats and data analysis strategies, and distribution of analysis tasks over local and widearea networks are in development;

2) Bioinformatics Support Facility: provides assistance and custom programming to projects and those unable to establish a bioinformatics support function intrinsic to their project due to shortage of qualified personnel or lack of funding;

3) Hardware Platform: provide a powerful high performance computing platform capable of handling the largest analysis needs for projects;

4) Bioinformatics Training for East and central African scientists: While many Webbased tools are available to the wetlab researcher, the Web is not well suited for tasks beyond singlesequence annotation. Researchers need to become productive in a serverbased Unix environment with its wealth of scripting and automation tools. Even at an entrylevel, this can be an intimidating task if proper guidance is not available.

International Centre of Insect Physiology and Ecology (ICIPE): ICIPE’s research focus is on insect biology, in order to improve the wellbeing of the peoples of the tropics through insect science. There is a commitment to utilise contemporary science in order to limit the impact of disease vectors, and agricultural pests. The understanding of the mechanisms associated with behaviour (e.g. attraction and repellency) is crucial. ICIPE seeks to enhance its bioinformatics capacity in order to support data from various EST projects designed to gain insights into the insect ecology and plant pathogen interactions though studies of metabolic pathways associated with production of all elochemicals.

Longterm training activities:

Kenyatta University: An introductory course in Bioinformatics is offers to MSc Biotechnology students. This comprises of 35 hours of lectures and practicals.

University of Nairobi: A centre for Biotechnology and Bioinformatics (CEBIB), which will offer postgraduate training (diplomas, MSc and PhD) in areas of biotechnology and bioinformatics has recently been launched. Other universities in Kenya, including Egerton, Maseno and the Jomo Kenyatta University of Agriculture and Technology offer introductory courses to undergraduates in biomedical sciences. In addition, under the BECA platform MSc and PhD fellowships are being made available for Bioinformatics students. ILRI is forging links with Universities in South Africa and the United Kingdom to provide access to courses and training material.

Research Interest and Activities:

The following are the present areas of research interest: 1. EST clustering 2. Genome sequencing and annotation 3. Functional genomics and proteomics (including key tropical pathogens) 4. Structural bioinformatics 5. Development of Bioinformatics Data Management Systems 6. Gene Mining 7. High Throughput Genotyping 8. Microarray data management and analysis 9. Metagenomics 10. Immunoinformatics 11. Hostpathogen interaction 12. High performance computing and grid development 13. Parasite transfection technologies 14. Cell cycle regulation 15. Population genetics 16. Vector genomics 17. Drug, vaccine and diagnostic target discovery

EWAS: epigenome-wide association study software 2.0

Jit — Wed, 21 Mar 2018 18:14:00 -0500

EWAS2.0 can analyze EWAS data and identify the association between epigenetic variations and disease/phenotype. On the basis of EWAS1.0, we have added more distinctive features. EWAS2.0 software was developed based on our “population epigenetic framework” and can perform: (1) epigenome-wide single marker association study; (2) epigenome-wide methylation haplotype (meplotype) association study; and (3) epigenome-wide association meta-analysis.

Address of the bookmark: http://www.bioapp.org/ewas/

Flu Attack! How A Virus Invades Your Body

Thu, 22 Aug 2013 08:09:51 -0500

When you get the flu, viruses turn your cells into tiny factories that help spread the disease. In this animation, NPR's Robert Krulwich and medical animator David Bolinsky explain how a flu virus can trick a single cell into making a million more viruses. See and hear the rest of the story on NPR.org: http://www.npr.org/templates/story/story.php?storyId=114075029 Credit: Robert Krulwich, David Bolinsky, Jason Orfanon

Submit your SARS-CoV-2 sequence data to GenBank

Neel — Thu, 09 Apr 2020 18:28:25 -0500

Submit your SARS-CoV-2 sequence data to GenBank and SRA with our new submission landing page. Submission is simple and streamlined *and* there’s a rapid turnaround. https://submit.ncbi.nlm.nih.gov/sarscov2/

Quickly and easily add your SARS-CoV-2 sequence data to the growing public archive with new, special features and support from NCBI. new SARS-CoV-2 sequence submission landing page will help you get started. GenBank submissions are accessioned and released in approximately 1-2 working days, and Sequence Read Archive (SRA) submissions typically processed and released within hours. Submission is simple!

More information is available on NCBI Insights. https://ncbiinsights.ncbi.nlm.nih.gov/2020/04/09/sars-cov2-data-streamlined-submission-rapid-turnaround/

Sequencing Solutions to World Health

Rahul Agarwal — Thu, 29 Aug 2013 15:05:35 -0500

"New technology that quickly, easily and economically reveals the genomes of viruses and pathogens transforms public health and medicine."

Source: Life technologies

Address of the bookmark: http://www.lifetechnologies.com/global/en/home/communities-social/blog/blogs/sequencing-solutions-to-world-health.html?cid=social_blogseries_20130829_11098264

Omega2: metagenome assembly pipeline

Jit — Mon, 10 Jul 2017 05:56:07 -0500

Omega found overlaps between reads using a prefix/suffix hash table. The overlap graph of reads was simplified by removing transitive edges and trimming short branches. Unitigs were generated based on minimum cost flow analysis of the overlap graph and then merged to contigs and scaffolds using mate-pair information. In comparison with three de Bruijn graph assemblers (SOAPdenovo, IDBA-UD and MetaVelvet), Omega provided comparable overall performance on a HiSeq 100-bp dataset and superior performance on a MiSeq 300-bp dataset. In comparison with Celera on the MiSeq dataset, Omega provided more continuous assemblies overall using a fraction of the computing time of existing overlap-layout-consensus assemblers. This indicates Omega can more efficiently assemble longer Illumina reads, and at deeper coverage, for metagenomic datasets.

Address of the bookmark: http://omega.omicsbio.org/

miniasm: very fast OLC-based de novo assembler for noisy long reads

Jit — Mon, 27 Nov 2017 07:58:49 -0600

Miniasm is a very fast OLC-based de novo assembler for noisy long reads. It takes all-vs-all read self-mappings (typically by minimap) as input and outputs an assembly graph in the GFA format. Different from mainstream assemblers, miniasm does not have a consensus step. It simply concatenates pieces of read sequences to generate the final unitig sequences. Thus the per-base error rate is similar to the raw input reads.

So far miniasm is in early development stage. It has only been tested on a dozen of PacBio and Oxford Nanopore (ONT) bacterial data sets. Including the mapping step, it takes about 3 minutes to assemble a bacterial genome. Under the default setting, miniasm assembles 9 out of 12 PacBio datasets and 3 out of 4 ONT datasets into a single contig. The 12 PacBio data sets are PacBio E. coli sample, ERS473430, ERS544009, ERS554120, ERS605484, ERS617393, ERS646601, ERS659581, ERS670327, ERS685285, ERS743109 and a deprecated PacBio E. coli data set. ONT data are acquired from the Loman Lab.

For a C. elegans PacBio data set (only 40X are used, not the whole dataset), miniasm finishes the assembly, including reads overlapping, in ~10 minutes with 16 CPUs. The total assembly size is 105Mb; the N50 is 1.94Mb. In comparison, the HGAP3produces a 104Mb assembly with N50 1.61Mb. This dotter plot gives a global view of the miniasm assembly (on the X axis) and the HGAP3 assembly (on Y). They are broadly comparable. Of course, the HGAP3 consensus sequences are much more accurate. In addition, on the whole data set (assembled in ~30 min), the miniasm N50 is reduced to 1.79Mb. Miniasm still needs improvements.

Miniasm confirms that at least for high-coverage bacterial genomes, it is possible to generate long contigs from raw PacBio or ONT reads without error correction. It also shows that minimap can be used as a read overlapper, even though it is probably not as sensitive as the more sophisticated overlapers such as MHAP and DALIGNER. Coupled with long-read error correctors and consensus tools, miniasm may also be useful to produce high-quality assemblies.

Minimap and miniasm are ultrafast tools for (i) mapping and (ii) assembly. Designed for long, noisy reads, they do not have a correction or consensus step, and therefore the resulting assemblies are contiguous (i.e. long) but very noisy (i.e. full of errors)

We start with an all against all comparison:

minimap -Sw5 -L100 -m0 -t8 reads.fq reads.fq | gzip -1 > reads.paf.gz

Then we can assemble

miniasm -f reads.fq reads.paf.gz > reads.gfa

Convert GFA to FASTA:

awk '/^S/{print ">"$2"\n"$3}' reads.gfa | fold > reads.fa

And then count how many contigs:

grep ">" reads.fa | wc -l

# Download sample PacBio from the PBcR website
wget -O- http://www.cbcb.umd.edu/software/PBcR/data/selfSampleData.tar.gz | tar zxf -
ln -s selfSampleData/pacbio_filtered.fastq reads.fq
# Install minimap and miniasm (requiring gcc and zlib)
git clone https://github.com/lh3/minimap && (cd minimap && make)
git clone https://github.com/lh3/miniasm && (cd miniasm && make)
# Overlap
minimap/minimap -Sw5 -L100 -m0 -t8 reads.fq reads.fq | gzip -1 > reads.paf.gz
# Layout
miniasm/miniasm -f reads.fq reads.paf.gz > reads.gfa

Address of the bookmark: https://github.com/lh3/miniasm

MashMap: a fast and approximate software for mapping long reads (PacBio/ONT) or assembly to reference genome(s)

Jit — Tue, 12 Dec 2017 17:23:31 -0600

MashMap is a fast and approximate software for mapping long reads (PacBio/ONT) or assembly to reference genome(s). It maps a query sequence against a reference region if and only if its estimated alignment identity is above a specified threshold. It does not compute the alignments explicitly, but rather estimates a k-mer based Jaccard similarity using a combination of Winnowing and MinHash. This is then converted to an estimate of sequence identity using the Mash distance. An appropriate k-mer sampling rate is automatically determined given minimum local alignment length and identity thresholds. The efficiency of the algorithm improves as both of these thresholds are increased.

Address of the bookmark: https://github.com/marbl/MashMap

RGFA: powerful and convenient handling of assembly graphs

Rahul Nayak — Thu, 25 Jan 2018 05:47:53 -0600

RGFA, an implementation of the proposed GFA specification in Ruby. It allows the user to conveniently parse, edit and write GFA files. Complex operations such as the separation of the implicit instances of repeats and the merging of linear paths can be performed. A typical application of RGFA is the editing of a graph, to finish the assembly of a sequence, using information not available to the assembler. We illustrate a use case, in which the assembly of a repetitive metagenomic fosmid insert was completed using a script based on RGFA.

https://github.com/ggonnella/rgfa

Address of the bookmark: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5103826/