BOL: Related items

PANDASEQ is a program to align Illumina reads, optionally with PCR primers embedded in the sequence, and reconstruct an overlapping sequence.

BioStar — Fri, 21 Sep 2018 10:19:52 -0500

Development packages for zlib and libbz2 are needed, as well as a standard compiler environment. On Ubuntu, this can be installed via:

sudo apt-get install build-essential libtool automake zlib1g-dev libbz2-dev pkg-config

On MacOS, the Apple Developer tools and Fink (or MacPorts or Brew) must be installed, then:

sudo fink install bzip2-dev pkgconfig

Address of the bookmark: https://github.com/neufeld/pandaseq

The 8000 years old Tibetian gene mutation !!!

Neel — Wed, 20 Aug 2014 21:57:44 -0500

A new study has provided insight into how gene mutation around 8,000 years ago helped Tibetans' to survive in the thin air on the Tibetan Plateau, where an average elevation is of 14,800 feet.

A study led by University of Utah scientists is the first to find a genetic cause for the adaptation, a single DNA base pair change that dates back 8,000 years and demonstrate how it contributes to the Tibetans' ability to live in low oxygen conditions.

About 8,000 years ago, the gene EGLN1 changed by a single DNA base pair. Today, a relatively short time later on the scale of human history, 88 percent of Tibetans have the genetic variation, and it was virtually absent from closely related lowland Asians. The findings indicate the genetic variation endows its carriers with an advantage.

In those without the adaptation, low oxygen caused their blood to become thick with oxygen-carrying red blood cells, an attempt to feed starved tissues, which could cause long-term complications such as heart failure. The researchers found that the newly identified genetic variation protected Tibetans by decreasing the over-response to low oxygen.

Reference: http://www.nature.com/nature/journal/v512/n7513/abs/nature13408.html

ARCS: scaffolding genome drafts with linked reads

Jit — Mon, 17 Dec 2018 17:40:28 -0600

ARCS requires two input files:

Draft assembly fasta file
Interleaved linked reads file (Barcode sequence expected in the BX tag of the read header or in the form "@readname_barcode" ; Run Long Ranger basic on raw chromium reads to produce this interleaved file)

Address of the bookmark: https://github.com/bcgsc/ARCS/

Assistant Professor - Medical Bioinformatics

Wed, 23 Jul 2014 05:00:38 -0500

Advt. No : ME-I/A-IV/03/14

No.of Posts:01 (SC)

Pay Scale:

Pay Band of Rs.15600-39100 + Rs.6000/- GP +NPA @ 25% of Basic Pay +Learning Resource Allowance @ Rs.20,000/-P.A.+ Conveyance Allowance @ Rs. 1650/-P.M.+ Academic Allowance @ Rs.2500/- P.M. and other admissible allowances.

Qualifications:

Area of Specialization:-

Bioinformatics/Computational/Biology/Genomics/ Proteomics/ Structural Biology

1. Postgraduate qualification, e.g. Master’s Degree in Biotechnology/Bioinformatics/ Biophysics.

2. A Doctorate Degree of recognized University/Institute in a basic or allied Medical Science subject e.g. Medical Biotechnology/Biophysics. Bioinformatics/X-ray Crystallography/

Immunology/Structural Biology etc

Experience:

1.Minimum three years teaching and/or research experience in a recognized medical/research Institution in an allied medical subject after obtaining doctorate degree and preferably in Medical

Molecular Biology/ Biophysics/Structural Biology/Genomics and Clinical Proteomics/Computational Biology.

2. Minimum two publication with atleast one in international journal and atleast one as first author

Desirable:-

Consistently excellent scholastic/academic record, demonstrated ability to write grant proposal/(s) successfully, Post Doctoral training in a frontier area of medical Bioinformatics Research and of direct relevance to clinical diagnosis or patient care (preferably from a recognized top-ranking medical institution abroad)

Send your applications to O/O, Deputy Registrar, Recruitment & Establishment Cell, University of Health Sciences, Rohtak by 08.7.2014

For more details,please visit website:http://pgimsrohtak.nic.in/2014%20AP%20Advt.pdf

Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads

BioStar — Tue, 04 Feb 2020 23:23:16 -0600

Rcorrector has an accuracy higher than or comparable to existing methods, including the only other method (SEECER) designed for RNA-seq reads, and is more time and memory efficient. With a 5 GB memory footprint for 100 million reads, it can be run on virtually any desktop or server. The software is available free of charge under the GNU General Public License from https://github.com/mourisl/Rcorrector/.

Usage: perl run_rcorrector.pl [OPTIONS]
OPTIONS:
	Required
	-s seq_files: comma separated files for single-end data sets
	-1 seq_files_left: comma separated files for the first mate in the paried-end data sets
	-2 seq_files_right: comma separated files for the second mate in the paired-end data sets
	-i seq_files_interleaved: comma sperated files for interleaved paired-end data sets
	Optional
	-k INT: kmer_length (<=32, default: 23)
	-od STRING: output_file_directory (default: ./)
	-t INT: number of threads to use (default: 1)
	-trim : allow trimming (default: false)
	-maxcorK INT: the maximum number of correction within k-bp window (default: 4)
	-wk FLOAT: the proportion of kmers that are used to estimate weak kmer count threshold, lower for more divergent genome (default: 0.95)
	-ek INT: expected number of kmers; does not affect the correctness of program but affects the memory usage (default: 100000000)
	-stdout: output the corrected reads to stdout (default: not used)
	-verbose: output some correction information to stdout (default: not used)
	-stage INT: start from which stage (default: 0)
		0-start from begining(storing kmers in bloom filter) ;
		1-start from count kmers showed up in bloom filter;
		2-start from dumping kmer counts into a jf_dump file;
		3-start from error correction.

Address of the bookmark: https://github.com/mourisl/Rcorrector/

The 5 reasons to mistakes at bioinformatics work !!!

Jit — Thu, 24 Jul 2014 02:51:41 -0500

When you're just starting out with biological programming, it's easy to run into complex problems that make you wonder how anyone has ever managed to write a program. There are some problems that trip up nearly every bioinformatician--everything from getting started understanding the biological problems to dealing with program design. Some random mistakes are so prominent that even experienced biological programmers do it. The 8 years in bioinformatics and my few random observations, most of them are snarky. These reasons will always take longer than expected and compel you to postpone your project deadline.

1.Stupid for biologist: Biology is so complex that it will make bioinformatician feel stupid. There are no any universal fixed rules; it can surprise you any time. So be nice to biologists who ask questions and resolve your biological puzzles. Sometime you will have no idea what the hell you were doing either.

2.Puzzling why: Do not hesitate to ask question. Especially. at the beginning of project you will have to ask a lot of questions. Instead of puzzling it out at end check out and clear your doubt even for a single error. It may can leads to wrong conclusion.

3.Running marathon: The most of the biological software’s documentation is always incomplete. In other word they are no more than 95 percent complete. Sometime a single problem can halt your entire project for months. Compilation and running the pipelines in tedious because almost all are interdependent and need proper configuration. I face the same kind of problem with Evolver :( …

4.Folders missing: The pipelines generate lots of data, and we keep them in several folders for future use. But sometime we delete them by mistake and move to recovery…

5.Digging deeper: Digging deeper is fruitful, but some time it can be catastrophic. You may get frustrated or direction less. So keep a biologist with you for rescue …. Sometime an expert computer programmer to handle your server. Remember, the server will always go down when you need it the most.

The most common frustrating common line: Why do we do this again?

SHAMAN: a user-friendly website for metataxonomic analysis from raw reads to statistical analysis

BioStar — Mon, 17 Aug 2020 05:21:09 -0500

SHAMAN is a shiny application for differential analysis of metagenomic data (16S, 18S, 23S, 28S, ITS and WGS) including bioinformatics treatment of raw reads for targeted metagenomics, statistical analysis and results visualization with a large variety of plots (barplot, boxplot, heatmap, …).
The bioinformatics treatment is based on Vsearch [Rognes 2016] which showed to be both accurate and fast [Wescott 2015].The statistical analysis is based on DESeq2 R package [Anders and Huber 2010] which robustly identifies the differential abundant features as suggested in [McMurdie and Holmes 2014] and [Jonsson2016]. SHAMAN robustly identifies the differential abundant genera with the Generalized Linear Model implemented in DESeq2 [Love 2014].
SHAMAN is compatible with standard formats for metagenomic analysis (.csv, .tsv, .biom) and figures can be downloaded in several formats. A presentation about SHAMAN is available here and a poster here.

More at https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03666-4

Address of the bookmark: https://github.com/aghozlane/shaman

Protein function annotation and machine learning - UPMC - Paris, France

Sat, 02 Aug 2014 01:22:52 -0500

Protein function annotation and machine learning - UPMC - Paris, France

Job Description: We are interested in finding an excellent postdoc with interests in protein functional annotation, machine learning and computer grids. The position is open for 3.5 years at the Université Pierre et Marie Curie, in the heart of paris.

Research topic: Protein function annotation, multiple probabilistic models, domain architecture, machine learning, combinatorial optimization, computer grid.

Title: A novel integrative platform for large scale protein annotation that exploits a multitude of diversified probabilistic models in several protein signature databases.

We propose a novel integrated approach for large scale protein annotation that will exploit an unprecedented amount of genomic data as well as sophisticated machine learning techniques and combinatorial optimization approaches taking advantages of High Performance Computing (HPC) environments. The idea is to uncover as much as possible the evolutionary processes of protein sequences that took place throughout the whole tree of life and that affected the evolution of a protein family. We have already demonstrated in a previous work that the problem of functional annotation is inherent to the ability of uncovering such paths. Now, we shall extend this approach to large scale genome annotation by considering 11 different protein databases, constituted by about 10^9 protein sequences, and by producing a large pool of diversified probabilistic models coding for about 10^7 evolutionary protein pathways. Such models will be used to search for specific domains in genomes to be annotated. Our previous methodology needs to be fundamentally improved to deal with this large amount of biological data. In this project, we shall work on the algorithms to reduce the space of models and the search complexity, and we shall implement some important algorithmic changes towards the realization of a powerful integrated annotation tool.

Where: This project is run on the Laboratoire de Biologie Computationnelle et Quantitative UMR7238 CNRS-UPMC – Analytical Genomics team, headed by A.Carbone. It is co-advised with Pierre-Henri Wuillemin, Laboratoire d’Informatique de Paris 6 – Equipe DECISION.

Start date: September 1st, 2014
Contact Person: Alessandra Carbone
Contact: alessandra.carbone@lip6.fr

FastProNGS: fast preprocessing of next-generation sequencing reads

Rahul Nayak — Sat, 26 Dec 2020 08:35:21 -0600

FastProNGS to integrate the quality control process with automatic adapter removal. Parallel processing was implemented to speed up the process by allocating multiple threads. Compared with similar up-to-date preprocessing tools, FastProNGS is by far the fastest.

Address of the bookmark: https://github.com/Megagenomics/FastProNGS

Swabs to Genomes: A Comprehensive Workflow

Rahul Nayak — Sun, 10 Aug 2014 03:01:21 -0500

The sequencing, assembly, and basic analysis of microbial genomes, once a painstaking and expensive undertaking, has become almost trivial for research labs with access to standard molecular biology and computational tools. However, there are a wide variety of options available for DNA library preparation and sequencing, and inexperience with bioinformatics can pose a significant barrier to entry for many who may be interested in microbial genomics. The objective of the present study was to design, test, troubleshoot, and publish a simple, comprehensive workflow from the collection of an environmental sample (a swab) to a published microbial genome; empowering even a lab or classroom with limited resources and bioinformatics experience to perform it.

Address of the bookmark: https://peerj.com/preprints/453.pdf