BOL: Related items

FQC Dashboard: Integrates FastQC results into a web-based, interactive, and extensible FASTQ quality control tool

Shruti Paniwala — Tue, 10 Nov 2020 01:30:22 -0600

FQC is software that facilitates quality control of FASTQ files by carrying out a QC protocol using FastQC, parsing results, and aggregating quality metrics into an interactive dashboard designed to richly summarize individual sequencing runs. The dashboard groups samples in dropdowns for navigation among the data sets, utilizes human-readable configuration files to manipulate the pages and tabs, and is extensible with CSV data.

Address of the bookmark: https://github.com/pnnl/fqc

Memories Can Be Passed Down Through DNA

Sat, 10 May 2014 21:24:10 -0500

The premise of Assassin's Creed is the reliving of other people's memories stored inside DNA. Well scientists have found that in mice, it actually happens! Anthony is joined by special guest and our friend Tara Long from Hard Science to explain how this process works, and if it might apply to humans as well. Read More: Parental olfactory experience influences behavior and neural structure in subsequent generations http://www.nature.com/neuro/journal/vaop/ncurrent/abs/nn.3594.html "Using olfactory molecular specificity, we examined the inheritance of parental traumatic exposure, a phenomenon that has been frequently observed, but not understood." What Is Epigenetics? http://www.sciencemag.org/content/330/6004/611 "The cells in a multicellular organism have nominally identical DNA sequences (and therefore the same genetic instruction sets), yet maintain different terminal phenotypes. This nongenetic cellular memory, which records developmental and environmental cues (and alternative cell states in unicellular organisms), is the basis of epi-(above)-genetics." Epigenetics http://en.wikipedia.org/wiki/Epigenetics Watch More: How to Change Your Genes https://www.youtube.com/watch?v=B5DU9lgbsSE TestTube Wild Card http://testtube.com/dnews/dnews-231-how-too-many-screens-affect-our-brain?utm_source=YT&utm_medium=DNews&utm_campaign=DNWC Is Sexiness Hereditary? https://www.youtube.com/watch?v=z6STRCncvM8 ____________________ DNews is dedicated to satisfying your curiosity and to bringing you mind-bending stories & perspectives you won't find anywhere else! New videos twice daily. Watch More DNews on TestTube http://testtube.com/dnews Subscribe now! http://www.youtube.com/subscription_center?add_user=dnewschannel DNews on Twitter http://twitter.com/dnews Anthony Carboni on Twitter http://twitter.com/acarboni Laci Green on Twitter http://twitter.com/gogreen18 Trace Dominguez on Twitter http://twitter.com/trace501 DNews on Facebook http://facebook.com/dnews DNews on Google+ http://gplus.to/dnews Discovery News http://discoverynews.com

Converting FASTQ to FASTA

Neel — Fri, 12 Jan 2018 03:49:09 -0600

There are several ways you can convert fastq to fasta sequences. Some methods are listed below.

Using SED

sed can be used to selectively print the desired lines from a file, so if you print the first and 2rd line of every 4 lines, you get the sequence header and sequence needed for fasta format.

sed -n '1~4s/^@/>/p;2~4p' INFILE.fastq > OUTFILE.fasta

Using PASTE

You can linerize every 4 lines in a tabular format and print first and second field using paste

cat INFILE.fastq | paste - - - - |cut -f 1, 2| sed 's/@/>/'g | tr -s "/t" "/n" > OUTFILE.fasta

EMBOSS:seqret

Standard script that can be used for many purposes. One such use is fastq-fasta conversion

seqret -sequence reads.fastq -outseq reads.fasta

awk can be used for conversion as follows:

Using AWK

cat infile.fq | awk '{if(NR%4==1) {printf(">%s\n",substr($0,2));} else if(NR%4==2) print;}' > file.fa

FASTX-toolkit

fastq_to_fasta is available in the FASTX-toolkit that scales really well with the huge datasets

fastq_to_fasta -h
usage: fastq_to_fasta [-h] [-r] [-n] [-v] [-z] [-i INFILE] [-o OUTFILE]
# Remember to use -Q33 for illumina reads!
version 0.0.6
       [-h]         = This helpful help screen.
       [-r]         = Rename sequence identifiers to numbers.
       [-n]         = keep sequences with unknown (N) nucleotides.
                   Default is to discard such sequences.
       [-v]         = Verbose - report number of sequences.
                   If [-o] is specified,  report will be printed to STDOUT.
                   If [-o] is not specified (and output goes to STDOUT),
                   report will be printed to STDERR.
       [-z]         = Compress output with GZIP.
       [-i INFILE]  = FASTA/Q input file. default is STDIN.
       [-o OUTFILE] = FASTA output file. default is STDOUT.

Bioawk

Another option to convert fastq to fasta format using bioawk

bioawk -c fastx '{print ">"$name"\n"$seq}' input.fastq > output.fasta

Seqtk

From the same developer, there is another option using a tool called seqtk

seqtk seq -a input.fastq > output.fasta

Note that you can use either compressed or uncompressed files for this tool

Bioinformatics JRF/SRF position at NATIONAL RESEARCH CENTRE ON PLANT BIOTECHNOLOGY

Sun, 11 May 2014 22:29:12 -0500

NATIONAL RESEARCH CENTRE ON PLANT BIOTECHNOLOGY
LBS, CENTRE, PUSA CAMPUS, IARI NEW DELHI
NEW DELHI – 110 012

WALK- IN –INTERVIEWS

Eligible candidates may appear in Walk-in-Interview on May 23, 2014 at 10 AM for the posts of Research Associates & Senior Research Fellows (SRF) in the following DST/DBT/ICAR funded projects.

1 NPTC Project on Bioinformatics and Comparative Genomics

Research Associate (One)

Rs. 24000/- + 30% HRA for masters degree holder with more than 4 years experience

Essential: Ph D in Plant Molecular Biology & Biotechnology/Genetics 0r Candidates who have already submitted their Ph D thesis in above subjects

Desirable: Research experience in Genomics, Molecular biology, Microarrays analysis, Gene cloning, transgenic Techniques , and computational analysis.

Senior Research Fellow ( UGCCSIR/ DBT/ ICAR Net qualified only): (One)

Rs. 16000/- + 30% HRA and Rs. 18000+30 HRA from 3rd year onwards

Essential:

1. ICAR/ UGCCSIR/DBT Net qualified only

2. M. Sc. (with thesis) in Biotechnology, Life Sciences, Biosciences/ Bioinformatics, Genetics/ Plant Pathology with experience in molecular biology.

Or M.Sc with more than 3 years research experiences

3. B.Sc. Agriculture or Biology

Desirable:
1. M. Sc. with thesis
2. Experience in molecular biology, plant tissue culture
3. Bioinformatics knowledge is important

2 DST JC Bose National Fellowship

Research Associate (Bioinformatics) : One

Rs.22000/- + 30% HRA for 1 & 2nd Yr., Rs. 23000+ 30% HRA for 3rd year and Rs. 24000+30% HRA for 4th &5th yr

Essential: M Ph D in Plant Molecular Biology & Biotechnology/Genetics

Desirable: Research experience in Genomics, Molecular biology, Microarrays analysis, Gene cloning, transgenic Techniques , and computational analysis.

Age limit: Max.35 years (Age relaxation of 5 years for SC/ST & women and 3 years for OBC)

The posts are purely temporary in nature and are co-terminus with the project. Initially the offer will be made for one year only and may be further extendable based on performance of the candidate. The interview will be held on May 23 , 2014 at 10:00 AM at NRCPB, LBS Building, Pusa Campus, IARI, New Delhi- 110012. The candidates must bring four copies of biodata (in the prescribed proforma), original certificates, attested photocopies of each of the certificates and an attested copy of recent passport size photograph. No. TA/DA would be given for the appearance in interview. Only the candidates having essential qualification would be entertained for the interviews. Short-listing of candidates based on academic merit and experience will be done in case of large number of applicants.

Advertisement: http://www.nrcpb.org/sites/default/files/Advertisement%20for%20RA%20and%20SRF%20Position.pdf

MashMap: a fast and approximate software for mapping long reads (PacBio/ONT) or assembly to reference genome(s)

Jit — Tue, 12 Dec 2017 17:23:31 -0600

MashMap is a fast and approximate software for mapping long reads (PacBio/ONT) or assembly to reference genome(s). It maps a query sequence against a reference region if and only if its estimated alignment identity is above a specified threshold. It does not compute the alignments explicitly, but rather estimates a k-mer based Jaccard similarity using a combination of Winnowing and MinHash. This is then converted to an estimate of sequence identity using the Mash distance. An appropriate k-mer sampling rate is automatically determined given minimum local alignment length and identity thresholds. The efficiency of the algorithm improves as both of these thresholds are increased.

Address of the bookmark: https://github.com/marbl/MashMap

A History of Bioinformatics (in the Year 2039)

Wed, 23 Jul 2014 06:37:51 -0500

C. Titus Brown http://video.open-bio.org/video/1/a-history-of-bioinformatics-in-the-year-2039

The MARVEL assembler

Jit — Fri, 04 May 2018 19:18:41 -0500

MARVEL consists of a set of tools that facilitate the overlapping, patching, correction and assembly of noisy (not so noisy ones as well) long reads.

The assembly process can be summarized as follows:

overlap
patch reads
overlap (again)
scrubbing
assembly graph construction and touring
optional read correction
fasta file creation

Address of the bookmark: https://github.com/schloi/MARVEL

Scientists map 17,294 proteins produced in human body

Jit — Thu, 29 May 2014 01:57:55 -0500

Indian scientists missed the genomic profiling bus, but they've more than made up for it by creating the first human proteome map which is an extension of the genomic study. Till now, here is no direct equivalent for the human proteome. But recently two groups present mass spectrometry-based analysis of human tissues, body fluids and cells mapping the large majority of the human proteome.

The Indian scientists working in Bangalore, along with their American counterparts, have mapped more than 17,000 proteins in 30 organs of the human body. Just like the human genome was sequenced around the turn of the millennium, this is an equivalent mapping of the human proteome.

The researcher estimated there are around 20,500 proteins in the human body. These scientists have profiled around 17,294, which account for around 84% of the total proteins. Apart from this, the team also traced around 2,500 of 3,000 proteins that had been categorised as "missing proteins".

The work, done by group of Indian scientists, and Johns Hopkins University, published in the renowned journal Nature ( http://www.nature.com/nature/journal/v509/n7502/full/nature13302.html ). Of the 72 people who worked on the project, 46 are Indians.

Reference:

http://www.nature.com/nature/journal/v509/n7502/full/nature13302.html

http://www.proteinatlas.org/ -The antibody-based Human Protein Atlas programme

http://www.humanproteomemap.org/ -Proteogenomic analysis by identifying translated proteins from annotated pseudogenes, non-coding RNAs and untranslated regions.

https://www.proteomicsdb.org/ -Assembled protein evidence for 18,097 genes in ProteomicsDB

Hercules: a profile HMM-based hybrid error correction algorithm for long reads

Jit — Mon, 20 Aug 2018 14:14:11 -0500

Choosing whether to use second or third generation sequencing platforms can lead to trade-offs between accuracy and read length. Several studies require long and accurate reads including de novo assembly, fusion and structural variation detection. In such cases researchers often combine both technologies and the more erroneous long reads are corrected using the short reads. Current approaches rely on various graph based alignment techniques and do not take the error profile of the underlying technology into account. Memory- and time- efficient machine learning algorithms that address these shortcomings have the potential to achieve better and more accurate integration of these two technologies. Results: We designed and developed Hercules, the first machine learning-based long read error correction algorithm. The algorithm models every long read as a profile Hidden Markov Model with respect to the underlying platformtextquoterights error profile. The algorithm learns a posterior transition/emission probability distribution for each long read and uses this to correct errors in these reads. Using datasets from two DNA-seq BAC clones (CH17-157L1 and CH17-227A2), and human brain cerebellum polyA RNA-seq, we show that Hercules-corrected reads have the highest mapping rate among all competing algorithms and highest accuracy when most of the basepairs of a long read are covered with short reads. Availability:

Hercules source code is available at https://github.com/BilkentCompGen/Hercules

Address of the bookmark: https://github.com/BilkentCompGen/Hercules

How to sequence the human genome - Mark J. Kiel

Fri, 30 May 2014 13:24:11 -0500

View full lesson: http://ed.ted.com/lessons/how-to-sequence-the-human-genome-mark-j-kiel Your genome, every human's genome, consists of a unique DNA sequence of A's, T's, C's and G's that tell your cells how to operate. Thanks to technological advances, scientists are now able to know the sequence of letters that makes up an individual genome relatively quickly and inexpensively. Mark J. Kiel takes an in-depth look at the science behind the sequence. Lesson by Mark J. Kiel, animation by Marc Christoforidis.