BOL: Related items

maftools

Surabhi Chaudhary — Fri, 17 Dec 2021 03:18:28 -0600

With advances in Cancer Genomics, Mutation Annotation Format (MAF) is being widely accepted and used to store somatic variants detected. The Cancer Genome Atlas Project has sequenced over 30 different cancers with sample size of each cancer type being over 200. Resulting data consisting of somatic variants are stored in the form of Mutation Annotation Format. This package attempts to summarize, analyze, annotate and visualize MAF files in an efficient manner from either TCGA sources or any in-house studies as long as the data is in MAF format.

https://www.bioconductor.org/packages/devel/bioc/vignettes/maftools/inst/doc/maftools.html

Address of the bookmark: https://github.com/PoisonAlien/maftools

Pandoc: a universal document converter

Surabhi Chaudhary — Thu, 24 Jun 2021 01:33:47 -0500

If you need to convert files from one markup format into another, pandoc is your swiss-army knife. Pandoc can convert almost all formats

https://pandoc.org/index.html

Address of the bookmark: https://pandoc.org/

maftools : Summarize, Analyze and Visualize MAF Files

Neel — Wed, 23 Dec 2020 05:29:33 -0600

Address of the bookmark: https://www.bioconductor.org/packages/release/bioc/vignettes/maftools/inst/doc/maftools.html

MafTools

Jit — Thu, 16 Feb 2017 11:16:01 -0600

maftools - An R package to summarize, analyze and visualize MAF files. Introduction.

With advances in Cancer Genomics, Mutation Annotation Format (MAF) is being widley accepted and used to store variants detected. The Cancer Genome Atlas Project has seqenced over 30 different cancers with sample size of each cancer type being over 200. The resulting data consisting of genetic variants is stored in the form of Mutation Annotation Format. This package attempts to summarize, analyze, annotate and visualize MAF files in an efficient manner either from TCGA sources or any in-house studies as long as the data is in MAF format. Maftools can also handle ICGC Simple Somatic Mutation format.

maftools is on bioRxiv

Please cite the below if you find this tool useful for you.

Mayakonda, A. and H.P. Koeffler, Maftools: Efficient analysis, visualization and summarization of MAF files from large-scale cohort based cancer studies. bioRxiv, 2016. doi: http://dx.doi.org/10.1101/052662

Address of the bookmark: https://github.com/PoisonAlien/maftools

maf2synteny

Abhimanyu Singh — Thu, 18 May 2017 05:31:30 -0500

A tool for converting for recovering synteny blocks from multiple alignment (in MAF fromat)

This tool is a standalone version of Ragout module [http://fenderglass.github./Ragout]

Address of the bookmark: https://github.com/fenderglass/maf2synteny

mafTools

Radha Agarkar — Sat, 21 May 2016 22:40:21 -0500

Bioinformatics tools for dealing with Multiple Alignment Format (MAF) files.

Address of the bookmark: https://github.com/dentearl/mafTools

Converting BLAST output into CSV

Poonam Mahapatra — Mon, 11 Dec 2017 04:17:58 -0600

Suppose we wanted to do something with all this BLAST output. Generally, that’s the case - you want to retrieve all matches, or do a reciprocal BLAST, or something.

As with most programs that run on UNIX, the text output is in some specific format. If the program is popular enough, there will be one or more parsers written for that format – these are just utilities written to help you retrieve whatever information you are interested in from the output.

Let’s conclude this tutorial by converting the BLAST output in out.txt into a spreadsheet format, using a Python script.

First, we need to get the script. We’ll do that using the ‘git’ program:

git clone https://github.com/ngs-docs/ngs-scripts.git /root/ngs-scripts

We’ll discuss ‘git’ more later; for now, just think of it as a way to get ahold of a particular set of files. In this case, we’ve placed the files in /root/ngs-scripts/, and you’re looking to run the script blast/blast-to-csv.py using Python:

python /root/ngs-scripts/blast/blast-to-csv.py out.txt

This outputs a spread-sheet like list of names and e-values. To save this to a file, do:

python /root/ngs-scripts/blast/blast-to-csv.py out.txt > ~out.csv

If you have Excel installed, try double clicking on it.

JSON

Abhimanyu Singh — Tue, 04 Apr 2017 08:02:39 -0500

JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language, Standard ECMA-262 3rd Edition - December 1999. JSON is a text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages, including C, C++, C#, Java, JavaScript, Perl, Python, and many others. These properties make JSON an ideal data-interchange language.

JSON is built on two structures:

A collection of name/value pairs. In various languages, this is realized as an object, record, struct, dictionary, hash table, keyed list, or associative array.
An ordered list of values. In most languages, this is realized as an array, vector, list, or sequence.

These are universal data structures. Virtually all modern programming languages support them in one form or another. It makes sense that a data format that is interchangeable with programming languages also be based on these structures.

Address of the bookmark: http://json.org/

Understanding BLASTn output format 6 !

Rahul Nayak — Wed, 27 Jun 2018 18:38:21 -0500

BLASTn output format 6

BLASTn maps DNA against DNA, for example gene sequences against a reference genome

blastn -query genes.ffn -subject genome.fna -outfmt 6

BLASTn tabular output format 6

Column headers:
qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore

1.	qseqid	query (e.g., gene) sequence id
2.	sseqid	subject (e.g., reference genome) sequence id
3.	pident	percentage of identical matches
4.	length	alignment length
5.	mismatch	number of mismatches
6.	gapopen	number of gap openings
7.	qstart	start of alignment in query
8.	qend	end of alignment in query
9.	sstart	start of alignment in subject
10.	send	end of alignment in subject
11.	evalue	expect value
12.	bitscore	bit score

Define your own output format

by adding the option -outfmt, as for example:

-outfmt "6 qseqid sseqid pident qlen length mismatch gapope evalue bitscore"

supported format specifiers are:
qseqid    Query Seq-id
qgi   Query GI
qacc    Query accesion
qaccver   Query accesion.version
qlen    Query sequence length
sseqid    Subject Seq-id
sallseqid All subject Seq-id(s), separated by a ';'
sgi       Subject GI
sallgi    All subject GIs
sacc      Subject accession
saccver   Subject accession.version
sallacc   All subject accessions
slen      Subject sequence length
qstart    Start of alignment in query
qend    End of alignment in query
sstart    Start of alignment in subject
send      End of alignment in subject
qseq      Aligned part of query sequence
sseq      Aligned part of subject sequence
evalue    Expect value
bitscore  Bit score
score   Raw score
length    Alignment length
pident    Percentage of identical matches
nident    Number of identical matches
mismatch  Number of mismatches
positive  Number of positive-scoring matches
gapopen   Number of gap openings
gaps      Total number of gaps
ppos      Percentage of positive-scoring matches
frames    Query and subject frames separated by a '/'
qframe    Query frame
sframe    Subject frame
btop      Blast traceback operations (BTOP)
staxids   Subject Taxonomy ID(s), separated by a ';'
sscinames Subject Scientific Name(s), separated by a ';'
scomnames Subject Common Name(s), separated by a ';'
sblastnames Subject Blast Name(s), separated by a ';'   (in alphabetical order)
sskingdoms  Subject Super Kingdom(s), separated by a ';'     (in alphabetical order)
stitle    Subject Title
salltitles  All Subject Title(s), separated by a '<>'
sstrand   Subject Strand
qcovs   Query Coverage Per Subject
qcovhsp   Query Coverage Per HSP

default values are:
-outfmt "6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore"

Convert VCF to tab-deilimited table

Seema Singh — Tue, 15 May 2018 07:39:08 -0500

Performed with GATK :

java -Xmx8g -jar GenomeAnalysisTK.jar \
-T VariantsToTable \
-R reference.fa \
-V reference_genomes_GT.vcf \
-F CHROM -F POS -F REF -F ALT -GF GT \
-o reference_genomes_GT.table
multiple_sample.vcf should also be converted to multiple_sample_GT.table using this approach.