BOL: Related items

Reverse Complement Problem Solved with Perl

Jit — Tue, 09 Jun 2015 23:37:23 -0500

Question at http://rosalind.info/problems/1b/

#Find the reverse complement of a DNA string.
#Given: A DNA string Pattern.
#Return: Pattern, the reverse complement of Pattern.

use strict;
use warnings;

my $string="AAAACCCGGT";
my $finalString="";
my %hash = (
   "C" => "G",
   "A" => "T",
   "T" => "A",
   "G" => "C",
);

for (my $aa=0; $aa<=(length($string)-1); $aa++) {
   my $char=substr $string, $aa, 1;
   #print $hash{$char};
   $finalString="$hash{$char}"."$finalString";
}

print $finalString;
print "\n";

Clump Finding Problem Solved with Perl

Jit — Wed, 10 Jun 2015 00:17:17 -0500

The question at http://rosalind.info/problems/1d/

Script are moved to http://bioinformaticsonline.com/snippets/view/34633/clump-finding-problem-solved-with-perl

Finding Patterns in Biological Sequences

Jit — Thu, 22 Dec 2016 10:30:49 -0600

In this report we provide an overview of known techniques for discovery of patterns of biological sequences (DNA and proteins). We also provide biological motivation, and methods of biological verification of such patterns. Finally we list publicly available tools and databases for pattern discovery. On-line supplement is available through http://genetics.uwaterloo.ca/∼tvinar/cs798g/motif.

Address of the bookmark: http://engr.case.edu/li_jing/papers/00798gpattern.pdf

MinION_GC: An R script to do some QC on MinION data

Radha Agarkar — Sun, 03 Dec 2017 15:19:18 -0600

Other tools focus on getting data out of the fastq or fast5 files, which is slow and computationally intensive. The benefit of this approach is that it works on a single, small, .txt summary file. So it's a lot quicker than most other things out there: it takes about a minute to analyse a 4GB flowcell on my laptop.

https://github.com/roblanf/minion_qc

Address of the bookmark: https://github.com/roblanf/minion_qc

Perl one-liner for beginners !

BioStar — Fri, 24 Jul 2020 05:58:28 -0500

I often use the following arguments to perl:

-e Makes the line of code be executed instead of a script
-n Forces your line to be called in a loop. Allows you to take lines from the diamond operator (or stdin)
-p Forces your line to be called in a loop. Prints $_ at the end

This counts the number of quotation marks in each line and prints it

perl -ne '$cnt = tr/"//;print "$cnt\n"' inputFileName.txt

Adds string to each line, followed by tab

perl -pe 's/(.*)/string\t$1/' inFile > outFile

Append a new line to each line

perl -pe 's//\n/' all.sent.classOnly > all.sent.classOnly.sep

Replace all occurrences of pattern1 (e.g. [0-9]) with pattern2

perl -p -i.bak -w -e 's/pattern1/pattern2/g' inputFile

Go through file and only print words that do not have any uppercase letters.

perl -ne 'print unless m/[A-Z]/' allWords.txt > allWordsOnlyLowercase.txt

Go through file, split line at each space and print words one per line.

perl -ne 'print join("\n", split(/ /,$_));print("\n")' someText.txt > wordsPerLine.txt

or in other words, delete every character that is not a letter, white space or line end (replace with nothing)

perl -pne 's/[^a-zA-Z\s]*//g' text_withSpecial.txt > text_lettersOnly.txt

perl -pne 'tr/[A-Z]/[a-z]/' textWithUpperCase.txt > textwithoutuppercase.txt;

Print only the second column of the data when using tabular as a separator

perl -ne '@F = split("\t", $_); print "$F[1]";' columnFileWithTabs.txt > justSecondColumn.txt

One-Liner: Sort lines by their length

perl -e 'print sort {length $a <=> length $b} <>' textFile

One-Liner: Print second column, unless it contains a number

perl">perl -lane 'print $F[1] unless $F[1] =~ m/[0-9]/' wordCounts.txt

A simple tutorial for a complex ComplexHeatmap

Neel — Fri, 02 Apr 2021 06:18:32 -0500

ComplexHeatmap (Gu, Eils, and Schlesner (2016)) is an R Programming Language (R Core Team (2020)) package that is currently listed in the Bioconductor package repository.

install and load required packages

  require(RColorBrewer)
  require(ComplexHeatmap)
  require(circlize)
  require(digest)
  require(cluster)

If all load successfully, proceed to Part 3. Otherwise, go through the following code chunks in order to ensure that each package is installed and loaded properly.

BiocManager (Morgan (2019))

Address of the bookmark: https://github.com/kevinblighe/E-MTAB-6141

SViper: Swipe your Structural Variants called on long (ONT/PacBio) reads with short exact (Illumina) reads.

Neel — Sun, 22 Dec 2019 03:48:28 -0600

Call sviper

~$ ./sviper -s short-reads.bam -l long-reads.bam -r ref.fa -c variants.vcf -o polished_variants

This will output a polished_variants.vcf file, that contains all the refined variants.

Sometimes it is helpful to look at the polished sequence, e.g. with the IGV browser. In that case you want SViper to output the polished and aligned sequences in a bam file via the option --output-polished-bam:

~$ ./sviper -s short-reads.bam -l long-reads.bam -r ref.fa -c variants.vcf -o polished_variants --output-polished-bam

Address of the bookmark: https://github.com/smehringer/SViper

Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications.

Jit — Mon, 28 May 2018 09:41:39 -0500

Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs. Manta discovers, assembles and scores large-scale SVs, medium-sized indels and large insertions within a single efficient workflow.

Address of the bookmark: https://github.com/Illumina/manta

vt: a variant tool set that discovers short variants from Next Generation Sequencing data.

Jit — Tue, 28 Jan 2020 03:44:43 -0600

vt is a variant tool set that discovers short variants from Next Generation Sequencing data.

https://genome.sph.umich.edu/wiki/Vt

https://github.com/atks/vt

Address of the bookmark: https://genome.sph.umich.edu/wiki/Vt

karyoploteR: plot whole genomes with arbitrary data

Abhimanyu Singh — Fri, 02 Feb 2018 03:24:28 -0600

karyoploteR is an R package to create karyoplots, that is, representations of whole genomes with arbitrary data plotted on them. It is inspired by the R base graphics system and does not depend on other graphics packages. The aim of karyoploteR is to offer the user an easy way to plot data along the genome to get broad genome-wide view to facilitate the identification of genome wide relations and distributions.

Address of the bookmark: https://bernatgel.github.io/karyoploter_tutorial/