• Pages
  • Jit
  • My most common one-liner in bioinformatics

My most common one-liner in bioinformatics

script available for doing sequence length disrtibution of fastq file

cat reads.fastq | awk '{if(NR%4==2) print length($1)}' | sort -n | uniq -c > read_length.txt

Plot syntenty

/home/urbe/Tools/Sibelia-3.0.6-Linux/bin/Sibelia -s far --gff -q -o palCh101 ch101_read33155_template_pass_FAH31515.fa


/home/urbe/Tools/SATSUMA/satsuma-code-0/SatsumaSynteny -t ../allPalindromeSimulated.fa -q MIRAsimulation_out.padded_CORRECTED.fasta -n 2 -m 8 -o output_directory

MicrosyntenyPlot: Go inside the outfolder of Satsuma synteny and run this 

/home/urbe/Tools/SATSUMA/satsuma-code-0/MicroSyntenyPlot -s 8000 -i xcorr_aligns.seeds.out -o seeds.ps

And to quickly obtain a graph in R:

reads<-read.csv(file="read_length.txt", sep="", header=FALSE)
plot (reads$V2,reads$V1,type="l",xlab="read length",ylab="occurences",col="blue")

Bwise command
Bwise.py -x GC047405.170925_ARC.fa -c 1 -p 5 -K 249 -t 40 -o assembly


makeblastdb -in vaga.fa -parse_seqids -dbtype nucl -out vagaDB

blastn -task megablast -query av1.fa -db vagaDB -evalue 1e-5 -num_threads 10 -max_target_seqs 1 -outfmt '6 qseqid qstart qend sseqid sstart send evalue length frames qcovs bitscore' -out seeAV15.megablast