1459 days ago
Sequence Ids conversion files !
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ Name Size Date Modified ARCHIVE/ 02/01/2020, 05:30:00 ASN_BINARY/ 0...01:32:00 770 kB 03/07/2020, 14:38:00 special_requests/ 18/04/2020, 00:...m.nih.gov/gene/DATA/gene2go.gz ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2...1459 days ago
Onliner to split the multifasta to singlefasta files !
#Split the multifasta to singlefasta # Multi fasta #Single fasta awk '$0 ~ "^>" { match($1, /^>([^:]+)/, id); filename=id[1]} {print >> filename".fa"}' sequence.fasta1457 days ago
get GC across the entire CDS !
#look at GC across the entire CDS. gffread -x - -g | \ seqtk comp - | \ awk -v OFS="\t" '{ print $1, "0", $2, ($4 + $5) / $2 }'1450 days ago
Reformat the multifasta for sequence length !
#awk oneliner to reformat the multifasta sequences awk '!/^>/ {printf "%s", $0; n = "\n"} /^>/ {print n $0; n = ""}' file.fasta | fold -w 1001449 days ago
Command to sort the bed file !
#Command to sort the bed file sort -V -k1,1 -k2,2 test.bed1446 days ago
Perl One-Liner to print only non-uppercase letters
#Go through file and only print words that do not have any uppercase letters. perl -ne 'print unless m/[A-Z]/' dna.fa > dnaOnlyLowercase.fa #To lowercase everything perl -pne 'tr/[A-Z]/[a-z]/' dnaUpperCase.fa >dnawithoutuppercase.fa;1440 days ago
Script to extract the cluster detail !
$ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 18.04.1 LTS Release: 18.04 Codename: bionic $ cat /proc/cpuinfo | grep -i 'model name' | head -n 1 model name : Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz1435 days ago
Bash script to get exon fragments from genome files !
#Exons are already defined in the GTF file, so we simply need to print lines that are marked exonic. gunzip -c genome_file.gtf.gz | awk 'BEGIN{OFS="\t";} $3=="exon" {print $1,$4-1,$5}' | bedtools sort | bedtools merge -i - | gzip > my_exon.bed.gz1422 days ago
Bash script to extract intronic fragments !
...obtain introns, we simply need the gene and exonic coordinates; #by subtracting the exonic regions from the genic region, we have the intronic...$1,$4-1,$5}' | bedtools sort | bedtools subtract -a stdin -b my_exon.bed.gz |...1422 days ago