Sequence Ids conversion files !
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ Name Size Date Modified ARCHIVE/ 02/01/2020, 05:30:00 ASN_BINARY/ 03/07/2020, 07:49:00 GENE_INFO/ 03/07/2020, 07:48:00 0...1415 days ago
Reformat the multifasta for sequence length !
#awk oneliner to reformat the multifasta sequences awk '!/^>/ {printf "%s", $0; n = "\n"} /^>/ {print n $0; n = ""}' file.fasta | fold -w 1001405 days ago
Bash script to handle Multifasta files
#Convert all lowercase residues to uppercase in a FASTA sequence file $ awk 'BEGIN{FS=" "}{if(!/>/){print toupper($0)}else{print $1}}' input.fasta > output.fasta #Rearrange FASTA sequences...1370 days ago
1353 days ago
1217 days ago
Python script to read FASTA and FASTQ file !
# !/usr/bin/env python3 # -*- coding: utf-8 -*- from pysam import FastxFile def read_fasta_...stxFile(fasta_q_file) as fh: for entry in fh: sequence_id = entry.name sequence...990 days ago
Remove dupcates in multifasta file !
#Using seqkit for duplicate sequence removal seqkit rmdup -n seqs.fa -o seqs_without_duplicate.fa #Awk for duplicate sequence removal awk '/^>/ { f = !a[$0]++ } f' seqs.fa989 days ago
Oneliner to convert lower-case to sequence masked with Ns
perl -pe '/^[^>]/ and $_=~ s/[a-z]/N/g' genomic.fna > genomic.N-masked.fna awk '{if(/^[^>]/)gsub(/[a-z]/,"N");print $0}' genomic.fna > genomic.N-masked.fna963 days ago
Perl script for Smith-Waterman Algorithm
# Smith-Waterman Algorithm # usage statement die "usage: $0 \n" unless @ARGV == 2; # get sequences from command line...963 days ago
963 days ago