Sequence Ids conversion files !
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ Name Size Date Modified ARCHIVE/ 02/01/2020, 05:30:00 ASN_BINARY/ 03/07/2020, 07:49:00 GENE_INFO/ 03/07/2020, 07:48:00 0...1407 days ago
Reformat the multifasta for sequence length !
#awk oneliner to reformat the multifasta sequences awk '!/^>/ {printf "%s", $0; n = "\n"} /^>/ {print n $0; n = ""}' file.fasta | fold -w 1001397 days ago
Bash script to handle Multifasta files
#Convert all lowercase residues to uppercase in a FASTA sequence file $ awk 'BEGIN{FS=" "}{if(!/>/){print...er($0)}else{print $1}}' input.fasta > output.fasta #Rearrange FASTA sequences according to their length...1362 days ago
1345 days ago
1208 days ago
Python script to read FASTA and FASTQ file !
..._file (str): Path to FASTA/Q file. """ with FastxFile(fasta_q_file) as fh: for entry in fh: sequence_id = entry.name sequence = entry.sequence982 days ago
Remove dupcates in multifasta file !
#Using seqkit for duplicate sequence removal seqkit rmdup -n seqs.fa -o seqs_without_duplicate.fa #Awk for duplicate sequence removal awk '/^>/ { f = !a[$0]++ } f' seqs.fa981 days ago
Oneliner to convert lower-case to sequence masked with Ns
perl -pe '/^[^>]/ and $_=~ s/[a-z]/N/g' genomic.fna > genomic.N-masked.fna awk '{if(/^[^>]/)gsub(/[a-z]/,"N");print $0}' genomic.fna > genomic.N-masked.fna955 days ago
Perl script for Smith-Waterman Algorithm
# Smith-Waterman Algorithm # usage statement die "usage: $0 \n" unless @ARGV == 2; # get sequences from command line my ($seq1, $seq2) = @ARGV; # scoring scheme my $MATCH...955 days ago
955 days ago