1400 days ago
Reformat the multifasta for sequence length !
#awk oneliner to reformat the multifasta sequences awk '!/^>/ {printf "%s", $0; n = "\n"} /^>/ {print n $0; n = ""}' file.fasta | fold -w 1001390 days ago
Bash script to handle Multifasta files
...case in a FASTA sequence file $ awk 'BEGIN{FS=" "}{if(!/>/){print toupper($0)}else{print $1}}' input.fasta > output.fasta #Rearrange FASTA sequences according to their length...1355 days ago
1338 days ago
Corona variant calling steps !
...-o results/bam/$basename.sorted.bam results/bam/$basename.aligned.bam bcftools mpileup -O b -o results/bcf/$basename.bcf -f data/ref_genome/sequences.fasta results/bam/$basename.s...1043 days ago
Tadpole is 250x faster than SPADes assembler !
...6, 2018 Description: Uses kmer counts to assemble contigs, extend sequences, or error-correct reads. T...contig: Make contigs from kmers. extend: Extend sequences to be longer, and optionally...974 days ago
Perl script for Smith-Waterman Algorithm
# Smith-Waterman Algorithm # usage statement die "usage: $0 \n" unless @ARGV == 2; # get sequences from command line my ($seq1, $seq2) = @ARGV; # scoring scheme my $MATCH...948 days ago
948 days ago
Command line to download blast database / protein
...//ftp.ncbi.nlm.nih.gov/blast/db/ https://ftp.ncbi.nlm.nih.gov/blast/db/ # Database detail / description nr.*tar.gz | Non-redundant protein sequences from GenPept, Swissprot, PIR,...941 days ago
Extract fasta sequences with ids in another file !
#Ids are in test.txt - one ids per line #sequences are in test.fa grep -w -A 2 -f test.txt test.fa --no-group-separator # seqtk seqtk subseq test.fa test.txt #faSomeRecods faSomeRecords in.fa listFile out.fa # seqkit seqkit grep -n -f list.txt sequences.fas > newfile2.fas887 days ago