Extract fasta sequences with ids in another file !
#Ids are in test.txt - one ids per line #sequences are in test.fa grep -w -A 2 -f test.txt test.fa --no-group-separator # seqtk seqtk subseq test.fa test.txt #faSomeRecods faSomeRecords in.fa listFile out.fa # seqkit seqkit grep -n -f list.txt sequences.fas > newfile2.fas903 days ago
Extract fasta header with ids !
#Extract all the fasta header name with certain ids kraken --db ../../../../DATABASE/minikraken_20171019_8GB.tgz out.fa more out.fa_class.txt | grep "227859" | awk '{print $2}' >...891 days ago
bash script to extract sequence by ids !
Use a Perl one-liner, grep and seqtk subseq to extract the desired fasta sequences: # Create test...ids_gene_ids.tsv # Select ids that correspond to the desired gene ids: grep -f gene_ids.txt ids_gene_ids....838 days ago
Bash script to split multifasta file !
...GIN {n=0;} /^>/ {if(n%500==0){file=sprintf("chunk%d.fa",n);} print >> file; n++; next;} { print >> file; }' < multi.fa #OR awk -v chunksize=$(grep ">" multi.fasta -c) 'BEGIN{n=...837 days ago
Bash script to find difference between two files !
#lines which are exist only in file2: grep -Fxvf file1 file2 > file3 #lines which are exist only in file1: grep -Fxvf file2 file1 > file3 #lines which are exist in both files: grep -Fxf file1 file2 > file3787 days ago
Bash command to explore assembly summary genbank !
wget https://ftp.ncbi.nlm.nih.gov/genomes/genbank/assembly_summary_genbank.txt pip3 install csvkit csvcut -t -K 1 -c 'excluded_from_refseq' assembly_summary_genbank.txt \ | tail -n +2 | tr ";" "\n" \ | sed -e 's/^ //' -e 's/ $//' | grep -v '""' \ | sort | uniq -c | sort -nr782 days ago