2727 days ago
Extract the fastq sequence with range in Perl
use Bio::DB::Fasta; open(POSITIONS,"positions.txt"); while(){ chomp; my ($seqName,$begin,$end) = split(/\s/); my $db = Bio::DB::Fasta->new('allGenomeContacted.fa'); my $seq = $db->seq("$seqName", $begin => $end); print "$seq\n"; } close(POSITIONS);2502 days ago
2244 days ago
Bash oneliner to extract all ids from a multifasta file
#List of ids - one per line in allIds.txt $ awk 'BEGIN{while((getline0)l[">"$1]=1}/^>/{f=!l[$1]}f' seq.fa # You can play with this f=!l[$1 ] if wanted to extract or not extract the ids1572 days ago
Bash script to get exon fragments from genome files !
#Exons are already defined in the GTF file, so we simply need to print lines that are marked exonic. gunzip -c genome_file.gtf.gz | awk 'BEGIN{OFS="\t";} $3=="exon" {print $1,$4-1,$5}' | bedtools sort | bedtools merge -i - | gzip > my_exon.bed.gz1399 days ago
Bash script to extract intronic fragments !
...onic coordinates; #by subtracting the exonic regions from the genic region, we have the intronic region. gunzip -c genome_file.gtf.gz | awk 'BEGIN{OFS="\t";} $3=="gene" {print...1399 days ago
Bash script to get intergenic region from genome files !
...ttp://xxx.chrom.sizes cat xxx.chrom.sizes | sed 's/^chr//' | sed 's/Cp/Pt/' > tmp mv tmp xxx.chrom.sizes gunzip -c genome_file.gtf.gz | awk 'BEGIN{OFS="\t";} $3=="gene" {print...1399 days ago
Bash script to handle Multifasta files
...all lowercase residues to uppercase in a FASTA sequence file $ awk 'BEGIN{FS=" "}{if(!/>/){print touppe...sort -k1,1n | cut -f 2- |tr "\t" "\n" > output.fasta #Add ‘>’ at the beginning of headers in a FASTA fil...1391 days ago
Perl script to check perl modules and download NCBI, BUSCO, Taonomy, Silva databases !
...ve been installed in the system and download the mandatory database # BEGIN { my @import_modules = (...else { print "\n Module $_ OK!\n";} } # end 'for' } # end 'BEGIN' block #Bash script else h...1194 days ago
1192 days ago