<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Owner]]></title>
	<link>https://bioinformaticsonline.com/snippets/owner/lege?offset=10</link>
	<atom:link href="https://bioinformaticsonline.com/snippets/owner/lege?offset=10" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/44524/python-script-to-finds-extact-similar-sequence-between-two-multi-fasta-files</guid>
	<pubDate>Thu, 02 May 2024 02:54:56 -0500</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/44524/python-script-to-finds-extact-similar-sequence-between-two-multi-fasta-files</link>
	<title><![CDATA[Python script to finds extact similar sequence between two multi fasta files !]]></title>
	<description><![CDATA[<code>from Bio.Blast.Applications import NcbiblastnCommandline
import os
import sys

def perform_local_blast(query_file, subject_file, output_file):
    # Set up the BLAST command with format 6 (tab-delimited)
    blastn_cline = NcbiblastnCommandline(query=query_file, subject=subject_file, out=output_file, outfmt=6,
                                          word_size=16, perc_identity=100)
    
    # Run BLAST
    stdout, stderr = blastn_cline()
    
    # Check for errors
    if stderr:
        print(&quot;Error running BLAST:&quot;)
        print(stderr)

def parse_blast_results(output_file):
    # Parse BLAST results
    with open(output_file, &quot;r&quot;) as result_handle:
        for line in result_handle:
            fields = line.strip().split(&#039;\t&#039;)
            qseq = fields[0]  # Extract the aligned query sequence (qseq)
            #print(&quot;Aligned Query Sequence:&quot;, qseq)
            # Print other relevant information if needed

def main():
    if len(sys.argv) != 4:
        print(&quot;Usage: python script.py query.fasta subject.fasta output.txt&quot;)
        sys.exit(1)
    
    query_file = sys.argv[1]
    subject_file = sys.argv[2]
    output_file = sys.argv[3]
    
    # Perform local BLAST
    perform_local_blast(query_file, subject_file, output_file)
    
    # Parse and print BLAST results
    #parse_blast_results(output_file)

if __name__ == &quot;__main__&quot;:
    main()</code>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/44494/python-script-to-parse-gff-file</guid>
	<pubDate>Wed, 27 Mar 2024 20:42:11 -0500</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/44494/python-script-to-parse-gff-file</link>
	<title><![CDATA[Python script to parse GFF file]]></title>
	<description><![CDATA[<code>def parse_gff(gff_file):
    features = []
    with open(gff_file, &#039;r&#039;) as f:
        for line in f:
            if not line.startswith(&#039;#&#039;):  # Ignore comment lines
                fields = line.strip().split(&#039;\t&#039;)
                feature = {
                    &#039;seqid&#039;: fields[0],
                    &#039;source&#039;: fields[1],
                    &#039;type&#039;: fields[2],
                    &#039;start&#039;: int(fields[3]),
                    &#039;end&#039;: int(fields[4]),
                    &#039;score&#039;: fields[5],
                    &#039;strand&#039;: fields[6],
                    &#039;phase&#039;: fields[7],
                    &#039;attributes&#039;: dict(item.split(&#039;=&#039;) for item in fields[8].split(&#039;;&#039;))
                }
                features.append(feature)
    return features

# Usage example
gff_file = &#039;example.gff&#039;
parsed_features = parse_gff(gff_file)
for feature in parsed_features:
    print(feature)</code>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/44493/python-script-to-convert-fastq-to-fasta</guid>
	<pubDate>Wed, 27 Mar 2024 20:30:32 -0500</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/44493/python-script-to-convert-fastq-to-fasta</link>
	<title><![CDATA[Python script to convert fastq to fasta]]></title>
	<description><![CDATA[<code>def fastq_to_fasta(fastq_file, fasta_file):
    with open(fastq_file, &#039;r&#039;) as fq:
        with open(fasta_file, &#039;w&#039;) as fa:
            while True:
                # Read four lines from the FASTQ file
                header = fq.readline().strip()
                sequence = fq.readline().strip()
                fq.readline()  # Skip the &#039;+&#039; line
                quality = fq.readline().strip()
                
                # Check for EOF
                if not header:
                    break
                
                # Write to the FASTA file
                fa.write(&#039;&gt;&#039; + header[1:] + &#039;\n&#039;)
                fa.write(sequence + &#039;\n&#039;)

# Usage example
fastq_to_fasta(&#039;input.fastq&#039;, &#039;output.fasta&#039;)</code>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/44485/fasta-to-fastq-conversion</guid>
	<pubDate>Mon, 18 Mar 2024 02:41:42 -0500</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/44485/fasta-to-fastq-conversion</link>
	<title><![CDATA[Fasta to Fastq conversion !]]></title>
	<description><![CDATA[<code>seqtk seq -F &#039;#&#039; in.fa &gt; out.fq

# &quot;#&quot; is fake score.</code>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/44460/python-script-for-basic-stats-of-the-assembled-genome</guid>
	<pubDate>Thu, 01 Feb 2024 02:20:54 -0600</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/44460/python-script-for-basic-stats-of-the-assembled-genome</link>
	<title><![CDATA[Python script for basic stats of the assembled genome !]]></title>
	<description><![CDATA[<code>from Bio import SeqIO
import statistics

# Input file containing the genome assembly in FASTA format
input_file = &#039;genome_assembly.fasta&#039;

# Variables for computing statistics
total_length = 0
num_contigs = 0
contig_lengths = []

# Iterate through each sequence in the assembly
for record in SeqIO.parse(input_file, &#039;fasta&#039;):
    length = len(record.seq)
    total_length += length
    num_contigs += 1
    contig_lengths.append(length)

# Sort contig lengths in descending order
contig_lengths.sort(reverse=True)

# Calculate additional statistics
min_contig_length = min(contig_lengths)
max_contig_length = max(contig_lengths)
avg_contig_length = statistics.mean(contig_lengths)
median_contig_length = statistics.median(contig_lengths)

# Calculate N50
def calculate_n50(lengths):
    total_size = sum(lengths)
    half_size = total_size / 2
    cumulative_size = 0
    for length in lengths:
        cumulative_size += length
        if cumulative_size &gt;= half_size:
            return length

# Calculate GC content
def calculate_gc_content(file):
    gc_count = 0
    total_bases = 0

    with open(file, &#039;r&#039;) as fh:
        for line in fh:
            if line.startswith(&#039;&gt;&#039;):
                continue  # Skip header lines
            line = line.strip()
            gc_count += line.count(&#039;G&#039;) + line.count(&#039;C&#039;)
            total_bases += len(line)

    gc_content_percentage = (gc_count / total_bases) * 100
    return round(gc_content_percentage, 2)

# Print the computed statistics and information
print(&quot;Genome Assembly Statistics:&quot;)
print(&quot;---------------------------&quot;)
print(f&quot;Total Length: {total_length}&quot;)
print(f&quot;Number of Contigs: {num_contigs}&quot;)
print(f&quot;Minimum Contig Length: {min_contig_length}&quot;)
print(f&quot;Maximum Contig Length: {max_contig_length}&quot;)
print(f&quot;Average Contig Length: {avg_contig_length}&quot;)
print(f&quot;Median Contig Length: {median_contig_length}&quot;)
print(f&quot;N50: {calculate_n50(contig_lengths)}&quot;)
print(f&quot;GC Content: {calculate_gc_content(input_file)}%&quot;)
print(&quot;\nContig Length Distribution:&quot;)
print(&quot;---------------------------&quot;)

# Print contig length distribution
for length in contig_lengths:
    print(length)</code>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/44459/perl-script-to-calculate-the-basic-stats-of-the-assembled-genome</guid>
	<pubDate>Thu, 01 Feb 2024 02:19:05 -0600</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/44459/perl-script-to-calculate-the-basic-stats-of-the-assembled-genome</link>
	<title><![CDATA[Perl script to calculate the basic stats of the assembled genome !]]></title>
	<description><![CDATA[<code>#!/usr/bin/perl
use strict;
use warnings;
use Bio::SeqIO;

# Input file containing the genome assembly in FASTA format
my $input_file = &#039;genome_assembly.fasta&#039;;

# Create Bio::SeqIO object to read the FASTA file
my $seqio = Bio::SeqIO-&gt;new(-file =&gt; $input_file, -format =&gt; &#039;fasta&#039;);

# Variables for computing statistics
my $total_length = 0;
my $num_contigs = 0;
my @contig_lengths;

# Iterate through each sequence in the assembly
while (my $seq = $seqio-&gt;next_seq) {
    my $length = $seq-&gt;length;
    $total_length += $length;
    $num_contigs++;
    push @contig_lengths, $length;
}

# Sort contig lengths in descending order
@contig_lengths = sort { $b &lt;=&gt; $a } @contig_lengths;

# Calculate additional statistics
my $min_contig_length = $contig_lengths[-1];
my $max_contig_length = $contig_lengths[0];
my $avg_contig_length = $total_length / $num_contigs;
my $median_contig_length = calculate_median(\@contig_lengths);

# Calculate N50
my $n50 = calculate_n50(\@contig_lengths);

# Calculate GC content
my $gc_content = calculate_gc_content($input_file);

# Print the computed statistics and information
print &quot;Genome Assembly Statistics:\n&quot;;
print &quot;---------------------------\n&quot;;
print &quot;Total Length: $total_length\n&quot;;
print &quot;Number of Contigs: $num_contigs\n&quot;;
print &quot;Minimum Contig Length: $min_contig_length\n&quot;;
print &quot;Maximum Contig Length: $max_contig_length\n&quot;;
print &quot;Average Contig Length: $avg_contig_length\n&quot;;
print &quot;Median Contig Length: $median_contig_length\n&quot;;
print &quot;N50: $n50\n&quot;;
print &quot;GC Content: $gc_content%\n&quot;;
print &quot;\nContig Length Distribution:\n&quot;;
print &quot;---------------------------\n&quot;;

# Print contig length distribution
foreach my $length (@contig_lengths) {
    print &quot;$length\n&quot;;
}

# Subroutine to calculate N50
sub calculate_n50 {
    my ($lengths_ref) = @_;
    my $total_size = 0;
    foreach my $length (@$lengths_ref) {
        $total_size += $length;
    }
    my $half_size = $total_size / 2;
    my $cumulative_size = 0;
    for my $length (@$lengths_ref) {
        $cumulative_size += $length;
        if ($cumulative_size &gt;= $half_size) {
            return $length;
        }
    }
    return 0; # Should not reach here
}

# Subroutine to calculate GC content
sub calculate_gc_content {
    my ($file) = @_;
    my $gc_count = 0;
    my $total_bases = 0;

    open my $fh, &#039;&lt;&#039;, $file or die &quot;Cannot open file: $!&quot;;
    while (&lt;$fh&gt;) {
        next if /^&gt;/; # Skip header lines
        chomp;
        $gc_count += tr/GCgc//;
        $total_bases += length($_);
    }
    close $fh;

    my $gc_content_percentage = ($gc_count / $total_bases) * 100;
    return sprintf(&quot;%.2f&quot;, $gc_content_percentage);
}

# Subroutine to calculate median
sub calculate_median {
    my ($array_ref) = @_;
    my $count = scalar @$array_ref;
    return $array_ref-&gt;[$count / 2];
}</code>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/44458/perl-script-to-parse-blast-results-and-plot-basic-stats</guid>
	<pubDate>Thu, 01 Feb 2024 02:11:23 -0600</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/44458/perl-script-to-parse-blast-results-and-plot-basic-stats</link>
	<title><![CDATA[Perl script to parse blast results and plot basic stats !]]></title>
	<description><![CDATA[<code>#!/usr/bin/perl

use strict;
use warnings;
use List::Util qw(sum);

# Usage: ./parse_blast.pl blast_result.txt

die &quot;Usage: ./parse_blast.pl blast_result.txt\n&quot; unless @ARGV;

my $blast_file = shift @ARGV;
my @blast_entries = parse_blast($blast_file);

print &quot;Total entries: &quot;, scalar(@blast_entries), &quot;\n&quot;;
print &quot;---------------------------\n&quot;;

# Print detailed information for each entry
for my $entry (@blast_entries) {
    print &quot;Query:             &quot;, $entry-&gt;{QUERY},         &quot;\n&quot;;
    print &quot;Subject:           &quot;, $entry-&gt;{SUBJECT},       &quot;\n&quot;;
    print &quot;Percent Identity:  &quot;, $entry-&gt;{PERCENT_IDENTITY}, &quot;\n&quot;;
    print &quot;Alignment Length:  &quot;, $entry-&gt;{ALIGNMENT_LENGTH}, &quot;\n&quot;;
    print &quot;E-value:           &quot;, $entry-&gt;{EVALUE},        &quot;\n&quot;;
    print &quot;Bit Score:         &quot;, $entry-&gt;{BITSCORE},      &quot;\n&quot;;
    print &quot;---------------------------\n&quot;;
}

# Calculate additional statistics
my $avg_percent_identity = calculate_average(\@blast_entries, &#039;PERCENT_IDENTITY&#039;);
my $avg_alignment_length = calculate_average(\@blast_entries, &#039;ALIGNMENT_LENGTH&#039;);
my ($min_evalue, $max_evalue, $avg_evalue) = calculate_summary_stats(\@blast_entries, &#039;EVALUE&#039;);
my ($min_bitscore, $max_bitscore, $avg_bitscore) = calculate_summary_stats(\@blast_entries, &#039;BITSCORE&#039;);

# Print summary statistics
print &quot;Average Percent Identity:  $avg_percent_identity\n&quot;;
print &quot;Average Alignment Length:  $avg_alignment_length\n&quot;;
print &quot;E-value Range:             $min_evalue - $max_evalue\n&quot;;
print &quot;Average E-value:           $avg_evalue\n&quot;;
print &quot;Bit Score Range:           $min_bitscore - $max_bitscore\n&quot;;
print &quot;Average Bit Score:         $avg_bitscore\n&quot;;

sub parse_blast {
    my ($file) = @_;

    open my $fh, &#039;&lt;&#039;, $file or die &quot;Cannot open file $file: $!\n&quot;;

    my @entries;

    while (my $line = &lt;$fh&gt;) {
        next if $line =~ /^\s*$/;  # skip empty lines

        chomp $line;
        my @fields = split /\t/, $line;

        my %entry;
        @entry{qw/QUERY SUBJECT PERCENT_IDENTITY ALIGNMENT_LENGTH EVALUE BITSCORE/} = @fields;

        push @entries, \%entry;
    }

    close $fh;

    return @entries;
}

sub calculate_average {
    my ($entries, $field) = @_;
    my @values = map { $_-&gt;{$field} } @$entries;
    return @values ? sum(@values) / @values : 0;
}

sub calculate_summary_stats {
    my ($entries, $field) = @_;
    my @values = map { $_-&gt;{$field} } @$entries;

    my $min = @values ? (sort { $a &lt;=&gt; $b } @values)[0] : 0;
    my $max = @values ? (sort { $b &lt;=&gt; $a } @values)[0] : 0;
    my $avg = @values ? sum(@values) / @values : 0;

    return ($min, $max, $avg);
}</code>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/44457/perl-script-to-parse-vcf-file</guid>
	<pubDate>Thu, 01 Feb 2024 02:08:11 -0600</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/44457/perl-script-to-parse-vcf-file</link>
	<title><![CDATA[Perl script to parse VCF file !]]></title>
	<description><![CDATA[<code>#!/usr/bin/perl

use strict;
use warnings;

# Usage: ./parse_vcf.pl input.vcf

die &quot;Usage: ./parse_vcf.pl input.vcf\n&quot; unless @ARGV;

my $vcf_file = shift @ARGV;
my @vcf_entries = parse_vcf($vcf_file);

print &quot;Total entries: &quot;, scalar(@vcf_entries), &quot;\n&quot;;
print &quot;---------------------------\n&quot;;

my %chromosome_counts;
for my $entry (@vcf_entries) {
    $chromosome_counts{$entry-&gt;{CHROM}}++;
}

print &quot;Chromosome counts:\n&quot;;
for my $chromosome (sort keys %chromosome_counts) {
    print &quot;  $chromosome: $chromosome_counts{$chromosome}\n&quot;;
}

sub parse_vcf {
    my ($file) = @_;

    open my $fh, &#039;&lt;&#039;, $file or die &quot;Cannot open file $file: $!\n&quot;;

    my @entries;

    while (my $line = &lt;$fh&gt;) {
        next if $line =~ /^\s*$/;  # skip empty lines
        next if $line =~ /^\s*#/; # skip comments

        chomp $line;
        my @fields = split /\t/, $line;

        my %entry;
        @entry{qw/CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLES/} = @fields;

        push @entries, \%entry;
    }

    close $fh;

    return @entries;
}</code>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/44456/perl-script-to-find-overlaps-between-two-bed-files</guid>
	<pubDate>Thu, 01 Feb 2024 02:04:04 -0600</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/44456/perl-script-to-find-overlaps-between-two-bed-files</link>
	<title><![CDATA[Perl script to find overlaps between two bed files !]]></title>
	<description><![CDATA[<code>#!/usr/bin/perl

use strict;
use warnings;

# Check if the correct number of arguments are provided
if (@ARGV != 2) {
    die &quot;Usage: $0 file1.bed file2.bed\n&quot;;
}

# Read the contents of the two BED files
my $file1 = shift @ARGV;
my $file2 = shift @ARGV;

open my $fh1, &#039;&lt;&#039;, $file1 or die &quot;Error opening $file1: $!&quot;;
open my $fh2, &#039;&lt;&#039;, $file2 or die &quot;Error opening $file2: $!&quot;;

# Iterate over each interval in the first BED file
while (my $line1 = &lt;$fh1&gt;) {
    chomp $line1;
    my @fields1 = split /\t/, $line1;
    my $chr1 = $fields1[0];
    my $start1 = $fields1[1];
    my $end1 = $fields1[2];

    # Check for overlaps with intervals in the second BED file
    while (my $line2 = &lt;$fh2&gt;) {
        chomp $line2;
        my @fields2 = split /\t/, $line2;
        my $chr2 = $fields2[0];
        my $start2 = $fields2[1];
        my $end2 = $fields2[2];

        # Check for chromosome match and overlap
        if ($chr1 eq $chr2 &amp;&amp; $start1 &lt; $end2 &amp;&amp; $end1 &gt; $start2) {
            print &quot;Overlap found:\n&quot;;
            print &quot;File 1: $line1\n&quot;;
            print &quot;File 2: $line2\n\n&quot;;
        }
    }

    # Rewind file2 to the beginning for the next iteration
    seek $fh2, 0, 0;
}

close $fh1;
close $fh2;

print &quot;Comparison completed.\n&quot;;</code>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/44455/raku-script-to-find-overlaps-between-two-bed-files</guid>
	<pubDate>Thu, 01 Feb 2024 02:02:46 -0600</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/44455/raku-script-to-find-overlaps-between-two-bed-files</link>
	<title><![CDATA[Raku script to find overlaps between two bed files !]]></title>
	<description><![CDATA[<code>#!/usr/bin/env raku

# Check if the correct number of arguments are provided
if @*ARGS.elems != 2 {
    say &quot;Usage: ./compare_bed_files.raku file1.bed file2.bed&quot;;
    exit 1;
}

# Read the contents of the two BED files
my @bed1 = slurp(@*ARGS[0]).lines;
my @bed2 = slurp(@*ARGS[1]).lines;

# Iterate over each interval in the first BED file
for my $line1 (@bed1) {
    my @fields1 = $line1.split(&quot;\t&quot;);
    my $chr1 = @fields1[0];
    my $start1 = @fields1[1];
    my $end1 = @fields1[2];

    # Check for overlaps with intervals in the second BED file
    for my $line2 (@bed2) {
        my @fields2 = $line2.split(&quot;\t&quot;);
        my $chr2 = @fields2[0];
        my $start2 = @fields2[1];
        my $end2 = @fields2[2];

        # Check for chromosome match and overlap
        if $chr1 eq $chr2 &amp;&amp; $start1 &lt; $end2 &amp;&amp; $end1 &gt; $start2 {
            say &quot;Overlap found:&quot;;
            say &quot;File 1: $line1&quot;;
            say &quot;File 2: $line2&quot;;
            say &quot;&quot;;
        }
    }
}

say &quot;Comparison completed.&quot;;</code>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>

</channel>
</rss>