<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: All]]></title>
	<link>https://bioinformaticsonline.com/snippets?</link>
	<atom:link href="https://bioinformaticsonline.com/snippets?" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/45197/onliner-to-convert-minimap2-to-paf</guid>
	<pubDate>Wed, 03 Jun 2026 07:03:26 -0500</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/45197/onliner-to-convert-minimap2-to-paf</link>
	<title><![CDATA[Onliner to convert minimap2 to PAF]]></title>
	<description><![CDATA[<code>awk -v OFS=&quot;\t&quot; -v ref_species=&quot;species1&quot; -v query_species=&quot;species2&quot; &#039;{ print $6, $8, $9, $1, $3, $4, $5, ref_species, query_species }&#039; mypaf.paf &gt; synPlotter.tsv</code>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/45196/oneliner-to-convert-minimap2-to-paf</guid>
	<pubDate>Wed, 03 Jun 2026 07:02:19 -0500</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/45196/oneliner-to-convert-minimap2-to-paf</link>
	<title><![CDATA[Oneliner to convert minimap2 to paf]]></title>
	<description><![CDATA[<code>awk -v OFS=&quot;\t&quot; -v ref_species=&quot;species1&quot; -v query_species=&quot;species2&quot; &#039;{ print $6, $8, $9, $1, $3, $4, $5, ref_species, query_species }&#039; mypaf.paf &gt; synPlotter.tsv</code>]]></description>
	<dc:creator>Jitendra Narayan</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/44753/python-script-to-split-a-dna-sequence-into-words-of-varying-lengths</guid>
	<pubDate>Thu, 02 Jan 2025 11:31:22 -0600</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/44753/python-script-to-split-a-dna-sequence-into-words-of-varying-lengths</link>
	<title><![CDATA[Python script to split a DNA sequence into words of varying lengths]]></title>
	<description><![CDATA[<code># Script to split a DNA sequence into words of varying lengths
def split_dna_into_words(dna_sequence, min_length, max_length):
    &quot;&quot;&quot;
    Splits a DNA sequence into words of lengths ranging from min_length to max_length.

    Parameters:
        dna_sequence (str): The DNA sequence to split (e.g., &quot;ATGCGTAC&quot;).
        min_length (int): The minimum length of each word.
        max_length (int): The maximum length of each word.

    Returns:
        dict: A dictionary where keys are word lengths and values are lists of DNA words of that length.
    &quot;&quot;&quot;
    if not dna_sequence:
        raise ValueError(&quot;The DNA sequence cannot be empty.&quot;)

    if min_length &lt;= 0 or max_length &lt;= 0:
        raise ValueError(&quot;Word lengths must be positive integers.&quot;)

    if min_length &gt; max_length:
        raise ValueError(&quot;Minimum length cannot be greater than maximum length.&quot;)

    # Ensure the DNA sequence contains valid nucleotides
    for nucleotide in dna_sequence:
        if nucleotide.upper() not in &quot;ATCG&quot;:
            raise ValueError(f&quot;Invalid character &#039;{nucleotide}&#039; found in DNA sequence.&quot;)

    # Generate words of varying lengths
    words_by_length = {}
    for length in range(min_length, max_length + 1):
        words_by_length[length] = [dna_sequence[i:i+length] for i in range(0, len(dna_sequence) - length + 1)]

    return words_by_length

# Example usage
def main():
    dna_sequence = &quot;ATGCGTACGCTAATGCGTACGCTAATGCGTACGCTAATGCGTACGCTAATGCGTACGCTAATGCGTACGCTAATGCGTACGCTAATGCGTACGCTAATGCGTACGCTAATGCGTACGCTA&quot;
    min_length = 3
    max_length = 99

    try:
        words_by_length = split_dna_into_words(dna_sequence, min_length, max_length)
        for length, words in words_by_length.items():
            print(f&quot;Words of length {length}:&quot;, words)
    except ValueError as e:
        print(&quot;Error:&quot;, e)

if __name__ == &quot;__main__&quot;:
    main()</code>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/44740/python-script-to-find-all-possible-repeats-in-a-dna-string</guid>
	<pubDate>Mon, 16 Dec 2024 07:54:38 -0600</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/44740/python-script-to-find-all-possible-repeats-in-a-dna-string</link>
	<title><![CDATA[Python script to find all possible repeats in a DNA string !]]></title>
	<description><![CDATA[<code>from collections import defaultdict

def find_repeats_in_genome(genome, min_length=2, max_length=None):
    &quot;&quot;&quot;
    Finds all repeating sequences in a genome within a specified length range.

    Parameters:
        genome (str): The genome sequence.
        min_length (int): Minimum length of repeats to scan for (default: 2).
        max_length (int): Maximum length of repeats to scan for (default: None, meaning entire genome).

    Returns:
        dict: A dictionary where keys are repeating sequences and values are lists of starting positions.
    &quot;&quot;&quot;
    if max_length is None:
        max_length = len(genome)

    repeats = defaultdict(list)

    # Iterate over all possible lengths of substrings
    for length in range(min_length, max_length + 1):
        seen = defaultdict(list)  # Tracks occurrences of substrings of the current length

        # Sliding window approach
        for i in range(len(genome) - length + 1):
            substring = genome[i:i + length]
            seen[substring].append(i)

        # Filter substrings that appear more than once
        for substring, positions in seen.items():
            if len(positions) &gt; 1:
                repeats[substring].extend(positions)

    return repeats

# Example usage
def main():
    genome = &quot;ATCGATCGAATTCGATCG&quot;  # Example genome sequence
    min_length = 2
    max_length = 5

    repeats = find_repeats_in_genome(genome, min_length, max_length)

    print(&quot;Repeating sequences:&quot;)
    for seq, positions in repeats.items():
        print(f&quot;Sequence: {seq}, Positions: {positions}&quot;)

if __name__ == &quot;__main__&quot;:
    main()</code>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/44738/python-script-for-treemap-using-pythons-plotly-library</guid>
	<pubDate>Sat, 14 Dec 2024 12:45:15 -0600</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/44738/python-script-for-treemap-using-pythons-plotly-library</link>
	<title><![CDATA[Python script for treemap using Python's Plotly library]]></title>
	<description><![CDATA[<code>import plotly.express as px
import pandas as pd

# Sample dataset: Representing biological pathways and their associated counts
data = {
    &quot;Category&quot;: [&quot;Metabolism&quot;, &quot;Metabolism&quot;, &quot;Metabolism&quot;, 
                 &quot;Cellular Processes&quot;, &quot;Cellular Processes&quot;, &quot;Cellular Processes&quot;, 
                 &quot;Information Storage&quot;, &quot;Information Storage&quot;],
    &quot;Subcategory&quot;: [&quot;Carbohydrate metabolism&quot;, &quot;Lipid metabolism&quot;, &quot;Amino acid metabolism&quot;, 
                    &quot;Signal transduction&quot;, &quot;Cell cycle&quot;, &quot;Transport&quot;, 
                    &quot;DNA replication&quot;, &quot;RNA processing&quot;],
    &quot;Count&quot;: [150, 120, 90, 100, 85, 70, 110, 95]
}

# Convert data to a DataFrame
df = pd.DataFrame(data)

# Create the treemap
fig = px.treemap(
    df,
    path=[&quot;Category&quot;, &quot;Subcategory&quot;],  # Hierarchical levels
    values=&quot;Count&quot;,                   # Size of the treemap blocks
    color=&quot;Count&quot;,                    # Color based on the count values
    color_continuous_scale=&quot;Viridis&quot;  # Color scale
)

# Add a title
fig.update_layout(title=&quot;Treemap: Hierarchical Data Representation in Bioinformatics&quot;)

# Show the plot
fig.show()</code>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/44733/bacterial-comparative-genomics-pipeline-bash-script</guid>
	<pubDate>Sat, 14 Dec 2024 12:34:57 -0600</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/44733/bacterial-comparative-genomics-pipeline-bash-script</link>
	<title><![CDATA[Bacterial Comparative Genomics Pipeline Bash Script]]></title>
	<description><![CDATA[<code>#!/bin/bash

# Bacterial Comparative Genomics Pipeline Script
# This script automates key steps in bacterial comparative genomics using popular bioinformatics tools.

# Ensure the script stops on error
set -e

# Define paths
WORKDIR=&quot;./bacterial_genomics_pipeline&quot;
INPUT_FASTA_DIR=&quot;./input_genomes&quot;
OUTPUT_DIR=&quot;./output&quot;
CORE_PAN_DIR=&quot;$OUTPUT_DIR/core_pan_analysis&quot;
PHYLOGENY_DIR=&quot;$OUTPUT_DIR/phylogeny&quot;
ALIGNMENT_DIR=&quot;$OUTPUT_DIR/genome_alignment&quot;
RESISTANCE_DIR=&quot;$OUTPUT_DIR/antibiotic_resistance&quot;
SYNTENY_DIR=&quot;$OUTPUT_DIR/synteny_analysis&quot;

# Create directories if they do not exist
mkdir -p $WORKDIR $OUTPUT_DIR $CORE_PAN_DIR $PHYLOGENY_DIR $ALIGNMENT_DIR $RESISTANCE_DIR $SYNTENY_DIR

# Tools required
PROKKA=&quot;prokka&quot;
ROARY=&quot;roary&quot;
MAUVE=&quot;progressiveMauve&quot;
IQTREE=&quot;iqtree&quot;
ABRICATE=&quot;abricate&quot;
MCSCANX=&quot;mcscanx&quot;

# Step 1: Genome Annotation using Prokka
annotate_genomes() {
  echo &quot;\n=== Annotating Genomes with Prokka ===&quot;
  for fasta in $INPUT_FASTA_DIR/*.fasta; do
    basename=$(basename $fasta .fasta)
    output_path=&quot;$OUTPUT_DIR/annotation_$basename&quot;
    echo &quot;Annotating $basename...&quot;
    $PROKKA --outdir $output_path --prefix $basename $fasta
  done
}

# Step 2: Core and Pan-genome Analysis using Roary
core_pan_analysis() {
  echo &quot;\n=== Performing Core and Pan-genome Analysis with Roary ===&quot;
  gff_files=$(find $OUTPUT_DIR -name &quot;*.gff&quot;)
  roary_output=&quot;$CORE_PAN_DIR/pan_genome_analysis&quot;
  mkdir -p $roary_output
  $ROARY -e -n -v -p 8 -o $roary_output $gff_files
}

# Step 3: Whole Genome Alignment using Mauve
align_genomes() {
  echo &quot;\n=== Aligning Genomes with Mauve ===&quot;
  alignment_output=&quot;$ALIGNMENT_DIR/aligned_genomes.xmfa&quot;
  echo &quot;Running Mauve on input genomes...&quot;
  $MAUVE --output=$alignment_output $(find $INPUT_FASTA_DIR -name &quot;*.fasta&quot;)
  echo &quot;Alignment saved to $alignment_output&quot;
}

# Step 4: Phylogenetic Tree Construction using IQ-TREE
construct_phylogeny() {
  echo &quot;\n=== Constructing Phylogenetic Tree with IQ-TREE ===&quot;
  alignment=&quot;$ALIGNMENT_DIR/aligned_genomes.xmfa&quot;
  phylo_output=&quot;$PHYLOGENY_DIR/phylogeny_tree&quot;
  iqtree_output=&quot;$phylo_output.treefile&quot;

  echo &quot;Running IQ-TREE on aligned genomes...&quot;
  $IQTREE -s $alignment -m GTR+G -nt AUTO -pre $phylo_output
  echo &quot;Phylogenetic tree saved to $iqtree_output&quot;
}

# Step 5: Antibiotic Resistance Gene Identification using ABRicate
identify_resistance_genes() {
  echo &quot;\n=== Identifying Antibiotic Resistance Genes with ABRicate ===&quot;
  for fasta in $INPUT_FASTA_DIR/*.fasta; do
    basename=$(basename $fasta .fasta)
    output_path=&quot;$RESISTANCE_DIR/${basename}_resistance.txt&quot;
    echo &quot;Analyzing $basename for resistance genes...&quot;
    abricate $fasta &gt; $output_path
  done
}

# Step 6: Synteny Analysis using MCScanX
synteny_analysis() {
  echo &quot;\n=== Performing Synteny Analysis with MCScanX ===&quot;
  synteny_output=&quot;$SYNTENY_DIR/synteny_results&quot;
  mkdir -p $synteny_output
  echo &quot;Running MCScanX on annotated genomes...&quot;
  MCScanX $OUTPUT_DIR &gt; &quot;$synteny_output/results.txt&quot;
  echo &quot;Synteny analysis results saved to $synteny_output&quot;
}

# Main workflow
annotate_genomes
core_pan_analysis
align_genomes
construct_phylogeny
identify_resistance_genes
synteny_analysis

echo &quot;\n=== Bacterial Comparative Genomics Pipeline Complete ===&quot;
echo &quot;Results saved in $OUTPUT_DIR&quot;</code>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/44725/bash-script-to-discover-pirna-in-transcriptome-data</guid>
	<pubDate>Fri, 13 Dec 2024 11:47:00 -0600</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/44725/bash-script-to-discover-pirna-in-transcriptome-data</link>
	<title><![CDATA[Bash script to discover piRNA in transcriptome data !]]></title>
	<description><![CDATA[<code>#!/bin/bash

# Variables (modify these as per your setup)
INPUT_FASTQ=&quot;input_reads.fastq&quot;
ADAPTER_SEQ=&quot;TGGAATTCTCGGGTGCCAAGG&quot;
REFERENCE_GENOME=&quot;reference_genome.fa&quot;
BOWTIE_INDEX=&quot;reference_index&quot;
OUTPUT_DIR=&quot;piRNA_analysis&quot;
THREADS=4

# Create output directory
mkdir -p $OUTPUT_DIR

# Step 1: Quality Control
echo &quot;Running FastQC for quality control...&quot;
fastqc $INPUT_FASTQ -o $OUTPUT_DIR

# Step 2: Adapter Trimming
echo &quot;Trimming adapters with Cutadapt...&quot;
cutadapt -a $ADAPTER_SEQ -o $OUTPUT_DIR/trimmed_reads.fastq $INPUT_FASTQ

# Step 3: Mapping Reads to Reference Genome
echo &quot;Mapping reads to reference genome using Bowtie...&quot;
bowtie -v 1 -k 1 --best -p $THREADS $BOWTIE_INDEX $OUTPUT_DIR/trimmed_reads.fastq -S $OUTPUT_DIR/aligned_reads.sam

# Step 4: Convert SAM to BAM and Sort
echo &quot;Converting SAM to BAM and sorting...&quot;
samtools view -Sb $OUTPUT_DIR/aligned_reads.sam | samtools sort -o $OUTPUT_DIR/sorted_reads.bam

# Step 5: Extract Reads of piRNA Size (24-32 nt)
echo &quot;Filtering reads of size 24-32 nt...&quot;
bedtools bamtofastq -i $OUTPUT_DIR/sorted_reads.bam -fq $OUTPUT_DIR/all_reads.fastq
seqkit seq -m 24 -M 32 $OUTPUT_DIR/all_reads.fastq &gt; $OUTPUT_DIR/piRNA_size_reads.fastq

# Step 6: Detect Sequence Bias (Optional)
echo &quot;Checking sequence bias using WebLogo-compatible data...&quot;
seqkit fx2tab $OUTPUT_DIR/piRNA_size_reads.fastq | cut -f2 | awk &#039;{print &quot;&gt;seq&quot;NR&quot;\n&quot;$0}&#039; &gt; $OUTPUT_DIR/piRNA_sequences.fa

# Step 7: Identify piRNA Clusters
# This step requires a tool like proTRAC or PIRANHA. Example placeholder:
echo &quot;Identifying piRNA clusters (requires proTRAC or PIRANHA)...&quot;
# Example with proTRAC:
# proTRAC.pl -s $OUTPUT_DIR/sorted_reads.bam -g $REFERENCE_GENOME -o $OUTPUT_DIR/clusters

# Step 8: Annotate Clusters
# Annotation depends on your genome&#039;s annotation file
# bedtools intersect example placeholder:
# bedtools intersect -a clusters.bed -b genome_annotation.gtf &gt; annotated_clusters.bed

# Step 9: Clean up intermediate files (optional)
echo &quot;Cleaning up intermediate files...&quot;
rm $OUTPUT_DIR/aligned_reads.sam $OUTPUT_DIR/all_reads.fastq

# Done
echo &quot;piRNA discovery pipeline completed! Results are in $OUTPUT_DIR.&quot;</code>]]></description>
	<dc:creator>Abhi</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/44715/python-script-to-split-a-genome-sequence-into-overlapping-windows-of-100-base-pairs</guid>
	<pubDate>Wed, 11 Dec 2024 23:32:55 -0600</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/44715/python-script-to-split-a-genome-sequence-into-overlapping-windows-of-100-base-pairs</link>
	<title><![CDATA[Python script to split a genome sequence into overlapping windows of 100 base pairs]]></title>
	<description><![CDATA[<code>def split_genome(sequence, window_size=100, step=1):
    &quot;&quot;&quot;
    Splits a genome sequence into overlapping windows.

    Args:
        sequence (str): The genome sequence.
        window_size (int): Size of each window (default: 100).
        step (int): Step size for overlapping (default: 1).

    Returns:
        list: A list of genome windows.
    &quot;&quot;&quot;
    windows = []
    for i in range(0, len(sequence) - window_size + 1, step):
        windows.append(sequence[i:i + window_size])
    return windows

# Example usage:
if __name__ == &quot;__main__&quot;:
    genome_sequence = &quot;ATGCGTACGTTAGCTACGATCGTACGATCGTACGATCGATCGTAGCATCGATCGTACG&quot;
    window_size = 100
    step_size = 1

    # Get overlapping windows
    genome_windows = split_genome(genome_sequence, window_size, step_size)

    # Print results
    for idx, window in enumerate(genome_windows):
        print(f&quot;Window {idx + 1}: {window}&quot;)</code>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/44701/methods-to-upgrade-the-ubuntu</guid>
	<pubDate>Fri, 06 Dec 2024 23:36:11 -0600</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/44701/methods-to-upgrade-the-ubuntu</link>
	<title><![CDATA[Methods to upgrade the Ubuntu !]]></title>
	<description><![CDATA[<code>#Install ubuntu-release-upgrader-core if it is not already installed:

sudo apt-get install ubuntu-release-upgrader-core
#Edit /etc/update-manager/release-upgrades and set Prompt=normal

#Launch the upgrade tool:

do-release-upgrade
#Follow the on-screen instructions.</code>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/44671/install-edirect</guid>
	<pubDate>Thu, 03 Oct 2024 01:52:15 -0500</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/44671/install-edirect</link>
	<title><![CDATA[Install Edirect !]]></title>
	<description><![CDATA[<code>sh -c &quot;$(curl -fsSL https://ftp.ncbi.nlm.nih.gov/entrez/entrezdirect/install-edirect.sh)&quot;</code>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>

</channel>
</rss>