<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Python script to split a DNA sequence into words of varying lengths]]></title>
	<link>https://bioinformaticsonline.com/snippets/view/44753/python-script-to-split-a-dna-sequence-into-words-of-varying-lengths?</link>
	<atom:link href="https://bioinformaticsonline.com/snippets/view/44753/python-script-to-split-a-dna-sequence-into-words-of-varying-lengths?" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/44753/python-script-to-split-a-dna-sequence-into-words-of-varying-lengths</guid>
	<pubDate>Thu, 02 Jan 2025 11:31:22 -0600</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/44753/python-script-to-split-a-dna-sequence-into-words-of-varying-lengths</link>
	<title><![CDATA[Python script to split a DNA sequence into words of varying lengths]]></title>
	<description><![CDATA[<code># Script to split a DNA sequence into words of varying lengths
def split_dna_into_words(dna_sequence, min_length, max_length):
    &quot;&quot;&quot;
    Splits a DNA sequence into words of lengths ranging from min_length to max_length.

    Parameters:
        dna_sequence (str): The DNA sequence to split (e.g., &quot;ATGCGTAC&quot;).
        min_length (int): The minimum length of each word.
        max_length (int): The maximum length of each word.

    Returns:
        dict: A dictionary where keys are word lengths and values are lists of DNA words of that length.
    &quot;&quot;&quot;
    if not dna_sequence:
        raise ValueError(&quot;The DNA sequence cannot be empty.&quot;)

    if min_length &lt;= 0 or max_length &lt;= 0:
        raise ValueError(&quot;Word lengths must be positive integers.&quot;)

    if min_length &gt; max_length:
        raise ValueError(&quot;Minimum length cannot be greater than maximum length.&quot;)

    # Ensure the DNA sequence contains valid nucleotides
    for nucleotide in dna_sequence:
        if nucleotide.upper() not in &quot;ATCG&quot;:
            raise ValueError(f&quot;Invalid character &#039;{nucleotide}&#039; found in DNA sequence.&quot;)

    # Generate words of varying lengths
    words_by_length = {}
    for length in range(min_length, max_length + 1):
        words_by_length[length] = [dna_sequence[i:i+length] for i in range(0, len(dna_sequence) - length + 1)]

    return words_by_length

# Example usage
def main():
    dna_sequence = &quot;ATGCGTACGCTAATGCGTACGCTAATGCGTACGCTAATGCGTACGCTAATGCGTACGCTAATGCGTACGCTAATGCGTACGCTAATGCGTACGCTAATGCGTACGCTAATGCGTACGCTA&quot;
    min_length = 3
    max_length = 99

    try:
        words_by_length = split_dna_into_words(dna_sequence, min_length, max_length)
        for length, words in words_by_length.items():
            print(f&quot;Words of length {length}:&quot;, words)
    except ValueError as e:
        print(&quot;Error:&quot;, e)

if __name__ == &quot;__main__&quot;:
    main()</code>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>

</channel>
</rss>