<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Python script to find all possible repeats in a DNA string !]]></title>
	<link>https://bioinformaticsonline.com/snippets/view/44740/python-script-to-find-all-possible-repeats-in-a-dna-string?</link>
	<atom:link href="https://bioinformaticsonline.com/snippets/view/44740/python-script-to-find-all-possible-repeats-in-a-dna-string?" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/44740/python-script-to-find-all-possible-repeats-in-a-dna-string</guid>
	<pubDate>Mon, 16 Dec 2024 07:54:38 -0600</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/44740/python-script-to-find-all-possible-repeats-in-a-dna-string</link>
	<title><![CDATA[Python script to find all possible repeats in a DNA string !]]></title>
	<description><![CDATA[<code>from collections import defaultdict

def find_repeats_in_genome(genome, min_length=2, max_length=None):
    &quot;&quot;&quot;
    Finds all repeating sequences in a genome within a specified length range.

    Parameters:
        genome (str): The genome sequence.
        min_length (int): Minimum length of repeats to scan for (default: 2).
        max_length (int): Maximum length of repeats to scan for (default: None, meaning entire genome).

    Returns:
        dict: A dictionary where keys are repeating sequences and values are lists of starting positions.
    &quot;&quot;&quot;
    if max_length is None:
        max_length = len(genome)

    repeats = defaultdict(list)

    # Iterate over all possible lengths of substrings
    for length in range(min_length, max_length + 1):
        seen = defaultdict(list)  # Tracks occurrences of substrings of the current length

        # Sliding window approach
        for i in range(len(genome) - length + 1):
            substring = genome[i:i + length]
            seen[substring].append(i)

        # Filter substrings that appear more than once
        for substring, positions in seen.items():
            if len(positions) &gt; 1:
                repeats[substring].extend(positions)

    return repeats

# Example usage
def main():
    genome = &quot;ATCGATCGAATTCGATCG&quot;  # Example genome sequence
    min_length = 2
    max_length = 5

    repeats = find_repeats_in_genome(genome, min_length, max_length)

    print(&quot;Repeating sequences:&quot;)
    for seq, positions in repeats.items():
        print(f&quot;Sequence: {seq}, Positions: {positions}&quot;)

if __name__ == &quot;__main__&quot;:
    main()</code>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>

</channel>
</rss>