Unique k-mers and distinct k-mers are related concepts in bioinformatics, specifically in the context of analyzing DNA or RNA sequences.
Unique k-mers: These are k-mers (short sequences of DNA or RNA of length k) that occur only once in a given set of sequences. Unique k-mers can be useful for genome assembly, error correction, and identifying novel sequences.
Distinct k-mers: These are k-mers that are unique within a single sequence, but may occur multiple times in a set of sequences. Distinct k-mers can be useful for quantifying the abundance or expression of different sequences within a sample.
To give an example, suppose we have a set of three DNA sequences:
Seq1: ATGCATGCGCAT Seq2: CGCGTACATCGT Seq3: ATGCGCAGCGCG
If we consider k-mers of length 4 (4-mers), we can identify the unique and distinct k-mers:
Note that the k-mer "TGCN" is a unique 4-mer because it only appears once in the set of sequences, while the k-mer "CGCG" is a distinct 4-mer because it appears twice (once in Seq2 and once in Seq3), but is unique within each sequence.
Distinct k-mers are counted only once, even if they appear multiple times, whereas Unique k-mers appear only once.