BioJoker2172 days ago
I am working on a new genome, and wondering about prediction methods for heterozygosity in the genome. Any tools suggestions and helps are welcome.
You can try many software, but the quickest one are BBTools/kmercountexact.sh
To uses approximate counts:
khist.sh in=reads.fq khist=khist.txt peaks=peaks.txt
To uses exact counts (and thus potentially more memory)
kmercountexact.sh in=reads.fq khist=khist.txt peaks=peaks.txtThe peaks file header contains estimates of genome size and heterozygousity. You can also add the flag "ploidy=2" for diploid organisms, so that it won't need to autodetect the ploidy (and thus potentially make a mistake).
I just came across this paper on arxiv "Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects"
https://arxiv.org/abs/1308.2012
It is available at ftp://ftp.genomics.org.cn/pub/gce/
You can try many software, but the quickest one are BBTools/kmercountexact.sh
To uses approximate counts:
khist.sh in=reads.fq khist=khist.txt peaks=peaks.txt
To uses exact counts (and thus potentially more memory)
kmercountexact.sh in=reads.fq khist=khist.txt peaks=peaks.txt
The peaks file header contains estimates of genome size and heterozygousity. You can also add the flag "ploidy=2" for diploid organisms, so that it won't need to autodetect the ploidy (and thus potentially make a mistake).