BOL: Related items

MGSE: Mapping-based Genome Size Estimation

Shruti Paniwala — Fri, 17 Jan 2020 02:11:43 -0600

MGSE can harness the power of files generated in genome sequencing projects to predict the genome size. Required are the FASTA file containing a high continuity assembly and a BAM file with all available reads mapped to this assembly. The script construct_cov_file.py (https://doi.org/10.1186/s12864-018-5360-z) allows the generation of a COV file based on the (sorted) BAM file (also possible via MGSE directly). Next, this COV file can be used by MGSE to calculate the coverage in provided reference regions and to calculate the total number of mapped bases. Both values are subjected to the genome size estimation. Providing accurate reference regions is crucial for this genome size estimation.

Address of the bookmark: https://github.com/bpucker/MGSE

Synteny and Rearrangement Identifier (SyRI)

Jit — Tue, 05 May 2020 10:37:10 -0500

SyRI is a comprehensive tool for predicting genomic differences between related genomes using whole-genome assemblies (WGA). The assemblies are aligned using whole-genome alignment tools, and these alignments are then used as input to SyRI. SyRI identifies syntenic path (longest set of co-linear regions), structural rearrangements (inversions, translocations, and duplications), local variations (SNPs, indels, CNVs etc) within syntenic and structural rearrangements, and un-aligned regions.

Address of the bookmark: https://schneebergerlab.github.io/syri/

Reference Sequence Resource!

LEGE — Wed, 15 Sep 2021 21:15:22 -0500

The ENCODE project uses Reference Genomes from NCBI or UCSC to provide a consistent framework for mapping high-throughput sequencing data. In general, ENCODE data are mapped consistently to 2 human (GRCH38, hg19) and 2 mouse (mm9/mm10) genomes for historical comparability. Drosophia melanogaster experiments are mapped to either dm3 or dm6 and Caenorhabdilis elegans experiments are mapped to ce10 or ce11. T

Address of the bookmark: https://www.encodeproject.org/data-standards/reference-sequences/

Vicoso group

Wed, 02 Feb 2022 02:51:27 -0600

The Vicoso group investigates how sex chromosomes evolve over time, and what biological forces are driving their patterns of differentiation.

The Vicoso group is interested in understanding several aspects of the biology of sex chromosomes, and the evolutionary processes that shape their peculiar features. By combining the use of next-generation sequencing technologies with studies in several model and non-model organisms, they can address a variety of standing questions, such as: Why do some Y chromosomes degenerate while others remain homomorphic, and how does this relate to the extent of sexual dimorphism of the species? What forces drive some species to acquire global dosage compensation of the X, while others only compensate specific genes? What are the frequency and molecular dynamics of sex-chromosome turnover?

More at https://ist.ac.at/en/research/vicoso-group/
http://pub.ist.ac.at/~bvicoso/

HIV genome database !

Rahul Nayak — Fri, 21 Jan 2022 05:40:15 -0600

HIV resources

https://www.hiv.lanl.gov/components/sequence/HIV/search/search.html

Address of the bookmark: https://www.hiv.lanl.gov/components/sequence/HIV/search/search.html

Human Complete Genome

Shruti Paniwala — Wed, 06 Jul 2022 06:42:55 -0500

Telomere-to-telomere consortium

We have sequenced the CHM13hTERT human cell line with a number of technologies. Human genomic DNA was extracted from the cultured cell line. As the DNA is native, modified bases will be preserved. The data includes 30x PacBio HiFi, 120x coverage of Oxford Nanopore, 70x PacBio CLR, 50x 10X Genomics, as well as BioNano DLS and Arima Genomics HiC. Most raw data is available from this site, with the exception of the PacBio data which was generated by the University of Washington/PacBio and is available from NCBI SRA.

A UCSC browser is available for v2.0 (as well as legacy v1.0 and v1.1 versions). An interactive dotplot visualization of all genomic repeats is also available from resgen.io. Known issues identified in the assembly are tracked at CHM13 issues.

MORE at https://github.com/marbl/CHM13

Address of the bookmark: https://www.science.org/doi/10.1126/science.abj6987

Steps to find all the repeats in the genome !

Neel — Thu, 31 Aug 2023 02:43:28 -0500

To find repeats in a genome from 2 to 9 length using a Perl script, you can use the RepeatMasker tool with the "--length" option[0]. Here's a step-by-step guide:

Install RepeatMasker: First, you need to install RepeatMasker on your system. You can download it from the RepeatMasker website[0].

Prepare the genome sequence: Make sure you have the genome sequence in a FASTA file format. Let's assume the file is named "genome.fasta".

./RepeatMasker -pa -nolow -norna -no_is -div -lib RepeatMaskerLib.embl -gff -xsmall -small -poly -species -dir -length - genome.fasta

Replace the following placeholders with appropriate values:

: The number of processors/threads you want to use for parallel processing.
: The divergence value for the species you are analyzing. You can find divergence values for different species in the RepeatMasker documentation[0].
: The name of the species you are analyzing.
: The directory where you want the output files to be saved.
and : The minimum and maximum lengths of the repeats you want to find (in this case, 2 and 9).

Analyze the output: RepeatMasker will generate several output files, including a .out file. You can parse this file to extract the information you need. There is a Perl tool called "one_code_to_find_them_all.pl" that can help you parse RepeatMasker output files[0]. You can download it from the source provided.

Use the provided Perl script: Once you have the "one_code_to_find_them_all.pl" script, you can run it to conveniently parse the RepeatMasker output files. Here's an example of how to use it:

perl one_code_to_find_them_all.pl --rm --length

Replace with the path to your RepeatMasker .out file, and with the path to a file containing the lengths of the reference elements.

This script will generate several output files, including .log.txt and .copynumber.csv, which contain quantitative information about the identified repeat elements.

Remember to adjust the parameters and options according to your specific needs and the characteristics of your genome.

UnCoVar: Workflow for Transparent and Robust Virus Variant Calling, Genome Reconstruction and Lineage Assignment

BioStar — Mon, 05 Aug 2024 23:01:29 -0500

UnCoVar: Workflow for Transparent and Robust Virus Variant Calling, Genome Reconstruction and Lineage Assignment

Using state of the art tools, easily extended for other viruses
Tool and database updates for critical components via Conda
Built using modern design patterns with Conda and Snakemake
Extensible and easy to customize
Submission Ready Genomes
Customizable reporting with comprehensive visualization

https://ikim-essen.github.io/uncovar/

Github https://github.com/IKIM-Essen/uncovar

Address of the bookmark: https://ikim-essen.github.io/uncovar/

Genome Simulation with SLiM and msprime

BioStar — Fri, 31 Jan 2025 12:47:43 -0600

Genome simulation is an essential tool in population genetics, enabling researchers to model evolutionary processes and study genetic variation. Two widely used simulation tools in this field are SLiM and msprime. While both serve different purposes, they can be used together with the slendr framework to compare simulation outputs effectively.

Overview of SLiM and msprime

SLiM: Forward Genetic Simulator

SLiM is a free, open-source tool designed for forward genetic simulations. It allows researchers to model complex evolutionary scenarios, including selection, recombination, and demographic events, making it particularly useful for studying adaptation and selection in populations.

Key Features of SLiM:

Simulates population evolution forward in time
Supports custom evolutionary models using an embedded scripting language
Allows modeling of spatial and ecological dynamics
Provides high flexibility and extensibility for user-defined scenarios
Available on GitHub as an open-source project

msprime: Ancestry and Mutation Simulator

msprime is an efficient, open-source tool that simulates ancestry and mutations using a coalescent framework. It is known for its high-speed performance and low memory requirements, making it a popular choice for large-scale genomic simulations.

Key Features of msprime:

Implements coalescent simulations for ancestry modeling
Efficiently simulates large population histories
Supports the addition of mutations to genealogies
Developed using an open-source community model
Often faster and more memory-efficient than alternative simulators

Using SLiM and msprime with slendr

Both SLiM and msprime can be integrated with slendr, a framework that facilitates structured population genetic simulations. This integration allows for seamless comparison of simulation outputs.

How They Work Together:

SLiM and msprime simulations can be analyzed within slendr.
The ts_read() function in slendr enables loading and comparing tree sequence outputs from both simulators.
This integration allows researchers to validate simulation results and gain deeper insights into evolutionary processes.

Performance Considerations

While SLiM offers powerful forward simulations with extensive customization, msprime is often preferred for its speed and memory efficiency when simulating ancestry and mutations. The choice between the two depends on the research goals:

For detailed evolutionary modeling with selection and recombination: Use SLiM.
For large-scale coalescent simulations with mutations: Use msprime.
For comparing different simulation models and their outputs: Use slendr to integrate SLiM and msprime results.

Conclusion

SLiM and msprime are valuable tools for genome simulation, each serving distinct but complementary purposes in population genetics research. By leveraging the strengths of both simulators with slendr, researchers can conduct robust and efficient evolutionary simulations, enhancing our understanding of genetic diversity and adaptation.

For more information, check out the official GitHub repositories for SLiM and msprime, and explore the slendr framework for streamlined simulation workflow

Oldest Hominin DNA Sequenced

Surajeet — Fri, 27 Dec 2013 19:58:31 -0600

Matthias Meyer and his team from the Max Planck Institute for Evolutionary Anthropology in Leipzig, Germany, have developed new techniques for retrieving and sequencing highly degraded ancient DNA. They then joined forces with Juan-Luis Arsuaga and applied the new techniques to a cave bear from the Sima de los Huesos site. After this success, the researchers sampled two grams of bone powder from a hominin thigh bone from the cave. They extracted its DNA and sequenced the genome of the mitochondria or mtDNA, a small part of the genome that is passed down along the maternal line and occurs in many copies per cell. The researchers then compared this ancient mitochondrial DNA with Neandertals, Denisovans, present-day humans, and apes.

From the missing mutations in the old DNA sequences the researchers calculated that the Sima hominin lived about 400,000 years ago. They also found that it shared a common ancestor with the Denisovans, an extinct archaic group from Asia related to the Neandertals, about 700,000 years ago. "The fact that the mtDNA of the Sima de los Huesos hominin shares a common ancestor with Denisovan rather than Neandertal mtDNAs is unexpected since its skeletal remains carry Neandertal-derived features," says Matthias Meyer. Considering their age and Neandertal-like features, the Sima hominins were likely related to the population ancestral to both Neandertals and Denisovans. Another possibility is that gene flow from yet another group of hominins brought the Denisova-like mtDNA into the Sima hominins or their ancestors.

Reference

http://www.sciencedaily.com/releases/2013/12/131204132018.htm