BOL: Related items

Vicoso group

Wed, 02 Feb 2022 02:51:27 -0600

The Vicoso group investigates how sex chromosomes evolve over time, and what biological forces are driving their patterns of differentiation.

The Vicoso group is interested in understanding several aspects of the biology of sex chromosomes, and the evolutionary processes that shape their peculiar features. By combining the use of next-generation sequencing technologies with studies in several model and non-model organisms, they can address a variety of standing questions, such as: Why do some Y chromosomes degenerate while others remain homomorphic, and how does this relate to the extent of sexual dimorphism of the species? What forces drive some species to acquire global dosage compensation of the X, while others only compensate specific genes? What are the frequency and molecular dynamics of sex-chromosome turnover?

More at https://ist.ac.at/en/research/vicoso-group/
http://pub.ist.ac.at/~bvicoso/

UCSC SARS-CoV-2 Genome Browser

Jit — Thu, 06 Jan 2022 06:48:40 -0600

The UCSC SARS-CoV-2 Genome Browser (https://genome.ucsc.edu/covid19.html) is an adaptation of our popular genome-browser visualization tool for this virus, containing many annotation tracks and new features, including conservation with similar viruses, immune epitopes, RT–PCR and sequencing primers and CRISPR guides. We invite all investigators to contribute to this resource to accelerate research and development activities globally.

Address of the bookmark: https://www.nature.com/articles/s41588-020-0700-8

MUM&Co is a simple bash script that uses Whole Genome Alignment information provided by MUMmer (v4) to detect variants.

Rahul Nayak — Wed, 27 Apr 2022 04:34:12 -0500

MUM&Co is able to detect:
Deletions, insertions, tandem duplications and tandem contractions (>=50bp & <=150kb)
Inversions (>=1kb) and translocations (>=10kb)

Address of the bookmark: https://github.com/SAMtoBAM/MUMandCo

The ATCC Genome Portal

Abhi — Wed, 15 May 2024 14:24:16 -0500

The ATCC Genome Portal (AGP, https://genomes.atcc.org/) is a database of authenticated genomes for bacteria, fungi, protists, and viruses held in ATCC’s biorepository. It now includes 3,938 assemblies (253% increase) produced under ISO 9000 by ATCC. Here, we present new features and content added to the AGP for the research community.

Address of the bookmark: https://genomes.atcc.org/

Proksee

LEGE — Wed, 27 Mar 2024 11:11:54 -0500

Proksee is an expert system for genome assembly, annotation and visualization. To begin using Proksee, provide a complete genome sequence, sequencing reads or a CGView/Proksee map JSON file.

Please Cite the Following

Grant JR, Enns E, Marinier E, Mandal A, Herman EK, Chen C, Graham M, Van Domselaar G, and Stothard P

Proksee: in-depth characterization and visualization of bacterial genomes

Nucleic Acids Research, 2023, gkad326, https://doi.org/10.1093/nar/gkad326

Address of the bookmark: https://proksee.ca/

UnCoVar: Workflow for Transparent and Robust Virus Variant Calling, Genome Reconstruction and Lineage Assignment

BioStar — Mon, 05 Aug 2024 23:01:29 -0500

UnCoVar: Workflow for Transparent and Robust Virus Variant Calling, Genome Reconstruction and Lineage Assignment

Using state of the art tools, easily extended for other viruses
Tool and database updates for critical components via Conda
Built using modern design patterns with Conda and Snakemake
Extensible and easy to customize
Submission Ready Genomes
Customizable reporting with comprehensive visualization

https://ikim-essen.github.io/uncovar/

Github https://github.com/IKIM-Essen/uncovar

Address of the bookmark: https://ikim-essen.github.io/uncovar/

Early Genome Screening: The New Health Horoscope!

LEGE — Thu, 02 Jan 2025 19:44:36 -0600

In an era where precision medicine is reshaping healthcare, genome screening is emerging as the modern equivalent of a health horoscope. It offers insights into our biological "stars," unraveling predispositions to various conditions and empowering individuals with knowledge to navigate their health journeys proactively. But how reliable is this "horoscope," and how does it impact our lives?

Understanding Genome Screening

Genome screening involves analyzing an individual's DNA to identify genetic variations that may influence health and disease susceptibility. This can range from simple single-gene tests to comprehensive whole-genome sequencing. By peering into our genetic blueprint, we can uncover risks for conditions like cancer, diabetes, cardiovascular diseases, and even rare genetic disorders.

The process is straightforward: a saliva or blood sample is collected, and advanced sequencing technologies decipher the genetic code. The results provide a personalized health map, guiding lifestyle modifications, preventive measures, or medical interventions.

A Shift from Reactive to Proactive Healthcare

Traditional healthcare often focuses on treating diseases after they manifest. Genome screening flips this model on its head, enabling a shift toward prevention and early intervention. For instance:

Cancer Risk Management: Individuals with BRCA1 or BRCA2 gene mutations can opt for enhanced screening programs or preventive surgeries to mitigate their risk of breast and ovarian cancers.
Cardiovascular Health: Genetic predispositions to conditions like familial hypercholesterolemia can prompt early cholesterol monitoring and lifestyle adjustments.
Rare Diseases: Identifying carriers of genetic disorders can aid in family planning and reduce the incidence of inherited conditions.

The Ethical and Practical Concerns

While genome screening offers incredible promise, it is not without challenges:

Accuracy and Interpretation: Genetic predisposition does not guarantee disease. Misinterpretation of results can lead to unnecessary anxiety or unwarranted medical interventions.
Privacy and Data Security: Genetic data is highly sensitive. Ensuring robust data protection measures is crucial to prevent misuse.
Accessibility and Equity: High costs and limited availability may restrict access to genome screening, exacerbating health disparities.

Balancing Science and Pseudoscience

The comparison of genome screening to horoscopes isn’t entirely unfounded. Both offer predictive insights, but the scientific foundation of genome screening distinguishes it from astrology. Unlike the alignment of celestial bodies, genetic predictions are based on rigorous data and evidence. However, the probabilistic nature of genetic predispositions underscores the importance of interpreting results in conjunction with clinical and lifestyle factors.

The Road Ahead

As genome screening becomes more affordable and integrated into routine healthcare, its potential to transform lives is immense. Policymakers, healthcare providers, and genetic counselors must collaborate to ensure ethical implementation, public awareness, and equitable access.

Imagine a future where your genetic "horoscope" is a trusted guide, not just a prediction. Early genome screening could help chart a healthier path for generations, making it a cornerstone of personalized medicine. After all, our genes might just hold the key to unlocking a future of better health and well-being.

HiTE: a fast and accurate dynamic boundary adjustment approach for full-length Transposable Elements detection and annotation in Genome Assemblies

LEGE — Sat, 20 Sep 2025 09:34:04 -0500

HiTE is a Python software that uses a dynamic boundary adjustment approach to detect and annotate full-length Transposable Elements in Genome Assemblies. In comparison to other tools, HiTE demonstrates superior performance in detecting a greater number of full-length TEs.

panHiTE

We have developed panHiTE, a comprehensive and accurate pipeline for TE detection in large-scale population genomes. It has been successfully applied to hundreds of plant population genomes, demonstrating its effectiveness and scalability.

For detailed instructions, please refer to the panHiTE tutorial.

Address of the bookmark: https://github.com/CSU-KangHu/HiTE

BFC: a standalone high-performance tool for correcting sequencing errors from Illumina sequencing data

Jit — Thu, 31 May 2018 09:35:23 -0500

BFC is a standalone high-performance tool for correcting sequencing errors from Illumina sequencing data. It is specifically designed for high-coverage whole-genome human data, though also performs well for small genomes. The BFC algorithm is a variant of the classical spectrum alignment algorithm introduced by Pevzner et al (2001). It uses an exhaustive search to find a k-mer path through a read that minimizes a heuristic objective function jointly considering penalties on correction, quality and k-mer support. This algorithm was first implemented in my fermi assembler and then refined a few times in fermi, fermi2 and now in BFC. In the k-mer counting phase, BFC uses a blocked bloom filter to filter out most singleton k-mers and keeps the rest in a hash table (Melsted and Pritchard, 2011). The use of bloom filter is how BFC is named, though other correctors such as Lighter and Bless actually rely more on bloom filter than BFC. https://github.com/lh3/bfc

Address of the bookmark: https://github.com/lh3/bfc

Parallel Processing with Perl !

Rahul Nayak — Sat, 25 Aug 2018 11:32:40 -0500

Here is a small tutorial on how to make best use of multiple processors for bioinformatics analysis. One best way is using perl threads and forks. Knowing how these threads and forks work is very important before implementing them. Getting to know how these work would be really useful before reading this tutorial.

Many times in bioinformatics we need to deal with huge datasets which are more than 100GB size. The traditional way to analysis a file is using the while loop

while (FILE){

Do something;

}

This is very slow(since we are using only one processor) and if we have 500 million lines in the dataset it takes more than a day to iterate through the whole dataset. So how do we make best use of all our processors and get the work done quickly?

Here is a very simple and efficient technique with perl which i have been using. I am more inclined towards using perl fork than perl threads.

One of the oldest way to fork is

my $fork = fork();
if($fork){
push (@childs,$fork);
}
elseif($fork==0){
your code here;
exit(0);
}
else{die “Couldnt fork : $!”;}
## wait for the child process to finish
foreach(@childs){
my $tmp=waitid($_,0);
}

what a fork does is it creates a child process and takes the variables and code with it to analyze it separately (detached from the parent process) and thus a separate process is created( which usually runs on a separate processor). Thats it!! One big disadvantage of forking is its very difficult to share variables among the different processes. I will show you how to do it easily but still it has its own drawbacks.

Okie, now if you really do not want to use fork in your code, that’s okie too..There are many useful modules which do it for you very efficiently. One really useful module is Parallel::ForkManager. You can use Parallel::ForkManager to manage the number of forks you want to generate (number of processors you want to use).
Simple usage:
use Parallel::ForkManager;
my $max_processors=8;
my $fork= new Parallel::ForkManager($max_processors);
foreach (@dna) {
$fork->start and next; # do the fork
you code here;
$fork->finish; # do the exit in the child process
}
$pm->wait_all_children;

so you will be generating 8 forks which do the same thing for your each element of array. when one child finishes, Parallel::ForkManager generates a new one and thus you will be using all your processors to analyze the data. Now, if you have generated 8 child processes and want to write the data to one file. You need to lock the file to do this, because you will have problems with the buffering. You can lock the file using flock command.

open (my $QUAL, “myfile.txt”);
flock $QUAL, LOCK_EX or die “cant lock file $!”;
print $QUAL “$output”;
flock $QUAL, LOCK_UN or die “$!”;
close $QUAL;

I would not suggest using flock when dealing with multiple processes because it will decrease the processing efficiency( each child process must wait for the lock to be released by the other child process). Instead, I would suggest each fork writing to a separate file and after the processing just concatenating them.

Putting it all together, If you have 100GB data you can do this

step 1 : split the dataset equally according to number of processors you have. this may take a few hours(about 2-3 hrs for 100GB file)
You can use unix “split” command for this
for example:
my $number_split=int($number_of_entries_in_your_dataset/$max_processors);
my $split_Files=`split -l $number_split “your_file.fasta” “file_name”`;
step2: open you directory comtaining you split files and start Parallel::ForkManager.
For example:
opendir(DIRECTORY, $split_files_directory) or die $!; ### open the directory
my $fork= new Parallel::ForkManager($max_processors);
while (my $file = readdir(DIRECTORY)) { ### read the directory
if($file=~/^\./){next;}
print $file,”\n”;
########## Start fork ##########
my $pid= $super_fork->start and next;
Whatever you want to do with the split file ;
analyze my piece of $file;
######### end fork ###############
$super_fork->finish;
}
$super_fork->wait_all_children;

So basically each processor will be active with its piece of data (split file) and thus you have created 8 processes at one time which run without interfering with the other process. I again will not suggest writing output from each child process to one file(for reasons above). Write output from each fork to a separate file and finally concatenate them. Thats it, you have just increased your program speed by 8 times!! Isnt it easy?

Note:
You may worry about concatenation of the output each child generates, since it does take some time(remember 100GB). I think now you can use a mysql database LOAD DATA LOCAL INFILE command to load all the files into a single table(Should take about 3hrs for 100Gb dataset) and then export the whole table into one file. This should be faster than just concatenating them using “cat” command.(correct me if I am wrong)

Or much simpler way is to use pipes

cat output_dir/* | my_pipe or my_pipe <(file1) final_file;

Thats it guys!! Enjoy programming and please do comment. I am not a computer scientist so forgive me for any mistakes and if any please report them. Thank you.