BOL: Related items

Steps to find all the repeats in the genome !

Neel — Thu, 31 Aug 2023 02:43:28 -0500

To find repeats in a genome from 2 to 9 length using a Perl script, you can use the RepeatMasker tool with the "--length" option[0]. Here's a step-by-step guide:

Install RepeatMasker: First, you need to install RepeatMasker on your system. You can download it from the RepeatMasker website[0].

Prepare the genome sequence: Make sure you have the genome sequence in a FASTA file format. Let's assume the file is named "genome.fasta".

./RepeatMasker -pa -nolow -norna -no_is -div -lib RepeatMaskerLib.embl -gff -xsmall -small -poly -species -dir -length - genome.fasta

Replace the following placeholders with appropriate values:

: The number of processors/threads you want to use for parallel processing.
: The divergence value for the species you are analyzing. You can find divergence values for different species in the RepeatMasker documentation[0].
: The name of the species you are analyzing.
: The directory where you want the output files to be saved.
and : The minimum and maximum lengths of the repeats you want to find (in this case, 2 and 9).

Analyze the output: RepeatMasker will generate several output files, including a .out file. You can parse this file to extract the information you need. There is a Perl tool called "one_code_to_find_them_all.pl" that can help you parse RepeatMasker output files[0]. You can download it from the source provided.

Use the provided Perl script: Once you have the "one_code_to_find_them_all.pl" script, you can run it to conveniently parse the RepeatMasker output files. Here's an example of how to use it:

perl one_code_to_find_them_all.pl --rm --length

Replace with the path to your RepeatMasker .out file, and with the path to a file containing the lengths of the reference elements.

This script will generate several output files, including .log.txt and .copynumber.csv, which contain quantitative information about the identified repeat elements.

Remember to adjust the parameters and options according to your specific needs and the characteristics of your genome.

UnCoVar: Workflow for Transparent and Robust Virus Variant Calling, Genome Reconstruction and Lineage Assignment

BioStar — Mon, 05 Aug 2024 23:01:29 -0500

UnCoVar: Workflow for Transparent and Robust Virus Variant Calling, Genome Reconstruction and Lineage Assignment

Using state of the art tools, easily extended for other viruses
Tool and database updates for critical components via Conda
Built using modern design patterns with Conda and Snakemake
Extensible and easy to customize
Submission Ready Genomes
Customizable reporting with comprehensive visualization

https://ikim-essen.github.io/uncovar/

Github https://github.com/IKIM-Essen/uncovar

Address of the bookmark: https://ikim-essen.github.io/uncovar/

Genome Simulation with SLiM and msprime

BioStar — Fri, 31 Jan 2025 12:47:43 -0600

Genome simulation is an essential tool in population genetics, enabling researchers to model evolutionary processes and study genetic variation. Two widely used simulation tools in this field are SLiM and msprime. While both serve different purposes, they can be used together with the slendr framework to compare simulation outputs effectively.

Overview of SLiM and msprime

SLiM: Forward Genetic Simulator

SLiM is a free, open-source tool designed for forward genetic simulations. It allows researchers to model complex evolutionary scenarios, including selection, recombination, and demographic events, making it particularly useful for studying adaptation and selection in populations.

Key Features of SLiM:

Simulates population evolution forward in time
Supports custom evolutionary models using an embedded scripting language
Allows modeling of spatial and ecological dynamics
Provides high flexibility and extensibility for user-defined scenarios
Available on GitHub as an open-source project

msprime: Ancestry and Mutation Simulator

msprime is an efficient, open-source tool that simulates ancestry and mutations using a coalescent framework. It is known for its high-speed performance and low memory requirements, making it a popular choice for large-scale genomic simulations.

Key Features of msprime:

Implements coalescent simulations for ancestry modeling
Efficiently simulates large population histories
Supports the addition of mutations to genealogies
Developed using an open-source community model
Often faster and more memory-efficient than alternative simulators

Using SLiM and msprime with slendr

Both SLiM and msprime can be integrated with slendr, a framework that facilitates structured population genetic simulations. This integration allows for seamless comparison of simulation outputs.

How They Work Together:

SLiM and msprime simulations can be analyzed within slendr.
The ts_read() function in slendr enables loading and comparing tree sequence outputs from both simulators.
This integration allows researchers to validate simulation results and gain deeper insights into evolutionary processes.

Performance Considerations

While SLiM offers powerful forward simulations with extensive customization, msprime is often preferred for its speed and memory efficiency when simulating ancestry and mutations. The choice between the two depends on the research goals:

For detailed evolutionary modeling with selection and recombination: Use SLiM.
For large-scale coalescent simulations with mutations: Use msprime.
For comparing different simulation models and their outputs: Use slendr to integrate SLiM and msprime results.

Conclusion

SLiM and msprime are valuable tools for genome simulation, each serving distinct but complementary purposes in population genetics research. By leveraging the strengths of both simulators with slendr, researchers can conduct robust and efficient evolutionary simulations, enhancing our understanding of genetic diversity and adaptation.

For more information, check out the official GitHub repositories for SLiM and msprime, and explore the slendr framework for streamlined simulation workflow

Graph Genome Suite

Jit — Fri, 28 Oct 2016 07:59:54 -0500

Seven Bridges is the biomedical data analysis company accelerating breakthroughs in genomics research for cancer, drug development and precision medicine. We build self-improving systems to analyze millions of genomes, including the Graph Genome Suite — the most advanced population genomics tools in the world.

Address of the bookmark: https://www.sbgenomics.com/graph/

PhyloGrapher - Graph Visualization Tool

Jit — Wed, 07 Mar 2018 18:11:25 -0600

PhyloGrapher is a program designed to visualize and study evolutionary relationships within families of homologous genes or proteins (elements). PhyloGrapher is a drawing tool that generates custom graphs for a given set of elements. In general, it is possible to use PhyloGrapher to visualize any type of relations between elements.

https://www.youtube.com/watch?v=WgufqYMHCvM

Address of the bookmark: http://www.atgc.org/PhyloGrapher/PhyloGrapher_Welcome.html

pggb: the pangenome graph builder

biogeek — Sun, 13 Sep 2020 20:54:20 -0500

This pangenome graph construction pipeline renders a collection of sequences into a pangenome graph (in the variation graph model). Its goal is to build a graph that is locally directed and acyclic while preserving large-scale variation. Maintaining local linearity is important for the interpretation, visualization, and reuse of pangenome variation graphs.

Address of the bookmark: https://github.com/pangenome/pggb

GraphPath: A graph attention model for molecular stratification with interpretability based on the pathway-pathway interaction network

LEGE — Wed, 27 Mar 2024 20:51:21 -0500

Achieving accurate and interpretable clinical predictions requires paramount attention to thoroughly characterizing patients at both the molecular and biological pathway levels. In this paper, we present GraphPath, a biological knowledge-driven graph neural network with multi-head self-attention mechanism that implements the pathway-pathway interaction network. We train GraphPath to classify the cancer status of patients with prostate cancer based on their multi-omics profiling.

Address of the bookmark: https://github.com/amazingma/GraphPath

Five points for bioinformatics software/tools

Jitendra Narayan — Mon, 05 Aug 2013 04:12:32 -0500

In the bioinformatics sector we mostly spend time on computational analysis of huge amounts of data and try to make sense of it, biologically. But, most of the newbie bioinformaticians are faced with dilemma when they receive biological sequence data for the first time. They mostly found confusing over open source, user friendly GUI, and commercial bioinformatics software. Don’t be surprise this is true and also not an easy task to decide, because analytical step is the most crucial part and believe to be the biggest bottleneck in publishing paper in high impact journals. Through this blog I would like to address the pros and cons of both kind of software/tools and try to assist (Hmmm not really, It looks convince) you to make decision on your software selections.

The most common newbie questions are:

Should I try to use these free open source programs? Why are we not trying GUI software for computational analysis? Should I use commercial bioinformatics programs/software?”

1. Let’s be open

We generally think free and cheap are useless. But this concept is not applicable when we discuss open source software. Mostly, the bioinformatics software is developed by highly competitive biological programmers who believe in open sharing of knowledge. They come under Open Bioinformatics Foundation or O|B|F which is a non-profit, volunteer run organization focused on supporting open source programming in bioinformatics. The best part about open source tools/software is that they’re free to download the source code and read exactly what the program does. If you are so inclined, you can view all of the parts of the program and see the logical flow of the pipeline. In addition, open source makes an excellent learning tool for any beginning bioinformatician. Moreover, you can modify existing open source programs to deal with cutting-edge problems or to customize your pipeline. Apart from your computational and analysis work, most of the reviewer also prefers the open source based results so that they can validate the results if validation required.

2. Code headache

As a bioinformatician you are supposed to know the basics of programming languages, and if you are not good at it, then please learn it as soon as possible because you are not a bio-analyst but biological programmers. The open source programs usually lack dedicated service and support teams (often because they were the product of an overworked doc/postdoc!) so you are responsible for troubleshooting your own errors most of the time. We commonly receive the HELP email to support and assist to setup the pipeline; you can also find this kind of request on any QA forum. I personally believe this coding horror brings the biggest downside of open-source programs; where you need some programming skills in order to implement the program in your pipeline. But, if you are not able to fix the pipeline and modify the open source code according to your requirements them you should re-think on your bioinformatician name tag!!!

3. Dive into the codes

Some of the biologist turn bioinformatician says “if you can do the same thing with commercial software then why to get migraine with weird codes”, well this statement looks to me that guys are keen to learn swimming but still don’t like to get wet. If you are still using paid software and doing your work by customer support and clicking some of the well-designed GUI button then perhaps you are not interested in learning and trying new and challenging bioinformatics works. You are missing the basic flavour of bioinformatics. Let’s dive into the coding world, I am sure your will enjoy it. I recommend your to swim freely in code’s sea, and enjoy the journey; do not merely watch it from the outside.

4. Paid does not mean better

The bioinformatics company which are specializes in bioinformatics solutions develop well designed/packed, user friendly software by using a large number of specialised scientist, programmers and support staff. They also provide good services to accomplice your biological analysis work. This means that if you hit a ‘snag’ with your data, help is likely only a phone call away! These companies price their products competitively against the cost of a dedicated bioinformatician. You may be able to afford the program, but not the additional staff! Additionally, most of the functionality that you need in your analysis is already coded into the program. Need to plot a graph? Just click this button right here. It is that easy. But, as a bioinformatician this is not generally well encouraged approach in biological analysis work, because the software is not available to everyone and your data can’t be validated. Moreover, there is very less chances that anyone will repeat your work or love to do similar kind of research (because not all the labs in the world are rich like yours).

5. Take a caution

In biological analysis work, in which you deal GB/TB of data are having maximum chances of getting errors, so please be careful and always cross check your data before coming to any conclusion. Even an error in two line code can alter your entire analysis and display weird results. Some of the scientist blindly believes on commercial software, which is entirely wrong. Using proprietary tools does not absolve you of the need to actually read and research the type of analysis that you are doing. This is particularly true in the case of genome assembly and annotation.

At the end, I would like to tell only one think that open source solutions allows you to do more cutting edge analysis than the commercial tools. So let’s go for it.

Disclaimer:

This is my personal view. I have nothing to do with any company or open source community. The views expressed on these pages are mine alone and not those of my current/past employers. I do reserve the right to remove comments left by spammers or off-topic comments.

Funny software engineer

Ram Yash Pal — Fri, 09 May 2014 06:57:04 -0500

GKNO

Neel — Fri, 20 May 2016 18:56:37 -0500

gkno opens the world of complex bioinformatic analysis to people of all level of computational expertise. This site contains documentation, tutorials and information on all the tools that comprise gkno.

http://gkno.me/how-to/install.html

http://gkno.me/software.html

Address of the bookmark: http://gkno.me/