BOL: Related items

Gap filling or Contigs extensions tools !

Rahul Nayak — Fri, 01 Jun 2018 08:07:32 -0500

There are many tools to perform gap filling using Illumina short reads, for example "GapFiller: a de novo assembly approach to fill the gap within paired reads" or "Toward almost closed genomes with GapFiller". There are also some tools like GAPresolution that can help to perform local re-assemblies using 454 reads. We used GAPresolution but it is not a very good software, it is useful only in some specific situations.

Take a look at the PRICE software from the DeRisi lab. Its meant to do something very similar. http://derisilab.ucsf.edu/index.php?page=software

You could also look at SSPACE (http://www.baseclear.com/landingpages/basetools-a-wide-range-of-bioinformatics-solutions/sspacev12/), ATLAS tools (http://www.hgsc.bcm.tmc.edu/content/bcm-hgsc-software), and SCARPA (http://compbio.cs.toronto.edu/hapsembler/scarpa.html).

See the PAGIT protocol: http://www.sanger.ac.uk/resources/software/pagit/

In particular, take a look at the IMAGE tool: http://genomebiology.com/2010/11/4/R41

Also SOAPdenovo has ha function for scaffolding. Not sure about ABYSS

Here there is a useful explanation of several tools.

https://bioinformaticsonline.com/search?q=scaffolding&entity_type=object&entity_subtype=bookmarks&offset=0&search_type=entities

I could be wrong, but the above answers to your hypothetical scenario appear to miss the point that you aren't interested in assembling the full genome, just the 100 kb part you're interested in. I suggest the following algorithm:

1. Start with the initial assembly C0 of the contigs you have identified as overlapping your region of interest, and the set S of reads those contigs contain. Let C = C0.

2. Repeat:
a. Identify paired-end reads (not in C) for which one or both ends align within, or extending, contigs in C.
b. Identify unpaired reads that align extending these new paired-end reads.
c. Construct a new assembly C' from C and the new reads identified in (a) and (b).
d. Trim C' so it does not extend more than 100 kb to either end of C0. Set C = C'.
e. Let S' denote the reads that contribute to C'. If S' does not contain any reads not present in S, stop. Otherwise, Set S = S'.

3. If you don't have a complete assembly of the region of interest, generate an STS for each end of each contig, probe a library for clones including these STSes, subclone these clones into a paired-end sequencing vector, and generate paired-end reads for this library; then try steps (1) and (2) again, adding these new sequencing reads to what you had before.

4. If your average sequencing depth for the region of interest exceeds 25 or so without filling all gaps, it is likely that the remaining gaps represent sequences that are not getting cloned in your sequencing vectors. Try different sequencing vectors.

PhD opportunity at Université de Liège - Belgium

Mon, 01 Sep 2014 17:16:22 -0500

The Bioinformatics and Systems Biology Unit of Université de Liège (Belgium) is looking for a highly motivated master student with programming skills for a PhD thesis project (4 years, fully funded) with the goal of designing computational tools that use literature, genomic and structural data in order to infer regulatory and metabolic networks.

Applicants are invited to send their resume and a recommendation letter to Prof. Patrick Meyer (more details at www.biosys.ulg.ac.be )

molinspiration: broad range of cheminformatics software tools supporting molecule manipulation

BioJoker — Sun, 20 Jan 2019 05:32:40 -0600

Molinspiration offers broad range of cheminformatics software tools supporting molecule manipulation and processing, including SMILES and SDfile conversion, normalization of molecules, generation of tautomers, molecule fragmentation, calculation of various molecular properties needed in QSAR, molecular modelling and drug design, high quality molecule depiction, molecular database tools supporting substructure and similarity searches. Our products support also fragment-based virtual screening, bioactivity prediction and data visualization. Molinspiration tools are written in Java, therefore can be used practically on any computer platform.

Address of the bookmark: https://www.molinspiration.com/

Internship in Computational Biology

Thu, 04 Sep 2014 04:19:40 -0500

We are looking for a motivated and autonomous intern to study gene expression in hybrid organisms. The student will work on natural hybrids of two or three different species of fungal endosymbionts of grasses. The pupose of this project is to build software allowing us to identify the genomic origin of expressed genes. To do that, the intern will have to analyze expression data (from RNA-seq) to find SNPs on the sequenced mRNAs allowing to identify from which of the parental genome the expressed gene come from. The data will have to be saved in a database using the standard BioSQL schema.

This job will allow the intern to become more familiar with new biological and bioinformatics tools like next generation sequencing, RNA-Seq data analysis and comparative genomics.

To apply for this position, send the following documents (in PDF format) to Dr Pierre-Yves Dupont (email p.y.dupont@massey.ac.nz):

1. A short cover letter.
2. A curriculum vitae, with transcript details.
3. The names and contact details of two referees willing to provide a confidential letter of recommendation upon request.

Informal enquiries are welcome. Formal applications are due by Sunday 2nd December 2012.
Requirements:

This position requires a good understanding of genetic problems, a good command of at least one scripting language (Perl, Python...), a basic knowledge of MySQL or any relational database management system. Knowledge in biological programming libraries (BioPython, BioPerl, BioRuby...), Java, C++ or any compiled language is an asset but not required. Undergraduate or Master degree is required.
Contact Information:

Dr. Pierre-Yves Dupont
Institute of Molecular BioSciences
Massey University
Private Bag 11 222
Palmerston North 4442
NEW ZEALAND

http://massey.genomicus.com/
p.y.dupont@massey.ac.nz

Information about the Institute of Molecular BioSciences (http://imbs.massey.ac.nz/) and the Computational Biology Research Group (http://massey.genomicus.com/) is available online. For more information about the position, you can contact Dr Pierre-Yves Dupont (email p.y.dupont@massey.ac.nz).

Dahak: benchmarking and containerization of tools for analysis of complex non-clinical metagenomes.

BioStar — Thu, 09 Apr 2020 04:56:09 -0500

Dahak is a software suite that integrates state-of-the-art open source tools for metagenomic analyses. Tools in the dahak software suite will perform various steps in metagenomic analysis workflows including data pre-processing, metagenome assembly, taxonomic and functional classification, genome binning, and gene assignment. We aim to deliver the analytical framework as a robust and reliable containerized workflow system, which will be free from dependency, installation, and execution problems typically associated with other open-source bioinformatics solutions. This will maximize the transparency, data provenance (i.e., the process of tracing the origins of data and its movement through the workflow), and reproducibility.

More at https://dahak-metagenomics.github.io/dahak/

Address of the bookmark: https://github.com/dahak-metagenomics/dahak

Bioinformatics position at IRCCS Casa Sollievo della Sofferenza

Wed, 10 Sep 2014 14:25:34 -0500

The bioinformatics unit at IRCCS Casa Sollievo della Sofferenza - Mendel laboratory in Rome is looking for one young bioinformatician with specific experience and/or interest in the analysis of genomics and transcriptomic data.

The candidate will be mainly in charge of developing research on Gene Expression/SNP Arrays data, NGS whole -exome and -transcriptome datasets and biological networks in the contexts of genetic diseases, innovative therapies and regenerative medicine. Main activities will be: (i) data analysis (short-reads mapping, genomics aberrations discovery and annotation, variants pathogenicity detection); (ii) functional/pathway enrichment analysis; (iii) biological networks analysis (artificial knockout, redundancy and lethality analysis, gene set essentiality); (iv) developing of ad-hoc software solutions/routines on clusters of CPUs and GPUs.

The correct cultural background (training in Biology / Computer Science / Statistics or a mix of the three) and a strong interest in working in high throughput data analysis will be considered at the same level of specific experience in the above-mentioned fields.

Knowledge of molecular modeling and simulation and willingness to learn one or more of these languages: python, perl, R, Java, C++, C# is a golden plus. Good knowledge of Scientific English will be positively evaluated for this position, together with good presentation and teamwork skills.

Candidates should send:
• a cover letter explaining the role they would like to undertake within the Center, even if it is not listed in this job adv, stating clearly why they would be a good fit to the proposed role, and what they would bring to the Center in terms of expertise, ideas, talent;
• a CV including a list of publications;
• List of referees.

A CV with one professional reference, details on educational background and of the biological and/or bioinformatic and/or data analysis skills and experience should be sent by email for a preliminary selection to: Tommaso Mazza t.mazza@css-mendel.it

Ancient whole genome duplication (WGD) detection tools !

Rahul Nayak — Sun, 07 Mar 2021 00:32:44 -0600

There are two methods for ancient WGD detection, one is collinearity analysis, and the other is based on the Ks distribution map. Among them, Ks is defined as the average number of synonymous substitutions at each synonymous site, and there is also a Ka corresponding to it, which refers to the average number of non-synonymous substitutions at each non-synonymous site.

At present, some people have posted articles about the analysis process of WGD. I searched for the keyword "wgd pipeline" and found the following:

GenoDup: https:// github.com/MaoYafei/GenoDup-Pipeline
https://peerj.com/articles/6303/
WGDdetector: https:// github.com/yongzhiyang2 012/WGDdetector
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2670-3
wgd: https:// github.com/arzwa/wgd
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1142-2#Sec1
https://bmcbiol.biomedcentral.com/articles/10.1186/s12915-017-0399-x
GeNoGAP https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1142-2
https://bmcbiol.biomedcentral.com/articles/10.1186/s12915-017-0399-x
https://github.com/dfguan/purge_dups
https://www.biorxiv.org/content/10.1101/2020.01.24.917997v1

This article introduces the usage of wgd.

Wgd cannot be installed directly with bioconda at present, so it is a little troublesome to install, because it depends on a lot of software. wgd depends on the following software

BLAST
MCL
MUSCLE/MAFFT/PRANK
PAML
PhyML/FastTree
i-ADHoRe

But the good news is that most of the software it depends on can be installed with bioconda

conda create -n wgd python=3.5 blast mcl muscle mafft prank paml fasttree cmake libpng mpi=1.0=mpich
conda activate wgd

Here mpi=1.0=mpich is selected, because i-adhore depends on mpich. If openmpi is installed, an error will appear while loading shared libraries: libmpi_cxx.so.40: cannot open shared object file: No such file or directory

After that, the installation is much simpler

git clone https://github.com/arzwa/wgd.git
cd wgd
pip install .
pip install git+https://github.com/arzwa/wgd.git
For i-ADHoRe, you need to register at http:// bioinformatics.psb.ugent.be /webtools/i-adhore/licensing/Agree to the license to download i-ADHoRe-3.0

Since my miniconda3 installed ~/opt/, the installation path is so~/opt/miniconda3/envs/wgd/

tar -zxvf i-adhore-3.0.01.tar.gz
cd i-adhore-3.0.01
mkdir -p build && cd build
cmake .. -DCMAKE_INSTALL_PREFIX=~/opt/miniconda3/envs/wgd/
make -j 4
make insatall

Take the sugarcane genome Saccharum spontaneum L as an example. The genome is 8-ploid with 32 chromosomes (2n = 4x8 = 32)

Download the tutorial for CDS and GFF annotation files

mkdir -p wgd_tutorial && cd wgd_tutorial
wget http://www.life.illinois.edu/ming/downloads/Spontaneum_genome/Sspon.v20190103.cds.fasta.gz
wget http://www.life.illinois.edu/ming/downloads/Spontaneum_genome/Sspon.v20190103.gff3.gz
gunzip *.gz

First conda activate wgdstart our analysis environment, and then start the analysis

Step 1 : Use to wgd mclidentify homologous genes in the genome

wgd mcl -n 20 --cds --mcl -s Sspon.v20190103.cds.fasta -o Sspon_cds.out

Step 2 : Use to wgd ksdbuild Ks distribution

wgd ksd --n_threads 80 Sspon_cds.out/Sspon.v20190103.cds.fasta.blast.tsv.mcl Sspon.v20190103.cds.fasta

Step 3 : If the quality of the genome is good, then wgd syncollinearity analysis can be used . It can help us find the collinearity block in the genome and the corresponding anchor point

wgd syn --feature gene --gene_attribute ID \
-ks wgd_ksd/Sspon.v20190103.cds.fasta.ks.tsv \
Sspon.v20190103.gff3 Sspon_cds.out/Sspon.v20190103.cds.fasta.blast.tsv.mcl

For more reading - There are 9 sub-modules in WGD

kde: KDE fitting to the Ks distribution
ksd: Ks distribution construction
mcl: BLASP comparison of All-vs-ALl + MCL classification analysis.
mix: Hybrid modeling of Ks distribution.
pre: preprocess the CDS file
syn: Call I-ADHoRe 3.0 to use GFF files for collinearity analysis
viz: draw histogram and density plot
wf1: Ks standard analysis procedure of the whole genome paranome (paranome), call mcl, ksd and syn
wf2: Ks standard analysis procedure of one-vs-one homologous gene (ortholog), call wcl and kSD

URDIP Bioinformatics RA/JRF Vacancies

Sat, 20 Sep 2014 20:52:56 -0500

CSIR - UNIT FOR RESEARCH AND DEVELOPMENT OF INFORMATION PRODUCTS (CSIR- URDIP)

Adv. No. URDIP/ 6/2014

Opportunity for young Bioinformatics Professionals to make a career in the area of Intellectual Property CSIR has set up a Unit for Research and Development of Information Products (CSIR-URDIP) at Pune to work in the area of scientific informatics. One of the major focus areas of research work at CSIR-URDIP is PATENT INFORMATICS. With the increasing applications of Bioinformatics in the areas of life sciences industry such as Agriculture and Health Care (Diagnostics and Drugs), the output of research in these area is being protected by different forms of Intellectual Property rights. Realizing the importance of IP in the Bioinformatics field, Department of Biotechnology (DBT) has sanctioned a project on “Development, Facilitation and Harvesting of Bioinformatics related Intellectual Property” at CSIR-URDIP.

The project will involve application of Patent Informatics tools and techniques to Bioinformatics (including creation of patent landscapes, preparation of techno-legal reports of patentability, freedom to operate studies) to help protect IPRs and develop and conduct training programmes on IPRs related to Bioinformatics.

CSIR-URDIP invites applications from young Bioinformatics professionals to work on this emerging area which offers challenging opportunities and attractive career possibilities in future.

Position I: Research Associate

No of Positions: One

Consolidated amount Payable: Rs. 22,000/- per month + 20% HRA= Rs.26,400

Qualification: PhD in Bioinformatics. In exceptional cases, candidature of M. Tech. candidates with First class in Bioinformatics with three years of relevant work experience will also be considered.

Age Limit: 35 years. The age should not exceed the limit indicated as on a closing date of receipt of completed application form.

Upper age limit is relaxable for 5 years for SC/ST, OBC, Physically handicapped and female candidates as per CSIR/Government of India rules.

Position II: Junior Research Fellow

No of Positions: one

Consolidated amount Payable: Rs. 16,000/- + 20% HRA = 19,200

Qualification: M.Sc / BE or equivalent in Bioinformatics with minimum of 55% marks in aggregate Job requirement: Scientific literature and patent search, analysis and Report Writing

Preference: Preference will be given to candidates with knowledge of patents and or 1-2 years of experience + Knowledge of Computers (MS Excel + Word Processing)

Age Limit: 28 years. The age should not exceed the limit indicated as on a closing date of receipt of completed application form.

For details please visit our website (www.urdip.res.in/careers) for further details and apply online by 30th September, 2014.

Advertisement: http://www.urdip.res.in/download/Advt6_2014.pdf

Comparative genomics visualisation tools !

Neel — Thu, 17 Feb 2022 05:37:55 -0600

Comparative genomics visualisation tools !

Address of the bookmark: https://cmdcolin.github.io/awesome-genome-visualization/?latest=true&selected=%23BRIG&tag=Comparative

Arvados

Martin Jones — Sat, 20 Sep 2014 16:54:21 -0500

Arvados is a free and open source bioinformatics platform for genomic and biomedical data. User can Store | Organize | Compute | Share the data for free.

Address of the bookmark: https://arvados.org/