BOL: Related items

The 5 reasons to mistakes at bioinformatics work !!!

Jit — Thu, 24 Jul 2014 02:51:41 -0500

When you're just starting out with biological programming, it's easy to run into complex problems that make you wonder how anyone has ever managed to write a program. There are some problems that trip up nearly every bioinformatician--everything from getting started understanding the biological problems to dealing with program design. Some random mistakes are so prominent that even experienced biological programmers do it. The 8 years in bioinformatics and my few random observations, most of them are snarky. These reasons will always take longer than expected and compel you to postpone your project deadline.

1.Stupid for biologist: Biology is so complex that it will make bioinformatician feel stupid. There are no any universal fixed rules; it can surprise you any time. So be nice to biologists who ask questions and resolve your biological puzzles. Sometime you will have no idea what the hell you were doing either.

2.Puzzling why: Do not hesitate to ask question. Especially. at the beginning of project you will have to ask a lot of questions. Instead of puzzling it out at end check out and clear your doubt even for a single error. It may can leads to wrong conclusion.

3.Running marathon: The most of the biological software’s documentation is always incomplete. In other word they are no more than 95 percent complete. Sometime a single problem can halt your entire project for months. Compilation and running the pipelines in tedious because almost all are interdependent and need proper configuration. I face the same kind of problem with Evolver :( …

4.Folders missing: The pipelines generate lots of data, and we keep them in several folders for future use. But sometime we delete them by mistake and move to recovery…

5.Digging deeper: Digging deeper is fruitful, but some time it can be catastrophic. You may get frustrated or direction less. So keep a biologist with you for rescue …. Sometime an expert computer programmer to handle your server. Remember, the server will always go down when you need it the most.

The most common frustrating common line: Why do we do this again?

Roary: the Pan Genome Pipeline

Jit — Tue, 22 Jan 2019 05:52:07 -0600

Roary is a high speed stand alone pan genome pipeline, which takes annotated assemblies in GFF3 format (produced by Prokka (Seemann, 2014)) and calculates the pan genome. Using a standard desktop PC, it can analyse datasets with thousands of samples, something which is computationally infeasible with existing methods, without compromising the quality of the results. 128 samples can be analysed in under 1 hour using 1 GB of RAM and a single processor. To perform this analysis using existing methods would take weeks and hundreds of GB of RAM. Roary is not intended for meta-genomics or for comparing extremely diverse sets of genomes.

Address of the bookmark: https://sanger-pathogens.github.io/Roary/

Protein function annotation and machine learning - UPMC - Paris, France

Sat, 02 Aug 2014 01:22:52 -0500

Protein function annotation and machine learning - UPMC - Paris, France

Job Description: We are interested in finding an excellent postdoc with interests in protein functional annotation, machine learning and computer grids. The position is open for 3.5 years at the Université Pierre et Marie Curie, in the heart of paris.

Research topic: Protein function annotation, multiple probabilistic models, domain architecture, machine learning, combinatorial optimization, computer grid.

Title: A novel integrative platform for large scale protein annotation that exploits a multitude of diversified probabilistic models in several protein signature databases.

We propose a novel integrated approach for large scale protein annotation that will exploit an unprecedented amount of genomic data as well as sophisticated machine learning techniques and combinatorial optimization approaches taking advantages of High Performance Computing (HPC) environments. The idea is to uncover as much as possible the evolutionary processes of protein sequences that took place throughout the whole tree of life and that affected the evolution of a protein family. We have already demonstrated in a previous work that the problem of functional annotation is inherent to the ability of uncovering such paths. Now, we shall extend this approach to large scale genome annotation by considering 11 different protein databases, constituted by about 10^9 protein sequences, and by producing a large pool of diversified probabilistic models coding for about 10^7 evolutionary protein pathways. Such models will be used to search for specific domains in genomes to be annotated. Our previous methodology needs to be fundamentally improved to deal with this large amount of biological data. In this project, we shall work on the algorithms to reduce the space of models and the search complexity, and we shall implement some important algorithmic changes towards the realization of a powerful integrated annotation tool.

Where: This project is run on the Laboratoire de Biologie Computationnelle et Quantitative UMR7238 CNRS-UPMC – Analytical Genomics team, headed by A.Carbone. It is co-advised with Pierre-Henri Wuillemin, Laboratoire d’Informatique de Paris 6 – Equipe DECISION.

Start date: September 1st, 2014
Contact Person: Alessandra Carbone
Contact: alessandra.carbone@lip6.fr

RaGOO: Fast Reference-Guided Scaffolding of Genome Assembly Contigs

BioJoker — Wed, 17 Apr 2019 19:45:22 -0500

Alonge M, Soyk S, Ramakrishnan S, Wang X, Goodwin S, Sedlazeck FJ, Lippman ZB, Schatz MC: Fast and accurate reference-guided scaffolding of draft genomes. bioRxiv 2019.

RaGOO is a tool for coalescing genome assembly contigs into pseudochromosomes via minimap2 alignments to a closely related reference genome. The focus of this tool is on practicality and therefore has the following features:

Good performance. On a MacBook Pro using Arabidopsis data, pseudochromosome construction takes less than a minute and the whole pipeline with SV calling takes ~2 minutes.
Intact ordering and orienting of contigs.
Chimeric contig correction
GFF lift-over
Structural variant calling with and integrated version of Assemblytics
Confidence scores associated with the grouping, localization, and orientation for each contig.

Address of the bookmark: https://github.com/malonge/RaGOO

MGSE: Mapping-based Genome Size Estimation

Shruti Paniwala — Fri, 17 Jan 2020 02:11:43 -0600

MGSE can harness the power of files generated in genome sequencing projects to predict the genome size. Required are the FASTA file containing a high continuity assembly and a BAM file with all available reads mapped to this assembly. The script construct_cov_file.py (https://doi.org/10.1186/s12864-018-5360-z) allows the generation of a COV file based on the (sorted) BAM file (also possible via MGSE directly). Next, this COV file can be used by MGSE to calculate the coverage in provided reference regions and to calculate the total number of mapped bases. Both values are subjected to the genome size estimation. Providing accurate reference regions is crucial for this genome size estimation.

Address of the bookmark: https://github.com/bpucker/MGSE

Assistant Professor in Bioinformatics at Indian Institute of Technology Delhi

Fri, 15 Aug 2014 06:16:06 -0500

Indian Institute of Technology Delhi Hauz Khas ,New Delhi – 110016

ROLLING ADVERTISEMENT NO. 01/2014(E-1)
ADVERTISEMENT FOR THE POSITIONS OF ASSISTANT PROFESSOR CANDIDATES CAN APPLY ANY TIME DURING THE YEAR.

IIT Delhi invites applications from qualified Indian Nationals, Persons of Indian Origin (PIOs) and Overseas Citizens of India (OCIs) for the following positions in the various Departments/Centres/Schools (in the fields
mentioned alongwith them):
Post Pay Band Assistant Professor and Assistant Professor (on Contract) Rs.15600-39100 (PB-3) (Minimum pay of Rs.30000/-)+ AGP Rs.8000/-

The following norms will be followed for fixing the basic pay + AGP for Assistant Professors appointed on
contract with Ph.D but experience of 3 years or less:-
Type Qualification & Experience on the date of joining
Assistant Professor (Contract) PB3 (Rs. 15,600-39,100).

MINIMUM QUALIFICATIONS AND EXPERIENCE:
Ph.D. with First class at the preceding degree or equivalent in the appropriate branch with very good academic record throughout. A minimum of three years industrial/research/teaching experience, excluding however, the experience gained while Pursuing Ph. D. The candidates should preferably be below
35 years of age for male and 38 years for female ( to be relaxed by 5 years in case of persons with physical disability, SC/ST and 3 years in case of OBC-NCL).

Qualified persons include:
(a) Indian Nationals,
(b) Foreign Nationals who are “Persons of Indian Origin” (PIO) or Overseas
Citizens of India (OCI), in whose case, if selected, permission will be sought from Govt. of India
before he/she can join IIT Delhi, or
(c) Other Foreign Nationals, in whose case, if selected, appointment will be on a contract basis for up to 5 (five) years subject to permission from the Govt. of India before he/she can join IIT Delhi.
(d) Institute specifically encourages applicants from SC/ST/OBC category as well as persons
with disability to apply for these positions.

AMAR NATH & SHASHI KHOSLA SCHOOL OF INFORMATION TECHNOLOGY:
Computational Neuroscience, Medical Applications of Information Technologies, Computational & Systems Biology, Machine to Machine (M2M) Technologies, Embedded Systems & Sensors, Computer Security.
KUSUMA SCHOOL OF BIOLOGICAL SCIENCES:
In-silico Biology Applications, Systems Biology, Infection Biology, Neurodegeneration.

More at http://www.iitd.ac.in/sites/default/files/jobs/faculty/spl-areas-rolling-advt.pdf

http://www.iitd.ac.in/content/faculty-positions

Lecturer/Senior Lecturer (Level B/C) in Bioinformatics

Fri, 22 Aug 2014 12:45:52 -0500

Lecturer/Senior Lecturer (Level B/C) in Synthetic Biology, Research Fellow (Level B) in Synthetic Biology & Lecturer/Senior Lecturer (Level B/C) in Bioinformatics

Apply now Job no: 494553
Work type: Continuing full time
Vacancy type: External Vacancy, Internal Vacancy
Categories: Academic - Teaching and Research

The Faculty of Science is launching a new and innovative branch of biological science at Macquarie University – Synthetic Biology. Synthetic biology combines engineering principles with molecular biological approaches to design and construct biological devices and systems. Recent highlights in this field include the design and synthesis of a functional bacterial genome and a yeast chromosome, and generation of synthetic bacterial cells. The rational synthesis of "designer" organisms yield important insights into how organisms work and has the potential to revolutionise biotechnological applications in areas such as bioenergy and biomanufacturing.

Find more at http://jobs.mq.edu.au/cw/en/job/494553/lecturersenior-lecturer-level-bc-in-synthetic-biology-research-fellow-level-b-in-synthetic-biology-lecturersenior-lecturer-level-bc-in-bioinformatics

Protocol for De novo Genome Assembly using Illumina Reads

BioStar — Sat, 16 Jan 2021 21:42:11 -0600

In this protocol, we address and describe the de novo assembly method for small to medium-sized genomes.

What is de novo genome assembly?
The method of taking a large number of short DNA sequences and placing them back together to create a reflection of the original chromosomes from which the DNA originated relates to genome assembly. No previous knowledge of the source DNA sequence length, structure or composition is inferred by De novo genome assemblies. The DNA of the target organism is split up into millions of tiny parts and read on a sequencing computer in a genome sequencing experiment. Depending on the sequencing system used, these "reads" range from 20 to 1000 nucleotide base pairs (bp) in length. Usually, length reads of 36 - 150 bp are produced for Illumina style short read sequencing. These reads can be either “single ended” as described above or “paired end.”

Why genome assembly?
In basic research into why and how they live, as well as in applied topics, identifying the DNA sequence of an organism is useful. Awareness of a DNA sequence may be useful in virtually any biological research because of the relevance of DNA to living things. For example, it may be used in medicine to classify, diagnose and eventually improve genetic disorder therapies. Similarly, pathogens study can lead to treatments for infectious diseases.

Raw NGS data
Reads can be saved as a Fasta file as text or in a FastQ file with their attributes. FastQ is the most common read file format since this is what the Illumina sequencing pipeline creates. This will henceforth be the subject of our conversation.

In a nutshell the protocol:
Get the sequence file(s) read from the sequencing machine (s).
Look at the readings - have an idea of what you have and what the standard is like.
If required, raw data cleanup/quality trimming.
Choose an adequate parameter set for assembly.
Assemble the data into scaffolds/contigs.
Examine the assembly performance and determine the efficiency of the assembly.

Read Quality Control:
Check the qualiy with fastQC.
Script
https://bioinformaticsonline.com/snippets/view/42540/install-fastqc-using-conda

Quality trimming/cleanup of read files.
This function trims adapters, barcodes and other contaminants from the reads.
Script
https://bioinformaticsonline.com/snippets/view/42542/trimmomatic-command

Genome Assembly:
The object of this portion of the protocol is to explain the method of assembling the reads trimmed by quality into draft contigs.

spades.py -1 illumina_R1.fastq.gz -2 illumina_R2.fastq.gz --careful --cov-cutoff auto -o result_of_spades_assembly_all_illumina

A significant range of short-read assemblers are available. Everyone with strengths and disadvantages of their own.
Some of the assemblers available include:
Velvet
SOAP-denovo
MIRA
ALLPATHS

Next step is to assess the suitability and what to do with a draft package of contiguous details for the remainder of the study now. Few stuff you can note about the contigs you just created: They're the draft Contigs. Any mis-assemblies can occur.

Mis-assembly checking and assembly metric tools:
QUAST - Quality assessment tool for genome assembly http://bioinf.spbau.ru/quast
Mauve assembly metrics - http://code.google.com/p/ngopt/wiki/How_To_Score_Genome_Assemblies_with_Mauve
InGAP-SV - https://sites.google.com/site/nextgengenomics/ingap and http://ingap.sourceforge.net/
inGAP is also useful for finding structural variants between genomes from read mappings.

Genome finishing tools:
Semi-automated gap fillers:
Gap filler - http://www.baseclear.com/landingpages/basetools-a-wide-range-of-bioinformatics-solutions/gapfiller/

IMAGE (V2) - http://sourceforge.net/apps/mediawiki/image2/index.php?title=Main_Page

Genome visualisers and editors:
Artemis - http://www.sanger.ac.uk/resources/software/artemis/
IGV - http://www.broadinstitute.org/igv/

Automated and semi automated annotation tools:
Prokka - https://github.com/tseemann/prokka
RAST - http://www.nmpdr.org/FIG/wiki/view.cgi/FIG/RapidAnnotationServer
JCVI Annotation Service - http://www.jcvi.org/cms/research/projects/annotation-service/

Frequent command use for the analysis are at:

https://bioinformaticsonline.com/blog/view/38765/list-of-tools-frequently-used-while-genome-assembly
https://bioinformaticsonline.com/pages/view/42275/frequent-parameters-for-bioinformatics-tools

PhD opportunity at Université de Liège - Belgium

Mon, 01 Sep 2014 17:16:22 -0500

The Bioinformatics and Systems Biology Unit of Université de Liège (Belgium) is looking for a highly motivated master student with programming skills for a PhD thesis project (4 years, fully funded) with the goal of designing computational tools that use literature, genomic and structural data in order to infer regulatory and metabolic networks.

Applicants are invited to send their resume and a recommendation letter to Prof. Patrick Meyer (more details at www.biosys.ulg.ac.be )

Genome Assembly Workshop 2020

Jit — Wed, 25 Aug 2021 04:30:32 -0500

Our team offers custom bioinformatics services to academic and private organizations. We have a strong academic background with a focus on cutting edge, open source software. We replicate standard analysis pipelines (best practices) when appropriate, and/or develop novel applications and pipelines when needed, however we always emphasize biological interpretation of the data.

More at https://ucdavis-bioinformatics-training.github.io/

Address of the bookmark: https://ucdavis-bioinformatics-training.github.io/2020-Genome_Assembly_Workshop/snakemake/snakemake_intro