BOL: Related items

Step-by-Step Guide to Running Genome Assembly

Abhi — Fri, 13 Dec 2024 11:35:55 -0600

Genome assembly is a critical process in bioinformatics, enabling the reconstruction of an organism's genome from short DNA sequence reads. Whether you’re working on a new microbial genome or a complex eukaryotic organism, this guide will walk you through the steps of genome assembly using state-of-the-art tools and best practices.

What is Genome Assembly?

Genome assembly involves piecing together short DNA sequence reads generated by sequencing platforms (e.g., Illumina, PacBio, Oxford Nanopore) into longer, contiguous sequences called contigs. This can be performed as:

De Novo Assembly: Without a reference genome.
Reference-Guided Assembly: Using a reference genome to guide the assembly process.

Step 1: Preparing Your Data

Before starting the assembly, ensure that your raw sequencing data is high quality.

Input Data
- Short Reads: Illumina sequencing generates short, accurate reads ideal for scaffolding.
- Long Reads: PacBio and Nanopore sequencing provide long reads for resolving repetitive regions.
Quality Control (QC)
Use tools like FastQC or MultiQC to assess the quality of your reads:

fastqc reads.fastq multiqc .

Look for issues like low-quality bases, adapter contamination, or overrepresented sequences.
Read Trimming and Filtering
Trim low-quality bases and adapters using Trimmomatic or Cutadapt:

trimmomatic PE reads_R1.fastq reads_R2.fastq trimmed_R1.fastq trimmed_R2.fastq \ ILLUMINACLIP:adapters.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:20 MINLEN:36

Step 2: Choosing an Assembly Strategy

Select an assembly strategy based on your data type:

Short-Read Assemblers:
- SPAdes: Popular for microbial genomes.
- Velvet: Fast for smaller genomes.
Long-Read Assemblers:
- Canu: Ideal for long-read datasets.
- Flye: Versatile for small and large genomes.
Hybrid Assemblers:
- MaSuRCA: Combines short and long reads.
- Unicycler: Optimized for bacterial genomes.

Step 3: Running the Assembly

3.1. SPAdes (Short-Read Assembly)

SPAdes is an excellent choice for small genomes, such as bacteria.

spades.py -1 trimmed_R1.fastq -2 trimmed_R2.fastq -o spades_output

The output includes assembled contigs (contigs.fasta) and scaffolds (scaffolds.fasta).

3.2. Canu (Long-Read Assembly)

Canu is designed for high-error long reads from PacBio or Nanopore.

canu -p genome -d canu_output genomeSize=4.7m -nanopore-raw reads.fastq

The output will be in canu_output/genome.contigs.fasta.

3.3. Hybrid Assembly with Unicycler

Unicycler combines short and long reads for improved assemblies.

unicycler -1 trimmed_R1.fastq -2 trimmed_R2.fastq -l long_reads.fastq -o unicycler_output

Step 4: Assessing Assembly Quality

After assembly, evaluate its quality using the following tools:

QUAST
QUAST generates assembly statistics, such as N50, genome size, and GC content:

quast contigs.fasta -o quast_output
BUSCO
BUSCO checks genome completeness by identifying conserved genes:

busco -i contigs.fasta -o busco_output -l fungi_odb10 -m genome
Assembly Graph Visualization
Visualize assembly graphs with Bandage:

Bandage load assembly_graph.gfa

Step 5: Post-Assembly Steps

Polishing
Improve assembly accuracy using tools like Pilon (for short reads) or Racon (for long reads).

racon long_reads.fasta mapped_reads.sam contigs.fasta > polished_contigs.fasta
Scaffolding
Link contigs into scaffolds using tools like SSPACE or Opera-LG if required.
Annotation
Annotate the assembled genome using Prokka for prokaryotes or Maker for eukaryotes.

prokka --outdir annotation_output --prefix genome contigs.fasta

Step 6: Sharing and Archiving

Submit to Public Repositories
Share your assembly in databases like NCBI GenBank, ENA, or DDBJ.
Metadata Preparation
Include detailed metadata for your submission, such as organism name, sequencing platform, and coverage.

Best Practices

Always perform quality checks at each stage to ensure data integrity.
Use multiple tools to cross-validate results when working with complex genomes.
Document parameters and software versions for reproducibility.

Conclusion

Genome assembly is a powerful process that transforms raw sequencing data into a coherent representation of an organism’s genome. By following this step-by-step guide, you can successfully assemble genomes and uncover valuable biological insights. Whether you’re assembling a microbial genome or tackling the complexities of a eukaryotic genome, these tools and strategies will set you on the path to success.

IITM-Tokyo Tech Joint Symposium

Jit — Thu, 24 Oct 2019 10:30:25 -0500

The IITM-Tokyo Tech Joint Symposium is a biannual international symposium held in Indian Institute of Technology Madras (IITM), India in collaboration with Tokyo Institute of Technology (Tokyo-Tech), Japan. During the symposium, experts in various domains of Bioinformatics gather from India and Japan under one roof to discuss and present their works. This provides an unique opportunity to the researchers and students to learn the frontiers and interact with eminent scientists in Bioinformatics. The 5th IITM - Tokyo Tech Joint Symposium titled "Current trends in Bioinformatics: Big data analysis, machine learning and drug design", will be held on 6th - 7th March 2020 in IITM, Chennai, India.

The symposium will focus on topics in the below mentioned areas.

Topics: Algorithms for biomolecular sequences / structures Bioinformatics databases and tools Protein function Structure based drug design Machine learning Deep learning Large scale data analysis Big Data NGS Analysis Protein interactions/network Molecular modelling/docking/screening Biomolecular structure and function More

Info: https://web.iitm.ac.in/bioinfo2/symposium2020/home

Useful Publications and Websites for Deep Sequencing Data Analysis

Rahul Nayak — Sun, 29 Dec 2013 22:30:45 -0600

Global overview papers

Next generation quantitative genetics in plants. Jiménez-Gómez, Frontiers in Plant Science 2:77, 2011 Full Text [equally relevant to animal and microbial systems]

Sense from sequence reads: methods for alignment and assembly. Flicek & Birney, Nat Methods 6(11 Suppl):S6-S12, 2009. Full Text

Library construction and experimental design

Statistical design and analysis of RNA sequencing data. Auer & Doerge, Genetics 185(2):405-16, 2010. PubMedCentral

Biases in Illumina transcriptome sequencing caused by random hexamer priming. Hansen et al., Nucleic Acids Res. 38(12): e131, 2010. PubMedCentral

Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Aird et al, Genome Biology 12:R18, 2011 Full Text

Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of GC-biased genomes. Kozarewa et al, Nature Methods 6(4):291-5, 2009 PubMedCentral

Cost-effective, high-throughput DNA sequencing libraries for multiplexed target capture. Rohland & Reich, Genome Research 22(5): 939–946. PubMedCentral

Data formats, data management, and alignment software tools

The Sequence Alignment/Map format and SAMtools. Li et al, Bioinformatics 25(16):2078-9, 2009 PubMedCentral

SAM format specification file

Efficient storage of high throughput sequencing data using reference-based compression. Fritz et al, Genome Res 21(5):734-40, 2011. Full Text

Compression of DNA sequence reads in FASTQ format. Deorowicz & Grabowski, Bioinformatics 27(6):860-2, 2011. PubMed

Fast and accurate short read alignment with Burrows-Wheeler transform. Li & Durbin, Bioinformatics 25(14):1754-60, 2009. PubMedCentral

Improving SNP discovery by base alignment quality. Li H, Bioinformatics 27(8):1157-8, 2011. PubMed

BEDTools: a flexible suite of utilities for comparing genomic features. Quinlan and Hall, Bioinformatics 26:841-842, 2010. Publisher Website

Data quality assessment, filtering, and correction

SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data. Cox et al, BMC Bioinformatics 11:485, 2010. PubMedCentral

TileQC: a system for tile-based quality control of Solexa data. Dolan & Denver, BMC Bioinformatics 9:250, 2008 PubMedCentral [requires a reference sequence]

Quake: quality-aware detection and correction of sequencing errors. Kelley et al, Genome Biol 11(11):R116, 2010. PubMed

FastQC: a quality control tool for high-throughput sequence data. Home Page

FASTX-toolkit: FASTQ/A short-reads pre-processing tools Home Page

Reference-free validation of short read data. Schröder et al, PLoS One 5(9):e12681, 2010. PubMedCentral

Correction of sequencing errors in a mixed set of reads. Salmela, Bioinformatics 26(10):1284, 2010. Full Text [includes error correction of SOLiD reads in colorspace]

Repeat-aware modeling and correction of short read errors. Yang et al, BMC Bioinformatics 12(Supp1):S52, 2011 PubMedCentral [requires a reference sequence]

HiTEC: accurate error correction in high-throughput sequencing data. Ilie et al, Bioinformatics 27(3):295, 2011 Full Text

Error correction of high-throughput sequencing datasets with non-uniform coverage. Medvedev et al., Bioinformatics 27(13):i137-41, 2011. PubMedCentral

De novo assembly

Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Zerbino & Birney, Genome Res 18(5):821-9, 2008. u>PubMedCentral

Assembly of large genomes using second-generation sequencing. Schatz et al, Genome Res 20(9):1165-73, 2010. PubMedCentral

High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Gnerre et al, PNAS 108(4): 1513-18, 2011 PubMedCentral

Genome assembly has a major impact on gene content: a comparison of annotation in two Bos taurus assemblies. Florea et al., PLoS One 6(6):e21400, 2011. PubMedCentral

Artemis: an integrated platform for visualization and analysis of high-throughput sequence-based experimental data. Carver et al, Bioinformatics 28(4):464 - 469, 2012 PubMedCentral

Efficient de novo assembly of large genomes using compressed data structures. Simpson & Durbin, Genome Research 22:549-556, 2012 Full Text [Describes the String Graph Assembler (SGA), which assembled a human genome in less than 6 days using 54 Gb of RAM and a 123-processor compute cluster for calculation of an FM-index of the 1.2 billion reads]

Readjoiner: a fast and memory efficient string graph-based sequence assembler. Gonnella & Kurtz, BMC Bioinformatics 13: 82, 2012 PubMedCentral

Assemblathon 1: A competitive assessment of de novo short read assembly methods. Earl et al, Genome Research 21:2224-2241, 2011 Full Text

Chromatin immunoprecipation analysis: ChIP-seq

ChIP-seq: advantages and challenges of a maturing technology. Park, Nat Rev Genet. 10:669-80, 2009 PubMed

ChIP-seq and Beyond: new and improved methodologies to detect and characterize protein-DNA interactions. Furey, Nat Rev Genet 13: 840–852, 2012 Publisher Web Site

MuMoD: a Bayesian approach to detect multiple modes of protein–DNA binding from genome-wide ChIP data. Narlikar, Nucleic Acids Res 41:21–32, 2013 PubMed

Transcriptome analysis

Assembly and comparison to genome

Full-length transcriptome assembly from RNA-Seq data without a reference genome. Grabherr et al, Nature Biotechnology 29:644 - 652, 2011. PubMed [The software is called Trinity, and is available on Sourceforge.]

Comprehensive analysis of RNA-Seq data reveals extensive RNA editing in a human transcriptome. Peng et al, Nature Biotechnology 30:253 - 260, 2012. PubMed [Several comments on this paper question whether the reported differences are in fact evidence of editing or are simply sequencing errors - the authors stand by their conclusions, but the controversy demonstrates the importance of robust data analysis methods.]

Optimization of de novo transcriptome assembly from next-generation sequencing data. Surget-Groba & Montoya-Burgos, Genome Res 20(10):1432-40, 2010. Full Text

Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads. Martin et al, BMC Genomics 11:663, 2010 Full Text

De novo assembly and analysis of RNA-seq data. Robertson et al, Nature Methods 7:909-912, 2010 Full Text [describes Trans-ABySS, a pipeline to use the ABySS parallel assembler for de novo transcriptome analysis]

Differential expression analysis

R-SAP: a multi-threading computational pipeline for the characterization of high-throughput RNA-sequencing data. Mittal & McDonald, Nucleic Acids Res, 2012 Full Text

Targeted RNA sequencing reveals the deep complexity of the human transcriptome. Mercer et al, Nature Biotechnology 30:99 - 104, 2012 Publisher Website

Differential gene and transcript expression analysis of RNA-Seq experiments with TopHat and Cufflinks. Trapnell et al, Nature Protocols 7:562 - 578, 2012 Publisher Website

Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling. Łabaj et al, Bioinformatics 27:i383 - i391, 2011 Full Text

Improving RNA-Seq expression estimates by correcting for fragment bias. Roberts et al, Genome Biol 12:R22, 2011 PubMed Central

Cloud-scale RNA-sequencing differential expression analysis with Myrna. Langmead et al, Genome Biol 11:R83, 2010 Full Text

From RNA-seq reads to differential expression results. Oshlack et al, Genome Biol 11(12):220, 2010 Full Text

DEGseq: an R package for identifying differentially expressed genes from RNA-seq data. Wang et al., Bioinformatics. 26(1):136-8. 2010 PubMed

DEseq: Differential expression analysis for sequence count data. Anders and Huber, Genome Biology 11:R106, 2010 Full Text

edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Robinson et al., Bioinformatics 26(1):139-40 2010 PubMedCentral

Two-stage Poisson model for testing RNA-seq data. Auer and Doerge, SAGMB 10(1), article 26 Full Text

Experimental design, preprocessing, normalization and differential expression analysis of small RNA sequencing experiments. McCormick et al., Silence2(1):2, 2011 PubMedCentral

RNA-Seq gene expression estimation with read mapping uncertainty. Li et al, Bioinformatics 26:493-500, 2010 PubMedCentral [describes the RSEM software package]

Comparing genomes and assemblies; variant detection

Versatile and open software for comparing large genomes. Kurtz et al, Genome Biol (5(2):R12, 2004. PubMedCentral [describes the MUMmer software for full-genome alignment & comparisons]

Searching for SNPs with cloud computing. Langmead et al, Genome Biol 10(11):R134, 2009 Full Text

Calling SNPs without a reference sequence. Ratan et al, BMC Bioinformatics 11:130, 2010 PubMedCentral

Microindel detection in short-read sequence data. Krawitz et al, Bioinformatics 26(6):722-9, 2010. Full Text

vipR: variant identification in pooled DNA using R. Altmann et al., Bioinformatics 27: i77-i84, 2011. PubMedCentral

Geoseq: a tool for dissecting deep-sequencing datasets. Gurtowski et al, BMC Bioinformatics 11:506, 2010. PubMedCentral [Geoseq is a web service that allows searching deep sequencing datasets with a reference sequence of a gene of interest]

Detecting and annotating genetic variations using the HugeSeq pipeline. Lam et al, Nature Biotechnology 30:226 - 229, 2012 Publisher Website, Home Page

Genome-wide LORE1 retrotransposon mutagenesis and high-throughput insertion detection in Lotus japonicus. Urbański et al, Plant J 64:731-741, 2012. Publisher Website [This paper describes a 2-dimensional pooling strategy with barcoding to allow use of Illumina sequencing to screen for retrotransposon insertion mutations, and includes a software package called FSTpoolit for analysis of the resulting sequence reads.]

Genotyping by sequencing

Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Davey et al., Nat Rev Genet 12(7):499-510, 2011 PubMed [A review of methods available at the time]

A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. Elshire et al., PLoS One 6(5):e19379, 2011. Full Text

Development of high-density genetic maps for barley and wheat using a novel two-enzyme genotyping-by-sequencing approach. Poland et al., PLoS One 7(2): e32253, 2012. Full Text

Double digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non-model species. Peterson et al, PLoS One 7(5):e37135, . 2012. Full Text

Imputation of unordered markers and the impact on genomic selection accuracy. Rutkowski et al, G3 3(3):427-39, 2013. Full Text

Diversity Arrays Technology (DArT) and next-generation sequencing combined: genome-wide, high-throughput, highly informative genotyping for molecular breeding of Eucalyptus. Sansaloni et al., BMC Proceedings 5(Suppl 7):P54, 2011 Full Text

High-throughput genotyping by whole-genome resequencing. Huang et al., Genome Res 19(6):1068-76, 2009. Full Text

Multiplexed shotgun genotyping for rapid and efficient genetic mapping. Andolfatto et al. Genome Res 21(4):610-7, 2011. Full Text

Restriction-site Associated DNA (RAD) markers

Rapid SNP discovery and genetic mapping using sequenced RAD markers. Baird et al, PLoS One 3(10):e3376, 2008 Full Text

Linkage mapping and comparative genomics using next-generation RAD sequencing of a non-model organism. Baxter et al., PLoS One 6(4):e19315, 2011. Full Text

Genome evolution and meiotic maps by massively parallel DNA sequencing: spotted gar, an outgroup for the teleost genome duplication. Amores et al, Genetics 188(4):799-808, 2011. PubMed

Construction and application for QTL analysis of a Restriction-site Associated DNA (RAD) linkage map in barley. Chutimanitsakun et al, BMC Genomics 4; 12:4, 2011. Full Text

RAD tag sequencing as a source of SNP markers in Cynara cardunculus L. Scaglione et al., BMC Genomics 13:3, 2012. Full Text

Paired-end RAD-seq for de novo assembly and marker design without available reference. Willing et al., Bioinformatics 27(16):2187-93, 2011. Publisher Website

Local de novo assembly of RAD paired-end contigs using short sequencing reads. Etter et al., PLOS ONE 6(4): e18561, 2011. Full Text

Stacks: building and genotyping loci de novo from short-read sequences. Catchen et al., G3: Genes, Genomes, Genetics, 1:171-182, 2011. Full Text, Home Page

Rainbow: an integrated tool for efficient clustering and assembling RAD-seq reads. Chong et al, Bioinformatics 28(21):2732-7, 2012. Publisher Website

UK RAD Sequencing Wiki page, with bibliography and RADTools software download Home Page

Workspace environments

Papers

Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Goecks et al, Genome Biol 11(8):R86, 2010 PubMedCentral

Galaxy Cloudman: Delivering compute clusters. BMC Bioinformatics 11(Suppl. 12):S4, 2010 Full Text

The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. McKenna et al, Genome Res 20(9):1297-303, 2010. PubMedCentral

A framework for variation discovery and genotyping using next-generation DNA sequencing data. DePristo et al., Nat Genet 43(5):491-8, 2011. PubMed

Online resources

The R statistical computing environment includes Bioconductor, a specialized set of tools for analysis of microarray and high-throughput sequencing data. Introductory materials from on-line or short workshops are widely available online; examples are Evomics2012 Bioconductor-tutorial.pdf, and Intro to Bioconductor. Materials from an advanced course on high-throughput genetic data analysis are at Seattle 2012 materials. Thomas Girke of UC-Riverside has written a very complete set of manuals describing the use of R and Bioconductor for analysis of genomic datasets, available at R and Bioconductor Manuals.
Manuals and contributed documentation for R are available at the R-project.org website, and video tutorials are also available on Youtube; those posted by Tutorlol are brief, clear, and to the point.
Materials from a series of mini-courses in R taught in 2010 at UCLA are available:

A Little Book of R for Bioinformatics is an on-line resource with information and exercises to provide practice in bioinformatics analysis of DNA sequences and other biological data in R.
Many books on specific topics in R programming are also available through Amazon or other vendors.

Cloud computing resources

The case for cloud computing in genome informatics. Lincoln Stein, Genome Biol. 11(5):207, 2010 Pubmed

Galaxy Cloudman: delivering cloud compute clusters. Afgan et al, BMC Bioinformatics 11(Suppl 12):S4, 2010 Full Text

CloudBioLinux is an open-source project that provides a bioinformatics Linux system for cloud computing, pre-configured with a variety of software tools installed and ready to use.

A tutorial on getting started with CloudBioLinux on the Amazon Web Services Elastic Compute Cloud (EC2)

Deploying Galaxy on the Cloud slides from a presentation by Enis Afgan (Emory University) at the
Bioinformatics Open Source Conference in Boston, July 2010

A screencast that provides a step-by-step guide to starting a Galaxy cluster in the EC2 environment

A webpage that has the same information in text form, and is the basis for the screencast

The iPlant Collaborative, an NSF-funded project to create computational resources for plant biology research, provides access to cloud computing resources through Atmosphere

SeqWare Query Engine: storing and searching sequence data in the cloud. OConnor et al, BMC Bioinformatics 11(Suppl 12):S2, 2010 Full Text

An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. Taylor, BMC Bioinformatics 11(Suppl 12):S1, 2010 Full Text

Links to Linux command-line tutorials and resources

Tutorials for AWK, a powerful tool for handling data tables

A set of awk notes from Boston University
Bruce Barnett's awk tutorial
Greg Goebel's awk tutorial
Executing an awk command from R to simplify data exploratory analysis, from Lex Nederbragt

Tutorials for bash shell scripting

A tutorial at linuxconfig.org
A Getting Started With Bash tutorial at hypexr.org
Mendel Cooper's Advanced Bash Shell-Scripting Guide

Tutorials for sed, the command-line stream editor

A tutorial at Rutgers
Peteris Krumins claims to have the World's Best Introduction to Sed; take a look and judge for yourself.
Bruce Barnett's sed tutorial.

Links to other useful sites

The SEQanswers online community has forums on several topics related to sequencing; the bioinformatics forum is the most active.

The SEQanswers Software Wiki is a list of software for analysis of sequencing data

Biostar is another online community for questions and answers on bioinformatics and computational genomics.

Information on file formats used by the University of California - Santa Cruz Genome Browser is on the FAQ list

A manual for the Integrated Genome Browser visualization tool is here

Course materials for a short course entitled Introduction to R and Bioconductor, held in Seattle in Dec 2010

Genomic Regions Enrichment of Annotations Tool - A web service to test for over-representation of specific ontology categories among genes near ChIP-seq peaks

Next-gen-seq software - a list of software packages, both commercial and open-source, related to analysis of deep sequencing datasets

Software from the Center for Bioinformatics and Computational Biology, University of Maryland - many useful programs, all open-source

PLAZA: a comparative genomics resource to study gene and genome evolution in plants; described by Proost et al, Plant Cell 21:3718, 2010 Full Text

The European Bioinformatics Institute provides tools ArrayExpressHTS and R-Cloud for analysis of transcriptome data

Genome U-Plot: a whole genome visualization

Rahul Nayak — Fri, 13 Jul 2018 19:50:41 -0500

Genome U-Plot for producing clear and intuitive graphs that allows researchers to generate novel insights and hypotheses by visualizing SVs such as deletions, amplifications, and chromoanagenesis events. The main features of the Genome U-Plot are its layered layout, its high spatial resolution and its improved aesthetic qualities.

https://github.com/gaitat/GenomeUPlot

Address of the bookmark: https://github.com/gaitat/GenomeUPlot

GRSR: a tool for deriving genome rearrangement scenarios from multiple unichromosomal genome sequences

Jit — Fri, 28 Sep 2018 09:35:10 -0500

GRSR is a Tool for Deriving Genome Rearrangement Scenarios for Multiple Uni-chromosomal Genomes. This tool will do the following steps:

Step 1. Run mugsy to get multiple sequence alignment results.
Step 2 & 3. Extraction of the Coordinates of Core Blocks, Construction of Synteny Blocks and Generating Signed Permutations.
Step 4. Generate pairwise genome rearrangement scenarios and find repeats at the breakpoints of each rearrangement events.

https://github.com/DanwangJessica/GRSR

Address of the bookmark: https://github.com/DanwangJessica/GRSR

Cogent: a tool for reconstructing the coding genome using high-quality full-length transcriptome sequences.

Jit — Tue, 18 Jun 2019 05:33:04 -0500

Cogent is a tool that identifies gene families and reconstructs the coding genome using high-quality transcriptome data without a reference genome, and can be used to check assemblies for the presence of these known coding sequences.

Cogent is a tool for reconstructing the coding genome using high-quality full-length transcriptome sequences. It is designed to be used on Iso-Seq data and in cases where there is no reference genome or the ref genome is highly incomplete.

See a recent presentation on Cogent being applied to the Cuttlefish Iso-Seq data.

Cogent preliminary draft paper (updated 2016Dec version), Supplementary

Please see wiki for details on usage.

Address of the bookmark: https://github.com/Magdoll/Cogent

Genomicus: genome browser that enables users to navigate in genomes in several dimensions

Jit — Mon, 28 Feb 2022 23:27:37 -0600

Genomicus is a genome browser that enables users to navigate in genomes in several dimensions: linearly along chromosome axes, transversaly across different species, and chronologicaly along evolutionary time.

Once a query gene has been entered, it is displayed in its genomic context in parallel to the genomic context of all its orthologous and paralogous copies in all the other sequenced metazoan genomes. Moreover, Genomicus stores and displays the predicted ancestral genome structure in all the ancestral species within the phylogenetic range of interest.

All the data on extant species displayed in this browser are from Ensembl.

Summary statistics of Genomicus version 105.01: (view species tree in pdf or newick)


Number of extant species	200
Number of extant genes	4303993
Number of ancestral species	196
Number of ancestral genes	4624213
Number of ancestral synteny blocks	83342

Address of the bookmark: https://www.genomicus.bio.ens.psl.eu/genomicus-105.01/cgi-bin/search.pl

Post-doctoral Research Assistant in Genetics

Thu, 05 Jun 2014 16:01:39 -0500

Post-doctoral Research Assistant in Genetics
Camden, North London
£31.1K per annum inclusive of London Weighting

This is a fixed term post for 36 months.

We wish to recruit a highly motivated, postdoctoral scientist to carry out a BBSRC funded project in the laboratory of Dr. Denis Larkin. The project is focused on developing and applying new algorithms to study genome and chromosome evolution in birds, mammals and other vertebrate species using whole-genome sequences and existing algorithms. The post holder will use cutting edge computational and laboratory approaches to generate chromosomal assemblies for sequenced genomes, study chromosomal structures and genome differences between bird and other vertebrate species in attempt to identify species- and clade-specific genome signatures.

Applicants must have a Ph.D. and a track record of success, as indicated by first-author publications in international journals. They must possess excellent organisation skills and be capable of individual initiative and of interacting as part of a team. Applicants with extensive practical experience in bioinformatics or computer science, programming, visualization, handling of large data sets, high-performance computing are encouraged to apply. The post will involve collaboration with a wide range of academic partners both within the UK, EU and worldwide. In addition to leading their own project the post holder will have opportunities to contribute to multiple international genome initiatives.

Experience in programming, bioinformatics and comparative genome analysis is essential. Applicants should have a minimum of a degree and preferably a higher degree in a relevant subject.

The Royal Veterinary College has the largest range of veterinary, para-veterinary and animal science undergraduate and postgraduate courses of any veterinary school in the world and is one of the largest veterinary schools in Europe.

Prospective applicants are encouraged to contact Dr. Denis Larkin, Comparative Biomedical Sciences Department on +442071211906 or email: dlarkin@rvc.ac.uk

We offer a generous reward package.

For further information and to apply on-line please visit our website: www.rvc.ac.uk
Job reference CBS-0025-14A

Closing date: 4 July 2014
Interviews are likely to be held in July 2014

We promote equality of opportunity and diversity within the workplace and welcome applications from all sections of the community.

You can't hide from Genome Hackers

Neel — Sat, 13 Oct 2018 14:17:28 -0500

Young computational biologist named Yaniv Erlich shocked the research world by showing it was possible to unmask the identities of people listed in anonymous genetic databases using only an Internet connection

Paper: http://science.sciencemag.org/content/early/2018/10/10/science.aau4832

More at https://www.wired.com/story/genome-hackers-show-no-ones-dna-is-anonymous-anymore/

Roth Lab

Tue, 11 Mar 2014 17:43:45 -0500

The Roth Lab seeks insight into biological systems through genome- and proteome-scale experimentation and analysis.

Current computational interests:

Systematic analysis of genetic epistasis to identify redundant or compensatory systems and to reveal order of action in genetic pathways.
Using knockout, knockdown, or overexpression, or other perturbation experiments in combinations of genes in S. cerevisiae, C. elegans or mouse.
Using genome-scale genotyping of natural polymorphisms in S. cerevisiae and human populations.
Alternative splicing and its relationship to protein interaction networks.
Integrating large-scale studies including phenotype, genetic epistasis, protein-protein and transcription-regulatory interactions and sequence patterns to quantitatively assign function to genes and guide experimentation.

More at http://llama.mshri.on.ca/index.html