BOL: Related items

CAR: Reconstructing Contiguous Regions of an Ancestral Genome

Abhimanyu Singh — Thu, 18 May 2017 05:24:01 -0500

We describe a new method for predicting the ancestral order and orientation of those intervals from their observed adjacencies in modern species. We combine the results from this method with data from chromosome painting experiments to produce a map of an early mammalian genome that accounts for 96.8% of the available human genome sequence data. The precision is further increased by mapping inversions as small as 31 bp. Analysis of the predicted evolutionary breakpoints in the human lineage confirms certain published observations but disagrees with others. Although only a few mammalian genomes are currently sequenced to high precision, our theoretical analyses and computer simulations indicate that our results are reasonably accurate and that they will become highly accurate in the foreseeable future. Our methods were developed as part of a project to reconstruct the genome sequence of the last ancestor of human, dogs, and most other placental mammals;

Address of the bookmark: http://www.bx.psu.edu/miller_lab/car/

List of genome announcement, notes and reporting journals

Jit — Wed, 26 Jul 2017 08:01:38 -0500

Faced with an increasing number of articles describing DNA data and a need for more appropriate venues to present these data, some publishers and journals have responded by changing the structure and format of genome papers. Specifically, certain journals have started accepting very short manuscripts (500–1500 words) that present a new chromosome sequence, its GenBank accession number and little else. These pint-sized articles go by various names, such as genome reports, genome announcements, genome notes or genome letters, but will be referred to here broadly as genome reports. Their short length and minimal number (or complete absence) of figures, tables and article subheadings are a significant departure from long-form genome papers, which typically span 8–10 journal pages, contain many supporting items and have formal introduction, methods, results and discussion sections.

Following are the list of journals publishing pint-sized articles go by various names, such as genome reports, genome announcements, genome notes or genome letters, but will be referred to here broadly as genome reports.

1. Genome Announcements, American Society for Microbiology, Genome announcement, Impact factor 1.3, A 500-word report stating that the genome of a particular organism (prokaryote, eukaryote or virus) has been sequenced and providing a citable record of the corresponding GenBank submission. Must include abstract but no text headings can be used except for ‘Acknowledgments’ and ‘References’. Cannot include figures, tables or supplemental material to present data or analysis.

Link: https://mra.asm.org/

2. Genome Biology and Evolution, Oxford University Press, Genome report, Impact factor 4.2, Focused 1500-word papers (up to six tables or figures) that publish the main evolutionary message of new genome sequences as they become submitted to GenBank. May also contain specifically focused comparative analyses of previously published genomes that contain a substantial and novel insight of broadest evolutionary significance.

Link: https://academic.oup.com/gbe

3. Journal of Biotechnology, Elsevier, Genome announcement, Impact factor 2.9, A 500-word report announcing the availability of the completely annotated genome sequence of a biotechnologically relevant organism in the corresponding database (for eukaryotes, advanced draft genomes will also be considered). Articles can contain an Abstract, a brief report on the organism and its biotechnological relevance, a table summarizing the genome features, References and an Acknowledgement. Figures are generally not allowed.

Link: https://www.journals.elsevier.com/journal-of-biotechnology

4. Journal of Genomics, Ivyspring, Genome note, Impact factor N/A, A 1000-word report (10 reference limit; conclusions not permitted) describing novel data sets from high-throughput analysis of genotypes, phenotypes, gene expression, metabolomes, proteomes or genome assemblies.Standard metrics for data quality and the experimental design must be clearly reported.

Link: http://www.jgenomics.com/

5. Memórias do Instituto, Oswaldo Cruz Oswaldo Cruz Foundation, Genome announcement and highlight, Impact factor 1.6, Dedicated to publishing new genome information from eukaryote parasites, virus, bacteria and their respective vectors, as well as re-sequencing or comparative genome analyses. Should occupy no more than three printed pages including figures and/or tables.

Link: http://memorias.ioc.fiocruz.br/

6. Molecular Ecology Resources, Wiley, Genomic resources note, Impact factor 3.7, Short notes on newly assembled and annotated transcriptomes, genome fractions or whole genomes, and/or a library of SNP/SSR markers.Authors submit a short manuscript describing how the resource was developed and where the data can be accessed. Do not appear in journal as individual papers but are instead published as part of a summary article.

Link: https://onlinelibrary.wiley.com/journal/17550998

7. Standards in Genomic Science, BioMed Central (Springer), Short genome report, Impact factor 3.2, Short (∼500-word) article on newly sequenced genome. Article format must follow guidelines and template (available from journal Web site) put forward by the SGS. Any manuscripts not using template or that are missing key figures, tables and/or references (as per the guidelines) will be returned to authors. Rationale of the content model is to provide information that is consistently and uniformly presented for rapid and easy consumption by both human and machine readers.

Link: https://standardsingenomics.biomedcentral.com/

8. 3biotech, Springer, Short genome report, Impact factor 1.3, Short (∼500-word) article on newly sequenced genome. Article format must follow guidelines (available from journal Web site). Genome of a particular organism (prokaryote, eukaryote or virus) has been sequenced and providing a citable record of the corresponding GenBank submission.

Link: https://link.springer.com/journal/13205

Eugene V. Koonin Lab

Tue, 09 Jan 2018 05:01:15 -0600

Interested in understanding the evolution of life. To obtain glimpses of such understanding, we employ existing and new methods of computational biology to perform research in several major areas.

https://www.ncbi.nlm.nih.gov/research/groups/koonin/

REGEN: Ancestral Genome Reconstruction for Bacteria

Rahul Nayak — Tue, 06 Mar 2018 05:02:36 -0600

REGEN infers evolutionary events, including gene creation and deletion and replicon fission and fusion. The reconstruction can be performed by either a maximum parsimony or a maximum likelihood method. Gene content reconstruction is based on the concept of neighboring gene pairs. REGEN was designed to be used with any set of genomes that are sufficiently related, which will usually be the case for bacteria within the same taxonomic order.

Address of the bookmark: http://www.mdpi.com/2073-4425/3/3/423

The MARVEL assembler

Jit — Fri, 04 May 2018 19:18:41 -0500

MARVEL consists of a set of tools that facilitate the overlapping, patching, correction and assembly of noisy (not so noisy ones as well) long reads.

The assembly process can be summarized as follows:

overlap
patch reads
overlap (again)
scrubbing
assembly graph construction and touring
optional read correction
fasta file creation

Address of the bookmark: https://github.com/schloi/MARVEL

GAPPadder: A Sensitive Approach for Closing Gaps on Draft Genomes with Short Sequence Reads

Jit — Mon, 14 May 2018 05:25:48 -0500

This software is provided ``as is” without warranty of any kind. In no event shall the author be held responsible for any damage resulting from the use of this software. The program package, including source codes, executables, and this documentation, is distributed free of charge. If you use this program in a publication, please cite the following reference:
Chong Chu, Xin Li, and Yufeng Wu. "GAPPadder: A Sensitive Approach for Closing Gaps on Draft Genomes with Short Sequence Reads." bioRxiv (2017): 125534.

Address of the bookmark: https://github.com/Reedwarbler/GAPPadder

SALSA: A tool to scaffold long read assemblies with Hi-C

Jit — Fri, 15 Jun 2018 04:01:15 -0500

This code is used to scaffold your assemblies using Hi-C data. This version implements some improvements in the original SALSA algorithm. If you want to use the old version, it can be found in the old_salsa branch. To use the latest version, first run the following commands: cd SALSA make To run the code, you will need Python 2.7, BOOST libraries and Networkx(version lower than 1.2). If you consider using this tool, please cite our publication which describes the methods used for scaffolding. Ghurye, J., Pop, M., Koren, S., Bickhart, D., & Chin, C. S. (2017). Scaffolding of long read assemblies using long range contact information. BMC genomics, 18(1), 527. Link Ghurye, J., Rhie, A., Walenz, B.P., Schmitt, A., Selvaraj, S., Pop, M., Phillippy, A.M. and Koren, S., 2018. Integrating Hi-C links with assembly graphs for chromosome-scale assembly. bioRxiv, p.261149 Link For any queries, please either ask on github issue page or send an email to Jay Ghurye (jayg@cs.umd.edu).

Address of the bookmark: https://github.com/machinegun/SALSA

Phased Human Genome Assembly !

Rahul Nayak — Mon, 08 Oct 2018 09:10:54 -0500

The new publicly available assembly (PacBio HG00733) has the fewest gaps of any human genome assembly, with more than half of the genome contained in gapless sequence at least 27 Mb long. The primary contig assembly is 2.89 Gb long and consists of 865 contigs that were assembled with PacBio data generated with the company’s Sequel® System. Using the FALCON-Unzip assembler, maternal and paternal haplotypes were resolved over more than 80% of the genome. Maternal and paternal haplotype blocks were then further phased using Hi-C technology and the FALCON-Phase methoddeveloped in collaboration with Phase Genomics. The genome was then de novo scaffolded using Phase Genomics’ Proximo Hi-C platform, resulting in the first chromosome-scale diploid assembly of a single individual accomplished with only two technologies. More specific details about the assembly are included on the PacBio blog.

The data are available using NCBI accession IDs: BioProject: (PRJNA483067), assembly: [RBJD00000000] and sequence data (SRP155659).

Additional Resources

Interactive map showcasing global initiatives underway to generate reference-quality human genome assemblies for diverse populations
BioReport Podcast on the value of ethnic-specific reference genomes
Nature Reviews Genetics paper from NHGRI: Prioritizing diversity in human genomics research
Article in The Journal of Precision Medicine: “Minority Report – Ethnic Diversity and the Real Promise for Precision Medicine”
Article in Bio-IT World: “Genomic Data Standards Are a Necessity”
NHGRI Project Award: High Quality Human and Non-Human Primate Genome Assemblies

More details are available on the PacBio website:

Blog post: Data Release: Highest-Quality, Most Contiguous Individual Human Genome Assembly to Date
Blog post: For Reference-Grade Human Genome Assemblies, SMRT Sequencing Yields Optimal Results
Webinar: Assembling High-Quality Human Reference Genomes for Global Populations
FALCON-Phase press release and article preprint
PacBio research focus webpage about Human Population Genetics

Ref: https://stockguru.com/2018/10/08/pacific-biosciences-releases-highest-quality-most-contiguous-individual-human-genome-assembly-to-date/

Referee: Genome assembly quality scores

Jit — Sun, 04 Nov 2018 16:44:30 -0600

Modern genome sequencing technologies provide a succint measure of quality at each position in every read, however all of this information is lost in the assembly process. Referee summarizes the quality information from the reads that map to a site in an assembled genome to calculate a quality score for each position in the genome assembly.

We accomplish this by first calculating genotype likelihoods for every site. For a given site in a diploid genome, there are 10 possible genotypes (AA, AC, AG, AT, CC, CG, CT, GG, GT, TT). Referee takes as input the genotype likelihoods calculated for all 10 genotypes given the called reference base at each position.

Referee is a program to calculate a quality score for every position in a genome assembly. This allows for easy filtering of low quality sites for any downstream analysis.

https://github.com/gwct/referee

Address of the bookmark: https://gwct.github.io/referee/#

Genome sequence-based (sub-)species delineation.

Abhimanyu Singh — Wed, 12 Dec 2018 08:31:14 -0600

The GGDC web service reports digital DDH for a universal and accurate delineation of prokaryotic (sub-)species without inheriting the pitfalls of classic DDH, and also calculates differences in genomic G+C content.

http://ggdc.dsmz.de/ggdc_background.php#

Genome-to-Genome Distance Calculator 2.1

http://ggdc.dsmz.de/ggdc.php

Address of the bookmark: http://ggdc.dsmz.de/