BOL: Related items

Synteny and Rearrangement Identifier (SyRI)

Jit — Tue, 05 May 2020 10:37:10 -0500

SyRI is a comprehensive tool for predicting genomic differences between related genomes using whole-genome assemblies (WGA). The assemblies are aligned using whole-genome alignment tools, and these alignments are then used as input to SyRI. SyRI identifies syntenic path (longest set of co-linear regions), structural rearrangements (inversions, translocations, and duplications), local variations (SNPs, indels, CNVs etc) within syntenic and structural rearrangements, and un-aligned regions.

Address of the bookmark: https://schneebergerlab.github.io/syri/

Genome assembly training tutorial at Galaxy !

Jit — Sun, 27 Dec 2020 05:25:45 -0600

In this tutorial we assemble and annotate the genome of E. coli strain C-1. This strain is routinely used in experimental evolution studies involving bacteriophages. For instance, now classic works by Holly Wichman and Jim Bull (Bull 1997, Bull 1998, Wichman 1999) have been performed using this strain and bacteriophage phiX174.

Address of the bookmark: https://training.galaxyproject.org/training-material/topics/assembly/tutorials/unicycler-assembly/tutorial.html

QuasiModo - Quasispecies Metric Determination on Omics

Neel — Sat, 26 Jun 2021 15:22:56 -0500

This repository contains the scripts and pipeline that reproduces the results of the HCMV benchmarking study. In this study we evaluated genome assemblers and variant callers on 10 in vitro generated, mixed strain HCMV sequence samples, each consisting of two lab strains in different abundance ratios. This tool can also be used to evaluate assemblies and variant calling results on other similar datasets.

https://academic.oup.com/bib/article/22/3/bbaa123/5868070

Address of the bookmark: https://github.com/hzi-bifo/Quasimodo

OGDRAW - Draw Organelle Genome Maps

Abhi — Tue, 05 Oct 2021 03:34:35 -0500

OrganellarGenomeDRAW converts annotations in the GenBank or EMBL/ENA format into graphical maps. The input has to be a GenBank or EMBL/ENA flat file wherase the output can vary among several types of files. The application is optimized to create detailed high-quality maps of organellar genomes (plastid and mitochondria). Nevertheless, you can upload most database entries.

Please take a look at our FAQ section and do not hesitate to report bugs or suggestions for improvements by email.

Address of the bookmark: https://chlorobox.mpimp-golm.mpg.de/OGDraw.html

UniqueKmer: Generate unique KMERs for every contig in a FASTA file

Abhi — Fri, 17 Dec 2021 00:08:15 -0600

Generate unique k-mers for every contig in a FASTA file.

Unique k-mer is consisted of k-mer keys (i.e. ATCGATCCTTAAGG) that are only presented in one contig, but not presented in any other contigs (for both forward and reverse strands).

This tool accepts the input of a FASTA file consisting of many contigs, and extract unique k-mers for each contig.

The output unique k-mer file and Genome file can be used for fastv: https://github.com/OpenGene/fastv, which is an ultra-fast tool to identify and visualize microbial sequences from sequencing data.

https://github.com/OpenGene/UniqueKMER

Address of the bookmark: https://github.com/OpenGene/UniqueKMER

Comparative Genomics Workshops !

Rahul Nayak — Tue, 25 Jan 2022 20:39:58 -0600

This meeting's objective was to obtain a big picture look at the current state of the field of comparative genomics with a focus on commonalities across genomic investigations into humans, model organisms (both traditional and non-traditional), agricultural species, wildlife species and microbes.

https://www.genome.gov/event-calendar/perspectives-in-comparative-genomics-and-evolution

Address of the bookmark: https://www.genome.gov/event-calendar/perspectives-in-comparative-genomics-and-evolution

Environmental Genomics Group SciLifeLab/KTH Stockholm

BioStar — Thu, 01 Dec 2022 01:12:43 -0600

Useful Metagenomics resources

Address of the bookmark: https://github.com/envgen

NCBI Datasets pages

BioStar — Wed, 12 Jul 2023 06:29:31 -0500

Update! Assembly and Genome record pages now redirect to new NCBI Datasets pages. NCBI Datasets is a new resource that makes it easier to find and download genome data. Learn more: https://ncbiinsights.ncbi.nlm.nih.gov/2023/07/11/ncbi-datasets-genome-assembly-pages/ #NCBICGR

Effective July 10, 2023, NCBI’s Assembly and Genome record pages now redirect to new NCBI Datasets pages. As previously announced, these updates are part of our ongoing effort to modernize and improve your user experience. NCBI Datasets is a new resource that makes it easier to find and download genome data.  

The following pages have been updated:

The NCBI Assembly record pages now redirect to the new NCBI Datasets Genome record pages that describe assembled genomes and provide links to related NCBI tools such as Genome Data Viewer and BLAST. 
The NCBI Genome record pages now redirect to the NCBI Datasets Taxonomy record pages that provide a taxonomy-focused portal to genes, genomes, and additional NCBI resources.

During this transition, you will have the option to return to the legacy Genome and Assembly record pages. We will remove the legacy pages in early 2024. 

Tools to access the quality of your assembled genome !

LEGE — Thu, 08 Aug 2024 23:31:18 -0500

FASTA VALIDATOR + SEQKIT RMDUP: FASTA validation
GENOMETOOLS GT GFF3VALIDATOR: GFF3 validation
ASSEMBLATHON STATS: Assembly statistics
GENOMETOOLS GT STAT: Annotation statistics
NCBI FCS ADAPTOR: Adaptor contamination pass/fail
NCBI FCS GX: Foreign organism contamination pass/fail
BUSCO: Gene-space completeness estimation
TIDK: Telomere repeat identification
LAI: Continuity of repetitive sequences
KRAKEN2: Taxonomy classification
HIC CONTACT MAP: Alignment and visualisation of HiC data
MUMMER → CIRCOS + DOTPLOT & MINIMAP2 → PLOTSR: Synteny analysis
MERQURY: K-mer completeness, consensus quality and phasing assessment

Step-by-Step Guide to Running Genome Assembly

Abhi — Fri, 13 Dec 2024 11:35:55 -0600

Genome assembly is a critical process in bioinformatics, enabling the reconstruction of an organism's genome from short DNA sequence reads. Whether you’re working on a new microbial genome or a complex eukaryotic organism, this guide will walk you through the steps of genome assembly using state-of-the-art tools and best practices.

What is Genome Assembly?

Genome assembly involves piecing together short DNA sequence reads generated by sequencing platforms (e.g., Illumina, PacBio, Oxford Nanopore) into longer, contiguous sequences called contigs. This can be performed as:

De Novo Assembly: Without a reference genome.
Reference-Guided Assembly: Using a reference genome to guide the assembly process.

Step 1: Preparing Your Data

Before starting the assembly, ensure that your raw sequencing data is high quality.

Input Data
- Short Reads: Illumina sequencing generates short, accurate reads ideal for scaffolding.
- Long Reads: PacBio and Nanopore sequencing provide long reads for resolving repetitive regions.
Quality Control (QC)
Use tools like FastQC or MultiQC to assess the quality of your reads:

fastqc reads.fastq multiqc .

Look for issues like low-quality bases, adapter contamination, or overrepresented sequences.
Read Trimming and Filtering
Trim low-quality bases and adapters using Trimmomatic or Cutadapt:

trimmomatic PE reads_R1.fastq reads_R2.fastq trimmed_R1.fastq trimmed_R2.fastq \ ILLUMINACLIP:adapters.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:20 MINLEN:36

Step 2: Choosing an Assembly Strategy

Select an assembly strategy based on your data type:

Short-Read Assemblers:
- SPAdes: Popular for microbial genomes.
- Velvet: Fast for smaller genomes.
Long-Read Assemblers:
- Canu: Ideal for long-read datasets.
- Flye: Versatile for small and large genomes.
Hybrid Assemblers:
- MaSuRCA: Combines short and long reads.
- Unicycler: Optimized for bacterial genomes.

Step 3: Running the Assembly

3.1. SPAdes (Short-Read Assembly)

SPAdes is an excellent choice for small genomes, such as bacteria.

spades.py -1 trimmed_R1.fastq -2 trimmed_R2.fastq -o spades_output

The output includes assembled contigs (contigs.fasta) and scaffolds (scaffolds.fasta).

3.2. Canu (Long-Read Assembly)

Canu is designed for high-error long reads from PacBio or Nanopore.

canu -p genome -d canu_output genomeSize=4.7m -nanopore-raw reads.fastq

The output will be in canu_output/genome.contigs.fasta.

3.3. Hybrid Assembly with Unicycler

Unicycler combines short and long reads for improved assemblies.

unicycler -1 trimmed_R1.fastq -2 trimmed_R2.fastq -l long_reads.fastq -o unicycler_output

Step 4: Assessing Assembly Quality

After assembly, evaluate its quality using the following tools:

QUAST
QUAST generates assembly statistics, such as N50, genome size, and GC content:

quast contigs.fasta -o quast_output
BUSCO
BUSCO checks genome completeness by identifying conserved genes:

busco -i contigs.fasta -o busco_output -l fungi_odb10 -m genome
Assembly Graph Visualization
Visualize assembly graphs with Bandage:

Bandage load assembly_graph.gfa

Step 5: Post-Assembly Steps

Polishing
Improve assembly accuracy using tools like Pilon (for short reads) or Racon (for long reads).

racon long_reads.fasta mapped_reads.sam contigs.fasta > polished_contigs.fasta
Scaffolding
Link contigs into scaffolds using tools like SSPACE or Opera-LG if required.
Annotation
Annotate the assembled genome using Prokka for prokaryotes or Maker for eukaryotes.

prokka --outdir annotation_output --prefix genome contigs.fasta

Step 6: Sharing and Archiving

Submit to Public Repositories
Share your assembly in databases like NCBI GenBank, ENA, or DDBJ.
Metadata Preparation
Include detailed metadata for your submission, such as organism name, sequencing platform, and coverage.

Best Practices

Always perform quality checks at each stage to ensure data integrity.
Use multiple tools to cross-validate results when working with complex genomes.
Document parameters and software versions for reproducibility.

Conclusion

Genome assembly is a powerful process that transforms raw sequencing data into a coherent representation of an organism’s genome. By following this step-by-step guide, you can successfully assemble genomes and uncover valuable biological insights. Whether you’re assembling a microbial genome or tackling the complexities of a eukaryotic genome, these tools and strategies will set you on the path to success.