BOL: Related items

3d-dna: 3D de novo assembly (3D DNA) pipeline

Jit — Thu, 28 Dec 2017 10:09:37 -0600

This code is designed to enable anyone to reproduce the Hs2-HiC and the AaegL4 genomes reported in: Dudchenko et al., De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science, 2017.

Unless otherwise noted, all terminology below is consistent with this paper, and all references to figures and tables in this readme refer to this paper. Specifically, some of the terminology used below is outlined in Figure S2. The assembly procedure is described in detail in the Supporting Online Materials, specifically in the section labelled “Pipeline description”.

In addition, the pipeline uses tools and methods from Juicer (Durand & Shamim et al., Cell Systems, 2016) and Juicebox (Durand & Robinson et al., Cell Systems, 2016), as well as additional dependencies noted below.

Feel free to post your questions and comments at: http://www.aidenlab.org/forum.html

http://aidenlab.org/documentation.html

Address of the bookmark: https://github.com/theaidenlab/3d-dna

MCAT: Motif Combining and Association Tool

Neel — Sun, 13 Jan 2019 06:27:28 -0600

This is a pipeline for finding motifs in fasta files.
It can be run from the command line as follows:

usage: orange_pipeline_refine.py [-h] [-w W] [--nmotifs NMOTIFS] [--iter ITER] [-c C]
[-s S] [-d] [-ff] [-v V]
positive_seq negative_seq

positional arguments:
positive_seq the fasta file for the positive sequences
negative_seq the fasta file for the negative sequences

Address of the bookmark: https://github.com/yanshen43/MCAT

TRITEX sequence assembly pipeline for Triticeae genomes

Jit — Tue, 20 Aug 2019 09:47:14 -0500

The pipeline is open-source and hosted in a public Bitbucket repository.

TRITEX has been run on highly inbred genotypes of barley (Hordeum vulgare), tetraploid wheat (Triticum turgidum) and hexaploid wheat (T. aestivum) with reasonable results: super-scaffold N50 values in the range of dozens of Mb and pseudomolecules with better gene space representation than a BAC-by-BAC assembly. It has never been tested and is not expected to work on heterozygous or autopolyploid genomes.

A protocol for generating chromosome-conformation capture sequencing (Hi-C) data suitable for use with the pipeline is described in Himmelbach et al. 2018. Refer to the technical notes of 10X Genomics on how to generate Chromium data.

Address of the bookmark: https://tritexassembly.bitbucket.io/

3D de novo assembly (3D DNA) pipeline

Jit — Sun, 02 Feb 2020 13:41:55 -0600

For a detailed description of the pipeline and how it integrates with other tools designed by the Aiden Lab see Genome Assembly Cookbook on http://aidenlab.org/assembly.

For the original version of the pipeline and to reproduce the Hs2-HiC and the AaegL4 genomes reported in (Dudchenko et al., Science, 2017) see the original commit.

For the detailed description of the merge section see https://github.com/theaidenlab/AGWG-merge.

Address of the bookmark: https://github.com/theaidenlab/3d-dna

WGDdetector: a pipeline for detecting whole genome duplication events using the genome or transcriptome annotations

LEGE — Thu, 23 Jul 2020 05:52:56 -0500

WGDdetector pipeline that integrates all analyses including gene family constructing, dS estimating and phasing, and outputting the dS values of each paralogs pairs processed with only one command. We further chose four species (Arabidopsis thaliana, Juglans regia, Populus trichocarpa and Xenopus laevis) representing herb, wood and animal, to test its practicability. Our final results showed a high degree of accuracy with the previous studies using both genome and transcriptome data.

More at https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2670-3

Address of the bookmark: https://github.com/yongzhiyang2012/wgddetector

CSA: A high-throughput chromosome-scale assembly pipeline for vertebrate genomes

Jit — Wed, 10 Mar 2021 06:13:49 -0600

The pipeline can use information from scaffolded assemblies (for example from HiC or 10X Genomics), or even from diverged (~65-100 Mya) reference genomes for ordering the contigs and thus support the assembly process. This typically results in improved contig N50 when compared to current state of the art methods.

For smaller vertebrate genomes (~1 Gbp) chromosome scale assemblies can be achieved within 12h on high-end Desktop computers (Intel i7, 12 CPU threads, 128 GB RAM). Larger mammalian genomes (~3Gbp) can be processed within 15-18 h on server equipment (Xeon, 96 CPU threads, 1TB RAM).

Address of the bookmark: https://github.com/HMPNK/CSA2.6

Illumina based assembly pipeline steps !

Surabhi Chaudhary — Fri, 10 Dec 2021 06:22:54 -0600

Illumina

Merge re-sequenced FastQ files (cat)
Read QC (FastQC)
Adapter trimming (fastp)
Removal of host reads (Kraken 2; optional)
Variant calling
1. Read alignment (Bowtie 2)
2. Sort and index alignments (SAMtools)
3. Primer sequence removal (iVar; amplicon data only)
4. Duplicate read marking (picard; optional)
5. Alignment-level QC (picard, SAMtools)
6. Genome-wide and amplicon coverage QC plots (mosdepth)
7. Choice of multiple variant calling and consensus sequence generation routes (iVar variants and consensus; default for amplicon data || BCFTools, BEDTools; default for metagenomics data)
  - Variant annotation (SnpEff, SnpSift)
  - Consensus assessment report (QUAST)
  - Lineage analysis (Pangolin)
  - Clade assignment, mutation calling and sequence quality checks (Nextclade)
  - Individual variant screenshots with annotation tracks (ASCIIGenome)
8. Intersect variants across callers (BCFTools)
De novo assembly
1. Primer trimming (Cutadapt; amplicon data only)
2. Choice of multiple assembly tools (SPAdes || Unicycler || minia)
  - Blast to reference genome (blastn)
  - Contiguate assembly (ABACAS)
  - Assembly report (PlasmidID)
  - Assembly assessment report (QUAST)
Present QC and visualisation for raw read, alignment, assembly and variant calling results (MultiQC)

Bactopia: a flexible pipeline for complete analysis of bacterial genomes

Abhi — Sat, 08 Jun 2024 16:25:08 -0500

Bactopia is a flexible pipeline for complete analysis of bacterial genomes. The goal of Bactopia is process your data with a broad set of tools, so that you can get to the fun part of analyses quicker!

Bactopia was inspired by Staphopia, a workflow we (Tim Read and myself) released that is targeted towards Staphylococcus aureus genomes. Using what we learned from Staphopia and user feedback, Bactopia was developed from scratch with usability, portability, and speed in mind from the start.

Bactopia uses Nextflow to manage the workflow, allowing for support of many types of environments (e.g. cluster or cloud). Bactopia allows for the usage of many public datasets as well as your own datasets to further enhance the analysis of your sequencing. Bactopia only uses software packages available from Bioconda and Conda-Forge to make installation as simple as possible for all users.

To highlight the use of Bactopia and Bactopia Tools, we performed an analysis of 1,664 public Lactobacillus genomes, focusing on Lactobacillus crispatus, a species that is a common part of the human vaginal microbiome. The results from this analysis are published in mSystems under the title: Bactopia: a flexible pipeline for complete analysis of bacterial genomes

Address of the bookmark: https://bactopia.github.io/latest/

INTERNSHIP @ NIPGR

Sat, 13 Sep 2014 16:02:35 -0500

Applications are invited from suitable candidates for six months ‘Training Fellowship' at National Institute of Plant Genome Research (NIPGR).

About National Institute Of Plant Genome Research (NIPGR) http://www.nipgr.res.in/

The National Institute of Plant Genome Research is an autonomous institution supported by the Department of Biotechnology, Government of India. It is committed to make the institute a premier Institution for plant genomic research in the country. It was established to contribute in the achievement of such hopes as a part of national effort for meeting the challenges in the midst of fast pace of international genomic research and grasping of opportunities on long-term basis.

About the Internship:

The selected intern(s) will work in the area of in Bioinformatics under the BTISNET program of DBT in the Distributed Information Sub center (DISC) facility at NIPGR, New Delhi, under the supervision of Dr. Gitanjali Yadav, Scientist, NIPGR.

Who can apply:

Students currently pursuing the final year of Masters Degree (or equivalent) in Bioinformatics/Biotechnology with strong interest in Computational Biology and First class/division throughout academic career may apply.

Largest Genome Sequenced

Rahul Agarwal — Fri, 21 Mar 2014 13:57:19 -0500

The enormous size of the loblolly pine genome having 22 billion base pairs compared to only 3 billion in the human genome. In other words, it is seven times larger than a human’s and also the largest and the most complete conifer genome ever sequenced.

Related Paper:

http://genomebiology.com/2014/15/3/R59/abstract

Address of the bookmark: http://www.news.ucdavis.edu/search/news_detail.lasso?id=10859