Research Interest:
Bioinformatics
High-throughput and high-dimensional data analysis
Microbiome data analysis (Main focus)
Next-generation and third-generation sequencing data analysis for genomics
Gene expression data...
http://rast.nmpdr.org/ - The RAST (Rapid Annotation using Subsystem Technology) annotation engine was built in 2008 to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for identifying genomic features (i.e., protein-encoding genes...
github.com - TULIP currently consists of two Perl scripts, tulipseed.perl and tulipbulb.perl. These are very much intended as prototypes, and additional components and/or implementations are likely to follow.
Tulipseed takes as input alignments files of long...
github.com - YAMP is constructed on Nextflow, a framework based on the dataflow programming model, which allows writing workflows that are highly parallel, easily portable (including on distributed systems), and very flexible and customisable,...
sourceforge.net - Modern genome sequencing strategies are highly sensitive to contamination making the detection of foreign DNA sequences an important part of analysis pipelines. Here we use Taxoblast, a simple pipeline with a graphical user interface, for the...
https://pgapx.ybzhao.com/ - PGAP-X is a microbial comparative genomic analysis platform with graphic interface. Serials of algorithms and methodologies have been developed and integrated to analyze and visualize genomics structure variation, gene distribution with different...
github.com - Run a pipeline processing fast5s to a consensus in a single command.
Recommended fixed "standard" and "fast" pipelines.
Interchange basecaller, assembler, and consensus components of the pipelines simply by changing the target filepath.
Seemless...
github.com - An increasing number of phased (i.e. with resolved haplotypes) reference genomes are available. However, most genetic variant calling tools do not explicitly account for haplotype structure. Here, we present HaploTypo, a pipeline tailored to resolve...
github.com - This is a scaffold assembler designed for stLFR reads[1]. It uses the link-reads information from stLFR reads to assemble contigs to scaffolds.
Here is an illustration of this pipeline:
github.com - PyParanoid is a pipeline for rapid identification of homologous gene families in a set of genomes - a central task of any comparative genomics analysis. The "gold standard" for identifying homologs is to use reciprocal best hits (RBHs) which depends...