SEX-DETector is a probabilistic method that relies on RNAseq data from a cross (parents and progeny of each sex) to infer autosomal and sex-linked genes (genes located on the non recombining part of sex chromosomes).
How does SEX-DETector...
Here’s the quick and dirty of what was done:
1 Run minimap:
This uses a pre-built set of defaults (the ava-pb in the code below) for analyzing PacBio data. Minimap only accepts two FASTQ files and you need to map your FASTQ file...
The k-mer Weighted Inner Product.
This software implements a de novo, alignment free measure of sample genetic dissimilarity which operates upon raw sequencing reads. It is able to calculate the genetic dissimilarity between samples without any...
Porechop is a tool for finding and removing adapters from Oxford Nanopore reads. Adapters on the ends of reads are trimmed off, and when a read has an adapter in its middle, it is treated as chimeric and chopped into separate reads. Porechop...
WhatsHap is a software for phasing genomic variants using DNA sequencing reads, also called read-based phasing or haplotype assembly. It is especially suitable for long reads, but works also well with short reads.
Features
Very accurate...
Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs. Manta discovers, assembles...
GenomeMapper is a short read mapping tool designed for accurate read alignments. It quickly aligns millions of reads either with ungapped or gapped alignments. It can be used to align against multiple genomes simulanteously or against a single...
pbalign aligns PacBio reads to reference sequences, filters aligned reads according to user-specific filtering criteria, and converts the output to either the SAM format or PacBio Compare HDF5 (e.g., .cmp.h5) format. The output Compare HDF5 file...
Minialign is a little bit fast and moderately accurate nucleotide sequence alignment tool designed for PacBio and Nanopore long reads. It is built on three key algorithms, minimizer-based index of the minimap overlapper, array-based seed chaining,...
%MM calculates whether a given gene sequence encodes amino acids using the most common codons possible, the least common codons possible, or (most typically) some combination of these extremes. See our PLoS ONE paper for more details on how the...
SOAPaligner/soap2 is a member of the SOAP (Short Oligonucleotide Analysis Package). It is an updated version of SOAP software for short oligonucleotide alignment. The new program features in super fast and accurate alignment for huge amounts of...
BLASR (Basic Local Alignment with Successive Refinement) for mapping Single Molecule Sequencing (SMS) reads that are thousands to tens of thousands of bases long with divergence between the read and genome dominated by insertion and deletion...
CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes. It provides robust estimates of genome completeness and contamination by using collocated sets of genes that are ubiquitous...
We now need to install the poRe dependencies in R, which is very easy:
R
source("http://www.bioconductor.org/biocLite.R")
biocLite("rhdf5")
install.packages(c("shiny","bit64","data.table","svDialogs"))
q()
R may ask if you want...
use SPAdes to assemble the data. SPAdes is a swiss-army knife of genome assembly tools, and by default includes read correction. This takes up lots of RAM, so we are going to skip it. We will also only use 3 kmers to save...