rHAT is a seed-and-extension-based noisy long read alignment tool. It is suitable for aligning 3rd generation sequencing reads which are in large read length with relatively high error rate, especially Pacbio's Single Molecule Read-time (SMRT)...
In general intereration point of view, the 3C probe for an interaction between two specific regions that you identify a priori. In 4C, one fragment of interest is known, and you probe for all the potential fragments interacting with your...
MultiLine fasta to oneline fasta converter awk '/^>/ {printf("\n%s\n",$0);next; } { printf("%s",$0);} END {printf("\n");}' < file.fa #Multiline#Oneline
Development packages for zlib and libbz2 are needed, as well as a standard compiler environment. On Ubuntu, this can be installed via:
sudo apt-get install build-essential libtool automake zlib1g-dev libbz2-dev pkg-config
On MacOS, the Apple...
Perform Alignment-free k-tuple frequency comparisons from sequences. This can be in the form of two input files (e.g. a reference and a query) or a single file for pairwise comparisons to be made.
d2Tools are the toolbox for counting the frequency of K-tuple from sequencing datasets and then calculating the pairwise dissimilarity matrix between samples with the d2-style(d2/d2*/d2S representing d2/d2Star/d2shepp, respectively)...
Funannotate is a genome prediction, annotation, and comparison software package. It was originally written to annotate fungal genomes (small eukaryotes ~ 30 Mb genomes), but has evolved over time to accomodate larger genomes. The impetus for this...
Rebaler is a program for conducting reference-based assemblies using long reads. It relies mainly on minimap2 for alignment and Racon for making consensus sequences.
I made Rebaler for bacterial genomes (specifically for the...
Hilbert curve is a type of space-filling curves that folds one dimensional axis into a two dimensional space, but still keeps the locality. It has advantages to visualize data with long axis in following two aspects:
greatly improve...