BOL: Related items

AfterQC: Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data

Jit — Fri, 29 Jun 2018 03:26:03 -0500

Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data AfterQC can simply go through all fastq files in a folder and then output three folders: good, bad and QC folders, which contains good reads, bad reads and the QC results of each fastq file/pair. Currently it supports processing data from HiSeq 2000/2500/3000/4000, Nextseq 500/550, MiniSeq...and other Illumina 1.8 or newer formats The author has reimplemented this tool in C++ with multithreading support to make it much faster. The new tool is called fastp and can be found at: https://github.com/OpenGene/fastp . If you prefer a C++ based tool, please use fastp instead. https://github.com/OpenGene/AfterQC

Address of the bookmark: https://github.com/OpenGene/AfterQC

getopts.pl file

Jit — Fri, 15 Jun 2018 04:43:03 -0500

SSPACE_longread complain for getopts.pl file.

To resolve this, download and have in SSPACED-Longreads folder.

Cheers :)

SimLoRD: A read simulator for third generation sequencing reads

Aaryan Lokwani — Wed, 22 Aug 2018 10:40:27 -0500

SimLoRD is a read simulator for third generation sequencing reads and is currently focused on the Pacific Biosciences SMRT error model.

Reads are simulated from both strands of a provided or randomly generated reference sequence.

The reference can be read from a FASTA file or randomly generated with a given GC content. It can consist of several chromosomes, whose structure is respected when drawing reads. (Simulation of genome rearrangements may be incorporated at a later stage.)
The read lengths can be determined in four ways: drawing from a log-normal distribution (typical for genomic DNA), sampling from an existing FASTQ file (typical for RNA), sampling from a a text file with integers (RNA), or using a fixed length
Quality values and number of passes depend on fragment length.
Provided subread error probabilities are modified according to number of passes
Outputs reads in FASTQ format and alignments in SAM format

Address of the bookmark: https://bitbucket.org/genomeinformatics/simlord/

A Brief Bioinformatics Tutorial

Jit — Wed, 21 May 2014 12:50:09 -0500

This is about how to use a computer to find what is known about a gene of interest and also how to get new insights about it.

The tutorial is divided in three main parts:

In the Sequence part, you will see how to look efficiently for a particular protein sequence, how to blast it against the database of your choice to find homologues, how to perform a multiple alignment of the homologues you've selected and how to edit this alignment.
The Structure part is about molecular visualization, homology modeling and structural domain prediction.
In the Function part, you will be introduced to you 3 useful servers to investigate the function of a protein. i.e. finding interactors, co-expressed genes, see a phylogenetic profile, easily access papers citing your gene etc ...

During all the three parts, we will use the S. cerevisiae VPS36 protein as an example.

Address of the bookmark: http://www.mrc-lmb.cam.ac.uk/rlw/text/bioinfo_tuto/introduction.html

Machine Learning !!!

Gudiya Pal — Fri, 01 Jul 2016 12:57:12 -0500

In machine learning, computers apply statistical learning techniques to automatically identify patterns in data. These techniques can be used to make highly accurate predictions.

Keep scrolling. Using a data set about homes, we will create a machine learning model to distinguish homes in New York from homes in San Francisco.

Address of the bookmark: http://www.r2d3.us/visual-intro-to-machine-learning-part-1/

An Introduction to Applied Bioinformatics

Jit — Fri, 02 Mar 2018 04:26:38 -0600

IAB is primarily being developed by Greg Caporaso(GitHub/Twitter: @gregcaporaso) in the Caporaso Lab at Northern Arizona University. You can find information on the courses I teach on my teaching website and information on my research and lab on my lab website.

Address of the bookmark: http://readiab.org/

Bioinformatics in Africa: Part3 - Mali

BioStar — Sat, 06 Feb 2021 13:28:44 -0600

International Center for Excellence in Research (ICER):

The ICER is a research center composed of the following three programs: 1. The Malaria Research and Training Center Parasitology Group, 2. The Malaria Research and Training Center Entomology Group 3. The SEREFO Group.

The first two programs develop biomedical researches in malaria, Filariasis and Leishmaniasis. The third program develops biomedical researches in tuberculosis and HIV.

Bioinformatics was introduced recently to the ICER and is constantly growing. The ICER has one team, headed by Sidy SOUMARE, which supports the three programmes in all their needs in informatics and bioinformatics. This team can beneficiate from some computational facilities (4 blast servers, 15 other servers and around 200 terminals), but the ICER staff needs some training in order to be able to administrate these facilities.

Research Interest and Activities: The following are the present areas of research interest: 1. Functional genomics 2. Analysis of microarray data 3. Interaction between the vector and the parasite.

Bioinformatics in Africa: Part7 - Tunisia

BioStar — Sat, 06 Feb 2021 21:25:09 -0600

Institut Pasteur de Tunis (IPT):
The IPT is a research institution founded in 1883. IPT is under the supervision of the Ministry of Health and is part of the Université El Manar of Tunis (Ministry of high Education). The missions of the institute are: Public Health Laboratory activities (PHL), Research on infectious diseases, and R/D on vaccines. Research programs are mainly oriented towards local health problems such as leishmaniais, viral hepatitis, and scorpion venoms. The group of Bioinformatics and Modelling of the IPT is hosted by the Laboratoire d’Immunopathologie Vaccinologie et Génétique Moléculaire (LIVGM), and exists since the beginning of 2005. Its present research activities include: genome annotation, EST clustering and modelling of the host/parasite response to Leishmania infection. It consists of two senior scientists, two PhD students and one MSc student

Centre de Biotechnology de Sfax (CBS):
Bioinformatics activity started at CBS in 2001 with the settingup of a research and service unit of bioinformatics. This unit currently includes one senior researcher, one engineer and four Phd students. Activities include sequence annotation (service) and three research programs: ab initio prediction of short eukaryote genes, statistical modelling by Bayesian networks approach of signal transduction pathways and statistical analysis of human sequence variation data (haplotype reconstruction and linkage disequilibrium). Activities of the Bioinformatics unit could be found at the website: http://www.cbs.rnrt.tn/ and the research activity report is available under request to Bioinformatics@cbs.rnrt.tn. Although the computing facilities are good, there is still a need for trained human resources to strengthen bioinformatics capacities at CBS, particularly in structural bioinformatics.

Web site and links: http://www.cbs.rnrt.tn

Basic Structure of Snakemake Pipeline Run !

Abhi — Thu, 14 Oct 2021 07:01:38 -0500

/user/snakemake-demo$ ls

config.json data envs scripts slurm-240702.out Snakefile

data = mock data for the snakefile to use
Snakefile = name of the snakemake “formula” file
- Note: The default file that snakemake looks for in the current working directory is the Snakefile. If you would like to override that you can specify it following the -s
  - snakemake -s snakefile.py
envs = directory for storing the conda environments that the workflow will use.
scripts = directory for storing python scripts called by the snakemake formula.
config.json = json format file with extra parameters for our snakemake file to use.
cluster.json = json format file with specification for running on the HPC
samples.txt = file we will use later relating to the config.json file.

Run the snakemake file as a dry run (the example workflow shown above).

This will build a DAG of the jobs to be run without actually executing them.
snakemake --dry-run

User can execute rules of interest.

snakemake --dry-run all VS. snakemake --dry-run call VS. snakemake --dry-run bwa

Run the snakemake file in order to produce an image of the DAG of jobs to be run.

snakemake --dag | dot -Tsvg > dag.svg OR snakemake --dag | dot -Tsvg > dag.svg

Run the snakemake (this time not as a dry run)

snakemake --use-conda

Seeing Theory and Learn

LEGE — Tue, 04 Jun 2024 00:31:54 -0500

Seeing Theory was created by Daniel Kunin while an undergraduate at Brown University. The goal of this website is to make statistics more accessible through interactive visualizations (designed using Mike Bostock’s JavaScript library D3.js).

Address of the bookmark: https://seeing-theory.brown.edu/