BOL: Related items

SLURM

Jit — Wed, 04 May 2016 05:13:21 -0500

SLURM workload manager software, a free open-source workload manager designed specifically to satisfy the demanding needs of high performance computing.

This page is a HOWTO guide for setting up a SLURM installation, currently focused on a CentOS 7 Linux OS. Please send feedback to Ole.H.Nielsen /at/ fysik.dtu.dk.

See the SLURM homepage (also https://computing.llnl.gov/linux/slurm/).

Address of the bookmark: https://wiki.fysik.dtu.dk/niflheim/SLURM

Stay Connected and Productive: Unlock the Power of Screen, Tmux, and Mosh for Bioinformatics

BioStar — Wed, 22 Jan 2025 00:29:52 -0600

If you are a bioinformatician, chances are you have spent hours running long, complex analyses on remote servers only to lose your session because of an unstable connection. Frustrating, isnt it? Fear not! With tools like screen, tmux, and mosh, you can safeguard your workflow and stay productive, no matter where you are.

Why Remote Session Management is a Must-Have

In bioinformatics, tasks like genome assembly, RNA-seq analyses, and phylogenetic computations often take hours or days. A dropped SSH connection can result in:

Lost Progress: Restarting a job from scratch wastes valuable time.
Workflow Interruptions: Disruptions can derail your focus and productivity.
Corrupted Data: Interrupted processes may lead to incomplete or corrupted outputs.

By integrating screen, tmux, or mosh into your workflow, you can avoid these setbacks and ensure a seamless experience.

Screen: The Classic Workhorse

Screen is a terminal multiplexer that comes pre-installed on most Linux systems. It allows you to manage multiple terminal sessions and reconnect to them even after being disconnected.

Getting Started with Screen:

Start a Session:

screen
Detach from a Session:
Press Ctrl+A, then D.
Reattach to a Session:

screen -r

Pro Tip: Enhance your screen experience with a customized .screenrc configuration file. Download one here: Get .screenrc.

Tmux: A Modern Alternative

Tmux takes everything great about screen and adds modern features, including better key bindings and intuitive session management. It\u2019s perfect for bioinformaticians who want more control over their workflow.

Getting Started with Tmux:

Start a Session:

tmux
Detach from a Session:
Press Ctrl+B, then D.
Reattach to a Session:

tmux attach

Customize Your Tmux Experience:
Use a .tmux.conf file to personalize your setup. Grab one here: Download .tmux.conf.

Mosh: The Mobile Shell for Unreliable Connections

SSH works well for stable networks, but it struggles in areas with spotty connectivity. Enter Mosh, the Mobile Shell. Designed for intermittent networks, Mosh keeps your session alive even when the connection drops temporarily.

Why Mosh is a Game-Changer:

No lag over high-latency networks.
Automatically reconnects when the network is restored.
Ideal for working on the go, from cafes to trains.

Getting Started with Mosh:

Install Mosh:

sudo apt install mosh # For Debian/Ubuntu
Connect to a Server:

mosh username@server

Learn more at mosh.org.

Why This Matters for Bioinformatics

Every bioinformatician knows the value of time and data integrity. Tools like screen, tmux, and mosh provide a lifeline when running long analyses, enabling you to:

Safeguard your work against disconnections.
Easily manage multiple workflows in parallel.
Stay productive, even in challenging environments.

Quickstart Cheat Sheet

Screen:

screen # Start a session Ctrl+A, D # Detach screen -r # Reattach
Tmux:

tmux # Start a session Ctrl+B, D # Detach tmux attach # Reattach
Mosh:

mosh username@server

Final Thoughts

As a bioinformatician, your time is too valuable to spend restarting analyses due to technical hiccups. With screen, tmux, and mosh in your toolkit, you can work smarter, protect your progress, and stay productive no matter where you are. Start using these tools today and transform the way you work with remote systems.

Let me know how these tools work for you, and don\u2019t forget to follow for more bioinformatics tips!

clusterProfiler

Jit — Thu, 16 Jun 2016 18:57:03 -0500

statistical analysis and visulization of functional profiles for genes and gene clusters

Bioconductor version: Release (3.3)

This package implements methods to analyze and visualize functional profiles (GO and KEGG) of gene and gene clusters.

Author: Guangchuang Yu with contributions from Li-Gen Wang and Giovanni Dall'Olio.

Maintainer: Guangchuang Yu

Citation (from within R, enter citation("clusterProfiler")):

Yu G, Wang L, Han Y and He Q (2012). “clusterProfiler: an R package for comparing biological themes among gene clusters.” OMICS: A Journal of Integrative Biology, 16(5), pp. 284-287.
Installation

To install this package, start R and enter:

## try http:// if https:// URLs are not supported
source("https://bioconductor.org/biocLite.R")
biocLite("clusterProfiler")

https://www.bioconductor.org/packages/devel/bioc/vignettes/clusterProfiler/inst/doc/clusterProfiler.html

Address of the bookmark: https://www.bioconductor.org/packages/devel/bioc/vignettes/clusterProfiler/inst/doc/clusterProfiler.html

Bioinformatics opening at ICGEB NEW DELHI

Thu, 02 Mar 2017 04:16:36 -0600

ICGEB NEW DELHI

Applications are invited for:

Junior Research Fellow, in a DBT funded project, is available in Translational Health Group, ICGEB, New Delhi

Qualifications:

Education: M.Sc. (preferably in Biotechnology, Life Sciences or Zoology, Chemistry, Bioinformatics). Candidates with hands on experience on GC-MS data acquisition and analysis will be given preference. Bioinformatics expertise required.

Fellowship: As per DBT guidelines.

Tenure: The position is purely on temporary basis with an initial tenure of six months and based on satisfactory performance may continue until the completion of the project.

Closing date for applications: 04/03/2017

Please send a "TWO PAGE" CV by email to: th.icgeb@gmail.com on or before the last date.

Research Associate, in a DBT funded project, is available in Translational Health Group, ICGEB, New Delhi

Qualifications:

Education: Ph.D. (in Biology, Biotechnology, Chemistry, Bioinformatics). Candidates with hands on experience on GC-MS data acquisition and analysis will be given preference.

Fellowship: As per DBT guidelines.

Tenure: The position is purely on temporary basis with an initial tenure of six months and based on satisfactory performance may continue until the completion of the project.

Closing date for applications: 04/03/2017

Please send a "TWO PAGE" CV by email to: th.icgeb@gmail.com on or before the last date.

A Brief Bioinformatics Tutorial

Jit — Wed, 21 May 2014 12:50:09 -0500

This is about how to use a computer to find what is known about a gene of interest and also how to get new insights about it.

The tutorial is divided in three main parts:

In the Sequence part, you will see how to look efficiently for a particular protein sequence, how to blast it against the database of your choice to find homologues, how to perform a multiple alignment of the homologues you've selected and how to edit this alignment.
The Structure part is about molecular visualization, homology modeling and structural domain prediction.
In the Function part, you will be introduced to you 3 useful servers to investigate the function of a protein. i.e. finding interactors, co-expressed genes, see a phylogenetic profile, easily access papers citing your gene etc ...

During all the three parts, we will use the S. cerevisiae VPS36 protein as an example.

Address of the bookmark: http://www.mrc-lmb.cam.ac.uk/rlw/text/bioinfo_tuto/introduction.html

A guide for complete R beginners :- Getting data into R

Archana Malhotra — Tue, 24 Feb 2015 20:15:08 -0600

For a beginner this can be is the hardest part, it is also the most important to get right.

It is possible to create a vector by typing data directly into R using the combine function ‘c’

x

same as

x

creates the vector x with the numbers between 1 and 5.

You can see what is in an object at any time by typing its name;

x

will produce the output ‘[1] 1 2 3 4 5′

Note that names need to be quoted

daysofweek ← c(‘Monday’, ‘Tuesday’, ‘Wednesday’, ‘Thursday’, ‘Friday’);

Usually however you want to input from a file. We have touched on the ‘read.table’ function already.

mydata

Now mydata is a data frame with multiple vectors

each vector can be identified by the default syntax

#if any of these are typed it will print to screen

mydata$V1 mydata$V2 mydata$V3

By default the function assumes certain things from the file

The file is a plain text file (there are function to read excel files: not covered here)
columns are separated by any number of tabs or spaces
there is the same number of data points in each column
there is no header row (labels for the columns)
there is no column with names for the rows** [I’ll explain].

If any of these are false, we need to tell that to the function

If it has a header column

mydata header=T also works

Note that there is a comma between different parts of the functions arguments

If there is one less column in the header row, then R assumes that the 1^st column of data after the header are the row names

Now the vectors (columns) are identified by their name

#if any of these are typed it will print to screen

mydata$A mydata$B mydata$C

# Summary about the whole data frame

summary(mydata)

# Summary information of column A

summary(mydata$A)

We can shortcut having to type the data frame each time by attaching it

attach(mydata)

# summary of column B as ‘mydata’ is attached

summary(B)

Two other important options for read.table

If is is separated only by tabs and has a header

mydata

Really useful if you have spaces in the contents of some columns, so R does not mess up reading the columns . However if the columns or of an uneven length it will tell you.

If you know that the file has uneven columns

mydata

This causes R to fill empty spaces in a columns with ‘NA’ .

The last two examples will still work with our file and give the same result as with only headers=T

Graphs

to get an idea of what R is capable of type

demo(graphics)

steps through the examples, and the code is printed to the screen

We will work with simpler examples that have immediate use to biologists.

Remember to get more information about the options to a function type ‘?function’

Histogram of A

hist(mydata$A)

If there was more data we could increase the number of vertical columns with the option, breaks=50 (or another relevant number).

boxplot(mydata)

We can get rid of the need to type the data frame each time by using the attach function

# if not already done so

attach(mydata)
boxplot(mydata$A, mydata$B, name=c(“Value A”, “Value B”) , ylab=“Count of Something”)

same as

boxplot(A, B, name=c(“Value A”, “Value B”) , ylab=“Count of Something”)

Scatter plot

# if not already done so

attach(mydata)
plot(A,B) # or plot(mydata$A, mydata$B)

SAVING an image

Windows users (Rgui) RIGHT click on image and select which you want.

These instructions work for everyone.

You need to create a new device of the type of file you need, then send the data to that device

to save as a png file (easy to load into the likes of powerpoint, also great for web applications.

png(‘filename’)
boxplot(A, B, name=c(“Value A”, “Value B”) , ylab=“Count of Something”)

or to save as a pdf

pdf(‘filename’)
boxplot(A, B, name=c(“Value A”, “Value B”) , ylab=“Count of Something”)

Note

Nothing will appear on screen, the output is going to the file
Also it may not be saved immediately but will once the device (or R) is turned quit.

To quit R type

q() # If you save your session, next time you start R, you will have your data preloaded.

Or if you want to remain in R

dev.off() #turns of the png (or pdf etc) device, thus forces the data to save

Bioinformatician at 23andMe

Sat, 06 May 2017 17:57:39 -0500

23andMe’s mission is to help people access, understand, and benefit
from the human genome. We are a group of passionate individuals excited
to push the boundaries of what’s possible to help turn genetic insight
into better health and personal understanding.

Our Research Team prides itself on driving cutting edge, industrial-scale
science to make an impact that belies the team’s size, in an environment
and culture that fosters creativity, innovation, collaboration, and fun.

More than 80% of our customers consent to participate in research, and as
a result of their participation, we have one of the largest recontactable,
genotyped, and phenotyped research cohorts in the world. The scope and
breadth of our vision means that most of the methods and tools necessary
to unlock the potential of this unique resource for discovery have yet
to be developed.

Our science has garnered the respect of many members of the
broader scientific community. For a list of our publications, see
www.23andme.com/publications/for-scientists/.

Join us! Visit our Careers page (www.23andMe.com/careers) to learn more
about these open positions:

• Scientist, Research Communications
• Bioinformaticist
• Computational Biologist, Ancestry R&D
• Scientist/Senior Scientist, Statistical Genetics
• Scientist/Senior Scientist, Survey Methodology
• Scientist/Senior Scientist, Health R&D
• Senior Computational Biologist
• Biostatistician

pfontanillas@23andme.com

ICGEB Bioinformatics Job

Sat, 23 Jan 2021 21:01:55 -0600

The following vacancies are available in the various ongoing bioinformatics projects at.
Translational Bioinformatics Group (https://www.icgeb.org/dinesh-gupta/), ICGEB, New Delhi, India. Shortlisted candidates will be welcomed for an on-line interview at ICGEB. Only the chosen applicants will be informed individually. Preference will be given to the applicants with experience related to Bioinformatics as well as Computational area.

Interested applicants must submit their complete, updated Curriculum Vitae, mentioning details of two references as well as various other details at – http://14.139.62.220/survey/index.php/2021/01/21/icgeb-dbt-project-vacancy/

The last date of submission of applications is January 31st, 2021.

Research Associate : PhD. Degree in Computational Biology/Bioinformatics.

Consolidated Salary: 58280/- pm (including HRA).

More at https://www.icgeb.org/project-positions-translational-bioinformatics-group/ and https://www.icgeb.org/category/vacancies/

Bioinformatics Scientist, Production Bioinformatics @ South San Francisco, CA

Thu, 19 Aug 2021 08:45:24 -0500

wist is looking for a Bioinformatics Scientist to join our Production Bioinformatics Team. You will work alongside research scientists, software engineers and data scientists to further deliver on our mission to expand access to best-in-class synthetic biology and next-generation sequencing applications. You will be developing and engineering tools to better evaluate and build hardened, production quality pipelines, optimize data quality, and automate lab and bioinformatics processes. Our ideal candidate is an organized problem solver with a background in developing and building novel production-quality bioinformatics tools and packages. Equally excellent communication skills and a proven ability to work independently are required.

More at https://boards.greenhouse.io/twistbioscience/jobs/3135495?gh_src=9ecc0b941us

A Beginner's Guide to Using Kraken for Taxonomic Classification

Neel — Fri, 13 Dec 2024 11:29:03 -0600

Kraken is a popular bioinformatics tool designed for fast and accurate taxonomic classification of metagenomic sequences. Its efficiency and precision make it a go-to resource for analyzing microbial communities, including bacteria, viruses, archaea, and fungi. Whether you're new to bioinformatics or experienced in the field, Kraken is an indispensable tool for taxonomic analysis.

In this blog, we’ll walk through the basics of Kraken, from installation to running an analysis, and highlight its key features and applications.

What is Kraken?

Kraken is a sequence classification tool that assigns taxonomic labels to DNA sequences using exact k-mer matching. It uses a reference database of genomes, dividing sequences into k-mers and identifying matches in a computationally efficient way.

Key Features of Kraken

Speed: Kraken processes data much faster than alignment-based methods.
Accuracy: It uses a precise k-mer matching algorithm for high-resolution taxonomic assignments.
Scalability: It can handle large metagenomic datasets.
Custom Databases: You can build and use custom databases tailored to your research needs.

Installing Kraken

System Requirements
- A Unix-based operating system (Linux/macOS).
- Sufficient computational resources for database building (RAM and disk space).
Installation Steps
- Clone the Kraken repository from GitHub:
  
  git clone https://github.com/DerrickWood/kraken.git cd kraken
- Compile the Kraken binaries:
  
  make
- Add Kraken to your PATH for easy access:
  
  export PATH=$PATH:/path/to/kraken

Preparing a Database

Kraken requires a database of reference genomes. You can use a pre-built database or create a custom one.

Downloading a Pre-built Database
Kraken offers pre-built databases, such as the MiniKraken database, which is lightweight and suitable for smaller datasets. Download it using:

kraken-build --download-library minikraken
Building a Custom Database
To include specific genomes, download FASTA files and build the database:

kraken-build --download-library bacteria --threads 4 --db my_database kraken-build --build --db my_database

This process may take considerable time and resources, depending on the size of the database.

Running Kraken

Once the database is ready, you can classify sequences.

Basic Usage
Use the following command to classify sequences:

kraken --db my_database --threads 4 --fastq-input input_sequences.fastq --output kraken_output.txt

Key options:
- --db: Specifies the database.
- --threads: Number of threads for parallel processing.
- --fastq-input: Indicates input file format (FASTQ/FASTA).
Interpreting Results
Kraken generates an output file with columns for sequence IDs, taxonomic classifications, and the confidence score.

Visualizing Kraken Results

Kraken results can be visualized using tools like Krona or converted to human-readable reports using kraken-report.

Generate a Report

kraken-report --db my_database kraken_output.txt > kraken_report.txt
Krona Visualization
Install Krona and convert Kraken output for visualization:

cut -f2,3 kraken_output.txt | ktImportTaxonomy -o krona_output.html

Open the HTML file in your browser to interactively explore the taxonomic classifications.

Advanced Usage

Confidence Thresholds
Adjust the confidence threshold for classification using the --confidence option. Higher values reduce false positives but may miss some true positives:

kraken --db my_database --confidence 0.1 --fastq-input input.fastq
Paired-End Reads
For paired-end sequencing data, use:

kraken --db my_database --paired reads_1.fastq reads_2.fastq
Customizing K-mers
Kraken allows you to set custom k-mer lengths during database building for specific applications.

Applications of Kraken

Microbial Ecology: Characterizing microbial communities in soil, water, and the human microbiome.
Pathogen Detection: Identifying pathogens in clinical samples.
Fungal Research: Analyzing fungal diversity in metagenomic datasets.
Environmental Monitoring: Tracking microbial populations in diverse habitats.

Conclusion

Kraken is a versatile and efficient tool for taxonomic classification in metagenomics. Its speed, accuracy, and flexibility make it a favorite among bioinformaticians. By following this guide, you can set up and use Kraken to unlock insights into microbial and fungal communities, paving the way for discoveries in ecology, medicine, and biotechnology.