BOL: Related items

UniqueKmer: Generate unique KMERs for every contig in a FASTA file

Abhi — Fri, 17 Dec 2021 00:08:15 -0600

Generate unique k-mers for every contig in a FASTA file.

Unique k-mer is consisted of k-mer keys (i.e. ATCGATCCTTAAGG) that are only presented in one contig, but not presented in any other contigs (for both forward and reverse strands).

This tool accepts the input of a FASTA file consisting of many contigs, and extract unique k-mers for each contig.

The output unique k-mer file and Genome file can be used for fastv: https://github.com/OpenGene/fastv, which is an ultra-fast tool to identify and visualize microbial sequences from sequencing data.

https://github.com/OpenGene/UniqueKMER

Address of the bookmark: https://github.com/OpenGene/UniqueKMER

R for Microsoft Excel

Jitendra Narayan — Wed, 18 Feb 2015 00:43:27 -0600

If you currently use a spreadsheet like Microsoft Excel for data analysis, you might be interested in taking a look at this tutorial on how to transition from Excel to R by Tony Ojeda. The tutorial explains how to use R functions in place of Excel formulas, including tools like =AVERAGE and =VLOOKUP. For the most part, it uses modern R packages to keep the R code clear and concise.

You'll likely still be using Excel as a data source, though, so you'll also want to check out this guide to importing data from Excel to R from MilanoR.

Reference http://www.r-bloggers.com/an-r-tutorial-for-microsoft-excel-users/

Human Complete Genome

Shruti Paniwala — Wed, 06 Jul 2022 06:42:55 -0500

Telomere-to-telomere consortium

We have sequenced the CHM13hTERT human cell line with a number of technologies. Human genomic DNA was extracted from the cultured cell line. As the DNA is native, modified bases will be preserved. The data includes 30x PacBio HiFi, 120x coverage of Oxford Nanopore, 70x PacBio CLR, 50x 10X Genomics, as well as BioNano DLS and Arima Genomics HiC. Most raw data is available from this site, with the exception of the PacBio data which was generated by the University of Washington/PacBio and is available from NCBI SRA.

A UCSC browser is available for v2.0 (as well as legacy v1.0 and v1.1 versions). An interactive dotplot visualization of all genomic repeats is also available from resgen.io. Known issues identified in the assembly are tracked at CHM13 issues.

MORE at https://github.com/marbl/CHM13

Address of the bookmark: https://www.science.org/doi/10.1126/science.abj6987

A guide for complete R beginners :- R Syntax

Archana Malhotra — Fri, 20 Feb 2015 23:41:03 -0600

R is a functional based language, the inputs to a function, including options, are in brackets. Note that all dat and options are separated by a comma

Function(data, options)

Even quit is a function

So is help

help(read.table)

Provides the help page for the FUNCTION ‘read.table’

help.search(“t test”)

Searches for help pages that might relate to the phrase ‘t test’

NOTE: quotes are needed for search strings, they are not needed when referring to data objects or function names.

There is a short cut for help,

? shows the help page on a function name, same as help(function)

?read.table

?? searches for help pages on functions, same as help.search(‘phrase’)

??“t test”

Information is usually returned from a function, by default this is printed to screen

read.table(‘data.tsv’)

This can always be stored, we call what it is stored in an ‘object’

mydata

here mydata is an object of type dataframe

Reminder:

Vector: a list of numbers, equivalent to a column in a table
Data Frame = a collection of vectors. Equivalent to a table

Hint:

Up/Down arrow keys can be use to cycle through previous commands

Genome Context Viewer (GCV)

LEGE — Sun, 21 May 2023 19:33:43 -0500

The Genome Context Viewer (GCV) is a web-app that visualizes genomic context data provided by third party services. Specifically, it uses functional annotations as a unit of search and comparison. By adopting a common set of annotations, data-store operators can deploy federated instances of GCV, allowing users to compare genomes from different providers in a single interface.

Address of the bookmark: https://github.com/legumeinfo/gcv

Asst. Professor at Central University of Jharkhand (CUJ)

Sun, 01 Mar 2015 01:17:52 -0600

Central University of Jharkhand (CUJ) has issued a recruitment notification for the recruitment of Assistant Professor through recruitment notification – Central University of Jharkhand (CUJ) Recruitment 2015 – Advt. No.: CUJ/Advt./14-15/15 Date: 26th Feb. 2015. Candidates who have completed M.Sc, Ph.D can apply for the new recruitment notification from Central University of Jharkhand (CUJ)

Central University of Jharkhand has been granted funds by the Department of Biotechnology (DBT), Govt. of India to establish “DBT-Boost to CUJ Interdisciplinary Life Sciences Departments for Education and Research” Applications are invited for the Assistant Professor on purely temporary basis. The appointments shall be initially for a period of one year, renewable every year depending on the satisfactory performance, till the end of project.

Position: ASSISTANT PROFESSOR (Total 03)
Salary: 45,000/- (fixed) per month
Essential Qualifications: i. Good academic record with at least 55% marks (or an equivalent grade in a point scale wherever grading system is followed) at the master’s degree level with specialization in Biodiversity and Systematic/ Systems Biology/ Biophysics/ Bioinformatics from an Indian University, or an equivalent degree from an accredited foreign university. ii. Besides fulfilling the above qualifications, the candidates must have cleared the National Eligibility Test (NET) conducted by the UGC, CSIR or similar test accredited by the UGC like SLET/SET. iii. Notwithstanding anything contained in i. and ii. candidates, who are or have been awarded Ph.D Degree in accordance with the University Grants Commission (Minimum Standards and Procedure for Award of Ph.D. Degree) Regulation, 2009, shall be exempted from therequirement of the minimum eligibility condition of NET/SLET/SET for recruitment and appointment of Assistant Professor. iv. NET/SLET/SET shall also not be required for such disciplines for which NET/SLET/SET in not conducted.
Desirable: Preference will be given to candidates having Ph.D in any of the above mentioned areas with NET

IMPORTANT DATES TO REMEMBER :

Last Date to Apply for this job 24/3/2015

REFERENCE:

Central University of Jharkhand (CUJ) Recruitment 2015 – Advt. No.: CUJ/Advt./14-15/15 Date: 26th Feb. 2015.

More at http://cuj.ac.in/careers.php

CGView.js is a Circular Genome Viewing tool

LEGE — Wed, 27 Mar 2024 11:16:24 -0500

CGView.js is a Circular Genome Viewing tool for visualizing and interacting with small genomes. This software is an adaptation of the Java program CGView.

CGView.js is the genome viewer of Proksee, an expert system for genome assembly, annotation and visualization.

Features

Circular and linear views of genomes
Capable of drawing genomes up to 10 Mbp with 1000's of features and 100's contigs
Smooth zooming down to the sequence level
Easily generate features and plots directly form the sequence (e.g. ORFs, GC-content and GC-Skew)
Save high resolution PNG maps up to 8000x8000px
Fully documented API for interacting with CGView.js maps

Address of the bookmark: https://js.cgview.ca/

RA Bioinformatics at Ch. Charan Singh University, Meerut

Wed, 11 Mar 2015 09:07:07 -0500

Ch. Charan Singh University, Meerut

http://molbiolabccsumrt.webs.com/

Applications are invited for one post of RA in a DBT funded research project “Creation of Bioinformatics Infrastructure Facility (BIF) for the promotion of Biology Teaching through Bioinformatics (BTBI) Scheme of BTISet”.

Candidate should have a Ph.D. degree in Bioinformatics/Biotechnology/Genetics and Plant Breeding with adequate experience in the area of Bioinformatics. If a suitable candidate for the post of RA is not available, a JRF/SRF may be appointed.

Candidate for the post of JRF/SRF should have Master’s degree in relevant subject with adequate experience in the area of Bioinformatics and should be NET/DBT-BINC qualified

Interested candidates may send their bio-data to Prof. H. S. Balyan (hsbalyan@gmail.com) (in exceptional case, bio-data may also be submitted at the time of interview) and attend the interview on Monday, March 30, 2015 at 11:00 AM in the Department of Genetics & Plant Breeding, Ch. Charan Singh University, Meerut. Candidates shall bring their original documents at the time of interview for verification. No interview letters will be issued and no TA/DA will be paid.

Step-by-Step Guide to Running Genome Assembly

Abhi — Fri, 13 Dec 2024 11:35:55 -0600

Genome assembly is a critical process in bioinformatics, enabling the reconstruction of an organism's genome from short DNA sequence reads. Whether you’re working on a new microbial genome or a complex eukaryotic organism, this guide will walk you through the steps of genome assembly using state-of-the-art tools and best practices.

What is Genome Assembly?

Genome assembly involves piecing together short DNA sequence reads generated by sequencing platforms (e.g., Illumina, PacBio, Oxford Nanopore) into longer, contiguous sequences called contigs. This can be performed as:

De Novo Assembly: Without a reference genome.
Reference-Guided Assembly: Using a reference genome to guide the assembly process.

Step 1: Preparing Your Data

Before starting the assembly, ensure that your raw sequencing data is high quality.

Input Data
- Short Reads: Illumina sequencing generates short, accurate reads ideal for scaffolding.
- Long Reads: PacBio and Nanopore sequencing provide long reads for resolving repetitive regions.
Quality Control (QC)
Use tools like FastQC or MultiQC to assess the quality of your reads:

fastqc reads.fastq multiqc .

Look for issues like low-quality bases, adapter contamination, or overrepresented sequences.
Read Trimming and Filtering
Trim low-quality bases and adapters using Trimmomatic or Cutadapt:

trimmomatic PE reads_R1.fastq reads_R2.fastq trimmed_R1.fastq trimmed_R2.fastq \ ILLUMINACLIP:adapters.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:20 MINLEN:36

Step 2: Choosing an Assembly Strategy

Select an assembly strategy based on your data type:

Short-Read Assemblers:
- SPAdes: Popular for microbial genomes.
- Velvet: Fast for smaller genomes.
Long-Read Assemblers:
- Canu: Ideal for long-read datasets.
- Flye: Versatile for small and large genomes.
Hybrid Assemblers:
- MaSuRCA: Combines short and long reads.
- Unicycler: Optimized for bacterial genomes.

Step 3: Running the Assembly

3.1. SPAdes (Short-Read Assembly)

SPAdes is an excellent choice for small genomes, such as bacteria.

spades.py -1 trimmed_R1.fastq -2 trimmed_R2.fastq -o spades_output

The output includes assembled contigs (contigs.fasta) and scaffolds (scaffolds.fasta).

3.2. Canu (Long-Read Assembly)

Canu is designed for high-error long reads from PacBio or Nanopore.

canu -p genome -d canu_output genomeSize=4.7m -nanopore-raw reads.fastq

The output will be in canu_output/genome.contigs.fasta.

3.3. Hybrid Assembly with Unicycler

Unicycler combines short and long reads for improved assemblies.

unicycler -1 trimmed_R1.fastq -2 trimmed_R2.fastq -l long_reads.fastq -o unicycler_output

Step 4: Assessing Assembly Quality

After assembly, evaluate its quality using the following tools:

QUAST
QUAST generates assembly statistics, such as N50, genome size, and GC content:

quast contigs.fasta -o quast_output
BUSCO
BUSCO checks genome completeness by identifying conserved genes:

busco -i contigs.fasta -o busco_output -l fungi_odb10 -m genome
Assembly Graph Visualization
Visualize assembly graphs with Bandage:

Bandage load assembly_graph.gfa

Step 5: Post-Assembly Steps

Polishing
Improve assembly accuracy using tools like Pilon (for short reads) or Racon (for long reads).

racon long_reads.fasta mapped_reads.sam contigs.fasta > polished_contigs.fasta
Scaffolding
Link contigs into scaffolds using tools like SSPACE or Opera-LG if required.
Annotation
Annotate the assembled genome using Prokka for prokaryotes or Maker for eukaryotes.

prokka --outdir annotation_output --prefix genome contigs.fasta

Step 6: Sharing and Archiving

Submit to Public Repositories
Share your assembly in databases like NCBI GenBank, ENA, or DDBJ.
Metadata Preparation
Include detailed metadata for your submission, such as organism name, sequencing platform, and coverage.

Best Practices

Always perform quality checks at each stage to ensure data integrity.
Use multiple tools to cross-validate results when working with complex genomes.
Document parameters and software versions for reproducibility.

Conclusion

Genome assembly is a powerful process that transforms raw sequencing data into a coherent representation of an organism’s genome. By following this step-by-step guide, you can successfully assemble genomes and uncover valuable biological insights. Whether you’re assembling a microbial genome or tackling the complexities of a eukaryotic genome, these tools and strategies will set you on the path to success.

Coding Ground

Jitendra Narayan — Tue, 17 Mar 2015 00:47:20 -0500

Online coding group for most of the programming languages.

Code in almost all popular languages using Coding Ground. Edit, compile, execute and share your projects, 100% cloud.

http://www.tutorialspoint.com/codingground.htm

Address of the bookmark: http://www.tutorialspoint.com/codingground.htm