BOL: Related items

GPS DNA tracking - University of Sheffield

Sat, 10 May 2014 04:33:28 -0500

University of Sheffield geneticist and bioinformatics expert Dr Eran Elhaik demonstrates the power of his new DNA research, which allows people to discover their genetic homeland from 1000 years ago. Find out more about our biological research here http://www.sheffield.ac.uk/aps

Pathway Analysis

Rahul Agarwal — Fri, 03 Oct 2014 08:51:13 -0500

Pathway Analysis is usually performed with aim to enrich the genes with their functional information and reveal the underlying biological mechanisms pursue by genes. Pathway Analysis is not only limited to what biological pathways a particular set of expressed genes follow but also to disclose the relationships between these genes. With availability of more genomics, transcriptomics and proteomics data, interactions between genes involve in multiple pathways become more clear and also relationships between the genes, their transcripts, and their gene products. However, existing tools and dbs mainly based on knowledge driven approach in which pathways will be identified by finding the correlation between the information in one of the pathway knowledge databases (KEGG,Reactome,Panther,BioCarta, Panther,GO,NCI,WikiPathways,etc) and gene expression result for a specific conditions for instance tumor, obesity , cold resistant crops/plants, etc.

Introductory Articles/ppt/sources:

http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1002375

http://bioinformatics.mdanderson.org/MicroarrayCourse/Lectures09/Pathway%20Analysis.pdf

http://gettinggeneticsdone.blogspot.de/2012/03/pathway-analysis-for-high-throughput.html

http://davetang.org/muse/tag/pathway/

https://www.biostars.org/p/42219/

http://bioinformatics.ca//files/public/Pathways_2014_Module4_v2.pdf

http://bioinformatics.ca//files/public/Pathways_2014_Module2.pdf

Impotant Database and Tools:

GeneMANIA, Cytoscape, IPA and Metacore (Commerical ), Pathway Commons, Reactome ,Panther, BioCyc, WikiPathways, Pathvisio, KEGG, NCI, Stringdb, Amigo, WebGestalt ,ConsensusPathDB ,GSEA,Blast2go

Popular R based tools:

Reactome.db, ReactomePA, ClusterProfiler, Gage, SPIA, topGO, Pathview,DOSE,GOStat

More:

http://www.bioconductor.org/help/search/index.html?q=Enrichment+analysis+

Managing and Analyzing Next-Generation Sequence Data

Rahul Agarwal — Sat, 10 May 2014 06:28:06 -0500

Centralized Bioinformatics Core Facilities provide shared resources for the computational and IT requirements of the investigators in their department or institution. As such, they must be able to effectively react to new types of experimental technology. Recently faced with an unprecedented flood of data generated by the next generation of DNA sequencers, these groups found it necessary to respond quickly and efficiently to the informatics and infrastructure demands. Centralized Facilities newly facing this challenge need to anticipate time and design considerations of necessary components, including infrastructure upgrades, staffing, and tools for data analyses and management ...

More at http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000369

Address of the bookmark: http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000369

P_RNA_scaffolder: a fast and accurate genome scaffolder using paired-end RNA-sequencing reads

BioStar — Fri, 07 Sep 2018 05:19:06 -0500

P_RNA_scaffolder is a novel scaffolding tool using Pair-end RNA-seq to scaffold genome fragments. The method is suitable for most genomes. The program could utilize Illumina Paired-end RNA-sequencing reads from target speciesies. Our method provides another practical alternative to existing mate-pair_based approaches or other Protein-based approaches (for instance, PEP_scaffolder ) for scaffolding genome sequences. The most important feature of this method is to improve the completeness of gene regions and long-coding gene regions (for instance, circRNA).

Address of the bookmark: http://www.fishbrowser.org/software/P_RNA_scaffolder/#

Visiting Scientist - Computational Genomics (two positions)

Mon, 07 Jul 2014 22:53:41 -0500

Scientific/Managerial & International Recruitment

ICRISAT seeks applications from Indian nationals Visiting Scientist-Computational Genomics (2 positions), to be part of a team of Centre of Excellence in Genomics (CEG), (www.icrisat.org/ceg) to work on legume genomics projects. The positions will be based at ICRISAT’s Headquarters in Patancheru, Hyderabad, India.

ICRISAT is a non-profit, non-political organization that conducts agricultural research for development in Asia and sub-Saharan Africa with a wide array of partners throughout the world. Covering 6.5 million square kilometers of land in 55 countries, the semi-arid tropics is home to over 2 billion people, with 650 million of these are the poorest of the poor. ICRISAT and its partners help empower those living in the semi-arid tropics, especially smallholder farmers, to overcome poverty, hunger, malnutrition and a degraded environment through more efficient and profitable agriculture. ICRISAT is headquartered in Greater Hyderabad, Andhra Pradesh, India and belongs to the Consortium of Centers supported by the Consultative Group on International Agricultural Research (CGIAR).

The Job: Responsibilities for these positions include:

Analyzing and handling large-scale next generation sequencing DNA and RNA data
Data mining and development of pipelines and troubleshooting
Genome diversity analysis such as SNPs, Indels, Structural Variations, population structure
Genome wide association study (GWAS) related analysis- LD analysis, hapmap and trait mapping
Expression analysis based on RNA-Seq data, annotation, gene ontology and metabolic pathway analysis
Epigenome analysis, small RNA identification
Gene family analysis, sequence level protein analysis, orthology/paralogy and molecular modelling
Compiling and analysis of results, writing reports and research papers

The Person: Ph.D. or MSc/MTech/PGDCA with two years research experience in Biotechnology, Computational biology, Agricultural/ Plant Biotechnology, Genetics, Molecular Biology or related discipline. Good knowledge of programming/scripting in at least two of following languages: Perl, C, C++, R, Shell Scripting and Python is plus.

How to apply: Please apply latest by 20 July 2014. The application should include the name of the position applied for, a letter of motivation, a full Curriculum Vita (CV), and the names and contact information of three references that are knowledgeable of the candidate’s professional qualifications and work experience. Technical details and more information about these positions can be obtained from R.K.VARSHNEY@CGIAR.ORG. All applications will be acknowledged, however only short listed candidates will be contacted.

Apply here https://recruit.zoho.com/ats/Portal.na?digest=T642sgLYWZOStExJ77cPrcM*sIMGZETWw4yPxngbmHA-

HNADOCK: a nucleic acid docking server for modeling RNA/DNA–RNA/DNA 3D complex structures

Poonam Mahapatra — Thu, 04 Jun 2020 23:19:07 -0500

The HNADOCK server is to predict the binding complex structure between two nucleic acid molecules through a hierarchical docking algorihtm of an FFT-based global search strategy and an intrinsic scoring function for nucleic acid interactions. Users are required to provide the three-dimensional (3D) structures of the two molecules to be docked.

Address of the bookmark: http://huanglab.phys.hust.edu.cn/hnadock/

R programming and Jobs website

Pragati Singh — Sun, 25 May 2014 14:43:57 -0500

Welcome to the R Jobs section of ProgrammingR.com. If your organization has an R employment opportunity that you would like to have posted here, submit it via the contact page. Prospective employees: use the contact information provided in the position listing to apply or contact the hiring organization.

Address of the bookmark: http://www.programmingr.com/category/stype/r-job-listings/

Tools for RNA classification

Abhi — Tue, 08 Nov 2022 03:39:11 -0600

barrnap - https://github.com/tseemann/barrnap

CPAT - https://github.com/liguowang/cpat, http://lilab.research.bcm.edu/ (web server)

CPC2 - https://github.com/gao-lab/CPC2_standalone, http://cpc2.gao-lab.org/ (web server)

Infernal - http://eddylab.org/infernal/, https://github.com/EddyRivasLab/infernal

NCBI RefSeq - https://www.ncbi.nlm.nih.gov/refseq/

Rfam - http://rfam.xfam.org/, https://docs.rfam.org/en/latest/index.html

SILVA - https://www.arb-silva.de/

RNAmmer - http://www.cbs.dtu.dk/services/RNAmmer/ (web server, standalone download link)

Perl one-liner for bioinformatician !!!

Abhimanyu Singh — Fri, 30 May 2014 05:49:07 -0500

With the emergence of NGS technologies, and sequencing data most of the bioinformaticians mung and wrangle around massive amounts of genomics text. There are several "standardized" file formats (FASTQ, SAM, VCF, etc.) and some tools for manipulating them (fastx toolkit, samtools, vcftools, etc.), there are still times where knowing a little bit of Perl onliner is extremely helpful.

Perl one-liners are small and awesome Perl programs that fit in a single line of code and they do one thing really well. These things include changing line spacing, numbering lines, doing calculations, converting and substituting text, deleting and printing certain lines, parsing logs, editing files in-place, doing statistics, carrying out system administration tasks, updating a bunch of files at once, and many more. Perl one-liners will make you the shell warrior. Anything that took you minutes to solve, will now take you seconds!

perl -pe '$\="\n"'
#double space a file

perl -pe '$_ .= "\n" unless /^$/'
#double space a file except blank lines

perl -pe '$_.="\n"x7'
#7 space in a line.

perl -ne 'print unless /^$/'
#remove all blank lines

perl -lne 'print if length($_) < 20'
#print all lines with length less than 20.

perl -00 -pe ''
#If there are multiple spaces, delete all leaving one(make the file a single spaced file).

perl -00 -pe '$_.="\n"x4'
#Expand single blank lines into 4 consecutive blank lines

perl -pe '$_ = "$. $_"'
#Number all lines in a file

perl -pe '$_ = ++$a." $_" if /./'
#Number only non-empty lines in a file

perl -ne 'print ++$a." $_" if /./'
#Number and print only non-empty lines in a file

perl -pe '$_ = ++$a." $_" if /regex/'
#Number only lines that match a pattern

perl -ne 'print ++$a." $_" if /regex/'
#Number and print only lines that match a pattern

perl -ne 'printf "%-5d %s", $., $_ if /regex/'
#Left align lines with 5 white spaces if matches a pattern (perl -ne 'printf "%-5d %s", $., $_' : for all the lines)

perl -le 'print scalar(grep{/./}<>)'
#prints the total number of non-empty lines in a file

perl -lne '$a++ if /regex/; END {print $a+0}'
#print the total number of lines that matches the pattern

perl -alne 'print scalar @F'
#print the total number fields(words) in each line.

perl -alne '$t += @F; END { print $t}'
#Find total number of words in the file

perl -alne 'map { /regex/ && $t++ } @F; END { print $t }'
#find total number of fields that match the pattern

perl -lne '/regex/ && $t++; END { print $t }'
#Find total number of lines that match a pattern

perl -le '$n = 20; $m = 35; ($m,$n) = ($n,$m%$n) while $n; print $m'
#will calculate the GCD of two numbers.

perl -le '$a = $n = 20; $b = $m = 35; ($m,$n) = ($n,$m%$n) while $n; print $a*$b/$m'
#will calculate lcd of 20 and 35.

perl -le '$n=10; $min=5; $max=15; $, = " "; print map { int(rand($max-$min))+$min } 1..$n'
#Generates 10 random numbers between 5 and 15.

perl -le 'print map { ("a".."z",”0”..”9”)[rand 36] } 1..8'
#Generates a 8 character password from a to z and number 0 – 9.

perl -le 'print map { ("a",”t”,”g”,”c”)[rand 4] } 1..20'
#Generates a 20 nucleotide long random residue.

perl -le 'print "a"x50'
#generate a string of ‘x’ 50 character long

perl -le 'print join ", ", map { ord } split //, "hello world"'
#Will print the ascii value of the string hello world.

perl -le '@ascii = (99, 111, 100, 105, 110, 103); print pack("C*", @ascii)'
#converts ascii values into character strings.

perl -le '@odd = grep {$_ % 2 == 1} 1..100; print "@odd"'
#Generates an array of odd numbers.

perl -le '@even = grep {$_ % 2 == 0} 1..100; print "@even"'
#Generate an array of even numbers

perl -lpe 'y/A-Za-z/N-ZA-Mn-za-m/' file
#Convert the entire file into 13 characters offset(ROT13)

perl -nle 'print uc'
#Convert all text to uppercase:

perl -nle 'print lc'
#Convert text to lowercase:

perl -nle 'print ucfirst lc'
#Convert only first letter of first word to uppercas

perl -ple 'y/A-Za-z/a-zA-Z/'
#Convert upper case to lower case and vice versa

perl -ple 's/(\w+)/\u$1/g'
#Camel Casing

perl -pe 's|\n|\r\n|'
#Convert unix new lines into DOS new lines:

perl -pe 's|\r\n|\n|'
#Convert DOS newlines into unix new line

perl -pe 's|\n|\r|'
#Convert unix newlines into MAC newlines:

perl -pe '/regexp/ && s/foo/bar/'
#Substitute a foo with a bar in a line with a regexp.

Reference/Sources:

http://genomics-array.blogspot.in/2010/11/some-unixperl-oneliners-for.html

http://genomespot.blogspot.com/2013/08/a-selection-of-useful-bash-one-liners.html

http://biowize.wordpress.com/2012/06/15/command-line-magic-for-your-gene-annotations/

http://genomics-array.blogspot.com/2010/11/some-unixperl-oneliners-for.html

http://bioexpressblog.wordpress.com/2013/04/05/split-multi-fasta-sequence-file/

Step-by-Step Guide to Detect piRNAs Using Bioinformatics

Abhi — Fri, 13 Dec 2024 11:41:46 -0600

Piwi-interacting RNAs (piRNAs) are a class of small non-coding RNAs that play crucial roles in silencing transposable elements and regulating gene expression, particularly in germline cells. Detecting piRNAs involves identifying their unique characteristics, such as size, sequence motifs, and association with Piwi proteins, from high-throughput RNA sequencing data.

This blog provides a comprehensive step-by-step guide to detect piRNAs using bioinformatics tools and workflows.

Step 1: Prepare Your Data

Obtain RNA Sequencing Data
Acquire raw small RNA-seq data in FASTQ format. Datasets can be sourced from repositories like NCBI SRA, EMBL-EBI, or specific small RNA sequencing projects.
Quality Control (QC)
Use FastQC to assess the quality of raw reads:

fastqc reads.fastq

Evaluate the per-base quality, adapter content, and overrepresented sequences.
Trimming and Adapter Removal
Use tools like Cutadapt or Trim Galore! to remove adapters and low-quality bases:

cutadapt -a TGGAATTCTCGGGTGCCAAGG -o trimmed_reads.fastq reads.fastq

Ensure the remaining reads are of high quality for downstream analysis.

Step 2: Map Reads to the Genome

Mapping reads to the reference genome is crucial for identifying piRNA loci.

Reference Genome Preparation
Download the genome assembly of your organism from databases like Ensembl, UCSC Genome Browser, or NCBI.
Align Reads
Use Bowtie or STAR for small RNA alignment:

bowtie -v 1 -k 1 --best genome_index trimmed_reads.fastq -S aligned_reads.sam
- -v 1: Allows one mismatch.
- -k 1: Reports the best alignment.
Convert SAM to BAM
Convert and sort alignments using SAMtools:

samtools view -Sb aligned_reads.sam | samtools sort -o sorted_reads.bam

Step 3: Identify Small RNAs

piRNAs are characterized by their size (24–32 nt) and strand bias.

Extract Reads by Size
Use tools like BEDtools or custom scripts to filter reads between 24 and 32 nt:

bedtools bamtofastq -i sorted_reads.bam -fq all_reads.fastq seqkit seq -m 24 -M 32 all_reads.fastq > piRNA_size_reads.fastq
Check for Sequence Bias
piRNAs often have a strong bias for a uridine at the 5’ end (1U bias). Use tools like WebLogo to visualize sequence motifs.

Step 4: Detect Ping-Pong Signature

The ping-pong amplification loop is a hallmark of piRNA biogenesis, characterized by a 10 nt overlap between piRNAs on opposite strands.

Generate Overlap Statistics
Use the piPipes tool or custom scripts to calculate overlap:

python ping_pong_overlap.py sorted_reads.bam
Visualize Overlap Distribution
Plot the distribution of overlaps to confirm the presence of the 10 nt ping-pong signature.

Step 5: Annotate piRNA Clusters

piRNAs are often generated from genomic clusters.

Cluster Identification
Use tools like proTRAC or PIRANHA to identify piRNA-producing clusters:

proTRAC.pl -s sorted_reads.bam -g genome.fa -o clusters
Annotate Genomic Regions
Annotate the identified clusters using gene annotation files (GTF/GFF). Tools like BEDtools intersect can help associate piRNA clusters with genes or transposable elements:

bedtools intersect -a clusters.bed -b genome_annotation.gtf > annotated_clusters.bed

Step 6: Functional Analysis

Functional analysis of piRNAs can uncover their targets and regulatory roles.

Predict piRNA Targets
Use tools like IntaRNA or RNAhybrid to predict interactions between piRNAs and potential target mRNAs:

RNAhybrid -t target_transcripts.fa -q piRNAs.fa > piRNA_targets.txt
Enrichment Analysis
Perform GO or KEGG enrichment analysis of target genes using tools like g:Profiler or DAVID.

Step 7: Validation and Visualization

Validate piRNA Candidates
Cross-check the identified piRNAs against known piRNA databases, such as piRBase or piRNAdb.
Visualize Results
- Use IGV (Integrative Genomics Viewer) to visualize piRNA alignment and clusters on the genome.
- Generate heatmaps or circos plots to present piRNA distributions.

Step 8: Share and Publish Findings

Archive Data
Submit sequencing data to public repositories like SRA or GEO with metadata specifying piRNA-related experiments.
Publish Results
Share findings in journals or conferences, emphasizing novel piRNA candidates, target genes, or regulatory mechanisms.

Conclusion

Detecting piRNAs involves a combination of computational and analytical methods to identify these unique small RNAs and their roles in gene regulation and transposable element suppression. By following this step-by-step guide, you can confidently navigate the complexities of piRNA detection and contribute to the growing understanding of their biological significance.