BOL: Related items

Find certain files/documents in Linux OS

Rahul Nayak — Sun, 06 Apr 2014 23:56:18 -0500

As bioinformatician I know the fact that we usually handle the large dataset and lost in the huge numbers of files and folders. In order to search the missing file a strong search command is required. The Linux Find Command is one of the most important and much used command in Linux sytems. Find command used to search and locate list of files and directories based on conditions you specify for files that match the arguments. Find can be used in variety of conditions like you can find files by permissions, users, groups, file type, date, size and other possible criteria.

Through this article we are sharing our day-to-day Linux find command experience and its usage in the form of examples. In this article we will show you the most used 35 Find Commands examples in Linux. We have divided the section into Five parts from basic to advance usage of find command.

Part I – Basic Find Commands for Finding Files with Names
1. Find Files Using Name in Current Directory

Find all the files whose name is gene.txt in a current working directory.

# find . -name gene.txt

./gene.txt

2. Find Files Under Home Directory

Find all the files under /home directory with name gene.txt.

# find /home -name gene.txt

/home/gene.txt

3. Find Files Using Name and Ignoring Case

Find all the files whose name is gene.txt and contains both capital and small letters in /home directory.

# find /home -iname gene.txt

./gene.txt
./Gene.txt

4. Find Directories Using Name

Find all directories whose name is Gene in / directory.

# find / -type d -name Gene

/Gene

5. Find fasta Files Using Name

Find all php files whose name is gene.fasta in a current working directory.

# find . -type f -name gene.fasta

./gene.fasta

6. Find all PHP Files in Directory

Find all fasta files in a directory.

# find . -type f -name "*.fasta"

./gene.fasta
./cancer.fasta
./allgene.fasta

Part II – Find Files Based on their Permissions
7. Find Files With 777 Permissions

Find all the files whose permissions are 777.

# find . -type f -perm 0777 -print

8. Find Files Without 777 Permissions

Find all the files without permission 777.

# find / -type f ! -perm 777

9. Find SGID Files with 644 Permissions

Find all the SGID bit files whose permissions set to 644.

# find / -perm 2644

10. Find Sticky Bit Files with 551 Permissions

Find all the Sticky Bit set files whose permission are 551.

# find / -perm 1551

11. Find SUID Files

Find all SUID set files.

# find / -perm /u=s

12. Find SGID Files

Find all SGID set files.

# find / -perm /g+s

13. Find Read Only Files

Find all Read Only files.

# find / -perm /u=r

14. Find Executable Files

Find all Executable files.

# find / -perm /a=x

15. Find Files with 777 Permissions and Chmod to 644

Find all 777 permission files and use chmod command to set permissions to 644.

# find / -type f -perm 0777 -print -exec chmod 644 {} \;

16. Find Directories with 777 Permissions and Chmod to 755

Find all 777 permission directories and use chmod command to set permissions to 755.

# find / -type d -perm 777 -print -exec chmod 755 {} \;

17. Find and remove single File

To find a single file called gene.txt and remove it.

# find . -type f -name "gene.txt" -exec rm -f {} \;

18. Find and remove Multiple File

To find and remove multiple files such as .fa or .gb, then use.

# find . -type f -name "*.fa" -exec rm -f {} \;

OR

# find . -type f -name "*.gb" -exec rm -f {} \;

19. Find all Empty Files

To file all empty files under certain path.

# find /tmp -type f -empty

20. Find all Empty Directories

To file all empty directories under certain path.

# find /tmp -type d -empty

21. File all Hidden Files

To find all hidden files, use below command.

# find /tmp -type f -name ".*"

Part III – Search Files Based On Owners and Groups
22. Find Single File Based on User

To find all or single file called gene.txt under / root directory of owner root.

# find / -user root -name gene.txt

23. Find all Files Based on User

To find all files that belongs to user Rahul under /home directory.

# find /home -user rahul

24. Find all Files Based on Group

To find all files that belongs to group Developer under /home directory.

# find /home -group developer

25. Find Particular Files of User

To find all .txt files of user Rahul under /home directory.

# find /home -user rahul -iname "*.txt"

Part IV – Find Files and Directories Based on Date and Time
26. Find Last 50 Days Modified Files

To find all the files which are modified 50 days back.

# find / -mtime 50

27. Find Last 50 Days Accessed Files

To find all the files which are accessed 50 days back.

# find / -atime 50

28. Find Last 50-100 Days Modified Files

To find all the files which are modified more than 50 days back and less than 100 days.

# find / -mtime +50 –mtime -100

29. Find Changed Files in Last 1 Hour

To find all the files which are changed in last 1 hour.

# find / -cmin -60

30. Find Modified Files in Last 1 Hour

To find all the files which are modified in last 1 hour.

# find / -mmin -60

31. Find Accessed Files in Last 1 Hour

To find all the files which are accessed in last 1 hour.

# find / -amin -60

Part V – Find Files and Directories Based on Size
32. Find 50MB Files

To find all 50MB files, use.

# find / -size 50M

33. Find Size between 50MB – 100MB

To find all the files which are greater than 50MB and less than 100MB.

# find / -size +50M -size -100M

34. Find and Delete 100MB Files

To find all 100MB files and delete them using one single command.

# find / -size +100M -exec rm -rf {} \;

35. Find Specific Files and Delete

Find all .gb files with more than 10MB and delete them using one single command.

# find / -type f -name *.gb -size +10M -exec rm {} \;

Fully funded position as PhD Research Fellow in genomics/bioinformatics

Wed, 03 Feb 2021 04:18:57 -0600

A fully funded position as PhD Research Fellow in genomics/bioinformatics is available at the Section for Genetics and Evolutionary Biology (EVOGENE) at the Department of Biosciences, University of Oslo.

The fellowship will be for a period of 3 years, or for a period of 4 years, with 25 % compulsory work (e.g. teaching responsibilities at the department) contingent on the qualifications of the candidate and the teaching needs of the department.

Starting date no later than October 1, 2021.

More at https://www.jobbnorge.no/en/available-jobs/job/199984/phd-research-fellow-in-genomics-and-bioinformatics

Assistant Professor at SARDAR PATEL UNIVERSITY

Mon, 21 Apr 2014 21:03:55 -0500

SARDAR PATEL UNIVERSITY
Centre for Interdisciplinary Studies in Science and Technology

No.: SPU/CISST/Advt./2014-15/519

ADVERTISEMENT for Teaching Positions (Contractual)

Applications for the following Contractual Teaching Position are invited for Centre for Interdisciplinary Studies in Science and Technology (CISST), Sardar Patel University:

2. Assistant Professor (ONE) (Contractual)

For the subject of Bioinformatics

Qualifications:

(I) Good academic record as defined by the concerned university with at least 55 % marks (or an equivalent grade in a point scale wherever grading system is followed) at the Master’s level

(II) Ph.D. degree in the concerned subject or in a relevant interdisciplinary subject
from an Indian University or NET/SLET clearance Contractual appointment carries a total Fixed Emoluments of Rs. 30,000/- p.m without any assurance of permanent Positions and related benefits.

An Application Form in prescribed Performa, available on University Website: www.spuvvn.edu should be filled in completely in Twelve Copies with self attested copies of certificates of qualifications and experience. Only one copy of each mark sheet be attached with the first copy of the application form. All 12 (Twelve) Application forms should be sent to Registrar’s office along with Demand Draft of Application form fee of Rs. 250/- (Non-refundable) in favour of “REGISTRAR, SARDAR PATEL UNIVERSITY, VALLABH VIDYANAGAR”. The S.C. and S.T. category candidates need not to pay Application fee.

Applicants who are in service should apply through their present employers. Candidates called for interview shall be required to attend at their own cost.

In absence of suitable candidate, the University may relax the eligibility criteria, for conditional appointment.

The last date of receipt of application by the University is 30th April, 2014

Advertisement: www.spuvvn.edu/careers/CISST%20Advt.%20April%202014.pdf

Assistant Professor (Bio-Informatics) at Health and Family Welfare Department (Medical Education) in Raipur

Wed, 07 May 2014 00:08:38 -0500

Advertisement No.05/2014/ Exam/Dated 17/04/2014

No of vacancies: 01

Pay scale:Rs. 15600 – 39100 + 6600/-

Essential Academic Qualifications / Experience : Good academic record as defined by the concerned university with at least 55% marks (or an equivalent grade in a point scale wherever grading system is followed) at the Master's Degree level in a relevant subject from an Indian University, or an equivalent degree from an accredited foreign university.

Besides fulfilling the above qualifications, the candidate must have cleared the National Eligibility Test (NET) conducted by the UGC, CSIR or similar test accredited by the UGC like SLET/ SET.

Notwithstanding anything contained in sub-clauses (a) and (b) to this Clause, candidates, who are, or have been awarded a Ph.D. Degree in accordance with the University Grants Commission (Minimum Standards and Procedure for Award of Ph.D. Degree) Regulations, 2009, shall be exempted from the requirement of the minimum eligibility condition of NET/SLET/SET for recruitment and appointment of Assistant Professor or equivalent positions in Universities/Colleges/Institutions.

NET/SLET/SET shall also not be required for such Masters Programmes in disciplines for which NET/SLET/SET is not conducted.

Apply online: http://www.psc.cg.gov.in/htm/OA_ME2014.html

Last Date for Online Registration: 22/05/2014

For more details: http://www.psc.cg.gov.in/pdf/Advertisement/ADV_ME2014.pdf

GPS DNA tracking - University of Sheffield

Sat, 10 May 2014 04:33:28 -0500

University of Sheffield geneticist and bioinformatics expert Dr Eran Elhaik demonstrates the power of his new DNA research, which allows people to discover their genetic homeland from 1000 years ago. Find out more about our biological research here http://www.sheffield.ac.uk/aps

Managing and Analyzing Next-Generation Sequence Data

Rahul Agarwal — Sat, 10 May 2014 06:28:06 -0500

Centralized Bioinformatics Core Facilities provide shared resources for the computational and IT requirements of the investigators in their department or institution. As such, they must be able to effectively react to new types of experimental technology. Recently faced with an unprecedented flood of data generated by the next generation of DNA sequencers, these groups found it necessary to respond quickly and efficiently to the informatics and infrastructure demands. Centralized Facilities newly facing this challenge need to anticipate time and design considerations of necessary components, including infrastructure upgrades, staffing, and tools for data analyses and management ...

More at http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000369

Address of the bookmark: http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000369

Visiting Scientist - Computational Genomics (two positions)

Mon, 07 Jul 2014 22:53:41 -0500

Scientific/Managerial & International Recruitment

ICRISAT seeks applications from Indian nationals Visiting Scientist-Computational Genomics (2 positions), to be part of a team of Centre of Excellence in Genomics (CEG), (www.icrisat.org/ceg) to work on legume genomics projects. The positions will be based at ICRISAT’s Headquarters in Patancheru, Hyderabad, India.

ICRISAT is a non-profit, non-political organization that conducts agricultural research for development in Asia and sub-Saharan Africa with a wide array of partners throughout the world. Covering 6.5 million square kilometers of land in 55 countries, the semi-arid tropics is home to over 2 billion people, with 650 million of these are the poorest of the poor. ICRISAT and its partners help empower those living in the semi-arid tropics, especially smallholder farmers, to overcome poverty, hunger, malnutrition and a degraded environment through more efficient and profitable agriculture. ICRISAT is headquartered in Greater Hyderabad, Andhra Pradesh, India and belongs to the Consortium of Centers supported by the Consultative Group on International Agricultural Research (CGIAR).

The Job: Responsibilities for these positions include:

Analyzing and handling large-scale next generation sequencing DNA and RNA data
Data mining and development of pipelines and troubleshooting
Genome diversity analysis such as SNPs, Indels, Structural Variations, population structure
Genome wide association study (GWAS) related analysis- LD analysis, hapmap and trait mapping
Expression analysis based on RNA-Seq data, annotation, gene ontology and metabolic pathway analysis
Epigenome analysis, small RNA identification
Gene family analysis, sequence level protein analysis, orthology/paralogy and molecular modelling
Compiling and analysis of results, writing reports and research papers

The Person: Ph.D. or MSc/MTech/PGDCA with two years research experience in Biotechnology, Computational biology, Agricultural/ Plant Biotechnology, Genetics, Molecular Biology or related discipline. Good knowledge of programming/scripting in at least two of following languages: Perl, C, C++, R, Shell Scripting and Python is plus.

How to apply: Please apply latest by 20 July 2014. The application should include the name of the position applied for, a letter of motivation, a full Curriculum Vita (CV), and the names and contact information of three references that are knowledgeable of the candidate’s professional qualifications and work experience. Technical details and more information about these positions can be obtained from R.K.VARSHNEY@CGIAR.ORG. All applications will be acknowledged, however only short listed candidates will be contacted.

Apply here https://recruit.zoho.com/ats/Portal.na?digest=T642sgLYWZOStExJ77cPrcM*sIMGZETWw4yPxngbmHA-

Perl one-liner for bioinformatician !!!

Abhimanyu Singh — Fri, 30 May 2014 05:49:07 -0500

With the emergence of NGS technologies, and sequencing data most of the bioinformaticians mung and wrangle around massive amounts of genomics text. There are several "standardized" file formats (FASTQ, SAM, VCF, etc.) and some tools for manipulating them (fastx toolkit, samtools, vcftools, etc.), there are still times where knowing a little bit of Perl onliner is extremely helpful.

Perl one-liners are small and awesome Perl programs that fit in a single line of code and they do one thing really well. These things include changing line spacing, numbering lines, doing calculations, converting and substituting text, deleting and printing certain lines, parsing logs, editing files in-place, doing statistics, carrying out system administration tasks, updating a bunch of files at once, and many more. Perl one-liners will make you the shell warrior. Anything that took you minutes to solve, will now take you seconds!

perl -pe '$\="\n"'
#double space a file

perl -pe '$_ .= "\n" unless /^$/'
#double space a file except blank lines

perl -pe '$_.="\n"x7'
#7 space in a line.

perl -ne 'print unless /^$/'
#remove all blank lines

perl -lne 'print if length($_) < 20'
#print all lines with length less than 20.

perl -00 -pe ''
#If there are multiple spaces, delete all leaving one(make the file a single spaced file).

perl -00 -pe '$_.="\n"x4'
#Expand single blank lines into 4 consecutive blank lines

perl -pe '$_ = "$. $_"'
#Number all lines in a file

perl -pe '$_ = ++$a." $_" if /./'
#Number only non-empty lines in a file

perl -ne 'print ++$a." $_" if /./'
#Number and print only non-empty lines in a file

perl -pe '$_ = ++$a." $_" if /regex/'
#Number only lines that match a pattern

perl -ne 'print ++$a." $_" if /regex/'
#Number and print only lines that match a pattern

perl -ne 'printf "%-5d %s", $., $_ if /regex/'
#Left align lines with 5 white spaces if matches a pattern (perl -ne 'printf "%-5d %s", $., $_' : for all the lines)

perl -le 'print scalar(grep{/./}<>)'
#prints the total number of non-empty lines in a file

perl -lne '$a++ if /regex/; END {print $a+0}'
#print the total number of lines that matches the pattern

perl -alne 'print scalar @F'
#print the total number fields(words) in each line.

perl -alne '$t += @F; END { print $t}'
#Find total number of words in the file

perl -alne 'map { /regex/ && $t++ } @F; END { print $t }'
#find total number of fields that match the pattern

perl -lne '/regex/ && $t++; END { print $t }'
#Find total number of lines that match a pattern

perl -le '$n = 20; $m = 35; ($m,$n) = ($n,$m%$n) while $n; print $m'
#will calculate the GCD of two numbers.

perl -le '$a = $n = 20; $b = $m = 35; ($m,$n) = ($n,$m%$n) while $n; print $a*$b/$m'
#will calculate lcd of 20 and 35.

perl -le '$n=10; $min=5; $max=15; $, = " "; print map { int(rand($max-$min))+$min } 1..$n'
#Generates 10 random numbers between 5 and 15.

perl -le 'print map { ("a".."z",”0”..”9”)[rand 36] } 1..8'
#Generates a 8 character password from a to z and number 0 – 9.

perl -le 'print map { ("a",”t”,”g”,”c”)[rand 4] } 1..20'
#Generates a 20 nucleotide long random residue.

perl -le 'print "a"x50'
#generate a string of ‘x’ 50 character long

perl -le 'print join ", ", map { ord } split //, "hello world"'
#Will print the ascii value of the string hello world.

perl -le '@ascii = (99, 111, 100, 105, 110, 103); print pack("C*", @ascii)'
#converts ascii values into character strings.

perl -le '@odd = grep {$_ % 2 == 1} 1..100; print "@odd"'
#Generates an array of odd numbers.

perl -le '@even = grep {$_ % 2 == 0} 1..100; print "@even"'
#Generate an array of even numbers

perl -lpe 'y/A-Za-z/N-ZA-Mn-za-m/' file
#Convert the entire file into 13 characters offset(ROT13)

perl -nle 'print uc'
#Convert all text to uppercase:

perl -nle 'print lc'
#Convert text to lowercase:

perl -nle 'print ucfirst lc'
#Convert only first letter of first word to uppercas

perl -ple 'y/A-Za-z/a-zA-Z/'
#Convert upper case to lower case and vice versa

perl -ple 's/(\w+)/\u$1/g'
#Camel Casing

perl -pe 's|\n|\r\n|'
#Convert unix new lines into DOS new lines:

perl -pe 's|\r\n|\n|'
#Convert DOS newlines into unix new line

perl -pe 's|\n|\r|'
#Convert unix newlines into MAC newlines:

perl -pe '/regexp/ && s/foo/bar/'
#Substitute a foo with a bar in a line with a regexp.

Reference/Sources:

http://genomics-array.blogspot.in/2010/11/some-unixperl-oneliners-for.html

http://genomespot.blogspot.com/2013/08/a-selection-of-useful-bash-one-liners.html

http://biowize.wordpress.com/2012/06/15/command-line-magic-for-your-gene-annotations/

http://genomics-array.blogspot.com/2010/11/some-unixperl-oneliners-for.html

http://bioexpressblog.wordpress.com/2013/04/05/split-multi-fasta-sequence-file/

Next generation sequencing in R or bioconductor environment

John Parker — Mon, 02 Jun 2014 18:03:09 -0500

There are many R software and bioconductor packages for NGS data analysis, some of them are as follows

Biostrings

The Biostrings package from Bioconductor provides an advanced environment for efficient sequence management and analysis in R. It contains many speed and memory effective string containers, string matching algorithms, and other utilities, for fast manipulation of large sets of biological sequences. The objects and functions provided by Biostrings form the basis for many other sequence analysis packages. Documentation

IRanges Overview

IRanges provides the low-level infrastructure and containers for handling sets of integer ranges within Bioconductor's BioC-Seq domain. Its classes and methods provide support for many more high-level packages like GenomicRanges, ShortRead, Rsamtools, etc. Documentation

GenomicRanges Overview

The GenomicRanges package serves as the foundation for representing genomic locations within the Bioconductor project. It is built upon the IRanges infrastructure and defines three major data containers - GRanges, GRangesList and GappedAlignments - which are supporting other important BioC-Seq packages including ShortRead, Rsamtools, rtracklayer, GenomicFeatures and BSgenome. Compared to the IRanges container, the GRanges/GRangesList classes are more flexible and extensible to store additional information about sequence ranges, such as chromosome identifiers (sequence space), strand information and annotation data. Documentation

Motif Discovery

cosmo

The cosmo package allows to search a set of unaligned DNA sequences for a shared motif that may function as transcription factor binding site. The algorithm extends the popular motif discovery tool MEME (Bailey and Elkan, 1995) in that it allows the search to be supervised by specifying a set of constraints that the motif to be discovered must satisfy. Documentation

BCRANK

BCRANK is a method that takes a ranked list of genomic regions as input and outputs short DNA sequences that are overrepresented in some part of the list. The algorithm was developed for detecting transcription factor (TF) binding sites in a large number of enriched regions from high-throughput ChIP-chip or ChIP-seq experiments, but it can be applied to any ranked list of DNA sequences. Documentation

rGADEM: Documentation

MotIV: Documentation

ShortRead

The ShortRead package provides input, quality control, filtering, parsing, and manipulation functionality for short read sequences produced by high throughput sequencing technologies. While support is provided for many sequencing technologies, this package is primairly focused on Solexa/Illumina reads. Documentation

Rsamtools

Rsamtools provides functions for parsing and inspecting samtools BAM formatted binary alignment data. SAM/BAM is quickly becoming a universal standard alignment format, and is now supported by a wide variety of alignment tools. Documentation

Samtools Website
BWA (Burrows-Wheeler Alignment) Website

Additional tools for SNP analysis:

snpMatrix

BSgenome

BSgenome provides an object oriented infrastructure for interacting with a Biostring based genome sequence. BSgenome packages exist for many common genomes, and can be created to represent custom genomes. See the "How to forge a BSgenome data package" Vignette for instructions to create a new BSgenome package if a prebuilt package does not exist for your organism. Documentation

rtracklayer

rtracklayer provides an interface for exporting annotation feature data to various genome browsers and file formats (such as GFF). See the Small RNA Profiling exercise for an example of using rtracklayer to visualize alignment coverage. Documentation

biomaRt

The biomaRt package, provides an interface to a growing collection of databases implementing the BioMart software suite (http:// www.biomart.org). The package enables online retrieval of large amounts of data in a uniform way without the need to know the underlying database schemas. This data is retrieved automatically via the Internet, so it's recommended that you cache the data locally, or check versions if your code will be adversely affected by updates to these data. Documentation

ChIP-Seq Analysis Packages

Bioconductor provides various packages for analyzing and visualizing ChIP-Seq data. Only a small selection of these packages is introduced here. Additional useful introductions to this topic are: BioC ChIP-seq Case Study and BioC ChIP-Seq.

chipseq

The chipseq package combines a variety of HT-Seq packages to a pipeline for ChIP-Seq data analysis. Documentation

BayesPeak

BayesPeak is a peak calling package for identifying DNA binding sites of proteins in ChIP-Seq experiments. Its algorithm uses hidden Markov models (HMM) and Bayesian statistical methods. The following sample code introduces the identification of peaks with the BayesPeak package as well as the incorporation of read coverage information obtained by the chipseq package. Documentation [ Publication ]

PICS

The PICS package applies probabilistic inference to aligned-read ChIP-Seq data in order to identify regions bound by transcription factors. PICS identifies enriched regions by modeling local concentrations of directional reads, and uses DNA fragment length prior information to discriminate closely adjacent binding events via a Bayesian hierarchical t-mixture model. The following sample code uses the test data set from the above BayesPeak package in order to compare the results from both methods by identifying their consensus peak set. Documentation [ Publication ]

ChIPpeakAnno

The ChIPpeakAnno package provides. batch annotation of the peaks identified from either ChIP-seq or ChIP-chip experiments. It includes functions to retrieve the sequences around peaks, obtain enriched Gene Ontology (GO) terms, find the nearest gene, exon, miRNA or custom features such as most conserved elements and other transcription factor binding sites supplied by users. The package leverages the biomaRt, IRanges, Biostrings, BSgenome, GO.db, multtest and stat packages. Documentation

Additional ChIP-Seq Packages

DiffBind: Documentation

MOSAICS: Documentation

iSeq: Documentation

ChIPseqR: Documentation

ChiPsim: Documentation

CSAR: Documentation

ChIP-Seq Pipeline: PICS, rGADEM and MotIV (developer web site)

SPP: ChIP-seq processing pipeline

SPP Tutorial

MACS

SIPeS

RNA-Seq Analysis

Counting Reads that Overlap with Annotation Ranges

The GenomicRanges package provides support for importing into R short read alignment data in BAM format (via Rsamtools) and associating them with genomic feature ranges, such as exons or genes. This way one can quantify the number of reads aligning to annotated genomic regions. The package defines general purpose containers for storing genomic intervals as well as more specialized containers for storing alignments against a reference genome. The two main functions for read counting provided by this infrastructure are countOverlaps and summarizeOverlaps. For their proper usage, it is important to read the corresponding PDF manual. Documentation

Differential Gene Expression Analysis with DESeq

The DESeq package contains functions to call differentially expressed genes (DEGs) in count tables based on a model using the negative binomial distribution. It expects as input a data frame with the raw read counts per region/gene of interest (rows) for each test sample (columns). Such a count table can be imported into R or generated from BAM alignment files using the countOverlaps function as introduced above. Documentation

Differential Gene Expression Analysis with edgeR

The edgeR package uses empirical Bayes estimation and exact tests based on the negative binomial distribution to call differentially expressed genes (DEGs) in count data.

Documentation

A variety of additional R packages are available for normalizing RNA-Seq read count data and identifying differentially expressed genes (DEG):

easyRNASeq (simplifies read counting per genome feature)

DEXSeq (Inference of differential exon usage); parathyroidSE explains how to generate exon read counts in R

DEGseq

baySeq (also see: segmentSeq)

Genominator (Bullard et al. 2010)

Detection of Alternative Splice Junctions

Another utility of RNA-Seq experiments is the analysis of splice junctions. The following software suggestions provide this utility:

ERANGE
TopHat

SpliceMap

SplitSeek

DNA-Methylation Data Analysis

methylPipe
bsseq
BiSeq
Much more under BiocViews

HT-Seq Data Visualization

ggbio: ggplot2 extension for genomics data (online manual) Gviz: Plotting data and annotation information along genomic coordinates HilbertVis: Hilbert genome plots

GenomeGraphs: Plotting genomic information from Ensembl

TileQC: Flow Cell Quality Visualization

rtracklayer: R interface to genome browsers

genoPlotR: Plotting maps of genes and genomes

Genominator: Tools for storing, accessing, analyzing and visualizing genomic data.

To install all packages

source("http://bioconductor.org/biocLite.R")
biocLite()
biocLite(c("ShortRead", "Biostrings", "IRanges", "BSgenome", "rtracklayer", "biomaRt", "chipseq", "ChIPpeakAnno", "Rsamtools", "BayesPeak", "PICS", "GenomicRanges", "DESeq", "edgeR", "leeBamViews", "GenomicFeatures", "BSgenome.Celegans.UCSC.ce2"))

Ten recommendations for creating usable bioinformatics command line software

RAJESH DETROJA — Sun, 08 Jun 2014 10:06:26 -0500

Bioinformatics software varies greatly in quality. In terms of usability, the command line interface is the first experience a user will have of a tool. Unfortunately, this is often also the last time a tool will be used. Here I present ten recommendations for command line software author’s tools to follow, which I believe would greatly improve the uptake and usability of their products, waste less user’s time, and improve the quality of scientific analyses.

Address of the bookmark: http://www.gigasciencejournal.com/content/2/1/15?utm_content=buffer25ee0&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer