Log in

Our Sponsors



Download BioinformaticsOnline(BOL) Apps in your chrome browser.




Latest activity

  • Anjana is now a friend with Sanjay 2385 days ago
  • JBrowse is a fast, embeddable genome browser built completely with JavaScript and HTML5, with optional run-once data formatting tools written in Perl. Headline Features: Fast, smooth scrolling and zooming. Explore your genome with unparalleled...
  • Jit created a new bio-script Bash script to run Busco2 ! 2388 days ago
  • AfterQC AfterQC - Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data. dupRadar dupRadar. An R package which provides functions for plotting and analyzing the duplication rates dependent on the...
  • Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data AfterQC can simply go through all fastq files in a folder and then output three folders: good, bad and QC folders, which contains good reads, bad reads and the QC...
    Comments
    • Jit 2388 days ago
      • AfterQC AfterQC - Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data.
      • dupRadar dupRadar. An R package which provides functions for plotting and analyzing the duplication rates dependent on the expression levels.
      • FastQC FastQC is a quality control tool for high-throughput sequence data (Babraham Institute) and is developed in Java. Import of data is possible from FastQfiles, BAM or SAM format. This tool provides an overview to inform about problematic areas, summary graphs and tables to rapid assessment of data. Results are presented in HTML permanent reports. FastQC can be run as a stand-alone application or it can be integrated into a larger pipeline solution.
      • fastqp fastqp. Simple FASTQ quality assessment using Python.
      • Kraken kraken:A set of tools for quality control and analysis of high-throughput sequence data.
      • HTSeq HTSeq.The Python script htseq-qa takes a file with sequencing reads (either raw or aligned reads) and produces a PDF file with useful plots to assess the technical quality of a run.
      • mRIN mRIN - Assessing mRNA integrity directly from RNA-Seq data.
      • MultiQC MultiQC- Aggregate and visualise results from numerous tools (FastQC, HTSeq, RSeQC, Tophat, STAR, others..) across all samples into a single report.
      • NGSQC NGSQC: cross-platform quality analysis pipeline for deep sequencing data.
      • NGS QC Toolkit NGS QC Toolkit A toolkit for the quality control (QC) of next generation sequencing (NGS) data. The toolkit comprises user-friendly stand alone tools for quality control of the sequence data generated using Illumina and Roche 454 platforms with detailed results in the form of tables and graphs, and filtering of high-quality sequence data. It also includes few other tools, which are helpful in NGS data quality control and analysis.
      • PRINSEQ PRINSEQ is a tool that generates summary statistics of sequence and quality data and that is used to filter, reformat and trim next-generation sequence data. It is particular designed for 454/Roche data, but can also be used for other types of sequence.
      • QC-Chain QC-Chain is a package of quality control tools for next generation sequencing (NGS) data, consisting of both raw reads quality evaluation and de novo contamination screening, which could identify all possible contamination sequences.
      • QC3 QC3 a quality control tool designed for DNA sequencing data for raw data, alignment, and variant calling.
      • qrqc qrqc. Quickly scans reads and gathers statistics on base and quality frequencies, read length, and frequent sequences. Produces graphical output of statistics for use in quality control pipelines, and an optional HTML quality report. S4 SequenceSummary objects allow specific tests and functionality to be written around the data collected.
      • RNA-SeQC RNA-SeQCis a tool with application in experiment design, process optimization and quality control before computational analysis. Essentially, provides three types of quality control: read counts (such as duplicate reads, mapped reads and mapped unique reads, rRNA reads, transcript-annotated reads, strand specificity), coverage (like mean coverage, mean coefficient of variation, 5’/3’ coverage, gaps in coverage, GC bias) and expression correlation (the tool provides RPKM-based estimation of expression levels). RNA-SeQC is implemented in Java and is not required installation, however can be run using the GenePattern web interface. The input could be one or more BAM files. HTML reports are generated as output.
      • RSeQC RSeQC analyzes diverse aspects of RNA-Seq experiments: sequence quality, sequencing depth, strand specificity, GC bias, read distribution over the genome structure and coverage uniformity. The input can be SAM, BAM, FASTA, BED files or Chromosome size file (two-column, plain text file). Visualization can be performed by genome browsers like UCSC, IGB and IGV. However, R scripts can also be used to visualization.
      • SAMStat SAMStat identifies problems and reports several statistics at different phases of the process. This tool evaluates unmapped, poorly and accurately mapped sequences independently to infer possible causes of poor mapping.
      • SolexaQA SolexaQA calculates sequence quality statistics and creates visual representations of data quality for second-generation sequencing data. Originally developed for the Illumina system (historically known as “Solexa”), SolexaQA now also supports Ion Torrent and 454 data.
      • Trim galore! Trim_galore is a wrapper script to automate quality and adapter trimming as well as quality control, with some added functionality to remove biased methylation positions for RRBS sequence files (for directional, non-directional (or paired-end) sequencing).
  • Rahul Nayak posted to the wire 2389 days ago
    Get unique ids from column 1 in Linux. $ cat output_file | cut -f 1 | sort | uniq > allUniqOUT #Unique #Linux #Cut
  • BLASTn output format 6 BLASTn maps DNA against DNA, for example gene sequences against a reference genomeblastn  -query genes.ffn  -subject genome.fna  -outfmt 6 BLASTn tabular output format 6 Column headers:qseqid sseqid pident...
  • Jit posted to the wire 2390 days ago
    #tBLASTn #command line $ tblastn -query Avaga.pep.out.45.fa -db myDB -outfmt 6 -out output_file -evalue 1e-5 -max_target_seqs 1 #Blast #Protein
  • Jit posted to the wire 2390 days ago
    perl -e '$/="\n>"; while (<>) { s/>//g; my ($id, $seq) = split /\n/; print ">$_" if length $seq; }' < all_p_ctg.fasta > all_p_ctg_corrected.fasta #Convert #Fasta #Zero
  • Jit created a new bio-script Perl script to convert GFF 2 FASTA ! 2390 days ago
  • Jit created a new bio-script Perl script to run SATSUMA in loop ! 2391 days ago
  • An R package for Interactive visualization and mapping of human chromosomes
    Comments
  • Rahul Nayak posted to the wire 2392 days ago
  • Jit posted to the wire 2393 days ago
    perl -pe 's/[^AGTC\n]/N/gi unless m/>/;' /media/urbe/TOSHIBALAB/falconUnzip/SSPACED_P2/scaffolds.fasta > /media/urbe/TOSHIBALAB/falconUnzip/SSPACED_P2/scaffolds_corrected.fasta #Remove #Non-ATCG
  • Jit posted to the wire 2395 days ago
    #Zero coverage regions ~/Tools/bedtools2/bin/bedtools genomecov -ibam PvsH_aln_sorted.bam -bga | awk '$4==0' > ZERO_coverage_regions.txt
  • Aaryan Lokwani posted to the wire 2395 days ago
    Sed illegal charters from fasta. $ sed '/^[^>]/ s/[^AGTC]/N/gi' < seq.fa #fasta #otherthanATGC #replace #sed #Linux #illegal
  • Comments
    • Jit 1814 days ago

      If you want to count all 'N' in miltifasta file. 

      (base) ?  output_2_test git:(master) ? more scaffolds.fasta |  grep -Ho N * | uniq -c

           12 draft_summary.info:N

      1035948 scaffolds.fasta:N

           12 scaffolds_summary.info:N

          900 updatedGenome.fa:N