<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/35144?offset=10</link>
	<atom:link href="https://bioinformaticsonline.com/related/35144?offset=10" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34292/automatic-filtering-trimming-error-removing-and-quality-control-for-fastq-data</guid>
	<pubDate>Mon, 13 Nov 2017 05:10:23 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34292/automatic-filtering-trimming-error-removing-and-quality-control-for-fastq-data</link>
	<title><![CDATA[Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data]]></title>
	<description><![CDATA[<p><span>Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data</span><br><code>AfterQC</code><span>&nbsp;can simply go through all fastq files in a folder and then output three folders:&nbsp;</span><span>good</span><span>,&nbsp;</span><span>bad</span><span>&nbsp;and&nbsp;</span><span>QC</span><span>&nbsp;folders, which contains good reads, bad reads and the QC results of each fastq file/pair.</span><br><span>Currently it supports processing data from HiSeq 2000/2500/3000/4000, Nextseq 500/550, MiniSeq...and other&nbsp;</span><a href="http://support.illumina.com/help/SequencingAnalysisWorkflow/Content/Vault/Informatics/Sequencing_Analysis/CASAVA/swSEQ_mCA_FASTQFiles.htm">Illumina 1.8 or newer formats</a></p><p>Address of the bookmark: <a href="https://github.com/OpenGene/AfterQC" rel="nofollow">https://github.com/OpenGene/AfterQC</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/2573/most-commonly-used-awk-by-bioinformatician</guid>
	<pubDate>Mon, 19 Aug 2013 01:12:38 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/2573/most-commonly-used-awk-by-bioinformatician</link>
	<title><![CDATA[Most Commonly used Awk by Bioinformatician]]></title>
	<description><![CDATA[<p style="text-align: center;">&nbsp;</p><p>Awk is a programming language that is specifically designed for quickly manipulating space delimited data. Although you can achieve all its functionality with Perl, awk is simpler in many practical cases.</p><p>Why awk? You can replace a pipeline of 'stuff | grep | sed | cut...' with a single call to awk. For a simple script, most of the timelag is in loading these apps into memory, and it's much faster to do it all with one. This is ideal for something like an openbox pipe menu where you want to generate something on the fly. You can use awk to make a neat one-liner for some quick job in the terminal, or build an awk section into a shell script. You can find a lot of online tutorials, but here I will only show a few examples which cover most of bioinformatician daily uses of awk.</p><p>choose rows where column 3 is larger than column 5:</p><p>awk '$3&gt;$5' input.txt &gt; output.txt</p><p>extract column 2,4,5:</p><p>awk '{print $2,$4,$5}' input.txt &gt; output.txt</p><p>awk 'BEGIN{OFS="\t"}{print $2,$4,$5}' input.txt</p><p>show rows between 20th and 80th:</p><p>awk 'NR&gt;=20&amp;&amp;NR&lt;=80' input.txt &gt; output.txt</p><p>calculate the average of column 2:</p><p>awk '{x+=$2}END{print x/NR}' input.txt</p><p>regex (egrep):</p><p>awk '/^test[0-9]+/' input.txt</p><p>calculate the sum of column 2 and 3 and put it at the end of a row or replace the first column:</p><p>awk '{print $0,$2+$3}' input.txt</p><p>awk '{$1=$2+$3;print}' input.txt</p><p>join two files on column 1:</p><p>awk 'BEGIN{while((getline&lt;"file1.txt")&gt;0)l[$1]=$0}$1 in l{print $0"\t"l[$1]}' file2.txt &gt; output.txt</p><p>count number of occurrence of column 2 (uniq -c):</p><p>awk '{l[$2]++}END{for (x in l) print x,l[x]}' input.txt</p><p>apply "uniq" on column 2, only printing the first occurrence (uniq):</p><p>awk '!($2 in l){print;l[$2]=1}' input.txt</p><p>count different words (wc):</p><p>awk '{for(i=1;i!=NF;++i)c[$i]++}END{for (x in c) print x,c[x]}' input.txt</p><p>deal with simple CSV:</p><p>awk -F, '{print $1,$2}'</p><p>substitution (sed is simpler in this case):</p><p>awk '{sub(/test/, "no", $0);print}' input.txt</p><p>&nbsp;</p><p>OK now here's where to read this stuff properly explained. roll</p><p>Two thorough tutorials:</p><p>http://www.gnu.org/software/gawk/manual/gawk.html</p><p>http://www.grymoire.com/Unix/Awk.html</p><p>A famous list of useful one-liners - though they're short, many are quite tricky:</p><p>http://www.pement.org/awk/awk1line.txt</p><p>And some nice explanations of those one-liners. After reading this you'll have a pretty good grasp!</p><p>http://www.catonmat.net/blog/awk-one-li &hellip; -part-one/</p><p>http://www.catonmat.net/blog/ten-awk-ti &hellip; -pitfalls/</p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/36624/convert-vcf-to-tab-deilimited-table</guid>
	<pubDate>Tue, 15 May 2018 07:39:08 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/36624/convert-vcf-to-tab-deilimited-table</link>
	<title><![CDATA[Convert VCF to tab-deilimited table]]></title>
	<description><![CDATA[
<p>Performed with GATK :</p>

<p>java -Xmx8g -jar GenomeAnalysisTK.jar \<br /> -T VariantsToTable \<br /> -R reference.fa \<br /> -V reference_genomes_GT.vcf \<br /> -F CHROM -F POS -F REF -F ALT -GF GT \<br /> -o reference_genomes_GT.table<br />multiple_sample.vcf should also be converted to multiple_sample_GT.table using this approach.</p>
]]></description>
	<dc:creator>Seema Singh</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/44320/tools-for-id-conversion</guid>
	<pubDate>Sat, 20 May 2023 21:53:32 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/44320/tools-for-id-conversion</link>
	<title><![CDATA[Tools for id conversion]]></title>
	<description><![CDATA[<p><strong>g:Convert</strong><span>&nbsp;enables to convert between various gene, protein, microarray probe and numerous other types of namespaces. We provide at least 40 types of IDs for more than 60 species. The 98 different namespaces supported for human include Ensembl, Refseq, Illumina, Entrezgene and Uniprot identifiers. All namespaces are obtained through matching them via Ensembl gene identifiers as a reference.</span></p><p>Address of the bookmark: <a href="https://biit.cs.ut.ee/gprofiler/convert" rel="nofollow">https://biit.cs.ut.ee/gprofiler/convert</a></p>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/38666/mcat-motif-combining-and-association-tool</guid>
	<pubDate>Sun, 13 Jan 2019 06:27:28 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/38666/mcat-motif-combining-and-association-tool</link>
	<title><![CDATA[MCAT: Motif Combining and Association Tool]]></title>
	<description><![CDATA[<p>This is a pipeline for finding motifs in fasta files.<br>It can be run from the command line as follows:</p>
<p>usage: orange_pipeline_refine.py [-h] [-w W] [--nmotifs NMOTIFS] [--iter ITER] [-c C]<br>[-s S] [-d] [-ff] [-v V]<br>positive_seq negative_seq</p>
<p>positional arguments:<br>positive_seq the fasta file for the positive sequences<br>negative_seq the fasta file for the negative sequences</p>
<p>&nbsp;</p><p>Address of the bookmark: <a href="https://github.com/yanshen43/MCAT" rel="nofollow">https://github.com/yanshen43/MCAT</a></p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/27076/ale-a-generic-assembly-likelihood-evaluation-framework-for-assessing-the-accuracy-of-genome-and-metagenome-assemblies</guid>
	<pubDate>Tue, 26 Apr 2016 03:38:43 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/27076/ale-a-generic-assembly-likelihood-evaluation-framework-for-assessing-the-accuracy-of-genome-and-metagenome-assemblies</link>
	<title><![CDATA[ALE: a Generic Assembly Likelihood Evaluation Framework for Assessing the Accuracy of Genome and Metagenome Assemblies]]></title>
	<description><![CDATA[<p>Assembly Likelihood Evaluation (ALE) framework that overcomes these limitations, systematically evaluating the accuracy of an assembly in a reference-independent manner using rigorous statistical methods. This framework is comprehensive, and integrates read quality, mate pair orientation and insert length (for paired-end reads), sequencing coverage, read alignment and k-mer frequency. ALE pinpoints synthetic errors in both single and metagenomic assemblies, including single-base errors, insertions/deletions, genome rearrangements and chimeric assemblies presented in metagenomes. At the genome level with real-world data, ALE identifies three large misassemblies from the Spirochaeta smaragdinae finished genome, which were all independently validated by Pacific Biosciences sequencing. At the single-base level with Illumina data, ALE recovers 215 of 222 (97%) single nucleotide variants in a training set from a GC-rich Rhodobacter sphaeroides genome. Using real Pacific Biosciences data, ALE identifies 12 of 12 synthetic errors in a Lambda Phage genome, surpassing even Pacific Biosciences' own variant caller, EviCons. In summary, the ALE framework provides a comprehensive, reference-independent and statistically rigorous measure of single genome and metagenome assembly accuracy, which can be used to identify misassemblies or to optimize the assembly process.</p>
<p>More at&nbsp;http://www.ncbi.nlm.nih.gov/pubmed/23303509</p><p>Address of the bookmark: <a href="http://sc932.github.io/ALE/about.html" rel="nofollow">http://sc932.github.io/ALE/about.html</a></p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/42303/fqc-dashboard-integrates-fastqc-results-into-a-web-based-interactive-and-extensible-fastq-quality-control-tool</guid>
	<pubDate>Tue, 10 Nov 2020 01:30:22 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/42303/fqc-dashboard-integrates-fastqc-results-into-a-web-based-interactive-and-extensible-fastq-quality-control-tool</link>
	<title><![CDATA[FQC Dashboard: Integrates FastQC results into a web-based, interactive, and extensible FASTQ quality control tool]]></title>
	<description><![CDATA[<p>FQC is software that facilitates quality control of FASTQ files by carrying out a QC protocol using FastQC, parsing results, and aggregating quality metrics into an interactive dashboard designed to richly summarize individual sequencing runs. The dashboard groups samples in dropdowns for navigation among the data sets, utilizes human-readable configuration files to manipulate the pages and tabs, and is extensible with CSV data.</p><p>Address of the bookmark: <a href="https://github.com/pnnl/fqc" rel="nofollow">https://github.com/pnnl/fqc</a></p>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/32131/wgs-celera-assembler-version-83rc2</guid>
	<pubDate>Mon, 10 Apr 2017 04:45:40 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/32131/wgs-celera-assembler-version-83rc2</link>
	<title><![CDATA[WGS Celera Assembler version 8.3rc2]]></title>
	<description><![CDATA[<p>These are release notes for Celera Assembler version 8.3rc2, which was released on May 24, 2015.<br><br>This distribution package provides a stable, tested, documented version of the software.&nbsp; The distribution is usable on most Unix-like platforms, and some platforms have pre-compiled binary distributions ready for installation.<br><br>The source code package includes full source code (revision 4627), Makefiles, and scripts.&nbsp; A subset of the kmer package (http://kmer.sourceforge.net/, version r1994), used by some modules of Celera Assembler, is included.&nbsp; This distribution includes [http://samtools.sourceforge.net/ SAMtools], [http://www.cbcb.umd.edu/software/jellyfish/ Jellyfish 2.0], [https://github.com/pbjd/pbutgcns PBUTGCNS], [https://github.com/PacificBiosciences/pbdagcon PBDAGCON], [https://github.com/PacificBiosciences/BLASR BLASR], and parts of the [https://github.com/PacificBiosciences/FALCON/tree/v0.1.3 Falcon assembler].<br><br>Full documentation can be found online at http://wgs-assembler.sourceforge.net/.</p>
<p>Interesting scripts within it</p>
<p>urbe@urbo214b[bin] ls&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; []<br>-rwxrwxr-x 1 urbe urbe&nbsp; 11K Apr 10 11:41 addCNSToStore<br>-rwxrwxr-x 1 urbe urbe 575K Apr 10 11:41 addReadsToUnitigs<br>-rwxrwxr-x 1 urbe urbe 128K Apr 10 11:41 analyzeBest<br>-rwxrwxr-x 1 urbe urbe 257K Apr 10 11:41 analyzePosMap<br>-rwxrwxr-x 1 urbe urbe 1,5M Apr 10 11:41 analyzeScaffolds<br>-rwxrwxr-x 1 urbe urbe 224K Apr 10 11:41 asmOutputFasta<br>-rwxrwxr-x 1 urbe urbe 448K Apr 10 11:41 asmOutputStatistics<br>-rwxrwxr-x 1 urbe urbe 2,4K Apr 10 11:41 asmToAGP.pl<br>-rwxrwxr-x 1 urbe urbe 7,6M Apr 10 11:41 blasr<br>-rwxrwxr-x 1 urbe urbe 1,6M Apr 10 11:41 bogart<br>-rwxrwxr-x 1 urbe urbe 183K Apr 10 11:41 bogus<br>-rwxrwxr-x 1 urbe urbe 272K Apr 10 11:41 bogusness<br>-rwxrwxr-x 1 urbe urbe 247K Apr 10 11:41 buildPosMap<br>-rwxrwxr-x 1 urbe urbe 213K Apr 10 11:41 buildRefContigs<br>-rwxrwxr-x 1 urbe urbe 990K Apr 10 11:41 buildUnitigs<br>-rwxrwxr-x 1 urbe urbe&nbsp; 18K Apr 10 11:41 ca2ace.pl<br>-rwxrwxr-x 1 urbe urbe&nbsp; 12K Apr 10 11:41 caqc_help.ini<br>-rwxrwxr-x 1 urbe urbe&nbsp; 61K Apr 10 11:41 caqc.pl<br>-rwxrwxr-x 1 urbe urbe&nbsp; 23K Apr 10 11:41 cat-corrects<br>-rwxrwxr-x 1 urbe urbe&nbsp; 24K Apr 10 11:41 cat-erates<br>-rwxrwxr-x 1 urbe urbe 1,9M Apr 10 11:41 cgw<br>-rwxrwxr-x 1 urbe urbe 1,4M Apr 10 11:41 cgwDump<br>-rwxrwxr-x 1 urbe urbe 204K Apr 10 11:41 chimChe<br>-rwxrwxr-x 1 urbe urbe 201K Apr 10 11:40 chimera<br>-rwxrwxr-x 1 urbe urbe 220K Apr 10 11:41 classifyMates<br>-rwxrwxr-x 1 urbe urbe 201K Apr 10 11:41 classifyMatesApply<br>-rwxrwxr-x 1 urbe urbe 215K Apr 10 11:41 classifyMatesPairwise<br>-rwxrwxr-x 1 urbe urbe 366K Apr 10 11:41 computeCoverageStat<br>-rwxrwxr-x 1 urbe urbe 9,8K Apr 10 11:41 convert-fasta-to-v2.pl<br>-rwxrwxr-x 1 urbe urbe&nbsp; 48K Apr 10 11:41 convertOverlap<br>-rwxrwxr-x 1 urbe urbe 119K Apr 10 11:41 convertSamToCA<br>-rwxrwxr-x 1 urbe urbe&nbsp; 20K Apr 10 11:41 convertToPBCNS<br>-rwxrwxr-x 1 urbe urbe 197K Apr 10 11:41 correct-frags<br>-rwxrwxr-x 1 urbe urbe 259K Apr 10 11:41 correct-olaps<br>-rwxrwxr-x 1 urbe urbe 520K Apr 10 11:41 correctPacBio<br>-rwxrwxr-x 1 urbe urbe 540K Apr 10 11:41 ctgcns<br>-rwxrwxr-x 1 urbe urbe 162K Apr 10 11:40 deduplicate<br>-rwxrwxr-x 1 urbe urbe&nbsp; 37K Apr 10 11:41 demotePosMap<br>-rwxrwxr-x 1 urbe urbe 1,5M Apr 10 11:41 dumpCloneMiddles<br>-rwxrwxr-x 1 urbe urbe 124K Apr 10 11:41 dumpPBRLayoutStore<br>-rwxrwxr-x 1 urbe urbe 1,3M Apr 10 11:41 dumpSingletons<br>-rwxrwxr-x 1 urbe urbe 171K Apr 10 11:41 erate-estimate<br>-rwxrwxr-x 1 urbe urbe 221K Apr 10 11:40 estimate-mer-threshold<br>-rwxrwxr-x 1 urbe urbe 1,5M Apr 10 11:41 extendClearRanges<br>-rwxrwxr-x 1 urbe urbe 1,3M Apr 10 11:41 extendClearRangesPartition<br>-rwxrwxr-x 1 urbe urbe 205K Apr 10 11:40 extractmessages<br>-rwxrwxr-x 1 urbe urbe 7,2M Apr 10 11:41 falcon_sense<br>-rwxrwxr-x 1 urbe urbe 9,8K Apr 10 11:41 fastaToCA<br>-rwxrwxr-x 1 urbe urbe 124K Apr 10 11:40 fastqAnalyze<br>-rwxrwxr-x 1 urbe urbe 137K Apr 10 11:40 fastqSample<br>-rwxrwxr-x 1 urbe urbe&nbsp; 62K Apr 10 11:40 fastqSimulate<br>-rwxrwxr-x 1 urbe urbe 121K Apr 10 11:40 fastqSimulate-sort<br>-rwxrwxr-x 1 urbe urbe 246K Apr 10 11:40 fastqToCA<br>-rwxrwxr-x 1 urbe urbe 140K Apr 10 11:41 filterOverlap<br>-rwxrwxr-x 1 urbe urbe 341K Apr 10 11:40 finalTrim<br>-rwxrwxr-x 1 urbe urbe 228K Apr 10 11:41 fixUnitigs<br>-rwxrwxr-x 1 urbe urbe 147K Apr 10 11:40 fragmentDepth<br>-rwxrwxr-x 1 urbe urbe&nbsp; 29K Apr 10 11:41 fragsInVars<br>-rwxrwxr-x 1 urbe urbe 545K Apr 10 11:41 frgs2clones<br>-rwxrwxr-x 1 urbe urbe 398K Apr 10 11:40 gatekeeper<br>-rwxrwxr-x 1 urbe urbe 139K Apr 10 11:40 gatekeeperbench<br>-rwxrwxr-x 1 urbe urbe 167K Apr 10 11:40 gkpStoreCreate<br>-rwxrwxr-x 1 urbe urbe 147K Apr 10 11:40 gkpStoreDumpFASTQ<br>-rwxrwxr-x 1 urbe urbe 184K Apr 10 11:41 greedyFragmentTiling<br>-rwxrwxr-x 1 urbe urbe 1,6K Apr 10 11:41 greedy_layout_to_IUM<br>-rwxrwxr-x 1 urbe urbe 142K Apr 10 11:40 initialTrim<br>-rwxrwxr-x 1 urbe urbe 967K Apr 10 11:41 jellyfish<br>-rwxrwxr-x 1 urbe urbe 219K Apr 10 11:41 markRepeatUnique<br>-rwxrwxr-x 1 urbe urbe 273K Apr 10 11:40 markUniqueUnique<br>-rwxrwxr-x 1 urbe urbe 114K Apr 10 11:40 mercy<br>-rwxrwxr-x 1 urbe urbe 3,8K Apr 10 11:41 mergeqc.pl<br>-rwxrwxr-x 1 urbe urbe 422K Apr 10 11:40 merTrim<br>-rwxrwxr-x 1 urbe urbe 125K Apr 10 11:40 merTrimApply<br>-rwxrwxr-x 1 urbe urbe 376K Apr 10 11:40 meryl<br>-rwxrwxr-x 1 urbe urbe 176K Apr 10 11:41 metagenomics_ovl_analyses<br>-rwxrwxr-x 1 urbe urbe 297K Apr 10 11:41 olap-from-seeds<br>-rwxrwxr-x 1 urbe urbe 275K Apr 10 11:41 outputLayout<br>-rwxrwxr-x 1 urbe urbe 229K Apr 10 11:41 overlapInCore<br>-rwxrwxr-x 1 urbe urbe 144K Apr 10 11:40 overlap_partition<br>-rwxrwxr-x 1 urbe urbe 179K Apr 10 11:41 overlapStats<br>-rwxrwxr-x 1 urbe urbe 179K Apr 10 11:41 overlapStore<br>-rwxrwxr-x 1 urbe urbe 153K Apr 10 11:41 overlapStoreBucketizer<br>-rwxrwxr-x 1 urbe urbe 175K Apr 10 11:41 overlapStoreBuild<br>-rwxrwxr-x 1 urbe urbe&nbsp; 33K Apr 10 11:41 overlapStoreIndexer<br>-rwxrwxr-x 1 urbe urbe&nbsp; 48K Apr 10 11:41 overlapStoreSorter<br>-rwxrwxr-x 1 urbe urbe 604K Apr 10 11:40 overmerry<br>lrwxrwxrwx 1 urbe urbe&nbsp;&nbsp;&nbsp; 4 Apr 10 11:41 pacBioToCA -&gt; PBcR<br>-rwxrwxr-x 1 urbe urbe 131K Apr 10 11:41 PBcR<br>-rwxrwxr-x 1 urbe urbe 2,9M Apr 10 11:41 pbdagcon<br>-rwxrwxr-x 1 urbe urbe 1,9M Apr 10 11:41 pbutgcns<br>-rwxrwxr-x 1 urbe urbe 201K Apr 10 11:40 remove_fragment<br>-rwxrwxr-x 1 urbe urbe 153K Apr 10 11:40 removeMateOverlap<br>-rwxrwxr-x 1 urbe urbe 2,5K Apr 10 11:41 replaceUIDwithName-fastq<br>-rwxrwxr-x 1 urbe urbe 1,2K Apr 10 11:41 replaceUIDwithName-posmap<br>-rwxrwxr-x 1 urbe urbe 1,3M Apr 10 11:41 resolveSurrogates<br>-rwxrwxr-x 1 urbe urbe 139K Apr 10 11:41 rewriteCache<br>-rwxrwxr-x 1 urbe urbe 232K Apr 10 11:41 runCA<br>-rwxrwxr-x 1 urbe urbe&nbsp; 88K Apr 10 11:41 runCA-dedupe<br>-rwxrwxr-x 1 urbe urbe&nbsp; 14K Apr 10 11:41 runCA-overlapStoreBuild<br>-rwxrwxr-x 1 urbe urbe 3,6K Apr 10 11:41 run_greedy.csh<br>-rwxrwxr-x 1 urbe urbe 297K Apr 10 11:40 sffToCA<br>-rwxrwxr-x 1 urbe urbe&nbsp; 13K Apr 10 11:40 show-corrects<br>-rwxrwxr-x 1 urbe urbe 557K Apr 10 11:41 splitUnitigs<br>-rwxrwxr-x 1 urbe urbe 1,4M Apr 10 11:41 terminator<br>drwxrwxr-x 2 urbe urbe 4,0K Apr 10 11:41 TIGR<br>-rwxrwxr-x 1 urbe urbe 526K Apr 10 11:41 tigStore<br>-rwxrwxr-x 1 urbe urbe&nbsp; 35K Apr 10 11:41 tracearchiveToCA<br>-rwxrwxr-x 1 urbe urbe&nbsp; 35K Apr 10 11:41 tracedb-to-frg.pl<br>-rwxrwxr-x 1 urbe urbe&nbsp; 44K Apr 10 11:41 trimFastqByQVWindow<br>-rwxrwxr-x 1 urbe urbe&nbsp; 18K Apr 10 11:40 uidclient<br>-rwxrwxr-x 1 urbe urbe 589K Apr 10 11:41 unitigger<br>-rwxrwxr-x 1 urbe urbe&nbsp; 42K Apr 10 11:40 upgrade-v8-to-v9<br>-rwxrwxr-x 1 urbe urbe&nbsp; 42K Apr 10 11:40 upgrade-v9-to-v10<br>-rwxrwxr-x 1 urbe urbe&nbsp; 854 Apr 10 11:41 utg2fasta<br>-rwxrwxr-x 1 urbe urbe 731K Apr 10 11:41 utgcns<br>-rwxrwxr-x 1 urbe urbe 561K Apr 10 11:41 utgcnsfix<br><br><br></p><p>Address of the bookmark: <a href="http://wgs-assembler.sourceforge.net/wiki/index.php/Main_Page" rel="nofollow">http://wgs-assembler.sourceforge.net/wiki/index.php/Main_Page</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37205/afterqc-automatic-filtering-trimming-error-removing-and-quality-control-for-fastq-data</guid>
	<pubDate>Fri, 29 Jun 2018 03:26:03 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37205/afterqc-automatic-filtering-trimming-error-removing-and-quality-control-for-fastq-data</link>
	<title><![CDATA[AfterQC: Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data]]></title>
	<description><![CDATA[Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data
AfterQC can simply go through all fastq files in a folder and then output three folders: good, bad and QC folders, which contains good reads, bad reads and the QC results of each fastq file/pair.
Currently it supports processing data from HiSeq 2000/2500/3000/4000, Nextseq 500/550, MiniSeq...and other Illumina 1.8 or newer formats

The author has reimplemented this tool in C++ with multithreading support to make it much faster. The new tool is called fastp and can be found at: https://github.com/OpenGene/fastp . If you prefer a C++ based tool, please use fastp instead.

https://github.com/OpenGene/AfterQC<p>Address of the bookmark: <a href="https://github.com/OpenGene/AfterQC" rel="nofollow">https://github.com/OpenGene/AfterQC</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/35534/awk-for-bioinformatician-and-computational-biologist</guid>
	<pubDate>Tue, 06 Feb 2018 14:54:35 -0600</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/35534/awk-for-bioinformatician-and-computational-biologist</link>
	<title><![CDATA[Awk for Bioinformatician and computational biologist]]></title>
	<description><![CDATA[<p>Awk is a programming language which allows easy manipulation of structured data and is mostly used for pattern scanning and processing. It searches one or more files to see if they contain lines that match with the specified patterns and then perform associated actions. The basic syntax is:</p><blockquote><p><br />awk '/pattern1/ {Actions}<br /> /pattern2/ {Actions}' file</p></blockquote><p><br />The working of Awk is as follows<br />Awk reads the input files one line at a time.<br />For each line, it matches with given pattern in the given order, if matches performs the corresponding action.<br />If no pattern matches, no action will be performed.<br />In the above syntax, either search pattern or action are optional, But not both.<br />If the search pattern is not given, then Awk performs the given actions for each line of the input.<br />If the action is not given, print all that lines that matches with the given patterns which is the default action.<br />Empty braces with out any action does nothing. It wont perform default printing operation.<br />Each statement in Actions should be delimited by semicolon.<br />Say you have data.tsv with the following contents:</p><p><br />$ cat data/test.tsv<br />contig1 ACTGTCTGTCACTGTGTTGTGATGTTGTGTGTG<br />contig2 ACTTTATATATT<br />contig3 ACTTATATATATATA<br />contig4 ACTTATATATATATA<br />contig5 ACTTTATATATT <br />By default Awk prints every line from the file.</p><p><br />$ awk '{print;}' data/test.tsv<br />contig1 ACTGTCTGTCACTGTGTTGTGATGTTGTGTGTG<br />contig2 ACTTTATATATT<br />contig3 ACTTATATATATATA<br />contig4 ACTTATATATATATA<br />contig5 ACTTTATATATT <br />We print the line which matches the pattern contig3</p><p><br />$ awk '/contig3/' data/test.tsv<br />contig3 ACTTATATATATATA<br />Awk has number of builtin variables. For each record i.e line, it splits the record delimited by whitespace character by default and stores it in the $n variables. If the line has 5 words, it will be stored in $1, $2, $3, $4 and $5. $0 represents the whole line. NF is a builtin variable which represents the total number of fields in a record.</p><p><br />$ awk '{print $1","$2;}' data/test.tsv<br />contig1,ACTGTCTGTCACTGTGTTGTGATGTTGTGTGTG<br />contig2,ACTTTATATATT<br />contig3,ACTTATATATATATA<br />contig4,ACTTATATATATATA<br />contig5,ACTTTATATATT</p><p>$ awk '{print $1","$NF;}' data/test.tsv<br />contig1,ACTGTCTGTCACTGTGTTGTGATGTTGTGTGTG<br />contig2,ACTTTATATATT<br />contig3,ACTTATATATATATA<br />contig4,ACTTATATATATATA<br />contig5,ACTTTATATATT</p><p><br />Awk has two important patterns which are specified by the keyword called BEGIN and END. The syntax is as follows:</p><blockquote><p>BEGIN { Actions before reading the file}<br />{Actions for everyline in the file} <br />END { Actions after reading the file }</p></blockquote><p><br />For example,<br />$ awk 'BEGIN{print "Header,Sequence"}{print $1","$2;}END{print "-------"}' data/test.tsv<br />Header,Sequence<br />contig1,ACTGTCTGTCACTGTGTTGTGATGTTGTGTGTG<br />contig2,ACTTTATATATT<br />contig3,ACTTATATATATATA<br />contig4,ACTTATATATATATA<br />contig5,ACTTTATATATT<br />------- <br />We can also use the concept of a conditional operator in print statement of the form print CONDITION ? PRINT_IF_TRUE_TEXT : PRINT_IF_FALSE_TEXT. For example, in the code below, we identify sequences with lengths &gt; 14:</p><p>$ awk '{print (length($2)&gt;14) ? $0"&gt;14" : $0"&lt;=14";}' data/test.tsv<br />contig1 ACTGTCTGTCACTGTGTTGTGATGTTGTGTGTG&gt;14<br />contig2 ACTTTATATATT&lt;=14<br />contig3 ACTTATATATATATA&gt;14<br />contig4 ACTTATATATATATA&gt;14<br />contig5 ACTTTATATATT&lt;=14<br />We can also use 1 after the last block {} to print everything (1 is a shorthand notation for {print $0} which becomes {print} as without any argument print will print $0 by default), and within this block, we can change $0, for example to assign the first field to $0 for third line (NR==3), we can use:</p><p>$ awk 'NR==3{$0=$1}1' data/test.tsv<br />contig1 ACTGTCTGTCACTGTGTTGTGATGTTGTGTGTG<br />contig2 ACTTTATATATT<br />contig3<br />contig4 ACTTATATATATATA<br />contig5 ACTTTATATATT<br />You can have as many blocks as you want and they will be executed on each line in the order they appear, for example, if we want to print $1 three times (here we are using printf instead of print as the former doesn't put end-of-line character),</p><p>$ awk '{printf $1"\t"}{printf $1"\t"}{print $1}' data/test.tsv<br />contig1 contig1 contig1<br />contig2 contig2 contig2<br />contig3 contig3 contig3<br />contig4 contig4 contig4<br />contig5 contig5 contig5 <br />Although, we can also skip executing later blocks for a given line by using next keyword:</p><p>$ awk '{printf $1"\t"}NR==3{print "";next}{print $1}' data/test.tsv<br />contig1 contig1<br />contig2 contig2<br />contig3 <br />contig4 contig4<br />contig5 contig5</p><p>$ awk 'NR==3{print "";next}{printf $1"\t"}{print $1}' data/test.tsv<br />contig1 contig1<br />contig2 contig2</p><p>contig4 contig4<br />contig5 contig5<br />You can also use getline to load the contents of another file in addition to the one you are reading, for example, in the statement given below, the while loop will load each line from test.tsv into k until no more lines are to be read:</p><p>$ awk 'BEGIN{while((getline k &lt;"data/test.tsv")&gt;0) print "BEGIN:"k}{print}' data/test.tsv<br />BEGIN:contig1 ACTGTCTGTCACTGTGTTGTGATGTTGTGTGTG<br />BEGIN:contig2 ACTTTATATATT<br />BEGIN:contig3 ACTTATATATATATA<br />BEGIN:contig4 ACTTATATATATATA<br />BEGIN:contig5 ACTTTATATATT<br />contig1 ACTGTCTGTCACTGTGTTGTGATGTTGTGTGTG<br />contig2 ACTTTATATATT<br />contig3 ACTTATATATATATA<br />contig4 ACTTATATATATATA<br />contig5 ACTTTATATATT <br />You can also store data in the memory with the syntax VARIABLE_NAME[KEY]=VALUE which you can later use through for (INDEX in VARIABLE_NAME) command:</p><p>$ awk '{i[$1]=1}END{for (j in i) print j"&lt;="i[j]}' data/test.tsv<br />contig1&lt;=1<br />contig2&lt;=1<br />contig3&lt;=1<br />contig4&lt;=1<br />contig5&lt;=1</p>]]></description>
	<dc:creator>Poonam Mahapatra</dc:creator>
</item>

</channel>
</rss>