<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/27318?offset=690</link>
	<atom:link href="https://bioinformaticsonline.com/related/27318?offset=690" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/1295/five-points-for-bioinformatics-softwaretools</guid>
	<pubDate>Mon, 05 Aug 2013 04:12:32 -0500</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/1295/five-points-for-bioinformatics-softwaretools</link>
	<title><![CDATA[Five points for bioinformatics software/tools]]></title>
	<description><![CDATA[<p><span>In the bioinformatics sector we mostly spend time on computational analysis of huge amounts of data and try to make sense of it, biologically. But, most of the newbie bioinformaticians are faced with dilemma when they receive biological sequence data for the first time. They mostly found confusing over open source, user friendly GUI, and commercial bioinformatics software. Don&rsquo;t be surprise this is true and also not an easy task to decide, because analytical step is the most crucial part and believe to be the biggest bottleneck in publishing paper in high impact journals. Through this blog I would like to address the pros and cons of both kind of software/tools and try to assist (Hmmm not really, It looks convince) you to make decision on your software selections.</span></p><p><span><img src="http://bioinformaticsonline.com/mod/photo/five.jpg" alt="image" style="border: 0px;"></span></p><p><span>The most common newbie questions are:</span><span></span></p><p><span>Should I try to use these free open source programs? &nbsp;Why are we not trying GUI software for computational analysis? Should I use commercial bioinformatics programs/software?&rdquo;</span><span><br /></span><span><br />1. Let&rsquo;s be open</span><span></span></p><p><span>We generally think free and cheap are useless. But this concept is not applicable when we discuss open source software. Mostly, the bioinformatics software is developed by highly competitive biological programmers who believe in open sharing of knowledge. They come under Open Bioinformatics Foundation or O|B|F which is a non-profit, volunteer run organization focused on supporting open source programming in bioinformatics. The best part about open source tools/software is that they&rsquo;re free to download the source code and read exactly what the program does. If you are so inclined, you can view all of the parts of the program and see the logical flow of the pipeline. In addition, open source makes an excellent learning tool for any beginning bioinformatician. Moreover, you can modify existing open source programs to deal with cutting-edge problems or to customize your pipeline.</span><span>&nbsp;</span><span>Apart from your computational and analysis work, most of the reviewer also prefers the open source based results so that they can validate the results if validation required.</span></p><p><span>2. Code headache</span><span></span></p><p><span>As a bioinformatician you are supposed to know the basics of programming languages, and if you are not good at it, then please learn it as soon as possible because you are not a bio-analyst but biological programmers. The<span>&nbsp;</span>open source programs usually lack dedicated service and support teams (often because they were the product of an overworked doc/postdoc!) so you are responsible for troubleshooting your own errors most of the time.<span>&nbsp;</span>We commonly receive the HELP email to support and assist to setup the pipeline; you can also find this kind of request on any QA forum. I personally believe this coding horror brings the biggest downside of open-source programs; where you need some programming skills in order to implement the program in your pipeline. But, if you are not able to fix the pipeline and modify the open source code according to your requirements them you should re-think on your bioinformatician name tag!!!</span><span></span></p><p><span>3. Dive into the codes</span><span></span></p><p><span>Some of the biologist turn bioinformatician says &ldquo;if you can do the same thing with commercial software then why to get migraine with weird codes&rdquo;, well this statement looks to me that guys are keen to learn swimming but still don&rsquo;t like to get wet. If you are still using paid software and doing your work by customer support and clicking some of the well-designed GUI button then perhaps you are not interested in learning and trying new and challenging bioinformatics works. You are missing the basic flavour of bioinformatics. Let&rsquo;s dive into the coding world, I am sure your will enjoy it. I recommend your to swim freely in code&rsquo;s sea, and enjoy the journey; do not merely watch it from the outside. &nbsp;</span></p><p><span>4. Paid does not mean better</span><span></span></p><p><span>The bioinformatics company which are specializes in bioinformatics solutions develop well designed/packed, user friendly software by using a large number of specialised scientist, programmers and support staff. They also provide good services to accomplice your biological analysis work. This means that if you hit a &lsquo;snag&rsquo; with your data, help is likely only a phone call away! These companies price their products competitively against the cost of a dedicated bioinformatician. You may be able to afford the program, but not the additional staff! Additionally, most of the functionality that you need in your analysis is already coded into the program. Need to plot a graph? Just click this button right here. It is that easy.</span><span>&nbsp;</span><span>But, as a bioinformatician this is not generally well encouraged approach in biological analysis work, because the software is not available to everyone and your data can&rsquo;t be validated. Moreover, there is very less chances that anyone will repeat your work or love to do similar kind of research (because not all the labs in the world are rich like yours).</span></p><p><span>5. Take a caution<br /><br />In biological analysis work, in which you deal GB/TB of data are having maximum chances of getting errors, so please be careful and always cross check your data before coming to any conclusion. Even an error in two line code can alter your entire analysis and display weird results. Some of the scientist blindly believes on commercial software, which is entirely wrong. Using proprietary tools does not absolve you of the need to actually read and research the type of analysis that you are doing. This is particularly true in the case of genome assembly and annotation.</span></p><p><span><br />At the end, I would like to tell only one think that open source solutions allows you to do more cutting edge analysis than the commercial tools. So let&rsquo;s go for it.</span></p><p>Disclaimer:</p><p>This is my personal view. I have nothing to do with any company or open source community.&nbsp;The views expressed on these pages are mine alone and not those of my current/past employers. I do reserve the right to remove comments left by spammers or off-topic comments.</p>]]></description>
	<dc:creator>Jitendra Narayan</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/35525/linux-commands-cheat-sheet-for-bioinformatics-and-computational-biology-professionals</guid>
	<pubDate>Mon, 05 Feb 2018 18:50:41 -0600</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/35525/linux-commands-cheat-sheet-for-bioinformatics-and-computational-biology-professionals</link>
	<title><![CDATA[Linux Commands Cheat Sheet for Bioinformatics and Computational Biology Professionals]]></title>
	<description><![CDATA[<p><span>The purpose of this cheat sheet is to introduce biologist and bioinformatician to the frequently used tools for NGS analysis as well as giving experience in writing one-liners.</span></p><ul>
<li><span></span><span><strong>File System</strong></span><span><strong><br /> </strong></span><span>ls</span><span>&nbsp;&mdash; list items in current directory</span><span><br /> </span><span>ls -l</span><span>&nbsp;&mdash; list items in current directory and show in long format to see perimissions, size, and modification date</span><span><br /> </span><span>ls -a</span><span>&nbsp;&mdash; list all items in current directory, including hidden files</span><span><br /> </span><span>ls -F</span><span>&nbsp;&mdash; list all items in current directory and show directories with a slash and executables with a star</span><span><br /> </span><span>ls dir</span><span>&nbsp;&mdash; list all items in directory dir</span><span><br /> </span><span>cd dir</span><span>&nbsp;&mdash; change directory to dir</span><span><br /> </span><span>cd ..</span><span>&nbsp;&mdash; go up one directory</span><span><br /> </span><span>cd /</span><span>&nbsp;&mdash; go to the root directory</span><span><br /> </span><span>cd ~</span><span>&nbsp;&mdash; go to to your home directory</span><span><br /> </span><span>cd -</span><span>&nbsp;&mdash; go to the last directory you were just in</span><span><br /> </span><span>pwd</span><span>&nbsp;&mdash; show present working directory</span><span><br /> </span><span>mkdir dir</span><span>&nbsp;&mdash; make directory dir</span><span><br /> </span><span>rm file</span><span>&nbsp;&mdash; remove file</span><span><br /> </span><span>rm -r dir</span><span>&nbsp;&mdash; remove directory dir recursively</span><span><br /> </span><span>cp file1 file2</span><span>&nbsp;&mdash; copy file1 to file2</span><span><br /> </span><span>cp -r dir1 dir2</span><span>&nbsp;&mdash; copy directory dir1 to dir2 recursively</span><span><br /> </span><span>mv file1 file2</span><span>&nbsp;&mdash; move (rename) file1 to file2</span><span><br /> </span><span>ln -s file link</span><span>&nbsp;&mdash; create symbolic link to file</span><span><br /> </span><span>touch file</span><span>&nbsp;&mdash; create or update file</span><span><br /> </span><span>cat file</span><span>&nbsp;&mdash; output the contents of file</span><span><br /> </span><span>less file</span><span>&nbsp;&mdash; view file with page navigation</span><span><br /> </span><span>head file</span><span>&nbsp;&mdash; output the first 10 lines of file</span><span><br /> </span><span>tail file</span><span>&nbsp;&mdash; output the last 10 lines of file</span><span><br /> </span><span>tail -f file</span><span>&nbsp;&mdash; output the contents of file as it grows, starting with the last 10 lines</span><span><br /> </span><span>vim file</span><span>&nbsp;&mdash; edit file</span><span><br /> </span><span>alias name 'command'</span><span>&nbsp;&mdash; create an alias for a command</span><span><br /> </span></li>
<li><span></span><span><strong>System</strong></span><span><strong><br /> </strong></span><span>shutdown</span><span>&nbsp;&mdash; shut down machine</span><span><br /> </span><span>reboot</span><span>&nbsp;&mdash; restart machine</span><span><br /> </span><span>date</span><span>&nbsp;&mdash; show the current date and time</span><span><br /> </span><span>whoami</span><span>&nbsp;&mdash; who you are logged in as</span><span><br /> </span><span>finger user</span><span>&nbsp;&mdash; display information about user</span><span><br /> </span><span>man command</span><span>&nbsp;&mdash; show the manual for command</span><span><br /> </span><span>df</span><span>&nbsp;&mdash; show disk usage</span><span><br /> </span><span>du</span><span>&nbsp;&mdash; show directory space usage</span><span><br /> </span><span>free</span><span>&nbsp;&mdash; show memory and swap usage</span><span><br /> </span><span>whereis app</span><span>&nbsp;&mdash; show possible locations of app</span><span><br /> </span><span>which app</span><span>&nbsp;&mdash; show which app will be run by default</span><span><br /> </span></li>
<li><span></span><span><strong>Process Management</strong></span><span><strong><br /> </strong></span><span>ps</span><span>&nbsp;&mdash; display your currently active processes</span><span><br /> </span><span>top</span><span>&nbsp;&mdash; display all running processes</span><span><br /> </span><span>kill pid</span><span>&nbsp;&mdash; kill process id pid</span><span><br /> </span><span>kill -9 pid</span><span>&nbsp;&mdash; force kill process id pid</span><span><br /> </span></li>
<li><span></span><span><strong>Permissions</strong></span><span><strong><br /> </strong></span><span>ls -l</span><span>&nbsp;&mdash; list items in current directory and show permissions</span><span><br /> </span><span>chmod ugo file</span><span>&nbsp;&mdash; change permissions of file to ugo - u is the user's permissions, g is the group's permissions, and o is everyone else's permissions. The values of u, g, and o can be any number between 0 and 7.</span><span><br /> </span><span>7</span><span>&nbsp;&mdash; full permissions</span><span><br /> </span><span>6</span><span>&nbsp;&mdash; read and write only</span><span><br /> </span><span>5</span><span>&nbsp;&mdash; read and execute only</span><span><br /> </span><span>4</span><span>&nbsp;&mdash; read only</span><span><br /> </span><span>3</span><span>&nbsp;&mdash; write and execute only</span><span><br /> </span><span>2</span><span>&nbsp;&mdash; write only</span><span><br /> </span><span>1</span><span>&nbsp;&mdash; execute only</span><span><br /> </span><span>0</span><span>&nbsp;&mdash; no permissions</span><span><br /> </span><span>chmod 600 file</span><span>&nbsp;&mdash; you can read and write - good for files</span><span><br /> </span><span>chmod 700 file</span><span>&nbsp;&mdash; you can read, write, and execute - good for scripts</span><span><br /> </span><span>chmod 644 file</span><span>&nbsp;&mdash; you can read and write, and everyone else can only read - good for web pages</span><span><br /> </span><span>chmod 755 file</span><span>&nbsp;&mdash; you can read, write, and execute, and everyone else can read and execute - good for programs that you want to share</span><span><br /> </span></li>
<li><span></span><span><strong>Networking</strong></span><span><strong><br /> </strong></span><span>wget file</span><span>&nbsp;&mdash; download a file</span><span><br /> </span><span>curl file</span><span>&nbsp;&mdash; download a file</span><span><br /> </span><span>scp user@host:file dir</span><span>&nbsp;&mdash; secure copy a file from remote server to the dir directory on your machine</span><span><br /> </span><span>scp file user@host:dir</span><span>&nbsp;&mdash; secure copy a file from your machine to the dir directory on a remote server</span><span><br /> </span><span>scp -r user@host:dir dir</span><span>&nbsp;&mdash; secure copy the directory dir from remote server to the directory dir on your machine</span><span><br /> </span><span>ssh user@host</span><span>&nbsp;&mdash; connect to host as user</span><span><br /> </span><span>ssh -p port user@host</span><span>&nbsp;&mdash; connect to host on port as user</span><span><br /> </span><span>ssh-copy-id user@host</span><span>&nbsp;&mdash; add your key to host for user to enable a keyed or passwordless login</span><span><br /> </span><span>ping host</span><span>&nbsp;&mdash; ping host and output results</span><span><br /> </span><span>whois domain</span><span>&nbsp;&mdash; get information for domain</span><span><br /> </span><span>dig domain</span><span>&nbsp;&mdash; get DNS information for domain</span><span><br /> </span><span>dig -x host</span><span>&nbsp;&mdash; reverse lookup host</span><span><br /> </span><span>lsof -i tcp:1337</span><span>&nbsp;&mdash; list all processes running on port 1337</span><span><br /> </span></li>
<li><span></span><span><strong>Searching</strong></span><span><strong><br /> </strong></span><span>grep pattern files</span><span>&nbsp;&mdash; search for pattern in files</span><span><br /> </span><span>grep -r pattern dir</span><span>&nbsp;&mdash; search recursively for pattern in dir</span><span><br /> </span><span>grep -rn pattern dir</span><span>&nbsp;&mdash; search recursively for pattern in dir and show the line number found</span><span><br /> </span><span>grep -r pattern dir --include='*.ext</span><span>&nbsp;&mdash; search recursively for pattern in dir and only search in files with .ext extension</span><span><br /> </span><span>command | grep pattern</span><span>&nbsp;&mdash; search for pattern in the output of command</span><span><br /> </span><span>find file</span><span>&nbsp;&mdash; find all instances of file in real system</span><span><br /> </span><span>locate file</span><span>&nbsp;&mdash; find all instances of file using indexed database built from the updatedb command. Much faster than find</span><span><br /> </span><span>sed -i 's/day/night/g' file</span><span>&nbsp;&mdash; find all occurrences of day in a file and replace them with night - s means substitude and g means global - sed also supports regular expressions</span><span><br /> </span></li>
<li><span></span><span><strong>Compression</strong></span><span><strong><br /> </strong></span><span>tar cf file.tar files</span><span>&nbsp;&mdash; create a tar named file.tar containing files</span><span><br /> </span><span>tar xf file.tar</span><span>&nbsp;&mdash; extract the files from file.tar</span><span><br /> </span><span>tar czf file.tar.gz files</span><span>&nbsp;&mdash; create a tar with Gzip compression</span><span><br /> </span><span>tar xzf file.tar.gz</span><span>&nbsp;&mdash; extract a tar using Gzip</span><span><br /> </span><span>gzip file</span><span>&nbsp;&mdash; compresses file and renames it to file.gz</span><span><br /> </span><span>gzip -d file.gz</span><span>&nbsp;&mdash; decompresses file.gz back to file</span><span><br /> </span></li>
<li><span></span><span><strong>Shortcuts</strong></span><span><strong><br /> </strong></span><span>ctrl+a</span><span>&nbsp;&mdash; move cursor to beginning of line</span><span><br /> </span><span>ctrl+f</span><span>&nbsp;&mdash; move cursor to end of line</span><span><br /> </span><span>alt+f</span><span>&nbsp;&mdash; move cursor forward 1 word</span><span><br /> </span><span>alt+b</span><span>&nbsp;&mdash; move cursor backward 1 word</span><span><br /> </span></li>
<li></li>
</ul>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/news/view/1469/prime-minister%E2%80%99s-100k-genome-project</guid>
	<pubDate>Thu, 08 Aug 2013 09:40:39 -0500</pubDate>
	<link>https://bioinformaticsonline.com/news/view/1469/prime-minister%E2%80%99s-100k-genome-project</link>
	<title><![CDATA[Prime Minister’s 100k Genome Project]]></title>
	<description><![CDATA[<p>Genomics Ebgland is destined to sequence 100,000 patients over the next five year in England.&nbsp; A landmark project by british government.</p><p>Genomics England will play a key role in building on the UK&rsquo;s long track record as leader in medical science advances to push the boundaries by unlocking the power of DNA data. The UK will become the first ever country to introduce this technology in its mainstream health system &ndash; leading the global race for better tests, better drugs and above all better, more personalised care.</p><p>http://www.genomicsengland.co.uk/100k-genome-project/</p>]]></description>
	<dc:creator>Jitendra Narayan</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/35983/some-useful-bioinformatics-links</guid>
	<pubDate>Fri, 16 Mar 2018 20:50:10 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/35983/some-useful-bioinformatics-links</link>
	<title><![CDATA[Some useful Bioinformatics links]]></title>
	<description><![CDATA[<p><br /> Reference-free prediction of rearrangement breakpoint reads | Bioinformatics | Oxford Academic</p><p>https://academic.oup.com/bioinformatics/article/30/18/2559/2475628<br /> Reference-free SNP detection: dealing with the data deluge</p><p>https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4083407/<br /> GATB/DiscoSnp: DiscoSnp is designed for discovering all kinds of SNPs (not only isolated ones), as well as insertions and deletions, from raw set(s) of reads.</p><p>https://github.com/GATB/DiscoSnp<br /> De novo assembly | Oxford Nanopore Technologies</p><p>https://nanoporetech.com/taxonomy/term/131<br /> De novo long-read assembly of a complex animal genome | bioRxiv</p><p>https://www.biorxiv.org/content/early/2017/09/10/187054<br /> Rapid de novo assembly of the European eel genome from nanopore sequencing reads | Scientific Reports</p><p>https://www.nature.com/articles/s41598-017-07650-6.epdf?author_access_token=dktG7e98wyRJnaEEMTcPqtRgN0jAjWel9jnR3ZoTv0P7E7t-wVGo30iojNO7dICajNY_7PE5xVPv6OoLe7hn9TeUjcZ5umREOzNoPMWkfYH58RS6uxm3vm4e4BG2AA_WKW84i6egKK271NwMq-NfzA%3D%3D<br /> nanoporetech/ont-assembly-polish: ONT assembly and Illumina polishing pipeline</p><p>https://github.com/nanoporetech/ont-assembly-polish<br /> Generade-nl/TULIP: TULIP - The Uncorrected Long read Itegration Pipeline</p><p>https://github.com/Generade-nl/TULIP<br /> www.nature.com</p><p>https://www.nature.com/articles/s41598-017-03996-z<br /> Example gallery of NanoPlot &ndash; Gigabase or gigabyte</p><p>https://gigabaseorgigabyte.wordpress.com/2017/06/01/example-gallery-of-nanoplot/<br /> Tool documentation</p><p>https://broadinstitute.github.io/picard/command-line-overview.html<br /> Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. - PubMed - NCBI</p><p>https://www.ncbi.nlm.nih.gov/pubmed/24185095<br /> MAFFT ver.7 - a multiple sequence alignment program</p><p>https://mafft.cbrc.jp/alignment/software/algorithms/algorithms.html<br /> Measuring the distance between multiple sequence alignments | Bioinformatics | Oxford Academic</p><p>https://academic.oup.com/bioinformatics/article/28/4/495/212883<br /> The MUMmer 3 examples</p><p>http://mummer.sourceforge.net/examples/<br /> MAFFT ver.7 - a multiple sequence alignment program</p><p>https://mafft.cbrc.jp/alignment/software/tips.html<br /> Omega | Overlap-graph de novo Assembler for Metagenomics</p><p>https://omega.omicsbio.org/<br /> abiswas-odu/Disco: Multi-threaded Distributed Memory Overlap-Layout-Consensus (OLC) Metagenome Assembler</p><p>https://github.com/abiswas-odu/Disco<br /> SAGE: String-overlap Assembly of GEnomes | BMC Bioinformatics | Full Text</p><p>https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-302</p><p>Fast and sensitive mapping of nanopore sequencing reads with GraphMap | Nature Communications</p><p>https://www.nature.com/articles/ncomms11307<br /> lumpy-sv/extractSplitReads_BwaMem at master &middot; arq5x/lumpy-sv</p><p>https://github.com/arq5x/lumpy-sv/blob/master/scripts/extractSplitReads_BwaMem<br /> jts/nanocorrect: Experimental pipeline for correcting nanopore reads</p><p>https://github.com/jts/nanocorrect</p><p>video - how to install flash plugin on ubuntu 14.04 LTS 64-bit version - Ask Ubuntu</p><p>https://askubuntu.com/questions/469553/how-to-install-flash-plugin-on-ubuntu-14-04-lts-64-bit-version<br /> lh3/fermi: A WGS de novo assembler based on the FMD-index for large genomes</p><p>https://github.com/lh3/fermi<br /> Multi-metagenome</p><p>http://madsalbertsen.github.io/multi-metagenome/docs/step9.html<br /> Bandage by rrwick</p><p>https://rrwick.github.io/Bandage/<br /> Codon Optimization OnLine (COOL): a web-based multi-objective optimization platform for synthetic gene design | Bioinformatics | Oxford Academic</p><p>https://academic.oup.com/bioinformatics/article/30/15/2210/2391162<br /> Genome Architecture and Evolution of a Unichromosomal Asexual Nematode - ScienceDirect</p><p>https://www.sciencedirect.com/science/article/pii/S096098221731076X?via%3Dihub#fig4<br /> How to determine chimeras in my de novo assembly? - SEQanswers</p><p>http://seqanswers.com/forums/showthread.php?t=26721<br /> samtools(1) manual page</p><p>http://www.htslib.org/doc/samtools.html<br /> How To Filter Mapped Reads With Samtools</p><p>https://www.biostars.org/p/56246/<br /> The MUMmer 3 manual</p><p>http://mummer.sourceforge.net/manual/#nucmer<br /> assembly_olc.pdf</p><p>http://www.cs.jhu.edu/~langmea/resources/lecture_notes/assembly_olc.pdf<br /> SAM and BAM filtering oneliners</p><p>https://gist.github.com/davfre/8596159<br /> Inroduction to dot-plots</p><p>http://www.code10.info/index.php%3Foption%3Dcom_content%26view%3Darticle%26id%3D64:inroduction-to-dot-plots%26catid%3D52:cat_coding_algorithms_dot-plots%26Itemid%3D76<br /> RepeatFinder Home Page</p><p>http://www.cbcb.umd.edu/software/RepeatFinder/<br /> RepeatFinderReprint.pdf</p><p>http://www.cbcb.umd.edu/software/RepeatFinder/RepeatFinderReprint.pdf<br /> https://bernatgel.github.io/karyoploter_tutorial//Tutorial/CreateIdeogram/CreateIdeogram.html</p><p>https://bernatgel.github.io/karyoploter_tutorial//Tutorial/CreateIdeogram/CreateIdeogram.html<br /> Circular Visualization in R</p><p>http://zuguang.de/circlize_book/book/introduction.html#a-qiuck-glance<br /> Creating a coverage plot using BEDTools and R</p><p>https://davetang.org/muse/2015/08/05/creating-a-coverage-plot-using-bedtools-and-r/<br /> Eval: A software package for analysis of genome annotations | BMC Bioinformatics | Full Text</p><p>https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-4-50<br /> eval-documentation.pdf</p><p>http://mblab.wustl.edu/media/software/eval-documentation.pdf<br /> OmicCircos: A Simple-to-Use R Package for the Circular Visualization of Multidimensional Omics Data</p><p>https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3921174/<br /> sequence - download.tardigrades.org &gt; v1 &gt; sequence</p><p>http://download.tardigrades.org/v1/sequence/<br /> ksahlin/BESST: BESST - scaffolder for genomic assemblies</p><p>https://github.com/ksahlin/BESST<br /> reubwn/scripts: Useful scripts for various things</p><p>https://github.com/reubwn/scripts<br /> ICEberg</p><p>http://db-mml.sjtu.edu.cn/ICEberg/index.php<br /> Satsuma - Evolution and Genomics</p><p>http://evomics.org/learning/genomics/satsuma/<br /> A complete bacterial genome assembled de novo using only nanopore sequencing data | Nature Methods</p><p>https://www.nature.com/articles/nmeth.3444<br /> vezzi/FRC_align: Computes FRC from SAM/BAM file and not from afg files</p><p>https://mail.google.com/mail/u/0/#inbox<br /> Read GTF file into R - Dave Tang's blog</p><p>https://davetang.org/muse/2017/08/04/read-gtf-file-r/</p><p>https://bernatgel.github.io/karyoploter_tutorial//Tutorial/CustomGenomes/CustomGenomes.html</p><p>https://bernatgel.github.io/karyoploter_tutorial//Tutorial/CustomGenomes/CustomGenomes.html<br /> Dot: Interactive dot plot for genome-genome alignments</p><p>https://dnanexus.github.io/dot/<br /> Zoho Accounts</p><p>https://accounts.zoho.eu/signin?servicename=ZohoProjects&amp;serviceurl=https%3A%2F%2Fprojects.zoho.eu%2Fportal%2Favaga2<br /> lh3/minimap2: A versatile pairwise aligner for genomic and spliced nucleotide sequences</p><p>https://github.com/lh3/minimap2<br /> SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information | BMC Bioinformatics | Full Text</p><p>https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-211<br /> Palindromic gene amplification &mdash; an evolutionarily conserved role for DNA inverted repeats in the genome | Nature Reviews Cancer</p><p>https://www.nature.com/articles/nrc2591<br /> bioinformatics - BLAST DNA Sequences Reversed - Biology Stack Exchange</p><p>https://biology.stackexchange.com/questions/8160/blast-dna-sequences-reversed<br /> LASTZ</p><p>http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html<br /> SOGo - (1652) Inbox</p><p>https://sogo.unamur.be/SOGo/so/jnarayan/Mail/view<br /> Tetra-Nucleotide Analysis (TNA) | BIOiPLUG Help center</p><p>http://help.bioiplug.com/tetra-nucleotide-analysis-tna/</p><p>Clustering metagenomic contigs on tetranucleotide frequency &mdash; CGAT documentation</p><p>http://cgat.readthedocs.io/en/latest/recipes/metagenome_contigs_kmers.html</p><p>&nbsp;</p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/36384/binding-site-prediction-in-protein</guid>
	<pubDate>Wed, 25 Apr 2018 04:35:57 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/36384/binding-site-prediction-in-protein</link>
	<title><![CDATA[Binding Site Prediction in Protein !]]></title>
	<description><![CDATA[<p><span>The interaction between proteins and other molecules is fundamental to all biological functions. In this section we include tools that can assist in prediction of interaction sites on protein surface and tools for predicting the structure of the intermolecular complex formed between two or more molecules (docking).</span></p><h4>Pockets Identification</h4><p><a href="http://sts.bioengr.uic.edu/castp/" target="_blank">CASTp</a></p><div style="text-align: justify;">Automatic Identification of pockets and cavities in proteins structure, and quantitation of their volumes using Delaunay triangulation. Available also as PyMOL plugin</div><p><a href="http://www.bioinformatics.leeds.ac.uk/pocketfinder/" target="_blank">Pocket-Finder</a></p><div style="text-align: justify;">Automatic identification of pockets and cavities in proteins structure, and quantitation of their volumes.</div><p><a href="http://gecco.org.chemie.uni-frankfurt.de/pocketpicker/index.html" target="_blank">PocketPicker</a></p><div style="text-align: justify;">Grid-based technique for the analysis of protein pockets. PocketPicker available as a plugin for&nbsp;<a href="https://bip.weizmann.ac.il/toolbox/structure/pymol.htm">PyMOL</a></div><div style="text-align: justify;">&nbsp;</div><div style="text-align: justify;"><h4>Binding Site Prediction</h4>
<p><a href="http://consurf.tau.ac.il/" target="_blank">ConSurf</a></p>
</div><div style="text-align: justify;">&nbsp;</div><div style="text-align: justify;">Identification of functional regions in proteins by surface-mapping of phylogenetic information</div><div style="text-align: justify;">&nbsp;</div><div style="text-align: justify;"><a href="http://www-cryst.bioc.cam.ac.uk/~crescendo/crescendo.php" target="_blank">CRESCENDO</a></div><div style="text-align: justify;">&nbsp;</div><div style="text-align: justify;">Identification protein interaction sites. It uses sequence conservation patterns in homologous proteins to distinguish between residues that are conserved due to structural restraints from those due to functional restraints.&nbsp;&nbsp;</div><div style="text-align: justify;">&nbsp;</div><div style="text-align: justify;"><strong>Ligand Binding Sites</strong></div><div style="text-align: justify;">&nbsp;</div><div style="text-align: justify;"><a href="http://www.sbg.bio.ic.ac.uk/~3dligandsite/" target="_blank">3DLigandSite</a></div><div style="text-align: justify;">&nbsp;</div><div style="text-align: justify;">The server utilizes protein-structure prediction to provide structural models of the binding site. Ligands bound to structures are superimposed onto the model and use to predict the binding site.</div><div style="text-align: justify;">&nbsp;</div><div style="text-align: justify;">F<a href="http://cssb.biology.gatech.edu/skolnick/files/FINDSITE/" target="_blank">INDSITE</a></div><div style="text-align: justify;">&nbsp;</div><div style="text-align: justify;">A threading-based method for ligand-binding site prediction and functional annotation based on binding-site similarity across superimposed groups of threading templates.</div><div style="text-align: justify;">&nbsp;</div><div style="text-align: justify;">
<p><a href="http://scoppi.biotec.tu-dresden.de/pocket/" target="_blank">LIGSITE<sup>csc</sup></a></p>
<div style="text-align: justify;">&nbsp;</div><div style="text-align: justify;">Prediction of binding site by pocket identification using the Connolly surface and degree of conservation</div>
<p><a href="http://metapocket.eml.org/" target="_blank"></a></p>
</div><div style="text-align: justify;">&nbsp;</div><div style="text-align: justify;"><a href="http://metapocket.eml.org/" target="_blank">metaPocket</a>A meta server for ligand-binding site prediction. metaPocket use&nbsp;<a href="https://bip.weizmann.ac.il/toolbox/structure/binding.htm#ligsite">LIGSITE<sup>csc</sup></a>,&nbsp;<a href="https://bip.weizmann.ac.il/toolbox/structure/binding.htm#pass">PASS</a>,&nbsp;<a href="https://bip.weizmann.ac.il/toolbox/structure/binding.htm#qsite">Q-SiteFinder</a>&nbsp;and&nbsp;<a href="http://www.biochem.ucl.ac.uk/~roman/surfnet/surfnet.html" target="_blank">SURFNET</a></div>]]></description>
	<dc:creator>Poonam Mahapatra</dc:creator>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/opportunity/view/1720/postdoctoral-associate-bioinformatics-at-duke-university-medical-center</guid>
  <pubDate>Sat, 10 Aug 2013 18:38:38 -0500</pubDate>
  <link></link>
  <title><![CDATA[Postdoctoral Associate - Bioinformatics  at Duke University Medical Center]]></title>
  <description><![CDATA[
<p>The Department of Biostatistics and Bioinformatics at Duke University Medical Center is seeking a Postdoctoral Associate for a one year appointment to work on several high-dimensional research projects. The specific goals of the project are to identify genes or molecular markers that are predictive of clinical outcomes in renal and prostate cancer.</p>

<p>Candidates must have: a PhD degree in statistics, biostatistics or bioinformatics, extensive experience in analyzing high-dimensional data (microarray, SNP, CNVs) and of validation approaches. In addition, experience in penalized regression methods, data base manipulation; and strong programming skills in order to conduct Monte Carlo studies and applications (R). Candidate must have excellent communication skills (verbal, written and presentation), a strong proficiency in Linux system.</p>

<p>This position is available immediately and will be filled as soon as possible. Appointment could be extended beyond the first year based on additional funding.</p>

<p>For more information about the Department of Biostatistics and Bioinformatics, please visit our website: http://www.biostat.duke.edu.</p>

<p>For more info: http://biostat.duke.edu/sites/biostat.duke.edu/files/Halabi%20-%20Postdoc%20Job%20Posting%202013%20updated.pdf</p>

<p>Duke University is an Equal Opportunity/Affirmative Action Employer.</p>
]]></description>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/37590/parallel-processing-with-perl</guid>
	<pubDate>Sat, 25 Aug 2018 11:32:40 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/37590/parallel-processing-with-perl</link>
	<title><![CDATA[Parallel Processing with Perl !]]></title>
	<description><![CDATA[<p>Here is a small tutorial on how to make best use of multiple processors for bioinformatics analysis. One best way is using perl threads and forks. Knowing how these threads and forks work is very important before implementing them. Getting to know how these work would be really useful before reading this tutorial.</p><p>Many times in bioinformatics we need to deal with huge datasets which&nbsp; are more than 100GB size. The traditional way to analysis a file is using the while loop</p><p>while (FILE){</p><p>Do something;</p><p>}</p><p>This is very slow(since we are using only one processor) and if we have 500 million lines in the dataset it takes more than a day to iterate through the whole dataset. So how do we make best use of all our processors and get the work done quickly?</p><p>Here is a very simple and efficient technique with perl which i have been using. I am&nbsp; more inclined towards using perl fork than perl threads.</p><p>One of the oldest way to fork is</p><blockquote><p>my $fork = fork();<br />if($fork){&nbsp;&nbsp;&nbsp;<br />push (@childs,$fork);&nbsp;<br />}<br />elseif($fork==0){<br /><strong>your code here;</strong><br />exit(0);<br />}<br />else{die &ldquo;Couldnt fork : $!&rdquo;;}</p><p>## wait for the child process to finish<br />foreach(@childs){<br />my $tmp=waitid($_,0);<br />}</p></blockquote><p>what a fork does is it creates a child process and takes the variables and code with it to analyze it separately (detached from the parent process) and thus a separate process is created( which usually runs on a separate processor). Thats it!! One big disadvantage of forking is its very difficult to share variables among the different processes. I will show you how to do it easily but still it has its own drawbacks.</p><blockquote><p>Okie, now if you really do not want to use fork in your code, that&rsquo;s okie too..There are many useful modules which do it for you very efficiently. One really useful module is Parallel::ForkManager. You can use Parallel::ForkManager to manage the number of forks you want to generate (number of processors you want to use).</p><p><strong>Simple usage:</strong><br />use Parallel::ForkManager;<br />my $max_processors=8;<br />my $fork= new Parallel::ForkManager($max_processors);<br />foreach (@dna) {<br />$fork-&gt;start and next; # do the fork<br /><strong>you code here;</strong><br />$fork-&gt;finish; # do the exit in the child process<br />}<br />$pm-&gt;wait_all_children;</p></blockquote><p>so you will be generating 8 forks which do the same thing for your each element of array. when one child finishes, Parallel::ForkManager generates a new one and thus you will be using all your processors to analyze the data. Now, if you have generated 8 child processes and want to write the data to one file. You need to lock the file to do this, because you will have problems with the buffering. You can lock the file using flock command.</p><blockquote><p>open (my $QUAL, &ldquo;myfile.txt&rdquo;);<br />flock $QUAL, LOCK_EX or die &ldquo;cant lock file $!&rdquo;;<br />print $QUAL &ldquo;$output&rdquo;;<br />flock $QUAL, LOCK_UN or die &ldquo;$!&rdquo;;<br />close $QUAL;</p></blockquote><p>I would not suggest using flock when dealing with multiple processes because it will decrease the processing efficiency( each child process must wait for the lock to be released by the other child process). Instead, I would suggest each fork writing to a separate file and after the processing just concatenating them.</p><p><strong>Putting it all together, If you have 100GB data you can do this</strong></p><blockquote><p><strong>step 1</strong>&nbsp;: split the dataset equally according to number of processors you have. this may take a few hours(about 2-3 hrs for 100GB file)<br />You can use unix &ldquo;split&rdquo; command for this<br />for example:<br />my $number_split=int($number_of_entries_in_your_dataset/$max_processors);<br />my $split_Files=`split -l $number_split &ldquo;your_file.fasta&rdquo; &ldquo;file_name&rdquo;`;</p><p><strong>step2</strong>: open you directory comtaining you split files and start Parallel::ForkManager.<br /><strong>For example:</strong><br />opendir(DIRECTORY, $split_files_directory) or die $!; ### open the directory<br />my $fork= new Parallel::ForkManager($max_processors);<br />while (my $file = readdir(DIRECTORY)) { ### read the directory<br />if($file=~/^\./){next;}<br />print $file,&rdquo;\n&rdquo;;<br />########## Start fork ##########<br />my $pid= $super_fork-&gt;start and next;<br /><strong>Whatever you want to do with the split file ;</strong><br /><strong>analyze my piece of $file;</strong><br />######### end fork ###############<br />$super_fork-&gt;finish;<br />}<br />$super_fork-&gt;wait_all_children;</p></blockquote><p>So basically each processor will be active with its piece of data (split file) and thus you have created 8 processes at one time which run without interfering with the other process. I again will not suggest writing output from each child process to one file(for reasons above). Write output from each fork to a separate file and finally concatenate them. Thats it, you have just increased your program speed by 8 times!! Isnt it easy?</p><p><strong>Note:</strong><br />You may worry about concatenation of the output each child generates, since it does take some time(remember 100GB). I think now you can use a mysql database LOAD DATA LOCAL INFILE command to load all the files into a single table(Should take about 3hrs for 100Gb dataset) and then export the whole table into one file. This should be faster than just concatenating them using &ldquo;cat&rdquo; command.(correct me if I am wrong)</p><p>Or much simpler way is to use pipes</p><p>cat output_dir/* | my_pipe or my_pipe &lt;(file1) final_file;</p><p>Thats it guys!! Enjoy programming and please do comment. I am not a computer scientist so forgive me for any mistakes and if any please report them. Thank you.</p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/view/2021</guid>
	<pubDate>Mon, 12 Aug 2013 09:27:57 -0500</pubDate>
	<link>https://bioinformaticsonline.com/view/2021</link>
	<title><![CDATA[What are the difference between BioRuby and BioGem?]]></title>
	<description><![CDATA[<p>I came across two diferent but matching term BioRuby and BioGem. What are the difference between these two term? If both are using same Ruby language for development then why did they develope two different biological packages.</p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/opportunity/view/2425/phd-fellowship-computational-biologybioinformatics-cork-ireland-cork-ireland</guid>
  <pubDate>Thu, 15 Aug 2013 14:09:00 -0500</pubDate>
  <link></link>
  <title><![CDATA[Ph.D. Fellowship (Computational Biology/Bioinformatics) : Cork, Ireland : Cork, Ireland]]></title>
  <description><![CDATA[
<p>Ph.D. Fellowship (18,000 euro/pa, plus tuition fees at the EU students rate) is available for four years to work on development of Bioinformatics resources for the analysis and visualization of ribosome profiling data. Ribosome profiling (ribo-seq) is a technology that allows mapping positions of the ribosomes on the whole transcriptome level with a nucleotide precision. The technology allows obtaining high resolution digital snapshots of gene expression in cells. The position is available starting on the 1st of October, 2013.</p>

<p>Candidate:<br />The candidate is expected to have B.S. or M.S. degree in the disciplines such as Computer Science, Statistics, Applied Mathematics, Physics or Electrical Engineering. The candidates with the backgrounds in Life Science disciplines such as Bioinformatics, Computational or Quantitative Biology will also be considered.</p>

<p>Location:<br />The position is available at LAPTI (http://lapti.ucc.ie) that is located in the Western Gate Building (http://www.stwarchitects.com/project-information.php?c=1&amp;p=09993) at University College Cork. Western Gate Building Research Complex hosts several UCC departments and provides ideal environment for interdisciplinary research. Cork (sometimes referenced as “Venice of Ireland”) is the second most populous city in the Republic. It has friendly cosmopolitan atmosphere and vibrant culture. A number of American industrial giants such as Apple , EMC and Pfizer have chosen Cork as a home for their European headquarters.</p>

<p>Application process:<br />The details of the application process are given at http://lapti.ucc.ie/jobs.html. To ensure prompt processing of your application use the subject line: ‘Ph.D. computational’. All applications received prior to August the 1st are guaranteed equal consideration. However, applications at the later dates will also be considered until the position is filled.</p>
]]></description>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/opportunity/view/39603/tenure-track-position-in-bioinformatics-at-institute-of-neurobiology-unam-queretaro-mexico</guid>
  <pubDate>Mon, 10 Jun 2019 00:48:54 -0500</pubDate>
  <link></link>
  <title><![CDATA[Tenure Track position in Bioinformatics at Institute of Neurobiology, UNAM, Querétaro, México]]></title>
  <description><![CDATA[
<p>The Institute of Neurobiology UNAM (www.inb.unam.mx) offers a tenure-track position at the level of Assistant Professor (Investigador Asociado C) to develop an original research program in Bioinformatics with applications to neuroscience and to establish multidisciplinary collaboration with other members of the Institute. Applicants are expected to have a doctorate degree, postdoctoral experience related to bioinformatics or genome biology, and a strong track record of peer-reviewed publications. No previous experience in neuroscience is required.</p>

<p>Interested applicants must submit CV and addresses of three references to ataulfo@unam.mx.</p>

<p>Tenure Track position in Genomic Sciences  </p>

<p>Laboratorio Internacional de Investigación sobre el Genoma Humano, UNAM Juriquilla, Querétaro, México </p>

<p>The International Laboratory for Human Genome Research, LIIGH-UNAM (www.liigh.unam.mx) offers a tenure-track position at the level of Assistant Professor (Investigador Asociado C) to perform research, teaching and formation of human resources in the area of: “Genomics of Mendelian Diseases” </p>

<p>Applicants are expected to have a doctorate degree, postdoctoral experience related to the above mentioned area and a strong track record of peer-reviewed publications. Interested applicants must submit CV, email addresses of three references, and a three-page project to Dr. Rafael Palacios, Coordinator of LIIGH-UNAM (palacios@liigh.unam.mx) before June 21, 2019 ………………………………………………………………</p>

<p>Tenure Track position in Genomic Sciences </p>

<p>Laboratorio Internacional de Investigación sobre el Genoma Humano, UNAM Juriquilla, Querétaro, México </p>

<p>The International Laboratory for Human Genome Research, LIIGH-UNAM (www.liigh.unam.mx) offers a tenure-track position at the level of Assistant Professor (Investigador Asociado C) to perform research, teaching and formation of human resources in the area of: “Statistic Population Genomics and its Impact in Complex Diseases” </p>

<p>Applicants are expected to have a doctorate degree, postdoctoral experience related to the above mentioned area and a strong track record of peer-reviewed publications. Interested applicants must submit CV, email addresses of three references, and a three-page statement of research interests to Dr. Rafael Palacios, Coordinator of LIIGH-UNAM (palacios@liigh.unam.mx) before June 21, 2019</p>
]]></description>
</item>

</channel>
</rss>