<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/44234?offset=50</link>
	<atom:link href="https://bioinformaticsonline.com/related/44234?offset=50" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/44479/doubletrouble-identify-duplicated-genes-from-whole-genome-protein-sequences-and-classify</guid>
	<pubDate>Tue, 05 Mar 2024 00:23:49 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/44479/doubletrouble-identify-duplicated-genes-from-whole-genome-protein-sequences-and-classify</link>
	<title><![CDATA[doubletrouble: identify duplicated genes from whole-genome protein sequences and classify]]></title>
	<description><![CDATA[<p><span>doubletrouble aims to identify duplicated genes from whole-genome protein sequences and classify them based on their modes of duplication. The duplication modes are i. segmental duplication (SD); ii. tandem duplication (TD); iii. proximal duplication (PD); iv. transposed duplication (TRD) and; v. dispersed duplication (DD). Transposon-derived duplicates (TRD) can be further subdivided into rTRD (retrotransposon-derived duplication) and dTRD (DNA transposon-derived duplication). If users want a simpler classification scheme, duplicates can also be classified into SD- and SSD-derived (small-scale duplication) gene pairs. Besides classifying gene pairs, users can also classify genes, so that each gene is assigned a unique mode of duplication. Users can also calculate substitution rates per substitution site (i.e., Ka and Ks) from duplicate pairs, find peaks in Ks distributions with Gaussian Mixture Models (GMMs), and classify gene pairs into age groups based on Ks peaks.</span></p><p>Address of the bookmark: <a href="https://bioconductor.org/packages/release/bioc/html/doubletrouble.html" rel="nofollow">https://bioconductor.org/packages/release/bioc/html/doubletrouble.html</a></p>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/44896/jaeger-an-accurate-and-fast-deep-learning-tool-to-detect-bacteriophage-sequences</guid>
	<pubDate>Sun, 31 Aug 2025 06:30:16 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/44896/jaeger-an-accurate-and-fast-deep-learning-tool-to-detect-bacteriophage-sequences</link>
	<title><![CDATA[Jaeger : an accurate and fast deep-learning tool to detect bacteriophage sequences]]></title>
	<description><![CDATA[<p><span>Jaeger is a tool that utilizes homology-free machine learning to identify phage genome sequences that are hidden within metagenomes. It is capable of detecting both phages and prophages within metagenomic assemblies.</span></p><p>Address of the bookmark: <a href="https://github.com/MGXlab/Jaeger" rel="nofollow">https://github.com/MGXlab/Jaeger</a></p>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/36711/ancestral-sequence-reconstruction-steps</guid>
	<pubDate>Fri, 18 May 2018 08:28:26 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/36711/ancestral-sequence-reconstruction-steps</link>
	<title><![CDATA[Ancestral sequence reconstruction steps !]]></title>
	<description><![CDATA[<div><strong>Ancestral sequence reconstruction</strong>&nbsp;(<strong>ASR</strong>) &ndash; also known as&nbsp;<strong>ancestral gene</strong>/<strong>sequence reconstruction</strong>/<strong>resurrection</strong>&nbsp;&ndash; is a technique used in the study of&nbsp;molecular evolution. The method consists of the synthesis of an ancestral&nbsp;gene&nbsp;and expression of the corresponding ancestral&nbsp;protein.&nbsp;<a href="https://en.wikipedia.org/wiki/Ancestral_sequence_reconstruction#cite_note-thornton-1"></a>The idea of protein 'resurrection' was suggested in 1963 by Pauling and Zuckerkandl.<a href="https://en.wikipedia.org/wiki/Ancestral_sequence_reconstruction#cite_note-2"></a>&nbsp;Some early efforts were made in the eighties-nineties, led by the laboratory of&nbsp;Steven A. Benner, showing the potential of this technique &ndash; one that only started to be fulfilled in the post-genomic era.<a href="https://en.wikipedia.org/wiki/Ancestral_sequence_reconstruction#cite_note-3"></a>&nbsp;Thanks to the improvement of algorithms and of better sequencing and synthesis techniques, the method was developed further in the early 2000s to allow the resurrection of a greater variety of and much more ancient genes.<a href="https://en.wikipedia.org/wiki/Ancestral_sequence_reconstruction#cite_note-4"></a>&nbsp;Over the last decade, ancestral protein resurrection has developed as a strategy to reveal the mechanisms and dynamics of protein evolution.&nbsp;</div><div>&nbsp;</div><div>BEAST is the best way to predict the ancestral structure. but, I suggest following steps?</div><div>&nbsp;</div><div>1- Alignments "Mafft -&nbsp;<a href="https://www.researchgate.net/deref/http%3A%2F%2Fmafft.cbrc.jp%2Falignment%2Fsoftware%2Fsource.html" target="_blank">http://mafft.cbrc.jp/alignment/software/source.html</a>"</div><div>mafft --maxiterate 1000 --reorder --thread 24 --genafpair Dataset.fasta &gt; Dataset_Alig.fasta</div><div>&nbsp;</div><div>2- Your dataset has a good phylogenetic signal, is possible to perform with Tree-Puzzle "<a href="https://www.researchgate.net/deref/http%3A%2F%2Fwww.tree-puzzle.de" target="_blank">http://www.tree-puzzle.de</a>";</div><div>&nbsp;</div><div id="yui_3_14_1_1_1526649596608_1443">3 - This dataset which the saturation index, I perform with "<a href="https://www.researchgate.net/deref/http%3A%2F%2Fdambe.bio.uottawa.ca%2Fdambe.asp" target="_blank">http://dambe.bio.uottawa.ca/dambe.asp</a>";</div><div>&nbsp;</div><div>4- Has evidence of possible recombination in your dataset, the evaluate if this presence or absence, because this may to influence the grouping of clades, I perform with</div><div>---recombination</div><div>&nbsp;</div><div>4.1- Phi-test, implemented in SplitTree4"<a href="https://www.researchgate.net/deref/http%3A%2F%2Fwww.splitstree.org" target="_blank">http://www.splitstree.org</a>", (.nex file)</div><div>&nbsp;</div><div>4.2- GARD deployed in webserver in the DataMonkey "<a href="https://www.researchgate.net/deref/http%3A%2F%2Fwww.datamonkey.org%2F" target="_blank">http://www.datamonkey.org/</a>" - turning to the amino acid seaview -&gt; view proteins -&gt; save as ...) Ideally do a tree-based groups.</div><div>&nbsp;</div><div>4.3- RDP4 for download and installation on Windows in "<a href="https://www.researchgate.net/deref/http%3A%2F%2Fweb.cbio.uct.ac.za%2F~darren%2Frdp.html" target="_blank">http://web.cbio.uct.ac.za/~darren/rdp.html</a>"</div><div>&nbsp;</div><div>4.4- Hyphy (Mac, Windows, Linux) in "<a href="https://www.researchgate.net/deref/http%3A%2F%2Fhyphy.org%2Fw%2Findex.php%2FDownload" target="_blank">http://hyphy.org/w/index.php/Download</a>"</div><div>&nbsp;</div><div>4.5- Path-o-Gen (temporal structure of a tree input file -&gt; arquivo.tre)</div><div>These steps above, I call of pre-processing to inferences phylogenetic...</div><div>&nbsp;</div><div>5- Perform phylogenetic tree, used Bayesian Inference with Molecular Clock, but is necessary Clock Testing:</div><div>&nbsp;</div><div>- This step is performed with program Beast (Beauti, Beast and TreeAnnotator), and Tracer_v1.5 more FigTree to inspection.</div><div>&nbsp;</div><div>- Tutorials:&nbsp;<a href="https://www.researchgate.net/deref/http%3A%2F%2Fbeast.bio.ed.ac.uk%2Ftutorials" target="_blank">http://beast.bio.ed.ac.uk/tutorials</a></div><div>- Downloads:&nbsp;<a href="https://www.researchgate.net/deref/http%3A%2F%2Fbeast.bio.ed.ac.uk%2Fdownloads" target="_blank">http://beast.bio.ed.ac.uk/downloads</a></div>]]></description>
	<dc:creator>Surabhi Chaudhary</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44002/interesting-bioinformatics-resources</guid>
	<pubDate>Fri, 11 Nov 2022 06:30:46 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44002/interesting-bioinformatics-resources</link>
	<title><![CDATA[Interesting Bioinformatics Resources !]]></title>
	<description><![CDATA[<p>1. a reproducible workflow.&nbsp;<a href="https://www.youtube.com/watch?v=s3JldKoA0zw">https://www.youtube.com/watch?v=s3JldKoA0zw</a>&nbsp;This two minute video will change your mind on reproducible research&nbsp;</p><p>2. Parallel sequencing lives, or what makes large sequencing projects successful&nbsp;<a href="https://academic.oup.com/gigascience/article/6/11/gix100/4557140?login=false">https://academic.oup.com/gigascience/article/6/11/gix100/4557140?login=false</a></p><p>3. Common-sense approaches to sharing tabular data alongside publication&nbsp;<a href="https://www.sciencedirect.com/science/article/pii/S2666389921002300">https://www.sciencedirect.com/science/article/pii/S2666389921002300</a></p><p>4. A Reproducible Data Analysis Workflow with R Markdown, Git, Make, and Docker&nbsp;<a href="https://psyarxiv.com/8xzqy/">https://psyarxiv.com/8xzqy/</a></p><p>5. Practical Computational Reproducibility in the Life Sciences&nbsp;<a href="https://www.cell.com/cell-systems/fulltext/S2405-4712(18)30140-6">https://www.cell.com/cell-systems/fulltext/S2405-4712(18)30140-6</a></p><p>6. A video by Dr.Keith A. Baggerly from MD Anderson [The Importance of Reproducible Research in High-Throughput Biology](<a href="https://www.youtube.com/watch?v=7gYIs7uYbMo">https://www.youtube.com/watch?v=7gYIs7uYbMo</a>) highly recommended.</p><p>7. Ten Simple Rules for Reproducible Computational Research&nbsp;<a href="http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003285">http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003285</a>)</p><p>8. Good Enough Practices in Scientific Computing&nbsp;<a href="http://arxiv.org/abs/1609.00037">http://arxiv.org/abs/1609.00037</a>&nbsp;</p><p>9. Best Practices for Scientific Computing&nbsp;<a href="https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1001745">https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1001745</a></p><p>10. A Quick Guide to Organizing Computational Biology Projects&nbsp;<a href="http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.100042">http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.100042</a>&nbsp; A must read for computational biologists!</p><p>11. Reproducibility of computational workflows is automated using continuous analysis&nbsp;<a href="https://www.nature.com/articles/nbt.3780">https://www.nature.com/articles/nbt.3780</a></p><p>12. Five selfish reasons to work reproducibly&nbsp;<a href="https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0850-7">https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0850-7</a></p>]]></description>
	<dc:creator>Abhi</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/33461/graphmap-a-highly-sensitive-and-accurate-mapper-for-long-error-prone-reads</guid>
	<pubDate>Wed, 07 Jun 2017 04:18:16 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/33461/graphmap-a-highly-sensitive-and-accurate-mapper-for-long-error-prone-reads</link>
	<title><![CDATA[GraphMap - A highly sensitive and accurate mapper for long, error-prone reads]]></title>
	<description><![CDATA[<p>GraphMap - A highly sensitive and accurate mapper for long, error-prone reads http://www.nature.com/ncomms/2016/160415/ncomms11307/full/ncomms11307.html<br><br><strong>Features</strong><br><br>&nbsp;&nbsp;&nbsp; Mapping position agnostic to alignment parameters.<br>&nbsp;&nbsp;&nbsp; Consistently very high sensitivity and precision across different error profiles, rates and sequencing technologies even with default parameters.<br>&nbsp;&nbsp;&nbsp; Circular genome handling to resolve coverage drops near ends of the genome.<br>&nbsp;&nbsp;&nbsp; E-value.<br>&nbsp;&nbsp;&nbsp; Meaningful mapping quality.<br>&nbsp;&nbsp;&nbsp; Various alignment strategies (semiglobal bit-vector and Gotoh, anchored).<br>&nbsp;&nbsp;&nbsp; Overlapping of reads for de novo assembly.<br>&nbsp;&nbsp;&nbsp; Transcriptome mapping through internal construction of a transcriptome from a given genomic reference and a GTF file.<br>&nbsp;&nbsp;&nbsp; ...and much more.<br><br>GraphMap is also used as an overlapper in a new de novo genome assembly project called Ra (https://github.com/mariokostelac/ra-integrate).<br>Ra attempts to create de novo assemblies from raw nanopore and PacBio reads without requiring error correction, for which a highly sensitive overlapper is required.<br><br>Currently, development of a new spliced-alignment mode for mapping RNA-seq reads is under way.<br>Description of the current effort as well as how to reach the experimental implementation can be found here: doc/rnaseq.md.</p><p>Address of the bookmark: <a href="https://github.com/isovic/graphmap" rel="nofollow">https://github.com/isovic/graphmap</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34618/mashmap-a-fast-and-approximate-software-for-mapping-long-reads-pacbioont-or-assembly-to-reference-genomes</guid>
	<pubDate>Tue, 12 Dec 2017 17:23:31 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34618/mashmap-a-fast-and-approximate-software-for-mapping-long-reads-pacbioont-or-assembly-to-reference-genomes</link>
	<title><![CDATA[MashMap: a fast and approximate software for mapping long reads (PacBio/ONT) or assembly to reference genome(s)]]></title>
	<description><![CDATA[<p><span>MashMap is a fast and approximate software for mapping long reads (PacBio/ONT) or assembly to reference genome(s). It maps a query sequence against a reference region if and only if its estimated alignment identity is above a specified threshold. It does not compute the alignments explicitly, but rather estimates a&nbsp;</span><em>k</em><span>-mer based&nbsp;</span><a href="https://en.wikipedia.org/wiki/Jaccard_index">Jaccard similarity</a><span>&nbsp;using a combination of&nbsp;</span><a href="http://www.cs.princeton.edu/courses/archive/spr05/cos598E/bib/p76-schleimer.pdf">Winnowing</a><span>&nbsp;and&nbsp;</span><a href="https://en.wikipedia.org/wiki/MinHash">MinHash</a><span>. This is then converted to an estimate of sequence identity using the&nbsp;</span><a href="http://mash.readthedocs.org/">Mash</a><span>&nbsp;distance. An appropriate&nbsp;</span><em>k</em><span>-mer sampling rate is automatically determined given minimum local alignment length and identity thresholds. The efficiency of the algorithm improves as both of these thresholds are increased.</span></p><p>Address of the bookmark: <a href="https://github.com/marbl/MashMap" rel="nofollow">https://github.com/marbl/MashMap</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36456/alpaca-a-hybrid-strategy-for-assembly-of-genomic-dna-shotgun-sequencing-reads</guid>
	<pubDate>Mon, 30 Apr 2018 04:38:40 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36456/alpaca-a-hybrid-strategy-for-assembly-of-genomic-dna-shotgun-sequencing-reads</link>
	<title><![CDATA[ALPACA: A hybrid strategy for assembly of genomic DNA shotgun sequencing reads.]]></title>
	<description><![CDATA[<p><span>ALPACA requires Celera Assembler 8.3 or later. It is recommended to build Celera Assembler from source. (Why? The pre-built binaries CA_8.3rc1 and CA8.3rc2 will work for any large data set.&nbsp;</span></p>
<p><span>Detail paper at&nbsp;https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-017-3927-8</span></p><p>Address of the bookmark: <a href="https://github.com/VicugnaPacos/ALPACA" rel="nofollow">https://github.com/VicugnaPacos/ALPACA</a></p>]]></description>
	<dc:creator>Seema Singh</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36607/tarean-a-computational-tool-for-identification-and-characterization-of-satellite-dna-from-unassembled-short-reads</guid>
	<pubDate>Tue, 15 May 2018 02:53:11 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36607/tarean-a-computational-tool-for-identification-and-characterization-of-satellite-dna-from-unassembled-short-reads</link>
	<title><![CDATA[TAREAN: A computational tool for identification and characterization of satellite DNA from unassembled short reads]]></title>
	<description><![CDATA[<p><strong>TA</strong>ndem&nbsp;<strong>RE</strong>peat&nbsp;<strong>AN</strong>alyzer -TAREAN &ndash; is a computational pipeline for&nbsp;<strong>unsupervised identification of satellite repeats</strong>&nbsp;from unassembled sequence reads. The pipeline uses low-pass whole genome sequence reads and performs their graph-based clustering. Resulting clusters, representing all types of repeats, are then examined for the presence of circular structures and putative satellite repeats are reported.</p>
<p><em><strong>How to use TAREAN</strong></em>:</p>
<ul>
<li>Install a local instance of the pipeline using its source code available from&nbsp;<a href="https://bitbucket.org/petrnovak/repex_tarean" target="_blank" title="TAREAN source code">bitbucket repository</a>.</li>
<li>Use&nbsp; public Galaxy-based server at&nbsp;<a href="https://repeatexplorer-elixir.cerit-sc.cz/" target="_blank">https://repeatexplorer-elixir.cerit-sc.cz/</a>. The server is provided in frame of the&nbsp;<a href="https://www.elixir-czech.cz/" target="_blank">Elixir CZ project</a>&nbsp;and is maintained by&nbsp;<a href="https://www.cesnet.cz/" target="_blank">CESNET</a>&nbsp;and&nbsp;<a href="https://www.cerit-sc.cz/en/index.html" target="_blank">CERIT-SC</a>. Simple registration is required to use this service.</li>
</ul>
<p>Development of TAREAN was supported by&nbsp;<a href="https://www.elixir-czech.cz/" target="_blank" title="ELIXIR-CZ">ELIXIR CZ</a>&nbsp;research infrastructure project (MEYS Grant No: LM2015047).</p>
<p><strong><em>References</em></strong></p>
<p>Novak, P., Avila Robledillo, L., Koblizkova, A., Vrbova, I., Neumann, P., Macas, J. (2017) &ndash;&nbsp;<a href="https://academic.oup.com/nar/article/3574061/" target="_blank">TAREAN: a computational tool for identification and characterization of satellite DNA from unassembled short reads</a>.&nbsp;<em>Nucleic Acids Res.</em>, doi:10.1093/nar/gkx257</p><p>Address of the bookmark: <a href="https://bitbucket.org/petrnovak/repex_tarean" rel="nofollow">https://bitbucket.org/petrnovak/repex_tarean</a></p>]]></description>
	<dc:creator>Surabhi Chaudhary</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36739/blasr-mapping-single-molecule-sequencing-reads-using-basic-local-alignment-with-successive-refinement-blasr-theory-and-application</guid>
	<pubDate>Wed, 23 May 2018 06:54:32 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36739/blasr-mapping-single-molecule-sequencing-reads-using-basic-local-alignment-with-successive-refinement-blasr-theory-and-application</link>
	<title><![CDATA[BlasR Mapping single molecule sequencing reads using Basic Local Alignment with Successive Refinement (BLASR): Theory and Application,]]></title>
	<description><![CDATA[<p><span>BLASR (Basic Local Alignment with Successive Refinement) for mapping Single Molecule Sequencing (SMS) reads that are thousands to tens of thousands of bases long with divergence between the read and genome dominated by insertion and deletion error.</span></p>
<p>Here is how I use the blasr to align PacBio reads to the contigs (target.fasta). The &ldquo;target.fasta.sa&rdquo; is the suffix array from &ldquo;target.fasta&rdquo; generated by sawriter.</p>
<blockquote>
<p>blasr query.fa ./target.fasta -sa ./target.fasta.sa -bestn 40 -maxScore -500 -m 4 -nproc 24 -out target.m4 -maxLCPLength 15</p>
</blockquote>
<p>the output format option &ldquo;-m 4&Prime; generate the alignment coordinate. Not fully documented, but I can explain that to you.&nbsp;</p>
<p>I use a 24 cores / 48G ram server for the alignment. It took about 2 to 3 hours aligning 3G PacBio Reads to 10^6 sequences of short read contigs with a mean 3.5kbp length.</p><p>Address of the bookmark: <a href="http://bix.ucsd.edu/projects/blasr/" rel="nofollow">http://bix.ucsd.edu/projects/blasr/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36812/porechop-tool-for-finding-and-removing-adapters-from-oxford-nanopore-reads</guid>
	<pubDate>Tue, 29 May 2018 07:33:44 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36812/porechop-tool-for-finding-and-removing-adapters-from-oxford-nanopore-reads</link>
	<title><![CDATA[Porechop:  tool for finding and removing adapters from Oxford Nanopore reads]]></title>
	<description><![CDATA[<p>Porechop is a tool for finding and removing adapters from <a href="https://nanoporetech.com/">Oxford Nanopore</a> reads. Adapters on the ends of reads are trimmed off, and when a read has an adapter in its middle, it is treated as chimeric and chopped into separate reads. Porechop performs thorough alignments to effectively find adapters, even at low sequence identity.</p>
<p>Porechop also supports demultiplexing of Nanopore reads that were barcoded with the <a href="https://store.nanoporetech.com/native-barcoding-kit-1d.html">Native Barcoding Kit</a>, <a href="https://store.nanoporetech.com/pcr-barcoding-kit-96.html">PCR Barcoding Kit</a> or <a href="https://store.nanoporetech.com/rapid-barcoding-sequencing-kit.html">Rapid Barcoding Kit</a>.</p><p>Address of the bookmark: <a href="https://github.com/rrwick/Porechop" rel="nofollow">https://github.com/rrwick/Porechop</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>

</channel>
</rss>