<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/44775?offset=70</link>
	<atom:link href="https://bioinformaticsonline.com/related/44775?offset=70" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/43799/kast</guid>
	<pubDate>Wed, 23 Feb 2022 08:28:36 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/43799/kast</link>
	<title><![CDATA[KAST]]></title>
	<description><![CDATA[<p><span>Perform Alignment-free k-tuple frequency comparisons from sequences. This can be in the form of two input files (e.g. a reference and a query) or a single file for pairwise comparisons to be made.</span></p><p>Address of the bookmark: <a href="https://github.com/martinjvickers/KAST" rel="nofollow">https://github.com/martinjvickers/KAST</a></p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/44223/ale-assembly-likelihood-estimator</guid>
	<pubDate>Wed, 08 Mar 2023 01:39:33 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/44223/ale-assembly-likelihood-estimator</link>
	<title><![CDATA[ALE: Assembly Likelihood Estimator]]></title>
	<description><![CDATA[<p>Just import the assembly, bam and ALE scores. You can convert the .ale file to a set of .wig files with ale2wiggle.py and IGV can read those directly.&nbsp; Depending on your genome size you may want to convert the .wig files to the BigWig format.</p><p>Address of the bookmark: <a href="https://github.com/sc932/ALE" rel="nofollow">https://github.com/sc932/ALE</a></p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/44371/steps-to-find-all-the-repeats-in-the-genome</guid>
	<pubDate>Thu, 31 Aug 2023 02:43:28 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/44371/steps-to-find-all-the-repeats-in-the-genome</link>
	<title><![CDATA[Steps to find all the repeats in the genome !]]></title>
	<description><![CDATA[<div><p>To find repeats in a genome from 2 to 9 length using a Perl script, you can use the RepeatMasker tool with the "--length" option<a href="https://mobilednajournal.biomedcentral.com/articles/10.1186/1759-8753-5-13" target="_blank">[0]</a>. Here's a step-by-step guide:</p></div><div><ol>
<li>Install RepeatMasker: First, you need to install RepeatMasker on your system. You can download it from the RepeatMasker website<a href="https://mobilednajournal.biomedcentral.com/articles/10.1186/1759-8753-5-13" target="_blank">[0]</a>.</li>
</ol></div><div><ol>
<li>Prepare the genome sequence: Make sure you have the genome sequence in a FASTA file format. Let's assume the file is named "genome.fasta".</li>
</ol><blockquote><p>./RepeatMasker -pa &lt;number_of_processors&gt; -nolow -norna -no_is -div &lt;divergence_value&gt; -lib RepeatMaskerLib.embl -gff -xsmall -small -poly -species &lt;species_name&gt; -dir &lt;output_directory&gt; -length &lt;min_length&gt;-&lt;max_length&gt; genome.fasta</p></blockquote><div><p>Replace the following placeholders with appropriate values:</p><ul>
<li><code>&lt;number_of_processors&gt;</code>: The number of processors/threads you want to use for parallel processing.</li>
<li><code>&lt;divergence_value&gt;</code>: The divergence value for the species you are analyzing. You can find divergence values for different species in the RepeatMasker documentation<a href="https://mobilednajournal.biomedcentral.com/articles/10.1186/1759-8753-5-13" target="_blank">[0]</a>.</li>
<li><code>&lt;species_name&gt;</code>: The name of the species you are analyzing.</li>
<li><code>&lt;output_directory&gt;</code>: The directory where you want the output files to be saved.</li>
<li><code>&lt;min_length&gt;</code>&nbsp;and&nbsp;<code>&lt;max_length&gt;</code>: The minimum and maximum lengths of the repeats you want to find (in this case, 2 and 9).</li>
</ul></div><div><ol>
<li>Analyze the output: RepeatMasker will generate several output files, including a .out file. You can parse this file to extract the information you need. There is a Perl tool called "one_code_to_find_them_all.pl" that can help you parse RepeatMasker output files<a href="https://mobilednajournal.biomedcentral.com/articles/10.1186/1759-8753-5-13" target="_blank">[0]</a>. You can download it from the source provided.</li>
</ol></div><div><ol>
<li>Use the provided Perl script: Once you have the "one_code_to_find_them_all.pl" script, you can run it to conveniently parse the RepeatMasker output files. Here's an example of how to use it:</li>
</ol><blockquote><p>perl one_code_to_find_them_all.pl --rm &lt;RepeatMasker_out_file&gt; --length &lt;length_file&gt;</p></blockquote></div><p>&nbsp;</p></div><div><div><p>Replace&nbsp;<code>&lt;RepeatMasker_out_file&gt;</code>&nbsp;with the path to your RepeatMasker .out file, and&nbsp;<code>&lt;length_file&gt;</code>&nbsp;with the path to a file containing the lengths of the reference elements.</p></div><div><p>This script will generate several output files, including .log.txt and .copynumber.csv, which contain quantitative information about the identified repeat elements.</p></div><div><p>Remember to adjust the parameters and options according to your specific needs and the characteristics of your genome.</p></div></div>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/44637/tools-to-access-the-quality-of-your-assembled-genome</guid>
	<pubDate>Thu, 08 Aug 2024 23:31:18 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/44637/tools-to-access-the-quality-of-your-assembled-genome</link>
	<title><![CDATA[Tools to access the quality of your assembled genome !]]></title>
	<description><![CDATA[<ul dir="auto">
<li><a href="https://github.com/linsalrob/fasta_validator">FASTA VALIDATOR</a>&nbsp;+&nbsp;<a href="https://github.com/shenwei356/seqkit">SEQKIT RMDUP</a>: FASTA validation</li>
<li><a href="https://genometools.org/tools/gt_gff3validator.html">GENOMETOOLS GT GFF3VALIDATOR</a>: GFF3 validation</li>
<li><a href="https://github.com/PlantandFoodResearch/assemblathon2-analysis/blob/a93cba25d847434f7eadc04e63b58c567c46a56d/assemblathon_stats.pl">ASSEMBLATHON STATS</a>: Assembly statistics</li>
<li><a href="https://genometools.org/tools/gt_stat.html">GENOMETOOLS GT STAT</a>: Annotation statistics</li>
<li><a href="https://github.com/ncbi/fcs">NCBI FCS ADAPTOR</a>: Adaptor contamination pass/fail</li>
<li><a href="https://github.com/ncbi/fcs">NCBI FCS GX</a>: Foreign organism contamination pass/fail</li>
<li><a href="https://gitlab.com/ezlab/busco">BUSCO</a>: Gene-space completeness estimation</li>
<li><a href="https://github.com/tolkit/telomeric-identifier">TIDK</a>: Telomere repeat identification</li>
<li><a href="https://github.com/oushujun/LTR_retriever/blob/master/LAI">LAI</a>: Continuity of repetitive sequences</li>
<li><a href="https://github.com/DerrickWood/kraken2">KRAKEN2</a>: Taxonomy classification</li>
<li><a href="https://github.com/igvteam/juicebox.js">HIC CONTACT MAP</a>: Alignment and visualisation of HiC data</li>
<li><a href="https://github.com/mummer4/mummer">MUMMER</a>&nbsp;&rarr;&nbsp;<a href="http://circos.ca/documentation/">CIRCOS</a>&nbsp;+&nbsp;<a href="https://plotly.com/">DOTPLOT</a>&nbsp;&amp;&nbsp;<a href="https://github.com/lh3/minimap2">MINIMAP2</a>&nbsp;&rarr;&nbsp;<a href="https://github.com/schneebergerlab/plotsr">PLOTSR</a>: Synteny analysis</li>
<li><a href="https://github.com/marbl/merqury">MERQURY</a>: K-mer completeness, consensus quality and phasing assessment</li>
</ul>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44722/step-by-step-guide-to-running-genome-assembly</guid>
	<pubDate>Fri, 13 Dec 2024 11:35:55 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44722/step-by-step-guide-to-running-genome-assembly</link>
	<title><![CDATA[Step-by-Step Guide to Running Genome Assembly]]></title>
	<description><![CDATA[<p>Genome assembly is a critical process in bioinformatics, enabling the reconstruction of an organism's genome from short DNA sequence reads. Whether you&rsquo;re working on a new microbial genome or a complex eukaryotic organism, this guide will walk you through the steps of genome assembly using state-of-the-art tools and best practices.</p><h4><strong>What is Genome Assembly?</strong></h4><p>Genome assembly involves piecing together short DNA sequence reads generated by sequencing platforms (e.g., Illumina, PacBio, Oxford Nanopore) into longer, contiguous sequences called contigs. This can be performed as:</p><ul>
<li><strong>De Novo Assembly</strong>: Without a reference genome.</li>
<li><strong>Reference-Guided Assembly</strong>: Using a reference genome to guide the assembly process.</li>
</ul><h4><strong>Step 1: Preparing Your Data</strong></h4><p>Before starting the assembly, ensure that your raw sequencing data is high quality.</p><ol>
<li>
<p><strong>Input Data</strong></p>
<ul>
<li><strong>Short Reads</strong>: Illumina sequencing generates short, accurate reads ideal for scaffolding.</li>
<li><strong>Long Reads</strong>: PacBio and Nanopore sequencing provide long reads for resolving repetitive regions.</li>
</ul>
</li>
<li>
<p><strong>Quality Control (QC)</strong><br />Use tools like <strong>FastQC</strong> or <strong>MultiQC</strong> to assess the quality of your reads:</p>
<div>
<div dir="ltr"><code>fastqc reads.fastq multiqc . </code></div>
</div>
<p>Look for issues like low-quality bases, adapter contamination, or overrepresented sequences.</p>
</li>
<li>
<p><strong>Read Trimming and Filtering</strong><br />Trim low-quality bases and adapters using <strong>Trimmomatic</strong> or <strong>Cutadapt</strong>:</p>
<div>
<div dir="ltr"><code>trimmomatic PE reads_R1.fastq reads_R2.fastq trimmed_R1.fastq trimmed_R2.fastq \ ILLUMINACLIP:adapters.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:20 MINLEN:36 </code></div>
</div>
</li>
</ol><h4><strong>Step 2: Choosing an Assembly Strategy</strong></h4><p>Select an assembly strategy based on your data type:</p><ul>
<li>
<p><strong>Short-Read Assemblers</strong>:</p>
<ul>
<li>SPAdes: Popular for microbial genomes.</li>
<li>Velvet: Fast for smaller genomes.</li>
</ul>
</li>
<li>
<p><strong>Long-Read Assemblers</strong>:</p>
<ul>
<li>Canu: Ideal for long-read datasets.</li>
<li>Flye: Versatile for small and large genomes.</li>
</ul>
</li>
<li>
<p><strong>Hybrid Assemblers</strong>:</p>
<ul>
<li>MaSuRCA: Combines short and long reads.</li>
<li>Unicycler: Optimized for bacterial genomes.</li>
</ul>
</li>
</ul><h4><strong>Step 3: Running the Assembly</strong></h4><h5><strong>3.1. SPAdes (Short-Read Assembly)</strong></h5><p>SPAdes is an excellent choice for small genomes, such as bacteria.</p><div><div dir="ltr"><code>spades.py -1 trimmed_R1.fastq -2 trimmed_R2.fastq -o spades_output </code></div></div><p>The output includes assembled contigs (<code>contigs.fasta</code>) and scaffolds (<code>scaffolds.fasta</code>).</p><h5><strong>3.2. Canu (Long-Read Assembly)</strong></h5><p>Canu is designed for high-error long reads from PacBio or Nanopore.</p><div><div dir="ltr"><code>canu -p genome -d canu_output genomeSize=4.7m -nanopore-raw reads.fastq </code></div></div><p>The output will be in <code>canu_output/genome.contigs.fasta</code>.</p><h5><strong>3.3. Hybrid Assembly with Unicycler</strong></h5><p>Unicycler combines short and long reads for improved assemblies.</p><div><div dir="ltr"><code>unicycler -1 trimmed_R1.fastq -2 trimmed_R2.fastq -l long_reads.fastq -o unicycler_output </code></div></div><h4><strong>Step 4: Assessing Assembly Quality</strong></h4><p>After assembly, evaluate its quality using the following tools:</p><ol>
<li>
<p><strong>QUAST</strong><br />QUAST generates assembly statistics, such as N50, genome size, and GC content:</p>
<div>
<div dir="ltr"><code>quast contigs.fasta -o quast_output </code></div>
</div>
</li>
<li>
<p><strong>BUSCO</strong><br />BUSCO checks genome completeness by identifying conserved genes:</p>
<div>
<div dir="ltr"><code>busco -i contigs.fasta -o busco_output -l fungi_odb10 -m genome </code></div>
</div>
</li>
<li>
<p><strong>Assembly Graph Visualization</strong><br />Visualize assembly graphs with <strong>Bandage</strong>:</p>
<div>
<div dir="ltr"><code>Bandage load assembly_graph.gfa </code></div>
</div>
</li>
</ol><hr><h4><strong>Step 5: Post-Assembly Steps</strong></h4><ol>
<li>
<p><strong>Polishing</strong><br />Improve assembly accuracy using tools like <strong>Pilon</strong> (for short reads) or <strong>Racon</strong> (for long reads).</p>
<div>
<div dir="ltr"><code>racon long_reads.fasta mapped_reads.sam contigs.fasta &gt; polished_contigs.fasta </code></div>
</div>
</li>
<li>
<p><strong>Scaffolding</strong><br />Link contigs into scaffolds using tools like <strong>SSPACE</strong> or <strong>Opera-LG</strong> if required.</p>
</li>
<li>
<p><strong>Annotation</strong><br />Annotate the assembled genome using <strong>Prokka</strong> for prokaryotes or <strong>Maker</strong> for eukaryotes.</p>
<div>
<div dir="ltr"><code>prokka --outdir annotation_output --prefix genome contigs.fasta </code></div>
</div>
</li>
</ol><h4><strong>Step 6: Sharing and Archiving</strong></h4><ol>
<li>
<p><strong>Submit to Public Repositories</strong><br />Share your assembly in databases like <strong>NCBI GenBank</strong>, <strong>ENA</strong>, or <strong>DDBJ</strong>.</p>
</li>
<li>
<p><strong>Metadata Preparation</strong><br />Include detailed metadata for your submission, such as organism name, sequencing platform, and coverage.</p>
</li>
</ol><h4><strong>Best Practices</strong></h4><ul>
<li>Always perform quality checks at each stage to ensure data integrity.</li>
<li>Use multiple tools to cross-validate results when working with complex genomes.</li>
<li>Document parameters and software versions for reproducibility.</li>
</ul><h4><strong>Conclusion</strong></h4><p>Genome assembly is a powerful process that transforms raw sequencing data into a coherent representation of an organism&rsquo;s genome. By following this step-by-step guide, you can successfully assemble genomes and uncover valuable biological insights. Whether you&rsquo;re assembling a microbial genome or tackling the complexities of a eukaryotic genome, these tools and strategies will set you on the path to success.</p>]]></description>
	<dc:creator>Abhi</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/44902/hite-a-fast-and-accurate-dynamic-boundary-adjustment-approach-for-full-length-transposable-elements-detection-and-annotation-in-genome-assemblies</guid>
	<pubDate>Sat, 20 Sep 2025 09:34:04 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/44902/hite-a-fast-and-accurate-dynamic-boundary-adjustment-approach-for-full-length-transposable-elements-detection-and-annotation-in-genome-assemblies</link>
	<title><![CDATA[HiTE: a fast and accurate dynamic boundary adjustment approach for full-length Transposable Elements detection and annotation in Genome Assemblies]]></title>
	<description><![CDATA[<p dir="auto"><code>HiTE</code>&nbsp;is a Python software that uses a dynamic boundary adjustment approach to detect and annotate full-length Transposable Elements in Genome Assemblies. In comparison to other tools, HiTE demonstrates superior performance in detecting a greater number of full-length TEs.</p>
<div dir="auto">
<h2 dir="auto">panHiTE</h2>
<a href="https://github.com/CSU-KangHu/HiTE#panhite"></a></div>
<p dir="auto">We have developed panHiTE, a comprehensive and accurate pipeline for TE detection in large-scale population genomes. It has been successfully applied to hundreds of plant population genomes, demonstrating its effectiveness and scalability.</p>
<p dir="auto">For detailed instructions, please refer to the&nbsp;<a href="https://github.com/CSU-KangHu/HiTE/wiki/panHiTE-tutorial">panHiTE tutorial</a>.</p><p>Address of the bookmark: <a href="https://github.com/CSU-KangHu/HiTE" rel="nofollow">https://github.com/CSU-KangHu/HiTE</a></p>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/researchlabs/view/1972/page-lab-at-whitehead-institute-mit</guid>
  <pubDate>Sun, 11 Aug 2013 17:24:05 -0500</pubDate>
  <link></link>
  <title><![CDATA[Page Lab at Whitehead Institute, MIT]]></title>
  <description><![CDATA[
<p>They study the foundations of mammalian reproduction, with particular focus on sex chromosome biology and evolution, the fetal origins of gametes, and infertility.  </p>

<p>PI webpage : http://pagelab.wi.mit.edu/david_page.html</p>

<p>Ted Presentation : http://www.youtube.com/watch?v=nQcgD5DpVlQ</p>

<p>Lab webpage: http://pagelab.wi.mit.edu/index.html</p>
]]></description>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/news/view/12883/breaking-chromosomes-to-study-cancer</guid>
	<pubDate>Fri, 18 Jul 2014 05:42:09 -0500</pubDate>
	<link>https://bioinformaticsonline.com/news/view/12883/breaking-chromosomes-to-study-cancer</link>
	<title><![CDATA[Breaking chromosomes to study cancer !!!]]></title>
	<description><![CDATA[<p>Chromosomes are present in every cell of our body and they contain the information the body needs to develop and function properly. This information is carried in genes that are arranged along the chromosomes. There are usually 46 chromosomes in every cell. These chromosomes come in pairs, one from our mother and one from our father. The chromosomes can be sorted into 23 pairs by looking at them down a microscope.</p><p>Most people who have a balanced translocation have the right amount of chromosome material but it has been rearranged in some way. This may happen if two chromosomes swap pieces (a reciprocal translocation). In other cases two whole chromosomes may become stuck together (a Robertsonian translocation). This page describes what happens when someone has a reciprocal translocation. <br /><br />Reciprocal chromosomal translocations occur following double-strand breaks (DSBs) in DNA when a section of one chromosome is exchanged with that of another, non-homologous chromosome. These exchanges may produce a dysfunctional fusion gene that disrupts cell growth and survival pathways, such as the translocations seen in leukemia and childhood sarcomas. <br /><br />Chromosomal translocations have been well studied in cancer cell lines which are associated with two types of cancer, acute myeloid leukemia and Ewing's sarcoma, but determining how they contribute to cancer development is complicated by additional mutations and altered gene expression profiles in these cultured cells. Now, Juan Carlos Ramirez, head of the Viral Vector Facility at the Fundacion Centro Nacional de Investigaciones Cardiovasculares (CNIC) and his colleagues Raul Torres at CNIC and Sandra Rodriguez-Peralez at the Spanish National Cancer Center (CNIO) in Madrid, Spain have used a new genome editing tool, CRISPR-Cas9, to induce chromosomal translocations for the first time in a human cell line and in primary cells. The study's authors conclude by stating that the use of this technology will allow for the clarification of how and why chromosomal translocation occurs, which without doubt will allow new anti-cancer therapeutic strategies to be tackled.</p><p>Using RNA-Guided Endonuclease (RGEN) technology or CRISPR/Cas9 genome engineering technology, CNIO and CNIC researchers have shown that it is possible to obtain such chromosomal translocations. The CRISPR-Cas9 system is extremely simple to introduce a cut at the desired locus, easier to design, and cheaper than many other systems. Using the CRISPR-Cas9 system, Ramirez and his colleagues reproduced the translocations observed in Ewing&rsquo;s Sarcoma (ES) and Acute Myeloid Leukemia (AML) patient cell lines in HEK293 cells and also generated the ES translocation in human mesenchymal stem cells and the AML translocation in umbilical cord blood cells.</p><p>By focusing on chromosomal translocation without the confounding characteristics of established cell lines, these new cells lines should help answer the fundamental question of what causes a cell to become cancerous. Ramirez and his team now look forward to modeling other chromosome translocations in a variety of cell types.</p><p>Reference:</p><p>http://en.wikipedia.org/wiki/Chromosomal_translocation</p><p>http://www.nature.com/ncomms/2014/140603/ncomms4964/abs/ncomms4964.html<br /><br /></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/researchlabs/view/19648/mit-computational-biology-group</guid>
  <pubDate>Thu, 18 Dec 2014 14:47:01 -0600</pubDate>
  <link></link>
  <title><![CDATA[MIT Computational Biology Group]]></title>
  <description><![CDATA[
<p>My research group consists primarily of computer science graduate students and postdocs with expertise in algorithms, statistical inferences and machine learning, and sharing a passion for understanding fundamental biological problems.</p>

<p>We work in a highly interdisciplinary environment at the interface of Computer Science and Biology. Since its inception, our lab has eagerly engaged in collaborative research partnerships with biological and experimental collaborators, facilitated by our affiliation with the Broad Institute and the Computational and Systems Biology initiative (CSBi) at MIT, our participation in the Epigenome Roadmap, ENCODE, and modENCODE consortia, and by several other ongoing collaborations at MIT, Harvard, and the Harvard Medical School affiliated hospitals.</p>

<p>http://compbio.mit.edu/</p>
]]></description>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/researchlabs/view/23149/raphael-lab</guid>
  <pubDate>Sat, 04 Jul 2015 19:05:29 -0500</pubDate>
  <link></link>
  <title><![CDATA[Raphael Lab]]></title>
  <description><![CDATA[
<p>Raphael Lab research is focused on Bioinformatics and Computational Biology.</p>

<p>Current research interests include next-generation DNA sequencing, structural variation, genome rearrangements in cancer and evolution, and network analysis of somatic mutations in cancer. Earlier research included topics in comparative genomics, multiple sequence alignment, and motif finding.</p>

<p>More athttp://compbio.cs.brown.edu/</p>
]]></description>
</item>

</channel>
</rss>