<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/40711?offset=110</link>
	<atom:link href="https://bioinformaticsonline.com/related/40711?offset=110" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37223/chopstitch-exon-annotation-and-splice-graph-construction-using-transcriptome-assembly-and-whole-genome-sequencing-data</guid>
	<pubDate>Tue, 03 Jul 2018 04:14:52 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37223/chopstitch-exon-annotation-and-splice-graph-construction-using-transcriptome-assembly-and-whole-genome-sequencing-data</link>
	<title><![CDATA[ChopStitch: exon annotation and splice graph construction using transcriptome assembly and whole genome sequencing data]]></title>
	<description><![CDATA[ChopStitch is a new method for finding putative exons and constructing splice graphs using an assembled transcriptome and whole genome shotgun sequencing (WGSS) data. ChopStitch identifies exon-exon boundaries in de novo assembled RNA-seq data with the help of a Bloom filter that represents the k-mer spectrum of WGSS reads. The algorithm also detects base substitutions in transcript sequences corresponding to sequencing or assembly errors, haplotype variations, or putative RNA editing events. The primary output of our tool is a FASTA file containing putative exons. Further, exon edges are interrogated for alternative exon-exon boundaries to detect transcript isoforms, which are reported as splice graphs in dot output format.<p>Address of the bookmark: <a href="https://github.com/bcgsc/ChopStitch" rel="nofollow">https://github.com/bcgsc/ChopStitch</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/42806/graphunzip-phases-an-assembly-graph-using-hi-c-data-andor-long-reads</guid>
	<pubDate>Fri, 05 Feb 2021 21:22:24 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/42806/graphunzip-phases-an-assembly-graph-using-hi-c-data-andor-long-reads</link>
	<title><![CDATA[GraphUnzip: Phases an assembly graph using Hi-C data and/or long reads.]]></title>
	<description><![CDATA[<p>GraphUnzip, a fast, memory-efficient and accurate tool to unzip assembly graphs into their constituent haplotypes using long reads and/or Hi-C data. As GraphUnzip only connects sequences in the assembly graph that already had a potential link based on overlaps, it yields high-quality gap-less supercontigs. To demonstrate the efficiency of GraphUnzip, we tested it on a simulated diploid Escherichia coli genome, and on two real datasets for the genomes of the rotifer Adineta vaga and the potato Solanum tuberosum. In all cases, GraphUnzip yielded highly continuous phased assemblies.</p>
<p>https://www.biorxiv.org/content/biorxiv/early/2021/02/01/2021.01.29.428779.full.pdf</p><p>Address of the bookmark: <a href="https://github.com/nadegeguiglielmoni/GraphUnzip" rel="nofollow">https://github.com/nadegeguiglielmoni/GraphUnzip</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/44497/graphpath-a-graph-attention-model-for-molecular-stratification-with-interpretability-based-on-the-pathway-pathway-interaction-network</guid>
	<pubDate>Wed, 27 Mar 2024 20:51:21 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/44497/graphpath-a-graph-attention-model-for-molecular-stratification-with-interpretability-based-on-the-pathway-pathway-interaction-network</link>
	<title><![CDATA[GraphPath: A graph attention model for molecular stratification with interpretability based on the pathway-pathway interaction network]]></title>
	<description><![CDATA[<p><span>Achieving accurate and interpretable clinical predictions requires paramount attention to thoroughly characterizing patients at both the molecular and biological pathway levels. In this paper, we present GraphPath, a biological knowledge-driven graph neural network with multi-head self-attention mechanism that implements the pathway-pathway interaction network. We train GraphPath to classify the cancer status of patients with prostate cancer based on their multi-omics profiling.</span></p>
<p><span><img src="https://github.com/amazingma/GraphPath/raw/main/Figures/GraphPath.png" alt="image" style="border: 0px;"></span></p><p>Address of the bookmark: <a href="https://github.com/amazingma/GraphPath" rel="nofollow">https://github.com/amazingma/GraphPath</a></p>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44722/step-by-step-guide-to-running-genome-assembly</guid>
	<pubDate>Fri, 13 Dec 2024 11:35:55 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44722/step-by-step-guide-to-running-genome-assembly</link>
	<title><![CDATA[Step-by-Step Guide to Running Genome Assembly]]></title>
	<description><![CDATA[<p>Genome assembly is a critical process in bioinformatics, enabling the reconstruction of an organism's genome from short DNA sequence reads. Whether you&rsquo;re working on a new microbial genome or a complex eukaryotic organism, this guide will walk you through the steps of genome assembly using state-of-the-art tools and best practices.</p><h4><strong>What is Genome Assembly?</strong></h4><p>Genome assembly involves piecing together short DNA sequence reads generated by sequencing platforms (e.g., Illumina, PacBio, Oxford Nanopore) into longer, contiguous sequences called contigs. This can be performed as:</p><ul>
<li><strong>De Novo Assembly</strong>: Without a reference genome.</li>
<li><strong>Reference-Guided Assembly</strong>: Using a reference genome to guide the assembly process.</li>
</ul><h4><strong>Step 1: Preparing Your Data</strong></h4><p>Before starting the assembly, ensure that your raw sequencing data is high quality.</p><ol>
<li>
<p><strong>Input Data</strong></p>
<ul>
<li><strong>Short Reads</strong>: Illumina sequencing generates short, accurate reads ideal for scaffolding.</li>
<li><strong>Long Reads</strong>: PacBio and Nanopore sequencing provide long reads for resolving repetitive regions.</li>
</ul>
</li>
<li>
<p><strong>Quality Control (QC)</strong><br />Use tools like <strong>FastQC</strong> or <strong>MultiQC</strong> to assess the quality of your reads:</p>
<div>
<div dir="ltr"><code>fastqc reads.fastq multiqc . </code></div>
</div>
<p>Look for issues like low-quality bases, adapter contamination, or overrepresented sequences.</p>
</li>
<li>
<p><strong>Read Trimming and Filtering</strong><br />Trim low-quality bases and adapters using <strong>Trimmomatic</strong> or <strong>Cutadapt</strong>:</p>
<div>
<div dir="ltr"><code>trimmomatic PE reads_R1.fastq reads_R2.fastq trimmed_R1.fastq trimmed_R2.fastq \ ILLUMINACLIP:adapters.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:20 MINLEN:36 </code></div>
</div>
</li>
</ol><h4><strong>Step 2: Choosing an Assembly Strategy</strong></h4><p>Select an assembly strategy based on your data type:</p><ul>
<li>
<p><strong>Short-Read Assemblers</strong>:</p>
<ul>
<li>SPAdes: Popular for microbial genomes.</li>
<li>Velvet: Fast for smaller genomes.</li>
</ul>
</li>
<li>
<p><strong>Long-Read Assemblers</strong>:</p>
<ul>
<li>Canu: Ideal for long-read datasets.</li>
<li>Flye: Versatile for small and large genomes.</li>
</ul>
</li>
<li>
<p><strong>Hybrid Assemblers</strong>:</p>
<ul>
<li>MaSuRCA: Combines short and long reads.</li>
<li>Unicycler: Optimized for bacterial genomes.</li>
</ul>
</li>
</ul><h4><strong>Step 3: Running the Assembly</strong></h4><h5><strong>3.1. SPAdes (Short-Read Assembly)</strong></h5><p>SPAdes is an excellent choice for small genomes, such as bacteria.</p><div><div dir="ltr"><code>spades.py -1 trimmed_R1.fastq -2 trimmed_R2.fastq -o spades_output </code></div></div><p>The output includes assembled contigs (<code>contigs.fasta</code>) and scaffolds (<code>scaffolds.fasta</code>).</p><h5><strong>3.2. Canu (Long-Read Assembly)</strong></h5><p>Canu is designed for high-error long reads from PacBio or Nanopore.</p><div><div dir="ltr"><code>canu -p genome -d canu_output genomeSize=4.7m -nanopore-raw reads.fastq </code></div></div><p>The output will be in <code>canu_output/genome.contigs.fasta</code>.</p><h5><strong>3.3. Hybrid Assembly with Unicycler</strong></h5><p>Unicycler combines short and long reads for improved assemblies.</p><div><div dir="ltr"><code>unicycler -1 trimmed_R1.fastq -2 trimmed_R2.fastq -l long_reads.fastq -o unicycler_output </code></div></div><h4><strong>Step 4: Assessing Assembly Quality</strong></h4><p>After assembly, evaluate its quality using the following tools:</p><ol>
<li>
<p><strong>QUAST</strong><br />QUAST generates assembly statistics, such as N50, genome size, and GC content:</p>
<div>
<div dir="ltr"><code>quast contigs.fasta -o quast_output </code></div>
</div>
</li>
<li>
<p><strong>BUSCO</strong><br />BUSCO checks genome completeness by identifying conserved genes:</p>
<div>
<div dir="ltr"><code>busco -i contigs.fasta -o busco_output -l fungi_odb10 -m genome </code></div>
</div>
</li>
<li>
<p><strong>Assembly Graph Visualization</strong><br />Visualize assembly graphs with <strong>Bandage</strong>:</p>
<div>
<div dir="ltr"><code>Bandage load assembly_graph.gfa </code></div>
</div>
</li>
</ol><hr><h4><strong>Step 5: Post-Assembly Steps</strong></h4><ol>
<li>
<p><strong>Polishing</strong><br />Improve assembly accuracy using tools like <strong>Pilon</strong> (for short reads) or <strong>Racon</strong> (for long reads).</p>
<div>
<div dir="ltr"><code>racon long_reads.fasta mapped_reads.sam contigs.fasta &gt; polished_contigs.fasta </code></div>
</div>
</li>
<li>
<p><strong>Scaffolding</strong><br />Link contigs into scaffolds using tools like <strong>SSPACE</strong> or <strong>Opera-LG</strong> if required.</p>
</li>
<li>
<p><strong>Annotation</strong><br />Annotate the assembled genome using <strong>Prokka</strong> for prokaryotes or <strong>Maker</strong> for eukaryotes.</p>
<div>
<div dir="ltr"><code>prokka --outdir annotation_output --prefix genome contigs.fasta </code></div>
</div>
</li>
</ol><h4><strong>Step 6: Sharing and Archiving</strong></h4><ol>
<li>
<p><strong>Submit to Public Repositories</strong><br />Share your assembly in databases like <strong>NCBI GenBank</strong>, <strong>ENA</strong>, or <strong>DDBJ</strong>.</p>
</li>
<li>
<p><strong>Metadata Preparation</strong><br />Include detailed metadata for your submission, such as organism name, sequencing platform, and coverage.</p>
</li>
</ol><h4><strong>Best Practices</strong></h4><ul>
<li>Always perform quality checks at each stage to ensure data integrity.</li>
<li>Use multiple tools to cross-validate results when working with complex genomes.</li>
<li>Document parameters and software versions for reproducibility.</li>
</ul><h4><strong>Conclusion</strong></h4><p>Genome assembly is a powerful process that transforms raw sequencing data into a coherent representation of an organism&rsquo;s genome. By following this step-by-step guide, you can successfully assemble genomes and uncover valuable biological insights. Whether you&rsquo;re assembling a microbial genome or tackling the complexities of a eukaryotic genome, these tools and strategies will set you on the path to success.</p>]]></description>
	<dc:creator>Abhi</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/file/view/43732/spades-tutorial-pdf</guid>
	<pubDate>Tue, 01 Feb 2022 04:56:43 -0600</pubDate>
	<link>https://bioinformaticsonline.com/file/view/43732/spades-tutorial-pdf</link>
	<title><![CDATA[Spades tutorial PDF]]></title>
	<description><![CDATA[<p>SPAdes&mdash;St. Petersburg genome Assembler&mdash;was originally developed for de novo assembly of genome sequencing data produced for cultivated microbial isolates and for single-cell genomic DNA sequencing. With time, the functionality of SPAdes was extended to enable assembly of IonTorrent data, as well as hybrid assembly from short and long reads (PacBio and Oxford Nanopore). In this article we present protocols for five different assembly pipelines that comprise the SPAdes package and that are used for assembly of metagenomes and transcriptomes as well as assembly of putative plasmids and biosynthetic gene clusters from whole-genome sequencing and metagenomic datasets.&nbsp;</p>]]></description>
	<dc:creator>Abhimanyu Singh</dc:creator>
	<enclosure url="https://bioinformaticsonline.com/file/download/43732" length="268093" type="application/pdf" />
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/file/view/989/bioinformatics-approach-to-boar-taint</guid>
	<pubDate>Wed, 17 Jul 2013 15:50:37 -0500</pubDate>
	<link>https://bioinformaticsonline.com/file/view/989/bioinformatics-approach-to-boar-taint</link>
	<title><![CDATA[Bioinformatics approach to Boar Taint]]></title>
	<description><![CDATA[<p><span>Meat products obtained from intact male pigs often produce offensive smell or odour which is recognized as a complex genetic trait called boar taint.Androstenone and Skatole&nbsp;in the fat primarily cause boar taint. Metabolism of androstenone and sex steroids share a common pathway which makes removal of boar taint a very challenging task. Castration is a traditional solution to remove boar taint but it also results in bad quality of meat due to low level of steroids which is objectionable to many consumers. Detected functional variant(s) underlying boar taint compounds can be used as genetic markers in selection of male pigs with reduced boar taint levels. Resequencing of a total of 47 samples belong to Norwegian Landrace (NL) and Duroc (D) pigs with varied boar taint levels were done in Illumina HiSeq2000 to &gt;10X average coverage. Short reads generated from these samples mapped to&nbsp;<em>Sus Scrofa</em>&nbsp;version 10.2 reference assembly using Bowtie2. Alignment file then used for calling SNPs and InDels inside previousy identified QTL regions on SSC5,13, and 7 with the aid of FreeBayes , a variant caller tool. A final list of SNPs was prepared after filtering SNPs on the basis of SNP quality, coverage of SNP allele, functional and structural annotation, and repeats, etc. Selected SNPs will be genotyped in sample population for validation and then used for constructing SNPs haplotypes in close linkage disequilibrium with QTLs and fine mapping of QTLs through association mapping of genotyped SNPs.</span><span>&nbsp;</span></p><p><span>&nbsp;</span></p>]]></description>
	<dc:creator>Rahul Agarwal</dc:creator>
	<enclosure url="https://bioinformaticsonline.com/file/download/989" length="19688" type="image/jpeg" />
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/news/view/6302/a-allele-of-slc24a5-gene-is-found-to-be-responsible-for-variation-in-skin-color-of-south-east-asians-and-europeans</guid>
	<pubDate>Tue, 12 Nov 2013 21:02:27 -0600</pubDate>
	<link>https://bioinformaticsonline.com/news/view/6302/a-allele-of-slc24a5-gene-is-found-to-be-responsible-for-variation-in-skin-color-of-south-east-asians-and-europeans</link>
	<title><![CDATA[A-allele of SLC24A5 gene is found to be responsible for variation in skin color of South-East Asians and Europeans]]></title>
	<description><![CDATA[<p><strong>Key finding</strong>:</p><ol>
<li><span>rs1426654 SNP of <em>SLC24A5</em>&nbsp;gene is decider of skin pigmentation variation in South Asia</span></li>
<li><span><span>rs1426654-A allele is widely spread throughout the Indian subcontinent&nbsp;</span></span></li>
<li><span>Skin pigmentation is also account by the combination of processes like selection and demographic history of populations affected by their language and origin</span></li>
<li><span><span>Sign of positive selection in Europeans, Middle East, Pakistan, Central Asia and North India but not in South India</span></span></li>
<li><span><span>In European , A-allele is almost reached to fixation</span></span></li>
</ol><p><span><span><strong>Paper</strong>:</span></span></p><p><span><span><a href="http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1003912">http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1003912</a></span></span></p>]]></description>
	<dc:creator>Rahul Agarwal</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/31976/snpgenie</guid>
	<pubDate>Thu, 30 Mar 2017 17:38:02 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/31976/snpgenie</link>
	<title><![CDATA[SNPGenie]]></title>
	<description><![CDATA[<p>SNPGenie is a Perl script for estimating evolutionary parameters, mainly from pooled next-generation sequencing (NGS) single-nucleotide polymorphism (SNP) variant data. SNP reports (acceptable in a variety of formats) much each correspond to a single population, with variants called relative to a single reference sequence (one sequence in one FASTA file). Just run the main script, <strong>snpgenie.pl</strong>, in a directory containing the necessary <a href="https://github.com/hugheslab/snpgenie#snpgenie-input">input files</a>, and we take care of the rest! For the earlier version, see <a href="http://ww2.biol.sc.edu/~austin/">Hughes Lab Bioinformatics Resource</a>.</p><p>Address of the bookmark: <a href="https://github.com/hugheslab/snpgenie" rel="nofollow">https://github.com/hugheslab/snpgenie</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/news/view/13226/you-and-your-friend-have-similar-dna</guid>
	<pubDate>Sun, 27 Jul 2014 20:44:05 -0500</pubDate>
	<link>https://bioinformaticsonline.com/news/view/13226/you-and-your-friend-have-similar-dna</link>
	<title><![CDATA[You and your friend have similar DNA !!!]]></title>
	<description><![CDATA[<p>New research out of Massachusetts claims that people often choose friends that are similar to them in genetics and they are more accurate than you might suppose. A study published on PNAS&nbsp;http://www.pnas.org/content/111/Supplement_3/10796.full found that people are apt to pick friends who are genetically similar to themselves - so much so that friends tend to be as alike at the genetic level as a person's fourth cousin.</p><div style="text-align: center;"><img src="http://i.kinja-img.com/gawker-media/image/upload/s--CwLwHa43--/18fbmlokxcmqcjpg.jpg" alt="image" width="300" height="271" style="border: 0px; border: 0px;"></div><p>Scientists with a long-running Framingham Heart Study looked at 1,932 people (examination of about 1.5 million markers of genetic variations), comparing unrelated friends to unrelated strangers. They found that friends shared about 1% of their genes &mdash; a percentage much higher than those shared with strangers.This new findings made it clear that people have more DNA in common with those who are selected as friends than with strangers in the same population.&nbsp;</p><p>The genes that lined up the most were olfactory genes, which deal with smell. The ones that lined up the least were immune system genes. The researchers weren't sure why that happened :/. Olfactory genes might be a straightforward explanation: People who like the same smells tend to be drawn to similar environments, where they meet others with the same tendencies.</p><p>Reference:</p><p>http://www.pnas.org/content/111/Supplement_3/10796.full</p><p>Image : http://i.kinja-img.com</p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/38004/vcfr-a-package-to-manipulate-and-visualize-vcf-data-in-r</guid>
	<pubDate>Thu, 25 Oct 2018 09:05:59 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/38004/vcfr-a-package-to-manipulate-and-visualize-vcf-data-in-r</link>
	<title><![CDATA[vcfR:  a package to manipulate and visualize VCF data in R]]></title>
	<description><![CDATA[<p><span>VcfR is an R package intended to allow easy manipulation and visualization of variant call format (VCF) data. Functions are provided to rapidly read from and write to VCF files. Once VCF data is read into R a parser function extracts matrices from the VCF data for use with typical R functions. This information can then be used for quality control or other purposes. Additional functions provide visualization of genomic data. Once processing is complete data may be written to a VCF file or converted into other popular R objects (e.g., genlight, DNAbin). VcfR provides a link between VCF data and the R environment connecting familiar software with genomic data.</span></p><p>Address of the bookmark: <a href="https://github.com/knausb/vcfR" rel="nofollow">https://github.com/knausb/vcfR</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

</channel>
</rss>