BOL: Related items

PLAR: Pipeline for lncRNA annotation from RNA-seq data

Abhi — Fri, 07 Jan 2022 06:18:01 -0600

Due to several requests, we are releasing an assingment of orthologs, determined using the same methods used in Hezroni et al. (BLAST, Whole Genome Alignment (WGA), and synteny). One is comparing human GENCODE genes (from GENCODE v30) to lncRNAs from other species identified by PLAR. Available here.

Species	Assembly	Code	Transcriptome	lncRNAs	Protein-coding
Human	hg19	hg19	Download	Download	Download
Rhesus	rheMac3	rm3	Download	Download	Download
Marmoset	calJac3	cj3	Download	Download	Download
Mouse	mm9	mm9	Download	Download	Download
Rabbit	oryCun2	oc2	Download	Download	Download
Dog	canFam3	cf3	Download	Download	Download
Ferret	musFur1	oa3	Download	Download	Download
Opossum	monDom5	md5	Download	Download	Download
Chicken	galGal4	gg4	Download	Download	Download
Lizard	anoCar2	ac2	Download	Download	Download
Coelacanth	latCha1	lc1	Download	Download	Download
Zebrafish	danRer7	dr7	Download	Download	Download
Stickleback	gasAcu1	ga1	Download	Download	Download
Nile tilapia	oreNil2	ot2	Download	Download	Download
Spotted gar	lepOcu1	lo1	Download	Download	Download
Elephant shark	calMil1	cm1	Download	Download	Download
Sea urchin	strPur4	sp4	Download	Download	Download

Address of the bookmark: http://www.weizmann.ac.il/Biological_Regulation/IgorUlitsky/PLAR

Monkeypox virus isolate MPXV_USA_2022_MA001, complete genome

Jit — Tue, 26 Jul 2022 06:21:07 -0500

LOCUS       ON563414              197205 bp    DNA     linear   VRL 30-MAY-2022
DEFINITION  Monkeypox virus isolate MPXV_USA_2022_MA001, complete genome.
ACCESSION   ON563414
VERSION     ON563414.3
KEYWORDS    .
SOURCE      Monkeypox virus (monkeypox)
  ORGANISM  Monkeypox virus
            Viruses; Varidnaviria; Bamfordvirae; Nucleocytoviricota;
            Pokkesviricetes; Chitovirales; Poxviridae; Chordopoxvirinae;
            Orthopoxvirus.

Address of the bookmark: https://www.ncbi.nlm.nih.gov/nuccore/ON563414

MitoHiFi: a python pipeline for mitochondrial genome assembly from PacBio high fidelity reads

Abhi — Tue, 05 Sep 2023 07:31:35 -0500

MitoHiFi v3.2 is a python pipeline distributed under MIT License !

MitoHiFi was first developed to assemble the mitogenomes for a wide range of species in the Darwin Tree of Life Project (DToL)

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-023-05385-y

Address of the bookmark: https://github.com/marcelauliano/MitoHiFi

quarTeT: a telomere-to-telomere toolkit for gap-free genome assembly and centromeric repeat identification.

Abhi — Sat, 08 Jun 2024 15:54:36 -0500

quarTeT is a collection of tools for T2T genome assembly and basic analysis in automatic workflow.

Task include:

AssemblyMapper : reference-guided genome assembly
GapFiller : long-reads based gap filling
TeloExplorer : telomere identification
CentroMiner : centromere candidate prediction

https://academic.oup.com/hr/article/10/8/uhad127/7197191?login=false

Address of the bookmark: http://www.atcgn.com:8080/quarTeT/home.html

Is reference genome necessary for gene expression study in transcriptome sequencing or for variant discovery in genome sequencing?

Rahul Agarwal — Wed, 17 Jul 2013 15:25:09 -0500

Like in case of plant genomes where nature of genome is too complex and huge in size to accomplish complete de novo assembly by current sequencing technology. What would be alternate solution? Can we live in reference free world?

320000 viruses in mammals yet to sequenced in future!!!

Rahul Agarwal — Tue, 03 Sep 2013 08:35:30 -0500

With current biological technique improvements, finally it is now possible to look at millions of unknown viruses at genomic level and understand the mechanism. According to available data, close to 70 per cent of emerging viral diseases such as HIV/AIDS, West Nile, Ebola, SARS, and influenza, are zoonoses - infections of animals that cross into humans.

To address the challenges of describing and estimating virodiversity, a team of investigators from Center for Infection and Immunity (CII) and EcoHealth Alliance began in jungles of Bangladesh - home to the flying fox.

Reference:

http://economictimes.indiatimes.com/news/news-by-industry/et-cetera/mammals-harbour-at-least-320000-new-viruses/articleshow/22253268.cms

http://www.bbc.co.uk/news/science-environment-23932400

Tools to detect synteny blocks regions among multiple genomes

Jitendra Narayan — Mon, 16 Sep 2013 17:12:02 -0500

The synteny block (which etymologically means “on the same ribbon”) is a collection of contiguous genes located on the same chromosome. These block regions have mostly been preserved by genome rearrangements, and so synteny blocks from two related species (e.g., humans and mice) will be roughly similar but flipped around on the respective genomes. Ovcharenko et. al. define it as ‘any conserved sequence blocks, regardless of whether it encompasses multiple genes, an area containing single genes, or areas devoid of known genes to be considers as synteny block as long as there is conservation at the sequence level. Today, however, biologists usually refer to synteny as the conservation of blocks of order within two sets of chromosomes that are being compared with each other. This concept can also be referred to as shared synteny. The NHBLI/NCBI Glossary define synteny as “Two genes which occur on the same chromosome are syntenic; however, syntenic genes may or may not be "linked."

Now a day, geneticists have developed a language of their own. They are pouring lots of money and energy to read the entire genomic text and understand the gods own code ATGC. It is somewhat fascinating, not only for geneticist but also for non-biologist to know that there are several conserved blocks in genome which remain conserved over hundreds of millions of years. There have been several researches on conserved blocks and non-conserved regions to understand the mechanism and importance of all these regions (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2675965/). The finding indicates conservation and rearrangements of certain evolutionary important genes play an important role in evolution/adaptive changes (http://www.nature.com/nature/journal/v491/n7424/abs/nature11622.html https://academic.oup.com/gbe/article/8/8/2442/2198198/Novel-Insights-into-Chromosome-Evolution-in-Birds , http://science.sciencemag.org/content/346/6215/1311).

But the puzzle remains open, how to correctly define the synteny (presence of two or more genes on the same chromosome) and conserved synteny (presence of two or more genes on chromosome of each of the two species) on several genomes.

Figure: Image generated with Evolution Highway (EH) tool http://eh-demo.ncsa.illinois.edu/

Keeping the new approach to define conserved synteny in mind there have been various algorithms developed to identify the conserved homologous synteny blocks (HSB) amongst species. Some of them which were commonly used for synteny detections are:

SyntenyTracker ( http://www-app.igb.uiuc.edu/labs/lewin/donthu/Synteny_assign/html/),

SyntenyTracker was shown to be an efficient and accurate automated tool for defining HSBs using datasets that may contain minor errors resulting from limitations in map construction methodologies.

CoGe (http://genomevolution.org/CoGe/SynFind.pl )

Satsuma (http://evomics.org/learning/genomics/satsuma/)

Cinteny (http://cinteny.cchmc.org/) ,

Cinteny server can be used for finding regions syntenic across multiple genomes and measuring the extent of genome rearrangement using reversal distance as a measure.

OrthoCluster (http://krono.act.uji.es/noticias/orthocluster-a-new-tool-for-mining-syntenic-blocks)

A new tool for mining syntenic blocks in comparative genomics

SynMap (http://genomevolution.org/wiki/index.php/SynMap),

SyMAP (http://www.symapdb.org/)

SyMAP (Synteny Mapping and Analysis Program) v4.0 is an automated system for identifying and displaying genome synteny alignments. The genomes may be represented by sequenced chromosomes (pseudomolecules), by draft sequence contigs, or by FPC physical maps (with BAC-end or marker sequence).

http://genomevolution.org/CoGe/SynMap.pl

RegionMiner (http://www.genomatix.de/online_help/help_regionminer/orthologous.html)

SyntenyMiner is being developed as an application to visualize and interrogate comparisons among multiple complete genome sequences. http://syntenyminer.sourceforge.net/

AutoGRAPH ( http://autograph.genouest.org/),

AutoGRAPH is an integrated web server for multi-species comparative genomic analysis. It is designed for constructing and visualizing synteny maps between two or three species, determination and display of macrosynteny and microsynteny relationships among species, and for highlighting evolutionary breakpoints.

SynChro(http://www.lgm.upmc.fr/CHROnicle/SynChro.html)

SynChro is a tool designed to define conserved synteny blocks. It reconstructs synteny blocks between pairwise comparison of multiple genomes. The reconstructed synteny blocks may overlap each other, be included in one another or duplicated due to micro-rearrangements.

SyntenyView ( http://www.cbs.dtu.dk/dtucourse/cookbooks/nikob/exercises/gf1_output_5.html),

Ensembl 'SyntenyView' shows conservation of large-scale gene order between species pairs. A brief summary of the calculation method appears at the bottom of this help page. The left of a 'SyntenyView' page displays a diagram of chromosomes with blocks of conserved synteny. The right of a page shows homology matches between individual genes within syntenic blocks.

SynBrowse ( http://www.synbrowse.org/),

SynBrowse (Synteny Browser) is a generic sequence comparison tool for visualizing genome alignments both within and between species. It is intended to help scientists study and analyze synteny, homologous genes and other conserved elements between sequences. This software is useful in studying genome duplication and evolution. It can also aid in identifying uncharacterized genes, putative regulatory elements and novel structural features of study species by comparing to a well annotated reference sequence, thus enabling genome curators to refine and edit annotations of species that have incomplete genome annotations.

Sibelia (http://arxiv.org/abs/1307.7941).

A comparative genomic tool: It assists biologists in analysing the genomic variations that correlate with pathogens, or the genomic changes that help microorganisms adapt in different environments. Sibelia will also be helpful for the evolutionary and genome rearrangement studies for multiple strains of microorganisms.

GSV (http://cas-bioinfo.cas.unt.edu/gsv/homepage.php)

Genome Synteny Viewer allows users to upload files which contain synteny regions between two or more genomes and interactively visualize the synteny between them. GSV also allows users to upload annotation files to visualize annotated regions in addition to synteny regions.

MicroSyn (http://www.lgm.upmc.fr/CHROnicle/SynChro.html)

MicroSyn software as a means of detecting microsynteny in adjacent genomic regions surrounding genes in gene families. MicroSyn searches for conserved, flanking colinear homologous gene pairs between two genomic fragments to determine the relationship between two members in a gene family.

SynOrth (http://synorth.genereg.net/)

Synorth [s n ôrth], named in combination of "synteny" and "ortholog", is designed for the study of evolutionary changes of genomic regulatory blocks (GRBs) in vertebrate genomes, and especially the changes following the whole-genome duplication in teleost fish, by tracing the ortholog genes gain and loss in ancient synteny blocks.

SyDiG (http://www.ncbi.nlm.nih.gov/pubmed/21441096)

Uncovering Synteny in Distant Genomes.

MapSynteny (http://www.automatizacionysistemas.com/download.html)

MapSynteny is a macro in MS Excel® able to create images to show the relationship between genetic maps and large sequences (scaffolds, chromosomes, BACs, etc.). Based on tab – delimited BLAST results and some formulas, a suitable image of syntenic relationships or physical mapping can be obtained. http://www.automatizacionysistemas.com/Poster_MapSynteny.pdf

One of the best synteny tutorial for beginer @ http://www.nature.com/scitable/topicpage/synteny-inferring-ancestral-genomes-44022

Reference:

http://www.nature.com/scitable/topicpage/synteny-inferring-ancestral-genomes-44022

http://www.nature.com/nature/journal/v491/n7424/full/nature11622.html

http://en.wikipedia.org/wiki/Synteny

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2675965/

MedGenome is looking for Genome Analysts

Fri, 01 Jan 2021 11:06:23 -0600

MedGenome is looking for Genome Analysts (5-6 Positions), ambitious and energetic who will work both independently and as part of a collaborative team to generate data from various genomics-oriented workflows and assist in the optimization and validation of new technologies and procedures.
• Master’s in Science, 0 – 4 years of relevant experience
• Interpretation of variants/mutations causing genetic disorders using standard guidelines.
• Support in data analysis of projects

Reach out to careers@medgenome.com with your detailed profile.

Perl one-liner for bioinformatician !!!

Abhimanyu Singh — Fri, 30 May 2014 05:49:07 -0500

With the emergence of NGS technologies, and sequencing data most of the bioinformaticians mung and wrangle around massive amounts of genomics text. There are several "standardized" file formats (FASTQ, SAM, VCF, etc.) and some tools for manipulating them (fastx toolkit, samtools, vcftools, etc.), there are still times where knowing a little bit of Perl onliner is extremely helpful.

Perl one-liners are small and awesome Perl programs that fit in a single line of code and they do one thing really well. These things include changing line spacing, numbering lines, doing calculations, converting and substituting text, deleting and printing certain lines, parsing logs, editing files in-place, doing statistics, carrying out system administration tasks, updating a bunch of files at once, and many more. Perl one-liners will make you the shell warrior. Anything that took you minutes to solve, will now take you seconds!

perl -pe '$\="\n"'
#double space a file

perl -pe '$_ .= "\n" unless /^$/'
#double space a file except blank lines

perl -pe '$_.="\n"x7'
#7 space in a line.

perl -ne 'print unless /^$/'
#remove all blank lines

perl -lne 'print if length($_) < 20'
#print all lines with length less than 20.

perl -00 -pe ''
#If there are multiple spaces, delete all leaving one(make the file a single spaced file).

perl -00 -pe '$_.="\n"x4'
#Expand single blank lines into 4 consecutive blank lines

perl -pe '$_ = "$. $_"'
#Number all lines in a file

perl -pe '$_ = ++$a." $_" if /./'
#Number only non-empty lines in a file

perl -ne 'print ++$a." $_" if /./'
#Number and print only non-empty lines in a file

perl -pe '$_ = ++$a." $_" if /regex/'
#Number only lines that match a pattern

perl -ne 'print ++$a." $_" if /regex/'
#Number and print only lines that match a pattern

perl -ne 'printf "%-5d %s", $., $_ if /regex/'
#Left align lines with 5 white spaces if matches a pattern (perl -ne 'printf "%-5d %s", $., $_' : for all the lines)

perl -le 'print scalar(grep{/./}<>)'
#prints the total number of non-empty lines in a file

perl -lne '$a++ if /regex/; END {print $a+0}'
#print the total number of lines that matches the pattern

perl -alne 'print scalar @F'
#print the total number fields(words) in each line.

perl -alne '$t += @F; END { print $t}'
#Find total number of words in the file

perl -alne 'map { /regex/ && $t++ } @F; END { print $t }'
#find total number of fields that match the pattern

perl -lne '/regex/ && $t++; END { print $t }'
#Find total number of lines that match a pattern

perl -le '$n = 20; $m = 35; ($m,$n) = ($n,$m%$n) while $n; print $m'
#will calculate the GCD of two numbers.

perl -le '$a = $n = 20; $b = $m = 35; ($m,$n) = ($n,$m%$n) while $n; print $a*$b/$m'
#will calculate lcd of 20 and 35.

perl -le '$n=10; $min=5; $max=15; $, = " "; print map { int(rand($max-$min))+$min } 1..$n'
#Generates 10 random numbers between 5 and 15.

perl -le 'print map { ("a".."z",”0”..”9”)[rand 36] } 1..8'
#Generates a 8 character password from a to z and number 0 – 9.

perl -le 'print map { ("a",”t”,”g”,”c”)[rand 4] } 1..20'
#Generates a 20 nucleotide long random residue.

perl -le 'print "a"x50'
#generate a string of ‘x’ 50 character long

perl -le 'print join ", ", map { ord } split //, "hello world"'
#Will print the ascii value of the string hello world.

perl -le '@ascii = (99, 111, 100, 105, 110, 103); print pack("C*", @ascii)'
#converts ascii values into character strings.

perl -le '@odd = grep {$_ % 2 == 1} 1..100; print "@odd"'
#Generates an array of odd numbers.

perl -le '@even = grep {$_ % 2 == 0} 1..100; print "@even"'
#Generate an array of even numbers

perl -lpe 'y/A-Za-z/N-ZA-Mn-za-m/' file
#Convert the entire file into 13 characters offset(ROT13)

perl -nle 'print uc'
#Convert all text to uppercase:

perl -nle 'print lc'
#Convert text to lowercase:

perl -nle 'print ucfirst lc'
#Convert only first letter of first word to uppercas

perl -ple 'y/A-Za-z/a-zA-Z/'
#Convert upper case to lower case and vice versa

perl -ple 's/(\w+)/\u$1/g'
#Camel Casing

perl -pe 's|\n|\r\n|'
#Convert unix new lines into DOS new lines:

perl -pe 's|\r\n|\n|'
#Convert DOS newlines into unix new line

perl -pe 's|\n|\r|'
#Convert unix newlines into MAC newlines:

perl -pe '/regexp/ && s/foo/bar/'
#Substitute a foo with a bar in a line with a regexp.

Reference/Sources:

http://genomics-array.blogspot.in/2010/11/some-unixperl-oneliners-for.html

http://genomespot.blogspot.com/2013/08/a-selection-of-useful-bash-one-liners.html

http://biowize.wordpress.com/2012/06/15/command-line-magic-for-your-gene-annotations/

http://genomics-array.blogspot.com/2010/11/some-unixperl-oneliners-for.html

http://bioexpressblog.wordpress.com/2013/04/05/split-multi-fasta-sequence-file/

Swabs to Genomes: A Comprehensive Workflow

Rahul Nayak — Sun, 10 Aug 2014 03:01:21 -0500

The sequencing, assembly, and basic analysis of microbial genomes, once a painstaking and expensive undertaking, has become almost trivial for research labs with access to standard molecular biology and computational tools. However, there are a wide variety of options available for DNA library preparation and sequencing, and inexperience with bioinformatics can pose a significant barrier to entry for many who may be interested in microbial genomics. The objective of the present study was to design, test, troubleshoot, and publish a simple, comprehensive workflow from the collection of an environmental sample (a swab) to a published microbial genome; empowering even a lab or classroom with limited resources and bioinformatics experience to perform it.

Address of the bookmark: https://peerj.com/preprints/453.pdf