<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/31087?offset=1280</link>
	<atom:link href="https://bioinformaticsonline.com/related/31087?offset=1280" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	
<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/researchlabs/view/855/bahlo-lab</guid>
  <pubDate>Sun, 14 Jul 2013 12:17:38 -0500</pubDate>
  <link></link>
  <title><![CDATA[Bahlo Lab]]></title>
  <description><![CDATA[
<p>Melanie Bahlo is an applied statistician working in the areas of statistical genetics, bioinformatics and population genetics. Her main area of research is linkage mapping, in humans and mice.</p>

<p>Research Area:<br />Mapping loci in ENU mutants in mice in complex pedigrees<br />Investigation of DNA sharing in distantly related individuals<br />CNV analysis in pedigrees and connections to linkage studies<br />Statistical Genetics</p>

<p>Link @ http://www.wehi.edu.au/faculty_members/dr_melanie_bahlo</p>
]]></description>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/34552/edit-distance-application-in-bioinformatics</guid>
	<pubDate>Thu, 07 Dec 2017 08:46:51 -0600</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/34552/edit-distance-application-in-bioinformatics</link>
	<title><![CDATA[Edit distance application in bioinformatics !]]></title>
	<description><![CDATA[<p>There are other popular measures of&nbsp;<a href="https://en.wikipedia.org/wiki/Edit_distance" title="Edit distance">edit distance</a>, which are calculated using a different set of allowable edit operations. For instance,</p><ul>
<li>the&nbsp;<a href="https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance" title="Damerau&ndash;Levenshtein distance">Damerau&ndash;Levenshtein distance</a>&nbsp;allows insertion, deletion, substitution, and the&nbsp;<a href="https://en.wikipedia.org/wiki/Transposition_(mathematics)" title="Transposition (mathematics)">transposition</a>&nbsp;of two adjacent characters;</li>
<li>the&nbsp;<a href="https://en.wikipedia.org/wiki/Longest_common_subsequence_problem" title="Longest common subsequence problem">longest common subsequence</a>&nbsp;(LCS) distance allows only insertion and deletion, not substitution;</li>
<li>the&nbsp;<a href="https://en.wikipedia.org/wiki/Hamming_distance" title="Hamming distance">Hamming distance</a>&nbsp;allows only substitution, hence, it only applies to strings of the same length.</li>
<li>the&nbsp;<a href="https://en.wikipedia.org/wiki/Jaro_distance" title="Jaro distance">Jaro distance</a>&nbsp;allows only&nbsp;<a href="https://en.wikipedia.org/wiki/Transposition_(mathematics)" title="Transposition (mathematics)">transposition</a>.</li>
</ul><p>&nbsp;</p><pre><span>use</span> Text<span>::</span>Levenshtein <span>qw</span><span>(</span>distance<span>);</span>

 <span>print</span> <span>distance</span><span>(</span><span>"foo"</span><span>,</span><span>"four"</span><span>);</span>
 <span># prints "2"</span>

 <span>my</span> <span>@words</span>     <span>=</span> <span>qw</span><span>/ four foo bar /</span><span>;</span>
 <span>my</span> <span>@distances</span> <span>=</span> <span>distance</span><span>(</span><span>"foo"</span><span>,</span><span>@words</span><span>);</span>

 <span>print</span> <span>"@distances"</span><span>;</span>
 <span># prints "2 0 3"</span><br /><br /><br /></pre><pre><span>use</span> Algorithm<span>::</span>LCSS <span>qw</span><span>(</span> LCSS CSS CSS_Sorted <span>);</span>
    <span>my</span> <span>$lcss_ary_ref</span> <span>=</span> <span>LCSS</span><span>(</span> <span>\</span><span>@SEQ1</span><span>,</span> <span>\</span><span>@SEQ2</span> <span>);</span>  <span># ref to array</span>
    <span>my</span> <span>$lcss_string</span>  <span>=</span> <span>LCSS</span><span>(</span> <span>$STR1</span><span>,</span> <span>$STR2</span> <span>);</span>    <span># string</span>
    <span>my</span> <span>$css_ary_ref</span> <span>=</span> <span>CSS</span><span>(</span> <span>\</span><span>@SEQ1</span><span>,</span> <span>\</span><span>@SEQ2</span> <span>);</span>    <span># ref to array of arrays</span>
    <span>my</span> <span>$css_str_ref</span> <span>=</span> <span>CSS</span><span>(</span> <span>$STR1</span><span>,</span> <span>$STR2</span> <span>);</span>      <span># ref to array of strings</span>
    <span>my</span> <span>$css_ary_ref</span> <span>=</span> <span>CSS_Sorted</span><span>(</span> <span>\</span><span>@SEQ1</span><span>,</span> <span>\</span><span>@SEQ2</span> <span>);</span>  <span># ref to array of arrays</span>
    <span>my</span> <span>$css_str_ref</span> <span>=</span> <span>CSS_Sorted</span><span>(</span> <span>$STR1</span><span>,</span> <span>$STR2</span> <span>);</span>    <span># ref to array of strings<br /><br /><br /><br /></span></pre><p>There are many different modules on CPAN for calculating the edit distance between two strings. Here's just a selection.</p><p><a href="http://search.cpan.org/perldoc?Text%3A%3ALevenshteinXS">Text::LevenshteinXS</a>&nbsp;and&nbsp;<a href="http://search.cpan.org/perldoc?Text%3A%3ALevenshtein%3A%3AXS">Text::Levenshtein::XS</a>&nbsp;are both versions of the Levenshtein algorithm that require a C compiler, but will be a lot faster than this module.</p><p>The Damerau-Levenshtein edit distance is like the Levenshtein distance, but in addition to insertion, deletion and substitution, it also considers the transposition of two adjacent characters to be a single edit. The module&nbsp;<a href="http://search.cpan.org/perldoc?Text%3A%3ALevenshtein%3A%3ADamerau">Text::Levenshtein::Damerau</a>&nbsp;defaults to using a pure perl implementation, but if you've installed&nbsp;<a href="http://search.cpan.org/perldoc?Text%3A%3ALevenshtein%3A%3ADamerau%3A%3AXS">Text::Levenshtein::Damerau::XS</a>&nbsp;then it will be a lot quicker.</p><p><a href="http://search.cpan.org/perldoc?Text%3A%3AWagnerFischer">Text::WagnerFischer</a>&nbsp;is an implementation of the Wagner-Fischer edit distance, which is similar to the Levenshtein, but applies different weights to each edit type.</p><p><a href="http://search.cpan.org/perldoc?Text%3A%3ABrew">Text::Brew</a>&nbsp;is an implementation of the Brew edit distance, which is another algorithm based on edit weights.</p><p><a href="http://search.cpan.org/perldoc?Text%3A%3AFuzzy">Text::Fuzzy</a>&nbsp;provides a number of operations for partial or fuzzy matching of text based on edit distance.&nbsp;<a href="http://search.cpan.org/perldoc?Text%3A%3AFuzzy%3A%3APP">Text::Fuzzy::PP</a>&nbsp;is a pure perl implementation of the same interface.</p><p><a href="http://search.cpan.org/perldoc?String%3A%3ASimilarity">String::Similarity</a>&nbsp;takes two strings and returns a value between 0 (meaning entirely different) and 1 (meaning identical). Apparently based on edit distance.</p><p><a href="http://search.cpan.org/perldoc?Text%3A%3ADice">Text::Dice</a>&nbsp;calculates&nbsp;<a href="https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient">Dice's coefficient</a>&nbsp;for two strings. This formula was originally developed to measure the similarity of two different populations in ecological research.</p><pre><span>&nbsp;</span></pre>]]></description>
	<dc:creator>Neel</dc:creator>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/researchlabs/view/867/bc-cancer-agency-genome-sciences-centre</guid>
  <pubDate>Sun, 14 Jul 2013 13:21:27 -0500</pubDate>
  <link></link>
  <title><![CDATA[BC Cancer Agency Genome Sciences Centre]]></title>
  <description><![CDATA[
<p>Research Area</p>

<p>Genome analysis, genome visualization, mutation detection, molecular docking, comparative genomics, cancer informatics</p>

<p>Link @ http://www.bcgsc.ca</p>
]]></description>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/35534/awk-for-bioinformatician-and-computational-biologist</guid>
	<pubDate>Tue, 06 Feb 2018 14:54:35 -0600</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/35534/awk-for-bioinformatician-and-computational-biologist</link>
	<title><![CDATA[Awk for Bioinformatician and computational biologist]]></title>
	<description><![CDATA[<p>Awk is a programming language which allows easy manipulation of structured data and is mostly used for pattern scanning and processing. It searches one or more files to see if they contain lines that match with the specified patterns and then perform associated actions. The basic syntax is:</p><blockquote><p><br />awk '/pattern1/ {Actions}<br /> /pattern2/ {Actions}' file</p></blockquote><p><br />The working of Awk is as follows<br />Awk reads the input files one line at a time.<br />For each line, it matches with given pattern in the given order, if matches performs the corresponding action.<br />If no pattern matches, no action will be performed.<br />In the above syntax, either search pattern or action are optional, But not both.<br />If the search pattern is not given, then Awk performs the given actions for each line of the input.<br />If the action is not given, print all that lines that matches with the given patterns which is the default action.<br />Empty braces with out any action does nothing. It wont perform default printing operation.<br />Each statement in Actions should be delimited by semicolon.<br />Say you have data.tsv with the following contents:</p><p><br />$ cat data/test.tsv<br />contig1 ACTGTCTGTCACTGTGTTGTGATGTTGTGTGTG<br />contig2 ACTTTATATATT<br />contig3 ACTTATATATATATA<br />contig4 ACTTATATATATATA<br />contig5 ACTTTATATATT <br />By default Awk prints every line from the file.</p><p><br />$ awk '{print;}' data/test.tsv<br />contig1 ACTGTCTGTCACTGTGTTGTGATGTTGTGTGTG<br />contig2 ACTTTATATATT<br />contig3 ACTTATATATATATA<br />contig4 ACTTATATATATATA<br />contig5 ACTTTATATATT <br />We print the line which matches the pattern contig3</p><p><br />$ awk '/contig3/' data/test.tsv<br />contig3 ACTTATATATATATA<br />Awk has number of builtin variables. For each record i.e line, it splits the record delimited by whitespace character by default and stores it in the $n variables. If the line has 5 words, it will be stored in $1, $2, $3, $4 and $5. $0 represents the whole line. NF is a builtin variable which represents the total number of fields in a record.</p><p><br />$ awk '{print $1","$2;}' data/test.tsv<br />contig1,ACTGTCTGTCACTGTGTTGTGATGTTGTGTGTG<br />contig2,ACTTTATATATT<br />contig3,ACTTATATATATATA<br />contig4,ACTTATATATATATA<br />contig5,ACTTTATATATT</p><p>$ awk '{print $1","$NF;}' data/test.tsv<br />contig1,ACTGTCTGTCACTGTGTTGTGATGTTGTGTGTG<br />contig2,ACTTTATATATT<br />contig3,ACTTATATATATATA<br />contig4,ACTTATATATATATA<br />contig5,ACTTTATATATT</p><p><br />Awk has two important patterns which are specified by the keyword called BEGIN and END. The syntax is as follows:</p><blockquote><p>BEGIN { Actions before reading the file}<br />{Actions for everyline in the file} <br />END { Actions after reading the file }</p></blockquote><p><br />For example,<br />$ awk 'BEGIN{print "Header,Sequence"}{print $1","$2;}END{print "-------"}' data/test.tsv<br />Header,Sequence<br />contig1,ACTGTCTGTCACTGTGTTGTGATGTTGTGTGTG<br />contig2,ACTTTATATATT<br />contig3,ACTTATATATATATA<br />contig4,ACTTATATATATATA<br />contig5,ACTTTATATATT<br />------- <br />We can also use the concept of a conditional operator in print statement of the form print CONDITION ? PRINT_IF_TRUE_TEXT : PRINT_IF_FALSE_TEXT. For example, in the code below, we identify sequences with lengths &gt; 14:</p><p>$ awk '{print (length($2)&gt;14) ? $0"&gt;14" : $0"&lt;=14";}' data/test.tsv<br />contig1 ACTGTCTGTCACTGTGTTGTGATGTTGTGTGTG&gt;14<br />contig2 ACTTTATATATT&lt;=14<br />contig3 ACTTATATATATATA&gt;14<br />contig4 ACTTATATATATATA&gt;14<br />contig5 ACTTTATATATT&lt;=14<br />We can also use 1 after the last block {} to print everything (1 is a shorthand notation for {print $0} which becomes {print} as without any argument print will print $0 by default), and within this block, we can change $0, for example to assign the first field to $0 for third line (NR==3), we can use:</p><p>$ awk 'NR==3{$0=$1}1' data/test.tsv<br />contig1 ACTGTCTGTCACTGTGTTGTGATGTTGTGTGTG<br />contig2 ACTTTATATATT<br />contig3<br />contig4 ACTTATATATATATA<br />contig5 ACTTTATATATT<br />You can have as many blocks as you want and they will be executed on each line in the order they appear, for example, if we want to print $1 three times (here we are using printf instead of print as the former doesn't put end-of-line character),</p><p>$ awk '{printf $1"\t"}{printf $1"\t"}{print $1}' data/test.tsv<br />contig1 contig1 contig1<br />contig2 contig2 contig2<br />contig3 contig3 contig3<br />contig4 contig4 contig4<br />contig5 contig5 contig5 <br />Although, we can also skip executing later blocks for a given line by using next keyword:</p><p>$ awk '{printf $1"\t"}NR==3{print "";next}{print $1}' data/test.tsv<br />contig1 contig1<br />contig2 contig2<br />contig3 <br />contig4 contig4<br />contig5 contig5</p><p>$ awk 'NR==3{print "";next}{printf $1"\t"}{print $1}' data/test.tsv<br />contig1 contig1<br />contig2 contig2</p><p>contig4 contig4<br />contig5 contig5<br />You can also use getline to load the contents of another file in addition to the one you are reading, for example, in the statement given below, the while loop will load each line from test.tsv into k until no more lines are to be read:</p><p>$ awk 'BEGIN{while((getline k &lt;"data/test.tsv")&gt;0) print "BEGIN:"k}{print}' data/test.tsv<br />BEGIN:contig1 ACTGTCTGTCACTGTGTTGTGATGTTGTGTGTG<br />BEGIN:contig2 ACTTTATATATT<br />BEGIN:contig3 ACTTATATATATATA<br />BEGIN:contig4 ACTTATATATATATA<br />BEGIN:contig5 ACTTTATATATT<br />contig1 ACTGTCTGTCACTGTGTTGTGATGTTGTGTGTG<br />contig2 ACTTTATATATT<br />contig3 ACTTATATATATATA<br />contig4 ACTTATATATATATA<br />contig5 ACTTTATATATT <br />You can also store data in the memory with the syntax VARIABLE_NAME[KEY]=VALUE which you can later use through for (INDEX in VARIABLE_NAME) command:</p><p>$ awk '{i[$1]=1}END{for (j in i) print j"&lt;="i[j]}' data/test.tsv<br />contig1&lt;=1<br />contig2&lt;=1<br />contig3&lt;=1<br />contig4&lt;=1<br />contig5&lt;=1</p>]]></description>
	<dc:creator>Poonam Mahapatra</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/901/bioinformatics-definitions</guid>
	<pubDate>Mon, 15 Jul 2013 03:01:07 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/901/bioinformatics-definitions</link>
	<title><![CDATA[Bioinformatics Definitions]]></title>
	<description><![CDATA[<p>"Bioinformatics is a science of biological predictions and analysis" --&nbsp;Jitendra Narayan</p><p>"The mathematical, statistical and computing methods that aim to solve biological problems using DNA and amino acid sequences and related information."</p><p>"The collection, organization and analysis of large amounts of biological data, using networks of computers and databases." - from the glossary for ABC Science Online's feature: The State of the Genome 2001.</p><p>"It is defined here as an interdisciplinary research area that applies computer and information science to solve biological problems. However, this is not the only definition. The field is being defined (and redefined) at present, and there are probably as many definitions as there are bioinformaticians (bioinformaticists?).</p><p>The following references are a snapshot of the moving target named bioinformatics. ... " - from the University of Minnesota Graduate Program in Bioinformatics' page: What is Bioinformatics,<br /><br />"The application of computer technology to the management of biological information.Bioinformatics uses computers to solve problems in the life sciences, such as determination of DNA and protein sequences, investigation of protein functions, development of pharmaceuticals. It involves the creation of extensive electronic databases on genomes and protein sequences, and techniques such as the three-dimensional modeling of biomolecules and biologic systems. ..." - from the Bioinformatics Glossary edited by Charles E. Kahn, Jr., Medical College of Wisconsin.<br /><br />"Bioinformatics is the field of science in which biology, computer science, and information technology merge to form a single discipline. The ultimate goal of the field is to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned." - from the National Center for Biotechnology Information's Bioinformatics Factsheet.<br /><br />"Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data." - NIH Bioinformatics Web site<br /><br />"The use of computers, laboratory robots and software to create, manage and interpret massive sets of complex biological data." - from the glossary for the University of Michigan Health System's Symphony of Life: Genetics &amp; Medicine Web site.<br /><br />"The field of science in which biology, computer science, and information technology merge into a single discipline.There are three important sub-disciplines within bioinformatics: (1) the development of new algorithms and statistics with which to assess relationships among members of large data sets; (2) the analysis and interpretation of various types of data including nucleotide and amino acid sequences, protein domains, and protein structures; and (3) the development and implementation of tools that enable efficient access and management of different types of information." - U.S. Environmental Protection Agency's ComputationalToxicology Research Glossary.<br /><br />What is Bioinformatics? "One idea for a definition: (Molecular) Bio - informatics = is conceptualizing biology in terms of molecules (in the sense of physical-chemistry) and then applying "informatics" techniques (derived from disciplines such as applied math, CS, and statistics) to understand and organize the information associated with these molecules, on a large-scale." - By Mark Gerstein, Gerstein Group - Yale Bioinformatics.<br /><br /><strong>Bioinformatics</strong></p><p><strong>Definition:</strong></p><p>Bioinformatics derives knowledge from computer analysis of biological data. These can consist of the information stored in the genetic code, but also experimental results from various sources, patient statistics, and scientific literature. Research in bioinformatics includes method development for storage, retrieval, and analysis of the data. Bioinformatics is a rapidly developing branch of biology and is highly interdisciplinary, using techniques and concepts from informatics, statistics, mathematics, chemistry, biochemistry, physics, and linguistics. It has many practical applications in different areas of biology and medicine.</p><p><strong>Description:</strong></p><p>The history of computing in biology goes back to the 1920s when scientists were already thinking of establishing biological laws solely from data analysis by induction (e.g. A.J. Lotka, Elements of Physical Biology, 1925). However, only the development of powerful computers, and the availability of experimental data that can be readily treated by computation (for example, DNA or amino acid sequences and three&ndash;dimensional structures of proteins) launched bioinformatics as an independent field. Today, practical applications of bioinformatics are readily available through the world wide web, and are widely used in biological and medical research. As the field is rapidly evolving, the very definition of bioinformatics is still the matter of some debate.</p><p>The relationship between computer science and biology is a natural one for several reasons. First, the phenomenal rate of biological data being produced provides challenges: massive amounts of data have to be stored, analysed, and made accessible. Second, the nature of the data is often such that a statistical method, and hence computation, is necessary. This applies in particular to the information on the building plans of proteins and of the temporal and spatial organisation of their expression in the cell encoded by the DNA. Third, there is a strong analogy between the DNA sequence and a computer program (it can be shown that the DNA represents a Turing Machine).</p><p>Analyses in bioinformatics focus on three types of datasets: genome sequences, macromolecular structures, and functional genomics experiments (e.g. expression data, yeast two&ndash;hybrid screens). But bioinformatic analysis is also applied to various other data, e.g. taxonomy trees, relationship data from metabolic pathways, the text of scientific papers, and patient statistics. A large range of techniques are used, including primary sequence alignment, protein 3D structure alignment, phylogenetic tree construction, prediction and classification of protein structure, prediction of RNA structure, prediction of protein function, and expression data clustering. Algorithmic development is an important part of bioinformatics, and techniques and algorithms were specifically developed for the analysis of biological data (e.g., the dynamic programming algorithm for sequence alignment).</p><p>Bioinformatics has a large impact on biological research. Giant research projects such as the human genome project [4] would be meaningless without the bioinformatics component. The goal of sequencing projects, for example, is not to corroborate or refute a hypothesis, but to provide raw data for later analysis. Once the raw data are available, hypotheses may be formulated and tested in silico. In this manner, computer experiments may answer biological questions which cannot be tackled by traditional approaches. This has led to the founding of dedicated bioinformatics research groups as well as to a different work practice in the average bioscience laboratory where the computer has become an essential research tool.</p><p>Three key areas are the organisation of knowledge in databases, sequence analysis, and structural bioinformatics.</p><p><strong>Organizing biological knowledge in databases:</strong></p><p>Biological raw data are stored in public databanks (such as Genbank or EMBL for primary DNA sequences). The data can be submitted and accessed via the world wide web. Protein sequence databanks like trEMBL provide the most likely translation of all coding sequences in the EMBL databank. Sequence data are prominent, but also other data are stored, e. g. yeast two&ndash;hybrid screens, expression arrays, systematic gene&ndash;knock&ndash;out experiments, and metabolic pathways.</p><p>The stored data need to be accessed in a meaningful way, and often contents of several databanks or databases have to be accessed simultaneously and correlated with each other. Special languages have been developed to facilitate this task (such as the Sequence Retrieval System (SRS) and the Entrez system). An unsolved problem is the optimal design of inter&ndash;operating database systems. Databases provide additional functionality such as access to sequence homology searches and links to other databases and analysis results. For example, SWISSPROT [1] contains verified protein sequences and more annotations describing the function of a protein. Protein 3D structures are stored in specific databases (for example, the Protein Data Bank [2], now primarily curated and developed by the Research Collaboratory for Structural Bioinformatics). Organism specific databases have been developed (such as ACEDB, the A C. Elegans DataBase for the C. elegans genome, FLYBASE for D. melanogaster etc). A major problem are errors in databanks and databases (mostly errors in annotation), in particular since errors propagate easily through links.</p><p>Also databases of scientific literature (such as PUBMED, MEDLINE) provide additional functionality, e.g. they can search for similar articles based on word&ndash;usage analysis. Text recognition systems are being developed that extract automatically knowledge about protein function from the abstracts of scientific articles, notably on protein&ndash;protein interactions.</p><p><strong>Analysing sequence data:</strong></p><p>The primary data of sequencing projects are DNA sequences. These become only really valuable through their annotation. Several layers of analysis with bioinformatics tools are necessary to arrive from a raw DNA sequence at an annotated protein sequences:</p><ul>
<li>establish the correct order of sequence contigs to obtain one continuous sequence;</li>
<li>find the tranlation and transcription initiation sites, find promoter sites, define open reading frames (ORF);</li>
<li>find splice sites, introns, exons;</li>
<li>translate the DNA sequence into a protein sequence, searching all six frames;</li>
<li>compare the DNA sequence to known protein sequences in order to verify exons etc with homologuous sequences.</li>
</ul><p>Some completely automated annotation systems have been developed (e.g., GENEQUIZ), which use a multitude of different programs and methods.</p><p>The protein sequences are further analysed to predict function. The function can often be inferred if a sequence of a homologous protein with known function can be found. Homology searches are the predominant bioinformatics application, and very efficient search methods have been developed [3]. The often difficult distinction between orthologous sequences and paralogous sequences facilitates the functional annotation in the comparison of whole genomes. Several methods detect glycolysation, myristylation and other sites, and the prediction of signal peptides in the amino acid sequence give valuable information about the subcellular location of a protein.</p><p>The ultimate goal of sequence annotation is to arrive at a complete functional description of all genes of an organism. However, function is an ill&ndash;defined concept. Thus, the simplified idea of &ldquo;one gene &ndash; one protein &ndash; one structure &ndash; one function&rdquo; cannot take into account proteins that have multiple functions depending on context (e.g., subcellar location and the presence of cofactors). Well-known cases of &ldquo;moonlighting&rdquo; proteins are lens crystalline and phosphoglucose isomerase. Currently, work on ontologies is under way to explicitly define a vocabulary that can be applied to all organisms even as knowledge of gene and protein roles in cells is accumulating and changing.</p><p>Families of similar sequences contain information on sequence evolution in the form of specific conservation patters at all sequence positions. Multiple sequence alignments are useful for</p><ul>
<li>building sequence profiles or Hidden Markov Models to perform more sensitive homology searches. A sequence profile contains information about the variability of every sequence position. improving structure prediction methods (secondary structure prediction). Sequence profile searches have become readily available through the introduction of PsiBLAST [3];</li>
<li>studying evolutionary aspects, by the construction of phylogenetic trees from the pairwise differences between sequences: for example, the classification with 70S, 30S RNAs established the separate kingdom of archeae;</li>
<li>determining active site residues, and residues specifc for subfamilies;</li>
<li>predicting protein&ndash;protein interactions;</li>
<li>analysing single nucleotide polymorphisms to hunt for genetic sources of deseases.</li>
<li>Many complete genomes of microorganisms and a few of eukaryotes are available [4]. By analysis of entire genome sequences a wealth of additional information can be obtained. The complete genomic sequence contains not only all protein sequences but also sequences regulating gene expression. A comparison of the genomes of genetically close organisms reveals genes responsible for specific properties of the organisms (e.g., infectivity). Protein interactions can be predicted from conservation of gene order or operon organisation in different genomes. Also the detection of gene fusion and gene fission (i.e, one protein is split into two in another genome) events helps to deduce protein interactions.</li>
</ul><p><strong>Structural bioinformatics:</strong></p><p>This branch of bioinformatics is concerned with computational approaches to predict and analyse the spatial structure of proteins and nucleic acids. Whereas in many cases the primary sequence uniquely specifies the three&ndash;dimensional (3D) structure, the specific rules are not well understood, and the protein folding problem remains largely unsolved. Some aspects of protein structure can already be predicted from amino acid content. Secondary structure can be deduced from the primary sequence with statistics or neural networks. When using a multiple sequence alignment, secondary structure can be predicted with an accuracy above 70 %.</p><p>3D models can be obtained most easily if the 3D structure of a homologous protein is known (homology modelling, comparative modelling). A homology model can only be as good as the sequence alignment: whereas protein relationships can be detected at the 20% identity level and below, a correct sequence alignment becomes very difficult, and the homology model will be doubtful. From 40 to 50% identity the models are usually mostly correct; however, it is possible to have 50% identity between two carefully designed protein sequences with different topology (the so &ndash;called JANUS protein). Remote relationships that are undetectable by sequence comparisons may be detected by sequence&ndash;to&ndash;structure&ndash;fitness (or threading) approaches: the search sequence is systematically compared to all known protein structures. Ab initio predictions of protein 3D structure remains the major challenge; some progress has been made recently by combining statistical with force&ndash;field based approaches.</p><p>Membrane proteins are interesting drug targets. It is estimated that membrane receptors form 50 % of all drug targets in pharmacological research. However, membrane proteins are underrepresented in the PDB structure database. Since membrane proteins are usually excluded from structural genomics initiatives due to technical problems, the prediction of transmembrane helices and solvent accessibility is very important. Modern methods can predict transmembrane helices with a reliability greater than 70 %.</p><p>Understanding the 3D structure of a macromolecule is crucial for understanding its function. Many properties of the 3D structure cannot be deduced directly from the primary sequence. Obtaining better understanding of protein function is the driving force behind structural genomics efforts, which can be thus understood as part of functional genomics. Similar structure can imply similar function. General structure&ndash;to&ndash;function relationships can be obtained by statistical approaches, for example, by relating secondary structure to known protein function or surface properties to cell location.</p><p>The increased speed of structure determination necessary for the structural genomics projects make an independent validation of the structures (by comparison to expected properties) particularly important. Structure validation helps to correct obvious errors (e.g., in the covalent structure) and leads to a more standardized representation of structural data, e.g., by agreeing on a common atom name nomenclature. The knowledge of the structure quality is a prerequisite for further use of the structure, e.g in molecular modelling or drug design.</p><p>In order to make as much data on the structure and its determination available in the databases, approaches for automated data harvesting are being developed. Structure classification schemes, as implemented for example in the SCOP, CATH, and FSSP databases, elucidate the relationship between protein folds and function and shed light on the evolution of protein domains.</p><p>Combined analysis of structural and genomic data will certainly get more important in the near future. Protein folds can be analysed for whole genomes. Protein&ndash;protein interactions predicted on the sequence level, can be studied in more detail on the structure level. Single Nucleotide Polymorphisms can be mapped on 3D structures of proteins in order to elucidate specific structural causes of disease.</p><p>More detailed aspects of protein function can be obtained also by force&ndash;field based approaches. Whereas protein function requires protein dynamics, no experimental technique can observe it directly on an atomic scale, and motions have to be simulated by molecular dynamics (MD) simulations. Also free energy differences (for example between binding energies of different protein ligands) can be characterized by MD simulations. Molecular mechanics or molecular dynamics based approaches are also necessary for homology modelling and for structure refinement in X&ndash;ray crystallography and NMR structure determination.</p><p>Drug design exploits the knowledge of the 3D structure of the binding site (or the structure of the complex with a ligand) to construct potential drugs, for example inhibitors of viral proteins or RNA. In addition to the 3D structure, a force field is necessary to evaluate the interaction between the protein and a ligand (to predict binding energies). In virtual screening, a library of molecules is tested on the computer for their capacities to bind to the macromolecule.</p><p><strong>Pharmacological Relevance:</strong></p><p>Many aspects of bioinformatics are relevant for pharmacology. Drug targets in infectious organisms can be revealed by whole genome comparisons of infectious and non&ndash;infectious organisms. The analysis of single nucleotide polymorphisms reveals genes potentially responsible for genetic deseases. Prediction and analysis of protein 3D structure is used to develop drugs and understand drug resistance.</p><p>Patient databases with genetic profiles, e.g. for cardiovascular diseases, diabetes, cancer, etc. may play an important role in the future for individual health care, by integrating personal genetic profile into diagnosis, despite obvious ethical problems. The goal is to analyse a patient&rsquo;s individual genetic profile and compare it with a collection of reference profiles and other related information. This may improve individual diagnosis, prophylaxis, and therapy.</p><p><strong>References:</strong></p><p>Bairoch A, Apweiler R (2000) The SWISS&ndash;PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28:45&ndash;48<br />Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The Protein Data Bank. Nucleic Acids Res. 28:235&ndash;42<br />Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI&ndash;BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389&ndash;3402<br />Pearson WR (2000) Flexible sequence similarity searching with the FASTA3 program package. Methods Mol. Biol. 132:185&ndash;219<br />The Genome International Sequencing Consortium (2001) Initial sequencing and analysis of the human genome. Nature 409:860&ndash;921<br />JC Venter et al. (2001) The sequence of the human genome. Science 291:1304&ndash;1351<br />R.D. Fleischmann et al. (1995) Whole&ndash;genome random sequencing and assembly of haemophilus&ndash;influenzae. Science 269:496&ndash;51</p>]]></description>
	<dc:creator>Jitendra Narayan</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/36211/project-based-approach-to-improve-bioinformatics-education-with-skilled-and-meaningful-access-to-omics-data</guid>
	<pubDate>Wed, 11 Apr 2018 13:31:42 -0500</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/36211/project-based-approach-to-improve-bioinformatics-education-with-skilled-and-meaningful-access-to-omics-data</link>
	<title><![CDATA[Project-based approach to improve bioinformatics education with skilled and meaningful access to omics data]]></title>
	<description><![CDATA[<p>Pine Biotech has been collaborating with Loyola University of New Orleans on piloting a new approach to bioinformatics education using the intuitive and logic-drive bioinformatics platform T-BioInfo.</p><p>https://edu.t-bio.info/collaborative-model-bioinformatics-education-combining-biologically-inspired-bioinformatics-project-based-learning/</p>]]></description>
	<dc:creator>eliabrodsky</dc:creator>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/opportunity/view/915/researcher-in-computer-sciencebiology</guid>
  <pubDate>Mon, 15 Jul 2013 18:38:40 -0500</pubDate>
  <link></link>
  <title><![CDATA[Researcher in computer science/biology]]></title>
  <description><![CDATA[
<p>Researcher in Computer Science at the Computational Biology Unit - temporary employment</p>

<p>The Department of Informatics is a vacant position as a researcher in computer science, related to Computational Biology Unit (CBU), for 3 years.<br /> <br />The position is part of CBU Service Group and will focus on bioinformatic analysis project and especially the analysis of high-throughput data, including NGS (sequencing), and proteomics data.<br /> <br />The successful candidate will be part of the Norwegian bioinformatics platform's national helpdesk within the project ELIXIR.NO<br /> <br />Applicants must hold a PhD in a relevant subject such as computer science, mathematics, molecular biology and also possess expertise and experience in bioinformatics statistics and analysis of data from high-throughput molecular experiment.<br /> <br />Basic programming or scripting skills are required. Experience in Python, R, Perl, Linux-based operating systems and moreover knowledge of databases and web programming will be a strength for applicants.<br /> <br />We expect enthusiasm and independence and moreover the ability to work in an interdisciplinary team environment.<br /> <br />Good knowledge of English is required.<br /> <br />Salaries start at level 57 (code 1109/LR 24.1) by appointment. Further promotion occurs after<br />service seniority in the position (at grade 57-65). Of particularly highly qualified applicants may be considered a higher salary.<br /> <br />Further information about the position is available from the chair of the CBU, <br />Professor Inge Jonassen, e-mail: Inge.Jonassen @ ii.uib.no<br /> <br />The successful applicant must comply with the guidelines that apply at any given time the position.<br /> <br />State employment shall as far as possible reflect the diversity of the population. It is therefore an objective to achieve a balanced age and sex composition and the recruitment of persons with immigrant backgrounds. Persons with immigrant background are requested to apply for the position.<br /> <br />Women are particularly encouraged to apply. If the experts find that several applicants have approximately equivalent qualifications, the rules on equal in the Personnel Regulations for Academic Positions will be applied.<br /> <br />University of Bergen applies the principles of public openness when recruiting staff to scientific positions.<br /> <br />Information about the applicant may be made public even though the applicant has requested not to be named in the list of applicants. If the request does not host admitted to the result, the applicant shall be notified of this.<br /> <br />Send application, CV, certificates, diplomas, undergraduate work and a list of publications (list of publications) online by clicking on https://www.jobbnorge.no/jobbsoknet/login.aspx?returnurl=/jobbsoknet/jobapplication.aspx?jobid=95196<br /> <br />You need to upload certified translations into English or a Scandinavian language of appendices, such as diplomas and transcripts.<br /> <br />Applications sent by email to individuals at the institute will not be considered.<br /> <br />Deadline: 9 August 2013</p>
]]></description>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/file/view/38029/biologist-versus-computational-biologist</guid>
	<pubDate>Mon, 29 Oct 2018 04:23:24 -0500</pubDate>
	<link>https://bioinformaticsonline.com/file/view/38029/biologist-versus-computational-biologist</link>
	<title><![CDATA[Biologist versus computational biologist !]]></title>
	<description><![CDATA[<p>This is how it work :)</p>]]></description>
	<dc:creator>Abhimanyu Singh</dc:creator>
	<enclosure url="https://bioinformaticsonline.com/file/download/38029" length="69305" type="image/png" />
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/opportunity/view/1149/system-biologist-at-millennium-software-productions-india-private-limited</guid>
  <pubDate>Fri, 19 Jul 2013 09:43:53 -0500</pubDate>
  <link></link>
  <title><![CDATA[System Biologist at Millennium Software productions India Private Limited]]></title>
  <description><![CDATA[
<p>Millennium Software productions India Private Limited</p>

<p>www.cytosolve.com</p>

<p>Post - System Biologist</p>

<p>Job Description: Role of system biology is to design quantitative models of bimolecular networks and to study interactions between the components of biological systems, and how these interactions give rise to the function and behavior of that system (Enzyme, metabolites and pathway).</p>

<p>Qualification : B.Tech or M.Sc in Bioinformatics</p>

<p>Required Skills:</p>

<p>1) Basic knowledge of cell signaling pathways, chemical/enzyme kinetics, and differential equation based modeling approach.<br />2) Previous laboratory experience could be an advantage<br />3) Good Communication skills.</p>

<p>santhiya.ram@mproductions.com and 044-42946555.</p>
]]></description>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/news/view/39471/bioinformatics-for-precision-oncology-online-training-program-summer-2019</guid>
	<pubDate>Wed, 05 Jun 2019 15:04:41 -0500</pubDate>
	<link>https://bioinformaticsonline.com/news/view/39471/bioinformatics-for-precision-oncology-online-training-program-summer-2019</link>
	<title><![CDATA[Bioinformatics for Precision Oncology - Online Training Program, Summer 2019]]></title>
	<description><![CDATA[<p><img src="https://edu.t-bio.info/wp-content/uploads/2019/05/OncologyBioinformatics.jpeg" width="600" height="337.5" alt="image" style="border: 0px;"></p><p>The bioinforamtics for precision oncology online course provides an opportunity to learn about bioinformatics methods used in precision oncology research and practice. As a subset of precision medicine, precision oncology deals with molecular factors involved in the biological rpocesses that lead to cancer and can help diagnose, treat or prevent this disease. Oncology is driven by data, often times generated using Next Generation Sequencing (NGS) that helps us study the genomic and transcriptomic sub-cellular processes. Learn more and register:&nbsp;https://edu.t-bio.info/bioinformatics-training-precision-oncology/</p>]]></description>
	<dc:creator>eliabrodsky</dc:creator>
</item>

</channel>
</rss>