BOL: Bioinformatics Algorithms

Pages
Jitendra Narayan
Bioinformatics Algorithms

Bioinformatics Algorithms

Last updated 3938 days ago by Jitendra Narayan Comments (6)

An algorithm is a computable set of steps to achieve a desired result.

We use algorithms every day. For example, a recipe for baking a cake is an algorithm. Most programs, with the exception of some artificial intelligence applications, consist of algorithms. Inventing elegant algorithms -- algorithms that are simple and require the fewest steps possible -- is one of the principal challenges in programming. An algorithm is a description of a procedure which terminates with a result. In other words an algorithm is a set of instructions, sometimes called a procedure or a function, that is used to perform a certain task. This can be a simple process, such as adding two numbers together, or a complex function, such as adding effects to an image. For example, in order to sharpen a digital photo, the algorithm would need to process each pixel in the image and determine which ones to change and how much to change them in order to make the image look sharper.

In mathematics, computer science, and related subjects, an algorithm is an effective method for solving a problem using a finite sequence of instructions. Algorithms are used for calculation, data processing, and many other fields.
Each algorithm is a list of well-defined instructions for completing a task. Starting from an initial state, the instructions describe a computation that proceeds through a well-defined series of successive states, eventually terminating in a final ending state. The transition from one state to the next is not necessarily deterministic; some algorithms, known as randomized algorithms, incorporate randomness.

History

The origin of the term comes from the ancients. The concept becomes more precise with the use of variables in mathematics. Algorithm in the sense of what is now used by computers appeared as soon as first mechanical engines were invented.
The word algorithm comes from the name of the 9th century Persian Muslim mathematician Abu Abdullah Muhammad ibn Musa Al-Khwarizmi. The word algorism originally referred only to the rules of performing arithmetic using Hindu-Arabic numerals but evolved via European Latin translation of Al-Khwarizmi's name into algorithm by the 18th century. The use of the word evolved to include all definite procedures for solving problems or performing tasks.
The algorithm of Archimedes gives an approximation of the Pi number.
Eratosthenes has defined an algorithim for retrieving prime numbers.
Averroès (1126-1198) was using algorithmic methods for calculations.
Adelard de Bath (12 th) introduces the algorismus term, from Al-Khwarizmi.
During the 1800's up to the mid-1900's:

- George Boole (1847) has invented the binary algebra, the basis of computers. Actually he has unified logic and calculation in a common symbolism.

- Gottlob Frege (1879) formula language's, that is a lingua characterica, a language written with special symbols, "for pure thought", that is free from rhetorical embellishments... constructed from specific symbols that are manipulated according to definite rules.

- Giuseppe Peano (1888) It's The principles of arithmetic, presented by a new method was the first attempt at an axiomatization of mathematics in a symbolic language.

- Alfred North Whitehead and Bertrand Russell in their Principia Mathematica (1910-1913) has further simplified and amplified the work of Frege.

- Kurt Goëdel (1931) cites the paradox of the liar that completely reduces rules of recursion to numbers.

The concept of algorithm was formalized in 1936 through Alan Turing's Turing machines and Alonzo Church's lambda calculus, which in turn formed the foundation of computer science.
Stephen C. Kleene (1943) defined his now-famous thesis known as the "Church-Turing Thesis". In this context:

" Algorithmic theories... In setting up a complete algorithmic theory, what we do is to describe a procedure, performable for each set of values of the independent variables, which procedure necessarily terminates and in such manner that from the outcome we can read a definite answer, "yes" or "no," to the question, "is the predicate value true?"

Classification

Classification by purpose

Each algorithm has a goal, for example, the purpose of the Quick Sort algorithm is to sort data in ascending or descending order. But the number of goals is infinite, and we have to group them by kind of purposes:

Classification by implementation

An algorithm may be implemeted according to different basical principles.

Recursive or iterative

A recursive algorithm is one that calls itself repeatedly until a certain condition matches. It is a method common to functional programming.
Iterative algorithms use repetitive constructs like loops.
Some problems are better suited for one implementation or the other. For example, the towers of hanoi problem is well understood in recursive implementation. Every recursive version has an iterative equivalent iterative, and vice versa.

Logical or procedural

An algorithm may be viewed as controlled logical deduction.
A logic component expresses the axioms which may be used in the computation and a control component determines the way in which deduction is applied to the axioms.
This is the basis of the logic programming. In pure logic programming languages the control component is fixed and algorithms are specified by supplying only the logic component.

Serial or parallel

Algorithms are usually discussed with the assumption that computers execute one instruction of an algorithm at a time. This is a serial algorithm, as opposed to parallel algorithms, which take advantage of computer architectures to process several instructions at once. They divide the problem into sub-problems and pass them to several processors. Iterative algorithms are generally parallelizable. Sorting algorithms can be parallelized efficiently.

Deterministic or non-deterministic

Deterministic algorithms solve the problem with a predefined process whereas non-deterministic algorithm must perform guesses of best solution at each step through the use of heuristics.

Classification by design paradigm

A design paradigm is a domain in research or class of problems that requires a dedicated kind of algorithm:

Divide and conquer

A divide and conquer algorithm repeatedly reduces an instance of a problem to one or more smaller instances of the same problem (usually recursively), until the instances are small enough to solve easily. One such example of divide and conquer is merge sorting. Sorting can be done on each segment of data after dividing data into segments and sorting of entire data can be obtained in conquer phase by merging them.
The binary search algorithm is an example of a variant of divide and conquer called decrease and conquer algorithm, that solves an identical subproblem and uses the solution of this subproblem to solve the bigger problem.

Dynamic programming

The shortest path in a weighted graph can be found by using the shortest path to the goal from all adjacent vertices.
When the optimal solution to a problem can be constructed from optimal solutions to subproblems, using dynamic programming avoids recomputing solutions that have already been computed.
- The main difference with the "divide and conquer" approach is, subproblems are independent in divide and conquer, where as the overlap of subproblems occur in dynamic programming.
- Dynamic programming and memoization go together. The difference with straightforward recursion is in caching or memoization of recursive calls. Where subproblems are independent, this is useless. By using memoization or maintaining a table of subproblems already solved, dynamic programming reduces the exponential nature of many problems to polynomial complexity.

The greedy method

A greedy algorithm is similar to a dynamic programming algorithm, but the difference is that solutions to the subproblems do not have to be known at each stage. Instead a "greedy" choice can be made of what looks the best solution for the moment.
The most popular greedy algorithm is finding the minimal spanning tree as given by Kruskal.

Linear programming

The problem is expressed as a set of linear inequalities and then an attempt is made to maximize or minimize the inputs. This can solve many problems such as the maximum flow for directed graphs, notably by using the simplex algorithm.
A complex variant of linear programming is called integer programming, where the solution space is restricted to all integers.

Reduction also called transform and conquer

Solve a problem by transforming it into another problem. A simple example: finding the median in an unsorted list is first translating this problem into sorting problem and finding the middle element in sorted list. The main goal of reduction is finding the simplest transformation possible.

Using graphs

Many problems, such as playing chess, can be modeled as problems on graphs. A graph exploration algorithms are used.
This category also includes the search algorithms and backtracking.

The probabilistic and heuristic paradigm

Probabilistic

Those that make some choices randomly.

Genetic

Attempt to find solutions to problems by mimicking biological evolutionary processes, with a cycle of random mutations yielding successive generations of "solutions". Thus, they emulate reproduction and "survival of the fittest".

Heuristic

Whose general purpose is not to find an optimal solution, but an approximate solution where the time or resources to find a perfect solution are not practical.

Classification by complexity

Some algorithms complete in linear time, and some complete in exponential amount of time, and some never complete.

Algorithms resources on net.

Graph Algorithms in Bioinformatics

Bioinformatics Algorithms Description

Bioinformatics Algorithms Course Page

Bioinformatics Algorithm Demonstrations

Introduction to Bioinformatics Algorithms Lectures 1-2 by Dr. Max Alekseyev USC, 2009

Online Lectures on Bioinformatics

Sequence Alignment Algorithms

Algorithm for sequence alignment: dynamic programming

Network Protocol Analysis using Bioinformatics Algorithms