Introduction to Classification Techniques in Bioinformatics

Bioinformatics is the application of computer science techniques to the field of biology. The aims of bioinformatics are to assist life scientists in organizing biological data and developing the necessary computer tools for the discovery of new scientific hypotheses. Classification techniques, also known as clustering techniques, are important in bioinformatics as they allow the separating of various biological data with similar attributes into distinct sets.

  1. History

    • The size of biological data has been growing exponentially, with the doubling of information observed every 15 months. As a result, computer science and informatics techniques are used intensively in the processing and management of biological data. The most fundamental concept in bioinformatics is that most biological data share similar characteristics and can be separated into clusters. For instance, the genes of an organism can be classified into their functional groups or metabolic pathways. Proteins can also be classified based on the genes that are expressed. Classification or clustering techniques are necessary in the management of huge databases of genetic and biological data. There are two primary types of classification techniques in bioinformatics: the hierarchical and the k-Means classification techniques.

    Hierarchical Classification

    • The hierarchical classification technique organizes biological data into a tree data structure. Genes are expressed as nodes in the tree, while each sub-tree of nodes represents a cluster or grouping of genes. The tree could be either rooted or unrooted. A rooted tree is defined as a tree with just a single node at the top. In contrast, an unrooted tree has multiple topmost nodes.

    k-Means Classification

    • A more complicated classification technique is the k-Means classification, which attempts to find a set of centers that minimize the square error distortion among the data sets in multidimensional space. A cluster is classified by grouping related points to their nearest center. The Lloyd algorithm is often used in the k-Means classification technique. In this algorithm, data points are randomly arranged into separate clusters, which are subsequently optimized to produce the minimal local square error distortions.

    Significance

    • After related proteins have been classified into similar groups, life scientists can use that information to predict the properties of certain less-studied proteins. This is also applicable to other aspects of the structure of proteins. Another use of classification techniques is to solve the problem of determining the evolutionary tree of certain organisms based on their genetic sequences. The evolutionary tree is constructed from the DNA sequence of the organism using either hierarchical or k-Means classification techniques.

    Considerations

    • Hierarchical classification technique is a relatively simple and effective way of clustering biological data. In contrast, no efficient algorithm exists at the time of writing that is able to perform the k-Means classification technique effectively as the size of the biological data increases. This suggests that a large computational power is often required to perform k-Means classification, which is an important factor to consider when selecting the classification technique to use in bioinformatics applications.

Related Searches:

References

Comments

You May Also Like

  • Neural Networks & Machine Learning

    Neural networks and machine learning are two important parts of the interdisciplinary field of artificial intelligence. Machine learning includes algorithms machines ...

  • How to Make a House of Mirrors

    No funhouse is complete without a house of mirrors. Whether you are hosting a neighborhood carnival or are turning your home into...

  • Introduction to CSS

    Cascading Style Sheets (CSS) were created by the World Wide Web Consortium, known as W3C. The W3C have set the international standards...

  • Types of Decision Trees

    A common tool that is used in a variety of different problem solving techniques is a decision tree. To accomplish this you...

  • Types of Concrete Finishes

    Types of Concrete Finishes. Concrete is becoming a cost-effective material for a luxurious--but reasonably priced--look inside the home. The finishing techniques used...

  • An Introduction to Tibshirani

    Robert Tibshirani, born in 1956 in Canada, is an accomplished professor of health research and policy and statistics at Stanford University, as...

Related Ads

Featured