<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-3299081579068555714</id><updated>2011-11-27T15:35:46.707-08:00</updated><title type='text'>bioworld</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://sciencemasala.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://sciencemasala.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>chandu</name><uri>http://www.blogger.com/profile/00723479850668875546</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>37</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-3299081579068555714.post-1776283550981828769</id><published>2008-02-24T19:49:00.000-08:00</published><updated>2008-02-24T19:50:26.623-08:00</updated><title type='text'>GENERATION OF ANTIBODY DIVERSITY</title><content type='html'>&lt;p&gt;The immune system has the capacity to recognize and respond to about 10&lt;sup&gt;7&lt;/sup&gt; different antigens. This extreme diversity can be generated in at least three possible ways: &lt;/p&gt;&lt;ol&gt;&lt;li&gt;Multiple genes in the germ line DNA. &lt;/li&gt;&lt;li&gt;Variable recombination during the differentiation of germ line cells into B-cells. &lt;/li&gt;&lt;li&gt;Mutation during the differentiation of germ line cells into B-cells. &lt;/li&gt;&lt;/ol&gt; &lt;p&gt;It is known that all three of these possibilities take place to produce antibody diversity. The following figures illustrate these possibilities: &lt;/p&gt;&lt;p&gt; &lt;img class="flt" src="http://www.cehs.siu.edu/fix/medmicro/pix/igdiv1.gif" alt="Antibody Diversity: Multiple Genes" height="115" width="295" /&gt; &lt;/p&gt;&lt;ol&gt;&lt;li&gt;The figure shows the genetic makeup of a germ line cell and a mature B-cell at the loci controlling heavy chain production. Germ line DNA has many (up to 200) different variable (V) region genes, in addition to 12 diversity (D) region genes and four joining (J) region genes. During differentiation of this cell into the B-cell, rearrangement of the DNA occurs. This rearrangement aligns one of the many V genes with one of the D genes and one of the J genes, producing a functional VDJ recombinant gene. Since any of the genes may recombine with any others, this rearrangement has the potential to generate 200 x 12 x 4 = 9600 different possible combinations. The same type of event occurs in the genes encoding the immmunoglobulin light chains where about 200 different V regions may recombine with about 5 different J regions giving rise to 200 x 5 = 1000 possible light chains. Since in any particular B-cell, any light chain combination can occur along with any heavy chain combination, the total possible immunoglobulin combinations approaches 10&lt;sup&gt;7&lt;/sup&gt; (9600 x 1000). &lt;p&gt; &lt;img class="flt" src="http://www.cehs.siu.edu/fix/medmicro/pix/igdiv2.gif" alt="Antibody Diversity: Variable Recombination" height="147" width="399" /&gt; &lt;/p&gt;&lt;/li&gt;&lt;li&gt;A second way that diversity can result is through a process of variable or "inaccurate" recombination. The figure illustrates three possible recombination events between the variable (V) and joining (J) regions of an immunoglobulin light chain. In the first event, a proline-tryptophan dipeptide sequence is produced in the resulting protein. However, in the second and third events, differential recombination places proline-arginine or proline-proline sequences into the resulting immunoglobulin. These types of events may also occur between the V and D regions and the D and J regions of the heavy chain DNA sequence.&lt;p&gt; &lt;img class="flt" src="http://www.cehs.siu.edu/fix/medmicro/pix/igdiv3.gif" alt="Antibody Diversity: Somatic Mutation" height="132" width="202" /&gt; &lt;/p&gt;&lt;/li&gt;&lt;li&gt;A third way that diversity can result is through a process of mutation. This process simply involves changes in DNA sequence that occur during differentiation of the B-cell. The figure illustrates how an A:T to G:C transition mutation could change a serine residue into a glycine residue in the resulting immunoglobulin. This process may, in part, explain the diversity observed in hypervariable (CDR) regions.&lt;p&gt; &lt;/p&gt;&lt;/li&gt;&lt;/ol&gt;  &lt;p&gt; &lt;/p&gt;&lt;hr /&gt; &lt;div&gt;&lt;b&gt;IMMUNOGLOBULIN PRODUCTION&lt;/b&gt; &lt;p&gt;The production of immunoglobulins by B-cells or plasma cells occurs in different stages. During differentiation of the B-cells from precursor stem cells, rearrangement, recombination and mutation of the immunoglobulin V, D, and J regions occurs to produce functional VJ (light chain) and VDJ (heavy chain) genes. At this point, the antigen specificity of the mature B-cell has been determined. Each cell can make only one heavy chain and one light chain, although the isotype of the heavy chain may change. Initially, a mature B-cell will produce primarily IgD (and some membrane IgM) that will migrate to the cell surface to act as the antigen receptor. Upon stimulation by antigen, the B-cell will differentiate into a plasma cell expressing large amounts of secreted IgM. Some cells will undergo a "class switch" during which a rearrangement of the DNA will occur, placing the VDJ gene next to the genes encoding the IgG, IgE or IgA constant regions. Upon secondary induction (i.e. the secondary response), these B-cells will differentiate into plasma cells expressing the new isotype. Most commonly, this results in a switch from IgM (primary response) to IgG (secondary response). The factors that lead to production of IgE or IgA instead of IgG are not well understood. &lt;/p&gt;&lt;/div&gt; &lt;p&gt; &lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3299081579068555714-1776283550981828769?l=sciencemasala.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://sciencemasala.blogspot.com/feeds/1776283550981828769/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3299081579068555714&amp;postID=1776283550981828769' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/1776283550981828769'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/1776283550981828769'/><link rel='alternate' type='text/html' href='http://sciencemasala.blogspot.com/2008/02/generation-of-antibody-diversity.html' title='GENERATION OF ANTIBODY DIVERSITY'/><author><name>chandu</name><uri>http://www.blogger.com/profile/00723479850668875546</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3299081579068555714.post-6938270949196303386</id><published>2008-02-24T19:48:00.000-08:00</published><updated>2008-02-24T19:49:30.221-08:00</updated><title type='text'>Immunoglobulins</title><content type='html'>&lt;p&gt;Immunoglobulins generally assume one of two roles: immunoglobulins may act as i) plasma membrane bound antigen receptors on the surface of a B-cell or ii) as antibodies free in cellular fluids functioning to intercept and eliminate antigenic determinants. In either role, antibody function is intimately related to its structure and this page will introduce immunoglobulins (antibodies) and relate their structure to their function in host defense. &lt;/p&gt;&lt;p&gt; &lt;/p&gt;&lt;hr /&gt; &lt;div&gt;&lt;b&gt;BASIC IMMUNOGLOBULIN STRUCTURE&lt;/b&gt;&lt;/div&gt; &lt;div class="flt"&gt;&lt;img src="http://www.cehs.siu.edu/fix/medmicro/pix/ig.gif" alt="Basic Immunoglobilin Structure" height="190" width="367" /&gt;&lt;/div&gt;  &lt;p&gt;Immunoglobulins are composed of four polypeptide chains: two "light" chains (lambda or kappa), and two "heavy" chains (alpha, delta, gamma, epsilon or mu). The type of heavy chain determines the immunoglobulin isotype (IgA, IgD, IgG, IgE, IgM, respectively). Light chains are composed of 220 amino acid residues while heavy chains are composed of 440-550 amino acids. Each chain has "constant" and "variable" regions as shown in the figure. Variable regions are contained within the amino (NH&lt;sub&gt;2&lt;/sub&gt;) terminal end of the polypeptide chain (amino acids 1-110). When comparing one antibody to another, these amino acid sequences are quite distinct. Constant regions, comprising amino acids 111-220 (or 440-550), are rather uniform, in comparison, from one antibody to another, within the same isotype. "Hypervariable" regions, or "Complementarity Determining Regions" (CDRs) are found within the variable regions of both the heavy and light chains. These regions serve to recognize and bind specifically to antigen. The four polypeptide chains are held together by covalent disulfide (-S-S-) bonds. &lt;/p&gt;&lt;p&gt; &lt;/p&gt;&lt;div class="ctr"&gt;&lt;img src="http://www.cehs.siu.edu/fix/medmicro/pix/igg2.gif" alt="IgG2 3-Dimensional Structure, side view" height="300" width="384" /&gt;&lt;img src="http://www.cehs.siu.edu/fix/medmicro/pix/igg_ag2.gif" alt="IgG2 3-Dimensional Structure, Ag binding" height="300" width="384" /&gt;&lt;/div&gt; &lt;p&gt; &lt;/p&gt;&lt;div class="ctr"&gt;&lt;a href="http://www.cehs.siu.edu/fix/medmicro/3d_reps.htm" onmouseover="self.status='Use RASMOL to visualize 3-D Structures'; return true" onmouseout="self.status=''; return true"&gt;Click here to visualize these 3D structures in real time!&lt;/a&gt;&lt;/div&gt; &lt;p&gt;Structural differences between immunoglobulins are used for their classification. As stated above, the type of heavy chain an immunoglobulin possesses determines the immunoglobulin &lt;span class="rd"&gt;"isotype"&lt;/span&gt;. More specifically, an isotype is determined by the primary sequence of amino acids in the constant region of the heavy chain, which in turn determines the three-dimensional structure of the molecule. Since immunoglobulins are proteins, they can act as an antigen, eliciting an immune response that generates anti-immunoglobulin antibodies. However, the structural (three-dimensional) features that define isotypes are not immunogenic in an animal of the same species, since they are not seen as "foreign". For example, the five human isotypes, IgA, IgD, IgG, IgE and IgM are found in all humans and a result, injection of human IgG into another human would not generate antibodies directed against the structural features (determinants) that define the IgG isotype. However, injection of human IgG into a rabbit &lt;i&gt;would&lt;/i&gt; generate antibodies directed against those same structural features. &lt;/p&gt;&lt;p&gt; Another means of classifying immunoglobulins is defined by the term &lt;span class="rd"&gt;"allotype"&lt;/span&gt;. Like isotypes, allotypes are determined by the amino acid sequence and corresponding three-dimensional structure of the constant region of the immunoglobulin molecule. Unlike isotypes, allotypes reflect genetic differences between members of the same species. This means that not all members of the species will possess any particular allotype. Therefore, injection of any specific human allotype into another human could possibly generate antibodies directed against the structural features that define that particular allotypic variation. &lt;/p&gt;&lt;p&gt; A third means of classifying immunoglobulins is defined by the term &lt;span class="rd"&gt;"idiotype"&lt;/span&gt;. Unlike isotypes and allotypes, idiotypes are determined by the amino acid sequence and corresponding three-dimensional structure of the variable region of the immunoglobulin molecule. In this regard, idiotypes reflect the antigen binding specificity of any particular antibody molecule. Idiotypes are so unique that an individual person is probably capable of generating antibodies directed against their own idiotypic determinants. This probability forms the basis of the Idiotypic Network Hypothesis to be described later. &lt;/p&gt;&lt;p&gt;   &lt;/p&gt;&lt;hr /&gt; &lt;div&gt;&lt;b&gt;BASIC IMMUNOGLOBULIN FUNCTION&lt;/b&gt;&lt;/div&gt; &lt;p&gt;Antibodies function in a variety of ways designed to eliminate the antigen that elicited their production. Some of these functions are independent of the particular class (isotype) of immunoglobulin. These functions reflect the antigen binding capacity of the molecule as defined by the variable and hypervariable (idiotypic) regions. For example, an antibody might bind to a toxin and prevent that toxin from entering host cells where its biological effects would be activated. Similarly, a different antibody might bind to the surface of a virus and prevent that virus from entering its host cell. In contrast, other antibody functions are dependent upon the immunoglobulin class (isotype). These functions are contained within the constant regions of the molecule. For example, only IgG and IgM antibodies have the ability to interact with and initiate the complement cascade. Likewise, only IgG molecules can bind to the surface of macrophages via Fc receptors to promote and enhance phagocytosis. The following table summarizes some immunoglobulin properties.&lt;/p&gt;&lt;p&gt; &lt;/p&gt; &lt;table&gt;&lt;tbody&gt;&lt;tr&gt; &lt;th class="b"&gt;Isotype&lt;/th&gt; &lt;th class="b"&gt;Structure&lt;/th&gt; &lt;th class="b"&gt;Placental transfer&lt;/th&gt; &lt;th class="b"&gt;Binds mast cell surfaces&lt;/th&gt; &lt;th class="b"&gt;Binds phagocytic cell surfaces&lt;/th&gt; &lt;th class="b"&gt;Activates complement&lt;/th&gt; &lt;th class="b"&gt;Additional features&lt;/th&gt; &lt;/tr&gt;  &lt;tr&gt; &lt;th&gt;IgM&lt;/th&gt; &lt;th&gt;&lt;img src="http://www.cehs.siu.edu/fix/medmicro/pix/igm.gif" alt="Structure of IgM" height="83" width="86" /&gt;&lt;/th&gt; &lt;th&gt;-&lt;/th&gt; &lt;th&gt;-&lt;/th&gt; &lt;th&gt;-&lt;/th&gt; &lt;th&gt;+&lt;/th&gt; &lt;td&gt;First Ab in development and response.&lt;/td&gt; &lt;/tr&gt;  &lt;tr&gt; &lt;th&gt;IgD&lt;/th&gt; &lt;th&gt;&lt;img src="http://www.cehs.siu.edu/fix/medmicro/pix/igd.gif" alt="Structure of IgD" height="55" width="89" /&gt;&lt;/th&gt; &lt;th&gt;-&lt;/th&gt; &lt;th&gt;-&lt;/th&gt; &lt;th&gt;-&lt;/th&gt; &lt;th&gt;-&lt;/th&gt; &lt;td&gt;B-cell receptor.&lt;/td&gt; &lt;/tr&gt;  &lt;tr&gt; &lt;th&gt;IgG&lt;/th&gt; &lt;th&gt;&lt;img src="http://www.cehs.siu.edu/fix/medmicro/pix/iga.gif" alt="Structure of IgG" height="37" width="38" /&gt;&lt;/th&gt; &lt;th&gt;+&lt;/th&gt; &lt;th&gt;-&lt;/th&gt; &lt;th&gt;+&lt;/th&gt; &lt;th&gt;+&lt;/th&gt; &lt;td&gt;Involved in opsonization and ADCC. Four subclasses; IgG1, IgG2, IgG3, IgG4.&lt;/td&gt; &lt;/tr&gt;  &lt;tr&gt; &lt;th&gt;IgE&lt;/th&gt; &lt;th&gt;&lt;img src="http://www.cehs.siu.edu/fix/medmicro/pix/ige.gif" alt="Structure of IgE" height="63" width="82" /&gt;&lt;/th&gt; &lt;th&gt;-&lt;/th&gt; &lt;th&gt;+&lt;/th&gt; &lt;th&gt;-&lt;/th&gt; &lt;th&gt;-&lt;/th&gt; &lt;td&gt;Involved in allergic responses.&lt;/td&gt; &lt;/tr&gt;  &lt;tr&gt; &lt;th&gt;IgA&lt;/th&gt; &lt;th&gt;&lt;img src="http://www.cehs.siu.edu/fix/medmicro/pix/iga.gif" alt="Structure of IgA" height="37" width="38" /&gt;&lt;br /&gt;&lt;img src="http://www.cehs.siu.edu/fix/medmicro/pix/siga.gif" alt="Structure of sIgA" height="34" width="80" /&gt;&lt;/th&gt; &lt;th&gt;-&lt;/th&gt; &lt;th&gt;-&lt;/th&gt; &lt;th&gt;-&lt;/th&gt; &lt;th&gt;-&lt;/th&gt; &lt;td&gt;Two subclasses; IgA1, IgA2. Also found as dimer (sIgA) in secretions.&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3299081579068555714-6938270949196303386?l=sciencemasala.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://sciencemasala.blogspot.com/feeds/6938270949196303386/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3299081579068555714&amp;postID=6938270949196303386' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/6938270949196303386'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/6938270949196303386'/><link rel='alternate' type='text/html' href='http://sciencemasala.blogspot.com/2008/02/immunoglobulins.html' title='Immunoglobulins'/><author><name>chandu</name><uri>http://www.blogger.com/profile/00723479850668875546</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3299081579068555714.post-8905220033944350737</id><published>2007-07-27T06:41:00.000-07:00</published><updated>2007-07-27T06:43:54.189-07:00</updated><title type='text'>Transposon</title><content type='html'>transposons are sequences of &lt;a href="http://en.wikipedia.org/wiki/DNA" title="DNA"&gt;DNA&lt;/a&gt; that can move around to different positions within the &lt;a href="http://en.wikipedia.org/wiki/Genome" title="Genome"&gt;genome&lt;/a&gt; of a single &lt;a href="http://en.wikipedia.org/wiki/Cell_%28biology%29" title="Cell (biology)"&gt;cell&lt;/a&gt;, a process called &lt;b&gt;transposition&lt;/b&gt;. In the process, they can cause &lt;a href="http://en.wikipedia.org/wiki/Mutation" title="Mutation"&gt;mutations&lt;/a&gt; and change the amount of DNA in the genome. Transposons are also called "jumping genes", and are examples of &lt;a href="http://en.wikipedia.org/wiki/Mobile_genetic_elements" title="Mobile genetic elements"&gt;mobile genetic elements&lt;/a&gt;. Discovered by &lt;a href="http://en.wikipedia.org/wiki/Barbara_McClintock" title="Barbara McClintock"&gt;Barbara McClintock&lt;/a&gt; early in her career&lt;sup id="_ref-0" class="reference"&gt;&lt;a href="http://en.wikipedia.org/wiki/Transposon#_note-0" title=""&gt;[1]&lt;/a&gt;&lt;/sup&gt;, the topic went on to be a &lt;a href="http://en.wikipedia.org/wiki/Nobel_Prize" title="Nobel Prize"&gt;Nobel&lt;/a&gt; winning work in &lt;a href="http://en.wikipedia.org/wiki/1983" title="1983"&gt;1983&lt;/a&gt;. There are a variety of mobile genetic elements, and they can be grouped based on their mechanism of transposition. Class I mobile genetic elements, or &lt;a href="http://en.wikipedia.org/wiki/Retrotransposon" title="Retrotransposon"&gt;retrotransposons&lt;/a&gt;, move in the genome by being &lt;a href="http://en.wikipedia.org/wiki/Transcription_%28genetics%29" title="Transcription (genetics)"&gt;transcribed&lt;/a&gt; to &lt;a href="http://en.wikipedia.org/wiki/RNA" title="RNA"&gt;RNA&lt;/a&gt; and then back to DNA by &lt;a href="http://en.wikipedia.org/wiki/Reverse_transcriptase" title="Reverse transcriptase"&gt;reverse transcriptase&lt;/a&gt;, while class II mobile genetic elements move directly from one position to another within the genome using a &lt;a href="http://en.wikipedia.org/wiki/Transposase" title="Transposase"&gt;transposase&lt;/a&gt; to "cut and paste" them within the genome. Transposons are very useful to researchers as a means to alter DNA inside of a living organism. Transposons make up a large fraction of &lt;a href="http://en.wikipedia.org/wiki/Genome_size" title="Genome size"&gt;genome sizes&lt;/a&gt; which is evident through the &lt;a href="http://en.wikipedia.org/wiki/C-value" title="C-value"&gt;C-values&lt;/a&gt; of &lt;a href="http://en.wikipedia.org/wiki/Eukaryote" title="Eukaryote"&gt;eukaryotic&lt;/a&gt; species. As an example about 48% of the &lt;a href="http://en.wikipedia.org/wiki/Human_genome" title="Human genome"&gt;human genome&lt;/a&gt; is composed of transposons and their defunct remnants.&lt;br /&gt;&lt;h2&gt;&lt;span class="mw-headline"&gt;Types of transposons&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;Transposons are classified into two classes based on their mechanism of transposition.&lt;/p&gt; &lt;p&gt;&lt;a name="Class_I:_Retrotransposons" id="Class_I:_Retrotransposons"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Class I: Retrotransposons&lt;/span&gt;&lt;/h3&gt; &lt;dl&gt;&lt;dd&gt; &lt;div class="noprint"&gt;&lt;i&gt;&lt;a href="http://en.wikipedia.org/wiki/Retrotransposon" title="Retrotransposon"&gt;&lt;br /&gt;&lt;/a&gt;&lt;/i&gt;&lt;/div&gt; &lt;/dd&gt;&lt;/dl&gt; &lt;p&gt;Retrotransposons work by copying themselves and pasting copies back into the genome in multiple places. Initially retrotransposons copy themselves to &lt;a href="http://en.wikipedia.org/wiki/RNA" title="RNA"&gt;RNA&lt;/a&gt; (&lt;a href="http://en.wikipedia.org/wiki/Transcription_%28genetics%29" title="Transcription (genetics)"&gt;transcription&lt;/a&gt;) but, in addition to being transcribed, the RNA is copied into &lt;a href="http://en.wikipedia.org/wiki/DNA" title="DNA"&gt;DNA&lt;/a&gt; by a &lt;a href="http://en.wikipedia.org/wiki/Reverse_transcriptase" title="Reverse transcriptase"&gt;reverse transcriptase&lt;/a&gt; (often coded by the transposon itself) and inserted back into the genome.&lt;/p&gt; &lt;p&gt;Retrotransposons behave very similarly to retroviruses, such as &lt;a href="http://en.wikipedia.org/wiki/HIV" title="HIV"&gt;HIV&lt;/a&gt;, giving a clue to the &lt;a href="http://en.wikipedia.org/wiki/Evolution" title="Evolution"&gt;evolutionary&lt;/a&gt; origins of such viruses.&lt;/p&gt; &lt;p&gt;There are three main classes of Retrotransposons:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;Viral: encode reverse transcriptase (to reverse transcribe RNA into DNA), have long terminal repeats (LTRs), similar to retroviruses&lt;/li&gt;&lt;li&gt;LINEs: encode reverse transcriptase, lack LTRs, transcribed by &lt;a href="http://en.wikipedia.org/wiki/RNA_polymerase_II" title="RNA polymerase II"&gt;RNA polymerase II&lt;/a&gt;&lt;/li&gt;&lt;li&gt;Nonviral superfamily: do not code for reverse transcriptase, transcribed by &lt;a href="http://en.wikipedia.org/wiki/RNA_polymerase_III" title="RNA polymerase III"&gt;RNA polymerase III&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;&lt;a name="Class_II:_DNA_transposons" id="Class_II:_DNA_transposons"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Class II: DNA transposons&lt;/span&gt;&lt;/h3&gt; &lt;p&gt;The major difference of Class II transposons from retrotransposons is that their transposition mechanism does not involve an RNA intermediate. Class II transposons usually move by &lt;a href="http://en.wikipedia.org/wiki/Cut_and_paste" title="Cut and paste"&gt;cut and paste&lt;/a&gt;, rather than copy and paste, using the &lt;a href="http://en.wikipedia.org/wiki/Transposase" title="Transposase"&gt;transposase&lt;/a&gt; enzyme. Different types of transposase work in different ways. Some can bind to any part of the DNA molecule, and the target site can therefore be anywhere, while others bind to specific sequences. Transposase makes a staggered cut at the target site producing sticky ends, cuts out the transposon and ligates it into the target site. A &lt;a href="http://en.wikipedia.org/wiki/DNA_polymerase" title="DNA polymerase"&gt;DNA polymerase&lt;/a&gt; fills in the resulting gaps from the sticky ends and &lt;a href="http://en.wikipedia.org/wiki/DNA_ligase" title="DNA ligase"&gt;DNA ligase&lt;/a&gt; closes the sugar-phosphate backbone. This results in target site duplication and the insertion sites of DNA transposons may be identified by short direct repeats (a staggered cut in the target DNA filled by DNA polymerase) followed by inverted repeats (which are important for the transposon excision by transposase).&lt;/p&gt; &lt;p&gt;Not all DNA transposons transpose through cut and paste mechanism. In some cases a &lt;a href="http://en.wikipedia.org/wiki/Replicative_transposition" title="Replicative transposition"&gt;replicative transposition&lt;/a&gt; is observed in which transposon replicates itself to a new target site.&lt;/p&gt; &lt;p&gt;Both classes of transposon may lose their ability to synthesise reverse transcriptase or transposase through mutation, yet continue to jump through the genome because other transposons are still producing the necessary enzyme.&lt;/p&gt; &lt;p&gt;&lt;a name="Examples" id="Examples"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Examples&lt;/span&gt;&lt;/h2&gt; &lt;ul&gt;&lt;li&gt;The first transposons were discovered in &lt;a href="http://en.wikipedia.org/wiki/Maize" title="Maize"&gt;maize&lt;/a&gt; (&lt;i&gt;Zea mays&lt;/i&gt;), (corn species) by &lt;a href="http://en.wikipedia.org/wiki/Barbara_McClintock" title="Barbara McClintock"&gt;Barbara McClintock&lt;/a&gt; in &lt;a href="http://en.wikipedia.org/wiki/1948" title="1948"&gt;1948&lt;/a&gt;, for which she was awarded a &lt;a href="http://en.wikipedia.org/wiki/Nobel_Prize" title="Nobel Prize"&gt;Nobel Prize&lt;/a&gt; in &lt;a href="http://en.wikipedia.org/wiki/1983" title="1983"&gt;1983&lt;/a&gt;. She noticed &lt;a href="http://en.wikipedia.org/wiki/Genetic_insertion" title="Genetic insertion"&gt;insertions&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/Genetic_deletion" title="Genetic deletion"&gt;deletions&lt;/a&gt;, and &lt;a href="http://en.wikipedia.org/wiki/Chromosomal_translocation" title="Chromosomal translocation"&gt;translocations&lt;/a&gt;, caused by these transposons. These changes in the genome could, for example, lead to a change in the color of corn kernels. About 50% of the total genome of maize consists of transposons. The Ac/Ds system McClintock described are class II transposons.&lt;/li&gt;&lt;li&gt;One family of transposons in the fruit fly &lt;i&gt;&lt;a href="http://en.wikipedia.org/wiki/Drosophila_melanogaster" title="Drosophila melanogaster"&gt;Drosophila melanogaster&lt;/a&gt;&lt;/i&gt; are called &lt;i&gt;&lt;a href="http://en.wikipedia.org/wiki/P_element" title="P element"&gt;P elements&lt;/a&gt;&lt;/i&gt;. They seem to have first appeared in the &lt;a href="http://en.wikipedia.org/wiki/Species" title="Species"&gt;species&lt;/a&gt; only in the middle of the twentieth century. Within 50 years, they have spread through every &lt;a href="http://en.wikipedia.org/wiki/Population" title="Population"&gt;population&lt;/a&gt; of the species. Artificial P elements can be used to insert genes into Drosophila by injecting the &lt;a href="http://en.wikipedia.org/wiki/Embryo" title="Embryo"&gt;embryo&lt;/a&gt;. For the use of P elements as a genetic tool see: "&lt;a href="http://en.wikipedia.org/w/index.php?title=Transposons_as_a_genetic_tool&amp;action=edit" class="new" title="Transposons as a genetic tool"&gt;transposons as a genetic tool&lt;/a&gt;".&lt;/li&gt;&lt;li&gt;Transposons in &lt;a href="http://en.wikipedia.org/wiki/Bacterium" title="Bacterium"&gt;bacteria&lt;/a&gt; usually carry an additional gene for function other than transposition---often for &lt;a href="http://en.wikipedia.org/wiki/Antibiotic_resistance" title="Antibiotic resistance"&gt;antibiotic resistance&lt;/a&gt;. In bacteria, transposons can jump from &lt;a href="http://en.wikipedia.org/wiki/Chromosome" title="Chromosome"&gt;chromosomal&lt;/a&gt; DNA to &lt;a href="http://en.wikipedia.org/wiki/Plasmid" title="Plasmid"&gt;plasmid&lt;/a&gt; DNA and back, allowing for the transfer and permanent addition of genes such as those encoding antibiotic resistance (&lt;a href="http://en.wikipedia.org/wiki/Multidrug_resistance" title="Multidrug resistance"&gt;multi-antibiotic resistant&lt;/a&gt; bacterial strains can be generated in this way). Bacterial transposons of this type belong to the Tn family. When the transposable elements lack additional genes, they are known as &lt;a href="http://en.wikipedia.org/wiki/Insertion_sequence" title="Insertion sequence"&gt;insertion sequences&lt;/a&gt;.&lt;/li&gt;&lt;li&gt;The most common form of transposon in &lt;a href="http://en.wikipedia.org/wiki/Human" title="Human"&gt;humans&lt;/a&gt; is the &lt;a href="http://en.wikipedia.org/wiki/Alu_sequence" title="Alu sequence"&gt;Alu sequence&lt;/a&gt;. The &lt;a href="http://en.wikipedia.org/wiki/Alu_sequence" title="Alu sequence"&gt;Alu sequence&lt;/a&gt; is approximately 300 bases long and can be found between 300,000 and a million times in the human &lt;a href="http://en.wikipedia.org/wiki/Genome" title="Genome"&gt;genome&lt;/a&gt;.&lt;/li&gt;&lt;li&gt;Mu phage transposition is the best known example of &lt;a href="http://en.wikipedia.org/wiki/Replicative_transposition" title="Replicative transposition"&gt;replicative transposition&lt;/a&gt;. Its transposition mechanism is somewhat similar to a &lt;a href="http://en.wikipedia.org/wiki/Homologous_recombination" title="Homologous recombination"&gt;homologous recombination&lt;/a&gt;.&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;&lt;a name="Transposons_causing_diseases" id="Transposons_causing_diseases"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Transposons causing diseases&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;Transposons are &lt;a href="http://en.wikipedia.org/wiki/Mutagen" title="Mutagen"&gt;mutagens&lt;/a&gt;. They can damage the genome of their host cell in different ways:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;A transposon or a retroposon that inserts itself into a functional gene will most likely disable that gene.&lt;/li&gt;&lt;li&gt;After a transposon leaves a gene, the resulting gap will probably not be repaired correctly.&lt;/li&gt;&lt;li&gt;Multiple copies of the same sequence, such as &lt;a href="http://en.wikipedia.org/wiki/Alu_sequence" title="Alu sequence"&gt;Alu sequences&lt;/a&gt; can hinder precise &lt;a href="http://en.wikipedia.org/wiki/Chromosome" title="Chromosome"&gt;chromosomal&lt;/a&gt; pairing during &lt;a href="http://en.wikipedia.org/wiki/Mitosis" title="Mitosis"&gt;mitosis&lt;/a&gt;, resulting in unequal &lt;a href="http://en.wikipedia.org/wiki/Chromosomal_crossover" title="Chromosomal crossover"&gt;crossovers&lt;/a&gt;, one of the main reasons for chromosome duplication.&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;Diseases that are often caused by transposons include &lt;a href="http://en.wikipedia.org/wiki/Hemophilia" title="Hemophilia"&gt;hemophilia&lt;/a&gt; A and B, &lt;a href="http://en.wikipedia.org/wiki/Severe_combined_immunodeficiency" title="Severe combined immunodeficiency"&gt;severe combined immunodeficiency&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/Porphyria" title="Porphyria"&gt;porphyria&lt;/a&gt;, predisposition to &lt;a href="http://en.wikipedia.org/wiki/Cancer" title="Cancer"&gt;cancer&lt;/a&gt;, and &lt;a href="http://en.wikipedia.org/wiki/Duchenne_muscular_dystrophy" title="Duchenne muscular dystrophy"&gt;Duchenne muscular dystrophy&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;Additionally, many transposons contain promoters which drive &lt;a href="http://en.wikipedia.org/wiki/Transcription_%28genetics%29" title="Transcription (genetics)"&gt;transcription&lt;/a&gt; of their own &lt;a href="http://en.wikipedia.org/wiki/Transposase" title="Transposase"&gt;transposase&lt;/a&gt;. These promoters can cause aberrant expression of linked genes, causing disease or &lt;a href="http://en.wikipedia.org/wiki/Mutant" title="Mutant"&gt;mutant&lt;/a&gt; &lt;a href="http://en.wikipedia.org/wiki/Phenotypes" title="Phenotypes"&gt;phenotypes&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;&lt;a name="Evolution_of_transposons" id="Evolution_of_transposons"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt;&lt;span class="mw-headline"&gt;Evolution of transposons&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;The evolution of transposons and their effect on genome evolution is currently a dynamic field of study.&lt;/p&gt; &lt;p&gt;Transposons are found in all major branches of life. They may or may not have originated in the &lt;a href="http://en.wikipedia.org/wiki/Common_descent" title="Common descent"&gt;last universal common ancestor&lt;/a&gt;, or arisen independently multiple times, or perhaps arisen once and then spread to other kingdoms by &lt;a href="http://en.wikipedia.org/wiki/Horizontal_gene_transfer" title="Horizontal gene transfer"&gt;horizontal gene transfer&lt;/a&gt;. While transposons may confer some benefits on their hosts, they are generally considered to be &lt;a href="http://en.wikipedia.org/wiki/Selfish_DNA" title="Selfish DNA"&gt;selfish DNA&lt;/a&gt; &lt;a href="http://en.wikipedia.org/wiki/Parasite" title="Parasite"&gt;parasites&lt;/a&gt; that live within the genome of cellular organisms. In this way, they are similar to &lt;a href="http://en.wikipedia.org/wiki/Virus" title="Virus"&gt;viruses&lt;/a&gt;. Viruses and transposons also share features in their genome structure and biochemical abilities, leading to speculation that they share a common ancestor.&lt;/p&gt; &lt;p&gt;Since excessive transposon activity can destroy a genome, many organisms seem to have developed mechanisms to reduce transposition to a manageable level. Bacteria may undergo high rates of gene deletion as part of a mechanism to remove transposons and viruses from their genomes while &lt;a href="http://en.wikipedia.org/wiki/Eukaryote" title="Eukaryote"&gt;eukaryotic&lt;/a&gt; &lt;a href="http://en.wikipedia.org/wiki/Organism" title="Organism"&gt;organisms&lt;/a&gt; may have developed the &lt;a href="http://en.wikipedia.org/wiki/RNA_interference" title="RNA interference"&gt;RNA interference&lt;/a&gt; (RNAi) mechanism as a way of reducing transposon activity. In the nematode &lt;i&gt;&lt;a href="http://en.wikipedia.org/wiki/Caenorhabditis_elegans" title="Caenorhabditis elegans"&gt;Caenorhabditis elegans&lt;/a&gt;&lt;/i&gt;, some genes required for RNAi also reduce transposon activity.&lt;/p&gt; &lt;p&gt;Transposons may have been co-opted by the &lt;a href="http://en.wikipedia.org/wiki/Adaptive_immune_system" title="Adaptive immune system"&gt;vertebrate immune system&lt;/a&gt; as a means of producing antibody diversity. The &lt;a href="http://en.wikipedia.org/wiki/V%28D%29J_recombination" title="V(D)J recombination"&gt;V(D)J recombination&lt;/a&gt; system operates by a mechanism similar to that of transposons.&lt;/p&gt; &lt;p&gt;Evidence exists that transposable elements may act as mutators in bacteria.&lt;/p&gt; &lt;p&gt;&lt;a name="Applications" id="Applications"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Applications&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;Transposons were first discovered in the plant &lt;a href="http://en.wikipedia.org/wiki/Maize" title="Maize"&gt;maize&lt;/a&gt; (&lt;i&gt;Zea mays&lt;/i&gt;, corn species), which is named &lt;a href="http://en.wikipedia.org/w/index.php?title=Dissociator&amp;amp;action=edit" class="new" title="Dissociator"&gt;dissociator&lt;/a&gt; (Ds). Likewise, the first transposon to be molecularly isolated was from a plant (&lt;a href="http://en.wikipedia.org/wiki/Snapdragon" title="Snapdragon"&gt;Snapdragon&lt;/a&gt;). Appropriately, transposons have been an especially useful tool in plant molecular biology. Researchers use transposons as a means of mutagenesis. In this context, a transposon jumps into a gene and produces a mutation. The presence of the transposon provides a straightforward means of identifying the mutant allele, relative to chemical mutagenesis methods.&lt;/p&gt; &lt;p&gt;Sometimes the insertion of a transposon into a gene can disrupt that gene's function in a reversible manner; transposase mediated excision of the transposon restores gene function. This produces plants in which neighboring cells have different &lt;a href="http://en.wikipedia.org/wiki/Genotype" title="Genotype"&gt;genotypes&lt;/a&gt;. This feature allows researchers to distinguish between genes that must be present inside of a cell in order to function (cell-autonomous) and genes that produce observable effects in cells other than those where the gene is expressed.&lt;/p&gt; &lt;p&gt;Transposons are also a widely used tool for mutagenesis in &lt;i&gt;&lt;a href="http://en.wikipedia.org/wiki/Drosophila_melanogaster" title="Drosophila melanogaster"&gt;Drosophila melanogaster&lt;/a&gt;&lt;/i&gt;, and a wide variety of &lt;a href="http://en.wikipedia.org/wiki/Bacterium" title="Bacterium"&gt;bacteria&lt;/a&gt; to study gene function.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3299081579068555714-8905220033944350737?l=sciencemasala.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://sciencemasala.blogspot.com/feeds/8905220033944350737/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3299081579068555714&amp;postID=8905220033944350737' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/8905220033944350737'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/8905220033944350737'/><link rel='alternate' type='text/html' href='http://sciencemasala.blogspot.com/2007/07/transposon.html' title='Transposon'/><author><name>chandu</name><uri>http://www.blogger.com/profile/00723479850668875546</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3299081579068555714.post-5786020109748507694</id><published>2007-07-27T06:40:00.001-07:00</published><updated>2007-07-27T06:40:50.531-07:00</updated><title type='text'>Gene silencing</title><content type='html'>&lt;p&gt;&lt;b&gt;Gene silencing&lt;/b&gt; is a general term describing &lt;a href="http://en.wikipedia.org/wiki/Epigenetic" title="Epigenetic"&gt;epigenetic&lt;/a&gt; processes of &lt;a href="http://en.wikipedia.org/wiki/Gene_regulation" title="Gene regulation"&gt;gene regulation&lt;/a&gt;. The term gene silencing is generally used to describe the "switching off" of a gene by a mechanism other than &lt;a href="http://en.wikipedia.org/wiki/Genetic_modification" title="Genetic modification"&gt;genetic modification&lt;/a&gt;. That is, a gene which would be expressed (turned on) under normal circumstances is switched off by machinery in the cell.&lt;/p&gt; &lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/Gene" title="Gene"&gt;Genes&lt;/a&gt; are regulated at either the &lt;a href="http://en.wikipedia.org/wiki/Transcription_%28genetics%29" title="Transcription (genetics)"&gt;transcriptional&lt;/a&gt; or &lt;a href="http://en.wikipedia.org/wiki/Translation_%28biology%29" title="Translation (biology)"&gt;post-transcriptional&lt;/a&gt; level.&lt;/p&gt; &lt;p&gt;Transcriptional gene silencing is the result of &lt;a href="http://en.wikipedia.org/wiki/Histone" title="Histone"&gt;histone&lt;/a&gt; modifications, creating an environment of &lt;a href="http://en.wikipedia.org/wiki/Heterochromatin" title="Heterochromatin"&gt;heterochromatin&lt;/a&gt; around a gene that makes it inaccessible to transcriptional machinery (&lt;a href="http://en.wikipedia.org/wiki/RNA_polymerase" title="RNA polymerase"&gt;RNA polymerase&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/Transcription_factors" title="Transcription factors"&gt;transcription factors&lt;/a&gt;, etc.).&lt;/p&gt; &lt;p&gt;Post-transcriptional gene silencing is the result of &lt;a href="http://en.wikipedia.org/wiki/MRNA" title="MRNA"&gt;mRNA&lt;/a&gt; of a particular gene being destroyed. The destruction of the mRNA prevents &lt;a href="http://en.wikipedia.org/wiki/Translation" title="Translation"&gt;translation&lt;/a&gt; to form an active gene product (in most cases, a &lt;a href="http://en.wikipedia.org/wiki/Protein" title="Protein"&gt;protein&lt;/a&gt;). A common mechanism of post-transcriptional gene silencing is &lt;a href="http://en.wikipedia.org/wiki/RNAi" title="RNAi"&gt;RNAi&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;Both transcriptional and post-transcriptional gene silencing are used to regulate endogenous genes. Mechanisms of gene silencing also protect the organism's genome from &lt;a href="http://en.wikipedia.org/wiki/Transposon" title="Transposon"&gt;transposons&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Virus" title="Virus"&gt;viruses&lt;/a&gt;. Gene silencing thus may be part of an ancient immune system protecting from such infectious &lt;a href="http://en.wikipedia.org/wiki/DNA" title="DNA"&gt;DNA&lt;/a&gt; elements.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3299081579068555714-5786020109748507694?l=sciencemasala.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://sciencemasala.blogspot.com/feeds/5786020109748507694/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3299081579068555714&amp;postID=5786020109748507694' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/5786020109748507694'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/5786020109748507694'/><link rel='alternate' type='text/html' href='http://sciencemasala.blogspot.com/2007/07/gene-silencing.html' title='Gene silencing'/><author><name>chandu</name><uri>http://www.blogger.com/profile/00723479850668875546</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3299081579068555714.post-4758952754351341049</id><published>2007-07-27T06:33:00.000-07:00</published><updated>2007-07-27T06:35:35.577-07:00</updated><title type='text'>What is RNAi</title><content type='html'>&lt;p&gt;RNA interference (RNAi) is a highly evolutionally conserved process of post-transcriptional gene silencing (PTGS) by which double stranded RNA (dsRNA), when introduced into a cell, causes sequence-specific degradation of homogolous mRNA sequences. It was first discovered in 1998 by Andrew Fire and Craig Mello in the nematode worm &lt;i&gt;Caenorhabditis elegans &lt;/i&gt;and later found in a wide variety of organisms, including mammals.  &lt;/p&gt; &lt;p&gt;&lt;b&gt;Mechanism of RNA interference&lt;/b&gt;&lt;/p&gt; &lt;center&gt; &lt;img src="http://www.rnaiweb.com/images/RNAi/RNAi.jpg" /&gt;&lt;/center&gt; &lt;p&gt;A. On entering the cell, long dsRNAs act as a trigger of RNAi process.&lt;/p&gt; &lt;p&gt;B. It is first processed by the RNAse III enzyme Dicer in an ATP-dependent reaction.&lt;/p&gt; &lt;p&gt;C. Dicer processes dsRNAs into 21-23 nt short interfering RNA (siRNA) with 2-nt 3' overhangs. siRNA can also be synthesized outside the cell and then be introduced into a cell. &lt;/p&gt; &lt;p&gt;D. The siRNAs are incorporated into the RNA-inducing silencing complex (RISC) which consists of an Argonaute (Ago) protein as one of its main components. Ago cleaves and discards the passenger (sense) strand of the siRNA duplex leading to activation of the RISC.&lt;/p&gt; &lt;p&gt;E and F. The remaining guide (antisense) strand of the siRNA guides RISC to its homologous mRNA, resulting in the endonucleolytic cleavage of the target mRNA  &lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3299081579068555714-4758952754351341049?l=sciencemasala.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://sciencemasala.blogspot.com/feeds/4758952754351341049/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3299081579068555714&amp;postID=4758952754351341049' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/4758952754351341049'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/4758952754351341049'/><link rel='alternate' type='text/html' href='http://sciencemasala.blogspot.com/2007/07/what-is-rnai.html' title='What is RNAi'/><author><name>chandu</name><uri>http://www.blogger.com/profile/00723479850668875546</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3299081579068555714.post-2680173636979702510</id><published>2007-07-26T10:10:00.001-07:00</published><updated>2007-07-26T10:10:32.095-07:00</updated><title type='text'>cath</title><content type='html'>&lt;p&gt;&lt;b&gt;CATH&lt;/b&gt; is a hierarchical classification of protein domain structures, which clusters proteins at four major levels, &lt;a href="http://cathwww.biochem.ucl.ac.uk/cgi-bin/cath/GotoCath.pl?link=cath_info.html#C_Level"&gt;Class(C)&lt;/a&gt;, &lt;a href="http://cathwww.biochem.ucl.ac.uk/cgi-bin/cath/GotoCath.pl?link=cath_info.html#A_Level"&gt;Architecture(A)&lt;/a&gt;, &lt;a href="http://cathwww.biochem.ucl.ac.uk/cgi-bin/cath/GotoCath.pl?link=cath_info.html#T_Level"&gt;Topology(T)&lt;/a&gt; and &lt;a href="http://cathwww.biochem.ucl.ac.uk/cgi-bin/cath/GotoCath.pl?link=cath_info.html#H_Level"&gt;Homologous superfamily (H)&lt;/a&gt;.  &lt;/p&gt;  &lt;p&gt;Class, derived from secondary structure content, is assigned for more than 90% of protein structures automatically. Architecture, which describes the gross orientation of secondary structures, independent of connectivities, is currently assigned manually. The topology level clusters structures into fold groups according to their topological connections and numbers of secondary structures. The homologous superfamilies cluster proteins with highly similar structures and functions. The assignments of structures to fold groups and homologous superfamilies are made by sequence and structure comparisons.&lt;/p&gt;  &lt;p&gt;The boundaries and assignments for each protein domain are determined using a combination of automated and manual procedures. These include computational techniques, empirical and statistical evidence, literature review and expert analysis.&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3299081579068555714-2680173636979702510?l=sciencemasala.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://sciencemasala.blogspot.com/feeds/2680173636979702510/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3299081579068555714&amp;postID=2680173636979702510' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/2680173636979702510'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/2680173636979702510'/><link rel='alternate' type='text/html' href='http://sciencemasala.blogspot.com/2007/07/cath.html' title='cath'/><author><name>chandu</name><uri>http://www.blogger.com/profile/00723479850668875546</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3299081579068555714.post-5773021820051214783</id><published>2007-07-26T09:59:00.000-07:00</published><updated>2007-07-26T10:04:22.955-07:00</updated><title type='text'>dna databases</title><content type='html'>DDBJ (DNA Data Bank of Japan) began DNA data bank activities in      earnest in 1986 at the National Institute of Genetics (NIG).&lt;br /&gt;    DDBJ has been functioning as the international nucleotide sequence      database in collaboration with EBI/EMBL and NCBI/GenBank.&lt;br /&gt;    DNA sequence records the organismic evolution more directly than other      biological materials and ,thus, is invaluable not only for research in      life sciences,  but also human welfare in general.  The databases are,      so to speak, a common treasure of human beings. With this in mind,      we make the databases online accessible to anyone in the world. The                          EMBL Nucleotide Sequence Database (also known as EMBL-Bank) constitutes Europe's                          primary nucleotide sequence resource. Main sources for                          DNA and RNA sequences are &lt;a href="http://www.ebi.ac.uk/embl/Submission/index.html"&gt;direct                          submissions&lt;/a&gt; from individual researchers, genome sequencing                           projects and patent applications.&lt;br /&gt;                       &lt;br /&gt;                        The database is produced in an international &lt;a href="http://www.ebi.ac.uk/embl/Contact/collaboration.html"&gt;collaboration&lt;/a&gt;                          with GenBank (USA) and the DNA Database of Japan (DDBJ).                          Each of the three groups collects a portion of the total                          sequence data reported worldwide, and all new and                          updated database entries are exchanged between the groups                          on a daily basis. The &lt;a href="ftp://ftp.ebi.ac.uk/pub/databases/embl/release/"&gt;current                          database release&lt;/a&gt; (Release 91,  June  2007), with according  &lt;a href="http://www.ebi.ac.uk/embl/Documentation/Release_notes/current/relnotes.html"&gt;Release notes&lt;/a&gt;  and &lt;a href="http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html"&gt;user                          manual&lt;/a&gt; are available from the EBI servers.&lt;br /&gt;GenBank&lt;sup&gt;®&lt;/sup&gt; is the NIH genetic sequence database, an annotated          collection of all publicly available DNA sequences .    There are approximately 65,369,091,950 bases in 61,132,599 sequence records in the    traditional GenBank divisions and 80,369,977,826 bases in 17,960,667 sequence records   in the WGS division as of August 2006.&lt;br /&gt;  &lt;br /&gt;         The complete &lt;a href="ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt"&gt;release          notes&lt;/a&gt; for the current version of GenBank are available on the NCBI ftp site.    A new release is made every two months. GenBank is part of the &lt;a href="http://www.ncbi.nlm.nih.gov/projects/collab/"&gt;International          Nucleotide Sequence Database Collaboration&lt;/a&gt;, which comprises         the DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory          (EMBL), and GenBank at NCBI. These three organizations exchange data on          a daily basis.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3299081579068555714-5773021820051214783?l=sciencemasala.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://sciencemasala.blogspot.com/feeds/5773021820051214783/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3299081579068555714&amp;postID=5773021820051214783' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/5773021820051214783'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/5773021820051214783'/><link rel='alternate' type='text/html' href='http://sciencemasala.blogspot.com/2007/07/dna-databases.html' title='dna databases'/><author><name>chandu</name><uri>http://www.blogger.com/profile/00723479850668875546</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3299081579068555714.post-2809970591514412450</id><published>2007-07-26T09:56:00.001-07:00</published><updated>2007-07-26T09:56:55.763-07:00</updated><title type='text'>Chromosome jumping</title><content type='html'>&lt;p&gt;&lt;b&gt;Chromosome jumping&lt;/b&gt; is a technique of &lt;a href="http://en.wikipedia.org/wiki/Molecular_biology" title="Molecular biology"&gt;molecular biology&lt;/a&gt; that is used as a tool in the physical mapping of &lt;a href="http://en.wikipedia.org/wiki/Genome" title="Genome"&gt;genomes&lt;/a&gt;. It is related to several other tools used for the same purpose, including &lt;a href="http://en.wikipedia.org/wiki/Chromosome_walking" title="Chromosome walking"&gt;chromosome walking&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;Chromosome jumping is used to bypass regions difficult to &lt;a href="http://en.wikipedia.org/wiki/Cloning" title="Cloning"&gt;clone&lt;/a&gt;, such as those containing &lt;a href="http://en.wikipedia.org/wiki/Repetitive_DNA" title="Repetitive DNA"&gt;repetitive DNA&lt;/a&gt;, that cannot be easily mapped by chromosome walking, and is useful in moving along a chromosome rapidly in search of a particular &lt;a href="http://en.wikipedia.org/wiki/Gene" title="Gene"&gt;gene&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;In &lt;a href="http://en.wikipedia.org/wiki/Chromosome" title="Chromosome"&gt;chromosome&lt;/a&gt; jumping, the &lt;a href="http://en.wikipedia.org/wiki/DNA" title="DNA"&gt;DNA&lt;/a&gt; of interest is identified, cut into fragments with &lt;a href="http://en.wikipedia.org/wiki/Restriction_enzyme" title="Restriction enzyme"&gt;restriction enzymes&lt;/a&gt;, and circularised (the beginning and end of each fragment is joined together to form a circular loop). From a known &lt;a href="http://en.wikipedia.org/wiki/DNA_sequence" title="DNA sequence"&gt;sequence&lt;/a&gt; a &lt;a href="http://en.wikipedia.org/wiki/Primer_%28molecular_biology%29" title="Primer (molecular biology)"&gt;primer&lt;/a&gt; is designed to sequence across the circularised junction. This primer is used to jump 100 &lt;a href="http://en.wikipedia.org/wiki/Base_pair" title="Base pair"&gt;kb&lt;/a&gt;-300 kb intervals: a sequence 100 kb away would have come near the known sequence on circularisation. Thus, sequences not reachable by chromosome walking can be sequenced. Chromosome walking can be used from the new jump position (in either direction) to look for gene-like sequences, or additional jumps can be used to progress further along the chromosome.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3299081579068555714-2809970591514412450?l=sciencemasala.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://sciencemasala.blogspot.com/feeds/2809970591514412450/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3299081579068555714&amp;postID=2809970591514412450' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/2809970591514412450'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/2809970591514412450'/><link rel='alternate' type='text/html' href='http://sciencemasala.blogspot.com/2007/07/chromosome-jumping.html' title='Chromosome jumping'/><author><name>chandu</name><uri>http://www.blogger.com/profile/00723479850668875546</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3299081579068555714.post-2625734955344362775</id><published>2007-07-26T09:53:00.002-07:00</published><updated>2007-07-26T09:55:25.258-07:00</updated><title type='text'>Gene Chips</title><content type='html'>&lt;span style="font-family:Arial, Helvetica, sans-serif;"&gt;&lt;span style="font-size:85%;"&gt;A &lt;a href="http://www.iscid.org/encyclopedia/Microarray"&gt;microarray&lt;/a&gt; or gene chip has made a big impact on &lt;a href="http://www.iscid.org/encyclopedia/DNA"&gt;DNA&lt;/a&gt; probe technology by helping detect tens of thousands of sequences almost simultaneously. A gene chip is a device in which a large number of different probes are carefully placed at specific locations on a glass slide (known as spotted arrays) or by putting probes to specific positions on some surface.&lt;br /&gt;&lt;br /&gt;The use of gene chips involves labeling the sample instead of the probe, propagating thousands of copies of the labeled sample across the chip and then washing away any copies of the sample that do not remain attached to some probe. Because the probes are attached to specified positions on the chip, if a labeled sample is detected at any position on the chip, it can easily be known which probe was able to hybridize its complement.&lt;br /&gt;&lt;br /&gt;Gene chips are most commonly used to &lt;a href="http://www.iscid.org/encyclopedia/Measure"&gt;measure&lt;/a&gt; the expression level of various &lt;a href="http://www.iscid.org/encyclopedia/Genes"&gt;genes&lt;/a&gt; in an organism. Each expression level gives a picture of the rate by which a specific protein is being produced in an organism’s cells at any given time. It should also be noted that more novel uses for gene chips are being continually developed and this is what makes this particular field very exciting.&lt;/span&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3299081579068555714-2625734955344362775?l=sciencemasala.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://sciencemasala.blogspot.com/feeds/2625734955344362775/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3299081579068555714&amp;postID=2625734955344362775' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/2625734955344362775'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/2625734955344362775'/><link rel='alternate' type='text/html' href='http://sciencemasala.blogspot.com/2007/07/gene-chips.html' title='Gene Chips'/><author><name>chandu</name><uri>http://www.blogger.com/profile/00723479850668875546</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3299081579068555714.post-1964127496733823441</id><published>2007-07-26T09:53:00.001-07:00</published><updated>2007-07-26T09:53:46.390-07:00</updated><title type='text'>Shotgun Cloning</title><content type='html'>&lt;span style="font-family:Arial, Helvetica, sans-serif;"&gt;&lt;span style="font-size:85%;"&gt;Shotgun &lt;a href="http://www.iscid.org/encyclopedia/Cloning"&gt;cloning&lt;/a&gt; is the practice of clipping at random a large &lt;a href="http://www.iscid.org/encyclopedia/DNA"&gt;DNA&lt;/a&gt; fragment to reduce it into various smaller pieces that can then be cloned.&lt;br /&gt;&lt;br /&gt;The method used to cut the DNA into smaller pieces can be done either through using a restriction &lt;a href="http://www.iscid.org/encyclopedia/Enzyme"&gt;enzyme&lt;/a&gt; or through more physical methods that have the end goal of smashing the DNA into smaller pieces. The resulting fragments are then gathered and then cloned into a vector. The original DNA can either be a genomic DNA (the process is then called &lt;a href="http://www.iscid.org/encyclopedia/Genome"&gt;genome&lt;/a&gt; shotgun cloning) or a clone like a YAC (yeast artificial &lt;a href="http://www.iscid.org/encyclopedia/Chromosome"&gt;chromosome&lt;/a&gt;s) that has a large piece of genomic DNA that needs to be split into fragments.&lt;br /&gt;&lt;br /&gt;If the DNA is required to be a in a certain cloning vector but the vector is only capable of carrying small amounts of DNA then the shotgun method can be employed. The method is usually used to generate small fragments of DNA for sequencing.&lt;br /&gt;&lt;br /&gt;For example, if a geneticist is studying a 50 kb gene it could be difficult to figure out the restriction map. Breaking a DNA sequence into smaller fragments and then mapping these a master restriction map can be deduced.&lt;/span&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3299081579068555714-1964127496733823441?l=sciencemasala.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://sciencemasala.blogspot.com/feeds/1964127496733823441/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3299081579068555714&amp;postID=1964127496733823441' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/1964127496733823441'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/1964127496733823441'/><link rel='alternate' type='text/html' href='http://sciencemasala.blogspot.com/2007/07/shotgun-cloning.html' title='Shotgun Cloning'/><author><name>chandu</name><uri>http://www.blogger.com/profile/00723479850668875546</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3299081579068555714.post-6702871005717226199</id><published>2007-07-26T09:51:00.000-07:00</published><updated>2007-07-26T09:52:10.212-07:00</updated><title type='text'>chromosome walking</title><content type='html'>&lt;p align="justify"&gt;&lt;span style="font-family:Arial, Helvetica, sans-serif;"&gt;&lt;span style="font-size:85%;"&gt;Chromosome walking is a technique for &lt;a href="http://www.iscid.org/encyclopedia/Cloning"&gt;cloning&lt;/a&gt; everything in the &lt;a href="http://www.iscid.org/encyclopedia/Genome"&gt;genome&lt;/a&gt; around a known piece of &lt;a href="http://www.iscid.org/encyclopedia/DNA"&gt;DNA&lt;/a&gt; (the starting probe). You screen a &lt;a href="http://www.iscid.org/encyclopedia/Genomic_Library"&gt;genomic library&lt;/a&gt; for all clones hybridizing with the probe, and then figure out which one extends furthest into the surrounding DNA. The most distal piece of this most distal clone is then used as a probe, so that ever more distal regions can be cloned. This has been used to move as much as 200 kb away from a given starting point (an immense undertaking). Typically used to "walk" from a starting point towards some nearby gene in &lt;a href="http://www.iscid.org/encyclopedia/Order"&gt;order&lt;/a&gt; to clone that gene. Also used to obtain the remainder of a gene when you have isolated a part of it.&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3299081579068555714-6702871005717226199?l=sciencemasala.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://sciencemasala.blogspot.com/feeds/6702871005717226199/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3299081579068555714&amp;postID=6702871005717226199' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/6702871005717226199'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/6702871005717226199'/><link rel='alternate' type='text/html' href='http://sciencemasala.blogspot.com/2007/07/chromosome-walking.html' title='chromosome walking'/><author><name>chandu</name><uri>http://www.blogger.com/profile/00723479850668875546</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3299081579068555714.post-1865236331033861583</id><published>2007-07-22T00:36:00.000-07:00</published><updated>2007-07-22T00:38:57.324-07:00</updated><title type='text'>FASTA</title><content type='html'>&lt;p&gt;&lt;b&gt;FASTA&lt;/b&gt; is a &lt;a href="http://en.wikipedia.org/wiki/DNA" title="DNA"&gt;DNA&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Protein" title="Protein"&gt;Protein&lt;/a&gt; &lt;a href="http://en.wikipedia.org/wiki/Sequence_alignment" title="Sequence alignment"&gt;sequence alignment&lt;/a&gt; software package first described (as FASTP) by &lt;a href="http://en.wikipedia.org/wiki/David_J._Lipman" title="David J. Lipman"&gt;David J. Lipman&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/w/index.php?title=William_R._Pearson&amp;action=edit" class="new" title="William R. Pearson"&gt;William R. Pearson&lt;/a&gt; in 1985 in the article &lt;a href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;amp;db=pubmed&amp;dopt=Abstract&amp;amp;list_uids=2983426" class="external text" title="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=pubmed&amp;amp;dopt=Abstract&amp;list_uids=2983426" rel="nofollow"&gt;Rapid and sensitive protein similarity searches&lt;/a&gt;. The original FASTP program was designed for protein sequence similarity searching. FASTA, described in 1988 (&lt;a href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;amp;db=pubmed&amp;dopt=Abstract&amp;amp;list_uids=3162770" class="external text" title="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=pubmed&amp;amp;dopt=Abstract&amp;list_uids=3162770" rel="nofollow"&gt;Improved Programs for Biological Sequence Comparison&lt;/a&gt;) added the ability to do DNA:DNA searches, translated protein:DNA searches, and also provided a more sophisticated shuffling program for evaluating statistical significance. There are several programs in this package that allow the alignment of &lt;a href="http://en.wikipedia.org/wiki/Protein" title="Protein"&gt;protein&lt;/a&gt; sequences and DNA sequences. FASTA is pronounced "FAST-Aye", and stands for "FAST-All", because it works with any alphabet, an extension of "FAST-P" (protein) and "FAST-N" (nucleotide) alignment.&lt;/p&gt; &lt;p&gt;The current FASTA package contains programs for protein:protein, DNA:DNA, protein:translated DNA (with frameshifts), and ordered or unordered peptide searches. Recent versions of the FASTA package include special translated search algorithms that correctly handle frameshift errors (which six-frame-translated searches do not handle very well) when comparing nucleotide to protein sequence data.&lt;/p&gt; &lt;p&gt;In addition to rapid heuristic search methods, the FASTA package provides SSEARCH, an implementation of the optimal &lt;a href="http://en.wikipedia.org/wiki/Smith-Waterman_algorithm" title="Smith-Waterman algorithm"&gt;Smith-Waterman algorithm&lt;/a&gt;. A major focus of the package is the calculation of accurate similarity statistics, so that biologists can judge whether an alignment is likely to have occurred by chance, or whether it can be used to infer &lt;a href="http://en.wikipedia.org/wiki/Homology_%28biology%29" title="Homology (biology)"&gt;homology&lt;/a&gt;. &lt;br /&gt;&lt;/p&gt; &lt;p&gt;The web-interface to submit sequences for running a search of the &lt;a href="http://en.wikipedia.org/wiki/European_Bioinformatics_Institute" title="European Bioinformatics Institute"&gt;European Bioinformatics Institute (EBI)'s&lt;/a&gt; online databases is also available called &lt;a href="http://www.ebi.ac.uk/fasta33" class="external text" title="http://www.ebi.ac.uk/fasta33" rel="nofollow"&gt;fasta.&lt;/a&gt;&lt;/p&gt; The &lt;a href="http://en.wikipedia.org/wiki/FASTA_format" title="FASTA format"&gt;FASTA file format&lt;/a&gt; used as input for this software is now largely used by other sequence database search tools (such as &lt;a href="http://en.wikipedia.org/wiki/BLAST" title="BLAST"&gt;BLAST&lt;/a&gt;) and sequence alignment programs&lt;br /&gt;&lt;h2&gt;&lt;span class="mw-headline"&gt;Search method&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;Fasta takes a given nucleotide or amino-acid sequence and searches a corresponding sequence database by using &lt;a href="http://en.wikipedia.org/wiki/Sequence_alignment#Local_alignment" title="Sequence alignment"&gt;local sequence alignment&lt;/a&gt; to find matches of similar database sequences.&lt;/p&gt; &lt;p&gt;The FASTA program follows a largely heuristic method which contributes to the high speed of its execution. It initially observes the pattern of word hits, word-to-word matches of a given length, and marks potential matches before performing a more time-consuming optimized search using a &lt;a href="http://en.wikipedia.org/wiki/Smith-Waterman_algorithm" title="Smith-Waterman algorithm"&gt;Smith-Waterman&lt;/a&gt; type of algorithm. The size taken for a word, given by the parameter ktup, controls the sensitivity and speed of the program. Increasing the ktup value decreases number of background hits that are found. From the word hits that are returned the program looks for segments that contain a cluster of nearby hits. It then investigates these segments for a possible match.&lt;/p&gt; &lt;div class="thumb tright"&gt; &lt;div class="thumbinner" style="width: 474px;"&gt;&lt;a href="http://en.wikipedia.org/wiki/Image:Document_html_47f1ed1b.gif" class="internal" title="Diagram from Book Protein Structure prediction - a practical approach from chapter Protein Sequence Alignment and Database Scanning"&gt;&lt;img alt="Diagram from Book Protein Structure prediction - a practical approach from chapter Protein Sequence Alignment and Database Scanning" longdesc="/wiki/Image:Document_html_47f1ed1b.gif" class="thumbimage" src="http://upload.wikimedia.org/wikipedia/en/c/cd/Document_html_47f1ed1b.gif" height="655" width="472" /&gt;&lt;/a&gt; &lt;div class="thumbcaption"&gt;Diagram from Book &lt;a href="http://search.barnesandnoble.com/bookSearch/isbnInquiry.asp?r=1&amp;isbn=0199634963" class="external text" title="http://search.barnesandnoble.com/bookSearch/isbnInquiry.asp?r=1&amp;amp;isbn=0199634963" rel="nofollow"&gt;&lt;i&gt;Protein Structure prediction - a practical approach&lt;/i&gt;&lt;/a&gt; from chapter &lt;a href="http://www.compbio.dundee.ac.uk/ftp/preprints/review93/review93.pdf" class="external text" title="http://www.compbio.dundee.ac.uk/ftp/preprints/review93/review93.pdf" rel="nofollow"&gt;&lt;i&gt;Protein Sequence Alignment and Database Scanning&lt;/i&gt;&lt;/a&gt;&lt;/div&gt; &lt;/div&gt; &lt;/div&gt; &lt;p&gt;There are some differences between fastn and fastp relating to the type of sequences used but both use four steps and calculate three scores to describe and format the sequence similarity results. These are:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;Identify regions of highest density in each sequence comparison. Taking a ktup to equal 1 or 2.&lt;/li&gt;&lt;/ul&gt; &lt;dl&gt;&lt;dd&gt;In this step all or a group of the identities between two sequences are found using a look up table. The ktup value determines how many consecutive identities are required for a match to be declared. Thus the lesser the ktup value: the more sensitive the search. ktup=2 is frequently taken by users for protein sequences and ktup=4 or 6 for nucleotide sequences. Short oligonucleotides are usually run with ktup = 1. The program then finds all similar &lt;b&gt;local regions&lt;/b&gt;, represented as diagonals of a certain length in a dot plot, between the two sequences by counting ktup matches and penalizing for intervening mismatches. This way, &lt;b&gt;local regions&lt;/b&gt; of highest density matches in a diagonal are isolated from background hits. For protein sequences &lt;a href="http://en.wikipedia.org/wiki/Substitution_matrix#BLOSUM" title="Substitution matrix"&gt;BLOSUM50&lt;/a&gt; values are used for scoring ktup matches. This ensures that groups of identities with high similarity scores contribute more to the local diagonal score than to identities with low similarity scores. Nucleotide sequences use the &lt;a href="http://en.wikipedia.org/wiki/Substitution_matrix#Identity_matrix" title="Substitution matrix"&gt;identity matrix&lt;/a&gt; for the same purpose. The best 10 local regions selected from all the diagonals put together are then saved.&lt;/dd&gt;&lt;/dl&gt; &lt;ul&gt;&lt;li&gt;Rescan the regions taken using the scoring matrices. trimming the ends of the region to include only those contributing to the highest score.&lt;/li&gt;&lt;/ul&gt; &lt;dl&gt;&lt;dd&gt;Rescan the 10 regions taken. This time use the relevant scoring matrix while rescoring to allow runs of identities shorter than the ktup value. Also while rescoring conservative replacements that contribute to the similarity score are taken. Though protein sequences use the &lt;a href="http://en.wikipedia.org/wiki/Substitution_matrix#BLOSUM" title="Substitution matrix"&gt;BLOSUM50&lt;/a&gt; matrix, scoring matrices based on the minimum number of base changes required for a specific replacement, on identities alone, or on an alternative measure of similarity, can also be used with the program. For each of the diagonal regions rescanned this way, a subregion with the maximum score is identified. The initial scores found in step1 are used to rank the library sequences. The highest score is referred to as &lt;i&gt;init1&lt;/i&gt; score.&lt;/dd&gt;&lt;/dl&gt; &lt;ul&gt;&lt;li&gt;In an alignment if several initial regions with scores greater than a CUTOFF value are found, check whether the trimmed initial regions can be joined to form an approximate alignment with gaps. Calculate a similarity score that is the sum of the joined regions penalising for each gap 20 points. This initial similarity score (&lt;i&gt;initn&lt;/i&gt;) is used to rank the library sequences. The score of the single best initial region found in step 2 is reported (&lt;i&gt;init1&lt;/i&gt;).&lt;/li&gt;&lt;/ul&gt; &lt;dl&gt;&lt;dd&gt;Here the program calculates an optimal alignment of initial regions as a combination of compatible regions with maximal score. This optimal alignment of initial regions can be rapidily calculated using a dynamic programming algorithm. The resulting score initn is used to rank the library sequences.This joining process increases sensitivity but decreases selectivity. A carefully calculated cut-off value is thus used to control where this step is implemented, a value that is approximately one &lt;a href="http://en.wikipedia.org/wiki/Standard_deviation" title="Standard deviation"&gt;standard deviation&lt;/a&gt; above the average score expected from unrelated sequences in the library. A 200-residue query sequence with ktup2 uses a value 28.&lt;/dd&gt;&lt;/dl&gt; &lt;ul&gt;&lt;li&gt;Use a banded &lt;a href="http://en.wikipedia.org/wiki/Smith_Waterman_algorithm" title="Smith Waterman algorithm"&gt;Smith-Waterman&lt;/a&gt; algorithm to calculate an optimal score for alignment.&lt;/li&gt;&lt;/ul&gt; &lt;dl&gt;&lt;dd&gt;This step uses a banded &lt;a href="http://en.wikipedia.org/wiki/Smith_Waterman_algorithm" title="Smith Waterman algorithm"&gt;Smith-Waterman&lt;/a&gt; algorithm to create an optimised score (&lt;i&gt;opt&lt;/i&gt;) for each alignment of query sequence to a database(library) sequence. It takes a band of 32 residues centered on the &lt;i&gt;init1&lt;/i&gt; region of step2 for calculating the optimal alignment. After all sequences are searched the program plots the initial scores of each database sequence in a &lt;a href="http://en.wikipedia.org/wiki/Histogram" title="Histogram"&gt;histogram&lt;/a&gt;, and calculates the statistical significance of the "opt" score. For protein sequences, the final alignment is produced using a full &lt;a href="http://en.wikipedia.org/wiki/Smith_Waterman_algorithm" title="Smith Waterman algorithm"&gt;Smith-Waterman&lt;/a&gt; alignment. For DNA sequences, a banded alignment is provided.&lt;/dd&gt;&lt;/dl&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3299081579068555714-1865236331033861583?l=sciencemasala.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://sciencemasala.blogspot.com/feeds/1865236331033861583/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3299081579068555714&amp;postID=1865236331033861583' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/1865236331033861583'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/1865236331033861583'/><link rel='alternate' type='text/html' href='http://sciencemasala.blogspot.com/2007/07/fasta.html' title='FASTA'/><author><name>chandu</name><uri>http://www.blogger.com/profile/00723479850668875546</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3299081579068555714.post-4074903673324525302</id><published>2007-07-22T00:19:00.000-07:00</published><updated>2007-07-22T00:29:19.603-07:00</updated><title type='text'>Mass spectrometry</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp0.blogger.com/_0fSb-1TJAx0/RqMFeiNp_DI/AAAAAAAAACE/ogKgripSVT4/s1600-h/200px-Ms_block_schematic.gif"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://bp0.blogger.com/_0fSb-1TJAx0/RqMFeiNp_DI/AAAAAAAAACE/ogKgripSVT4/s400/200px-Ms_block_schematic.gif" alt="" id="BLOGGER_PHOTO_ID_5089918025997089842" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;p&gt;&lt;b&gt;Mass spectrometry&lt;/b&gt; (also known as &lt;b&gt;mass spectroscopy&lt;/b&gt; (&lt;a href="http://en.wiktionary.org/wiki/deprecated" class="extiw" title="wikt:deprecated"&gt;deprecated&lt;/a&gt;)or informally, &lt;b&gt;"mass-spec"&lt;/b&gt; and &lt;b&gt;MS&lt;/b&gt;) is an analytical technique used to measure the &lt;a href="http://en.wikipedia.org/wiki/Mass-to-charge_ratio" title="Mass-to-charge ratio"&gt;mass-to-charge ratio&lt;/a&gt; of &lt;a href="http://en.wikipedia.org/wiki/Ion" title="Ion"&gt;ions&lt;/a&gt;. It is most generally used to find the composition of a physical sample by generating a &lt;a href="http://en.wikipedia.org/wiki/Mass_spectrum" title="Mass spectrum"&gt;mass spectrum&lt;/a&gt; representing the masses of sample components. The mass spectrum is measured by a &lt;b&gt;mass spectrometer&lt;/b&gt;.&lt;/p&gt; &lt;p&gt;All mass spectrometers consist of three basic parts: an &lt;b&gt;&lt;a href="http://en.wikipedia.org/wiki/Ion_source" title="Ion source"&gt;ion source&lt;/a&gt;&lt;/b&gt;, a &lt;b&gt;mass analyzer&lt;/b&gt;, and a &lt;b&gt;detector system&lt;/b&gt;. The stages within the mass spectrometer are:&lt;/p&gt; &lt;ol&gt;&lt;li&gt;Producing ions from the sample&lt;/li&gt;&lt;li&gt;Separating ions of differing masses&lt;/li&gt;&lt;li&gt;Detecting the number of ions of each mass produced&lt;/li&gt;&lt;li&gt;Collating the data and generating the mass spectrum&lt;/li&gt;&lt;/ol&gt; &lt;p&gt;The technique has several applications, including;&lt;/p&gt;&lt;p&gt;&lt;img src="file:///C:/DOCUME%7E1/sys/LOCALS%7E1/Temp/moz-screenshot-5.jpg" alt="" /&gt;&lt;/p&gt; &lt;ul&gt;&lt;li&gt;identifying unknown &lt;a href="http://en.wikipedia.org/wiki/Chemical_compound" title="Chemical compound"&gt;compounds&lt;/a&gt; by the mass of the compound molecules or their fragments&lt;/li&gt;&lt;li&gt;determining the &lt;a href="http://en.wikipedia.org/wiki/Isotope" title="Isotope"&gt;isotopic&lt;/a&gt; composition of elements in a compound&lt;/li&gt;&lt;li&gt;determining the &lt;a href="http://en.wikipedia.org/wiki/Structure" title="Structure"&gt;structure&lt;/a&gt; of a compound by observing its fragmentation&lt;/li&gt;&lt;li&gt;quantifying the amount of a compound in a sample using carefully designed methods (mass spectrometry is not inherently quantitative)&lt;/li&gt;&lt;li&gt;studying the fundamentals of &lt;a href="http://en.wikipedia.org/wiki/Gas_phase_ion_chemistry" title="Gas phase ion chemistry"&gt;gas phase ion chemistry&lt;/a&gt; (the chemistry of ions and neutrals in vacuum)&lt;/li&gt;&lt;li&gt;determining other physical, chemical, or even biological properties of compounds with a variety of other approaches&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;h2&gt;&lt;span class="mw-headline"&gt;Instrumentation&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;&lt;a name="Ion_source" id="Ion_source"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Ion source&lt;/span&gt;&lt;/h3&gt; &lt;p&gt;The ion source is the part of the mass spectrometer that ionizes the material under analysis (the analyte). The ions are then transported by &lt;a href="http://en.wikipedia.org/wiki/Magnetic_field" title="Magnetic field"&gt;magnetic&lt;/a&gt; or &lt;a href="http://en.wikipedia.org/wiki/Electric_field" title="Electric field"&gt;electric fields&lt;/a&gt; to the mass analyzer.&lt;/p&gt; &lt;p&gt;Techniques for &lt;a href="http://en.wikipedia.org/wiki/Ion" title="Ion"&gt;ionization&lt;/a&gt; have been key to determining what types of samples can be analyzed by mass spectrometry. &lt;a href="http://en.wikipedia.org/wiki/Electron_ionization" title="Electron ionization"&gt;Electron ionization&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Chemical_ionization" title="Chemical ionization"&gt;chemical ionization&lt;/a&gt; are used for &lt;a href="http://en.wikipedia.org/wiki/Gas" title="Gas"&gt;gases&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Vapor" title="Vapor"&gt;vapors&lt;/a&gt;. In &lt;a href="http://en.wikipedia.org/wiki/Chemical_ionization" title="Chemical ionization"&gt;chemical ionization&lt;/a&gt; sources, the analyte is ionized by chemical ion-molecule reactions during collisions in the source. Two techniques often used with &lt;a href="http://en.wikipedia.org/wiki/Liquid" title="Liquid"&gt;liquid&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Solid" title="Solid"&gt;solid&lt;/a&gt; biological samples include &lt;a href="http://en.wikipedia.org/wiki/Electrospray_ionization" title="Electrospray ionization"&gt;electrospray ionization&lt;/a&gt; (due to &lt;a href="http://en.wikipedia.org/wiki/John_Fenn" title="John Fenn"&gt;John Fenn&lt;/a&gt;) and &lt;a href="http://en.wikipedia.org/wiki/Matrix-assisted_laser_desorption/ionization" title="Matrix-assisted laser desorption/ionization"&gt;matrix-assisted laser desorption/ionization&lt;/a&gt; (MALDI, due to K. Tanaka and separately, M. Karas and F. Hillenkamp). &lt;a href="http://en.wikipedia.org/wiki/ICP-MS" title="ICP-MS"&gt;Inductively coupled plasma&lt;/a&gt; sources are used primarily for metal analysis on a wide array of sample types. Others include &lt;a href="http://en.wikipedia.org/wiki/Glow_discharge" title="Glow discharge"&gt;glow discharge&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/Fast_atom_bombardment" title="Fast atom bombardment"&gt;fast atom bombardment&lt;/a&gt; (FAB), &lt;a href="http://en.wikipedia.org/wiki/Thermospray" title="Thermospray"&gt;thermospray&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/w/index.php?title=Desorption/ionization_on_silicon&amp;action=edit" class="new" title="Desorption/ionization on silicon"&gt;desorption/ionization on silicon&lt;/a&gt; (DIOS), &lt;a href="http://en.wikipedia.org/wiki/DART_ion_source" title="DART ion source"&gt;Direct Analysis in Real Time (DART)&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/Atmospheric_pressure_chemical_ionization" title="Atmospheric pressure chemical ionization"&gt;atmospheric pressure chemical ionization&lt;/a&gt; (APCI), &lt;a href="http://en.wikipedia.org/wiki/Secondary_ion_mass_spectrometry" title="Secondary ion mass spectrometry"&gt;secondary ion mass spectrometry&lt;/a&gt; (SIMS), &lt;a href="http://en.wikipedia.org/wiki/Spark_ionization" title="Spark ionization"&gt;spark ionization&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Thermal_ionisation" title="Thermal ionisation"&gt;thermal ionisation&lt;/a&gt;. &lt;/p&gt; &lt;p&gt;&lt;a name="Mass_analyzer" id="Mass_analyzer"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt;&lt;span class="mw-headline"&gt;Mass analyzer&lt;/span&gt;&lt;/h3&gt; &lt;p&gt;Mass analyzers separate the ions according to their &lt;a href="http://en.wikipedia.org/wiki/Mass-to-charge_ratio" title="Mass-to-charge ratio"&gt;mass-to-charge ratio&lt;/a&gt;. All mass spectrometers are based on dynamics of charged particles in electric and magnetic fields in vacuum where the following two laws apply:&lt;/p&gt; &lt;dl&gt;&lt;dd&gt;&lt;img class="tex" alt="\mathbf{F} = q (\mathbf{E} + \mathbf{v} \times \mathbf{B})" src="http://upload.wikimedia.org/math/3/0/e/30e07241f7dce068047cbe7fb1ca21b2.png" /&gt; (&lt;a href="http://en.wikipedia.org/wiki/Lorentz_force_law" title="Lorentz force law"&gt;Lorentz force law&lt;/a&gt;)&lt;/dd&gt;&lt;/dl&gt; &lt;dl&gt;&lt;dd&gt;&lt;img class="tex" alt="\mathbf{F}=m\mathbf{a}" src="http://upload.wikimedia.org/math/1/b/4/1b40dff432be7e95bcd84429486bfedd.png" /&gt; (&lt;a href="http://en.wikipedia.org/wiki/Newton%27s_second_law" title="Newton's second law"&gt;Newton's second law&lt;/a&gt; of motion)&lt;/dd&gt;&lt;/dl&gt; &lt;p&gt;where &lt;b&gt;F&lt;/b&gt; is the force applied to the ion, &lt;i&gt;m&lt;/i&gt; is the mass of the ion, &lt;b&gt;a&lt;/b&gt; is the acceleration, &lt;i&gt;q&lt;/i&gt; is the ionic charge, &lt;b&gt;E&lt;/b&gt; is the electric field, and &lt;b&gt;v&lt;/b&gt; x &lt;b&gt;B&lt;/b&gt; is the &lt;a href="http://en.wikipedia.org/wiki/Vector_cross_product" title="Vector cross product"&gt;vector cross product&lt;/a&gt; of the ion velocity and the magnetic field&lt;/p&gt; &lt;p&gt;Equating the above expressions for the force applied to the ion yields:&lt;/p&gt; &lt;dl&gt;&lt;dd&gt;&lt;img class="tex" alt="(m/q)\mathbf{a} = \mathbf{E}+ \mathbf{v} \times \mathbf{B}" src="http://upload.wikimedia.org/math/7/f/6/7f651b70578bfb18d0be45efbcbb2a30.png" /&gt;&lt;/dd&gt;&lt;/dl&gt; &lt;p&gt;This differential equation is the classic equation of motion of charged particles. Together with the particle's initial conditions it completely determines the particle's motion in space and time and therefore is the basis of every mass spectrometer. It immediately reveals that two particles with the same physical quantity &lt;i&gt;&lt;a href="http://en.wikipedia.org/wiki/M/q" title="M/q"&gt;m/q&lt;/a&gt;&lt;/i&gt; behave exactly the same. Thus all mass spectrometers actually measure &lt;i&gt;m/q&lt;/i&gt; and strictly speaking should be called mass-to-charge spectrometers. When presenting data, it is common to use the (officially) dimensionless &lt;i&gt;m/z&lt;/i&gt; (called &lt;a href="http://en.wikipedia.org/wiki/Mass-to-charge_ratio" title="Mass-to-charge ratio"&gt;mass-to-charge ratio&lt;/a&gt;, although (more accurately) it represents the ratio of the mass number and the charge number), where z is the number of &lt;a href="http://en.wikipedia.org/wiki/Elementary_charge" title="Elementary charge"&gt;elementary charges&lt;/a&gt; (&lt;i&gt;e&lt;/i&gt;) on the ion (z=q/e).&lt;/p&gt; &lt;p&gt;There are many types of mass analyzers, using either static or dynamic fields, and magnetic or electric fields, but all operate according to this same law. Each analyzer type has its strengths and weaknesses. Many mass spectrometers use two or more mass analyzers for tandem mass spectrometry (MS/MS). In addition to the more common mass analyzers listed below, there are other less common ones designed for special situations.&lt;/p&gt; &lt;p&gt;&lt;a name="Sector" id="Sector"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h4&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Sector&lt;/span&gt;&lt;/h4&gt;  &lt;p&gt;A &lt;b&gt;sector field mass analyzer&lt;/b&gt; uses an electric and/or magnetic field to affect the path and/or &lt;a href="http://en.wikipedia.org/wiki/Velocity" title="Velocity"&gt;velocity&lt;/a&gt; of the &lt;a href="http://en.wikipedia.org/wiki/Electric_charge" title="Electric charge"&gt;charged&lt;/a&gt; particles in some way. As shown above, &lt;a href="http://en.wikipedia.org/wiki/Sector_instrument" title="Sector instrument"&gt;sector instruments&lt;/a&gt; change the direction of ions that are accelerated through the mass analyzer. The ions enter a magnetic or electric field which bends the ion paths depending on their mass-to-charge ratios, deflecting the more charged and faster-moving, lighter ions more. The ions eventually reach the detector and their relative abundances are measured. The analyzer can be used to select a narrow range of m/q or to scan through a range of m/q to catalog the ions present.&lt;/p&gt; &lt;p&gt;&lt;a name="Time-of-flight" id="Time-of-flight"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h4&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Time-of-flight&lt;/span&gt;&lt;/h4&gt;  &lt;p&gt;Perhaps the easiest to understand is the &lt;a href="http://en.wikipedia.org/wiki/Time-of-flight" title="Time-of-flight"&gt;Time-of-flight&lt;/a&gt; (TOF) analyzer. It uses an &lt;a href="http://en.wikipedia.org/wiki/Electric_field" title="Electric field"&gt;electric field&lt;/a&gt; to accelerate the ions through the same &lt;a href="http://en.wikipedia.org/wiki/Voltage" title="Voltage"&gt;potential&lt;/a&gt;, and then measures the time they take to reach the detector. If the particles all have the same &lt;a href="http://en.wikipedia.org/wiki/Electrical_charge" title="Electrical charge"&gt;charge&lt;/a&gt;, then their &lt;a href="http://en.wikipedia.org/wiki/Kinetic_energy" title="Kinetic energy"&gt;kinetic energies&lt;/a&gt; will be identical, and their &lt;a href="http://en.wikipedia.org/wiki/Velocity" title="Velocity"&gt;velocities&lt;/a&gt; will depend only on their &lt;a href="http://en.wikipedia.org/wiki/Mass" title="Mass"&gt;masses&lt;/a&gt;. Lighter ions will reach the detector first.&lt;/p&gt; &lt;p&gt;&lt;a name="Quadrupole" id="Quadrupole"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h4&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Quadrupole&lt;/span&gt; &lt;i&gt;.&lt;/i&gt;&lt;/h4&gt;  &lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/Quadrupole_mass_analyzer" title="Quadrupole mass analyzer"&gt;Quadrupole mass analyzers&lt;/a&gt; use oscillating electrical fields to selectively stabilize or destabilize ions passing through a &lt;a href="http://en.wikipedia.org/wiki/Radio_frequency" title="Radio frequency"&gt;radio frequency&lt;/a&gt; (RF) &lt;a href="http://en.wikipedia.org/wiki/Quadrupole" title="Quadrupole"&gt;quadrupole&lt;/a&gt; field. A quadrupole mass analyzer acts as a mass selective filter and is closely related to the &lt;a href="http://en.wikipedia.org/wiki/Quadrupole_ion_trap" title="Quadrupole ion trap"&gt;Quadrupole ion trap&lt;/a&gt;, particularly the linear quadrupole ion trap except that it operates without trapping the ions. A common variation of the quadrupole is the triple quadrupole.&lt;/p&gt; &lt;p&gt;&lt;a name="Quadrupole_ion_trap" id="Quadrupole_ion_trap"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h4&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Quadrupole ion trap&lt;/span&gt;&lt;i&gt;.&lt;/i&gt;&lt;/h4&gt;  &lt;p&gt;The &lt;a href="http://en.wikipedia.org/wiki/Quadrupole_ion_trap" title="Quadrupole ion trap"&gt;quadrupole ion trap&lt;/a&gt; works on the same physical principles as the QMS, but the ions are trapped and sequentially ejected. Ions are created and trapped in a mainly quadrupole RF potential and separated by m/q, non-destructively or destructively.&lt;/p&gt; &lt;p&gt;There are many mass/charge separation and isolation methods but most commonly used is the mass instability mode in which the RF potential is ramped so that the orbit of ions with a mass &lt;span class="texhtml"&gt;&lt;i&gt;a&lt;/i&gt; &gt; &lt;i&gt;b&lt;/i&gt;&lt;/span&gt; are stable while ions with mass &lt;span class="texhtml"&gt;&lt;i&gt;b&lt;/i&gt;&lt;/span&gt; become unstable and are ejected on the z-axis onto a detector.&lt;/p&gt; &lt;p&gt;Ions may also be ejected by the resonance excitation method, whereby a supplemental oscillatory excitation voltage is applied to the endcap electrodes, and the trapping voltage amplitude and/or excitation voltage frequency is varied to bring ions into a resonance condition in order of their mass/charge ratio&lt;span style="text-decoration: underline;"&gt;.&lt;/span&gt;&lt;/p&gt; &lt;p&gt;The &lt;a href="http://en.wikipedia.org/w/index.php?title=Cylindrical_ion_trap_mass_spectrometer&amp;action=edit" class="new" title="Cylindrical ion trap mass spectrometer"&gt;cylindrical ion trap mass spectrometer&lt;/a&gt; is a derivative of the quadrupole ion trap mass spectrometer.&lt;/p&gt; &lt;p&gt;&lt;a name="Linear_quadrupole_ion_trap" id="Linear_quadrupole_ion_trap"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h4&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Linear quadrupole ion trap&lt;/span&gt;&lt;/h4&gt; &lt;p&gt;A &lt;a href="http://en.wikipedia.org/w/index.php?title=Linear_quadrupole_ion_trap&amp;amp;action=edit" class="new" title="Linear quadrupole ion trap"&gt;linear quadrupole ion trap&lt;/a&gt; (LTQ) is similar to a QIT, but traps ions in a 2D quadrupole field, instead of a 3D quadrupole field as in a QIT. Ions can be stored along the entire length of the LTQ which results in a higher ion capacity.&lt;/p&gt; &lt;p&gt;&lt;a name="Fourier_transform_ion_cyclotron_resonance" id="Fourier_transform_ion_cyclotron_resonance"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h4&gt;&lt;span class="editsection"&gt;&lt;/span&gt;&lt;span class="mw-headline"&gt;Fourier transform ion cyclotron resonance&lt;/span&gt; &lt;i&gt;.&lt;/i&gt;&lt;/h4&gt;  &lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/Fourier_transform_mass_spectrometry" title="Fourier transform mass spectrometry"&gt;Fourier transform mass spectrometry&lt;/a&gt;, or more precisely &lt;a href="http://en.wikipedia.org/wiki/Fourier_transform_ion_cyclotron_resonance" title="Fourier transform ion cyclotron resonance"&gt;Fourier transform ion cyclotron resonance&lt;/a&gt; MS, measures mass by detecting the image current produced by ions &lt;a href="http://en.wikipedia.org/wiki/Cyclotron" title="Cyclotron"&gt;cyclotroning&lt;/a&gt; in the presence of a magnetic field. Instead of measuring the deflection of ions with a detector such as an &lt;a href="http://en.wikipedia.org/wiki/Electron_multiplier" title="Electron multiplier"&gt;electron multiplier&lt;/a&gt;, the ions are injected into a &lt;a href="http://en.wikipedia.org/wiki/Penning_trap" title="Penning trap"&gt;Penning trap&lt;/a&gt; (a static electric/magnetic &lt;a href="http://en.wikipedia.org/wiki/Ion_trap" title="Ion trap"&gt;ion trap&lt;/a&gt;) where they effectively form part of a circuit. Detectors at fixed positions in space measure the electrical signal of ions which pass near them over time producing cyclical signal. Since the frequency of an ion's cycling is determined by its mass to charge ratio, this can be deconvoluted by performing a &lt;a href="http://en.wikipedia.org/wiki/Fourier_transform" title="Fourier transform"&gt;Fourier transform&lt;/a&gt; on the signal. &lt;a href="http://en.wikipedia.org/wiki/FTMS" title="FTMS"&gt;FTMS&lt;/a&gt; has the advantage of high sensitivity (since each ion is 'counted' more than once) and much high resolution and thus precision.&lt;sup id="_ref-7" class="reference"&gt;&lt;a href="http://en.wikipedia.org/wiki/Mass_spectrometry#_note-7" title=""&gt;[8]&lt;/a&gt;&lt;/sup&gt;&lt;sup id="_ref-8" class="reference"&gt;&lt;a href="http://en.wikipedia.org/wiki/Mass_spectrometry#_note-8" title=""&gt;[9]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt; &lt;p&gt;&lt;a href="http://en.wikipedia.org/w/index.php?title=Ion_cyclotron_resonance&amp;action=edit" class="new" title="Ion cyclotron resonance"&gt;Ion cyclotron resonance&lt;/a&gt; is an older mass analysis technique similar to FTMS except that ions are detected with a traditional detector. Ions trapped in a &lt;a href="http://en.wikipedia.org/wiki/Penning_trap" title="Penning trap"&gt;Penning trap&lt;/a&gt; are excited by an RF electric field until they impact the wall of the trap where the detector is located with ions of different mass being resolved in time.&lt;/p&gt; &lt;p&gt;&lt;a name="Orbitrap" id="Orbitrap"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h4&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Orbitrap&lt;/span&gt;&lt;/h4&gt;  &lt;p&gt;The &lt;a href="http://en.wikipedia.org/wiki/Orbitrap" title="Orbitrap"&gt;Orbitrap&lt;/a&gt; is the most recently introduced mass analyser (commercially available since 2005,ThermoElectron(R)). In the Orbitrap, ions are &lt;a href="http://en.wikipedia.org/wiki/Electrostatic" title="Electrostatic"&gt;electrostatically&lt;/a&gt; trapped in an orbit around a central, spindle-shaped electrode. The electrode confines the ions so that they both orbit around the central electrode and oscillate back and forth along the central electrode's long axis. This oscillation generates an image current in the detector plates which is recorded by the instrument. The frequencies of these image currents depend on the mass to charge ratios of the ions in the Orbitrap. Mass spectra are obtained by &lt;a href="http://en.wikipedia.org/wiki/Fourier_transformation" title="Fourier transformation"&gt;Fourier transformation&lt;/a&gt; of the recorded image currents.&lt;/p&gt; &lt;p&gt;Similar to &lt;a href="http://en.wikipedia.org/wiki/Fourier_transform_ion_cyclotron_resonance" title="Fourier transform ion cyclotron resonance"&gt;Fourier transform ion cyclotron resonance&lt;/a&gt; mass spectrometers, Orbitraps have a high mass accuracy, high sensitivity and a good dynamic range.&lt;/p&gt; &lt;p&gt;&lt;a name="Detector" id="Detector"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Detector&lt;/span&gt;&lt;/h3&gt; &lt;p&gt;The final element of the mass spectrometer is the detector. The detector records the charge induced or current produced when an ion passes by or hits a surface. In a scanning instrument the signal produced in the detector during the course of the scan versus where the instrument is in the scan (at what m/q) will produce a &lt;a href="http://en.wikipedia.org/wiki/Mass_spectrum" title="Mass spectrum"&gt;mass spectrum&lt;/a&gt;, a record of ions as a function of &lt;i&gt;m/q&lt;/i&gt;.&lt;/p&gt; &lt;p&gt;Typically, some type of &lt;a href="http://en.wikipedia.org/wiki/Electron_multiplier" title="Electron multiplier"&gt;electron multiplier&lt;/a&gt; is used, though other detectors including &lt;a href="http://en.wikipedia.org/wiki/Faraday_cup" title="Faraday cup"&gt;Faraday cups&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/w/index.php?title=Ion-to-photon_detector&amp;amp;action=edit" class="new" title="Ion-to-photon detector"&gt;ion-to-photon detectors&lt;/a&gt; are also used. Because the number of ions leaving the mass analyzer at a particular instant is typically quite small, significant amplification is often necessary to get a signal. &lt;a href="http://en.wikipedia.org/wiki/Microchannel_Plate_Detector" title="Microchannel Plate Detector"&gt;Microchannel Plate Detectors&lt;/a&gt; are commonly used in modern commercial instruments.&lt;sup id="_ref-10" class="reference"&gt;&lt;a href="http://en.wikipedia.org/wiki/Mass_spectrometry#_note-10" title=""&gt;[11]&lt;/a&gt;&lt;/sup&gt; In &lt;a href="http://en.wikipedia.org/wiki/FTMS" title="FTMS"&gt;FTMS&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Orbitrap" title="Orbitrap"&gt;Orbitraps&lt;/a&gt;, the detector consists of a pair of metal surfaces within the mass analyzer/ion trap region which the ions only pass near as they oscillate. No DC current is produced, only a weak AC image current is produced in a circuit between the electrodes. Other inductive detectors have also been used. &lt;/p&gt; &lt;p&gt;&lt;a name="Tandem_MS_.28MS.2FMS.29" id="Tandem_MS_.28MS.2FMS.29"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Tandem MS (MS/MS)&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;Tandem mass spectrometry involves multiple steps of mass selection or analysis, usually separated by some form of fragmentation. A tandem mass spectrometer is one capable of multiple rounds of mass spectrometry. For example, one mass analyzer can isolate one &lt;a href="http://en.wikipedia.org/wiki/Peptide" title="Peptide"&gt;peptide&lt;/a&gt; from many entering a mass spectrometer. A second mass analyzer then stabilizes the peptide ions while they collide with a gas, causing them to fragment by &lt;a href="http://en.wikipedia.org/wiki/Collision-induced_dissociation" title="Collision-induced dissociation"&gt;collision-induced dissociation&lt;/a&gt; (CID). A third mass analyzer then catalogs the fragments produced from the peptides. Tandem MS can also be done in a single mass analyzer over time as in a &lt;a href="http://en.wikipedia.org/wiki/Quadrupole_ion_trap" title="Quadrupole ion trap"&gt;quadrupole ion trap&lt;/a&gt;. There are various methods for fragmenting molecules for tandem MS, including &lt;a href="http://en.wikipedia.org/wiki/Collision-induced_dissociation" title="Collision-induced dissociation"&gt;collision-induced dissociation&lt;/a&gt; (CID), &lt;a href="http://en.wikipedia.org/wiki/Electron_capture_dissociation" title="Electron capture dissociation"&gt;electron capture dissociation&lt;/a&gt; (ECD), &lt;a href="http://en.wikipedia.org/wiki/Electron_transfer_dissociation" title="Electron transfer dissociation"&gt;electron transfer dissociation&lt;/a&gt; (ETD), &lt;a href="http://en.wikipedia.org/wiki/Infrared_multiphoton_dissociation" title="Infrared multiphoton dissociation"&gt;infrared multiphoton dissociation&lt;/a&gt; (IRMPD) and &lt;a href="http://en.wikipedia.org/wiki/Blackbody_infrared_radiative_dissociation" title="Blackbody infrared radiative dissociation"&gt;blackbody infrared radiative dissociation&lt;/a&gt; (BIRD). An important application using tandem mass spectrometry is in &lt;a href="http://en.wikipedia.org/wiki/Mass_spectrometry#Protein_identification" title="Mass spectrometry"&gt;protein identification&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;Tandem mass spectrometry enables a variety of experiments. Although it allows for many uniquely designed experiments some types of experiments are commonly used and built into many commercial mass spectrometers. Examples of these include single reaction monitoring (SRM), multiple reaction monitoring (MRM) and precursor ion scan. In single reaction monitoring the first analyzer allows only a single mass through and the second analyzer monitors for a specifically defined fragment ion. MRM is nearly identical except the second analyzer monitors multiple user defined fragment ions. These monikers are most often used with scanning instruments where the second mass analysis event is duty cycle limited. These experiments are used to increase specificity of detection of known molecules such as in pharmacokinetic studies. Precursor ion scan refers to monitoring for a specific loss from the precursor ion. The first and second mass analyzers scan across the spectrum separated by a user defined m/z value. This experiment is used to detect specific motifs within unknown molecules.&lt;/p&gt; &lt;p&gt;&lt;a name="Common_Mass_Spectrometer_Configurations_.26_Techniques" id="Common_Mass_Spectrometer_Configurations_.26_Techniques"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Common Mass Spectrometer Configurations &amp; Techniques&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;When all of the elements (source, analyzer and detector) of a mass spectrometer are combined to form a complete instrument and the specific configuration becomes common a new name, often an abbreviation of one or more of the internal components, becomes attached to the specific configuration and can become, within certain circles, more well known than the specific internal components. The most ubiquitous example of this is &lt;a href="http://en.wikipedia.org/wiki/MALDI-TOF" title="MALDI-TOF"&gt;MALDI-TOF&lt;/a&gt;, which simply refers to combining a &lt;a href="http://en.wikipedia.org/wiki/Matrix-assisted_laser_desorption/ionization" title="Matrix-assisted laser desorption/ionization"&gt;Matrix-assisted laser desorption/ionization&lt;/a&gt; source with a &lt;a href="http://en.wikipedia.org/wiki/Time-of-flight" title="Time-of-flight"&gt;Time-of-flight&lt;/a&gt; mass analyzer. The MALDI-TOF moniker is, however, often more widely recognized by the non-mass spectrometrist scientist than MALDI or TOF individually as if inseparable. Other examples include &lt;a href="http://en.wikipedia.org/wiki/ICP-MS" title="ICP-MS"&gt;inductively coupled plasma-mass spectrometry (ICP-MS)&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/Accelerator_mass_spectrometry" title="Accelerator mass spectrometry"&gt;accelerator mass spectrometry (AMS)&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/Thermal_ionisation" title="Thermal ionisation"&gt;Thermal ionization-mass spectrometry (TIMS)&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Spark_ionization" title="Spark ionization"&gt;spark source mass spectrometry (SSMS)&lt;/a&gt;. Sometimes the use of the generic "MS" actually implies a very specific mass analyzer and detection system as with AMS, which is always sector based. In other cases there are common configurations that may be implied but not necessarily.&lt;/p&gt; &lt;p&gt;Certain applications of mass spectrometry have developed monikers that although technically referring to a broad application also tend to indicate a specific or a limited number of instrument configurations. An example of this is &lt;a href="http://en.wikipedia.org/wiki/Isotope_ratio_mass_spectrometry" title="Isotope ratio mass spectrometry"&gt;isotope ratio mass spectrometry (IRMS)&lt;/a&gt;. Despite only specifically indicating an application, the use of a limited number of sector based mass analyzers is implied and the name is used to refer to both the application and the instrument used for the application.&lt;/p&gt; &lt;p&gt;&lt;a name="Other_Separation_Techniques_Combined_with_Mass_spectrometry" id="Other_Separation_Techniques_Combined_with_Mass_spectrometry"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Other Separation Techniques Combined with Mass spectrometry&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;An important enhancement to the mass resolving and determining capacity of mass spectrometry is the combination of mass spectrometry with analysis techniques that the resolve mixtures of compounds in a sample based on other characteristics before introduction into the mass spectrometer.&lt;/p&gt; &lt;p&gt;&lt;a name="Gas_chromatography.2FMS" id="Gas_chromatography.2FMS"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Gas chromatography/MS&lt;/span&gt;&lt;/h3&gt; &lt;dl&gt;&lt;dd&gt;&lt;i&gt;&lt;a href="http://en.wikipedia.org/wiki/Gas_chromatography-mass_spectrometry" title="Gas chromatography-mass spectrometry"&gt;&lt;br /&gt;&lt;/a&gt;&lt;/i&gt;&lt;/dd&gt;&lt;/dl&gt; &lt;p&gt;A common form of mass spectrometry is &lt;a href="http://en.wikipedia.org/wiki/Gas" title="Gas"&gt;gas&lt;/a&gt; chromatography-mass spectrometry (GC/MS or GC-MS). In this technique, a &lt;a href="http://en.wikipedia.org/wiki/Gas_chromatograph" title="Gas chromatograph"&gt;gas chromatograph&lt;/a&gt; is used to separate different compounds. This stream of separated compounds is fed on-line into the &lt;a href="http://en.wikipedia.org/wiki/Ion" title="Ion"&gt;ion&lt;/a&gt; source, a &lt;a href="http://en.wikipedia.org/wiki/Metal" title="Metal"&gt;metallic&lt;/a&gt; &lt;a href="http://en.wikipedia.org/wiki/Filament" title="Filament"&gt;filament&lt;/a&gt; to which &lt;a href="http://en.wikipedia.org/wiki/Voltage" title="Voltage"&gt;voltage&lt;/a&gt; is applied. This filament emits electrons which ionize the compounds. The ions can then further fragment, yielding predictable patterns. Intact ions and fragments pass into the mass spectrometer's analyser and are eventually detected.&lt;/p&gt; &lt;p&gt;&lt;a name="Liquid_chromatography.2FMS" id="Liquid_chromatography.2FMS"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Liquid chromatography/MS&lt;/span&gt;&lt;/h3&gt;  &lt;p&gt;Similar to gas chromatography MS (GC/MS), liquid chromatography mass spectrometry (LC/MS or LC-MS) separates compounds chromatographically before they are introduced to the ion source and mass spectrometer. It differs from GC/MS in that the mobile phase is liquid, usually a combination of &lt;a href="http://en.wikipedia.org/wiki/Water" title="Water"&gt;water&lt;/a&gt; and organic &lt;a href="http://en.wikipedia.org/wiki/Solvent" title="Solvent"&gt;solvents&lt;/a&gt;, instead of gas. Most commonly, an &lt;a href="http://en.wikipedia.org/wiki/Electrospray_ionization" title="Electrospray ionization"&gt;electrospray ionization&lt;/a&gt; source is used in LC/MS.&lt;/p&gt; &lt;p&gt;&lt;a name="IMS.2FMS" id="IMS.2FMS"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt;&lt;span class="mw-headline"&gt;IMS/MS&lt;/span&gt;&lt;/h3&gt; &lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/Ion_mobility_spectrometry" title="Ion mobility spectrometry"&gt;Ion mobility spectrometry&lt;/a&gt;/mass spectrometry is a technique where ions are first separated by drift time through some pressure of neutral gas given an electrical potential gradient before being introduced into a mass spectrometer.&lt;/p&gt; &lt;p&gt;The drift time is a measure of the radius relative to the charge of the ion. The duty cycle of IMS (time over which the experiment takes place) is longer than most mass spectrometers such that the mass spectrometer can sample along the course of the IMS separation. This produces data about the IMS separation and the mass-to-charge ratio of the ions in a manner similar to &lt;a href="http://en.wikipedia.org/wiki/Liquid_chromatography-mass_spectrometry" title="Liquid chromatography-mass spectrometry"&gt;LC/MS&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;The duty cycle of IMS is short relative to liquid chromatography or gas chromatography separations and can thus be coupled to such techniques producing triply hyphenated techniques such as LC/IMS/MS.&lt;/p&gt; &lt;p&gt;&lt;a name="Data_and_analysis" id="Data_and_analysis"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt;&lt;span class="mw-headline"&gt;Data and analysis&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;&lt;a name="Data_representations" id="Data_representations"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Data representations&lt;/span&gt;&lt;/h3&gt; &lt;p&gt;Mass spectrometry produces various types of data. The most ubiquitous data representation is the &lt;a href="http://en.wikipedia.org/wiki/Mass_spectrum" title="Mass spectrum"&gt;mass spectrum&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;Certain types of mass spectrometry data are best represented as a &lt;a href="http://en.wikipedia.org/wiki/Mass_chromatogram" title="Mass chromatogram"&gt;mass chromatogram&lt;/a&gt;. Types of chromatograms include selected ion monitoring (SIM), total ion current (TIC), and selected reaction monitoring chromatogram (SRM), among many others.&lt;/p&gt; &lt;p&gt;Other types of mass spectrometry data are well represented as a &lt;a href="http://en.wikipedia.org/wiki/Contour_map" title="Contour map"&gt;contour map&lt;/a&gt; of mass-to-charge on one axis, intensity on another and an additional experimental parameter (often time) on the third axis, thus producing a three dimensional surface.&lt;/p&gt; &lt;p&gt;&lt;a name="Data_analysis" id="Data_analysis"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt;&lt;span class="mw-headline"&gt;Data analysis&lt;/span&gt;&lt;/h3&gt; &lt;p&gt;&lt;b&gt;Basics&lt;/b&gt;&lt;/p&gt; &lt;p&gt;Mass spectrometry data analysis is a complicated subject matter that is very specific to the type of experiment producing the data. There are several general subdivisions of data that are fundamental to beginning to understand any data.&lt;/p&gt; &lt;p&gt;Many mass spectrometers work in either &lt;i&gt;negative ion mode&lt;/i&gt; or &lt;i&gt;positive ion mode&lt;/i&gt;. It is very important to know whether the observed ions are negatively or positively charged. This is often important in determining the neutral mass but it also indicates something about the nature of the molecules.&lt;/p&gt; &lt;p&gt;There are many different types of ion sources that behave very differently from each other. A source such as an electron ionization source produces many fragments and mostly odd electron species with one charge, whereas a source such as an electrospray source usually produces quasimolecular even electron species that may be multiply charged.&lt;/p&gt; &lt;p&gt;Tandem mass spectrometry purposely produces fragment ions post-source and can drastically change the sort of data achieved by an experiment.&lt;/p&gt; &lt;p&gt;By understanding the origin of a sample certain expectations can be assumed. For example, if the sample is coming from a synthesis/manufacturing process impurities are likely to be present that are related to the major component. If the sample is a relatively crude preparation of a biological sample, the sample likely contains a certain amount of salt that may form &lt;a href="http://en.wikipedia.org/wiki/Adduct" title="Adduct"&gt;adducts&lt;/a&gt; with the analyte molecules in certain analyses.&lt;/p&gt; &lt;p&gt;Results can also depend heavily on how was the sample prepared and how was it run/introduced. An important example is which matrix was used for MALDI spotting, since much of the energetics of the desorption/ionization event is controlled by the matrix rather than the laser power. Sometimes samples are spiked with sodium or another ion-carrying species to produce adducts rather than a protonated species.&lt;/p&gt; &lt;p&gt;The most commonly overlooked basic question by non-mass spectrometrists trying to use mass spectrometry or interact with a mass spectrometrist is what is the over-arching goal of the project. To interpret data one must know the desired outcome (and have collected the right data in the first place). There are many bits of information that can be gleaned from mass spectrometry data, such as the masses of the molecules, the purity of the sample, and the structure of the molecules. Each of these questions requires a different approach. Simply asking for a "mass-spec" will most likely not answer the real question at hand.&lt;/p&gt; &lt;p&gt;&lt;a name="Applications" id="Applications"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Applications&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;&lt;a name="Isotope_ratio_MS:_isotope_dating_and_tracking" id="Isotope_ratio_MS:_isotope_dating_and_tracking"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Isotope ratio MS: isotope dating and tracking&lt;/span&gt;&lt;/h3&gt; &lt;div class="thumb tright"&gt; &lt;div class="thumbinner" style="width: 182px;"&gt;&lt;a href="http://en.wikipedia.org/wiki/Image:Mass-spectrometer_awi_hg.jpg" class="internal" title="Mass spectrometer to determine the 16O/18O and 12C/13C isotope ratio on biogenous carbonate"&gt;&lt;img alt="Mass spectrometer to determine the 16O/18O and 12C/13C isotope ratio on biogenous carbonate" longdesc="/wiki/Image:Mass-spectrometer_awi_hg.jpg" class="thumbimage" src="http://upload.wikimedia.org/wikipedia/commons/thumb/d/da/Mass-spectrometer_awi_hg.jpg/180px-Mass-spectrometer_awi_hg.jpg" height="116" width="180" /&gt;&lt;/a&gt; &lt;div class="thumbcaption"&gt; &lt;div class="magnify" style="float: right;"&gt;&lt;a href="http://en.wikipedia.org/wiki/Image:Mass-spectrometer_awi_hg.jpg" class="internal" title="Enlarge"&gt;&lt;img src="http://en.wikipedia.org/skins-1.5/common/images/magnify-clip.png" alt="" height="11" width="15" /&gt;&lt;/a&gt;&lt;/div&gt; Mass spectrometer to determine the &lt;sup&gt;16&lt;/sup&gt;O/&lt;sup&gt;18&lt;/sup&gt;O and &lt;sup&gt;12&lt;/sup&gt;C/&lt;sup&gt;13&lt;/sup&gt;C isotope ratio on biogenous carbonate&lt;/div&gt; &lt;/div&gt; &lt;/div&gt; &lt;dl&gt;&lt;dd&gt; &lt;div class="noprint"&gt;&lt;i&gt;&lt;a href="http://en.wikipedia.org/wiki/Isotope_ratio_mass_spectrometry" title="Isotope ratio mass spectrometry"&gt;&lt;br /&gt;&lt;/a&gt;&lt;/i&gt;&lt;/div&gt; &lt;/dd&gt;&lt;/dl&gt; &lt;p&gt;Mass spectrometry is also used to determine the &lt;a href="http://en.wikipedia.org/wiki/Isotope" title="Isotope"&gt;isotopic&lt;/a&gt; composition of elements within a sample. Differences in mass among isotopes of an element are very small, and the less abundant isotopes of an element are typically very rare, so a very sensitive instrument is required. These instruments, sometimes referred to as isotope ratio mass spectrometers (IR-MS), usually use a single magnet to bend a beam of ionized particles towards a series of &lt;a href="http://en.wikipedia.org/wiki/Faraday_cup" title="Faraday cup"&gt;Faraday cups&lt;/a&gt; which convert particle impacts to &lt;a href="http://en.wikipedia.org/wiki/Electric_current" title="Electric current"&gt;electric current&lt;/a&gt;. A fast on-line analysis of &lt;a href="http://en.wikipedia.org/wiki/Deuterium" title="Deuterium"&gt;deuterium&lt;/a&gt; content of water can be done using &lt;a href="http://en.wikipedia.org/wiki/Flowing_afterglow_mass_spectrometry" title="Flowing afterglow mass spectrometry"&gt;Flowing afterglow mass spectrometry&lt;/a&gt;, FA-MS. Probably the most sensitive and accurate mass spectrometer for this purpose is the &lt;a href="http://en.wikipedia.org/wiki/Accelerator_mass_spectrometry" title="Accelerator mass spectrometry"&gt;accelerator mass spectrometer&lt;/a&gt; (AMS). Isotope ratios are important markers of a variety of processes. Some isotope ratios are used to determine the age of materials for example as in &lt;a href="http://en.wikipedia.org/wiki/Carbon_dating" title="Carbon dating"&gt;carbon dating&lt;/a&gt;. Labelling with stable isotopes is also used for protein quantification. (see &lt;a href="http://en.wikipedia.org/wiki/Mass_spectrometry#Protein_quantitation" title="Mass spectrometry"&gt;Protein quantitation&lt;/a&gt; below)&lt;/p&gt; &lt;p&gt;&lt;a name="Trace_gas_analysis" id="Trace_gas_analysis"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt;&lt;span class="mw-headline"&gt;Trace gas analysis&lt;/span&gt;&lt;/h3&gt; &lt;p&gt;Several techniques use ions created in a dedicated ion source injected into a flow tube or a drift tube: &lt;a href="http://en.wikipedia.org/wiki/SIFT-MS_selected_ion_flow_tube_mass_spectrometry" title="SIFT-MS selected ion flow tube mass spectrometry"&gt;selected ion flow tube&lt;/a&gt; (SIFT-MS), and proton transfer reaction (PTR-MS), are variants of &lt;a href="http://en.wikipedia.org/wiki/Chemical_ionization" title="Chemical ionization"&gt;chemical ionization&lt;/a&gt; dedicated for trace gas analysis of air, breath or liquid headspace using well defined reaction time allowing calculations of analyte concentrations from the known reaction kinetics without the need for internal standard or calibration.&lt;/p&gt; &lt;p&gt;&lt;a name="Atom_Probe" id="Atom_Probe"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Atom Probe&lt;/span&gt;&lt;/h3&gt; &lt;dl&gt;&lt;dd&gt; &lt;div class="noprint"&gt;&lt;i&gt;&lt;a href="http://en.wikipedia.org/wiki/Atom_probe" title="Atom probe"&gt;&lt;br /&gt;&lt;/a&gt;&lt;/i&gt;&lt;/div&gt; &lt;/dd&gt;&lt;/dl&gt; &lt;p&gt;An &lt;a href="http://en.wikipedia.org/wiki/Atom_probe" title="Atom probe"&gt;atom probe&lt;/a&gt; is an instrument that combines &lt;a href="http://en.wikipedia.org/wiki/Time-of-flight" title="Time-of-flight"&gt;time-of-flight&lt;/a&gt; mass spectrometry and &lt;a href="http://en.wikipedia.org/wiki/Field_ion_microscope" title="Field ion microscope"&gt;field ion microscopy&lt;/a&gt; (FIM) to map the location of individual atoms.&lt;/p&gt; &lt;p&gt;&lt;a name="Pharmacokinetics" id="Pharmacokinetics"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Pharmacokinetics&lt;/span&gt;&lt;/h3&gt; &lt;dl&gt;&lt;dd&gt; &lt;div class="noprint"&gt;&lt;i&gt;&lt;a href="http://en.wikipedia.org/wiki/Pharmacokinetics" title="Pharmacokinetics"&gt;&lt;br /&gt;&lt;/a&gt;&lt;/i&gt;&lt;/div&gt; &lt;/dd&gt;&lt;/dl&gt; &lt;p&gt;Pharmacokinetics is often studied using mass spectrometry because of the complex nature of the matrix (often blood or urine) and the need for high sensitivity to observe low dose and long time point data. The most common instrumentation used in this application is &lt;a href="http://en.wikipedia.org/wiki/Liquid_chromatography-mass_spectrometry" title="Liquid chromatography-mass spectrometry"&gt;LC-MS&lt;/a&gt; with a &lt;a href="http://en.wikipedia.org/wiki/Quadrupole_mass_analyzer" title="Quadrupole mass analyzer"&gt;triple quadrupole mass spectrometer&lt;/a&gt;. Tandem mass spectrometry is usually employed for added specificity. Standard curves and internal standards are used for quantitation of usually a single pharmaceutical in the samples. The samples represent different time points as a pharmaceutical is administered and then metabolized or cleared from the body. Blank or t=0 samples taken before administration are important in determining background and insuring data integrity with such complex sample matrices. Much attention is paid to the linearity of the standard curve; however it is not uncommon to use curve fitting with more complex functions such as quadratics since the response of most mass spectrometers is less than linear across large concentration ranges.&lt;/p&gt; &lt;p&gt;There is currently considerable interest in the use of very high sensitivity mass spectrometry for &lt;a href="http://en.wikipedia.org/wiki/Microdosing" title="Microdosing"&gt;microdosing&lt;/a&gt; studies, which are seen as a promising alternative to &lt;a href="http://en.wikipedia.org/wiki/Animal_experimentation" title="Animal experimentation"&gt;animal experimentation&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;&lt;a name="Mass_spectrometry_of_proteins" id="Mass_spectrometry_of_proteins"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt;&lt;span class="mw-headline"&gt;Mass spectrometry of proteins&lt;/span&gt;&lt;/h3&gt; &lt;p&gt;Mass spectrometry is an important emerging method for the characterization of proteins. The two primary methods for ionization of whole proteins are &lt;a href="http://en.wikipedia.org/wiki/Electrospray_ionization" title="Electrospray ionization"&gt;electrospray ionization&lt;/a&gt; (ESI) and &lt;a href="http://en.wikipedia.org/wiki/Matrix-assisted_laser_desorption/ionization" title="Matrix-assisted laser desorption/ionization"&gt;matrix-assisted laser desorption/ionization&lt;/a&gt; (MALDI). In keeping with the performance and mass range of available mass spectrometers, two approaches are used for characterizing proteins. In the first, intact proteins are ionized by either of the two techniques described above, and then introduced to a mass analyser. In the second, proteins are enzymatically digested into smaller &lt;a href="http://en.wikipedia.org/wiki/Peptides" title="Peptides"&gt;peptides&lt;/a&gt; using an agent such as &lt;a href="http://en.wikipedia.org/wiki/Trypsin" title="Trypsin"&gt;trypsin&lt;/a&gt; or &lt;a href="http://en.wikipedia.org/wiki/Pepsin" title="Pepsin"&gt;pepsin&lt;/a&gt;. Other &lt;a href="http://en.wikipedia.org/w/index.php?title=Proteolytic_digest_agent&amp;action=edit" class="new" title="Proteolytic digest agent"&gt;proteolytic digest agents&lt;/a&gt; are also used. The collection of peptide products are then introduced to the mass analyser. This is often referred to as the "bottom-up" approach of protein analysis.&lt;/p&gt; &lt;p&gt;Whole protein mass analysis is primarily conducted using either &lt;a href="http://en.wikipedia.org/wiki/Time-of-flight" title="Time-of-flight"&gt;time-of-flight&lt;/a&gt; (TOF) MS, or &lt;a href="http://en.wikipedia.org/wiki/Fourier_transform_ion_cyclotron_resonance" title="Fourier transform ion cyclotron resonance"&gt;Fourier transform ion cyclotron resonance&lt;/a&gt; (FT-ICR). These two types of instrument are preferable here because of their wide mass range, and in the case of FT-ICR, its high mass accuracy. Mass analysis of proteolytic peptides is a much more popular method of protein characterization, as cheaper instrument designs can be used for characterization. Additionally, sample preparation is easier once whole proteins have been digested into smaller peptide fragments. The most widely used instrument for peptide mass analysis is the &lt;a href="http://en.wikipedia.org/wiki/Quadrupole_ion_trap" title="Quadrupole ion trap"&gt;quadrupole ion trap&lt;/a&gt;. Multiple stage quadrupole-time-of-flight and MALDI &lt;a href="http://en.wikipedia.org/wiki/Time-of-flight" title="Time-of-flight"&gt;time-of-flight&lt;/a&gt; instruments also find use in this application.&lt;/p&gt; &lt;p&gt;&lt;a name="Protein_and_peptide_fractionation_coupled_with_mass_spectrometry" id="Protein_and_peptide_fractionation_coupled_with_mass_spectrometry"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h4&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Protein and peptide fractionation coupled with mass spectrometry&lt;/span&gt;&lt;/h4&gt; &lt;p&gt;Proteins of interest to biological researchers are usually part of a very complex mixture of other proteins and molecules that co-exist in the biological medium. This presents two significant problems. First, the two ionization techniques used for large molecules only work well when the mixture contains roughly equal amounts of constituents, while in biological samples, different proteins tend to be present in widely differing amounts. If such a mixture is ionized using electrospray or MALDI, the more abundant species have a tendency to "drown" signals from less abundant ones. The second problem is that the mass spectrum from a complex mixture is very difficult to interpret because of the overwhelming number of mixture components. This is exacerbated by the fact that enzymatic digestion of a protein gives rise to a large number of peptide products.&lt;/p&gt; &lt;p&gt;To contend with this problem, two methods are widely used to fractionate proteins, or their peptide products from an enzymatic digestion. The first method fractionates whole proteins and is called &lt;a href="http://en.wikipedia.org/wiki/Two-dimensional_gel_electrophoresis" title="Two-dimensional gel electrophoresis"&gt;two-dimensional gel electrophoresis&lt;/a&gt;. The second method, &lt;a href="http://en.wikipedia.org/wiki/High_performance_liquid_chromatography" title="High performance liquid chromatography"&gt;high performance liquid chromatography&lt;/a&gt; is used to fractionate peptides after enzymatic digestion. In some situations, it may be necessary to combine both of these techniques.&lt;/p&gt; &lt;p&gt;Gel spots identified on a 2D Gel are usually attributable to one protein. If the identity of the protein is desired, the gel spot can be excised, and digested proteolytically. The peptide masses resulting from the digestion can be determined by mass spectrometry using &lt;a href="http://en.wikipedia.org/wiki/Peptide_mass_fingerprinting" title="Peptide mass fingerprinting"&gt;peptide mass fingerprinting&lt;/a&gt;. If this information does not allow unequivocal identification of the protein, its peptides can be subject to &lt;a href="http://en.wikipedia.org/wiki/Tandem_mass_spectrometry" title="Tandem mass spectrometry"&gt;tandem mass spectrometry&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;Characterization of protein mixtures using HPLC/MS is also called &lt;i&gt;shotgun proteomics&lt;/i&gt; and &lt;i&gt;mudpit&lt;/i&gt;. A peptide mixture that results from digestion of a protein mixture is fractionated by one or two steps of liquid chromatography. The eluent from the chromatography stage can be either directly introduced to the mass spectrometer through electrospray ionization, or laid down on a series of small spots for later mass analysis using MALDI.&lt;/p&gt; &lt;p&gt;&lt;a name="Protein_identification" id="Protein_identification"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h4&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Protein identification&lt;/span&gt;&lt;/h4&gt; &lt;p&gt;There are two main ways MS is used to identify proteins. &lt;a href="http://en.wikipedia.org/wiki/Peptide_mass_fingerprinting" title="Peptide mass fingerprinting"&gt;Peptide mass fingerprinting&lt;/a&gt; (mentioned in the previous section) uses the masses of proteolytic peptides as input to a search of a database of predicted masses that would arise from digestion of a list of known proteins. If a protein sequence in the reference list gives rise to a significant number of predicted masses that match the experimental values, there is some evidence that this protein was present in the original sample.&lt;/p&gt; &lt;div class="thumb tright"&gt; &lt;div class="thumbinner" style="width: 302px;"&gt;&lt;a href="http://en.wikipedia.org/wiki/Image:PeptideMSMS.jpg" class="internal" title="Full MS and MS2 spectra of a peptide."&gt;&lt;img alt="Full MS and MS2 spectra of a peptide." longdesc="/wiki/Image:PeptideMSMS.jpg" class="thumbimage" src="http://upload.wikimedia.org/wikipedia/commons/thumb/7/76/PeptideMSMS.jpg/300px-PeptideMSMS.jpg" height="206" width="300" /&gt;&lt;/a&gt; &lt;div class="thumbcaption"&gt; &lt;div class="magnify" style="float: right;"&gt;&lt;a href="http://en.wikipedia.org/wiki/Image:PeptideMSMS.jpg" class="internal" title="Enlarge"&gt;&lt;img src="http://en.wikipedia.org/skins-1.5/common/images/magnify-clip.png" alt="" height="11" width="15" /&gt;&lt;/a&gt;&lt;/div&gt; Full MS and MS&lt;sup&gt;2&lt;/sup&gt; spectra of a peptide.&lt;/div&gt; &lt;/div&gt; &lt;/div&gt; &lt;p&gt;Tandem MS is becoming a more popular experimental method for identifying proteins. Collision-induced dissociation is used in mainstream applications to generate a set of fragments from a specific peptide ion. The fragmentation process primarily gives rise to cleavage products that break along peptide bonds. Because of this simplicity in fragmentation, it is possible to use the observed fragment masses to match with a database of predicted masses for one of many given peptide sequences. Tandem MS of whole protein ions has been investigated recently using &lt;a href="http://en.wikipedia.org/wiki/Electron_capture_dissociation" title="Electron capture dissociation"&gt;electron capture dissociation&lt;/a&gt; and has demonstrated extensive sequence information in principle but is not in common practice. This is sometimes referred to as the "top-down" approach in that it involves starting with the whole mass and then pulling it apart rather than starting with pieces (proteolytic fragments) and piecing the protein back together using &lt;a href="http://en.wikipedia.org/wiki/De_novo_repeat_detection" title="De novo repeat detection"&gt;De novo repeat detection&lt;/a&gt; (bottom-up).&lt;/p&gt; &lt;p&gt;A number of different algorithmic approaches have been described to identify peptides and proteins from tandem mass spectrometry (MS/MS), peptide de novo sequencing and sequence tag based searching.&lt;/p&gt; &lt;p&gt;&lt;br /&gt;A popular option that combines a comprehensive range of data analysis features is &lt;a href="http://en.wikipedia.org/wiki/PEAKS%28software%29" title="PEAKS(software)"&gt;PEAKS&lt;/a&gt; *.&lt;/p&gt; &lt;p&gt;Other existing mass spec analysis software include: Peptide fragment fingerprinting &lt;a href="http://en.wikipedia.org/wiki/SEQUEST" title="SEQUEST"&gt;SEQUEST&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/Mascot" title="Mascot"&gt;Mascot&lt;/a&gt;, &lt;a href="http://pubchem.ncbi.nlm.nih.gov/omssa/" class="external text" title="http://pubchem.ncbi.nlm.nih.gov/omssa/" rel="nofollow"&gt;OMSSA&lt;/a&gt; and &lt;a href="http://www.thegpm.org/TANDEM/" class="external text" title="http://www.thegpm.org/TANDEM/" rel="nofollow"&gt;X!Tandem&lt;/a&gt;). Peptide de novo sequencing (&lt;a href="http://www.hairyfatguy.com/lutefisk/" class="external text" title="http://www.hairyfatguy.com/lutefisk/" rel="nofollow"&gt;LuteFisk&lt;/a&gt;, &lt;a href="http://peptide.ucsd.edu/" class="external text" title="http://peptide.ucsd.edu" rel="nofollow"&gt;PepNovo&lt;/a&gt;, and Sherenga). &lt;a href="http://en.wikipedia.org/wiki/Peptide_sequence_tag" title="Peptide sequence tag"&gt;Peptide sequence tag&lt;/a&gt; based searching (&lt;a href="http://bif.csd.uwo.ca/spider/" class="external text" title="http://bif.csd.uwo.ca/spider/" rel="nofollow"&gt;SPIDER&lt;/a&gt;, &lt;a href="http://peptide.ucsd.edu/" class="external text" title="http://peptide.ucsd.edu/" rel="nofollow"&gt;InsPecT&lt;/a&gt;, and GutenTAG).&lt;/p&gt; &lt;p&gt;&lt;sup id="_ref-18" class="reference"&gt;&lt;a href="http://en.wikipedia.org/wiki/Mass_spectrometry#_note-18" title=""&gt;&lt;br /&gt;&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt; &lt;p&gt;&lt;a name="Protein_quantitation" id="Protein_quantitation"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h4&gt;&lt;span class="editsection"&gt;&lt;/span&gt;&lt;span class="mw-headline"&gt;Protein quantitation&lt;/span&gt;&lt;/h4&gt; &lt;p&gt;Several recent methods allow for the quantitation of proteins by mass spectrometry. Typically, stable (e.g. non-radioactive) heavier &lt;a href="http://en.wikipedia.org/wiki/Isotope" title="Isotope"&gt;isotopes&lt;/a&gt; of &lt;a href="http://en.wikipedia.org/wiki/Carbon" title="Carbon"&gt;carbon&lt;/a&gt; (C13) or &lt;a href="http://en.wikipedia.org/wiki/Nitrogen" title="Nitrogen"&gt;nitrogen&lt;/a&gt; (N15) are incorporated into one sample while the other one is labelled with corresponding light isotopes (e.g. C12 and N14). The two samples are mixed before the analysis. Peptides derived from the different samples can be distinguished due to their mass difference. The ratio of their peak intensities corresponds to the relative abundance ratio of the peptides (and proteins). The most popular methods for isotope labelling are &lt;a href="http://en.wikipedia.org/wiki/SILAC" title="SILAC"&gt;SILAC&lt;/a&gt; (stable isotope labelling with amino acids in cell culture), trypsin-catalyzed O18 labeling, ICAT (isotope coded affinity tagging), ITRAQ (isotope tags for relative and absolute quantitation).&lt;sup id="_ref-19" class="reference"&gt;&lt;a href="http://en.wikipedia.org/wiki/Mass_spectrometry#_note-19" title=""&gt;&lt;/a&gt;&lt;/sup&gt; “Semi-quantitative” mass spectrometry can be performed without labeling of samples. Typically, this is done with MALDI analysis (in linear mode). The peak intensity, or the peak area, from individual molecules (typically proteins) is here correlated to the amount of protein in the sample. However, the individual signal depends on the primary structure of the protein, on the complexity of the sample, and on the settings of the instrument.&lt;/p&gt; &lt;p&gt;&lt;a name="Protein_structure" id="Protein_structure"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h4&gt;&lt;span class="editsection"&gt;&lt;/span&gt;&lt;span class="mw-headline"&gt;Protein structure&lt;/span&gt;&lt;/h4&gt; &lt;p&gt;Characteristics indicative of the &lt;a href="http://en.wikipedia.org/wiki/Protein_structure" title="Protein structure"&gt;3 dimensional structure&lt;/a&gt; of proteins can be probed with mass spectrometry in various ways. By using chemical crosslinking to couple parts of the protein that are close in space, but far apart in sequence, information about the overall structure can be inferred. By following the &lt;a href="http://en.wikipedia.org/wiki/Hydrogen-deuterium_exchange" title="Hydrogen-deuterium exchange"&gt;exchange of amide protons&lt;/a&gt; with &lt;a href="http://en.wikipedia.org/wiki/Deuterium" title="Deuterium"&gt;deuterium&lt;/a&gt; from the solvent, it is possible to probe the solvent accessibility of various parts of the protein. &lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3299081579068555714-4074903673324525302?l=sciencemasala.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://sciencemasala.blogspot.com/feeds/4074903673324525302/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3299081579068555714&amp;postID=4074903673324525302' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/4074903673324525302'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/4074903673324525302'/><link rel='alternate' type='text/html' href='http://sciencemasala.blogspot.com/2007/07/mass-spectrometry.html' title='Mass spectrometry'/><author><name>chandu</name><uri>http://www.blogger.com/profile/00723479850668875546</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp0.blogger.com/_0fSb-1TJAx0/RqMFeiNp_DI/AAAAAAAAACE/ogKgripSVT4/s72-c/200px-Ms_block_schematic.gif' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3299081579068555714.post-6322814180542583984</id><published>2007-07-22T00:16:00.000-07:00</published><updated>2007-07-22T00:18:17.519-07:00</updated><title type='text'>Protein microarray</title><content type='html'>A &lt;b&gt;protein microarray&lt;/b&gt; is a piece of glass on which different molecules of protein have been affixed at separate locations in an ordered manner thus forming a microscopic array. These are used to identify protein-protein interactions, to identify the substrates of protein kinases, or to identify the targets of biologically active small molecules. The most common protein microarray is the &lt;a href="http://en.wikipedia.org/wiki/Antibody_microarray" title="Antibody microarray"&gt;antibody microarray&lt;/a&gt;, where antibodies are spotted onto the protein chip and are used as capture molecules to detect proteins from cell lysate solutions.&lt;h2&gt;&lt;span class="mw-headline"&gt;Applications&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;&lt;b&gt;Protein microarrays&lt;/b&gt; (also &lt;b&gt;biochip&lt;/b&gt;, &lt;b&gt;proteinchip&lt;/b&gt;) are measurement devices used in biomedical applications to determine the presence and/or amount (referred to as quantitation) of &lt;a href="http://en.wikipedia.org/wiki/Protein" title="Protein"&gt;proteins&lt;/a&gt; in biological samples, e.g. &lt;a href="http://en.wikipedia.org/wiki/Blood" title="Blood"&gt;blood&lt;/a&gt;. They have the potential to be an important tool for &lt;a href="http://en.wikipedia.org/wiki/Proteomics" title="Proteomics"&gt;proteomics&lt;/a&gt; research. Usually a multitude of different capture agents, most frequently &lt;a href="http://en.wikipedia.org/wiki/Monoclonal_antibodies" title="Monoclonal antibodies"&gt;monoclonal antibodies&lt;/a&gt;, are deposited on a chip surface (glass or silicon) in a miniature array. This format is often also referred to as a &lt;b&gt;microarray&lt;/b&gt; (a more general term for chip based biological measurement devices).&lt;/p&gt; &lt;p&gt;&lt;a name="Types_of_Protein_Chips" id="Types_of_Protein_Chips"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Types of Protein Chips&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;There are several types of protein chips, however the most common are glass slide chips and nano-well arrays.&lt;/p&gt; &lt;p&gt;&lt;a name="Production_of_Protein_Arrays" id="Production_of_Protein_Arrays"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Production of Protein Arrays&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;The proteins can be externally synthesised, purified and attached to the array. Alternatively they can be in-situ synthesised and directly attached to the array.&lt;/p&gt; &lt;p&gt;The proteins can be synthesised through &lt;a href="http://en.wikipedia.org/wiki/Protein_biosynthesis" title="Protein biosynthesis"&gt;biosynthesis&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/w/index.php?title=Cell-free_DNA_expression&amp;action=edit" class="new" title="Cell-free DNA expression"&gt;cell-free DNA expression&lt;/a&gt; or &lt;a href="http://en.wikipedia.org/wiki/Peptide_synthesis" title="Peptide synthesis"&gt;chemical synthesis&lt;/a&gt;. In-situ synthesis is possible with the latter two. With cell-free DNA expression proteins are attached to the support right after their production. Peptides chemically procued by solid phase peptide synthesis are already attached to the support. Selective deprotection is carried out through lithographic methods or by the so called SPOT-synthesis.&lt;/p&gt; &lt;p&gt;&lt;a name="Types_of_Capture_Molecules" id="Types_of_Capture_Molecules"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Types of Capture Molecules&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;Capture molecules used are most commonly &lt;a href="http://en.wikipedia.org/wiki/Antibodies" title="Antibodies"&gt;antibodies&lt;/a&gt;; however, more recently there has been a push towards other types of capture molecules which are more similar in their nature such as peptides or aptamers. Antibodies have several problems including the fact that there are not antibodies for most proteins and also problems with specificity in some commercial antibody preparations. Nevertheless, antibodies still represent the most well characterized and effective protein capture agent for microarrays. Recently, nucleic acids, receptors, enzymes, and proteins have been spotted onto chips and used as capture molecules. This will allow a vast variety of experiments to be conducted on protein-protein interactions, and all other protein binding substrates.&lt;/p&gt; &lt;p&gt;&lt;a name="Detection_methods" id="Detection_methods"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Detection methods&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;Although protein microarrays may use similar detection methods as DNA Microarrays, a problem is that protein concentrations in a biological sample may be many orders of magnitute different from that for mRNAs. Therefore, protein chip detection methods must have a much larger range of detection.&lt;/p&gt; &lt;p&gt;The preferred method of detection currently is fluorescence detection. Fluorescent detection is safe, sensitive, and can have a high resolution. The fluorescent detection method is compatible with standard microarray scanners, however some minor alterations to software may need to be done.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3299081579068555714-6322814180542583984?l=sciencemasala.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://sciencemasala.blogspot.com/feeds/6322814180542583984/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3299081579068555714&amp;postID=6322814180542583984' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/6322814180542583984'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/6322814180542583984'/><link rel='alternate' type='text/html' href='http://sciencemasala.blogspot.com/2007/07/protein-microarray.html' title='Protein microarray'/><author><name>chandu</name><uri>http://www.blogger.com/profile/00723479850668875546</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3299081579068555714.post-5016714548413509198</id><published>2007-07-22T00:14:00.000-07:00</published><updated>2007-07-22T00:16:11.520-07:00</updated><title type='text'>Antibody microarray</title><content type='html'>&lt;h1 class="firstHeading"&gt;Antibody microarray&lt;/h1&gt;       &lt;h3 id="siteSub"&gt;&lt;br /&gt;&lt;/h3&gt;              &lt;div id="jump-to-nav"&gt;&lt;a href="http://en.wikipedia.org/wiki/Antibody_microarray#searchInput"&gt;&lt;br /&gt;&lt;/a&gt;&lt;/div&gt;   &lt;!-- start content --&gt;    &lt;div class="thumb tright"&gt; &lt;div class="thumbinner" style="width: 182px;"&gt;&lt;a href="http://en.wikipedia.org/wiki/Image:Antibody_microarray.jpg" class="internal" title="Samples of antibody microarray creations and detections."&gt;&lt;img alt="Samples of antibody microarray creations and detections." longdesc="/wiki/Image:Antibody_microarray.jpg" class="thumbimage" src="http://upload.wikimedia.org/wikipedia/en/thumb/9/99/Antibody_microarray.jpg/180px-Antibody_microarray.jpg" height="173" width="180" /&gt;&lt;/a&gt; &lt;div class="thumbcaption"&gt; &lt;div class="magnify" style="float: right;"&gt;&lt;a href="http://en.wikipedia.org/wiki/Image:Antibody_microarray.jpg" class="internal" title="Enlarge"&gt;&lt;img src="http://en.wikipedia.org/skins-1.5/common/images/magnify-clip.png" alt="" height="11" width="15" /&gt;&lt;/a&gt;&lt;/div&gt; Samples of antibody microarray creations and detections.&lt;/div&gt; &lt;/div&gt; &lt;/div&gt; &lt;p&gt;An &lt;b&gt;antibody microarray&lt;/b&gt; is a specific form of &lt;a href="http://en.wikipedia.org/wiki/Protein_microarray" title="Protein microarray"&gt;protein microarrays&lt;/a&gt;, a collection of capture &lt;a href="http://en.wikipedia.org/wiki/Antibodies" title="Antibodies"&gt;antibodies&lt;/a&gt; are spotted and fixed on a solid surface, such as glass, plastic and silicon chip for the purpose of detecting antigens. Antibody microarray is often used for detecting &lt;a href="http://en.wikipedia.org/wiki/Protein_expression" title="Protein expression"&gt;protein expressions&lt;/a&gt; from &lt;a href="http://en.wikipedia.org/wiki/Cell_lysate" title="Cell lysate"&gt;cell lysates&lt;/a&gt; in general research and special &lt;a href="http://en.wikipedia.org/wiki/Biomarker" title="Biomarker"&gt;biomarkers&lt;/a&gt; from &lt;a href="http://en.wikipedia.org/wiki/Blood_plasma" title="Blood plasma"&gt;serum&lt;/a&gt; or &lt;a href="http://en.wikipedia.org/wiki/Urine" title="Urine"&gt;urine&lt;/a&gt; for diagnostic applications.&lt;/p&gt; &lt;p&gt;Related microarray technologies also include &lt;a href="http://en.wikipedia.org/wiki/Protein_microarray" title="Protein microarray"&gt;Protein microarrays&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/DNA_microarray" title="DNA microarray"&gt;DNA microarrays&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/Tissue_microarray" title="Tissue microarray"&gt;Tissue microarrays&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Chemical_Compound_Microarray" title="Chemical Compound Microarray"&gt;Chemical Compound Microarrays&lt;/a&gt;.&lt;/p&gt;    &lt;table style="background-color: transparent;" cellpadding="0" cellspacing="0"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;br /&gt;&lt;/td&gt; &lt;td&gt;&lt;i&gt; &lt;/i&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3299081579068555714-5016714548413509198?l=sciencemasala.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://sciencemasala.blogspot.com/feeds/5016714548413509198/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3299081579068555714&amp;postID=5016714548413509198' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/5016714548413509198'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/5016714548413509198'/><link rel='alternate' type='text/html' href='http://sciencemasala.blogspot.com/2007/07/antibody-microarray.html' title='Antibody microarray'/><author><name>chandu</name><uri>http://www.blogger.com/profile/00723479850668875546</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3299081579068555714.post-30051431989342892</id><published>2007-07-22T00:08:00.000-07:00</published><updated>2007-07-22T00:12:37.056-07:00</updated><title type='text'>DNA microarray</title><content type='html'>&lt;h3 id="siteSub"&gt;&lt;br /&gt;&lt;/h3&gt;              &lt;div id="jump-to-nav"&gt;&lt;a href="http://en.wikipedia.org/wiki/DNA_microarray#searchInput"&gt;&lt;br /&gt;&lt;/a&gt;&lt;/div&gt;   &lt;!-- start content --&gt;    &lt;div class="thumb tright"&gt; &lt;div class="thumbinner" style="width: 352px;"&gt;&lt;a href="http://en.wikipedia.org/wiki/Image:Microarray2.gif" class="internal" title="Example of an approximately 40,000 probe spotted oligo microarray with enlarged inset to show detail."&gt;&lt;img alt="Example of an approximately 40,000 probe spotted oligo microarray with enlarged inset to show detail." longdesc="/wiki/Image:Microarray2.gif" class="thumbimage" src="http://upload.wikimedia.org/wikipedia/en/thumb/0/0e/Microarray2.gif/350px-Microarray2.gif" height="213" width="350" /&gt;&lt;/a&gt; &lt;div class="thumbcaption"&gt; &lt;div class="magnify" style="float: right;"&gt;&lt;a href="http://en.wikipedia.org/wiki/Image:Microarray2.gif" class="internal" title="Enlarge"&gt;&lt;img src="http://en.wikipedia.org/skins-1.5/common/images/magnify-clip.png" alt="" height="11" width="15" /&gt;&lt;/a&gt;&lt;/div&gt; Example of an approximately 40,000 probe spotted oligo microarray with enlarged inset to show detail.&lt;/div&gt; &lt;/div&gt; &lt;/div&gt; A &lt;b&gt;DNA microarray&lt;/b&gt; (also commonly known as &lt;i&gt;&lt;a href="http://en.wikipedia.org/wiki/Gene" title="Gene"&gt;gene&lt;/a&gt; or &lt;a href="http://en.wikipedia.org/wiki/Genome" title="Genome"&gt;genome&lt;/a&gt; chip&lt;/i&gt;, &lt;i&gt;DNA chip&lt;/i&gt;, or &lt;i&gt;gene array&lt;/i&gt;) is a collection of microscopic &lt;a href="http://en.wikipedia.org/wiki/DNA" title="DNA"&gt;DNA&lt;/a&gt; spots, commonly representing single &lt;a href="http://en.wikipedia.org/wiki/Gene" title="Gene"&gt;genes&lt;/a&gt;, arrayed on a solid surface by &lt;a href="http://en.wikipedia.org/wiki/Covalent" title="Covalent"&gt;covalent&lt;/a&gt; attachment to chemically suitable matrices. DNA arrays are different from other types of &lt;i&gt;microarray&lt;/i&gt; only in that they either measure DNA or use DNA as part of its detection system. Qualitative or quantitative measurements with DNA microarrays utilize the selective nature of DNA-DNA or DNA-RNA &lt;a href="http://en.wikipedia.org/wiki/DNA_hybridization" title="DNA hybridization"&gt;hybridization&lt;/a&gt; under high-stringency conditions and &lt;a href="http://en.wikipedia.org/wiki/Fluorophore" title="Fluorophore"&gt;fluorophore&lt;/a&gt;-based detection. DNA arrays are commonly used for &lt;a href="http://en.wikipedia.org/wiki/Expression_profiling" title="Expression profiling"&gt;expression profiling&lt;/a&gt;, i.e., monitoring &lt;a href="http://en.wikipedia.org/wiki/Gene_expression" title="Gene expression"&gt;expression&lt;/a&gt; levels of thousands of genes simultaneously, or for comparative genomic hybridization&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;&lt;span class="mw-headline"&gt;Introduction&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;Arrays of DNA can either be spatially arranged, as in the commonly known &lt;i&gt;&lt;a href="http://en.wikipedia.org/wiki/Gene" title="Gene"&gt;gene&lt;/a&gt; or &lt;a href="http://en.wikipedia.org/wiki/Genome" title="Genome"&gt;genome&lt;/a&gt; chip&lt;/i&gt;, &lt;i&gt;DNA chip&lt;/i&gt;, or &lt;i&gt;gene array&lt;/i&gt;, or can be specific DNA sequences tagged or labelled such that they can be independently identified in solution. The traditional solid-phase array is a collection of microscopic &lt;a href="http://en.wikipedia.org/wiki/DNA" title="DNA"&gt;DNA&lt;/a&gt; spots attached to a solid surface, such as &lt;a href="http://en.wikipedia.org/wiki/Glass" title="Glass"&gt;glass&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/Plastic" title="Plastic"&gt;plastic&lt;/a&gt; or &lt;a href="http://en.wikipedia.org/wiki/Silicon" title="Silicon"&gt;silicon&lt;/a&gt; &lt;a href="http://en.wikipedia.org/wiki/Biochip" title="Biochip"&gt;chip&lt;/a&gt;. The affixed DNA segments are known as &lt;i&gt;probes&lt;/i&gt; (although some sources will use different nomenclature such as &lt;i&gt;reporters&lt;/i&gt;), thousands of which can be placed in known locations on a single DNA microarray. Microarray technology evolved from &lt;a href="http://en.wikipedia.org/wiki/Southern_blotting" title="Southern blotting"&gt;Southern blotting&lt;/a&gt;, whereby fragmented DNA is attached to a &lt;a href="http://en.wikipedia.org/wiki/Substrate_%28biochemistry%29" title="Substrate (biochemistry)"&gt;substrate&lt;/a&gt; and then probed with a known gene or fragment.&lt;/p&gt; &lt;p&gt;Applications of these arrays include:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;&lt;a href="http://en.wikipedia.org/wiki/MRNA" title="MRNA"&gt;mRNA&lt;/a&gt; or gene &lt;a href="http://en.wikipedia.org/wiki/Expression_profiling" title="Expression profiling"&gt;expression profiling&lt;/a&gt; - Monitoring &lt;a href="http://en.wikipedia.org/wiki/Gene_expression" title="Gene expression"&gt;expression&lt;/a&gt; levels for thousands of genes simultaneously is relevant to many areas of &lt;a href="http://en.wikipedia.org/wiki/Biology" title="Biology"&gt;biology&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Medicine" title="Medicine"&gt;medicine&lt;/a&gt;, such as studying treatments, &lt;a href="http://en.wikipedia.org/wiki/Disease" title="Disease"&gt;disease&lt;/a&gt;, and developmental stages. For example, microarrays can be used to identify disease genes by comparing gene expression in diseased and normal cells (reference?).&lt;/li&gt;&lt;li&gt;&lt;a href="http://en.wikipedia.org/wiki/Comparative_genomic_hybridization" title="Comparative genomic hybridization"&gt;comparative genomic hybridization&lt;/a&gt; (Array CGH) - Assessing large genomic rearrangements.&lt;/li&gt;&lt;li&gt;&lt;a href="http://en.wikipedia.org/wiki/SNP_array" title="SNP array"&gt;SNP detection arrays&lt;/a&gt; - Looking for &lt;a href="http://en.wikipedia.org/wiki/Single_nucleotide_polymorphism" title="Single nucleotide polymorphism"&gt;Single nucleotide polymorphism&lt;/a&gt; in the genome of populations.&lt;/li&gt;&lt;li&gt;&lt;a href="http://en.wikipedia.org/wiki/Chromatin_immunoprecipitation" title="Chromatin immunoprecipitation"&gt;Chromatin immunoprecipitation&lt;/a&gt; (chIP) studies - Determining protein binding site occupancy throughout the genome, employing &lt;a href="http://en.wikipedia.org/wiki/ChIP-on-chip" title="ChIP-on-chip"&gt;ChIP-on-chip&lt;/a&gt; technology.&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;&lt;a name="Fabrication" id="Fabrication"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt;&lt;span class="mw-headline"&gt;Fabrication&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;Microarrays can be fabricated using a variety of technologies, including printing with fine-pointed pins onto glass slides, &lt;a href="http://en.wikipedia.org/wiki/Photolithography" title="Photolithography"&gt;photolithography&lt;/a&gt; using pre-made masks, photolithography using dynamic micromirror devices, ink-jet printing&lt;span style="text-decoration: underline;"&gt;,&lt;/span&gt; or &lt;a href="http://en.wikipedia.org/wiki/Electrochemistry" title="Electrochemistry"&gt;electrochemistry&lt;/a&gt; on microelectrode arrays.&lt;/p&gt;&lt;div class="medialist videolist"&gt;&lt;ul&gt;&lt;li&gt;&lt;big&gt;&lt;span class="plainlinks"&gt;&lt;a href="http://tools.wikimedia.de/%7Egmaxwell/jorbis/JOrbisPlayer.php?path=Microarray+printing.ogg" class="external text" title="http://tools.wikimedia.de/~gmaxwell/jorbis/JOrbisPlayer.php?path=Microarray+printing.ogg" rel="nofollow"&gt;&lt;br /&gt;&lt;/a&gt;&lt;/span&gt;&lt;/big&gt; &lt;ul&gt;&lt;li&gt;A DNA microarray being printed by a &lt;a href="http://en.wikipedia.org/wiki/Robot" title="Robot"&gt;robot&lt;/a&gt; at the &lt;a href="http://en.wikipedia.org/wiki/University_of_Delaware" title="University of Delaware"&gt;University of Delaware&lt;/a&gt;. &lt;i&gt;&lt;br /&gt;&lt;/i&gt;&lt;/li&gt;&lt;/ul&gt; &lt;/li&gt;&lt;/ul&gt; &lt;/div&gt; &lt;p&gt;DNA microarrays can be used to detect RNAs that may or may not be translated into active proteins. Scientists refer to this kind of analysis as &lt;a href="http://en.wikipedia.org/wiki/Gene_expression" title="Gene expression"&gt;"expression analysis"&lt;/a&gt; or &lt;a href="http://en.wikipedia.org/wiki/Expression_profiling" title="Expression profiling"&gt;expression profiling&lt;/a&gt;. Since there can be tens of thousands of distinct probes on an array, each microarray experiment can accomplish the equivalent number of genetic tests in parallel. Arrays have therefore dramatically accelerated many types of investigations.&lt;/p&gt; &lt;p&gt;&lt;br /&gt;&lt;/p&gt; &lt;p&gt;&lt;a name="Spotted_microarrays" id="Spotted_microarrays"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt;&lt;span class="mw-headline"&gt;Spotted microarrays&lt;/span&gt;&lt;/h3&gt; &lt;div class="thumb tleft"&gt; &lt;div class="thumbinner" style="width: 182px;"&gt;&lt;a href="http://en.wikipedia.org/wiki/Image:Microarray-schema.gif" class="internal" title="Diagram of typical dual-colour microarray experiment."&gt;&lt;img alt="Diagram of typical dual-colour microarray experiment." longdesc="/wiki/Image:Microarray-schema.gif" class="thumbimage" src="http://upload.wikimedia.org/wikipedia/en/thumb/3/32/Microarray-schema.gif/180px-Microarray-schema.gif" height="263" width="180" /&gt;&lt;/a&gt; &lt;div class="thumbcaption"&gt; &lt;div class="magnify" style="float: right;"&gt;&lt;a href="http://en.wikipedia.org/wiki/Image:Microarray-schema.gif" class="internal" title="Enlarge"&gt;&lt;img src="http://en.wikipedia.org/skins-1.5/common/images/magnify-clip.png" alt="" height="11" width="15" /&gt;&lt;/a&gt;&lt;/div&gt; Diagram of typical dual-colour microarray experiment.&lt;/div&gt; &lt;/div&gt; &lt;/div&gt; &lt;p&gt;In &lt;b&gt;spotted microarrays&lt;/b&gt; (or &lt;b&gt;two-channel or two-colour microarrays&lt;/b&gt;), the probes are &lt;a href="http://en.wikipedia.org/wiki/Oligonucleotide" title="Oligonucleotide"&gt;oligonucleotides&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/CDNA" title="CDNA"&gt;cDNA&lt;/a&gt; or small fragments of &lt;a href="http://en.wikipedia.org/wiki/PCR" title="PCR"&gt;PCR&lt;/a&gt; products that correspond to &lt;a href="http://en.wikipedia.org/wiki/MRNA" title="MRNA"&gt;mRNAs&lt;/a&gt; and are spotted onto the microarray surface. This type of array is typically &lt;a href="http://en.wikipedia.org/wiki/DNA_hybridization" title="DNA hybridization"&gt;hybridized&lt;/a&gt; with cDNA from two samples to be compared (e.g. diseased tissue versus healthy tissue) that are labeled with two different &lt;a href="http://en.wikipedia.org/wiki/Fluorophore" title="Fluorophore"&gt;fluorophores&lt;/a&gt; (e.g. &lt;a href="http://en.wikipedia.org/wiki/Rhodamine" title="Rhodamine"&gt;Rhodamine&lt;/a&gt; (Cyanine 5, red) and &lt;a href="http://en.wikipedia.org/wiki/Fluorescein" title="Fluorescein"&gt;Fluorescein&lt;/a&gt; (Cyanine 3, green)). The two samples are mixed and hybridized to a single microarray that is then scanned in a microarray scanner to visualize &lt;a href="http://en.wikipedia.org/wiki/Fluorescence" title="Fluorescence"&gt;fluorescence&lt;/a&gt; of the two fluorophores. Relative intensities of each fluorophore are then used to identify up-regulated and down-regulated genes in ratio-based analysis. Absolute levels of gene expression cannot be determined in the two-colour array, but relative differences in expression among different spots (=genes) can be estimated with some oligonucleotide arrays. Examples of providers for such microarrays includes &lt;a href="http://en.wikipedia.org/wiki/Agilent" title="Agilent"&gt;Agilent&lt;/a&gt; with their Dual-Mode platform, &lt;a href="http://en.wikipedia.org/wiki/Eppendorf_%28company%29" title="Eppendorf (company)"&gt;Eppendorf (company)&lt;/a&gt; with their DualChip platform and [ArrayIt.]&lt;/p&gt; &lt;p&gt;&lt;a name="Oligonucleotide_microarrays" id="Oligonucleotide_microarrays"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt;&lt;span class="mw-headline"&gt;Oligonucleotide microarrays&lt;/span&gt;&lt;/h3&gt; &lt;div class="thumb tright"&gt; &lt;div class="thumbinner" style="width: 152px;"&gt;&lt;a href="http://en.wikipedia.org/wiki/Image:Affymetrix-microarray.jpg" class="internal" title="Two Affymetrix chips"&gt;&lt;img alt="Two Affymetrix chips" longdesc="/wiki/Image:Affymetrix-microarray.jpg" class="thumbimage" src="http://upload.wikimedia.org/wikipedia/commons/thumb/2/22/Affymetrix-microarray.jpg/150px-Affymetrix-microarray.jpg" height="136" width="150" /&gt;&lt;/a&gt; &lt;div class="thumbcaption"&gt; &lt;div class="magnify" style="float: right;"&gt;&lt;a href="http://en.wikipedia.org/wiki/Image:Affymetrix-microarray.jpg" class="internal" title="Enlarge"&gt;&lt;img src="http://en.wikipedia.org/skins-1.5/common/images/magnify-clip.png" alt="" height="11" width="15" /&gt;&lt;/a&gt;&lt;/div&gt; Two Affymetrix chips&lt;/div&gt; &lt;/div&gt; &lt;/div&gt; &lt;p&gt;In &lt;b&gt;oligonucleotide microarrays&lt;/b&gt; (or &lt;b&gt;single-channel microarrays&lt;/b&gt;), the probes are designed to match parts of the sequence of known or predicted mRNAs. There are commercially available designs that cover complete genomes from companies such as &lt;a href="http://en.wikipedia.org/wiki/GE_Healthcare" title="GE Healthcare"&gt;GE Healthcare&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/Affymetrix" title="Affymetrix"&gt;Affymetrix&lt;/a&gt;, &lt;a href="http://www.ocimumbio.com/" class="external text" title="http://www.ocimumbio.com/" rel="nofollow"&gt;Ocimum Biosolutions&lt;/a&gt;, or &lt;a href="http://en.wikipedia.org/wiki/Agilent" title="Agilent"&gt;Agilent&lt;/a&gt;. These microarrays give estimations of the absolute value of gene expression and therefore the comparison of two conditions requires the use of two separate microarrays.&lt;/p&gt; &lt;p&gt;Oligonucleotide Arrays can be either produced by piezoelectric deposition with full length oligonucleotides or in-situ synthesis.&lt;/p&gt; &lt;p&gt;Long Oligonucleotide Arrays are composed of 60-mers, or 50-mers and are produced by ink-jet printing on a silica substrate. Short Oligonucleotide Arrays are composed of 25-mer or 30-mer and are produced by photolithographic synthesis (Affymetrix) on a silica substrate or piezoelectric deposition (GE Healthcare) on an acrylamide matrix. More recently, Maskless Array Synthesis from NimbleGen Systems has combined flexibility with large numbers of probes. Arrays can contain up to 390,000 spots, from a custom array design. New array formats are being developed to study specific pathways or disease states for a systems biology approach.&lt;/p&gt; &lt;p&gt;Oligonucleotide microarrays often contain control probes designed to hybridize with &lt;a href="http://en.wikipedia.org/wiki/RNA_spike-in" title="RNA spike-in"&gt;RNA spike-ins&lt;/a&gt;. The degree of hybridization between the spike-ins and the control probes is used to normalize the hybridization measurements for the target probes.&lt;/p&gt; &lt;p&gt;&lt;a name="Genotyping_microarrays" id="Genotyping_microarrays"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Genotyping microarrays&lt;/span&gt;&lt;/h2&gt; &lt;dl&gt;&lt;dd&gt; &lt;div class="noprint"&gt;&lt;i&gt;&lt;a href="http://en.wikipedia.org/wiki/SNP_array" title="SNP array"&gt;&lt;br /&gt;&lt;/a&gt;&lt;/i&gt;&lt;/div&gt; &lt;/dd&gt;&lt;/dl&gt; &lt;p&gt;DNA microarrays can also be used to &lt;i&gt;read&lt;/i&gt; the sequence of a genome in particular positions.&lt;/p&gt; &lt;p&gt;&lt;b&gt;SNP microarrays&lt;/b&gt; are a particular type of DNA microarrays that are used to identify genetic variation in individuals and across populations. Short oligonucleotide arrays can be used to identify the &lt;a href="http://en.wikipedia.org/wiki/Single_nucleotide_polymorphisms" title="Single nucleotide polymorphisms"&gt;single nucleotide polymorphisms&lt;/a&gt; (SNPs) that are thought to be responsible for genetic variation and the source of susceptibility to genetically caused diseases. Generally termed &lt;a href="http://en.wikipedia.org/wiki/Genotyping" title="Genotyping"&gt;genotyping&lt;/a&gt; applications, DNA microarrays may be used in this fashion for forensic applications, rapidly discovering or measuring genetic predisposition to disease, or identifying DNA-based drug candidates.&lt;/p&gt; &lt;p&gt;These SNP microarrays are also being used to profile &lt;a href="http://en.wikipedia.org/wiki/Somatic" title="Somatic"&gt;somatic&lt;/a&gt; &lt;a href="http://en.wikipedia.org/wiki/Mutations" title="Mutations"&gt;mutations&lt;/a&gt; in &lt;a href="http://en.wikipedia.org/wiki/Cancer" title="Cancer"&gt;cancer&lt;/a&gt;, specifically &lt;a href="http://en.wikipedia.org/wiki/Loss_of_heterozygosity" title="Loss of heterozygosity"&gt;loss of heterozygosity&lt;/a&gt; events and amplifications and deletions of regions of DNA. Amplifications and deletions can also be detected using &lt;a href="http://en.wikipedia.org/wiki/Comparative_genomic_hybridization" title="Comparative genomic hybridization"&gt;comparative genomic hybridization&lt;/a&gt;, or aCGH, in conjunction with microarrays, but may be limited in detecting novel &lt;a href="http://en.wikipedia.org/w/index.php?title=Copy_Number_Polymorphisms&amp;action=edit" class="new" title="Copy Number Polymorphisms"&gt;Copy Number Polymorphisms&lt;/a&gt;, or CNPs, by probe coverage.&lt;/p&gt; &lt;p&gt;Resequencing arrays have also been developed to sequence portions of the &lt;a href="http://en.wikipedia.org/wiki/Genome" title="Genome"&gt;genome&lt;/a&gt; in individuals. These arrays may be used to evaluate &lt;a href="http://en.wikipedia.org/wiki/Germline" title="Germline"&gt;germline&lt;/a&gt; mutations in individuals, or somatic mutations in cancers.&lt;/p&gt; &lt;p&gt;Genome tiling arrays include overlapping oligonucleotides designed to blanket an entire genomic region of interest. Many companies have successfully designed tiling arrays that cover whole human chromosomes.&lt;/p&gt; &lt;p&gt;&lt;a name="Microarrays_and_bioinformatics" id="Microarrays_and_bioinformatics"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Microarrays and bioinformatics&lt;/span&gt;&lt;/h2&gt; &lt;div class="thumb tright"&gt; &lt;div class="thumbinner" style="width: 162px;"&gt;&lt;a href="http://en.wikipedia.org/wiki/Image:Heatmap.png" class="internal" title="Gene expression values from microarray experiments can be represented as heat maps to visualize the result of data analysis."&gt;&lt;img alt="Gene expression values from microarray experiments can be represented as heat maps to visualize the result of data analysis." longdesc="/wiki/Image:Heatmap.png" class="thumbimage" src="http://upload.wikimedia.org/wikipedia/en/thumb/4/48/Heatmap.png/160px-Heatmap.png" height="160" width="160" /&gt;&lt;/a&gt; &lt;div class="thumbcaption"&gt; &lt;div class="magnify" style="float: right;"&gt;&lt;a href="http://en.wikipedia.org/wiki/Image:Heatmap.png" class="internal" title="Enlarge"&gt;&lt;img src="http://en.wikipedia.org/skins-1.5/common/images/magnify-clip.png" alt="" height="11" width="15" /&gt;&lt;/a&gt;&lt;/div&gt; Gene expression values from microarray experiments can be represented as &lt;a href="http://en.wikipedia.org/wiki/Heat_map" title="Heat map"&gt;heat maps&lt;/a&gt; to visualize the result of data analysis.&lt;/div&gt; &lt;/div&gt; &lt;/div&gt; &lt;p&gt;&lt;a name="Experimental_Design" id="Experimental_Design"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Experimental Design&lt;/span&gt;&lt;/h3&gt; &lt;p&gt;Due to the biological complexity of gene expression, the considerations of experimental design that are discussed in the &lt;a href="http://en.wikipedia.org/wiki/Expression_profiling" title="Expression profiling"&gt;expression profiling&lt;/a&gt; article are of critical importance if statistically and biologically valid conclusions are to be drawn from the data.&lt;/p&gt; &lt;ul&gt;&lt;li&gt;There are three main elements to consider when designing a microarray experiment. First, replication of the biological samples is essential for drawing conclusions from the experiment. Second, technical replicates (two RNA samples obtained from each experimental unit) help to ensure precision and allow for testing differences within treatment groups. The technical replicates may be two independent RNA extractions or two aliquots of the same extraction. Third, spots of each cDNA clone or oligonucleotide are present at least as duplicates on the microarray slide, to provide a measure of technical precision in each hybridization. It is critical that information about the sample preparation and handling is discussed in order to help identify the independent units in the experiment as well as to avoid inflated estimates of significance &lt;/li&gt;&lt;/ul&gt; &lt;p&gt;&lt;a name="Standardization" id="Standardization"&gt;&lt;/a&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Standardization&lt;/span&gt;&lt;/p&gt;  &lt;p&gt;The lack of standardization in arrays presents an &lt;a href="http://en.wikipedia.org/wiki/Interoperability" title="Interoperability"&gt;interoperability&lt;/a&gt; problem in &lt;a href="http://en.wikipedia.org/wiki/Bioinformatics" title="Bioinformatics"&gt;bioinformatics&lt;/a&gt;, which hinders the exchange of array data. Various grass-roots &lt;a href="http://en.wikipedia.org/wiki/Open_source" title="Open source"&gt;open-source&lt;/a&gt; projects are attempting to facilitate the exchange and analysis of data produced with non-proprietary chips.&lt;/p&gt; &lt;ul&gt;&lt;li&gt;The "Minimum Information About a Microarray Experiment" (&lt;a href="http://en.wikipedia.org/wiki/MIAME" title="MIAME"&gt;MIAME&lt;/a&gt;) checklist helps define the level of detail that should exist and is being adopted by many &lt;a href="http://en.wikipedia.org/wiki/Scientific_journal" title="Scientific journal"&gt;journals&lt;/a&gt; as a requirement for the submission of papers incorporating microarray results. MIAME describes the minimum required information for complying experiments, but not its format. Thus, as of 2007, whilst many formats can support the MIAME requirements there is no format which permits verification of complete semantic compliance.&lt;/li&gt;&lt;/ul&gt; &lt;ul&gt;&lt;li&gt;The "MicroArray Quality Control (MAQC) Project" is being conducted by the FDA to develop standards and quality control metrics which will eventually allow the use of MicroArray data in drug discovery, clinical practice and regulatory decision-making. &lt;/li&gt;&lt;/ul&gt; &lt;ul&gt;&lt;li&gt;The &lt;a href="http://en.wikipedia.org/wiki/MicroArray_and_Gene_Expression" title="MicroArray and Gene Expression"&gt;MicroArray and Gene Expression&lt;/a&gt; (MAGE) group is working on the standardization of the representation of gene expression data and relevant annotations.&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;&lt;a name="Statistical_analysis" id="Statistical_analysis"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Statistical analysis&lt;/span&gt;&lt;/h3&gt; &lt;p&gt;The analysis of DNA microarrays poses a large number of &lt;a href="http://en.wikipedia.org/wiki/Statistics" title="Statistics"&gt;statistical&lt;/a&gt; problems, including the &lt;a href="http://en.wikipedia.org/wiki/Normalization_%28statistics%29" title="Normalization (statistics)"&gt;normalization&lt;/a&gt; of the data. There are dozens of proposed normalization methods in the published literature; as in many other cases where authorities disagree, a sound conservative approach is to try a number of popular normalization methods and compare the conclusions reached: how sensitive are the main conclusions to the method chosen?&lt;/p&gt; &lt;p&gt;From a hypothesis-testing perspective, the large number of genes present on a single array means that the experimenter must take into account a &lt;a href="http://en.wikipedia.org/wiki/Multiple_testing" title="Multiple testing"&gt;multiple testing&lt;/a&gt; problem: even if the statistical &lt;a href="http://en.wikipedia.org/wiki/P-value" title="P-value"&gt;P-value&lt;/a&gt; assigned to a given gene indicates that it is extremely unlikely that differential expression of this gene was due to random rather than treatment effects, the very high number of genes on an array makes it likely that differential expression of some genes represent &lt;a href="http://en.wikipedia.org/wiki/Type_I_and_type_II_errors" title="Type I and type II errors"&gt;false positives&lt;/a&gt; or &lt;a href="http://en.wikipedia.org/wiki/Type_I_and_type_II_errors" title="Type I and type II errors"&gt;false negatives&lt;/a&gt;. Statistical methods tailored to microarray analyses have recently become available that assess statistical power based on the variation present in the data and the number of experimental replicates, and can help minimize type I and type II errors in the analyses.&lt;/p&gt; &lt;p&gt;A basic difference between microarray data analysis and much traditional biomedical research is the dimensionality of the data. A large clinical study might collect 100 data items per patient for thousands of patients. A medium-size microarray study will obtain many thousands of numbers per sample for perhaps a hundred samples. Many analysis techniques treat each sample as a single point in a space with thousands of dimensions, then attempt by various techniques to reduce the dimensionality of the data to something humans can visualize.&lt;/p&gt; &lt;p&gt;&lt;a name="Relation_between_probe_and_gene" id="Relation_between_probe_and_gene"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Relation between probe and gene&lt;/span&gt;&lt;/h3&gt; &lt;p&gt;The relation between a probe and the mRNA that it is expected to detect is problematic. On the one hand, some mRNAs may cross-hybridize probes in the array that are supposed to detect another mRNA. On the other hand, probes that are designed to detect the mRNA of a particular gene may be relying &lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3299081579068555714-30051431989342892?l=sciencemasala.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://sciencemasala.blogspot.com/feeds/30051431989342892/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3299081579068555714&amp;postID=30051431989342892' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/30051431989342892'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/30051431989342892'/><link rel='alternate' type='text/html' href='http://sciencemasala.blogspot.com/2007/07/dna-microarray.html' title='DNA microarray'/><author><name>chandu</name><uri>http://www.blogger.com/profile/00723479850668875546</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3299081579068555714.post-3039365379340166664</id><published>2007-07-21T23:56:00.000-07:00</published><updated>2007-07-22T00:04:47.116-07:00</updated><title type='text'>Computational phylogenetics-construction of tree</title><content type='html'>&lt;p&gt;&lt;b&gt;Computational phylogenetics&lt;/b&gt; is the application of computational &lt;a href="http://en.wikipedia.org/wiki/Algorithm" title="Algorithm"&gt;algorithms&lt;/a&gt;, methods and programs to &lt;a href="http://en.wikipedia.org/wiki/Phylogenetic" title="Phylogenetic"&gt;phylogenetic&lt;/a&gt; analyses. The goal is to assemble a &lt;a href="http://en.wikipedia.org/wiki/Phylogenetic_tree" title="Phylogenetic tree"&gt;phylogenetic tree&lt;/a&gt; representing a hypothesis about the evolutionary ancestry of a set of &lt;a href="http://en.wikipedia.org/wiki/Gene" title="Gene"&gt;genes&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/Species" title="Species"&gt;species&lt;/a&gt;, or other &lt;a href="http://en.wikipedia.org/wiki/Taxa" title="Taxa"&gt;taxa&lt;/a&gt;. For example, these techniques have been used to explore the family tree of &lt;a href="http://en.wikipedia.org/wiki/Hominid" title="Hominid"&gt;hominid&lt;/a&gt; species and the relationships between specific genes shared by many types of organisms. Traditional phylogenetics relies on &lt;a href="http://en.wikipedia.org/wiki/Morphology_%28biology%29" title="Morphology (biology)"&gt;morphological&lt;/a&gt; data obtained by measuring and quantifying the &lt;a href="http://en.wikipedia.org/wiki/Phenotype" title="Phenotype"&gt;phenotypic&lt;/a&gt; properties of representative organisms, while the more recent field of molecular phylogenetics uses &lt;a href="http://en.wikipedia.org/wiki/Nucleotide" title="Nucleotide"&gt;nucleotide&lt;/a&gt; sequences encoding genes or &lt;a href="http://en.wikipedia.org/wiki/Amino_acid" title="Amino acid"&gt;amino acid&lt;/a&gt; sequences encoding &lt;a href="http://en.wikipedia.org/wiki/Protein" title="Protein"&gt;proteins&lt;/a&gt; as the basis for classification. Many forms of molecular phylogenetics are closely related to and make extensive use of &lt;a href="http://en.wikipedia.org/wiki/Sequence_alignment" title="Sequence alignment"&gt;sequence alignment&lt;/a&gt; in constructing and refining phylogenetic trees, which are used to classify the evolutionary relationships between homologous &lt;a href="http://en.wikipedia.org/wiki/Gene" title="Gene"&gt;genes&lt;/a&gt; represented in the &lt;a href="http://en.wikipedia.org/wiki/Genome" title="Genome"&gt;genomes&lt;/a&gt; of divergent species. The phylogenetic trees constructed by computational methods are unlikely to perfectly reproduce the &lt;a href="http://en.wikipedia.org/wiki/Evolutionary_tree" title="Evolutionary tree"&gt;evolutionary tree&lt;/a&gt; that represents the historical relationships between the species being analyzed. The historical species tree may also differ from the historical tree of an individual homologous gene shared by those species.&lt;/p&gt; &lt;p&gt;Producing a phylogenetic tree requires a measure of &lt;a href="http://en.wikipedia.org/wiki/Homology_%28biology%29" title="Homology (biology)"&gt;homology&lt;/a&gt; among the characteristics shared by the taxa being compared. In morphological studies, this requires explicit decisions about which physical characteristics to measure and how to use them to encode distinct states corresponding to the input taxa. In molecular studies, a primary problem is in producing a &lt;a href="http://en.wikipedia.org/wiki/Multiple_sequence_alignment" title="Multiple sequence alignment"&gt;multiple sequence alignment&lt;/a&gt; (MSA) between the genes or amino acid sequences of interest. Progressive sequence alignment methods produce a phylogenetic tree by necessity because they incorporate new sequences into the calculated alignment in order of &lt;a href="http://en.wikipedia.org/wiki/Genetic_distance" title="Genetic distance"&gt;genetic distance&lt;/a&gt;. Although a phylogenetic tree can always be constructed from an MSA, phylogenetics methods such as &lt;a href="http://en.wikipedia.org/wiki/Maximum_parsimony" title="Maximum parsimony"&gt;maximum parsimony&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Maximum_likelihood" title="Maximum likelihood"&gt;maximum likelihood&lt;/a&gt; do not require the production of an initial or concurrent MSA.&lt;/p&gt;&lt;h2&gt;&lt;span class="mw-headline"&gt;Types of phylogenetic trees&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/Phylogenetic_tree" title="Phylogenetic tree"&gt;Phylogenetic trees&lt;/a&gt; generated by computational phylogenetics can be either &lt;i&gt;rooted&lt;/i&gt; or &lt;i&gt;unrooted&lt;/i&gt; depending on the input data and the algorithm used. A rooted tree is a &lt;a href="http://en.wikipedia.org/wiki/Directed_graph" title="Directed graph"&gt;directed graph&lt;/a&gt; that explicitly identifies a &lt;a href="http://en.wikipedia.org/wiki/Most_recent_common_ancestor" title="Most recent common ancestor"&gt;most recent common ancestor&lt;/a&gt; (MRCA), usually an imputed sequence that is not represented in the input. Genetic distance measures can be used to plot a tree with the input sequences as &lt;a href="http://en.wikipedia.org/wiki/Leaf_node" title="Leaf node"&gt;leaf nodes&lt;/a&gt; and their distances from the root proportional to their &lt;a href="http://en.wikipedia.org/wiki/Genetic_distance" title="Genetic distance"&gt;genetic distance&lt;/a&gt; from the hypothesized MRCA. Identification of a root usually requires the inclusion in the input data of at least one "outgroup" known to be only distantly related to the sequences of interest.&lt;/p&gt; &lt;p&gt;By contrast, unrooted trees plot the distances and relationships between input sequences without making assumptions regarding their descent. An unrooted tree can always be produced from a rooted tree, but a root cannot usually be placed on an unrooted tree without additional data on divergence rates, such as the assumption of the &lt;a href="http://en.wikipedia.org/wiki/Molecular_clock" title="Molecular clock"&gt;molecular clock&lt;/a&gt; hypothesis.&lt;/p&gt; &lt;p&gt;The set of all possible phylogenetic trees for a given group of input sequences can be conceptualized as a discretely defined multidimensional "tree space" through which search paths can be traced by &lt;a href="http://en.wikipedia.org/wiki/Optimization_%28mathematics%29" title="Optimization (mathematics)"&gt;optimization&lt;/a&gt; algorithms. Although counting the total number of trees for a nontrivial number of input sequences can be complicated by variations in the definition of a tree topology, it is always true that there are more rooted than unrooted trees for a given number of inputs and choice of parameters.&lt;/p&gt; &lt;p&gt;&lt;a name="Coding_characters_and_defining_homology" id="Coding_characters_and_defining_homology"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Coding characters and defining homology&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;&lt;a name="Morphological_analysis" id="Morphological_analysis"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Morphological analysis&lt;/span&gt;&lt;/h3&gt; &lt;p&gt;The basic problem in morphological phylogenetics is the assembly of a &lt;a href="http://en.wikipedia.org/wiki/Matrix_%28mathematics%29" title="Matrix (mathematics)"&gt;matrix&lt;/a&gt; representing a mapping from each of the taxa being compared to representative measurements for each of the phenotypic characteristics being used as a classifier. The types of phenotypic data used to construct this matrix depend on the taxa being compared; for individual species, they may involve measurements of average body size, lengths or sizes of particular bones or other physical features, or even behavioral manifestations. Of course, since not every possible phenotypic characteristic could be measured and encoded for analysis, the selection of which features to measure is a major inherent obstacle to the method. The decision of which traits to use as a basis for the matrix necessarily represents a hypothesis about which traits of a species or higher taxon are evolutionarily relevant. Morphological studies can be confounded by examples of &lt;a href="http://en.wikipedia.org/wiki/Convergent_evolution" title="Convergent evolution"&gt;convergent evolution&lt;/a&gt; of phenotypes. A major challenge in constructing useful classes is the high likelihood of inter-taxon overlap in the distribution of the phenotype's variation. The inclusion of extinct taxa in morphological analysis is often difficult due to absence of or incomplete &lt;a href="http://en.wikipedia.org/wiki/Fossil" title="Fossil"&gt;fossil&lt;/a&gt; records, but has been shown to have a significant effect on the trees produced; in one study only the inclusion of extinct species of &lt;a href="http://en.wikipedia.org/wiki/Ape" title="Ape"&gt;apes&lt;/a&gt; produced a morphologically derived tree that was consistent with that produced from molecular data.&lt;/p&gt; &lt;p&gt;Some phenotypic classifications, particularly those used when analyzing very diverse groups of taxa, are discrete and unambiguous; classifying organisms as possessing or lacking a tail, for example, is straightforward in the majority of cases, as is counting features such as eyes or vertebrae. However, the most appropriate representation of continuously varying phenotypic measurements is a controversial problem without a general solution. A common method is simply to sort the measurements of interest into two or more classes, rendering continuous observed variation as discretely classifiable (e.g., all examples with humerus bones longer than a given cutoff are scored as members of one state, and all members whose humerus bones are shorter than the cutoff are scored as members of a second state). This results in an easily manipulated &lt;a href="http://en.wikipedia.org/wiki/Data_set" title="Data set"&gt;data set&lt;/a&gt; but has been criticized for poor reporting of the basis for the class definitions and for sacrificing information compared to methods that use a continuous weighted distribution of measurements.&lt;/p&gt; &lt;p&gt;Because morphological data is extremely labor-intensive to collect, whether from literature sources or from field observations, reuse of previously compiled data matrices is not uncommon, although this may propagate flaws in the original matrix into multiple derivative analyses.&lt;/p&gt; &lt;p&gt;&lt;a name="Molecular_analysis" id="Molecular_analysis"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Molecular analysis&lt;/span&gt;&lt;/h3&gt; &lt;p&gt;The problem of character coding is very different in molecular analyses, as the characters in biological sequence data are immediate and discretely defined - distinct &lt;a href="http://en.wikipedia.org/wiki/Nucleotide" title="Nucleotide"&gt;nucleotides&lt;/a&gt; in &lt;a href="http://en.wikipedia.org/wiki/DNA" title="DNA"&gt;DNA&lt;/a&gt; or &lt;a href="http://en.wikipedia.org/wiki/RNA" title="RNA"&gt;RNA&lt;/a&gt; sequences and distinct &lt;a href="http://en.wikipedia.org/wiki/Amino_acid" title="Amino acid"&gt;amino acids&lt;/a&gt; in &lt;a href="http://en.wikipedia.org/wiki/Protein" title="Protein"&gt;protein&lt;/a&gt; sequences. However, defining homology can be challenging due to the inherent difficulties of &lt;a href="http://en.wikipedia.org/wiki/Multiple_sequence_alignment" title="Multiple sequence alignment"&gt;multiple sequence alignment&lt;/a&gt;. For a given gapped MSA, several rooted phylogenetic trees can be constructed that vary in their interpretations of which changes are "&lt;a href="http://en.wikipedia.org/wiki/Mutation" title="Mutation"&gt;mutations&lt;/a&gt;" versus ancestral characters, and which events are &lt;a href="http://en.wikipedia.org/wiki/Insertion_%28genetics%29" title="Insertion (genetics)"&gt;insertion mutations&lt;/a&gt; or &lt;a href="http://en.wikipedia.org/wiki/Deletion_%28genetics%29" title="Deletion (genetics)"&gt;deletion mutations&lt;/a&gt;. For example, given only a pairwise alignment with a gap region, it is impossible to determine whether one sequence bears an insertion mutation or the other carries a deletion. The problem is magnified in MSAs with unaligned and nonoverlapping gaps. In practice, sizable regions of a calculated alignment may be discounted in phylogenetic tree construction to avoid integrating noisy data into the tree calculation.&lt;/p&gt; &lt;p&gt;&lt;a name="Distance-matrix_methods" id="Distance-matrix_methods"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Distance-matrix methods&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;Distance-matrix methods of phylogenetic analysis explicitly rely on a measure of "genetic distance" between the sequences being classified, and therefore they require an MSA as an input. Distance is often defined as the fraction of mismatches at aligned positions, with gaps either ignored or counted as mismatches.&lt;sup id="_ref-mount_1" class="reference"&gt;&lt;a href="http://en.wikipedia.org/wiki/Computational_phylogenetics#_note-mount" title=""&gt;&lt;/a&gt;&lt;/sup&gt; Distance methods attempt to construct an all-to-all matrix from the sequence query set describing the distance between each sequence pair. From this is constructed a phylogenetic tree that places closely related sequences under the same &lt;a href="http://en.wikipedia.org/wiki/Interior_node" title="Interior node"&gt;interior node&lt;/a&gt; and whose branch lengths closely reproduce the observed distances between sequences. Distance-matrix methods may produce either rooted or unrooted trees, depending on the algorithm used to calculate them. They are frequently used as the basis for progressive and iterative types of &lt;a href="http://en.wikipedia.org/wiki/Multiple_sequence_alignment" title="Multiple sequence alignment"&gt;multiple sequence alignment&lt;/a&gt;. The main disadvantage of distance-matrix methods is their inability to efficiently use information about local high-variation regions that appear across multiple subtrees&lt;span style="text-decoration: underline;"&gt;.&lt;/span&gt;&lt;/p&gt; &lt;p&gt;&lt;a name="Neighbor-joining" id="Neighbor-joining"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Neighbor-joining&lt;/span&gt;&lt;/h3&gt; &lt;p&gt;Neighbor-joining methods apply general &lt;a href="http://en.wikipedia.org/wiki/Data_clustering" title="Data clustering"&gt;data clustering&lt;/a&gt; techniques to sequence analysis using genetic distance as a clustering metric. The simple &lt;a href="http://en.wikipedia.org/wiki/Neighbor-joining" title="Neighbor-joining"&gt;neighbor-joining&lt;/a&gt; method produces unrooted trees, but it does not assume a constant rate of evolution (i.e., a &lt;a href="http://en.wikipedia.org/wiki/Molecular_clock" title="Molecular clock"&gt;molecular clock&lt;/a&gt;) across lineages. Its relative, &lt;a href="http://en.wikipedia.org/wiki/UPGMA" title="UPGMA"&gt;UPGMA&lt;/a&gt; (Unweighted Pair Group Method with Arithmetic mean) produces rooted trees and requires a constant-rate assumption - that is, it assumes an &lt;a href="http://en.wikipedia.org/wiki/Ultrametric" title="Ultrametric"&gt;ultrametric&lt;/a&gt; tree in which the distances from the root to every branch tip are equal.&lt;/p&gt; &lt;p&gt;&lt;a name="Fitch-Margoliash_method" id="Fitch-Margoliash_method"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Fitch-Margoliash method&lt;/span&gt;&lt;/h3&gt; &lt;p&gt;The Fitch-Margoliash method uses a weighted &lt;a href="http://en.wikipedia.org/wiki/Least_squares" title="Least squares"&gt;least squares&lt;/a&gt; method for clustering based on genetic distance.Closely related sequences are given more weight in the tree construction process to correct for the increased inaccuracy in measuring distances between distantly related sequences. The distances used as input to the algorithm must be normalized to prevent large artifacts in computing relationships between closely related and distantly related groups. The distances calculated by this method must be &lt;a href="http://en.wikipedia.org/wiki/Linear" title="Linear"&gt;linear&lt;/a&gt;; the linearity criterion for distances requires that the &lt;a href="http://en.wikipedia.org/wiki/Expected_value" title="Expected value"&gt;expected values&lt;/a&gt; of the branch lengths for two individual branches must equal the expected value of the sum of the two branch distances - a property that applies to biological sequences only when they have been corrected for the possibility of &lt;a href="http://en.wikipedia.org/wiki/Back_mutation" title="Back mutation"&gt;back mutations&lt;/a&gt; at individual sites. This correction is done through the use of a &lt;a href="http://en.wikipedia.org/wiki/Substitution_matrix" title="Substitution matrix"&gt;substitution matrix&lt;/a&gt; such as that derived from the &lt;a href="http://en.wikipedia.org/wiki/Jukes-Cantor_model" title="Jukes-Cantor model"&gt;Jukes-Cantor model&lt;/a&gt; of DNA evolution. The distance correction is only necessary in practice when the evolution rates differ among branches.&lt;/p&gt; &lt;p&gt;The least-squares criterion applied to these distances is more accurate but less efficient than the neighbor-joining methods. An additional improvement that corrects for correlations between distances that arise from many closely related sequences in the data set can also be applied at increased computational cost. Finding the optimal least-squares tree with any correction factor is &lt;a href="http://en.wikipedia.org/wiki/NP-complete" title="NP-complete"&gt;NP-complete&lt;/a&gt;, so &lt;a href="http://en.wikipedia.org/wiki/Heuristic" title="Heuristic"&gt;heuristic&lt;/a&gt; search methods like those used in maximum-parsimony analysis are applied to the search through tree space.&lt;/p&gt; &lt;p&gt;&lt;a name="Using_outgroups" id="Using_outgroups"&gt;&lt;/a&gt;&lt;span class="editsection"&gt;&lt;/span&gt;&lt;span class="mw-headline"&gt;using outgroups&lt;/span&gt;&lt;/p&gt;  &lt;p&gt;Independent information about the relationship between sequences or groups can be used to help reduce the tree search space and root unrooted trees. Standard usage of distance-matrix methods involves the inclusion of at least one &lt;a href="http://en.wikipedia.org/wiki/Outgroup" title="Outgroup"&gt;outgroup&lt;/a&gt; sequence known to be only distantly related to the sequences of interest in the query set. This usage can be seen as a type of &lt;a href="http://en.wikipedia.org/wiki/Experimental_control" title="Experimental control"&gt;experimental control&lt;/a&gt;. If the outgroup has been appropriately chosen, it will have a much greater &lt;a href="http://en.wikipedia.org/wiki/Genetic_distance" title="Genetic distance"&gt;genetic distance&lt;/a&gt; and thus a longer branch length than any other sequence, and it will appear near the root of a rooted tree. Choosing an appropriate outgroup requires the selection of a sequence that is moderately related to the sequences of interest; too close a relationship defeats the purpose of the outgroup and too distant adds &lt;a href="http://en.wikipedia.org/wiki/Signal_noise" title="Signal noise"&gt;noise&lt;/a&gt; to the analysis. Care should also be taken to avoid situations in which the species from which the sequences were taken are distantly related, but the gene encoded by the sequences is highly &lt;a href="http://en.wikipedia.org/wiki/Conservation_%28genetics%29" title="Conservation (genetics)"&gt;conserved&lt;/a&gt; across lineages. &lt;a href="http://en.wikipedia.org/wiki/Horizontal_gene_transfer" title="Horizontal gene transfer"&gt;Horizontal gene transfer&lt;/a&gt;, especially between otherwise divergent &lt;a href="http://en.wikipedia.org/wiki/Bacteria" title="Bacteria"&gt;bacteria&lt;/a&gt;, can also confound outgroup usage.&lt;/p&gt; &lt;p&gt;&lt;a name="Maximum_parsimony" id="Maximum_parsimony"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Maximum parsimony&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/Maximum_parsimony" title="Maximum parsimony"&gt;Maximum parsimony&lt;/a&gt; (MP) is a method of identifying the potential phylogenetic tree that requires the smallest total number of &lt;a href="http://en.wikipedia.org/wiki/Evolution" title="Evolution"&gt;evolutionary&lt;/a&gt; events to explain the observed sequence data. Some ways of scoring trees also include a "cost" associated with particular types of evolutionary events and attempt to locate the tree with the smallest total cost. This is a useful approach in cases where not every possible type of event is equally likely - for example, when particular &lt;a href="http://en.wikipedia.org/wiki/Nucleotide" title="Nucleotide"&gt;nucleotides&lt;/a&gt; or &lt;a href="http://en.wikipedia.org/wiki/Amino_acid" title="Amino acid"&gt;amino acids&lt;/a&gt; are known to be more mutable than others.&lt;/p&gt; &lt;p&gt;The most naive way of identifying the most parsimonious tree is simple enumeration - considering each possible tree in succession and searching for the tree with the smallest score. However, this is only possible for a relatively small number of sequences or species because the problem of identifying the most parsimonious tree is known to be &lt;a href="http://en.wikipedia.org/wiki/NP-hard" title="NP-hard"&gt;NP-hard&lt;/a&gt;; consequently a number of &lt;a href="http://en.wikipedia.org/wiki/Heuristic" title="Heuristic"&gt;heuristic&lt;/a&gt; search methods for &lt;a href="http://en.wikipedia.org/wiki/Optimization_%28mathematics%29" title="Optimization (mathematics)"&gt;optimization&lt;/a&gt; have been developed to locate a highly parsimonious tree, if not the most optimal in the set. Most such methods involve a &lt;a href="http://en.wikipedia.org/wiki/Steepest_descent" title="Steepest descent"&gt;steepest descent&lt;/a&gt;-style minimization mechanism operating on a &lt;a href="http://en.wikipedia.org/wiki/Tree_rearrangement" title="Tree rearrangement"&gt;tree rearrangement&lt;/a&gt; criterion.&lt;/p&gt; &lt;p&gt;&lt;a name="Branch_and_bound" id="Branch_and_bound"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Branch and bound&lt;/span&gt;&lt;/h3&gt; &lt;p&gt;The &lt;a href="http://en.wikipedia.org/wiki/Branch_and_bound" title="Branch and bound"&gt;branch and bound&lt;/a&gt; algorithm is a general method used to increase the efficiency of searches for near-optimal solutions of &lt;a href="http://en.wikipedia.org/wiki/NP-hard" title="NP-hard"&gt;NP-hard&lt;/a&gt; problems first applied to phylogenetics in the early 1980s.&lt;span style="text-decoration: underline;"&gt;.&lt;/span&gt; Branch and bound is particularly well suited to phylogenetic tree construction because it inherently requires dividing a problem into a &lt;a href="http://en.wikipedia.org/wiki/Tree_structure" title="Tree structure"&gt;tree structure&lt;/a&gt; as it subdivides the problem space into smaller regions. As its name implies, it requires as input both a branching rule (in the case of phylogenetics, the addition of the next species or sequence to the tree) and a bound (a rule that excludes certain regions of the search space from consideration, thereby assuming that the optimal solution cannot occupy that region). Identifying a good bound is the most challenging aspect of the algorithm's application to phylogenetics. A simple way of defining the bound is a maximum number of assumed evolutionary changes allowed per tree. A set of criteria known as Zharkikh's rules severely limit the search space by defining characteristics shared by all candidate "most parsimonious" trees. The two most basic rules require the elimination of all but one redundant sequence (for cases where multiple observations have produced identical data) and the elimination of character sites at which two or more states do not occur in at least two species. Under ideal conditions these rules and their associated algorithm would completely define a tree.&lt;/p&gt; &lt;p&gt;&lt;a name="Sankoff-Morel-Cedergren_algorithm" id="Sankoff-Morel-Cedergren_algorithm"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt;&lt;span class="mw-headline"&gt;Sankoff-Morel-Cedergren algorithm&lt;/span&gt;&lt;/h3&gt; &lt;p&gt;The Sankoff-Morel-Cedergren algorithm was among the first published methods to simultaneously produce an MSA and a phylogenetic tree for nucleotide sequences. The method uses a &lt;a href="http://en.wikipedia.org/wiki/Maximum_parsimony" title="Maximum parsimony"&gt;maximum parsimony&lt;/a&gt; calculation in conjunction with a scoring function that penalizes gaps and mismatches, thereby favoring the tree that introduces a minimal number of such events. The imputed sequences at the &lt;a href="http://en.wikipedia.org/wiki/Interior_node" title="Interior node"&gt;interior nodes&lt;/a&gt; of the tree are scored and summed over all the nodes in each possible tree. The lowest-scoring tree sum provides both an optimal tree and an optimal MSA given the scoring function. Because the method is highly computationally intensive, an approximate method in which initial guesses for the interior alignments are refined one node at a time. Both the full and the approximate version are in practice calculated by dynamic programming.&lt;/p&gt; &lt;p&gt;&lt;a name="MALIGN_and_POY" id="MALIGN_and_POY"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;MALIGN and POY&lt;/span&gt;&lt;/h3&gt; &lt;p&gt;More recent phylogenetic tree/MSA methods use heuristics to isolate high-scoring, but not necessarily optimal, trees. The MALIGN method uses a maximum-parsimony technique to compute a multiple alignment by maximizing a &lt;a href="http://en.wikipedia.org/wiki/Cladogram" title="Cladogram"&gt;cladogram&lt;/a&gt; score, and its companion POY uses an iterative method that couples the optimization of the phylogenetic tree with improvements in the corresponding MSA. However, the use of these methods in constructing evolutionary hypotheses has been criticized as biased due to the deliberate construction of trees reflecting minimal evolutionary events. Both programs are available from the &lt;a href="http://research.amnh.org/scicomp/projects.html" class="external text" title="http://research.amnh.org/scicomp/projects.html" rel="nofollow"&gt;American Museum of Natural History&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;&lt;a name="Maximum_likelihood" id="Maximum_likelihood"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Maximum likelihood&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;The &lt;a href="http://en.wikipedia.org/wiki/Maximum_likelihood" title="Maximum likelihood"&gt;maximum likelihood&lt;/a&gt; method uses standard statistical techniques for inferring &lt;a href="http://en.wikipedia.org/wiki/Probability_distribution" title="Probability distribution"&gt;probability distributions&lt;/a&gt; to assign probabilities to particular possible phylogenetic trees. The method requires a &lt;a href="http://en.wikipedia.org/wiki/Substitution_model" title="Substitution model"&gt;substitution model&lt;/a&gt; to assess the probability of particular &lt;a href="http://en.wikipedia.org/wiki/Mutation" title="Mutation"&gt;mutations&lt;/a&gt;; roughly, a tree that requires more mutations at interior nodes to explain the observed phylogeny will be assessed as having a lower probability. This is broadly similar to the maximum-parsimony method, but maximum likelihood allows additional statistical flexibility by permitting varying rates of evolution across both lineages and sites. In fact, the method requires that evolution at different sites and along different lineages must be &lt;a href="http://en.wikipedia.org/wiki/Statistically_independent" title="Statistically independent"&gt;statistically independent&lt;/a&gt;. Maximum likelihood is thus well suited to the analysis of distantly related sequences, but because it formally requires search of all possible combinations of tree topology and branch length, it is computationally expensive to perform on more than a few sequences.&lt;/p&gt; &lt;p&gt;The "pruning" algorithm, a variant of &lt;a href="http://en.wikipedia.org/wiki/Dynamic_programming" title="Dynamic programming"&gt;dynamic programming&lt;/a&gt;, is often used to reduce the search space by efficiently calculating the likelihood of subtrees. The method calculates the likelihood for each site in a "linear" manner, starting at a node whose only descendants are leaves (that is, the tips of the tree) and working backwards toward the "bottom" node in nested sets. However, the trees produced by the method are only rooted if the substitution model is irreversible, which is not generally true of biological systems. The search for the maximum-likelihood tree also includes a branch length optimization component that is difficult to improve upon algorithmically; general &lt;a href="http://en.wikipedia.org/wiki/Global_optimization" title="Global optimization"&gt;global optimization&lt;/a&gt; tools such as the &lt;a href="http://en.wikipedia.org/wiki/Newton-Raphson" title="Newton-Raphson"&gt;Newton-Raphson&lt;/a&gt; method are often used. Searching tree topologies defined by likelihood has not been shown to be NP-complete, but remains extremely challenging because branch-and-bound search is not yet effective for trees represented in this way.&lt;/p&gt; &lt;p&gt;&lt;a name="Bayesian_inference" id="Bayesian_inference"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Bayesian inference&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/Bayesian_inference" title="Bayesian inference"&gt;Bayesian inference&lt;/a&gt; can be used to produce phylogenetic trees in a manner closely related to the maximum likelihood methods. Bayesian methods assume a prior &lt;a href="http://en.wikipedia.org/wiki/Probability_distribution" title="Probability distribution"&gt;probability distribution&lt;/a&gt; of the possible trees, which may simply be the probability of any one tree among all the possible trees that could be generated from the data, or may be a more sophisticated estimate derived from the assumption that divergence events such as &lt;a href="http://en.wikipedia.org/wiki/Speciation" title="Speciation"&gt;speciation&lt;/a&gt; occur as &lt;a href="http://en.wikipedia.org/wiki/Stochastic_process" title="Stochastic process"&gt;stochastic processes&lt;/a&gt;. The choice of prior distribution is a point of contention among users of Bayesian-inference phylogenetics methods&lt;/p&gt; &lt;p&gt;Implementations of Bayesian methods generally use &lt;a href="http://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo" title="Markov chain Monte Carlo"&gt;Markov chain Monte Carlo&lt;/a&gt; sampling algorithms, although the choice of move set varies; selections used in Bayesian phylogenetics include circularly permuting leaf nodes of a proposed tree at each step and swapping descendant subtrees of a random &lt;a href="http://en.wikipedia.org/wiki/Internal_node" title="Internal node"&gt;internal node&lt;/a&gt; between two related trees.&lt;sup id="_ref-Yang_0" class="reference"&gt;&lt;a href="http://en.wikipedia.org/wiki/Computational_phylogenetics#_note-Yang" title=""&gt;&lt;/a&gt;&lt;/sup&gt; The use of Bayesian methods in phylogenetics has been controversial, largely due to incomplete specification of the choice of move set, acceptance criterion, and prior distribution in published work.&lt;/p&gt; &lt;p&gt;&lt;a name="Model_selection" id="Model_selection"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Model selection&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;Molecular phylogenetics methods rely on a defined &lt;a href="http://en.wikipedia.org/wiki/Substitution_model" title="Substitution model"&gt;substitution model&lt;/a&gt; that encodes a hypothesis about the relative rates of &lt;a href="http://en.wikipedia.org/wiki/Mutation" title="Mutation"&gt;mutation&lt;/a&gt; at various sites along the gene or amino acid sequences being studied. At their simplest, substitution models aim to correct for differences in the rates of &lt;a href="http://en.wikipedia.org/wiki/Transition_%28genetics%29" title="Transition (genetics)"&gt;transitions&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Transversion" title="Transversion"&gt;transversions&lt;/a&gt; in nucleotide sequences. The use of substitution models is necessitated by the fact that the &lt;a href="http://en.wikipedia.org/wiki/Genetic_distance" title="Genetic distance"&gt;genetic distance&lt;/a&gt; between two sequences increases linearly only for a short time after the two sequences diverge from each other (alternatively, the distance is linear only shortly before &lt;a href="http://en.wikipedia.org/wiki/Coalescent_theory" title="Coalescent theory"&gt;coalescence&lt;/a&gt;). The longer the amount of time after divergence, the more likely it becomes that two mutations occur at the same nucleotide site. Simple genetic distance calculations will thus undercount the number of mutation events that have occurred in evolutionary history. The extent of this undercount increases with increasing time since divergence, which can lead to the phenomenon of &lt;a href="http://en.wikipedia.org/wiki/Long_branch_attraction" title="Long branch attraction"&gt;long branch attraction&lt;/a&gt;, or the misassignment of two distantly related but &lt;a href="http://en.wikipedia.org/wiki/Convergent_evolution" title="Convergent evolution"&gt;convergently evolving&lt;/a&gt; sequences as closely related. The maximum parsimony method is particularly susceptible to this problem due to its explicit search for a tree representing a minimum number of distinct evolutionary events.&lt;/p&gt; &lt;p&gt;&lt;a name="Types_of_models" id="Types_of_models"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Types of models&lt;/span&gt;&lt;/h3&gt; &lt;dl&gt;&lt;dd&gt; &lt;div class="noprint"&gt;&lt;i&gt;&lt;a href="http://en.wikipedia.org/wiki/Substitution_model" title="Substitution model"&gt;&lt;br /&gt;&lt;/a&gt;&lt;/i&gt;&lt;/div&gt; &lt;/dd&gt;&lt;/dl&gt; &lt;p&gt;All substitution models assign a set of weights to each possible change of state represented in the sequence. The most common model types are implicitly reversible because they assign the same weight to, for example, a G&gt;C nucleotide mutation as to a C&gt;G mutation. The simplest possible model, the &lt;a href="http://en.wikipedia.org/wiki/Jukes-Cantor_model" title="Jukes-Cantor model"&gt;Jukes-Cantor model&lt;/a&gt;, assigns an equal probability to every possible change of state for a given nucleotide base. The rate of change between any two distinct nucleotides will be one-third of the overall substitution rate. More advanced models distinguish between &lt;a href="http://en.wikipedia.org/wiki/Transition_%28genetics%29" title="Transition (genetics)"&gt;transitions&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Transversion" title="Transversion"&gt;transversions&lt;/a&gt;. The most general possible time-reversible model, called the GTR model, has contains six mutation rate parameters. An even more generalized model known as the general 12-parameter model breaks time-reversibility, at the cost of much additional complexity in calculating genetic distances that are consistent among multiple lineages. One possible variation on this theme adjusts the rates so that overall GC content - an important measure of DNA double helix stability - varies over time.&lt;/p&gt; &lt;p&gt;Models may also allow for the variation of rates with positions in the input sequence. The most obvious example of such variation follows from the arrangement of nucleotides in protein-coding genes into three-base &lt;a href="http://en.wikipedia.org/wiki/Codon" title="Codon"&gt;codons&lt;/a&gt;. If the location of the &lt;a href="http://en.wikipedia.org/wiki/Open_reading_frame" title="Open reading frame"&gt;open reading frame&lt;/a&gt; (ORF) is known, rates of mutation can be adjusted for position of a given site within a codon, since it is known that &lt;a href="http://en.wikipedia.org/wiki/Wobble_base_pair" title="Wobble base pair"&gt;wobble base pairing&lt;/a&gt; can allow for higher mutation rates in the third nucleotide of a given codon without affecting the codon's meaning in the &lt;a href="http://en.wikipedia.org/wiki/Genetic_code" title="Genetic code"&gt;genetic code&lt;/a&gt;.A less hypothesis-driven example that does not rely on ORF identification simply assigns to each site a rate randomly drawn from a predetermined distribution, often the &lt;a href="http://en.wikipedia.org/wiki/Gamma_distribution" title="Gamma distribution"&gt;gamma distribution&lt;/a&gt; or &lt;a href="http://en.wikipedia.org/wiki/Log-normal_distribution" title="Log-normal distribution"&gt;log-normal distribution&lt;/a&gt;. Finally, a more conservative estimate of rate variations known as the &lt;a href="http://en.wikipedia.org/wiki/Covarion" title="Covarion"&gt;covarion&lt;/a&gt; method allows &lt;a href="http://en.wikipedia.org/wiki/Autocorrelation" title="Autocorrelation"&gt;autocorrelated&lt;/a&gt; variations in rates, so that the mutation rate of a given site is correlated across sites and lineages.&lt;/p&gt; &lt;p&gt;&lt;a name="Choosing_the_best_model" id="Choosing_the_best_model"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Choosing the best model&lt;/span&gt;&lt;/h3&gt; &lt;p&gt;The selection of an appropriate model is critical for the production of good phylogenetic analyses, both because underparameterized or overly restrictive models may produce aberrant behavior when their underlying assumptions are violated, and because overly complex or overparameterized models are computationally expensive and the parameters may be overfit. The most common method of model selection is the &lt;a href="http://en.wikipedia.org/wiki/Likelihood_ratio_test" title="Likelihood ratio test"&gt;likelihood ratio test&lt;/a&gt; (LRT), which produces a likelihood estimate that can be interpreted as a measure of "&lt;a href="http://en.wikipedia.org/wiki/Goodness_of_fit" title="Goodness of fit"&gt;goodness of fit&lt;/a&gt;" between the model and the input data. However, care must be taken in using these results, since a more complex model with more parameters will always have a higher likelihood than a simplified version of the same model, which can lead to the naive selection of models that are overly complex. For this reason model selection computer programs will choose the simplest model that is not significantly worse than more complex substitution models. A significant disadvantage of the LRT is the necessity of making a series of pairwise comparisons between models; it has been shown that the order in which the models are compared has a major effect on the one that is eventually selected.&lt;/p&gt; &lt;p&gt;An alternative model selection method is the &lt;a href="http://en.wikipedia.org/wiki/Akaike_information_criterion" title="Akaike information criterion"&gt;Akaike information criterion&lt;/a&gt; (AIC), formally an estimate of the &lt;a href="http://en.wikipedia.org/wiki/Kullback-Leibler_divergence" title="Kullback-Leibler divergence"&gt;Kullback-Leibler divergence&lt;/a&gt; between the true model and the model being tested. It can be interpreted as a likelihood estimate with a correction factor to penalize overparameterized models. The AIC is calculated on an individual model rather than a pair, so it is independent of the order in which models are assessed. A related alternative, the &lt;a href="http://en.wikipedia.org/wiki/Bayesian_information_criterion" title="Bayesian information criterion"&gt;Bayesian information criterion&lt;/a&gt; (BIC), has a similar basic interpretation but penalizes complex models more heavily.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3299081579068555714-3039365379340166664?l=sciencemasala.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://sciencemasala.blogspot.com/feeds/3039365379340166664/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3299081579068555714&amp;postID=3039365379340166664' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/3039365379340166664'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/3039365379340166664'/><link rel='alternate' type='text/html' href='http://sciencemasala.blogspot.com/2007/07/computational-phylogenetics.html' title='Computational phylogenetics-construction of tree'/><author><name>chandu</name><uri>http://www.blogger.com/profile/00723479850668875546</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3299081579068555714.post-7596630656169546593</id><published>2007-07-21T23:45:00.000-07:00</published><updated>2007-07-21T23:52:14.882-07:00</updated><title type='text'>sequence alignment</title><content type='html'>&lt;!-- start content --&gt;    &lt;p&gt;In &lt;a href="http://en.wikipedia.org/wiki/Bioinformatics" title="Bioinformatics"&gt;bioinformatics&lt;/a&gt;, a &lt;b&gt;sequence alignment&lt;/b&gt; is a way of arranging the &lt;a href="http://en.wikipedia.org/wiki/Primary_sequence" title="Primary sequence"&gt;primary sequences&lt;/a&gt; of &lt;a href="http://en.wikipedia.org/wiki/DNA" title="DNA"&gt;DNA&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/RNA" title="RNA"&gt;RNA&lt;/a&gt;, or &lt;a href="http://en.wikipedia.org/wiki/Protein" title="Protein"&gt;protein&lt;/a&gt; to identify regions of similarity that may be a consequence of functional, &lt;a href="http://en.wikipedia.org/wiki/Structural_biology" title="Structural biology"&gt;structural&lt;/a&gt;, or &lt;a href="http://en.wikipedia.org/wiki/Evolution" title="Evolution"&gt;evolutionary&lt;/a&gt; relationships between the sequences. Aligned sequences of &lt;a href="http://en.wikipedia.org/wiki/Nucleotide" title="Nucleotide"&gt;nucleotide&lt;/a&gt; or &lt;a href="http://en.wikipedia.org/wiki/Amino_acid" title="Amino acid"&gt;amino acid&lt;/a&gt; residues are typically represented as rows within a &lt;a href="http://en.wikipedia.org/wiki/Matrix_%28mathematics%29" title="Matrix (mathematics)"&gt;matrix&lt;/a&gt;. Gaps are inserted between the residues so that residues with identical or similar characters are aligned in successive columns.&lt;/p&gt; &lt;div class="center"&gt; &lt;div class="thumb tnone"&gt; &lt;div class="thumbinner" style="width: 721px;"&gt;&lt;a href="http://en.wikipedia.org/wiki/Image:Zinc-finger-seq-alignment2.png" class="internal" title="A sequence alignment, produced by ClustalW between two human zinc finger proteins identified by GenBank accession number. (Key)"&gt;&lt;br /&gt;&lt;/a&gt; &lt;div class="thumbcaption"&gt;A sequence alignment, produced by &lt;a href="http://en.wikipedia.org/wiki/ClustalW" title="ClustalW"&gt;ClustalW&lt;/a&gt; between two &lt;a href="http://en.wikipedia.org/wiki/Human" title="Human"&gt;human&lt;/a&gt; &lt;a href="http://en.wikipedia.org/wiki/Zinc_finger" title="Zinc finger"&gt;zinc finger&lt;/a&gt; proteins identified by &lt;a href="http://en.wikipedia.org/wiki/GenBank" title="GenBank"&gt;GenBank&lt;/a&gt; accession number. (&lt;a href="http://en.wikipedia.org/wiki/Image:Zinc-finger-seq-alignment2.png" title="Image:Zinc-finger-seq-alignment2.png"&gt;Key&lt;/a&gt;)&lt;/div&gt; &lt;/div&gt; &lt;/div&gt; &lt;/div&gt; &lt;p&gt;If two sequences in an alignment share a common ancestor, mismatches can be interpreted as &lt;a href="http://en.wikipedia.org/wiki/Point_mutation" title="Point mutation"&gt;point mutations&lt;/a&gt; and gaps as &lt;a href="http://en.wikipedia.org/wiki/Indel" title="Indel"&gt;indels&lt;/a&gt; (that is, insertion or deletion mutations) introduced in one or both lineages in the time since they diverged from one another. In protein sequence alignment, the degree of similarity between &lt;a href="http://en.wikipedia.org/wiki/Amino_acid" title="Amino acid"&gt;amino acids&lt;/a&gt; occupying a particular position in the sequence can be interpreted as a rough measure of how &lt;a href="http://en.wikipedia.org/wiki/Conservation_%28genetics%29" title="Conservation (genetics)"&gt;conserved&lt;/a&gt; a particular region or &lt;a href="http://en.wikipedia.org/wiki/Sequence_motif" title="Sequence motif"&gt;sequence motif&lt;/a&gt; is among lineages. The absence of substitutions, or the presence of only very conservative substitutions (that is, the substitution of amino acids whose &lt;a href="http://en.wikipedia.org/wiki/Side_chain" title="Side chain"&gt;side chains&lt;/a&gt; have similar biochemical properties) in a particular region of the sequence, suggest that this region has structural or functional importance. Although DNA and RNA &lt;a href="http://en.wikipedia.org/wiki/Nucleotide" title="Nucleotide"&gt;nucleotide&lt;/a&gt; bases are more similar to each other than to amino acids, the conservation of &lt;a href="http://en.wikipedia.org/wiki/Base_pair" title="Base pair"&gt;base pairing&lt;/a&gt; can indicate a similar functional or structural role. Sequence alignment can be used for non-biological sequences, such as those present in &lt;a href="http://en.wikipedia.org/wiki/Natural_language" title="Natural language"&gt;natural language&lt;/a&gt; or in financial data.&lt;/p&gt; Very short or very similar sequences can be aligned by hand; however, most interesting problems require the alignment of lengthy, highly variable or extremely numerous sequences that cannot be aligned solely by human effort. Instead, human knowledge is primarily applied in constructing algorithms to produce high-quality sequence alignments, and occasionally in adjusting the final results to reflect patterns that are difficult to represent algorithmically (especially in the case of nucleotide sequences). Computational approaches to sequence alignment generally fall into two categories: &lt;i&gt;global alignments&lt;/i&gt; and &lt;i&gt;local alignments&lt;/i&gt;. Calculating a global alignment is a form of &lt;a href="http://en.wikipedia.org/wiki/Global_optimization" title="Global optimization"&gt;global optimization&lt;/a&gt; that "forces" the alignment to span the entire length of all query sequences. By contrast, local alignments identify regions of similarity within long sequences that are often widely divergent overall. Local alignments are often preferable, but can be more difficult to calculate because of the additional challenge of identifying the regions of similarity. A variety of computational algorithms have been applied to the sequence alignment problem, including slow but formally optimizing methods like &lt;a href="http://en.wikipedia.org/wiki/Dynamic_programming" title="Dynamic programming"&gt;dynamic programming&lt;/a&gt; and efficient &lt;a href="http://en.wikipedia.org/wiki/Heuristic" title="Heuristic"&gt;heuristic&lt;/a&gt; or &lt;a href="http://en.wikipedia.org/wiki/Probability" title="Probability"&gt;probabilistic&lt;/a&gt; methods designed for large-scale database search&lt;br /&gt;&lt;h2&gt;&lt;span class="mw-headline"&gt;Representations&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;Alignments are commonly represented both graphically and in text format. In almost all sequence alignment representations, sequences are written in rows arranged so that aligned residues appear in successive columns. In text formats, aligned columns containing identical or similar characters are indicated with a system of conservation symbols. As in the image above, an asterisk or pipe symbol is used to show identity between two columns; other less common symbols include a colon for conservative substitutions and a period for semiconservative substitutions. Many sequence visualization programs also use color to display information about the properties of the individual sequence elements; in DNA and RNA sequences, this equates to assigning each nucleotide its own color. In protein alignments, such as the one in the image above, color is often used to indicate amino acid properties to aid in judging the &lt;a href="http://en.wikipedia.org/wiki/Conservation_%28genetics%29" title="Conservation (genetics)"&gt;conservation&lt;/a&gt; of a given amino acid substitution. For multiple sequences the last row in each column is often the &lt;a href="http://en.wikipedia.org/wiki/Consensus_sequence" title="Consensus sequence"&gt;consensus sequence&lt;/a&gt; determined by the alignment; the consensus sequence is also often represented in graphical format with a &lt;a href="http://en.wikipedia.org/wiki/Sequence_logo" title="Sequence logo"&gt;sequence logo&lt;/a&gt; in which the size of each nucleotide or amino acid letter corresponds to its degree of conservation&lt;span style="text-decoration: underline;"&gt;.&lt;/span&gt;&lt;/p&gt; &lt;p&gt;Sequence alignments can be stored in a wide variety of text-based file formats, many of which were originally developed in conjunction with a specific alignment program or implementation. Most web-based tools allow a number of input and output formats, such as &lt;a href="http://en.wikipedia.org/wiki/FASTA_format" title="FASTA format"&gt;FASTA format&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/GenBank" title="GenBank"&gt;GenBank&lt;/a&gt; format; however, the use of specific tools authored by individual research laboratories can be complicated by limited file format compatibility. A general conversion program is available at &lt;a href="http://bioweb.pasteur.fr/seqanal/interfaces/readseq.html" class="external text" title="http://bioweb.pasteur.fr/seqanal/interfaces/readseq.html" rel="nofollow"&gt;READSEQ&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;&lt;a name="Global_and_local_alignments" id="Global_and_local_alignments"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Global and local alignments&lt;/span&gt;&lt;/h2&gt; &lt;div class="thumb tright"&gt; &lt;div class="thumbinner" style="width: 245px;"&gt;&lt;a href="http://en.wikipedia.org/wiki/Image:Global-local-alignment.png" class="internal" title="Illustration of global and local alignments demonstrating the 'gappy' quality of global alignments that can occur if sequences are insufficiently similar"&gt;&lt;img alt="Illustration of global and local alignments demonstrating the 'gappy' quality of global alignments that can occur if sequences are insufficiently similar" longdesc="/wiki/Image:Global-local-alignment.png" class="thumbimage" src="http://upload.wikimedia.org/wikipedia/commons/4/4b/Global-local-alignment.png" height="109" width="243" /&gt;&lt;/a&gt; &lt;div class="thumbcaption"&gt;Illustration of global and local alignments demonstrating the 'gappy' quality of global alignments that can occur if sequences are insufficiently similar&lt;/div&gt; &lt;/div&gt; &lt;/div&gt; &lt;p&gt;Global alignments, which attempt to align every residue in every sequence, are most useful when the sequences in the query set are similar and of roughly equal size. (This does not mean global alignments cannot end in gaps.) A general global alignment technique is called the &lt;a href="http://en.wikipedia.org/wiki/Needleman-Wunsch_algorithm" title="Needleman-Wunsch algorithm"&gt;Needleman-Wunsch algorithm&lt;/a&gt; and is based on dynamic programming. Local alignments are more useful for dissimilar sequences that are suspected to contain regions of similarity or similar sequence motifs within their larger sequence context. The &lt;a href="http://en.wikipedia.org/wiki/Smith-Waterman_algorithm" title="Smith-Waterman algorithm"&gt;Smith-Waterman algorithm&lt;/a&gt; is a general local alignment method also based on dynamic programming. With sufficiently similar sequences, there is no difference between local and global alignments.&lt;/p&gt; &lt;p&gt;Hybrid methods, known as semiglobal or "glocal" methods, attempt to find the best possible alignment that includes the start and end of one or the other sequence. This can be especially useful when the downstream part of one sequence overlaps with the upstream part of the other sequence. In this case, neither global nor local alignment is entirely appropriate: a global alignment would attempt to force the alignment to extend beyond the region of overlap, while a local alignment might not fully cover the region of overlap&lt;span style="text-decoration: underline;"&gt;.&lt;/span&gt;&lt;/p&gt; &lt;p&gt;&lt;a name="Pairwise_alignment" id="Pairwise_alignment"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Pairwise alignment&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;Pairwise sequence alignment methods are used to find the best-matching piecewise (local) or global alignments of two query sequences. Pairwise alignments can only be used between two sequences at a time, but they are efficient to calculate and are often used for methods that do not require extreme precision (such as searching a database for sequences with high &lt;a href="http://en.wikipedia.org/wiki/Homology_%28biology%29" title="Homology (biology)"&gt;homology&lt;/a&gt; to a query). The three primary methods of producing pairwise alignments are dot-matrix methods, dynamic programming, and word methods; however, multiple sequence alignment techniques can also align pairs of sequences. Although each method has its individual strengths and weaknesses, all three pairwise methods have difficulty with highly repetitive sequences of low &lt;a href="http://en.wikipedia.org/wiki/Information_content" title="Information content"&gt;information content&lt;/a&gt; - especially where the number of repetitions differ in the two sequences to be aligned.&lt;/p&gt; &lt;p&gt;&lt;a name="Dot-matrix_methods" id="Dot-matrix_methods"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Dot-matrix methods&lt;/span&gt;&lt;/h3&gt; &lt;div class="infobox sisterproject"&gt;&lt;div style="margin-left: 60px;"&gt; &lt;div style="margin-left: 10px;"&gt;&lt;i&gt;&lt;b&gt;&lt;a href="http://en.wikiversity.org/wiki/Dot-matrix_methods" class="extiw" title="v:Dot-matrix_methods"&gt;Dot-matrix methods&lt;/a&gt;&lt;/b&gt;&lt;/i&gt;&lt;/div&gt; &lt;/div&gt; &lt;/div&gt; &lt;div class="thumb tright"&gt; &lt;div class="thumbinner" style="width: 202px;"&gt;&lt;a href="http://en.wikipedia.org/wiki/Image:Zinc-finger-dot-plot.png" class="internal" title="A DNA dot plot of a human zinc finger transcription factor (GenBank ID NM_002383), showing regional self-similarity. The main diagonal represents the sequence's alignment with itself; lines off the main diagonal represent similar or repetitive patterns within the sequence. This is a typical example of a recurrence plot."&gt;&lt;img alt="A DNA dot plot of a human zinc finger transcription factor (GenBank ID NM_002383), showing regional self-similarity. The main diagonal represents the sequence's alignment with itself; lines off the main diagonal represent similar or repetitive patterns within the sequence. This is a typical example of a recurrence plot." longdesc="/wiki/Image:Zinc-finger-dot-plot.png" class="thumbimage" src="http://upload.wikimedia.org/wikipedia/commons/thumb/3/33/Zinc-finger-dot-plot.png/200px-Zinc-finger-dot-plot.png" height="200" width="200" /&gt;&lt;/a&gt; &lt;div class="thumbcaption"&gt; &lt;div class="magnify" style="float: right;"&gt;&lt;a href="http://en.wikipedia.org/wiki/Image:Zinc-finger-dot-plot.png" class="internal" title="Enlarge"&gt;&lt;img src="http://en.wikipedia.org/skins-1.5/common/images/magnify-clip.png" alt="" height="11" width="15" /&gt;&lt;/a&gt;&lt;/div&gt; A DNA dot plot of a &lt;a href="http://en.wikipedia.org/wiki/Human" title="Human"&gt;human&lt;/a&gt; &lt;a href="http://en.wikipedia.org/wiki/Zinc_finger" title="Zinc finger"&gt;zinc finger&lt;/a&gt; &lt;a href="http://en.wikipedia.org/wiki/Transcription_factor" title="Transcription factor"&gt;transcription factor&lt;/a&gt; (GenBank ID NM_002383), showing regional &lt;a href="http://en.wikipedia.org/wiki/Self-similarity" title="Self-similarity"&gt;self-similarity&lt;/a&gt;. The main diagonal represents the sequence's alignment with itself; lines off the main diagonal represent similar or repetitive patterns within the sequence. This is a typical example of a &lt;a href="http://en.wikipedia.org/wiki/Recurrence_plot" title="Recurrence plot"&gt;recurrence plot&lt;/a&gt;.&lt;/div&gt; &lt;/div&gt; &lt;/div&gt; &lt;p&gt;The dot-matrix approach, which implicitly produces a family of alignments for individual sequence regions, is qualitative and simple, though time-consuming to analyze on a large scale. It is very easy to visually identify certain sequence features—such as insertions, deletions, repeats, or &lt;a href="http://en.wikipedia.org/wiki/Inverted_repeat" title="Inverted repeat"&gt;inverted repeats&lt;/a&gt;—from a dot-matrix plot. To construct a dot-matrix plot, the two sequences are written along the top row and leftmost column of a two-dimensional &lt;a href="http://en.wikipedia.org/wiki/Matrix_%28mathematics%29" title="Matrix (mathematics)"&gt;matrix&lt;/a&gt; and a dot is placed at any point where the characters in the appropriate columns match—this is a typical &lt;a href="http://en.wikipedia.org/wiki/Recurrence_plot" title="Recurrence plot"&gt;recurrence plot&lt;/a&gt;. Some implementations vary the size or intensity of the dot depending on the degree of similarity of the two characters, to accommodate conservative substitutions. The dot plots of very closely related sequences will appear as a single line along the matrix's &lt;a href="http://en.wikipedia.org/wiki/Main_diagonal" title="Main diagonal"&gt;main diagonal&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;Dot plots can also be used to assess repetitiveness in a single sequence. A sequence can be plotted against itself and regions that share significant similarities will appear as lines off the main diagonal. This effect can occur when a protein consists of multiple similar &lt;a href="http://en.wikipedia.org/wiki/Structural_domain" title="Structural domain"&gt;structural domains&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;&lt;a name="Dynamic_programming" id="Dynamic_programming"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Dynamic programming&lt;/span&gt;&lt;/h3&gt; &lt;p&gt;The technique of &lt;a href="http://en.wikipedia.org/wiki/Dynamic_programming" title="Dynamic programming"&gt;dynamic programming&lt;/a&gt; can be applied to produce global alignments via the &lt;a href="http://en.wikipedia.org/wiki/Needleman-Wunsch_algorithm" title="Needleman-Wunsch algorithm"&gt;Needleman-Wunsch algorithm&lt;/a&gt;, and local alignments via the &lt;a href="http://en.wikipedia.org/wiki/Smith-Waterman_algorithm" title="Smith-Waterman algorithm"&gt;Smith-Waterman algorithm&lt;/a&gt;. In typical usage, protein alignments use a &lt;a href="http://en.wikipedia.org/wiki/Substitution_matrix" title="Substitution matrix"&gt;substitution matrix&lt;/a&gt; to assign scores to amino-acid matches or mismatches, and a &lt;a href="http://en.wikipedia.org/wiki/Gap_penalty" title="Gap penalty"&gt;gap penalty&lt;/a&gt; for matching an amino acid in one sequence to a gap in the other. DNA and RNA alignments may use a scoring matrix, but in practice often simply assign a positive match score, a negative mismatch score, and a negative gap penalty. (In standard dynamic programming, the score of each amino acid position is independent of the identity of its neighbors, and therefore &lt;a href="http://en.wikipedia.org/wiki/Base_stacking" title="Base stacking"&gt;base stacking&lt;/a&gt; effects are not taken into account. However, it is possible to account for such effects by modifying the algorithm.)&lt;/p&gt; &lt;p&gt;Dynamic programming can be useful in aligning nucleotide to protein sequences, a task complicated by the need to take into account &lt;a href="http://en.wikipedia.org/wiki/Frameshift" title="Frameshift"&gt;frameshift&lt;/a&gt; mutations (usually insertions or deletions). The framesearch method produces a series of global or local pairwise alignments between a query nucleotide sequence and a search set of protein sequences, or vice versa. Although the method is very slow, its ability to evaluate frameshifts offset by an arbitrary number of nucleotides makes the method useful for sequences containing large numbers of indels, which can be very difficult to align with more efficient heuristic methods. In practice, the method requires large amounts of computing power or a system whose architecture is specialized for dynamic programming. The &lt;a href="http://en.wikipedia.org/wiki/BLAST" title="BLAST"&gt;BLAST&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/EMBOSS" title="EMBOSS"&gt;EMBOSS&lt;/a&gt; suites provide basic tools for creating translated alignments (though some of these approaches take advantage of side-effects of sequence searching capabilities of the tools). More general methods are available from both commercial sources, such as &lt;i&gt;FrameSearch&lt;/i&gt;, distributed as part of the &lt;a href="http://www.accelrys.com/products/gcg/" class="external text" title="http://www.accelrys.com/products/gcg/" rel="nofollow"&gt;Accelrys GCG package&lt;/a&gt;, and &lt;a href="http://en.wikipedia.org/wiki/Open_Source" title="Open Source"&gt;Open Source&lt;/a&gt; software such as &lt;a href="http://www.ebi.ac.uk/Wise2" class="external text" title="http://www.ebi.ac.uk/Wise2" rel="nofollow"&gt;Genewise&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;The dynamic programming method is guaranteed to find an optimal alignment given a particular scoring function; however, identifying a good scoring function is often an empirical rather than a theoretical matter. Although dynamic programming is extensible to more than two sequences, it is prohibitively slow for large numbers of or extremely long sequences.&lt;/p&gt; &lt;p&gt;&lt;a name="Word_methods" id="Word_methods"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt;&lt;span class="mw-headline"&gt;Word methods&lt;/span&gt;&lt;/h3&gt; &lt;p&gt;Word methods, also known as &lt;i&gt;k&lt;/i&gt;-tuple methods, are &lt;a href="http://en.wikipedia.org/wiki/Heuristic" title="Heuristic"&gt;heuristic&lt;/a&gt; methods that are not guaranteed to find an optimal alignment solution, but are significantly more efficient than dynamic programming. These methods are especially useful in large-scale database searches where it is understood that a large proportion of the candidate sequences will have essentially no significant match with the query sequence. Word methods are best known for their implementation in the database search tools &lt;a href="http://en.wikipedia.org/wiki/FASTA" title="FASTA"&gt;FASTA&lt;/a&gt; and the &lt;a href="http://en.wikipedia.org/wiki/BLAST" title="BLAST"&gt;BLAST&lt;/a&gt; family. Word methods identify a series of short, nonoverlapping subsequences ("words") in the query sequence that are then matched to candidate database sequences. The relative positions of the word in the two sequences being compared are subtracted to obtain an offset; this will indicate a region of alignment if multiple distinct words produce the same offset. Only if this region is detected do these methods apply more sensitive alignment criteria; thus, many unnecessary comparisons with sequences of no appreciable similarity are eliminated.&lt;/p&gt; &lt;p&gt;In the FASTA method, the user defines a value &lt;i&gt;k&lt;/i&gt; to use as the word length with which to search the database. The method is slower but more sensitive at lower values of &lt;i&gt;k&lt;/i&gt;, which are also preferred for searches involving a very short query sequence. The BLAST family of search methods provides a number of algorithms optimized for particular types of queries, such as searching for distantly related sequence matches. BLAST was developed to provide a faster alternative to FASTA without sacrificing much accuracy; like FASTA, BLAST uses a word search of length &lt;i&gt;k&lt;/i&gt;, but evaluates only the most significant word matches, rather than every word match as does FASTA. Most BLAST implementations use a fixed default word length that is optimized for the query and database type, and that is changed only under special circumstances, such as when searching with repetitive or very short query sequences. Implementations can be found via a number of web portals, such as &lt;a href="http://www.ebi.ac.uk/fasta33/" class="external text" title="http://www.ebi.ac.uk/fasta33/" rel="nofollow"&gt;EMBL FASTA&lt;/a&gt; and &lt;a href="http://www.ncbi.nlm.nih.gov/BLAST/" class="external text" title="http://www.ncbi.nlm.nih.gov/BLAST/" rel="nofollow"&gt;NCBI BLAST&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;&lt;a name="Multiple_sequence_alignment" id="Multiple_sequence_alignment"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Multiple sequence alignment&lt;/span&gt;&lt;/h2&gt; &lt;dl&gt;&lt;dd&gt; &lt;div class="noprint"&gt;&lt;i&gt;&lt;a href="http://en.wikipedia.org/wiki/Multiple_sequence_alignment" title="Multiple sequence alignment"&gt;&lt;br /&gt;&lt;/a&gt;&lt;/i&gt;&lt;/div&gt; &lt;/dd&gt;&lt;/dl&gt; &lt;div class="thumb tright"&gt; &lt;div class="thumbinner" style="width: 302px;"&gt;&lt;a href="http://en.wikipedia.org/wiki/Image:Hemagglutinin-alignments.png" class="internal" title="Alignment of 27 avian influenza hemagglutinin protein sequences colored by residue conservation (top) and residue properties (bottom)"&gt;&lt;img alt="Alignment of 27 avian influenza hemagglutinin protein sequences colored by residue conservation (top) and residue properties (bottom)" longdesc="/wiki/Image:Hemagglutinin-alignments.png" class="thumbimage" src="http://upload.wikimedia.org/wikipedia/commons/thumb/b/b4/Hemagglutinin-alignments.png/300px-Hemagglutinin-alignments.png" height="322" width="300" /&gt;&lt;/a&gt; &lt;div class="thumbcaption"&gt; &lt;div class="magnify" style="float: right;"&gt;&lt;a href="http://en.wikipedia.org/wiki/Image:Hemagglutinin-alignments.png" class="internal" title="Enlarge"&gt;&lt;img src="http://en.wikipedia.org/skins-1.5/common/images/magnify-clip.png" alt="" height="11" width="15" /&gt;&lt;/a&gt;&lt;/div&gt; Alignment of 27 &lt;a href="http://en.wikipedia.org/wiki/Avian_influenza" title="Avian influenza"&gt;avian influenza&lt;/a&gt; &lt;a href="http://en.wikipedia.org/wiki/Hemagglutinin" title="Hemagglutinin"&gt;hemagglutinin&lt;/a&gt; protein sequences colored by residue conservation (top) and residue properties (bottom)&lt;/div&gt; &lt;/div&gt; &lt;/div&gt; &lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/Multiple_sequence_alignment" title="Multiple sequence alignment"&gt;Multiple sequence alignment&lt;/a&gt; (MSA) is an extension of pairwise alignment to incorporate more than two sequences at a time. Multiple alignment methods try to align all of the sequences in a given query set. Multiple alignments are often used in identifying &lt;a href="http://en.wikipedia.org/wiki/Conservation_%28genetics%29" title="Conservation (genetics)"&gt;conserved&lt;/a&gt; sequence regions across a group of sequences hypothesized to be evolutionarily related. Such conserved sequence motifs can be used in conjunction with structural and &lt;a href="http://en.wikipedia.org/wiki/Reaction_mechanism" title="Reaction mechanism"&gt;mechanistic&lt;/a&gt; information to locate the catalytic &lt;a href="http://en.wikipedia.org/wiki/Active_site" title="Active site"&gt;active sites&lt;/a&gt; of &lt;a href="http://en.wikipedia.org/wiki/Enzyme" title="Enzyme"&gt;enzymes&lt;/a&gt;. Alignments are also used to aid in establishing evolutionary relationships by constructing &lt;a href="http://en.wikipedia.org/wiki/Phylogenetic_tree" title="Phylogenetic tree"&gt;phylogenetic trees&lt;/a&gt;. MSAs are computationally difficult to produce and most formulations of the problem lead to &lt;a href="http://en.wikipedia.org/wiki/NP-complete" title="NP-complete"&gt;NP-complete&lt;/a&gt; combinatorial optimization problems Nevertheless, the utility of these alignments in bioinformatics has led to the development of a variety of methods suitable for aligning three or more sequences.&lt;/p&gt; &lt;p&gt;&lt;a name="Dynamic_programming_2" id="Dynamic_programming_2"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Dynamic programming&lt;/span&gt;&lt;/h3&gt; &lt;p&gt;The technique of dynamic programming is theoretically applicable to any number of sequences; however, because it is computationally expensive in both time and &lt;a href="http://en.wikipedia.org/wiki/Computer_memory" title="Computer memory"&gt;memory&lt;/a&gt;, it is rarely used for more than three or four sequences in its most basic form. This method requires constructing the &lt;i&gt;n&lt;/i&gt;-dimensional equivalent of the sequence matrix formed from two sequences, where &lt;i&gt;n&lt;/i&gt; is the number of sequences in the query. Standard dynamic programming is first used on all pairs of query sequences and then the "alignment space" is filled in by considering possible matches or gaps at intermediate positions, eventually constructing an alignment essentially between each two-sequence alignment. Although this technique is computationally expensive, its guarantee of a global optimum solution is useful in cases where only a few sequences need to be aligned accurately. One method for reducing the computational demands of dynamic programming, which relies on the "sum of pairs" &lt;a href="http://en.wikipedia.org/wiki/Objective_function" title="Objective function"&gt;objective function&lt;/a&gt;, has been implemented in the &lt;a href="http://www.ncbi.nlm.nih.gov/CBBresearch/Schaffer/msa.html" class="external text" title="http://www.ncbi.nlm.nih.gov/CBBresearch/Schaffer/msa.html" rel="nofollow"&gt;MSA&lt;/a&gt; software package&lt;/p&gt; &lt;p&gt;&lt;a name="Progressive_methods" id="Progressive_methods"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt;&lt;span class="mw-headline"&gt;Progressive methods&lt;/span&gt;&lt;/h3&gt; &lt;p&gt;Progressive, hierarchical, or tree methods generate an MSA by first aligning the most similar sequences and then adding successively less related sequences or groups to the alignment until the entire query set has been incorporated into the solution. The initial tree describing the sequence relatedness is based on pairwise comparisons that may include heuristic pairwise alignment methods similar to &lt;a href="http://en.wikipedia.org/wiki/FASTA" title="FASTA"&gt;FASTA&lt;/a&gt;. Progressive alignment results are dependent on the choice of "most related" sequences and thus can be sensitive to inaccuracies in the initial pairwise alignments. Most progressive MSA methods additionally weight the sequences in the query set according to their relatedness, which reduces the likelihood of making a poor choice of initial sequences and thus improves alignment accuracy.&lt;/p&gt; &lt;p&gt;Many variations of the &lt;a href="http://en.wikipedia.org/wiki/Clustal" title="Clustal"&gt;Clustal&lt;/a&gt; progressive implementation&lt;sup id="_ref-higgins_0" class="reference"&gt;&lt;a href="http://en.wikipedia.org/wiki/Sequence_alignment#_note-higgins" title=""&gt;[6]&lt;/a&gt;&lt;/sup&gt;&lt;sup id="_ref-thompson_0" class="reference"&gt;&lt;a href="http://en.wikipedia.org/wiki/Sequence_alignment#_note-thompson" title=""&gt;[7]&lt;/a&gt;&lt;/sup&gt;&lt;sup id="_ref-chenna_0" class="reference"&gt;&lt;a href="http://en.wikipedia.org/wiki/Sequence_alignment#_note-chenna" title=""&gt;[8]&lt;/a&gt;&lt;/sup&gt; are used for multiple sequence alignment, phylogenetic tree construction, and as input for &lt;a href="http://en.wikipedia.org/wiki/Protein_structure_prediction" title="Protein structure prediction"&gt;protein structure prediction&lt;/a&gt;. A slower but more accurate variant of the progressive method is known as &lt;a href="http://en.wikipedia.org/wiki/T-Coffee" title="T-Coffee"&gt;T-Coffee&lt;/a&gt;&lt;sup id="_ref-notredame_0" class="reference"&gt;&lt;a href="http://en.wikipedia.org/wiki/Sequence_alignment#_note-notredame" title=""&gt;[9]&lt;/a&gt;&lt;/sup&gt;; implementations can be found at &lt;a href="http://align.genome.jp/" class="external text" title="http://align.genome.jp/" rel="nofollow"&gt;ClustalW&lt;/a&gt; and &lt;a href="http://www.tcoffee.org/" class="external text" title="http://www.tcoffee.org" rel="nofollow"&gt;T-Coffee&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;&lt;a name="Iterative_methods" id="Iterative_methods"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt;&lt;span class="mw-headline"&gt;Iterative methods&lt;/span&gt;&lt;/h3&gt; &lt;p&gt;Iterative methods attempt to improve on the weak point of the progressive methods, the heavy dependence on the accuracy of the initial pairwise alignments. Iterative methods optimize an &lt;a href="http://en.wikipedia.org/wiki/Objective_function" title="Objective function"&gt;objective function&lt;/a&gt; based on a selected alignment scoring method by assigning an initial global alignment and then realigning sequence subsets. The realigned subsets are then themselves aligned to produce the next iteration's MSA. Various ways of selecting the sequence subgroups and objective function are reviewed in .&lt;/p&gt; &lt;p&gt;&lt;a name="Motif_finding" id="Motif_finding"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Motif finding&lt;/span&gt;&lt;/h3&gt; &lt;p&gt;Motif finding, also known as profile analysis, constructs global MSAs that attempt to align short conserved &lt;a href="http://en.wikipedia.org/wiki/Sequence_motif" title="Sequence motif"&gt;sequence motifs&lt;/a&gt; among the sequences in the query set. This is usually done by first constructing a general global MSA, after which the highly &lt;a href="http://en.wikipedia.org/wiki/Conservation_%28genetics%29" title="Conservation (genetics)"&gt;conserved&lt;/a&gt; regions are isolated and used to construct a set of profile matrices. The profile matrix for each conserved region is arranged like a scoring matrix but its frequency counts for each amino acid or nucleotide at each position are derived from the conserved region's character distribution rather than from a more general empirical distribution. The profile matrices are then used to search other sequences for occurrences of the motif they characterize. In cases where the original &lt;a href="http://en.wikipedia.org/wiki/Data_set" title="Data set"&gt;data set&lt;/a&gt; contained a small number of sequences, or only highly related sequences, &lt;a href="http://en.wikipedia.org/wiki/Pseudocount" title="Pseudocount"&gt;pseudocounts&lt;/a&gt; are added to normalize the character distributions represented in the motif.&lt;/p&gt; &lt;p&gt;&lt;a name="Techniques_inspired_by_computer_science" id="Techniques_inspired_by_computer_science"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt;&lt;span class="mw-headline"&gt;Techniques inspired by computer science&lt;/span&gt;&lt;/h3&gt; &lt;p&gt;A variety of general &lt;a href="http://en.wikipedia.org/wiki/Optimization_%28mathematics%29" title="Optimization (mathematics)"&gt;optimization&lt;/a&gt; algorithms commonly used in computer science have also been applied to the multiple sequence alignment problem. &lt;a href="http://en.wikipedia.org/wiki/Hidden_Markov_model" title="Hidden Markov model"&gt;Hidden Markov models&lt;/a&gt; have been used to produce probability scores for a family of possible MSAs for a given query set; although early HMM-based methods produced underwhelming performance, later applications have found them especially effective in detecting remotely related sequences because they are less susceptible to noise created by conservative or semiconservative substitutions.&lt;sup id="_ref-karplus_0" class="reference"&gt;&lt;a href="http://en.wikipedia.org/wiki/Sequence_alignment#_note-karplus" title=""&gt;[11]&lt;/a&gt;&lt;/sup&gt; &lt;a href="http://en.wikipedia.org/wiki/Genetic_algorithm" title="Genetic algorithm"&gt;Genetic algorithms&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Simulated_annealing" title="Simulated annealing"&gt;simulated annealing&lt;/a&gt; have also been used in optimizing MSA scores as judged by a scoring function like the sum-of-pairs method. More complete details and software packages can be found in the main article &lt;a href="http://en.wikipedia.org/wiki/Multiple_sequence_alignment" title="Multiple sequence alignment"&gt;multiple sequence alignment&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;&lt;a name="Structural_alignment" id="Structural_alignment"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Structural alignment&lt;/span&gt;&lt;/h2&gt; &lt;dl&gt;&lt;dd&gt; &lt;div class="noprint"&gt;&lt;i&gt;&lt;a href="http://en.wikipedia.org/wiki/Structural_alignment" title="Structural alignment"&gt;&lt;br /&gt;&lt;/a&gt;&lt;/i&gt;&lt;/div&gt; &lt;/dd&gt;&lt;/dl&gt; &lt;p&gt;Structural alignments, which are usually specific to protein and sometimes RNA sequences, use information about the &lt;a href="http://en.wikipedia.org/wiki/Secondary_structure" title="Secondary structure"&gt;secondary&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Tertiary_structure" title="Tertiary structure"&gt;tertiary structure&lt;/a&gt; of the protein or RNA molecule to aid in aligning the sequences. These methods can be used for two or more sequences and typically produce local alignments; however, because they depend on the availability of structural information, they can only be used for sequences whose corresponding structures are known (usually through &lt;a href="http://en.wikipedia.org/wiki/X-ray_crystallography" title="X-ray crystallography"&gt;X-ray crystallography&lt;/a&gt; or &lt;a href="http://en.wikipedia.org/wiki/NMR_spectroscopy" title="NMR spectroscopy"&gt;NMR spectroscopy&lt;/a&gt;). Because both protein and RNA structure is more evolutionarily conserved than sequence,&lt;sup id="_ref-chothia_0" class="reference"&gt;&lt;a href="http://en.wikipedia.org/wiki/Sequence_alignment#_note-chothia" title=""&gt;[12]&lt;/a&gt;&lt;/sup&gt; structural alignments can be more reliable between sequences that are very distantly related and that have diverged so extensively that sequence comparison cannot reliably detect their similarity.&lt;/p&gt; &lt;p&gt;Structural alignments are used as the "gold standard" in evaluating alignments for homology-based &lt;a href="http://en.wikipedia.org/wiki/Protein_structure_prediction" title="Protein structure prediction"&gt;protein structure prediction&lt;/a&gt; because they explicitly align regions of the protein sequence that are structurally similar rather than relying exclusively on sequence information. However, clearly structural alignments cannot be used in structure prediction because at least one sequence in the query set is the target to be modeled, for which the structure is not known. It has been shown that, given the structural alignment between a target and a template sequence, highly accurate models of the target protein sequence can be produced; a major stumbling block in homology-based structure prediction is the production of structurally accurate alignments given only sequence information&lt;/p&gt; &lt;p&gt;&lt;a name="DALI" id="DALI"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;DALI&lt;/span&gt;&lt;/h3&gt; &lt;p&gt;The DALI method, or distance matrix alignment, is a fragment-based method for constructing structural alignments based on contact similarity patterns between successive hexapeptides in the query sequences. It can generate pairwise or multiple alignments and identify a query sequence's structural neighbors in the &lt;a href="http://en.wikipedia.org/wiki/Protein_Data_Bank" title="Protein Data Bank"&gt;Protein Data Bank&lt;/a&gt; (PDB). It has been used to construct the &lt;a href="http://en.wikipedia.org/wiki/Families_of_structurally_similar_proteins" title="Families of structurally similar proteins"&gt;FSSP&lt;/a&gt; structural alignment database (Fold classification based on Structure-Structure alignment of Proteins, or Families of Structurally Similar Proteins). A DALI webserver can be accessed at &lt;a href="http://www.ebi.ac.uk/dali/" class="external text" title="http://www.ebi.ac.uk/dali/" rel="nofollow"&gt;EBI DALI&lt;/a&gt; and the FSSP is located at &lt;a href="http://ekhidna.biocenter.helsinki.fi/dali/start" class="external text" title="http://ekhidna.biocenter.helsinki.fi/dali/start" rel="nofollow"&gt;The Dali Database&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;&lt;a name="SSAP" id="SSAP"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;SSAP&lt;/span&gt;&lt;/h3&gt; &lt;p&gt;SSAP (sequential structure alignment program) is a dynamic programming-based method of structural alignment that uses atom-to-atom vectors in structure space as comparison points. It has been extended since its original description to include multiple as well as pairwise alignments, and has been used in the construction of the &lt;a href="http://en.wikipedia.org/wiki/CATH" title="CATH"&gt;CATH&lt;/a&gt; (Class, Architecture, Topology, Homology) hierarchical database classification of protein folds The CATH database can be accessed at &lt;a href="http://www.cathdb.info/latest/index.html" class="external text" title="http://www.cathdb.info/latest/index.html" rel="nofollow"&gt;CATH Protein Structure Classification&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;&lt;a name="Combinatorial_extension" id="Combinatorial_extension"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Combinatorial extension&lt;/span&gt;&lt;/h3&gt; &lt;p&gt;The combinatorial extension (CE) method of structural alignment generates a pairwise structural alignment by using local geometry to align short fragments of the two proteins being analyzed and then assembles these fragments into a larger alignment.&lt;sup id="_ref-shindyalov_0" class="reference"&gt;&lt;a href="http://en.wikipedia.org/wiki/Sequence_alignment#_note-shindyalov" title=""&gt;[17]&lt;/a&gt;&lt;/sup&gt; Based on measures such as rigid-body &lt;a href="http://en.wikipedia.org/wiki/Root_mean_square_deviation_%28bioinformatics%29" title="Root mean square deviation (bioinformatics)"&gt;root mean square distance&lt;/a&gt;, residue distances, local secondary structure, and surrounding environmental features such as residue neighbor &lt;a href="http://en.wikipedia.org/wiki/Hydrophobic" title="Hydrophobic"&gt;hydrophobicity&lt;/a&gt;, local alignments called "aligned fragment pairs" (AFPs) are generated and used to build a similarity matrix representing all possible structural alignments within predefined cutoff criteria. A path from one protein structure state to the other is then traced through the matrix by extending the growing alignment one fragment at a time. The optimal such path defines the CE alignment. A web-based server implementing the method and providing a database of pairwise alignments of structures in the PDB is located at the &lt;a href="http://cl.sdsc.edu/" class="external text" title="http://cl.sdsc.edu/" rel="nofollow"&gt;Combinatorial Extension&lt;/a&gt; website.&lt;/p&gt; &lt;p&gt;&lt;a name="Phylogenetic_analysis" id="Phylogenetic_analysis"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Phylogenetic analysis&lt;/span&gt;&lt;/h2&gt; &lt;dl&gt;&lt;dd&gt; &lt;div class="noprint"&gt;&lt;i&gt;&lt;a href="http://en.wikipedia.org/wiki/Computational_phylogenetics" title="Computational phylogenetics"&gt;&lt;br /&gt;&lt;/a&gt;&lt;/i&gt;&lt;/div&gt; &lt;/dd&gt;&lt;/dl&gt; &lt;p&gt;Phylogenetics and sequence alignment are closely related fields due to the shared necessity of evaluating sequence relatedness. The field of &lt;a href="http://en.wikipedia.org/wiki/Phylogenetics" title="Phylogenetics"&gt;phylogenetics&lt;/a&gt; makes extensive use of sequence alignments in the construction and interpretation of &lt;a href="http://en.wikipedia.org/wiki/Phylogenetic_tree" title="Phylogenetic tree"&gt;phylogenetic trees&lt;/a&gt;, which are used to classify the evolutionary relationships between homologous &lt;a href="http://en.wikipedia.org/wiki/Gene" title="Gene"&gt;genes&lt;/a&gt; represented in the &lt;a href="http://en.wikipedia.org/wiki/Genome" title="Genome"&gt;genomes&lt;/a&gt; of divergent species. The degree to which sequences in a query set differ is qualitatively related to the sequences' evolutionary distance from one another. Roughly speaking, high sequence identity suggests that the sequences in question have a comparatively young &lt;a href="http://en.wikipedia.org/wiki/Most_recent_common_ancestor" title="Most recent common ancestor"&gt;most recent common ancestor&lt;/a&gt;, while low identity suggests that the divergence is more ancient. This approximation, which reflects the "&lt;a href="http://en.wikipedia.org/wiki/Molecular_clock" title="Molecular clock"&gt;molecular clock&lt;/a&gt;" hypothesis that a roughly constant rate of evolutionary change can be used to extrapolate the elapsed time since two genes first diverged (that is, the &lt;a href="http://en.wikipedia.org/wiki/Coalescence_%28genetics%29" title="Coalescence (genetics)"&gt;coalescence&lt;/a&gt; time), assumes that the effects of mutation and &lt;a href="http://en.wikipedia.org/wiki/Natural_selection" title="Natural selection"&gt;selection&lt;/a&gt; are constant across sequence lineages. Therefore it does not account for possible difference among organisms or species in the rates of &lt;a href="http://en.wikipedia.org/wiki/DNA_repair" title="DNA repair"&gt;DNA repair&lt;/a&gt; or the possible functional conservation of specific regions in a sequence. (In the case of nucleotide sequences, the molecular clock hypothesis in its most basic form also discounts the difference in acceptance rates between &lt;a href="http://en.wikipedia.org/wiki/Silent_mutation" title="Silent mutation"&gt;silent mutations&lt;/a&gt; that do not alter the meaning of a given &lt;a href="http://en.wikipedia.org/wiki/Codon" title="Codon"&gt;codon&lt;/a&gt; and other mutations that result in a different &lt;a href="http://en.wikipedia.org/wiki/Amino_acid" title="Amino acid"&gt;amino acid&lt;/a&gt; being incorporated into the protein.) More statistically accurate methods allow the evolutionary rate on each branch of the phylogenetic tree to vary, thus producing better estimates of coalescence times for genes.&lt;/p&gt; &lt;p&gt;Progressive multiple alignment techniques produce a phylogenetic tree by necessity because they incorporate sequences into the growing alignment in order of relatedness. Other techniques that assemble MSAs and phylogenetic trees score and sort trees first and calculate an MSA from the highest-scoring tree. Commonly used methods of phylogenetic tree construction are mainly &lt;a href="http://en.wikipedia.org/wiki/Heuristic" title="Heuristic"&gt;heuristic&lt;/a&gt; because the problem of selecting the optimal tree, like the problem of selecting the optimal MSA, is &lt;a href="http://en.wikipedia.org/wiki/NP-hard" title="NP-hard"&gt;NP-hard&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;&lt;a name="Assessment_of_significance" id="Assessment_of_significance"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Assessment of significance&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;Sequence alignments are useful in bioinformatics for identifying sequence similarity, producing phylogenetic trees, and developing homology models of protein structures. However, the biological relevance of sequence alignments is not always clear. Alignments are often assumed to reflect a degree of evolutionary change between sequences descended from a common ancestor; however, it is formally possible that &lt;a href="http://en.wikipedia.org/wiki/Convergent_evolution" title="Convergent evolution"&gt;convergent evolution&lt;/a&gt; can occur to produce apparent similarity between proteins that are evolutionarily unrelated but perform similar functions and have similar structures.&lt;/p&gt; &lt;p&gt;In database searches such as BLAST, statistical methods can determine the likelihood of a particular alignment between sequences or sequence regions arising by chance given the size and composition of the database being searched. These values can vary significantly depending on the search space. In particular, the likelihood of finding a given alignment by chance increases if the database consists only of sequences from the same organism as the query sequence. Repetitive sequences in the database or query can also distort both the search results and the assessment of statistical significance; BLAST automatically filters such repetitive sequences in the query to avoid apparent hits that are statistical artifacts.&lt;/p&gt; &lt;p&gt;&lt;a name="Scoring_functions" id="Scoring_functions"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Scoring functions&lt;/span&gt;&lt;/h3&gt; &lt;p&gt;The choice of a scoring function that reflects biological or statistical observations about known sequences is important to producing good alignments. Protein sequences are frequently aligned using &lt;a href="http://en.wikipedia.org/wiki/Substitution_matrix" title="Substitution matrix"&gt;substitution matrices&lt;/a&gt; that reflect the probabilities of given character-to-character substitutions. A series of matrices called &lt;a href="http://en.wikipedia.org/wiki/Point_accepted_mutation" title="Point accepted mutation"&gt;PAM matrices&lt;/a&gt; (Point Accepted Mutation matrices, originally defined by &lt;a href="http://en.wikipedia.org/wiki/Margaret_Dayhoff" title="Margaret Dayhoff"&gt;Margaret Dayhoff&lt;/a&gt; and sometimes referred to as "Dayhoff matrices") explicitly encode evolutionary approximations regarding the rates and probabilities of particular amino acid mutations. Another common series of scoring matrices, known as &lt;a href="http://en.wikipedia.org/wiki/BLOSUM" title="BLOSUM"&gt;BLOSUM&lt;/a&gt; (Blocks Substitution Matrix), encodes empirically derived substitution probabilities. Variants of both types of matrices are used to detect sequences with differing levels of divergence, thus allowing users of BLAST or FASTA to restrict searches to more closely related matches or expand to detect more divergent sequences. &lt;a href="http://en.wikipedia.org/wiki/Gap_penalty" title="Gap penalty"&gt;Gap penalties&lt;/a&gt; account for the introduction of a gap - on the evolutionary model, an insertion or deletion mutation - in both nucleotide and protein sequences, and therefore the penalty values should be proportional to the expected rate of such mutations. The quality of the alignments produced therefore depends on the quality of the scoring function.&lt;/p&gt; &lt;p&gt;It can be very useful and instructive to try the same alignment several times with different choices for scoring matrix and/or gap penalty values and compare the results. Regions where the solution is weak or non-unique can often be identified by observing which regions of the alignment are robust to variations in alignment parameters.&lt;/p&gt; &lt;p&gt;&lt;a name="Non-biological_uses" id="Non-biological_uses"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Non-biological uses&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;The methods used for biological sequence alignment have also found applications in other fields, most notably in &lt;a href="http://en.wikipedia.org/wiki/Natural_language_processing" title="Natural language processing"&gt;natural language processing&lt;/a&gt;. Techniques that generate the set of elements from which words will be selected in natural-language generation algorithms have borrowed multiple sequence alignment techniques from bioinformatics to produce linguistic versions of computer-generated mathematical proofs&lt;span style="text-decoration: underline;"&gt;.&lt;/span&gt; In the field of historical and comparative &lt;a href="http://en.wikipedia.org/wiki/Linguistics" title="Linguistics"&gt;linguistics&lt;/a&gt;, sequence alignment has been used to partially automate the &lt;a href="http://en.wikipedia.org/wiki/Comparative_method" title="Comparative method"&gt;comparative method&lt;/a&gt; by which linguists traditionally reconstruct languages. Business and marketing research has also applied MSA techniques in analyzing series of purchases over time.&lt;/p&gt; &lt;p&gt;&lt;a name="Software" id="Software"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Software&lt;/span&gt;&lt;/h2&gt; &lt;dl&gt;&lt;dd&gt; &lt;div class="noprint"&gt;&lt;i&gt;&lt;a href="http://en.wikipedia.org/wiki/Sequence_alignment_software" title="Sequence alignment software"&gt;&lt;br /&gt;&lt;/a&gt;&lt;/i&gt;&lt;/div&gt; &lt;/dd&gt;&lt;/dl&gt; &lt;p&gt;Common software tools used for general sequence alignment tasks include &lt;a href="http://www2.ebi.ac.uk/clustalw/" class="external text" title="http://www2.ebi.ac.uk/clustalw/" rel="nofollow"&gt;ClustalW&lt;/a&gt; and &lt;a href="http://tcoffee.vital-it.ch/cgi-bin/Tcoffee/tcoffee_cgi/index.cgi" class="external text" title="http://tcoffee.vital-it.ch/cgi-bin/Tcoffee/tcoffee_cgi/index.cgi" rel="nofollow"&gt;T-coffee&lt;/a&gt; for alignment, and &lt;a href="http://ncbi.nih.gov/BLAST/" class="external text" title="http://ncbi.nih.gov/BLAST/" rel="nofollow"&gt;BLAST&lt;/a&gt; for database searching. A more complete list of available software categorized by algorithm and alignment type is available at &lt;a href="http://en.wikipedia.org/wiki/Sequence_alignment_software" title="Sequence alignment software"&gt;sequence alignment software&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;Alignment algorithms and software can be directly compared to one another using a standardized set of &lt;a href="http://en.wikipedia.org/wiki/Benchmark" title="Benchmark"&gt;benchmark&lt;/a&gt; reference multiple sequence alignments known as BAliBASE. The data set consists of structural alignments, which can be considered a standard against which purely sequence-based methods are compared. The relative performance of many common alignment methods on frequently encountered alignment problems has been tabulated and selected results published online at &lt;a href="http://bips.u-strasbg.fr/fr/Products/Databases/BAliBASE/prog_scores.html" class="external text" title="http://bips.u-strasbg.fr/fr/Products/Databases/BAliBASE/prog_scores.html" rel="nofollow"&gt;BAliBASE&lt;/a&gt;. A comprehensive list of BAliBASE scores for many (currently 12) different alignment tools can be computed within the protein workbench [&lt;a href="http://3d-alignment.eu/" class="external text" title="http://3d-alignment.eu/" rel="nofollow"&gt;STRAP&lt;/a&gt;].&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3299081579068555714-7596630656169546593?l=sciencemasala.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://sciencemasala.blogspot.com/feeds/7596630656169546593/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3299081579068555714&amp;postID=7596630656169546593' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/7596630656169546593'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/7596630656169546593'/><link rel='alternate' type='text/html' href='http://sciencemasala.blogspot.com/2007/07/sequence-alignment.html' title='sequence alignment'/><author><name>chandu</name><uri>http://www.blogger.com/profile/00723479850668875546</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3299081579068555714.post-117276202397324186</id><published>2007-07-21T23:38:00.000-07:00</published><updated>2007-07-21T23:42:09.158-07:00</updated><title type='text'>Protein structure prediction</title><content type='html'>&lt;b&gt;&lt;a href="http://en.wikipedia.org/wiki/Protein_structure" title="Protein structure"&gt;Protein structure&lt;/a&gt; prediction&lt;/b&gt; is one of the most significant technologies pursued by &lt;a href="http://en.wikipedia.org/wiki/Bioinformatics" title="Bioinformatics"&gt;computational structural biology&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Theoretical_chemistry" title="Theoretical chemistry"&gt;theoretical chemistry&lt;/a&gt;. It has the aim of determining the three-dimensional structure of &lt;a href="http://en.wikipedia.org/wiki/Protein" title="Protein"&gt;proteins&lt;/a&gt; from their &lt;a href="http://en.wikipedia.org/wiki/Amino_acid" title="Amino acid"&gt;amino acid&lt;/a&gt; sequences (an example of &lt;a href="http://en.wikipedia.org/wiki/Emergence" title="Emergence"&gt;emergence&lt;/a&gt;). In more formal terms, this is expressed as the prediction of protein &lt;a href="http://en.wikipedia.org/wiki/Tertiary_structure" title="Tertiary structure"&gt;tertiary structure&lt;/a&gt; from &lt;a href="http://en.wikipedia.org/wiki/Primary_structure" title="Primary structure"&gt;primary structure&lt;/a&gt;. Given the usefulness of known protein structures in such valuable tasks as &lt;a href="http://en.wikipedia.org/wiki/Rational_drug_design" title="Rational drug design"&gt;rational drug design&lt;/a&gt; this is a highly active field of research.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://en.wikipedia.org/wiki/Protein_structure" title="Protein structure"&gt;&lt;b&gt;&lt;/b&gt;&lt;/a&gt;&lt;b&gt;&lt;a href="http://en.wikipedia.org/wiki/Protein_structure" title="Protein structure"&gt;Protein structure&lt;/a&gt; prediction&lt;/b&gt;&lt;h2&gt;&lt;span class="mw-headline"&gt;Overview&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;The practical role of protein structure prediction is now more important than ever. Massive amounts of protein sequence data may be derived from modern large-scale &lt;a href="http://en.wikipedia.org/wiki/DNA" title="DNA"&gt;DNA&lt;/a&gt; sequencing efforts such as the &lt;a href="http://en.wikipedia.org/wiki/Human_Genome_Project" title="Human Genome Project"&gt;Human Genome Project&lt;/a&gt;. Despite community-wide efforts in &lt;a href="http://en.wikipedia.org/wiki/Structural_genomics" title="Structural genomics"&gt;structural genomics&lt;/a&gt;, the output of experimentally determined protein structures — typically by time-consuming and relatively expensive &lt;a href="http://en.wikipedia.org/wiki/X-ray_crystallography" title="X-ray crystallography"&gt;X-ray crystallography&lt;/a&gt; or &lt;a href="http://en.wikipedia.org/wiki/Protein_NMR" title="Protein NMR"&gt;NMR spectroscopy&lt;/a&gt; — is lagging far behind the output of protein sequences.&lt;/p&gt; &lt;p&gt;A number of factors exist that make protein structure prediction a very difficult task, including:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;The number of possible structures that proteins may possess is extremely large&lt;/li&gt;&lt;li&gt;The physical basis of protein structural stability is not fully understood.&lt;/li&gt;&lt;li&gt;The tertiary structure of a native protein may not be readily formed without the aid of &lt;i&gt;&lt;a href="http://en.wikipedia.org/wiki/Trans" title="Trans"&gt;trans&lt;/a&gt;&lt;/i&gt;-acting factors. For example, proteins known as &lt;i&gt;&lt;a href="http://en.wikipedia.org/wiki/Chaperone" title="Chaperone"&gt;chaperones&lt;/a&gt;&lt;/i&gt; are required for some proteins to properly fold; other proteins cannot fold properly without modifications such as &lt;a href="http://en.wikipedia.org/wiki/Glycosylation" title="Glycosylation"&gt;glycosylation&lt;/a&gt;.&lt;/li&gt;&lt;li&gt;A particular sequence may be able to assume multiple conformations depending on its environment, and the biologically active conformation may not be the most &lt;a href="http://en.wikipedia.org/wiki/Thermodynamics" title="Thermodynamics"&gt;thermodynamically&lt;/a&gt; favorable.&lt;/li&gt;&lt;li&gt;Direct simulation of &lt;a href="http://en.wikipedia.org/wiki/Protein_folding" title="Protein folding"&gt;protein folding&lt;/a&gt; via methods such as &lt;a href="http://en.wikipedia.org/wiki/Molecular_dynamics" title="Molecular dynamics"&gt;molecular dynamics&lt;/a&gt; is not tractable for both practical and theoretical reasons except in very small proteins, despite the efforts of distributed computing projects such as &lt;a href="http://en.wikipedia.org/wiki/Folding%40home" title="Folding@home"&gt;Folding@home&lt;/a&gt;.&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;Due to exponentially improving computer power, and new algorithms, much progress is being made to overcome these factors by the many research groups that are interested in the task. Prediction of structures for small proteins is now a perfectly realistic goal. A wide range of approaches are routinely applied for such predictions. These approaches may be classified into two broad classes; &lt;i&gt;&lt;a href="http://en.wikipedia.org/wiki/Ab_initio" title="Ab initio"&gt;ab initio&lt;/a&gt;&lt;/i&gt; modelling and comparative modelling.&lt;/p&gt; &lt;p&gt;&lt;a name="Ab_initio_protein_modelling" id="Ab_initio_protein_modelling"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;&lt;i&gt;Ab initio&lt;/i&gt; protein modelling&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;&lt;i&gt;Ab initio&lt;/i&gt;- or &lt;i&gt;de novo&lt;/i&gt;- protein modelling methods seek to build three-dimensional protein models "from scratch", i.e., based on physical principles rather than (directly) on previously solved structures. There are many possible procedures that either attempt to mimic &lt;a href="http://en.wikipedia.org/wiki/Protein_folding" title="Protein folding"&gt;protein folding&lt;/a&gt; or apply some &lt;a href="http://en.wikipedia.org/wiki/Stochastic" title="Stochastic"&gt;stochastic&lt;/a&gt; method to search possible solutions (i.e., &lt;a href="http://en.wikipedia.org/wiki/Global_optimization" title="Global optimization"&gt;global optimization&lt;/a&gt; of a suitable energy function). These procedures tend to require vast computational resources, and have thus only been carried out for tiny proteins. To predict protein structure &lt;i&gt;de novo&lt;/i&gt; for larger proteins will require better algorithms and larger computational resources like those afforded by either powerful supercomputers (such as &lt;a href="http://en.wikipedia.org/wiki/Blue_Gene" title="Blue Gene"&gt;Blue Gene&lt;/a&gt; or &lt;a href="http://en.wikipedia.org/wiki/MDGRAPE-3" title="MDGRAPE-3"&gt;MDGRAPE-3&lt;/a&gt;) or distributed computing (such as &lt;a href="http://en.wikipedia.org/wiki/Folding%40home" title="Folding@home"&gt;Folding@home&lt;/a&gt;, the &lt;a href="http://en.wikipedia.org/wiki/Human_Proteome_Folding_Project" title="Human Proteome Folding Project"&gt;Human Proteome Folding Project&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Rosetta%40Home" title="Rosetta@Home"&gt;Rosetta@Home&lt;/a&gt;). Although these computational barriers are vast, the potential benefits of structural genomics (by predicted or experimental methods) make &lt;i&gt;ab initio&lt;/i&gt; structure prediction an active research field.&lt;/p&gt; &lt;p&gt;&lt;a name="Comparative_protein_modelling" id="Comparative_protein_modelling"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Comparative protein modelling&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;Comparative protein modelling uses previously solved structures as starting points, or templates. This is effective because it appears that although the number of actual proteins is vast, there is a limited set of &lt;a href="http://en.wikipedia.org/wiki/Tertiary_structure" title="Tertiary structure"&gt;tertiary&lt;/a&gt; &lt;a href="http://en.wikipedia.org/wiki/Structural_motif" title="Structural motif"&gt;structural motifs&lt;/a&gt; to which most proteins belong. It has been suggested that there are only around 2000 distinct protein folds in nature, though there are many millions of different proteins.&lt;/p&gt; &lt;p&gt;These methods may also be split into two groups:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;&lt;b&gt;&lt;a href="http://en.wikipedia.org/wiki/Homology_modelling" title="Homology modelling"&gt;Homology modelling&lt;/a&gt;&lt;/b&gt; is based on the reasonable assumption that two &lt;a href="http://en.wikipedia.org/wiki/Homology_%28biology%29#Homology_of_sequences_in_genetics" title="Homology (biology)"&gt;homologous&lt;/a&gt; proteins will share very similar structures. Because a protein's fold is more evolutionarily conserved than its amino acid sequence, a target sequence can be modeled with reasonable accuracy on a very distantly related template, provided that the relationship between target and template can be discerned through &lt;a href="http://en.wikipedia.org/wiki/Sequence_alignment" title="Sequence alignment"&gt;sequence alignment&lt;/a&gt;. It has been suggested that the primary bottleneck in comparative modelling arises from difficulties in alignment rather than from errors in structure prediction given a known-good alignment Unsurprisingly, homology modelling is most accurate when the target and template have similar sequences.&lt;/li&gt;&lt;/ul&gt; &lt;ul&gt;&lt;li&gt;&lt;a href="http://en.wikipedia.org/wiki/Protein_threading" title="Protein threading"&gt;&lt;b&gt;Protein threading&lt;/b&gt;&lt;/a&gt; scans the amino acid sequence of an unknown structure against a database of solved structures. In each case, a scoring function is used to assess the compatibility of the sequence to the structure, thus yielding possible three-dimensional models. This type of method is also known as &lt;b&gt;3D-1D fold recognition&lt;/b&gt; due to its compatibility analysis between three-dimensional structures and linear protein sequences. This method has also given rise to methods performing an &lt;b&gt;inverse folding search&lt;/b&gt; by evaluating the compatibility of a given structure with a large database of sequences, thus predicting which sequences have the potential to produce a given fold.&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;&lt;a name="Side_chain_geometry_prediction" id="Side_chain_geometry_prediction"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Side chain geometry prediction&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;Even structure prediction methods that are reasonably accurate for the peptide backbone often get the orientation and packing of the amino acid &lt;a href="http://en.wikipedia.org/wiki/Side_chain" title="Side chain"&gt;side chains&lt;/a&gt; wrong. Methods that specifically address the problem of predicting side chain geometry include &lt;a href="http://en.wikipedia.org/wiki/Dead-end_elimination" title="Dead-end elimination"&gt;dead-end elimination&lt;/a&gt; and the &lt;a href="http://en.wikipedia.org/wiki/Self-consistent_mean_field_%28biology%29" title="Self-consistent mean field (biology)"&gt;self-consistent mean field&lt;/a&gt; method. Both discretize the continuously varying &lt;a href="http://en.wikipedia.org/wiki/Dihedral_angle" title="Dihedral angle"&gt;dihedral angles&lt;/a&gt; that determine a side chain's orientation relative to the backbone into a set of &lt;a href="http://en.wikipedia.org/wiki/Rotamer" title="Rotamer"&gt;rotamers&lt;/a&gt; with fixed dihedral angles. The methods then attempt to identify the set of rotamers that minimize the model's overall energy. Rotamers are the side chain conformations with low energy. Such methods are most useful for analyzing the protein's &lt;a href="http://en.wikipedia.org/wiki/Hydrophobic" title="Hydrophobic"&gt;hydrophobic&lt;/a&gt; core, where side chains are more closely packed; they have more difficulty addressing the looser constraints and higher flexibility of surface residues.&lt;/p&gt; &lt;p&gt;&lt;a name="Software" id="Software"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Software&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/MODELLER" title="MODELLER"&gt;MODELLER&lt;/a&gt; is a popular software tool for producing homology models using methodology derived from &lt;a href="http://en.wikipedia.org/wiki/Protein_NMR" title="Protein NMR"&gt;NMR spectroscopy&lt;/a&gt; data processing. &lt;a href="http://swissmodel.expasy.org//SWISS-MODEL.html" class="external text" title="http://swissmodel.expasy.org//SWISS-MODEL.html" rel="nofollow"&gt;SwissModel&lt;/a&gt; provides an automated web server for basic homology modeling. A common software tool for protein threading is &lt;a href="http://www.sbg.bio.ic.ac.uk/%7E3dpssm/" class="external text" title="http://www.sbg.bio.ic.ac.uk/~3dpssm/" rel="nofollow"&gt;3D-PSSM&lt;/a&gt;. The basic algorithm for threading is described in and is fairly straightforward to implement.&lt;/p&gt; &lt;p&gt;&lt;a href="http://www.eidogen-sertanty.com/products_tip_content.html" class="external text" title="http://www.eidogen-sertanty.com/products_tip_content.html" rel="nofollow"&gt;TIP&lt;/a&gt; is a knowledgebase of STRUCTFAST&lt;sup id="_ref-debe2006_0" class="reference"&gt;&lt;a href="http://en.wikipedia.org/wiki/Protein_structure_prediction#_note-debe2006" title=""&gt;&lt;/a&gt;&lt;/sup&gt; models and precomputed similarity relationships between sequences, structures, and binding sites.&lt;/p&gt; &lt;p&gt;A very recent review of currently popular software for structure prediction can be found at. A partial list of web servers and available tools is maintained .&lt;/p&gt; &lt;p&gt;Several &lt;a href="http://en.wikipedia.org/wiki/Distributed_computing" title="Distributed computing"&gt;distributed computing&lt;/a&gt; projects concerning protein structure prediction have also been implemented, such as the &lt;a href="http://en.wikipedia.org/wiki/Folding%40home" title="Folding@home"&gt;Folding@home&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/Rosetta%40home" title="Rosetta@home"&gt;Rosetta@home&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/Human_Proteome_Folding_Project" title="Human Proteome Folding Project"&gt;Human Proteome Folding Project&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/Predictor%40home" title="Predictor@home"&gt;Predictor@home&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/TANPAKU" title="TANPAKU"&gt;TANPAKU&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;&lt;a name="Protein-protein_complexes" id="Protein-protein_complexes"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Protein-protein complexes&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;In the case of &lt;a href="http://en.wikipedia.org/wiki/Protein_complex" title="Protein complex"&gt;complexes of two or more proteins&lt;/a&gt;, where the structures of the proteins are known or can be predicted with high accuracy, &lt;a href="http://en.wikipedia.org/wiki/Protein-protein_docking" title="Protein-protein docking"&gt;protein-protein docking&lt;/a&gt; methods can be used to predict the structure of the complex. Information of the effect of mutations at specific sites on the affinity of the complex helps to understand the complex structure and to guide docking methods.&lt;/p&gt; is one of the most significant technologies pursued by &lt;a href="http://en.wikipedia.org/wiki/Bioinformatics" title="Bioinformatics"&gt;computational structural biology&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Theoretical_chemistry" title="Theoretical chemistry"&gt;theoretical chemistry&lt;/a&gt;. It has the aim of determining the three-dimensional structure of &lt;a href="http://en.wikipedia.org/wiki/Protein" title="Protein"&gt;proteins&lt;/a&gt; from their &lt;a href="http://en.wikipedia.org/wiki/Amino_acid" title="Amino acid"&gt;amino acid&lt;/a&gt; sequences (an example of &lt;a href="http://en.wikipedia.org/wiki/Emergence" title="Emergence"&gt;emergence&lt;/a&gt;). In more formal terms, this is expressed as the prediction of protein &lt;a href="http://en.wikipedia.org/wiki/Tertiary_structure" title="Tertiary structure"&gt;tertiary structure&lt;/a&gt; from &lt;a href="http://en.wikipedia.org/wiki/Primary_structure" title="Primary structure"&gt;primary structure&lt;/a&gt;. Given the usefulness of known protein structures in such valuable tasks as &lt;a href="http://en.wikipedia.org/wiki/Rational_drug_design" title="Rational drug design"&gt;rational drug design&lt;/a&gt; this is a highly active field of research.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3299081579068555714-117276202397324186?l=sciencemasala.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://sciencemasala.blogspot.com/feeds/117276202397324186/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3299081579068555714&amp;postID=117276202397324186' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/117276202397324186'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/117276202397324186'/><link rel='alternate' type='text/html' href='http://sciencemasala.blogspot.com/2007/07/protein-structure-prediction.html' title='Protein structure prediction'/><author><name>chandu</name><uri>http://www.blogger.com/profile/00723479850668875546</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3299081579068555714.post-3225446551765492741</id><published>2007-07-21T23:34:00.000-07:00</published><updated>2007-07-21T23:36:57.567-07:00</updated><title type='text'>Protein-protein docking</title><content type='html'>&lt;p&gt;&lt;b&gt;Protein-protein docking&lt;/b&gt; is the determination of the molecular structure of &lt;a href="http://en.wikipedia.org/wiki/Complex_%28chemistry%29" title="Complex (chemistry)"&gt;complexes&lt;/a&gt; formed by two or more &lt;a href="http://en.wikipedia.org/wiki/Proteins" title="Proteins"&gt;proteins&lt;/a&gt; without the need for &lt;a href="http://en.wikipedia.org/wiki/Experiment" title="Experiment"&gt;experimental&lt;/a&gt; measurement. The study of protein-protein docking was boosted by the rapid increase in available protein structures of the &lt;a href="http://en.wikipedia.org/wiki/1990s" title="1990s"&gt;1990s&lt;/a&gt;, and it has now been under intensive research for over a decade. Many proteins which remain relatively rigid upon complexation can now be successfully docked. Methods are under development to handle cases where the internal conformation of one or more of the partners changes substantially.&lt;/p&gt; &lt;p&gt;Protein-protein docking generally does not refer to describing the path taken by the components during complexation; the only object of docking is the final complexed state. Since the natural use of "docking" suggests guidance along a path, "protein-protein docking" may be regarded as a misnomer.&lt;/p&gt;&lt;br /&gt;&lt;h2&gt;&lt;span class="mw-headline"&gt;Introduction&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;For most of the proteins known to science, their &lt;a href="http://en.wikipedia.org/wiki/Biology" title="Biology"&gt;biological&lt;/a&gt; role, as characterized by which other proteins they interact with, is incompletely understood. Even proteins which participate in a well-understood biological process (&lt;i&gt;e. g.&lt;/i&gt; the &lt;a href="http://en.wikipedia.org/wiki/Krebs_cycle" title="Krebs cycle"&gt;Krebs cycle&lt;/a&gt;) may have interaction partners or functions which are unrelated to that process. Moreover, vast numbers of "hypothetical" proteins were discovered in the &lt;a href="http://en.wikipedia.org/wiki/Genome" title="Genome"&gt;genomic&lt;/a&gt; revolution of the late 1990's, about which there remains no information at all, apart from their amino acid sequence.&lt;/p&gt; &lt;p&gt;In cases of known protein-protein interactions, other questions arise. &lt;a href="http://en.wikipedia.org/wiki/Genetic_disease" title="Genetic disease"&gt;Genetic diseases&lt;/a&gt; are known to be caused by misfolded or &lt;a href="http://en.wikipedia.org/wiki/Mutation" title="Mutation"&gt;mutated&lt;/a&gt; proteins (&lt;i&gt;e. g.&lt;/i&gt; &lt;a href="http://en.wikipedia.org/wiki/Cystic_fibrosis" title="Cystic fibrosis"&gt;cystic fibrosis&lt;/a&gt;), and there is a desire to understand what, if any, anomalous protein-protein interactions a given mutation can cause. In the distant future, proteins may be designed to perform biological functions, and a determination of the potential interactions of such proteins will be essential.&lt;/p&gt; &lt;p&gt;For any given set of proteins, the following questions may arise:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;Do the proteins bind &lt;i&gt;&lt;a href="http://en.wikipedia.org/wiki/In_vivo" title="In vivo"&gt;in vivo&lt;/a&gt;&lt;/i&gt;?&lt;/li&gt;&lt;/ul&gt; &lt;dl&gt;&lt;dd&gt;If they bind, &lt;ul&gt;&lt;li&gt;What is the spatial configuration which they adopt in their &lt;a href="http://en.wikipedia.org/wiki/Bound_state" title="Bound state"&gt;bound state&lt;/a&gt;?&lt;/li&gt;&lt;li&gt;How strong or weak is their &lt;a href="http://en.wikipedia.org/wiki/Interaction" title="Interaction"&gt;interaction&lt;/a&gt;?&lt;/li&gt;&lt;/ul&gt; &lt;/dd&gt;&lt;dd&gt;If they do not bind, can they be made to bind by inducing a mutation?&lt;/dd&gt;&lt;/dl&gt; &lt;p&gt;Protein-protein docking is proposed to have the ultimate potential to address all these issues comprehensively. Furthermore, since docking methods can be based on purely &lt;a href="http://en.wikipedia.org/wiki/Physics" title="Physics"&gt;physical&lt;/a&gt; principles, even proteins of unknown function (or which have been studied relatively little) may be docked. The only prerequisite is that their &lt;a href="http://en.wikipedia.org/wiki/Molecular_structure" title="Molecular structure"&gt;molecular structure&lt;/a&gt; has been either determined experimentally, or can be estimated by some theoretical technique (see &lt;a href="http://en.wikipedia.org/wiki/Protein_structure_prediction" title="Protein structure prediction"&gt;protein structure prediction&lt;/a&gt;).&lt;/p&gt; &lt;p&gt;&lt;a name="Rigid-body_docking_vs._flexible_docking" id="Rigid-body_docking_vs._flexible_docking"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Rigid-body docking &lt;i&gt;vs&lt;/i&gt;. flexible docking&lt;/span&gt;&lt;/h3&gt; &lt;p&gt;If the &lt;a href="http://en.wikipedia.org/wiki/Molecular_geometry" title="Molecular geometry"&gt;bond angles, bond lengths and torsion angles&lt;/a&gt; of the components are not modified at any stage of complex generation, it is known as &lt;i&gt;rigid body docking&lt;/i&gt;. A subject of speculation is whether or not rigid-body docking is sufficiently good for most docking. When substantial conformational change occurs within the components at the time of complex formation, rigid-body docking is inadequate. However, scoring all possible conformational changes is prohibitively expensive in computer time. Docking procedures which permit conformational change, or &lt;i&gt;flexible docking&lt;/i&gt; procedures, must intelligently select small subset of possible conformational changes for consideration.&lt;/p&gt; &lt;p&gt;&lt;a name="Methods" id="Methods"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Methods&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;Successful docking requires two criteria:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;Generating a set configurations which reliably includes at least one nearly correct one.&lt;/li&gt;&lt;li&gt;Reliably distinguishing nearly correct configurations from the others.&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;For many interactions, the binding site is known on one or more of the proteins to be docked. This is the case for &lt;a href="http://en.wikipedia.org/wiki/Antibody" title="Antibody"&gt;antibodies&lt;/a&gt; and for &lt;a href="http://en.wikipedia.org/wiki/Competitive_inhibitor" title="Competitive inhibitor"&gt;competitive inhibitors&lt;/a&gt;. In other cases, a binding site may be strongly suggested by &lt;a href="http://en.wikipedia.org/wiki/Mutagenesis" title="Mutagenesis"&gt;mutagenic&lt;/a&gt; or &lt;a href="http://en.wikipedia.org/wiki/Phylogeny" title="Phylogeny"&gt;phylogenetic&lt;/a&gt; evidence. Configurations where the proteins interpenetrate severely may also be ruled out &lt;i&gt;a priori&lt;/i&gt;.&lt;/p&gt; &lt;p&gt;After making exclusions based on prior knowledge or &lt;a href="http://en.wikipedia.org/wiki/Stereochemistry" title="Stereochemistry"&gt;stereochemical&lt;/a&gt; clash, the remaining space of possible complexed structures must be sampled exhaustively, evenly and with a sufficient coverage to guarantee a near hit. Each configuration must be scored with a measure that is capable of ranking a nearly correct structure above at least 100,000 alternatives. This is a computationally intensive task, and a variety of strategies have been developed.&lt;/p&gt; &lt;p&gt;&lt;a name="Reciprocal_space_methods" id="Reciprocal_space_methods"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Reciprocal space methods&lt;/span&gt;&lt;/h3&gt; &lt;p&gt;Each of the proteins may be represented as a simple cubic lattice. Then, for the class of scores which are discrete &lt;a href="http://en.wikipedia.org/wiki/Convolution" title="Convolution"&gt;convolutions&lt;/a&gt;, configurations related to each other by translation of one protein by an exact lattice vector can all be scored almost simultaneously by applying the &lt;a href="http://en.wikipedia.org/wiki/Convolution_theorem" title="Convolution theorem"&gt;convolution theorem&lt;/a&gt;&lt;span class="reference plainlinksneverexpand" id="ref_Katzir"&gt;&lt;sup&gt;&lt;a href="http://en.wikipedia.org/wiki/Protein-protein_docking#endnote_Katzir" class="external autonumber" title="http://en.wikipedia.org/wiki/Protein-protein_docking#endnote_Katzir" rel="nofollow"&gt;[1]&lt;/a&gt;&lt;/sup&gt;&lt;/span&gt;. It is possible to construct reasonable, if approximate, convolution-like scoring functions representing both stereochemical and electrostatic fitness.&lt;/p&gt; &lt;p&gt;Reciprocal space methods have been used extensively for their ability to evaluate enormous numbers of configurations. They lose their speed advantage if torsional changes are introduced. Another drawback is that it is impossible to make efficient use of prior knowledge. The question also remains whether convolutions are too limited a class of scoring function to identify the best complex reliably.&lt;/p&gt; &lt;p&gt;&lt;a name="Monte_Carlo_methods" id="Monte_Carlo_methods"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Monte Carlo methods&lt;/span&gt;&lt;/h3&gt; &lt;p&gt;In &lt;a href="http://en.wikipedia.org/wiki/Monte_Carlo_method" title="Monte Carlo method"&gt;Monte Carlo&lt;/a&gt;, an initial configuration is refined by taking random steps which are accepted or rejected based on their induced improvement in score (see the &lt;a href="http://en.wikipedia.org/wiki/Metropolis-Hastings_algorithm" title="Metropolis-Hastings algorithm"&gt;Metropolis criterion&lt;/a&gt;), until a certain number of steps have been tried. The assumption is that convergence to the best structure should occur from a large class of initial configurations, only one of which needs to be considered. Initial configurations may be sampled coarsely, and much computation time can be saved. Because of the difficulty of finding a scoring function which is both highly discriminating for the correct configuration and also converges to the correct configuration from a distance, the use of two levels of refinement, with different scoring functions, has been proposed &lt;span class="reference plainlinksneverexpand" id="ref_Gray"&gt;&lt;sup&gt;&lt;a href="http://en.wikipedia.org/wiki/Protein-protein_docking#endnote_Gray" class="external autonumber" title="http://en.wikipedia.org/wiki/Protein-protein_docking#endnote_Gray" rel="nofollow"&gt;[2]&lt;/a&gt;&lt;/sup&gt;&lt;/span&gt;. Torsion can be introduced naturally to Monte Carlo as an additional property of each random move.&lt;/p&gt; &lt;p&gt;Monte Carlo methods are not guaranteed to search exhaustively, so that the best configuration may be missed even using a scoring function which would in theory identify it. How severe a problem this is for docking has not been firmly established.&lt;/p&gt; &lt;p&gt;&lt;a name="Selecting_the_docked_complex_structure" id="Selecting_the_docked_complex_structure"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Selecting the docked complex structure&lt;/span&gt;&lt;/h3&gt; &lt;p&gt;To find a score which forms a consistent basis for selecting the best configuration, studies are carried out on a standard benchmark &lt;a href="http://en.wikipedia.org/wiki/Protein-protein_docking#Benchmark" title="Protein-protein docking"&gt;(see below)&lt;/a&gt; of protein-protein interaction cases. Scoring functions are assessed on the rank they assign to the best structure (ideally the best structure should be ranked 1), and on their coverage (the proportion of the benchmark cases for which they achieve an acceptable result). Types of scores studied include:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;&lt;a href="http://en.wikipedia.org/wiki/Heuristic" title="Heuristic"&gt;Heuristic&lt;/a&gt; scores based on &lt;a href="http://en.wikipedia.org/wiki/Amino-acid" title="Amino-acid"&gt;residue&lt;/a&gt; contacts.&lt;/li&gt;&lt;li&gt;Shape complementarity of &lt;a href="http://en.wikipedia.org/wiki/Molecular_surface" title="Molecular surface"&gt;molecular surfaces&lt;/a&gt; ("stereochemistry").&lt;/li&gt;&lt;li&gt;Free energies, estimated using parameters from &lt;a href="http://en.wikipedia.org/wiki/Molecular_mechanics" title="Molecular mechanics"&gt;molecular mechanics&lt;/a&gt; &lt;a href="http://en.wikipedia.org/wiki/Force_field_%28chemistry%29" title="Force field (chemistry)"&gt;force fields&lt;/a&gt; such as &lt;a href="http://en.wikipedia.org/wiki/CHARMM" title="CHARMM"&gt;CHARMM&lt;/a&gt; or &lt;a href="http://en.wikipedia.org/wiki/AMBER" title="AMBER"&gt;AMBER&lt;/a&gt;.&lt;/li&gt;&lt;li&gt;Phylogenetic desirability of the interacting regions.&lt;/li&gt;&lt;li&gt;Clustering coefficients.&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;It is usual to create hybrid scores by combining one or more categories above in a weighted sum whose weights are optimized on cases from the benchmark. To avoid bias, the benchmark cases used to optimize the weights must not overlap with the cases used to make the final test of the score.&lt;/p&gt; &lt;p&gt;&lt;a name="Benchmark" id="Benchmark"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Benchmark&lt;/span&gt;&lt;/h3&gt; &lt;p&gt;A benchmark of 84 protein-protein interactions with known complexed structures has been developed for testing docking methods&lt;span class="reference plainlinksneverexpand" id="ref_Mintseris"&gt;&lt;sup&gt;&lt;a href="http://en.wikipedia.org/wiki/Protein-protein_docking#endnote_Mintseris" class="external autonumber" title="http://en.wikipedia.org/wiki/Protein-protein_docking#endnote_Mintseris" rel="nofollow"&gt;[3]&lt;/a&gt;&lt;/sup&gt;&lt;/span&gt;. The set is chosen to cover a wide range of interaction types, and to avoid repeated features, such as the profile of interactors' structural families according to the &lt;a href="http://en.wikipedia.org/wiki/Structural_Classification_of_Proteins" title="Structural Classification of Proteins"&gt;SCOP&lt;/a&gt; database. Benchmark elements are classified into three levels of difficulty (the most difficult containing the largest change in backbone conformation). The protein-protein docking benchmark contains examples of enzyme-inhibitor, antigen-antibody and homomultimeric complexes.&lt;/p&gt; &lt;p&gt;&lt;a name="The_CAPRI_assessment" id="The_CAPRI_assessment"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt;&lt;span class="mw-headline"&gt;The CAPRI assessment&lt;/span&gt;&lt;/h3&gt; &lt;p&gt;The Critical Assessment of Predicted Interactions (&lt;a href="http://capri.ebi.ac.uk/" class="external text" title="http://capri.ebi.ac.uk" rel="nofollow"&gt;CAPRI&lt;/a&gt;)&lt;span class="reference plainlinksneverexpand" id="ref_Proteins"&gt;&lt;sup&gt;&lt;a href="http://en.wikipedia.org/wiki/Protein-protein_docking#endnote_Proteins" class="external autonumber" title="http://en.wikipedia.org/wiki/Protein-protein_docking#endnote_Proteins" rel="nofollow"&gt;[4]&lt;/a&gt;&lt;/sup&gt;&lt;/span&gt; is an ongoing series of events in which researchers throughout the community try to dock the same proteins, as provided by the assessors. Rounds take place approximately every 6 months. Each round contains between one and six target protein-protein complexes whose structures have been recently determined experimentally. The coordinates and are held privately by the assessors, with the cooperation of the &lt;a href="http://en.wikipedia.org/wiki/Structural_biology" title="Structural biology"&gt;structural biologists&lt;/a&gt; who determined them. The assessment of submissions is &lt;a href="http://en.wikipedia.org/wiki/Double_blind" title="Double blind"&gt;double blind&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;CAPRI attracts a high level of participation (37 groups participated worldwide in round seven) and a high level of interest from the biological community in general. Although CAPRI results are of little statistical significance owing to the small number of targets in each round, the role of CAPRI in stimulating discourse is significant. (The &lt;a href="http://en.wikipedia.org/wiki/CASP" title="CASP"&gt;CASP&lt;/a&gt; assessment is a similar exercise in the field of protein structure prediction).&lt;/p&gt; &lt;p&gt;&lt;a name="Deciding_whether_a_complex_actually_occurs_in_nature_and_measuring_its_affinity" id="Deciding_whether_a_complex_actually_occurs_in_nature_and_measuring_its_affinity"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Deciding whether a complex actually occurs in nature and measuring its affinity&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;A reliable method for affinity prediction has the potential to transform biochemistry and cell biology. Though a distant prospect, affinity prediction may be considered the as the ultimate achievement in protein-protein docking.&lt;/p&gt; &lt;p&gt;&lt;a name="Protein-protein_docking_and_molecular_docking" id="Protein-protein_docking_and_molecular_docking"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Protein-protein docking and molecular docking&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;The field of protein-protein docking is highly computationally oriented, and it shares approaches with &lt;a href="http://en.wikipedia.org/wiki/Molecular_docking" title="Molecular docking"&gt;molecular docking&lt;/a&gt;. Molecular docking is sometimes referred to as &lt;i&gt;small-molecule docking&lt;/i&gt;, to distinguish it from protein-protein docking. Proteins complexed with &lt;a href="http://en.wikipedia.org/wiki/Nucleotide" title="Nucleotide"&gt;polynucleotide&lt;/a&gt; molecules are widely studied using similar or identical approaches to protein-protein docking, although if the &lt;a href="http://en.wikipedia.org/wiki/Nucleotide" title="Nucleotide"&gt;nucleotide&lt;/a&gt; molecule is small enough, the case may be framed as a &lt;a href="http://en.wikipedia.org/wiki/Molecular_docking" title="Molecular docking"&gt;molecular docking&lt;/a&gt; problem.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3299081579068555714-3225446551765492741?l=sciencemasala.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://sciencemasala.blogspot.com/feeds/3225446551765492741/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3299081579068555714&amp;postID=3225446551765492741' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/3225446551765492741'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/3225446551765492741'/><link rel='alternate' type='text/html' href='http://sciencemasala.blogspot.com/2007/07/protein-protein-docking.html' title='Protein-protein docking'/><author><name>chandu</name><uri>http://www.blogger.com/profile/00723479850668875546</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3299081579068555714.post-679014162621425670</id><published>2007-07-21T23:20:00.000-07:00</published><updated>2007-07-21T23:32:51.930-07:00</updated><title type='text'>BLAST</title><content type='html'>&lt;table class="infobox" style="width: 23px; font-size: 90%; text-align: left; height: 33px;" cellspacing="5"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td colspan="2" style="text-align: center; font-size: 130%; font-weight: bold;"&gt;&lt;br /&gt;&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;th&gt;&lt;br /&gt;&lt;/th&gt; &lt;td&gt;&lt;br /&gt;&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;th&gt;&lt;br /&gt;&lt;/th&gt; &lt;td&gt;&lt;br /&gt;&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;th&gt;&lt;br /&gt;&lt;/th&gt; &lt;td&gt;&lt;br /&gt;&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;th&gt;&lt;br /&gt;&lt;/th&gt; &lt;td style="text-align: justify;"&gt;&lt;br /&gt;&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;th&gt;&lt;br /&gt;&lt;/th&gt; &lt;td&gt;&lt;br /&gt;&lt;br /&gt;&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;th&gt;&lt;br /&gt;&lt;/th&gt; &lt;td&gt;&lt;br /&gt;&lt;/td&gt; &lt;/tr&gt; &lt;/tbody&gt;&lt;/table&gt;In &lt;a href="http://en.wikipedia.org/wiki/Bioinformatics" title="Bioinformatics"&gt;bioinformatics&lt;/a&gt;, &lt;b&gt;B&lt;/b&gt;asic &lt;b&gt;L&lt;/b&gt;ocal &lt;b&gt;A&lt;/b&gt;lignment &lt;b&gt;S&lt;/b&gt;earch &lt;b&gt;T&lt;/b&gt;ool, or &lt;b&gt;BLAST&lt;/b&gt;, is an &lt;a href="http://en.wikipedia.org/wiki/Algorithm" title="Algorithm"&gt;algorithm&lt;/a&gt; for comparing &lt;a href="http://en.wikipedia.org/wiki/Primary_structure" title="Primary structure"&gt;primary&lt;/a&gt; biological sequence information, such as the &lt;a href="http://en.wikipedia.org/wiki/Amino_acid" title="Amino acid"&gt;amino-acid&lt;/a&gt; sequences of different &lt;a href="http://en.wikipedia.org/wiki/Protein" title="Protein"&gt;proteins&lt;/a&gt; or the &lt;a href="http://en.wikipedia.org/wiki/Nucleotide" title="Nucleotide"&gt;nucleotides&lt;/a&gt; of &lt;a href="http://en.wikipedia.org/wiki/DNA_sequence" title="DNA sequence"&gt;DNA sequences&lt;/a&gt;. A &lt;i&gt;BLAST search&lt;/i&gt; enables a researcher to compare a query sequence with a library or &lt;a href="http://en.wikipedia.org/wiki/Database" title="Database"&gt;database&lt;/a&gt; of sequences, and identify library sequences that resemble the query sequence above a certain threshold. For example, following the discovery of a previously unknown gene in the &lt;a href="http://en.wikipedia.org/wiki/Mus_musculus" title="Mus musculus"&gt;mouse&lt;/a&gt;, a scientist will typically perform a BLAST search of the &lt;a href="http://en.wikipedia.org/wiki/Human_genome" title="Human genome"&gt;human genome&lt;/a&gt; to see if humans carry a similar gene; BLAST will identify sequences in the human genome that resemble the mouse gene based on similarity of sequence.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;BLAST is one of the most widely used bioinformatics programs&lt;sup class="noprint Template-Fact"&gt;&lt;a href="http://en.wikipedia.org/wiki/Wikipedia:Citing_sources" title="Wikipedia:Citing sources"&gt;&lt;span title="This claim needs references to reliable sources since April 2007" style="white-space: nowrap;"&gt;[&lt;i&gt;citation needed&lt;/i&gt;]&lt;/span&gt;&lt;/a&gt;&lt;/sup&gt;, probably because it addresses a fundamental problem and the algorithm emphasizes speed over sensitivity. This emphasis on speed is vital to making the algorithm practical on the huge genome databases currently available, although subsequent algorithms can be even faster.&lt;/p&gt; &lt;p&gt;Examples of other questions that researchers use BLAST to answer are&lt;/p&gt; &lt;ul&gt;&lt;li&gt;Which &lt;a href="http://en.wikipedia.org/wiki/Bacterium" title="Bacterium"&gt;bacterial&lt;/a&gt; &lt;a href="http://en.wikipedia.org/wiki/Species" title="Species"&gt;species&lt;/a&gt; have a protein that is related in lineage to a certain protein whose &lt;a href="http://en.wikipedia.org/wiki/Primary_structure" title="Primary structure"&gt;amino-acid sequence&lt;/a&gt; I know?&lt;/li&gt;&lt;li&gt;Where does the DNA that I've just sequenced come from?&lt;/li&gt;&lt;li&gt;What other genes encode proteins that exhibit structures or &lt;a href="http://en.wikipedia.org/wiki/Structural_motif" title="Structural motif"&gt;motifs&lt;/a&gt; such as the one I've just determined?&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;BLAST is also often used as part of other algorithms that require approximate sequence matching.&lt;/p&gt; &lt;p&gt;The BLAST algorithm and the &lt;a href="http://en.wikipedia.org/wiki/Computer_program" title="Computer program"&gt;computer program&lt;/a&gt; that implements it were developed by &lt;a href="http://en.wikipedia.org/wiki/Stephen_Altschul" title="Stephen Altschul"&gt;Stephen Altschul&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/w/index.php?title=Warren_Gish&amp;action=edit" class="new" title="Warren Gish"&gt;Warren Gish&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/David_Lipman" title="David Lipman"&gt;David Lipman&lt;/a&gt; at the U.S. &lt;a href="http://en.wikipedia.org/wiki/NCBI" title="NCBI"&gt;National Center for Biotechnology Information&lt;/a&gt; (NCBI), &lt;a href="http://en.wikipedia.org/wiki/Webb_Miller" title="Webb Miller"&gt;Webb Miller&lt;/a&gt; at &lt;a href="http://en.wikipedia.org/wiki/The_Pennsylvania_State_University" title="The Pennsylvania State University"&gt;The Pennsylvania State University&lt;/a&gt;, and &lt;a href="http://en.wikipedia.org/wiki/Gene_Myers" title="Gene Myers"&gt;Gene Myers&lt;/a&gt; at the &lt;a href="http://en.wikipedia.org/wiki/University_of_Arizona" title="University of Arizona"&gt;University of Arizona&lt;/a&gt; . It is available on the web at . Alternative implementations are available at  (WU-BLAST) and  (FSA-BLAST).&lt;/p&gt; &lt;p&gt;The original paper "Altschul, SF, W Gish, W Miller, EW Myers, and DJ Lipman. Basic local alignment search tool. J Mol Biol 215(3):403-10, 1990."was the most highly cited paper published in the 1990s&lt;sup class="noprint Template-Fact"&gt;&lt;a href="http://en.wikipedia.org/wiki/Wikipedia:Citing_sources" title="Wikipedia:Citing sources"&gt;&lt;span title="This claim needs references to reliable sources since February 2007" style="white-space: nowrap;"&gt;&lt;/span&gt;&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt; &lt;p&gt;&lt;a name="Input.2FOutput" id="Input.2FOutput"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Input/Output&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;Input and Output, complies to the &lt;a href="http://en.wikipedia.org/wiki/FASTA_format" title="FASTA format"&gt;FASTA format&lt;/a&gt;&lt;/p&gt; &lt;p&gt;&lt;a name="Algorithm" id="Algorithm"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt;&lt;span class="mw-headline"&gt;Algorithm&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;To run, BLAST requires two sequences as input: a query sequence (also called the target sequence) and a sequence database. BLAST will find subsequences in the query that are similar to subsequences in the database. In typical usage, the query sequence is much smaller than the database, e.g., the query may be one thousand nucleotides while the database is several billion nucleotides.&lt;/p&gt; &lt;p&gt;BLAST searches for high scoring &lt;a href="http://en.wikipedia.org/wiki/Sequence_alignment" title="Sequence alignment"&gt;sequence alignments&lt;/a&gt; between the query sequence and sequences in the database using a heuristic approach that approximates the &lt;a href="http://en.wikipedia.org/wiki/Smith-Waterman_algorithm" title="Smith-Waterman algorithm"&gt;Smith-Waterman algorithm&lt;/a&gt;. The exhaustive Smith-Waterman approach is too slow for searching large genomic databases such as GenBank. Therefore, the BLAST algorithm uses a &lt;a href="http://en.wikipedia.org/wiki/Heuristic" title="Heuristic"&gt;heuristic&lt;/a&gt; approach that is slightly less accurate than Smith-Waterman but over 50 times faster. The speed and relatively good accuracy of BLAST are the key technical innovation of the BLAST programs and arguably why the tool is the most popular bioinformatics search tool.&lt;/p&gt; &lt;p&gt;The BLAST algorithm can be conceptually divided into three stages.&lt;/p&gt; &lt;ul&gt;&lt;li&gt;In the first stage, BLAST searches for exact matches of a small fixed length W between the query and sequences in the database. For example, given the sequences AGTTAC and ACTTAG and a word length W = 3, BLAST would identify the matching substring TTA that is common to both sequences. By default, W = 11 for nucleic seeds.&lt;/li&gt;&lt;li&gt;In the second stage, BLAST tries to extend the match in both directions, starting at the seed. The ungapped alignment process extends the initial seed match of length W in each direction in an attempt to boost the alignment score. Insertions and deletions are not considered during this stage. For our example, the ungapped alignment between the sequences AGTTAC and ACTTAG centered around the common word TTA would be:&lt;/li&gt;&lt;/ul&gt; &lt;pre&gt;..AGTTAC..&lt;br /&gt;| |||&lt;br /&gt;..ACTTAG..&lt;br /&gt;&lt;/pre&gt; &lt;p&gt;If a high-scoring ungapped alignment is found, the database sequence is passed on to the third stage.&lt;/p&gt; &lt;ul&gt;&lt;li&gt;In the third stage, BLAST performs a gapped alignment between the query sequence and the database sequence using a variation of the &lt;a href="http://en.wikipedia.org/wiki/Smith-Waterman_algorithm" title="Smith-Waterman algorithm"&gt;Smith-Waterman algorithm&lt;/a&gt;. &lt;a href="http://en.wikipedia.org/wiki/Statistically_significant" title="Statistically significant"&gt;Statistically significant&lt;/a&gt; alignments are then displayed to the user.&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;An extremely fast but considerably less sensitive alternative to BLAST that compares nucleotide sequences to the genome is &lt;a href="http://en.wikipedia.org/wiki/BLAT" title="BLAT"&gt;BLAT&lt;/a&gt; (&lt;b&gt;B&lt;/b&gt;last &lt;b&gt;L&lt;/b&gt;ike &lt;b&gt;A&lt;/b&gt;lignment &lt;b&gt;T&lt;/b&gt;ool). A version designed for comparing multiple large genomes or chromosomes is &lt;a href="http://en.wikipedia.org/w/index.php?title=BLASTZ&amp;amp;action=edit" class="new" title="BLASTZ"&gt;BLASTZ&lt;/a&gt;. Also there is another well-known software called &lt;a href="http://www.bioinformaticssolutions.com/products/ph/index.php" class="external text" title="http://www.bioinformaticssolutions.com/products/ph/index.php" rel="nofollow"&gt;PatternHunter&lt;/a&gt; which produces significantly better sensitivity results than BLAST at the same speed or very similar sensitivity results at a much faster speed.&lt;/p&gt; &lt;p&gt;&lt;a name="Parallel_BLAST" id="Parallel_BLAST"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Parallel BLAST&lt;/span&gt;&lt;/h3&gt; &lt;p&gt;Parallel BLAST versions are implemented using &lt;a href="http://en.wikipedia.org/wiki/Message_Passing_Interface" title="Message Passing Interface"&gt;MPI&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/Pthreads" title="Pthreads"&gt;Pthreads&lt;/a&gt; and are ported on various platforms including &lt;a href="http://en.wikipedia.org/wiki/Microsoft_Windows" title="Microsoft Windows"&gt;Windows&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/Linux" title="Linux"&gt;Linux&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/Solaris_Operating_Environment" title="Solaris Operating Environment"&gt;Solaris&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/Mac_OS_X" title="Mac OS X"&gt;OSX&lt;/a&gt;, and &lt;a href="http://en.wikipedia.org/wiki/AIX_operating_system" title="AIX operating system"&gt;AIX&lt;/a&gt;. Popular approaches to parallelize BLAST include query distribution, hash table segmentation, computation parallelization, and database segmentation(partition).&lt;/p&gt; &lt;p&gt;&lt;br /&gt;&lt;/p&gt; &lt;p&gt;&lt;a name="Program" id="Program"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Program&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;The BLAST program can either be downloaded and run as a command-line utility "blastall" or accessed for free over the web. The BLAST web server, hosted by the &lt;a href="http://en.wikipedia.org/wiki/NCBI" title="NCBI"&gt;NCBI&lt;/a&gt;, allows anyone with a web browser to perform similarity searches against constantly updated databases of proteins and DNA that include most of the newly sequenced organisms.&lt;/p&gt; &lt;p&gt;BLAST is actually a family of programs (all included in the blastall executable). The following are some of the programs, ranked mostly in order of importance:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;&lt;b&gt;Nucleotide-nucleotide BLAST (blastn)&lt;/b&gt;: This program, given a DNA query, returns the most similar DNA sequences from the DNA database that the user specifies.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Protein-protein BLAST (blastp)&lt;/b&gt;: This program, given a protein query, returns the most similar protein sequences from the protein database that the user specifies.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Position-Specific Iterative BLAST (PSI-BLAST)&lt;/b&gt;: One of the more recent BLAST programs, this program is used for finding distant relatives of a protein. First, a list of all closely related proteins is created. Then these proteins are combined into a "profile" that is a sort of average sequence. A query against the protein database is then run using this profile, and a larger group of proteins found. This larger group is used to construct another profile, and the process is repeated.&lt;br /&gt;By including related &lt;a href="http://en.wikipedia.org/wiki/Protein" title="Protein"&gt;proteins&lt;/a&gt; in the search, PSI-BLAST is much more sensitive in picking up distant &lt;a href="http://en.wikipedia.org/wiki/Phylogenetic_tree" title="Phylogenetic tree"&gt;evolutionary relationships&lt;/a&gt; than the standard protein-protein BLAST.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Nucleotide 6-frame translation-protein (blastx)&lt;/b&gt;: This program compares the six-frame conceptual translation products of a nucleotide query sequence (both strands) against a protein sequence database.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Nucleotide 6-frame translation-nucleotide 6-frame translation (tblastx)&lt;/b&gt;: This program is the slowest of the BLAST family. It translates the query nucleotide sequence in all six possible frames and compares it against the six-frame translations of a nucleotide sequence database. The purpose of tblastx is to find very distant relationships between nucleotide sequences.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Protein-nucleotide 6-frame translation (tblastn)&lt;/b&gt;: This program compares a protein query against the six-frame translations of a nucleotide sequence database.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Large numbers of query sequences (megablast)&lt;/b&gt;: When comparing large numbers of input sequences via the command-line BLAST, "megablast" is much faster than running BLAST multiple times. It basically concatenates many in&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3299081579068555714-679014162621425670?l=sciencemasala.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://sciencemasala.blogspot.com/feeds/679014162621425670/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3299081579068555714&amp;postID=679014162621425670' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/679014162621425670'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/679014162621425670'/><link rel='alternate' type='text/html' href='http://sciencemasala.blogspot.com/2007/07/blast.html' title='BLAST'/><author><name>chandu</name><uri>http://www.blogger.com/profile/00723479850668875546</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3299081579068555714.post-6585003669768666236</id><published>2007-07-21T05:21:00.000-07:00</published><updated>2007-07-21T05:25:21.821-07:00</updated><title type='text'>DNA SEQUENCING</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp2.blogger.com/_0fSb-1TJAx0/RqH66iNp_CI/AAAAAAAAAB8/HtbhJ9zpzv8/s1600-h/reactions.jpg"&gt;&lt;br /&gt;&lt;/a&gt;&lt;table border="1"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td align="center"&gt;&lt;table border="0"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;    &lt;br /&gt;&lt;/td&gt;&lt;td&gt;    &lt;br /&gt;&lt;/td&gt;&lt;td&gt;          &lt;b&gt;DNA sequencing reactions&lt;/b&gt; are just like the PCR reactions for replicating DNA           (refer to the previous page &lt;a href="http://seqcore.brcf.med.umich.edu/doc/educ/dnapr/pg2.html"&gt;DNA Denaturation, Annealing and Replication&lt;/a&gt;).          The reaction mix includes the template DNA, free nucleotides,          an enzyme (usually a variant of Taq polymerase) and a 'primer' - a small piece          of single-stranded DNA about 20-30 nt long that can hybridize to one strand          of the template DNA.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp2.blogger.com/_0fSb-1TJAx0/RqH66iNp_CI/AAAAAAAAAB8/HtbhJ9zpzv8/s1600-h/reactions.jpg"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://bp2.blogger.com/_0fSb-1TJAx0/RqH66iNp_CI/AAAAAAAAAB8/HtbhJ9zpzv8/s400/reactions.jpg" alt="" id="BLOGGER_PHOTO_ID_5089624937428810786" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;         &lt;p&gt;          The reaction is initiated by heating until the two strands of DNA separate, then          the primer sticks to its intended location and DNA polymerase starts elongating          the primer. If allowed to go to completion, a new strand of DNA would be the          result. If we start with a billion identical pieces of template DNA, we'll get          a billion new copies of one of its strands.       &lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;  &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="center"&gt;     &lt;table border="0"&gt;       &lt;tbody&gt;&lt;tr&gt;&lt;td&gt;          &lt;img src="http://seqcore.brcf.med.umich.edu/doc/educ/dnapr/dideoxy.jpg" /&gt;       &lt;/td&gt;&lt;td&gt;    &lt;br /&gt;&lt;/td&gt;&lt;td&gt;          &lt;spacer type="vertical" size="20"&gt;          &lt;b&gt;Dideoxynucleotides:&lt;/b&gt; We run the reactions, however, in the presence of a dideoxyribonucleotide. This          is just like regular DNA, except it has no 3' hydroxyl group - once it's added          to the end of a DNA strand, there's no way to continue elongating it.          &lt;p&gt;          Now the key to this is that MOST of the nucleotides are regular ones, and just a fraction          of them are dideoxy nucleotides....          &lt;spacer type="vertical" size="20"&gt;    &lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;  &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="center"&gt;     &lt;table border="0"&gt;       &lt;tbody&gt;&lt;tr&gt;&lt;td&gt;          &lt;img src="http://seqcore.brcf.med.umich.edu/doc/educ/dnapr/ddextend.jpg" /&gt;       &lt;/td&gt;&lt;td&gt;    &lt;br /&gt;&lt;/td&gt;&lt;td&gt;          &lt;b&gt;Replicating a DNA strand in the presence of dideoxy-T&lt;/b&gt;          &lt;p&gt;          MOST of the time when a 'T' is required to make the new strand, the enzyme will          get a good one and there's no problem. MOST of the time after adding a T, the          enzyme will go ahead and add more nucleotides. However, 5% of the time, the enzyme will          get a dideoxy-T, and that strand can never again be elongated. It eventually breaks away          from the enzyme, a dead end product.          &lt;/p&gt;&lt;p&gt;          Sooner or later ALL of the copies will get terminated by a T, but each time the          enzyme makes a new strand, the place it gets stopped will be random. In millions          of starts, there will be strands stopping at every possible T along the way.          &lt;/p&gt;&lt;p&gt;          ALL of the strands we make started at one exact position. ALL of them end with          a T. There are billions of them ... many millions at each possible T position.          To find out where all the T's are in our newly synthesized strand, all we have          to do is find out the sizes of all the terminated products!    &lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;  &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="center"&gt;     &lt;table border="0"&gt;       &lt;tbody&gt;&lt;tr&gt;&lt;td&gt;          &lt;img src="http://seqcore.brcf.med.umich.edu/doc/educ/dnapr/lane.jpg" /&gt;       &lt;/td&gt;&lt;td&gt;    &lt;br /&gt;&lt;/td&gt;&lt;td&gt;          &lt;b&gt;Here's how we find out those fragment sizes.&lt;/b&gt;          &lt;p&gt;          Gel electrophoresis can be used to separate the fragments by size and measure          them. In the cartoon at left, we depict the results of a sequencing reaction          run in the presence of dideoxy-Cytidine (ddC).          &lt;/p&gt;&lt;p&gt;          First, let's add one fact: the dideoxy nucleotides in my lab have been chemically          modified to fluoresce under UV light. The dideoxy-C, for example, glows blue. Now          put the reaction products onto an 'electrophoresis gel' (you may need to refer to 'Gel Electrophoresis' in          the &lt;a href="http://seqcore.brcf.med.umich.edu/doc/educ/dnapr/mbglossary/mbgloss.html"&gt;Molecular Biology Glossary&lt;/a&gt;), and you'll see something like depicted          at left. Smallest fragments are at the bottom, largest at the top. The positions          and spacing shows the relative sizes. At the bottom is the smallest fragment that's          been terminated by ddC; that's probably the C closest to the end of the primer (which          is omitted from the sequence shown). Simply by scanning up the gel, we can see that           we skip two, and then there's two more C's in a row. Skip another, and there's          yet another C. And so on, all the way up. We can see where all the C's are.    &lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;  &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="center"&gt;  &lt;table border="0"&gt;    &lt;tbody&gt;&lt;tr&gt;&lt;td&gt;  &lt;img src="http://seqcore.brcf.med.umich.edu/doc/educ/dnapr/seqgels.jpg" /&gt; &lt;/td&gt;&lt;td&gt; &lt;b&gt;Putting all four deoxynucleotides into the picture:&lt;/b&gt; &lt;p&gt; Well, OK, it's not so easy reading just C's, as you perhaps saw in the last figure. The spacing between the bands isn't all that easy to figure out. Imagine, though, that we ran the reaction with *all four* of the dideoxy nucleotides (A, G, C and T) present, and with *different* fluorescent colors on each. NOW look at the gel we'd get (at left). The sequence of the DNA is rather obvious if you know the color codes ... just read the colors from bottom to top: TGCGTCCA-(etc). &lt;/p&gt;&lt;p&gt; (Forgive me for using black - it shows up better than yellow).  &lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;  &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;  &lt;table border="0"&gt;    &lt;tbody&gt;&lt;tr&gt;&lt;td&gt;  &lt;img src="http://seqcore.brcf.med.umich.edu/doc/educ/dnapr/gelfrag.jpg" /&gt; &lt;/td&gt;&lt;td&gt;&lt;br /&gt;&lt;/td&gt;&lt;td&gt; &lt;b&gt;An Automated sequencing gel:&lt;/b&gt; &lt;p&gt; That's exactly what we do to sequence DNA, then - we run DNA replication reactions in a test tube, but in the presence of trace amounts of &lt;i&gt;all four&lt;/i&gt; of the dideoxy terminator nucleotides. Electrophoresis is used to separate the resulting fragments by size and we can 'read' the sequence from it, as the colors march past in order. &lt;/p&gt;&lt;p&gt; In a large-scale sequencing lab, we use a machine to run the electrophoresis step and to monitor the different colors as they come out. Since about 2001, these machines -  not surprisingly called automated DNA sequencers - have used 'capillary electrophoresis', where the fragments are piped through a tiny glass-fiber capillary during the electrophoresis step, and they come out the far end in size-order. There's an ultraviolet laser built into the machine that shoots through the liquid emerging from the end of the capillaries, checking for pulses of fluorescent colors to emerge. There might be as many as 96 samples moving through as many capillaries ('lanes') in the most common type of sequencer. &lt;/p&gt;&lt;p&gt; At left is a screen shot of a real fragment of sequencing gel (this one from an older model of sequencer, but the concepts are identical). The four colors red, green, blue and yellow each represent one of the four nucleotides. &lt;/p&gt;&lt;p&gt; The actual gel image, if you could get a monitor large enough to see it all at this magnification, would be perhaps 3 or 4 meters long and 30 or 40 cm wide.  &lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;  &lt;/td&gt;&lt;/tr&gt;   &lt;tr&gt;&lt;td&gt; &lt;spacer type="vertical" size="20"&gt; &lt;b&gt;A 'Scan' of one gel lane:&lt;/b&gt; &lt;p&gt; We don't even have to 'read' the sequence from the gel - the computer does that for us! Below is an example of what the sequencer's computer shows us for one sample. This is a plot of the colors detected in one 'lane' of a gel (one sample), scanned from smallest fragments to largest. The computer even interprets the colors by printing the nucleotide sequence across the top of the plot. &lt;b&gt;This is just a fragment of the entire file, which would span around 900 or so nucleotides of accurate sequence.&lt;/b&gt; &lt;/p&gt;&lt;p&gt; The sequencer also gives the operator a text file containing just the nucleotide sequence, without the color traces. &lt;/p&gt;&lt;p align="center"&gt; &lt;spacer type="vertical" size="20"&gt; &lt;img src="http://seqcore.brcf.med.umich.edu/doc/educ/dnapr/good.GIF" /&gt;  &lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3299081579068555714-6585003669768666236?l=sciencemasala.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://sciencemasala.blogspot.com/feeds/6585003669768666236/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3299081579068555714&amp;postID=6585003669768666236' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/6585003669768666236'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/6585003669768666236'/><link rel='alternate' type='text/html' href='http://sciencemasala.blogspot.com/2007/07/dna-sequencing.html' title='DNA SEQUENCING'/><author><name>chandu</name><uri>http://www.blogger.com/profile/00723479850668875546</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp2.blogger.com/_0fSb-1TJAx0/RqH66iNp_CI/AAAAAAAAAB8/HtbhJ9zpzv8/s72-c/reactions.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3299081579068555714.post-1231784891506391120</id><published>2007-07-21T05:14:00.000-07:00</published><updated>2007-07-21T05:18:51.430-07:00</updated><title type='text'>Primer (molecular biology)</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp3.blogger.com/_0fSb-1TJAx0/RqH5lyNp_BI/AAAAAAAAAB0/KOzeN5IxG74/s1600-h/imageDFM.JPG"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://bp3.blogger.com/_0fSb-1TJAx0/RqH5lyNp_BI/AAAAAAAAAB0/KOzeN5IxG74/s400/imageDFM.JPG" alt="" id="BLOGGER_PHOTO_ID_5089623481434897426" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;p&gt;A &lt;b&gt;primer&lt;/b&gt; is a &lt;a href="http://en.wikipedia.org/wiki/Nucleic_acid" title="Nucleic acid"&gt;nucleic acid&lt;/a&gt; strand, or a related &lt;a href="http://en.wikipedia.org/wiki/Molecule" title="Molecule"&gt;molecule&lt;/a&gt; that serves as a starting point for &lt;a href="http://en.wikipedia.org/wiki/DNA_replication" title="DNA replication"&gt;DNA replication&lt;/a&gt;. A primer is required because most &lt;a href="http://en.wikipedia.org/wiki/DNA_polymerase" title="DNA polymerase"&gt;DNA polymerases&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/Enzyme" title="Enzyme"&gt;enzymes&lt;/a&gt; that catalyze the replication of &lt;a href="http://en.wikipedia.org/wiki/DNA" title="DNA"&gt;DNA&lt;/a&gt;, cannot begin synthesizing a new DNA strand from scratch, but can only add to an existing strand of &lt;a href="http://en.wikipedia.org/wiki/Nucleotide" title="Nucleotide"&gt;nucleotides&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;In most natural &lt;a href="http://en.wikipedia.org/wiki/DNA_replication" title="DNA replication"&gt;DNA replication&lt;/a&gt;, the ultimate primer for DNA synthesis is a short strand of &lt;a href="http://en.wikipedia.org/wiki/RNA" title="RNA"&gt;RNA&lt;/a&gt;. This RNA is produced by &lt;a href="http://en.wikipedia.org/wiki/Primase" title="Primase"&gt;primase&lt;/a&gt;, and is later removed and replaced with DNA by a DNA polymerase.&lt;/p&gt; &lt;p&gt;Many laboratory techniques of &lt;a href="http://en.wikipedia.org/wiki/Biochemistry" title="Biochemistry"&gt;biochemistry&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Molecular_biology" title="Molecular biology"&gt;molecular biology&lt;/a&gt; that involve DNA polymerases, such as &lt;a href="http://en.wikipedia.org/wiki/DNA_sequencing" title="DNA sequencing"&gt;DNA sequencing&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Polymerase_chain_reaction" title="Polymerase chain reaction"&gt;polymerase chain reaction&lt;/a&gt; (PCR), require primers. The primers used for these techniques are usually short, chemically synthesized DNA molecules with a length about twenty bases.&lt;/p&gt; &lt;table id="toc" class="toc" summary="Contents"&gt; &lt;tbody&gt;&lt;tr&gt; &lt;td&gt;&lt;br /&gt;&lt;/td&gt; &lt;/tr&gt; &lt;/tbody&gt;&lt;/table&gt; &lt;script type="text/javascript"&gt; //&lt;![CDATA[  if (window.showTocToggle) { var tocShowText = "show"; var tocHideText = "hide"; showTocToggle(); }  //]]&gt; &lt;/script&gt; &lt;p&gt;&lt;a name="Uses_of_synthetic_primers" id="Uses_of_synthetic_primers"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Uses of synthetic primers&lt;/span&gt;&lt;/h2&gt; &lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/DNA_sequencing" title="DNA sequencing"&gt;DNA sequencing&lt;/a&gt; is used to determine the nucleotides in a DNA strand; the chain termination method (dideoxy sequencing or Sanger method) uses a primer as a start marker for the chain reaction.&lt;/p&gt; &lt;p&gt;In &lt;a href="http://en.wikipedia.org/wiki/Polymerase_chain_reaction" title="Polymerase chain reaction"&gt;polymerase chain reaction&lt;/a&gt;, primers are used to determine the DNA fragment to be amplified by the PCR process. The length of primers is usually not more than 30 nucleotides, and they match exactly the beginning and the end of the DNA fragment to be amplified. They &lt;a href="http://en.wikipedia.org/wiki/Annealing_%28biology%29" title="Annealing (biology)"&gt;anneal&lt;/a&gt; (adhere) to the DNA template at these starting and ending points, where DNA polymerase binds and begins the synthesis of the new DNA strand.&lt;/p&gt; &lt;p&gt;It is worth noting that primers are not essentially always necessary for DNA synthesis and can in fact be used by viral polymerases, e.g. influenza, for RNA synthesis.&lt;/p&gt; &lt;p&gt;&lt;a name="PCR_primer_design" id="PCR_primer_design"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;PCR primer design&lt;/span&gt;&lt;/h2&gt; &lt;pre&gt;The &lt;a href="http://en.wikipedia.org/wiki/DNA_melting" title="DNA melting"&gt;melting temperature&lt;/a&gt; of a primer is defined as the temperature at which 50% of that same DNA molecule species form a stable double helix and the other 50% have been separated to single strand molecules. The melting temperature required increases with the length of the primer. Primers that are too short would anneal at several positions on a long DNA template, which would result in non-specific copies. On the other hand, the length of a primer is limited by the temperature required to melt it. Melting temperatures that are too high, i.e., above 80 °C, can also cause problems since the DNA polymerases used for PCR are less active at such temperatures. The optimum length of a primer is generally from 20 to 30 nucleotides with a melting temperature between about 55 °C and 65 °C.&lt;br /&gt;&lt;/pre&gt; &lt;p&gt;Pairs of primers should have the similar melting temperatures as annealing in a PCR reaction occurs for both simultaneously. A primer with a &lt;i&gt;T&lt;/i&gt;&lt;sub&gt;m&lt;/sub&gt; significantly higher than the reaction's annealing temperature may mishybridize and extend at an incorrect location along the DNA sequence, while &lt;i&gt;T&lt;/i&gt;&lt;sub&gt;m&lt;/sub&gt; significantly lower than the annealing temperature may fail to anneal and extend at all.&lt;/p&gt; &lt;p&gt;Primer sequences need to be chosen to uniquely select for a region of DNA, avoiding the possibility of mishybridization to a similar sequence nearby. Mononucleotide repeats should be avoided, as loop formation can occur and contribute to mishybridization. Primers should not easily anneal with other primers in the mixture (either other copies of same or the reverse direction primer); this phenomenon can lead to the production of &lt;a href="http://en.wikipedia.org/w/index.php?title=Primer_dimer&amp;action=edit" class="new" title="Primer dimer"&gt;primer dimer&lt;/a&gt; products contaminating the mixture. Primers should also not anneal strongly to themselves, as internal hairpins and loops could hinder the annealing with the template DNA.&lt;/p&gt; &lt;p&gt;&lt;a name="Degenerate_primers" id="Degenerate_primers"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h3&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Degenerate primers&lt;/span&gt;&lt;/h3&gt; &lt;p&gt;Sometimes &lt;i&gt;degenerate primers&lt;/i&gt; are used. These are actually mixtures of similar, but not identical, primers. They may be convenient if the same &lt;a href="http://en.wikipedia.org/wiki/Gene" title="Gene"&gt;gene&lt;/a&gt; is to be amplified from different &lt;a href="http://en.wikipedia.org/wiki/Organism" title="Organism"&gt;organisms&lt;/a&gt;, as the genes themselves are probably similar but not identical. The other use for degenerate primers is when primer design is based on &lt;a href="http://en.wikipedia.org/wiki/Protein_sequence" title="Protein sequence"&gt;protein sequence&lt;/a&gt;. As several different &lt;a href="http://en.wikipedia.org/wiki/Codon" title="Codon"&gt;codons&lt;/a&gt; can code for one &lt;a href="http://en.wikipedia.org/wiki/Amino_acid" title="Amino acid"&gt;amino acid&lt;/a&gt;, it is often difficult to deduce which codon is used in a particular case. Therefore primer sequence corresponding to the &lt;a href="http://en.wikipedia.org/wiki/Amino_acid" title="Amino acid"&gt;amino acid&lt;/a&gt; &lt;a href="http://en.wikipedia.org/wiki/Isoleucine" title="Isoleucine"&gt;isoleucine&lt;/a&gt; might be "ATH", where A stands for &lt;a href="http://en.wikipedia.org/wiki/Adenine" title="Adenine"&gt;adenine&lt;/a&gt;, T for &lt;a href="http://en.wikipedia.org/wiki/Thymine" title="Thymine"&gt;thymine&lt;/a&gt;, and H for &lt;a href="http://en.wikipedia.org/wiki/Adenine" title="Adenine"&gt;adenine&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/Thymine" title="Thymine"&gt;thymine&lt;/a&gt;, or &lt;a href="http://en.wikipedia.org/wiki/Cytosine" title="Cytosine"&gt;cytosine&lt;/a&gt;, according to the &lt;a href="http://en.wikipedia.org/wiki/Genetic_code" title="Genetic code"&gt;genetic code&lt;/a&gt; for each &lt;a href="http://en.wikipedia.org/wiki/Codon" title="Codon"&gt;codon&lt;/a&gt;. Use of degenerate primers can greatly reduce the specificity of the PCR amplification. The problem can be partly solved by using &lt;a href="http://en.wikipedia.org/wiki/Touchdown_PCR" title="Touchdown PCR"&gt;touchdown PCR&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;&lt;i&gt;Degenerate primers&lt;/i&gt; are widely used and extremely useful in the field of microbial ecology. The allow for the amplification of genes from thus far uncultivated microorganisms or allow the recovery of genes from organisms where genomic information is not available. Usually, degenerate primers are designed by aligning gene sequencing found in &lt;a href="http://en.wikipedia.org/wiki/GenBank" title="GenBank"&gt;GenBank&lt;/a&gt;. Differences among sequences are accounted for by using IUPAC degeneracies for individual bases. PCR primers are then synthesized as a mixture of primers corresponding to all permutations.&lt;/p&gt; &lt;p&gt;&lt;a name="Oligonucleotide_synthesis" id="Oligonucleotide_synthesis"&gt;&lt;/a&gt;&lt;/p&gt; &lt;h2&gt;&lt;span class="editsection"&gt;&lt;/span&gt; &lt;span class="mw-headline"&gt;Oligonucleotide synthesis&lt;/span&gt;&lt;/h2&gt; The actual construction of such primers starts with 3'-hydroxyl nucleosides (&lt;a href="http://en.wikipedia.org/wiki/Phosphoramidite" title="Phosphoramidite"&gt;phosphoramidite&lt;/a&gt;) attached to a &lt;a href="http://en.wikipedia.org/w/index.php?title=Controlled-pore_glass&amp;amp;action=edit" class="new" title="Controlled-pore glass"&gt;controlled-pore glass&lt;/a&gt; (CPG). The 5'-hydroxyl of the nucleosides is covered &lt;a href="http://en.wikipedia.org/w/index.php?title=Dimethoxytrityl&amp;action=edit" class="new" title="Dimethoxytrityl"&gt;dimethoxytrityl&lt;/a&gt; (DMT), which prevents the building of a nucleotide chain. To add a nucleotide, DMT is chemically removed, and the nucleotide is added. The 5'-hydroxyl of the new nucleotide is blocked by DMT, preventing the addition of more than one nucleotide to each chain. After that, the cycle is repeated for each nucleotide in the primer. This is a simplified description; the actual process is quite complicated. For that reason, most laboratories do not make primers themselves, but order them by specialized companies&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3299081579068555714-1231784891506391120?l=sciencemasala.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://sciencemasala.blogspot.com/feeds/1231784891506391120/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3299081579068555714&amp;postID=1231784891506391120' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/1231784891506391120'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/1231784891506391120'/><link rel='alternate' type='text/html' href='http://sciencemasala.blogspot.com/2007/07/primer-molecular-biology.html' title='Primer (molecular biology)'/><author><name>chandu</name><uri>http://www.blogger.com/profile/00723479850668875546</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp3.blogger.com/_0fSb-1TJAx0/RqH5lyNp_BI/AAAAAAAAAB0/KOzeN5IxG74/s72-c/imageDFM.JPG' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3299081579068555714.post-432345559214889039</id><published>2007-07-21T05:10:00.000-07:00</published><updated>2007-07-21T05:11:32.910-07:00</updated><title type='text'>RAPD</title><content type='html'>&lt;h1&gt;Random Amplified&lt;/h1&gt;&lt;h1&gt;Polymorphic DNA (RAPD)&lt;/h1&gt; &lt;h2&gt;Introduction&lt;/h2&gt; &lt;p&gt;&lt;span class="emphasis"&gt;Random Amplified Polymorphic DNA (RAPD)&lt;/span&gt; markers are DNA fragments from  &lt;a href="http://www.ncbi.nlm.nih.gov/projects/genome/probe/doc/TechPCR.shtml" title="Overview of PCR technology"&gt;PCR&lt;/a&gt; amplification of random segments of genomic DNA with single primer of arbitrary nucleotide sequence. &lt;/p&gt;   &lt;!-- &lt;a href="/projects/genome/probe/doc/ApplMapping.shtml"&gt;genome mapping&lt;/a&gt;.  --&gt;   &lt;!-- &lt;h4&gt;Principle of PCR and qRT-PCR&lt;/h4&gt; --&gt;     &lt;p class="BoxTitle"&gt;How It Works&lt;/p&gt;  &lt;p&gt;Unlike traditional PCR analysis, RAPD (pronounced "rapid") does not require any specific knowledge of the DNA sequence of the target organism: the identical 10-mer primers will or will not amplify a segment of DNA, depending on positions that are complementary to the primers' sequence. For example, no fragment is produced if primers annealed too far apart or 3' ends of the primers are not facing each other. Therefore, if a mutation has occured in the template DNA at the site that was previously complementary to the primer, a PCR product will not be produced, resulting in a different pattern of amplified DNA segments on the gel. &lt;/p&gt;  &lt;p class="BoxTitle"&gt;Example&lt;/p&gt;  &lt;p&gt;RAPD is an inexpensive yet powerful typing method for many bacterial species. &lt;/p&gt; &lt;p&gt;   &lt;img src="http://www.ncbi.nlm.nih.gov/projects/genome/probe/IMG/zjm0010661110001.jpg" alt="RAPD profiles, example" align="left" /&gt;&lt;br /&gt;  Silver-stained polyacrylamide gel showing three distinct RAPD profiles generated by primer OPE15 for &lt;i&gt;Haemophilus ducreyi&lt;/i&gt; isolates from Tanzania,  Senegal,  Thailand,  Europe, and North America. &lt;/p&gt; &lt;p&gt;Selecting the right sequence for the primer is very important because different sequences will produce different band patterns and possibly allow for a more specific recognition of individual strains. &lt;/p&gt;  &lt;p class="BoxTitle"&gt;Limitations of RAPD &lt;/p&gt;  &lt;p&gt;   &lt;/p&gt;&lt;ul&gt;&lt;li&gt;Nearly all RAPD markers are dominant, i.e. it is not possible to distinguish whether a DNA segment is amplified from a locus that is heterozygous (1 copy) or homozygous (2 copies). Co-dominant RAPD markers, observed as different-sized DNA segments amplified from the same locus, are detected only rarely.&lt;/li&gt;&lt;li&gt;PCR is an enzymatic reaction, therefore the quality and concentration of template DNA, concentrations of PCR components, and the PCR cycling conditions may greatly influence the outcome. Thus, the RAPD technique is notoriously laboratory dependent and needs carefully developed laboratory protocols to be reproducible.&lt;/li&gt;&lt;li&gt;Mismatches between the primer and the template may result in the total absence of PCR product as well as in a merely decreased amount of the product. Thus, the RAPD results can be difficult to interpret.&lt;/li&gt;&lt;/ul&gt;       &lt;p class="BoxTitle"&gt;Developing Locus-specific, Co-Dominant Markers from RAPDs&lt;/p&gt;  &lt;p&gt;     &lt;/p&gt;&lt;ul&gt;&lt;li&gt;The polymorphic RAPD marker band is isolated from the gel.&lt;/li&gt;&lt;li&gt;It is amplified in the PCR reaction.&lt;/li&gt;&lt;li&gt;The PCR product is cloned and sequenced.&lt;/li&gt;&lt;li&gt;New longer and specific primers are designed for the DNA sequence, which is called the &lt;span class="emphasis"&gt;Sequenced Characterized Amplified Region Marker (SCAR).&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3299081579068555714-432345559214889039?l=sciencemasala.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://sciencemasala.blogspot.com/feeds/432345559214889039/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3299081579068555714&amp;postID=432345559214889039' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/432345559214889039'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/432345559214889039'/><link rel='alternate' type='text/html' href='http://sciencemasala.blogspot.com/2007/07/rapd.html' title='RAPD'/><author><name>chandu</name><uri>http://www.blogger.com/profile/00723479850668875546</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3299081579068555714.post-7306509499274847147</id><published>2007-07-21T05:01:00.001-07:00</published><updated>2007-07-21T05:07:20.427-07:00</updated><title type='text'>RFLP</title><content type='html'>&lt;h1 align="center"&gt;Restriction&lt;br /&gt;&lt;/h1&gt;&lt;h1 align="center"&gt;FragmentLength&lt;br /&gt;&lt;/h1&gt;&lt;h1 align="center"&gt;Polymorphisms                              (RFLPs)&lt;/h1&gt;  &lt;a href="http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/R/RestrictionEnzymes.html"&gt;Restriction enzymes&lt;/a&gt; cut DNA at precise points producing &lt;ul&gt;&lt;li&gt;a collection of DNA fragments of precisely defined length. &lt;/li&gt;&lt;li&gt;These can be separated by &lt;a href="http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/R/RecombinantDNA.html#electrophoresis"&gt;electrophoresis&lt;/a&gt;, with the smaller fragments migrating farther than the larger fragments. &lt;/li&gt;&lt;li&gt;One or more of the fragments can be visualized with a "probe" — a molecule of single-stranded DNA that is &lt;ul&gt;&lt;li&gt;complementary to a run of nucleotides in one or more of the restriction fragments and is  &lt;/li&gt;&lt;li&gt;radioactive (or fluorescent).&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;/ul&gt;  &lt;p&gt;If probes encounter a complementary sequence of nucleotides in a test sample of DNA, they bind to it by Watson-Crick &lt;a href="http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/B/BasePairing.html"&gt;base pairing&lt;/a&gt; and thus identify it. &lt;/p&gt; &lt;p&gt;&lt;a href="http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/P/Polymorphisms.html"&gt;Polymorphisms&lt;/a&gt; are inherited differences found among the individuals in a population. &lt;/p&gt;  RFLPs have provided valuable information in many areas of biology, including: &lt;ul&gt;&lt;li&gt;screening human DNA for the presence of potentially deleterious genes ("Case 1"); &lt;/li&gt;&lt;li&gt;providing evidence to establish the innocence of, or a probability of the guilt of, a crime suspect by DNA "fingerprinting" (&lt;a href="http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/R/RFLPs.html#fingerprinting"&gt;"Case 3"&lt;/a&gt;).&lt;/li&gt;&lt;/ul&gt;  &lt;h2&gt;&lt;a name="case_1"&gt;Case 1&lt;/a&gt;: Screening for the sickle-cell gene &lt;/h2&gt;  &lt;img alt="" src="http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/S/SickleMutation.gif" align="right" height="129" width="393" /&gt;  Sickle cell disease is a genetic disorder in which both genes in the patient encode the amino acid &lt;b&gt;&lt;a href="http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/A/Ala_val.gif"&gt;valine&lt;/a&gt;&lt;/b&gt; (Val) in the sixth position of the beta chain (beta&lt;sup&gt;&lt;big&gt;S&lt;/big&gt;&lt;/sup&gt;) of the &lt;a href="http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/B/Blood.html#oxygen"&gt;hemoglobin&lt;/a&gt; molecule. "Normal" beta chains (beta&lt;big&gt;&lt;sup&gt;A&lt;/sup&gt;&lt;/big&gt;) have &lt;b&gt;&lt;a href="http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/A/Asp_glu.gif"&gt;glutamic acid&lt;/a&gt;&lt;/b&gt; at this position.  &lt;p&gt; The only difference between the two genes is the substitution of a T for an A in the middle position of &lt;a href="http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/C/Codons.html"&gt;codon&lt;/a&gt; 6. &lt;/p&gt;&lt;br /&gt;&lt;br /&gt;This &lt;ul&gt;&lt;li&gt; converts a GAG codon (for Glu) to a GTG codon for Val and &lt;/li&gt;&lt;li&gt;abolishes a sequence (CTGAGG, which spans codons 5, 6, and 7) recognized and cut by one of the restriction enzymes. &lt;/li&gt;&lt;/ul&gt;  &lt;img alt="" src="http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/S/SicklePedigree2.gif" height="195" width="440" /&gt;  &lt;p&gt;When the &lt;b&gt;normal&lt;/b&gt; gene (beta&lt;big&gt;&lt;sup&gt;A&lt;/sup&gt;&lt;/big&gt;) is digested with the enzyme and the fragments separated by electrophoresis, the probe binds to a &lt;b&gt;short&lt;/b&gt; fragment (between the red arrows).&lt;/p&gt; &lt;p&gt;However, the enzyme cannot cut the &lt;b&gt;sickle-cell gene&lt;/b&gt; at this site, so the probe attaches to a much larger fragment (between the blue arrows).&lt;/p&gt; &lt;p&gt;The figure (from data provided by S. E. Antonarakis) shows the pedigree of a family whose only son has sickle-cell disease. Both his father and mother were &lt;a href="http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/H/H.html#heterozygous"&gt;heterozygous&lt;/a&gt; (semifilled box and circle respectively) as they had to be to produce an afflicted child (solid box). The electrophoresis patterns for each member of the family are placed directly beneath them. Note that the two homozygous children (1 and 3) have only a single band, but these are more intense because there is twice as much DNA in them. &lt;/p&gt; &lt;p&gt;In this example, a change of a single nucleotide produced the RFLP. This is a very common cause of RFLPs and now such polymorphisms are often referred to as &lt;b&gt;single nucleotide polymorphisms&lt;/b&gt; or &lt;b&gt;SNPs&lt;/b&gt;. (However, not all RFLPs arise from SNPs. &lt;a href="http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/P/Phenylketonuria.html#PKUallele"&gt;Link to an example of one that didn't.&lt;/a&gt;)&lt;/p&gt; &lt;p&gt; How can these tools be used? &lt;/p&gt; &lt;p&gt;By testing the DNA of prospective parents, their genotype can be determined and their odds of producing an afflicted child can be determined. In the case of sickle-cell disease, if &lt;b&gt;both&lt;/b&gt; parents are heterozygous for the genes, there is a 1 in 4 chance that they will produce a child with the disease.  &lt;a href="http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/A/A.html#amniocentesis"&gt;Amniocentesis&lt;/a&gt; and &lt;a href="http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/C/C.html#chorionic_villus_sampling_%28CVS%29"&gt;chorionic villus sampling&lt;/a&gt; make it possible to apply the same techniques to the DNA of a &lt;a href="http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/F/F.html#fetus"&gt;fetus&lt;/a&gt; early in pregnancy. The parents can learn whether the unborn child will be free of the disease or not. They may choose to have an abortion rather than bring an afflicted child into the world. &lt;/p&gt;  Three problems: &lt;ul&gt;&lt;li&gt;The mutations that cause most human genetic diseases are more varied than the single mutation associated with sickle-cell disease. Over a thousand different mutations in the &lt;a href="http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/M/Mutations.html#cystic_fibrosis"&gt;cystic fibrosis gene&lt;/a&gt; can cause the disease. A probe for one will probably fail to identify a second. A mixture of probes, one for each of the more common mutations, can be used. But there remains the problem of "false negatives": people who are falsely told they do not carry a mutant gene. &lt;/li&gt;&lt;li&gt;There are many diseases which result from several mutant genes working together to produce the disease phenotype.  &lt;/li&gt;&lt;li&gt;There are still genetic diseases for which no gene has yet been discovered. Until the gene can be located, cloned, and sequenced, no probe can be made to detect it directly. However, it is sometimes possible to find a genetic "marker" that can serve as a surrogate for the gene itself. Let's see how.&lt;/li&gt;&lt;/ul&gt;  &lt;h2&gt;&lt;a name="RFLPs"&gt;Case 2&lt;/a&gt;: Screening for a RFLP "marker"&lt;/h2&gt; &lt;p&gt; If a particular RFLP is usually associated with a particular genetic disease, then the presence or absence of that RFLP can be used to counsel people about their risk of developing or transmitting the disease. The assumption is that the gene they are really interested in is located so close to the RFLP that the presence of the RFLP can serve as a surrogate for the disease gene itself. But people wanting to be tested cannot simply walk in off the street. Because of &lt;a href="http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/M/Meiosis.html#crossing_over"&gt;crossing over&lt;/a&gt;, a particular RFLP might be associated with the mutant gene in some people, with its healthy allele in others. Thus it is essential to examine not only the patient but as many members of the patient's family as possible. &lt;/p&gt; &lt;p&gt;The most useful probes for such analysis bind to a unique sequence of DNA; that is, a sequence occurring at only one place in the genome. Often this DNA is of unknown, if any, function. This can actually be helpful as this DNA has been freer to mutate without harm to the owner. The probe will hybridize (bind to) different lengths of digested DNA in different people depending on where the enzyme cutting sites are that each person has inherited. Thus a large variety of &lt;a href="http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/A/A.html#allele"&gt;alleles&lt;/a&gt; (polymorphisms) may be present in the population. Some people will be &lt;a href="http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/H/H.html#homozygous"&gt;homozygous&lt;/a&gt; and reveal a single band; others (e.g., all the family members shown below) will be &lt;a href="http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/H/H.html#heterozygous"&gt;heterozygous&lt;/a&gt; with each allele producing its band.&lt;/p&gt;  &lt;img alt="" src="http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/R/RFLPs_8.gif" height="328" width="473" /&gt;   &lt;p&gt;The pedigree shows the inheritance of a RFLP marker through three generations in a single family. A total of 8 alleles (numbered to the left of the blots) are present in the family. The RFLPs of each member of the family are placed directly below his (squares) or her (circles) symbol and RFLP numbers. &lt;/p&gt;  &lt;p&gt;If, for example, everyone who inherited RFLP 2 also has a certain inherited disorder, and no one lacking RFLP 2 has the disorder, we deduce that the gene for the disease is closely linked to this RFLP. If the parents decide to have another child, prenatal testing could reveal whether that child was apt to come down with the disease. &lt;/p&gt;  &lt;p&gt;But note, that crossing over during gamete formation could have moved the RFLP to the healthy allele. So the greater the distance between the RFLP and the gene locus, the lower the probability of an accurate diagnosis.&lt;/p&gt; &lt;table border="1"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;a href="http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/L/Linkage.html"&gt;Link to discussion of genetic linkage.&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;  &lt;h2&gt;Case 3: DNA "&lt;a name="fingerprinting"&gt;typing&lt;/a&gt;"&lt;/h2&gt;  &lt;img alt="" src="http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/L/Lifecodes2.gif" align="right" height="674" width="246" /&gt;  Each human cell contains 6 x 10&lt;sup&gt;9&lt;/sup&gt; base pairs of DNA. Some of this represents structural genes (e.g., for the beta chain of hemoglobin) that are identical in a large proportion of people. But long stretches of DNA do not encode for anything and are free to mutate extensively. It seems certain that if we could read the entire sequence of DNA in each human, we would never find two that were identical (unless the samples were from identical siblings; i.e., derived from a single zygote). &lt;p&gt; So each person's DNA is as unique as a fingerprint. &lt;/p&gt;  This truth has not escaped the law enforcement and legal professions. Analysis of DNA, called &lt;b&gt;DNA typing&lt;/b&gt;, is widely used to  &lt;ul&gt;&lt;li&gt;identify rapists and other criminals; &lt;/li&gt;&lt;li&gt;determine paternity; that is, who the father of the child really is; &lt;/li&gt;&lt;li&gt;determine whether a hopeful immigrant is, as he or she claims, really a close relative of already established residents.&lt;/li&gt;&lt;/ul&gt;  &lt;p&gt;The image (courtesy of Lifecodes Corporation) shows the test results in a rape case. Two probes were used: one revealing the bands at the top, the other those at the bottom. &lt;/p&gt;  DNA was tested from &lt;ul&gt;&lt;li&gt;semen removed from the vagina of the rape victim (EVIDENCE #2); &lt;/li&gt;&lt;li&gt;a semen stain left on the victim's clothing (EVIDENCE #1); &lt;/li&gt;&lt;li&gt;the DNA of the victim herself (VICTIM) to be sure that the DNA didn't come from her cells; &lt;/li&gt;&lt;li&gt;DNA from two suspects (SUSPECT #1, SUSPECT #2); &lt;/li&gt;&lt;li&gt;a set of DNA fragments of known and decreasing length (MARKER). They provide a built-in ruler for measuring the exact distance that each fragment travels. &lt;/li&gt;&lt;li&gt;the cells of a previously-tested person to be sure the probes are performing properly (CONTROL).&lt;/li&gt;&lt;/ul&gt;  &lt;p&gt;One the basis of this test, &lt;b&gt;suspect #2&lt;/b&gt; can clearly be &lt;b&gt;ruled out&lt;/b&gt;. None of his bands matches the bands found in the semen.  &lt;/p&gt;  Is &lt;b&gt;suspect #1&lt;/b&gt; guilty?  &lt;p&gt;We can never be certain. The best we can do is to estimate the probability that another person, picked at random, could provide the same DNA fingerprint. &lt;/p&gt; &lt;p&gt;As a conservative estimate, a given allele (band) might be found in 25% of the people tested. The probability of a random match of two alleles is (0.25)&lt;sup&gt;2&lt;/sup&gt; or 1 in 16. The probability that 6 alleles match, as in this case, is (0.25)&lt;sup&gt;6&lt;/sup&gt; or 1 in 4096.  &lt;/p&gt; &lt;p&gt;But the suspect was not picked at random, so you may feel that the evidence of guilt is strong. &lt;/p&gt; &lt;p&gt;The more probes you use, the more confident you can be that you have gotten the right man. If, for example, a set of probes revealed 14 bands in a suspect's DNA identical to those in the semen sample, the probability that you have the wrong man drops to less than 1 in 268 million (0.25)&lt;sup&gt;14&lt;/sup&gt; = 1/268,435,456, which is more than the entire population, males and females, in the United States.  &lt;/p&gt; &lt;a name="STRs"&gt;&lt;/a&gt; &lt;p&gt; Starting in 1999, law enforcement agencies in both Great Britain and the United States began switching to a new version of RFLP analysis using shorter sequences called &lt;b&gt;STRs&lt;/b&gt; ("&lt;b&gt;Short Tandem Repeats&lt;/b&gt;").  &lt;/p&gt;  STRs are repeated sequences of a few (usually four) nucleotides, e.g., TCATTCATTCATTCAT. They often occur in the untranslated parts of known genes (whose sequence can be used for the &lt;a href="http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/P/PCR.html"&gt;PCR primers&lt;/a&gt;). The exact number of repeats (6, 7, 8, 9, etc.) varies in different people (and, often, in the gene on each chromosome; that is, people are often heterozygous for the marker)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3299081579068555714-7306509499274847147?l=sciencemasala.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://sciencemasala.blogspot.com/feeds/7306509499274847147/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3299081579068555714&amp;postID=7306509499274847147' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/7306509499274847147'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/7306509499274847147'/><link rel='alternate' type='text/html' href='http://sciencemasala.blogspot.com/2007/07/rflp.html' title='RFLP'/><author><name>chandu</name><uri>http://www.blogger.com/profile/00723479850668875546</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3299081579068555714.post-5863840003887399896</id><published>2007-07-20T21:53:00.000-07:00</published><updated>2007-07-20T22:17:46.846-07:00</updated><title type='text'>ELISA</title><content type='html'>&lt;div style="text-align: justify;"&gt;&lt;span style="font-size:130%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;h1 style="text-align: justify;"&gt;&lt;span style=";font-size:130%;color:red;"  &gt;E&lt;/span&gt;&lt;span style="font-size:130%;"&gt;nzyme-&lt;/span&gt;&lt;span style=";font-size:130%;color:red;"  &gt;L&lt;/span&gt;&lt;span style="font-size:130%;"&gt;inked&lt;/span&gt;&lt;/h1&gt;&lt;h1 style="text-align: justify;"&gt;&lt;span style=";font-size:130%;color:red;"  &gt;I&lt;/span&gt;&lt;span style="font-size:130%;"&gt;mmuno&lt;/span&gt;&lt;span style=";font-size:130%;color:red;"  &gt;s&lt;/span&gt;&lt;span style="font-size:130%;"&gt;orbent &lt;/span&gt;&lt;span style=";font-size:130%;color:red;"  &gt;A&lt;/span&gt;&lt;span style="font-size:130%;"&gt;ssay&lt;br /&gt;&lt;/span&gt;&lt;/h1&gt;&lt;h1 style="text-align: justify;"&gt;&lt;span style="font-size:130%;"&gt;(ELISA)&lt;/span&gt;&lt;/h1&gt;&lt;div style="text-align: justify;"&gt; &lt;/div&gt;&lt;p style="text-align: justify;"&gt;ELISA is a widely-used method for measuring the concentration of a particular  molecule (e.g., a hormone or drug) in a fluid such as &lt;a href="http://www.blogger.com/B/Blood.html#serum"&gt;serum&lt;/a&gt; or urine. It is also known as &lt;span style="color:red;"&gt;e&lt;/span&gt;nzyme &lt;span style="color:red;"&gt;i&lt;/span&gt;mmuno&lt;span style="color:red;"&gt;a&lt;/span&gt;ssay or &lt;b&gt;EIA&lt;/b&gt;. &lt;/p&gt;&lt;div style="text-align: justify;"&gt; &lt;/div&gt;&lt;p style="text-align: justify;"&gt;The molecule is detected by antibodies that have been made against it; that  is, for which it is the &lt;b&gt;&lt;a href="http://www.blogger.com/A/AntigenPresentation.html"&gt;antigen&lt;/a&gt;&lt;/b&gt;. &lt;a href="http://www.blogger.com/M/Monoclonals.html"&gt;Monoclonal antibodies&lt;/a&gt; are often used. &lt;/p&gt;&lt;div style="text-align: justify;"&gt; The test requires:  &lt;/div&gt;&lt;ul style="text-align: justify;"&gt;&lt;li&gt;the antibodies fixed to a solid surface, such as the inner surface of a test  tube;  &lt;/li&gt;&lt;li&gt;a preparation of the same antibodies coupled to an enzyme. This is one  (e.g., &lt;a href="http://www.blogger.com/B/BigBlue.html"&gt;β-galactosidase&lt;/a&gt;) that produces a colored  product from a colorless substrate.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp3.blogger.com/_0fSb-1TJAx0/RqGRyyNp-1I/AAAAAAAAAAU/gqf9RqcHLk0/s1600-h/ELISA.gif"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://bp3.blogger.com/_0fSb-1TJAx0/RqGRyyNp-1I/AAAAAAAAAAU/gqf9RqcHLk0/s400/ELISA.gif" alt="" id="BLOGGER_PHOTO_ID_5089509355563907922" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: justify;"&gt;Performing the Test  &lt;/div&gt;&lt;ol style="text-align: justify;"&gt;&lt;li&gt;The tubes are filled with the antigen solution (e.g., urine) to be assayed.  Any antigen molecules present bind to the immobilized antibody molecules.  &lt;/li&gt;&lt;li&gt;The antibody-enzyme conjugate is added to the reaction mixture. The antibody  part of the conjugate binds to any antigen molecules that were bound previously,  creating an antibody-antigen-antibody "sandwich".  &lt;/li&gt;&lt;li&gt;After washing away any unbound conjugate, the substrate solution is added.  &lt;/li&gt;&lt;li&gt;After a set interval, the reaction is stopped (e.g., by adding 1 N NaOH) and  the concentration of colored product formed is measured in a spectrophotometer.  The intensity of color is proportional to the concentration of bound  &lt;b&gt;antigen&lt;/b&gt;. &lt;/li&gt;&lt;/ol&gt;&lt;div style="text-align: justify;"&gt;ELISA can also be adapted to measure the concentration  of &lt;b&gt;antibodies&lt;/b&gt;. In this case,  &lt;/div&gt;&lt;ol style="text-align: justify;"&gt;&lt;li&gt;The wells are coated with the appropriate &lt;b&gt;antigen&lt;/b&gt;.  &lt;/li&gt;&lt;li&gt;The solution (e.g., serum) containing antibodies is added.  &lt;/li&gt;&lt;li&gt;After they have had time to bind to the immobilized &lt;b&gt;antigen&lt;/b&gt;,  &lt;/li&gt;&lt;li&gt;an enzyme-conjugated &lt;b&gt;anti-immunoglobulin&lt;/b&gt; is added, consisting of  &lt;ul&gt;&lt;li&gt;an antibody against the antibodies being tested for. For example, if human  anti-&lt;a href="http://www.blogger.com/A/AIDS.html#Progression"&gt;HIV&lt;/a&gt; antibodies are being assayed,  then antibodies (raised in a goat or rabbit against human immunoglobulins) are  conjugated to  &lt;/li&gt;&lt;li&gt;the enzyme.&lt;/li&gt;&lt;/ul&gt; &lt;/li&gt;&lt;li&gt;After washing away unreacted reagent, the substrate is added.  &lt;/li&gt;&lt;li&gt;The intensity of the color produced is proportional to the amount of  enzyme-labeled antibodies bound (and thus to the concentration of the antibodies  being assayed). &lt;/li&gt;&lt;/ol&gt;&lt;div style="text-align: justify;"&gt;Literally hundreds of ELISA kits are manufactured for  &lt;/div&gt;&lt;ul style="text-align: justify;"&gt;&lt;li&gt;research  &lt;/li&gt;&lt;li&gt;human and veterinary diagnosis &lt;/li&gt;&lt;/ul&gt;&lt;div style="text-align: justify;"&gt;Some examples:  &lt;/div&gt;&lt;ul style="text-align: justify;"&gt;&lt;li&gt;screening donated blood for evidence of viral contamination by  &lt;ul&gt;&lt;li&gt;&lt;a href="http://www.blogger.com/A/AIDS.html"&gt;HIV-1 and HIV-2&lt;/a&gt; (presence of anti-HIV  antibodies)  &lt;/li&gt;&lt;li&gt;hepatitis C (presence of antibodies)  &lt;/li&gt;&lt;li&gt;&lt;a href="http://www.blogger.com/V/Viruses.html#HepatitisB"&gt;hepatitis B&lt;/a&gt; (testing for both  antibodies and a viral antigen)  &lt;/li&gt;&lt;li&gt;&lt;a href="http://www.blogger.com/C/CellCycle.html#MAD"&gt;HTLV-1&lt;/a&gt; and -2 (presence of antibodies)  &lt;/li&gt;&lt;/ul&gt; &lt;/li&gt;&lt;li&gt;measuring hormone levels  &lt;ul&gt;&lt;li&gt;&lt;a href="http://www.blogger.com/S/SexHormones.html#HCG"&gt;HCG&lt;/a&gt; (as a test for pregnancy)  &lt;/li&gt;&lt;li&gt;&lt;a href="http://www.blogger.com/P/Pituitary.html#LH_in_females"&gt;LH&lt;/a&gt; (determining the time of  ovulation)  &lt;/li&gt;&lt;li&gt;&lt;a href="http://www.blogger.com/P/Pituitary.html#TSH"&gt;TSH&lt;/a&gt;, &lt;a href="http://www.blogger.com/T/Thyroid.html#T4_and_T3"&gt;T3 and T4&lt;/a&gt; (for thyroid function)  &lt;/li&gt;&lt;li&gt;hormones (e.g., &lt;a href="http://www.blogger.com/S/SexHormones.html#anabolic"&gt;anabolic  steroids&lt;/a&gt;, &lt;a href="http://www.blogger.com/P/Pituitary.html#GH"&gt;HGH&lt;/a&gt;) that may have been used  illicitly by athletes&lt;/li&gt;&lt;/ul&gt; &lt;/li&gt;&lt;li&gt;detecting infections  &lt;ul&gt;&lt;li&gt;sexually-transmitted agents like &lt;a href="http://www.blogger.com/A/AIDS.html"&gt;HIV&lt;/a&gt;, &lt;a href="http://www.blogger.com/Eubacteria.html#Spirochetes"&gt;syphilis&lt;/a&gt;, and &lt;a href="http://www.blogger.com/Eubacteria.html#Chlamydiae"&gt;chlamydia&lt;/a&gt;  &lt;/li&gt;&lt;li&gt;hepatitis B and C  &lt;/li&gt;&lt;li&gt;&lt;a href="http://www.blogger.com/P/Protists.html#Sporozoans"&gt;Toxoplasma gondii&lt;/a&gt; &lt;/li&gt;&lt;/ul&gt; &lt;/li&gt;&lt;li&gt;detecting &lt;a href="http://www.blogger.com/A/Allergies.html#immediate"&gt;allergens&lt;/a&gt; in food and  house dust  &lt;/li&gt;&lt;li&gt;measuring "rheumatoid factors" and other autoantibodies in autoimmune  diseases like &lt;a href="http://www.blogger.com/A/Allergies.html#SLE"&gt;lupus erythematosus&lt;/a&gt;  &lt;/li&gt;&lt;li&gt;measuring toxins in contaminated food  &lt;/li&gt;&lt;li&gt;detecting illicit drugs, e.g.,  &lt;ul&gt;&lt;li&gt;&lt;a href="http://www.blogger.com/D/Drugs.html#Cocaine"&gt;cocaine&lt;/a&gt;  &lt;/li&gt;&lt;li&gt;&lt;a href="http://www.blogger.com/D/Drugs.html#opiates"&gt;opiates&lt;/a&gt;  &lt;/li&gt;&lt;li&gt;Δ-9-tetrahydrocannabinol, the active ingredient in &lt;a href="http://www.blogger.com/D/Drugs.html#marijuana"&gt;marijuana&lt;/a&gt; &lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3299081579068555714-5863840003887399896?l=sciencemasala.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://sciencemasala.blogspot.com/feeds/5863840003887399896/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3299081579068555714&amp;postID=5863840003887399896' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/5863840003887399896'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/5863840003887399896'/><link rel='alternate' type='text/html' href='http://sciencemasala.blogspot.com/2007/07/elisa.html' title='ELISA'/><author><name>chandu</name><uri>http://www.blogger.com/profile/00723479850668875546</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp3.blogger.com/_0fSb-1TJAx0/RqGRyyNp-1I/AAAAAAAAAAU/gqf9RqcHLk0/s72-c/ELISA.gif' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3299081579068555714.post-8136275938925922749</id><published>2007-07-20T21:46:00.000-07:00</published><updated>2007-07-20T21:47:26.614-07:00</updated><title type='text'>What is DNA Fingerprinting</title><content type='html'>&lt;p&gt;The chemical structure of everyone's DNA is the same. The only difference  between people (or any animal) is the order of the &lt;a href="basepair.html"&gt;base  pairs&lt;/a&gt;. There are so many millions of base pairs in each person's DNA that  every person has a different sequence.  &lt;/p&gt;&lt;p&gt;Using these sequences, every person could be identified solely by the  sequence of their base pairs. However, because there are so many millions of  base pairs, the task would be very time-consuming. Instead, scientists are able  to use a shorter method, because of repeating patterns in DNA.  &lt;/p&gt;&lt;p&gt;These patterns do not, however, give an individual "fingerprint," but they  are able to determine whether two DNA samples are from the same person, related  people, or non-related people. Scientists use a small number of sequences of DNA  that are known to vary among individuals a great deal, and analyze those to get  a certain probability of a match. &lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3299081579068555714-8136275938925922749?l=sciencemasala.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://sciencemasala.blogspot.com/feeds/8136275938925922749/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3299081579068555714&amp;postID=8136275938925922749' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/8136275938925922749'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/8136275938925922749'/><link rel='alternate' type='text/html' href='http://sciencemasala.blogspot.com/2007/07/what-is-dna-fingerprinting.html' title='What is DNA Fingerprinting'/><author><name>chandu</name><uri>http://www.blogger.com/profile/00723479850668875546</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3299081579068555714.post-3549845803575613580</id><published>2007-07-20T21:45:00.001-07:00</published><updated>2007-07-20T21:45:47.613-07:00</updated><title type='text'>How is DNA Fingerprinting Done?</title><content type='html'>&lt;ul&gt;&lt;img alt="*" src="button.gif" /&gt;&lt;a href="blot.html"&gt;1. Performing a Southern  Blot&lt;/a&gt;&lt;br /&gt;&lt;img alt="*" src="button.gif" /&gt;&lt;a href="radi.html"&gt;2. Making a  Radioactive Probe&lt;/a&gt;&lt;br /&gt;&lt;img alt="*" src="button.gif" /&gt;&lt;a href="hybrid.html"&gt;3.  Creating a Hybridization Reaction&lt;/a&gt;&lt;br /&gt;&lt;img alt="*" src="button.gif" /&gt;&lt;a href="vntrs.html"&gt;4. VNTRs&lt;/a&gt;&lt;br /&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3299081579068555714-3549845803575613580?l=sciencemasala.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://sciencemasala.blogspot.com/feeds/3549845803575613580/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3299081579068555714&amp;postID=3549845803575613580' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/3549845803575613580'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/3549845803575613580'/><link rel='alternate' type='text/html' href='http://sciencemasala.blogspot.com/2007/07/how-is-dna-fingerprinting-done.html' title='How is DNA Fingerprinting Done?'/><author><name>chandu</name><uri>http://www.blogger.com/profile/00723479850668875546</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3299081579068555714.post-3839058901064728325</id><published>2007-07-20T21:44:00.001-07:00</published><updated>2007-07-20T22:08:00.382-07:00</updated><title type='text'>southern blotting</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp0.blogger.com/_0fSb-1TJAx0/RqGUoCNp-3I/AAAAAAAAAAk/bguUAOY2PLk/s1600-h/blot.gif"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://bp0.blogger.com/_0fSb-1TJAx0/RqGUoCNp-3I/AAAAAAAAAAk/bguUAOY2PLk/s400/blot.gif" alt="" id="BLOGGER_PHOTO_ID_5089512469415197554" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;i&gt;The Southern Blot is one way to analyze the genetic patterns which appear in  a person's DNA. Performing a Southern Blot involves: &lt;/i&gt; &lt;p&gt;1. Isolating the DNA in question from the rest of the cellular material in  the nucleus. This can be done either chemically, by using a detergent to wash  the extra material from the DNA,or mechanically, by applying a large amount of  pressure in order to "squeeze out" the DNA.  &lt;/p&gt;&lt;p&gt;2. Cutting the DNA into several pieces of different sizes. This is done using  one or more &lt;a href="http://www.blogger.com/restriction.html"&gt;restriction enzymes.&lt;/a&gt;  &lt;/p&gt;&lt;p&gt;3. Sorting the DNA pieces by size. The process by which the size separation,  "size fractionation," is done is called gel electrophoresis. The DNA is poured  into a gel, such as agarose, and an electrical charge is applied to the gel,  with the positive charge at the bottom and the negative charge at the top.  Because DNA has a slightly negative charge, the pieces of DNA will be attracted  towards the bottom of the gel; the smaller pieces, however, will be able to move  more quickly and thus further towards the bottom than the larger pieces. The  different-sized pieces of DNA will therefore be separated by size, with the  smaller pieces towards the bottom and the larger pieces towards the top.  &lt;/p&gt;&lt;p&gt;4. Denaturing the DNA, so that all of the DNA is rendered single-stranded.  This can be done either by heating or chemically treating the DNA in the gel.  &lt;/p&gt;&lt;p&gt;5. Blotting the DNA. The gel with the size-fractionated DNA is applied to a  sheet of nitrocellulose paper, and then baked to permanently attach the DNA to  the sheet. The Southern Blot is now ready to be analyzed.  &lt;/p&gt;&lt;p&gt;In order to analyze a Southern Blot, a radioactive genetic &lt;a href="http://www.blogger.com/probe.html"&gt;probe&lt;/a&gt; is used in a &lt;a href="http://www.blogger.com/hybridgloss.html"&gt;hybridization reaction&lt;/a&gt; with the DNA in question  &lt;i&gt;(see next topics for more information)&lt;/i&gt;. If an X-ray is taken of the  Southern Blot after a radioactive probe has been allowed to bond with the  denatured DNA on the paper, only the areas where the radioactive probe binds  will show up on the film. This allows researchers to identify, in a  particular person's DNA, the occurrence and frequency of the particular genetic  pattern contained in the probe.  &lt;/p&gt;&lt;p&gt; &lt;/p&gt;&lt;center&gt;&lt;a href="http://www.blogger.com/blot.gif"&gt;&lt;br /&gt;&lt;/a&gt;&lt;/center&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3299081579068555714-3839058901064728325?l=sciencemasala.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://sciencemasala.blogspot.com/feeds/3839058901064728325/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3299081579068555714&amp;postID=3839058901064728325' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/3839058901064728325'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/3839058901064728325'/><link rel='alternate' type='text/html' href='http://sciencemasala.blogspot.com/2007/07/southern-blot-is-one-way-to-analyze.html' title='southern blotting'/><author><name>chandu</name><uri>http://www.blogger.com/profile/00723479850668875546</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp0.blogger.com/_0fSb-1TJAx0/RqGUoCNp-3I/AAAAAAAAAAk/bguUAOY2PLk/s72-c/blot.gif' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3299081579068555714.post-4102438520795286032</id><published>2007-07-20T21:41:00.000-07:00</published><updated>2007-07-20T22:24:50.542-07:00</updated><title type='text'>HYBRIDIZATION</title><content type='html'>&lt;p&gt;1. Hybridization is the coming together, or binding, of two genetic  sequences. The binding occurs because of the hydrogen bonds  between base  pairs. Between a A base and a T base, there are two hydrogen bonds; between a C  base and a G base, there are three hydrogen bonds.&lt;/p&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp3.blogger.com/_0fSb-1TJAx0/RqGXyyNp-9I/AAAAAAAAABU/9Rzk9VG30ns/s1600-h/hybrid01.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp3.blogger.com/_0fSb-1TJAx0/RqGXyyNp-9I/AAAAAAAAABU/9Rzk9VG30ns/s400/hybrid01.gif" alt="" id="BLOGGER_PHOTO_ID_5089515952633674706" border="0" /&gt;&lt;/a&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt; &lt;/p&gt;&lt;center&gt;&lt;a href="http://www.blogger.com/hybrid01.gif"&gt;&lt;br /&gt;&lt;/a&gt; &lt;/center&gt; &lt;p&gt;2. When making use of hybridization in the laboratory, DNA must first be  denatured, usually by using heat or chemicals. Denaturing is a process by which  the hydrogen bonds of the original double-stranded DNA are broken, leaving a  single strand of DNA whose bases are available for hydrogen bonding.&lt;/p&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp0.blogger.com/_0fSb-1TJAx0/RqGX9CNp--I/AAAAAAAAABc/ESbvsC-7uNI/s1600-h/hybrid02.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp0.blogger.com/_0fSb-1TJAx0/RqGX9CNp--I/AAAAAAAAABc/ESbvsC-7uNI/s400/hybrid02.gif" alt="" id="BLOGGER_PHOTO_ID_5089516128727333858" border="0" /&gt;&lt;/a&gt;&lt;p&gt; &lt;/p&gt;&lt;center&gt;&lt;a href="http://www.blogger.com/hybrid02.gif"&gt;&lt;br /&gt;&lt;/a&gt;  &lt;/center&gt; &lt;p&gt;3. Once the DNA has been denatured, a single-stranded radioactive probe  can be used to see if the denatured DNA contains a sequence similar  to that on the probe. The denatured DNA is put into a plastic bag along with the  probe and some saline liquid; the bag is then shaken to allow sloshing. If the  probe finds a fit, it will bind to the DNA.&lt;/p&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp3.blogger.com/_0fSb-1TJAx0/RqGYIyNp-_I/AAAAAAAAABk/Q40R_Tp8SKg/s1600-h/hybrid03.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp3.blogger.com/_0fSb-1TJAx0/RqGYIyNp-_I/AAAAAAAAABk/Q40R_Tp8SKg/s400/hybrid03.gif" alt="" id="BLOGGER_PHOTO_ID_5089516330590796786" border="0" /&gt;&lt;/a&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt; &lt;/p&gt;&lt;center&gt;&lt;a href="http://www.blogger.com/hybrid03.gif"&gt;&lt;br /&gt;&lt;/a&gt;  &lt;/center&gt; &lt;p&gt;4. The fit of the probe to the DNA does not have to be exact. Sequences of  varying &lt;a href="http://www.blogger.com/homology.html"&gt;homology&lt;/a&gt; can stick to the DNA even if the  fit is poor; the poorer the fit, the fewer the hydrogen bonds between the probe and the denatured DNA. The ability of low-homology probes to still  bind to DNA can be manipulated through varying the temperature of the  hybridization reaction environment, or by varying the amount of salt in the  sloshing mixture.&lt;/p&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp3.blogger.com/_0fSb-1TJAx0/RqGYIyNp-_I/AAAAAAAAABk/Q40R_Tp8SKg/s1600-h/hybrid03.gif"&gt;&lt;br /&gt;&lt;/a&gt;&lt;p&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp1.blogger.com/_0fSb-1TJAx0/RqGYTSNp_AI/AAAAAAAAABs/BctaSLwHCSo/s1600-h/hybrid04.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp1.blogger.com/_0fSb-1TJAx0/RqGYTSNp_AI/AAAAAAAAABs/BctaSLwHCSo/s400/hybrid04.gif" alt="" id="BLOGGER_PHOTO_ID_5089516510979423234" border="0" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt; &lt;/p&gt;&lt;center&gt;&lt;a href="http://www.blogger.com/hybrid04.gif"&gt;&lt;br /&gt;&lt;/a&gt;  &lt;/center&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3299081579068555714-4102438520795286032?l=sciencemasala.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://sciencemasala.blogspot.com/feeds/4102438520795286032/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3299081579068555714&amp;postID=4102438520795286032' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/4102438520795286032'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/4102438520795286032'/><link rel='alternate' type='text/html' href='http://sciencemasala.blogspot.com/2007/07/1_20.html' title='HYBRIDIZATION'/><author><name>chandu</name><uri>http://www.blogger.com/profile/00723479850668875546</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp3.blogger.com/_0fSb-1TJAx0/RqGXyyNp-9I/AAAAAAAAABU/9Rzk9VG30ns/s72-c/hybrid01.gif' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3299081579068555714.post-8395208442800953712</id><published>2007-07-20T21:40:00.000-07:00</published><updated>2007-07-20T22:16:22.229-07:00</updated><title type='text'>RADIOACTIVE LABELLING</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp2.blogger.com/_0fSb-1TJAx0/RqGVeiNp-4I/AAAAAAAAAAs/i7iCEM2mYIY/s1600-h/radio01.gif"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://bp2.blogger.com/_0fSb-1TJAx0/RqGVeiNp-4I/AAAAAAAAAAs/i7iCEM2mYIY/s400/radio01.gif" alt="" id="BLOGGER_PHOTO_ID_5089513405718068098" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;p&gt;1. Obtain some &lt;a href="http://www.blogger.com/dnapol.html"&gt;DNA polymerase&lt;/a&gt; . Put the DNA  to be made radioactive (radiolabeled) into a tube.  &lt;/p&gt;&lt;p&gt; &lt;/p&gt;&lt;center&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp2.blogger.com/_0fSb-1TJAx0/RqGVliNp-5I/AAAAAAAAAA0/XOvBTz3LnmI/s1600-h/radio02.gif"&gt;&lt;br /&gt;&lt;/a&gt; &lt;/center&gt; &lt;p&gt;2. Introduce nicks, or horizontal breaks along a strand, into the DNA you  want to radiolabel. At the same time, add individual nucleotides to the nicked  DNA, one of which, *C , is radioactive.  &lt;/p&gt;&lt;p&gt; &lt;/p&gt;&lt;center&gt;&lt;center&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp2.blogger.com/_0fSb-1TJAx0/RqGVliNp-5I/AAAAAAAAAA0/XOvBTz3LnmI/s1600-h/radio02.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp2.blogger.com/_0fSb-1TJAx0/RqGVliNp-5I/AAAAAAAAAA0/XOvBTz3LnmI/s400/radio02.gif" alt="" id="BLOGGER_PHOTO_ID_5089513525977152402" border="0" /&gt;&lt;/a&gt; &lt;/center&gt; &lt;/center&gt; &lt;p&gt;3. Add the DNA polymerase [pink] to the tube with the nicked DNA and the  individual nucleotides. The DNA polymerase will become immediately attracted to  the nicks in the DNA and attempt to repair the DNA, starting from the 5' end and  moving toward the 3' end.&lt;/p&gt;&lt;br /&gt;&lt;center&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp2.blogger.com/_0fSb-1TJAx0/RqGVuiNp-6I/AAAAAAAAAA8/NlxxAmdXCME/s1600-h/radio03.gif"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://bp2.blogger.com/_0fSb-1TJAx0/RqGVuiNp-6I/AAAAAAAAAA8/NlxxAmdXCME/s400/radio03.gif" alt="" id="BLOGGER_PHOTO_ID_5089513680595975074" border="0" /&gt;&lt;/a&gt; &lt;/center&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt; &lt;/p&gt;&lt;p&gt; &lt;/p&gt;&lt;center&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp2.blogger.com/_0fSb-1TJAx0/RqGVuiNp-6I/AAAAAAAAAA8/NlxxAmdXCME/s1600-h/radio03.gif"&gt;&lt;br /&gt;&lt;/a&gt; &lt;/center&gt; &lt;p&gt;4. The DNA polymerase  begins repairing the nicked DNA. It destroys all  the existing bonds in front of it and places the new nucleotides, gathered from  the individual nucleotides mixed in the tube, behind it. Whenever a G base is  read in the lower strand, a radioactive *C  base is placed in the  new strand. In this fashion, the nicked strand, as it is repaired by the DNA  polymerase, is made radioactive by the inclusion of radioactive *C bases.&lt;/p&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp3.blogger.com/_0fSb-1TJAx0/RqGV0yNp-7I/AAAAAAAAABE/8eQmrzUE2C8/s1600-h/radio04.gif"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://bp3.blogger.com/_0fSb-1TJAx0/RqGV0yNp-7I/AAAAAAAAABE/8eQmrzUE2C8/s400/radio04.gif" alt="" id="BLOGGER_PHOTO_ID_5089513787970157490" border="0" /&gt;&lt;/a&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp3.blogger.com/_0fSb-1TJAx0/RqGV0yNp-7I/AAAAAAAAABE/8eQmrzUE2C8/s1600-h/radio04.gif"&gt;&lt;br /&gt;&lt;/a&gt;&lt;p&gt; &lt;/p&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp3.blogger.com/_0fSb-1TJAx0/RqGV0yNp-7I/AAAAAAAAABE/8eQmrzUE2C8/s1600-h/radio04.gif"&gt;&lt;br /&gt;&lt;/a&gt;&lt;p&gt;5. The nicked DNA is then heated, splitting the two strands of DNA apart.  This creates single-stranded radioactive and non-radioactive pieces. The  radioactive DNA, now called a probe, is ready for use.  &lt;/p&gt;&lt;p&gt; &lt;/p&gt;&lt;center&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp3.blogger.com/_0fSb-1TJAx0/RqGV7yNp-8I/AAAAAAAAABM/IopwRiWW4wY/s1600-h/radio05.gif"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="http://bp3.blogger.com/_0fSb-1TJAx0/RqGV7yNp-8I/AAAAAAAAABM/IopwRiWW4wY/s400/radio05.gif" alt="" id="BLOGGER_PHOTO_ID_5089513908229241794" border="0" /&gt;&lt;/a&gt; &lt;/center&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3299081579068555714-8395208442800953712?l=sciencemasala.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://sciencemasala.blogspot.com/feeds/8395208442800953712/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3299081579068555714&amp;postID=8395208442800953712' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/8395208442800953712'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/8395208442800953712'/><link rel='alternate' type='text/html' href='http://sciencemasala.blogspot.com/2007/07/1.html' title='RADIOACTIVE LABELLING'/><author><name>chandu</name><uri>http://www.blogger.com/profile/00723479850668875546</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp2.blogger.com/_0fSb-1TJAx0/RqGVeiNp-4I/AAAAAAAAAAs/i7iCEM2mYIY/s72-c/radio01.gif' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3299081579068555714.post-1595590883318465363</id><published>2007-07-20T21:33:00.000-07:00</published><updated>2007-07-20T22:04:29.682-07:00</updated><title type='text'>VNTR'S</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp0.blogger.com/_0fSb-1TJAx0/RqGS2CNp-2I/AAAAAAAAAAc/Xk_sw17P0kk/s1600-h/vntr02.gif"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://bp0.blogger.com/_0fSb-1TJAx0/RqGS2CNp-2I/AAAAAAAAAAc/Xk_sw17P0kk/s400/vntr02.gif" alt="" id="BLOGGER_PHOTO_ID_5089510510910110562" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;p&gt;Every strand of DNA has pieces that contain genetic information which informs  an organism's development (exons) and pieces that, apparently, supply no  relevant genetic information at all (introns). Although the introns may seem  useless, it has been found that they contain repeated sequences of base pairs.  These sequences, called Variable Number Tandem Repeats (VNTRs), can contain  anywhere from twenty to one hundred base pairs.  &lt;/p&gt;&lt;p&gt;Every human being has some VNTRs. To determine if a person has a particular  VNTR, a &lt;a href="http://www.blogger.com/blot.html"&gt;Southern Blot&lt;/a&gt; is performed, and then the  Southern Blot is &lt;a href="http://www.blogger.com/radi.html"&gt;probed&lt;/a&gt;, through a &lt;a href="http://www.blogger.com/hybrid.html"&gt;hybridization reaction&lt;/a&gt;, with a radioactive version of the  VNTR in question. The pattern which results from this process is what is often  referred to as a &lt;a href="http://www.blogger.com/whatis.html"&gt;DNA fingerprint&lt;/a&gt;.  &lt;/p&gt;&lt;p&gt;A given person's VNTRs come from the genetic information donated by his or  her parents; he or she could have VNTRs inherited from his or her mother or  father, or a combination, but never a VNTR either of his or her parents do not  have. Shown below are the VNTR patterns for Mrs. Nguyen [blue], Mr. Nguyen  [yellow], and their four children: D1 (the Nguyens' biological daughter), D2  (Mr. Nguyen's step-daughter, child of Mrs. Nguyen and her former husband [red]),  S1 (the Nguyens' biological son), and S2 (the Nguyens' adopted son, not  biologically related [his parents are light and dark green]).&lt;/p&gt;&lt;p&gt; &lt;/p&gt;&lt;img src="file:///C:/DOCUME%7E1/sys/LOCALS%7E1/Temp/moz-screenshot-3.jpg" alt="" /&gt;&lt;p&gt;Because VNTR patterns are inherited genetically, a given person's VNTR  pattern is more or less unique. The more VNTR probes used to analyze a person's  VNTR pattern, the more distinctive and individualized that pattern, or DNA  fingerprint, will be. &lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3299081579068555714-1595590883318465363?l=sciencemasala.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://sciencemasala.blogspot.com/feeds/1595590883318465363/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3299081579068555714&amp;postID=1595590883318465363' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/1595590883318465363'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/1595590883318465363'/><link rel='alternate' type='text/html' href='http://sciencemasala.blogspot.com/2007/07/every-strand-of-dna-has-pieces-that.html' title='VNTR&apos;S'/><author><name>chandu</name><uri>http://www.blogger.com/profile/00723479850668875546</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp0.blogger.com/_0fSb-1TJAx0/RqGS2CNp-2I/AAAAAAAAAAc/Xk_sw17P0kk/s72-c/vntr02.gif' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3299081579068555714.post-3261350571665138788</id><published>2007-07-20T21:28:00.000-07:00</published><updated>2007-07-20T21:32:15.551-07:00</updated><title type='text'>Problems With DNA Fingerprinting</title><content type='html'>&lt;img style="width: 175px; height: 18px;" alt="*" src="button.gif" /&gt;&lt;p&gt;Like nearly everything else in the scientific world, nothing about DNA  fingerprinting is 100% assured. The term DNA fingerprint is, in one sense, a  misnomer: it implies that, like a fingerprint, the VNTR pattern for a given  person is utterly and completely unique to that person. Actually, all that a  VNTR pattern can do is present a probability that the person in question is  indeed the person to whom the VNTR pattern (of the child, the criminal evidence,  or whatever else) belongs. Given, that probability might be 1 in 20 billion,  which would indicate that the person can be reasonably matched with the DNA  fingerprint; then again, that probability might only be 1 in 20, leaving a large  amount of doubt regarding the specific identity of the VNTR pattern's owner.  &lt;/p&gt;&lt;p&gt;&lt;b&gt;1. Generating a High Probability&lt;/b&gt;&lt;br /&gt;The probability of a DNA  fingerprint belonging to a specific person needs to be reasonably  high--especially in criminal cases, where the association helps establish a  suspect's guilt or innocence. Using certain rare VNTRs or combinations of VNTRs  to create the VNTR pattern increases the probability that the two DNA samples do  indeed match (as opposed to look alike, but not actually come from the same  person) or correlate (in the case of parents and children).  &lt;/p&gt;&lt;p&gt; &lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;b&gt;2. Problems with Determining Probability&lt;/b&gt;  &lt;/p&gt;&lt;p&gt;&lt;i&gt;A. Population Genetics&lt;/i&gt;&lt;br /&gt;VNTRs, because they are results of genetic  inheritance, are not distributed evenly across all of human population. A given  VNTR cannot, therefore, have a stable probability of occurrence; it will vary  depending on an individual's genetic background. The difference in probabilities  is particularly visible across racial lines. Some VNTRs that occur very  frequently among Hispanics will occur very rarely among Caucasians or  African-Americans. Currently, not enough is known about the VNTR frequency  distributions among ethnic groups to determine accurate probabilities for  individuals within those groups; the heterogeneous genetic composition of  interracial individuals, who are growing in number, presents an entirely new set  of questions. Further experimentation in this area, known as population  genetics, has been surrounded with and hindered by controversy, because the idea  of identifying people through genetic anomalies along racial lines comes  alarmingly close to the eugenics and ethnic purification movements of the recent  past, and, some argue, could provide a scientific basis for racial  discrimination.  &lt;/p&gt;&lt;p&gt;&lt;i&gt;B. Technical Difficulties&lt;/i&gt;&lt;br /&gt;Errors in the hybridization and probing  process must also be figured into the probability, and often the idea of error  is simply not acceptable. Most people will agree that an innocent person should  not be sent to jail, a guilty person allowed to walk free, or a biological  mother denied her legal right to custody of her children, simply because a lab  technician did not conduct an experiment accurately. When the DNA sample  available is minuscule, this is an important consideration, because there is not  much room for error, especially if the analysis of the DNA sample involves  amplification of the sample (creating a much larger sample of genetically  identical DNA from what little material is available), because if the wrong DNA  is amplified (i.e. a skin cell from the lab technician) the consequences can be  profoundly detrimental. Until recently, the standards for determining DNA  fingerprinting matches, and for laboratory security and accuracy which would  minimize error, were neither stringent nor universally codified, causing a great  deal of public outcry. &lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3299081579068555714-3261350571665138788?l=sciencemasala.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://sciencemasala.blogspot.com/feeds/3261350571665138788/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3299081579068555714&amp;postID=3261350571665138788' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/3261350571665138788'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/3261350571665138788'/><link rel='alternate' type='text/html' href='http://sciencemasala.blogspot.com/2007/07/problems-with-dna-fingerprinting.html' title='Problems With DNA Fingerprinting'/><author><name>chandu</name><uri>http://www.blogger.com/profile/00723479850668875546</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3299081579068555714.post-2133176093010446267</id><published>2007-07-20T06:00:00.001-07:00</published><updated>2007-07-20T06:00:52.999-07:00</updated><title type='text'>cycle sequencing</title><content type='html'>&lt;span style="font-family:verdana;font-size:85%;color:#333366;"&gt;&lt;i&gt;Cycle                Sequencing&lt;/i&gt;&lt;br /&gt;              The sequencing method developed by Fred Sanger forms the basis of                automated "cycle" sequencing reactions today. Fluorescent                dyes are added to the reactions, and a laser within an automated                DNA sequencing machine is used to analyze the DNA fragments produced.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3299081579068555714-2133176093010446267?l=sciencemasala.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://sciencemasala.blogspot.com/feeds/2133176093010446267/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3299081579068555714&amp;postID=2133176093010446267' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/2133176093010446267'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/2133176093010446267'/><link rel='alternate' type='text/html' href='http://sciencemasala.blogspot.com/2007/07/cycle-sequencing.html' title='cycle sequencing'/><author><name>chandu</name><uri>http://www.blogger.com/profile/00723479850668875546</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3299081579068555714.post-6452415120198156397</id><published>2007-07-19T06:17:00.000-07:00</published><updated>2007-07-19T06:28:07.558-07:00</updated><title type='text'>rt pcr</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp1.blogger.com/_0fSb-1TJAx0/Rp9mUMWo5XI/AAAAAAAAAAM/2dmSoNLl1Fw/s1600-h/7300_Real_Time_big.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp1.blogger.com/_0fSb-1TJAx0/Rp9mUMWo5XI/AAAAAAAAAAM/2dmSoNLl1Fw/s400/7300_Real_Time_big.jpg" alt="" id="BLOGGER_PHOTO_ID_5088898601051546994" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;p align="left"&gt;&lt;span style="color: rgb(204, 102, 0);font-size:180%;" &gt;Introduction:&lt;/span&gt;&lt;/p&gt; &lt;blockquote&gt;    &lt;p align="left"&gt;&lt;span style="font-size:130%;"&gt;The advent of Polymerase Chain Reaction (PCR)      by Kary B. Mullis in the mid-1980s revolutionized molecular biology as we      know it. PCR is a fairly standard procedure now, and its use is extremely      wide-ranging. At its most basic application, PCR can amplify a small amount      of template DNA (or RNA) into large quantities in a few hours. This is performed      by mixing the DNA with primers on either side of the DNA (forward and reverse),      &lt;em&gt;Taq&lt;/em&gt; polymerase (of the species &lt;em&gt;Thermus aquaticus&lt;/em&gt;, a thermophile      whose polymerase is able to withstand extremely high temperatures), free nucleotides      (dNTPs for DNA, NTPs for RNA), and buffer. This &lt;a href="http://www.bio.davidson.edu/misc/movies/pcr2.mov"&gt;movie&lt;/a&gt;      shows PCR in action. The temperature is then alternated between hot and cold      to denature and reanneal the DNA, with the polymerase adding new complementary      strands each time. In addition to the basic use of PCR, specially designed      primers can be made to ligate two different pieces of DNA together or add      a restriction site, in addition to many other creative uses. Clearly, PCR      is a procedure that is an integral addition to the molecular biologist’s      toolbox, and the method has been continually improved upon over the years.      (Purves, et al. 2001)&lt;br /&gt;  &lt;/span&gt;&lt;/p&gt;   &lt;p align="left"&gt;&lt;span style="font-size:130%;"&gt;Fairly recently, a new method of PCR quantification      has been invented. This is called “real-time PCR” because it allows      the scientist to actually view the increase in the amount of DNA as it is      amplified. Several different types of real-time PCR are being marketed to      the scientific community at this time, each with their advantages. This web      site will explore one of these types, TaqMan® real-time PCR, as well as      give an overview of the other two types of real-time PCR, molecular beacon      and SYBR® Green.&lt;/span&gt;&lt;span style="font-size:130%;"&gt; &lt;/span&gt;&lt;br /&gt;&lt;/p&gt; &lt;/blockquote&gt; &lt;p align="left"&gt;&lt;span style="color: rgb(204, 102, 0);font-size:180%;" &gt;How TaqMan&lt;/span&gt;&lt;span style="color: rgb(204, 102, 0);font-size:130%;" &gt;®&lt;/span&gt;&lt;span style="color: rgb(204, 102, 0);font-size:180%;" &gt;    works:&lt;/span&gt;&lt;/p&gt; &lt;blockquote&gt;    &lt;p align="left"&gt;&lt;span style="font-size:130%;"&gt;TaqMan® utilizes a system that is fairly      easy to grasp conceptually. First, we must take a look at the TaqMan®      probe.&lt;br /&gt;  &lt;/span&gt;&lt;/p&gt;   &lt;p align="center"&gt;&lt;span style="font-size:130%;"&gt;&lt;img src="http://www.bio.davidson.edu/Courses/Molbio/MolStudents/spring2003/Pierce/Taqmanprobe2.jpg" height="154" width="362" /&gt;      &lt;/span&gt;&lt;/p&gt;   &lt;p align="center"&gt;Figure 1. The Taqman probe. The red circle represents the      quenching dye that disrupts the observable signal from the reporter dye (green      circle) when it is within a short distance. Image created by Dan Pierce.&lt;/p&gt;   &lt;p align="left"&gt;&lt;span style="font-size:130%;"&gt;The probe consists of two types of fluorophores,      which are the fluorescent parts of reporter proteins (Green Fluorescent Protein      (GFP) has an often-used fluorophore). While the probe is attached or unattached      to the template DNA and before the polymerase acts, the &lt;strong&gt;quencher (Q)&lt;/strong&gt;      fluorophore (usually a long-wavelength colored dye, such as red) reduces the      fluorescence from the &lt;strong&gt;reporter (R)&lt;/strong&gt; fluorophore (usually a      short-wavelength colored dye, such as green). It does this by the use of Fluorescence      (or Förster) Resonance Energy Transfer (FRET), which is the inhibition      of one dye caused by another without emission of a proton. The reporter dye      is found on the 5’ end of the probe and the quencher at the 3’      end.&lt;br /&gt;  &lt;/span&gt;&lt;/p&gt;   &lt;p align="center"&gt;&lt;img src="http://www.bio.davidson.edu/Courses/Molbio/MolStudents/spring2003/Pierce/TaqmanprobeonDNA.jpg" height="194" width="499" /&gt;&lt;/p&gt;   &lt;p align="center"&gt; &lt;/p&gt;   &lt;p align="center"&gt;Figure 2. The TaqMan&lt;span style="font-size:130%;"&gt;®&lt;/span&gt; probe binds      to the target DNA, and the primer binds as well. Because the primer is bound,&lt;em&gt;      Taq&lt;/em&gt; polymerase can now create a complementary strand. Image created by      Dan Pierce.&lt;/p&gt;   &lt;p align="center"&gt; &lt;/p&gt;   &lt;p align="center"&gt;&lt;span style="font-size:130%;"&gt;Once the TaqMan® probe has bound to its      specific piece of the template DNA after denaturation (high temperature) and      the reaction cools, the primers anneal to the DNA. &lt;em&gt;Taq&lt;/em&gt; polymerase      then adds nucleotides and removes the Taqman® probe from the template      DNA. This separates the quencher from the reporter, and allows the reporter      to give off its emit its energy. This is then quantified using a computer.      The more times the denaturing and annealing takes place, the more opportunities      there are for the Taqman® probe to bind and, in turn, the more emitted      light is detected.&lt;/span&gt; &lt;span style="font-size:130%;"&gt;&lt;/span&gt;&lt;/p&gt;   &lt;p align="center"&gt;&lt;br /&gt;&lt;br /&gt;  &lt;img src="http://www.bio.davidson.edu/Courses/Molbio/MolStudents/spring2003/Pierce/TaqmanwithTaq2.jpg" height="328" width="496" /&gt; &lt;/p&gt;   &lt;p align="center"&gt; &lt;/p&gt;   &lt;p align="center"&gt;Figure 3. The reporter dye is released from the extending      double-stranded DNA created by the Taq polymerase. Away from the quenching      dye, the light emitted from the reporter dye in an excited state can now be      observed. Image created by Dan Pierce.&lt;/p&gt;   &lt;p align="left"&gt;  &lt;/p&gt; &lt;/blockquote&gt; &lt;p align="left"&gt;&lt;span style="color: rgb(204, 102, 0);font-size:180%;" &gt;Quantification:&lt;/span&gt;&lt;/p&gt; &lt;blockquote&gt;    &lt;p align="left"&gt;&lt;span style="font-size:130%;"&gt;The specifics in quantification of the light      emitted during real-time PCR are fairly involved and complex. The math involved      is above the scope of this website, but this website explains some specifics      not talked about here: &lt;a href="http://www.wzw.tum.de/gene-quantification/relative.html"&gt;&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;   &lt;p&gt;&lt;span style="font-size:130%;"&gt;The light emitted from the dye in the excited state is received      by a computer and shown on a graph display, such as this, showing PCR cycles      on the X-axis and a logarithmic indication of intensity on the Y-axis.&lt;br /&gt;  &lt;/span&gt; &lt;/p&gt;   &lt;p align="center"&gt;&lt;span style="font-size:130%;"&gt;&lt;img src="http://www.bio.davidson.edu/Courses/Molbio/MolStudents/spring2003/Pierce/Taqmangraph.gif" height="264" width="455" /&gt;&lt;/span&gt;&lt;/p&gt;   &lt;p align="center"&gt;Figure 4. A graph printiout of actual data found using the      TaqMan&lt;span style="font-size:130%;"&gt;®&lt;/span&gt; probe. Courtesy &lt;a href="http://www.biotech.uiuc.edu/taqman.htm"&gt;&lt;/a&gt;&lt;/p&gt;       &lt;/blockquote&gt; &lt;p align="left"&gt; &lt;/p&gt; &lt;p align="left"&gt;&lt;span style="color: rgb(204, 102, 0);font-size:180%;" &gt;Other Images of TaqMan&lt;/span&gt;&lt;span style="color: rgb(204, 102, 0);font-size:130%;" &gt;®&lt;/span&gt;&lt;span style="color: rgb(204, 102, 0);font-size:180%;" &gt;    in Action:&lt;/span&gt;&lt;/p&gt; &lt;p align="center"&gt; &lt;/p&gt; &lt;p align="center"&gt;&lt;img src="http://www.bio.davidson.edu/Courses/Molbio/MolStudents/spring2003/Pierce/threestep.gif" height="412" width="482" /&gt;&lt;/p&gt; &lt;p align="center"&gt; &lt;/p&gt; &lt;p align="center"&gt;Figure 6. Another three step view of the TaqMan&lt;span style="font-size:130%;"&gt;®&lt;/span&gt;    probe working: before the probe is met with the Taq polymerase, energy is transferred    from a short-wavelength fluorophore (green) to a long-wavelength fluorophore    (red). When the polymerase adds nucleotides to the template strand, it releases    the short-wavelength fluorophore, making it detectable and the long-wavelength    undetectable. Figure courtesy &lt;a href="http://www.probes.com/handbook/boxes/0422.html"&gt;www.probes.com&lt;/a&gt;&lt;/p&gt; &lt;p align="left"&gt; &lt;/p&gt; &lt;p align="center"&gt;&lt;img src="http://www.bio.davidson.edu/Courses/Molbio/MolStudents/spring2003/Pierce/germanmethods.gif" height="368" width="717" /&gt;&lt;/p&gt; &lt;p align="center"&gt;Figure 7. Another view of TaqMan&lt;span style="font-size:130%;"&gt;®&lt;/span&gt;    in action. The release from the Quencher dye (red Q) in step 2 eventually causes    the Reporter dye (blue R) to be seen in step 4. Figure courtesty &lt;a href="http://www.ruhr-uni-bochum.de/homeexpneu/projekte/irmgards-projekt.html"&gt;www.ruhr-uni-bochum.de&lt;/a&gt;    pending.&lt;/p&gt; &lt;p align="left"&gt; &lt;/p&gt; &lt;p align="left"&gt;&lt;span style="color: rgb(204, 102, 0);font-size:180%;" &gt;Other Real-Time PCR Methods:&lt;/span&gt;&lt;/p&gt; &lt;blockquote&gt;    &lt;p align="left"&gt;&lt;span style="font-size:130%;"&gt;There are two other types of real-time PCR methods,      the molecular beacon method and the SYBR® Green method. The molecular      beacon method utilizes a reporter probe that is wrapped around into a hairpin.      It also has a quencher dye that must be in close contact to the reporter to      work. An important difference of the molecular beacon method in comparison      to the TaqMan® method is that the probe remains intact throughout the      PCR product, and is rebound to the target at every cycle. Click here to see      a web page on the &lt;a href="http://www.bio.davidson.edu/courses/genomics/method/realtimepcr.html"&gt;molecular      beacon&lt;/a&gt; method of PCR, another type of real-time PCR used in molecular      biology. The SYBR® Green probe was the first to be used in real-time PCR.      It binds to double-stranded DNA and emits light when excited. Unfortunately,      it binds to any double-stranded DNA which could result in inaccurate data,      especially compared with the specificity found in the other two methods.&lt;br /&gt;&lt;/span&gt;&lt;/p&gt; &lt;/blockquote&gt; &lt;p&gt; &lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3299081579068555714-6452415120198156397?l=sciencemasala.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://sciencemasala.blogspot.com/feeds/6452415120198156397/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3299081579068555714&amp;postID=6452415120198156397' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/6452415120198156397'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/6452415120198156397'/><link rel='alternate' type='text/html' href='http://sciencemasala.blogspot.com/2007/07/rt-pcr.html' title='rt pcr'/><author><name>chandu</name><uri>http://www.blogger.com/profile/00723479850668875546</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp1.blogger.com/_0fSb-1TJAx0/Rp9mUMWo5XI/AAAAAAAAAAM/2dmSoNLl1Fw/s72-c/7300_Real_Time_big.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3299081579068555714.post-1561676995378112181</id><published>2007-07-19T06:05:00.000-07:00</published><updated>2007-07-19T06:06:23.957-07:00</updated><title type='text'></title><content type='html'>The purpose of a PCR (&lt;u&gt;P&lt;/u&gt;olymerase &lt;u&gt;C&lt;/u&gt;hain &lt;u&gt;R&lt;/u&gt;eaction) is to make a huge number of copies of a gene.  This is necessary to have enough starting template for sequencing. &lt;ol&gt;&lt;li&gt;&lt;b&gt;The cycling reactions :&lt;/b&gt;&lt;/li&gt; There are three major steps in a PCR, which are repeated for 30 or 40 cycles. This is done on an automated cycler, which can heat and cool the tubes with the reaction mixture in a very short time.&lt;ol&gt;&lt;li&gt;&lt;b&gt;Denaturation&lt;/b&gt; at 94°C :&lt;/li&gt;&lt;br /&gt;During the denaturation, the double strand melts open to single stranded DNA, all enzymatic reactions stop (for example : the extension from a previous cycle). &lt;li&gt;&lt;b&gt;Annealing&lt;/b&gt; at 54°C :&lt;/li&gt;&lt;br /&gt;The primers are jiggling around, caused by the Brownian motion. Ionic bonds are constantly formed and broken between the single stranded primer and the single stranded template. The more stable bonds last a little bit longer (primers that fit exactly) and on that little piece of double stranded DNA (template and primer), the polymerase can attach and starts copying the template. Once there are a few bases built in, the ionic bond is so strong between the template and the primer, that it does not break anymore. &lt;li&gt;&lt;b&gt;extension&lt;/b&gt; at 72°C :&lt;/li&gt;&lt;br /&gt;This is the ideal working temperature for the polymerase. The primers, where there are a few bases built in, already have a stronger ionic attraction to the template than the forces breaking these attractions. Primers that are on positions with no exact match, get loose again (because of the higher temperature) and don't give an extension of the fragment.&lt;br /&gt;The bases (complementary to the template) are coupled to the primer on the 3' side (the polymerase adds dNTP's from 5' to 3', reading the template from 3' to 5' side, bases are added complementary to the template) &lt;/ol&gt;&lt;img src="http://users.ugent.be/%7Eavierstr/principles/pcrsteps.gif" alt="PCR steps" /&gt;&lt;br /&gt;&lt;br /&gt;Figure 3 : The different steps in PCR. &lt;a href="http://users.ugent.be/%7Eavierstr/pdf/PCR.pdf"&gt;(pdf file of this picture)&lt;/a&gt;&lt;a href="http://users.ugent.be/%7Eavierstr/principles/pcrani.html"&gt;Animated picture of PCR&lt;/a&gt;  Because both strands are copied during PCR, there is an &lt;b&gt;exponential&lt;/b&gt; increase of the number of copies of the gene. Suppose there is only one copy of the wanted gene before the cycling starts, after one cycle, there will be 2 copies, after two cycles, there will be 4 copies, three cycles will result in 8 copies and so on.&lt;img src="http://users.ugent.be/%7Eavierstr/principles/pcrcopies.gif" alt="PCR copies" /&gt;&lt;br /&gt;&lt;br /&gt;Figure 4 : The exponential amplification of the gene in PCR. &lt;li&gt;&lt;b&gt;Is there a gene copied during PCR and is it the right size ?&lt;/b&gt;&lt;/li&gt;  Before the PCR product is used in further applications, it has to be checked if : &lt;ol&gt;&lt;li&gt;There is a product formed. &lt;br /&gt;Though biochemistry is an exact science, not every PCR is successful. There is for example a possibility that the quality of the DNA is poor, that one of the primers doesn't fit, or that there is too much starting template&lt;/li&gt;&lt;li&gt;The product is of the right size&lt;br /&gt;It is possible that there is a product, for example a band of 500 bases, but the expected gene should be 1800 bases long. In that case, one of the primers probably fits on a part of the gene closer to the other primer. It is also possible that both primers fit on a totally different gene. &lt;/li&gt;&lt;li&gt;Only one band is formed. &lt;br /&gt;As in the description above, it is possible that the primers fit on the desired locations, and also on other locations. In that case, you can have different bands in one lane on a gel. &lt;/li&gt;&lt;/ol&gt;&lt;img src="http://users.ugent.be/%7Eavierstr/principles/pcrgel.gif" alt="PCR gel" /&gt;&lt;br /&gt;&lt;br /&gt;Figure 5 : Verification of the PCR product on gel. The ladder is a mixture of fragments with known size to compare with the PCR fragments. Notice that the distance between the different fragments of the ladder is logarithmic. Lane 1 : PCR fragment is approximately 1850 bases long. Lane 2 and 4 : the fragments are approximately 800 bases long. Lane 3 : no product is formed, so the PCR failed. Lane 5 : multiple bands are formed because one of the primers fits on different places. &lt;/ol&gt;   &lt;hr /&gt; &lt;a href="http://users.ugent.be/%7Eavierstr/index.html"&gt;back to homepage&lt;/a&gt;&lt;br /&gt;&lt;a href="http://users.ugent.be/%7Eavierstr/principles/seq.html"&gt;Next : Sequencing&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3299081579068555714-1561676995378112181?l=sciencemasala.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://sciencemasala.blogspot.com/feeds/1561676995378112181/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3299081579068555714&amp;postID=1561676995378112181' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/1561676995378112181'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/1561676995378112181'/><link rel='alternate' type='text/html' href='http://sciencemasala.blogspot.com/2007/07/purpose-of-pcr-p-olymerase-c-hain-r.html' title=''/><author><name>chandu</name><uri>http://www.blogger.com/profile/00723479850668875546</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3299081579068555714.post-7750493328849356787</id><published>2007-07-19T05:38:00.000-07:00</published><updated>2007-07-19T05:39:11.564-07:00</updated><title type='text'>pcr</title><content type='html'>polymerase chain reaction&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3299081579068555714-7750493328849356787?l=sciencemasala.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://sciencemasala.blogspot.com/feeds/7750493328849356787/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3299081579068555714&amp;postID=7750493328849356787' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/7750493328849356787'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3299081579068555714/posts/default/7750493328849356787'/><link rel='alternate' type='text/html' href='http://sciencemasala.blogspot.com/2007/07/pcr.html' title='pcr'/><author><name>chandu</name><uri>http://www.blogger.com/profile/00723479850668875546</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
