How to Do Multiple Sequence Alignment
A multiple sequence alignment (MSA) is a bioinformatic method used by biologists and computational biologists to analyze huge amounts of gene sequencing information from many different species (e.g., mammals, amphibians, plants) and to determine whether they are connected to each other. This result is used by scientists to understand evolutionary relationships and predict the sequences of genes and proteins that may be related. MSAs require advanced education in genetics and molecular biology, a complete understanding of bioinformatics and experience with analyzing scientific databases.
Instructions
-
-
1
Determine the sequences. You must already know the species of your sequence of interest and which species you want to align it to. For example, if you have a human sequence, you will want to align this to mice, dogs, frogs, bacteria or yeast. To obtain the other sequences, access online sequence databases such as GenBank, EMBL or UniGene.
-
2
Verify the sequences. Using standard biological analysis software (such as VectorNTI, MacVector, AlignX, Mauve, Clustal X and W), open up each sequence and look through it to determine areas that will be needed for the alignment. A quick alignment can be performed by hand at this point to match up short segments of sequences. This will help the automated alignment to proceed faster later.
-
-
3
Align the sequences by using an alignment algorithm such as Clustal W to import the hand-aligned sequences. Do not attempt to align more than eight sequences at the beginning. Start the alignment; and when the sequences are finished, check the matched regions for mistakes. No computational aligner is capable of producing completely correct alignments; therefore look out for errors, such as misalignments or disrupted gene residues. Rectify these by manually adjusting the alignments or deleting any interfering sequences, then repeat the alignment. Do this as many times as necessary until an acceptable alignment is produced.
-
4
Analyze the alignment. Aligned regions will be color-coded and indicate sequences that are conserved across species or genes. Continue to add on newly-found sequences and restart the alignment. Determine if any biologically or functionally important sequences are present in the color-aligned regions, which can be used for subsequent experiments.
-
1
Tips & Warnings
Multiple sequence alignments for an inexperienced biologist are almost always prohibitively difficult. Approach a highly skilled bioinformatician or molecular biologist for guidance. Be aware that there are many methods to achieve an alignment, and the more complex the sequence is, the higher the computational skill and genetic knowledge required.
References
Resources
- Photo Credit dna 10 image by chrisharvey from Fotolia.com