How to Create Phylogenetic Trees
Phylogenetic trees are a complicated, highly technical bioinformatic tool used by biologists to determine the evolutionary relationship between different species. In particular, phylogenetics looks at the differences or similarities in genes between species and aims to establish the two species actually originated from a common ancestor. The "tree" is essentially a pictorial representation of this evolutionary relationship. Phylogenetic trees are difficult to construct and should not be attempted by beginners, non-scientists or students without proper guidance from a qualified scientist or expert. Advanced (college level) knowledge of genetics and computational biology is required.
Instructions
-
-
1
Put together a dataset of genes. This requires access to repositories such as GenBank, EMBL or DDBJ from which nucleotide sequences can be downloaded, as well as knowledge of how to use them. Use keyword or sequence-similarity searches to find a group of sequences that are related. Download the "FASTA" format or text document for each gene selected and compile these into one file.
-
2
Align the sequences in the dataset. Carry out multiple sequence alignments using computational biology tools available at your facility, such as VectorNTI, ClustalX, BioEdit, CLCBio Workbench and so on. Note that some of these tools are very complicated to operate and highly expensive to use (site licenses can cost upwards of several thousand dollars), so prior training may be mandatory. Perform a progressive sequence alignment where sequences are added one-by-one and similar nucleotides are matched up (i.e. aligned) until the alignment is established. This must be highly accurate from the start. Mistakes here will be amplified in the final phylogenetic tree.
-
-
3
Check the alignment. Not all parts of the alignment need to be included in the construction of the phylogenetic tree; for example, delete any gaps inserted during the construction of the alignment. Also remove any unclear, misaligned or ambiguous alignments. If the gene being studied actually produces a protein, consider whether it would be more appropriate to convert the nucleotide sequence alignment into a protein sequence alignment. This will depend on how much data is to be obtained from the tree (amino acid sequences will be more informative). Note that the more aligned the sequences are to each other, the more evolutionarily related they are (i.e. shorter "evolutionary distance" from one species to another).
-
4
Decide if the tree is to be built by "distance-matrix" or "discrete data" method. The distance-matrix method is easier and less complicated, requiring only the distance between the aligned sequences to construct the most suitable tree. The discrete data method breaks the alignment down into parts and tries to assemble each of these into the tree, and because of this, provides much more information for subsequent analysis; however, they take much longer to complete than distance matrix trees.
-
5
Load the aligned sequences into a phylogeny package. Several inexpensive or open-source software packages are available and are the industry standard, such as PHYLIP, Mega and PAUP. ClustalX can be used for simpler trees. Using ClustalX as an example, remove gaps from the alignment and then correct for alignment errors or multiple substitutions. Check that the correct output format ("nodes," not "branches") is selected. Set the software to "boostrap NJ tree," which will carry out the complicated calculations required for the assembly of the tree.
-
6
Present the calculated data as a tree. The phylogeney package will have calculated distances that represent how similar (close) or different (far) sequences are from each other between different species. This can be used to draw a tree, with branches representing these distances and nodes at the ends of the branches to denote the species or gene. Draw this to scale to give a good visual representation of evolutionary distance between the species.
-
1
Tips & Warnings
Phylogenetic tree construction is a very complicated and intensely computational procedure. It is best to seek a trained bioinformatician to provide guidance or hands-on training. Mistakes in tree formation are extremely common but cannot be resolved manually and will require advanced computational biology expertise.
References
- “Evolution at the Molecular Level;” Robert K. Selander, Andrew G. Clark, Thomas S. Whittam; 1991
- “Fundamentals of Molecular Evolution;" W.-H. Li and Dan Graur; 1990
- “Phylogeny for the Faint of Heart: A Tutorial;" Sandra Baldauf; 2003
- Photo Credit Jupiterimages/Photos.com/Getty Images