FASTA Format Description
A FASTA-format file contains one or more sequences of nucleotides in DNA. The FASTA format originated with the FASTA software package for DNA sequencing, although it has become a standard format for the representation of DNA sequences in bioinformatics. FASTA is a simple format that makes sequences easy to parse using scripting languages such as Perl and Python.
-
Overview
-
The basis of a file is a line starting with the ">" character and followed by text identifying the origin of the sequence. The header line is typically fewer than 80 characters. The line following this header line contains a series of characters representing nucleotides in DNA or amino acid residues in a peptide sequence.
Allowed DNA Characters
-
Only meaningful characters are allowed as part of a FASTA sequence. Sequences can consist of A, C, T, G or U, corresponding to the nucleotides adenosine, cytosine, thymidine, guanine or uracil respectively. However, the exact identity of the nucleotide may not always be present from sequencing. FASTA also contains codes representing the possible nucleotides when uncertainty is present. The code N is used when no determination can be made and X when the nucleotide is masked by other molecules. The "-" code is used to represent a gap of indeterminate length.
-
Allowed Peptide Characters
-
An alphanumeric code can also be used to represent the 24 amino acids present in a peptide sequence. If a peptide cannot be determined, the code X is used, similarly to a DNA sequence. An "*" is used to indicate the terminus or translation stop sequence of a peptide. A "-" is also used to represent a gap in sequencing data for peptides.
Other Information
-
The NCBI sets a standard sequence ID, or SeqID, for use in FASTA header lines, though there is no definitive standard for inclusion in the FASTA header line. A FASTA file containing multiple sequences is known as a multi-FASTA file. FASTA files may have the file extension ".fasta," ".fna," ".ffn," ".faa," ".frn" or ".fas."
-
References
- Photo Credit Comstock/Comstock/Getty Images