Software for Principal Component Analysis
Principal component analysis reduces a set of correlated variables to a smaller set of uncorrelated variables, simplifying a complex data set for further examination. This complex statistical procedure can be performed by many data analysis software programs, or by add-on programs that expand the abilities of some existing software packages.
-
Simplification of data sets
-
Principal component analysis is a data exploration and reduction technique that analysts use for extracting the most important information from large, confusing data sets. Analysts use principal component analysis when they have a large number of observed variables that make a data set seem unwieldy. Often, many of the variables are correlated, making the data seem redundant. Principal component analysis simplifies the data by expressing these variables in terms of a smaller number of underlying structures (known as principal components) that account for most of the variance in the measures.
Software Used
-
The complexity of principal component analysis requires use of a software program. A variety of statistical software programs exist, and most of them are capable of conducting principal component analysis. The most widely used software packages for statistical analysis include SAS, Stata and SPSS. Universities, research centers, consulting organizations and other research professionals use these specialized software programs. All three programs can conduct principal component analysis on a set of data entered in a spreadsheet in which the rows represent individual observations and the columns represent separate variables.
-
Features
-
Most software programs for principal component analysis, including SAS, Stata and SPSS, will display the results of principal component analysis in a tabular form that includes the eigenvalues, or measures of explained variation. Many programs also will provide visual display of the results in the form of a scree plot.
Considerations
-
Principal component analysis is sometimes confused with factor analysis, another data reduction technique that explains correlated observations in terms of underlying factors. The two are actually separate procedures, although principal component analysis is one step in factor analysis. However, many software packages combine the two procedures.
Another important consideration is that specialized software packages such as SAS, SPSS and Stata are costly to purchase. Therefore, these software programs may not be useful for people who do not plan to conduct frequent statistical analyses.
Excel
-
People who need to conduct some statistical analyses, but would prefer not to purchase a specialized program, might ask if widely available spreadsheet programs such as Excel, are capable of conducting principal component analysis. The answer is yes and no. Although Excel has some data analysis capabilities, which a user can access by installing the analysis tool add-in, the program was not primarily intended as a statistical analysis program. Principal component analysis and factor analysis are not among the functions in the data analysis tool.
Prevention/Solution
-
There is a program that users can download and install to enhance Excel's capabilities as a tool of data analysis. Addinsoft, a software company that specializes in analytic programs. XLStat enables Excel to conduct principal component analysis and other procedures. The program has the same user-friendly procedures that allow users to select the data to analyze simply by clicking a cell and dragging across fields. Users can purchase and download XLStat from the Addinsoft website. There also is a free trial version that users can try out before deciding to purchase it.
-
References
Resources
- Photo Credit Image by Flickr.com, courtesy of Casey Serin