How to Calculate Correlation Between Variables
Researchers often want to know how certain variables are correlated and how closely they are correlated. Whenever a correlation between two variables exists, one variable will change when the other changes. For example, socioeconomic status might affect the level of education one will receive. Statements such as this demonstrate a possible correlation between two variables; it is just a matter of calculating how much of a correlation exists. Calculating a correlation between two variables can be completed in just a few steps.
Instructions
-
-
1
Determine the two variables you will be using to examine the correlation between the two. For instance, suppose you want to calculate the correlation between test scores and number of hours studying. Enter the data of these two variables into a spreadsheet program, such as Microsoft Excel. You should enter each student's number of studying hours in one column and the students' test scores in another column.
-
2
Calculate the mean of both variables. This means you must first calculate the mean number of studying hours for all students, and then calculate the mean of their test scores. To calculate the means, add up all the number of studying hours and divide by the number of students. Repeat this for finding the mean of the test scores by adding up all the test scores and then dividing by the number of students.
-
-
3
Create a scatterplot graph using your data. Most spreadsheet programs, like Excel, will allow you to show your data in such a graph. In Excel, you can do this by clicking on the "Insert" menu, and selecting "Chart Wizard." Look for "Scatterplot," select it and then select your data range. You can select your range by clicking on any cell in your spreadsheet. Drag your mouse to highlight all the cells you would like included in the graph. For example, suppose you entered each student's studying hours in one column and the corresponding test scores in another column. Keep in mind that each student will represent one row. For this example, you would need to select both columns: test scores and studying hours. Then the software will display the scatterplot graph, which displays the data points on the x-axis and y-axis. For the example, y = test scores and x = number of studying hours.
-
4
Calculate the correlation coefficient. The correlation coefficient can be found by determining the sum of the differences between the values of x and the mean of x. You will then need to divide the sum of the differences by the standard deviation of x, and then multiply the sum of the differences between values of y and the mean of y divided by the standard deviation of y. Divide this calculation by the number in the sample minus 1. For example, if you had 30 students in this sample, you would subtract one, so 30 - 1 = 29.
-
5
Interpret the results by looking at the correlation coefficient. The correlation coefficient will have boundaries between -1 and +1. A value of +1 indicates a perfect positive relationship. A value of -1 indicates a perfect linear relationship. Using the example with study hours and test scores, the correlation coefficient is likely to be somewhere in between. A value of zero means there is no correlation. Any positive number between zero and 1 may suggest that the more studying hours a student puts in, the higher her test score will be. In other words, the higher the number is indicates the strength of the correlation. A negative number like -1 suggests that studying has a negative effect on test scores.
-
1
References
- Photo Credit Jupiterimages/Comstock/Getty Images