How to Use Chi Square Test of Independence

How to Use Chi Square Test of Independence thumbnail
You can use statistical tests like the chi-square test of independence to test for correlations.

The chi-square test of independence (TOI) is a statistical test that helps you determine whether two variables are correlated. You'll use the chi-square TOI when you want to compare a measurement across different categories. Let's say, for example, that you want to compare the number of students who get As between five different schools in wealthy and poor districts (five categories). Or perhaps you want to compare the number of flowers that are red and the percentage that are pink in four geographic regions. The chi-square test will help you determine whether there is a relationship between these variables.

Instructions

    • 1

      Define your null hypothesis. The null hypothesis assumes that the two variables are independent of each other and that there is no correlation. Let's say, for example, that you are looking at the number of students who got scores in one of 3 ranges (excellent, average and poor) on a standardized test, and you are comparing the results from five different school districts. The null hypothesis would be that the relative proportions in the excellent, good, and poor categories would be the same in all five districts. In other words, the percentages of students in each category should be the same across all five districts if the null hypothesis is true.

    • 2

      Set up a table of your data. You can use a spreadsheet program if you have one to make this step even easier. The columns will be one set of categories and the rows will be another. For example, if you are working with the students from the five school districts, the columns would be Excellent, Average, and Poor (for the number of students with scores in each range), while the rows would be the names of the five school districts. Each cell of the table, then, would contain the number of students who scored in a particular range for a particular district.

    • 3

      Add up the number in each column and divide that by the total number of individuals in the table to find the Expected percentage. In the school example, for instance, let's say 700 students in all five school districts took the standardized test, and 300 scored in the Excellent range, 200 in the Average range, and 200 in the Poor range. If you divide 300 / 700, you have 0.429 or 42.9 percent, which means that 42.9 percent of ALL students were in the Excellent category. If the null hypothesis is true, there is no difference between the districts; so if we accept the null hypothesis, we would expect 42.9 percent of students in each district to have scored in the Excellent category. Therefore, 42.9 is your Expected percentage.

    • 4

      Multiply the total number in each row by the Expected percentage for each column to find the Expected value for each cell. In the school example, for instance, let's say 200 students in the Lamont school district took the test. Since the Expected percentage is 42.9, you would expect that 42.9 percent or 86 of these students achieved a score in the Excellent range. This is your Expected value for the number of excellent scores at Lamont.

    • 5

      Take the Observed number (the actual number in the table) for each cell and subtract the Expected value (the number you calculated from the last step). Continuing with the school example, let's say that for Lamont school district, the Observed values in the excellent, average, and poor columns were 100, 50, and 50, while the Expected values were 86, 20 and 94. If you subtract the Expected from the Observed for each column, you have 14, 30 and -44.

    • 6

      Square each result from the last step, then divide it by the Expected value for that cell. Continuing with the school example, if you square 14 you have 196; when you divide this by 86 (the Expected value for that cell) you now have 2.28. If you do the same for the other two columns for Lamont district, you have 45 and 20.5.

    • 7

      Sum up or add together all the results from step 6 for your table. Continuing the schools example, let's say the total chi-squared sum for the whole table turns out to be 100.

    • 8

      Subtract 1 from the number of rows in your table, then subtract 1 from the number of columns in your table. Finally, multiply these two results by each other. Continuing the schools example, your table had 3 columns and 5 rows. 5 -1 = 4 and 3 - 1 = 2, so 2 x 4 = 8. This number is your degrees of freedom.

    • 9

      Click on the link under the Resources section to retrieve a table of chi-square values. Find the number corresponding to your degrees of freedom from the last step in the column labelled "df", then compare your result from step 7 with the number in the p = 0.05 column. If your result from step 7 is GREATER than the number in the p = 0.05 column, you can confidently reject the null hypothesis; your results suggest there is indeed a correlation between these variables. If, on the other hand, your result is LESS than the number shown, you cannot reject the null hypothesis; you do not have sufficient evidence to show a correlation.

      Continuing with the schools example, you had 8 degrees of freedom and a chi-square value of 100. The value in the p = 0.05 column for 8 degrees of freedom is 15.51. Therefore, you can confidently reject the null hypothesis.

Tips & Warnings

  • Note that you can only use the chi-square test for numbers of individuals, NOT percentages or proportions.

  • A spreadsheet program can do all these calculations for you and make your task much simpler. In fact, you can probably set up a formula that will automate the whole process. Consult the user's manual for your spreadsheet program to figure out how to do this.

  • Remember that correlation does not equal causation. With the school districts, for example, you have evidence to suggest a correlation between district and student performance. That does NOT necessarily mean, however, that teaching quality in each district determines student performance -- there could very well be other factors involved that explain the correlation.

Related Searches:

References

Resources

  • Photo Credit Creatas/Creatas/Getty Images

Comments

Related Ads

Featured