A "hypothesis" is a prediction or an explanation for a certain phenomenon. In science, these hypotheses are tested in studies where data is collected and then analyzed to see if it supports or refutes the hypothesis. However, because data collection and analysis is never perfect, there is always a certain probability of getting a positive result, even when the hypothesis is actually incorrect. Significance testing tries to work out what this probability is; the lower the probability, the more significant the findings.
In statistical analysis, “significance” has a specific technical meaning. In general usage, significant can mean that something has meaning, or is important. However, when scientists and other data analysts say that a result was significant, they do not simply mean a large or noteworthy finding. They mean that the results obtained in the study have met certain statistical conditions.
Significance is reported using a “p value.” This value measures probability from zero, meaning 0 percent chance, and 1, meaning 100 percent chance. The closer the figure is to zero, the harder it is to get the results obtained in the analysis by chance, and therefore the more confidence researchers can have in the findings. The p value is calculated through complex analysis procedures, usually using specialized software programs.
An important question for analysts to ask is: “What p value is acceptable?” This “acceptable level” is called the alpha, and it is the cut-off point, below which the results are considered statistically significant. In many fields, including psychology, sociology and economics, alpha is set to 0.05. This means that, if the probability of obtaining the results by chance is 5 per cent or lower, they are considered statistically significant.
Type I and Type II errors
The alpha level has important implications. When set too high, for example 0.2, false positives will slip through the net and analysts assume an effect when there isn't one. This is a Type I error. When set too low, for example 0.0001, false negatives may be made and researchers may assume no effect when there is one. This is a Type II error. There is no scientific way to set alpha, and the commonly used 0.05 convention is essentially arbitrary.
A major limitation of NHST is that the p value is highly influenced by the number of data points in the analysis. If there are thousands of data points in the analysis, even very small effects may be statistically significant. So, a significant effect in a study might not represent something that is having an effect in the real world. To get around this, significance is usually combined with other statistics, such as the “effect size,” which approximates the size of the difference.
- Photo Credit Thinkstock/Comstock/Getty Images