How to Calculate Outliers

Calculate the correlation coefficient
••• class room board image by Alhazm Salemi from Fotolia.com

An outlier is a value in a data set that is far from the other values. Outliers can be caused by experimental or measurement errors, or by a long-tailed population. In the former cases, it can be desirable to identify outliers and remove them from data before performing a statistical analysis, because they can throw off the results so that they do not accurately represent the sample population. The simplest way to identify outliers is with the quartile method.

    Sort the data in ascending order. For example take the data set {4, 5, 2, 3, 15, 3, 3, 5}. Sorted, the example data set is {2, 3, 3, 3, 4, 5, 5, 15}.

    Find the median. This is the number at which half the data points are larger and half are smaller. If there are an even number of data points, the middle two are averaged. For the example data set, the middle points are 3 and 4, so the median is (3 + 4) / 2 = 3.5.

    Find the upper quartile, Q2; this is the data point at which 25 percent of the data are larger. If the data set is even, average the 2 points around the quartile. For the example data set, this is (5 + 5) / 2 = 5.

    Find the lower quartile, Q1; this is the data point at which 25 percent of the data are smaller. If the data set is even, average the 2 points around the quartile. For the example data, (3 + 3) / 2 = 3.

    Subtract the lower quartile from the higher quartile to get the interquartile range, IQ. For the example data set, Q2 – Q1 = 5 – 3 = 2.

    Multiply the interquartile range by 1.5. Add this to the upper quartile and subtract it from the lower quartile. Any data point outside these values is a mild outlier. For the example set, 1.5 x 2 = 3; thus 3 – 3 = 0 and 5 + 3 = 8. So any value less than 0 or greater than 8 would be a mild outlier. This means that 15 qualifies as a mild outlier.

    Multiply the interquartile range by 3. Add this to the upper quartile and subtract it from the lower quartile. Any data point outside these values is an extreme outlier. For the example set, 3 x 2 = 6; thus 3 – 6 = –3 and 5 + 6 = 11. So any value less than –3 or greater than 11 would be a extreme outlier. This means that 15 qualifies as an extreme outlier.

    Tips

    • Extreme outliers are more indicative of a bad data point than a mild outlier.

Related Articles

How to Calculate Variance
How to Calculate the Mean, Median, & Mode
How to Calculate Statistical Mean
How to Calculate the Distribution of the Mean
How to Calculate Relative Standard Error
How to Calculate Valid Percent
How to Calculate a Confidence Interval
How to Calculate the Confidence Interval of the Mean
How to Calculate a T-Score
How to Calculate a Sigma Value
How to Calculate Median Change
Median Salary Definition
How to Calculate Absolute Deviation (and Average Absolute...
How to Calculate a P-Value
How to Compute a Population Mean
How to Calculate the Midrange
How to Calculate Dispersion
How to Get the Average of Decimals
How to Calculate Skew
How to Calculate the Interquartile Range