Understanding Outliers
Today's post is inspired by a question one of my colleagues had around outliers:
Are there different types?
And to that question, the simple answer is yes!
As data professionals, we often encounter data points that stand out like a sore thumb during analysis, leaving us wondering what to do with them.
In most cases, deletion seems like the obvious choice.
However, I believe we need to pause and take a closer look at these points before removing them.
You see, during Exploratory Data Analysis (EDA for short), asking the right questions about the dataset is crucial to uncovering its full story.
Deleting an outlier just because it stands out can create a narrow, myopic view of the data, limiting the depth and value of the insights you can gain.
Rather than rushing to remove outliers, consider what they might reveal. Could they signal an emerging trend, an unusual but important event, or even a data collection issue?
With that being said, here is a table of some types of outliers and examples:
Type | Explanation | Example |
---|---|---|
Global Outlier | A data point that is extremely different from all others in the dataset. | A TikTok video getting 10 million views in an hour when your other videos average 500 views. |
Contextual Outlier | A data point that seems unusual only when considering the specific context. | Someone buying 2 TVs during Black Friday—because they’re on sale, even if they don’t need them. |
Collective Outlier | A group of data points that deviate significantly when considered as a cluster. | Thousands of people suddenly migrating to the middle of the Nevada desert to build a temporary city for Burning Man. |
Natural Outlier | A valid data point that represents natural variation within the population. | A cat with heterochromia (one blue eye and one green eye)—rare, but perfectly natural and stunning! |
Outliers are much more than just anomalies in data—they are opportunities to ask better questions!
Whether it’s a global outlier that stands alone, a contextual outlier with a surprising twist, a collective outlier representing group behavior, or a natural outlier that simply reflects diversity, each has something valuable to reveal.
Comments
Post a Comment