Algorithms: Linear Regression
y = mx + c: looks familiar, right?
In my Data Analytics journey, I've chosen to revisit and explore algorithms commonly used in this field.
Today, our focus is on linear regression: understanding its significance and limitations.
Let's begin by unraveling the concept of 'Linear Regression' through a series of "dumb" questions:
What is a line?
A line consists of an infinite number of points that extend indefinitely in two opposite directions.
What does it mean to be linear?
In mathematics, linearity often refers to a relationship between variables that can be graphically represented as a straight line.
What is regression?
Regression refers to a statistical method used to analyze the relationships between variables.
Considering these explanations, we can define Linear Regression as the analysis of the relationship between two variables displayed along a straight line.
Suppose we want to understand how traffic volume changes on a particular highway as Christmas approaches.
We collect data on the number of vehicles passing through a specific checkpoint on the highway daily, starting from the beginning of December until Christmas day.
Hypothetical Dataset Snippet
Day (December) | Number of Vehicles |
---|---|
1 | 2000 |
2 * | 2200 **** |
20 | 3800 |
21 ** | 4000 **** |
24 (Christmas Eve) | 5000 |
25 (Christmas Day) | 4800 |
Here are some pros and cons in using this approach for analyzing this dataset:
Pros
Simplicity
It clearly shows how traffic changes as Christmas approaches (assuming that it is roughly linear relationship)
Prediction
The model can forecast expected traffic volume for upcoming days leading to Christmas based on the established trend.
Cons
The Relationship Must Be Linear
If the actual relationship between the day of December and traffic volume is not truly linear but follows a more complex pattern we would need to use another model.
Overfitting
It can be too sensitive to the specificities of the dataset and might not accurately predict traffic patterns for new days or future years.
Underfitting
By being too simplistic, this line fails to capture the nuances and fluctuations in traffic patterns.
Comments
Post a Comment