Algorithms: Linear Regression

 


y = mx + c: looks familiar, right?


In my Data Analytics journey, I've chosen to revisit and explore algorithms commonly used in this field.

Today, our focus is on linear regression: understanding its significance and limitations.

Let's begin by unraveling the concept of 'Linear Regression' through a series of "dumb" questions:

What is a line?

A line consists of an infinite number of points that extend indefinitely in two opposite directions.


What does it mean to be linear?

In mathematics, linearity often refers to a relationship between variables that can be graphically represented as a straight line. 


What is regression?

Regression refers to a statistical method used to analyze the relationships between variables.


Considering these explanations, we can define Linear Regression as the analysis of the relationship between two variables displayed along a straight line.


Suppose we want to understand how traffic volume changes on a particular highway as Christmas approaches. 

We collect data on the number of vehicles passing through a specific checkpoint on the highway daily, starting from the beginning of December until Christmas day.


Hypothetical Dataset Snippet

Day (December)Number of Vehicles
12000
2
*
2200
****
203800
21
**
4000
****
24 (Christmas Eve)5000
25 (Christmas Day)4800


Here are some pros and cons in using this approach for analyzing this dataset: 

Pros

Simplicity

It clearly shows how traffic changes as Christmas approaches (assuming that it is roughly linear relationship)

Prediction

The model can forecast expected traffic volume for upcoming days leading to Christmas based on the established trend.


Cons

The Relationship Must Be Linear

If the actual relationship between the day of December and traffic volume is not truly linear but follows a more complex pattern we would need to use another model. 

Overfitting

It can be too sensitive to the specificities of the dataset and might not accurately predict traffic patterns for new days or future years.

Underfitting

By being too simplistic, this line fails to capture the nuances and fluctuations in traffic patterns. 



Resources 

Comments

Popular posts from this blog

Missing Data : What to Do?

Prompt Engineering : An Introduction

Upskilling: Certificates vs. Certifications

Women In STEM : Challenges and Advantages

SQL Server Reporting Services vs. Power BI

5 Authentication Methods

There Has Been a Data Breach: Now What?

Inductive and Deductive Reasoning

Improving SQL Query Performance : Indexes

Don't Be Bland : Spice Up Your Personal Brand