Data Lineage : Where Did Your Data Come From?


Heredity is the story written across generations.


Today's post is inspired by a conversation with a friend who is on a quest to build her family tree. 

This got me thinking about why we have such a strong desire to uncover our origins and as a Biology enthusiast, I couldn't help but draw parallels between genetics and data.

Why is it important to trace the origins of things? What do we gain from understanding where we come from? These are the questions I'll explore today.

In Biology, traits are passed down through generations. For instance, if your ancestors were from Ireland, you might inherit the genetic trait for red hair or a predisposition to certain conditions like celiac disease.

Similarly, in the world of data, knowing the lineage of information is crucial. Imagine a retail company’s quarterly sales report being skewed because a high-priced item was mistakenly listed at a bargain price, leading to inflated sales and misleading projections. 

The first question would be, "Where did the prices come from, and when were they last updated?"

 Understanding the origin of this data point is essential for quickly making corrections.

Just like in genetics, where understanding your ancestry can help you manage inherited traits, knowing the story behind each data asset allows you to address any "mutations" effectively.

With that being said, here is what you can do to track the origins of your data points:


Identify key data points

This could be sales data, customer information, product details, or any other relevant data.


Create a data dictionary

Define the meaning and context of each data element, including its units of measurement, data types, and any constraints.


Classify how it was created

This might be from internal systems, external sources, or manual input.


Document transformations

 Note any changes that occur to the data as it moves from one system to another.


Identify dependencies 

For example, a customer's order might depend on their shipping address and product availability.


Leverage automation

 Consider using specialized data lineage tools that can automate the process of mapping data flow and tracking changes.



Resources

 

Comments

Popular posts from this blog

Missing Data : What to Do?

Prompt Engineering : An Introduction

Upskilling: Certificates vs. Certifications

Women In STEM : Challenges and Advantages

SQL Server Reporting Services vs. Power BI

5 Authentication Methods

There Has Been a Data Breach: Now What?

Inductive and Deductive Reasoning

Improving SQL Query Performance : Indexes

Don't Be Bland : Spice Up Your Personal Brand