Data Profiling with Python's Great Expectations

 

pip install great_expectations

As a Data Quality Analyst, I constantly seek tools to enhance my daily workflow, and Great Expectations is a recent discovery.

Before getting into what this Python library has to offer, let's address a fundamental question:

What is Data Profiling?

In order to effectively tackle any problem, one must first understand it. 

Data Profiling is the art of uncovering and investigating data quality issues, such as duplication, missing values, and inconsistency. 

It is essentially determining the baseline level of quality of a dataset in terms of the six data quality dimensions : Completeness, Uniqueness, Validity, Timeliness, Accuracy and Consistency. 


Great Expectations Data Profiling Example

Now, picture yourself as a pastry chef whose mission is to ensure that every currants roll emerging from your bakery is nothing short of delightful.

 Let's explore how data profiling in Great Expectations guarantees they meet your standards.

Step 1: Defining a Delicious Currants Roll

Similar to the standards for the perfect currants roll – a flaky, golden-brown crust and a filling of sweet currants – in data profiling, you define your expectations or rules to be followed for data quality.

Step 2: Finding the Ingredients 

It's crucial to know the precise locations of key ingredients such as butter, sugar, flour, and currants in your kitchen. Similarly, when working with Great Expectations for data profiling, you must specify where the data is located.

Step 3: Evaluating the Currants Roll

After your currants roll is baked, you taste-test it to confirm it meets your standards. With Great Expectations, you evaluate the results of your data analysis to ensure data quality aligns with your expectations.

Resources




Comments

Popular posts from this blog

Prompt Engineering : An Introduction

Women In STEM : Challenges and Advantages

5 Authentication Methods

Inductive and Deductive Reasoning

Don't Be Bland : Spice Up Your Personal Brand

3 Common Diseases Associated With Sitting All Day

Coding Best Practices : Error Messages Are Friends, Not Foes.

Upskilling: Certificates vs. Certifications

There Has Been a Data Breach: Now What?

Scheduling Algorithms