Data Modelling: Star x Snowflake Schemas

 


The formation of snowflakes is a complex and delicate process 


This week, I added more tables to a data schema I created a few months ago. 

To ensure optimal query performance when checking data quality, I took a deeper look at the differences between the two most popular schemas: star and snowflake.

Now, if you are new to this, your first question may be: What is a data schema?

To answer that question, let's start with the origin of the word. 

"Schema" comes from the Greek word σχῆμα (skhēma), which means shape or plan.

A data schema is the blueprint that defines the structure of the database, including tables, columns, data types, constraints, and entity relationships.

Naturally, your next question may be: What is the difference between a star and a snowflake?

Well, that's a good question! I hope we can both agree that a star may look like a snowflake, but it's not a snowflake.

What makes a snowflake special is its tendency to branch out at the end. 

Your final question would be: What is the difference between a star schema and a snowflake schema?

Before I get into the characteristics of these two schemas, I must first introduce you to two key pieces of jargon.

Denormalization: Think of a pot of soup, you can find everything in the same place but, sometimes it might get a little messy. 

Normalization: Similar to putting Barbie dolls in one box and Hot Wheels cars in another, normalization is putting data into specific tables so that you know exactly where to find them.

Great! Now you are well-equipped to know what the differences are between them. 

Imagine a star! 🌟: Denormalization

  1. In a Star schema, you have a bright center (the fact table) surrounded by arms (the dimension tables).
  2. The fact table holds important numbers or data you want to analyze, like how many cookies you eat each day.
  3. The dimension tables provide extra details about the data, like the type of cookies, who made them, or where they came from.
  4. It's easy to understand and it makes queries faster because the data is organized in a straightforward way.

Now, think of a snowflake! ❄️: Normalization

  1. In a Snowflake schema, you still have a central part (the fact table) but the arms (dimension tables) have more branches, like a snowflake spreading out.
  2. The dimension tables are more detailed, broken down into smaller sub-tables, to avoid repeating information.
  3. It's like a fancy snowflake with more intricate patterns, but it might take a bit more time for queries because of the additional connections between the sub-tables.
In summary, a Star schema is simple like a star and great for faster queries, while a Snowflake schema is more complex like a snowflake with detailed patterns, but it optimizes storage and data consistency. Depending on your needs, you can choose the one that fits your data requirements!


Resources






Comments

Popular posts from this blog

Missing Data : What to Do?

Prompt Engineering : An Introduction

Upskilling: Certificates vs. Certifications

Women In STEM : Challenges and Advantages

SQL Server Reporting Services vs. Power BI

5 Authentication Methods

There Has Been a Data Breach: Now What?

Inductive and Deductive Reasoning

Improving SQL Query Performance : Indexes

Don't Be Bland : Spice Up Your Personal Brand