Data Modelling: Star x Snowflake Schemas

- July 29, 2023

The formation of snowflakes is a complex and delicate process

This week, I added more tables to a data schema I created a few months ago.

To ensure optimal query performance when checking data quality, I took a deeper look at the differences between the two most popular schemas: star and snowflake.

Now, if you are new to this, your first question may be: What is a data schema?

To answer that question, let's start with the origin of the word.

"Schema" comes from the Greek word σχῆμα (skhēma), which means shape or plan.

A data schema is the blueprint that defines the structure of the database, including tables, columns, data types, constraints, and entity relationships.

Naturally, your next question may be: What is the difference between a star and a snowflake?

Well, that's a good question! I hope we can both agree that a star may look like a snowflake, but it's not a snowflake.

What makes a snowflake special is its tendency to branch out at the end.

Your final question would be: What is the difference between a star schema and a snowflake schema?

Before I get into the characteristics of these two schemas, I must first introduce you to two key pieces of jargon.

Denormalization: Think of a pot of soup, you can find everything in the same place but, sometimes it might get a little messy.

Normalization: Similar to putting Barbie dolls in one box and Hot Wheels cars in another, normalization is putting data into specific tables so that you know exactly where to find them.

Great! Now you are well-equipped to know what the differences are between them.

Imagine a star! 🌟: Denormalization

In a Star schema, you have a bright center (the fact table) surrounded by arms (the dimension tables).
The fact table holds important numbers or data you want to analyze, like how many cookies you eat each day.
The dimension tables provide extra details about the data, like the type of cookies, who made them, or where they came from.
It's easy to understand and it makes queries faster because the data is organized in a straightforward way.

Now, think of a snowflake! ❄️: Normalization

In a Snowflake schema, you still have a central part (the fact table) but the arms (dimension tables) have more branches, like a snowflake spreading out.
The dimension tables are more detailed, broken down into smaller sub-tables, to avoid repeating information.
It's like a fancy snowflake with more intricate patterns, but it might take a bit more time for queries because of the additional connections between the sub-tables.

In summary, a Star schema is simple like a star and great for faster queries, while a Snowflake schema is more complex like a snowflake with detailed patterns, but it optimizes storage and data consistency. Depending on your needs, you can choose the one that fits your data requirements!

Resources

Draw Entity-Relationship Diagrams, Painlessly

Video : Snowflake and Star Schemas

Course : Data Modelling in Power BI

HackerRank : SQL Practice Questions

Normalization and Denormalization

Search This Blog

Tech Talk with Toni