Posts

Showing posts from September, 2023

Choosing the Right Project for Your Portfolio

Image
   Etymologically, the word "project" essentially means "to throw or cast forward."  As students, we have all been assigned multiple projects to complete within in a stipulated time period whether we liked them or not, from papier-mâché in art class to solar system models in science.  We completed these projects for good grades at the end of the term, which may have led to an even greater incentive, that puppy you wanted for your birthday.  Moving into our professional lives as technology experts, we often face new projects, but now there's a paycheck waiting for us at the end.  This is great because we all have bills to pay.  But, if we focus only on the money, we might not be so willing to put in extra effort without getting paid for it. Recently, I began watching One Piece, and a common thread among all the main characters that stands out is their dedication to something they genuinely love.  Take Sanji, for example, who is so passionate about cooking that he

Web Scraping: An Introduction

Image
  Wolverine was here Web scraping has been used to compile extensive datasets on everything from UFO sightings to product reviews on the internet! If you have ever gone to a website, copied data from it, and pasted it into an Excel file, then congratulations, you have done web scraping manually. However, what if you had a bigger problem to solve and required more data from a website? For example, the wristband for your cheap Casio watch broke, and you decided it's time to upgrade to a more premium model.  However, before spending your money, you wanted to compare the prices, reviews and materials of thousands of watches to ensure that you are making the best decision when it comes to durability and your budget.  I hope we can both agree that this task is humanly impossible to get done within a reasonable amount of time, and that's where libraries like Scrapy and Beautiful Soup come into play. With great power comes great responsibility! Here are some pros and cons of web scrapi

Validating Email Address Format with Python's Great Expectations

Image
  Your email address is valid! This week, I had the task of gathering a client contact list, but I ran into some issues with certain email addresses. Specifically, I noticed that some email addresses were missing the '@' symbol, while others were lacking the domain name. To ensure the integrity of our client contact list, it's crucial that all email addresses follow the expected format: username@domain.com. This way, when we use this data for its intended purpose – sending emails to customers regarding promotions, updates, or other important communications – we can minimize the number of undeliverable emails. I've already taken steps to address this issue by removing rows that don't adhere to this format using Alteryx.  Today, I'm excited to show you how we can use Great Expectations on Google Colaboratory to identify  erroneous emails. I want to give a special shout-out to our Product Manager, Ikechi Griffith, for challenging me to dive deeper into Great Expect

Data Profiling with Python's Great Expectations

Image
  pip install great_expectations As a Data Quality Analyst, I constantly seek tools to enhance my daily workflow, and Great Expectations is a recent discovery. Before getting into what this Python library has to offer, let's address a fundamental question: What is Data Profiling? In order to effectively tackle any problem, one must first understand it.  Data Profiling is the art of uncovering and investigating data quality issues, such as duplication, missing values, and inconsistency.  It is essentially determining the baseline level of quality of a dataset in terms of the six data quality dimensions : Completeness, Uniqueness, Validity, Timeliness, Accuracy and Consistency.  Great Expectations Data Profiling Example Now, picture yourself as a pastry chef whose mission is to ensure that every currants roll emerging from your bakery is nothing short of delightful.  Let's explore how data profiling in Great Expectations guarantees they meet your standards. Step 1: Defining a Del

Documentation : Boring But Necessary Habit

Image
  "The palest ink is better than the best memory." — Chinese Proverb Writing things down is the best way to ensure that knowledge is accurately passed down from one generation to the next, rather than solely relying on our memories. For instance, let's consider our approach to cooking. To achieve the best results, you'll often search for the sponge cake recipe your mom uses, even though you've made cake many times. Similarly, when I discuss the work I completed last week with my coworkers, and I start with  "Do you remember when...,"  they often respond with, "Toni, just assume I forgot everything." Both of these scenarios confirm our lack of confidence in our ability to recall information.  As a result, I've developed the habit of writing down anything worth remembering, from my friend's birthdays to my Pennywise list, and I believe that this is a habit that everyone should adopt.  I would have briefly touched on this in  Blog 16 : Com