Posts

Showing posts from April, 2023

Numbers vs Words : Quantitative and Qualitative Data Explained

Image
  Russian nesting dolls make for a great analogy of ordinal data, where each subsequent doll is larger or smaller than the previous one, representing a ranking or order As someone with a background in natural sciences, I have a solid understanding of the difference between quantitative and qualitative data. Quantitative data is anything that can be measured - for example, the atmospheric pressure can be measured using a barometer. It's the kind of data that can answer life's most pressing questions, such as 'How many more hours until Friday?' and 'How much caffeine from Starbucks is too much?' On the other hand, qualitative analysis provides a description of something - for instance, you may have brown hair and be of Asian descent. However, these two categories of data are like Russian nesting dolls that can be further divided into sub-categories that unpack the characteristics of the data even more. Here are the respective subcategories of quantitative and qual

Data Wrangling : Best Practices For Working With Big Datasets

Image
  A matrix of dots that you did not bother to count Let's face it, working with exponentially expanding datasets can be both exciting and overwhelming. Imagine dealing with a dataset of 22 million rows - that's a lot of information to process!  The question is, can your ETL process handle it?  This is a problem that I faced this week, and I had to find a way to improve the performance of the process that updates a dashboard as it was taking a decade to update. The initial question that crossed my mind was, "What was the actual size of this dataset?" At first, I mistakenly assumed that the dataset contained between 1 and 6 million rows. A quick COUNT(*) query made it clear that my estimate was way off and also provided me with some clarity to the problem. That dataset was a behemoth, it probably had its own gravitational pull! The sad truth was that the process was not scalable, and it was clear that immediate improvements were necessary. Here are a three tips that I h

The ETL Process: Pros and Cons of Low-Code Software

Image
IBM 350 Disk Storage:  a huge stack of metal plates that could store up to 5,000,000 characters In school, you learn the ropes of programming and might even find yourself tackling odd assignments, like creating a cricket simulator in C++ in just two weeks.  In the professional world, ad hoc report requests can cause anxiety. Gone are the days of set deadlines.  Fear not, as you can now utilize low-code software to easily create programs by dragging and dropping tools for a wide range of tasks. Today, our focus will be on the ETL process. In case you're not familiar with the acronym ETL, it stands for Extract, Transform, and Load - the essential steps in the data pipeline. This process involves extracting data from multiple sources, transforming it into a desired format or structure, and loading it into a target system for analysis or reporting. However, this repetitive process can be time-consuming as it often involves cleaning, joining, filtering, and aggregating data. As a result

Coding Best Practices : Error Messages Are Friends, Not Foes.

Image
  This error's name originated from Room 404 At first, I saw error messages as coding villains, ready to ruin my day with red underlines. As soon as I see one, I panic and do everything humanly possible to get rid of it. But now, I view error messages as superheroes here to help me debug my code. The inspiration for this week's post comes from a problem I encountered on the job with code. The Problem Errors were introduced into SQL queries that were needed to generate a crucial report due to updates I made.  However, I didn't know exactly where to start the debugging process, as all the scripts were being executed dynamically.  Whenever one of these queries failed, the entire process would throw a generic error message that did not specify where the error occurred. The brute force approach of combing through dozens of scripts to find a handful of syntactical errors would have been stressful and time-consuming. The Solution Instead of manually looking for the problem, I want