How to minimize errors and maximize results in your hypothesis tests

Photo by Thomas Kelley on Unsplash

Introduction

Knowing how to set up and conduct a hypothesis test is a critical skill for any aspiring data scientist. It can feel confusing at first trying to make sense of alpha, beta, power, and type I or II errors. My goal in this article is to help you build intuition and provide some visual references.


Office Hours

A guide for increasing your business acumen and acing the interview questions

Photo by Hunters Race on Unsplash

Why are business case interview questions so important?

It’s not enough to be good at statistical tests, machine learning, or coding. These technical skills are, of course, essential to being good at data science. But it’s possible to know all the technical things and still be considered a terrible data scientist. One also needs the soft skills and business knowledge to be able to work effectively with others cross-functionally, communicate results, and really understand the problems you are trying to solve. Having some business acumen is going to make you a much more effective data scientist.


Shopping comparisons of men’s and women’s products on Amazon

Photo by Rochelle Brown on Unsplash

In honor of women’s history month, I wanted to do an exploratory data analysis (EDA) project related to gender equality. With equal pay day upon us in the United States, I immediately thought about pink tax. After all, nothing boils the blood quite like the combination of getting paid less while also not having your dollar go as far, all simply for being a woman.

What is pink tax?

Pink tax is the tendency for products marketed to women to be more expensive than equivalent products for men. Have you ever noticed that the pink razors cost an extra few cents? Or that women’s…


Getting Started

Getting better at SQL will save you time and frustration

Photo by Markus Spiske on Unsplash

SQL is an important skill for many data scientists. SQL (Structured Query Language) is a language that is very flexible and reads a lot like regular English. It allows for easy access to even the most complex table structures in a database. After all, what good is data if you can’t access it? Many jobs on the market call for SQL knowledge so it’s definitely a smart idea to at least learn some basics. …


A beginner’s guide for expanding your clustering knowledge beyond K-Means.

Photo by Nareeta Martin on Unsplash

Introduction

Recently I was using K-Means in a project and decided to see what other options were out there for clustering algorithms. I always find it enjoyable to sink my teeth into expanding my data science skillset. I decided to write this article to share the experience of what I discovered on my quest to broaden my clustering knowledge to include using Gaussian Mixture Models.

An Overview of Gaussian Mixture Models

When hearing of this technique you may have thought about the Gaussian distribution (also called normal distribution). That’s exactly what this clustering technique is based on. …


Say hello to a library called Yellowbrick

Photo by Kristen Beever on Unsplash

What is an AUC ROC plot?

An AUC ROC (Area Under the Curve Receiver Operating Characteristics) plot can be used to visualize a model’s performance between sensitivity and specificity. Sensitivity refers to the ability to correctly identify entries that fall into the positive class. Specificity refers to the ability to correctly identify entries that fall into the negative class. Put another way, an AUC ROC plot can help you identify how well your model is able to distinguish between classes.


How should you define success?

Photo by Nguyen Dang Hoang Nhu on Unsplash

When I start a new classification project I always take some time to sit down with myself, the data, and my business case to ask an important question: what does it mean to have a “successful” model? In this article I attempt to help you think through some different scoring metrics and which might be right for your modeling project, but this is by no means a complete or exhaustive list.

Accuracy

Often times the accuracy of the model is thought of as the most basic or standard scoring metric. Accuracy is based on how many predictions the model got correct…


Understanding some background basics can be a game changer

Photo by Jeet Dhanoa on Unsplash

Understanding some basics of how Python works under the hood can help you be more confident in coding. I want to share with you 3 things about Python that you may find useful especially if you are new to the language.

1. Variables are references to an address in memory

You may have heard people say, “everything in Python is an object”. Well if you are new to Python this can feel really confusing. Everyone says it but what does it really mean? …


With Examples Using Seaborn and Plotly Express.

Photo by Benedikt Geyer on Unsplash

When I first started with data science I was amazed at all the beautiful plots that could be made so easily with packages like Seaborn or Plotly Express. But there came a point where I was working on a project and realized the perfect EDA plot would show the percentage of entries in my data that were in the different target classes split out by a categorical feature. Some scouring through documentation, galleries, and Stack Overflow pages and I realized that there was no canned plot to be able to do what I wanted. In this article, I’m going to…

Vivienne DiFrancesco

Data scientist with a background in biology and health tech interested in using data for projects that improve lives. GitHub @HeyThatsViv

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store