A beginner’s guide for expanding your clustering knowledge beyond just K-Means.

Image for post
Image for post
Photo by Nareeta Martin on Unsplash


Recently I was using K-Means in a project and decided to see what other options were out there for clustering algorithms. I always find it enjoyable to sink my teeth into expanding my data science skillset. I decided to write this article to share the experience of what I discovered on my quest to broaden my clustering knowledge to include using Gaussian Mixture Models.

An Overview of Gaussian Mixture Models

When hearing of this technique you may have thought about the Gaussian distribution (also called normal distribution). That’s exactly what this clustering technique is based on. …

Say hello to a library called Yellowbrick

Image for post
Image for post
Photo by Kristen Beever on Unsplash

What is an AUC ROC plot?

An AUC ROC (Area Under the Curve Receiver Operating Characteristics) plot can be used to visualize a model’s performance between sensitivity and specificity. Sensitivity refers to the ability to correctly identify entries that fall into the positive class. Specificity refers to the ability to correctly identify entries that fall into the negative class. Put another way, an AUC ROC plot can help you identify how well your model is able to distinguish between classes.

In real world problems, there is often overlap between classes which means catching all true negatives and true positives can be a trade off. …

How should you define success?

Image for post
Image for post
Photo by Nguyen Dang Hoang Nhu on Unsplash

When I start a new classification project I always take some time to sit down with myself, the data, and my business case to ask an important question: what does it mean to have a “successful” model? In this article I attempt to help you think through some different scoring metrics and which might be right for your modeling project, but this is by no means a complete or exhaustive list.


Often times the accuracy of the model is thought of as the most basic or standard scoring metric. Accuracy is based on how many predictions the model got correct…

Understanding some background basics can be a game changer

Image for post
Image for post
Photo by Jeet Dhanoa on Unsplash

Understanding some basics of how Python works under the hood can help you be more confident in coding. I want to share with you 3 things about Python that you may find useful especially if you are new to the language.

1. Variables are references to an address in memory

You may have heard people say, “everything in Python is an object”. Well if you are new to Python this can feel really confusing. Everyone says it but what does it really mean? …

With Examples Using Seaborn and Plotly Express.

Image for post
Image for post
Photo by Benedikt Geyer on Unsplash

When I first started with data science I was amazed at all the beautiful plots that could be made so easily with packages like Seaborn or Plotly Express. But there came a point where I was working on a project and realized the perfect EDA plot would show the percentage of entries in my data that were in the different target classes split out by a categorical feature. Some scouring through documentation, galleries, and Stack Overflow pages and I realized that there was no canned plot to be able to do what I wanted. In this article, I’m going to…

Vivienne DiFrancesco

Data scientist with a background in biology and health tech interested in using data for projects that improve lives. GitHub @HeyThatsViv

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store