FRANKENMILLER's Data Bootcamp Final Project

This is Bryan Allen's submission for IDB10's Final Porfolio for skills of Python-Jupyter-Pandas-GitHub programming. It is based on a popular database of nutrition data from McD's Menu Items in India! There are three Q's I am curious to know and will attempt to answer using data manipulation:

1.) Which items have the most sugar and/or the HIGHEST ratio of sugar to serving size?
2.) What is the most unhealthy item on the Menu(I.E. Which items contain the most fat?
3.) Which items have the most sodium and/or the HIGHEST ratio of sodium to serving size?

I hope that some these insights can help our programmer friends on Indian sub-continent make healthy choices for themselves and for their familes, so that they can responsibly enjoy these unique Indian-American treats. Who knows, maybe some of them will choose to work with McDs in the future.

Table of Contents

Libraries and Data

A NOTE ON NAMING CONVENTIONS: My project limits use of aliases, since I am owner of this Notebook. I'm aware that on teams the standard naming convention is to create abbreviated aliases, and so when I'm on those teams I'll defer to their conventions, but if the project is mine I will name the modules whatever looks the coolest.

Some programmers have asked why I do this; my response is only that to me aliases make everything kinda look the same and force me to strain looking at them, the same when I look at red on green background, so maybe it's a disconnect between the brain and the eye (indeed I do have a touch of that type of color blindness). But the simple answer is that python code is fun to type, and it's beautiful to look at, for me anyway. Why would I want cover that up?

Bryan Allen, apprentice Data Engineer
Lusaka Zambia August 2022

Previewing Data


Taking look at head and tail of our DB can quickly give us idea how long and wide the DB we're working with is. Also we'll get rough idea how much cleaning our data will require.

Pretty Ugly if you ask me! This data's going take aLOT of work!


Let's put those columns into a list.

Even tho this Histogram tells us little it was fun practice to create

Return to Top: Table of Contents

Data Cleaning


Cleaning requires the skill of a detective and the creativity of a seasoned programmer. One learns all kinds of neat tricks and builds out his/her utility belt of tools. So let's not be afraid get our hands dirty, this part is fun!

Some of the cell values have special characters that need to be stripped out in order convert them from object to string type data.

When splitting values contained within the serving size we can see if the menu item is a drink or a food by observing whether it is measured in grams or mililiters. This distinction will be critical in our observations later.

Check out this conversion machine! It's takes an objects as input and spits them out as the desired float data type

Return to Top: Table of Contents

Question #1 Danger of Sugar


What item has HIGHEST ratio of sugar to serving size? In order try answer this Q I will utilize a scatterplot, with serving size for x-axis and sugars on the y-axis. Later we will do some arithmetic to calculate the respecive ratios but first let's visualize that data! We'll need create the necessary dataframes.

Now that we've created our dataframes let's create a scatterplot. Perhaps a few items will pop out at us.

Now that we've visualized them let's create tables to see the worst offenders side-by-side with their names for easy identification. First table pay attn to the Total Sugars column, for second table look closely at grams sugar per ml column.

So even tho we see the drinks that contain the highest sugars are the usual offenders like large size Fanta and Coco-Cola we should advise our friends to also be selective with some of the McCafe hot drinks for they seem to be the menu items most saturated in sweeteners.

Return to Top: Table of Contents

Question #2 Danger of Fat


As living standards on Indian sub-continent rise so too are the size of ppls' waistlines. Whereas our first set of Qs could arm consumers against threat of Diabetes, this second set of Qs will help our friends make informed choices to protect themselves against rising threat of obesity.

Above and below we can see the fattiest foods on the menu, both in aggregate and as percentage of serving size. One suriprising find above the veggie version of the "Maharaja Mac" actually contains more grams of fat than the Chicken version, who knew? From this we can also encourage consumers to look carefully into what's actually contained in their foods.

Return to Top: Table of Contents

Question #3 Danger of Sodium


Finally we'll take a look at an aspect of our food that could be uniquely dangerous: Sodium. Eating too much salty foods has been linked to a variety of illnesses like hyptertension and heart problems, especially when combined with a stressful lifestyle.

Let's find the top five dishes that contain the highest Sodium totals

Take look above at the "Maharaja Macs" I was pointing out earlier and we can see the chicken version contain's 25% more sodium than the veggie version. Also note the Ghee Rice dish, even tho it sounds tasty and that the serving size is eight percent larger than the chicken Maharaja Mac it also contains a whopping thirty percent more sodium.

In fact this might be a good moment take a quick sidetrip step away from the tables to take look at the serving sizes. Keeping an eye on serving sizes could help consumers stay trim and fit. Let's create a visual.

This chart of the largest serving sizes almost perfectly overlaps with the tables of the menu items that contained greatest Total Fat and Sodium that we saw earlier. Let's one look at our last table, we can draw one last insight.

So these chicken strips even tho they sound mighty tasty Indian consumers should definately consider limiting their intake of these items for chicken products indeed are the saltiest items on the entire menu.

Return to Top: Table of Contents