This project is a case study for the final stage of the Google Data Analysis Course on Coursera. The objective is to analyze data from smart device users to help the high-tech company Bellabeat unlock new growth opportunities. Bellabeat has invested in traditional advertising media such as radio, TV, print, and out-of-home billboards, as well as digital channels like Google Search, Instagram, Facebook, and Twitter.
Founded in 2013, Bellabeat is a high-tech company that develops wellness tracking devices specifically for women. By 2016, Bellabeat had launched multiple products and expanded its business globally. These products became available on their own e-commerce platform, as well as through various online retailers. Bellabeat places a strong emphasis on digital marketing, utilizing Google Search, video advertisements, and consumer engagement on social media platforms.
Bellabeat has introduced five products:
The company aims to gain insights into how people are using their smart devices. Using this information, I will provide high-level recommendations to inform Bellabeat's marketing strategy.
1.1 What is the problem we are trying to solve?
The problem we are trying to solve is understanding how users interact with their smart devices to identify patterns and trends that can help Bellabeat optimize its marketing strategy and unlock new growth opportunities.
1.2 How can we drive business decisions?
By analyzing smart device usage data, we can identify key insights and trends that inform Bellabeat on user behavior. These insights can guide decisions on targeted advertising, product development, and marketing campaigns across various channels such as radio, TV, print, billboards, and social media platforms like Google Search, Instagram, Facebook, and Twitter.
2.1 Data:
Urška Sršen, Bellabeat’s cofounder and Chief Creative Officer, encourages the use of public data that explores smart device users' habits. She points to a specific dataset: FitBit Fitness Tracker Data, which is made available through Mobius on Kaggle and updated annually.
2.2 Loading Packages:
install.packages("tidyverse")
install.packages("here")
install.packages("skimr")
install.packages("janitor")
install.packages("lubridate")
install.packages("readr")
install.packages("ggpubr")
library(tidyverse)
library(here)
library(skimr)
library(janitor)
library(lubridate)
library(readr)
library(ggpubr)
2.3 Importing data and with names:
dailyActivity <- read.csv(".../dailyActivity_merged.csv")
heartrate <- read.csv(".../heartrate_seconds_merged.csv")
...
weight <- read.csv(".../weightLogInfo_merged.csv")
2.4 Take a look again on data and its structure:
head(dailyActivity)
str(dailyActivity)
...
2.5 Lets check the number of participants in each file:
n_unique(dailyActivity$Id)
Output: [1] 33
All datasets have 33 participants each, except for heartrate and weight datasets. Dropped due to small sample size.
2.6 Duplicates :
sum(duplicated(dailyActivity))
Output: [1] 0
Drop the duplicated rows:
dailyActivity <- dailyActivity %>% distinct() %>% drop_na()
...
3.1 Make Time and Date Same Format:
dailyActivity <- dailyActivity %>%
rename(date = ActivityDate) %>%
mutate(date = as_date(date, format = "%m/%d/%Y"))
...
3.2 Merging Data:
daily_activity_sleep <- merge(dailyActivity, sleepDay, by= c("Id", "date"))
3.3 Calculate Time in Bed without Sleeping:
daily_activity_sleep$InBedWithoutSleeping <- daily_activity_sleep$TotalTimeInBed - daily_activity_sleep$TotalMinutesAsleep
3.4 Average Daily Data:
daily_average <- daily_activity_sleep %>%
group_by(Id) %>%
summarise(mean_daily_steps = mean(TotalSteps), ...)
4.1 User Types Distribution:
classify_activity <- function(steps) {
if (steps < 5000) return("Sedentary")
else if (steps < 7500) return("Lightly_Active")
...
}
Pie chart visualization:
4.2 Average Calories per Hour:
AverageCaloriesPerHour <- hourlyCalories %>%
mutate(hour = format(date_time, "%H")) %>%
group_by(hour) %>%
summarise(avg_calories = mean(Calories, na.rm = TRUE))
4.3 Daily Average Sleep Time and Average Steps:
weekday_steps_sleep <- daily_activity_sleep %>%
mutate(weekday = weekdays(date)) %>%
...
4.4 Smart Devices Usage
To determine how often users used their smart devices over the 31-day period, we calculated the number of days each user had activity data and sleep data. This helps us understand user engagement levels with the devices.
device_usage_days <- daily_activity_sleep %>%
group_by(Id) %>%
summarise(activity_days = n_distinct(date),
avg_daily_steps = mean(TotalSteps),
avg_sleep_min = mean(TotalMinutesAsleep))
I also plotted a histogram of the number of active days per user to visualize how frequently users engaged with their devices.
4.5 Correlation
Now, let's examine the correlation between daily steps and calories, as well as between daily steps and sleep.
ggplot(data= subset(daily_activity_sleep,!is.na(TotalMinutesAsleep)),aes(TotalSteps,TotalMinutesAsleep))+
geom_rug(position= "jitter", size=.08)+
geom_jitter(alpha= 0.5)+
geom_smooth(color= "blue", linewidth=.6)+
stat_cor(method = "pearson", label.x = 15000, label.y = 650)+
labs(title= "Daily steps vs. sleep", x= "Daily Steps", y= "Minutes Asleep")+
theme_minimal()
ggplot(daily_activity_sleep,aes(TotalSteps,Calories))+geom_jitter(alpha=.5)+
geom_rug(position="jitter", linewidth=.08)+
geom_smooth(linewidth =.6)+
stat_cor(method = "pearson", label.x = 20000, label.y = 2300)+
labs(title= "Daily Steps vs. Calories", x= "Daily steps", y="Calories")+
theme_minimal()
4.6 User Type vs Time in the Bed without Sleeping
I want to see the average TimeInBedWithoutSleep for each user type:
avg_timeInBed_UserType <- daily_average %>%
group_by(user_type$user_type) %>%
summarise(mean_time_in_Bed_per_UserType = mean(mean_time_inBed_without_sleep, na.rm = TRUE))
head(avg_timeInBed_UserType)
colnames(avg_timeInBed_UserType) <- c("user_type", "mean_time_in_Bed_per_UserType")
ggplot(avg_timeInBed_UserType, aes(x = user_type, y = mean_time_in_Bed_per_UserType)) +
geom_bar(stat = "identity", fill = "skyblue", color = "black") +
labs(title = "Average Time in Bed per User Type",
x = "User Type",
y = "Mean Time in Bed")+theme_minimal()
I used visualizations such as bar graphs and pie charts to make the data easier to understand and accessible to key stakeholders. The analysis was shared through an HTML web report and presentations.
Visual Summary:
Based on our analysis, here are our recommendations:
Future Work:
This case study was completed as part of the Google Data Analytics Capstone Project using R and tidyverse packages. All visualizations and code were created by me as part of the learning journey.