My_Study_Projects/How Can a Wellness Technology Company Play It Smart?/fit_data_analysis.Rmd

---
title: "Fit Data Analysis"
author: "Sachie Tran"
date: "9/23/2021"
output:
  html_document
---

<hr>

## Senario 

<font size="4">
You are a junior data analyst working on the marketing analyst team at Bellabeat, a high-tech manufacturer of health-focused products for women. Bellabeat is a successful small company, but they have the potential to become a larger player in the global smart device market. Urska Srsen, cofounder and Chief Creative Officer of Bellabeat, believes that analyzing smart device fitness data could help unlock new growth opportunities for the company. You have been asked to focus on one of Bellabeat's products and analyze smart device data to gain insight into how consumers are using their smart devices. The insights you discover will then help guide marketing strategy for the company. You will present your analysis to the Bellabeat executive team along with your high-level recommendations for Bellabeat's marketing strategy.</font>

<hr>

## Set up my environment 

### Install packages and libraries
```{r message=FALSE, error=FALSE, warning=FALSE}
#install.packages("tidyverse")
library(tidyverse)
#install.packages("dplyr")
library(dplyr)
```

<br/><br/>

<hr>

### Import three dataframes

```{r message=FALSE, error=FALSE, warning=FALSE}
daily_activity <- read_csv("dailyActivity_merged.csv")

sleep_day <- read_csv("sleepDay_merged.csv")

weight_log <- read_csv("weightLogInfo_merged.csv")

```

```{r message=FALSE}
head(daily_activity)

head(sleep_day)

head(weight_log)
```


<hr>


## The goal of this project is to answer these questions;
<font size="4">
- What are the desirable features for Bellabeat considering the current wearable device trend?<br/>
- What feature can we add for better self-management?
</font>

<br/><br/>

<hr>

##  Clean/organize dataframes 

### 1. Change date format
```{r}
daily_activity$ActivityDate <- as.Date(daily_activity$ActivityDate, format = "%m/%d/%Y")
head(daily_activity)
```

```{r}
sleep_day$ActivityDate <- as.Date(sleep_day$SleepDay, format = "%m/%d/%Y")
sleep_day <- sleep_day[,c(1,6,3,4,5)]  # change the order and remove unnecessary columns
head(sleep_day)
```

```{r}
weight_log$ActivityDate <- as.Date(weight_log$Date, format = "%m/%d/%Y")
weight_log <- weight_log[,c(1,9,3,4,5,6,7,8)]  # Change the order and remove unnecessary columns.
head(weight_log)
```

<br/><br/>

### 2. Data merging

```{r}
# Create a primary-key combined "Id" and "ActivityDate" in each dataframe

daily_activity$key <- paste(daily_activity$Id,daily_activity$ActivityDate)

sleep_day$key <- paste(sleep_day$Id,sleep_day$ActivityDate)

weight_log$key <- paste(weight_log$Id,weight_log$ActivityDate)

```

```{r}
all_data <- merge(daily_activity, sleep_day, by=c("key"), all=T)

dim(all_data)   # Check if any row is missing

```

```{r}
all_data <- merge(all_data, weight_log, by=c("key"), all=T)

dim(all_data)  # Check if any row is missing

head(all_data)

```

```{r}
# Reorganize and remove unnecessary columns

fit_data <- all_data[,c(2,3,4,5,7,8,9,10,11,12,13,14,15,16,20,21,23,24,25,26,27)]

```

### Insights

<font size="4">
- There is no data about communication(texting/calling)<br/>
- Most people report their weights by manual. What can I find from the weight report?<br/>
- How do sleep time and workout results relate each other?

</font>

<br/><br/>

<hr>

## Analysis

<br/><br/>

### 1. See the correlation between body-weight scale frequency and exercise results

<br/>

```{r}
# Add body-weight frequency column

fit_data$WeightPounds[is.na(fit_data$WeightPounds)] <-0  # replace NA with zero

fit_data <- fit_data %>%
  mutate(weight_record = case_when(
  WeightPounds > 0 ~ "Weighted",
  WeightPounds == 0 ~ "not_Weighted"
  ))

fit_data <- mutate(fit_data, count = case_when(weight_record == 'not_Weighted'~ 0,weight_record == 'Weighted'~ 1))

head(fit_data)

```

```{r}
weight_track <- fit_data %>%
  select(Id.x,weight_record) %>%
  mutate(fit_data, count = case_when(weight_record == 'not_Weighted'~ 0,weight_record == 'Weighted'~ 1)) %>%
  group_by(Id.x) %>%
  summarize(weighted=sum(count))

# Data validation in weight_data with fit_data

tail(weight_track, n=5)

table(fit_data$weight_record[fit_data$Id.x ==8877689391])
```

```{r}
# Add aggregate "TotalSteps" data by Id

step_data <- fit_data %>%
  group_by(Id.x) %>%
  summarise(steps=sum(TotalSteps))
step_data
```

```{r}
# Add aggregate "Calories" data by Id

calory_data <- fit_data %>%
  group_by(Id.x) %>%
  summarise(total_calories=sum(Calories))
calory_data
```

```{r}
# Merge two dataframes into one

weight_track <- merge(weight_track,step_data,by="Id.x")
weight_track <- merge(weight_track,calory_data,by="Id.x")
```

<br/>

### Display a scatter plot of weight scale vs total steps

```{r error=FALSE, message=FALSE, warning=FALSE}

library(scales)

ggplot(data=weight_track)+geom_point(mapping=aes(x=weighted, y=steps))+
  geom_smooth(mapping=aes(x=weighted, y=steps))+
  labs(title = "Weight Scale Frequency vs Total Steps",x = "Weight Scale (times)", y = "Steps (thousand)")+
    theme_minimal() + 
  scale_y_continuous(labels = label_number(scale = 1e-3, accuracy = 1))

```

<br/><br/>

### Display a propotion table of weight scale and total steps

```{r}
step_table <- table(weight_track$weighted, cut(weight_track$steps, breaks=seq.int(from=0,to=500000,by=100000)))

step_table_prop <- prop.table(step_table)
round(step_table_prop,2)

```

### Insights

<font size="4">
- People who track their weight seem to walk more than who don't.
</font>

<br/><br/>

<hr>

### Display a scatter plot of weight scale vs total calories

```{r error=FALSE, message=FALSE, warning=FALSE}

ggplot(data=weight_track)+geom_point(mapping=aes(x=weighted, y=total_calories))+
  labs(title = "Weight Scale Frequency vs Total Calories",x = "Weight Scale (times)", y = "Total Calories (thousand)")+
    theme_minimal() + 
  scale_y_continuous(labels = label_number(scale = 1e-3, accuracy = 1))
```

### See the relationship between weight scale and total calories in a propotion of table
```{r}
cal_table <- table(weight_track$weighted, cut(weight_track$total_calories, breaks=seq.int(from=0,to=115000,by=20000)))

cal_table_prop <- prop.table(cal_table)
cal_table_prop
round(cal_table_prop,2)

```

### Insights

<font size="4">
- People who track their weight tend to burn more calories than who don't.
</font>

<br/><br/>

<hr>


## 2. See the relationship between workout and sleep

### Add cumulative calories, steps and sleep time columns for each Id. 

```{r}
fit_data$accum_calories <-
  ave(fit_data$Calories,fit_data$Id.x, FUN = cumsum)

# Add cumulative Very active distance column for each Ids 
fit_data$accum_VeryActive <-
  ave(fit_data$VeryActiveDistance,fit_data$Id.x, FUN = cumsum)

# Add cumulative moderate active distance column for each Ids 
fit_data$accum_ModeratedActive <-
  ave(fit_data$ModeratelyActiveDistance ,fit_data$Id.x, FUN = cumsum)

# Add cumulative light active distance column for each Ids 
fit_data$accum_LightlyActive <-
  ave(fit_data$LightActiveDistance,fit_data$Id.x, FUN = cumsum)
```

```{r}
# Add cumulative step column for each Ids 
fit_data$accum_steps <-
  ave(fit_data$TotalSteps,fit_data$Id.x, FUN = cumsum)

# Add cumulative sleeping time column for each Ids 
fit_data$accum_sleep <-
  ave(fit_data$TotalMinutesAsleep,fit_data$Id.x, FUN = cumsum)

n_distinct(fit_data$Id.x)

```


### See the relationship between active distance/minutes and time in asleep

```{r warning=FALSE, message=FALSE}
# create scatter plot of very active distance vs minutes asleep
ggplot(fit_data,aes(x=VeryActiveDistance, y=TotalMinutesAsleep)) + geom_point()+
  geom_smooth(mapping=aes(x=VeryActiveDistance, y=TotalMinutesAsleep))+
  labs(title = "Very Active Distance vs Minutes Asleep")

```

```{r warning=FALSE, message=FALSE}
# create a scatter plot of moderate active distance vs minutes asleep
ggplot(data=fit_data)+geom_point(mapping=aes(x=ModeratelyActiveDistance, y=TotalMinutesAsleep))+
  geom_smooth(mapping=aes(x=ModeratelyActiveDistance, y=TotalMinutesAsleep))+
  labs(title = "Moderately Active Distance vs Minutes Asleep")

```

```{r warning=FALSE, message=FALSE}
# create a scatter plot of light active distance vs minutes asleep
ggplot(data=fit_data)+geom_point(mapping=aes(x=LightActiveDistance, y=TotalMinutesAsleep))+
  geom_smooth(mapping=aes(x=LightActiveDistance, y=TotalMinutesAsleep))+
  labs(title = "Light Active Distance vs Minutes Asleep")
```

```{r warning=FALSE, message=FALSE}
# create a scatter plot of very active minutes and minutes asleep
ggplot(data=fit_data)+geom_point(aes(x=VeryActiveMinutes, y=TotalMinutesAsleep))+
  geom_smooth(mapping=aes(x=VeryActiveMinutes, y=TotalMinutesAsleep))+
  labs(title = "Very Active Minutes vs Minutes Asleep")

```

```{r warning=FALSE, message=FALSE}
ggplot(data=fit_data)+geom_point(mapping=aes(x=FairlyActiveMinutes, y=TotalMinutesAsleep))+
  geom_smooth(mapping=aes(x=FairlyActiveMinutes, y=TotalMinutesAsleep))+
  labs(title = "Fairly Active Minutes vs Minutes Asleep")
```

```{r warning=FALSE, message=FALSE}
ggplot(data=fit_data)+geom_point(mapping=aes(x=LightlyActiveMinutes, y=TotalMinutesAsleep))+
  geom_smooth(mapping=aes(x=LightlyActiveMinutes, y=TotalMinutesAsleep))+
  labs(title = "Light Active Minutes vs Minutes Asleep")
```

### Insights

<font size="4">
- There is no obvious relationship between active distance/minutes and time in asleep.
</font>

<br/><br/>

<hr>



## 3. See the relationship between burned calories and time in asleep
```{r warning=FALSE, message=FALSE}
ggplot(data=fit_data)+geom_point(mapping=aes(x=Calories, y=TotalMinutesAsleep))+
  geom_smooth(mapping=aes(x=Calories, y=TotalMinutesAsleep))+
  labs(title = "Calories vs Minutes Asleep")
```

### Insights

<font size="4">
- There is no obvious relationship between calories and time in asleep.
</font>

<br/><br/>

<hr>

## 4. See the relationship between steps and time in asleep

```{r message=FALSE, error=FALSE, warning=FALSE}
scat1 <- ggplot(data=fit_data)+geom_point(mapping=aes(x=TotalSteps, y=TotalMinutesAsleep, color = weight_record))+
  geom_smooth(mapping=aes(x=TotalSteps, y=TotalMinutesAsleep))+
  labs(title = "Steps vs Sleeping minutes")
plot(scat1)

```

```{r message=FALSE, error=FALSE, warning=FALSE}
scat2 <- scat1 + xlim(0,20000) + ylim(0,800)
plot(scat2)
```


### Insights

<font size="4">
- There is no obvious relationship between steps and time in asleep.
</font>

<br/><br/>

<hr>

## Solutions

<font size="4">
- The most obvious tendency is the people who scale their body weights do more workout, so there are two possible ways to encourage Bellabeat users.(1) Add the feature of weight-scale notification to encourage the users to track their weight records.(2)Add a feature that Bellabeat syncs with a body weight scale to avoid manual input.<br/>
- Add communication features (e.g.text&call notification) (This feature requires huge investment, but most of wearable devices have similar features)

</font>

<br/><br/>

<hr>


## Conclusion

<font size="4">

- The best recommendation for Bellabeat is to enable easy-track on users' body weight. To improve the accuracy of the insight that the relationship between body-weight track and workout result, adding more samples is desirable.<br/>
- Majority of people run around 1-5 miles per day. Generally speaking, good exercise leads good sleep, but it doesn't prove in this analysis. That is because these data don't cover a variety of key factors that lead to good sleep. More accessibility for the user data is expected.

</font>

<br/><br/>

<hr>


## Next steps

<font size="4">
Overall, I recommend Bellabeat marketing & tech team to improve weight-scale tracker feature by adding notification and auto-sync with a body weight scale to the device, and do the tests for the new features. These new features will lead users to focus more on their health. Do A/B test by asking their users to be a part of the survey and review the result.

</font>

<br/><br/>

<hr>