Medical-Data-Visualizer

certification python project from freecodecamp

using Python to explore the relationship between cardiac disease, body measurements, blood markers, and lifestyle choices.
visualize and make calculations from medical examination data using matplotlib, seaborn, and pandas. The dataset values were collected during medical examinations.

import libraries

libraries

   import pandas as pd
   import seaborn as sns
   import matplotlib.pyplot as plt
   import numpy as np

import data

read data

   df = pd.read_csv("medical_examination.csv")

add 'overweight' column

overweight

   df['overweight'] = df["weight"] / (df["height"]/100)**2
   df.loc[df["overweight"] > 25, "overweight"] = 1
   df.loc[df["overweight"] !=1, "overweight"] = 0

normalize data by making 0 always good and 1 always bad. If the value of 'cholesterol' or 'gluc' is 1, make the value 0. If the value is more than 1, make the value 1.
normalize
```
   df["cholesterol"].replace({1:0, 2:1, 3:1}, inplace=True)
   df["gluc"].replace({1:0, 2:1, 3:1}, inplace=True)
   
```

create categorical plot

draw_cat_plot

   def draw_cat_plot():
       - create DataFrame for cat plot using `pd.melt` using just the values from 'cholesterol', 'gluc', 'smoke', 'alco', 'active', and 'overweight'.
       df_cat = df.copy(deep=True)
       df_cat = pd.melt(df_cat, id_vars="cardio", value_vars=["active", "alco", "cholesterol", "gluc", "overweight", "smoke"]) 
          
       - group and reformat the data to split it by 'cardio'. Show the counts of each feature. 
       df_cat = df_cat.groupby(["cardio", "variable", "value"]).agg(total = ("value", "count"))
       df_cat = pd.DataFrame(df_cat)
       df_cat.reset_index(inplace=True)

draw plot

   fig = sns.catplot(data = df_cat, 
         x ="variable", 
         y = "total", 
         hue = "value", 
         col = "cardio", 
         kind = "bar").fig
         

   fig.savefig('catplot.png')
   return fig

Cat Plot

clean the data by filtering out the following patient segments that represent incorrect data:

criteria

diastolic pressure is higher than systolic (Keep the correct data with (df['ap_lo'] <= df['ap_hi']))
height is less than the 2.5th percentile (Keep the correct data with (df['height'] >= df['height'].quantile(0.025)))
height is more than the 97.5th percentile
weight is less than the 2.5th percentile
weight is more than the 97.5th percentile

draw_heat_map

  def draw_heat_map():
    df_heat = df.copy(deep=True)
    df_heat = df_heat[
    (df_heat['ap_lo'] <= df_heat['ap_hi']) &
    (df_heat['height'] >= df_heat['height'].quantile(0.025)) &
    (df_heat['height'] <= df_heat['height'].quantile(0.975)) &
    (df_heat['weight'] >= df_heat['weight'].quantile(0.025)) &
    (df_heat['weight'] <= df_heat['weight'].quantile(0.975))
    ]

correlation matrix

  corr = df_heat.corr(method="pearson")
  
      
  - masking for upper triangle of heat map      
  mask = np.triu(corr)

draw heat map

      
  fig, ax = plt.subplots(figsize=(12,12))
  ax = sns.heatmap(data=corr, 
        mask=mask, 
        annot=True,
       cmap="cubehelix",
        fmt=".1f",
       annot_kws={"fontsize":8},
            linewidths=1)    
  fig.savefig('heatmap.png')
  return fig

Heat Map

put them together in medical_data_visualizer.py file
call via main.py

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
image		image
README.md		README.md
main.py		main.py
medical_data_visualizer.py		medical_data_visualizer.py
medical_examination.csv		medical_examination.csv
test_module.py		test_module.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Medical-Data-Visualizer

About

Releases

Packages

Languages

mas-tono/Medical-Data-Visualizer

Folders and files

Latest commit

History

Repository files navigation

Medical-Data-Visualizer

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages