Skip to content

mas-tono/Medical-Data-Visualizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Medical-Data-Visualizer

certification python project from freecodecamp

  1. using Python to explore the relationship between cardiac disease, body measurements, blood markers, and lifestyle choices.

  2. visualize and make calculations from medical examination data using matplotlib, seaborn, and pandas. The dataset values were collected during medical examinations.

  3. import libraries

    libraries
       import pandas as pd
       import seaborn as sns
       import matplotlib.pyplot as plt
       import numpy as np
       
  4. import data

    read data
       df = pd.read_csv("medical_examination.csv")
       
  5. add 'overweight' column

    overweight
       df['overweight'] = df["weight"] / (df["height"]/100)**2
       df.loc[df["overweight"] > 25, "overweight"] = 1
       df.loc[df["overweight"] !=1, "overweight"] = 0
       
  6. normalize data by making 0 always good and 1 always bad. If the value of 'cholesterol' or 'gluc' is 1, make the value 0. If the value is more than 1, make the value 1.

    normalize
       df["cholesterol"].replace({1:0, 2:1, 3:1}, inplace=True)
       df["gluc"].replace({1:0, 2:1, 3:1}, inplace=True)
       
  7. create categorical plot

    draw_cat_plot
       def draw_cat_plot():
           - create DataFrame for cat plot using `pd.melt` using just the values from 'cholesterol', 'gluc', 'smoke', 'alco', 'active', and 'overweight'.
           df_cat = df.copy(deep=True)
           df_cat = pd.melt(df_cat, id_vars="cardio", value_vars=["active", "alco", "cholesterol", "gluc", "overweight", "smoke"]) 
    - group and reformat the data to split it by 'cardio'. Show the counts of each feature. df_cat = df_cat.groupby(["cardio", "variable", "value"]).agg(total = ("value", "count")) df_cat = pd.DataFrame(df_cat) df_cat.reset_index(inplace=True)
    draw plot
       fig = sns.catplot(data = df_cat, 
             x ="variable", 
             y = "total", 
             hue = "value", 
             col = "cardio", 
             kind = "bar").fig
             
    fig.savefig('catplot.png') return fig
    Cat Plot
  8. clean the data by filtering out the following patient segments that represent incorrect data:

    criteria
    • diastolic pressure is higher than systolic (Keep the correct data with (df['ap_lo'] <= df['ap_hi']))
    • height is less than the 2.5th percentile (Keep the correct data with (df['height'] >= df['height'].quantile(0.025)))
    • height is more than the 97.5th percentile
    • weight is less than the 2.5th percentile
    • weight is more than the 97.5th percentile
    draw_heat_map
      def draw_heat_map():
        df_heat = df.copy(deep=True)
        df_heat = df_heat[
        (df_heat['ap_lo'] <= df_heat['ap_hi']) &
        (df_heat['height'] >= df_heat['height'].quantile(0.025)) &
        (df_heat['height'] <= df_heat['height'].quantile(0.975)) &
        (df_heat['weight'] >= df_heat['weight'].quantile(0.025)) &
        (df_heat['weight'] <= df_heat['weight'].quantile(0.975))
        ]
      
    correlation matrix
      corr = df_heat.corr(method="pearson")
      
    - masking for upper triangle of heat map mask = np.triu(corr)
    draw heat map
          
      fig, ax = plt.subplots(figsize=(12,12))
      ax = sns.heatmap(data=corr, 
            mask=mask, 
            annot=True,
           cmap="cubehelix",
            fmt=".1f",
           annot_kws={"fontsize":8},
                linewidths=1)    
      fig.savefig('heatmap.png')
      return fig
      
    Heat Map
  9. put them together in medical_data_visualizer.py file

  10. call via main.py

Releases

No releases published

Packages

No packages published

Languages