Skip to content

MaiqTheHonest/toolkit-data-bread

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 

Repository files navigation

A toolkit for quickly getting insights from pandas Dataframes via common regression methods and their visualizations. Effectively a combination of R's glm and ggplot functionalities as a single Python model call with the addition of one-liner clustering methods.

Currently supports:

  1. (multiple) linear regression with 2D, 3D plots
  2. (multiple) logistic regression; robust for both binary and proportional (i.e. 0 < y < 1) regressands + 2D, 3D plots
  3. k-means clustering with inertia plots to determine optimal cluster number; plots in 1D, 2D, 3D but clustering for any number of variables

Example usage:

Linear regression:

from explore_toolkit import lm 
df = pd.read_csv("titanic.csv") 
df = df.fillna({'Age': df['Age'].median()})

lm(df, 'Fare ~ Age', plot=True)
lm(df, 'Fare ~ Age + SibSp', plot=True)
2D 3D
image image

Logistic regression (for binary and proportional predictors):

from explore-toolkit import logit 
td = pd.read_csv('ReedfrogPred.csv') #  propsurv is between 0 and 1, but also works if binary

logit(td, 'propsurv ~ surv', plot=True)
logit(td, 'propsurv ~ density + surv', plot=True)
2D 3D
image image

k-means clustering and the "elbow" rule:

from explore_toolkit kmeansclusters, elbow
df = pd.read_csv('Iris.csv')

Here I am using the very popular iris dataset

elbow(df, ['SepalLengthCm', 'SepalWidthCm', 'PetalWidthCm'])


and now clustering with the optimal n_clusters = 3 :

kmeansclusters(df, ['SepalLengthCm', 'SepalWidthCm', ], n_clusters=3, plot=True, append=True, spit=False)
kmeansclusters(df, ['SepalLengthCm', 'SepalWidthCm', 'PetalWidthCm'], n_clusters=3, plot=True, append=True)
2D 3D
image image

Use spit = True to return the standalone column of cluster numbers and use append = True to insert the column of cluster numbers into the analysed dataframe (position 1)

Releases

No releases published

Packages

No packages published

Languages