GitHub - MaiqTheHonest/toolkit-data-bread: see README

A toolkit for quickly getting insights from pandas Dataframes via common regression methods and their visualizations. Effectively a combination of R's glm and ggplot functionalities as a single Python model call with the addition of one-liner clustering methods.

Currently supports:

(multiple) linear regression with 2D, 3D plots
(multiple) logistic regression; robust for both binary and proportional (i.e. 0 < y < 1) regressands + 2D, 3D plots
k-means clustering with inertia plots to determine optimal cluster number; plots in 1D, 2D, 3D but clustering for any number of variables

Example usage:

Linear regression:

from explore_toolkit import lm 
df = pd.read_csv("titanic.csv") 
df = df.fillna({'Age': df['Age'].median()})

lm(df, 'Fare ~ Age', plot=True)
lm(df, 'Fare ~ Age + SibSp', plot=True)

2D	3D

Logistic regression (for binary and proportional predictors):

from explore-toolkit import logit 
td = pd.read_csv('ReedfrogPred.csv') #  propsurv is between 0 and 1, but also works if binary

logit(td, 'propsurv ~ surv', plot=True)
logit(td, 'propsurv ~ density + surv', plot=True)

2D	3D

k-means clustering and the "elbow" rule:

from explore_toolkit kmeansclusters, elbow
df = pd.read_csv('Iris.csv')

Here I am using the very popular iris dataset

elbow(df, ['SepalLengthCm', 'SepalWidthCm', 'PetalWidthCm'])

and now clustering with the optimal n_clusters = 3 :

kmeansclusters(df, ['SepalLengthCm', 'SepalWidthCm', ], n_clusters=3, plot=True, append=True, spit=False)
kmeansclusters(df, ['SepalLengthCm', 'SepalWidthCm', 'PetalWidthCm'], n_clusters=3, plot=True, append=True)

2D	3D

Use spit = True to return the standalone column of cluster numbers and use append = True to insert the column of cluster numbers into the analysed dataframe (position 1)

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
README.md		README.md
explore_toolkit.py		explore_toolkit.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Linear regression:

Logistic regression (for binary and proportional predictors):

k-means clustering and the "elbow" rule:

About

Releases

Packages

Languages

MaiqTheHonest/toolkit-data-bread

Folders and files

Latest commit

History

Repository files navigation

Linear regression:

Logistic regression (for binary and proportional predictors):

k-means clustering and the "elbow" rule:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages