Skip to content

Commit d8b82b1

Browse files
committed
updating requirements, adding pre-commit, and formatting code with black
1 parent ad24076 commit d8b82b1

28 files changed

+209
-170
lines changed

.pre-commit-config.yaml

+28-17
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,31 @@
1-
---
21
repos:
32

4-
-
5-
repo: https://github.com/ambv/black
6-
rev: 20.8b1
7-
hooks:
8-
-
9-
id: black
10-
language_version: python3
3+
- repo: https://github.com/pre-commit/pre-commit-hooks
4+
rev: v2.3.0
5+
hooks:
6+
- id: check-yaml
7+
- id: end-of-file-fixer
8+
- id: trailing-whitespace
9+
- id: check-added-large-files
10+
- id: debug-statements
11+
language_version: python3
1112

12-
- repo: local
13-
hooks:
14-
- id: python-tests
15-
name: pytests
16-
entry: pytest src/tests
17-
language: python
18-
additional_dependencies: [pre-commit, pytest, pandas, sklearn, matplotlib]
19-
always_run: true
20-
pass_filenames: false
13+
- repo: https://github.com/psf/black
14+
rev: 22.10.0
15+
hooks:
16+
- id: black
17+
args: [--safe]
18+
19+
- repo: local
20+
hooks:
21+
- id: pylint
22+
name: pylint
23+
files: .
24+
entry: pylint
25+
language: system
26+
types: [python3]
27+
args: [
28+
"-rn", # Only display messages
29+
"-sn", # Don't display the score
30+
"--rcfile=.pylintrc", # Link to your config file
31+
]

data/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -12,4 +12,4 @@ Finally, you can download the dataset using the following command:
1212
bash download_data.sh
1313
```
1414

15-
The dataset will be temporarily saved locally (inside the `data` folder) and transferred to your AWS S3 bucket. After that, the dataset will be deleted. If you choose to not use an AWS S3 Bucket, then the dataset will be stored into the `data` folder.
15+
The dataset will be temporarily saved locally (inside the `data` folder) and transferred to your AWS S3 bucket. After that, the dataset will be deleted. If you choose to not use an AWS S3 Bucket, then the dataset will be stored into the `data` folder.

data/download_data.sh

-1
Original file line numberDiff line numberDiff line change
@@ -39,4 +39,3 @@ if [[ "$CONFIG_S3" != "YOUR_S3_BUCKET_URL" ]]; then
3939

4040
# deleting the create folder
4141
rm Original_ObesityDataSet.csv
42-

notebooks/README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Here go the notebooks used for research and development. The main idea is to try
44

55
## Setup Credentials
66

7-
If you haven't your credentials yet, please check the `docs` folder first before following along.
7+
If you haven't your credentials yet, please check the `docs` folder first before following along.
88

99
1. Set your `AWS Credentials` and `Kaggle API Credentials` (used to download the dataset) in the `credentials.yaml` file.
1010

@@ -44,4 +44,4 @@ sudo docker log <CONTAINER_ID>
4444
- Run the `EDA` notebook.
4545
- Run the `Data Processing` notebook.
4646
- Run the `Experimentations` notebook (will test different Machine Learning models, different hyperparameters for each model, and do some feature engineering and selection).
47-
- Register the best models to the MLflow model registry using the `Experimentations` notebook (last cell) or the MLflow's user interface.
47+
- Register the best models to the MLflow model registry using the `Experimentations` notebook (last cell) or the MLflow's user interface.

notebooks/VERSION

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
1.1.0
1+
1.3.0

notebooks/dev_Dockerfile

+1-1
Original file line numberDiff line numberDiff line change
@@ -17,4 +17,4 @@ WORKDIR /e2e-project
1717
RUN pip install --no-cache-dir -U pip
1818

1919
# installing requirements
20-
RUN pip install -r notebooks/requirements_dev.txt
20+
RUN pip install -r notebooks/requirements_dev.txt

notebooks/docs/SETUP_AWS.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -196,7 +196,7 @@ aws ec2 authorize-security-group-ingress \
196196
--group-id "sg-0613261580cd87115" \
197197
--protocol tcp \
198198
--port 5000 \
199-
--cidr "0.0.0.0/0"
199+
--cidr "0.0.0.0/0"
200200
```
201201

202202
The output should look like this:
@@ -224,7 +224,7 @@ aws ec2 authorize-security-group-ingress \
224224
--group-id "sg-0613261580cd87115" \
225225
--protocol tcp \
226226
--port 22 \
227-
--cidr "18.206.107.24/29"
227+
--cidr "18.206.107.24/29"
228228
```
229229

230230
The output should look like this:
@@ -579,4 +579,4 @@ pipenv install mlflow boto3 psycopg2-binary awscli
579579
pipenv shell
580580

581581
aws configure
582-
```
582+
```

notebooks/docs/SETUP_KAGGLE.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
# Setting up Kaggle's Account
22

3-
To use the Kaggle API, sign up for a Kaggle account at https://www.kaggle.com. Then go to the 'Account' tab of your user profile (https://www.kaggle.com/<username>/account) and select 'Create API Token'. This will trigger the download of kaggle.json, a file containing your API credentials. Set your `Kaggle API Credentials` (used to download the dataset) in the `credentials.yaml` file.
3+
To use the Kaggle API, sign up for a Kaggle account at https://www.kaggle.com. Then go to the 'Account' tab of your user profile (https://www.kaggle.com/<username>/account) and select 'Create API Token'. This will trigger the download of kaggle.json, a file containing your API credentials. Set your `Kaggle API Credentials` (used to download the dataset) in the `credentials.yaml` file.

notebooks/requirements_dev.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -12,4 +12,4 @@ optuna==3.6.1
1212
pandas==1.5.2
1313
scikit_learn==1.3.2
1414
seaborn==0.13.2
15-
xgboost==2.1.1
15+
xgboost==2.1.1

requirements.txt

+12-11
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,12 @@
1-
scikit-learn>=0.23
2-
pandas
3-
seaborn
4-
matplotlib
5-
joblib
6-
numpy
7-
ibm_watson_machine_learning
8-
pyyaml
9-
pytest
10-
pytest-dependency
11-
pre-commit
1+
boto3==1.35.6
2+
fastapi==0.115.5
3+
joblib==1.3.2
4+
loguru==0.7.2
5+
mlflow==2.17.2
6+
numpy==2.1.3
7+
pandas==1.5.2
8+
pydantic==2.9.2
9+
pytest==8.3.3
10+
PyYAML==6.0.2
11+
scikit_learn==1.3.2
12+
xgboost==2.1.2

src/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
# Scripts
22

3-
Here goes Scripts and Pipelines
3+
Here goes Scripts and Pipelines

src/api.py

+7-5
Original file line numberDiff line numberDiff line change
@@ -12,20 +12,24 @@
1212
app = FastAPI()
1313

1414
if aws_credentials.EC2 != "YOUR_EC2_INSTANCE_URL":
15-
mlflow.set_tracking_uri(f"http://{aws_credentials.EC2}:5000")
15+
mlflow.set_tracking_uri(f"http://{aws_credentials.EC2}:5000")
1616
else:
1717
mlflow.set_tracking_uri(f"http://127.0.0.1:5000")
1818

19+
1920
@app.get("/version")
2021
def check_versions():
21-
with open(f"{general_settings.RESEARCH_ENVIRONMENT_PATH}/VERSION", "r", encoding="utf-8") as f:
22+
with open(
23+
f"{general_settings.RESEARCH_ENVIRONMENT_PATH}/VERSION", "r", encoding="utf-8"
24+
) as f:
2225
code_version = f.readline().strip()
2326

2427
return {
2528
"code_version": code_version,
2629
"model_version": model_settings.VERSION,
2730
}
2831

32+
2933
@app.get("/predict")
3034
async def prediction(person: Person):
3135
loaded_model = ModelServe(
@@ -38,6 +42,4 @@ async def prediction(person: Person):
3842
data = pd.DataFrame.from_dict([person.model_dump()])
3943
X = data_processing_inference(data)
4044

41-
return {
42-
"predictions": loaded_model.predict(X).tolist()
43-
}
45+
return {"predictions": loaded_model.predict(X).tolist()}

src/config/aws.py

+2
Original file line numberDiff line numberDiff line change
@@ -11,12 +11,14 @@ class AWSCredentials(BaseModel):
1111
Args:
1212
BaseModel (pydantic.BaseModel): Pydantic base model instance.
1313
"""
14+
1415
EC2: str
1516
S3: str
1617
POSTGRESQL: str
1718
AWS_ACCESS_KEY: str
1819
AWS_SECRET_KEY: str
1920

21+
2022
aws_credentials = AWSCredentials(
2123
**read_yaml_credentials_file(
2224
file_path=Path.joinpath(

src/config/kaggle.py

+1
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ class KaggleCredentials(BaseModel):
1010
Args:
1111
BaseModel (pydantic.BaseModel): Pydantic base model instance.
1212
"""
13+
1314
KAGGLE_USERNAME: str
1415
KAGGLE_KEY: str
1516

src/config/model.py

+2
Original file line numberDiff line numberDiff line change
@@ -12,13 +12,15 @@ class ModelSettings(BaseModel):
1212
Args:
1313
BaseModel (pydantic.BaseModel): Pydantic base model instance.
1414
"""
15+
1516
MODEL_NAME: str
1617
VERSION: str
1718
MODEL_FLAVOR: str
1819
EXPERIMENT_ID: str
1920
RUN_ID: str
2021
FEATURES: List[str]
2122

23+
2224
model_settings = ModelSettings(
2325
**read_yaml_credentials_file(
2426
file_path=Path.joinpath(

src/config/settings.py

+3-1
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ class GeneralSettings(BaseModel):
1313
Args:
1414
BaseModel (pydantic.BaseModel): Pydantic base model instance.
1515
"""
16+
1617
DATA_PATH: DirectoryPath
1718
RAW_FILE_NAME: str
1819
ARTIFACTS_PATH: DirectoryPath
@@ -22,6 +23,7 @@ class GeneralSettings(BaseModel):
2223
LOG_PATH: DirectoryPath
2324
RESEARCH_ENVIRONMENT_PATH: DirectoryPath
2425

26+
2527
general_settings = GeneralSettings(
2628
**read_yaml_credentials_file(
2729
file_path=Path.joinpath(
@@ -38,5 +40,5 @@ class GeneralSettings(BaseModel):
3840
Path.joinpath(general_settings.LOG_PATH, "logs", "app.log"),
3941
rotation="1 day",
4042
retention="7 days",
41-
compression="zip"
43+
compression="zip",
4244
)

src/config/utils.py

+7-9
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
from pydantic import BaseModel, create_model
77
from pydantic.fields import FieldInfo
88

9+
910
def partial_model(model: Type[BaseModel]):
1011
"""Workaround for setting all Pydantic's fields as optional.
1112
All credits goes to the author:
@@ -14,30 +15,27 @@ def partial_model(model: Type[BaseModel]):
1415
Args:
1516
model (Type[BaseModel]): Pydantic base model instance.
1617
"""
18+
1719
def make_field_optional(
18-
field: FieldInfo,
19-
default: Any = None
20+
field: FieldInfo, default: Any = None
2021
) -> Tuple[Any, FieldInfo]:
2122
new = deepcopy(field)
2223
new.default = default
2324
new.annotation = Optional[field.annotation] # type: ignore
2425
return new.annotation, new
2526

2627
return create_model(
27-
f'Partial{model.__name__}',
28+
f"Partial{model.__name__}",
2829
__base__=model,
2930
__module__=model.__module__,
3031
**{
3132
field_name: make_field_optional(field_info)
3233
for field_name, field_info in model.model_fields.items()
33-
}
34+
},
3435
)
3536

3637

37-
def read_yaml_credentials_file(
38-
file_path: Path,
39-
file_name: str
40-
) -> Dict:
38+
def read_yaml_credentials_file(file_path: Path, file_name: str) -> Dict:
4139
"""Reads a YAML file.
4240
4341
Args:
@@ -56,7 +54,7 @@ def read_yaml_credentials_file(
5654
file_name,
5755
)
5856

59-
with open(path, 'r', encoding='utf-8') as f:
57+
with open(path, "r", encoding="utf-8") as f:
6058
try:
6159
context = yaml.safe_load(f)
6260
except yaml.YAMLError as e:

0 commit comments

Comments
 (0)