Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT CLI conversion #249

Merged
merged 48 commits into from
Jan 23, 2023
Merged
Show file tree
Hide file tree
Changes from 32 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
22c4c71
First draft with test outline
E-Aho Dec 6, 2022
eacedf8
Add in tests for both main and method
E-Aho Dec 9, 2022
453728d
Add entrypoint to setup.py
E-Aho Dec 9, 2022
da6da30
Get entrypoint working
E-Aho Dec 9, 2022
a5abe50
Merge branch 'main' into FEAT-CLI-conversion
E-Aho Dec 9, 2022
5a9c0fc
Undo change in _persist
E-Aho Dec 9, 2022
b3a561e
Add output dir argument
E-Aho Dec 11, 2022
1d01ff5
fix tests
E-Aho Dec 11, 2022
91d1b0f
adjust format of log message
E-Aho Dec 11, 2022
7af9c26
Update main entrypoint and parent structure
E-Aho Dec 11, 2022
8ad10cd
Add test that covers main entrypoint
E-Aho Dec 12, 2022
864d8ca
Update skops/io/_cli.py
E-Aho Dec 15, 2022
7589a06
Merge branch 'main' into FEAT-CLI-conversion
E-Aho Dec 17, 2022
b39ec14
WIP for merging from main
E-Aho Dec 17, 2022
9e76ee1
Merge branch 'FEAT-CLI-conversion' of github.com:E-Aho/skops into FEA…
E-Aho Dec 17, 2022
aa24050
Address PR comments 1
E-Aho Dec 17, 2022
02b3cf8
Add test to check for multiple output file types
E-Aho Dec 17, 2022
245197f
Update entrypoint name in setup.py
E-Aho Dec 17, 2022
2b681ce
PR comments before reworking to support only single files
E-Aho Jan 3, 2023
25b539c
Pre-parser rework]
E-Aho Jan 4, 2023
50bf189
Remove empty test file
E-Aho Jan 4, 2023
6ecb8cf
Merge branch 'main' into FEAT-CLI-conversion
E-Aho Jan 5, 2023
227f23c
Pre-test reintroduction
E-Aho Jan 9, 2023
426d663
Pre-test reintroduction
E-Aho Jan 9, 2023
c3b001c
Merge branch 'FEAT-CLI-conversion' of github.com:E-Aho/skops into FEA…
E-Aho Jan 9, 2023
37d3a87
Change verbosity to work as discussed in PR
E-Aho Jan 10, 2023
9b9f342
Add needed __init__ file
E-Aho Jan 10, 2023
6f5b71e
Update docstrings
E-Aho Jan 10, 2023
aa7c3ba
Make names of files uniform
E-Aho Jan 10, 2023
1c0352d
Update help docstrings
E-Aho Jan 10, 2023
8d9b3c5
Add documentation for CLI
E-Aho Jan 10, 2023
452fbdb
Merge branch 'main' into FEAT-CLI-conversion
E-Aho Jan 11, 2023
8150f9a
Update docs/persistence.rst
E-Aho Jan 17, 2023
88e7be0
Update docs/persistence.rst
E-Aho Jan 17, 2023
c38cb73
Update docs/persistence.rst
E-Aho Jan 17, 2023
5233230
Apply suggestions from code review
E-Aho Jan 17, 2023
95a5ff3
Local changes
E-Aho Jan 17, 2023
a17b14b
Merge branch 'FEAT-CLI-conversion' of github.com:E-Aho/skops into FEA…
E-Aho Jan 17, 2023
7e63178
Add test coverage for very verbose logging
E-Aho Jan 17, 2023
6e35770
Add comments to make entrypoint more clear
E-Aho Jan 17, 2023
02c95ad
Adjust test to be more clear
E-Aho Jan 17, 2023
c3491ef
Add example script to convert all pkl files in working dir
E-Aho Jan 17, 2023
7e072da
Update skops/cli/_convert.py to be more clear in warning message
E-Aho Jan 18, 2023
799f91b
Update log message
E-Aho Jan 18, 2023
1864f1c
Merge branch 'FEAT-CLI-conversion' of github.com:E-Aho/skops into FEA…
E-Aho Jan 18, 2023
8ddc6b9
Adjust warning message to not use Trusted=True
E-Aho Jan 18, 2023
bdfbf58
Merge branch 'main' into FEAT-CLI-conversion
E-Aho Jan 19, 2023
0e6b4cb
Merge branch 'main' into FEAT-CLI-conversion
E-Aho Jan 20, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion docs/changes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,12 @@ skops Changelog
:depth: 1
:local:

v0.5
----
- Added CLI entrypoint support (:func:`.cli.entrypoint.main_cli`)
and a command line function to convert Pickle files
to Skops files (:func:`.cli._convert.main`). :pr:`249` by `Erin Aho`_

v0.4
----
- :func:`.io.dump` and :func:`.io.load` now work with file like objects,
Expand Down Expand Up @@ -83,4 +89,5 @@ Contributors

:user:`Adrin Jalali <adrinjalali>`, :user:`Merve Noyan <merveenoyan>`,
:user:`Benjamin Bossan <BenjaminBossan>`, :user:`Ayyuce Demirbas
<ayyucedemirbas>`, :user:`Prajjwal Mishra <p-mishra1>`
<ayyucedemirbas>`, :user:`Prajjwal Mishra <p-mishra1>`,
:user:`Erin Aho <E-Aho>`,
16 changes: 16 additions & 0 deletions docs/persistence.rst
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,22 @@ you have custom functions (say, a custom function to be used with
most ``numpy`` and ``scipy`` functions should work. Therefore, you can save
objects having references to functions such as ``numpy.sqrt``.

Command Line Interface
########
Skops has a command line interface to convert SciKit-Learn models persisted with
``Pickle`` to ``Skops`` files.

To convert a file from the command line, use the ``skops convert`` entrypoint.

Below is an example call to convert a file ``my_model.pkl`` to ``new_model.skops``:

.. code:: console
skops convert my_model.pkl

Further help for the different supported options can be found by calling
``skops convert --help`` in a terminal.


Supported libraries
-------------------

Expand Down
10 changes: 9 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,14 @@


def setup_package():
package_data = dict(
entry_points={
"console_scripts": [
"skops = skops.cli.entrypoint:main_cli",
],
}
)

metadata = dict(
name=DISTNAME,
maintainer=MAINTAINER,
Expand Down Expand Up @@ -72,7 +80,7 @@ def setup_package():
include_package_data=True,
)

setup(**metadata)
setup(**package_data, **metadata)


if __name__ == "__main__":
Expand Down
Empty file added skops/cli/__init__.py
Empty file.
109 changes: 109 additions & 0 deletions skops/cli/_convert.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
from __future__ import annotations

import argparse
import logging
import os
import pathlib
import pickle
from typing import Optional

from skops.cli._utils import get_log_level
from skops.io import dumps, get_untrusted_types


def _convert_file(input_file: os.PathLike, output_file: os.PathLike):
"""
Function that is called by ``skops convert`` entrypoint.

Loads a pickle model from the input path, converts to skops format, and saves to
output file.

Parameters
----------
input_file : os.PathLike
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more general question: I wonder if we should still document this, and similar cases, as "str or pathlib.Path", as users may not be familiar with os.PathLike. WDYT?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly think os.PathLike is pretty self-documenting, but I could also see a user getting confused by not seeing the type they're expecting in the docstring.

I'm happy with either way

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard to say if a user who doesn't know the type would guess that str is a path-like. Anyway, I won't object to leaving it as is, especially since it means that the type annotation and docstring are identical, I just wanted to bring up the possible friction here.

Path of input .pkl model to load.

output_file : os.PathLike
Path to save .skops model to.

"""
model_name = pathlib.Path(input_file).stem

logging.debug(f"Converting {model_name}")

with open(input_file, "rb") as f:
obj = pickle.load(f)
skops_dump = dumps(obj)

untrusted_types = get_untrusted_types(data=skops_dump)

if not untrusted_types:
logging.info(f"No unknown types found in {model_name}.")
else:
untrusted_str = ", ".join(untrusted_types)

logging.warning(
"Unknown Types Detected! "
f"While converting {model_name}, "
"the following unknown types were found: "
f"{untrusted_str}. "
f"When loading {output_file}, add 'trusted=True' to the skops.load call. "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of asking users to use trusted=True, we could also tell them the exact types to trust. WDYT?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think from a users' perspective, it might be nicer to give them the easiest fix possible.

When they're converting this file, and want to not deal, the log line above would've told them all of the untrusted types anyway, so if they wanted to go that route, they could just add those.

We could add a line along the lines of:
..., or set each of the unknown types listed previously to "trusted".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this touches on a more general point of how we should deal with trusted=True. I posted on discord about it, which is a better place to discuss than here :)

)

with open(output_file, "wb") as out_file:
logging.debug(f"Writing to {output_file}")
out_file.write(skops_dump)


def format_parser(
parser: Optional[argparse.ArgumentParser] = None,
) -> argparse.ArgumentParser:
"""Adds arguments and help to parent CLI parser for the convert method."""

if not parser: # used in tests
parser = argparse.ArgumentParser()

parser_subgroup = parser.add_argument_group("convert")
parser_subgroup.add_argument("input", help="Path to an input file to convert. ")

parser_subgroup.add_argument(
"-o",
"--output-file",
help=(
"Specify the output file name for the converted skops file. "
"If not provided, will default to using the same name as the input file, "
"and saving to the current working directory with the suffix '.skops'."
),
default=None,
)
parser_subgroup.add_argument(
"-v",
"--verbose",
help=(
"Increases verbosity of logging. Can be used multiple times to increase "
"verbosity further."
),
action="count",
dest="loglevel",
default=0,
)
return parser


def main(
parsed_args: argparse.Namespace,
):
output_file = parsed_args.output_file
input_file = parsed_args.input

logging.basicConfig(format="%(message)s", level=get_log_level(parsed_args.loglevel))

if not output_file:
# No filename provided, defaulting to base file path
file_name = pathlib.Path(input_file).stem
output_file = pathlib.Path.cwd() / f"{file_name}.skops"

_convert_file(
input_file=input_file,
output_file=output_file,
)
15 changes: 15 additions & 0 deletions skops/cli/_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
import logging


def get_log_level(level: int = 0):
"""Takes in verbosity from a CLI entrypoint (number of times -v specified),
and sets the logger to the required log level"""

all_levels = [logging.WARNING, logging.INFO, logging.DEBUG]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, warning level logs would always be shown, even if I don't add -v. IMO, this should not be the case. So I would prefer that without -v, the log level should be ERROR or CRITICAL.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the python docs:

The default level is WARNING, which means that only events of this level and above will be tracked, unless the logging package is configured to do otherwise.

I followed this because it's the default for python packages.

We can adjust the default level for our CLI, or we could also add a -q, --quiet option to reduce the log level

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is true for logging in general, but for CLIs, there is the convention to not print anything if there is no issue. Ideally, I would like to avoid having a --quiet option, because it increases complexity and because it's unclear how it interacts with -v.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good point.

I would raise that it could be considered an issue if untrusted types were present.

If a user had a file with untrusted types in it, and we converted it silently, they wouldn't know they needed to specify anything as trusted when using it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, this is more of an issue for the person loading the file, and they will know (unless they use trusted=True). Personally, I would not report it by default, but I can live with the current way.


if level > len(all_levels):
level = len(all_levels)
elif level < 0:
level = 0

return all_levels[level]
44 changes: 44 additions & 0 deletions skops/cli/entrypoint.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
import argparse

import skops.cli._convert


def main_cli(command_line_args=None):
"""
Main command line interface entrypoint for all command line Skops methods.

To add a new entrypoint:
1. Create a new method to call that accepts a namespace
2. Create a new subparser formatter to define the expected CL arguments
3. Add those to the function map.
"""
entry_parser = argparse.ArgumentParser(
prog="Skops",
description="Main entrypoint for all command line Skops methods.",
add_help=True,
)

subparsers = entry_parser.add_subparsers(
title="Commands",
description="Skops command to call",
dest="cmd",
help="Sub-commands help",
)

# function_map should map a command to
# method: the command to call
# format_parser: the function used to create a subparser for that command
function_map = {
"convert": {
"method": skops.cli._convert.main,
"format_parser": skops.cli._convert.format_parser,
},
}

for func_name, values in function_map.items():
subparser = subparsers.add_parser(func_name)
subparser.set_defaults(func=values["method"])
values["format_parser"](subparser)

args = entry_parser.parse_args(command_line_args)
args.func(args)
138 changes: 138 additions & 0 deletions skops/cli/tests/test_convert.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
import logging
import pathlib
import pickle as pkl
from unittest import mock

import numpy as np
import pytest

from skops.cli import _convert
from skops.io import load


class MockUnsafeType:
def __init__(self):
pass


class TestConvert:
model_name = "some_model_name"

@pytest.fixture
def safe_obj(self):
return np.ndarray([1, 2, 3, 4])

@pytest.fixture
def unsafe_obj(self):
return MockUnsafeType()

@pytest.fixture
def pkl_path(self, tmp_path):
return tmp_path / f"{self.model_name}.pkl"

@pytest.fixture
def skops_path(self, tmp_path):
return tmp_path / f"{self.model_name}.skops"

@pytest.fixture
def write_safe_file(self, pkl_path, safe_obj):
with open(pkl_path, "wb") as f:
pkl.dump(safe_obj, f)

@pytest.fixture
def write_unsafe_file(self, pkl_path, unsafe_obj):
with open(pkl_path, "wb") as f:
pkl.dump(unsafe_obj, f)

def test_base_case_works_as_expected(
self, pkl_path, tmp_path, skops_path, write_safe_file, safe_obj, caplog
):
_convert._convert_file(pkl_path, skops_path)
persisted_obj = load(skops_path)
assert np.array_equal(persisted_obj, safe_obj)
assert MockUnsafeType.__name__ not in caplog.text

def test_unsafe_case_works_as_expected(
self, pkl_path, tmp_path, skops_path, write_unsafe_file, caplog
):
caplog.set_level(logging.WARNING)
_convert._convert_file(pkl_path, skops_path)
persisted_obj = load(skops_path, trusted=True)

assert isinstance(persisted_obj, MockUnsafeType)

# check logging has warned that an unsafe type was found
assert MockUnsafeType.__name__ in caplog.text


class TestMain:
@staticmethod
def assert_called_correctly(
mock_convert: mock.MagicMock,
path,
output_file=None,
):
if not output_file:
output_file = pathlib.Path.cwd() / f"{pathlib.Path(path).stem}.skops"
mock_convert.assert_called_once_with(input_file=path, output_file=output_file)

@mock.patch("skops.cli._convert._convert_file")
def test_base_works_as_expected(self, mock_convert: mock.MagicMock):
path = "123.pkl"
namespace, _ = _convert.format_parser().parse_known_args([path])

_convert.main(namespace)
self.assert_called_correctly(mock_convert, path)

@mock.patch("skops.cli._convert._convert_file")
@pytest.mark.parametrize(
"input_path, output_file, expected_path",
[
("abc.123", "a/b/c", "a/b/c"),
("abc.123", None, pathlib.Path.cwd() / "abc.skops"),
],
ids=["Given an output path", "No output path"],
)
def test_with_output_dir_works_as_expected(
self, mock_convert: mock.MagicMock, input_path, output_file, expected_path
):
if output_file is not None:
args = [input_path, "--output", output_file]
else:
args = [input_path]

namespace, _ = _convert.format_parser().parse_known_args(args)

_convert.main(namespace)
self.assert_called_correctly(
mock_convert, path=input_path, output_file=expected_path
)

@mock.patch("skops.cli._convert._convert_file")
@pytest.mark.parametrize(
"verbosity, expected_level",
[
("", logging.WARNING),
("-v", logging.INFO),
("--verbose", logging.INFO),
("-vv", logging.DEBUG),
("-v -v", logging.DEBUG),
("-vvv", logging.DEBUG),
("--verbose --verbose", logging.DEBUG),
],
)
def test_given_log_levels_works_as_expected(
self, mock_convert: mock.MagicMock, verbosity, expected_level, caplog
):
input_path = "abc.def"
output_path = "bde.skops"
args = [input_path, "--output", output_path, verbosity.split()]

namespace, _ = _convert.format_parser().parse_known_args(args)

_convert.main(namespace)
self.assert_called_correctly(
mock_convert, path=input_path, output_file=output_path
)

assert caplog.at_level(expected_level)
Loading