Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable MLEM and DVC extensions #169

Closed
wants to merge 7 commits into from
Closed

Enable MLEM and DVC extensions #169

wants to merge 7 commits into from

Conversation

aguschin
Copy link
Contributor

@aguschin aguschin commented Jun 2, 2022

Updating extensions from #52. Please check out examples there as well.

What is this and why we need it

When we're going to release Studio MR, we can provide complimentary CLI experience. This will be useful since many of our users are CLI users and having good CLI counterpart will help people adopt our Model Registry and ecosystem.

From the user perspective, the integration should give three features: describe and discover and a way to materialize the artifact.

describe

This is a complement to Model Details Card page. While working in CLI, you may want to learn which enrichments are there for the specific artifact and get their details:

$ gto describe --repo . --rev main mymodel
GTO: Type "model". Path "some/path". Description: "Awesome one"
DVC: Size 100mb. Stored in remote "myremote" ("s3://mybucket/path").
MLEM: Sklearn model. Methods: "predict", "predict_proba". Input data: dataframe. Output: list.

discover

Discover mechanics helps you find potential artifacts that can be annotated and then registered/promoted. (We'll also show those in Model Tab in MR at some point, suggesting to annotate them.)

$ gto discover --repo . --rev main
- models/nn.pkl (MLEM model, DVC PL output)
- data/features.csv (MLEM dataset, DVC PL input)
- data/raw.csv (DVC tracked)
- models/model-committed-to-repo.onnx (MLEM model)

Both examples are human-readable output. For --json it would be different. This is how it should look like, currently the output is not that exciting.

materialize

To materialize the object means to copy it, e.g.:

$ gto get -r $REPO myartifact --version v1 --output myartifact-v1
$ gto get -r $REPO myartifact --env prod --output myartifact-prod

for that to work, GTO needs some way to ask DVC/MLEM to download the binaries they track.

Check this PR works

  1. Set up venv with this PR. Clone repo, checkout branch, pip install gto from it.
  2. Clone this example repo https://github.com/aguschin/fixture-model-registry
  3. Call $ gto describe gto-mlem-in-mlem-dir and $ gto describe gto-mlem-in-mlem-dir --json. You can try to call describe on other models that are shown gto show.

Current implementation

This implementation utilizes entrypoints magic: https://github.com/iterative/gto/blob/feature/extensions/setup.py#L61
@mike0sv can help understand it better.

Currently only describe implemented. We should start with it and draft some implementation first.

The idea is to move gto/ext_mlem.py to mlem.gto module, and gto/ext_dvc.py to dvc.gto module once everything is developed (it's just easier to develop things initially in a single PR). MLEM and DVC doesn't have to depend on GTO, they only need to have that abstract classes implementation copied inside them.

There is a CLI implementation also: in case DVC was installed as binary and GTO needs to communicate via CLI. It's commented now since it doesn't work, but you can check out #52 for a working example.

closes #46

@aguschin aguschin self-assigned this Jun 2, 2022
@aguschin aguschin changed the title Enable MLEM and GTO extensions Enable MLEM and DVC extensions Jun 3, 2022
@aguschin aguschin closed this May 19, 2023
@skshetry skshetry deleted the feature/extensions branch March 26, 2024 09:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement interface for enrichments
1 participant