Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Updating extensions from #52. Please check out examples there as well.
What is this and why we need it
When we're going to release Studio MR, we can provide complimentary CLI experience. This will be useful since many of our users are CLI users and having good CLI counterpart will help people adopt our Model Registry and ecosystem.
From the user perspective, the integration should give three features:
describe
anddiscover
and a way tomaterialize
the artifact.describe
This is a complement to Model Details Card page. While working in CLI, you may want to learn which enrichments are there for the specific artifact and get their details:
discover
Discover mechanics helps you find potential artifacts that can be annotated and then registered/promoted. (We'll also show those in Model Tab in MR at some point, suggesting to annotate them.)
$ gto discover --repo . --rev main - models/nn.pkl (MLEM model, DVC PL output) - data/features.csv (MLEM dataset, DVC PL input) - data/raw.csv (DVC tracked) - models/model-committed-to-repo.onnx (MLEM model)
Both examples are human-readable output. For
--json
it would be different. This is how it should look like, currently the output is not that exciting.materialize
To
materialize
the object means to copy it, e.g.:for that to work, GTO needs some way to ask DVC/MLEM to download the binaries they track.
Check this PR works
pip install gto
from it.$ gto describe gto-mlem-in-mlem-dir
and$ gto describe gto-mlem-in-mlem-dir --json
. You can try to calldescribe
on other models that are showngto show
.Current implementation
This implementation utilizes entrypoints magic: https://github.com/iterative/gto/blob/feature/extensions/setup.py#L61
@mike0sv can help understand it better.
Currently only
describe
implemented. We should start with it and draft some implementation first.The idea is to move
gto/ext_mlem.py
tomlem.gto
module, andgto/ext_dvc.py
todvc.gto
module once everything is developed (it's just easier to develop things initially in a single PR). MLEM and DVC doesn't have to depend on GTO, they only need to have that abstract classes implementation copied inside them.There is a CLI implementation also: in case DVC was installed as binary and GTO needs to communicate via CLI. It's commented now since it doesn't work, but you can check out #52 for a working example.
closes #46