gto show --repo accepts remote repositories #269

francesco086 · 2022-09-16T13:40:40Z

Contributes to #25

…2. a path to a local repo 3. a url to remote repo

…f_repo_is_remote

…lity

Enable show to work on a remote repo

codecov-commenter · 2022-09-16T14:05:44Z

Codecov Report

Base: 81.88% // Head: 82.72% // Increases project coverage by +0.83% 🎉

Coverage data is based on head (0293e67) compared to base (fb35551).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #269      +/-   ##
==========================================
+ Coverage   81.88%   82.72%   +0.83%     
==========================================
  Files          16       17       +1     
  Lines        1905     1945      +40     
==========================================
+ Hits         1560     1609      +49     
+ Misses        345      336       -9

Impacted Files	Coverage Δ
gto/api.py	`91.12% <100.00%> (+0.14%)`	⬆️
gto/cli.py	`70.43% <100.00%> (ø)`
gto/constants.py	`100.00% <100.00%> (ø)`
gto/git_utils.py	`100.00% <100.00%> (ø)`
gto/base.py	`87.25% <0.00%> (+1.41%)`	⬆️
gto/tag.py	`86.39% <0.00%> (+2.72%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

francesco086 · 2022-09-16T14:35:12Z

There are issues with windows not managing temporary files / directories properly (see e.g. this.
However, with more recent python versions, the issue seems to have been solved.

I found this discussion.

I tried to find a workaround but was not able to, also because I have no windows and for each attempt I need to push and wait for CI results.

What's our next best action? Any developer with windows who can help? Or shall we just exclude windows people with python < 3.9 ?

aguschin · 2022-09-19T11:46:18Z

Good stuff @francesco086 ! Confirm it works on my Mac!

Re windows tests, let's not implement it. If someone on windows will try running it and it will fail, he will bring this to issues and we will consider implementing this. About tests, let's skip the test execution. Something like this should work:

# conftest.py

def skip_matrix(skip: Sequence[Tuple[str, str]]):
    current_os = os.environ.get("GITHUB_MATRIX_OS")
    current_python = os.environ.get("GITHUB_MATRIX_PYTHON")
    if (current_os, current_python) in skip:
        return pytest.mark.skip(
            reason=f"This test is only for {os_} and python:{python}"
        )
    return lambda f: f

# test_file_with_failing_test.py

skip_remote_tempdir_matrix = skip_matrix(
    [("windows-latest", "3.7"), ("windows-latest", "3.8")]
)

@skip_remote_tempdir_matrix
def failing_test():
    ...

If you need a working example, please see MLEM (but note, that in MLEM we wanted to run a test in a single CI job - py3.7 on ubuntu, and thus it's the "skip wrapper" is different).

Could we also make this work? It works if you add .git in the end, but would be cool if it worked even without it :)

gto show -r https://github.com/iterative/example-gto

aguschin · 2022-09-19T16:31:55Z

Cool! I see you almost fixed everything. I will review the code tomorrow (right now it's the late evening for me). Meanwhile, do you want to add support for other read-only CLI commands in this PR? Like for check-ref, history and stages?

francesco086 · 2022-09-19T16:49:26Z

Hi @aguschin !

I went crazy with the windows thingy... I wanted to add a simple message in the error: "Are you using windows with python < 3.9? This may be the reason of this error: https://bugs.python.org/issue42796. Consider upgrading python."
To do this, long story short, I was driven crazy. In windows with python 3.8, whereas an error was indeed raised, the pytest assert raise did not recognize it. No idea why! The fact that each time I had to push to github and wait for CI made this trivial thing really lengthy. Eventually I resorted to give up, and test that this extra message is sent only with windows and python 3.7.

Regarding the skip, I preferred the pytest.mark.skipif over your custom skip_matrix.

Besides that, I added the support for url without the .git at the end (had only to change the regex).

Waiting for your code review. I took a lot of liberty in how I wrote and structured the code. Probably too much, it looks too inconsistent with the pre-existing code. Well... for tomorrow!

I would suggest to first merge this one, and then extend to other commands? If you don't mind, I find it easier this way. Let's first get this chunk done.

gto/git_utils.py

aguschin · 2022-09-20T07:44:21Z

gto/git_utils.py

+def git_clone_if_repo_is_remote(f: Callable):
+    @wraps(f)
+    def wrapped_f(*args, **kwargs):
+        kwargs = _turn_args_into_kwargs(args, kwargs)
+
+        # noinspection PyTypeChecker
+        if isinstance(kwargs["repo"], str) and is_url_to_remote_repo(
+            repo=kwargs["repo"]
+        ):
+            try:
+                with TemporaryDirectory() as tmp_dir:
+                    logging.debug("create temporary directory %s", tmp_dir)
+                    # noinspection PyTypeChecker
+                    git_clone(repo=kwargs["repo"], dir=tmp_dir)
+                    kwargs["repo"] = tmp_dir
+                    result = f(**kwargs)
+            except (NotADirectoryError, PermissionError) as e:
+                raise e.__class__(
+                    "Are you using windows with python < 3.9? "
+                    "This may be the reason of this error: https://bugs.python.org/issue42796. "
+                    "Consider upgrading python."
+                ) from e
+            logging.debug("temporary directory %s has been deleted", tmp_dir)
+        else:
+            result = f(**kwargs)
+
+        return result
+
+    def _turn_args_into_kwargs(
+        args: tuple, kwargs: Dict[str, object]
+    ) -> Dict[str, object]:
+        kwargs_complement = {
+            k: args[i]
+            for i, k in enumerate(inspect.getfullargspec(f).args)
+            if k not in kwargs.keys() and i < len(args)
+        }
+        kwargs.update(kwargs_complement)
+        return kwargs
+
+    return wrapped_f


This should work since repo is always the first arg to all CLI functions, and is more concise:

Suggested change

def git_clone_if_repo_is_remote(f: Callable):

@wraps(f)

def wrapped_f(*args, **kwargs):

kwargs = _turn_args_into_kwargs(args, kwargs)

# noinspection PyTypeChecker

if isinstance(kwargs["repo"], str) and is_url_to_remote_repo(

repo=kwargs["repo"]

):

try:

with TemporaryDirectory() as tmp_dir:

logging.debug("create temporary directory %s", tmp_dir)

# noinspection PyTypeChecker

git_clone(repo=kwargs["repo"], dir=tmp_dir)

kwargs["repo"] = tmp_dir

result = f(**kwargs)

except (NotADirectoryError, PermissionError) as e:

raise e.__class__(

"Are you using windows with python < 3.9? "

"This may be the reason of this error: https://bugs.python.org/issue42796. "

"Consider upgrading python."

) from e

logging.debug("temporary directory %s has been deleted", tmp_dir)

else:

result = f(**kwargs)

return result

def _turn_args_into_kwargs(

args: tuple, kwargs: Dict[str, object]

) -> Dict[str, object]:

kwargs_complement = {

k: args[i]

for i, k in enumerate(inspect.getfullargspec(f).args)

if k not in kwargs.keys() and i < len(args)

}

kwargs.update(kwargs_complement)

return kwargs

return wrapped_f

def git_clone_if_repo_is_remote(f: Callable):

@wraps(f)

def wrapped_f(repo, *args, **kwargs):

if isinstance(repo, str) and is_url_to_remote_repo(repo=repo):

try:

with TemporaryDirectory() as tmp_dir:

logging.debug("create temporary directory %s", tmp_dir)

git_clone(repo=repo, dir=tmp_dir)

return f(tmp_dir, *args, **kwargs)

except (NotADirectoryError, PermissionError) as e:

raise e.__class__(

"Are you using windows with python < 3.9? "

"This may be the reason of this error: https://bugs.python.org/issue42796. "

"Consider upgrading python."

) from e

logging.debug("temporary directory %s has been deleted", tmp_dir)

else:

return f(tmp_dir, args, **kwargs)

return wrapped_f

Mmm... I wrote the tests to make sure that the decorator works even if repo is not the first argument. I find it very much prone to problems if you create a decorator with such assumption.

So, currently your code suggestion breaks the tests even after fixing line 30 to return f(repo, *args, **kwargs).

Are you sure you want to make the decorator less general? How would you suggest to make clear that the repo argument must be the first argument in the decorated function?

Ok, I see. Let's keep it this then!

aguschin

Looks good overall! Please see some comments and suggestions I had @francesco086

gto/git_utils.py

tests/api/data/__init__.py

tests/git_utils/data/__init__.py

aguschin · 2022-09-20T07:59:48Z

tests/git_utils/data/__init__.py

@@ -0,0 +1,6 @@
+def get_example_http_remote_repo() -> str:
+    return "https://github.com/iterative/example-gto.git"


I wonder why some put these to conftest.py https://docs.pytest.org/en/7.1.x/reference/fixtures.html#conftest-py-sharing-fixtures-across-multiple-files . @mike0sv, I've seen you put constants in conftest.py rather than in __init__.py. Is there any difference?

tests/git_utils/test_git_clone_if_repo_is_remote.py

tests/git_utils/test_is_url_to_remote_repo.py

aguschin · 2022-09-20T08:07:02Z

tests/skip_presets.py

+    is_os_windows_and_py_lt_3_9,
+)
+
+skip_for_windows_py_lt_3_9 = pytest.mark.skipif(


The same q as above, does it make sense to keep these in conftest.py? Except for introducing one more file. @mike0sv ?

tests/utils.py

gto/git_utils.py

aguschin · 2022-09-20T11:36:58Z

gto/git_utils.py

+@contextmanager
+def cloned_git_repo(repo: str):
+    tmp_dir = TemporaryDirectory()
+    logging.debug("create temporary directory %s", tmp_dir)
+    git_clone(repo=repo, dir=tmp_dir.name)
+    yield tmp_dir.name
+    logging.debug("delete temporary directory %s", tmp_dir)
+    tmp_dir.cleanup()
+
+
+def git_clone(repo: str, dir: str) -> None:
+    logging.debug("clone %s in directory %s", repo, dir)
+    Repo.clone_from(url=repo, to_path=dir)


Let's merge these functions and test a single one only? Now having git_clone as a separate seems redundant. You can easily test cloned_git_repo without generating a temporary directory additionally in tests.

Can gladly do it for the tests (actually it connects with something I wanted to discuss about public/private functions -> will add it below), but for the functions I would suggest to leave them as they are, again to leave the door open for a different engine. It doesn't do any harm and makes the separation clear: this is the only thing you need to change if you want to use a different git engine.

Thanks, I checked out the second PR you created.

My opinion:

We should keep tests. We can remove them anytime later. Keeping the functions public is also OK for me, this is in the fashion of the rest of GTO codebase I think.

We can keep these functions separate, doesn't matter for now I guess.

I think the PR is ready, no need to overthink this. We can get fix all of that when that would be needed. Let's merge this and move along. WDYT?

fine with me! :) let's do that

francesco086 · 2022-09-20T13:25:09Z

@aguschin One thing I would like to discuss is that in the module git_utils there are now some functions that should probably be private. Currently the only public one (meaning, used outside the module itself) is git_clone_remote_repo (the decorator). All others can be turned into private. What do you think?
Maybe one could also argue that cloned_git_repo could be public too (perhaps needed in future?).

If you agree, then tests for private functions are probably to be removed to avoid having too rigid tests that could potentially make further development hard.

Function I would like to turn into private and remove the tests of: git_clone and is_url_of_remote_repo.

EDIT: Here the PR that shows the changes I would like to do: francesco086#2

Let me know :)

aguschin

Two last code suggestions)

gto/git_utils.py

Co-authored-by: Alexander Guschin <1aguschin@gmail.com>

francesco086 · 2022-09-21T13:20:19Z

@aguschin that should be it... :)

aguschin · 2022-09-21T14:55:20Z

Awesome! Thanks for the huge work done & useful feature implemented!

francesco086 · 2022-09-21T15:00:08Z

Awesome!! Thank you @aguschin for the patience... aligning on code style is definitely time and energy consuming. But now next features will come very quickly!

Francesco Calcavecchia and others added 11 commits September 13, 2022 19:48

git_utils module that allows to get a git.Repo object from 1. a Repo …

d984781

…2. a path to a local repo 3. a url to remote repo

delete function convert_to_git_repo in favor of decorator git_clone_i…

ddfe434

…f_repo_is_remote

improve detection of remote git url and enable cloning via ssh in tests

e533d2d

modify show to handle remote repos

feea54e

reformat sample remote repo expected registry json for better readibi…

2d79391

…lity

refactor test structure

b5e9810

blackify code

b188057

update git show command --repo help text

2297ac5

Merge pull request #1 from francesco086/show-on-remote-repo

4dd89ed

Enable show to work on a remote repo

add empty line at the end of json file

76aebe2

remove tests requiring ssh key

4686711

francesco086 force-pushed the main branch from 2009087 to 4686711 Compare September 16, 2022 14:30

francesco086 force-pushed the main branch from f026114 to af489ef Compare September 19, 2022 13:03

introduce and use @skip_for_windows_py_lt_3_9

74e0493

francesco086 force-pushed the main branch from af489ef to 74e0493 Compare September 19, 2022 13:23

Francesco Calcavecchia added 3 commits September 19, 2022 16:20

add info for windows users in case gto show fails

7af727a

add info for windows users in case gto show fails

3d2d970

support url without .git at the end

fcb55d0

francesco086 force-pushed the main branch 3 times, most recently from bc52e54 to c795876 Compare September 19, 2022 16:16

fix win test

6415782

francesco086 force-pushed the main branch from c795876 to 6415782 Compare September 19, 2022 16:21

fix win test

8dfa922

francesco086 force-pushed the main branch from ca1630f to 8dfa922 Compare September 19, 2022 16:37

aguschin reviewed Sep 20, 2022

View reviewed changes

gto/git_utils.py Outdated Show resolved Hide resolved

aguschin reviewed Sep 20, 2022

View reviewed changes

aguschin suggested changes Sep 20, 2022

View reviewed changes

Francesco Calcavecchia added 3 commits September 20, 2022 10:25

move remote git regex in the constants module

c1dbcaa

remove todo

ec0c7eb

add more path examples to test in test_if_local_url_then_true

0293e67

aguschin reviewed Sep 20, 2022

View reviewed changes

gto/git_utils.py Outdated Show resolved Hide resolved

Francesco Calcavecchia added 4 commits September 20, 2022 11:23

prefer constants over get functions in data module in tests

3f9cd51

reduce complexity of _turn_args_into_kwargs

fe29fa0

default os env reading of GITHUB_MATRIX_PYTHON to "" instead of "2"

b2475e7

introduce cloned_git_repo

f548a3c

aguschin reviewed Sep 20, 2022

View reviewed changes

gto/git_utils.py Outdated Show resolved Hide resolved

aguschin reviewed Sep 20, 2022

View reviewed changes

refactored tests

8464ec0

aguschin approved these changes Sep 21, 2022

View reviewed changes

gto/git_utils.py Outdated Show resolved Hide resolved

gto/git_utils.py Outdated Show resolved Hide resolved

Apply suggestions from code review

35eeb93

Co-authored-by: Alexander Guschin <1aguschin@gmail.com>

aguschin merged commit 61bf0ea into iterative:main Sep 21, 2022

aguschin assigned francesco086 Sep 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gto show --repo accepts remote repositories #269

gto show --repo accepts remote repositories #269

francesco086 commented Sep 16, 2022

codecov-commenter commented Sep 16, 2022 •

edited

Loading

francesco086 commented Sep 16, 2022

aguschin commented Sep 19, 2022

aguschin commented Sep 19, 2022

francesco086 commented Sep 19, 2022

aguschin Sep 20, 2022

francesco086 Sep 20, 2022

aguschin Sep 20, 2022

aguschin left a comment

aguschin Sep 20, 2022

aguschin Sep 20, 2022

aguschin Sep 20, 2022

francesco086 Sep 20, 2022

aguschin Sep 21, 2022

francesco086 Sep 21, 2022

francesco086 commented Sep 20, 2022 •

edited

Loading

aguschin left a comment

francesco086 commented Sep 21, 2022 •

edited

Loading

aguschin commented Sep 21, 2022

francesco086 commented Sep 21, 2022

		@@ -0,0 +1,6 @@
		def get_example_http_remote_repo() -> str:
		return "https://github.com/iterative/example-gto.git"

gto show --repo accepts remote repositories #269

gto show --repo accepts remote repositories #269

Conversation

francesco086 commented Sep 16, 2022

codecov-commenter commented Sep 16, 2022 • edited Loading

Codecov Report

francesco086 commented Sep 16, 2022

aguschin commented Sep 19, 2022

aguschin commented Sep 19, 2022

francesco086 commented Sep 19, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aguschin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

francesco086 commented Sep 20, 2022 • edited Loading

aguschin left a comment

Choose a reason for hiding this comment

francesco086 commented Sep 21, 2022 • edited Loading

aguschin commented Sep 21, 2022

francesco086 commented Sep 21, 2022

codecov-commenter commented Sep 16, 2022 •

edited

Loading

francesco086 commented Sep 20, 2022 •

edited

Loading

francesco086 commented Sep 21, 2022 •

edited

Loading