Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gto show --repo accepts remote repositories #269

Merged
merged 26 commits into from
Sep 21, 2022
Merged

Conversation

francesco086
Copy link
Contributor

Contributes to #25

@codecov-commenter
Copy link

codecov-commenter commented Sep 16, 2022

Codecov Report

Base: 81.88% // Head: 82.72% // Increases project coverage by +0.83% 🎉

Coverage data is based on head (0293e67) compared to base (fb35551).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #269      +/-   ##
==========================================
+ Coverage   81.88%   82.72%   +0.83%     
==========================================
  Files          16       17       +1     
  Lines        1905     1945      +40     
==========================================
+ Hits         1560     1609      +49     
+ Misses        345      336       -9     
Impacted Files Coverage Δ
gto/api.py 91.12% <100.00%> (+0.14%) ⬆️
gto/cli.py 70.43% <100.00%> (ø)
gto/constants.py 100.00% <100.00%> (ø)
gto/git_utils.py 100.00% <100.00%> (ø)
gto/base.py 87.25% <0.00%> (+1.41%) ⬆️
gto/tag.py 86.39% <0.00%> (+2.72%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@francesco086
Copy link
Contributor Author

There are issues with windows not managing temporary files / directories properly (see e.g. this.
However, with more recent python versions, the issue seems to have been solved.

I found this discussion.

I tried to find a workaround but was not able to, also because I have no windows and for each attempt I need to push and wait for CI results.

What's our next best action? Any developer with windows who can help? Or shall we just exclude windows people with python < 3.9 ?

@aguschin
Copy link
Contributor

Good stuff @francesco086 ! Confirm it works on my Mac!

Re windows tests, let's not implement it. If someone on windows will try running it and it will fail, he will bring this to issues and we will consider implementing this. About tests, let's skip the test execution. Something like this should work:

# conftest.py

def skip_matrix(skip: Sequence[Tuple[str, str]]):
    current_os = os.environ.get("GITHUB_MATRIX_OS")
    current_python = os.environ.get("GITHUB_MATRIX_PYTHON")
    if (current_os, current_python) in skip:
        return pytest.mark.skip(
            reason=f"This test is only for {os_} and python:{python}"
        )
    return lambda f: f

# test_file_with_failing_test.py

skip_remote_tempdir_matrix = skip_matrix(
    [("windows-latest", "3.7"), ("windows-latest", "3.8")]
)

@skip_remote_tempdir_matrix
def failing_test():
    ...

If you need a working example, please see MLEM (but note, that in MLEM we wanted to run a test in a single CI job - py3.7 on ubuntu, and thus it's the "skip wrapper" is different).

Could we also make this work? It works if you add .git in the end, but would be cool if it worked even without it :)

gto show -r https://github.com/iterative/example-gto

@francesco086 francesco086 force-pushed the main branch 3 times, most recently from bc52e54 to c795876 Compare September 19, 2022 16:16
@aguschin
Copy link
Contributor

Cool! I see you almost fixed everything. I will review the code tomorrow (right now it's the late evening for me). Meanwhile, do you want to add support for other read-only CLI commands in this PR? Like for check-ref, history and stages?

@francesco086
Copy link
Contributor Author

Hi @aguschin !

I went crazy with the windows thingy... I wanted to add a simple message in the error: "Are you using windows with python < 3.9? This may be the reason of this error: https://bugs.python.org/issue42796. Consider upgrading python."
To do this, long story short, I was driven crazy. In windows with python 3.8, whereas an error was indeed raised, the pytest assert raise did not recognize it. No idea why! The fact that each time I had to push to github and wait for CI made this trivial thing really lengthy. Eventually I resorted to give up, and test that this extra message is sent only with windows and python 3.7.

Regarding the skip, I preferred the pytest.mark.skipif over your custom skip_matrix.

Besides that, I added the support for url without the .git at the end (had only to change the regex).

Waiting for your code review. I took a lot of liberty in how I wrote and structured the code. Probably too much, it looks too inconsistent with the pre-existing code. Well... for tomorrow!

I would suggest to first merge this one, and then extend to other commands? If you don't mind, I find it easier this way. Let's first get this chunk done.

gto/git_utils.py Outdated
Comment on lines 13 to 52
def git_clone_if_repo_is_remote(f: Callable):
@wraps(f)
def wrapped_f(*args, **kwargs):
kwargs = _turn_args_into_kwargs(args, kwargs)

# noinspection PyTypeChecker
if isinstance(kwargs["repo"], str) and is_url_to_remote_repo(
repo=kwargs["repo"]
):
try:
with TemporaryDirectory() as tmp_dir:
logging.debug("create temporary directory %s", tmp_dir)
# noinspection PyTypeChecker
git_clone(repo=kwargs["repo"], dir=tmp_dir)
kwargs["repo"] = tmp_dir
result = f(**kwargs)
except (NotADirectoryError, PermissionError) as e:
raise e.__class__(
"Are you using windows with python < 3.9? "
"This may be the reason of this error: https://bugs.python.org/issue42796. "
"Consider upgrading python."
) from e
logging.debug("temporary directory %s has been deleted", tmp_dir)
else:
result = f(**kwargs)

return result

def _turn_args_into_kwargs(
args: tuple, kwargs: Dict[str, object]
) -> Dict[str, object]:
kwargs_complement = {
k: args[i]
for i, k in enumerate(inspect.getfullargspec(f).args)
if k not in kwargs.keys() and i < len(args)
}
kwargs.update(kwargs_complement)
return kwargs

return wrapped_f
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should work since repo is always the first arg to all CLI functions, and is more concise:

Suggested change
def git_clone_if_repo_is_remote(f: Callable):
@wraps(f)
def wrapped_f(*args, **kwargs):
kwargs = _turn_args_into_kwargs(args, kwargs)
# noinspection PyTypeChecker
if isinstance(kwargs["repo"], str) and is_url_to_remote_repo(
repo=kwargs["repo"]
):
try:
with TemporaryDirectory() as tmp_dir:
logging.debug("create temporary directory %s", tmp_dir)
# noinspection PyTypeChecker
git_clone(repo=kwargs["repo"], dir=tmp_dir)
kwargs["repo"] = tmp_dir
result = f(**kwargs)
except (NotADirectoryError, PermissionError) as e:
raise e.__class__(
"Are you using windows with python < 3.9? "
"This may be the reason of this error: https://bugs.python.org/issue42796. "
"Consider upgrading python."
) from e
logging.debug("temporary directory %s has been deleted", tmp_dir)
else:
result = f(**kwargs)
return result
def _turn_args_into_kwargs(
args: tuple, kwargs: Dict[str, object]
) -> Dict[str, object]:
kwargs_complement = {
k: args[i]
for i, k in enumerate(inspect.getfullargspec(f).args)
if k not in kwargs.keys() and i < len(args)
}
kwargs.update(kwargs_complement)
return kwargs
return wrapped_f
def git_clone_if_repo_is_remote(f: Callable):
@wraps(f)
def wrapped_f(repo, *args, **kwargs):
if isinstance(repo, str) and is_url_to_remote_repo(repo=repo):
try:
with TemporaryDirectory() as tmp_dir:
logging.debug("create temporary directory %s", tmp_dir)
git_clone(repo=repo, dir=tmp_dir)
return f(tmp_dir, *args, **kwargs)
except (NotADirectoryError, PermissionError) as e:
raise e.__class__(
"Are you using windows with python < 3.9? "
"This may be the reason of this error: https://bugs.python.org/issue42796. "
"Consider upgrading python."
) from e
logging.debug("temporary directory %s has been deleted", tmp_dir)
else:
return f(tmp_dir, args, **kwargs)
return wrapped_f

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm... I wrote the tests to make sure that the decorator works even if repo is not the first argument. I find it very much prone to problems if you create a decorator with such assumption.

So, currently your code suggestion breaks the tests even after fixing line 30 to return f(repo, *args, **kwargs).

Are you sure you want to make the decorator less general? How would you suggest to make clear that the repo argument must be the first argument in the decorated function?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see. Let's keep it this then!

Copy link
Contributor

@aguschin aguschin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall! Please see some comments and suggestions I had @francesco086

@@ -0,0 +1,6 @@
def get_example_http_remote_repo() -> str:
return "https://github.com/iterative/example-gto.git"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why some put these to conftest.py https://docs.pytest.org/en/7.1.x/reference/fixtures.html#conftest-py-sharing-fixtures-across-multiple-files . @mike0sv, I've seen you put constants in conftest.py rather than in __init__.py. Is there any difference?

is_os_windows_and_py_lt_3_9,
)

skip_for_windows_py_lt_3_9 = pytest.mark.skipif(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same q as above, does it make sense to keep these in conftest.py? Except for introducing one more file. @mike0sv ?

Comment on lines +59 to +71
@contextmanager
def cloned_git_repo(repo: str):
tmp_dir = TemporaryDirectory()
logging.debug("create temporary directory %s", tmp_dir)
git_clone(repo=repo, dir=tmp_dir.name)
yield tmp_dir.name
logging.debug("delete temporary directory %s", tmp_dir)
tmp_dir.cleanup()


def git_clone(repo: str, dir: str) -> None:
logging.debug("clone %s in directory %s", repo, dir)
Repo.clone_from(url=repo, to_path=dir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge these functions and test a single one only? Now having git_clone as a separate seems redundant. You can easily test cloned_git_repo without generating a temporary directory additionally in tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can gladly do it for the tests (actually it connects with something I wanted to discuss about public/private functions -> will add it below), but for the functions I would suggest to leave them as they are, again to leave the door open for a different engine. It doesn't do any harm and makes the separation clear: this is the only thing you need to change if you want to use a different git engine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I checked out the second PR you created.

My opinion:

  1. We should keep tests. We can remove them anytime later. Keeping the functions public is also OK for me, this is in the fashion of the rest of GTO codebase I think.
  2. We can keep these functions separate, doesn't matter for now I guess.

I think the PR is ready, no need to overthink this. We can get fix all of that when that would be needed. Let's merge this and move along. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine with me! :) let's do that

@francesco086
Copy link
Contributor Author

francesco086 commented Sep 20, 2022

@aguschin One thing I would like to discuss is that in the module git_utils there are now some functions that should probably be private. Currently the only public one (meaning, used outside the module itself) is git_clone_remote_repo (the decorator). All others can be turned into private. What do you think?
Maybe one could also argue that cloned_git_repo could be public too (perhaps needed in future?).

If you agree, then tests for private functions are probably to be removed to avoid having too rigid tests that could potentially make further development hard.

Function I would like to turn into private and remove the tests of: git_clone and is_url_of_remote_repo.

EDIT: Here the PR that shows the changes I would like to do: francesco086#2

Let me know :)

Copy link
Contributor

@aguschin aguschin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two last code suggestions)

Co-authored-by: Alexander Guschin <1aguschin@gmail.com>
@francesco086
Copy link
Contributor Author

francesco086 commented Sep 21, 2022

@aguschin that should be it... :)

@aguschin
Copy link
Contributor

Awesome! Thanks for the huge work done & useful feature implemented!

@aguschin aguschin merged commit 61bf0ea into iterative:main Sep 21, 2022
@francesco086
Copy link
Contributor Author

Awesome!! Thank you @aguschin for the patience... aligning on code style is definitely time and energy consuming. But now next features will come very quickly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants