Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for "git clone --recursive" to pull in submodules #96

Closed
rsdunlapiv opened this issue Apr 11, 2018 · 11 comments · Fixed by #119
Closed

Add support for "git clone --recursive" to pull in submodules #96

rsdunlapiv opened this issue Apr 11, 2018 · 11 comments · Fixed by #119
Assignees

Comments

@rsdunlapiv
Copy link

Summary of Issue:

When cloning a git external, it does not automatically pull in the submodules of the cloned repo. To do this, we'd need to add a "--recursive" option to the git clone command. Good resource on this: http://www.vogella.com/tutorials/GitSubmodules/article.html

Expected behavior and actual behavior:

This is expected so far, but adding the recursive option would reduce number of manual steps when checking out an external that has submodules.

Steps to reproduce the problem (should include model description file(s) or link to public repository):

Add any git repo that has submodules to an Externals.cfg and execute checkout_externals. The submodules will not be present (yet) in the cloned repo.

What is the changeset ID of the code, and the machine you are using:

master as of 4/10/18

have you modified the code? If so, it must be committed and available for testing:

No

Screen output or log file showing the error message and context:

@rsdunlapiv
Copy link
Author

Probably need some discussion about whether this would be default behavior, or if we need a -r flag or similar to checkout_externals. Another option might be to put it in the Externals.cfg for a particular external, e.g.:

[mom]
branch = mybranch
protocol = git
repo_url = https://github.com/mvertens/vin-chaud.git
local_path = components/vin
required = True
recursive = True

Or something similar...

@billsacks
Copy link
Member

Thanks, @rsdunlapiv .

Is there any downside of always adding the --recursive flag?

And is there ever anything else you need to do for submodules to work correctly? e.g., in the git checkout command?

@gold2718
Copy link
Collaborator

I would like to see a bit more of the proposed semantics to make sure that checkout_externals can always handle this situation when a change to Externals.cfg results in a potentially incompatible .gitmodules file (e.g., different or moved or deleted submodules). Is there a clear semantics that checkout_externals can use?

@llpcarson
Copy link

I would also be interested in having this option added, but agree that thinking through the situation when submodules change is important.

@mnlevy1981
Copy link
Contributor

I've been talking to @dabail10 about this (and learning about submodules as I go. It seems like the current approach taken by manage externals is

  1. git clone [repo]
  2. git checkout [ref]

The desired results of pulling down submodules could be achieved by adding

  1. git submodule update --init --recursive

I gave Dave the following source mod for _git_checkout_ref():

$ git diff
diff --git a/manic/repository_git.py b/manic/repository_git.py
index efb775d..c8f4ba9 100644
--- a/manic/repository_git.py
+++ b/manic/repository_git.py
@@ -727,3 +727,9 @@ def _git_checkout_ref(ref, verbosity):
         if verbosity >= VERBOSITY_VERBOSE:
             printlog('    {0}'.format(' '.join(cmd)))
         execute_subprocess(cmd)
+
+        cmd = ['git', 'submodule', 'update', '--init', '--recursive']
+        if verbosity >= VERBOSITY_VERBOSE:
+            printlog('    {0}'.format(' '.join(cmd)))
+        execute_subprocess(cmd)

That seemed to work for his needs (checking out CICE and its submodules). If you checkout one version of CICE with this and then change Externals.cfg to point to a different version with a different version of the same submodules, it will update the submodules accordingly. That said, I don't know if this patch could have other adverse affects relating to the issue @gold2718 raised about the .gitmodules file (or even how to test it).

The one thing I did test is that running git submodule update --init --recursive in repositories that don't use submodules at all is okay -- it doesn't change anything in the repository, nor does it throw an error, so I don't think you need an extra flag in the conf file... just run this in every git external.

@mnlevy1981
Copy link
Contributor

I've talked more with @dabail10 and also looked into the mechanisms of git submodules a little bit. Without getting too technical, I think it should be possible to combine the contents of .gitmodules and the output from running git submodule to basically generate an Externals.cfg file in memory:

[/submodule name/]
commit = /hash from git submodule output/
protocol = git
repo_url = /url from .gitmodules/
local_path = /path from .gitmodules/
required = ???

To address required, I think two new flags should be added to Externals.cfg:

  1. ignore-submodules would be False by default, but if a user sets it to True then no git submodules would be cloned (maybe skip the whole process described above if this is the case)
  2. excluded-submodules would be an empty list by default, but users could change it to a list of submodules that can be ignored. In this case, maybe required=True for all submodules except those listed in excluded-submodules?

Hopefully this allays concerns raised by @gold2718, because it's effectively treating a component that has submodules as though it uses manage_externals itself.

@llpcarson
Copy link

Would this also work for recursively (or could it, with a switch?)

@mnlevy1981
Copy link
Contributor

Proposed behavior from a meeting: if externals is not specified in higher level Externals.cfg, look for submodules instead and process that as an external request. (@llpcarson this should work recursively.)

This is preferable to the code change I mentioned above because changes to submodules will be caught by manage_externals instead of silently overwritten.

We may also need externals=None as a way to avoid checking out any git submodules.

Last item for the wish list: for cases where we will use Externals.cfg to avoid checking out a specific submodule, can there be syntax to allow the hash used in git submodules to be checked out by manage_externals (avoiding the "duplicate information may get out of sync" error). From MOM6's Externals_MOM.cfg, replace:

[geoKdTree]
tag = a4670b9
protocol = git
local_path = pkg/geoKdTree
repo_url = https://github.com/travissluka/geoKdTree.git
required = True

with

[geoKdTree]
protocol = git
from_submodule = True
required = True

@gold2718
Copy link
Collaborator

gold2718 commented Feb 9, 2019

I am still trying to wrap my head around the proposed semantics. First, here is what I think we all agreed should work:

  • When a repository with submodules is encountered, the .gitmodules file should be treated as if the relevant information was encoded into an externals.cfg file and that Externals.cfg file was specified in an externals config item in the config section which triggered the module's checkout.
  • If checkout_externals is processing an externals.cfg file for a repository, that entry supersedes a .gitmodules file, however, entries in the .gitmodules file may be referenced in a config section with the from_submodule = True idiom.
    Example:
  • repository, foo1, has a .gitmodules file that points to repository foo2.
  • repository, foo2 has a .gitmodules file that points to repository foo3.
  • Externals.cfg has a config section to checkout foo1
  • Running checkout_externals should checkout foo2 as a submodule of foo1 and foo3 as a submodule of foo1.

Okay so far?

So what happens if foo2 has an externals.cfg file? Normally, we do not process any sub externals.cfg file unless it is referenced by a parent's externals.cfg file. However, if I am now going to always search for a .gitmodules for checked out repositories which do not have any externals.cfg file referenced in their parent's externals.cfg file, I will ignore an externals.cfg file but still checkout the submodule. Back to a slight modification of the example above:

  • repository, foo1, has a .gitmodules file that points to repository foo2.
  • repository, foo2 has a .gitmodules file that points to repository foo3.
  • repository, foo2 also has an externals.cfg file (e.g., externals_foo2.cfg).
  • Externals.cfg has a config section to checkout foo1
  • Running checkout_externals should checkout foo2 as a submodule of foo1 and foo3 as a submodule of foo1. That is, it will ignore the externals_foo2.cfg file.

Is everyone okay with that? If not, please propose bulletproof semantics for that situation.

@gold2718
Copy link
Collaborator

@mnlevy1981 suggested a modified config entry to use .gitmodules information for a manage_externals checkout. The new form is:

[da_hooks]
protocol = git
from_submodule = True
required = True

In this case, note that the name of the external conflicts with the name in the .gitmodules file:

[submodule "pkg/MOM6_DA_hooks"]
        path = pkg/MOM6_DA_hooks
        url = https://github.com/MJHarrison-GFDL/MOM6_DA_hooks.git

Do we want an exact match (requiring, in this case, modifying Externals_MOM.cfg)? Do we want to match the part of the submodule name after the last slash (/)? Do we want a close-enough match (e.g., 'da_hooks' is in 'mom6_da_hooks')? Thoughts? @billsacks?

@billsacks
Copy link
Member

I haven't been following this closely enough to give any helpful thoughts. I'm happy to go with whatever you guys want here. If you want another opinion, though, I'm willing to give one, but in that case, can we find some time to sit down together to talk this through?

gold2718 pushed a commit that referenced this issue May 15, 2019
New capability to use git submodule info to checkout externals

By default, an external with no specified sub-externals configuration file will
have its submodules checked out according to the information in that repository's
.gitmodules file. This process is recursive.

To prevent submodules from being checked out, the external description for that
repository should be listed as "externals = none".

To use submodule URL, local path, and reference hash information for a
sub-external, replace the 'local_path', 'repo_url', and 'branch' 'hash' or 'tag'
keywords with "from_submodule = True".

User interface changes?: Yes
externals keyword can be "none" to prevent loading submodules.
from_submodule keyword can be used to replace external configuration information
with information from a repository's submodule configuration.

Fixes: #96

Testing:
test removed: none
unit tests: all pass
system tests: all pass, added TestSubrepoCheckout to test new functionality
manual testing: Tests with ESCOMP/MOM_interface (see below)

    Just run checkout_externals: no change
    Remove "externals = ../Externals_MOM.cfg" from Externals.cfg: All four submodules
    of MOM6 checked out
    Set "externals = none" in Externals.cfg: No submodules checked out (empty directories)
    Modify da_hooks external in Externals_MOM.cfg (see below): da_hooks is checked out with
    MOM6 .gitmodules configuration.

[pkg/MOM6_DA_hooks]
protocol = git
from_submodule = True
required = True
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants