Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

many_different_states fails on Windows with GIX_TEST_IGNORE_ARCHIVES=1 #1442

Closed
EliahKagan opened this issue Jul 4, 2024 · 3 comments
Closed
Labels
acknowledged an issue is accepted as shortcoming to be fixed

Comments

@EliahKagan
Copy link
Member

EliahKagan commented Jul 4, 2024

Current behavior 😯

Starting after #1441, which fixed #1440, there is a new failing test on Windows with GIX_TEST_IGNORE_ARCHIVES=1, in addition to those that had failed before (#1358).

The new failing test is:

        FAIL [   9.690s] gix-diff-tests::diff tree::changes::to_obtain_tree::many_different_states

This is one of the tests using the fixture the changes in #1441 applied to.

The recent change seems otherwise good

Only Windows is affected, all tests still pass on Windows when GIX_TEST_IGNORE_ARCHIVES=1 is not used, and no unexpected untracked files are created on any system (at least on Windows, Ubuntu, or macOS).

This is to say that #1441 does appear to have fully fixed #1440, but unfortunately has also produced a test suite regression, on Windows only, for tree::changes::to_obtain_tree::many_different_states in gix-diff-tests::diff, when the use of pre-generated archives is suppressed with GIX_TEST_IGNORE_ARCHIVES=1.

Full run output

Full output for this test run, as well as the full error summary (which otherwise lists the same tests as #1358), can be seen in this new gist.

Details on the failed test

Cleaning ignored files again, and then entering the gix-diff/tests/tree directory and running tests from there with GIX_TEST_IGNORE_ARCHIVES=1 and RUST_BACKTRACE=full, shows some more information about the failing test. The full output can, if wanted, be seen in this other gist. The other gix-diff tests still pass as expected, and the full traceback for the failing test is unremarkable. The interesting part is the assertion message itself:

ek@Glub MINGW64 ~/source/repos/gitoxide/gix-diff/tests/tree (main)
        FAIL [   7.550s] gix-diff-tests::diff tree::changes::to_obtain_tree::many_different_states

--- STDOUT:              gix-diff-tests::diff tree::changes::to_obtain_tree::many_different_states ---

running 1 test
test tree::changes::to_obtain_tree::many_different_states ... FAILED

failures:

failures:
    tree::changes::to_obtain_tree::many_different_states

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 28 filtered out; finished in 7.53s


--- STDERR:              gix-diff-tests::diff tree::changes::to_obtain_tree::many_different_states ---
thread 'tree::changes::to_obtain_tree::many_different_states' panicked at gix-diff\tests\tree\mod.rs:259:13:
assertion `left == right` failed: :100644 120000 13c2aca72ab576cb5f22dc8e7f8ba8ddab553a8a 2e65efe2a145dda7ee51d1741299f848e5bf752e T    f/f
  left: [Modification { previous_entry_mode: EntryMode(16384), previous_oid: Sha1(849bd76db90b65ebbd2e6d3970ca70c96ee5592c), entry_mode: EntryMode(16384), oid: Sha1(3b287f8730c81d0b763c2d294618a5e32b67b4f8), path: "f" }, Modification { previous_entry_mode: EntryMode(33188), previous_oid: Sha1(13c2aca72ab576cb5f22dc8e7f8ba8ddab553a8a), entry_mode: EntryMode(33188), oid: Sha1(e69de29bb2d1d6434b8b29ae775ad8c2e48c5391), path: "f/f" }]
 right: [Modification { previous_entry_mode: EntryMode(16384), previous_oid: Sha1(849bd76db90b65ebbd2e6d3970ca70c96ee5592c), entry_mode: EntryMode(16384), oid: Sha1(7e26dba59b6336f87d1d4ae3505a2da302b91c76), path: "f" }, Modification { previous_entry_mode: EntryMode(33188), previous_oid: Sha1(13c2aca72ab576cb5f22dc8e7f8ba8ddab553a8a), entry_mode: EntryMode(40960), oid: Sha1(2e65efe2a145dda7ee51d1741299f848e5bf752e), path: "f/f" }]

Analysis of failure output

The failing assertion is:

https://github.com/Byron/gitoxide/blob/f87322e185704d9d4368ae88e95892635a976e4a/gix-diff/tests/tree/mod.rs#L259-L278

#1441 included these changes to gix-diff/tree/mod.rs, specifically in the many_different_states test function, that removed Windows-specific expected values of some hexadecimal Git object IDs. It changed what was previously:

https://github.com/Byron/gitoxide/blob/e2e8cf9206058365be50ea899cc4dcc22e923bef/gix-diff/tests/tree/mod.rs#L256-L267

To just:

https://github.com/Byron/gitoxide/blob/f87322e185704d9d4368ae88e95892635a976e4a/gix-diff/tests/tree/mod.rs#L256-L258

These changes are to expected values used in the specific assertion that failed.

Hypothesis

That change was originally not included in #1441, but was added in the force push from 2f69389 to f5a9884. It seems it was added to fix this many_different_states test failure on Windows, and that it was effective at fixing that.

CI does not currently run Windows tests with GIX_TEST_IGNORE_ARCHIVES=1. So the mismatch between the generated repo's object IDs is--as one one would expect given that it doesn't look like there is a problem in the gitoxide code under test--a real mismatch that occurs across operating systems when the test fixture script is run.

It looks like the test suite bug described in #1440 was actually inadvertently protecting against a condition where assertions were dependent on which operating system ran the test fixture script. Because the archive was always regenerated, the OIDs specific to the current operating system were always used. Fixing the bug created a situation where the committed archive would be usable but contain OIDs differing from those on Windows. Then, the test case would either:

  • Assert OS-specific OIDs and fail on Windows when GIX_TEST_IGNORE_ARCHIVES=1 is not used, as was the case prior to the PR force push cited above.
  • Assert non-Windows OIDs on all systems and fail on Windows when GIX_TEST_IGNORE_ARCHIVES=1 is used, as is the case now.

As demonstrated below, when run on Windows, the fixture script looks like it creates and commits a symlink or at least attempts to do so, but on most systems actually does not even attempt to do so, even if doing so is permitted. This is why the repository's objects are different.

Expected behavior 🤔

The test and/or fixture should be changed so that the test:

  • Passes even with GIX_TEST_IGNORE_ARCHIVES=1, as it had before.
  • Fully tests what it is trying to test, which it seems not to have done before on Windows.

Ideally, the disparity between the repositories generated on Windows and non-Windows operating systems should be eliminated in the test fixture script itself. I think this would be best unless it is technically infeasible or would make the test fixture or any tests that use it less robust or much less clear.

In the mean time, #1441 has actually improved the situation overall, because the pre-generated archive that the test passes with on Windows has the objects that the test is supposed to use.

Platform expectations

The OID special-casing for Windows was added in b1b6e00, whose title stated the rationale:

[tree-diff] consider that windows does do symlinks differently

But I don't know of any reason why that would be the case. My impression is that Git produces the same hashes on Unix-like systems and Windows when the directory structure, metadata, and relevant configuration match exactly, even when symbolic links are being staged and committed. Experiments confirm this, at least for simple situations. I did this on Ubuntu:

ek@Kip:~/tmp$ git init --quiet symlink-test
ek@Kip:~/tmp$ cd symlink-test
ek@Kip:~/tmp/symlink-test (main #)$ touch target
ek@Kip:~/tmp/symlink-test (main #%)$ ln -s target symlink
ek@Kip:~/tmp/symlink-test (main #%)$ ls -l
total 0
lrwxrwxrwx 1 ek ek 6 Jul  4 15:51 symlink -> target
-rw-rw-r-- 1 ek ek 0 Jul  4 15:51 target
ek@Kip:~/tmp/symlink-test (main #%)$ git add .
ek@Kip:~/tmp/symlink-test (main +)$ export GIT_AUTHOR_DATE='2000-01-01 00:00:00 +0000'
ek@Kip:~/tmp/symlink-test (main +)$ export GIT_AUTHOR_EMAIL='author@example.com'
ek@Kip:~/tmp/symlink-test (main +)$ export GIT_AUTHOR_NAME='author'
ek@Kip:~/tmp/symlink-test (main +)$ export GIT_COMMITTER_DATE='2000-01-02 00:00:00 +0000'
ek@Kip:~/tmp/symlink-test (main +)$ export GIT_COMMITTER_EMAIL='committer@example.com'
ek@Kip:~/tmp/symlink-test (main +)$ export GIT_COMMITTER_NAME='committer'
ek@Kip:~/tmp/symlink-test (main +)$ git commit -m 'Initial commit'
[main (root-commit) 1d2a185] Initial commit
 Author: author <author@example.com>
 2 files changed, 1 insertion(+)
 create mode 120000 symlink
 create mode 100644 target
ek@Kip:~/tmp/symlink-test (main)$ git log
commit 1d2a185c1faa49105d05af46ebdc49a791f200e1 (HEAD -> main)
Author: author <author@example.com>
Date:   Sat Jan 1 00:00:00 2000 +0000

    Initial commit

And I did this on Windows (in PowerShell):

C:\Users\ek\tmp> git init symlink-test
Initialized empty Git repository in C:/Users/ek/tmp/symlink-test/.git/
C:\Users\ek\tmp> cd symlink-test
C:\Users\ek\tmp\symlink-test [main]> New-Item target >NUL
C:\Users\ek\tmp\symlink-test [main +1 ~0 -0 !]> New-Item -ItemType SymbolicLink -Path symlink -Target target >NUL
C:\Users\ek\tmp\symlink-test [main +2 ~0 -0 !]> Get-ChildItem

    Directory: C:\Users\ek\tmp\symlink-test

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
la---            7/4/2024  3:45 PM              0 symlink -> target
-a---            7/4/2024  3:45 PM              0 target

C:\Users\ek\tmp\symlink-test [main +2 ~0 -0 !]> git add .
C:\Users\ek\tmp\symlink-test [main +2 ~0 -0 ~]> $env:GIT_AUTHOR_DATE = '2000-01-01 00:00:00 +0000'
C:\Users\ek\tmp\symlink-test [main +2 ~0 -0 ~]> $env:GIT_AUTHOR_EMAIL = 'author@example.com'
C:\Users\ek\tmp\symlink-test [main +2 ~0 -0 ~]> $env:GIT_AUTHOR_NAME = 'author'
C:\Users\ek\tmp\symlink-test [main +2 ~0 -0 ~]> $env:GIT_COMMITTER_DATE = '2000-01-02 00:00:00 +0000'
C:\Users\ek\tmp\symlink-test [main +2 ~0 -0 ~]> $env:GIT_COMMITTER_EMAIL = 'committer@example.com'
C:\Users\ek\tmp\symlink-test [main +2 ~0 -0 ~]> $env:GIT_COMMITTER_NAME = 'committer'
C:\Users\ek\tmp\symlink-test [main +2 ~0 -0 ~]> git commit -m 'Initial commit'
[main (root-commit) 1d2a185] Initial commit
 Author: author <author@example.com>
 2 files changed, 1 insertion(+)
 create mode 120000 symlink
 create mode 100644 target
C:\Users\ek\tmp\symlink-test [main]> git log
commit 1d2a185c1faa49105d05af46ebdc49a791f200e1 (HEAD -> main)
Author: author <author@example.com>
Date:   Sat Jan 1 00:00:00 2000 +0000

    Initial commit

The commit hashes (and thus the other object hashes from which they are computed) match exactly.

The Windows system I tested on has core.symlinks set to true in the global scope. But that is actually irrelevant to all this. Even when core.symlinks is set to false (as in the case by default in Git for Windows), Git will recognize and commit symlinks it finds, it just won't create them on checkout.

Why the fixture scripts do not (usually) create symlinks on Windows

The problem is that fixture scripts that run ln -s in Git Bash on Windows usually create copies rather than symbolic links. This, too, is independent of core.symlinks. The Git Bash environment is a fork--or really an alternative distribution--of MSYS2. Its ln executable is accordingly linked with msys-2.0.dll, whose symlink call emulation defaults to just creating copies. See also this answer and comments.

Although MSYS2's default behavior differs from Cygwin--whose default is not to create copies--its MSYS environment variable otherwise corresponds to Cygwin's CYGWIN environment variable:

ek@Glub MINGW64 ~/tmp
$ type ldd ln
ldd is /usr/bin/ldd
ln is /usr/bin/ln

ek@Glub MINGW64 ~/tmp
$ ldd /usr/bin/ln
        ntdll.dll => /c/Windows/SYSTEM32/ntdll.dll (0x7ff99a5f0000)
        KERNEL32.DLL => /c/Windows/System32/KERNEL32.DLL (0x7ff99a4e0000)
        KERNELBASE.dll => /c/Windows/System32/KERNELBASE.dll (0x7ff998140000)
        msys-intl-8.dll => /usr/bin/msys-intl-8.dll (0x430b30000)
        msys-2.0.dll => /usr/bin/msys-2.0.dll (0x210040000)
        msys-iconv-2.dll => /usr/bin/msys-iconv-2.dll (0x5603f0000)

ek@Glub MINGW64 ~/tmp
$ ls -l
total 0

ek@Glub MINGW64 ~/tmp
$ touch c

ek@Glub MINGW64 ~/tmp
$ ln -s c a

ek@Glub MINGW64 ~/tmp
$ MSYS=winsymlinks:nativestrict ln -s c b

ek@Glub MINGW64 ~/tmp
$ ls -l
total 0
-rw-r--r-- 1 ek 197121 0 Jul  4 16:53 a
lrwxrwxrwx 1 ek 197121 1 Jul  4 16:53 b -> c
-rw-r--r-- 1 ek 197121 0 Jul  4 16:53 c

This behavior is preserved without modification in the Git Bash distribution of MSYS2. Changing the default--such as by setting that variable--has been proposed in git-for-windows/git#3122, but so far no such change has been made. Note that this, too, is independent of core.symlinks. The comment discussion there has more details, especially git-for-windows/git#3122 (comment).

Fixture scripts could be changed to make real symlinks

One possibility is that fixture scripts could attempt create real symlinks by setting that environment variable for ln -s commands on Windows, or even by using a different command altogether, such as by calling cmd.exe and using its mklink builtin:

ek@Glub MINGW64 ~/tmp
$ cmd.exe //C mklink d c
symbolic link created for d <<===>> c

ek@Glub MINGW64 ~/tmp
$ ls -l
total 0
-rw-r--r-- 1 ek 197121 0 Jul  4 16:53 a
lrwxrwxrwx 1 ek 197121 1 Jul  4 16:53 b -> c
-rw-r--r-- 1 ek 197121 0 Jul  4 16:53 c
lrwxrwxrwx 1 ek 197121 1 Jul  4 17:17 d -> c

Or stage symlinks without making them

I think making real symlinks, so as to rely on having done so, may not be the best approach.

I said commands like those shown in the preceding subsection "attempt" to create symlinks because, on Windows, if the caller does not have the privilege to create symlinks – see #1310 (comment), #1374 (comment), and git-for-windows/git#3122 (comment) – then the fixtures would still not succeed.

In contrast, some of existing gitoxide test fixture scripts are able to create the desired repository even if running on Windows without the ability to create symlinks. They place entries for them in the index without requiring actual symlinks to exist in the filesystem before being staged, by running git update-index --index-info and passing input that includes entries with octal mode 120000.

Whether the tests that use those fixtures can run without the ability to create symlinks, such as to perform checkouts, is of course another matter. But a test checking that gitoxide correctly obtains OIDs, such as the test failing here, should not need to be able to do that, I don't think.

gix-diff/tests/fixtures/make_diff_repo.sh could be modified to use this or some other technique to always stage a symlink even if it cannot make one. This could be done for all the files in it, if they are not otherwise needed, or just for the symlink.

However, it would run the risk of making things much less clear, and may even make it harder to ensure the fixture script is itself free of bugs. This is because that fixture lays down multiple commits that change the symlink f/f in various ways--including whether it is a symlink or not. If that fixture script is hard to understand, then when tests fail that use it, it will be hard to figure out if the failure is caused by a test bug or a bug in the code under test, and in either case hard to figure out what the failure signifies.

So I am not sure what should be done.

Similar situation for some other tests?

It occurs to me that:

  • Some of the longer-standing tests known to fail on Windows with GIX_TEST_IGNORE_ARCHIVES=1, listed in #1358, might also be failing due to the unsatisfied assumption that ln -s attempts to create actual symbolic links in a Git Bash or other MSYS2 environment on Windows.
  • Some tests related to symlinks that are currently explicitly ignored on Windows may be unable to pass due to this behavior of ln -s.

I will try to look into at least the first of those possibilities soon and update this and/or #1358 accordingly. I've looked into that, and setting MSYS=winsymlinks:nativestrict makes the many_different_states test pass, as expected, but only makes one of the tests pass whose failure was reported in #1358. See #1358 (comment). It also makes nine otherwise passing tests fail, for which I have opened #1443.

Git behavior

Not directly applicable.

As presented above, the behavior of Git, and even more so the behavior of other tools provided as part of a full Git for Windows installation, including its ln command, is relevant. But they are relevant because of how a fixture script uses them, rather than for any compatibility reasons.

Git has a number of shell scripts in its own test suite that make use of ln -s and that are present in the Git for Windows fork. I have not looked into how Git for Windows handles this for development. It has its own SDK that ships more tools than are included in the Git Bash environment--and which gitoxide's test suite would not want to rely on being present--but I don't know how relevant that is.

Steps to reproduce 🕹

To get a full comparison to the prior results given in #1358, I ran:

gix clean -xde
GIX_TEST_IGNORE_ARCHIVES=1 cargo nextest run --all --no-fail-fast

The second command must be run in Git Bash. (Environment variables can be set in PowerShell, but many of the tests require Git Bash or a similar environment to pass on Windows, as reported in #1359. This requirement is not new or recent and is not part of what this issue is reporting.)

Before doing so, I did that without GIX_TEST_IGNORE_ARCHIVES=1, verifying that all tests still passed. Thus the problem is specific to when the fixture script has to be run.

To verify that the problem is also specific to Windows, I ran all the tests, with the above procedure, both with and without GIX_TEST_IGNORE_ARCHIVES=1 and with that full clean in between, on both Ubuntu 22.04 LTS (with Git 2.45.2 installed from the Git stable releases) and macOS 14.5. All tests passed on those systems, both when generated archives were used and when they were not.

As for the more specific test run:

gix clean -xde
cd gix-diff/tests/tree
RUST_BACKTRACE=full GIX_TEST_IGNORE_ARCHIVES=1 cargo nextest run --no-fail-fast
@Byron
Copy link
Member

Byron commented Jul 5, 2024

Thanks a lot for reporting and the intense research - it helped me a lot to understand what's going on.

It's a bit sad to see that 9 tests fail when bluntly applying my preferred solution: setting MSYS=winsymlinks:nativestrict via gix-testtools, but I will read up on your findings there in the hopes this can be made to work after all and in a way that is at least somewhat maintainable.

Lastly, my feeling is that Git requires symlinks to work but might disable support (or otherwise prohibit them) for some Windows related tests. That's how I would do it anyway to be able to use the existing test-suite as is, and it's what I'd wish to accomplish for gitoxide as well. Those running the tests must allow symlinks or expect failure of some tests, something I find reasonable to expect on a developer machine.

@EliahKagan
Copy link
Member Author

Lastly, my feeling is that Git requires symlinks to work but might disable support (or otherwise prohibit them) for some Windows related tests. That's how I would do it anyway to be able to use the existing test-suite as is, and it's what I'd wish to accomplish for gitoxide as well. Those running the tests must allow symlinks or expect failure of some tests, something I find reasonable to expect on a developer machine.

That makes sense, given that the alternatives--at least those I am unaware of--would hamper clarity.

@EliahKagan
Copy link
Member Author

EliahKagan commented Jul 6, 2024

This issue was fixed by #1444.

To confirm this, note the absence of

gix-diff-tests::diff tree::changes::to_obtain_tree::many_different_states

in the updated list of failing tests in #1358.

Although that test was never listed there even before it was updated, that is because I opened this issue for the new failure rather than editing the list there. Since #1444, I have rerun all tests on all three major platforms including Windows. Although some tests continue to fail on Windows when GIX_TEST_IGNORE_ARCHIVES=1 is set, this many_different_states test does not fail on Windows or any other platforms tested, with or without GIX_TEST_IGNORE_ARCHIVES=1.

This is as suspected based on earlier runs in which I set MSYS=winsymlinks:nativestrict manually prior to #1444. Since #1444, that no longer has to be set manually to make many_different_states pass with GIX_TEST_IGNORE_ARCHIVES=1.

(The status of this issue is thus the same as #1443, but unfortunately it doesn't look like I can myself mark this specifically as closed by #1444.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acknowledged an issue is accepted as shortcoming to be fixed
Projects
None yet
Development

No branches or pull requests

2 participants