Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add compliation check mechanism #882

Open
PeterJCLaw opened this issue Aug 31, 2019 · 18 comments
Open

Add compliation check mechanism #882

PeterJCLaw opened this issue Aug 31, 2019 · 18 comments
Labels
feature Request for a new feature needs discussion Need some more discussion

Comments

@PeterJCLaw
Copy link

I'd like to be able to run a command in CI to ensure that my requirements.in produces my requirements.txt, mostly as a guard against accidental edits to the latter which get undone when pip-compile is next run.

I'm currently using pip-compile --dry-run with some bodging of the output then run through a diff against my requirements.txt to enable this, however it's (predictably) not exactly reliable (sometimes I get warning text in the output which wouldn't be in the requirements file).

I can see a couple of solutions:

  • make pip-compile output the requirements.txt to STDOUT, rather than STDERR, so that there's a stream which I can use which is sure to be free of other text
  • make an actual check mode

The latter feels like the much better solution, though it seems likely to be more effort. I've no idea what the former would break though. (pip-compile seems not to use STDOUT at all?)

I'm using Python 3.5 and pip-tools 4.1.0.

PeterJCLaw added a commit to PeterJCLaw/srobo-runbook that referenced this issue Aug 31, 2019
Until jazzband/pip-tools#882 is solved,
this is probably as reliable as we're going to get.
@atugushev
Copy link
Member

Hello @PeterJCLaw,

Thanks for bringing this up. I'd prefer to do it in a way of "Single Responsibility Principle":

$ pip-compile --quiet && git diff --exit-code

If check is failed you can see the changes of requirements.txt.

@atugushev atugushev added the feature Request for a new feature label Sep 2, 2019
@PeterJCLaw
Copy link
Author

That's definitely nicer than my previous approach, though it does assume that the requirements file is under version control (I agree this is likely), which feels like an odd restriction, and that no other changes have been made to the working area.

As far as I know in pip-compile can only operate on an output file which is the same as it's reference/lock file (I'm not sure what term you use for that). If there were a way to have a different input file than the output that might help here. (This is kinda where I was going with suggesting the change to --dry-run's output). That said, I'm not actually sure that this is a good idea either (and introducing it just for this case feels a little contrived).

I can see that it would be nice to defer the comparison part to another tool, however one could argue that the --dry-run behaviour is already headed down this path -- why would you need --dry-run when you can can just run pip-compile and then reset the file from version control if you don't want the changes? (There's even an argument that that's preferable to --dry-run because you can easily diff the file)

In any case, git diff --exit-code works for my use-case, so I'm happy to use that for now.

@atugushev
Copy link
Member

atugushev commented Sep 2, 2019

@PeterJCLaw

In any case, git diff --exit-code works for my use-case, so I'm happy to use that for now.

I'm glad to help you. Anyways, i like the idea of the optional check mode, so let's leave this issue open and see whether the community is interested in this feature.

@atugushev atugushev added the needs discussion Need some more discussion label Sep 19, 2019
@Zac-HD
Copy link

Zac-HD commented Oct 4, 2019

I use tox to run my autoformatters (and pip-compile), then linters and tests. In local development that means "fix everything the check it", but in CI it's followed by git diff --exit-code to check that whatever was committed is good.

So count me as a vote for single responsibilities!

@PeterJCLaw
Copy link
Author

In any case, git diff --exit-code works for my use-case, so I'm happy to use that for now.

It seems I spoke too soon. While this works for the case I was looking at when I reported this issue (in an open source repo of a small project), it doesn't in a case I've been looking at at work.

For the build of one of our (fairly large) internal repos we have a multi-stage CI pipeline which caches the source files into the job workspace after the initial git checkout for speed, but doesn't persist the repo as part of the cache. As a result when we come to run this check, there isn't a git repo to use for diffing.

I could probably copy the requirements.txt out the way first, run a non-dry-run and then diff the copy against the new file, but that's definitely starting to feel like I'm bodging things again.

It would be great if there was a way to validate a pair of requirements files without relying on a version control repo.

@atugushev
Copy link
Member

atugushev commented Oct 30, 2019

There's another way — use pre-commit, see #974.

@atugushev
Copy link
Member

atugushev commented Apr 16, 2020

Also related to the PR #1070 (which is currently under discussion) with the pip-check mechanism.

@AndydeCleyre
Copy link
Contributor

While this doesn't help in the case where the needed git data isn't present, I'll point out that you can pass specific files, like

$ git diff --exit-code -- requirements.txt

so the repo doesn't need to be totally pristine.

When you can't count on git, an alternative comparison method is to create a hash beforehand and verify it after:

$ md5sum requirements.txt >requirements.txt.md5
$ pip-compile
$ md5sum -c requirements.txt.md5

@georgipopovhs
Copy link

georgipopovhs commented May 5, 2022

There is a verify command in pip-compile-multi (a tool based on pip-compile) which works similarly to @AndydeCleyre 's suggestion with MD5, however it's based on the hash of the input file.

pip-compile-multi adds an additional top line to the lockfile with the SHA1 of the orignal .in file, and pip-compile-multi verify simply checks that SHA1.

I think adding similar functionality to pip-compile could be backward compatible and useful too.

Edit: I just realized there are complications around handling requirements files that import other requirements files, because the imported files might change, so they will need to taken into consideration as well.

@AndydeCleyre
Copy link
Contributor

AndydeCleyre commented May 5, 2022

Also, if an already pinned version gets revoked from the repository, the same input would generate new output. I'm guessing we'd want the check to "fail" in this case, rather than just confirming that the input is unchanged?

@georgipopovhs
Copy link

Also, if an already pinned version gets revoked from the repository, the same input would generate new output. I'm guessing we'd want the check to "fail" in this case, rather than just confirming that the input is unchanged?

Good point, if the version is revoked the previous output file would be unusable.

@MichaelKim0407
Copy link

For anyone who comes across this issue, I came up with a solution that doesn't have much to do with resolving pip dependencies. I just hash both the input and output files and store the hashes in version control. This way if either of them changes, I will be able to tell in CI by comparing the hashes to the files.

def get_file_hash(file: str) -> str:
    with open(file) as f:
        s = '\n'.join(line.rstrip() for line in f)
    return hashlib.md5(s.encode()).hexdigest()

The rstrip is mostly because of different line endings in different OS's. I want to make sure the hash is the same wherever the repo is cloned.

The full script can be found here.

@lordmauve
Copy link

I'd like the hashing idea to be supported in pip-compile natively. That's because we use pyproject.toml for our dependencies, and I don't want to hash all of pyproject.toml. It would be ideal if pip-compile could extract dependency specifications from the inputs, subject to --extras, constraints, etc, hash ONLY those, and write the hash into the compiled requirements.txt as a comment, near the This file is autogenerated by pip-compile line. This would allow checking, and fast re-runs if nothing has changed.

@Cypher1
Copy link

Cypher1 commented Jun 6, 2024

I believe that an MD5 based approach misses an opportunity to get the user feedback on what has changed, and, while this is pretty reasonable, requires users to use version control to be able to recover changes.

Personally, I am working on the assumption that I can add new files to the repo, but also that Git may not be available (e.g. on build bots).

I therefore use the --output argument to send the output to a secondary file requirements.txt.check.
Then I can diff the two files and, when run locally, the user can mv requirements.txt.check requirements.txt without having to re-run.

EDIT: One wrinkle is that the requirements.txt file then needs to be modified to remove the .check suffix, but this appears acceptable.

Could this behaviour become part of pip-comile?

@pohmelie
Copy link

pohmelie commented Jun 6, 2024

I subscribed for this, so my solution right now is to use code below, but I want this to be implemented in pip-tools

import sys
from contextlib import contextmanager
from difflib import unified_diff
from pathlib import Path
from subprocess import run

PYTHON = sys.executable
REQUIREMENTS_IN = Path("requirements.in")
REQUIREMENTS_TXT = Path("requirements.txt")
REQUIREMENTS_TXT_ORIGINAL = Path("requirements.txt.original")


@contextmanager
def move_original_file():
    REQUIREMENTS_TXT_ORIGINAL.write_text(REQUIREMENTS_TXT.read_text())
    try:
        yield
    finally:
        # intentionally not rolling back the changes
        REQUIREMENTS_TXT_ORIGINAL.unlink()


def main():
    with move_original_file():
        print("Python:", PYTHON)
        arguments = [PYTHON, "-m", "piptools", "compile", REQUIREMENTS_IN, "-o", REQUIREMENTS_TXT]
        run(arguments, check=True, capture_output=True)

        diff = unified_diff(
            REQUIREMENTS_TXT_ORIGINAL.read_text().splitlines(),
            REQUIREMENTS_TXT.read_text().splitlines(),
            fromfile="requirements.txt.original",
            tofile="requirements.txt",
        )
        diff = "\n".join(diff)
        if diff:
            print("Error: requirements.txt is out of sync with requirements.in")
            print(diff)
            sys.exit(1)


if __name__ == "__main__":
    main()

@GianlucaFicarelli
Copy link

GianlucaFicarelli commented Jun 10, 2024

As some other people in the thread, I'd like to have a check option in pip-tools too, but I wanted to mention another possible diff command not needing git, and not modifying existing files, nor writing any tmp file to disk:

diff --ignore-blank-lines requirements.txt \
<(python -m piptools compile --dry-run 2>&1 | grep -vi Dry-run)

or, in case the commented lines do not need to be considered in the comparison:

diff --ignore-blank-lines \
<(grep -vE "^ *\#" requirements.txt) \
<(python -m piptools compile --dry-run 2>&1 | grep -vi Dry-run | grep -vE "^ *\#")

EDIT: use --dry-run 2>&1 instead of -q -o -, so the good-enough already-locked versions are considered valid

@AndydeCleyre
Copy link
Contributor

@GianlucaFicarelli That's good! Just note that since the output is set to stdout, it won't consider good-enough already-locked versions, like what happens when output is an existing lockfile.

@GianlucaFicarelli
Copy link

@GianlucaFicarelli That's good! Just note that since the output is set to stdout, it won't consider good-enough already-locked versions, like what happens when output is an existing lockfile.

That's true @AndydeCleyre , thanks for pointing it out!
I'm updating the previous comment to use the option --dry-run instead of -q -o -, so the good-enough already-locked versions are considered valid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Request for a new feature needs discussion Need some more discussion
Projects
None yet
Development

No branches or pull requests

10 participants