Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add/checkout optional git path #24

Merged
merged 4 commits into from
Mar 13, 2020
Merged

Conversation

vsoch
Copy link
Collaborator

@vsoch vsoch commented Mar 11, 2020

This pull request will allow for easier usage of the url checker on GitHub actions, meaning that the git_path is optional, and if not provided, we use the checkout of the branch that is already done by default. Specifically:

  • git_path is now optional. If not provided, it defaults to using the environment variable GITHUB_WORKFLOW where the repository is cloned, and if that isn't found, it defaults to the present working directory.
  • branch is an added variable in the case that a user does want to clone, the branch can be specified. This defaults to master.
  • cleanup is an added argument, which is needed now that we are interacting with the checked out repository directly. Most users won't want this deleted, so the default is false.
  • I also added a subfolder argument, the reason being that after clone of a git path (or use of the already checked out repository) we might want to limit checks to a particular folder. In my testing case, I only wanted to parse files in docs/
  • I noticed that several URLs in my testing returned 403, and this was because of the user agent string. I added a function to randomly choose a more human like user agent, and these 403s turned to 200.
  • I also updated the del_repo and _clone_repo functions to use the return code. The del_repo was previously returning a boolean regardless of status, which doesn't tell us much.
  • the README.md is updated for input arguments, and also with usage examples for both cases (with and without git_path).
  • Finally, I added an extra strip to clean up urls in the case that there is a trailing newline.

You can see the branch in action here: https://github.com/HPC-buildtest/buildtest-framework/pull/211/checks?check_run_id=501529549

vsoch added 3 commits March 11, 2020 12:22
If the user is running on GitHub actions, it is most likely that
a separate clone is not necessary, but instead we parse content
in the git repository base directory. If a git_path and branch are
provided, then we instead clone these paths and proceed as we did before.
Additionally, since deletion is not always required, we add a cleanup
variable that defaults to false (we dont delete the repository uless
requested by the user

Signed-off-by: vsoch <vsochat@stanford.edu>
Im also adding a subfolder argument so the user can select a subfolder of the
repository to check. In practice, if you parse an entire repository for urls
you find that there are some strings that should not be checked (eg
something like "https://%s" %path

Signed-off-by: vsoch <vsochat@stanford.edu>
Testing has shown that some sites return 403 (Forbidden) if its
not an okay looking (human browser) user agent. Thus, we add a function
to derive a user agent and use it for the requests

Signed-off-by: vsoch <vsochat@stanford.edu>
@vsoch
Copy link
Collaborator Author

vsoch commented Mar 11, 2020

@SuperKogito the tests are timing out and I'm not sure why. When you get a chance, could you take a look and give me a sense of what might be the issue? There is no output whatsoever so it's hard to tell. The action functions fine in my real world test case, so I'm wondering if a test needs to be updated for my changes.

deletion_status = del_repo(base_path)
# delete repo when done, if requested
if cleanup == "true":
del_repo(base_path)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if I get this change.
We usually clone the repo, run checks and the ideal final step is to clean up by removing the cloned repo so why would the user would want to keep the clone? can you please explain the logic behind it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most users that use the action will get the repo by way of the github/actions/checkout step run before, in which case the repository isn't cloned, but the base path used is that checked out repo. 95% of users will not want the check to be run and then have their repository deleted. Let me know if this is clear or if you need clarification on any point.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is an explicit, real world example. If I were to delete the repository after running the URL check step, none of the testing steps that follow would work.


    steps:
    - uses: actions/checkout@v2
    - name: URLs-checker
      uses: urlstechie/URLs-checker@add/checkout-optional-git-path
      with:
        subfolder: docs
    - name: Run Tests
      run: |
         # This (and anything following) will not work if URLs-checker deletes my repo!
         pytest tests/*.py

@SuperKogito
Copy link
Member

I like some of the new changes but I don't get some. Overall, I think the PR is a good improvement. I mentioned some changes I would like to have. We can discuss them here of course.
I will be testing your changes over the weekend, and hopefully the next release would be at the latest on Sunday's evening.
Concerning the test, I have some ideas for the Travis-ci tests that I will seek to improve as soon as I can. urlstechie/urlchecker-test-repo#3 should also help with the testing once implemented.

@vsoch
Copy link
Collaborator Author

vsoch commented Mar 11, 2020

Sounds good! There is definitely no rush, and post any questions that you have here, I'll find them :)

@SuperKogito SuperKogito merged commit 0330c74 into master Mar 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants