-
Notifications
You must be signed in to change notification settings - Fork 954
Fix: flaky Dex login test by improving authentication flow handling #3082
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: flaky Dex login test by improving authentication flow handling #3082
Conversation
Signed-off-by: madmecodes <ayushguptadev1@gmail.com>
@juliusvonkohout looking forward for the feedback |
I see a very large git diff, are you sure that you need to change several hundred lines for that ? |
Signed-off-by: madmecodes <ayushguptadev1@gmail.com>
The key addition is just the 403 handling block after the login POST request, which restarts the OAuth flow. The retry logic is also minimal but effective. |
tests/gh-actions/test_dex_login.py
Outdated
s = requests.Session() | ||
|
||
# GET the endpoint_url, which should redirect to Dex | ||
resp = s.get( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lest use clean coding standards, so proper long pronounceable variable names. session instead of s, response instead of resp, url_object instead of url_obj
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay i will be more verbose
tests/gh-actions/test_dex_login.py
Outdated
return "; ".join([f"{c.name}={c.value}" for c in s.cookies]) | ||
|
||
except Exception as e: | ||
if attempt < max_retries - 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you use -1 instead of <= ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used attempt < max_retries - 1 to ensure we retry only up to the second-to-last attempt (e.g., attempts 0 and 1 for max_retries = 3).
On the final attempt (attempt 2), if it fails, we fall into the else block to raise the exception without an extra retry.
Using <= max_retries would imply retrying beyond the intended limit since attempt only goes up to max_retries - 1 in the loop.
attempt <= max_retries would be incorrect since attempt never reaches max_retries in a range(max_retries)=> (0,1,2) and maxRetry=3 loop.
I will update the code, for more readability
…t retry logic Signed-off-by: madmecodes <ayushguptadev1@gmail.com>
Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>
# Handle 403 specifically - might need to restart oauth flow | ||
if response.status_code == 403: | ||
# Try one more approach - go through the oauth2 flow again | ||
oauth_url = f"{urlsplit(self._endpoint_url).scheme}://{urlsplit(self._endpoint_url).netloc}/oauth2/start" | ||
response = session.get( | ||
oauth_url, | ||
allow_redirects=True, | ||
verify=not self._skip_tls_verify, | ||
) | ||
# Continue with normal flow after restart |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference is really hard to read, but here I see a relevant change. I am worried that this could hide the underlying problem. it is more a workaround than a real solution. Why do we need to retry in the first place ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does a bit of load fail the test so easily ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That bit of load did fail the test at that time, but when I checked again the next day, it was able to handle it. So I deliberately triggered a 403 using this code block, and as seen in the output, the old version exited with the error, while the new one worked.
def patched_post(original_post):
def _patched_post(*args, **kwargs):
if random.random() < 0.2:
mock_response = requests.Response()
mock_response.status_code = 403
mock_response._content = b"Simulated 403 error"
mock_response.url = args[0]
return mock_response
return original_post(*args, **kwargs)
return _patched_post
original_post = requests.Session.post
requests.Session.post = patched_post(original_post)

You're right that this is more of a workaround than a solution to the underlying problem. The 403 errors happen when the Dex authentication flow gets disrupted under load. While this fix makes the test more reliable, we should investigate why these errors occur in the first place.
The error occurs randomly and is a bit hard to reproduce. But yes, when it fails, the same error occurs — a 403 against the URL.
Shall I investigate the logs of these pods and run it again to find the root cause?
kubectl logs -n auth $(kubectl get pods -n auth -l app=dex -o name) -f
kubectl logs -n oauth2-proxy $(kubectl get pods -n oauth2-proxy -o name) -f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes please investigate, but i will merge it for now
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: juliusvonkohout The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…ubeflow#3082) * Fix: flaky Dex login test by improving authentication flow handling Signed-off-by: madmecodes <ayushguptadev1@gmail.com> * improve Dex login test reliability with retry and 403 handling Signed-off-by: madmecodes <ayushguptadev1@gmail.com> * Update: Improve test readability with descriptive variables and robust retry logic Signed-off-by: madmecodes <ayushguptadev1@gmail.com> * Update dex_oauth2-proxy_test.yaml Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com> --------- Signed-off-by: madmecodes <ayushguptadev1@gmail.com> Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com> Co-authored-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>
✏️ Summary of Changes
Solution:
403 Handling - When receiving a 403 Forbidden response during login, we attempt to restart the OAuth flow, which helps when sessions expire.
Reproduce the issue:
You will get this:

After implementing 403 handling:
Successfully passed, after main 403 retry. (load test pod still running and if i comment out the 403 mechanism, it again throws the error)
📦 Dependencies
Asked to resolve the issue in PR 3056
🐛 Related Issues
none
✅ Contributor Checklist