Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix timing issues in subscription tests #76

Merged
merged 2 commits into from
Jun 28, 2023

Conversation

rjwills28
Copy link
Collaborator

This PR fixes the issue described in #64.

There is a timing issue relating to when you start a subscription to Coniql with respect to where the PV is in its update cycle (updating at 2Hz). Occasionally this can mean that you don't collect as many results in the time period as you expect, which causes the tests to fail.

Instead, I have changed these type of tests so that they collect 3 samples and check that no updates were missed (i.e. the results increment by 1 each time). I have also removed the need to ensure that the updating PV is reset to 0 for each test, which can also lead to some timing problems.

I've rerun the tests multiple times and no longer see failures but as this was always an intermittent problem we should still be aware of it.

Copy link
Contributor

@AlexanderWells-diamond AlexanderWells-diamond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for investigating this, good spot that the failure could happen to either protocols. This change is definitely an improvement to the current intermittent failures we see.

I do see one problem: the two modified tests will never terminate if something goes wrong with either the IOC or the subscriptions such that we never receive data. I'd be inclined to add back in the timer, with a large timeout value ( 20 seconds or so), and a pytest.fail() if it ever exceeds the timeout.

Also add timeout on blocking subscriptions.
@codecov
Copy link

codecov bot commented Jun 23, 2023

Codecov Report

Merging #76 (a14e0fa) into main (8f9c7de) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main      #76   +/-   ##
=======================================
  Coverage   93.30%   93.30%           
=======================================
  Files          10       10           
  Lines         807      807           
=======================================
  Hits          753      753           
  Misses         54       54           

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@rjwills28
Copy link
Collaborator Author

Adding a timeout for the subscription responses is a good idea. I discovered while doing this that we were previously actually not accounting for this as the awaits for the responses is blocking and so we would sit here forever waiting if no data is sent from the websocket. I have added an await_for to catch this case. However the old ws protocol constantly sends 'keep alive' messages so we do also need to implement a timeout ourselves as suggested. Comments have been added to the code to describe why both are necessary.

@AlexanderWells-diamond
Copy link
Contributor

I was running test_subscribe_pv on my PC and noticed an odd behaviour. I deleted the ioc parameter, so there are no PVs at all, and yet somehow test_subscribe_pv[graphql_transport_ws_protocol] PASSED was reported! The other test did fail as expected: test_subscribe_pv[graphql_ws_protocol] FAILED

Strangely, if I just run this one test in isolation (i.e. pytest tests/test_aiohttp.py::test_subscribe_pv) it correctly fails both, with different errors.

I'm concerned that we have some leakage of test state somewhere - perhaps the IOCs aren't shutting down fast enough, and future tests are catching the IOC meant for the previous test? I note that our PV prefix is currently global across all tests in a given run.

Sorry for finding more issues. If you feel this problem deserves its own PR I can shift this to an Issue.

@rjwills28
Copy link
Collaborator Author

The problem is that we run the ioc as a 'module' fixture, which means that the ioc clean up does not get called into the module has completed. So even though the ioc is not included as part of the test_subscribe_pv, the ioc is still running and so the tests can pass. Interestingly I find that both protocols are passing in that test. The only way way I can think to fix this would be to switch the ioc to 'function' but that means that we would have to start the ioc for each test, which we were previously trying to avoid as it slows down the tests.

@AlexanderWells-diamond
Copy link
Contributor

Ah, I hadn't spotted that. In that case, I don't entirely understand why my tests fail as the IOC should still be running!

Either way, this is better than it was so I think we can merge at this point.

Copy link
Contributor

@AlexanderWells-diamond AlexanderWells-diamond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@rjwills28 rjwills28 merged commit 91bf0c9 into main Jun 28, 2023
@rjwills28 rjwills28 deleted the fix_intermittent_test_failures branch June 28, 2023 08:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants