add ignore_rpc_timeout option to allow suppressing rpc timeout errors #5137
+13
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
Currently we run the watchtower to monitor validators on both mainnet and testnet. Though we have configured the instance to have a higher unhealthy threshold, ignore bad gateway errors, bear a longer connection time, and check the status less frequently (e.g.,
--unhealthy-threshold 2 --ignore-http-bad-gateway --rpc-timeout 60 --interval 65
), we still receive a lot ofoperation timed out
alerts. Such errors are more related to the availability of RPC endpoints and for now, we would like to suppress such errors.Summary of Changes
This PR adds a new optional cli option
--ignore_rpc_timeout
to allow users to suppress rpc timeout errors. The default value ofignore_rpc_timeout
isfalse
so merging this PR does not change the default behavior of watchtower. It is up to users to decide whether they would like to ignore rpc timeouts.