-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stack error reasons on block retrieving failure [JIRA: RCS-223] #1177
Conversation
Sorry = {error, notfound}, | ||
UUID, BlockNumber, _RcPid, MaxRetries, ErrorReasons) | ||
when is_list(ErrorReasons) andalso length(ErrorReasons) > MaxRetries -> | ||
Sorry = {error, hd(ErrorReasons)}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder returning whole reasons or just head of this, which is better ... riak_cs_get_fsm
handles this Reason just for logging. So maybe this should be rather a list of errors than the head of errors.
I also wonder right way to stack all errors, to add source of errors like from local get or remote. |
TL;DR; In my humble opinion, return all stacked errors to caller is It's difficult question: which error is most significant ;) There are three kinds of "get" in block server (excluding legacy
Possible usual error cases and triage of them:
Some other considerations:
Adding classification sounds nice :) e.g.
|
Added three indicator: |
{error, notfound} -> | ||
RetryFun(failure); | ||
RetryFun([{local_quorum, notfound}|ErrorReasons]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This branch indicates
- get object failed by notfound with default N (usually 3) and
- proxy get is disabled or not-used (because local cluster is origin).
There is no strong reason that retry make the operation succeed after this error.
Original "immediately fail" seems reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intention was not to change original behaviour. If we change original behaviour, it'd be another pull request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behaivior is changed.
Original will call RetryFun
with the atom failure
and original
do_get_block/11
was implemented as follows:
do_get_block(ReplyPid, _Bucket, _Key, _ClusterID, _UseProxyGet, _ProxyActive,
UUID, BlockNumber, _RcPid, MaxRetries, NumRetries)
when is_atom(NumRetries) orelse NumRetries > MaxRetries ->
Sorry = {error, notfound},
then there is no retry, immediate failure.
Updated. Looks like some riak_tests passing. |
{error, _} -> | ||
RetryFun(NumRetries + 1) | ||
{error, Reason} -> | ||
RetryFun([{remote_quorum, Reason}|ErrorReasons]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{local_quorum, notfound}
is missing in the stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops
b50b8ce
to
51374e9
Compare
"May the force-push be with you." "Use the force-push, Luke!" |
This PR improves visibility inside block object fetch failures 👍 🎢 📇 |
Stack error reasons on block retrieving failure [JIRA: RCS-223] Reviewed-by: shino
@borshop merge |
For release note: On retrieving blocks there is a complex logic to resolve blocks when GET is requested from client. First, CS tries to retrieve a block with _[posted via JIRA by Kota Uenishi]_ |
No description provided.