Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add app level retry logic for flight requests #119

Closed
wants to merge 4 commits into from

Conversation

sgrebnov
Copy link
Contributor

@sgrebnov sgrebnov commented Jan 16, 2024

#118

  1. Add app level retry logic for flight requests. Num default retries: 3.
const spiceClient = new SpiceClient(api_key);
spiceClient.setMaxRetries(5);

@sgrebnov sgrebnov changed the title [WIP] Add app level retry logic for flight requests Add app level retry logic for flight requests Jan 16, 2024

do_get.on('error', (err: any) => {
client.close();
reject(err);
Copy link
Contributor Author

@sgrebnov sgrebnov Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

‼️ There is a onData callback that could be passed to when calling query. If we fail and retry during data transmission (do_get.on) then there is a chance that we already passed some data to this callback so we might want to indicate that we restart and might want to call this callback with error instance to indicate this. The callback is NOT publicly documented so I'm not sure if it is used at all. Probably is not a big deal

if (onData) {
   onData(err);
  }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is documented here: https://docs.spice.ai/sdks/node.js-sdk/streaming

If we have already sent some data, we should not retry and just fail the request.

Copy link
Contributor Author

@sgrebnov sgrebnov Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to summarize - there are two places which can fail

  1. Connection/data transmission initialization

https://github.com/spiceai/spice.js/blob/trunk/src/client.ts#L71C17-L71C32

  private async getResultStream(
    queryText: string,
    getFlightClient: ((client: FlightClient) => void) | undefined = undefined
  ): Promise<EventEmitter> {
    const meta = new grpc.Metadata();
    const client: FlightClient = this.createClient(meta);
    meta.set('authorization', 'Bearer ' + this._apiKey);

    let queryBuff = Buffer.from(queryText, 'utf8');

    let flightTicket = await new Promise<Ticket>((resolve, reject) => {
      // GetFlightInfo returns FlightInfo that have endpoints with ticket to call DoGet with
      client.GetFlightInfo(
        { type: DescriptorType.CMD, cmd: queryBuff },
        (err: any, result: FlightInfo) => {
          if (err) {
            reject(err);
            return;
          }
          resolve(result.endpoint[0].ticket);
        }
      );
    });

    if (getFlightClient) {
      getFlightClient(client);
    }
    // DoGet return a stream of FlightData
    return client.DoGet(flightTicket);
  }
  1. During data transmission

When error event is emitted after connection is initialized -
https://github.com/spiceai/spice.js/pull/119/files#diff-25d66d74617fe2e23d7946bd6e3ba95640ab1b9bc8947445d604fc271c7c1f12R185

 do_get.on('error', ...
  1. From my testing 1 happens much more often than 2 so
    Option 1 - we can retry 1 case only
    Option 2 - make existing retry logic a little bit complicated and not very trivial as we would like to track if callback has been provided and we already sent some data when propagating the error to see if we this needs to be retried or failed (most likely we will need to add some flag to the error when bubbling it up to retry logic)

I propose to implement more robust Option 2 above (I'll just add additional check and flag to the error that it should not be retried in case of callback was provided and data already sent). @phillipleblanc - please let me know if you think otherwise and we should proceed with Option 2 (handle main case only).

PS. I'll also double check that data is really in the middle of transition in case 2 (as it is possible that the error is sent just after connection or after 'status' event / transmission is completed). I just see the error but not 100% sure that it could happen after 'data' event and before 'status' event (I think sever should avoid such cases as much as it can)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added logic to don't retry if some data has already been sent to callback:
c95149e

@sgrebnov
Copy link
Contributor Author

Closing in favor of #120
Created PR from the same repo instead of personal fork

@sgrebnov sgrebnov closed this Jan 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants