-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
launcher stops reporting to the server after a "Unavailable" grpc error #134
Comments
@zwass attached some logs from @FritzX6 mac this morning which have debug enabled. The relevant file is It looks like the launcher was failing to connect to the server around 9:35-36am this morning, but then successfully connected and got a list of distributed queries, which it executed. Then the launcher stops logging anything until about 9:55, when I killed the process. Edit: this looks suspicious to me given the interval beween ~9:37-9:55
|
- Correctly detect when error channel is closed (potential fix for #134). Previously the logic was inverted for whether the channel was closed, so recovery was not initiated. Unit test TestOsqueryDies repros the suspected issue. - Allow logger to be set properly. - Add logging around recovery scenarios. - Check communication with both osquery and extension server in health check (previously only the extension server was checked). - Add healthcheck on interval that causes recovery on failure (Closes #141).
- Correctly detect when error channel is closed (potential fix for #134). Previously the logic was inverted for whether the channel was closed, so recovery was not initiated. Unit test TestOsqueryDies repros the suspected issue. - Allow logger to be set properly. - Add logging around recovery scenarios. - Check communication with both osquery and extension server in health check (previously only the extension server was checked). - Add healthcheck on interval that causes recovery on failure (Closes #141). - Do not set cmd output to ioutil.Discard. Causes a bug with cmd.Wait (see golang/go#20730)
- Correctly detect when error channel is closed (potential fix for #134). Previously the logic was inverted for whether the channel was closed, so recovery was not initiated. Unit test TestOsqueryDies repros the suspected issue. - Allow logger to be set properly. - Add logging around recovery scenarios. - Check communication with both osquery and extension server in health check (previously only the extension server was checked). - Add healthcheck on interval that causes recovery on failure (Closes #141). - Do not set cmd output to ioutil.Discard. Causes a bug with cmd.Wait (see golang/go#20730)
I believe this was fixed with #176. Closing it, but we can reopen if there are further reports. |
User reports that once the
Unavailable desc = transport is closing
error happens, the launcher stops communicating with the server.There could be several issues (including server side) but it's important to first eliminate the launcher as a probable cause.
We need to verify if the error causes the grpc connection to be closed completely (not just the request failing) and either re-dial or fatal.
log for reference:
The text was updated successfully, but these errors were encountered: