Skip to content
This repository has been archived by the owner on Jan 30, 2020. It is now read-only.

fleetd: check that eStream is available before accessing to that #1623

Merged

Conversation

dongsupark
Copy link
Contributor

@dongsupark dongsupark commented Jul 1, 2016

To avoid potential nil-dereferences in agent reconciliation, check that eStream is available before accessing to eStream. Otherwise, just return.

Example error logs:

ERROR server.go:192: Server register machine failed: 105: Key already exists
(/_coreos.com/fleet/machines/d5ca2e9d4039e480807464280c9ab4a2/object) [203153266]
INFO server.go:188: hrt.Register() success
INFO server.go:198: Starting server components
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x20 pc=0x4bc024]
goroutine 49 [running]:
panic(0xc06080, 0xc82000a0a0)
        /usr/local/go/src/runtime/panic.go:481 +0x3e6
github.com/coreos/fleet/pkg.(*reconciler).Run.func1(0xc8201cef00, 0xc8201b9560, 0xc8201cf2c0)
/opt/fleet/gopath/src/github.com/coreos/fleet/pkg/reconcile.go:65 +0x74
created by github.com/coreos/fleet/pkg.(*reconciler).Run
/opt/fleet/gopath/src/github.com/coreos/fleet/pkg/reconcile.go:69 +0x89
systemd[1]: fleet.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
systemd[1]: fleet.service: Unit entered failed state.
systemd[1]: fleet.service: Failed with result 'exit-code'.

Fixes: #1622

Print "Server register machine failed" as warning instead of error,
to avoid getting users confused by the message that it would look
like a critical error.
To avoid potential nil-dereferences in agent reconciliation, check
that eStream is available before accessing to eStream. Otherwise,
just return.

Example error logs:
====
ERROR server.go:192: Server register machine failed: 105: Key already exists
(/_coreos.com/fleet/machines/d5ca2e9d4039e480807464280c9ab4a2/object) [203153266]
INFO server.go:188: hrt.Register() success
INFO server.go:198: Starting server components
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x20 pc=0x4bc024]
goroutine 49 [running]:
panic(0xc06080, 0xc82000a0a0)
        /usr/local/go/src/runtime/panic.go:481 +0x3e6
github.com/coreos/fleet/pkg.(*reconciler).Run.func1(0xc8201cef00, 0xc8201b9560, 0xc8201cf2c0)
/opt/fleet/gopath/src/github.com/coreos/fleet/pkg/reconcile.go:65 +0x74
created by github.com/coreos/fleet/pkg.(*reconciler).Run
/opt/fleet/gopath/src/github.com/coreos/fleet/pkg/reconcile.go:69 +0x89
systemd[1]: fleet.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
systemd[1]: fleet.service: Unit entered failed state.
systemd[1]: fleet.service: Failed with result 'exit-code'.
====

Fixes: coreos#1622
@dongsupark dongsupark force-pushed the dongsu/fleetd-reconc-nilderef-fix branch from aa6b7df to f591070 Compare July 1, 2016 14:36
@jonboulle
Copy link
Contributor

when would this ever be nil? Isn't that a bug in other code?

@dongsupark
Copy link
Contributor Author

@jonboulle
To be honest, I don't know.
AFAICT the only case of eStream being nil is when disable_watches is turned on. (See https://github.com/coreos/fleet/blob/master/server/server.go#L112) disable_watches defaults to false. If disable_watches is set to true by user, rStream (eStream) could be nil.
The thing is, I cannot reproduce the crash. Even if the commit could fix the case, I'm not sure if that's the same one as what @ewoutp has seen. So if anyone has a better idea, please let me know.

@ewoutp
Copy link
Contributor

ewoutp commented Jul 4, 2016

I do have disabled_watches turned on.

@ewoutp
Copy link
Contributor

ewoutp commented Jul 4, 2016

This is the config of one of my machines (only METADATA differs per machines)

[Service]
Environment=FLEET_METADATA=region=fr-1,odd=true,core=true,lb=true,gogs-sync=true,registry-sync=true
Environment=FLEET_PUBLIC_IP=192.168.35.3
Environment=FLEET_AGENT_TTL=30s
Environment=FLEET_DISABLE_ENGINE=false
Environment=FLEET_DISABLE_WATCHES=true
Environment=FLEET_ENGINE_RECONCILE_INTERVAL=10
Environment=FLEET_TOKEN_LIMIT=50

@dongsupark
Copy link
Contributor Author

I do have disabled_watches turned on.

@ewoutp Aha. Then it makes sense. Hopefully this PR could fix the bug in your case. Thanks!

@dongsupark dongsupark merged commit f14967a into coreos:master Jul 7, 2016
@dongsupark
Copy link
Contributor Author

Merged. Thanks.

@dongsupark dongsupark deleted the dongsu/fleetd-reconc-nilderef-fix branch July 7, 2016 13:15
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants