fix(agent): Clear UnitState before unscheduling on shutdown #463

bcwaldon · 2014-05-15T01:10:36Z

On graceful shutdown, an Agent must clear each individual Job's UnitState
before unscheduling it. If an Agent takes these two actions in the
opposite order, it runs the risk of deleting legitimate state from the
Registry that was published by a different Agent.

On graceful shutdown, an Agent must clear each individual Job's UnitState before unscheduling it. If an Agent takes these two actions in the opposite order, it runs the risk of deleting legitimate state from the Registry that was published by a different Agent.

bcwaldon · 2014-05-15T01:11:00Z

This was evident in the indeterminacy of functional tests exercising unit migrations.

jonboulle · 2014-05-15T01:11:56Z

This brings up the point that it might be worthwhile blocking agents from removing anything other than their own state.

bcwaldon · 2014-05-15T01:14:37Z

@jonboulle Absolutely. I started working through a slightly different approach this afternoon, but it did not pan out. The quickest way I see to get there is to track the index of the last published state for each Job in AgentState. We can use that when we call RemoveUnitState in the prevIndex query param. Thoughts?

bcwaldon · 2014-05-15T01:15:14Z

I'd like to get this in as a bugfix regardless of the approach we come out of that discussion with, though

bcwaldon · 2014-05-15T01:15:26Z

It sounds suspiciously like #452

jonboulle · 2014-05-15T01:17:58Z

Yeah, for bugfix

jonboulle · 2014-05-15T01:20:29Z

I think the index tracking is OK - but wouldn't a simpler solution just be to base it on the machineid? (related point: should we be purging unitstate during Initialize() for all jobs that get unscheduled?)

bcwaldon · 2014-05-15T01:21:15Z

@jonboulle How would we identify that state is owned by a given machineid?

jonboulle · 2014-05-15T01:24:36Z

UnitState.MachineState.ID, no?

bcwaldon · 2014-05-15T01:26:18Z

@jonboulle So we would have to do a GET on the unit state, compare the ID to the local ID, then issue a compareAndDelete with the prevIndex the modifiedIndex of the initial GET. That's the same as tracking the indexes internally, but without the actual tracking part (which I like). Let's do that.

jonboulle · 2014-05-15T01:29:03Z

👍

fix(agent): Clear UnitState before unscheduling on shutdown

bcwaldon added a commit that referenced this pull request May 15, 2014

Merge pull request #463 from bcwaldon/agent-shutdown-order

20cd7e4

fix(agent): Clear UnitState before unscheduling on shutdown

bcwaldon merged commit 20cd7e4 into coreos:master May 15, 2014

bcwaldon deleted the agent-shutdown-order branch May 15, 2014 14:48

bcwaldon mentioned this pull request May 15, 2014

Compare existing UnitState before removing from Registry #465

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agent): Clear UnitState before unscheduling on shutdown #463

fix(agent): Clear UnitState before unscheduling on shutdown #463

bcwaldon commented May 15, 2014

bcwaldon commented May 15, 2014

jonboulle commented May 15, 2014

bcwaldon commented May 15, 2014

bcwaldon commented May 15, 2014

bcwaldon commented May 15, 2014

jonboulle commented May 15, 2014

jonboulle commented May 15, 2014

bcwaldon commented May 15, 2014

jonboulle commented May 15, 2014

bcwaldon commented May 15, 2014

jonboulle commented May 15, 2014

fix(agent): Clear UnitState before unscheduling on shutdown #463

fix(agent): Clear UnitState before unscheduling on shutdown #463

Conversation

bcwaldon commented May 15, 2014

bcwaldon commented May 15, 2014

jonboulle commented May 15, 2014

bcwaldon commented May 15, 2014

bcwaldon commented May 15, 2014

bcwaldon commented May 15, 2014

jonboulle commented May 15, 2014

jonboulle commented May 15, 2014

bcwaldon commented May 15, 2014

jonboulle commented May 15, 2014

bcwaldon commented May 15, 2014

jonboulle commented May 15, 2014