Skip to content
This repository has been archived by the owner on Jan 30, 2020. It is now read-only.

fix(agent): Clear UnitState before unscheduling on shutdown #463

Merged
merged 1 commit into from
May 15, 2014

Conversation

bcwaldon
Copy link
Contributor

On graceful shutdown, an Agent must clear each individual Job's UnitState
before unscheduling it. If an Agent takes these two actions in the
opposite order, it runs the risk of deleting legitimate state from the
Registry that was published by a different Agent.

On graceful shutdown, an Agent must clear each individual Job's UnitState
before unscheduling it. If an Agent takes these two actions in the
opposite order, it runs the risk of deleting legitimate state from the
Registry that was published by a different Agent.
@bcwaldon
Copy link
Contributor Author

This was evident in the indeterminacy of functional tests exercising unit migrations.

@jonboulle
Copy link
Contributor

This brings up the point that it might be worthwhile blocking agents from removing anything other than their own state.

@bcwaldon
Copy link
Contributor Author

@jonboulle Absolutely. I started working through a slightly different approach this afternoon, but it did not pan out. The quickest way I see to get there is to track the index of the last published state for each Job in AgentState. We can use that when we call RemoveUnitState in the prevIndex query param. Thoughts?

@bcwaldon
Copy link
Contributor Author

I'd like to get this in as a bugfix regardless of the approach we come out of that discussion with, though

@bcwaldon
Copy link
Contributor Author

It sounds suspiciously like #452

@jonboulle
Copy link
Contributor

Yeah, :shipit: for bugfix

@jonboulle
Copy link
Contributor

I think the index tracking is OK - but wouldn't a simpler solution just be to base it on the machineid? (related point: should we be purging unitstate during Initialize() for all jobs that get unscheduled?)

@bcwaldon
Copy link
Contributor Author

@jonboulle How would we identify that state is owned by a given machineid?

@jonboulle
Copy link
Contributor

UnitState.MachineState.ID, no?

@bcwaldon
Copy link
Contributor Author

@jonboulle So we would have to do a GET on the unit state, compare the ID to the local ID, then issue a compareAndDelete with the prevIndex the modifiedIndex of the initial GET. That's the same as tracking the indexes internally, but without the actual tracking part (which I like). Let's do that.

@jonboulle
Copy link
Contributor

👍

bcwaldon added a commit that referenced this pull request May 15, 2014
fix(agent): Clear UnitState before unscheduling on shutdown
@bcwaldon bcwaldon merged commit 20cd7e4 into coreos:master May 15, 2014
@bcwaldon bcwaldon deleted the agent-shutdown-order branch May 15, 2014 14:48
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants