Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remediate start-up issue and restart docker service on atd-data03 #11593

Closed
frankhereford opened this issue Mar 1, 2023 · 4 comments
Closed
Assignees
Labels
Impact: 1-Severe Severely impacts TPW service delivery Need: 1-Must Have No point in delivering a solution without this Product: Moped A comprehensive mobility project tracking platform for Austin, Texas Product: Vision Zero Crash Data System Centralize the management of ATD's Vision Zero data Service: Dev Infrastructure and engineering Type: Bug Report Something is not right Type: DevOps Continuous integration pipeline operations and infrastructure Workgroup: DTS Data and Technology Services

Comments

@frankhereford
Copy link
Member

CTM updated the docker service on atd-data03 last night during schedule upgrades. The docker service on atd-data03 no longer starts, reporting:

Mar  1 07:00:59 atd-data03 dockerd: failed to start daemon: error initializing graphdriver: prior storage driver devicemapper is deprecated and will be removed in a future release; update the the daemon configuration and explicitly choose this storage driver to continue using it; visit https://docs.docker.com/go/storage-driver/ for more information

Borrowing from my comment in slack:

It looks like the docker version we rolled to has deprecated the devicemapper system used to provide block devices to running docker containers. I think this warrants and issue to document this work, so I’m going to create one. Short story is though that I’m going to try to reenable it with explicit flags to the service that are still available, but at some point, we’re going to be forced to roll to overlay2 or aufs, which will be a good thing, albeit, perhaps a bit of a lift? Never done it, not sure how much work it will be. More info here, see the devicemapper section.

This issue intends to track the creation of a solution for this problem and its execution.

@frankhereford frankhereford added Workgroup: DTS Data and Technology Services Type: Bug Report Something is not right Impact: 1-Severe Severely impacts TPW service delivery Service: Dev Infrastructure and engineering Need: 1-Must Have No point in delivering a solution without this Product: Moped A comprehensive mobility project tracking platform for Austin, Texas Type: DevOps Continuous integration pipeline operations and infrastructure Product: Vision Zero Crash Data System Centralize the management of ATD's Vision Zero data labels Mar 1, 2023
@frankhereford frankhereford self-assigned this Mar 1, 2023
@frankhereford
Copy link
Member Author

I've edited /lib/systemd/system/docker.service to add --storage-driver devicemapper to the ExecStart line. This re-enables the depreciated devicemanager driver. The service has been restarted.

This is a band-aid of a fix. This is a deprecated storage driver, and docker is going to remove it at some point. There's no telling when this will be. Moving to overlay2 is my recommended route, but it's going to be sticky because there's no obvious way to migrate from one storage solution to the next. If we don't mind our docker storage devices getting wiped out, then it's no big deal, so we should be square on atd-data03. The airflow installation may be a whole other ball of wax, as there has to be some degree of state in the airflow container to maintain logs, past run results, etc. @chiaberry, I recommend we spin off an issue to do that migration on our own schedule instead of waiting until we're forced to by a CTM upgrade.

Why is this a band-aid? It's not the most performant solution, and more importantly, it's reasonable to expect CTM to overwrite the configuration file which contains the fix itself, as well as the aforementioned support to be dropped from the software all together.

@frankhereford
Copy link
Member Author

Turns out that atd-data02 is already using overlay2, so the above worry about migrating airflow is moot. 😅

@frankhereford
Copy link
Member Author

We got to do this again on 4/25/2023. Updated packages from CTM installs overwrite the systemctl service definition file and the dockerd service does not start up. Everything here is vanilla CentOS; we need to start to deviate from the way it wants to work to avoid using devicemanager file system drivers.

@frankhereford
Copy link
Member Author

We went back to the dance for another spin on 7/19/2023. Very interestingly, the worry about airflow is real again because now our primary airflow is on data03!

I'm going to spin off an issue to deprecate devicemapper on data03 while we are still young in the tooth on this airflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Impact: 1-Severe Severely impacts TPW service delivery Need: 1-Must Have No point in delivering a solution without this Product: Moped A comprehensive mobility project tracking platform for Austin, Texas Product: Vision Zero Crash Data System Centralize the management of ATD's Vision Zero data Service: Dev Infrastructure and engineering Type: Bug Report Something is not right Type: DevOps Continuous integration pipeline operations and infrastructure Workgroup: DTS Data and Technology Services
Projects
None yet
Development

No branches or pull requests

1 participant