Skip to content

Latest commit

 

History

History
102 lines (62 loc) · 11.9 KB

backup-and-restore.md

File metadata and controls

102 lines (62 loc) · 11.9 KB

KubeDirector includes some support for working in concert with a K8s (Kubernetes) resource backup solution. This document describes what this support entails (and what it does not), along with some specific tips for backing up kdclusters (KubeDirector clusters) using the Velero backup solution.

NON-GOALS

First: KubeDirector is not, itself, a backup solution. It also does not address the complexities of backing up native K8s resources such as Services and StatefulSets. You need to use a dedicated K8s backup solution for such things. The KubeDirector features described in this document revolve around properly managing KubeDirector's custom resource types when they are backed up and restored.

KubeDirector also does not yet provide hooks for automatically quiescing either application activity or KubeDirector activity during a backup. If an application must be quiesced for backup, that currently must be handled outside of KubeDirector's configurations/policies. The "BACKUP PREPARATION" section below discusses considerations for whether KubeDirector activities themselves must be quiesced for backup.

Finally, it's worth mentioning that only the kdcluster resource needs special handling for backup and restore. The (less-complex) kdapp and kdconfig resources currently have no issues that need to be addressed here.

GOALS

A K8s backup solution will, broadly speaking, interact with the K8s API to read a set of resources for backup. On restore, it will use the K8s API to re-create those resources. There are many subtleties and corner cases to handle in this process, for example dealing with PV content and its relationship to PVCs, or for handling fields that were originally generated by K8s rather than user-specified (such as a dynamically-allocated host port of a NodePort service). But for the purposes of discussing KubeDirector support for backup-and-restore, the general concept of backup-and-restore as "read a bunch of resources" and then "re-create a bunch of resources" is sufficient. We will take it as given that the backup solution will correctly handle the subtleties of that process when it comes to native K8s resources.

Saving and then restoring a kdcluster through this process triggers the first issue that KubeDirector must deal with: when you create a kdcluster, even if you are really intending to "re-create" a previously existing kdcluster, you cannot specify an initial status stanza in the resource. This is problematic for a "re-creation" since the status stanza contains KubeDirector's state for managing that kdcluster -- particularly which native resources make up the kdcluster, and whether there are any pending notifications to be sent to script hooks in member pods.

The first goal is to allow a restore process to restore kdcluster status.

The not (necessarily) ordered nature of the restore process triggers a second issue: when a kdcluster is being reconciled by KubeDirector, how can KubeDirector distinguish for example the situation of "a necessary Service is missing" from the situation of "a necessary Service will be restored by the backup solution in a second or two"? If it jumps the gun by inferring the first situation when it is actually in the second situation, then -- depending on how KubeDirector and/or the backup solution handle things -- we will end up either with a duplicate Service or with an error from the restore process.

The second goal is to pause reconciliation on a kdcluster until it has been restored.

Finally: The fact that objects are actually re-created for a "restore" means that the UID of the object changes. KubeDirector makes use of owner references between objects to tie their lifecycle to the kdcluster, and the changing of UIDs breaks these references.

The third goal is to repair owner references on kdcluster components.

BACKUP CONFIGURATION

The main configuration necessary to support kdcluster backup-and-restore is to set the "backupClusterStatus" property to true in your kd-global-config resource. The default is false.

    backupClusterStatus: true

If your config resource was created while running a previous version of KubeDirector that did not support that property, you can manually add the property.

As long as this property is true, the status stanza from each kdcluster will be mirrored in a kdstatusbackup CR of the same name. This CR must be captured by the backup process -- along with, of course, the kdcluster itself, the kdapp used by the kdcluster, and the native K8s resources. When a restore happens, KubeDirector will load the status from this CR and return it to the kdcluster's status stanza. This CR should only be of interest to KubeDirector and the backup solution; no other K8s client will typically need to see it.

This addresses the first of the three goals mentioned above.

You may also need some configuration to properly support your backup solution of choice. For example, if using Velero, you want to avoid trying to back up the tmpfs-tmp, tmpfs-run, and tmpfs-run-lock volumes in the pods generated for kdclusters. You can do this with the following lines in your kd-global-config:

    podAnnotations:
      backup.velero.io/backup-volumes-excludes: tmpfs-tmp,tmpfs-run,tmpfs-run-lock

In general the "podAnnotations", "podLabels", "serviceAnnotations", and "serviceLabels" properties in kd-global-config will come in handy if there are labels or annotations that your backup solution requires you to place on any pods or services that are generated for kdcluster members.

Finally, you should choose how to handle resources specified as "connections" for a kdcluster. As with any resource, they are not guaranteed to be in the backup; in the case of a connection resource it might not have even existed before the backup. And if they do get restored, they might be restored after the kdcluster. It is in the general case OK for a kdcluster to resume reconciliation before its connections reappear; when they reappear its members will get a "reconnect" notify on their startscripts. However, you may be using apps that were written to assume that connected resources always exist and that their properties-of-interest are immutable; in that case those apps may not implement a response to "reconnect". The "allowRestoreWithoutConnections" property in kd-global-config lets you decide how to deal with this situation:

    allowRestoreWithoutConnections: false

If this is set to false (the default), then a kdcluster will not automatically resume reconciliation if some of its connected resources are not present -- unless reconciliation is manually forced to resume as described below. If set to true however, the presence of connections will not be a consideration in the decision to resume reconciliation.

BACKUP PREPARATION

As mentioned above, KubeDirector does not provide automation for quiescing KubeDirector activity during a backup.

If a KubeDirector cluster is being created, edited, or deleted while a backup is going on, then a particularly unfortunate interleaving of resource-capture with the application of the kdcluster operation could result in a StatefulSet, Service, or even PV that exists in the backup set without being referenced by any captured kdcluster. For StatefulSet and Service such "orphan" resources could be recognized by having a "kubedirector.hpe.com/kdcluster" label that identifies a non-existent kdcluster, or which points to a kdcluster whose status does not validate them as being a component resource. For PVs the situation is more complicated; they would have a claimRef for a nonexistent PVC, but that could be an expected temporary condition for an actually valid PV in your system.

Future KubeDirector work will improve reporting on (and possibly cleaning up) such orphan resources, as well as exploring support for integrations that could ask KubeDirector to pause its work on a set of kdclusters that is being backed up. However for the current release we would recommend avoiding kdcluster create/edit/delete in a namespace that is in the process of being backed up.

Note that if a kdcluster has queued-up notifications that are waiting to be sent to a dead member pod (when/if that pod is resurrected), that state of affairs will be properly captured and restored.

AUTOMATIC RESTORE MANAGEMENT

When a kdcluster is restored from backup, KubeDirector will recognize this situation (from annotations on the kdcluster) and initially not do any reconciliation. Reconciliation will resume on the kdcluster when its kdstatusbackup, kdapp, and all component native resources have been restored. If for whatever reason this is not going to happen, you can choose to manually delete the kdcluster, or to manually force it to resume reconciliation.

This addresses the second of the three goals mentioned above.

When a kdcluster is in this "paused" state while being restored, its status stanza will include a "restoreProgress" property, an object that includes three boolean flags and an optional error message string. To begin with it will look like this:

    restoreProgress:
      awaitingApp: true
      awaitingResources: true
      awaitingStatus: true
      error: ""

The "awaitingApp" flag will switch to false if the relevant kdapp is restored. The "awaitingStatus" flag becomes false when the relevant kdstatusbackup is restored. Once status is restored, the "awaitingResources" flag can become false if all resources named in the status are restored. (If "allowRestoreWithoutConnections" is false in the KD config, connected resources also affect this flag.) Once all of these flags are false, KubeDirector will automatically resume reconciliation for this kdcluster.

If/when reconciliation resumes, the kdcluster will be validated, running through the same series of checks that would normally be used when the kdcluster is initially created. If this validation fails then reconciliation cannot resume, and the "error" field of "restoreProgress" will describe the validation failure.

Note that until reconciliation resumes, any attempt to edit or delete the kdcluster spec will be rejected. If you absolutely need to delete it, see the next section below.

MANUAL RESTORE MANAGEMENT

Ideally the automatic process will be the usual one, but there may be situations where you will want to manually intervene.

If you need to force reconciliation to resume, you can edit the kdcluster to remove the "kubedirector.hpe.com/restoring" label. Two things to note about this in particular:

  • If the restore process is still going on, manually forcing the resumption of reconciliation can result in duplicate resources, or errors reported in the restore result.
  • If the kdcluster is not valid, the attempt to remove the label will be rejected. The "error" field of "restoreProgress" will describe the validation failure.

Any attempt to manually delete a kdcluster while it still has a "kubedirector.hpe.com/restoring" label will be rejected, since that could also leave orphan resources from the restore (even if the restore process is complete; see the next section below).

However if you truly do need to delete the kdcluster -- and you either cannot (because of validation failures) or do not wish to remove the "restoring" label -- then you can manually add the "kubedirector.hpe.com/allow-delete-while-restoring" label to the kdcluster. (The label value does not matter.) Once this label is in place, the kdcluster may be deleted even if it is still in "restoring" mode.

AFTER RESTORE

Once KubeDirector resumes reconciliation on a kdcluster, it will repair the owner references as necessary on any of that kdcluster's component resources. This is always an automatic process that does not need any user attention.

This addresses the last of the three goals mentioned above.

The only reason a user might want to be aware of this behavior is that if a kdcluster is forcibly deleted before it resumes reconciliation, its component native K8s resources will not get automatically "cleaned up" since they do not have valid owner references.