Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding logic to Patch ETCD apiserver client crt secret on rollout #56

Merged
merged 1 commit into from
Dec 5, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion controllers/certs.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ import (
"sigs.k8s.io/cluster-api/util/certs"
"sigs.k8s.io/cluster-api/util/secret"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
"sigs.k8s.io/etcdadm/certs/pkiutil"
"sigs.k8s.io/etcdadm/constants"
)
Expand Down Expand Up @@ -87,7 +88,17 @@ func (r *EtcdadmClusterReconciler) generateCAandClientCertSecrets(ctx context.Co
Generated: true,
}
s := apiServerClientCertKeyPair.AsSecret(client.ObjectKey{Name: cluster.Name, Namespace: cluster.Namespace}, *metav1.NewControllerRef(etcdCluster, etcdv1.GroupVersion.WithKind("EtcdadmCluster")))
if err := r.Client.Create(ctx, s); err != nil && !apierrors.IsAlreadyExists(err) {
secretToPatch := s.DeepCopy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont quite understand why we need to have a deep copy of the secret

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So basically its for how CreateOrPatch works. If the object is found, its expected that the changes needed should be redone in a call back func.

So on line 98 s.Data = secretToPatch.Data i am updating the secret data again using the deep copied object.


// CreateOrPatch performs a create operation when the object is not found.
// But if an object is found, the function expects to reconcile the fields we want patched in a callback func.
// CreateOrPatch does a GET call and overwrites the object we pass in with whats on the cluster.
// Hence we keep a copy of the newly generated secret and update the secret Data field in a callback func.
// Ex; https://github.com/kubernetes-sigs/controller-runtime/blob/v0.14.5/pkg/controller/controllerutil/example_test.go
if _, err := controllerutil.CreateOrPatch(ctx, r.Client, s, func() error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is another secret create in

if err := r.Client.Create(ctx, s); err != nil && !apierrors.IsAlreadyExists(err) {

what does it do and why we do not need to make similar change like this api-server one?

s.Data = secretToPatch.Data
return nil
}); err != nil {
return fmt.Errorf("failure while saving etcd client key and certificate: %v", err)
}

Expand Down
15 changes: 15 additions & 0 deletions controllers/controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -312,6 +312,8 @@ func (r *EtcdadmClusterReconciler) reconcile(ctx context.Context, etcdCluster *e
}
}
conditions.MarkFalse(ep.EC, etcdv1.EtcdMachinesSpecUpToDateCondition, etcdv1.EtcdRollingUpdateInProgressReason, clusterv1.ConditionSeverityWarning, "Rolling %d replicas with outdated spec (%d replicas up to date)", len(needRollout), len(ep.Machines)-len(needRollout))
conditions.MarkFalse(ep.EC, etcdv1.EtcdCertificatesAvailableCondition, etcdv1.EtcdRollingUpdateInProgressReason, clusterv1.ConditionSeverityWarning, "Rolling %d replicas with outdated spec (%d replicas up to date)", len(needRollout), len(ep.Machines)-len(needRollout))

return r.upgradeEtcdCluster(ctx, cluster, etcdCluster, ep, needRollout)
default:
// make sure last upgrade operation is marked as completed.
Expand All @@ -325,6 +327,19 @@ func (r *EtcdadmClusterReconciler) reconcile(ctx context.Context, etcdCluster *e
delete(etcdCluster.Annotations, etcdv1.UpgradeInProgressAnnotation)
}
}

// If the ETCD nodes have performed a rollout, the etcd client certs on the CP nodes need to be renewed.
// We mark the condition EtcdCertificatesAvailable False in the rollout case and check for its value.
// The default case is hit right after ScaleUp of ETCD nodes is completed and before the first CP comes up.
// If EtcdCertificatesAvailable is False, this means we need to update the certs.
// EtcdCertificatesAvailable is set to True once the certs are updated.
if conditions.IsFalse(ep.EC, etcdv1.EtcdCertificatesAvailableCondition) {
log.Info("Updating Etcd client certs")
if err := r.generateCAandClientCertSecrets(ctx, cluster, etcdCluster); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the actual external etcd CA cert + key pair are not regenerated after the etcd machines are rolled out right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the ETCD certs on the ETCD nodes do get updated but thats with etcdadm, its not related to what we do here.

That was working, the problem was only that the secret on the cluster was not updated and thereby the cert on the CP was not updated.

r.Log.Error(err, "error generating etcd CA certs")
return ctrl.Result{}, err
}
}
}

switch {
Expand Down
56 changes: 56 additions & 0 deletions controllers/controller_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -402,6 +402,62 @@ func TestReconcileDeleteOutdatedMachines(t *testing.T) {
g.Expect(machineList.Items).To(BeEmpty())
}

func TestReconcileNeedsRollOutEtcdCluster(t *testing.T) {
g := NewWithT(t)

cluster := newClusterWithExternalEtcd()
etcdadmCluster := newEtcdadmCluster(cluster)

// CAPI machine controller has set status.Initialized to true, after the first etcd Machine is created, and after creating the Secret containing etcd init address
etcdadmCluster.Status.Initialized = true
etcdadmCluster.Spec.Replicas = pointer.Int32(int32(1))

// etcdadm controller has also registered that the status.Initialized field is true, so it has set InitializedCondition to true
conditions.MarkTrue(etcdadmCluster, etcdv1.InitializedCondition)
machine := newEtcdMachine(etcdadmCluster, cluster)
machine.Spec.Bootstrap = clusterv1.Bootstrap{
ConfigRef: &corev1.ObjectReference{
Name: testClusterName,
Namespace: testNamespace,
},
}

// EtcdadmConfig with a different spec to trigger rollout.
etcdadmConfig := &etcdbootstrapv1.EtcdadmConfig{
ObjectMeta: metav1.ObjectMeta{
Namespace: testNamespace,
Name: testClusterName,
},
Spec: etcdbootstrapv1.EtcdadmConfigSpec{
EtcdadmInstallCommands: []string{"etcdadmInstallCommands is not empty"},
CloudInitConfig: &etcdbootstrapv1.CloudInitConfig{
Version: "v3.4.9",
},
},
}

objects := []client.Object{
cluster,
etcdadmCluster,
infraTemplate.DeepCopy(),
machine,
etcdadmConfig,
}
fakeClient := fake.NewClientBuilder().WithScheme(setupScheme()).WithObjects(objects...).Build()

r := &EtcdadmClusterReconciler{
Client: fakeClient,
uncachedClient: fakeClient,
Log: log.Log,
}
_, err := r.Reconcile(ctx, ctrl.Request{NamespacedName: util.ObjectKey(etcdadmCluster)})
g.Expect(err).NotTo(HaveOccurred())

machineList := &clusterv1.MachineList{}
g.Expect(fakeClient.List(context.Background(), machineList, client.InNamespace("test"))).To(Succeed())
g.Expect(len(machineList.Items)).To(Equal(2))
}

// newClusterWithExternalEtcd return a CAPI cluster object with managed external etcd ref
func newClusterWithExternalEtcd() *clusterv1.Cluster {
return &clusterv1.Cluster{
Expand Down