Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[system test] Fix downgrade procedure #11154

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

see-quick
Copy link
Member

@see-quick see-quick commented Feb 18, 2025

Type of change

  • Bugfix
  • Enhancement / new feature

Description

This PR refactors our Strimzi upgrade/downgrade procedures. It also changes the downgrade process (motivated by [1]). Previously, we downgraded the CO first, then Kafka. In this PR, I reverse that order.

[1] - #11144.

Checklist

  • Write tests
  • Make sure all tests pass

Signed-off-by: see-quick <maros.orsak159@gmail.com>
@see-quick see-quick added this to the 0.46.0 milestone Feb 18, 2025
@see-quick see-quick requested review from a team February 18, 2025 12:25
@see-quick see-quick self-assigned this Feb 18, 2025
@see-quick
Copy link
Member Author

/azp run upgrade

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@PaulRMellor PaulRMellor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switch makes sense as reflected in the docs

@scholzj
Copy link
Member

scholzj commented Feb 18, 2025

This PR refactors our Strimzi upgrade/downgrade procedures. It also changes the downgrade process (motivated by [1]). Previously, we downgraded the CO first, then Kafka. In this PR, I reverse that order.

If you did this, it would not have worked - at least until #10929 is included in the older CO version - so I'm not sure this PR / the old code was really doing what you say it is doing.

@see-quick
Copy link
Member Author

This PR refactors our Strimzi upgrade/downgrade procedures. It also changes the downgrade process (motivated by [1]). Previously, we downgraded the CO first, then Kafka. In this PR, I reverse that order.

If you did this, it would not have worked - at least until #10929 is included in the older CO version - so I'm not sure this PR / the old code was really doing what you say it is doing.

where specifically should it fail?

@scholzj
Copy link
Member

scholzj commented Feb 18, 2025

If you dongrade the CO without downgrading Kafka first, then either the Kafka cluster will run unknown Kafka version and it will get stuck. Or it runs already supported Kafka version and then you did everything correctly and there is no reason to change anything in the procedure.

@see-quick
Copy link
Member Author

If you dongrade the CO without downgrading Kafka first, then either the Kafka cluster will run unknown Kafka version and it will get stuck. Or it runs already supported Kafka version and then you did everything correctly and there is no reason to change anything in the procedure.

Hmmm, the current way how we do it is that the Kafka version at the start of the downgrade procedure is 3.8.0. So basically when we do downgrade we would be on the same Kafka version that's why I think it works for now.

metadataVersion: "3.8"
version: 3.8.0

Should we start from 3.9.0 and downgrade to 3.8.0 or 3.8.1?

@scholzj
Copy link
Member

scholzj commented Feb 18, 2025

Well, you skip the first step and start with the oldest version because:

  • Downgrade from 3.9.0 to 3.8.0 might not be possible (if you for example start with metadata version 3.9 already)
  • It is covered in a separate test

But that does not mean you are somehow not in sync with the docs and that there is anything to change just because of the docs change.

@ppatierno
Copy link
Member

ppatierno commented Feb 19, 2025

It is covered in a separate test

Wdym? I am lost now. Do we already have a test for the right order downgrade? Assuming you have Kafka 3.9 (with metadata 3.8) and you want to downgrade, we should have a test doing: downgrade Kafka from 3.9 to 3.8 (which is still possible because using 3.8 metadata) and then the operator.

The documentation describes the correct order now. Of course, as you said downgrading Kafka is possible only if the metadata is (already) "older" and supported.

@im-konge
Copy link
Member

im-konge commented Feb 19, 2025

There are few things we are doing as preparation before running the downgrade test-case:

  • we take the deployKafkaVersion from BundleDowngrade.yaml - currently 3.8.0
  • we take all the supported version of Kafka by the latest CO - from kafka-versions.yaml - and from there, we take the oldest supported Kafka version - currently 3.8.0

In the code we then do this:

String lowerMetadataVersion = downgradeData.getProcedures().getMetadataVersion();

UpgradeKafkaVersion downgradeKafkaVersion = new UpgradeKafkaVersion(downgradeData.getDeployKafkaVersion(), lowerMetadataVersion);

where the getDeployKafkaVersion() returns the value from the BundleDowngrade.yaml, lowerMetadataVersion is the version taken from the kafka-versions.yaml. So with values currently present in the both values, it looks like this:

version: 3.8.0
metadataVersion: 3.8

That's why we have it like that. You can change the deployKafkaVersion to 3.9.0, but you have to be sure that the downgrade path is supported and that you have 3.8 metadataVersion.

IIRC we had discussion around this before and with the way how it is now we are verifying that the older CO will handle the rolling update of the supported version of Kafka once it (CO) is downgraded. Because downgrade of Kafka version is anyway handled, as @scholzj mentioned, in different tests - KRaftKafkaUpgradeDowngradeST for example. The KRaftStrimziDowngradeST should be really about downgrading the CO and verifying that after downgrade of CO, all of the components are rolled correctly, with right images.

So maybe - it's just my opinion - we can remove the upgrade/downgrade of Kafka from there - because it's anyway handled somewhere else. We can then just pick the oldest supported version by the current (latest) CO and deploy that one.

@see-quick
Copy link
Member Author

So maybe - it's just my opinion - we can remove the upgrade/downgrade of Kafka from there - because it's anyway handled somewhere else. We can then just pick the oldest supported version by the current (latest) CO and deploy that one.

@scholzj @ppatierno Any thoughts on this?

@scholzj
Copy link
Member

scholzj commented Feb 20, 2025

@see-quick Sorry, I'm not really sure I understand what that means in relation to the test, sorry.

@see-quick
Copy link
Member Author

see-quick commented Feb 20, 2025

@see-quick Sorry, I'm not really sure I understand what that means in relation to the test, sorry.

We would remove this part of the code from upgrade and downgrade procedures within the KRaftStrimziDowngradeST, and KRaftStrimziUpgradeST classes.

// Upgrade/Downgrade kafka
changeKafkaVersion(testStorage.getNamespaceName(), upgradeDowngradeData);
changeKafkaVersionInKafkaConnect(testStorage.getNamespaceName(), upgradeDowngradeData);

and we will simply just do an upgrade/downgrade of the Strimzi Operator with the same versions of the components. But as Lukas pointed out we do upgrade/downgrade of Kafka version in KRaftKafkaUpgradeDowngradeST

@scholzj
Copy link
Member

scholzj commented Feb 20, 2025

TBH, I guess I'm a bit confused. This PR started with we do not do downgrade of Kafka before we downgrade the CO and now we moved to let's remove the downgrade of Kafka before we downgrade the CO.

@im-konge
Copy link
Member

TBH, I guess I'm a bit confused. This PR started with we do not do downgrade of Kafka before we downgrade the CO and now we moved to let's remove the downgrade of Kafka before we downgrade the CO.

TBH I was a bit confused as well when reading the comments. But as I went through the code and also read through the comments, I realized that we are doing the upgrade/downgrade of Kafka anyway inside the KRaftKafkaUpgradeDowngradeST. And in the KRaftStrimziDowngradeST we have (currently) downgrade of Kafka once we downgrade the CO - which in the current situation, there is no actual downgrade of Kafka version, because of how the variables are configured, the "deploy Kafka version" is same as the "downgrade to Kafka version".

So my idea or opinion was to remove the Kafka downgrade step from the KRaftStrimziDowngradeST, as in this suite we care especially about the CO downgrade and if everything will be correctly rolled. I think that downgrade of Kafka cluster after we downgrade CO doesn't add much value in that point.

Same situation would be if we would change the order and deploy Kafka, do the downgrade of Kafka and then downgrading the CO. It's just added step which is covered by different set of tests.

That's why it changed from we do not do downgrade of Kafka before we downgrade the CO to let's remove the downgrade of Kafka before we downgrade the CO.

@scholzj
Copy link
Member

scholzj commented Feb 20, 2025

@im-konge Ok, that makes sense I guess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants