stabilize e2e test case sandbox-basic #962

LiboYu2 · 2024-10-17T15:36:25Z

Increased the timeout for steps that frequently fail.
Add some latency to stabilize the cluster before moving to the unsandboxing step.
I ran the test cases locally for 5 times in a roll and they all passed.

LiboYu2 · 2024-10-17T15:38:19Z

kuttl-test.yaml

@@ -13,6 +13,7 @@

 apiVersion: kuttl.dev/v1beta1
 kind: TestSuite
+kindNodeCache: true


This allows the downloaded images to be cached on the node. This will speed up test case execution.

LiboYu2 · 2024-10-17T15:38:55Z

tests/e2e-leg-10/sandbox-basic/59-wait-for-steady-state.yaml

@@ -15,3 +15,4 @@ apiVersion: kuttl.dev/v1beta1
 kind: TestStep
 commands:
  - command: bash -c "../../../scripts/wait-for-verticadb-steady-state.sh -n verticadb-operator -t 360 $NAMESPACE"
+  - command: sleep 120


Add this latency to stabilize the cluster before step 60 starts.

You don't need to add this sleep. It will take the test longer to complete. If this frequently fails due to insufficient timeout, then just increase the timeout in the script call above.

This is different from timeout. The timeout is the maximum wait time for the whole test step to finish. This sleep call gives the cluster some time to stabilize its state before the next step starts. It makes the test run longer but it makes the test pass.

The script on line 17 waits for the operator to be steady(gives the cluster some time to stabilize its state before the next step) meaning there is no error and nothing going on. There is no benefit in adding another wait after. If this step fails and more time is needed, you should increase the time passed as argument to the script.

This is from that wait-for-verticadb-steady-state.sh

timeout $TIMEOUT bash -c -- "while ! $LOG_CMD | \ grep $WEBHOOK_FILTER | \ grep $DEPRECATION_FILTER | \ grep $VDB_FILTER | \ tail -1 | grep --quiet '\"result\": {\"Requeue\":false,\"RequeueAfter\":0}, \"err\": null'; do sleep 1; done" & pid=$! wait $pid

This is from "man timeout":
timeout - run a command with a time limit

If the script runs longer than the $TIMEOUT, an error will be reported.
If the script finishes within the $TIMEOUT, the next step starts right away.
What I want to achieve is to add a latency between the two steps to make sure the latter
step is not impacted by the previous step. Increasing the timeout will not achieve that.

cchen-vertica · 2024-10-18T14:00:27Z

As discussed, we should use an environment variable for all extended timeout values (900), and try to print all the events on vdb.

roypaulin · 2024-10-18T15:13:50Z

As discussed, we should use an environment variable for all extended timeout values (900), and try to print all the events on vdb.

What do you mean by "printing all the events on vdb"

1 removed 2 min sleep added previously 2 increased low disk space to avoid low disk volume event 3 bumped the spam filter threshold from 25 to 100 to make sure kuttl framework will receive all the events

LiboYu2 · 2024-10-23T14:51:01Z

As discussed, we should use an environment variable for all extended timeout values (900), and try to print all the events on vdb.

What do you mean by "printing all the events on vdb"

Cai's point is to let vdb receive all the events.

LiboYu2 · 2024-10-23T14:55:00Z

As discussed, we should use an environment variable for all extended timeout values (900), and try to print all the events on vdb.

Totally forgot about that:) I just tried it out. It turns out environmental variables can only be used in commands. https://kuttl.dev/docs/testing/steps.html#running-commands

We cannot use the environmental variables as a configuration parameter. Here is the error I got after using environmental variables: harness.go:397: loading /home/lyu/test-repos/vertica-kubernetes/tests/e2e-leg-10/sandbox-basic/60-assert.yaml: error converting unstructured object TestAssert:/ (/home/lyu/test-repos/vertica-kubernetes/tests/e2e-leg-10/sandbox-basic/60-assert.yaml): error converting TestAssert:/ from unstructured error: unrecognized type: int

cchen-vertica · 2024-10-23T18:03:51Z

tests/e2e-leg-10/sandbox-basic/setup-vdb/base/setup-vdb.yaml

@@ -23,7 +23,7 @@ spec:
  initPolicy: CreateSkipPackageInstall
  communal: {}
  local:
-    requestSize: 250Mi
+    requestSize: 270Mi


Will 20Mi make any difference? We shouldn't make this change.

I noticed the vdb received low disk space event when I ran the test locally. When free disk space is less than 10MB, that event will be sent.

Then give it more like 500Mi.

500Mi is great for ci server. But when we run the test locally, it may be a challenge if the free disk space is not large enough. When parallel is set to 2, there will 20 pods at most at the same time. How about 300Mi?

cchen-vertica · 2024-10-23T18:05:48Z

tests/e2e-leg-10/sandbox-basic/60-unsandbox-sec3.yaml

@@ -10,7 +10,10 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-
+apiVersion: kuttl.dev/v1beta1


The changes in this file should be removed since you already extended the timeout in 60-assert.yaml.

roypaulin · 2024-10-23T18:11:29Z

cmd/operator/main.go

@@ -239,7 +240,7 @@ func main() {
 	if opcfg.GetLoggingFilePath() != "" {
 		log.Printf("Now logging in file %s", opcfg.GetLoggingFilePath())
 	}
-
+	var multibroadcaster = record.NewBroadcasterWithCorrelatorOptions(record.CorrelatorOptions{BurstSize: 100})


Let's just make this configurable through an env var. Look at examples in opcfg/config.go5(DEPLOY_WITH? WEBHOOKS_ENABLED). See how and where they are changed and do the same for the burstsize. By default it will be 25(k8s default), if a value lower than 25 is passed we still pick 25.

Made the change. Please take a look.

@LiboYu2, I do not see the change.

Forgot to push it. My bad. Now you should be able to see it.

roypaulin · 2024-10-24T15:09:16Z

tests/kustomize-defaults.cfg

+# this will set up the threshold used by the spam filer 
+# default threshold is 25, which is too low to run the test cases
+# some test cases rely on event verification
+BROADCASTER_BURST_SIZE=100


It is not needed here.

roypaulin · 2024-10-24T15:25:16Z

config/manager/operator-envs

@@ -14,3 +14,4 @@ CONCURRENCY_VERTICARESTOREPOINTSQUERY
 CONCURRENCY_VERTICASCRUTINIZE
 CONCURRENCY_SANDBOXCONFIGMAP
 CONCURRENCY_VERTICAREPLICATOR
+BROADCASTER_BURST_SIZE


Sorry I forgot to tell you. You also need to add this to template-helm-chart.sh. Also need to add a new parameter to values.yaml. Look at WEBHOOKS_ENABLED(line 227 in template-helm-chart.sh) as an example and do the same for BROADCASTER_BURST_SIZE

No worries. I added a new commit to address that. You can take a look. I built locally and deployed to my local kind cluster. The burst size is 100. No external environmental variable was used.

HaoYang0000 · 2024-10-28T08:45:10Z

pkg/opcfg/config.go

+// GetBroadcasterBurstSize returns the customizable burst size for broadcaster.
+func GetBroadcasterBurstSize() int {
+	burstSize := lookupIntEnvVar("BROADCASTER_BURST_SIZE", envCanNotExist)
+	if burstSize < 25 {


Minor: We can make a constant DEFAULT_BURST_SIZE = 25 for maintaining the int, and we don't need an else block here, we can do:
if burstSize >= DEFAULT_BURST_SIZE {
return burstSize
}
return DEFAULT_BURST_SIZE

made the change.

HaoYang0000 · 2024-10-28T08:46:23Z

tests/e2e-leg-10/sandbox-basic/60-assert.yaml

@@ -11,6 +11,10 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

+apiVersion: kuttl.dev/v1beta1
+kind: TestAssert
+timeout: 900


I remember the default timeout for all job is 1000, any reason we want to reduce the time out here?

The default is 600s in kuttl-test.yaml

roypaulin · 2024-10-28T15:46:22Z

helm-charts/verticadb-operator/values.yaml

@@ -86,6 +86,10 @@ webhook:
  # can be used to skip that and still deploy the operator.
  enable: true

+  # this will increase the default threshold used by spam filter in controller runtime from 25 to 100
+  # this is important for running e2e test as some test cases require verification of events
+  burstSize: 100


Move this under controllers.

helm-charts/verticadb-operator/values.yaml

Co-authored-by: Roy Paulin <rnguetsopken@opentext.com>

…ica/vertica-kubernetes into VER-95582-debug-daily-build-failure

cchen-vertica · 2024-10-28T17:28:26Z

The unit tests failed. Some go-lint errors exist. You need to fix those errors.

VER-95582 stabilize sandbox-basic

3e63000

LiboYu2 requested review from roypaulin, cchen-vertica, fenic-fawkes, HaoYang0000 and qindotguan as code owners October 17, 2024 15:36

LiboYu2 changed the title ~~VER-95582 stabilize e2e test case sandbox-basic~~ stabilize e2e test case sandbox-basic Oct 17, 2024

LiboYu2 commented Oct 17, 2024

View reviewed changes

fix sandbox-basic e2e test case

e774ca6

1 removed 2 min sleep added previously 2 increased low disk space to avoid low disk volume event 3 bumped the spam filter threshold from 25 to 100 to make sure kuttl framework will receive all the events

cchen-vertica reviewed Oct 23, 2024

View reviewed changes

roypaulin reviewed Oct 23, 2024

View reviewed changes

use BROADCASTER_BURST_SIZE to set spam filter threshold

15c80ce

roypaulin reviewed Oct 24, 2024

View reviewed changes

set up burst size for helm

a16a24b

HaoYang0000 reviewed Oct 28, 2024

View reviewed changes

minor change per code review

aacccbf

roypaulin reviewed Oct 28, 2024

View reviewed changes

LiboYu2 and others added 3 commits October 28, 2024 11:55

Update helm-charts/verticadb-operator/values.yaml

8410060

Co-authored-by: Roy Paulin <rnguetsopken@opentext.com>

relocate burstSize per code review

cdabac3

Merge branch 'VER-95582-debug-daily-build-failure' of github.com:vert…

3491284

…ica/vertica-kubernetes into VER-95582-debug-daily-build-failure

LiboYu2 and others added 2 commits October 28, 2024 14:31

fix unittest lint error

ea8547f

Add changie entry

8fdcf3b

roypaulin approved these changes Oct 29, 2024

View reviewed changes

roypaulin merged commit 4ace14d into main Oct 29, 2024
39 checks passed

roypaulin deleted the VER-95582-debug-daily-build-failure branch October 29, 2024 17:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stabilize e2e test case sandbox-basic #962

stabilize e2e test case sandbox-basic #962

LiboYu2 commented Oct 17, 2024 •

edited

Loading

LiboYu2 Oct 17, 2024

LiboYu2 Oct 17, 2024

roypaulin Oct 17, 2024 •

edited

Loading

LiboYu2 Oct 17, 2024

roypaulin Oct 17, 2024

LiboYu2 Oct 17, 2024

cchen-vertica commented Oct 18, 2024

roypaulin commented Oct 18, 2024

LiboYu2 commented Oct 23, 2024

LiboYu2 commented Oct 23, 2024

cchen-vertica Oct 23, 2024

LiboYu2 Oct 23, 2024

cchen-vertica Oct 28, 2024

LiboYu2 Oct 28, 2024

cchen-vertica Oct 23, 2024

roypaulin Oct 23, 2024

LiboYu2 Oct 24, 2024

roypaulin Oct 24, 2024

LiboYu2 Oct 24, 2024

roypaulin Oct 24, 2024

roypaulin Oct 24, 2024

LiboYu2 Oct 24, 2024

HaoYang0000 Oct 28, 2024 •

edited

Loading

LiboYu2 Oct 28, 2024

HaoYang0000 Oct 28, 2024

roypaulin Oct 28, 2024

roypaulin Oct 28, 2024

LiboYu2 Oct 28, 2024

cchen-vertica commented Oct 28, 2024

stabilize e2e test case sandbox-basic #962

stabilize e2e test case sandbox-basic #962

Conversation

LiboYu2 commented Oct 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

roypaulin Oct 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cchen-vertica commented Oct 18, 2024

roypaulin commented Oct 18, 2024

LiboYu2 commented Oct 23, 2024

LiboYu2 commented Oct 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HaoYang0000 Oct 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cchen-vertica commented Oct 28, 2024

LiboYu2 commented Oct 17, 2024 •

edited

Loading

roypaulin Oct 17, 2024 •

edited

Loading

HaoYang0000 Oct 28, 2024 •

edited

Loading