Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CreateContainerError occurs when trying to use VerticaAutoscaler #908

cyun79 opened this issue Aug 29, 2024 · 6 comments · Fixed by #913

CreateContainerError occurs when trying to use VerticaAutoscaler #908

cyun79 opened this issue Aug 29, 2024 · 6 comments · Fixed by #913


Copy link

cyun79 commented Aug 29, 2024

I'm trying to implement VerticaAutoscaler, but it doesn't work. Could anyone give me some advice?

Before generate load

[mini@vmhost ~]$ k get all

NAME                           READY   STATUS    RESTARTS   AGE
pod/vertica-eon-k8s-pri-01-0   3/3     Running   0          13m
pod/vertica-eon-k8s-pri-01-1   3/3     Running   0          13m
pod/vertica-eon-k8s-pri-01-2   3/3     Running   0          13m

NAME                                      TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                               AGE
service/kubernetes                        ClusterIP       <none>        443/TCP                               5h27m
service/vertica-eon-k8s                   ClusterIP   None            <none>        5434/TCP,4803/TCP,8443/TCP,5554/TCP   13m
service/vertica-eon-k8s-vdb-connections   ClusterIP   <none>        5433/TCP,8443/TCP                     13m

NAME                                      READY   AGE
statefulset.apps/vertica-eon-k8s-pri-01   3/3     13m

NAME                                         REFERENCE                  TARGETS       MINPODS   MAXPODS   REPLICAS   AGE
horizontalpodautoscaler.autoscaling/vas-01   VerticaAutoscaler/vas-01   cpu: 0%/10%   3         12        3          17s

NAME                                    SUBCLUSTERS   VERSION     READY   AGE   1             v24.2.0-1   3/3     13m

NAME                                   GRANULARITY   CURRENT SIZE   TARGET SIZE   SCALING COUNT   AGE   Pod           3              3             0               21s

[mini@vmhost ~]$ k top pods

NAME                       CPU(cores)   MEMORY(bytes)   
vertica-eon-k8s-pri-01-0   12m          804Mi           
vertica-eon-k8s-pri-01-1   12m          713Mi           
vertica-eon-k8s-pri-01-2   12m          717Mi   

[mini@vmhost ~]$ kd hpa

Name:                                                  vas-01
Namespace:                                             default
Labels:                                                <none>
Annotations:                                           <none>
CreationTimestamp:                                     Thu, 29 Aug 2024 18:41:16 +0900
Reference:                                             VerticaAutoscaler/vas-01
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  0% (14m) / 10%
Min replicas:                                          3
Max replicas:                                          12
VerticaAutoscaler pods:                                3 current / 3 desired
  Type            Status  Reason               Message
  ----            ------  ------               -------
  AbleToScale     True    ScaleDownStabilized  recent recommendations were higher than current one, applying the highest recent recommendation
  ScalingActive   True    ValidMetricFound     the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
  ScalingLimited  False   DesiredWithinRange   the desired count is within the acceptable range
Events:           <none>

After generate load

[mini@vmhost ~]$ k top pods

NAME                       CPU(cores)   MEMORY(bytes)   
vertica-eon-k8s-pri-01-0   983m         807Mi           
vertica-eon-k8s-pri-01-1   17m          709Mi           
vertica-eon-k8s-pri-01-2   18m          713Mi   

[mini@vmhost ~]$ kd hpa

Name:                                                  vas-01
Namespace:                                             default
Labels:                                                <none>
Annotations:                                           <none>
CreationTimestamp:                                     Thu, 29 Aug 2024 18:41:16 +0900
Reference:                                             VerticaAutoscaler/vas-01
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  16% (340m) / 10%
Min replicas:                                          3
Max replicas:                                          12
VerticaAutoscaler pods:                                3 current / 5 desired
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    SucceededRescale    the HPA controller was able to update the target scale to 5
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
  Type    Reason             Age   From                       Message
  ----    ------             ----  ----                       -------
  Normal  SuccessfulRescale  5s    horizontal-pod-autoscaler  New size: 5; reason: cpu resource utilization (percentage of request) above target

[mini@vmhost ~]$ k get pods

NAME                       READY   STATUS              RESTARTS   AGE
vertica-eon-k8s-pri-01-0   0/2     ContainerCreating   0          2s
vertica-eon-k8s-pri-01-1   0/2     ContainerCreating   0          2s
vertica-eon-k8s-pri-01-2   0/2     ContainerCreating   0          2s

[mini@vmhost ~]$ k get pods

vertica-eon-k8s-pri-01-0   1/2     CreateContainerError   0          19s
vertica-eon-k8s-pri-01-1   1/2     CreateContainerError   0          19s
vertica-eon-k8s-pri-01-2   1/2     CreateContainerError   0          19s

Operator shows below error

{"log":"2024-08-29T09:46:42.606Z\u0009ERROR\u0009Reconciler error\u0009{\"controller\": \"verticadb\", \"controllerGroup\": \"\", \"controllerKind\": \"VerticaDB\", \"VerticaDB\": {\"name\":\"vertica-eon-k8s\",\"namespace\":\"default\"}, \"namespace\": \"default\", \"name\": \"vertica-eon-k8s\", \"reconcileID\": \"e385718a-7945-431d-8b99-90178d645e75\", \"error\": \"failed to copy and execute the gather script: could not execute: unable to upgrade connection: pod does not exist\", \"errorVerbose\": \"could not execute: unable to upgrade connection: pod does not exist\\nfailed to copy and execute the gather script\\*PodFacts).runGather\\n\\t/workspace/pkg/controllers/vdb/podfacts.go:457\\*PodFacts).collectPodByStsIndex\\n\\t/workspace/pkg/controllers/vdb/podfacts.go:420\\*PodFacts).collectSubcluster\\n\\t/workspace/pkg/controllers/vdb/podfacts.go:339\\*PodFacts).Collect\\n\\t/workspace/pkg/controllers/vdb/podfacts.go:282\\*AnnotateAndLabelPodReconciler).Reconcile\\n\\t/workspace/pkg/controllers/vdb/annotateandlabelpod_reconciler.go:56\\*VerticaDBReconciler).Reconcile\\n\\t/workspace/pkg/controllers/vdb/verticadb_controller.go:135\\*Controller).Reconcile\\n\\t/go/pkg/mod/\\*Controller).reconcileHandler\\n\\t/go/pkg/mod/\\*Controller).processNextWorkItem\\n\\t/go/pkg/mod/\\*Controller).Start.func2.2\\n\\t/go/pkg/mod/\\nruntime.goexit\\n\\t/usr/local/go/src/runtime/asm_amd64.s:1695\"}\n","stream":"stdout","time":"2024-08-29T09:46:42.693813185Z"}

My envrionment

[mini@vmhost ~]$ kubectl version

Client Version: v1.31.0
Kustomize Version: v5.4.2
Server Version: v1.30.0

[mini@vmhost ~]$ k api-resources | grep -i vertica

eventtriggers                       et                 true         EventTrigger
verticaautoscalers                  vas                true         VerticaAutoscaler
verticadbs                          vdb                     true         VerticaDB
verticareplicators                  vrep                true         VerticaReplicator
verticarestorepointsqueries         vrpq                true         VerticaRestorePointsQuery
verticascrutinizers                 vscr                true         VerticaScrutinize

# cat vertica.yml

kind: VerticaDB
  name: "vertica-eon-k8s"
  annotations: "true" vertica
    - name: vlogger
      image: opentext/vertica-logger:1.0.1
          memory: "100Mi"
          cpu: "100m"
          memory: "100Mi"
          cpu: "100m"
    path: "s3://vertica-data-k8s"
    credentialSecret: s3-creds
    region: "us-east-1"
  image: opentext/vertica-k8s:24.2.0-1-minimal
  imagePullPolicy: Always
  - name: regcreds  
  dbName: eon_k8s
    requestSize: 10Gi
  - name: pri_01
    serviceName: vdb-connections
        cpu: 1
        memory: 2G
        cpu: 1
        memory: 2G
    size: 3
  shardCount: 3
  licenseSecret: vertica-license

# cat vas.yml

kind: VerticaAutoscaler
  name: vas-01
  namespace: default
  scalingGranularity: Pod
  #scalingGranularity: Subcluster
  serviceName: vdb-connections
  verticaDBName: vertica-eon-k8s
Copy link

Can you share the following:

  • k get pod vertica-eon-k8s-pri-01-0 -o yaml when CreateContainerError occurs, after the generated load?
  • k get vdb vertica-eon-k8s -o yaml when CreateContainerError occurs, after the generated load?
  • Redirect the operator logs to a file(just before you generate the load) and upload the file here.

Copy link

cyun79 commented Aug 30, 2024

Copy link

The issue is that at some point during the autoscaling process the annotation was set to false when the default value should be and stay true.
I am going take a look but as a temporary fix can you explicitly set the annotation to true before deploying vertica(in vertica.yaml)? This way:

annotations: "true"

Let me know how it goes.

Copy link

cyun79 commented Aug 30, 2024

Thank you for your guide.
I tried that, and there's no CreateContainerError. However, the additional nodes are never fully ready.
The parameters scalingGranularity (Pod/Subcluster) both produced the same results.

The results below are from when the scalingGranularity was set to "Subcluster".

[mini@vmhost ~]$ k get pods
NAME                         READY   STATUS    RESTARTS        AGE
vertica-eon-k8s-pri-01-0     3/3     Running   0               26m
vertica-eon-k8s-pri-01-1     3/3     Running   0               26m
vertica-eon-k8s-pri-01-2     3/3     Running   0               26m
vertica-eon-k8s-vas-01-0-0   2/3     Running   1 (2m41s ago)   22m
vertica-eon-k8s-vas-01-0-1   2/3     Running   1 (2m30s ago)   22m
vertica-eon-k8s-vas-01-0-2   0/3     Pending   0               22m

As you can see, the pod 0 and 1 are stuck after "Starting HTTP listener on address :5554" and pod 2 wasn't started.

[mini@vmhost ~]$ k logs pod/vertica-eon-k8s-vas-01-0-0 -f
Defaulted container "nma" out of: nma, server, vlogger
2024/08/30 15:26:33 New NodeManagementAgent starting
2024/08/30 15:26:33 Checking for existence of directory  /opt/vertica/log
2024/08/30 15:26:33 Moving working directory to  /opt/vertica/log
2024/08/30 15:26:33 Successfully opened file /proc/1/fd/1. Setting log output to that file.
2024/08/30 15:26:33 New log for process  1
2024/08/30 15:26:33 Called with args  [/opt/vertica/bin/node_management_agent]
2024/08/30 15:26:33 Hostname vertica-eon-k8s-vas-01-0-0 User id 5000
2024/08/30 15:26:33 Verbose logging is off
2024/08/30 15:26:33 Checking for existence of directory  /opt/vertica/config
2024/08/30 15:26:33 Creating pid file named  /opt/vertica/config/
2024/08/30 15:26:33 [Info]: Initializing TLS configuration for HTTPS listener.
2024/08/30 15:26:33 [Info]: Secrets retrieval from k8s based secret store
2024/08/30 15:26:33 [Info]: Secret name not set in env. Failback to other cert retieval methods.
2024/08/30 15:26:33 [Info]: Using paths to PEM files from environment variables.
2024/08/30 15:26:33 [Info]: Writing paths to PEM files from environment variables to cache.
2024/08/30 15:26:33 [Warning]: Failed to write cache file /opt/vertica/config/https_certs/tls_path_cache.yaml. Ignoring this error and continuing: error in writing yaml file /opt/vertica/config/https_certs/tls_path_cache.yaml: open /opt/vertica/config/https_certs/tls_path_cache.yaml: no such file or directory
2024/08/30 15:26:33 [Info]: Added CA certificate(s) to trusted pool.
2024/08/30 15:26:33 [Info]: Initializing TLS configuration finished.
2024/08/30 15:26:33 Starting HTTP listener on address :5554
[mini@vmhost ~]$ k logs pod/vertica-eon-k8s-vas-01-0-1 -f
Defaulted container "nma" out of: nma, server, vlogger
2024/08/30 15:26:34 New NodeManagementAgent starting
2024/08/30 15:26:34 Checking for existence of directory  /opt/vertica/log
2024/08/30 15:26:34 Moving working directory to  /opt/vertica/log
2024/08/30 15:26:34 Successfully opened file /proc/1/fd/1. Setting log output to that file.
2024/08/30 15:26:34 New log for process  1
2024/08/30 15:26:34 Called with args  [/opt/vertica/bin/node_management_agent]
2024/08/30 15:26:34 Hostname vertica-eon-k8s-vas-01-0-1 User id 5000
2024/08/30 15:26:34 Verbose logging is off
2024/08/30 15:26:34 Checking for existence of directory  /opt/vertica/config
2024/08/30 15:26:34 Creating pid file named  /opt/vertica/config/
2024/08/30 15:26:34 [Info]: Initializing TLS configuration for HTTPS listener.
2024/08/30 15:26:34 [Info]: Secrets retrieval from k8s based secret store
2024/08/30 15:26:34 [Info]: Secret name not set in env. Failback to other cert retieval methods.
2024/08/30 15:26:34 [Info]: Using paths to PEM files from environment variables.
2024/08/30 15:26:34 [Info]: Writing paths to PEM files from environment variables to cache.
2024/08/30 15:26:34 [Warning]: Failed to write cache file /opt/vertica/config/https_certs/tls_path_cache.yaml. Ignoring this error and continuing: error in writing yaml file /opt/vertica/config/https_certs/tls_path_cache.yaml: open /opt/vertica/config/https_certs/tls_path_cache.yaml: no such file or directory
2024/08/30 15:26:34 [Info]: Added CA certificate(s) to trusted pool.
2024/08/30 15:26:34 [Info]: Initializing TLS configuration finished.
2024/08/30 15:26:34 Starting HTTP listener on address :5554

[mini@vmhost ~]$  k logs pod/vertica-eon-k8s-vas-01-0-2 -f
Defaulted container "nma" out of: nma, server, vlogger

I attached the operator log file.

Copy link

The issue is that the operator is waiting for all the new pods to be running before adding them to the database but one of them is stuck pending. They are several reasons why a pod can be "Pending": Insufficient resources in the k8s cluster(CPU/Mem), pod quotas or limits, (k8s cluster) node availability... It is difficult to remotely what might be the issue as the k8s cluster is yours.
Are you sure your cluster has enough resources?
Share the output of these commands:

  • kubectl describe pod vertica-eon-k8s-vas-01-0-2
  • kubectl describe sts vertica-eon-k8s-vas-01-0
  • kubectl get nodes
  • kubectl get resourcequotas

Copy link

cyun79 commented Aug 31, 2024

I really appreciate your advice, and it worked after adjusting the CPU for CR.

roypaulin added a commit that referenced this issue Sep 4, 2024
When running a VerticaAutoscaler on a VerticaDB where `vcluster-ops` was
not explicitly set to `true` we hit a `CreateContainerError` error. This
happens because the autoscaler was still internally using a v1beta1
VerticaDB which led to the conversion webhook wrongfully setting that
annotation to `false`.
This fixes the issue by using a v1 VerticaDB instead.

Closes #908
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet

Successfully merging a pull request may close this issue.

2 participants