3.3. Resizing the Cluster with Elastic Scaling

Documentation

VoltDB Home » Documentation » VoltDB Kubernetes Administrator's Guide

3.3. Resizing the Cluster with Elastic Scaling

Elastic scaling lets you increase or decrease the size of your cluster on the fly, without requiring any downtime. You initiate elastic scaling by changing the value of the cluster.clusterSpec.replicas property.

For both increasing and decreasing the size of the cluster, the change in the number of nodes must meet the configuration requirements for the cluster's K factor. Specifically, you must add or remove K+1 nodes at a time.

3.3.1. Increasing the Size of the Cluster

To scale up the cluster you add nodes by upgrading the release, specifying the new number of nodes you want. Of course, the new value must meet the requirements for elastically expanding the cluster, as set out in the discussion of adding nodes to the cluster in the VoltDB Administrator's Guide. So, for example, to increase a five node cluster with a K-safety factor of one by two nodes, you can set the replica count to seven:

$ helm upgrade mydb voltdb/voltdb       \
   --reuse-values                       \
   --set cluster.clusterSpec.replicas=7

3.3.2. Decreasing the Size of the Cluster

To scale down or "shrink" the cluster, you upgrade the release specifying the new number of nodes you need. The new value must meet the requirements for K-safety. Specifically, you can only remove K+1 nodes at a time. So if your cluster has a K-safety factor of one, you must remove two nodes to reduce the size of the cluster. For example, if you have a five node cluster with K=1, you can shrink the cluster by setting the replica count to three:

$ helm upgrade mydb voltdb/voltdb       \
   --reuse-values                       \
   --set cluster.clusterSpec.replicas=3

If you want to reduce the cluster by more than K+1 nodes (for example, going from from seven to three nodes in the preceding example), you must iterate the resizing operation in steps of K+1 nodes. In our example that means first reducing the seven node cluster to five nodes. Then once the first resizing operation is complete, you can perform a second resizing operation to reduce the cluster to three nodes.

3.3.3. Autoscaling the Cluster

In dynamic environments there can be significant variation in the workload, based on anything from time of day to external events that drive usage. For example, traffic monitoring applications see the majority of their activity during rush hour with very little at other times of day. For applications where usage can spike and dip it is useful to be able to scale up and scale down the cluster to meet the needs of the workload. Automatically adjusting the cluster size based on workload or other system performance metrics is called autoscaling.

To set up autoscaling, you need to define thresholds on one or more metrics that are used to trigger the resizing. For example, you might want to resize the cluster based on capacity, the amount of memory in use — scaling up when memory usage exceeds a certain level and scaling down if memory drops below a minimum size. Alternatively, you could schedule autoscaling based on throughput or CPU usage.

When you select a unit to measure, you also define the thresholds at which the cluster should scale up or down for that metric. For example, if you are using memory consumption as the trigger, you set the thresholds as the resident set size (RSS) in bytes, so that the cluster will scale up if too much memory is being used and scale down if too little is used.

Having enabled autoscaling, if the cluster ever does exceed the upper limit, the Volt operator will automatically start an elastic resize operation to increase the size of the cluster by K+1 nodes. Similarly, if the memory usage drops below the lower limit, the operator will start an elastic downsizing of the cluster by K+1 nodes.

In addition to defining the trigger metrics and their associated upper and lower thresholds, you can control other aspects of the autoscaling process, such as:

  • The minimum and maximum size for the cluster

  • How frequently autoscaling status is reported in the logs

  • How long the metric must remain beyond the threshold before actually starting to autoscale the cluster

The following sections explain how to configure autoscaling, how to monitor autoscaling while it is in process, and how to troubleshoot potential problems if autoscaling fails.

3.3.3.1. Enabling Autoscaling

You configure autoscaling using the cluster.clusterSpec.autoScaling.* properties. First, you must set the cluster.clusterSpec.autoScaling.enabled property to true. Next, you select one or more metrics to measure as triggers for autoscaling. At the same time, you define the upper and lower thresholds using the .scaleUp and .scaleDown properties. There are currently four metrics you can choose from:

  • CPU Usage (cpu) — measured in percentage of total CPU currently in use

  • Memory Usage (rss) — measured by the resident set size in bytes

  • Latency (tps) — measured in the average number of transactions per second

  • Idle Time (idletime) — measured in the percentage of time the partitions were idle (that is, not processing any transactions because the queues were empty)

You must define at least one metric as a trigger, but you can define more than one if you like. For example, the following Helm chart enables autoscaling based on both TPS and CPU, using 50K TPS or 75% CPU usage as the upper threshold and 10K TPS or 25% CPU usage as the lower threshold:

cluster:
  clusterSpec:
    autoScaling:
      enabled: true
      metrics:
        tps:
          scaleUp: 50000
          scaleDown: 10000
        cpu:
          scaleUp: 75
          scaleDown: 25
3.3.3.1.1. Setting Appropriate AutoscalingThresholds

Defining appropriate thresholds is critical to effective autoscaling. What metrics you choose and what limits to set depend on the needs of your specific application. However, you have to be careful that the thresholds are both not too far apart that your application hits a resource constraint before scaling can take effect and not so close together that autoscaling ends up bouncing between scale up and scale down.

On the one hand, autoscaling does not take effect immediately. The trigger metrics are monitored periodically, so there can be a delay between when a threshold is crossed and when the Volt operator detects the event. Then actually scaling the cluster takes time as well. If a metric is steadily increasing, you do not want it to reach its physical maximum before the additional nodes are operational. So, for example, it would be dangerous to set the cpu.scaleUp threshold at 90%, since your cluster could easily reach 100% before the autoscaling is complete. Be sure to leave enough headroom for further growth until the elastic scaling operation can run to completion.

On the other hand, if you set the limits too close together, there is the danger that scaling up the cluster will reduce the trigger metric to the point where is now drops below the scale down threshold. For example, if you set the TPS limits at 20K and 40K, scaling up a 3-node, K=2 cluster to 6 nodes could easily cut the TPS in half, risking it dropping below the scale down threshold of 20K. The result would be a cluster that is constantly switching between scaling up and scaling down, seriously impinging on the cluster's ability to process requests.

The one other unusual situation to consider, when monitoring multiple metrics, is if one metric crosses the threshold in one direction while a second metric crosses the threshold in the opposite direction. For example, using the Helm chart from the preceding section, if the TPS upper threshold of 50K is crossed while at the same time the CPU usage drops below the lower threshold of 25%, the Volt operator will not take any action. Autoscaling will not be triggered unless only the upper or lower thresholds are crossed. This rule also applies when using a stabilization window (as described in the next section); if two thresholds are crossed in opposite directions during the stabilization window, the resize operation is canceled and the stabilization window reset.

3.3.3.1.2. Controlling the Autoscaling Process

Finally, there are additional properties that let you control the behavior of autoscaling. These include:

  • cluster.clusterSpec.autoScaling.minReplicas and cluster.clusterSpec.autoScaling.maxReplicas

    These properties specify the minimum and maximum number of nodes in the cluster. Autoscaling will not resize the cluster beyond these values, even of the threshold of a trigger metric is exceeded. The default minimum is K+1. The default maximum number of nodes is 16.

  • cluster.clusterSpec.autoScaling.stabilizationWindow

    There are actually two properties (.scaleUp and .scaleDown) specified in seconds that define how long after a threshold is crossed before the Volt operator actually starts the elastic resizing. If the metric drops back across the threshold during that window, the elastic operation is canceled and the stabilization "clock" reset. Providing a stabilization window allows applications with more dynamic workloads to cross the thresholds temporarily without triggering an elastic resize operation until the threshold is crossed for an extended period of time. The default stabilization window is 10 minutes (600 seconds).

  • cluster.clusterSpec.autoScaling.maxRetries and cluster.clusterSpec.autoScaling.retryTimeout

    Specify how many times an autoscaling resize operation is retried after it fails and how long to wait for such operations to start before deciding that it has failed. The default is not to retry any failed resizing (a value of zero) and the retry timeout is 60 seconds. See Section 3.3.3.3, “Troubleshooting Autoscaling and Recovering From Errors” for more information about troubleshooting autoscaling failures.

  • cluster.clusterSpec.autoScaling.notificationInterval

    This property specifies (in seconds) how frequently the Volt operator updates the logs and statistics during an autoscaling event. The default notification interval is zero, or no notifications.

3.3.3.2. Monitoring Autoscaling

Once autoscaling begins, it takes time for the cluster to either elastically expand or shrink, which can vary significantly depending on the size of the data, the current workload, and other circumstances. During this period, it is a good idea to monitor the process to ensure the elastic operations are proceeding as expected.

The operator periodically updates the status of the operation in the operator log, the Kubernetes cluster status, and as Kubernetes events. You can see the operator logs using the kubectl logs command. For example, if the Helm release name is mydb:

$  kubectl logs -f deploy/mydb-voltdb-operator    

To see the Kubernetes events associated with the operator, you can use the kubectl events command:

$  kubectl events --for deploy/mydb-voltdb-operator    

Or you can see both the cluster status and Kubernetes events using the kubectl describe command:

$  kubectl describe voltdbcluster mydb-voltdb-cluster
 [ . . . ]
Status:
  Cluster State:
    Auto Scaling:                                
      Desired Replicas:  3                       1
      Metrics:                                   2
        Tps:
      State:  Monitoring auto-scaling metrics    3

You can use whichever method you find most useful. However, the following examples use the cluster status because it is the easiest to read. In the preceding example, where a three node, K=1 cluster has autoscaling enabled, the status display tells you:

  1. The number replicas the operator expects. Before any autoscaling occurs, this matches the node count for the cluster (3).

  2. Which metrics are being monitored. In this case, only the TPS metric is being used.

  3. The overall state of autoscaling. In this case, autoscaling is enabled and the metrics are being monitored.

If a metric crosses an associated threshold, the status changes to indicate which metric it is and when the threshold was crossed:

Status:
  Cluster State:
    Auto Scaling:
      Desired Replicas:  3                                          1
      Metrics:
        Tps:
          Direction:               ScaleUp                          2
          Last Value:              22981
          Time Threshold Crossed:  2024-02-02T19:27:05.870Z
      State:                       Monitoring auto-scaling metrics  3  

The number of replicas (1) remains the same, because the stabilization window is in effect. But the metrics section (2) now includes information on which threshold has been triggered and when. While the status line still reports it is monitoring the metrics to make sure it stays over the threshold for the duration of the stabilization window.

If the metrics stays above (or below) the threshold when the stabilization window expires, the autoscaling event begins and the status changes to reflect the new state:

Status:
  Cluster State:
    Auto Scaling:
      Desired Replicas:  5                                         1
      Direction:         ScaleUp
      Metrics:
        Tps:
          Direction:               ScaleUp                         2
          Last Value:              28617
          Time Threshold Crossed:  2024-02-02T19:38:03.763Z
      State:                       Scaling cluster                 3
      Time Scaling Notified:       2024-02-02T19:38:34.695Z
      Time Scaling Started:        2024-02-02T19:38:34.695Z  

The target number of replicas (1) increases by K+1, to 5. Note this is the desired number of nodes, not the current number. It also reports the direction the cluster is resizing (in this case, scaling up). The metrics information (2) continues to report the current value and when the threshold was crossed triggering the autoscale activity. Finally, the status line (3) now reports that the cluster is elastically resizing to expand the cluster.

The cluster status will continue to report this information throughout the elastic resizing, which can take minutes, or even hours, depending on how much data must be moved and how busy the database is processing transactions. But once the resize is complete, the status returns to its initial stage, except the number of desired replicas now matches the new cluster size, 5.

Status:
  Cluster State:
    Auto Scaling:
      Desired Replicas:  5
      Metrics:
        Tps:
      State:  Monitoring auto-scaling metrics

3.3.3.3. Troubleshooting Autoscaling and Recovering From Errors

The autoScaling.maxRetries and autoScaling.retryTimeout properties give the Volt operator some flexibility in recovering from issues that might arise during an autoscaling event. However, not all failures are transitory. For example, there may be provisioning or configuration issues that are stopping the cluster from expanding. Which it is why it is important to know what to do if autoscaling does not operate correctly.

The first step is to monitor the autoscaling activity — either proactively or by reviewing the logs and status after the fact. If the maximum number of retries is exceeded or a scaling operation fails to restart, the cluster status is changed to indicate that autoscaling has stopped:

Status:
  Cluster State:
    Auto Scaling:
      Desired Replicas:  3
      Direction:         ScaleDown
      Metrics:
        Tps:
          Direction:               ScaleDown
          Time Threshold Crossed:  2024-02-02T21:24:17.690Z
      State:                       Auto-scaling stopped by failure

At this point autoscaling has stopped. But more importantly, the elastic resizing has not completed. So it requires human intervention to either complete or revert the resizing before autoscaling can be resumed. Again, the first step is to determine what is causing the resize operation to fail and, if at all possible, correcting the situation. This is particularly true when resizing to reduce the size of the cluster because the only reliable way to recover the cluster is to complete the resizing operation.

To do this, you want to shutdown the cluster, then restart it specifying the number of replicas before the resize operation began. For example, if the cluster was resizing from 5 nodes to 3, you will want to restart the cluster with the number of replicas set to 5:

$ helm upgrade mydb voltdb/voltdb --resuse-values \
  --set cluster.clusterSpec.replicas=5

Finally, there are a few basic rules about what not to do with the cluster while using autoscaling:

  • Do NOT manually initiate a resizing operation while autoscaling is enabled.

  • Do NOT initiate a software upgrade while autoscaling is enabled.

  • Do NOT shutdown the cluster while autoscaling is actively resizing the cluster.

  • Do NOT disable autoscaling while autoscaling is actively resizing the cluster.

In all cases, it is safest to wait for all resizing to complete, then disable autoscaling by setting cluster.clusterSpec.autoScaling.enabled to false before performing any of the preceding actions.