3.2. Initializing and Starting a VoltDB Database on a Cluster

Documentation

VoltDB Home » Documentation » Using VoltDB

3.2. Initializing and Starting a VoltDB Database on a Cluster

You initialize and start a cluster the same way you start a single node: with the voltdb init and start commands. The only difference is that when starting the cluster, you must tell the cluster nodes how big the cluster is and which nodes to use as potential hosts for the startup.

You initialize a root directory on each server using the voltdb init command. You can accept the default configuration as shown in the previous section. However, when setting up a cluster you often want to make some configuration adjustments (for example, enabling K-safety). So it is a good idea to get into the habit of specifying a configuration file.

You specify the configuration file with the --config or -C flag when you initialize the root directory. All nodes must use the same configuration file. For example:

$ voltdb init -D ~/mydb --config=myconfig.xml

Once the nodes are initialized, you start the cluster by issuing the voltdb start command on all nodes specifying the following information:

  • Number of nodes in the cluster: When you start the cluster, you specify how many servers will make up the cluster using the --count flag.

  • Host names: You specify the hostnames or IP addresses of one or more servers from the cluster that are potential "hosts" for coordinating the formation of the cluster. You specify the list of hosts with the --host or -H flag. You must specify at least one node as a host.

For each node of the cluster, log in and start the server process using the same voltdb start command. For example, the following example starts a five-node database cluster specifying voltsvr1 as the host node. Be sure the number of nodes on which you run the command match the number of nodes specified in the --count argument.

$ voltdb start --count=5 -–host=voltsvr1

Or you can also use shortened forms for the argument flags:

$ voltdb start -c 5 -H voltsvr1

Although you only need to specify one potential host, it is a good idea to specify multiple hosts. This way, you can use the exact same command for both starting and rejoining nodes in a highly-available cluster. Even if the rejoining node is in the host list another, running node can be chosen to facilitate the rejoin.

To simplify even further, you can specify all of the servers in the --host argument. If you do this, you can skip the --count argument. If --count is missing, VoltDB assumes the number of servers in the --host list is complete and sets the server count to match. For example, the following command — issued on all three servers — starts a three node cluster:

$ voltdb start --host=svrA,svrB,svrC

When starting a VoltDB database on a cluster, the VoltDB server process performs the following actions:

  1. If you are starting the database process on the node selected as the host node, it waits for initialization messages from the remaining nodes. The host is selected from the list of hosts on the command line and plays a special role during startup by managing the cluster initiation process. It is important that all nodes in the cluster can resolve the hostnames or IP addresses of the host nodes you specify.

  2. If you are starting the database on a non-host node, it sends an initialization message to the host indicating that it is ready. The database is not operational until the correct number of nodes (as specified on the command line) have connected.

  3. Once all the nodes have sent initialization messages, the host sends out a message to the other nodes that the cluster is complete. Once the startup procedure is complete, the host's role is over and it becomes a peer like every other node in the cluster. It performs no further special functions.

Manually logging on to each node of the cluster every time you want to start the database can be tedious. Instead, you can use secure shell (ssh) to execute shell commands remotely. By creating an ssh script (with the appropriate permissions) you can copy files and/or start the database on each node in the cluster from a single script. Or you can use distributed system management tools such as Chef and Puppet to automate the startup procedures.