Two important aspects of a VoltDB database are the physical layout of the cluster that runs the database and the
database features you choose to use. You define the physical cluster layout on the voltdb start command
using the --count
and --host
arguments. You enable and disable specific database
features in configuration files when you initialize the database root directory with the voltdb init
command.
In the simplest case — when running on a single node with no configuration specified — VoltDB defaults to
eight execution sites per host, and a K-safety value of zero. You can customize the database by specifying options in one or
more YAML configuration files when you initialize the database with the voltdb init command and the
--config
(or -C
) qualifier. You can put all of the configuration properties in a
single file, or you can modularize the configuration into separate files for individual topics. For example, the following
command customizes the database using separate YAML files for common properties, directory paths, and security:
$ voltdb init --config=common.yaml,paths.yaml,security.yaml
Configuration files are in YAML format, where options are specified as a hierarchy of properties with each element of the hierarchy indented on a separate line and terminated by a colon. The actual property values follow the colon. For example:
deployment: cluster: sitesperhost: 12 kfactor: 1
In the preceding example, the child properties of the deployment.cluster
element define the layout
of the database partitions, including:
sitesperhost — specifies the number of partitions created on each server in
the cluster. The sitesperhost
value times the number of servers gives you the total number of
partitions in the cluster. See Section 3.7.1, “Determining How Many Sites per Host” for more information about partition count.
kfactor — specifies the K-safety value to use for durability when creating the database. The K-safety value controls the duplication of database partitions. See Chapter 10, Availability for more information about K-safety.
Configuration files also enable and configure many runtime options related to the database, which are described later in this book. For example, the configuration file can specify:
Whether security is enabled and what users and passwords are needed to authenticate clients at runtime. See Chapter 12, Security for more information.
A schedule for saving automatic snapshots of the database. See Section 13.2, “Scheduling Automated Snapshots”.
Properties for exporting and importing data to other data sources. See Chapter 15, Streaming Data: Import, Export, and Migration.
For the complete list of properties and YAML file syntax, see Appendix E, YAML Configuration Properties.
There is very little penalty for allocating more sites than needed for the partitions the database will use (except
for incremental memory usage). Consequently, VoltDB defaults to eight sites per node to provide reasonable performance on
most modern system configurations. This default does not normally need to be changed. However, for systems with a large
number of available processors (16 or more) or older machines with fewer than 8 processors and limited memory, you may
wish to tune the sitesperhost
property.
The number of sites needed per node is related to the number of processor cores each system has, the optimal number being approximately 3/4 of the number of CPUs reported by the operating system. For example, if you are using a cluster of dual quad-core processors (in other words, 8 cores per node), the optimal number of partitions is likely to be 6 or 7 sites per node.
deployment: cluster: sitesperhost: 6
For systems that support hyperthreading (where the number of physical cores support twice as many threads), the operating system reports twice the number of physical cores. In other words, a dual quad-core system would report 16 virtual CPUs. However, each partition is not quite as efficient as on non- hyperthreading systems. So the optimal number of sites is more likely to be between 10 and 12 per node in this situation.
Because there are no hard and set rules, the optimal number of sites per node is best calculated by actually benchmarking the application to see what combination of cores and sites produces the best results. However, it is important to remember that all nodes in the cluster will use the same number of sites. So the best performance is achieved by using a cluster with all nodes having the same physical architecture (i.e. cores).
An important aspect of some runtime features is that they make use of disk resources for persistent storage across sessions. For example, automatic snapshots need a directory for storing snapshots of the database contents. Similarly, export uses disk storage for writing overflow data if the export connector cannot keep up with the export queue.
You can specify individual paths for each feature in the configuration. If not, VoltDB creates subfolders for each feature in the database root directory as needed, which can be useful for testing. However, in production, it is useful to direct certain high volume features, such as command logging, to separate devices to avoid disk I/O affecting database performance.
You can identify specific path locations for the following features using the paths
property:
Command logging (deployment.paths.commandlog
)
Command log snapshots(deployment.paths.commandlogsnapshot
)
Export overflow (deployment.paths.exportoverflow
)
Snapshots (deployment.paths.snapshots
)
If you specify a relative rather than an absolute path, it is relative to the database root directory. If you name a
specific feature path and it does not exist, VoltDB attempts to create it for you. For example, the export overflow path
contains temporary data which can be deleted periodically. The following configuration file specifies
/opt/overflow
as the directory for export overflow.
deployment: paths: exportoverflow: path: "/opt/overflow"
The configuration files and start command options define the desired configuration of your database cluster. However, there are several important aspects of the physical hardware and operating system configuration that you should be aware of before running VoltDB:
VoltDB can operate on heterogeneous clusters. However, best performance is achieved by running the cluster on similar hardware with the same type of processors, number of processors, and amount of memory on each node.
All nodes must be able to resolve the IP addresses and host names of the other nodes in the cluster. That means they must all have valid DNS entries or have the appropriate entries in their local hosts file.
You must run a time synchronization service such as Network Time Protocol (NTP) or chrony on all of the cluster nodes, preferably synchronizing against the same local time server. If the time skew between nodes in the cluster is greater than 200 milliseconds, VoltDB cannot start the database.
It is strongly recommended that you configure your time service to avoid adjusting time backwards. For example,
in NTP this is done using the -x
argument. If the server time moves backward, VoltDB must pause and
wait for time to catch up.