11.4. Monitoring Database Replication

Documentation

VoltDB Home » Documentation » Using VoltDB

11.4. Monitoring Database Replication

Database replication runs silently in the background. To ensure replication is proceeding effectively, VoltDB provides statistics on the producer and consumer clusters that help you understand the current state of the DR process. Specifically, the statistics can tell you:

  • The amount of DR data waiting to be sent from the producer

  • The timestamp and unique ID of the last transaction received by the consumer

  • Whether any partitions are "falling behind" in processing DR data

This information is available from the @Statistics system procedure using the DRROLE, DRCONSUMER, and DRPRODUCER selectors. All clusters provide summary information in response to the DRROLE selector. For one-way (passive) DR, the master database is a "producer" and provides additional information through the DRPRODUCER selector and the replica is the "consumer" and provides additional information through the DRCONSUMER selector. For two-way (cross datacenter) replication, all clusters act as both producer and consumer and can provide statistics on both roles:

  • On all databases, the @Statistics DRROLE procedure provides summary information about the database's DR role (master, replica, xdcr, or none), the cluster ID, and the current state of the DR process.

  • On the producer database, the @Statistics DRPRODUCER procedure includes columns for the cluster IDs of the current cluster and the consumer, as well as the transaction ID and timestamp of the last queued transaction and for the last transaction ACKed by the consumer. The difference between these two events can tell you the approximate latency between the two databases.

  • On the consumer database, the @Statistics DRCONSUMER procedure includes statistics, on a per partition basis, showing whether it has an identified "host" server from each producer cluster "covering" it, or in other words, providing it DR logs. The system procedure results also include columns listing the ID and timestamp of the last received transaction for each producer cluster. If a consumer partition is not covered, it means it has lost contact with the server on the producer database that was providing it logs (possibly due to a node failure). It is possible for the partition to recover, once the covering server rejoins. However, the difference between the last received timestamp of that partition and the other partitions may give you an indication of how long the interruption has persisted and how far behind that partition may be.