Two important aspects of export to keep in mind are:
Export is automatic. When you enable an export target in the configuration file, the database servers take care of starting and stopping the connector on each server when the database starts and stops, including if nodes fail and rejoin the cluster. You can also start and stop export on a running database by updating the configuration file using the voltadmin update command.
Export is asynchronous. The actual delivery of the data to the export target is asynchronous to the transactions that initiate data transfer.
The advantage of an asynchronous approach is that any delays in delivering the exported data to the target system do not interfere with the VoltDB database performance. The disadvantage is that VoltDB must handle queueing export data pending its actual transmission to the target, including ensuring durability in case of system failures. Again, this task is handled automatically by the VoltDB server process. But it is useful to understand how the export queuing works and its consequences.
One consequence of this durability guarantee is that VoltDB will send at least one copy of every export record to the target. However, it is possible when recovering command logs or rejoining nodes, that certain export records are resent. It is up to the downstream target to handle these duplicate records. For example, using unique indexes or including a unique record ID in the export stream.
For the export process to work, it is important that the connector keep up with the queue of exported information. If too much data gets queued to the connector by the export function without being delivered by the target system, the VoltDB server process consumes increasingly large amounts of memory.
If the export target does not keep up with the connector and the data queue fills up, VoltDB starts writing overflow data in the export buffer to disk. This protects your database in several ways:
If the destination is intermittently unreachable or cannot keep up with the data flow, writing to disk helps VoltDB avoid consuming too much memory while waiting for the destination to catch up.
If the database is stopped, the export data is retained across sessions. When the database restarts, the connector will retrieve the overflow data and reinsert it in the export queue.
You can specify where VoltDB writes the overflow export data using the <exportoverflow> element in the configuration file. For example:
<paths> <exportoverflow path="/tmp/export/"/> </paths>
If you do not specify a path for export overflow, VoltDB creates a subfolder in the database root directory. See Section 3.7.2, “Configuring Paths for Runtime Features” for more information about configuring paths in the configuration file.
It is important to note that VoltDB only uses the disk storage for overflow data. However, you can force VoltDB to write all queued export data to disk using any of the following methods:
Calling the @Quiesce system procedure
Requesting a blocking snapshot (using voltadmin save --blocking)
Performing an orderly shutdown (using voltadmin shutdown)
This means that if you perform an orderly shutdown with the voltadmin shutdown command, you can recover the database — and any pending export queue data — by simply restarting the database cluster in the same root directories.
Note that when you initialize or re-initialize a root directory, any subdirectories of the root are purged.[5] So if your configuration did not specify a different location for the export overflow, and you re-initialize the root directories and then restore the database from a snapshot, the database is restored but the export overflow will be lost. If both your original and new configuration use the same, explicit directory outside the root directory for export overflow, you can start a new database and restore a snapshot without losing the overflow data.
[5] Initializing a root directory deletes any files in the command log and overflow directories. The snapshots directory is archived to a named subdirectory.