8.2. Handling Errors When Restoring a Database

Documentation

VoltDB Home » Documentation » Administrator's Guide

8.2. Handling Errors When Restoring a Database

After determining what caused the problem, the next step is often to get the database up and running again as soon as possible. When using snapshots or command logs, this is done using the voltdb start command described in Section 3.6, “Restarting the Database”. However, in unusual cases, the restart itself may fail.

There are several situations where an attempt to recover a database — either from a snapshot or command logs — may fail. For example, restoring data from a snapshot to a schema where a unique index has been added can result in a constraint violation. In this case, the restore operation continues but any records that caused a constraint violation are saved to a CSV file.

Or when recovering command logs, the log may contain a transaction that originally succeeded but fails and raises an exception during playback. In this situation, VoltDB issues a fatal error and stops the database to avoid corrupting the contents.

Although protecting you from an incomplete recovery is the appropriate default behavior, there may be cases where you want to recover as much data as possible, with full knowledge that the resulting data set does not match the original. VoltDB provides two processes for performing partial recoveries in case of failure:

  • Logging constraint violations during snapshot restore

  • Performing command log recovery in safe mode

The following sections describe these procedures.

Warning

It is critically important to recognize that the techniques described in this section do not produce a complete copy of the original database or resolve the underlying problem that caused the initial recovery to fail. These techniques should never be attempted without careful consideration and full knowledge and acceptance of the risks associated with partial data recovery.

8.2.1. Logging Constraint Violations

There are several situations that can cause a snapshot restore to fail because of constraint violations. Rather than have the operation fail as a whole, VoltDB continues with the restore process and logs the constraint violations to a file instead. This way you can review the tuples that were excluded and decide whether to ignore or replace their content manually after the restore completes.

By default, the constraint violations are logged to one or more files (one per table) in the same directory as the snapshot files. In a cluster, each node logs the violations that occur on that node. If you know there are going to constraint violations and want to save the logged constraints to a different location, you can use a special JSON form of the @SnapshotRestore system procedure. You specify the path of the log files in a JSON attribute, duplicatePaths. For example, the following commands perform a restore of snapshot files in the directory /var/voltdb/snapshots/ with the unique identifier myDB. The restore operation logs constraint violations to the directory /var/voltdb/logs.

$ sqlcmd
1> exec @SnapshotRestore '{ "path":"/var/voltdb/snapshots/", 
                            "nonce":"myDB", 
                            "duplicatesPath":"/var/voltdb/logs/" }';
2> exit

Constraint violations are logged as needed, one file per table, to CSV files with the name {table}-duplicates-{timestamp}.csv.

8.2.2. Safe Mode Recovery

On rare occasions, recovering a database from command logs may fail. This can happen, for example, if a stored procedure introduces non-deterministic content. If a recovery fails, the specific error is known. However, there is no way for VoltDB to know the root cause or how to continue. Therefore, the recovery fails and the database stops.

When this happens, VoltDB logs the last successful transaction before the recovery failed. You can then ask VoltDB to restart up to but not including the failing transaction by performing a recovery in safe mode.

You request safe mode by adding the --safemode switch to the voltdb start command, like so:

$ voltdb start --safemode --dir=~/mydb

When VoltDB recovers from command logs in safe mode it enables two distinct behaviors:

  • Snapshots are restored, logging any constraint violations

  • Command logs are replayed up to the last valid transaction

This means that if you are recovering using an automated snapshot (rather than command logs), you can recover some data even if there are constraint violations during the snapshot restore. Also, when recovering from command logs, VoltDB will ignore constraint violations in the command log snapshot and replay all transactions that succeeded in the previous attempt.

It is important to note that to successfully use safe mode with command logs, you must perform a regular recovery operation first — and have it fail — so that VoltDB can determine the last valid transaction. Also, if the snapshot and the command logs contain both constraint violations and failed transactions, you may need to run recovery in safe mode twice to recover as much data as possible. Once to complete restoration of the snapshot, then a second time to recover the command logs up to a point before the failed transaction.