15.2. Planning your Export Strategy

Documentation

VoltDB Home » Documentation » Using VoltDB

15.2. Planning your Export Strategy

The important point when planning to export data, is deciding:

  • What data to export

  • When to export the data

  • Where to export data to

It is possible to export all of the data in a VoltDB database. You would do this by creating export stream replicas of all tables in the schema and writing to the corresponding stream whenever you insert into the normal table. However, this means the same number of transactions and volume of data that is being processed by VoltDB will be exported through the connector. There is a strong likelihood, given a high transaction volume, that the target database will not be able to keep up with the load VoltDB is handling. As a consequence you will usually want to be more selective about what data is exported when.

If you have an existing target database, the question of what data to export is likely decided for you (that is, you need to export the data matching the target's schema). If you are defining both your VoltDB database and your target at the same time, you will need to think about what information is needed "downstream" and create the appropriate export streams within VoltDB.

The second consideration is when to export the data. For tables that are not updated frequently, inserting the data to a complementary export stream whenever data is inserted into the real table is the easiest and most practical approach. For tables that are updated frequently (hundreds or thousands of times a second) you should consider writing a copy of the data to an export stream at an appropriate milestone.

Using the flight reservation system as an example, one aspect of the workflow not addressed by the application described in Chapter 6, Designing VoltDB Client Applications is the need to archive information about the flights after takeoff. Changes to reservations (additions and cancellations) are important in real time. However, once the flight takes off, all that needs to be recorded (for billing purposes, say) is what reservations were active at the time.

In other words, the archiving database needs information about the customers, the flights, and the final reservations. According to the workload in Table 4.1, “Example Application Workload”, the customer and flight tables change infrequently. So data can be inserted into the export streams at the same time as the "live" flight and reservation tables. (It is a good idea to give the export stream a meaningful name so its purpose is clear. In this example we identify the streams with the export_ prefix or, in the case of the reservation stream which is not an exact copy, the _final suffix.)

The reservation table, on the other hand, is updated frequently. So rather than export all changes to a reservation to the reservation stream in real-time, a separate stored procedure is invoked when a flight takes off. This procedure copies the final reservation data to the export stream and deletes the associated flight and reservation records from the VoltDB database. Figure 15.2, “Flight Schema with Export Streams” shows the modified database schema with the added export streams, EXPORT_FLIGHT, EXPORT_CUSTOMER, and RESERVATION_FINAL.

Figure 15.2. Flight Schema with Export Streams

Flight Schema with Export Streams

This design adds a transaction to the VoltDB application, which is executed approximately once a second (when a flight takes off). However, it reduces the number of reservation transactions being exported from 1200 a second to less than 200 a second. These are the sorts of trade offs you need to consider when adding export functionality to your application.

The third decision is where to export the data to. As described in Section 15.4, “Configuring Export in the Configuration File”, you can export the data through multiple different protocols: files, HTTP. JDBC, etc. Your choice of protocol will depend on the ultimate target destination for your exported data.

You can also export to multiple destinations at once. When you declare a stream, you can assign it to a specific export target. If you want different streams to be exported to different destinations, you declare the streams to belong to different targets. Then in the configuration file you can assign each target to be exported to a different destination.