Chapter 7. Creating Custom Importers, Exporters, and Formatters

Documentation

VoltDB Home » Documentation » Guide to Performance and Customization

Chapter 7. Creating Custom Importers, Exporters, and Formatters

VoltDB includes built-in export and import connectors for a number of standard formats, such as CSV files, JDBC, Kafka topics, and so on. If you have a data source or destination not currently covered by connectors provided by VoltDB, you could write a custom application to perform the translation. However, you would then need to manually coordinate the starting and stopping of your application with the starting and stopping of the database.

A better approach is to create a custom import or export connector. Custom connectors run within the VoltDB process and use the standard mechanisms in VoltDB for synchronizing the running of the connector with the database itself. You write custom connectors as Java classes, packaged in a JAR file, which VoltDB can access at runtime. This chapter provides instructions and sample code for writing, installing, and configuring custom export and import connectors. It also describes how to write custom formatters that can be used to interpret the input coming from an import connector.

7.1. Writing a Custom Exporter

An export connector, known internally as an ExportClient, is a Java class that receives blocks of row data when data is inserted into a stream within the database. The export connector is responsible for formatting and passing those rows to the downstream export target. A sample export client can be found online in the VoltDB github repository at following location.

The following sections use a similar example to describe:

7.1.1. The Structure and Workflow of the Export Client

VoltDB passes data to the export client in blocks that are roughly 2MB in size but do not align with transactions. A block is guaranteed to contain complete rows — that is, no single SQL INSERT to an export stream is split across blocks. The handoff from the internal VoltDB producer to the custom export client follows a simple pattern:

producer -> client.onBlockStart
foreach row in block:
    producer -> client.processRow
producer -> client.onBlockCompletion

Each time the pattern executes, it runs within a single thread. Therefore, it is not necessary to synchronize accesses to the data structures used in client.onBlockStart, client.processRow, and client.onBlockCompletion unless they are used in other threads as well.

For each row of data, the processRow() method is called. Within the method you decode the input and then process the resulting column values. For example:

public boolean processRow(int rowSize, byte[] rowData) 
               throws RestartBlockException {
   try {
        // Process one row from the current block
        ExportRowData pojo = this.decodeRow(rowData);
        for (int i =0; i < pojo.values.length; i++) {

              // do actual work . . .

        }
    } catch (IOException e) {

Note that each row starts with six columns of metadata, including the transaction ID and timestamp. If you do not need this information, you can skip the first six columns.

If the client fails at onBlockStart, processRow or onBlockCompletion, the export client must throw a RestartBlockException to prevent VoltDB from acknowledging (ACKing) and dropping the export data from its durability control. This point deserves repeating: if the custom ExportClient runs onBlockStart, processRow and onBlockCompletion without throwing the correct exception, VoltDB assumes the data is remotely durable and that the VoltDB database can discard that export block.

The ExportClient must not return from onBlockCompletion until it ensures the downstream target acknowledges receipt of the data. See getExecutor() in the sample export client for some further commentary on correct thread handling.

7.1.2. How to Use Custom Properties to Configure the Client

Properties, set in the deployment file as part of the export configuration, let you pass information to the export connector. For example, if the user needs to pass the export connector a file location or IP address for the export target. What properties are necessary or supported is up to you as the author of the export client to decide.

The properties specified in the deployment file are passed to the export client as a Properties object argument to the configure() method every time the connector starts. That is, whenever the database starts with the connector enabled or whenever the schema or deployment is modified (for example, by a voltadmin update command).

The configure() method can either iterate over the Properties object or it can look for specific entries as needed. For example:

public void configure(Properties config) throws Exception {
  
        // Check for specific property value
    if config.containsKey("filename") {
       filename = config.getProperty("filename");
     }   
}

7.1.3. How to Compile and Install the Client

Once your export client code is complete, you need to compile, package, and install the connector on the appropriate VoltDB servers. You compile the export client like other Java methods. Be sure to include the VoltDB server jar file in the classpath. For example, if VoltDB is installed in a directory called voltdb in your home directory, the command could be:

$ javac -cp "$HOME/voltdb/voltdb/*:./" -d obj \
   org.voltdb.exportclient/MyExportClient.java

After compiling the source code, you must package the resulting class into a JAR file, like so:

$ jar cvf myexportclient.jar -C obj .

Finally you must install the JAR file in the lib/extension folder where VoltDB is installed on all servers in the cluster that will be running the export client. For, example, if you are running a single node cluster on the current node, where VoltDB has been installed as $HOME/voltdb, you can copy the JAR file with the following command:

$ cp myexportclient.jar $HOME/voltdb/lib/extension/

7.1.4. How to Configure the Export Client

Once your custom export client is installed you can configure and start it. Custom export clients are configured like any other export connector, by adding a <configure> section to <export> in the deployment file (or configuring it interactively in the VoltDB Management Center). For custom clients, you declare the connector type as "custom" and add the exportconnectorclass attribute specifying the connector's Java classpath. For example:

<export>
   <configuration enabled="true" target="myspecial" type="custom" 
    exportconnectorclass="org.voltdb.exportclient.MyExportClient" >
     <property name="filename">myexportfile.txt</property>
  </configuration>
</export>

Any properties listed in the <configuration> ("filename" in this example) are passed to the custom export client as arguments to the configure() method, as described in Section 7.1.2, “How to Use Custom Properties to Configure the Client”. See the chapter on "Importing and Exporting Live Data" in the Using VoltDB manual for more information on configuring export connectors.