7.2. Writing a Custom Importer

Documentation

VoltDB Home » Documentation » Guide to Performance and Customization

7.2. Writing a Custom Importer

An import connector is a set of Java classes that configure the connector and then iteratively retrieve data from a remote source and pass it into VoltDB by invoking stored procedures. Unlike the export connector, which is responsible for formatting the data between source and target, the VoltDB import architecture allows for the use a separate formatter to translate the inbound data into a set of Java objects that can be passed as parameters to a stored procedure.

Import connectors are packaged as OSGi (Open Service Gateway Initiative) bundles, so they can be started and stopped easily from within the server process. However, for the purposes of writing a custom importer, VoltDB handles much of the OSGi infrastructure in the abstract classes associated with the import client. As a result, your import connector only needs to provide some of the classes normally required for an OSGi bundle. Specifically, a custom importer must provide the classes and methods described in Table 7.1, “Structure of the Custom Importer”.

Table 7.1. Structure of the Custom Importer

ClassMethodDescription
implementation of ImporterConfigclass constructor 
getFormatterBuilder()Returns the FormatterBuilder method of the specified format.
getResourceID()Returns a unique resource ID for this importer instance.
URI()Returns the URI for the current importer instance.
extension of AbstractImporterFactorycreate()Returns an instance of the AbstractImporter implementation.
createImporterConfigurations()Returns a map of configuration information.
getTypeName()Returns the name of the AbstractImporter class as a string.
isImporterRunEverywhere()Returns true or false.
extension of AbstractImportergetName()Returns the name of the AbstractImporter class as a string.
accept()Performs the actual data import. Should check to see if stop() has been called.
stop()Completes the import process.
URI()Returns the URI for the current importer instance.

Having all the right parts in place is extremely important, since if the bundle is incomplete or incorrect, the server process will crash when the importer starts. So the best way to create a new custom importer is to take an existing example — including the associated ant build script — and modify it as needed. You can find an example custom importer in the VoltDB public github repository at the following URL:

The following sections describe:

  • Writing the custom importer using Java

  • Compiling, packaging, and installing the importer

  • Configuring and running the importer

7.2.1. Designing and Coding a Custom Importer

One of the most important decisions you must make when planning your custom importer is whether to run a single importer process for the cluster or to design a run-everywhere importer. A single importer process ensures only one instance of the importer is running at any given time. That means on a cluster, only one node will run the import connector process.

The following sections discuss run-everywhere vs. single process and managing the starting and stopping of the import connector.

7.2.1.1. Run-Everywhere vs. Single Process

A run-everywhere import connector starts a separate import process on each node in the cluster. A run-everywhere connector can improve performance since it distributes the work across the cluster. However, it means that the connector must negotiate the distribution of the work to avoid importing duplicate copies of the data.

Run-everywhere connectors are especially useful where the import process uses a "push" model rather than a "pull'. That is, if the connector opens a port and accepts data sent to the port, then the data source(s) can proactively connect and "push" data to that port, making the data source responsible for the distribution to the multiple servers of the VoltDB cluster.

You specify whether you are creating a single importer process or run-everywhere connector in the isImporterRunEverywhere() method of the Importer class. If the method returns true, importer processes are created on every server. If the method returns false, only one importer process is created at any given time.

7.2.1.2. Managing the Starting and Stopping of the Import Process

When the custom importer is enabled, the ImporterFactory create() method is invoked, which in turn creates instances of the ImporterConfig and Importer classes. The VoltDB import infrastructure then calls the Importer accept() method for each importer process that is created.

The accept() method does the bulk of the work of iteratively fetching data from the appropriate sources, calling the formatter to structure the content of each row, and invoking the appropriate stored procedure to insert the data into the database. Two important points to keep in mind:

  • If the accept() method fails for any reason of returns to the caller, the importer will stop until the next time it is initialized. (That is, when the database restarts or is paused and resumed.)

  • On each iteration, the accept() method should check to see if the close() method has been called, so it can clean up any pending imports and then return to the caller.

7.2.2. Packaging and Installing a Custom Importer

Once the custom importer code is ready, you need to compile and package it as an OSGi-compliant JAR file. There are a number of OSGi properties that need to be set in the JAR file manifest. So it is easiest to use an ant build file to compile and package the files. The following is an excerpt from an example build.xml file for a custom importer project:

<!-- Simple build file to build socket stream importer -->
<project name="customimport"      basedir="." default="customimporter">
<property name='base.dir'         location='.' />
<property name='bundles.dir'      location='./bundles' />
<property name='build.dir'        location='./obj' />

   <target name="buildbundle" depends="customimporter"/>

   <resources id="default.imports.resource">
      <string>org.osgi.framework;version=&quot;[1.6,2)&quot;</string>
      <string>org.voltcore.network</string>
      <string>org.voltdb.importer</string>
      <string>org.voltdb.client</string>
      <string>org.voltdb.importer.formatter</string>
      <string>org.apache.log4j</string>
      <string>org.slf4j</string>
      <string>jsr166y</string>
      <string>org.voltcore.utils</string>
      <string>org.voltcore.logging</string>
      <string>com.google_voltpatches.common.base</string>
      <string>com.google_voltpatches.common.collect</string>
      <string>com.google_voltpatches.common.net</string>
      <string>com.google_voltpatches.common.io</string>
      <string>com.google_voltpatches.common.util.concurrent</string>
   </resources>

   <pathconvert property="default.imports" 
                refid="default.imports.resource" pathsep=","/>

   <target name="customimporter">
 
     <!-- Compile source files -->
     [ . . . ]

     <!-- Build OSGi bundle -->
     <antcall target="osgibundle">
         <param name="bundle.name" value="mycustomimporter"/>
         <param name="activator" value="MyCustomImporterFactory"/>
         <param name="bundle.displayname" value="MyCustomImporter"/>
         <param name="include.classpattern" value="mycustomimporter/*.class"/>
      </antcall>
   </target>

   <target name="osgibundle">
      <mkdir dir="${bundles.dir}" />
      <jar destfile="${bundles.dir}//${bundle.name}.jar" basedir="${build.dir}">
         <include name="${include.classpattern}"/>
         <manifest>
            <attribute name="Bundle-Activator" value="${activator}" />
            <attribute name="Bundle-ManifestVersion" value="2" />
            <attribute name="Bundle-Name" 
                       value="${bundle.displayname} OSGi Bundle" />
            <attribute name="Bundle-SymbolicName" 
                       value="${bundle.displayname}" />
            <attribute name="Bundle-Version" value="1.0.0" />
            <attribute name="DynamicImport-Package" value="*" />
            <attribute name="Import-Package" value="${default.imports}" />
         </manifest>
      </jar>
   </target>
</project>

Once you create the OSGi bundle, you install the custom importer by copying it to the bundles folder inside the root directory of the VoltDB installation on every server in the cluster. For example, if VoltDB is installed in /opt/voltdb, copy your custom importer JAR file to /opt/voltdb/bundles/.

7.2.3. Configuring and Running a Custom Importer

Once the custom importer is installed on the VoltDB servers, you can configure and start the importer using the database configuration file. You can configure import either before the database starts or after the database is running using the voltadmin update command.

In the configuration use the <import> and <configuration> elements to declare your custom importer. Specify the type as "custom" and identify the custom importer bundle in the module attribute specifying the name of the JAR file. For example:

<import>
   <configuration type="custom" module="mycustomimporter.jar">
     [ . . . ]

If the custom importer requires additional information, you can provide it in properties passed to the ImporterConfig class. For example:

<import>
   <configuration type="custom" module="mycustomimporter.jar">
     <property name="datasource">my.data.source</property>
     <property name="timeout">5m</property>
  </configuration>
</import>

As soon as the configuration is enabled, the import processes will be initialized and the custom importer accept() method invoked by the VoltDB import infrastructure.