An import connector is a set of Java classes that configure the connector and then iteratively retrieve data from a remote source and pass it into VoltDB by invoking stored procedures. Unlike the export connector, which is responsible for formatting the data between source and target, the VoltDB import architecture allows for the use a separate formatter to translate the inbound data into a set of Java objects that can be passed as parameters to a stored procedure.
Import connectors are packaged as OSGi (Open Service Gateway Initiative) bundles, so they can be started and stopped easily from within the server process. However, for the purposes of writing a custom importer, VoltDB handles much of the OSGi infrastructure in the abstract classes associated with the import client. As a result, your import connector only needs to provide some of the classes normally required for an OSGi bundle. Specifically, a custom importer must provide the classes and methods described in Table 9.1, “Structure of the Custom Importer”.
Table 9.1. Structure of the Custom Importer
Class | Method | Description |
---|---|---|
implementation of ImporterConfig | class constructor | |
getFormatterBuilder() | Returns the FormatterBuilder method of the specified format. | |
getResourceID() | Returns a unique resource ID for this importer instance. | |
URI() | Returns the URI for the current importer instance. | |
extension of AbstractImporterFactory | create() | Returns an instance of the AbstractImporter implementation. |
createImporterConfigurations() | Returns a map of configuration information. | |
getTypeName() | Returns the name of the AbstractImporter class as a string. | |
isImporterRunEverywhere() | Returns true or false. | |
extension of AbstractImporter | getName() | Returns the name of the AbstractImporter class as a string. |
accept() | Performs the actual data import. Should check to see if stop() has been called. | |
stop() | Completes the import process. | |
URI() | Returns the URI for the current importer instance. |
Having all the right parts in place is extremely important, since if the bundle is incomplete or incorrect, the server process will crash when the importer starts. So the best way to create a new custom importer is to take an existing example — including the associated ant build script — and modify it as needed. You can find an example custom importer in the VoltDB public github repository at the following URL:
The following sections describe:
Writing the custom importer using Java
Compiling, packaging, and installing the importer
Configuring and running the importer
One of the most important decisions you must make when planning your custom importer is whether to run a single importer process for the cluster or to design a run-everywhere importer. A single importer process ensures only one instance of the importer is running at any given time. That means on a cluster, only one node will run the import connector process.
The following sections discuss run-everywhere vs. single process and managing the starting and stopping of the import connector.
A run-everywhere import connector starts a separate import process on each node in the cluster. A run-everywhere connector can improve performance since it distributes the work across the cluster. However, it means that the connector must negotiate the distribution of the work to avoid importing duplicate copies of the data.
Run-everywhere connectors are especially useful where the import process uses a "push" model rather than a "pull'. That is, if the connector opens a port and accepts data sent to the port, then the data source(s) can proactively connect and "push" data to that port, making the data source responsible for the distribution to the multiple servers of the VoltDB cluster.
You specify whether you are creating a single importer process or run-everywhere connector in the isImporterRunEverywhere() method of the Importer class. If the method returns true, importer processes are created on every server. If the method returns false, only one importer process is created at any given time.
When the custom importer is enabled, the ImporterFactory create() method is invoked, which in turn creates instances of the ImporterConfig and Importer classes. The VoltDB import infrastructure then calls the Importer accept() method for each importer process that is created.
The accept() method does the bulk of the work of iteratively fetching data from the appropriate sources, calling the formatter to structure the content of each row, and invoking the appropriate stored procedure to insert the data into the database. Two important points to keep in mind:
If the accept() method fails for any reason of returns to the caller, the importer will stop until the next time it is initialized. (That is, when the database restarts or is paused and resumed.)
On each iteration, the accept() method should check to see if the close() method has been called, so it can clean up any pending imports and then return to the caller.
Once the custom importer code is ready, you need to compile and package it as an OSGi-compliant JAR file. There are a number of OSGi properties that need to be set in the JAR file manifest. So it is easiest to use an ant build file to compile and package the files. The following is an excerpt from an example build.xml file for a custom importer project:
<!-- Simple build file to build socket stream importer --> <project name="customimport" basedir="." default="customimporter"> <property name='base.dir' location='.' /> <property name='bundles.dir' location='./bundles' /> <property name='build.dir' location='./obj' /> <target name="buildbundle" depends="customimporter"/> <resources id="default.imports.resource"> <string>org.osgi.framework;version="[1.6,2)"</string> <string>org.voltcore.network</string> <string>org.voltdb.importer</string> <string>org.voltdb.client</string> <string>org.voltdb.importer.formatter</string> <string>org.apache.log4j</string> <string>org.slf4j</string> <string>jsr166y</string> <string>org.voltcore.utils</string> <string>org.voltcore.logging</string> <string>com.google_voltpatches.common.base</string> <string>com.google_voltpatches.common.collect</string> <string>com.google_voltpatches.common.net</string> <string>com.google_voltpatches.common.io</string> <string>com.google_voltpatches.common.util.concurrent</string> </resources> <pathconvert property="default.imports" refid="default.imports.resource" pathsep=","/> <target name="customimporter"> <!-- Compile source files --> [ . . . ] <!-- Build OSGi bundle --> <antcall target="osgibundle"> <param name="bundle.name" value="mycustomimporter"/> <param name="activator" value="MyCustomImporterFactory"/> <param name="bundle.displayname" value="MyCustomImporter"/> <param name="include.classpattern" value="mycustomimporter/*.class"/> </antcall> </target> <target name="osgibundle"> <mkdir dir="${bundles.dir}" /> <jar destfile="${bundles.dir}//${bundle.name}.jar" basedir="${build.dir}"> <include name="${include.classpattern}"/> <manifest> <attribute name="Bundle-Activator" value="${activator}" /> <attribute name="Bundle-ManifestVersion" value="2" /> <attribute name="Bundle-Name" value="${bundle.displayname} OSGi Bundle" /> <attribute name="Bundle-SymbolicName" value="${bundle.displayname}" /> <attribute name="Bundle-Version" value="1.0.0" /> <attribute name="DynamicImport-Package" value="*" /> <attribute name="Import-Package" value="${default.imports}" /> </manifest> </jar> </target> </project>
Once you create the OSGi bundle, you install the custom importer by copying it to the bundles
folder inside the root directory of the VoltDB installation on every server in the cluster. For example, if VoltDB is
installed in /opt/voltdb
, copy your custom importer JAR file to
/opt/voltdb/bundles/
.
Once the custom importer is installed on the VoltDB servers, you can configure and start the importer using the database configuration file. You can configure import either before the database starts or after the database is running using the voltadmin update command.
In the configuration use the deployment.import.configuration
array to declare your custom
importer. Specify the type
as "custom" and identify the custom importer bundle in the
module
property specifying the name of the JAR file. It is also a good idea to provide a nickname so
you can identify the importer directly in the voltadmin get and set commands. For
example:
deployment:
import:
configuration:
- nickname: mycustomimporter
- type: custom
- module: mycustomimporter.jar
[ . . . ]
If the custom importer requires additional information, you can provide it in properties passed to the ImporterConfig class. For example:
deployment: import: configuration: - nickname: mycustomimporter - type: custom - module: mycustomimporter.jar - property: - name: datasource value: my.data.source - name: timeout value: 5m
As soon as the configuration is enabled, the import processes will be initialized and the custom importer accept() method invoked by the VoltDB import infrastructure.