The file connector receives the serialized data from the export streams and writes it out as text files (either comma or tab separated) to disk. The file connector writes the data out one file per stream, "rolling" over to new files periodically. The filenames of the exported data are constructed from:
A unique prefix (specified with the nonce
property)
A unique value identifying the current version of the database schema
The stream name
A timestamp identifying when the file was started
While the file is being written, the file name also contains the prefix "active-". Once the file is complete and a new file started, the "active-" prefix is removed. Therefore, any export files without the prefix are complete and can be copied, moved, deleted, or post-processed as desired.
There are two properties that must be set when using the file connector:
The type
property lets you choose between comma-separated files (csv) or tab-delimited files
(tsv).
The nonce
property specifies a unique prefix to identify all files that the connector writes
out for this database instance.
Table 15.1, “File Export Properties” describes the supported properties for the file connector.
Table 15.1. File Export Properties
Property | Allowable Values | Description |
---|---|---|
type* | csv, tsv | Specifies whether to create comma-separated (CSV) or tab-delimited (TSV) files, |
nonce* | string | A unique prefix for the output files. |
outdir | directory path | The directory where the files are created. Relative paths are relative to the database root directory on each server. If you do not specify an output path, VoltDB writes the output files into the root directory itself. |
period | Integer | The frequency, in minutes, for "rolling" the output file. The default frequency is 60 minutes. |
binaryencoding | hex, base64 | Specifies whether VARBINARY data is encoded in hexadecimal or BASE64 format. The default is hexadecimal. |
dateformat | format string | The format of the date used when constructing the output file names. You specify the date format as a Java SimpleDateFormat string. The default format is "yyyyMMddHHmmss". |
timezone | string | The time zone to use when formatting the timestamp. Specify the time zone as a Java timezone identifier. The default is GMT. |
delimiters | string | Specifies the delimiter characters for CSV output. The text string specifies four characters in the following order: the separator, the quote character, the escape character, and the end-of-line character. Non-printing characters must be encoded as Java literals. For example, the new line character (ASCII code 13) should be entered as "\n". Alternately, you can use Java Unicode literals, such as "\u000d". You must also encode any XML special characters, such as the ampersand and left angle bracket as HTML entities for inclusion in the XML configuration file. For example encoding "<" as ">". The following property definition matches the default delimiters. That is, the comma, the double quotation character twice (as both the quote and escape delimiters) and the new line character: <property name="delimiter">,""\n</property> |
batched | true, false | Specifies whether to store the output files in subfolders that are "rolled" according to the frequency specified by the period property. The subfolders are named according to the nonce and the timestamp, with "active-" prefixed to the subfolder currently being written. |
skipinternals | true, false | Specifies whether to include six columns of VoltDB metadata (such as transaction ID and timestamp) in the output. If you specify skipinternals as "true", the output files contain only the exported stream data. |
with-schema | true, false | Specifies whether to write a JSON representation of each stream's schema as part of the export. The JSON schema files can be used to ensure the appropriate datatype and precision is maintained if and when the output files are imported into another system. |
*Required |
Whatever properties you choose, the order and representation of the content within the output files is the same. The export connector writes a separate line of data for every INSERT it receives, including the following information:
Six columns of metadata generated by the export connector. This information includes a transaction ID, a timestamp, a sequence number, the site and partition IDs, as well as an integer indicating the query type.
The remaining columns are the columns of the database stream, in the same order as they are listed in the database definition (DDL) file.