CREATE TABLE

Documentation

VoltDB Home » Documentation » Using VoltDB

CREATE TABLE

CREATE TABLE — Creates a table in the database.

Synopsis

CREATE TABLE table-name (
column-definition [,...]
[, constraint-definition [,...]]

);

column-definition: column-name datatype [DEFAULT value ] [ NOT NULL ] [index-type]

constraint-definition: [CONSTRAINT constraint-name] { index-definition | limit-definition }

index-definition: {index-type} (column-name [,...])

limit-definition: LIMIT PARTITION ROWS row-count [EXECUTE (delete-statement)]

index-type: PRIMARY KEY | UNIQUE | ASSUMEUNIQUE

Description

The CREATE TABLE statement creates a table and its associated columns in the database. The supported datatypes are described in Table A.1, “Supported SQL Datatypes”.

Table A.1. Supported SQL Datatypes

SQL DatatypeEquivalent Java DatatypeDescription
TINYINTbyte1-byte signed integer, -127 to 127[a]
SMALLINTshort2-byte signed integer, -32,767 to 32,767
INTEGERint4-byte signed integer, -2,147,483,647 to 2,147,483,647
BIGINTlong8-byte signed integer, -9,223,372,036,854,775,807 to 9,223,372,036,854,775,807
FLOATdouble8-byte numeric, -(2-2-52)·21023 to (2-2-52)·21023 (Note that values less than or equal to -1.7E+308 are interpreted as null.)
DECIMALBigDecimal16-byte fixed scale of 12 and precision of 38, -99999999999999999999999999.999999999999 to 99999999999999999999999999.999999999999
GEOGRAPHY or GEOGRAPHY() A geospatial region. The storage requirement for geospatial data varies depending on the geometry. The default maximum size in memory is 32768. However, you can specify a different value by specifying the maximum size (in bytes) in the declaration. For example: GEOGRAPHY(80000). See the section on entering geospatial data in the VoltDB Guide to Performance and Customization for details.
GEOGRAPHY_POINT A geospatial location identified by its latitude and longitude. Requires 16 bytes of storage.
VARCHAR()StringVariable length text string, with a maximum length specified in either characters (the default) or bytes. To specify the length in bytes, use the BYTES keyword after the length value. For example: VARCHAR(28 BYTES).
VARBINARY()byte arrayVariable length binary string (sometimes referred to as a "blob") with a maximum length specified in bytes
TIMESTAMPlong, VoltDB TimestampTypeTime in microseconds

[a] For integer and floating-point datatypes, VoltDB reserves the largest possible negative value to denote a null value. For example -128 is interpreted as null for TINYINT, -32768 for SMALLINT, and so on.


The following limitations are important to note when using the CREATE TABLE statement in VoltDB:

  • CHECK and FOREIGN KEY constraints are not supported.

  • VoltDB does not support AUTO_INCREMENT, the automatic incrementing of column values.

  • Each column has a maximum size of one megabyte and the total declared size of all of the columns in a table cannot exceed two megabytes. For VARCHAR columns where the length is specified in characters, the declared size is calculated as four bytes per character to allow for the longest potential UTF-8 string.

  • If you intend to use a column to partition a table, that column cannot contain null values. You must specify NOT NULL in the definition of the column or VoltDB issues an error when compiling the schema.

  • When you specify an index constraint, by default VoltDB creates a tree index. You can explicitly create a hash index by including the string "hash" as part of the index name. For example, the following declaration creates a hash index, Version_Hash_Idx, of three numeric columns.

    CREATE TABLE Version (
        Major SMALLINT NOT NULL,
        Minor SMALLINT NOT NULL,
        baselevel INTEGER NOT NULL,
        ReleaseDate TIMESTAMP,
        CONSTRAINT Version_Hash_Idx PRIMARY KEY
            (Major, Minor, Baselevel)
    );

    See the description of CREATE INDEX for more information on the difference between hash and tree indexes.

  • To specify an index — either for an individual column or as a table constraint — that is globally unique across the database, use the standard SQL keywords UNIQUE and PRIMARY KEY. However, for partitioned tables, VoltDB can only ensure uniqueness if the index includes the partitioning column. Otherwise, these keywords are not allowed.

    It can be a performance advantage to define indexes or constraints on non-partitioning columns that you, as the developer, know are going to contain unique values. Although VoltDB cannot ensure uniqueness across the entire database, it does allow you to define indexes that are assumed to be unique by using the ASSUMEUNIQUE keyword.

    When you define an index on a partitioned table as ASSUMEUNIQUE, VoltDB verifies uniqueness within the current partition when creating an index entry. However, it is your responsibility as developer or administrator to ensure that the values are actually globally unique. If the database is repartitioned due to adding new nodes or restoring a snapshot to a different cluster configuration, non-unique ASSUMEUNIQUE index entries may collide. When this occurs it results in a constraint violation error and the database will not be able to complete its current action.

    Therefore, ASSUMEUNIQUE should be used with caution. Also, it is not necessary and should not be used with replicated tables or indexes that contain the partitioning column, which can be defined as UNIQUE.

  • VoltDB includes a special constraint, LIMIT PARTITION ROWS, that limits the number of rows of data that can be inserted into any one partition for the table. This constraint is useful for managing memory usage and avoiding accidentally running out of memory due to unbalanced partitions or unexpected data growth.

    Note that the limit, specified as an integer, limits the number of rows per partition, not for the table as a whole. In the case of replicated tables, where each partition contains all rows of the table, the limit applies equally to the table as a whole and each partition. Also, the constraint is applied to INSERT operations. The constraint is not enforced when restoring a snapshot, altering the table declaration, or rebalancing the cluster as part of elastically adding nodes. In these cases, ignoring the limit allows the operation to succeed even if, as a result, a partition ends up containing more rows than specified by the LIMIT PARTITION ROWS constraint. But once the limit has been exceeded, any attempt to INSERT more table rows into that partition will result in an error, until sufficient rows are deleted to reduce the row count below the limit.

    As part of the LIMIT PARTITION ROWS constraint, you can optionally include an EXECUTE clause that specifies a DELETE statement to be executed when an INSERT statement will exceed the partition's row limit. For example, assume the events table has the following constraint as part of the CREATE TABLE statement:

    CREATE TABLE events (
       event_time TIMESTAMP NOT NULL,
       event_code INTEGER NOT NULL,
       event_message VARCHAR(128),
       LIMIT PARTITION ROWS 1000 EXECUTE (
         DELETE FROM events WHERE 
            SINCE_EPOCH(second,NOW) - SINCE_EPOCH(second,event_time) > 24*3600
       )
    );

    At runtime, If an INSERT statement would result in the the current partition having more than 1000 rows, the delete statement will automatically be executed in an attempt to reduce the row count before the INSERT statement is run. In the example, any records with an event_time older than 24 hours will be deleted. Note that it is your responsibility as the query designer to provide a DELETE statement that is both deterministic and likely to remove sufficient rows to allow the query to succeed. Several important points to note about the EXECUTE clause:

    • If the DELETE statement does not delete sufficient rows, the INSERT statement will fail. For example, in the previous example, if you attempt to insert more than 1000 rows into a single partition in a 24 hour period, the DELETE statement will not delete enough records when you attempt to insert the 1001st record.

    • The LIMIT PARTITION ROWS constraint is applied per partition. That is, the DELETE statement is executed as a single-partitioned query in the partition where the INSERT statement triggers the row limit constraint, even if the INSERT statement is part of a multi-partitioned stored procedure.

  • The length of VARCHAR columns can be specified in either characters (the default) or bytes. To specify the length in bytes, include the BYTES keyword after the length value; for example VARCHAR(16 BYTES).

    Specifying the VARCHAR length in characters is recommended because UTF-8 characters can require a variable number of bytes to store. By specifying the length in characters you can be sure the column has sufficient space to store any string of the specified length. Specifying the length in bytes is only recommended when all values contain only single byte (ASCII) characters or when conserving space is required and the strings are less than 64 bytes in length.

  • The VARBINARY datatype provides variable storage for arbitrary strings of binary data and operates similarly to VARCHAR(n BYTES) strings. You assign byte arrays to a VARBINARY column when passing in variables, or you can use a hexidecimal string for assigning literal values in the SQL statement.

  • The VoltDB TIMESTAMP datatype is a long integer representing the number of microseconds since the epoch. Two important points to note about this timestamp:

    • The VoltDB TIMESTAMP is not the same as the Java Timestamp datatype or traditional Linux time measurements, which are measured in milliseconds rather than microseconds. Appropriate conversion is needed when casting values between a VoltDB TIMESTAMP and other timestamp datatypes.

    • The VoltDB TIMESTAMP is interpreted as a Greenwich Meantime (GMT) value. Depending on how time values are created, their value may or may not account for the local machine's default time zone. Mixing timestamps from different time zones (for example, in WHERE clause comparisons) can result in unexpected behavior.

  • For TIMESTAMP columns, you can define a default value using the NOW or CURRENT_TIMESTAMP keywords in place of a specific value. For example:

    CREATE TABLE Event (
        Event_Id INTEGER UNIQUE NOT NULL,
        Event_Timestamp TIMESTAMP DEFAULT NOW,
        Event_Description VARCHAR(128)
    );

    The default value is evaluated at runtime as an approximation, in milliseconds, of when the transaction begins execution.

Example

The following example defines a table with five columns. The first column, Company, is not allowed to be null, which is important since it is used as the partitioning column in the following PARTITION TABLE statement. That column is also contained in the PRIMARY KEY constraint. Again, it is important to include the partitioning column in any fully unique indexes for partitioned tables.

CREATE TABLE Inventory (
    Company VARCHAR(32) NOT NULL,
    ProductID BIGINT NOT NULL, 
    Price DECIMAL,
    Category VARCHAR(32),
    Description VARCHAR(256),
    PRIMARY KEY (Company, ProductID)
);
PARTITION TABLE Inventory ON COLUMN Company;