CREATE TABLE

Documentation

VoltDB Home » Documentation » Using VoltDB

< previous

table of contents

next >

CREATE TABLE

CREATE TABLE — Creates a table in the database.

Synopsis

CREATE TABLE table-name (
column-definition [,...]
[, constraint-definition [,...]]

) [ttl-definition] ;

column-definition: column-name datatype [DEFAULT value ] [ NOT NULL ] [index-type]

constraint-definition: [CONSTRAINT constraint-name] { index-definition | limit-definition }

index-definition: {index-type} (column-name [,...])

limit-definition: LIMIT PARTITION ROWS row-count [EXECUTE (delete-statement)]

index-type: PRIMARY KEY | UNIQUE | ASSUMEUNIQUE

ttl-definition: USING TTL value [time-unit] ON COLUMN column-name
[BATCH_SIZE number-of-rows] [MAX_FREQUENCY value]

time-unit: SECONDS | MINUTES | HOURS | DAYS

Description

The CREATE TABLE statement creates a table and its associated columns in the database. The supported datatypes are described in Table A.1, “Supported SQL Datatypes”.

Table A.1. Supported SQL Datatypes

SQL Datatype	Equivalent Java Datatype	Description
TINYINT	byte	1-byte signed integer, -127 to 127^[a]
SMALLINT	short	2-byte signed integer, -32,767 to 32,767
INTEGER	int	4-byte signed integer, -2,147,483,647 to 2,147,483,647
BIGINT	long	8-byte signed integer, -9,223,372,036,854,775,807 to 9,223,372,036,854,775,807
FLOAT	double	8-byte numeric, -(2-2^-52)·2¹⁰²³ to (2-2^-52)·2¹⁰²³ (Note that values less than or equal to -1.7E+308 are interpreted as null.)
DECIMAL	BigDecimal	16-byte fixed scale of 12 and precision of 38, -99999999999999999999999999.999999999999 to 99999999999999999999999999.999999999999
GEOGRAPHY or GEOGRAPHY()		A geospatial region. The storage requirement for geospatial data varies depending on the geometry. The default maximum size in memory is 32768. However, you can specify a different value by specifying the maximum size (in bytes) in the declaration. For example: GEOGRAPHY(80000). See the section on entering geospatial data in the VoltDB Guide to Performance and Customization for details.
GEOGRAPHY_POINT		A geospatial location identified by its latitude and longitude. Requires 16 bytes of storage.
VARCHAR()	String	Variable length text string, with a maximum length specified in either characters (the default) or bytes. To specify the length in bytes, use the BYTES keyword after the length value. For example: VARCHAR(28 BYTES).
VARBINARY()	byte array	Variable length binary string (sometimes referred to as a "blob") with a maximum length specified in bytes
TIMESTAMP	long, VoltDB TimestampType	Time in microseconds
^[a]For integer and floating-point datatypes, VoltDB reserves the largest possible negative value to denote a null value. For example -128 is interpreted as null for TINYINT, -32768 for SMALLINT, and so on.

The following limitations are important to note when using the CREATE TABLE statement in VoltDB:

CHECK and FOREIGN KEY constraints are not supported.
VoltDB does not support AUTO_INCREMENT, the automatic incrementing of column values.
A table can have up to 1024 columns. Each column has a maximum size of 1 megabyte and the total declared size of all of the columns in a table cannot exceed 2 megabytes. For VARCHAR columns where the length is specified in characters, the declared size is calculated as 4 bytes per character to allow for the longest potential UTF-8 string.
If you intend to use a column to partition a table, that column cannot contain null values. You must specify NOT NULL in the definition of the column or VoltDB issues an error when compiling the schema.
To specify an index — either for an individual column or as a table constraint — that is globally unique across the database, use the standard SQL keywords UNIQUE and PRIMARY KEY. However, for partitioned tables, VoltDB can only ensure uniqueness if the index includes the partitioning column. Otherwise, these keywords are not allowed.
It can be a performance advantage to define indexes or constraints on non-partitioning columns that you, as the developer, know are going to contain unique values. Although VoltDB cannot ensure uniqueness across the entire database, it does allow you to define indexes that are assumed to be unique by using the ASSUMEUNIQUE keyword.
When you define an index on a partitioned table as ASSUMEUNIQUE, VoltDB verifies uniqueness within the current partition when creating an index entry. However, it is your responsibility as developer or administrator to ensure that the values are actually globally unique. If the database is repartitioned due to adding new nodes or restoring a snapshot to a different cluster configuration, non-unique ASSUMEUNIQUE index entries may collide. When this occurs it results in a constraint violation error and the database will not be able to complete its current action.
Therefore, ASSUMEUNIQUE should be used with caution. Also, it is not necessary and should not be used with replicated tables or indexes that contain the partitioning column, which can be defined as UNIQUE.
VoltDB includes a special constraint, LIMIT PARTITION ROWS, that limits the number of rows of data that can be inserted into any one partition for the table. This constraint is useful for managing memory usage and avoiding accidentally running out of memory due to unbalanced partitions or unexpected data growth.
Note that the limit, specified as an integer, limits the number of rows per partition, not for the table as a whole. In the case of replicated tables, where each partition contains all rows of the table, the limit applies equally to the table as a whole and each partition. Also, the constraint is applied to INSERT operations. The constraint is not enforced when restoring a snapshot, altering the table declaration, or rebalancing the cluster as part of elastically adding nodes. In these cases, ignoring the limit allows the operation to succeed even if, as a result, a partition ends up containing more rows than specified by the LIMIT PARTITION ROWS constraint. But once the limit has been exceeded, any attempt to INSERT more table rows into that partition will result in an error, until sufficient rows are deleted to reduce the row count below the limit.
As part of the LIMIT PARTITION ROWS constraint, you can optionally include an EXECUTE clause that specifies a DELETE statement to be executed when an INSERT statement will exceed the partition's row limit. For example, assume the events table has the following constraint as part of the CREATE TABLE statement:
```
CREATE TABLE events (
   event_time TIMESTAMP NOT NULL,
   event_code INTEGER NOT NULL,
   event_message VARCHAR(128),
   LIMIT PARTITION ROWS 1000 EXECUTE (
     DELETE FROM events WHERE 
        SINCE_EPOCH(second,NOW) - SINCE_EPOCH(second,event_time) > 24*3600
   )
);
```
At runtime, If an INSERT statement would result in the the current partition having more than 1000 rows, the delete statement will automatically be executed in an attempt to reduce the row count before the INSERT statement is run. In the example, any records with an event_time older than 24 hours will be deleted. Note that it is your responsibility as the query designer to provide a DELETE statement that is both deterministic and likely to remove sufficient rows to allow the query to succeed. Several important points to note about the EXECUTE clause:
- If the DELETE statement does not delete sufficient rows, the INSERT statement will fail. For example, in the previous example, if you attempt to insert more than 1000 rows into a single partition in a 24 hour period, the DELETE statement will not delete enough records when you attempt to insert the 1001st record.
- The LIMIT PARTITION ROWS constraint is applied per partition. That is, the DELETE statement is executed as a single-partitioned query in the partition where the INSERT statement triggers the row limit constraint, even if the INSERT statement is part of a multi-partitioned stored procedure.
The USING TTL clause specifies a lifetime for each record, based on the difference between the specified "time to live" (TTL) value, the value of the specified column, and the current time (in GMT microseconds). In the simplest case, you can define a time to live based on a TIMESTAMP column defined as DEFAULT NOW, so the record expires the specified amount of time after it is inserted. For example, the records in the following table will be deleted five minutes after they are inserted into the database (assuming the default value is used for the created column):
```
CREATE TABLE current_alerts (
   id BIGINT NOT NULL,
   message VARCHAR(128),
   created TIMESTAMP DEFAULT NOW NOT NULL,
) USING TTL 5 MINUTES ON COLUMN created;
```
You specify the time to live value as an integer number of seconds, minutes, hours, or days. (The default, if you do not specify a time unit, is seconds.) You can identify any INTEGER, BIGINT, or TIMESTAMP column as the TTL column, as long as the column is defined as NOT NULL.
The USING TTL clause lets you automate the deletion of temporal data from the database at a set interval. TTL records are evaluated and deleted by a parallel process within the database. As a result, records are deleted shortly after the specified time to live arrives, rather than at the exact time specified. But the deletion of records is handled as a proper database transaction, guaranteeing consistency with any user-invoked transactions.
One consequence of automating the expiration of database records, is that the evaluation and deletion of records can impact database performance.
- There must be a usable index on the TTL column for the table. VoltDB uses that index to optimize the evaluation of the TTL values. If not, the USING TTL clause is accepted, but no automated deletion will occur at runtime until a usable index is defined.
- The CREATE TABLE... USING TTL statement is not rejected if the index is missing. This way you can define the index in a subsequent DDL statement. However, a warning message is issued if the USING TTL clause has no supporting index available. A similar warning is issued if you delete the last usable index.
- TTL clauses are most effective when used on partitioned tables. Defining TTL for a replicated table, especially a large replicated table, can have a significant impact on database performance because the TTL delete actions must be processed as multi-partition transactions.
- You can also specify the frequency and maximum size of the TTL processing cycle.
  - The BATCH_SIZE argument specifies the maximum number of records that will be deleted during each processing cycle. Specify the batch size as a positive integer. The default is 1000 rows.
  - The MAX_FREQUENCY argument specifies how often the TTL clause is evaluated. You specify the frequency in terms of the maximum number of times it is processed per second. For example a MAX_FREQUENCY of 10 means that the table's TTL value is processed at most 10 times per second. Specify the frequency as a positive integer. The default frequency is once per second (1).
  Under extreme loads or sudden bursts of inserts, it is possible for TTL processing to fall behind. Or if the records are extremely large, attempting to delete too many records at one time can cause the TTL process to exceed the temporary table limit. The BATCH_SIZE and MAX_FREQUENCY clauses let you customize the TTL processing per table to meet the specific requirements of your application. The TTL selector for the @Statistics system procedure can help you evaluate TTL performance against your application workload to determine what settings you need.
- When using database replication (DR), it is possible for the TTL transaction to exceed the 50MB limit on the DR binary log. If this happens, a warning is issued and TTL processing is suspended.
Finally, evaluation of the time to live is made against the current value of the TTL column, not its initial value. So if a subsequent transaction alters the column value (either increasing or decreasing it) that modification will impact the subsequent lifetime of the record.
The length of VARCHAR columns can be specified in either characters (the default) or bytes. To specify the length in bytes, include the BYTES keyword after the length value; for example VARCHAR(16 BYTES).
Specifying the VARCHAR length in characters is recommended because UTF-8 characters can require a variable number of bytes to store. By specifying the length in characters you can be sure the column has sufficient space to store any string of the specified length. Specifying the length in bytes is only recommended when all values contain only single byte (ASCII) characters or when conserving space is required and the strings are less than 64 bytes in length.
The VARBINARY datatype provides variable storage for arbitrary strings of binary data and operates similarly to VARCHAR(n BYTES) strings. You assign byte arrays to a VARBINARY column when passing in variables, or you can use a hexidecimal string for assigning literal values in the SQL statement.
The VoltDB TIMESTAMP datatype is a long integer representing the number of microseconds since the epoch. Two important points to note about this timestamp:
- The VoltDB TIMESTAMP is not the same as the Java Timestamp datatype or traditional Linux time measurements, which are measured in milliseconds rather than microseconds. Appropriate conversion is needed when casting values between a VoltDB TIMESTAMP and other timestamp datatypes.
- The VoltDB TIMESTAMP is interpreted as a Greenwich Meantime (GMT) value. Depending on how time values are created, their value may or may not account for the local machine's default time zone. Mixing timestamps from different time zones (for example, in WHERE clause comparisons) can result in unexpected behavior.
For TIMESTAMP columns, you can define a default value using the NOW or CURRENT_TIMESTAMP keywords in place of a specific value. For example:
```
CREATE TABLE Event (
    Event_Id INTEGER UNIQUE NOT NULL,
    Event_Timestamp TIMESTAMP DEFAULT NOW,
    Event_Description VARCHAR(128)
);
```
The default value is evaluated at runtime as an approximation, in milliseconds, of when the transaction begins execution.

Example

The following example defines a table with five columns. The first column, Company, is not allowed to be null, which is important since it is used as the partitioning column in the following PARTITION TABLE statement. That column is also contained in the PRIMARY KEY constraint. Again, it is important to include the partitioning column in any fully unique indexes for partitioned tables.

CREATE TABLE Inventory (
    Company VARCHAR(32) NOT NULL,
    ProductID BIGINT NOT NULL, 
    Price DECIMAL,
    Category VARCHAR(32),
    Description VARCHAR(256),
    PRIMARY KEY (Company, ProductID)
);
PARTITION TABLE Inventory ON COLUMN Company;

< previous

table of contents

next >