HQL Reference

The Hypertable Query Language (HQL) allows you to create, modify, and query tables and invoke adminstrative commands. HQL is interpreted by the following interfaces:

  • The hypertable command line interface (ht shell)
  • The hql_exec* and hql_query* Thrift API methods
  • The Hypertable::HqlInterpreter C++ class

Table of Contents

ALTER TABLE

EBNF

ALTER TABLE name alter_specification [alter_specification]*
ALTER TABLE name WITH schema_file

alter_specification:
  ADD add_definitions
  | MODIFY modify_definitions
  | drop_specification
  | RENAME COLUMN FAMILY '(' old_name ',' new_name ')'

add_definitions:
  '(' add_definition [',' add_definition]* ')'

add_definition:
  column_family_name  [column_family_option ...]
  | INDEX column_family_name
  | QUALIFIER INDEX column_family_name
  | ACCESS GROUP name '(' [column_family_name, ...] ')'
      [access_group_options ...]

modify_definitions:
  '(' modify_definition [',' modify_definition]* ')'

modify_definition:
  column_family_name  [modifiable_column_family_option ...]
  | INDEX column_family_name
  | QUALIFIER INDEX column_family_name
  | ACCESS GROUP name '(' [column_family_name, ...] ')'
      [access_group_options ...]

drop_specification:
  DROP '(' column_family_list ')'
  | DROP INDEX '(' column_family_list ')'
  | DROP QUALIFIER INDEX '(' column_family_list ')'

column_family_list:
  column_family_name [',' column_family_name]*

column_family_option:
  modifiable_column_family_option
  | COUNTER [true|false]
  | TIME_ORDER asc|desc

modifiable_column_family_option:
  MAX_VERSIONS int
  | TTL duration

duration:
  num MONTHS
  | num WEEKS
  | num DAYS
  | num HOURS
  | num MINUTES
  | num [ SECONDS ]

access_group_option:
  IN_MEMORY [true|false]
  | BLOCKSIZE int
  | REPLICATION int
  | COMPRESSOR compressor_spec
  | BLOOMFILTER bloom_filter_spec

access_group_options:
  column_family_option | access_group_option

compressor_spec:
  bmz [ bmz_options ]
  | lzo
  | quicklz
  | snappy
  | zlib [ zlib_options ]
  | none

bmz_options:
  --fp-len int
  | --offset int

zlib_options:
  -9
  | --best
  | --normal

bloom_filter_spec:
  rows [ bloom_filter_options ]
  | rows+cols [ bloom_filter_options ]
  | none 

bloom_filter_options:
  --false-positive float
  --bits-per-item float
  --num-hashes int
  --max-approx-items int

DESCRIPTION

The ALTER TABLE command provides a way to alter a table by adding column families, indices, and access groups, dropping column families and indices, renaming column families, or modifying column family and access group, definitions.  See CREATE TABLE for a description of the column family and access group options.  The command takes two forms as described below.

1. ALTER TABLE name WITH schema_file

When a WITH clause is supplied, the command will modify the schema for table 'name' with the schema read from file 'schema_file'.  The 'schema_file' must contain a schema in XML format as generated with the DESCRIBE TABLE command supplied with the WITH IDS option, for example:

DESCRIBE TABLE WITH IDS other_table

2. ALTER TABLE name alter_specification [alter_specification]*

In this form, the schema modifications are specified as arguments to the command.  The format of the schema modification arguments (alter_specification) are described below.

Column families that are not explicitly included in an access group specification will automatically be assigned to the "default" access group.  Moving column families from one access group to another is currently not supported.

The MODIFY option to ALTER TABLE allows you to replace column family and access group definitions with new ones.  Column family options that are specified as access group options are used as default option values for column families that are subsequently added to the access group and do not explicitly define those options in the column family definition.  The options COUNTER and TIME_ORDER of column family definitions cannot be modified.

Adding a column index with the MODIFY command does not populate the index, which means that the index will not be consistent with the table after the command is run and may return empty results erroneously when used.  To populate an index after adding it with the MODIFY command, the REBUILD INDICES command must be run.  Likewise, when dropping a column index with the MODIFY command, the space taken up by the index will not be reclaimend until the REBUILD INDICES command is run.

EXAMPLE

The following statements,

CREATE TABLE foo (
  a MAX_VERSIONS 1,
  b TTL 1 DAY,
  c,
  ACCESS GROUP primary (a) BLOCKSIZE 1024,
  ACCESS GROUP secondary (b, c) COMPRESSOR="zlib --best"
);

ALTER TABLE foo
  ADD ( d MAX_VERSIONS 2 )
  ADD ( ACCESS GROUP tertiary (d) BLOOMFILTER "rows --false-positive 0.1")
  DROP ( c ) 
  RENAME COLUMN FAMILY (a, e);

ALTER TABLE foo MODIFY (e MAX_VERSIONS 3, b TTL 172800);

will produce the following output with SHOW CREATE TABLE foo;

CREATE TABLE foo (
  e MAX_VERSIONS 3,
  b TTL 172800,
  d MAX_VERSIONS 2,
  ACCESS GROUP primary (e) BLOCKSIZE 1024,
  ACCESS GROUP secondary (b) COMPRESSOR "zlib --best",
  ACCESS GROUP tertiary (d) BLOOMFILTER "rows --false-positive 0.1",
);

COMPACT

EBNF

COMPACT [TYPE '=' compaction_type] TABLE table_name [row]

COMPACT [TYPE '=' compaction_type] RANGES range_type ['|' range_type ...]

compaction_type:
    MINOR
    | MAJOR
    | MERGING
    | GC

range_type:
  ROOT |
  METADATA |
  SYSTEM |
  USER |
  ALL

DESCRIPTION

The COMPACT command schedules a compaction operation for a specified set of ranges.  The command has two forms.  The first form provides the ability to compact ranges for a specific table.  It takes an optional string literal that represents a row in the table.  If this optional row string is provided, then just the range containing the row will be scheduled for compaction, otherwise all ranges in the table will be scheduled for compaction.  The second form of the command provides the ability to compact all ranges of a given type.  The following table describes the range types that may be specified.  Multiple types may be specified by concatenating the type names together with the '|' character.

Range Type Description
ROOT The root range of the METADATA table
METADATA All ranges of the METADATA table
SYSTEM All ranges in the sys/ namespace (except METADATA)
USER Ranges for all tables not in the sys/ namespace
ALL All ranges in the database

EXAMPLE

hypertable> COMPACT TABLE foo

hypertable> COMPACT TABLE foo "com.bar"

hypertable> COMPACT RANGES METADATA|SYSTEM

hypertable> COMPACT RANGES ALL

CREATE NAMESPACE

EBNF

CREATE NAMESPACE namespace_name;

DESCRIPTION

CREATE NAMESPACE command creates a new namespace. If namespace_name starts with '/' it treats the namespace_name as an absolute path otherwise it considers it to be a sub-namespace relative to the current namespace.

EXAMPLE

hypertable> CREATE NAMESPACE "/test";
hypertable> USE "/test";
hypertable> CREATE NAMESPACE "subtest";

CREATE TABLE

EBNF

CREATE TABLE name '(' [create_definition, ...] ')' [table_option ...]
CREATE TABLE name LIKE other_table
CREATE TABLE name WITH schema_file

create_definition:
  column_family_name [column_family_option ...]
  | INDEX column_family_name
  | QUALIFIER INDEX column_family_name
  | ACCESS GROUP name '(' [column_family_name, ...] ')'
    [access_group_options ...]

column_family_option:
  MAX_VERSIONS int
  | TIME_ORDER ASC|DESC
  | TTL duration
  | COUNTER [true|false)]

duration:
  int MONTHS
  | int WEEKS
  | int DAYS
  | int HOURS
  | int MINUTES
  | int [ SECONDS ]

access_group_option:
  IN_MEMORY
  | BLOCKSIZE int
  | REPLICATION int
  | COMPRESSOR compressor_spec
  | BLOOMFILTER bloom_filter_spec

access_group_options:
  column_family_option | access_group_option

compressor_spec:
  bmz [ bmz_options ]
  | lzo
  | quicklz
  | snappy
  | zlib [ zlib_options ]
  | none

bmz_options:
  --fp-len int
  | --offset int

zlib_options:
  -9
  | --best
  | --normal

bloom_filter_spec:
  rows [ bloom_filter_options ]
  | rows+cols [ bloom_filter_options ]
  | none

bloom_filter_options:
  --false-positive float
  --bits-per-item float
  --num-hashes int
  --max-approx-items int

table_option:
  access_group_option
  | column_family_option
  | GROUP_COMMIT_INTERVAL int

DESCRIPTION

CREATE TABLE creates a table with the given name.  The command takes three forms as described below.

1. CREATE TABLE name LIKE other_table

When a LIKE clause is supplied, the command will create a table 'name' that has the same schema as table 'other_table'.

2. CREATE TABLE name WITH schema_file

When a WITH clause is supplied, the command will create a table 'name' with the schema read from file 'schema_file'.  The 'schema_file' must contain a schema in XML format as generated with the DESCRIBE TABLE command supplied with the WITH IDS option, for example:

DESCRIBE TABLE WITH IDS other_table

3. CREATE TABLE name '(' [create_definition, ...] ')' [table_option ...]

In this form, the schema is specified as arguments to the command.  The schema consists of column family specifications, access group specifications and optional table options which are described below.

Column Families

Column families are somewhat analogous to a traditional database column. The main difference is that a theoretically infinite number of qualified columns can be created within a column family. The qualifier is an optional NUL- terminated string that can be supplied, along with the data, in the insert statement. This is what gives tables in Hypertable their sparse nature. For example, given a column family "tag", the following set of qualified columns may be inserted for a single row.

  • tag:good
  • tag:science
  • tag:authoritative
  • tag:green

The column family is represented internally as a single byte, so there is a limit of 255 column families (the 0 value is reserved) which may be supplied in the CREATE TABLE statement.

Secondary Indices

Tables can also have one or more indices, each indexing a single column family.  Two types of indices exist: a cell value index, which optimizes scans on a single column family that do an excact match or prefix match of the cell value, and a qualifier index, which optimizes scans on a single column family that do an exact match or prefix match of the column qualifier. The use of indices is optional.

The indices are stored in an index table which is created in the same namespace as the primary table and has the same name with one (cell value index) or two (qualifier index) caret signs (`^`) as a prefix.

A column family can have both types of indices (cell value index and qualifier index) at the same time.  The following HQL command creates a table with three column families (a, b and c).  Column family a has a cell value index, column family b has a qualifier index and c has both.

CREATE TABLE foo (
    a,
    b,
    c,
    INDEX a,
    QUALIFIER INDEX b,
    INDEX c,
    QUALIFIER INDEX c,
 );

Indices speed up some queries that match on column families.  Accessing columns which are indexed is nearly as fast as accessing them by their row key. On the downside they require additional disk storage and cause a very small performance impact when inserting data to an indexed column.

Cell value indices are used when selecting cells by value (SELECT a FROM TABLE t WHERE a = "cell-value" ...) or by a value prefix (SELECT a FROM TABLE t WHERE a =^ "cell-prefix" ...).

Qualifier indices are used when selecting cells from a qualified column (SELECT a:foo FROM TABLE t ...) or selecting all cells with a qualifier prefix (SELECT a:^prefix FROM TABLE t ...).

Access Groups

Tables consist of one or more access groups, each containing some number of column families. There is a default access group named "default" which contains all column families that are not explicitly referenced in an ACCESS GROUP clause. For example, the following two statements are equivalent.

CREATE TABLE foo (
   a,
   b,
   c,
   ACCESS GROUP bar ( a, b )
);

CREATE TABLE foo (
   a,
   b,
   c,
   ACCESS GROUP bar (a, b),
   ACCESS GROUP default (c)
);

Access groups provide control over the physical layout of the table data on disk. The data for all column families in the same access group are stored physically together on disk. By carefully defining a set of access groups and choosing which column families go into those access groups, performance can be significantly improved for expected workloads. For example, say you have a table with 100 column families, but two of the column families get access together with much higher frequency than the rest of the 98 column families. By putting the two frequently accessed column families in their own access group, the system does much less disk i/o because only the data for the two column families gets transfered whenever those column families are accessed. A row-oriented database can be emulated by having a single access group. A column-oriented database can be emulated by having each column family within their own access group.

Column Family Options

The following column family options are supported:

  MAX_VERSIONS int
  TIME_ORDER ASC|DESC
  TTL duration
  COUNTER [true|false]

The MAX_VERSIONS option allows you to specify that you only want to keep n versions of each cell.  Cells are identified by a 3-tuple, (row_key, column_family, column_qualifier).  In addition to the 3-tuple that identifies the cell, there is a 64-bit timestamp field that defines a specific version of the cell.  When specifying a MAX_VERSIONS n option, depending on the value of the TIME_ORDER option (see below), either the most recent n versions or the oldest n versions will be kept.  Versions outside the window (of size n) will be lazily garbage collected through the normal compaction process.

The TIME_ORDER option controls the order in which cell versions are stored and returned.  The TIME_ORDER option defines the order, within each cell, in which the versions are stored and returned.  By default the order is reverse-chronological (ASC) so when a SELECT statement is issued with MAX_VERSIONS 1, the most recent version is returned for each cell.  When specifying TIME_ORDER DESC for a column, a SELECT statement that is issued with MAX_VERSIONS 1 will return the oldest version of the cell.

The TTL option allows you to specify that you only want to keep cell versions that fall within some time window in the immediate past. For example, you can specify that you only want to keep cells that were created within the past two weeks. Like the MAX_VERSIONS option, older versions are lazily garbage collected through the normal compaction process.

The COUNTER option makes each instance of this column act as an atomic counter. Counter columns are accessed using the same methods as other columns. However, to modify the counter, the value must be formatted specially, as described in the following table.

Value Format Description
['+'] n Increment the counter by n
'-' n Decrement the counter by n
'=' n Reset the counter to n

For example, consider the following sequence of values written to a counter column:

  +9
  =0
  +3
  +4
  +5
  -2
  +9

After these six values get written to a counter column, a subsequent read of that column would return the ASCII string "10".

Access Group Options

Access group options are supplied after the column list and control physical characteristics of the access group or provide default options for member column families.  The following options are supported:

  column_family_option
  IN_MEMORY
  BLOCKSIZE int
  REPLICATION int
  COMPRESSOR compressor_spec
  BLOOMFILTER bloom_filter_spec 

Any of the column family options may be specified as access group options.
Column family options specified as access group options are taken to be
default values for any member column family that doesn't define the option in its definition.

The IN_MEMORY option indicates that all cell data for the access group should remain memory resident. Queries against column families in IN_MEMORY access groups are efficient because no disk access is required.

The cell data inserted into an access group resides in one of two places. The recently inserted cells are stored in an in-memory data structure called the cell cache and older cells get compacted into on-disk data structures called cell stores. The cell stores are organized as a series of compressed blocks of sorted key/value pairs (cells). At the end of the compressed blocks is a block index which contains, for each block, the key (row,column,timestamp) of the last cell in the block, followed by the block offset. It also contains a Bloom Filter.

The BLOCKSIZE option controls the size of the compressed blocks in the cell stores. A smaller block size minimizes the amount of data that must be read from disk and decompressed for a key lookup at the expense of a larger block index which consumes memory. The default value for the block size is 65K.

The REPLICATION option controls the replication level in the underlying distributed file system for cell store files created for this access group. The default is unspecified, which translates to whatever the default replication level is for the underlying file system.  If the underlying filesystem does not support a replication factor, this option is ignored.

The COMPRESSOR option specifies the compression codec that should be used for cell store blocks within an access group. See the Compressors section below for a description of each compression codec.

NOTE: if the block, after compression, is not significantly reduced in size, then no compression will be performed on the block

An access group can consist of many on-disk cell stores. A query for a single row key can result probing each cell store to see if data is present for that row even when most of the cell stores do not contain any data for that row. To eliminate this inefficiency, each cell store contains an optional Bloom Filter. The Bloom Filter is a probabilistic data structure that can indicate, with high probability, if a key is present and also indicate definitively if a key is not present. By mapping the bloom filters, for each cell store in memory, queries can be made much more efficient because only the cell stores that contain the row are searched.

The bloom filter specification can take one of the following forms. The rows form, which is the default, causes only row keys to be inserted into the bloom filter. The rows+cols form causes the row key concatenated with the column family to be inserted into the bloom filter. none disables the bloom filter.

  rows [ bloom_filter_options ]
  rows+cols [ bloom_filter_options ]
  none

The following table describes the bloom filter options:

Option Default Description
--false-positive arg 0.01 Expected false positive probability. This option is (currently) mutually exclusive with the --bits-per-item and --num-hashes options. If specified it will choose the minimum number of bits per item that can achieve the given false positive probability and will choose the appropriate number of hash functions.
--bits-per-item arg NULL Number of bits to use per item (to compute size of bloom filter). Must be used in conjunction with --num-hashes.
--num-hashes arg NULL Number of hash functions to use. Must be used in conjunction with --bits-per-item.
--max-approx-items arg 1000 Number of cell store items used to guess the number of actual Bloom filter entries
Compressors

The cell store blocks within an access group are compressed using the compression codec that is specified for the access group. The following compression codecs are available:

  • bmz
  • lzo
  • quicklz
  • snappy
  • zlib
  • none

The default code is snappy for cell store blocks. The following tables describe the available options.

bmz codec options
Option Default Description
--fp-len arg 19 Minimum fingerprint length
--offset arg 0 Starting fingerprint offset

zlib codec options
Option Description
-9 [ --best ] Highest compression ratio (at the cost of speed)
--normal Normal compression ratio

Table Options

Table options are supplied at the end of the CREATE TABLE statement and control the runtime characteristics of the table or provide default options for access groups or column families.  The following options are supported:

  access_group_option
  column_family_option
  GROUP_COMMIT_INTERVAL int

Any of the access group options may be specified as table options. Access group options specified as table options are taken to be default values for any member access group that doesn't define the option in its definition.

Any of the column family options may be specified as table options. Column family options specified as table options are taken to be default values for any member column family that doesn't define the option in its definition. Column family defaults specified in an access group take precedence over column family defaults specified at the table level.

These are the same options as the ones in the column family and access group specification except that they act as defaults in the case where no corresponding option is specified in the column family or access group specifier. See the description under Access Group Options for option details.

Group commit is a feature whereby the system will accumulate update requests for a table and commit them together as a group on a regular interval. This improves the performance of systems that receive a large number of concurrent updates by reducing the number of times sync() gets called on the commit log.

The GROUP_COMMIT_INTERVAL option tells the system that updates to this table should be carried out with group commit and also specifies the commit interval in milliseconds. The interval is constrained by the value of the config property Hypertable.RangeServer.CommitInterval, which acts as a lower bound and defaults to 50ms. The value specified for GROUP_COMMIT_INTERVAL will get rounded up to the nearest multiple of this property value.

DELETE

EBNF

DELETE ('*' | column [',' column ...])
  FROM table_name
  WHERE ROW '=' row_key
  [(TIMESTAMP timestamp | VERSION timestamp)]

column:
  column_family [':' column_qualifier]

timestamp:
  YYYY-MM-DD HH:MM:SS[.nanoseconds]

DESCRIPTION

The DELETE command provides a way to delete cells from a row in a table. The command applies to a single row only and can be used to delete, for a given row, all of the cells in a qualified column, all the cells in a column family, or all of the cells in the row. If the TIMESTAMP clause is given, then the delete will only apply to those cells whose internal timestamp field is equal to or less than the given timestamp. An example of each type of delete is shown below. Assume that we are re starting with a table that contains the following:

hypertable> SELECT * FROM crawldb DISPLAY_TIMESTAMPS;
2010-01-01 00:00:02.00000000    org.hypertable.www      status-code     200     
2010-01-01 00:00:01.00000000    org.hypertable.www      status-code     200
2010-01-01 00:00:04.00000000    org.hypertable.www      anchor:http://www.news.com/     Hypertable
2010-01-01 00:00:03.00000000    org.hypertable.www      anchor:http://www.news.com/     Hypertable
2010-01-01 00:00:06.00000000    org.hypertable.www      anchor:http://www.opensource.org/       Hypertable.org
2010-01-01 00:00:05.00000000    org.hypertable.www      anchor:http://www.opensource.org/       Hypertable.org
2010-01-01 00:00:08.00000000    org.hypertable.www      checksum        822828699
2010-01-01 00:00:07.00000000    org.hypertable.www      checksum        2921728

The first example shows how to delete one specific timestamped version of a cell in the column anchor:http://www.opensource.org/ of the row org.hypertable.www.

hypertable> DELETE "anchor:http://www.opensource.org/" FROM crawldb WHERE
ROW='org.hypertable.www' VERSION "2010-01-01 00:00:06";

hypertable> select "anchor" from crawldb DISPLAY_TIMESTAMPS;
2010-01-01 00:00:04.00000000    org.hypertable.www      anchor:http://www.news.com/     Hypertable
2010-01-01 00:00:03.00000000    org.hypertable.www      anchor:http://www.news.com/     Hypertable
2010-01-01 00:00:05.00000000    org.hypertable.www      anchor:http://www.opensource.org/       Hypertable.org

The next example shows how to delete the cells in the column anchor:http://www.opensource.org/ of the row org.hypertable.www.

hypertable> DELETE "anchor:http://www.opensource.org/" FROM crawldb WHERE
ROW='org.hypertable.www';

hypertable> select * from crawldb;
org.hypertable.www      status-code     200
org.hypertable.www      status-code     200
org.hypertable.www      anchor:http://www.news.com/     Hypertable
org.hypertable.www      anchor:http://www.news.com/     Hypertable
org.hypertable.www      checksum        822828699
org.hypertable.www      checksum        2921728

The next example shows how to delete all of the cells in the column family checksum of the row org.hypertable.www.

hypertable> DELETE checksum FROM crawldb WHERE ROW="org.hypertable.www";

hypertable> select * from crawldb;
org.hypertable.www      status-code     200
org.hypertable.www      status-code     200
org.hypertable.www      anchor:http://www.news.com/     Hypertable
org.hypertable.www      anchor:http://www.news.com/     Hypertable

And finally, here's how to delete all of the cells in the row org.hypertable.www.

hypertable> DELETE * FROM crawldb WHERE ROW="org.hypertable.www";

hypertable> SELECT * FROM crawldb;

DESCRIBE TABLE

EBNF

DESCRIBE TABLE [ WITH IDS ] table_name

DESCRPTION

The DESCRIBE TABLE command displays the XML-style schema for a table. The output of the straight DESCRIBE TABLE command can be passed into the Hypertable C++ Client::create_table() API as the schema parameter. If the optional WITH IDS clause is supplied, then the schema "generation" attribute and the column family "id" attributes are included in the XML output. For example, the following table creation statement:

CREATE TABLE foo (
  a MAX_VERSIONS 1,
  b TTL 1 DAY,
  c,
  ACCESS GROUP primary (a, b) BLOCKSIZE 1024,
  ACCESS GROUP secondary (c) compressor "zlib --best"
);

will create a table with the following schema as reported by the CREATE TABLE command:

<Schema>
  <AccessGroupDefaults>
  </AccessGroupDefaults>
  <ColumnFamilyDefaults>
  </ColumnFamilyDefaults>
  <AccessGroup name="primary">
    <Options>
      <BlockSize>1024</BlockSize>
    </Options>
    <ColumnFamilyDefaults>
    </ColumnFamilyDefaults>
    <ColumnFamily>
      <Name>a</Name>
      <AccessGroup>primary</AccessGroup>
      <Deleted>false</Deleted>
      <Options>
        <MaxVersions>1</MaxVersions>
      </Options>
    </ColumnFamily>
    <ColumnFamily>
      <Name>b</Name>
      <AccessGroup>primary</AccessGroup>
      <Deleted>false</Deleted>
      <Options>
        <TTL>86400</TTL>
      </Options>
    </ColumnFamily>
  </AccessGroup>
  <AccessGroup name="secondary">
    <Options>
      <Compressor>zlib --best</Compressor>
    </Options>
    <ColumnFamilyDefaults>
    </ColumnFamilyDefaults>
    <ColumnFamily>
      <Name>c</Name>
      <AccessGroup>secondary</AccessGroup>
      <Deleted>false</Deleted>
      <Options>
      </Options>
    </ColumnFamily>
  </AccessGroup>
</Schema>

and the following output will be generated when the WITH IDS clause is supplied in the CREATE TABLE statement:

<Schema>
  <Generation>1403544333358153000</Generation>
  <AccessGroupDefaults>
  </AccessGroupDefaults>
  <ColumnFamilyDefaults>
  </ColumnFamilyDefaults>
  <AccessGroup name="primary">
    <Generation>1403544333358153000</Generation>
    <Options>
      <BlockSize>1024</BlockSize>
    </Options>
    <ColumnFamilyDefaults>
    </ColumnFamilyDefaults>
    <ColumnFamily id="1">
      <Generation>1403544333358153000</Generation>
      <Name>a</Name>
      <AccessGroup>primary</AccessGroup>
      <Deleted>false</Deleted>
      <Options>
        <MaxVersions>1</MaxVersions>
      </Options>
    </ColumnFamily>
    <ColumnFamily id="2">
      <Generation>1403544333358153000</Generation>
      <Name>b</Name>
      <AccessGroup>primary</AccessGroup>
      <Deleted>false</Deleted>
      <Options>
        <TTL>86400</TTL>
      </Options>
    </ColumnFamily>
  </AccessGroup>
  <AccessGroup name="secondary">
    <Generation>1403544333358153000</Generation>
    <Options>
      <Compressor>zlib --best</Compressor>
    </Options>
    <ColumnFamilyDefaults>
    </ColumnFamilyDefaults>
    <ColumnFamily id="3">
      <Generation>1403544333358153000</Generation>
      <Name>c</Name>
      <AccessGroup>secondary</AccessGroup>
      <Deleted>false</Deleted>
      <Options>
      </Options>
    </ColumnFamily>
  </AccessGroup>
</Schema>

DROP NAMESPACE

EBNF

DROP NAMESPACE [IF EXISTS] namespace_name

DESCRIPTION

The DROP NAMESPACE removes a namespace. If the IF EXISTS clause is specified, the command wont generate an error if the namespace namespace_name does not exist. A namespace can only be dropped if it is empty (ie has contains no tables or sub-namespaces If namespace_name starts with '/' it treats namespace_name as an absolute path, otherwise it considers it to be a sub-namespace relative to the current namespace.

EXAMPLE

hypertable> DROP NAMESPACE "/test/subtest";
hypertable> DROP NAMESPACE "/test";

DROP TABLE

EBNF

DROP TABLE [IF EXISTS] table_name

DESCRIPTION

The DROP TABLE command removes the table table_name from the system. If the IF EXIST clause is supplied, the command won't generate an error if a table by the name of table_name does not exist.

EXAMPLE

hypertable> drop table foo;

DUMP TABLE

EBNF

DUMP TABLE table_name
  [COLUMNS ('*' | (column_predicate [',' column_predicate]*))]
  [where_clause]
  [options_spec]

where_clause:
  WHERE where_predicate [AND where_predicate]*

where_predicate:
  row_predicate
  | timestamp_predicate
  | value_predicate

row_predicate:
  ROW REGEXP 'row_regexp'

timestamp_predicate:
  [timestamp relop] TIMESTAMP relop timestamp

relop: '=' | '<' | '<=' | '>' | '>='

value_predicate:
  VALUE REGEXP 'value_regexp'

options_spec:
  ( MAX_VERSIONS version_count
  | INTO FILE [file_location]filename[.gz]
  | BUCKETS n
  | NO_ESCAPE)*

file_location:
  "fs://" | "file://"

timestamp:
  'YYYY-MM-DD HH:MM:SS[.nanoseconds]'

DESCRIPTION

The DUMP TABLE command provides a way to create efficient table backups which can be loaded with LOAD DATA INFILE. The problem with using SELECT to create table backups is that it outputs table data in order of row key. LOAD DATA INFILE yields worst-case performance when loading data that is sorted by the row key because only one RangeServer at a time will be actively receiving updates. Backup file generated with DUMP TABLE are much more efficient because the data distribution in the backup file causes many (or all) of the RangeServers to actively receive updates during the loading process. The DUMP TABLE command will randomly select n ranges and output cells from those ranges in round-robin fashion. n is the number of buckets (default is 20) and can be specified with the BUCKETS option.

OPTIONS

MAX_VERSIONS version_count

Each cell in a Hypertable table can have multiple timestamped versions. By default all versions of a cell are returned by the DUMP TABLE statement. The MAX_VERSIONS option allows control over the number of cell versions returned. The cell versions are stored in reverse-chronological order, so MAX_VERSIONS 1 will return the most recent version of the cell.

INTO FILE [file://|fs://]filename[.gz]

The result of a DUMP TABLE command is displayed to standard output by default. The INTO FILE option allows the output to get redirected to a file. If the file name starts with the location specifier fs:// then the output file is assumed to reside in the brokered FS. If it starts with file:// then output is sent to a local file. This is also the default location in the absence of any location specifier. If the file name specified ends in a .gz extension, then the output is compressed with gzip before it is written to the file.

BUCKETS n

This option causes the DUMP TABLE command to use n buckets. The default is 20. It is recommended that n is at least as large as the number of nodes in the cluster that the backup with be restored to.

NO_ESCAPE

The output format of a DUMP TABLE command comprises tab delimited lines, one cell per line, which is suitable for input to the LOAD DATA INFILE command. However, if the value portion of the cell contains either newline or tab characters, then it will confuse the LOAD DATA INFILE input parser. To prevent this from happening, newline, tab, and backslash characters are converted into two character escape sequences, described in the following table.

Character Escape Sequence
backslash (\) '\' '\'
newline (\n) '\' \n'
tab (\t) '\' 't'
NUL (\0) '\' '0'

The NO_ESCAPE option turns off this escaping mechanism.

EXAMPLES

DUMP TABLE foo;
DUMP TABLE foo WHERE '2008-07-28 00:00:02' < TIMESTAMP < '2008-07-28 00:00:07';
DUMP TABLE foo INTO FILE 'foo.tsv.gz'
DUMP TABLE foo MAX_VERSIONS 1 BUCKETS 1000;
DUMP TABLE LoadTest COLUMNS user:/^a/ WHERE ROW REGEXP "1.*2" AND VALUE REGEXP "foob";

EXISTS TABLE

EBNF

EXISTS TABLE table_name

DESCRIPTION

The EXISTS TABLE command will print "true" if a table named table_name exists in the current namespace and will print "false" if it does not.

EXAMPLE

hypertable> exists table foo;
true

GET LISTING

EBNF

GET LISTING

DESCRIPTION

The GET LISTING command lists the tables and namespaces in the current namespace.

EXAMPLE

hypertable> GET LISTING;
foo
sys   (namespace)
Test

INSERT

EBNF

INSERT INTO table_name VALUES value_list

value_list:
    value_spec [',' value_spec ...]

value_spec:
    '(' row ',' column ',' value ')'
    '(' timestamp ',' row ',' column ',' value ')'

column:
    family [ ':' qualifier ]

timestamp:
    YYYY-MM-DD HH:MM:SS[.nanoseconds]

DESCRIPTION

The INSERT command inserts data (cells) into a table. The data is supplied as a list of comma separated tuples. Each tuple represents a cell and can take one of two forms:

(row, column, value)
(timestamp, row, column, value)

The first form just supplies the row key, column key, and value as strings and the cell timestamp is auto-assigned. The second form supplies the timestamp in addition to the row key, column key, and value.

EXAMPLE

hypertable> INSERT INTO fruit VALUES ("cantelope", "tag:good", "Had with breakfast"), ("2009-08-02 08:30:00", "cantelope", "description", "A cultivated variety of muskmelon with orange flesh"),("banana", "tag:great", "Had with lunch");

hypertable> SELECT * FROM fruit DISPLAY_TIMESTAMPS;
2009-08-11 05:06:17.246062001  banana     tag:great    Had with lunch
2009-08-11 05:06:17.246062002  cantelope  tag:good     Had with breakfast
2009-08-02 08:30:00.000000000  cantelope  description  A cultivated variety of muskmelon with orange flesh

If timestamps are not supplied, then they will be automatically assigned by the RangeServer for non-indexed column families, and will be automatically assigned by the client library for indexed column families. It is therefore important that the system clock of the application servers be synchronized with the clocks of the Hypertable RangeServer machines.

Hypertable supports the GUID() function call for row key and value. It will generate a globally unique ID:

INSERT INTO test VALUES (GUID(), "username", "bloefeld");
INSERT INTO test VALUES ("harddisk0", "device", GUID());

might yield the following output from the SELECT command:

3a983b8e-b7c7-49ae-b3e4-e221610f33ec username bloefeld
harddisk0 device 6d38d110-8790-4a40-8653-701742343d1e

LOAD DATA INFILE

EBNF

LOAD DATA INFILE [options] [file_location]fname[.gz] INTO TABLE name

LOAD DATA INFILE [options] [file_location]fname[.gz] INTO FILE fname

options:

  (ROW_KEY_COLUMN '=' column_specifier ['+' column_specifier ...]
   | TIMESTAMP_COLUMN '=' name |
   | HEADER_FILE '=' '"' filename '"'
   | ROW_UNIQUIFY_CHARS '=' n
   | FS = '<char>'
   | NO_ESCAPE
   | NO_LOG
   | DUPLICATE_KEY_COLUMNS
   | IGNORE_UNKNOWN_COLUMNS
   | SINGLE_CELL_FORMAT)*

column_specifier =
  [ column_format ] column_name

column_format
  "%0" int
  | "%-"
  | "%"

file_location:
  "fs://" | "file://"

DESCRIPTION

The LOAD DATA INFILE command provides a way to bulk load data from an optionally compressed file or stdin (fname of "-", see Load from STDIN below), into a table. The input is assumed to start with a header line that indicates the format of the lines in the file. The header can optionlly be stored in a separate file and referenced with the HEADER_FILE option. The header is expected to have the following format:

header =
  [ '#' ] single_cell_format
  | [ '#' ] multi_cell_format

single_cell_format =
  "row" '\t' "column" '\t' "value" '\n'
  | "timestamp" '\t' "row" '\t' "column" '\t' "value" '\n'

multi_cell_format =
  column | string ( '\t' ( column | string ) )*

column = column_family [ ':' column_qualifier ]

Two basic tab-delimited formats are supported, a single cell format in which each line contains a single cell, and a multi-cell format in which each line can contain a list of cells. The following example shows the single-cell format:

Example 1

#row     column  value
1127071 query   guardianship
1127071 item:rank       8
1127071 click_url       http://adopting.adoption.com
1246036 query   polish american priests association
1246036 item:rank       6
1246036 click_url       http://www.palichicago.org
12653   query   lowes
12653   item:rank       1
12653   click_url       http://www.lowes.com
1270972 query   head hunters
1270972 item:rank       2
1270972 click_url       http://www.headhunters.com
2648672 query   jamie farr
2648672 item:rank       1
2648672 click_url       http://www.imdb.com

An optional initial timestamp column can be included which represents the cell timestamp, for example:

Example 2

#timestamp       row     column  value
2009-08-12 00:01:08     1127071 query   guardianship
2009-08-12 00:01:08     1127071 item:rank       8
2009-08-12 00:01:08     1127071 click_url       http://adopting.adoption.com
2009-08-12 00:01:18     1246036 query   polish american priests association
2009-08-12 00:01:18     1246036 item:rank       6
2009-08-12 00:01:18     1246036 click_url       http://www.palichicago.org
2009-08-12 00:01:14     12653   query   lowes
2009-08-12 00:01:14     12653   item:rank       1
2009-08-12 00:01:14     12653   click_url       http://www.lowes.com
2009-08-12 00:01:10     1270972 query   head hunters
2009-08-12 00:01:10     1270972 item:rank       2
2009-08-12 00:01:10     1270972 click_url       http://www.headhunters.com
2009-08-12 00:01:17     2648672 query   jamie farr
2009-08-12 00:01:17     2648672 item:rank       1
2009-08-12 00:01:17     2648672 click_url       http://www.imdb.com

Deleting data from Hypertable involves inserting delete tombstones.  A delete tombstone is just a key with a special delete flag and an empty value.  The system will logically remove the data described by the tombstone during query execution.  The deleted data will be physically removed (garbage collected) over time via background maintenance tasks.  The following table describes the supported tombstone types:

Table 1
Flag Marker Description
DELETE_ROW Remove row content that is chronologically less than or equal to the given timestamp
DELETE_COLUMN_FAMILY Remove all cells within the given column family of the given row with a timestamp that is chronologically less than or equal to the given timestamp
DELETE_CELL Remove all versions of a cell whose timestamp is chronologically less than or equal to the given timestamp
DELETE_CELL_VERSION Remove cell version whose timestamp exactly matches the given timestamp

Delete tombstones may be specified in either the three column or four column of the single-cell-per-line .tsv formats by supplying an additional column, after an empty value column, that contains one of the delete flag markers described in the above table.  For the three column format, the timestamp will be auto-assigned to the "current" time.

Example 3

#timestamp      row     column  value
2013-08-27 00:13:46     com.hypertable.www                      DELETE_ROW
2013-08-27 00:13:46     com.hypertable.www      tag             DELETE_COLUMN_FAMILY
2013-08-27 00:13:46     com.hypertable.www      tag:foo         DELETE_CELL
2013-08-27 00:13:46     com.hypertable.www      tag:foo         DELETE_CELL_VERSION

The multi-line format assumes that each tab delimited field represents a cell value and the column header specifies the name of the column. Unless otherwise specified, the first column is assumed to be the rowkey. For example:

Example 4

#anon_id     query       item:rank   click_url
3613173     batman signal images    18  http://www.icomania.com
1127071     guardianship  8     http://adopting.adoption.com
1270972     head hunters  2     http://www.headhunters.com
465778      google    1     http://www.google.com
12653       lowes     1     http://www.lowes.com
48785       address locator     2   http://www.usps.com/ncsc/
48785       address locator     3   http://factfinder.census.gov
2648672     jamie farr          1   http://www.imdb.com
1246036     polish american 6   http://www.palichicago.org
605089      dachshunds for sale 2   http://www.houstonzone.org
760038      stds       1   http://www.ashastd.org

When loaded into a table with a straight LOAD DATA INFILE command, the above file will produce a set of cells equivalent to Example 1 above.

OPTIONS

ROW_KEY_COLUMN = column_specifier [ + column_specifier ... ]

The LOAD DATA INFILE command accepts a number of options. The first is the ROW_KEY_COLUMN option. This is used in conjunction with the multi-cell input file format. It provides a way to select which column in the input file should be used as the row key (default is first column). By separating two or more column names with the '+' character, multiple column values will be concatenated together, separated by a single space character to form the row key. Also, each column specifier can have one of the following prefixes to control field width and justification:

Table 2
Prefix Description
%0 For numeric columns, specifies a field width of and right-justify with '0' padding
%- Specifies a field width of and right-justification with ' ' (space) padding
% Specifies a field width of and left-justification with ' ' (space) padding

For example, assuming the data in Example 3 above is contained in a filed named query-log.tsv, then the following LOAD DATA INFILE command:

LOAD DATA INFILE ROW_KEY_COLUMN="%09anon_id"+query "query-log.tsv" INTO TABLE 'anon-id-query';

will populated the 'anon-id-query' table with the following content:

000012653 lowes      item:rank      1
000012653 lowes      click_url      http://www.lowes.com
000048785 address locator       item:rank   3
000048785 address locator       item:rank   2
000048785 address locator       click_url   http://factfinder.census.gov
000048785 address locator       click_url   http://www.usps.com/ncsc/
000465778 google  item:rank     1
000465778 google  click_url     http://www.google.com
000605089 dachshunds for sale       item:rank   2
000605089 dachshunds for sale       click_url   http://www.houstonzone.org
000760038 stds       item:rank      1
000760038 stds       click_url      http://www.ashastd.org
001127071 guardianship          item:rank   8
001127071 guardianship          click_url   http://adopting.adoption.com
001246036 polish american       item:rank   6
001246036 polish american       click_url   http://www.palichicago.org
001270972 head hunters          item:rank   2
001270972 head hunters          click_url   http://www.headhunters.com
002648672 jamie farr            item:rank   1
002648672 jamie farr            click_url   http://www.imdb.com
003613173 batman signal images      item:rank   18
003613173 batman signal images      click_url   http://www.icomania.com

TIMESTAMP_COLUMN = column_name

The TIMESTAMP_COLUMN option is used in conjunction with the multi-cell input file format to specify which field of the input file should be used as the timestamp. The timestamp extracted from this field will be used for each cell in the row. The timestamp field is assumed to have the format YYYY-MM-DD HH:MM:SS

HEADER_FILE = "filename"

The HEADER_FILE option is used to specify an alternate file that contains the header line for the data file. This is useful in situations where you have log files that roll periodically and/or you want to be able to concatenate them. This option allows the input files to just contain data and the header to be specified in a separate file.

ROW_UNIQUIFY_CHARS = n

The ROW_UNIQUIFY_CHARS option provides a way to append a random string of characters to the end of the row keys to ensure that they are unique. The maximum number of characters you can specify is 21 and each character represents 6 random bits. It is useful in situations where the row key isn't discriminating enough to cause each input line to wind up in its own row. For example, let's say you want to dump a server log into a table, using the timestamp as the row key. However, as in the case of an Apache log, the timestamp usually only has resolution down to the second and there may be many entries that fall within the same second.

FS = '<char>'

Set the field separator to character '<char>'.  By default the field separator character is the tab character.  This option allows you to change the field separator character to something different.  When this option is supplied, the escaping rules change such that tabs are not escaped.  When this option is supplied, care must be taken to ensure that the field separator character is not present in the data.

NO_LOG

The NO_LOG option tells the servers to skip the commit log write for the data being loaded which can significantly improve the loading speed.  WARNING:  Use this option with caution because failure of a RangeServer during the execution of this command may result in data loss.  To avoid data loss after successfully running LOAD DATA INFILE with the NO_LOG option, the table should be compacted with the COMPACT command.

NO_ESCAPE

The NO_ESCAPE option provides a way to disable the escaping mechanism. The newline and tab characters are escaped and unescaped when transferred in and out of the system. The LOAD DATA INFILE command will scan the input for the two character sequences '\' 'n', '\' 't', '\' '0', and '\' '\' and will convert them into a newline, tab, NUL, and backslash, respectively. The NO_ESCAPE option disables this conversion.

DUPLICATE_KEY_COLUMNS

Normally input fields that represent the row key (the first field or the ones designated in the ROW_KEY_COLUMN option) are not also inserted as cell data. This option causes the system to also insert the row key fields as cell data.

IGNORE_UNKNOWN_COLUMNS

Skip input lines that refer to unknown (non-existent) column families.

SINGLE_CELL_FORMAT

The LOAD DATA INFILE command will attempt to detect the format of the input file by parsing the first line if it begins with a '#' character. It assumes that the remainder of the line is a tab-delimited list of column names. However, if the file format is the 3-field single-cell format (i.e. row,column,value) and lacks a header, and the first character of the row key in first line happens to be the '#' character, the parser will get confused and interpret the first cell as the format line. The SINGLE_CELL_FORMAT option provides a way to tell the interpreter that there is no header format line and the format of the load file is one of the single cell formats (e.g. row,column,value or timestamp,row,column,value) and have it determine the format by counting the tabs on the first line.

Loading from STDIN

The LOAD DATA INFILE command has the ability to load data from standard input by specifying a file name of "-". The following is an example of how to use this feature:

$ cat data.tsv | ht shell --batch -e 'load data infile "-" into table foo;'

Load from file in the brokered filesystem

If the data file name starts with the location specifier fs:// then the file is read from the brokered filesystem over the FsBroker. If it begins with the specifier file:// then it is read from the local filesystem (this is the default in the absence of a location specifier).

Compression

If the name of the input file ends with a .gz, the file is assumed to be compressed and will be streamed in through a decompressor (gzip).

REBUILD INDICES

EBNF

REBUILD [QUALIFIER|VALUE] INDICES table_name

DESCRIPTION

This command will rebuild the indices for the table specified by table_name.  It can be used to rebuild the qualifier indices, the value indices, or both.  It does this by first dropping and recreating the index table(s) and then scanning over the primary table, reconstructing the indices by repopulating the index tables.

NOTE: This command is not atomic with respect to queries.  While this command is running, queries against the indices that are being rebuilt may return incorrect results.  Also, the command is not serialized with other REBUILD INDICES commands for the same table.  Multiple concurrent invocations of this command for the same table will interfere with one another, causing one of the commands to fail.

EXAMPLE

REBUILD INDICES mytable;
REBUILD QUALIFIER INDICES mytable;
REBUILD VALUE INDICES mytable;

RENAME TABLE

EBNF

RENAME TABLE table_name TO new_table_name

DESCRIPTION

The RENAME TABLE command renames the existing table_name to the new_table_name.

EXAMPLE

hypertable> rename table foo to bar;

SELECT

EBNF

SELECT [CELLS] ('*' | (column_predicate [',' column_predicate]*))

  FROM table_name
  [where_clause]
  [options_spec]

where_clause:
    WHERE where_predicate [AND where_predicate ...]
where_predicate:
  cell_predicate
  | row_predicate
  | column_value_predicate
  | timestamp_predicate

relop: '=' | '<' | '<=' | '>' | '>=' | '=^'
column_predicate
  = column_family
  | column_family ':' column_qualifer
  | column_family ':' '/' column_qualifier_regexp '/'
  | column_family ':' '^' column_qualifier_prefix

cell_spec: row ',' column
cell_predicate:
  [cell_spec relop] CELL relop cell_spec
  | '(' [cell_spec relop] CELL relop cell_spec
        (OR [cell_spec relop] CELL relop cell_spec)* ')'

row_predicate:
  [row_key relop] ROW relop row_key
  | '(' [row_key relop] ROW relop row_key
          (OR [row_key relop] ROW relop row_key)* ')'
  | ROW REGEXP '"'row_regexp'"'

column_value_predicate:
  column_family '=' value
  | column_family '=' '^' value

timestamp_predicate:
  [timestamp relop] TIMESTAMP relop timestamp

options_spec:
  (MAX_VERSIONS version_count
  | OFFSET row_offset
  | LIMIT row_count
  | CELL_OFFSET cell_offset
  | CELL_LIMIT max_cells
  | CELL_LIMIT_PER_FAMILY max_cells_per_cf
  | INTO FILE [file_location]filename[.gz]
  | DISPLAY_TIMESTAMPS
  | KEYS_ONLY
  | NO_ESCAPE
  | RETURN_DELETES
  | SCAN_AND_FILTER_ROWS)*

timestamp:
  'YYYY-MM-DD HH:MM:SS[.nanoseconds]'

file_location:
  "fs://" | "file://"

DESCRIPTION

The SELECT command is used to issue queries that return a specific set of cells from a table.  Conceptually, the command operates in two steps.  The first step selects a set of candidate cells by applying the row or cell predicate.  The second step filters the candidate cells by applying an optional timestamp predicate and other options.

Row Predicate

This is the part of the select statement that allows you to specify a set of rows or row intervals to be considered during query evaluation.  A row or row interval can be specified using a comparison expression that combines the keyword ROW with comparison operators and row key string constants (e.g. ROW = "com.hypertable.www").  To specify multiple rows or row intervals, multiple comparison expressions can be strung together with the OR keyword.  Row predicates cannot be combined with cell predicates (see below). If a row or cell predicate is not supplied, the query results in a full table scan.

Cell Predicate

This part of the select statement allows you to specify a set of cells or cell intervals
to be considered during query evaluation.  A cell or cell interval can be specified using a comparison expression that combines the keyword CELL with comparison operators and cell key string constants (e.g. "com.hypertabl.www","tag:a" <= CELL < "com.hypertable.www","tag:j").  To specify multiple cells or cell intervals, multiple comparison expressions can be strung together with the OR keyword.  Row predicates cannot be combined with cell predicates.  If a row or cell predicate is not supplied, the query results in a full table scan.

Timestamp Predicate

Every cell of every table contains a 64-bit, high resolution timestamp that uniquely identifies a specific version of the cell.  The timestamp is is either supplied by the application or auto-assigned by the system and is interpreted as nanoseconds since the Epoch (1970-01-01 00:00:00.000000000).  Cells can be selected whos timestamp satisfies a certain criteria by specifying a comparison expression that combines the keyword TIMESTAMP with comparison operators and timestamp constants.  Only one timestamp comparison expression is allowed.  The timestamp predicate is a filter that is applied after the cells have been selected via the row or cell predicates.  If no timestamp predicate is supplied, all cell versions are returned.

Column Value Predicate

When specifying a column value predicate, the column family must be identical to the column family used in the SELECT clause, and exactly one column family must be selected.  The following examples are valid:

SELECT col FROM test WHERE col = "foo";
SELECT col FROM test WHERE col =^ "prefix";

The following examples are not valid because they select more than one column family or because the column family in the select clause is different from the one in the predicate (these limitations will be removed in future versions of Hypertable):

SELECT * FROM test WHERE col = "foo";
SELECT col, col2 FROM test WHERE col =^ "prefix";
SELECT foo FROM test WHERE bar = "value";

Column Qualifier Predicate

The column qualifier predicate comes immediately after the SELECT keyword and can be used to do an exact match or prefix match of a column qualifer.  If a qualifier index exists for the column family referenced, it will be used to efficiently satisfy the predicate.  The following is an example of an exact match column qualifier predicate:

SELECT tags:interesting FROM forum_posts;

The following is an example of a prefix match column qualifier predicate:

SELECT tags:^interest FROM forum_posts;

Comparison Operators

The following table lists the comparison operators that can be used in Row, Cell, or Timestamp predicates.

Operator Description Example
= Equals ROW = "com.hypertable.www/doc"
=^ Starts With ( row, cell, and column predicates only ) ROW =^ "com.hypertable"
< Less Than ROW < "com.hypertable"
<= Less Than or Equal To ROW <= "com.hypertable"
> Greater Than TIMESTAMP > "2012-01-02"
>= Greater Than or Equal To TIMESTAMP > "2012-01-02"

OPTIONS

MAX_VERSIONS version_count

Each cell in a Hypertable table can have multiple timestamped versions. By default all versions of a cell are returned by the SELECT statement. The MAX_VERSIONS option allows control over the number of cell versions returned. The cell revisions are stored in reverse-chronological order, so MAX_VERSIONS 1 will return the most recent version of the cell.

OFFSET row_offset

Skips the first row_offset rows returned by the SELECT statement. This option cannot be combined with CELL_OFFSET and currently applies independently to each row (or cell) interval supplied in the WHERE clause.

LIMIT row_count

Limits the number of rows returned by the SELECT statement to row_count. The limit applies independently to each row (or cell) interval specified in the WHERE clause.

CELL_OFFSET cell_offset

Skips the first cell_offset cells returned by the SELECT statement. This option cannot be combined with OFFSET and currently applies independently to each row (or cell) interval supplied in the WHERE clause.

CELL_LIMIT max_cells

Limits the total number of cells returned by the query to max_cells (applied after CELL_LIMIT_PER_FAMILY). The limit applies independently to each row (or cell) interval specified in the WHERE clause.

CELL_LIMIT_PER_FAMILY max_cells_per_cf

Limits the number of cells returned per row per column family by the SELECT statement to max_cells_per_cf.

INTO FILE [file://|fs://]filename[.gz]

The result of a SELECT command is displayed to standard output by default. The INTO FILE option allows the output to get redirected to a file.
If the file name starts with the location specifier fs:// then the output file is assumed to reside in the brokered FS. If it starts with file:// then output is sent to a local file. This is also the default location in the absence of any location specifier. If the file name specified ends in a .gz extension, then the output is compressed with gzip before it is written to the file. The first line of the output, when using the INTO FILE option, is a header line, which will take one of the two following formats. The second format will be output if the DISPLAY_TIMESTAMPS option is supplied.

#row '\t' column '\t' value

#timestamp '\t' row '\t' column '\t' value

DISPLAY_TIMESTAMPS

The SELECT command displays one cell per line of output. Each line contains three tab delimited fields, row, column, and value. The DISPLAY_TIMESTAMPS option causes the cell timestamp to be included in the output as well. When this option is used, each output line will contain four tab delimited fields in the following order:

timestamp, row, column, value

KEYS_ONLY

The KEYS_ONLY option suppresses the output of the value. It is somewhat efficient because the option is processed by the RangeServers and not by the client. The value data is not transferred back to the client, only the key data.

NO_CACHE

The NO_CACHE option causes the RangeServer query cache to be bypassed for this query. It not only prevents cached results from being returned, but it also prevents the results of the query from being inserted into the query cache.

NO_ESCAPE

The output format of a SELECT command comprises tab delimited lines, one cell per line, which is suitable for input to the LOAD DATA INFILE command. However, if the value portion of the cell contains either newline or tab characters, then it will confuse the LOAD DATA INFILE input parser. To prevent this from happening, newline, tab, and backslash characters are converted into two character escape sequences, described in the following table.

Escape Sequences
Character Escape Sequence
backslash \ '\' '\'
newline \n '\' 'n'
tab \t '\' 't'
NUL '\' '0'

The NO_ESCAPE option turns off this escaping mechanism.

RETURN_DELETES

The RETURN_DELETES option is used internally for debugging. When data is deleted from a table, the data is not actually deleted right away. A delete key will get inserted into the database and the delete will get processed and applied during subsequent scans. The RETURN_DELETES option will return the delete keys in addition to the normal cell keys and values. This option can be useful when used in conjuction with the DISPLAY_TIMESTAMPS option to understand how the delete mechanism works.

SCAN_AND_FILTER_ROWS

The SCAN_AND_FILTER_ROWS option can be used to improve query performance for queries that select a very large number of individual rows. The default algorithm for fetching a set of rows is to fetch each row individually, which involves a network roundtrip to a range server for each row. Supplying the SCAN_AND_FILTER_ROWS option tells the system to scan over the data and filter the requested rows at the range server, which will reduce the number of network roundtrips required when the number of rows requested is very large.

EXAMPLES

SELECT * FROM test WHERE ('a' <= ROW <= 'e') and
                         '2008-07-28 00:00:02' < TIMESTAMP < '2008-07-28 00:00:07';
SELECT * FROM test WHERE ROW =^ 'b';
SELECT * FROM test WHERE (ROW = 'a' or ROW = 'c' or ROW = 'g');
SELECT * FROM test WHERE ('a' < ROW <= 'c' or ROW = 'g' or ROW = 'c');
SELECT * FROM test WHERE (ROW < 'c' or ROW > 'd');
SELECT * FROM test WHERE (ROW < 'b' or ROW =^ 'b');
SELECT * FROM test WHERE "farm","tag:abaca" < CELL <= "had","tag:abacinate";
SELECT * FROM test WHERE "farm","tag:abaca" <= CELL <= "had","tag:abacinate";
SELECT * FROM test WHERE CELL = "foo","tag:adactylism";
SELECT * FROM test WHERE CELL =^ "foo","tag:ac";
SELECT * FROM test WHERE CELL =^ "foo","tag:a";
SELECT * FROM test WHERE CELL > "old","tag:abacate";
SELECT * FROM test WHERE CELL >= "old","tag:abacate";
SELECT * FROM test WHERE "old","tag:foo" < CELL >= "old","tag:abacate";
SELECT * FROM test WHERE ( CELL = "maui","tag:abaisance" OR
                           CELL = "foo","tag:adage" OR
                           CELL = "cow","tag:Ab" OR
                           CELL =^ "foo","tag:acya");
SELECT * FROM test INTO FILE "fs:///tmp/foo";
SELECT col2:"bird" from RegexpTest WHERE ROW REGEXP "http://.*";
SELECT col1:/^w[^a-zA-Z]*$/ from RegexpTest WHERE ROW REGEXP "m.*\s\S";
SELECT CELLS col1:/^w[^a-zA-Z]*$/ from RegexpTest WHERE VALUE REGEXP \"l.*e\";
SELECT CELLS col1:/^w[^a-zA-Z]*$/ from RegexpTest WHERE ROW REGEXP \"^\\D+\"
    AND VALUE REGEXP \"l.*e\";";
SELECT col FROM test WHERE col = "foo";
SELECT col FROM test WHERE col =^ "prefix";
SELECT tags:^prefix FROM test;

SHOW CREATE TABLE

EBNF

SHOW CREATE TABLE table_name

DESCRIPTION

The SHOW CREATE TABLE command shows the CREATE TABLE statement that creates the given table.

EXAMPLE

hypertable> SHOW CREATE TABLE foo;
CREATE TABLE foo (
  a,
  b,
  c,
  ACCESS GROUP bar (a, b),
  ACCESS GROUP default (c)
);

SHOW TABLES

EBNF

SHOW TABLES

DESCRIPTION

The SHOW TABLES command lists only the tables in the current namespace.

EXAMPLE

hypertable> SHOW TABLES;
foo
Test

USE

EBNF

USE namespace_name

DESCRIPTION

The USE command sets the current namespace. If namespace_name starts with '/' it treats the namespace_name as an absolute path, otherwise it considers it to be a sub-namespace relative to the current namespace.

EXAMPLE

hypertable> USE "/";

hypertable> CREATE NAMESPACE test;

hypertable> USE "test";

hypertable> CREATE NAMESPACE subtest;

hypertable> USE "subtest";

hypertable> GET LISTING;

hypertable> USE "/test";

hypertable> GET LISTING;
subtest	(namespace)

hypertable> USE "/";
sys	(namespace)
test	(namespace)