New and Improved Secondary Indexes
07.10.2014 | New Feature
As part of our ongoing effort to make Hypertable useful for a wide range of applications, we've greatly improved our support for secondary indexing. These improvements, introduced in the 0.9.8.0 release, enable much more powerful queries that access rows by means other than the primary key. As part of this effort, we've changed the query language to allow for more expressive queries and have brought it more in-line with SQL.
Like the original implementation, thew new secondary indexes are fully consistent and massively scalable. They can be used to index not only column values, but also column qualifiers. Secondary indexes can be defined in the CREATE TABLE statement with the INDEX and QUALIFIER INDEX clauses, for example:
CREATE TABLE customers (
name, INDEX name, tag, QUALIFIER INDEX tag, info, INDEX info, QUALIFIER INDEX info );
The ALTER TABLE command has been enhanced to allow for creating and dropping indexes of preexisting tables. For example:
ALTER TABLE customers ADD ( QUALIFIER INDEX name ) DROP INDEX ( info );
To populate indexes added with ALTER TABLE or to convert indexes created with older versions of Hypertable into the new format, the indexes need to be rebuilt. This can be accomplished with the new REBUILD INDICES command. For example:
REBUILD INDICES customers;
The examples which follow in this post assume that the following table has been created and populated with products.tsv.
CREATE TABLE products ( title, section, info, category, INDEX section, INDEX info, QUALIFIER INDEX info, QUALIFIER INDEX category );
A set of scripts and data files for running all of the examples in this post can be found in the archive secondary-indexes.tgz.
Queries against the value index
One important difference between secondary index queries and normal table scan queries is that with secondary index queries, the SELECT statement behaves like SQL SELECT in that the WHERE predicate acts as a boolean selector of rows and the columns listed after the SELECT keyword act as a projection of the selected rows. In other words, you can project an arbitrary set of columns that don't necessarily have to be referenced in the WHERE predicate.
Exact match
Select the title column of all rows whose section column contains exactly "books".
SELECT title FROM products WHERE section = "books";
0307743659 title The Shining Mass Market Paperback 0321321928 title C++ Common Knowledge: Essential Intermediate Programming [Paperback] 0321776402 title C++ Primer Plus (6th Edition) (Developer's Library)
Select the title column of all rows that contain an info:actor column containing exactly "Jack Nicholson".
SELECT title FROM products WHERE info:actor = "Jack Nicholson";
B00002VWE0 title Five Easy Pieces (1970) B002VWNIDG title The Shining (1980)
Prefix match
Select the title and info:publisher columns of all rows that contain an info:publisher column whose value starts with "Addison-Wesley".
SELECT title,info:publisher FROM products WHERE info:publisher =^ 'Addison-Wesley'; SELECT title,info:publisher FROM products WHERE info:publisher =~ /^Addison-Wesley/;
0321321928 title C++ Common Knowledge: Essential Intermediate Programming [Paperback] 0321321928 info:publisher Addison-Wesley Professional; 1 edition (March 10, 2005) 0321776402 title C++ Primer Plus (6th Edition) (Developer's Library) 0321776402 info:publisher Addison-Wesley Professional; 6 edition (October 28, 2011)
Regular expression match
Select the title and info:publisher columns of all rows that contain an info:author column whose value starts with "Stephen " followed by either a 'P' or 'K' character.
SELECT title,info:author FROM products WHERE info:author =~ /^Stephen [PK]/; SELECT title,info:author FROM products WHERE Regexp(info:author, '^Stephen [PK]');
0307743659 title The Shining Mass Market Paperback 0307743659 info:author Stephen King 0321776402 title C++ Primer Plus (6th Edition) (Developer's Library) 0321776402 info:author Stephen Prata
Queries against the qualifier index
Like the value index queries, the qualifier index queries have the traditional SQL select/project semantics.
Exact match
Select the title column of all rows that contain an info:studio column.
SELECT title FROM products WHERE Exists(info:studio);
B00002VWE0 title Five Easy Pieces (1970) B000Q66J1M title 2001: A Space Odyssey [Blu-ray] B002VWNIDG title The Shining (1980)
Regular expression match
Select the title column of all rows that contain a category column with qualifier that starts with "/Movies".
SELECT title FROM products WHERE Exists(category:/^\/Movies/);
B00002VWE0 title Five Easy Pieces (1970) B000Q66J1M title 2001: A Space Odyssey [Blu-ray] B002VWNIDG title The Shining (1980)
Select the title column of all rows that contain a category column with qualifier that starts with "/Books" followed at some point later with "Programming".
SELECT title FROM products WHERE Exists(category:/^\/Books.*Programming/);
0321321928 title C++ Common Knowledge: Essential Intermediate Programming [Paperback] 0321776402 title C++ Primer Plus (6th Edition) (Developer's Library)
Boolean operators
The boolean operators AND and OR can be used to combine column predicates. The boolean AND operator can also combine column predicates and ROW intervals.
OR operator
Select the title column of all rows that contain an info:author column that starts with "Stephen P" or contain an info:publisher column that starts with "Anchor".
SELECT title FROM products WHERE info:author =~ /^Stephen P/ OR info:publisher =^ "Anchor"; SELECT title FROM products WHERE info:author =~ /^Stephen P/ OR info:publisher =~ /^Anchor/;
0307743659 title The Shining Mass Market Paperback 0321776402 title C++ Primer Plus (6th Edition) (Developer's Library)
AND operator
Select the title column of all rows that contain an info:author column that starts with "Stephen " followed by either the letter 'P' or 'K' and contain an info:publisher column that starts witht "Anchor"
SELECT title FROM products WHERE info:author =~ /^Stephen [PK]/ AND info:publisher =^ "Anchor"; SELECT title FROM products WHERE info:author =~ /^Stephen [PK]/ AND info:publisher =~ /^Anchor/;
0307743659 title The Shining Mass Market Paperback
Select the title column of all rows whose row key is greater than 'B00002VWE0' and that contain an info:actor column containing exactly "Jack Nicholson".
SELECT title FROM products WHERE ROW > 'B00002VWE0' AND info:actor = 'Jack Nicholson';
B002VWNIDG title The Shining (1980)
Select the title column of all rows whose row key starts with 'B' and that contain an info:actor column containing exactly "Jack Nicholson".
SELECT title FROM products WHERE ROW =^ 'B' AND info:actor = 'Jack Nicholson';
B00002VWE0 title Five Easy Pieces (1970) B002VWNIDG title The Shining (1980)
Programming examples (in all supported languages) of the above queries can be found in the Hypertable Developer Guide. Special thanks goes to Philip Gatt for providing feedback on the original implementation that led to these improvements!
Posted By: Doug Judd
Here's what other people had to say
There are no comments yet... be the first!