Standalone
For applications that need the performance characteristics and data model that Hypertable has to offer, but don't require horizontal scalability, Hypertable can be setup to run on a single machine over its native filesystem. For better performance, this machine can be configured with an SSD drive. This document describes how to get Hypertable up and running on a single standalone machine.
Table of Contents
Prerequisites
Before you get started with the installation, there are some general system requirements that need to be satisfied before proceeding. These requirements are described as follows:
-
root access - If you plan to install Hypertable in the standard location (
/opt/hypertable
) you will need root access to create and populate that directory. You can either carry out the steps below while logged in as the root user or you can run each of the commands with sudo. If you do not have root access, you can install the.tar.bz2
package anywhere you like.
-
open file limit - Most operating systems have a limit on the total number of files that a process can have open at any one time. If you plan to load a large amount of data into Hypertable, you will likely need to increase this limit. See Open File Limit for details on how to increase this limit.
-
firewall - The Hypertable processes use TCP and UDP to communicate with one another and with client applications. Firewalls can block this traffic and prevent Hypertable from operating properly. Any firewall that blocks traffic to or from specific ports should be disabled or the appropriate ports should be opened up to allow Hypertable communication. See Hypertable Firewall Requirements for instructions on how to do this.
- Linux Kernel Tuning - The Linux kernel exposes some configuration parameters that can be tuned to optimize the behavior of the kernel for the specific workload which the operating system will be handling. To tune these parameters for optimum performance of the Hypertable server processes, see Linux Kernel Configuration.
Step 1 - Install Hypertable Package
Hypertable can be installed via binary packages which can be downloaded as described on the Hypertable Download page. The packages come bundled with nearly all of the dependent shared libraries. The nice thing about this approach is that only a single package is needed for all flavors of Linux for each supported architecture (32-bit and 64-bit). The only requirement is that your system be built with glibc 2.4+ (released on March 6th 2006). Hypertable comes with a program launch script, ht
, that sets up LD_LIBRARY_PATH (or DYLD_LIBRARY_PATH) to point to the lib/
directory of the installation so that the dependent libraries can be found by the dynamic linker.
To begin the package installation, switch to the directory containing the package file and then issue the command listed below for your operating system.
Redhat, CentOS, or SUSE Installation
$ sudo rpm -ivh --replacepkgs --nomd5 --nodeps --oldpackage package.rpm
Debian or Ubuntu Installation
$ sudo dpkg --install package.deb
Bzipped Archive Installation
sudo mkdir /opt/hypertable tar xjvf package.tar.bz2 sudo mv hypertable-*/opt/hypertable/* /opt/hypertable/
Mac installation
Double-click the package.dmg
file and follow the instructions
The Redhat, Debian, and Mac packages will install Hypertable under a directory by the name of /opt/hypertable/$VERSION by default. You will need to change the ownership of the installation files and directories to the owner that you plan to launch the services as. For example:
sudo chown -R john:staff /opt/hypertable/0.9.8.5
Step 2 - FHS-ize Installation
See Filesystem Hierarchy Standard for an introduction to FHS. Create the directories /etc/opt/hypertable
and /var/opt/hypertable
on all machines in the cluster and change ownership to the user account under which the binaries will be run. For example:
$ sudo mkdir /etc/opt/hypertable /var/opt/hypertable
$ sudo chown john:staff /etc/opt/hypertable /var/opt/hypertable
Then FHS-ize the installation with the following command:
$ /opt/hypertable/0.9.8.5/bin/ht-fhsize.sh
Step 3 - Set "current" link
To make the latest version of Hypertable referenceable from a well-known location, we recommend setting a "current" link to point to the latest installation. After installation, make a symlink from /opt/hypertable/current to point to the latest installed version, for example:
$ cd /opt/hypertable
$ ln -s 0.9.8.5 current
So that you don't have to specify absolute paths when running Hypertable commands, we recommend that you add the Hypertable bin/ directory to the program search path for your shell. If you're running the bash shell, this can be accomplished by adding the following line to your .bashrc file:
export PATH=$PATH:/opt/hypertable/current/bin
You'll need to log out and log back in to pick up the PATH change. If you're running a shell other than bash, consult the documentation for your shell for instructions on how to modify the program search path. Once the search path is setup, the hypertable CLI (command shell) can be run as follows:
$ ht shell
Step 4 - Setup data volume
To configure the Hypertable installation to write to the correct data volume, make sure the volume is formatted and mounted somewhere, for example /data
. Then create a directory on that data volume to serve as the toplevel directory for the Hypertable data files. For example:
$ sudo mkdir -p /data/hypertable/fs
Then make sure this directory is readable and writeable by the user account from which you will be running the Hypertable service. For example:
$ sudo chown -R john:staff /data/hypertable
Finally, create a symbolic link called "fs" inside the Hypertable installation pointing to the directory you just created on the data volume. For example:
$ cd /opt/hypertable/current $ rm -f fs $ ln -s /data/hypertable/fs
Step 5 - Edit hypertable.cfg
Open the file /opt/hypertable/current/conf/hypertable.cfg
file in an editor and modify the Hyperspace.Replica.Host
, Hypertable.Cluster.Name, and Hypertable.RangeServer.Monitoring.DataDirectories properties for your cluster. For example:
Hyperspace.Replica.Host=yourmachine Hypertable.Cluster.Name="Your Cluster Name" Hypertable.RangeServer.Monitoring.DataDirectories="/data"
Be sure to use the public DNS name of the machine for the Hyperspace.Replica.Host property. By doing so, it will allow other machines in your network to access this local Hypertable instance by simply copying this modified configuration file into the conf/ directory of their Hypertable installation:
othermachine$ scp yourmachine:/opt/hypertable/current/conf/hypertable.cfg /opt/hypertable/current/conf othermachine$ ht shell
Step 6 - Install monitoring system dependencies
The Monitoring UI is written in Ruby with the Sinatra web framework and uses RRDTool to create databases for storing metric data. Ruby 1.8.7 or greater is required. The first step in getting the dependencies set up is to install Ruby, RubyGems, and RRDTool:
Redhat (CentOS)
$ yum -y install rrdtool ruby rubygems ruby-devel ruby-rdoc
Debian (Ubuntu)
$ apt-get -y install rrdtool ruby rubygems ruby-dev rdoc
Mac OSX
Ruby is installed by default on OSX, so nothing needs to be done to install it. For RRDTool, you can install it with either MacPorts or Homebrew. To install it with MacPorts, issue the following command:
$ port install rrdtool
To install it with Homebrew, issue the following command:
$ brew install rrdtool
After successfully installing Ruby, RubyGems, and RRDTool, the next step is to verify that you have Ruby 1.8.7 or greater installed:
$ ruby --version ruby 1.8.7 (2011-06-30 patchlevel 352) [x86_64-linux]
If your version of ruby is older than 1.8.7, you'll need to consult your operating system documentation (i.e. the Internet) to figure out how to install a newer version of Ruby and RubyGems (the packages may be called ruby19, rubygems19, ...).
Once you have the correct version of Ruby and RubyGems installed, install the required gems as follows:
$ gem install sinatra rack thin json titleize syck
NOTE: the syck gem is only required for ruby >= 2.0
Step 7 - Install Notification Script
The Hypertable Master will invoke a notification script (conf/notification-hook.sh
) to inform the Hypertable administrator of certain events such as machine failure or any problems that may have been encountered during machine failure recovery. The script accepts two arguments, a subject string and a message body string. The prefix of the subject line string can be examined to determine the type of notification, "NOTICE" indicating a notification of abnormal condition, and "ERROR" indicating a hard error that requires intervention. The following is an example notification script (/opt/hypertable/current/conf/notification-hook.sh
) that can be used to email notificaiton to a list of administrators:
#!/usr/bin/env bash recipients="root" subject=$1 message=$2 echo -e $message | mail -s "$subject" ${recipients}
Modify the recipients variable to contain the the list of recipients to whom notificaiton messages are to be sent. Verify that the script works properly by testing it manually:
/opt/hypertable/current/conf/notification-hook.sh "Test Message" "This is a test."
Step 8 - [optional] Edit cluster.def
Hypertable clusters are administered with the ht_cluster task automation tool. The tool requires a configuration file called cluster.def which is located in the conf/ directory of the installation. Configuring cluster.def for your standalone instance allows you to administer Hypertable in the exact same way you would administer a full-blown Hypertable cluster.
The first thing to do is to get your machine set up so that you can ssh into localhost without a password. See Password-less SSL Login for details on how to do this.
Hypertable comes with an example cluster configuration file called cluster.def-EXAMPLE. Copy or rename this file to cluster.def, for example:
cd /opt/hypertable/0.9.8.5/conf cp cluster.def-EXAMPLE cluster.def
There are some variables at the top of this file that you need to modify for your particular environment. These variables are shown below.
INSTALL_PREFIX=/opt/hypertable HYPERTABLE_VERSION=0.9.8.5 PACKAGE_FILE=/root/packages/hypertable-0.9.8.5-linux-x86_64.tar.gz FS=local ORIGIN_CONFIG_FILE=/root/hypertable.cfg
The main difference between how cluster.def is configured for a standalone instance and how one is configured for a Hadoop cluster is that the FS variable is set to local instead of hadoop and the HADOOP_DISTRO variable is not used.
In addition to setting the variables, the roles must also be configured. For a standalone Hypertable instance, all of the roles are assigned to localhost, for example:
role: source localhost role: master localhost role: hyperspace localhost role: slave localhost role: thriftbroker role: spare
For a complete description of the variables and roles, see Step 3 - Edit cluster.def section of the Hadoop Installation guide.
Give it a try
The programs and shell scripts for administering Hypertable can be found in the bin/
directory of the Hypertable installation. It is strongly recommended that you add this directory to your PATH
environment variable so that you can run the programs without having to specify full paths. For example, to add the Hypertable bin/
directory to the PATH
environment variable in Bash, add the following line to your .bashrc
file:
export PATH=$PATH:/opt/hypertable/current/bin
For instructions on how to do this with other shells, please refer to their documentation.
Start Hypertable
To start all of the Hypertable processes:
$ ht start-all-servers local
It should generate output that looks something like this:
FsBroker (local) Started Hyperspace Started Master Started RangeServer Started ThriftBroker Started
Alternatively, if you configured cluster.def as described in Step 8, you can start all of the Hypertable processes with the following command:
$ ht cluster start
Verify that it works
Create a table.
echo "CREATE TABLE foo ( c1, c2 ); GET LISTING;" | ht shell --batch
The output of this command should look like:
foo sys (namespace) tmp (namespace)
Load some data.
echo "INSERT INTO foo VALUES('001', 'c1', 'very'), \ ('000', 'c1', 'Hypertable'), ('001', 'c2', 'easy'), ('000', 'c2', 'is');" \ | ht shell --batch
Dump the table.
echo "SELECT * FROM foo;" | ht shell --batch
The output of this command should look like:
000 c1 Hypertable
000 c2 is
001 c1 very
001 c2 easy
View Monitoring UI
The Hypertable monitoring UI is accessible on HTTP port 15860. It takes about a minute after Hypertable comes up for the monitoring system to gather information and start populating the metric databases. You can access the monitoring system in a web browser by visiting the address formulated with the machine name and port 15860 (e.g. http://your-hypertable-machine:15860). If you get an error message in the browser, wait a minute, then hit refresh and it should come up.
Stop Hypertable
To stop all of the Hypertable processes:
$ ht stop-servers
Alternatively, if you configured cluster.def as described in Step 8, you can stop Hypertable with the following command:
$ ht cluster stop
To wipe Hypertable clean, removing all namespaces and tables:
$ ht destroy-database
or
$ ht cluster destroy
What Next?
Congratulations! Now that you have successfully installed Hypertable, we recommend that you walk through the HQL Tutorial to get familiar with using the system.