Test Setup
This document describes how to setup and run the Hypertable vs. HBase performance evaluation test, to compare the performance of Hypertable 0.9.5.5 with that of HBase 0.90.4. All of the test specific configuration and run scripts can be found in test2.tar.gz and can be examined via the links provided in the pertinent sections below. We built the test framework and checked it into the Hypertable source tree under examples/java/org/hypertable/examples/PerformanceTest/. The test framework is compiled and included in the hypertable-0.9.5.5-examples.jar file included in the Hypertable binary packages. The following source files contain the code which encapsulates the interaction with the Hypertable and HBase APIs:
Driver.java
DriverCommon.java
DriverHypertable.java
DriverHBase.java
Prerequisites
The following prerequisites must be satisfied to run this test:
-
The user account from which the test is being run must have password-less sudo priviledge.
-
The root account on the machine from which this test will be administered must have password-less ssh access to all machines in the test cluster.
-
Hypertable and HBase must be installed on all machines in the test cluster, including the machine from which the test is being administered.
Step 1. Setup and Install Hadoop
The first step is to setup and install HDFS. We ran the test using HDFS version 0.20.2 (CDH3u2) with the following configuration files:
hadoop/conf/core-site.xml
hadoop/conf/hadoop-env.sh
hadoop/conf/hdfs-site.xml
hadoop/conf/masters
hadoop/conf/slaves
Step 2. Setup and Install HBase
Install and configure HBase version 0.90.4 (CDH3u2). We used the following configuration files in our test:
hbase/conf/regionservers
hbase/conf/hbase-env.sh
hbase/conf/hbase-site.xml [reading]
hbase/conf/hbase-site.xml [writing]
Some notable non-default configuration includes the following variables set in hbase-env.sh:
export HBASE_REGIONSERVER_OPTS="-Xmx14g -Xms14g -Xmn128m -XX:+UseParNewGC \ -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -verbose:gc \ -XX:+PrintGCDetails -XX:+PrintGCTimeStamps \ -Xloggc:$HBASE_HOME/logs/gc-$(hostname)-hbase.log" export HBASE_LIBRARY_PATH=/usr/lib/hadoop/lib/native/Linux-amd64-64
Step 3. Setup and Install Hypertable
Install and configure Hypertable version 0.9.5.5 (download) using the following configuration files:
hypertable/conf/hypertable.cfg [writing]
hypertable/conf/hypertable.cfg [reading]
We used the same configuration file for write, scan, and random read uniform tests. The only non-default configuration property that was modified was the one to increase the range split size to make it more inline with the HBase configuration.
Hypertable.RangeServer.Range.SplitSize=1GB
For the Zipfian random read tests, we increased the size of the query cache to two gigabytes with the addition of the following configuration property.
Hypertable.RangeServer.QueryCache.MaxMemory=2G
Step 4. Configure Capistrano "Capfile"
We use a tool called Capistrano to manage Hypertable clusters. Capistrano is a simple tool that facilitates remote task execution. It relies on ssh and reads its instructions from a file called "Capfile". We augmented the Capfile to include tasks for starting and stopping the test framework. The following Capfile was used in our test:
This Capfile can be used to run the test on a different set of machines with a different configuration. The only requirement would be to edit the variables and role definitions at the top of the file. The following listing shows the top portion of the Capfile that would need to change to launch Hypertable and the performance evaluation test on a different cluster with a different configuration.
set :source_machine, "test00" set :install_dir, "/opt/hypertable/doug" set :hypertable_version, "0.9.5.5" set :default_pkg, "/tmp/hypertable-0.9.5.5-linux-x86_64.deb" set :default_dfs, "hadoop" set :default_config, "/home/doug/benchmark/perftest-hypertable.cfg" set :default_additional_args, "" set :hbase_home, "/usr/lib/hbase" set :default_client_multiplier, 1 set :default_test_driver, "hypertable" set :default_test_args, "" role :source, "test00" role :master, "test01" role :hyperspace, "test01", "test02", "test03" role :slave, "test04", "test05", "test06", "test07", "test08", "test09", "test10", "test11", "test12", "test13", "test14", "test15" role :localhost, "test00" role :thriftbroker role :spare role :test_client, "test00", "test01", "test02", "test03" role :test_dispatcher, "test00"
Step 5. Run Tests
The test scripts, Capfile, and default Hypertable configuration file can be installed by un-taring the test2.tar.gz archive.
$ tar xzvf test2.tar.gz bin/ bin/run-test-load-sequential.sh bin/run-test-scan.sh bin/run-test-load.sh bin/run-test-read-random.sh bin/clean-database.sh bin/test-config.sh Capfile perftest-hypertable.cfg reports/
After you have un-tared this archive, modify the Capfile as described in step 4 and modify the following properties in perftest-hypertable.cfg:
HdfsBroker.fs.default.name=hdfs://test01:9000 Hyperspace.Replica.Host=test01 Hyperspace.Replica.Host=test02 Hyperspace.Replica.Host=test03 Hypertable.RangeServer.Range.SplitSize=1GB
Test Scripts
The tests can be run by hand, using the following set of scripts. Some of the tests need to have configuration files adjusted prior to running the test. We created scripts to run each test and in between each test, we adjusted the configuraiton as needed. Each on of these test scripts deposits the result of the test in a summary file in the reports/ subdirectory. The test parameters are encoded in the filename of the summary report, for example:
reports/test2-hbase-random-read-zipfian-4901960784-20-1000-512clients.txt reports/test2-hbase-sequential-write-490196078-20-1000-48clients.txt
The following section describes each test script.
bin/test-config.sh
This script is included by the other test scripts and contains definitions for three important variables that control the behavior of the tests.
let DATA_SIZE=5000000000000 # The following variable points to the english Wikipedia export file that is sampled # for value data. This file must be present on all test client machines VALUE_DATA=/data/1/test/enwiki-sample.txt # The following variable points to the file containing the cumulative mass function # data used to generate the Zipfian distribution. It can be generated with the # following command: # # /opt/hypertable/current/bin/jrun org.hypertable.Common.DiscreteRandomGeneratorZipf\ --generate-cmf-file /data/1/test/cmf.dat 0 100000000 # CMF=/data/1/test/cmf.dat
bin/run-test-load.sh
This script is used to perform the random write test. The systemDATA_SIZE / (key-size + value-size)
with DATA_SIZE
being the variable defined in test-config.sh
. The keys are formed as an ASCII string representation of a number in the range of [0..key-count]*10
.
bin/run-test-scan.sh
This script is used to perform the scan test. The run-test-load.sh
script ([0..key-count]*10
) and the key space is divided into segments and each segment is fed to a test client for scanning.
bin/run-test-load-sequential.sh
This script is used to load a table in preparation for the random read tests. The system [0..key-count)
, where key-count is computed as DATA_SIZE / (key-size + value-size)
. After running this script, each key in the range [0..key-count]
will contain exactly one cell.
bin/run-test-read-random.s [--zipfian]
This script is used to perform the random read test. The system --zipfian
argument is supplied, the test clients will generate a zipfian key distribution in the range [0..key-count)
as defined in the run-test-load-sequential.sh
script. To efficiently generate the zipfian distribution, the clients load cumulative mass function data from a file specified by the CMF
variable in the test-config.sh
script. This file should be present on all test client machines. If the --zipfian
argument is not supplied, a uniform key distribution will be generated.
bin/clean-database.sh
This script is used to clean the database in preparation for each load test. The
Random Write and Scan Tests
The random write and sequential scan tests were run four times. The amount of data written into the table was fixed at 5TB, but the value size varied from 10KB to 10 bytes, with the corresponding cell count going from 500 million to 167 billion. Prior to running these tests, we set the data set size to 5TB by setting the DATA_SIZE variable in the test-config.sh file as follows:
let DATA_SIZE=5000000000000
We also pushed out the hbase-site.xml file and modified the perfeval-hypertable.cfg file to contain the system configuration properties for each system appropriate for the test. The following script illustrates how we ran the tests.
#!/usr/bin/env bash # Set SYSTEM variable to either "hbase" or "hypertable" SYSTEM=$0 let VALUE_SIZE=10000 while [ $VALUE_SIZE -ge 10 ]; do ./bin/run-test-load.sh $SYSTEM 20 $VALUE_SIZE ./bin/run-test-scan.sh $SYSTEM 20 $VALUE_SIZE let VALUE_SIZE=VALUE_SIZE/10 done
Random Read Tests
The random read tests were run twice, once with a 5TB and again with 0.5TB to measure the performance of each system under different RAM-to-disk ratios. In addition to varying the dataset size, we ran each test with a uniform as well as Zipfian key distribution. The Zipfian key distribution was chosen to simulate realistic workload. All tests were run with a fixed value size of 1KB.
We pushed out the hbase-site.xml file and modified the perfeval-hypertable.cfg file to contain the system configuration properties for each system appropriate for the test. The following script illustrates how we ran the tests.
#!/usr/bin/env bash # Set SYSTEM variable to either "hbase" or "hypertable" SYSTEM=$0 ./bin/run-test-load-sequential.sh $SYSTEM 20 1000 ./bin/run-test-read-random.sh $SYSTEM 20 1000 ./bin/run-test-read-random.sh --zipfian $SYSTEM 20 1000