Hypertable Evaluation:  Sample Customer

 

January 23, 2012

Evaluator:  Christoph Rupp

Recommendations

 

  1. The Hypertable.RangeServer.Monitoring.DataDirectories property should be set to make the disk use % field of the RangeServer monitoring system accurate.  For example:

    Hypertable.RangeServer.Monitoring.DataDirectories=Ó/data01,/data02,/data03,/data04,/data05,/data06Ó

  2. There should be three Hyperspace replicas.  Currently, with a single replica, Hyperspace is a single point of failure.

  3. Secondary NameNodes donÕt provide you much protection when running Hypertable.  They just give you point-in-time checkpoints which means that youÕre likely to lose data in the event of a NameNode failure.  Rather than running secondary NameNodes, we recommend that you copy the FS Image onto multiple disks on the NameNode machine as well as at least one NFS mounted disk that resides on a different machine.  For example,  assuming you have disks mounted on /data/1 and /data/2 and also an NFS volume mounted on /mount/nfs, you could achieve this by adding the following lines to hdfs-site.xml:

    <property>
      <name>dfs.name.dir</name>
      <value>/data/1/hadoop/dfs/nn,/data/2/hadoop/dfs/nn, /mount/nfs/data/hadoop/dfs/nn </value>
    </property>

  4. Since you have the default replication count in HDFS set to 2, we recommend that you configure Hypertable to set the replication count of the meta data to 3.  To do this, add the following line to your hypertable.cfg file:

    Hypertable.Metadata.Replication=3

  5. If you donÕt ever intend to do exact row match queries like this:

    SELECT * FROM Sensor WHERE ROW = Ò01496115 2455853.241 02092Ó;

    The you can disable the query cache to free up a little more memory that should help performance (slightly).  To disable it, add the following line to the hypertable.cfg file:

    Hypertable.RangeServer.QueryCache.MaxMemory=0

  6. We recommend setting the following properties in the hdfs-site.xml file (remember to push the change out to all the nodes and restart HDFS).

    <property>
      <name>dfs.namenode.handler.count</name>
      <value>20</value>
    </property>

     <property>
      <name>dfs.datanode.max.xcievers</name>
      <value>4096</value>
    </property>

  7. We recommend  removing all non-standard Hypertable configuration (see below) unless youÕre absolutely sure that it makes an improvement.  Of the ones that you have currently,  I would probably only keep the Hypertable.RangeServer.MemoryLimit and possibly Hypertable.RangeServer.CellStore.DefaultBlockSize, but only if youÕve empirically determined that it improves performance.

 

Service Setup

 

Hostname(s)

Service

Machine Configuration

master001-01

NameNode
Hyperspace

   CPU: Intel(R) Xeon(R) CPU
        X5470 @ 3.33Ghz

 cores: 8

  arch: x86_64

   RAM: 32 GB

    OS: CentOS 6.1

Kernel: Linux 2.6.32-71.el6.x86_64

hyperspace01-01

Master

   CPU: Intel(R) Xeon(R) CPU

        X5470  @ 3.33GHz

 cores: 8

  arch: x86_64

   RAM: 16 GB

    OS: CentOS 6.1

kernel: Linux 2.6.32-71.el6.x86_64

slave001-01 .. slave001-08,

slave002-01 .. slave002-08

DataNode
RangeServer

   CPU: Intel(R) Xeon(R) CPU

        E5506  @ 2.13GHz

 cores: 8

  arch: x86_64

   RAM: 24 GB

    OS: CentOS 6.1

kernel: Linux 2.6.32-71.el6.x86_64

  disk: 6X1TB, /data01 É /data06

   use: 7-12%

secondary001-01

Secondary NameNode

   CPU: Intel(R) Xeon(R) CPU

        E5540  @ 2.53GHz

 cores: 16

  arch: x86_64

   RAM: 38 GB

    OS: CentOS 6.1

kernel: Linux 2.6.32-71.el6.x86_64

secondary001-01

Secondary NameNode

   CPU: Intel(R) Xeon(R) CPU

        X5470  @ 3.33GHz

 cores: 8

  arch: x86_64

   RAM: 16 GB

    OS: CentOS 6.1

kernel: Linux 2.6.32-71.el6.x86_64

 

Network Topology

 

slave001-* machines are plugged into one switch

slave002-* machines are plugged into second switch

 

Both switches are wire speed and there is a 10Gbps link between the switches.

 

HDFS

 

hdfs-site.xml:

 

<property>

  <name>dfs.http.address</name>

  <value>master001-01.samplecustomer.com:50070</value>

</property>

 

<property>

  <name>dfs.replication</name>

  <value>2</value>

</property>

 

<property>

  <name>dfs.block.size</name>

  <value>134217728</value>

</property>

 

 

Hypertable

 

Non-standard config:

 

Hypertable.RangeServer.Range.SplitSize=268435456

Hypertable.Mutator.ScatterBuffer.FlushLimit.PerServer=10485760

Hypertable.RangeServer.CommitLog.PruneThreshold.Min=1000000000

Hypertable.RangeServer.CommitLog.PruneThreshold.Max=3000000000

Hypertable.RangeServer.CellStore.DefaultBlockSize=32768

Hypertable.RangeServer.MemoryLimit=11G

Hypertable.RangeServer.Workers=200

Hypertable.RangeServer.Reactors=24

Hypertable.RangeServer.Scanner.Ttl=300000

ThriftBroker.API.Logging=1

Hypertable.Logging.Level=warn

 

Application

 

Python Thrift Client interface.