Roadmap to Hypertable 1.0
07.23.2012 | Release Status
With the release of Hypertable version 0.9.6.0 I thought I would take some time to describe where we are in terms of the Hypertable 1.0 release and what work is remaining. We had intended to make the next Hypertable release our beta release. However, it’s been four months since the release of 0.9.5.6 and since the beta release is not quite ready to go, we decided to do one last alpha release and call it 0.9.6.0. In this release we’ve put in a considerable effort to fix a number of stability issues that have affected prior releases.
0.9.6.0 Stability Improvements for HDFS deployments
The biggest source of instability for Hypertable deployments running on top of HDFS has do with the unclean shutdown of either the Master or RangeServer. Upon restart after this situation has ocurred, the RangeServer (or Master) can fail to come up with an error message similar to the following in its log file:
1342810317 ERROR Hypertable.RangeServer : verify_backup (/root/src/hypertable/src/cc/Hypertable/Lib/MetaLogReader.cc:131): MetaLog file '/hypertable/servers/rs12/log/rsml/0' has length 0 < backup file '/opt/hypertable/0.9.5.6 /run/log_backup/rsml/rs12/0' length 11376
This problem was due to a misunderstanding on our part of the HDFS API semantics. Whenever the Master or RangeServer writes data to any of its log files, it makes a call to
FSDataOutputStream.sync() to ensure that the data makes it in to the filesystem and is persistent. However, after making this call, a call to the
FileStatus.getLen() does not return the correct value.
FileStatus.getLen() only returns the correct file length if the file was properly closed. HDFS provides an alternate API,
DFSClient.DFSDataInputStream.getVisibleLength(), that returns the actual length of the file regardless of whether or not it was closed properly. We’ve since modified the HDFSBroker length method to use this alternate API which has solved the problem with negligible performance impact.
There have been a number of other important, though more rare, issues that have been fixed in the 0.9.6.0 release. A brief description of each of these fixes can be found in the Hypertable 0.9.6.0 Release Notes.
0.9.7.0 “beta” Release – Automatic RangeServer Failover
The main feature that has been holding up our beta release is automatic RangeServer failover. Currently, if a machine running a RangeServer dies, the portion of the data set managed by that machine will be unavailable until an operator intervenes and does one of two things:
- Bring the machine back online
- Replace the machine with a new one and assign it the same IP address and/or hostname
The 0.9.7.0 “beta” release will include automatic RangeServer failover in which the Master will detect when a RangeServer dies and will orchestrate the recovery of that RangeServer. Recovery will involve re-assigning the ranges managed by the failed RangeServer to other servers and enlisting the help of other RangeServers in the system to replay the failed server’s commit log in parallel.
We have been laying the groundwork for automatic RangeServer failover for five years and have been actively working on it full-time for the past nine months. The difficult part of this work is not so much the basic failover logic, but handling all of the failure scenarios. For example, after a RangeServer fails and recovery is initiated, the system needs to properly handle the case where one or more of the recovery participants fail. We’ve put a tremendous effort into testing for and handling all of the various failure scenarios and have over seventy tests to verify correctness and prevent regressions.
We plan to release Hypertable 0.9.7.0 with automatic RangeServer failover next month.
On to Hypertable 1.0 …
Once the beta release is out, we will have an approximately two month beta period in which we will resolve any remaining issues and then release version 1.0. Thanks to everyone who has helped us get to this point and you can rest assured that we're working very hard to make 1.0 available as soon as possible.
Posted By: Doug Judd, CEO, Hypertable Inc.