Hypertable has Reached a Major Milestone!
02.14.2013 | RangeServer Failover
With the release of Hypertable version 0.9.7.0 comes support for automatic RangeServer failover. Hypertable will now detect when a RangeServer has failed, logically remove it from the system, and automatically re-assign the ranges that it was managing to other RangeServers. This represents a major milestone for Hypertable and alows for very large scale deployments. We have been activly working on this feature, full-time, for 1 1/2 years. To give you an idea of the magnitude of the change, here are the commit statistics:
- 441 changed files
- 17,522 line additions
- 6,384 line deletions
The reason that this feature has been a long time in the making is because we placed a very high standard of quality for this feature so that under no circumstance, a RangeServer failure would lead to consistency problems or data loss. We're confident that we've achieved 100% correctness under every conceivable circumstance. The two primary goals for the feature, robustness and applicaiton transparancy, are described below.
Robustness
We designed the RangeServer failover feature to be extremely robust. RangeServers can fail in any state (mid-split, transferring, etc.) and will be recovered properly. The system can also withstand the loss of any RangeServer, even the ones holding the ROOT or other METADATA ranges. To achieve this level of robustness, we added 63 regression tests that verify the correct handling of RangeServer failures in every conceivable failure scenario. We will follow up later with a blog post describing these tests.
Application Transparency
Another important aspect of our RangeServer failover implementation is application transparency. Aside from a transient delay in database access, RangeServer failures are completely transparent to applications that are actively reading or writing to Hypertable. This transparency applies to native C++ applications as well as Thrift-based applications (Java, PHP, Python, Perl, Ruby, etc.). We have introduced regression tests to verify this behavior and ensure that it continues to hold true for future releases.
To learn more about RangeServer failover, visit the Machine Failure page of the Hypertable documentation. I would like to thank the entire Hypertable development team for the tremendous effort that has gone into this feature. Together they have created a solid foundation which will allow us to rapidly build powerful new query capabilities going forward.
Posted By: Doug Judd, CEO, Hypertable Inc.
Here's what other people had to say
That’s a great milestone.
Still, I am surprised that the most critical part of the system has not been dealt with yet, I am talking about hyperspace recover.
RangeServers (1) are monitored/managed by the Master (2) which relies on the Hyperspace (3) for its environment information, especially the master lock, used to indicate which master got the job.
2 seems to have been done before 1 and 3 seems to come last.
The priority list to have a robust system would have been 3 -> 2 -> 1 -> Go!
As of now, it seems your solutions tries to look like an idol but still has feet of clay… :o\
Wow , thats the great i have not setup the more than two rangeservers and not even face failures but there are lot of peoples which are experincing it when setting up large clusters…......Cograts to Hypertable team