salvage

This tool will salvage data from a (potentially corrupt) Hypertable database by recursively walking the /hypertable/tables directory in the brokered filesystem looking for CellStore files and extracting data from them. The recovered data is written into a tree of .tsv files that can be re-loaded into a clean database at a later time. For each unique table directory encountered, a .hql file will also be generated which contains the HQL required to re-create the table.

Prerequisites

This tool requires Hyperspace to be intact and running and an FS broker running on local host.  These services can be started as follows:

$ ht cluster start_hyperspace
$ ht start-fsbroker hadoop

This tool only extracts data from CellStores.  To extract data in commit logs, the log_player tool can be used.

Basic Usage

The simplest usage is to run it with no options, supplying only the output directory argument:

$ ht salvage output

After successfully completing, the output directory will be populated with the namespace hierarchy and the .tsv and .hql files. For example:

$ tree output
output
├── alerts
│   ├── create-realtime.hql
│   └── realtime.tsv
├── cache
│   ├── create-image.hql
│   └── image.tsv
└── search
    ├── blog.tsv
    ├── create-blog.hql
    ├── create-image.hql
    ├── create-news.hql
    ├── image.tsv
    └── news.tsv

Include and Exclude

To exclude a specific namespace or table, the --exclude option may be used:

$ ht salvage --exclude search output

$ tree output
output
├── alerts
│   ├── create-realtime.hql
│   └── realtime.tsv
└── cache
    ├── create-image.hql
    └── image.tsv

To include a specific namespace or table, the --include option may be used:

$ ht salvage --include search/blog output

$ tree output
output
└── search
    ├── blog.tsv
    └── create-blog.hql

Restricting Row Space

To restrict the salvaged data to a specific row key range, use the --start-key and --end-key options. For example:

$ ht salvage --start-key "019999999" --end-key "030000000" output

Path Regex

With the --path-regex option, a regular expression can be supplied to specify which directories in the brokered filesystem should be included in the recovery. For example:

$ ht salvage --verbose --path-regex "/hypertable/tables/[2-3]" output