Block compressor that uses the BMZ algorithm. More...
#include <BlockCompressionCodecBmz.h>
Public Member Functions | |
BlockCompressionCodecBmz (const Args &args) | |
Constructor. More... | |
virtual | ~BlockCompressionCodecBmz () |
Destructor. More... | |
virtual void | deflate (const DynamicBuffer &input, DynamicBuffer &output, BlockHeader &header, size_t reserve=0) |
Compresses a buffer using the BMZ algorithm. More... | |
virtual void | inflate (const DynamicBuffer &input, DynamicBuffer &output, BlockHeader &header) |
Decompresses a buffer compressed with the BMZ algorithm. More... | |
virtual void | set_args (const Args &args) |
Sets arguments to control compression behavior. More... | |
virtual int | get_type () |
Returns enum value representing compression type BMZ. More... | |
Public Member Functions inherited from Hypertable::BlockCompressionCodec | |
virtual | ~BlockCompressionCodec () |
Destructor. More... | |
Private Attributes | |
DynamicBuffer | m_workmem |
Working memory buffer used by deflate() and inflate() More... | |
size_t | m_offset |
Starting offset of fingerprints. More... | |
size_t | m_fp_len |
Fingerprint length. More... | |
Additional Inherited Members | |
Public Types inherited from Hypertable::BlockCompressionCodec | |
enum | Type { UNKNOWN =-1, NONE =0, BMZ =1, ZLIB =2, LZO =3, QUICKLZ =4, SNAPPY =5, COMPRESSION_TYPE_LIMIT =6 } |
Enumeration for compression type. More... | |
typedef std::vector< String > | Args |
Compression codec argument vector. More... | |
Static Public Member Functions inherited from Hypertable::BlockCompressionCodec | |
static const char * | get_compressor_name (uint16_t algo) |
Returns string mnemonic for compression type. More... | |
Block compressor that uses the BMZ algorithm.
This class provides a way to compress and decompress blocks of data using the bmz algorithm, a compression algorithm based on the one described in the paper, Data Compression Using Long Common Strings (Bentley & McIlroy, 1999). This algorithm generally works well for data that contains long repeated strings. It was described in, Bigtable: A Distributed Storage System for Structured Data (Dean et al., 2006) as the compression algorithm they use for the "content" column of their crawler database. In this column they store multiple copies of each crawled page content.
Definition at line 49 of file BlockCompressionCodecBmz.h.
BlockCompressionCodecBmz::BlockCompressionCodecBmz | ( | const Args & | args | ) |
Constructor.
Initializes members as follows: m_workmem (0), m_offset=0, m_fp_len=19. It then calls bmz_init() and then passes args
into a call to set_args().
args | Arguments to control compression behavior |
Definition at line 40 of file BlockCompressionCodecBmz.cc.
|
virtual |
Destructor.
Definition at line 46 of file BlockCompressionCodecBmz.cc.
|
virtual |
Compresses a buffer using the BMZ algorithm.
This method reserves enough space in output
to hold the serialized header
followed by the compressed input followed by reserve
bytes. If the resulting compressed buffer is larger than the input buffer, then the input buffer is copied directly to the output buffer and the compression type is set to BlockCompressionCodec::NONE. Before serailizing header
, the data_length, data_zlength, data_checksum, and compression_type fields are set appropriately. The output buffer is formatted as follows:
header | compressed data | reserve |
input | Input buffer |
output | Output buffer |
header | Block header populated by function |
reserve | Additional space to reserve at end of output buffer |
Implements Hypertable::BlockCompressionCodec.
Definition at line 69 of file BlockCompressionCodecBmz.cc.
|
inlinevirtual |
Returns enum value representing compression type BMZ.
Returns the enum value BMZ
Implements Hypertable::BlockCompressionCodec.
Definition at line 116 of file BlockCompressionCodecBmz.h.
|
virtual |
Decompresses a buffer compressed with the BMZ algorithm.
input | Input buffer |
output | Output buffer |
header | Block header |
Implements Hypertable::BlockCompressionCodec.
Definition at line 104 of file BlockCompressionCodecBmz.cc.
|
virtual |
Sets arguments to control compression behavior.
The arguments accepted by this method are described in the following table.
Argument | Description |
---|---|
–fp-len n | Fingerprint length |
–offset n | Starting offset of fingerprints |
args | Vector of arguments |
Reimplemented from Hypertable::BlockCompressionCodec.
Definition at line 56 of file BlockCompressionCodecBmz.cc.
|
private |
Fingerprint length.
Definition at line 127 of file BlockCompressionCodecBmz.h.
|
private |
Starting offset of fingerprints.
Definition at line 124 of file BlockCompressionCodecBmz.h.
|
private |
Working memory buffer used by deflate() and inflate()
Definition at line 121 of file BlockCompressionCodecBmz.h.