Squash
0.7.0
|
While the Squash API is fairly large, it is conceptually fairly simple.
Most of the functions are really just layers of convenience wrappers which follow common conventions; for example, many functions will have versions for string and SquashCodec parameters for the codec, and variadic arguments or a SquashOptions parameter for the options, which means four versions for every function, but only one concept to learn.
Squash's API follows some pretty standard conventions:
size_t
followed by a uint8_t*
.The heart of Squash is the codec. A codec is an implementation of an algorithm, and codecs with the same name are assumed to be compatible. A single plugin can, and often does, provide an implementation of several different codecs.
Typically you will not deal with contexts or plugins directly, and you may only deal with strings to represent codecs.
If you have a block of data in memory which you want to compress (or decompress), the easiest way to do so is using the buffer API.
Like many functions in Squash, squash_compress returns a SquashStatus. If the operation was successful, a positive number is returned (generally SQUASH_OK), and a negative number is returned on failure.
The first argument is the name of the codec. To list available codecs, you can use the squash -L
command; the convention is that codecs are just the lowercase name of the algorithm. For example, "gzip", "lz4", and "bzip2" are all what you would probably expect.
The compressed and compressed_length arguments represent the buffer which you wish to compress the data into. Notice that compressed_length is a size_t*
, not a size_t
. When you call the function it should be a pointer to the number of bytes available for writing in the compressed buffer, and the function will alter the value to the true size of the compressed data upon successful compression.
uncompressed and uncompressed_length are the buffer you wish to compress. uncompressed_length is a size_t
, not a size_t
, because the function does not need to modify the value.
Finally, this function is variadic—it accepts an arbitrary number of options, which you can use to control things like the compression level. Options are key/value pairs of strings, terminated by a NULL sentinel… More on that later, but for now, just know that passing NULL there will use the defaults, which is generally what you want.
Knowing how big the compressed buffer needs to be generally requires another function:
This function returns the maximum buffer size necessary to hold uncompressed_length worth of compressed data, even in the worst case scenario (uncompressable data). Typically it is slightly larger than uncompressed_length.
So, if you wanted to compress some data using the deflate algorithm, you end up with something like this:
Decompression is basically the same:
It is worth noting that "Hello, world!" is really too short to compress (it "compresses" from 13 bytes to 15 bytes here), but using a longer string will yield better results. You can find a complete, self-contained example in simple.c.
While the buffer API is very easy to use it can be a bit limiting. If you want to compress a lot of data without loading it into memory then a streaming API is much more appropriate.
Squash has two streaming APIs; the first is modeled after the API provided by zlib, which has been copied by many other libraries. It is a bit of a pain to use, but extremely powerful. The second API is a higher-level convenience API which wraps the lower-level API and provides an interface similar to the standard C I/O API (i.e. fopen, fclose, fflush, fread, and fwrite).
We will not go into detail about the lower-level API now, but if you're interested you can look at the API documentation for SquashStream for details.
The higher level API is documented in SquashFile, but some of the prototypes are reproduced here:
Like most of the Squash API, most of these functions return SQUASH_OK on success and a negative error code on failure. The exception to this is squash_file_open which, like fopen
, returns NULL on failure. The mode string is passed verbatim to fopen
.
Going back to our "Hello, world!" example from above, to write a file with the compressed version of "Hello, world!", all you would have to do is something like:
If you want to simply splice the contents of one file to another (decompressing of compressing in the process), you can use the squash_splice
family of functions, which looks like:
Beyond just being convenient, this function has the advantage that for codecs which don't implement streaming natively Squash will attempt to memory map the files and pass those addresses directly to the buffer-to-buffer compression function. This eliminates a lot of buffering which, for larger files, can consume significant amounts of memory.