A Look At Zopfli, Google’s Open Source Compression Algorithm

Google recently open-sourced “Zopfli”, a new, optimised implementation of the Deflate compression algorithm. Deflate is not only the default compressor for ZIP files, but the de facto standard when, well, deflating web pages, scripts and other text files for transmission over the internet.

What does Zopfli mean to you? Unless you’re running some kind of massive server farm or archiving data for long-term storage, not much.

You might be wondering why Google would bother optimising the Deflate algorithm at all? Why not use newer approaches such as LZMA, popularised by 7-Zip, prediction by partial matching (PPM), which is designed specifically for text (and would therefore work well with HTML, CSS and JavaScript) or even the Burrows–Wheeler transform-based algorithm in BZip2, which has been around almost as long as gzip?

The problem is that current infrastructure would also need to change to support data compressed via these methods and that is no small task. The other problem is that of overhead and speed — Deflate, as implemented by gzip, is one of the fastest general-purpose algorithms and doesn’t require a lot of CPU time, while the others mentioned aren’t as lightweight. It doesn’t matter what compressor was used to apply the transform, unpacking the information is about the same speed, as Google shows:

Compression algorithm Uncompress time for “gzip ­d” of enwik8
gzip ­9 934 ms
7­zip ­mm=Deflate ­mx=9 949 ms
kzip 937 ms
Zopfli 926 ms

This mean you can play around with the compression settings, as long as you don’t change the fundamental rules and you won’t break compatibility. So, by getting the most out of Deflate, nothing has to change on the client end. All modern web browsers will support data compressed via Zopfli, without alteration and all that’s required on the server is a module or extension, depending on the software running.

Of course, there is the little problem of compression speed, with Google admitting that Zopfli nets you 3-8 per cent smaller files, at the expense of being 81 times slower.

Compression algorithm Compression time
gzip ­9 5.60 s
7­zip ­mm=Deflate ­mx=9 128 s
kzip 336 s
Zopfli 454 s

Then there’s the actual compression ratios. We’ve already mentioned the percentages, but a real example always helps. On enwik8, a 100MB chunk based on a dump of Wikipedia entries, gzip manages 36,445,248MB, while Zopfli gets it down to 34,995,756, which works out to be 1.5 per cent smaller.

So, it’s not so great compared to gzip, but against other optimised Deflate-based compressors, it does OK.

If you firing off a lot of static information to a couple of thousand clients (or more) a second, then you’ll happily take the bandwidth savings and live performance benefits over the time it took to compress the data in the first place. This is where Zopfli shines. You could also argue that the long term storage of digital information would benefit from Zopfli — you wouldn’t have to rely on an exotic decompression program to access that data — any old Deflate-compatible implementation will do.

Finally, Deflate works on any sort of data you care to throw at it, while PPM does its best work on text. You could argue that, other than text, a lot of data transmitted over internet is already compressed, images being the best example — they’re processed with algorithms specifically tuned for those formats.

PPM, then, wouldn’t be out of place if limited to HTML, CSS, etc. Yet, again, we go back to wide-scale adoption and getting everyone to agree on a specific algorithm. Seeing as how everyone is still sorting out the audio and video codecs for HTML5, that’s a fairly unattractive can of worms.

So, why should you care about Zopfli? To the average net citizen, there’s nothing really to see. But for compression buffs such as myself and the developers behind the likes of Apache, ngnix and other servers, it’s a cool accomplishment and a solid effort at making the internet that much more responsive.

Image: 401(K) 2012 / Flickr, licensed under Creative Commons 2.0

zopfli [Google Project Hosting, via Google Open Source Blog]


The Cheapest NBN 50 Plans

Here are the cheapest plans available for Australia’s most popular NBN speed tier.

At Lifehacker, we independently select and write about stuff we love and think you'll like too. We have affiliate and advertising partnerships, which means we may collect a share of sales or other compensation from the links on this page. BTW – prices are accurate and items in stock at the time of posting.

Comments


2 responses to “A Look At Zopfli, Google’s Open Source Compression Algorithm”

Leave a Reply