A Look At Zopfli, Google's Open Source Compression Algorithm

Google recently open-sourced "Zopfli", a new, optimised implementation of the Deflate compression algorithm. Deflate is not only the default compressor for ZIP files, but the de facto standard when, well, deflating web pages, scripts and other text files for transmission over the internet.

What does Zopfli mean to you? Unless you're running some kind of massive server farm or archiving data for long-term storage, not much.

You might be wondering why Google would bother optimising the Deflate algorithm at all? Why not use newer approaches such as LZMA, popularised by 7-Zip, prediction by partial matching (PPM), which is designed specifically for text (and would therefore work well with HTML, CSS and JavaScript) or even the Burrows–Wheeler transform-based algorithm in BZip2, which has been around almost as long as gzip?

The problem is that current infrastructure would also need to change to support data compressed via these methods and that is no small task. The other problem is that of overhead and speed — Deflate, as implemented by gzip, is one of the fastest general-purpose algorithms and doesn't require a lot of CPU time, while the others mentioned aren't as lightweight. It doesn't matter what compressor was used to apply the transform, unpacking the information is about the same speed, as Google shows:

Compression algorithm Uncompress time for "gzip ­d" of enwik8
gzip ­9 934 ms
7­zip ­mm=Deflate ­mx=9 949 ms
kzip 937 ms
Zopfli 926 ms

This mean you can play around with the compression settings, as long as you don't change the fundamental rules and you won't break compatibility. So, by getting the most out of Deflate, nothing has to change on the client end. All modern web browsers will support data compressed via Zopfli, without alteration and all that's required on the server is a module or extension, depending on the software running.

Of course, there is the little problem of compression speed, with Google admitting that Zopfli nets you 3-8 per cent smaller files, at the expense of being 81 times slower.

Compression algorithm Compression time
gzip ­9 5.60 s
7­zip ­mm=Deflate ­mx=9 128 s
kzip 336 s
Zopfli 454 s

Then there's the actual compression ratios. We've already mentioned the percentages, but a real example always helps. On enwik8, a 100MB chunk based on a dump of Wikipedia entries, gzip manages 36,445,248MB, while Zopfli gets it down to 34,995,756, which works out to be 1.5 per cent smaller.

So, it's not so great compared to gzip, but against other optimised Deflate-based compressors, it does OK.

If you firing off a lot of static information to a couple of thousand clients (or more) a second, then you'll happily take the bandwidth savings and live performance benefits over the time it took to compress the data in the first place. This is where Zopfli shines. You could also argue that the long term storage of digital information would benefit from Zopfli — you wouldn't have to rely on an exotic decompression program to access that data — any old Deflate-compatible implementation will do.

Finally, Deflate works on any sort of data you care to throw at it, while PPM does its best work on text. You could argue that, other than text, a lot of data transmitted over internet is already compressed, images being the best example — they're processed with algorithms specifically tuned for those formats.

PPM, then, wouldn't be out of place if limited to HTML, CSS, etc. Yet, again, we go back to wide-scale adoption and getting everyone to agree on a specific algorithm. Seeing as how everyone is still sorting out the audio and video codecs for HTML5, that's a fairly unattractive can of worms.

So, why should you care about Zopfli? To the average net citizen, there's nothing really to see. But for compression buffs such as myself and the developers behind the likes of Apache, ngnix and other servers, it's a cool accomplishment and a solid effort at making the internet that much more responsive.

Image: 401(K) 2012 / Flickr, licensed under Creative Commons 2.0

zopfli [Google Project Hosting, via Google Open Source Blog]


Comments

    Let me summarise:
    It could make surfing the net slightly faster, if web servers use it.

    What's the carbon impact of those extra processing cycles for a 1.5% gain in size. Seems absurd

    After discovering xz (lzma), I've come to consider deflate as a bit of a joke. Perhaps the time spent on this insignificant improvement would have been better spent at pushing ppm/lzma for the next generation of browsers.

    Here are my stats based on compressing a 20gb SQL file using maximum compression:

    gzip: 18gb
    xz: 1.2gb

    I guess Zopfli might come in at 16 to 17gb? That's still pitiful.

    Anyone using deflate implementations for archived data is making a huge mistake, Using lzma isn't relying on an exotic algorithm since decompressors are open source and widely available on every Linux installation I've seen, and Windows through 7zip and such.

    You're right. We should work collectively to make the web more responsive and reduce bandwidth. So why are we bothering with Zopfli?

      I'm sure Google has its reasons, though it's quite possible it's just the result of an employee's side project and will never actually be implemented. Rather than just let this tinkered algorithm rot somewhere on a server, the decision was made to open source it instead. Which is... nice, I guess.

      Google is optimizing content delivery to their userbase - the most popular (if not universal) HTTP 1.1 compression encodings are gZip and Deflate only. So basically they are making changes to what they have control over: making their backend as efficient as possible for the most common encodings accepted by user's clients. Sure LZMA algorithm is more efficient, but it is unlikely the world's userbase are going to upgrade their clients (especially on mobile or closed/native platforms) to support additional compression encodings anytime soon.

Join the discussion!

Trending Stories Right Now