Why Modern Video Codecs Like H.264 Are Essentially ‘Magic’

Why Modern Video Codecs Like H.264 Are Essentially ‘Magic’

You don’t need a Weissman score to know that today’s video encoders are incredibly good at what they do and will continue to get better as hardware power increases. So how exactly does an encoder, such as H.264, compress gigabytes of video into megabytes? It’s complex, sure, but very explainable.

Developer and former Microsoft software engineer Sid Bala has written a detailed post on the inner workings on H.264, appropriately titled “H.246 is Magic”. While the codec does use some sophisticated algorithms to do its work, most of the space-saving is done by simply discarding information:

In a TV signal, R+G+B color data gets transformed to Y+Cb+Cr. The Y is the luminance (essentially black and white brightness) and the Cb and Cr are the chrominance (color) components … But check out the trick: the Y component gets encoded at full resolution. The C components only at a quarter resolution. Since the eye/brain is terrible at detecting color variations, you can get away with this. By doing this, you reduce total bandwidth by one half, with very little visual difference. Half!

Given that you’re storing moving images, it’s possible to extrapolate patterns by analysing each frame and making compression decisions based on this information. As Bala explains:

Imagine you’re watching a tennis match … the court, the net, the crowds all are static. The only thing moving really is the ball. What if you could just have one static image of everything on the background, and then one moving image of just the ball. Wouldn’t that save a lot of space?

The solution is to store only the changes, called the delta. This not only reduces the amount of data, but makes it more favourable to compression.

This is only scratching the surface — hit up Bala’s article below for the full explanation.

H.264 is Magic [Sid Bala]


  • appropriately titled “H.246 is Magic”

    It’s H.264 not H.246.

    it’s possible to extrapolate patterns

    Video encoders also don’t extrapolate, they interpolate. If they extrapolated, you’ll even up with jittery video.

    • Don’t tell others they are wrong when you know very little about the subject, go learn first. Motion Vector Extrapolation is one technique used in many video codecs that EXTRAPOLATES what the next frame should look like from motion vector information in the original frame. Video codecs use many different interpolation and extrapolation techniques. In simple terms MVE says “The red ball in the frame is moving in that direction at that speed, extrapolate where it will be next frame based on it’s original speed and direction”. Go sit in the corner.

        • I am familiar with it, perhaps you should actually read it. Here is just one extract “First motion information is obtained from the first bitstream, and is used to extrapolate second motion information for a second bitstream of compressed image data.” MVE does use interpolation techniques as well, but also heavily realies on extrapolation. You didn’t actually read and understand it did you? Telling others they are wrong when you obviously don’t know – ur doin it again. Maybe you have dyslexia and have misread the word ‘extrapolate” in its 30 or so reference in the patent?

  • Lots of H.265 stuff becoming available now, but it requires a lot more grunt to make it run. Unfortunately for many of us, this means a hardware upgrade is necessary. I only not long ago bought a PVR that could run H.264, but it won’t run H.265. I’ve been using “Handbrake” to transcode, but it’s a slow process.

  • Well, that explains some bugs like when the background have been static for some time and then everything starts moving and a ghost image of that static part appears on top of everything…

Show more comments

Comments are closed.

Log in to comment on this story!