H.264 vs. H.265 (HEVC): Key Differences in Decoding

Video compression standards have evolved rapidly to keep up with the demand for higher quality and more efficient streaming. H.264, also known as AVC, has been a workhorse since its introduction in 2003, powering everything from DVDs to online videos. Its successor, H.265 or High Efficiency Video Coding (HEVC), arrived in 2013 with promises of up to 50% better compression efficiency—meaning smaller file sizes or higher quality at the same bitrate.

While both codecs share a similar high-level architecture, their decoding processes differ significantly in ways that impact performance, complexity, and hardware requirements. In this post, we'll explore the key differences in decoding, focusing on how H.265 builds upon and improves H.264's foundations. This is geared toward those familiar with basic video decoding concepts; if you're new, check out my intro to H.264 decoding for background.

Core Architectural Shifts

At the heart of both decoders is a hybrid approach: prediction to guess pixel values, followed by residual coding to handle differences. However, H.265 introduces larger and more flexible structures for better efficiency.

In H.264, decoding revolves around macroblocks—fixed 16x16 pixel units that can be subdivided for motion compensation. H.265 replaces this with Coding Tree Units (CTUs), which can be as large as 64x64 pixels. These CTUs are recursively split into Coding Units (CUs) down to 8x8, Prediction Units (PUs) for prediction, and Transform Units (TUs) for residual processing. This quadtree-based partitioning allows H.265 to adapt better to content complexity, reducing overhead in uniform areas like skies or walls.

The result? H.265 decoders handle fewer but larger blocks, which can improve cache efficiency in hardware but requires more sophisticated parsing logic to navigate the tree structure.

Prediction Enhancements

Prediction is where much of the compression magic happens, and H.265 amps it up considerably.

For intra prediction (within the same frame), H.264 offers up to 9 modes for 4x4 luma blocks, extrapolating from neighboring pixels. H.265 expands this to 35 modes, including more angular directions and a planar mode for smoother gradients. This finer granularity means better predictions for detailed textures, but the decoder must evaluate and apply more possibilities, increasing computational load.

Inter prediction (across frames) sees similar upgrades. H.264 supports quarter-pixel motion vector precision with partitions down to 4x4. H.265 goes to 1/8-pixel for chroma and introduces advanced motion vector prediction (AMVP) and merge modes, where motion info is inherited from neighbors to save bits. It also allows asymmetric partitioning of PUs, like 8x4 or 12x16, for irregular motion. Decoders for H.265 thus need more precise interpolation filters and larger reference frame buffers, as CTUs enable bigger motion searches.

Transform and Quantization Differences

Both use transform coding to compact residual data into frequency coefficients, but H.265 scales it up.

H.264 primarily employs a 4x4 or 8x8 integer discrete cosine transform (DCT)-like transform. H.265 supports transforms from 4x4 up to 32x32, allowing larger blocks to capture low-frequency energy more efficiently. It also introduces a discrete sine transform (DST) for 4x4 intra luma residuals, which better suits certain patterns.

Inverse quantization and transform in H.265 are more demanding due to these larger sizes—think matrix multiplications on 32x32 blocks versus 16x16 max in H.264. This contributes to H.265's higher decoding complexity, often 1.5-2x that of H.264 for similar quality.

Entropy Decoding Improvements

Entropy decoding extracts the compressed syntax from the bitstream. H.264 offers CAVLC (simpler, variable-length) or CABAC (more efficient, arithmetic). H.265 mandates an enhanced CABAC for its main profiles, with optimizations like bypass bins for faster processing and better context modeling.

While CABAC in both is serial and can bottleneck decoders, H.265's version is tuned for higher throughput, supporting parallelization in wavefront parallel processing (WPP). This means H.265 decoders can leverage multi-core CPUs more effectively than H.264's slice-based parallelism.

In-Loop Filtering Advances

To combat artifacts, both apply filters during decoding for smoother output and better references.

H.264 uses an adaptive deblocking filter on block edges. H.265 retains a similar deblocker but adds Sample Adaptive Offset (SAO), which classifies pixels and applies offsets to reduce banding and ringing. SAO operates post-deblocking and is edge- or band-based, requiring additional passes in the decoder pipeline.

These extra filters enhance subjective quality, especially at low bitrates, but add to the processing overhead—H.265 decoders might need 20-30% more cycles just for filtering.

Parallelism and Hardware Implications

H.264 decoding is relatively lightweight, making it ubiquitous on older devices. H.265, designed for 4K and beyond, emphasizes parallelism with tools like tiles (grid-like divisions) and WPP (row-based wavefronts), allowing independent decoding of frame sections.

This makes H.265 more scalable on modern GPUs and multi-core processors, but it demands hardware acceleration for real-time playback on mobiles or low-power devices. Early adoption was slowed by higher complexity—decoding H.265 can require 4-10x more power than H.264 without dedicated silicon like Apple's A-series or Qualcomm's Snapdragon decoders.

When to Choose Which

H.264 remains ideal for legacy systems, broad compatibility, and scenarios where encoding/decoding speed trumps efficiency. H.265 shines in bandwidth-constrained environments like 4K streaming (e.g., Netflix, YouTube) or storage, but its royalty fees and complexity have led to alternatives like AV1.

In summary, H.265 decoding isn't just an upgrade—it's a reimagining for the high-res era, with deeper prediction, larger blocks, and smarter filters. If you're implementing or optimizing decoders, libraries like FFmpeg support both, letting you experiment with the trade-offs.

What are your thoughts on the shift to newer codecs? Share in the comments!