Parsing H.264 Bitstreams: A Practical Guide with Sample Code

in #h2646 days ago

In our previous post, we demystified the structure of H.264 bitstreams, focusing on NAL units and slices as the foundational building blocks. We saw how NAL units encapsulate everything from parameter sets (SPS/PPS) to coded slices, separated by start codes, and how slices divide frames for resilience and performance.

Now, let's get hands-on. This sequel post walks you through parsing an H.264 bitstream in code. We'll implement a simple parser that:

  • Reads a raw H.264 file (Annex B format).
  • Locates NAL units by finding start codes.
  • Extracts the NAL header.
  • Identifies the NAL unit type.
  • Prints basic information (e.g., type, reference status).

This is an educational example — not production-grade — but it gives you a solid foundation for deeper work, such as extracting SPS/PPS or integrating with decoders like FFmpeg.

Prerequisites

  • Basic Python knowledge.
  • An H.264 test stream (e.g., a .264 or .h264 file in Annex B format). You can find free samples online, like from the JVT test suite or FFmpeg's test streams.
  • We'll use Python's built-in struct module for byte-level reading.

Key Concepts Recap

  • Annex B Byte Stream Format: NAL units are prefixed with a start code: 0x000001 or 0x00000001.
  • NAL Header: 1 byte: forbidden_zero_bit (1 bit) | nal_ref_idc (2 bits) | nal_unit_type (5 bits).
  • Types we care about: 1-5 (coded slices), 7 (SPS), 8 (PPS), etc.

Sample Code: Basic H.264 Bitstream Parser

Here's a complete, runnable Python script that parses the bitstream and reports each NAL unit.

import sys
import struct

# NAL unit type names for readability
NAL_TYPES = {
    1: "Coded slice of a non-IDR picture",
    2: "Coded slice data partition A",
    3: "Coded slice data partition B",
    4: "Coded slice data partition C",
    5: "Coded slice of an IDR picture",
    6: "Supplemental enhancement information (SEI)",
    7: "Sequence parameter set (SPS)",
    8: "Picture parameter set (PPS)",
    9: "Access unit delimiter",
    10: "End of sequence",
    11: "End of stream",
    12: "Filler data",
    13: "Sequence parameter set extension",
    14: "Prefix NAL unit",
    15: "Subset sequence parameter set",
    19: "Coded slice of an auxiliary coded picture without partitioning",
    20: "Coded slice extension",
}

def find_start_code(data, pos):
    """Find the next start code (0x000001 or 0x00000001) starting from pos."""
    while pos < len(data) - 3:
        if data[pos:pos+3] == b'\x00\x00\x01':
            return pos, 3
        if data[pos:pos+4] == b'\x00\x00\x00\x01':
            return pos, 4
        pos += 1
    return None, None

def parse_nal_header(byte):
    """Parse the 1-byte NAL header."""
    forbidden_zero = (byte >> 7) & 0x01
    nal_ref_idc = (byte >> 5) & 0x03
    nal_unit_type = byte & 0x1F
    return forbidden_zero, nal_ref_idc, nal_unit_type

def main(filename):
    try:
        with open(filename, 'rb') as f:
            data = f.read()
    except FileNotFoundError:
        print(f"Error: File '{filename}' not found.")
        return

    print(f"Parsing H.264 bitstream: {filename}")
    print("Offset\tStart Code Len\tNAL Type\tRef\tDescription")

    pos = 0
    nal_count = 0

    while pos < len(data):
        start_pos, sc_len = find_start_code(data, pos)
        if start_pos is None:
            break

        # Move past the start code
        pos = start_pos + sc_len

        # Read the NAL header (next byte)
        if pos >= len(data):
            break
        nal_header = data[pos]
        pos += 1

        forbidden, ref_idc, unit_type = parse_nal_header(nal_header)

        if forbidden != 0:
            print(f"Warning: Forbidden zero bit set at offset {start_pos}")

        # Find the next start code to determine NAL length
        next_start, _ = find_start_code(data, pos)
        nal_length = (next_start - start_pos - sc_len) if next_start else (len(data) - start_pos - sc_len)

        description = NAL_TYPES.get(unit_type, f"Reserved/Unknown ({unit_type})")
        ref_str = "Yes" if ref_idc > 0 else "No"

        print(f"{start_pos:08X}\t{sc_len}\t\t{unit_type:02d}\t\t{ref_str}\t{description}")

        nal_count += 1

        # Move to the next potential start code
        pos = start_pos + sc_len + 1  # +1 for header

    print(f"\nTotal NAL units found: {nal_count}")

if __name__ == "__main__":
    if len(sys.argv) != 2:
        print("Usage: python h264_parser.py <input.h264>")
        sys.exit(1)
    main(sys.argv[1])

How to Run It

  1. Save the script as h264_parser.py.
  2. Download a sample H.264 file (e.g., test.264).
  3. Run: python h264_parser.py test.264

Sample Output

Parsing H.264 bitstream: test.264
Offset      Start Code Len  NAL Type    Ref     Description
00000000    4               07          Yes     Sequence parameter set (SPS)
0000001A    4               08          Yes     Picture parameter set (PPS)
00000028    4               09          No      Access unit delimiter
0000002C    4               05          Yes     Coded slice of an IDR picture
...
Total NAL units found: 120

Extending the Parser

This basic version is a great starting point. You can expand it to:

  • Extract SPS/PPS: Parse the payload after the header (using exponential-Golomb coding for syntax elements).
  • Handle RBSP (Raw Byte Sequence Payload): Remove emulation prevention bytes (0x03 inserted after 0x0000 or 0x0001).
  • Detect Access Units: Group NAL units between access unit delimiters or primary coded pictures.
  • Integrate with FFmpeg: Use libavcodec in C for full decoding, or pyav in Python.

For example, to remove emulation prevention bytes:

def remove_emulation_prevention(data):
    i = 0
    while i < len(data) - 2:
        if data[i:i+3] == b'\x00\x00\x03':
            data = data[:i+2] + data[i+3:]
        i += 1
    return data

Why This Matters

Parsing the bitstream yourself gives you insight into how H.264 data is organized — invaluable for debugging, custom streaming servers, or building tools like bitstream analyzers.

If you're serious about video coding, tools like Elecard StreamEye, FFmpeg's ffprobe, or Bitstream Analyzer from the H.264 reference software provide deeper inspection. But nothing beats rolling your own parser to truly understand the format.

Try it out with your own streams! Have questions about extending the code or tackling specific NAL types? Let me know in the comments.