Transcoding Without the Generational Curse: A Flatness-First Strategy

in #transcoding6 days ago

If you've ever re-encoded a video — maybe converting old H.264 files to AV1 for archiving, uploading to YouTube, or building a personal media library — you've probably noticed something frustrating: each transcode makes the video look a little worse. Edges get softer, dark scenes pick up banding, and flat areas (like black letterbox bars or solid backgrounds) accumulate ugly noise. This is generational loss, the inevitable buildup of quantization errors every time you go through a lossy codec.

The good news? You can fight back — and often end up with a cleaner result at a lower bitrate than the source. The secret isn't chasing perfect fidelity to the already-damaged input. It's recognizing that much of the "detail" in flat or near-uniform regions is just inherited quantization noise — worthless entropy that deserves to be thrown away.

This post is a practical, step-by-step guide to a "flatness-first" transcoding strategy: detect blank or low-detail areas, gently discard the garbage noise in them, and let the encoder reallocate those saved bits to the parts that actually matter (faces, text, motion, textures). The result: reduced generational loss, smaller files, and frequently better subjective quality.

Why Flat Regions Are the Key

In most videos, large portions of the frame carry almost no perceptual information:

  • Letterbox/pillarbox bars
  • Solid-color backgrounds (common in anime, presentations, gaming UIs)
  • Dark shadows or night skies
  • Uniform overlays (subtitles backgrounds, logos)

Original clean sources have these areas perfectly flat. But after one or more lossy compressions, they pick up subtle gradients, banding, or dither-like noise from quantization. Traditional encoders will faithfully preserve this noise because it's part of the input pixels — wasting bits on garbage.

By treating this noise as discardable entropy and forcing those regions toward true flatness before encoding, we allow the encoder to skip them almost entirely (zero residuals, single DC coefficient). The bits we save go to protecting real detail elsewhere.

Real-world savings: 10–30% bitrate reduction on typical content, up to 50% on anime or screencasts — often with zero visible difference and sometimes visibly cleaner output.

Step-by-Step Flatness-First Pipeline

We'll use free, open-source tools: ffmpeg + libaom (or SVT-AV1 for speed) for encoding, plus lightweight preprocessing. Everything runs on a standard CPU; no GPU required (though you can add AI denoisers if you want).

Step 1: Analyze Your Source for Flat Regions

First, understand where the opportunities are.

Quick variance check with ffmpeg:

ffmpeg -i input.mkv -vf "variance" -f null -

This outputs per-frame variance stats. Low average variance = lots of flat area.

For visual inspection, generate a variance heatmap:

ffmpeg -i input.mkv -vf "variance=size=hd1080,format=gray" variance_heatmap_%04d.png

Dark areas in the output = low variance (your targets).

Common patterns to look for:

  • Black bars (variance near zero, but with noise)
  • Solid anime backgrounds
  • Static UI elements in gameplay/screencasts

Step 2: Preprocess — Discard Noise in Flat Areas

The goal: gently force near-uniform regions to perfect uniformity without touching detailed areas.

Simple approach: Selective averaging / low-strength denoise

Use ffmpeg's avgblur or nlmeans only on low-variance areas via masking.

Better yet: use a variance threshold to create a mask and blend toward the local mean.

Here's a solid ffmpeg filter chain:

ffmpeg -i input.mkv \
-vf "split=2[main][tmp]; \
     [tmp]variance=size=hd1080[v]; \
     [v]thresh=0.01,negate[mask]; \  # threshold tuned for your content; lower = more aggressive on flats
     [main][mask]maskedmerge=avg" \  # replaces low-variance pixels with block average
-preprocessed.mkv

This is basic but effective — it flattens noisy blacks/bars without blurring edges.

Advanced option: VapourSynth scripting

For more control, switch to VapourSynth (Python-based, excellent for batch work). Example script to flatten low-variance blocks:

import vapoursynth as vs
core = vs.core

clip = core.ffms2.Source("input.mkv")

# Very light selective denoise only on flats
denoised = core.knlm.KNLMeansCL(clip, d=1, a=2, s=4, h=2.0, channels="YUV")

# Variance-based mask + merge toward mean
def flatten_flats(n, f):
    if f.props.PlaneStatsVariance < 10:  # tune threshold
        return core.std.PlaneStats(clip).std.MakeDiff(denoised).std.Merge(clip)
    return clip

clip = core.std.FrameEval(clip, flatten_flats, prop_src=clip)
clip.set_output()

Pipe this to ffmpeg for encoding. VapourSynth shines for anime (preserves lines) and screencasts (keeps text sharp).

Step 3: Encode with Bit Reallocation

Now encode the preprocessed source aggressively in flat areas.

Recommended: Two-pass AV1 with libaom or SVT-AV1

Example ffmpeg command (libaom, high quality):

ffmpeg -i preprocessed.mkv \
-c:v libaom-av1 -crf 28 -b:v 0 \  # tune CRF for your target quality
-cpu-used 4 \  # balance speed/quality
-row-mt 1 -tiles 4x4 \  # multithreading
-aq-mode 1 \  # variance-based AQ (helps flats)
-tune-content default \  # or 'screen' for screencasts
output.av1.mkv

To push flat optimization further:

  • Use --enable-qm=1 (quantization matrices) to reduce bits in flats
  • SVT-AV1 alternative for speed: -preset 4 -rc 0 -qp 30 with -tune 0 (VQ)

Psychovisual options (aomenc direct):

aomenc --cq-level=30 --tune=psy --psy-rd=2.0 --enable-fwd-kf=1 --lag-in-frames=48

Higher psy-rd values aggressively drop bits from low-detail regions.

Step 4: Verify Results

Compare:

ffmpeg -i original.mkv -i output.av1.mkv -lavfi ssim -f null -

You'll often see similar or higher SSIM/VMAF despite lower bitrate — because you removed garbage the metrics hated.

Visual check: look at black bars and skies. The new encode should be cleaner, with true black instead of noisy gray.

Real-World Results

On a typical 1080p anime episode (large flat cels):

  • Source H.264: 8 Mbps
  • Naive AV1 transcode: ~4 Mbps, but visible inherited banding
  • Flatness-first: ~2.8 Mbps, cleaner flats, sharper lines

On a gameplay recording with UI overlays:

  • 20–40% bitrate saved, text stays crisp, dark areas lose mosquito noise.

Final Thoughts

Generational loss isn't inevitable — it's a choice. By refusing to preserve worthless entropy in flat regions, you break the curse: each transcode can be a refinement rather than degradation.

Start simple with ffmpeg masking, graduate to VapourSynth for precision. Experiment on your own content — the bitrate savings and quality gains are addictive.

Got a specific content type (anime, live-action, screencasts)? Drop a comment — happy to share tuned scripts.

Happy encoding!