Reinforcement learning augmented rate-distortion optimization in HEVC

  • Ahmed Hamza

Student thesis: Doctoral Thesis


Modern hybrid video codecs, such as HEVC (High Efficiency Video Coding)/H.265, are complex, multi-tiered systems. They present a challenging
environment for intelligent adaptive agents which seek to optimize real-time control of the encoding process. The Rate-Distortion Optimization (RDO) portion of this process, at the heart of the encoder, is typically formulated in standard (and commercial) implementations in terms of heuristics, and hand-engineered, theoretically derived relationships between control parameters and other variables. These relations work well for video sequences in general, but are not necessarily ideal in every scenario or scene. In this thesis, we explore and implement intelligent control agents for the rate-distortion subsystems of HEVC - agents that learn to improve from many iterations of experience.
We leverage direct source pixel information, and recent advances in Reinforcement Learning (RL) methods with multi-layer neural networks (deep reinforcement learning), to optimize the control of the RDO process in HEVC. The functional estimation power of neural representation, through gradual learning of directed policies by experiential reward, produces adaptive agents that can exceed the hard-coded implementations of HEVC (which are source-agnostic). Applications of machine learning methods in control of video codecs typically assume the role of supervised learners, where the task is a bypassing of the reference protocol to achieve lower-cost encoding timesavings, complexity reduction). In those settings, the goal is limited by the performance of the reference. In this thesis, we develop reinforcement-based training systems that learn to surpass the performance of the reference, from encoding trials of frames and coding blocks of single video sequences.
We present contributions in the adaptation of RL to a completely novel setting in video coding optimization, without the presence of a simulator, and with the presence of engineered reference parameters. We further produce methods to handle uncertainty, as well as an algorithm of expanding-horizons in learning gradient updates, that enabled accelerated results from relatively low number of coding pictures.
These methods are then extended to generalized policy-learners that can
learn to control from other (different) video sequence trials, at the codingblock level. We discover this generalization is crucial to better performance
by countering the effect of over-fitting and improving sample efficiency in the
learning process of the agent.
Our results improve significantly on the compression performance of the standard while conforming to its specification. We also find that reinforcement trained predictors of Lagrangian modifiers perform better than comparable, analytically derived versions in the literature, giving a stable near-optimal performance across scenery types and resolutions, without pre-defined models. We examine results both quantitatively and analytically (visually) in this work. Along the way, we identify some limitations of deep learning by reinforcement algorithms in this context, and discuss possible future works.
Date of AwardJun 2022
Original languageEnglish
SupervisorDjamel Ait-Boudaoud (Supervisor)

Cite this