1# Temporally Scalable Video Coding 2 3## Basic Concepts 4 5### Introduction to Temporally Scalable Video Coding 6 7Scalable video coding is an extended standard for video coding. SVC (short for Scalable Video Coding, an extension of the H.264 standard) and SHVC (short for Scalable High Efficiency Video Coding, an extension of the H.265 standard) are popular nowadays. 8 9Scalable video coding allows conveyance of information structured in a hierarchical manner of spatial scalability, temporal scalability, and quality scalability. 10 11Temporally scalable video coding refers to the process of encoding a video sequence into a set of layers that provide an increasing temporal resolution. The following figure shows the structure of a bitstream that contains four temporal layers and is constructed based on the reference relationship. 12 13 14 15In scenarios where the channel condition is poor, frames can be dropped layer by layer in descending order (L3- > L2- > L1) to meet the changing requirements of transmission and decoding capabilities. 16 17The figure below shows the new bitstream structure when the frames at L3 are dropped. The bitstream can be normally decoded while the frame rate is reduced by half. Dropping can occur at other layers in a similar way. 18 19 20 21### Structure of a Temporally Scalable Bitstream 22A bitstream is organized by one or more Group of Pictures (GOPs). A GOP is a collection of consecutive pictures that can be independently decoded. It measures the distance between two I-frames (also named key frames). 23 24A GOP can be further divided into one or more Temporal Group of Pictures (TGOPs), and each TGOP is composed by a base layer (BL) and one or more associated enhancement layers (ELs). For example, frame 0 to frame 7 in the foregoing four-layer temporally scalable bitstream form a TGOP. 25 26- BL: bottom layer (L0) in the GOP. In temporal scalability, this layer is encoded at the lowest frame rate. 27 28- EL: layers above the BL. There are L1, L2, and L3 in ascending order. In temporal scalability, the lowest EL encodes, based on encoding information obtained from the BL, the frames at a higher frame rate; a higher EL encodes, based on the BL or a lower EL, the frames at a higher frame rate. 29 30### How to Implement the Structure of a Temporally Scalable Bitstream 31 32The temporally scalable bitstream structure is implemented by specifying reference frames, which are classified into the following types based on the duration of residence in a Decoded Picture Buffer (DPB): 33 34- Short-Term Reference (STR): a reference frame that cannot reside in the DPB for a long period of time. It adopts the First In First Out (FIFO) approach, which means that the oldest STR is removed from the DPB once the DPB is full. 35 36- Long-Term Reference (LTR): a reference frame that can reside in the DPB for a long period of time. It stays in the DPB until it is replaced by another decoded picture with the same ID. 37 38Although a specific cross-frame reference structure can be implemented when there is more than one STR, the span supported by temporal scalability is limited due to an excessively short validity period. This problem does not exist when coming to the LTR, which also covers the cross-frame scenario of the STR. Therefore, the LTR is preferably used to implement the structure of a temporally scalable bitstream. 39 40## When to Use 41You are advised to use temporal scalability in the following scenarios: 42 43- Real-time encoding and transmission scenarios with no cache or low cache on the playback side, for example, video conferencing, live streaming, and collaborative office. 44 45- Video encoding and recording scenario that requires video preview or multi-speed playback. 46 47If your development scenario does not involve dynamic adjustment of the temporal reference structure and the hierarchical structure is simple, you are advised to use [global temporal scalability](#global-temporal-scalability). Otherwise, enable [LTR](#ltr). 48 49## Constraints 50 51- The global temporal scalability and LTR features are mutually exclusive. 52 53 The two features cannot be both enabled because they have normalized bottom-layer implementation. 54 55- When using the forcible IDR configuration along with the two features, use the frame channel configuration. 56 57 The reference frame is valid only in the GOP. After an I-frame is refreshed, the DPB is cleared, so does the reference frame. In other words, the I-frame refresh location has a great impact on the reference relationship. 58 59 When temporal scalability is enabled, to temporarily request the I-frame through **OH_MD_KEY_REQUEST_I_FRAME**, you must configure the frame channel with a determined effective time to notify the framework of the I-frame refresh location, so as to avoid disorder of the reference relationship. For details, see the configuration guide of the frame channel. Do not use **OH_VideoEncoder_SetParameter**, which uses an uncertain effective time. 60 61- The callback using **OH_AVBuffer** is supported, but the callback using **OH_AVMemory** is not. 62 63 Temporal scalability depends on the frame feature. Do not use **OH_AVMemory** to trigger **OH_AVCodecAsyncCallback**. Instead, use **OH_AVBuffer** to trigger **OH_AVCodecCallback**. 64 65- Temporal scalability employs P-pictures, but not B-pictures. 66 67 Temporal scalability can be hierarchical-P or hierarchical-B. Currently, this feature can only be hierarchical-P. 68 69- In the case of **UNIFORMLY_SCALED_REFERENCE**, TGOP can only be 2 or 4. 70 71## Global Temporal Scalability 72 73### Available APIs 74 75Global temporal scalability is suitable for encoding frames into a stable and simple temporal structure. Its initial configuration takes effect globally and cannot be dynamically modified. The configuration parameters are as follows: 76 77| Parameter| Description | 78| -------- | ---------------------------- | 79| OH_MD_KEY_VIDEO_ENCODER_ENABLE_TEMPORAL_SCALABILITY | Enabled status of the global temporal scalability feature.| 80| OH_MD_KEY_VIDEO_ENCODER_TEMPORAL_GOP_SIZE | TGOP size of the global temporal scalability feature.| 81| OH_MD_KEY_VIDEO_ENCODER_TEMPORAL_GOP_REFERENCE_MODE | TGOP reference mode of the global temporal scalability feature. | 82 83- **OH_MD_KEY_VIDEO_ENCODER_ENABLE_TEMPORAL_SCALABILITY**: This parameter is set in the configuration phase. The feature can be successfully enabled only when it is supported. 84 85- **OH_MD_KEY_VIDEO_ENCODER_TEMPORAL_GOP_SIZE**: This parameter is optional and specifies the distance between two I-frames. You need to customize the I-frame density based on the frame extraction requirements. The value range is [2, GopSize). If no value is passed in, the default value is used. 86 87- **OH_MD_KEY_VIDEO_ENCODER_TEMPORAL_GOP_REFERENCE_MODE**: This parameter is optional and affects the reference mode of non-I-frames. The value can be **ADJACENT_REFERENCE**, **JUMP_REFERENCE**, or **UNIFORMLY_SCALED_REFERENCE**. **ADJACENT_REFERENCE** provides better compression performance, whereas **JUMP_REFERENCE** is more flexible in dropping frames. **UNIFORMLY_SCALED_REFERENCE** enables streams to be distributed more evenly in the case of frame loss. If no value is passed in, the default value is used. 88 89 > **NOTE** 90 > 91 > In the case of **UNIFORMLY_SCALED_REFERENCE**, TGOP can only be 2 or 4. 92 93Example 1: TGOP=4, ADJACENT_REFERENCE 94 95 96 97Example 2: TGOP=4, JUMP_REFERENCE 98 99 100 101Example 3: TGOP = 4, UNIFORMLY_SCALED_REFERENCE 102 103 104 105### How to Develop 106 107This section describes only the steps that are different from the basic encoding process. You can learn the basic encoding process in [Video Encoding](video-encoding.md). 108 1091. When creating an encoder instance, check whether the video encoder supports the global temporal scalability feature. 110 111 ```c++ 112 // 1.1 Obtain the handle to the capability of the video encoder. The following uses H.264 as an example. 113 OH_AVCapability *cap = OH_AVCodec_GetCapability(OH_AVCODEC_MIMETYPE_VIDEO_AVC, true); 114 // 1.2 Check whether the global temporal scalability feature is supported. 115 bool isSupported = OH_AVCapability_IsFeatureSupported(cap, VIDEO_ENCODER_TEMPORAL_SCALABILITY); 116 ``` 117 118 If the feature is supported, it can be enabled. 119 1202. In the configuration phase, configure the parameters related to the global temporal scalability feature. 121 122 ```c++ 123 constexpr int32_t TGOP_SIZE = 3; 124 // 2.1 Create a temporary AV format used for configuration. 125 OH_AVFormat *format = OH_AVFormat_Create(); 126 // 2.2 Fill in the key-value pair of the parameter used to enable the feature. 127 OH_AVFormat_SetIntValue(format, OH_MD_KEY_VIDEO_ENCODER_ENABLE_TEMPORAL_SCALABILITY, 1); 128 // 2.3 (Optional) Fill in the key-value pairs of the parameters that specify the TGOP size and reference mode. 129 OH_AVFormat_SetIntValue(format, OH_MD_KEY_VIDEO_ENCODER_TEMPORAL_GOP_SIZE, TGOP_SIZE); 130 OH_AVFormat_SetIntValue(format, OH_MD_KEY_VIDEO_ENCODER_TEMPORAL_GOP_REFERENCE_MODE, ADJACENT_REFERENCE); 131 // 2.4 Configure the parameters. 132 int32_t ret = OH_VideoEncoder_Configure(videoEnc, format); 133 if (ret != AV_ERR_OK) { 134 // Exception handling. 135 } 136 // 2.5 Destroy the temporary AV format after the configuration is complete. 137 OH_AVFormat_Destroy(format); 138 ``` 139 1403. (Optional) During output rotation in the running phase, obtain the temporal layer information corresponding to the bitstream. 141 142 You can periodically obtain the number of encoded frames based on the configured TGOP parameters. 143 144 The sample code is as follows: 145 146 ```c++ 147 uint32_t outPoc = 0; 148 // Obtain the relative position in the TGOP based on the number of valid frames in the output callback and determine the layer based on the configuration. 149 static void OnNewOutputBuffer(OH_AVCodec *codec, uint32_t index, OH_AVBuffer *buffer, void *userData) 150 { 151 // Note: If complex processing is involved, you are advised to create an association. 152 struct OH_AVCodecBufferAttr attr; 153 (void)buffer->GetBufferAttr(attr); 154 // Set POC to 0 after the I-frame is refreshed. 155 if (attr.flags & AVCODEC_BUFFER_FLAG_KEY_FRAME) { 156 outPoc = 0; 157 } 158 // Skip the process when there is only the XPS output, but no frame stream. 159 if (attr.flags != AVCODEC_BUFFER_FLAG_CODEC_DATA) { 160 int32_t tGopInner = outPoc % TGOP_SIZE; 161 if (tGopInner == 0) { 162 // I-frames cannot be dropped in subsequent transmission and decoding processes. 163 } else { 164 // Non-I-frames can be dropped in subsequent transmission and decoding processes. 165 } 166 outPoc++; 167 } 168 } 169 ``` 170 1714. (Optional) During output rotation in the running phase, use the temporal layer information obtained for adaptive transmission or decoding. 172 173 Based on the temporally scalable bitstream and layer information, select a required layer for transmission, or carry the information to the peer for adaptive decoding. 174 175## LTR 176 177### Available APIs 178 179The LTR feature provides a flexible configuration of the frame-level reference relationship. It is suitable for flexible and complex temporally hierarchical structures. 180 181| Parameter| Description | 182| -------- | ---------------------------- | 183| OH_MD_KEY_VIDEO_ENCODER_LTR_FRAME_COUNT | Number of LTR frames.| 184| OH_MD_KEY_VIDEO_ENCODER_PER_FRAME_MARK_LTR | Marked as an LTR frame.| 185| OH_MD_KEY_VIDEO_ENCODER_PER_FRAME_USE_LTR | Number of the LTR frame referenced by the current frame. | 186 187- **OH_MD_KEY_VIDEO_ENCODER_LTR_FRAME_COUNT**: This parameter is set in the configuration phase and must be less than or equal to the maximum number of LTR frames supported. 188- **OH_MD_KEY_VIDEO_ENCODER_PER_FRAME_MARK_LTR **: The BL layer is marked as an LTR frame, and the EL layer to skip is also marked as an LTR frame. 189- **OH_MD_KEY_VIDEO_ENCODER_PER_FRAME_USE_LTR **: Number of the frame marked as the LTR frame. 190 191For example, to implement the four-layer temporally hierarchical structure described in [Introduction to Temporally Scalable Video Coding](#introduction-to-temporally-scalable-video-coding), perform the following steps: 192 1931. In the configuration phase, set **OH_MD_KEY_VIDEO_ENCODER_LTR_FRAME_COUNT** to **5**. 194 1952. In the input rotation of the running phase, configure the LTR parameters according to the following table, where **\** means that no configuration is required. 196 197 | Configuration\POC| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 198 | -------- |---|---|---|---|---|---|---|---|---|---|----|----|----|----|----|----|----| 199 | MARK_LTR | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 200 | USE_LTR | \ | \ | 0 | \ | 0 | \ | 4 | \ | 0 | \ | 8 | \ | 8 | \ | 12 | 0 | 8 | 201 202### How to Develop 203 204This section describes only the steps that are different from the basic encoding process. You can learn the basic encoding process in [Video Encoding](video-encoding.md). 205 2061. When creating an encoder instance, check whether the video encoder supports the LTR feature. 207 208 ```c++ 209 constexpr int32_t NEEDED_LTR_COUNT = 5; 210 bool isSupported = false; 211 int32_t supportedLTRCount = 0; 212 // 1.1 Obtain the handle to the capability of the encoder. The following uses H.264 as an example. 213 OH_AVCapability *cap = OH_AVCodec_GetCapability(OH_AVCODEC_MIMETYPE_VIDEO_AVC, true); 214 // 1.2 Check whether the LTR feature is supported. 215 isSupported = OH_AVCapability_IsFeatureSupported(cap, VIDEO_ENCODER_LONG_TERM_REFERENCE); 216 // 1.3 Determine the number of supported LTR frames. 217 if (isSupported) { 218 OH_AVFormat *properties = OH_AVCapability_GetFeatureProperties(cap, VIDEO_ENCODER_LONG_TERM_REFERENCE); 219 OH_AVFormat_GetIntValue(properties, OH_FEATURE_PROPERTY_KEY_VIDEO_ENCODER_MAX_LTR_FRAME_COUNT, &supportedLTRCount); 220 OH_AVFormat_Destroy(properties); 221 // 1.4 Check whether the number of supported LTR frames meets the structure requirements. 222 isSupported = supportedLTRCount >= NEEDED_LTR_COUNT; 223 } 224 ``` 225 226 If the LTR feature is supported and the number of supported LTR frames meets the requirements, the feature can be enabled. 227 2282. Register the frame channel callback functions. 229 230 The following is an example of the configuration in buffer input mode: 231 232 ```c++ 233 // 2.1 Implement the OH_AVCodecOnNeedInputBuffer callback function. 234 static void OnNeedInputBuffer(OH_AVCodec *codec, uint32_t index, OH_AVBuffer *buffer, void *userData) 235 { 236 // The index of the input frame buffer is sent to InIndexQueue. 237 // The input frame data (specified by buffer) is sent to InBufferQueue. 238 // Perform data processing. For details, see: 239 // - Write the stream to encode. 240 // - Notify the encoder of EOS. 241 // - Write the frame parameter. 242 OH_AVFormat *format = OH_AVBuffer_GetParameter(buffer); 243 OH_AVFormat_SetIntValue(format, OH_MD_KEY_VIDEO_ENCODER_PER_FRAME_MARK_LTR, 1); 244 OH_AVFormat_SetIntValue(format, OH_MD_KEY_VIDEO_ENCODER_PER_FRAME_USE_LTR, 4); 245 OH_AVBuffer_SetParameter(buffer, format); 246 OH_AVFormat_Destroy(format); 247 // Notify the encoder that the buffer input is complete. 248 OH_VideoEncoder_PushInputBuffer(codec, index); 249 } 250 251 // 2.2 Implement the OH_AVCodecOnNewOutputBuffer callback function. 252 static void OnNewOutputBuffer(OH_AVCodec *codec, uint32_t index, OH_AVBuffer *buffer, void *userData) 253 { 254 // The index of the output frame buffer is sent to outIndexQueue. 255 // The encoded frame data (specified by buffer) is sent to outBufferQueue. 256 // Perform data processing. For details, see: 257 // - Release the encoded frame. 258 // - Record POC and the enabled status of LTR. 259 } 260 261 // 2.3 Register the callback functions. 262 OH_AVCodecCallback cb; 263 cb.onNeedInputBuffer = OnNeedInputBuffer; 264 cb.onNewOutputBuffer = OnNewOutputBuffer; 265 OH_VideoEncoder_RegisterCallback(codec, cb, nullptr); 266 ``` 267 268 The following is an example of the configuration in surface input mode: 269 270 ```c++ 271 // 2.1 Implement the OH_VideoEncoder_OnNeedInputParameter callback function. 272 static void OnNeedInputParameter(OH_AVCodec *codec, uint32_t index, OH_AVFormat *parameter, void *userData) 273 { 274 // The index of the input frame buffer is sent to InIndexQueue. 275 // The input frame data (specified by avformat) is sent to InFormatQueue. 276 // Perform data processing. For details, see: 277 // - Write the stream to encode. 278 // - Notify the encoder of EOS. 279 // - Write the frame parameter. 280 OH_AVFormat_SetIntValue(parameter, OH_MD_KEY_VIDEO_ENCODER_PER_FRAME_MARK_LTR, 1); 281 OH_AVFormat_SetIntValue(parameter, OH_MD_KEY_VIDEO_ENCODER_PER_FRAME_USE_LTR, 4); 282 // Notify the encoder that the frame input is complete. 283 OH_VideoEncoder_PushInputParameter(codec, index); 284 } 285 286 // 2.2 Implement the OH_AVCodecOnNewOutputBuffer callback function. 287 static void OnNewOutputBuffer(OH_AVCodec *codec, uint32_t index, OH_AVBuffer *buffer, void *userData) 288 { 289 // The index of the output frame buffer is sent to outIndexQueue. 290 // The encoded frame data (specified by buffer) is sent to outBufferQueue. 291 // Perform data processing. For details, see: 292 // - Release the encoded frame. 293 // - Record POC and the enabled status of LTR. 294 } 295 296 // 2.3 Register the callback functions. 297 OH_AVCodecCallback cb; 298 cb.onNewOutputBuffer = OnNewOutputBuffer; 299 OH_VideoEncoder_RegisterCallback(codec, cb, nullptr); 300 // 2.4 Register the frame channel callback functions. 301 OH_VideoEncoder_OnNeedInputParameter inParaCb = OnNeedInputParameter; 302 OH_VideoEncoder_RegisterParameterCallback(codec, inParaCb, nullptr); 303 ``` 304 3053. In the configuration phase, configure the maximum number of LTR frames. 306 307 ```c++ 308 constexpr int32_t TGOP_SIZE = 3; 309 // 3.1 Create a temporary AV format used for configuration. 310 OH_AVFormat *format = OH_AVFormat_Create(); 311 // 3.2 Fill in the key-value pair of the parameter that specifies the number of LTR frames. 312 OH_AVFormat_SetIntValue(format, OH_MD_KEY_VIDEO_ENCODER_LTR_FRAME_COUNT, NEEDED_LTR_COUNT); 313 // 3.3 Configure the parameters. 314 int32_t ret = OH_VideoEncoder_Configure(videoEnc, format); 315 if (ret != AV_ERR_OK) { 316 // Exception handling. 317 } 318 // 3.4 Destroy the temporary AV format after the configuration is complete. 319 OH_AVFormat_Destroy(format); 320 ``` 321 3224. (Optional) During output rotation in the running phase, obtain the temporal layer information corresponding to the bitstream. 323 324 This procedure is the same as that described in the global temporal scalability feature. 325 326 The LTR parameters are configured in the input rotation. You can also record the LTR parameters in the input rotation and find the corresponding input parameters in the output rotation. 327 3285. (Optional) During output rotation in the running phase, use the temporal layer information obtained for adaptive transmission or decoding. 329 330 This procedure is the same as that described in the global temporal scalability feature. 331