1# Temporally Scalable Video Coding
2
3## Basic Concepts
4
5### Introduction to Temporally Scalable Video Coding
6
7Scalable video coding is an extended standard for video coding. SVC (short for Scalable Video Coding, an extension of the H.264 standard) and SHVC (short for Scalable High Efficiency Video Coding, an extension of the H.265 standard) are popular nowadays.
8
9Scalable video coding allows conveyance of information structured in a hierarchical manner of spatial scalability, temporal scalability, and quality scalability.
10
11Temporally scalable video coding refers to the process of encoding a video sequence into a set of layers that provide an increasing temporal resolution. The following figure shows the structure of a bitstream that contains four temporal layers and is constructed based on the reference relationship.
12
13![Temporal scalability 4 layers](figures/temporal-scalability-4layers.png)
14
15In scenarios where the channel condition is poor, frames can be dropped layer by layer in descending order (L3- > L2- > L1) to meet the changing requirements of transmission and decoding capabilities.
16
17The figure below shows the new bitstream structure when the frames at L3 are dropped. The bitstream can be normally decoded while the frame rate is reduced by half. Dropping can occur at other layers in a similar way.
18
19![Temporal scalability 4 layers L3 dropped](figures/temporal-scalability-4layers-L3-dropped.png)
20
21### Structure of a Temporally Scalable Bitstream
22A bitstream is organized by one or more Group of Pictures (GOPs). A GOP is a collection of consecutive pictures that can be independently decoded. It measures the distance between two I-frames (also named key frames).
23
24A GOP can be further divided into one or more Temporal Group of Pictures (TGOPs), and each TGOP is composed by a base layer (BL) and one or more associated enhancement layers (ELs). For example, frame 0 to frame 7 in the foregoing four-layer temporally scalable bitstream form a TGOP.
25
26- BL: bottom layer (L0) in the GOP. In temporal scalability, this layer is encoded at the lowest frame rate.
27
28- EL: layers above the BL. There are L1, L2, and L3 in ascending order. In temporal scalability, the lowest EL encodes, based on encoding information obtained from the BL, the frames at a higher frame rate; a higher EL encodes, based on the BL or a lower EL, the frames at a higher frame rate.
29
30### How to Implement the Structure of a Temporally Scalable Bitstream
31
32The temporally scalable bitstream structure is implemented by specifying reference frames, which are classified into the following types based on the duration of residence in a Decoded Picture Buffer (DPB):
33
34- Short-Term Reference (STR): a reference frame that cannot reside in the DPB for a long period of time. It adopts the First In First Out (FIFO) approach, which means that the oldest STR is removed from the DPB once the DPB is full.
35
36- Long-Term Reference (LTR): a reference frame that can reside in the DPB for a long period of time. It stays in the DPB until it is replaced by another decoded picture with the same ID.
37
38Although a specific cross-frame reference structure can be implemented when there is more than one STR, the span supported by temporal scalability is limited due to an excessively short validity period. This problem does not exist when coming to the LTR, which also covers the cross-frame scenario of the STR. Therefore, the LTR is preferably used to implement the structure of a temporally scalable bitstream.
39
40## When to Use
41You are advised to use temporal scalability in the following scenarios:
42
43- Real-time encoding and transmission scenarios with no cache or low cache on the playback side, for example, video conferencing, live streaming, and collaborative office.
44
45- Video encoding and recording scenario that requires video preview or multi-speed playback.
46
47If your development scenario does not involve dynamic adjustment of the temporal reference structure and the hierarchical structure is simple, you are advised to use [global temporal scalability](#global-temporal-scalability). Otherwise, enable [LTR](#ltr).
48
49## Constraints
50
51- The global temporal scalability and LTR features are mutually exclusive.
52
53  The two features cannot be both enabled because they have normalized bottom-layer implementation.
54
55- When using the forcible IDR configuration along with the two features, use the frame channel configuration.
56
57  The reference frame is valid only in the GOP. After an I-frame is refreshed, the DPB is cleared, so does the reference frame. In other words, the I-frame refresh location has a great impact on the reference relationship.
58
59  When temporal scalability is enabled, to temporarily request the I-frame through **OH_MD_KEY_REQUEST_I_FRAME**, you must configure the frame channel with a determined effective time to notify the framework of the I-frame refresh location, so as to avoid disorder of the reference relationship. For details, see the configuration guide of the frame channel. Do not use **OH_VideoEncoder_SetParameter**, which uses an uncertain effective time.
60
61- The callback using **OH_AVBuffer** is supported, but the callback using **OH_AVMemory** is not.
62
63  Temporal scalability depends on the frame feature. Do not use **OH_AVMemory** to trigger **OH_AVCodecAsyncCallback**. Instead, use **OH_AVBuffer** to trigger **OH_AVCodecCallback**.
64
65- Temporal scalability employs P-pictures, but not B-pictures.
66
67  Temporal scalability can be hierarchical-P or hierarchical-B. Currently, this feature can only be hierarchical-P.
68
69- In the case of **UNIFORMLY_SCALED_REFERENCE**, TGOP can only be 2 or 4.
70
71## Global Temporal Scalability
72
73### Available APIs
74
75Global temporal scalability is suitable for encoding frames into a stable and simple temporal structure. Its initial configuration takes effect globally and cannot be dynamically modified. The configuration parameters are as follows:
76
77| Parameter| Description                        |
78| -------- | ---------------------------- |
79| OH_MD_KEY_VIDEO_ENCODER_ENABLE_TEMPORAL_SCALABILITY  |  Enabled status of the global temporal scalability feature.|
80| OH_MD_KEY_VIDEO_ENCODER_TEMPORAL_GOP_SIZE  | TGOP size of the global temporal scalability feature.|
81| OH_MD_KEY_VIDEO_ENCODER_TEMPORAL_GOP_REFERENCE_MODE  | TGOP reference mode of the global temporal scalability feature. |
82
83- **OH_MD_KEY_VIDEO_ENCODER_ENABLE_TEMPORAL_SCALABILITY**: This parameter is set in the configuration phase. The feature can be successfully enabled only when it is supported.
84
85- **OH_MD_KEY_VIDEO_ENCODER_TEMPORAL_GOP_SIZE**: This parameter is optional and specifies the distance between two I-frames. You need to customize the I-frame density based on the frame extraction requirements. The value range is [2, GopSize). If no value is passed in, the default value is used.
86
87- **OH_MD_KEY_VIDEO_ENCODER_TEMPORAL_GOP_REFERENCE_MODE**: This parameter is optional and affects the reference mode of non-I-frames. The value can be **ADJACENT_REFERENCE**, **JUMP_REFERENCE**, or **UNIFORMLY_SCALED_REFERENCE**. **ADJACENT_REFERENCE** provides better compression performance, whereas **JUMP_REFERENCE** is more flexible in dropping frames. **UNIFORMLY_SCALED_REFERENCE** enables streams to be distributed more evenly in the case of frame loss. If no value is passed in, the default value is used.
88
89    > **NOTE**
90    >
91    > In the case of **UNIFORMLY_SCALED_REFERENCE**, TGOP can only be 2 or 4.
92
93Example 1: TGOP=4, ADJACENT_REFERENCE
94
95![Temporal gop 4 adjacent reference](figures/temporal-scalability-tgop4-adjacent.png)
96
97Example 2: TGOP=4, JUMP_REFERENCE
98
99![TGOP4 jump reference](figures/temporal-scalability-tgop4-jump.png)
100
101Example 3: TGOP = 4, UNIFORMLY_SCALED_REFERENCE
102
103![TGOP4 uniformly scaled reference](figures/temporal-scalability-tgop4-uniformly.png)
104
105### How to Develop
106
107This section describes only the steps that are different from the basic encoding process. You can learn the basic encoding process in [Video Encoding](video-encoding.md).
108
1091. When creating an encoder instance, check whether the video encoder supports the global temporal scalability feature.
110
111    ```c++
112    // 1.1 Obtain the handle to the capability of the video encoder. The following uses H.264 as an example.
113    OH_AVCapability *cap = OH_AVCodec_GetCapability(OH_AVCODEC_MIMETYPE_VIDEO_AVC, true);
114    // 1.2 Check whether the global temporal scalability feature is supported.
115    bool isSupported = OH_AVCapability_IsFeatureSupported(cap, VIDEO_ENCODER_TEMPORAL_SCALABILITY);
116    ```
117
118    If the feature is supported, it can be enabled.
119
1202. In the configuration phase, configure the parameters related to the global temporal scalability feature.
121
122    ```c++
123    constexpr int32_t TGOP_SIZE = 3;
124    // 2.1 Create a temporary AV format used for configuration.
125    OH_AVFormat *format = OH_AVFormat_Create();
126    // 2.2 Fill in the key-value pair of the parameter used to enable the feature.
127    OH_AVFormat_SetIntValue(format, OH_MD_KEY_VIDEO_ENCODER_ENABLE_TEMPORAL_SCALABILITY, 1);
128    // 2.3 (Optional) Fill in the key-value pairs of the parameters that specify the TGOP size and reference mode.
129    OH_AVFormat_SetIntValue(format, OH_MD_KEY_VIDEO_ENCODER_TEMPORAL_GOP_SIZE, TGOP_SIZE);
130    OH_AVFormat_SetIntValue(format, OH_MD_KEY_VIDEO_ENCODER_TEMPORAL_GOP_REFERENCE_MODE, ADJACENT_REFERENCE);
131    // 2.4 Configure the parameters.
132    int32_t ret = OH_VideoEncoder_Configure(videoEnc, format);
133    if (ret != AV_ERR_OK) {
134        // Exception handling.
135    }
136    // 2.5 Destroy the temporary AV format after the configuration is complete.
137    OH_AVFormat_Destroy(format);
138    ```
139
1403. (Optional) During output rotation in the running phase, obtain the temporal layer information corresponding to the bitstream.
141
142    You can periodically obtain the number of encoded frames based on the configured TGOP parameters.
143
144    The sample code is as follows:
145
146    ```c++
147    uint32_t outPoc = 0;
148    // Obtain the relative position in the TGOP based on the number of valid frames in the output callback and determine the layer based on the configuration.
149    static void OnNewOutputBuffer(OH_AVCodec *codec, uint32_t index, OH_AVBuffer *buffer, void *userData)
150    {
151        // Note: If complex processing is involved, you are advised to create an association.
152        struct OH_AVCodecBufferAttr attr;
153        (void)buffer->GetBufferAttr(attr);
154        // Set POC to 0 after the I-frame is refreshed.
155        if (attr.flags & AVCODEC_BUFFER_FLAG_KEY_FRAME) {
156            outPoc = 0;
157        }
158        // Skip the process when there is only the XPS output, but no frame stream.
159        if (attr.flags != AVCODEC_BUFFER_FLAG_CODEC_DATA) {
160            int32_t tGopInner = outPoc % TGOP_SIZE;
161            if (tGopInner == 0) {
162                // I-frames cannot be dropped in subsequent transmission and decoding processes.
163            } else {
164                // Non-I-frames can be dropped in subsequent transmission and decoding processes.
165            }
166            outPoc++;
167        }
168    }
169    ```
170
1714. (Optional) During output rotation in the running phase, use the temporal layer information obtained for adaptive transmission or decoding.
172
173    Based on the temporally scalable bitstream and layer information, select a required layer for transmission, or carry the information to the peer for adaptive decoding.
174
175## LTR
176
177### Available APIs
178
179The LTR feature provides a flexible configuration of the frame-level reference relationship. It is suitable for flexible and complex temporally hierarchical structures.
180
181| Parameter| Description                |
182| -------- | ---------------------------- |
183| OH_MD_KEY_VIDEO_ENCODER_LTR_FRAME_COUNT  |  Number of LTR frames.|
184| OH_MD_KEY_VIDEO_ENCODER_PER_FRAME_MARK_LTR  | Marked as an LTR frame.|
185| OH_MD_KEY_VIDEO_ENCODER_PER_FRAME_USE_LTR   | Number of the LTR frame referenced by the current frame. |
186
187- **OH_MD_KEY_VIDEO_ENCODER_LTR_FRAME_COUNT**: This parameter is set in the configuration phase and must be less than or equal to the maximum number of LTR frames supported.
188- **OH_MD_KEY_VIDEO_ENCODER_PER_FRAME_MARK_LTR **: The BL layer is marked as an LTR frame, and the EL layer to skip is also marked as an LTR frame.
189- **OH_MD_KEY_VIDEO_ENCODER_PER_FRAME_USE_LTR **: Number of the frame marked as the LTR frame.
190
191For example, to implement the four-layer temporally hierarchical structure described in [Introduction to Temporally Scalable Video Coding](#introduction-to-temporally-scalable-video-coding), perform the following steps:
192
1931. In the configuration phase, set **OH_MD_KEY_VIDEO_ENCODER_LTR_FRAME_COUNT** to **5**.
194
1952. In the input rotation of the running phase, configure the LTR parameters according to the following table, where **\** means that no configuration is required.
196
197    | Configuration\POC| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 |
198    | -------- |---|---|---|---|---|---|---|---|---|---|----|----|----|----|----|----|----|
199    | MARK_LTR | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0  | 0  | 1  | 0  | 0  | 0  | 1  |
200    | USE_LTR  | \ | \ | 0 | \ | 0 | \ | 4 | \ | 0 | \ | 8  | \  | 8  | \  | 12 | 0  | 8  |
201
202### How to Develop
203
204This section describes only the steps that are different from the basic encoding process. You can learn the basic encoding process in [Video Encoding](video-encoding.md).
205
2061. When creating an encoder instance, check whether the video encoder supports the LTR feature.
207
208    ```c++
209    constexpr int32_t NEEDED_LTR_COUNT = 5;
210    bool isSupported = false;
211    int32_t supportedLTRCount = 0;
212    // 1.1 Obtain the handle to the capability of the encoder. The following uses H.264 as an example.
213    OH_AVCapability *cap = OH_AVCodec_GetCapability(OH_AVCODEC_MIMETYPE_VIDEO_AVC, true);
214    // 1.2 Check whether the LTR feature is supported.
215    isSupported = OH_AVCapability_IsFeatureSupported(cap, VIDEO_ENCODER_LONG_TERM_REFERENCE);
216    // 1.3 Determine the number of supported LTR frames.
217    if (isSupported) {
218        OH_AVFormat *properties = OH_AVCapability_GetFeatureProperties(cap, VIDEO_ENCODER_LONG_TERM_REFERENCE);
219        OH_AVFormat_GetIntValue(properties, OH_FEATURE_PROPERTY_KEY_VIDEO_ENCODER_MAX_LTR_FRAME_COUNT, &supportedLTRCount);
220        OH_AVFormat_Destroy(properties);
221        // 1.4 Check whether the number of supported LTR frames meets the structure requirements.
222        isSupported = supportedLTRCount >= NEEDED_LTR_COUNT;
223    }
224    ```
225
226    If the LTR feature is supported and the number of supported LTR frames meets the requirements, the feature can be enabled.
227
2282. Register the frame channel callback functions.
229
230    The following is an example of the configuration in buffer input mode:
231
232    ```c++
233    // 2.1 Implement the OH_AVCodecOnNeedInputBuffer callback function.
234    static void OnNeedInputBuffer(OH_AVCodec *codec, uint32_t index, OH_AVBuffer *buffer, void *userData)
235    {
236        // The index of the input frame buffer is sent to InIndexQueue.
237        // The input frame data (specified by buffer) is sent to InBufferQueue.
238        // Perform data processing. For details, see:
239        // - Write the stream to encode.
240        // - Notify the encoder of EOS.
241        // - Write the frame parameter.
242        OH_AVFormat *format = OH_AVBuffer_GetParameter(buffer);
243        OH_AVFormat_SetIntValue(format, OH_MD_KEY_VIDEO_ENCODER_PER_FRAME_MARK_LTR, 1);
244        OH_AVFormat_SetIntValue(format, OH_MD_KEY_VIDEO_ENCODER_PER_FRAME_USE_LTR, 4);
245        OH_AVBuffer_SetParameter(buffer, format);
246        OH_AVFormat_Destroy(format);
247        // Notify the encoder that the buffer input is complete.
248        OH_VideoEncoder_PushInputBuffer(codec, index);
249    }
250
251    // 2.2 Implement the OH_AVCodecOnNewOutputBuffer callback function.
252    static void OnNewOutputBuffer(OH_AVCodec *codec, uint32_t index, OH_AVBuffer *buffer, void *userData)
253    {
254        // The index of the output frame buffer is sent to outIndexQueue.
255        // The encoded frame data (specified by buffer) is sent to outBufferQueue.
256        // Perform data processing. For details, see:
257        // - Release the encoded frame.
258        // - Record POC and the enabled status of LTR.
259    }
260
261    // 2.3 Register the callback functions.
262    OH_AVCodecCallback cb;
263    cb.onNeedInputBuffer = OnNeedInputBuffer;
264    cb.onNewOutputBuffer = OnNewOutputBuffer;
265    OH_VideoEncoder_RegisterCallback(codec, cb, nullptr);
266    ```
267
268    The following is an example of the configuration in surface input mode:
269
270    ```c++
271    // 2.1 Implement the OH_VideoEncoder_OnNeedInputParameter callback function.
272    static void OnNeedInputParameter(OH_AVCodec *codec, uint32_t index, OH_AVFormat *parameter, void *userData)
273    {
274        // The index of the input frame buffer is sent to InIndexQueue.
275        // The input frame data (specified by avformat) is sent to InFormatQueue.
276        // Perform data processing. For details, see:
277        // - Write the stream to encode.
278        // - Notify the encoder of EOS.
279        // - Write the frame parameter.
280        OH_AVFormat_SetIntValue(parameter, OH_MD_KEY_VIDEO_ENCODER_PER_FRAME_MARK_LTR, 1);
281        OH_AVFormat_SetIntValue(parameter, OH_MD_KEY_VIDEO_ENCODER_PER_FRAME_USE_LTR, 4);
282        // Notify the encoder that the frame input is complete.
283        OH_VideoEncoder_PushInputParameter(codec, index);
284    }
285
286    // 2.2 Implement the OH_AVCodecOnNewOutputBuffer callback function.
287    static void OnNewOutputBuffer(OH_AVCodec *codec, uint32_t index, OH_AVBuffer *buffer, void *userData)
288    {
289        // The index of the output frame buffer is sent to outIndexQueue.
290        // The encoded frame data (specified by buffer) is sent to outBufferQueue.
291        // Perform data processing. For details, see:
292        // - Release the encoded frame.
293        // - Record POC and the enabled status of LTR.
294    }
295
296    // 2.3 Register the callback functions.
297    OH_AVCodecCallback cb;
298    cb.onNewOutputBuffer = OnNewOutputBuffer;
299    OH_VideoEncoder_RegisterCallback(codec, cb, nullptr);
300    // 2.4 Register the frame channel callback functions.
301    OH_VideoEncoder_OnNeedInputParameter inParaCb = OnNeedInputParameter;
302    OH_VideoEncoder_RegisterParameterCallback(codec, inParaCb, nullptr);
303    ```
304
3053. In the configuration phase, configure the maximum number of LTR frames.
306
307    ```c++
308    constexpr int32_t TGOP_SIZE = 3;
309    // 3.1 Create a temporary AV format used for configuration.
310    OH_AVFormat *format = OH_AVFormat_Create();
311    // 3.2 Fill in the key-value pair of the parameter that specifies the number of LTR frames.
312    OH_AVFormat_SetIntValue(format, OH_MD_KEY_VIDEO_ENCODER_LTR_FRAME_COUNT, NEEDED_LTR_COUNT);
313    // 3.3 Configure the parameters.
314    int32_t ret = OH_VideoEncoder_Configure(videoEnc, format);
315    if (ret != AV_ERR_OK) {
316        // Exception handling.
317    }
318    // 3.4 Destroy the temporary AV format after the configuration is complete.
319    OH_AVFormat_Destroy(format);
320    ```
321
3224. (Optional) During output rotation in the running phase, obtain the temporal layer information corresponding to the bitstream.
323
324    This procedure is the same as that described in the global temporal scalability feature.
325
326    The LTR parameters are configured in the input rotation. You can also record the LTR parameters in the input rotation and find the corresponding input parameters in the output rotation.
327
3285. (Optional) During output rotation in the running phase, use the temporal layer information obtained for adaptive transmission or decoding.
329
330    This procedure is the same as that described in the global temporal scalability feature.
331