1# Introduction to Media Kit
2
3Media Kit is used to develop audio and video playback or recording features. The Media Kit development guide provides comprehensive instructions on how to develop various audio and video modules, assisting you in utilizing the system's audio and video APIs to achieve desired functionalities. For example, you can use the SoundPool to implement simple prompt tones so that a drip sound is played upon the receipt of a new message; you can use the AVPlayer to develop a music player, which can loop a piece of music.
4
5Media Kit provides the following modules:
6
7- [AVPlayer](#avplayer): plays audio and video clips.
8- [SoundPool](#soundpool): plays short sounds.
9- [AVRecorder](#avrecorder): records audio and video clips.
10- [AVScreenCapture](#avscreencapture): captures the screen.
11- [AVMetadataExtractor](#avmetadataextractor): obtains audio and video metadata.
12- [AVImageGenerator](#avimagegenerator): obtains video thumbnails.
13- [AVTranscoder](#avtranscoder): video transcoding.
14
15## Highlights
16
17- Lightweight media engine
18
19   Less system resources (threads and memory) are required. Audio and video playback and recording, flexible pipeline assembly, and source, demuxer, and codec plugins are supported.
20
21- HDR video
22
23   The native data structures and interfaces are provided to support the capture and playback of HDR VIVID. Third-party applications can deliver a more immersive experience by leveraging the HDR capability of the system.
24
25- Sound pool
26
27   Short sound effects (such as the camera shutter sound effect and system notification sound effect) are often required during application development. You can call the SoundPool APIs to implement one-time loading of short sounds and multiple times of low-latency playback.
28
29## Development Description
30
31This development guide applies only to audio and video playback or recording, which are implemented by Media Kit. The UI, image processing, media storage, or other related capabilities are not covered.
32
33Before developing features related to audio and video playback or recording, you are advised to understand the following concepts:
34
35- Playback process: network protocol > container format > audio and video codec > graphics/audio rendering
36
37- Network protocols: HLS, HTTP-FLV, HTTP, HTTPS, and more
38
39- Container formats: mp4, mkv, mpeg-ts, and more
40
41- Encoding format: H.264, H.265, and more
42
43For details about the streaming media development process, see [Using AVPlayer to Play Streaming Media](streaming-media-playback-development-guide.md).
44
45## AVPlayer
46
47The AVPlayer transcodes audio and video media assets (such as MP4, MP3, MKV, and MPEG-TS) into renderable images and hearable audio analog signals, and plays the audio and video through output devices.
48
49The AVPlayer provides the integrated playback capability. This means that your application only needs to provide streaming media sources to implement media playback. It does not need to parse or decode data.
50
51### Audio Playback
52
53The figure below shows the interaction between the AVPlayer and external modules when it is used to develop a music application.
54
55![Audio Playback Interaction Diagram](figures/audio-playback-interaction-diagram.png)
56
57When a music application calls the AVPlayer APIs at the JS interface layer to implement audio playback, the player framework at the framework layer parses the media asset into audio data streams (in PCM format). The audio data streams are then decoded by software and output to the audio framework. The audio framework outputs the audio data streams to the audio HDI for rendering. A complete audio playback process requires the cooperation of the application, player framework, audio framework, and audio HDI.
58
59In this figure, the numbers indicate the process where data is transferred to external modules.
60
611. The application transfers the media asset to the AVPlayer instance.
62
632. The player framework outputs the audio PCM data streams to the audio framework, which then outputs the data streams to the audio HDI.
64
65### Video Playback
66
67The figure below shows the interaction between the AVPlayer and external modules when it is used to develop a video playback application.
68
69![Video playback interaction diagram](figures/video-playback-interaction-diagram.png)
70
71When a video playback application calls the AVPlayer APIs at the JS interface layer to implement audio and video playback, the player framework at the framework layer parses the media asset into separate audio data streams and video data streams. The audio data streams are then decoded by software and output to the audio framework. The audio framework outputs the audio data streams to the audio HDI at the hardware interface layer to implement audio playback. The video data streams are then decoded by hardware (recommended) or software and output to the graphic framework. The graphic framework outputs the video data streams to the display HDI at the hardware interface layer to implement graphics rendering.
72
73A complete video playback process requires the cooperation of the application, XComponent, player framework, graphic framework, audio framework, display HDI, and audio HDI.
74
75In this figure, the numbers indicate the process where data is transferred to external modules.
76
771. The application obtains a window surface ID from the XComponent. For details about how to obtain the window surface ID, see [XComponent](../../reference/apis-arkui/arkui-ts/ts-basic-components-xcomponent.md).
78
792. The application transfers the media asset and surface ID to the AVPlayer instance.
80
813. The player framework outputs the video elementary streams (ESs) to the decoding HDI to obtain video frames (NV12/NV21/RGBA).
82
834. The player framework outputs the audio PCM data streams to the audio framework, which then outputs the data streams to the audio HDI.
84
855. The player framework outputs the video frames (NV12/NV21/RGBA) to the graphic framework, which then outputs the video frames to the display HDI.
86
87### Supported Formats and Protocols
88
89Audio and video containers and codecs are domains specific to content creators. You are advised to use the mainstream playback formats, rather than custom ones to avoid playback failures, stutters, and artifacts. The system will not be affected by incompatibility issues. If such an issue occurs, you can exit playback.
90
91The table below lists the supported protocols.
92
93| Scenario| Description|
94| -------- | -------- |
95| Local VOD| The file descriptor is supported, but the file path is not.|
96| Network VoD| HTTP, HTTPS, HLS, and DASH are supported.|
97| Live webcasting| HLS and HTTP-FLV are supported.|
98
99The table below lists the supported audio playback formats.
100
101| Audio Container Format| Description|
102| -------- | -------- |
103| m4a | Audio format: AAC|
104| aac | Audio format: AAC|
105| mp3 | Audio format: MP3|
106| ogg | Audio format: VORBIS |
107| wav | Audio format: PCM|
108| amr | Audio format: AMR|
109
110<!--Del-->
111> **NOTE**
112>
113> The supported video formats are further classified into mandatory and optional ones. All vendors must support mandatory ones and can determine whether to implement optional ones based on their service requirements. You are advised to perform compatibility processing to ensure that all the application functions are compatible on different platforms.
114
115| Video Format| Mandatory or Not|
116| -------- | -------- |
117| H265<sup>10+</sup>      | Yes|
118| H264      | Yes|
119<!--DelEnd-->
120
121The table below lists the supported playback formats and mainstream resolutions.
122
123| Video Container Format| Description| Resolution|
124| -------- | -------- | -------- |
125| mp4 | Video formats: H.265<sup>10+</sup> and H.264<br>Audio formats: AAC and MP3| Mainstream resolutions, such as 4K, 1080p, 720p, 480p, and 270p|
126| mkv | Video formats: H.265<sup>10+</sup> and H.264<br>Audio formats: AAC and MP3| Mainstream resolutions, such as 4K, 1080p, 720p, 480p, and 270p|
127| ts | Video formats: H.265<sup>10+</sup> and H.264<br>Audio formats: AAC and MP3| Mainstream resolutions, such as 4K, 1080p, 720p, 480p, and 270p|
128
129The table below lists the supported subtitle formats.
130
131| Subtitle Container Format| Protocol| Loading Mode|
132| -------- | -------- | -------- |
133| srt | File descriptor (FD) for local video on-demand (VOD), and HTTP/HTTPS/HLS/DASH for network VOD| External subtitle|
134| vtt | FD for local VOD, and HTTP/HTTPS/HLS/DASH for network VOD| External subtitle|
135| webvtt | DASH for network VOD| Built-in subtitle|
136
137> **NOTE**
138>
139> When DASH streams include built-in subtitles, external subtitles cannot be used.
140
141## SoundPool
142
143The SoundPool transcodes audio assets (such as MP3, M4A, and WAV) into audio analog signals and plays the signals through output devices.
144
145The SoundPool provides the capability of playing short sounds. This means that your application only needs to provide audio asset sources to implement sound playback. It does not need to parse or decode data.
146
147The figure below shows the interaction between the SoundPool and external modules when it is used to develop an audio playback application.
148
149![SoundPool Interaction Diagram](figures/soundpool-interaction-diagram.png)
150
151When an audio playback application calls the SoundPool APIs at the JS interface layer to implement sound playback, the player framework at the framework layer parses the media asset into audio data streams (in PCM format). The audio data streams are then decoded by software and output to the audio framework. The audio framework outputs the audio data streams to the audio HDI for rendering. A complete audio playback process requires the cooperation of the application, player framework, audio framework, and audio HDI.
152
153In this figure, the numbers indicate the process where data is transferred to external modules.
154
1551. The application transfers the media asset to the SoundPool instance.
156
1572. The player framework outputs the audio PCM data streams to the audio framework, which then outputs the data streams to the audio HDI.
158
159### Supported Formats and Protocols
160
161Audio containers and codecs are domains specific to content creators. You are advised to use the mainstream playback formats, rather than custom ones to avoid playback failures and stutters. The system will not be affected by incompatibility issues. If such an issue occurs, you can exit playback.
162
163The table below lists the supported protocols.
164
165| Scenario| Description|
166| -------- | -------- |
167| Local VOD| The file descriptor is supported, but the file path is not.|
168
169The table below lists the supported audio playback formats.
170
171| Audio Container Format| Description|
172| -------- | -------- |
173| m4a | Audio format: AAC|
174| aac | Audio format: AAC|
175| mp3 | Audio format: MP3|
176| ogg | Audio format: VORBIS |
177| wav | Audio format: PCM|
178
179## AVRecorder
180
181The AVRecorder captures audio signals, receives video signals, encodes the audio and video signals, and saves them to files. With the AVRecorder, you can easily implement audio and video recording, including starting, pausing, resuming, and stopping recording, and releasing resources. You can also specify parameters such as the encoding format, container format, and file path for recording.
182
183The following figure shows the interaction between the AVRecorder and external modules when it is used to develop a video recording application.
184
185![Video recording interaction diagram](figures/video-recording-interaction-diagram.png)
186
187- Audio recording: When an application calls the AVRecorder APIs at the JS interface layer to implement audio recording, the player framework at the framework layer invokes the audio framework to capture audio data through the audio HDI. The audio data is then encoded by software and saved into a file.
188
189- Video recording: When an application calls the AVRecorder APIs at the JS interface layer to implement video recording, the camera framework is first invoked to capture image data. Through the video encoding HDI, the camera framework sends the data to the player framework at the framework layer. The player framework encodes the image data through the video HDI and saves the encoded image data into a file.
190
191With the AVRecorder, you can implement pure audio recording, pure video recording, and audio and video recording.
192
193In this figure, the numbers indicate the process where data is transferred to external modules.
194
1951. The application obtains a surface ID from the player framework through the AVRecorder instance.
196
1972. The application sets the surface ID for the camera framework, which obtains the surface corresponding to the surface ID. The camera framework captures image data through the video HDI and sends the data to the player framework at the framework layer.
198
1993. The camera framework transfers the video data to the player framework through the surface.
200
2014. The player framework encodes video data through the video HDI.
202
2035. The player framework sets the audio parameters for the audio framework and obtains the audio data from the audio framework.
204
205### Supported Formats
206
207The table below lists the supported audio sources.
208
209| Type| Description|
210| -------- | -------- |
211| mic | The system microphone is used as the audio source input.|
212
213The table below lists the supported video sources.
214
215| Type| Description|
216| -------- | -------- |
217| surface_yuv | The input surface carries raw data.|
218| surface_es | The input surface carries ES data.|
219
220The table below lists the supported audio and video encoding formats.
221
222| Encoding Format| Description|
223| -------- | -------- |
224| audio/mp4a-latm | Audio encoding format MP4A-LATM.|
225| video/hevc | Video encoding format HEVC.|
226| video/avc | Video encoding format AVC.|
227| audio/mpeg | Audio encoding format MPEG.|
228| audio/g711mu | Audio encoding format G.711 μ-law.|
229
230The table below lists the supported output file formats.
231
232| Format| Description|
233| -------- | -------- |
234| mp4 | Video container format MP4.|
235| m4a | Audio container format M4A.|
236| mp3 | Audio container format MP3.|
237| wav | Audio container format WAV.|
238
239## AVScreenCapture
240
241The AVScreenCapture captures audio and video signals and saves screen data to files by means of encoding, helping you easily implement screen capture. It consists of two sets of APIs: one for storing screen recordings and the other for obtaining streams during screen capture. It allows the caller to specify parameters such as the encoding format, container format, and file path for screen capture.
242
243The following figure shows the interaction between the AVScreenCapture and external modules when it is used to develop a screen capture application.
244
245![AvScreenCapture interaction diagram](figures/avscreencapture-interaction-diagram.png)
246
247- Audio capture: When an application calls the AVScreenCapture APIs at the JS or native interface layer to implement audio capture, the player framework at the framework layer invokes the audio framework to capture audio data through the audio HDI. The audio data is then encoded by software and saved into a file.
248- Screen capture: When an application calls the AVScreenCapture APIs at the JS or native interface layer to implement screen capture, the player framework at the framework layer invokes the graphic framework to capture screen data. The screen data is then encoded by software and saved into a file.
249
250### Supported Formats
251
252The table below lists the supported audio sources.
253
254| Type| Description|
255| -------- | -------- |
256| MIC | The system microphone is used as the audio source input.|
257| ALL_PLAYBACK | Internal recording is used as the audio source input.|
258
259The table below lists the supported video sources.
260
261| Type| Description|
262| -------- | -------- |
263| SURFACE_RGBA | The output buffer is RGBA data.|
264
265The table below lists the supported audio encoding formats.
266
267| Audio Encoding Format| Description|
268| -------- | -------- |
269| AAC_LC | AAC_LC.|
270
271The table below lists the supported video encoding formats.
272
273| Video Encoding Format| Description|
274| -------- | -------- |
275| H264 | H.264.|
276
277The table below lists the supported output file formats.
278
279| Format| Description|
280| -------- | -------- |
281| mp4 | Video container format MP4.|
282| m4a | Audio container format M4A.|
283
284## AVMetadataExtractor
285
286The AVMetadataExtractor is used to obtain audio and video metadata. With the AVMetadataExtractor, you can extract rich metadata information from original media assets. For example, for an audio asset, you can obtain its details, such as the title, artist, album name, and duration. The process of obtaining the metadata of a video asset is similar. The only difference is that the process of obtaining the album cover is not required for a video asset, because no album cover is available in the video asset.
287
288The full process of obtaining the metadata of an audio asset includes creating an AVMetadataExtractor instance, setting resources, obtaining the metadata, obtaining the album cover (optional), and releasing the instance.
289
290### Supported Formats
291
292For details about the supported audio and video sources, see [Demuxing Media Data](../avcodec/audio-video-demuxer.md).
293
294## AVImageGenerator
295
296The AVImageGenerator is used to obtain video thumbnails. With the AVImageGenerator, you can obtain video frames at a specified time from original media assets.
297
298### Supported Formats
299
300For details about the supported video sources, see [Video Decoding](../avcodec/video-decoding.md).
301
302## AVTranscoder
303
304The AVTranscoder is used to convert a compressed video file into a video in another format based on specified parameters.
305
306### Supported Formats
307
308The AVTranscoder provides the following services:
309
310The encoding parameters (format and bit rate) and container format of the source video file can be modified. The audio and video encoding and container formats of the source video are compatible with the AVCodec for decoding and demuxing purposes, whereas those of the target video are compatible with the AVCodec for encoding and muxing purposes.
311
312- The following source video formats are supported:
313  - [Demuxing formats](../avcodec/audio-video-demuxer.md)
314  - [Audio decoding formats](../avcodec/audio-decoding.md)
315  - [Video decoding formats](../avcodec/video-decoding.md)
316    <!--Del-->
317    > **NOTE**
318    >
319    > Currently, H.265 is not supported.
320    <!--DelEnd-->
321- The following target video formats are supported:
322  - [Container formats](../avcodec/audio-video-muxer.md)
323  - [Audio encoding formats](../avcodec/audio-encoding.md)
324  - [Video encoding formats](../avcodec/video-encoding.md)
325    <!--Del-->
326    > **NOTE**
327    >
328    > Currently, H.265 is not supported.
329
330    <!--DelEnd-->
331
332<!--RP1--><!--RP1End-->
333