1# Connecting NNRt to an AI Inference Framework
2
3## When to Use
4
5As a bridge between the AI inference engine and acceleration chip, Neural Network Runtime (NNRt) provides simplified native APIs for the AI inference engine to perform end-to-end inference through the acceleration chip.
6
7This topic uses the `Add` single-operator model shown in Figure 1 as an example to describe the NNRt development process. The `Add` operator involves two inputs, one parameter, and one output. Wherein, the `activation` parameter is used to specify the type of the activation function in the `Add` operator.
8
9**Figure 1** Add single-operator model<br>
10!["Single Add operator model"](figures/neural_network_runtime.png)
11
12## Preparing the Environment
13
14### Environment Requirements
15
16The environment requirements for NNRt are as follows:
17
18- Development environment: Ubuntu 18.04 or later.
19- Access device: a standard device whose built-in hardware accelerator driver has been connected to NNRt.
20
21NNRt is opened to external systems through native APIs. Therefore, you need to use the native development suite to build NNRt applications. You can download the ohos-sdk package of the corresponding version from the daily build in the OpenHarmony community and then decompress the package to obtain the native development suite of the corresponding platform. Take Linux as an example. The package of the native development suite is named `native-linux-{version number}.zip`.
22
23### Environment Setup
24
251. Start the Ubuntu server.
262. Copy the downloaded package of the Native development suite to the root directory of the current user.
273. Decompress the package of the native development suite.
28    ```shell
29    unzip native-linux-{version number}.zip
30    ```
31
32    The directory structure after decompression is as follows. The content in the directory may vary depending on the version. Use the native APIs of the latest version.
33    ```text
34    native/
35    ├── build // Cross-compilation toolchain
36    ├── build-tools // Compilation and build tools
37    ├── docs
38    ├── llvm
39    ├── nativeapi_syscap_config.json
40    ├── ndk_system_capability.json
41    ├── NOTICE.txt
42    ├── oh-uni-package.json
43    └── sysroot // Native API header files and libraries
44    ```
45## Available APIs
46
47This section describes the common APIs used in the NNRt development process.
48
49### Structs
50
51| Name| Description|
52| --------- | ---- |
53| typedef struct OH_NNModel OH_NNModel | Model handle of NNRt. It is used to construct a model.|
54| typedef struct OH_NNCompilation OH_NNCompilation | Compiler handle of NNRt. It is used to compile an AI model.|
55| typedef struct OH_NNExecutor OH_NNExecutor | Executor handle of NNRt. It is used to perform inference computing on a specified device.|
56| typedef struct NN_QuantParam NN_QuantParam | Quantization parameter handle, which is used to specify the quantization parameter of the tensor during model construction.|
57| typedef struct NN_TensorDesc NN_TensorDesc | Tensor description handle, which is used to describe tensor attributes, such as the data format, data type, and shape.|
58| typedef struct NN_Tensor NN_Tensor | Tensor handle, which is used to set the inference input and output tensors of the executor.|
59
60### Model Construction APIs
61
62| Name| Description|
63| ------- | --- |
64| OH_NNModel_Construct() | Creates a model instance of the OH_NNModel type.|
65| OH_NN_ReturnCode OH_NNModel_AddTensorToModel(OH_NNModel *model, const NN_TensorDesc *tensorDesc) | Adds a tensor to a model instance.|
66| OH_NN_ReturnCode OH_NNModel_SetTensorData(OH_NNModel *model, uint32_t index, const void *dataBuffer, size_t length) | Sets the tensor value.|
67| OH_NN_ReturnCode OH_NNModel_AddOperation(OH_NNModel *model, OH_NN_OperationType op, const OH_NN_UInt32Array *paramIndices, const OH_NN_UInt32Array *inputIndices, const OH_NN_UInt32Array *outputIndices) | Adds an operator to a model instance.|
68| OH_NN_ReturnCode OH_NNModel_SpecifyInputsAndOutputs(OH_NNModel *model, const OH_NN_UInt32Array *inputIndices, const OH_NN_UInt32Array *outputIndices) | Sets an index value for the input and output tensors of a model.|
69| OH_NN_ReturnCode OH_NNModel_Finish(OH_NNModel *model) | Completes model composition.|
70| void OH_NNModel_Destroy(OH_NNModel **model) | Destroys a model instance.|
71
72
73### Model Compilation APIs
74
75| Name| Description|
76| ------- | --- |
77| OH_NNCompilation *OH_NNCompilation_Construct(const OH_NNModel *model) | Creates an **OH_NNCompilation** instance based on the specified model instance.|
78| OH_NNCompilation *OH_NNCompilation_ConstructWithOfflineModelFile(const char *modelPath) | Creates an **OH_NNCompilation** instance based on the specified offline model file path.|
79| OH_NNCompilation *OH_NNCompilation_ConstructWithOfflineModelBuffer(const void *modelBuffer, size_t modelSize) | Creates an **OH_NNCompilation** instance based on the specified offline model buffer.|
80| OH_NNCompilation *OH_NNCompilation_ConstructForCache() | Creates an empty model building instance for later recovery from the model cache.|
81| OH_NN_ReturnCode OH_NNCompilation_ExportCacheToBuffer(OH_NNCompilation *compilation, const void *buffer, size_t length, size_t *modelSize) | Writes the model cache to the specified buffer.|
82| OH_NN_ReturnCode OH_NNCompilation_ImportCacheFromBuffer(OH_NNCompilation *compilation, const void *buffer, size_t modelSize) | Reads the model cache from the specified buffer.|
83| OH_NN_ReturnCode OH_NNCompilation_AddExtensionConfig(OH_NNCompilation *compilation, const char *configName, const void *configValue, const size_t configValueSize) | Adds extended configurations for custom device attributes. For details about the extended attribute names and values, see the documentation that comes with the device.|
84| OH_NN_ReturnCode OH_NNCompilation_SetDevice(OH_NNCompilation *compilation, size_t deviceID) | Sets the Device for model building and computing, which can be obtained through the device management APIs.|
85| OH_NN_ReturnCode OH_NNCompilation_SetCache(OH_NNCompilation *compilation, const char *cachePath, uint32_t version) | Sets the cache directory and version for model building.|
86| OH_NN_ReturnCode OH_NNCompilation_SetPerformanceMode(OH_NNCompilation *compilation, OH_NN_PerformanceMode performanceMode) | Sets the performance mode for model computing.|
87| OH_NN_ReturnCode OH_NNCompilation_SetPriority(OH_NNCompilation *compilation, OH_NN_Priority priority) | Sets the priority for model computing.|
88| OH_NN_ReturnCode OH_NNCompilation_EnableFloat16(OH_NNCompilation *compilation, bool enableFloat16) | Enables float16 for computing.|
89| OH_NN_ReturnCode OH_NNCompilation_Build(OH_NNCompilation *compilation) | Performs model building.|
90| void OH_NNCompilation_Destroy(OH_NNCompilation **compilation) | Destroys a model building instance.|
91
92### Tensor Description APIs
93
94| Name| Description|
95| ------- | --- |
96| NN_TensorDesc *OH_NNTensorDesc_Create() | Creates an **NN_TensorDesc** instance for creating an **NN_Tensor** instance at a later time.|
97| OH_NN_ReturnCode OH_NNTensorDesc_SetName(NN_TensorDesc *tensorDesc, const char *name) | Sets the name of the **NN_TensorDesc** instance.|
98| OH_NN_ReturnCode OH_NNTensorDesc_GetName(const NN_TensorDesc *tensorDesc, const char **name) | Obtains the name of the **NN_TensorDesc** instance.|
99| OH_NN_ReturnCode OH_NNTensorDesc_SetDataType(NN_TensorDesc *tensorDesc, OH_NN_DataType dataType) | Sets the data type of the **NN_TensorDesc** instance.|
100| OH_NN_ReturnCode OH_NNTensorDesc_GetDataType(const NN_TensorDesc *tensorDesc, OH_NN_DataType *dataType) | Obtains the data type of the **NN_TensorDesc** instance.|
101| OH_NN_ReturnCode OH_NNTensorDesc_SetShape(NN_TensorDesc *tensorDesc, const int32_t *shape, size_t shapeLength) | Sets the shape of the **NN_TensorDesc** instance.|
102| OH_NN_ReturnCode OH_NNTensorDesc_GetShape(const NN_TensorDesc *tensorDesc, int32_t **shape, size_t *shapeLength) | Obtains the shape of the **NN_TensorDesc** instance.|
103| OH_NN_ReturnCode OH_NNTensorDesc_SetFormat(NN_TensorDesc *tensorDesc, OH_NN_Format format) | Sets the data format of the **NN_TensorDesc** instance.|
104| OH_NN_ReturnCode OH_NNTensorDesc_GetFormat(const NN_TensorDesc *tensorDesc, OH_NN_Format *format) | Obtains the data format of the **NN_TensorDesc** instance.|
105| OH_NN_ReturnCode OH_NNTensorDesc_GetElementCount(const NN_TensorDesc *tensorDesc, size_t *elementCount) | Obtains the number of elements in the **NN_TensorDesc** instance.|
106| OH_NN_ReturnCode OH_NNTensorDesc_GetByteSize(const NN_TensorDesc *tensorDesc, size_t *byteSize) | Obtains the number of bytes occupied by the tensor data obtained through calculation based on the shape and data type of an **NN_TensorDesc** instance.|
107| OH_NN_ReturnCode OH_NNTensorDesc_Destroy(NN_TensorDesc **tensorDesc) | Destroys an **NN_TensorDesc** instance.|
108
109### Tensor APIs
110
111| Name| Description|
112| ------- | --- |
113| NN_Tensor* OH_NNTensor_Create(size_t deviceID, NN_TensorDesc *tensorDesc) | Creates an **NN_Tensor** instance based on the specified tensor description. This API will request for device shared memory.|
114| NN_Tensor* OH_NNTensor_CreateWithSize(size_t deviceID, NN_TensorDesc *tensorDesc, size_t size) | Creates an **NN_Tensor** instance based on the specified memory size and tensor description. This API will request for device shared memory.|
115| NN_Tensor* OH_NNTensor_CreateWithFd(size_t deviceID, NN_TensorDesc *tensorDesc, int fd, size_t size, size_t offset) | Creates an **NN_Tensor** instance based on the specified file descriptor of the shared memory and tensor description. This way, the device shared memory of other tensors can be reused.|
116| NN_TensorDesc* OH_NNTensor_GetTensorDesc(const NN_Tensor *tensor) | Obtains the pointer to the **NN_TensorDesc** instance in a tensor to read tensor attributes, such as the data type and shape.|
117| void* OH_NNTensor_GetDataBuffer(const NN_Tensor *tensor) | Obtains the memory address of tensor data to read or write tensor data.|
118| OH_NN_ReturnCode OH_NNTensor_GetFd(const NN_Tensor *tensor, int *fd) | Obtains the file descriptor of the shared memory where the tensor data is located. A file descriptor corresponds to a device shared memory block.|
119| OH_NN_ReturnCode OH_NNTensor_GetSize(const NN_Tensor *tensor, size_t *size) | Obtains the size of the shared memory where tensor data is located.|
120| OH_NN_ReturnCode OH_NNTensor_GetOffset(const NN_Tensor *tensor, size_t *offset) | Obtains the offset of the tensor data in the shared memory. The available size of the tensor data is the size of the shared memory minus the offset.|
121| OH_NN_ReturnCode OH_NNTensor_Destroy(NN_Tensor **tensor) | Destroys an **NN_Tensor** instance.|
122
123### Inference APIs
124
125| Name| Description|
126| ------- | --- |
127| OH_NNExecutor *OH_NNExecutor_Construct(OH_NNCompilation *compilation) | Creates an **OH_NNExecutor** instance.|
128| OH_NN_ReturnCode OH_NNExecutor_GetOutputShape(OH_NNExecutor *executor, uint32_t outputIndex, int32_t **shape, uint32_t *shapeLength) | Obtains the dimension information about the output tensor. This API is applicable only if the output tensor has a dynamic shape.|
129| OH_NN_ReturnCode OH_NNExecutor_GetInputCount(const OH_NNExecutor *executor, size_t *inputCount) | Obtains the number of input tensors.|
130| OH_NN_ReturnCode OH_NNExecutor_GetOutputCount(const OH_NNExecutor *executor, size_t *outputCount) | Obtains the number of output tensors.|
131| NN_TensorDesc* OH_NNExecutor_CreateInputTensorDesc(const OH_NNExecutor *executor, size_t index) | Creates an **NN_TensorDesc** instance for an input tensor based on the specified index value. This instance will be used to read tensor attributes or create **NN_Tensor** instances.|
132| NN_TensorDesc* OH_NNExecutor_CreateOutputTensorDesc(const OH_NNExecutor *executor, size_t index) | Creates an **NN_TensorDesc** instance for an output tensor based on the specified index value. This instance will be used to read tensor attributes or create **NN_Tensor** instances.|
133| OH_NN_ReturnCode OH_NNExecutor_GetInputDimRange(const OH_NNExecutor *executor, size_t index, size_t **minInputDims, size_t **maxInputDims, size_t *shapeLength) |Obtains the dimension range of all input tensors. If the input tensor has a dynamic shape, the dimension range supported by the tensor may vary according to device. |
134| OH_NN_ReturnCode OH_NNExecutor_SetOnRunDone(OH_NNExecutor *executor, NN_OnRunDone onRunDone) | Sets the callback function invoked when the asynchronous inference ends. For the definition of the callback function, see the *API Reference*.|
135| OH_NN_ReturnCode OH_NNExecutor_SetOnServiceDied(OH_NNExecutor *executor, NN_OnServiceDied onServiceDied) | Sets the callback function invoked when the device driver service terminates unexpectedly during asynchronous inference. For the definition of the callback function, see the *API Reference*.|
136| OH_NN_ReturnCode OH_NNExecutor_RunSync(OH_NNExecutor *executor, NN_Tensor *inputTensor[], size_t inputCount, NN_Tensor *outputTensor[], size_t outputCount) | Performs synchronous inference.|
137| OH_NN_ReturnCode OH_NNExecutor_RunAsync(OH_NNExecutor *executor, NN_Tensor *inputTensor[], size_t inputCount, NN_Tensor *outputTensor[], size_t outputCount, int32_t timeout, void *userData) | Performs asynchronous inference.|
138| void OH_NNExecutor_Destroy(OH_NNExecutor **executor) | Destroys an **OH_NNExecutor** instance.|
139
140### Device Management APIs
141
142| Name| Description|
143| ------- | --- |
144| OH_NN_ReturnCode OH_NNDevice_GetAllDevicesID(const size_t **allDevicesID, uint32_t *deviceCount) | Obtains the ID of the device connected to NNRt.|
145| OH_NN_ReturnCode OH_NNDevice_GetName(size_t deviceID, const char **name) | Obtains the name of the specified device.|
146| OH_NN_ReturnCode OH_NNDevice_GetType(size_t deviceID, OH_NN_DeviceType *deviceType) | Obtains the type of the specified device.|
147
148
149## How to Develop
150
151The development process of NNRt consists of three phases: model construction, model compilation, and inference execution. The following uses the `Add` single-operator model as an example to describe how to call NNRt APIs during application development.
152
1531. Create an application sample file.
154
155    Create the source file of the NNRt application sample. Run the following commands in the project directory to create the `nnrt_example/` directory and create the `nnrt_example.cpp` source file in the directory:
156
157    ```shell
158    mkdir ~/nnrt_example && cd ~/nnrt_example
159    touch nnrt_example.cpp
160    ```
161
1622. Import the NNRt module.
163
164    Add the following code at the beginning of the `nnrt_example.cpp` file to import NNRt:
165
166    ```cpp
167    #include <iostream>
168    #include <cstdarg>
169    #include "neural_network_runtime/neural_network_runtime.h"
170    ```
171
1723. Defines auxiliary functions, such as log printing, input data setting, and data printing.
173
174    ```cpp
175    // Macro for checking the return value
176    #define CHECKNEQ(realRet, expectRet, retValue, ...) \
177        do { \
178            if ((realRet) != (expectRet)) { \
179                printf(__VA_ARGS__); \
180                return (retValue); \
181            } \
182        } while (0)
183
184    #define CHECKEQ(realRet, expectRet, retValue, ...) \
185        do { \
186            if ((realRet) == (expectRet)) { \
187                printf(__VA_ARGS__); \
188                return (retValue); \
189            } \
190        } while (0)
191
192    // Set the input data for inference.
193    OH_NN_ReturnCode SetInputData(NN_Tensor* inputTensor[], size_t inputSize)
194    {
195        OH_NN_DataType dataType(OH_NN_FLOAT32);
196        OH_NN_ReturnCode ret{OH_NN_FAILED};
197        size_t elementCount = 0;
198        for (size_t i = 0; i < inputSize; ++i) {
199            // Obtain the data memory of the tensor.
200            auto data = OH_NNTensor_GetDataBuffer(inputTensor[i]);
201            CHECKEQ(data, nullptr, OH_NN_FAILED, "Failed to get data buffer.");
202            // Obtain the tensor description.
203            auto desc = OH_NNTensor_GetTensorDesc(inputTensor[i]);
204            CHECKEQ(desc, nullptr, OH_NN_FAILED, "Failed to get desc.");
205            // Obtain the data type of the tensor.
206            ret = OH_NNTensorDesc_GetDataType(desc, &dataType);
207            CHECKNEQ(ret, OH_NN_SUCCESS, OH_NN_FAILED, "Failed to get data type.");
208            // Obtain the number of elements in the tensor.
209            ret = OH_NNTensorDesc_GetElementCount(desc, &elementCount);
210            CHECKNEQ(ret, OH_NN_SUCCESS, OH_NN_FAILED, "Failed to get element count.");
211            switch(dataType) {
212                case OH_NN_FLOAT32: {
213                    float* floatValue = reinterpret_cast<float*>(data);
214                    for (size_t j = 0; j < elementCount; ++j) {
215                        floatValue[j] = static_cast<float>(j);
216                    }
217                    break;
218                }
219                case OH_NN_INT32: {
220                    int* intValue = reinterpret_cast<int*>(data);
221                    for (size_t j = 0; j < elementCount; ++j) {
222                        intValue[j] = static_cast<int>(j);
223                    }
224                    break;
225                }
226                default:
227                    return OH_NN_FAILED;
228            }
229        }
230        return OH_NN_SUCCESS;
231    }
232
233    OH_NN_ReturnCode Print(NN_Tensor* outputTensor[], size_t outputSize)
234    {
235        OH_NN_DataType dataType(OH_NN_FLOAT32);
236        OH_NN_ReturnCode ret{OH_NN_FAILED};
237        size_t elementCount = 0;
238        for (size_t i = 0; i < outputSize; ++i) {
239            auto data = OH_NNTensor_GetDataBuffer(outputTensor[i]);
240            CHECKEQ(data, nullptr, OH_NN_FAILED, "Failed to get data buffer.");
241            auto desc = OH_NNTensor_GetTensorDesc(outputTensor[i]);
242            CHECKEQ(desc, nullptr, OH_NN_FAILED, "Failed to get desc.");
243            ret = OH_NNTensorDesc_GetDataType(desc, &dataType);
244            CHECKNEQ(ret, OH_NN_SUCCESS, OH_NN_FAILED, "Failed to get data type.");
245            ret = OH_NNTensorDesc_GetElementCount(desc, &elementCount);
246            CHECKNEQ(ret, OH_NN_SUCCESS, OH_NN_FAILED, "Failed to get element count.");
247            switch(dataType) {
248                case OH_NN_FLOAT32: {
249                    float* floatValue = reinterpret_cast<float*>(data);
250                    for (size_t j = 0; j < elementCount; ++j) {
251                        std::cout << "Output index: " << j << ", value is: " << floatValue[j] << "." << std::endl;
252                    }
253                    break;
254                }
255                case OH_NN_INT32: {
256                    int* intValue = reinterpret_cast<int*>(data);
257                    for (size_t j = 0; j < elementCount; ++j) {
258                        std::cout << "Output index: " << j << ", value is: " << intValue[j] << "." << std::endl;
259                    }
260                    break;
261                }
262                default:
263                    return OH_NN_FAILED;
264            }
265        }
266
267        return OH_NN_SUCCESS;
268    }
269    ```
270
2714. Construct a model.
272
273    Use the model construction APIs to construct a single `Add` operator model.
274
275    ```cpp
276    OH_NN_ReturnCode BuildModel(OH_NNModel** pmodel)
277    {
278        // Create a model instance and construct a model.
279        OH_NNModel* model = OH_NNModel_Construct();
280        CHECKEQ(model, nullptr, OH_NN_FAILED, "Create model failed.");
281
282        // Add the first input tensor of the float32 type for the Add operator. The tensor shape is [1, 2, 2, 3].
283        NN_TensorDesc* tensorDesc = OH_NNTensorDesc_Create();
284        CHECKEQ(tensorDesc, nullptr, OH_NN_FAILED, "Create TensorDesc failed.");
285
286        int32_t inputDims[4] = {1, 2, 2, 3};
287        auto returnCode = OH_NNTensorDesc_SetShape(tensorDesc, inputDims, 4);
288        CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Set TensorDesc shape failed.");
289
290        returnCode = OH_NNTensorDesc_SetDataType(tensorDesc, OH_NN_FLOAT32);
291        CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Set TensorDesc data type failed.");
292
293        returnCode = OH_NNTensorDesc_SetFormat(tensorDesc, OH_NN_FORMAT_NONE);
294        CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Set TensorDesc format failed.");
295
296        returnCode = OH_NNModel_AddTensorToModel(model, tensorDesc);
297        CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Add first TensorDesc to model failed.");
298
299        returnCode = OH_NNModel_SetTensorType(model, 0, OH_NN_TENSOR);
300        CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Set model tensor type failed.");
301
302        // Add the second input tensor of the float32 type for the Add operator. The tensor shape is [1, 2, 2, 3].
303        tensorDesc = OH_NNTensorDesc_Create();
304        CHECKEQ(tensorDesc, nullptr, OH_NN_FAILED, "Create TensorDesc failed.");
305
306        returnCode = OH_NNTensorDesc_SetShape(tensorDesc, inputDims, 4);
307        CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Set TensorDesc shape failed.");
308
309        returnCode = OH_NNTensorDesc_SetDataType(tensorDesc, OH_NN_FLOAT32);
310        CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Set TensorDesc data type failed.");
311
312        returnCode = OH_NNTensorDesc_SetFormat(tensorDesc, OH_NN_FORMAT_NONE);
313        CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Set TensorDesc format failed.");
314
315        returnCode = OH_NNModel_AddTensorToModel(model, tensorDesc);
316        CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Add second TensorDesc to model failed.");
317
318        returnCode = OH_NNModel_SetTensorType(model, 1, OH_NN_TENSOR);
319        CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Set model tensor type failed.");
320
321        // Add the parameter tensor of the int8 type for the Add operator. The parameter tensor is used to specify the type of the activation function.
322        tensorDesc = OH_NNTensorDesc_Create();
323        CHECKEQ(tensorDesc, nullptr, OH_NN_FAILED, "Create TensorDesc failed.");
324
325        int32_t activationDims = 1;
326        returnCode = OH_NNTensorDesc_SetShape(tensorDesc, &activationDims, 1);
327        CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Set TensorDesc shape failed.");
328
329        returnCode = OH_NNTensorDesc_SetDataType(tensorDesc, OH_NN_INT8);
330        CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Set TensorDesc data type failed.");
331
332        returnCode = OH_NNTensorDesc_SetFormat(tensorDesc, OH_NN_FORMAT_NONE);
333        CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Set TensorDesc format failed.");
334
335        returnCode = OH_NNModel_AddTensorToModel(model, tensorDesc);
336        CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Add second TensorDesc to model failed.");
337
338        returnCode = OH_NNModel_SetTensorType(model, 2, OH_NN_ADD_ACTIVATIONTYPE);
339        CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Set model tensor type failed.");
340
341        // Set the type of the activation function to OH_NN_FUSED_NONE, indicating that no activation function is added to the operator.
342        int8_t activationValue = OH_NN_FUSED_NONE;
343        returnCode = OH_NNModel_SetTensorData(model, 2, &activationValue, sizeof(int8_t));
344        CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Set model tensor data failed.");
345
346        // Add the output tensor of the float32 type for the Add operator. The tensor shape is [1, 2, 2, 3].
347        tensorDesc = OH_NNTensorDesc_Create();
348        CHECKEQ(tensorDesc, nullptr, OH_NN_FAILED, "Create TensorDesc failed.");
349
350        returnCode = OH_NNTensorDesc_SetShape(tensorDesc, inputDims, 4);
351        CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Set TensorDesc shape failed.");
352
353        returnCode = OH_NNTensorDesc_SetDataType(tensorDesc, OH_NN_FLOAT32);
354        CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Set TensorDesc data type failed.");
355
356        returnCode = OH_NNTensorDesc_SetFormat(tensorDesc, OH_NN_FORMAT_NONE);
357        CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Set TensorDesc format failed.");
358
359        returnCode = OH_NNModel_AddTensorToModel(model, tensorDesc);
360        CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Add forth TensorDesc to model failed.");
361
362        returnCode = OH_NNModel_SetTensorType(model, 3, OH_NN_TENSOR);
363        CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Set model tensor type failed.");
364
365        // Specify index values of the input tensor, parameter tensor, and output tensor for the Add operator.
366        uint32_t inputIndicesValues[2] = {0, 1};
367        uint32_t paramIndicesValues = 2;
368        uint32_t outputIndicesValues = 3;
369        OH_NN_UInt32Array paramIndices = {&paramIndicesValues, 1};
370        OH_NN_UInt32Array inputIndices = {inputIndicesValues, 2};
371        OH_NN_UInt32Array outputIndices = {&outputIndicesValues, 1};
372
373        // Add the Add operator to the model instance.
374        returnCode = OH_NNModel_AddOperation(model, OH_NN_OPS_ADD, &paramIndices, &inputIndices, &outputIndices);
375        CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Add operation to model failed.");
376
377        // Set the index values of the input tensor and output tensor for the model instance.
378        returnCode = OH_NNModel_SpecifyInputsAndOutputs(model, &inputIndices, &outputIndices);
379        CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Specify model inputs and outputs failed.");
380
381        // Complete the model instance construction.
382        returnCode = OH_NNModel_Finish(model);
383        CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Build model failed.");
384
385        // Return the model instance.
386        *pmodel = model;
387        return OH_NN_SUCCESS;
388    }
389    ```
390
3915. Query the AI acceleration chips connected to NNRt.
392
393    NNRt can connect to multiple AI acceleration chips through HDIs. Before model building, you need to query the AI acceleration chips connected to NNRt on the current device. Each AI acceleration chip has a unique ID. In the compilation phase, you need to specify the chip for model compilation based on the ID.
394    ```cpp
395    void GetAvailableDevices(std::vector<size_t>& availableDevice)
396    {
397        availableDevice.clear();
398
399        // Obtain the available hardware IDs.
400        const size_t* devices = nullptr;
401        uint32_t deviceCount = 0;
402        OH_NN_ReturnCode ret = OH_NNDevice_GetAllDevicesID(&devices, &deviceCount);
403        if (ret != OH_NN_SUCCESS) {
404            std::cout << "GetAllDevicesID failed, get no available device." << std::endl;
405            return;
406        }
407
408        for (uint32_t i = 0; i < deviceCount; i++) {
409            availableDevice.emplace_back(devices[i]);
410        }
411    }
412    ```
413
4146. Compile a model on the specified device.
415
416    NNRt uses abstract model expressions to describe the topology structure of an AI model. Before inference execution on an AI acceleration chip, the build module provided by NNRt needs to deliver the abstract model expressions to the chip driver layer and convert the abstract model expressions into a format that supports inference and computing.
417    ```cpp
418    OH_NN_ReturnCode CreateCompilation(OH_NNModel* model, const std::vector<size_t>& availableDevice,
419                                       OH_NNCompilation** pCompilation)
420    {
421        // Create an OH_NNCompilation instance and pass the image composition model instance or the MindSpore Lite model instance to it.
422        OH_NNCompilation* compilation = OH_NNCompilation_Construct(model);
423        CHECKEQ(compilation, nullptr, OH_NN_FAILED, "OH_NNCore_ConstructCompilationWithNNModel failed.");
424
425        // Set compilation options, such as the compilation hardware, cache path, performance mode, computing priority, and whether to enable float16 low-precision computing.
426        // Choose to perform model compilation on the first device.
427        auto returnCode = OH_NNCompilation_SetDevice(compilation, availableDevice[0]);
428        CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "OH_NNCompilation_SetDevice failed.");
429
430        // Have the model compilation result cached in the /data/local/tmp directory, with the version number set to 1.
431        returnCode = OH_NNCompilation_SetCache(compilation, "/data/local/tmp", 1);
432        CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "OH_NNCompilation_SetCache failed.");
433
434        // Set the performance mode of the device.
435        returnCode = OH_NNCompilation_SetPerformanceMode(compilation, OH_NN_PERFORMANCE_EXTREME);
436        CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "OH_NNCompilation_SetPerformanceMode failed.");
437
438        // Set the inference priority.
439        returnCode = OH_NNCompilation_SetPriority(compilation, OH_NN_PRIORITY_HIGH);
440        CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "OH_NNCompilation_SetPriority failed.");
441
442        // Specify whether to enable FP16 computing.
443        returnCode = OH_NNCompilation_EnableFloat16(compilation, false);
444        CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "OH_NNCompilation_EnableFloat16 failed.");
445
446        // Perform model building
447        returnCode = OH_NNCompilation_Build(compilation);
448        CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "OH_NNCompilation_Build failed.");
449
450        *pCompilation = compilation;
451        return OH_NN_SUCCESS;
452    }
453    ```
454
4557. Create an executor.
456
457    After the model building is complete, you need to call the NNRt execution module to create an executor. In the inference phase, operations such as setting the model input, obtaining the model output, and triggering inference computing are performed through the executor.
458    ```cpp
459    OH_NNExecutor* CreateExecutor(OH_NNCompilation* compilation)
460    {
461        // Create an executor based on the specified OH_NNCompilation instance.
462        OH_NNExecutor *executor = OH_NNExecutor_Construct(compilation);
463        CHECKEQ(executor, nullptr, nullptr, "OH_NNExecutor_Construct failed.");
464        return executor;
465    }
466    ```
467
4688. Perform inference computing, and print the inference result.
469
470    The input data required for inference computing is passed to the executor through the API provided by the execution module. This way, the executor is triggered to perform inference computing once to obtain and print the inference computing result.
471    ```cpp
472    OH_NN_ReturnCode Run(OH_NNExecutor* executor, const std::vector<size_t>& availableDevice)
473    {
474        // Obtain information about the input and output tensors from the executor.
475        // Obtain the number of input tensors.
476        size_t inputCount = 0;
477        auto returnCode = OH_NNExecutor_GetInputCount(executor, &inputCount);
478        CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "OH_NNExecutor_GetInputCount failed.");
479        std::vector<NN_TensorDesc*> inputTensorDescs;
480        NN_TensorDesc* tensorDescTmp = nullptr;
481        for (size_t i = 0; i < inputCount; ++i) {
482            // Create the description of the input tensor.
483            tensorDescTmp = OH_NNExecutor_CreateInputTensorDesc(executor, i);
484            CHECKEQ(tensorDescTmp, nullptr, OH_NN_FAILED, "OH_NNExecutor_CreateInputTensorDesc failed.");
485            inputTensorDescs.emplace_back(tensorDescTmp);
486        }
487        // Obtain the number of output tensors.
488        size_t outputCount = 0;
489        returnCode = OH_NNExecutor_GetOutputCount(executor, &outputCount);
490        CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "OH_NNExecutor_GetOutputCount failed.");
491        std::vector<NN_TensorDesc*> outputTensorDescs;
492        for (size_t i = 0; i < outputCount; ++i) {
493            // Create the description of the output tensor.
494            tensorDescTmp = OH_NNExecutor_CreateOutputTensorDesc(executor, i);
495            CHECKEQ(tensorDescTmp, nullptr, OH_NN_FAILED, "OH_NNExecutor_CreateOutputTensorDesc failed.");
496            outputTensorDescs.emplace_back(tensorDescTmp);
497        }
498
499        // Create input and output tensors.
500        NN_Tensor* inputTensors[inputCount];
501        NN_Tensor* tensor = nullptr;
502        for (size_t i = 0; i < inputCount; ++i) {
503            tensor = nullptr;
504            tensor = OH_NNTensor_Create(availableDevice[0], inputTensorDescs[i]);
505            CHECKEQ(tensor, nullptr, OH_NN_FAILED, "OH_NNTensor_Create failed.");
506            inputTensors[i] = tensor;
507        }
508        NN_Tensor* outputTensors[outputCount];
509        for (size_t i = 0; i < outputCount; ++i) {
510            tensor = nullptr;
511            tensor = OH_NNTensor_Create(availableDevice[0], outputTensorDescs[i]);
512            CHECKEQ(tensor, nullptr, OH_NN_FAILED, "OH_NNTensor_Create failed.");
513            outputTensors[i] = tensor;
514        }
515
516        // Set the data of the input tensor.
517        returnCode = SetInputData(inputTensors, inputCount);
518        CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "SetInputData failed.");
519
520        // Perform inference
521        returnCode = OH_NNExecutor_RunSync(executor, inputTensors, inputCount, outputTensors, outputCount);
522        CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "OH_NNExecutor_RunSync failed.");
523
524        // Print the data of the output tensor.
525        Print(outputTensors, outputCount);
526
527        // Clear the input and output tensors and tensor description.
528        for (size_t i = 0; i < inputCount; ++i) {
529            returnCode = OH_NNTensor_Destroy(&inputTensors[i]);
530            CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "OH_NNTensor_Destroy failed.");
531            returnCode = OH_NNTensorDesc_Destroy(&inputTensorDescs[i]);
532            CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "OH_NNTensorDesc_Destroy failed.");
533        }
534        for (size_t i = 0; i < outputCount; ++i) {
535            returnCode = OH_NNTensor_Destroy(&outputTensors[i]);
536            CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "OH_NNTensor_Destroy failed.");
537            returnCode = OH_NNTensorDesc_Destroy(&outputTensorDescs[i]);
538            CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "OH_NNTensorDesc_Destroy failed.");
539        }
540
541        return OH_NN_SUCCESS;
542    }
543    ```
544
5459. Build an end-to-end process from model construction to model compilation and execution.
546
547    Steps 4 to 8 implement the model construction, compilation, and execution processes and encapsulates them into multiple functions to facilitate modular development. The following sample code shows how to apply these functions into a complete NNRt development process.
548    ```cpp
549    int main(int argc, char** argv)
550    {
551        OH_NNModel* model = nullptr;
552        OH_NNCompilation* compilation = nullptr;
553        OH_NNExecutor* executor = nullptr;
554        std::vector<size_t> availableDevices;
555
556        // Construct a model.
557        OH_NN_ReturnCode ret = BuildModel(&model);
558        if (ret != OH_NN_SUCCESS) {
559            std::cout << "BuildModel failed." << std::endl;
560            OH_NNModel_Destroy(&model);
561            return -1;
562        }
563
564        // Obtain the available devices.
565        GetAvailableDevices(availableDevices);
566        if (availableDevices.empty()) {
567            std::cout << "No available device." << std::endl;
568            OH_NNModel_Destroy(&model);
569            return -1;
570        }
571
572        // Build the model.
573        ret = CreateCompilation(model, availableDevices, &compilation);
574        if (ret != OH_NN_SUCCESS) {
575            std::cout << "CreateCompilation failed." << std::endl;
576            OH_NNModel_Destroy(&model);
577            OH_NNCompilation_Destroy(&compilation);
578            return -1;
579        }
580
581        // Destroy the model instance.
582        OH_NNModel_Destroy(&model);
583
584        // Create an inference executor for the model.
585        executor = CreateExecutor(compilation);
586        if (executor == nullptr) {
587            std::cout << "CreateExecutor failed, no executor is created." << std::endl;
588            OH_NNCompilation_Destroy(&compilation);
589            return -1;
590        }
591
592        // Destroy the model building instance.
593        OH_NNCompilation_Destroy(&compilation);
594
595        // Use the created executor to perform inference.
596        ret = Run(executor, availableDevices);
597        if (ret != OH_NN_SUCCESS) {
598            std::cout << "Run failed." << std::endl;
599            OH_NNExecutor_Destroy(&executor);
600            return -1;
601        }
602
603        // Destroy the executor instance.
604        OH_NNExecutor_Destroy(&executor);
605
606        return 0;
607    }
608    ```
609
610## Verification
611
6121. Prepare the compilation configuration file of the application sample.
613
614    Create a `CMakeLists.txt` file, and add compilation configurations to the application sample file `nnrt_example.cpp`. The following is a simple example of the `CMakeLists.txt` file:
615    ```text
616    cmake_minimum_required(VERSION 3.16)
617    project(nnrt_example C CXX)
618
619    add_executable(nnrt_example
620        ./nnrt_example.cpp
621    )
622
623    target_link_libraries(nnrt_example
624        neural_network_runtime
625        neural_network_core
626    )
627    ```
628
6292. Compile the application sample.
630
631    Create the **build/** directory in the current directory, and compile `nnrt\_example.cpp` in the **build/** directory to obtain the binary file `nnrt\_example`:
632    ```shell
633    mkdir build && cd build
634    cmake -DCMAKE_TOOLCHAIN_FILE={Path of the cross-compilation toolchain}/build/cmake/ohos.toolchain.cmake -DOHOS_ARCH=arm64-v8a -DOHOS_PLATFORM=OHOS -DOHOS_STL=c++_static ..
635    make
636    ```
637
6383. Push the application sample to the device for execution.
639    ```shell
640    # Push the `nnrt_example` obtained through compilation to the device, and execute it.
641    hdc_std file send ./nnrt_example /data/local/tmp/.
642
643    # Grant required permissions to the executable file of the test case.
644    hdc_std shell "chmod +x /data/local/tmp/nnrt_example"
645
646    # Execute the test case.
647    hdc_std shell "/data/local/tmp/nnrt_example"
648    ```
649
650    If the execution is normal, information similar to the following is displayed:
651    ```text
652    Output index: 0, value is: 0.000000.
653    Output index: 1, value is: 2.000000.
654    Output index: 2, value is: 4.000000.
655    Output index: 3, value is: 6.000000.
656    Output index: 4, value is: 8.000000.
657    Output index: 5, value is: 10.000000.
658    Output index: 6, value is: 12.000000.
659    Output index: 7, value is: 14.000000.
660    Output index: 8, value is: 16.000000.
661    Output index: 9, value is: 18.000000.
662    Output index: 10, value is: 20.000000.
663    Output index: 11, value is: 22.000000.
664    ```
665
6664. (Optional) Check the model cache.
667
668    If the HDI service connected to NNRt supports the model cache function, you can find the generated cache file in the `/data/local/tmp` directory after the `nnrt_example` is executed successfully.
669
670    > **NOTE**
671    >
672    > The IR graphs of the model need to be passed to the hardware driver layer, so that the HDI service compiles the IR graphs into a computing graph dedicated to hardware. The compilation process is time-consuming. The NNRt supports the computing graph cache feature. It can cache the computing graphs compiled by the HDI service to the device storage. If the same model is compiled on the same acceleration chip next time, you can specify the cache path so that NNRt can directly load the computing graphs in the cache file, reducing the compilation time.
673
674    Check the cached files in the cache directory.
675    ```shell
676    ls /data/local/tmp
677    ```
678
679    The command output is as follows:
680    ```text
681    # 0.nncache 1.nncache 2.nncache cache_info.nncache
682    ```
683
684    If the cache is no longer used, manually delete the cache files.
685    ```shell
686    rm /data/local/tmp/*nncache
687    ```
688