1# Connecting NNRt to an AI Inference Framework 2 3## When to Use 4 5As a bridge between the AI inference engine and acceleration chip, Neural Network Runtime (NNRt) provides simplified native APIs for the AI inference engine to perform end-to-end inference through the acceleration chip. 6 7This topic uses the `Add` single-operator model shown in Figure 1 as an example to describe the NNRt development process. The `Add` operator involves two inputs, one parameter, and one output. Wherein, the `activation` parameter is used to specify the type of the activation function in the `Add` operator. 8 9**Figure 1** Add single-operator model<br> 10 11 12## Preparing the Environment 13 14### Environment Requirements 15 16The environment requirements for NNRt are as follows: 17 18- Development environment: Ubuntu 18.04 or later. 19- Access device: a standard device whose built-in hardware accelerator driver has been connected to NNRt. 20 21NNRt is opened to external systems through native APIs. Therefore, you need to use the native development suite to build NNRt applications. You can download the ohos-sdk package of the corresponding version from the daily build in the OpenHarmony community and then decompress the package to obtain the native development suite of the corresponding platform. Take Linux as an example. The package of the native development suite is named `native-linux-{version number}.zip`. 22 23### Environment Setup 24 251. Start the Ubuntu server. 262. Copy the downloaded package of the Native development suite to the root directory of the current user. 273. Decompress the package of the native development suite. 28 ```shell 29 unzip native-linux-{version number}.zip 30 ``` 31 32 The directory structure after decompression is as follows. The content in the directory may vary depending on the version. Use the native APIs of the latest version. 33 ```text 34 native/ 35 ├── build // Cross-compilation toolchain 36 ├── build-tools // Compilation and build tools 37 ├── docs 38 ├── llvm 39 ├── nativeapi_syscap_config.json 40 ├── ndk_system_capability.json 41 ├── NOTICE.txt 42 ├── oh-uni-package.json 43 └── sysroot // Native API header files and libraries 44 ``` 45## Available APIs 46 47This section describes the common APIs used in the NNRt development process. 48 49### Structs 50 51| Name| Description| 52| --------- | ---- | 53| typedef struct OH_NNModel OH_NNModel | Model handle of NNRt. It is used to construct a model.| 54| typedef struct OH_NNCompilation OH_NNCompilation | Compiler handle of NNRt. It is used to compile an AI model.| 55| typedef struct OH_NNExecutor OH_NNExecutor | Executor handle of NNRt. It is used to perform inference computing on a specified device.| 56| typedef struct NN_QuantParam NN_QuantParam | Quantization parameter handle, which is used to specify the quantization parameter of the tensor during model construction.| 57| typedef struct NN_TensorDesc NN_TensorDesc | Tensor description handle, which is used to describe tensor attributes, such as the data format, data type, and shape.| 58| typedef struct NN_Tensor NN_Tensor | Tensor handle, which is used to set the inference input and output tensors of the executor.| 59 60### Model Construction APIs 61 62| Name| Description| 63| ------- | --- | 64| OH_NNModel_Construct() | Creates a model instance of the OH_NNModel type.| 65| OH_NN_ReturnCode OH_NNModel_AddTensorToModel(OH_NNModel *model, const NN_TensorDesc *tensorDesc) | Adds a tensor to a model instance.| 66| OH_NN_ReturnCode OH_NNModel_SetTensorData(OH_NNModel *model, uint32_t index, const void *dataBuffer, size_t length) | Sets the tensor value.| 67| OH_NN_ReturnCode OH_NNModel_AddOperation(OH_NNModel *model, OH_NN_OperationType op, const OH_NN_UInt32Array *paramIndices, const OH_NN_UInt32Array *inputIndices, const OH_NN_UInt32Array *outputIndices) | Adds an operator to a model instance.| 68| OH_NN_ReturnCode OH_NNModel_SpecifyInputsAndOutputs(OH_NNModel *model, const OH_NN_UInt32Array *inputIndices, const OH_NN_UInt32Array *outputIndices) | Sets an index value for the input and output tensors of a model.| 69| OH_NN_ReturnCode OH_NNModel_Finish(OH_NNModel *model) | Completes model composition.| 70| void OH_NNModel_Destroy(OH_NNModel **model) | Destroys a model instance.| 71 72 73### Model Compilation APIs 74 75| Name| Description| 76| ------- | --- | 77| OH_NNCompilation *OH_NNCompilation_Construct(const OH_NNModel *model) | Creates an **OH_NNCompilation** instance based on the specified model instance.| 78| OH_NNCompilation *OH_NNCompilation_ConstructWithOfflineModelFile(const char *modelPath) | Creates an **OH_NNCompilation** instance based on the specified offline model file path.| 79| OH_NNCompilation *OH_NNCompilation_ConstructWithOfflineModelBuffer(const void *modelBuffer, size_t modelSize) | Creates an **OH_NNCompilation** instance based on the specified offline model buffer.| 80| OH_NNCompilation *OH_NNCompilation_ConstructForCache() | Creates an empty model building instance for later recovery from the model cache.| 81| OH_NN_ReturnCode OH_NNCompilation_ExportCacheToBuffer(OH_NNCompilation *compilation, const void *buffer, size_t length, size_t *modelSize) | Writes the model cache to the specified buffer.| 82| OH_NN_ReturnCode OH_NNCompilation_ImportCacheFromBuffer(OH_NNCompilation *compilation, const void *buffer, size_t modelSize) | Reads the model cache from the specified buffer.| 83| OH_NN_ReturnCode OH_NNCompilation_AddExtensionConfig(OH_NNCompilation *compilation, const char *configName, const void *configValue, const size_t configValueSize) | Adds extended configurations for custom device attributes. For details about the extended attribute names and values, see the documentation that comes with the device.| 84| OH_NN_ReturnCode OH_NNCompilation_SetDevice(OH_NNCompilation *compilation, size_t deviceID) | Sets the Device for model building and computing, which can be obtained through the device management APIs.| 85| OH_NN_ReturnCode OH_NNCompilation_SetCache(OH_NNCompilation *compilation, const char *cachePath, uint32_t version) | Sets the cache directory and version for model building.| 86| OH_NN_ReturnCode OH_NNCompilation_SetPerformanceMode(OH_NNCompilation *compilation, OH_NN_PerformanceMode performanceMode) | Sets the performance mode for model computing.| 87| OH_NN_ReturnCode OH_NNCompilation_SetPriority(OH_NNCompilation *compilation, OH_NN_Priority priority) | Sets the priority for model computing.| 88| OH_NN_ReturnCode OH_NNCompilation_EnableFloat16(OH_NNCompilation *compilation, bool enableFloat16) | Enables float16 for computing.| 89| OH_NN_ReturnCode OH_NNCompilation_Build(OH_NNCompilation *compilation) | Performs model building.| 90| void OH_NNCompilation_Destroy(OH_NNCompilation **compilation) | Destroys a model building instance.| 91 92### Tensor Description APIs 93 94| Name| Description| 95| ------- | --- | 96| NN_TensorDesc *OH_NNTensorDesc_Create() | Creates an **NN_TensorDesc** instance for creating an **NN_Tensor** instance at a later time.| 97| OH_NN_ReturnCode OH_NNTensorDesc_SetName(NN_TensorDesc *tensorDesc, const char *name) | Sets the name of the **NN_TensorDesc** instance.| 98| OH_NN_ReturnCode OH_NNTensorDesc_GetName(const NN_TensorDesc *tensorDesc, const char **name) | Obtains the name of the **NN_TensorDesc** instance.| 99| OH_NN_ReturnCode OH_NNTensorDesc_SetDataType(NN_TensorDesc *tensorDesc, OH_NN_DataType dataType) | Sets the data type of the **NN_TensorDesc** instance.| 100| OH_NN_ReturnCode OH_NNTensorDesc_GetDataType(const NN_TensorDesc *tensorDesc, OH_NN_DataType *dataType) | Obtains the data type of the **NN_TensorDesc** instance.| 101| OH_NN_ReturnCode OH_NNTensorDesc_SetShape(NN_TensorDesc *tensorDesc, const int32_t *shape, size_t shapeLength) | Sets the shape of the **NN_TensorDesc** instance.| 102| OH_NN_ReturnCode OH_NNTensorDesc_GetShape(const NN_TensorDesc *tensorDesc, int32_t **shape, size_t *shapeLength) | Obtains the shape of the **NN_TensorDesc** instance.| 103| OH_NN_ReturnCode OH_NNTensorDesc_SetFormat(NN_TensorDesc *tensorDesc, OH_NN_Format format) | Sets the data format of the **NN_TensorDesc** instance.| 104| OH_NN_ReturnCode OH_NNTensorDesc_GetFormat(const NN_TensorDesc *tensorDesc, OH_NN_Format *format) | Obtains the data format of the **NN_TensorDesc** instance.| 105| OH_NN_ReturnCode OH_NNTensorDesc_GetElementCount(const NN_TensorDesc *tensorDesc, size_t *elementCount) | Obtains the number of elements in the **NN_TensorDesc** instance.| 106| OH_NN_ReturnCode OH_NNTensorDesc_GetByteSize(const NN_TensorDesc *tensorDesc, size_t *byteSize) | Obtains the number of bytes occupied by the tensor data obtained through calculation based on the shape and data type of an **NN_TensorDesc** instance.| 107| OH_NN_ReturnCode OH_NNTensorDesc_Destroy(NN_TensorDesc **tensorDesc) | Destroys an **NN_TensorDesc** instance.| 108 109### Tensor APIs 110 111| Name| Description| 112| ------- | --- | 113| NN_Tensor* OH_NNTensor_Create(size_t deviceID, NN_TensorDesc *tensorDesc) | Creates an **NN_Tensor** instance based on the specified tensor description. This API will request for device shared memory.| 114| NN_Tensor* OH_NNTensor_CreateWithSize(size_t deviceID, NN_TensorDesc *tensorDesc, size_t size) | Creates an **NN_Tensor** instance based on the specified memory size and tensor description. This API will request for device shared memory.| 115| NN_Tensor* OH_NNTensor_CreateWithFd(size_t deviceID, NN_TensorDesc *tensorDesc, int fd, size_t size, size_t offset) | Creates an **NN_Tensor** instance based on the specified file descriptor of the shared memory and tensor description. This way, the device shared memory of other tensors can be reused.| 116| NN_TensorDesc* OH_NNTensor_GetTensorDesc(const NN_Tensor *tensor) | Obtains the pointer to the **NN_TensorDesc** instance in a tensor to read tensor attributes, such as the data type and shape.| 117| void* OH_NNTensor_GetDataBuffer(const NN_Tensor *tensor) | Obtains the memory address of tensor data to read or write tensor data.| 118| OH_NN_ReturnCode OH_NNTensor_GetFd(const NN_Tensor *tensor, int *fd) | Obtains the file descriptor of the shared memory where the tensor data is located. A file descriptor corresponds to a device shared memory block.| 119| OH_NN_ReturnCode OH_NNTensor_GetSize(const NN_Tensor *tensor, size_t *size) | Obtains the size of the shared memory where tensor data is located.| 120| OH_NN_ReturnCode OH_NNTensor_GetOffset(const NN_Tensor *tensor, size_t *offset) | Obtains the offset of the tensor data in the shared memory. The available size of the tensor data is the size of the shared memory minus the offset.| 121| OH_NN_ReturnCode OH_NNTensor_Destroy(NN_Tensor **tensor) | Destroys an **NN_Tensor** instance.| 122 123### Inference APIs 124 125| Name| Description| 126| ------- | --- | 127| OH_NNExecutor *OH_NNExecutor_Construct(OH_NNCompilation *compilation) | Creates an **OH_NNExecutor** instance.| 128| OH_NN_ReturnCode OH_NNExecutor_GetOutputShape(OH_NNExecutor *executor, uint32_t outputIndex, int32_t **shape, uint32_t *shapeLength) | Obtains the dimension information about the output tensor. This API is applicable only if the output tensor has a dynamic shape.| 129| OH_NN_ReturnCode OH_NNExecutor_GetInputCount(const OH_NNExecutor *executor, size_t *inputCount) | Obtains the number of input tensors.| 130| OH_NN_ReturnCode OH_NNExecutor_GetOutputCount(const OH_NNExecutor *executor, size_t *outputCount) | Obtains the number of output tensors.| 131| NN_TensorDesc* OH_NNExecutor_CreateInputTensorDesc(const OH_NNExecutor *executor, size_t index) | Creates an **NN_TensorDesc** instance for an input tensor based on the specified index value. This instance will be used to read tensor attributes or create **NN_Tensor** instances.| 132| NN_TensorDesc* OH_NNExecutor_CreateOutputTensorDesc(const OH_NNExecutor *executor, size_t index) | Creates an **NN_TensorDesc** instance for an output tensor based on the specified index value. This instance will be used to read tensor attributes or create **NN_Tensor** instances.| 133| OH_NN_ReturnCode OH_NNExecutor_GetInputDimRange(const OH_NNExecutor *executor, size_t index, size_t **minInputDims, size_t **maxInputDims, size_t *shapeLength) |Obtains the dimension range of all input tensors. If the input tensor has a dynamic shape, the dimension range supported by the tensor may vary according to device. | 134| OH_NN_ReturnCode OH_NNExecutor_SetOnRunDone(OH_NNExecutor *executor, NN_OnRunDone onRunDone) | Sets the callback function invoked when the asynchronous inference ends. For the definition of the callback function, see the *API Reference*.| 135| OH_NN_ReturnCode OH_NNExecutor_SetOnServiceDied(OH_NNExecutor *executor, NN_OnServiceDied onServiceDied) | Sets the callback function invoked when the device driver service terminates unexpectedly during asynchronous inference. For the definition of the callback function, see the *API Reference*.| 136| OH_NN_ReturnCode OH_NNExecutor_RunSync(OH_NNExecutor *executor, NN_Tensor *inputTensor[], size_t inputCount, NN_Tensor *outputTensor[], size_t outputCount) | Performs synchronous inference.| 137| OH_NN_ReturnCode OH_NNExecutor_RunAsync(OH_NNExecutor *executor, NN_Tensor *inputTensor[], size_t inputCount, NN_Tensor *outputTensor[], size_t outputCount, int32_t timeout, void *userData) | Performs asynchronous inference.| 138| void OH_NNExecutor_Destroy(OH_NNExecutor **executor) | Destroys an **OH_NNExecutor** instance.| 139 140### Device Management APIs 141 142| Name| Description| 143| ------- | --- | 144| OH_NN_ReturnCode OH_NNDevice_GetAllDevicesID(const size_t **allDevicesID, uint32_t *deviceCount) | Obtains the ID of the device connected to NNRt.| 145| OH_NN_ReturnCode OH_NNDevice_GetName(size_t deviceID, const char **name) | Obtains the name of the specified device.| 146| OH_NN_ReturnCode OH_NNDevice_GetType(size_t deviceID, OH_NN_DeviceType *deviceType) | Obtains the type of the specified device.| 147 148 149## How to Develop 150 151The development process of NNRt consists of three phases: model construction, model compilation, and inference execution. The following uses the `Add` single-operator model as an example to describe how to call NNRt APIs during application development. 152 1531. Create an application sample file. 154 155 Create the source file of the NNRt application sample. Run the following commands in the project directory to create the `nnrt_example/` directory and create the `nnrt_example.cpp` source file in the directory: 156 157 ```shell 158 mkdir ~/nnrt_example && cd ~/nnrt_example 159 touch nnrt_example.cpp 160 ``` 161 1622. Import the NNRt module. 163 164 Add the following code at the beginning of the `nnrt_example.cpp` file to import NNRt: 165 166 ```cpp 167 #include <iostream> 168 #include <cstdarg> 169 #include "neural_network_runtime/neural_network_runtime.h" 170 ``` 171 1723. Defines auxiliary functions, such as log printing, input data setting, and data printing. 173 174 ```cpp 175 // Macro for checking the return value 176 #define CHECKNEQ(realRet, expectRet, retValue, ...) \ 177 do { \ 178 if ((realRet) != (expectRet)) { \ 179 printf(__VA_ARGS__); \ 180 return (retValue); \ 181 } \ 182 } while (0) 183 184 #define CHECKEQ(realRet, expectRet, retValue, ...) \ 185 do { \ 186 if ((realRet) == (expectRet)) { \ 187 printf(__VA_ARGS__); \ 188 return (retValue); \ 189 } \ 190 } while (0) 191 192 // Set the input data for inference. 193 OH_NN_ReturnCode SetInputData(NN_Tensor* inputTensor[], size_t inputSize) 194 { 195 OH_NN_DataType dataType(OH_NN_FLOAT32); 196 OH_NN_ReturnCode ret{OH_NN_FAILED}; 197 size_t elementCount = 0; 198 for (size_t i = 0; i < inputSize; ++i) { 199 // Obtain the data memory of the tensor. 200 auto data = OH_NNTensor_GetDataBuffer(inputTensor[i]); 201 CHECKEQ(data, nullptr, OH_NN_FAILED, "Failed to get data buffer."); 202 // Obtain the tensor description. 203 auto desc = OH_NNTensor_GetTensorDesc(inputTensor[i]); 204 CHECKEQ(desc, nullptr, OH_NN_FAILED, "Failed to get desc."); 205 // Obtain the data type of the tensor. 206 ret = OH_NNTensorDesc_GetDataType(desc, &dataType); 207 CHECKNEQ(ret, OH_NN_SUCCESS, OH_NN_FAILED, "Failed to get data type."); 208 // Obtain the number of elements in the tensor. 209 ret = OH_NNTensorDesc_GetElementCount(desc, &elementCount); 210 CHECKNEQ(ret, OH_NN_SUCCESS, OH_NN_FAILED, "Failed to get element count."); 211 switch(dataType) { 212 case OH_NN_FLOAT32: { 213 float* floatValue = reinterpret_cast<float*>(data); 214 for (size_t j = 0; j < elementCount; ++j) { 215 floatValue[j] = static_cast<float>(j); 216 } 217 break; 218 } 219 case OH_NN_INT32: { 220 int* intValue = reinterpret_cast<int*>(data); 221 for (size_t j = 0; j < elementCount; ++j) { 222 intValue[j] = static_cast<int>(j); 223 } 224 break; 225 } 226 default: 227 return OH_NN_FAILED; 228 } 229 } 230 return OH_NN_SUCCESS; 231 } 232 233 OH_NN_ReturnCode Print(NN_Tensor* outputTensor[], size_t outputSize) 234 { 235 OH_NN_DataType dataType(OH_NN_FLOAT32); 236 OH_NN_ReturnCode ret{OH_NN_FAILED}; 237 size_t elementCount = 0; 238 for (size_t i = 0; i < outputSize; ++i) { 239 auto data = OH_NNTensor_GetDataBuffer(outputTensor[i]); 240 CHECKEQ(data, nullptr, OH_NN_FAILED, "Failed to get data buffer."); 241 auto desc = OH_NNTensor_GetTensorDesc(outputTensor[i]); 242 CHECKEQ(desc, nullptr, OH_NN_FAILED, "Failed to get desc."); 243 ret = OH_NNTensorDesc_GetDataType(desc, &dataType); 244 CHECKNEQ(ret, OH_NN_SUCCESS, OH_NN_FAILED, "Failed to get data type."); 245 ret = OH_NNTensorDesc_GetElementCount(desc, &elementCount); 246 CHECKNEQ(ret, OH_NN_SUCCESS, OH_NN_FAILED, "Failed to get element count."); 247 switch(dataType) { 248 case OH_NN_FLOAT32: { 249 float* floatValue = reinterpret_cast<float*>(data); 250 for (size_t j = 0; j < elementCount; ++j) { 251 std::cout << "Output index: " << j << ", value is: " << floatValue[j] << "." << std::endl; 252 } 253 break; 254 } 255 case OH_NN_INT32: { 256 int* intValue = reinterpret_cast<int*>(data); 257 for (size_t j = 0; j < elementCount; ++j) { 258 std::cout << "Output index: " << j << ", value is: " << intValue[j] << "." << std::endl; 259 } 260 break; 261 } 262 default: 263 return OH_NN_FAILED; 264 } 265 } 266 267 return OH_NN_SUCCESS; 268 } 269 ``` 270 2714. Construct a model. 272 273 Use the model construction APIs to construct a single `Add` operator model. 274 275 ```cpp 276 OH_NN_ReturnCode BuildModel(OH_NNModel** pmodel) 277 { 278 // Create a model instance and construct a model. 279 OH_NNModel* model = OH_NNModel_Construct(); 280 CHECKEQ(model, nullptr, OH_NN_FAILED, "Create model failed."); 281 282 // Add the first input tensor of the float32 type for the Add operator. The tensor shape is [1, 2, 2, 3]. 283 NN_TensorDesc* tensorDesc = OH_NNTensorDesc_Create(); 284 CHECKEQ(tensorDesc, nullptr, OH_NN_FAILED, "Create TensorDesc failed."); 285 286 int32_t inputDims[4] = {1, 2, 2, 3}; 287 auto returnCode = OH_NNTensorDesc_SetShape(tensorDesc, inputDims, 4); 288 CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Set TensorDesc shape failed."); 289 290 returnCode = OH_NNTensorDesc_SetDataType(tensorDesc, OH_NN_FLOAT32); 291 CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Set TensorDesc data type failed."); 292 293 returnCode = OH_NNTensorDesc_SetFormat(tensorDesc, OH_NN_FORMAT_NONE); 294 CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Set TensorDesc format failed."); 295 296 returnCode = OH_NNModel_AddTensorToModel(model, tensorDesc); 297 CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Add first TensorDesc to model failed."); 298 299 returnCode = OH_NNModel_SetTensorType(model, 0, OH_NN_TENSOR); 300 CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Set model tensor type failed."); 301 302 // Add the second input tensor of the float32 type for the Add operator. The tensor shape is [1, 2, 2, 3]. 303 tensorDesc = OH_NNTensorDesc_Create(); 304 CHECKEQ(tensorDesc, nullptr, OH_NN_FAILED, "Create TensorDesc failed."); 305 306 returnCode = OH_NNTensorDesc_SetShape(tensorDesc, inputDims, 4); 307 CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Set TensorDesc shape failed."); 308 309 returnCode = OH_NNTensorDesc_SetDataType(tensorDesc, OH_NN_FLOAT32); 310 CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Set TensorDesc data type failed."); 311 312 returnCode = OH_NNTensorDesc_SetFormat(tensorDesc, OH_NN_FORMAT_NONE); 313 CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Set TensorDesc format failed."); 314 315 returnCode = OH_NNModel_AddTensorToModel(model, tensorDesc); 316 CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Add second TensorDesc to model failed."); 317 318 returnCode = OH_NNModel_SetTensorType(model, 1, OH_NN_TENSOR); 319 CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Set model tensor type failed."); 320 321 // Add the parameter tensor of the int8 type for the Add operator. The parameter tensor is used to specify the type of the activation function. 322 tensorDesc = OH_NNTensorDesc_Create(); 323 CHECKEQ(tensorDesc, nullptr, OH_NN_FAILED, "Create TensorDesc failed."); 324 325 int32_t activationDims = 1; 326 returnCode = OH_NNTensorDesc_SetShape(tensorDesc, &activationDims, 1); 327 CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Set TensorDesc shape failed."); 328 329 returnCode = OH_NNTensorDesc_SetDataType(tensorDesc, OH_NN_INT8); 330 CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Set TensorDesc data type failed."); 331 332 returnCode = OH_NNTensorDesc_SetFormat(tensorDesc, OH_NN_FORMAT_NONE); 333 CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Set TensorDesc format failed."); 334 335 returnCode = OH_NNModel_AddTensorToModel(model, tensorDesc); 336 CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Add second TensorDesc to model failed."); 337 338 returnCode = OH_NNModel_SetTensorType(model, 2, OH_NN_ADD_ACTIVATIONTYPE); 339 CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Set model tensor type failed."); 340 341 // Set the type of the activation function to OH_NN_FUSED_NONE, indicating that no activation function is added to the operator. 342 int8_t activationValue = OH_NN_FUSED_NONE; 343 returnCode = OH_NNModel_SetTensorData(model, 2, &activationValue, sizeof(int8_t)); 344 CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Set model tensor data failed."); 345 346 // Add the output tensor of the float32 type for the Add operator. The tensor shape is [1, 2, 2, 3]. 347 tensorDesc = OH_NNTensorDesc_Create(); 348 CHECKEQ(tensorDesc, nullptr, OH_NN_FAILED, "Create TensorDesc failed."); 349 350 returnCode = OH_NNTensorDesc_SetShape(tensorDesc, inputDims, 4); 351 CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Set TensorDesc shape failed."); 352 353 returnCode = OH_NNTensorDesc_SetDataType(tensorDesc, OH_NN_FLOAT32); 354 CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Set TensorDesc data type failed."); 355 356 returnCode = OH_NNTensorDesc_SetFormat(tensorDesc, OH_NN_FORMAT_NONE); 357 CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Set TensorDesc format failed."); 358 359 returnCode = OH_NNModel_AddTensorToModel(model, tensorDesc); 360 CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Add forth TensorDesc to model failed."); 361 362 returnCode = OH_NNModel_SetTensorType(model, 3, OH_NN_TENSOR); 363 CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Set model tensor type failed."); 364 365 // Specify index values of the input tensor, parameter tensor, and output tensor for the Add operator. 366 uint32_t inputIndicesValues[2] = {0, 1}; 367 uint32_t paramIndicesValues = 2; 368 uint32_t outputIndicesValues = 3; 369 OH_NN_UInt32Array paramIndices = {¶mIndicesValues, 1}; 370 OH_NN_UInt32Array inputIndices = {inputIndicesValues, 2}; 371 OH_NN_UInt32Array outputIndices = {&outputIndicesValues, 1}; 372 373 // Add the Add operator to the model instance. 374 returnCode = OH_NNModel_AddOperation(model, OH_NN_OPS_ADD, ¶mIndices, &inputIndices, &outputIndices); 375 CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Add operation to model failed."); 376 377 // Set the index values of the input tensor and output tensor for the model instance. 378 returnCode = OH_NNModel_SpecifyInputsAndOutputs(model, &inputIndices, &outputIndices); 379 CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Specify model inputs and outputs failed."); 380 381 // Complete the model instance construction. 382 returnCode = OH_NNModel_Finish(model); 383 CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "Build model failed."); 384 385 // Return the model instance. 386 *pmodel = model; 387 return OH_NN_SUCCESS; 388 } 389 ``` 390 3915. Query the AI acceleration chips connected to NNRt. 392 393 NNRt can connect to multiple AI acceleration chips through HDIs. Before model building, you need to query the AI acceleration chips connected to NNRt on the current device. Each AI acceleration chip has a unique ID. In the compilation phase, you need to specify the chip for model compilation based on the ID. 394 ```cpp 395 void GetAvailableDevices(std::vector<size_t>& availableDevice) 396 { 397 availableDevice.clear(); 398 399 // Obtain the available hardware IDs. 400 const size_t* devices = nullptr; 401 uint32_t deviceCount = 0; 402 OH_NN_ReturnCode ret = OH_NNDevice_GetAllDevicesID(&devices, &deviceCount); 403 if (ret != OH_NN_SUCCESS) { 404 std::cout << "GetAllDevicesID failed, get no available device." << std::endl; 405 return; 406 } 407 408 for (uint32_t i = 0; i < deviceCount; i++) { 409 availableDevice.emplace_back(devices[i]); 410 } 411 } 412 ``` 413 4146. Compile a model on the specified device. 415 416 NNRt uses abstract model expressions to describe the topology structure of an AI model. Before inference execution on an AI acceleration chip, the build module provided by NNRt needs to deliver the abstract model expressions to the chip driver layer and convert the abstract model expressions into a format that supports inference and computing. 417 ```cpp 418 OH_NN_ReturnCode CreateCompilation(OH_NNModel* model, const std::vector<size_t>& availableDevice, 419 OH_NNCompilation** pCompilation) 420 { 421 // Create an OH_NNCompilation instance and pass the image composition model instance or the MindSpore Lite model instance to it. 422 OH_NNCompilation* compilation = OH_NNCompilation_Construct(model); 423 CHECKEQ(compilation, nullptr, OH_NN_FAILED, "OH_NNCore_ConstructCompilationWithNNModel failed."); 424 425 // Set compilation options, such as the compilation hardware, cache path, performance mode, computing priority, and whether to enable float16 low-precision computing. 426 // Choose to perform model compilation on the first device. 427 auto returnCode = OH_NNCompilation_SetDevice(compilation, availableDevice[0]); 428 CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "OH_NNCompilation_SetDevice failed."); 429 430 // Have the model compilation result cached in the /data/local/tmp directory, with the version number set to 1. 431 returnCode = OH_NNCompilation_SetCache(compilation, "/data/local/tmp", 1); 432 CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "OH_NNCompilation_SetCache failed."); 433 434 // Set the performance mode of the device. 435 returnCode = OH_NNCompilation_SetPerformanceMode(compilation, OH_NN_PERFORMANCE_EXTREME); 436 CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "OH_NNCompilation_SetPerformanceMode failed."); 437 438 // Set the inference priority. 439 returnCode = OH_NNCompilation_SetPriority(compilation, OH_NN_PRIORITY_HIGH); 440 CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "OH_NNCompilation_SetPriority failed."); 441 442 // Specify whether to enable FP16 computing. 443 returnCode = OH_NNCompilation_EnableFloat16(compilation, false); 444 CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "OH_NNCompilation_EnableFloat16 failed."); 445 446 // Perform model building 447 returnCode = OH_NNCompilation_Build(compilation); 448 CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "OH_NNCompilation_Build failed."); 449 450 *pCompilation = compilation; 451 return OH_NN_SUCCESS; 452 } 453 ``` 454 4557. Create an executor. 456 457 After the model building is complete, you need to call the NNRt execution module to create an executor. In the inference phase, operations such as setting the model input, obtaining the model output, and triggering inference computing are performed through the executor. 458 ```cpp 459 OH_NNExecutor* CreateExecutor(OH_NNCompilation* compilation) 460 { 461 // Create an executor based on the specified OH_NNCompilation instance. 462 OH_NNExecutor *executor = OH_NNExecutor_Construct(compilation); 463 CHECKEQ(executor, nullptr, nullptr, "OH_NNExecutor_Construct failed."); 464 return executor; 465 } 466 ``` 467 4688. Perform inference computing, and print the inference result. 469 470 The input data required for inference computing is passed to the executor through the API provided by the execution module. This way, the executor is triggered to perform inference computing once to obtain and print the inference computing result. 471 ```cpp 472 OH_NN_ReturnCode Run(OH_NNExecutor* executor, const std::vector<size_t>& availableDevice) 473 { 474 // Obtain information about the input and output tensors from the executor. 475 // Obtain the number of input tensors. 476 size_t inputCount = 0; 477 auto returnCode = OH_NNExecutor_GetInputCount(executor, &inputCount); 478 CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "OH_NNExecutor_GetInputCount failed."); 479 std::vector<NN_TensorDesc*> inputTensorDescs; 480 NN_TensorDesc* tensorDescTmp = nullptr; 481 for (size_t i = 0; i < inputCount; ++i) { 482 // Create the description of the input tensor. 483 tensorDescTmp = OH_NNExecutor_CreateInputTensorDesc(executor, i); 484 CHECKEQ(tensorDescTmp, nullptr, OH_NN_FAILED, "OH_NNExecutor_CreateInputTensorDesc failed."); 485 inputTensorDescs.emplace_back(tensorDescTmp); 486 } 487 // Obtain the number of output tensors. 488 size_t outputCount = 0; 489 returnCode = OH_NNExecutor_GetOutputCount(executor, &outputCount); 490 CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "OH_NNExecutor_GetOutputCount failed."); 491 std::vector<NN_TensorDesc*> outputTensorDescs; 492 for (size_t i = 0; i < outputCount; ++i) { 493 // Create the description of the output tensor. 494 tensorDescTmp = OH_NNExecutor_CreateOutputTensorDesc(executor, i); 495 CHECKEQ(tensorDescTmp, nullptr, OH_NN_FAILED, "OH_NNExecutor_CreateOutputTensorDesc failed."); 496 outputTensorDescs.emplace_back(tensorDescTmp); 497 } 498 499 // Create input and output tensors. 500 NN_Tensor* inputTensors[inputCount]; 501 NN_Tensor* tensor = nullptr; 502 for (size_t i = 0; i < inputCount; ++i) { 503 tensor = nullptr; 504 tensor = OH_NNTensor_Create(availableDevice[0], inputTensorDescs[i]); 505 CHECKEQ(tensor, nullptr, OH_NN_FAILED, "OH_NNTensor_Create failed."); 506 inputTensors[i] = tensor; 507 } 508 NN_Tensor* outputTensors[outputCount]; 509 for (size_t i = 0; i < outputCount; ++i) { 510 tensor = nullptr; 511 tensor = OH_NNTensor_Create(availableDevice[0], outputTensorDescs[i]); 512 CHECKEQ(tensor, nullptr, OH_NN_FAILED, "OH_NNTensor_Create failed."); 513 outputTensors[i] = tensor; 514 } 515 516 // Set the data of the input tensor. 517 returnCode = SetInputData(inputTensors, inputCount); 518 CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "SetInputData failed."); 519 520 // Perform inference 521 returnCode = OH_NNExecutor_RunSync(executor, inputTensors, inputCount, outputTensors, outputCount); 522 CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "OH_NNExecutor_RunSync failed."); 523 524 // Print the data of the output tensor. 525 Print(outputTensors, outputCount); 526 527 // Clear the input and output tensors and tensor description. 528 for (size_t i = 0; i < inputCount; ++i) { 529 returnCode = OH_NNTensor_Destroy(&inputTensors[i]); 530 CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "OH_NNTensor_Destroy failed."); 531 returnCode = OH_NNTensorDesc_Destroy(&inputTensorDescs[i]); 532 CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "OH_NNTensorDesc_Destroy failed."); 533 } 534 for (size_t i = 0; i < outputCount; ++i) { 535 returnCode = OH_NNTensor_Destroy(&outputTensors[i]); 536 CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "OH_NNTensor_Destroy failed."); 537 returnCode = OH_NNTensorDesc_Destroy(&outputTensorDescs[i]); 538 CHECKNEQ(returnCode, OH_NN_SUCCESS, OH_NN_FAILED, "OH_NNTensorDesc_Destroy failed."); 539 } 540 541 return OH_NN_SUCCESS; 542 } 543 ``` 544 5459. Build an end-to-end process from model construction to model compilation and execution. 546 547 Steps 4 to 8 implement the model construction, compilation, and execution processes and encapsulates them into multiple functions to facilitate modular development. The following sample code shows how to apply these functions into a complete NNRt development process. 548 ```cpp 549 int main(int argc, char** argv) 550 { 551 OH_NNModel* model = nullptr; 552 OH_NNCompilation* compilation = nullptr; 553 OH_NNExecutor* executor = nullptr; 554 std::vector<size_t> availableDevices; 555 556 // Construct a model. 557 OH_NN_ReturnCode ret = BuildModel(&model); 558 if (ret != OH_NN_SUCCESS) { 559 std::cout << "BuildModel failed." << std::endl; 560 OH_NNModel_Destroy(&model); 561 return -1; 562 } 563 564 // Obtain the available devices. 565 GetAvailableDevices(availableDevices); 566 if (availableDevices.empty()) { 567 std::cout << "No available device." << std::endl; 568 OH_NNModel_Destroy(&model); 569 return -1; 570 } 571 572 // Build the model. 573 ret = CreateCompilation(model, availableDevices, &compilation); 574 if (ret != OH_NN_SUCCESS) { 575 std::cout << "CreateCompilation failed." << std::endl; 576 OH_NNModel_Destroy(&model); 577 OH_NNCompilation_Destroy(&compilation); 578 return -1; 579 } 580 581 // Destroy the model instance. 582 OH_NNModel_Destroy(&model); 583 584 // Create an inference executor for the model. 585 executor = CreateExecutor(compilation); 586 if (executor == nullptr) { 587 std::cout << "CreateExecutor failed, no executor is created." << std::endl; 588 OH_NNCompilation_Destroy(&compilation); 589 return -1; 590 } 591 592 // Destroy the model building instance. 593 OH_NNCompilation_Destroy(&compilation); 594 595 // Use the created executor to perform inference. 596 ret = Run(executor, availableDevices); 597 if (ret != OH_NN_SUCCESS) { 598 std::cout << "Run failed." << std::endl; 599 OH_NNExecutor_Destroy(&executor); 600 return -1; 601 } 602 603 // Destroy the executor instance. 604 OH_NNExecutor_Destroy(&executor); 605 606 return 0; 607 } 608 ``` 609 610## Verification 611 6121. Prepare the compilation configuration file of the application sample. 613 614 Create a `CMakeLists.txt` file, and add compilation configurations to the application sample file `nnrt_example.cpp`. The following is a simple example of the `CMakeLists.txt` file: 615 ```text 616 cmake_minimum_required(VERSION 3.16) 617 project(nnrt_example C CXX) 618 619 add_executable(nnrt_example 620 ./nnrt_example.cpp 621 ) 622 623 target_link_libraries(nnrt_example 624 neural_network_runtime 625 neural_network_core 626 ) 627 ``` 628 6292. Compile the application sample. 630 631 Create the **build/** directory in the current directory, and compile `nnrt\_example.cpp` in the **build/** directory to obtain the binary file `nnrt\_example`: 632 ```shell 633 mkdir build && cd build 634 cmake -DCMAKE_TOOLCHAIN_FILE={Path of the cross-compilation toolchain}/build/cmake/ohos.toolchain.cmake -DOHOS_ARCH=arm64-v8a -DOHOS_PLATFORM=OHOS -DOHOS_STL=c++_static .. 635 make 636 ``` 637 6383. Push the application sample to the device for execution. 639 ```shell 640 # Push the `nnrt_example` obtained through compilation to the device, and execute it. 641 hdc_std file send ./nnrt_example /data/local/tmp/. 642 643 # Grant required permissions to the executable file of the test case. 644 hdc_std shell "chmod +x /data/local/tmp/nnrt_example" 645 646 # Execute the test case. 647 hdc_std shell "/data/local/tmp/nnrt_example" 648 ``` 649 650 If the execution is normal, information similar to the following is displayed: 651 ```text 652 Output index: 0, value is: 0.000000. 653 Output index: 1, value is: 2.000000. 654 Output index: 2, value is: 4.000000. 655 Output index: 3, value is: 6.000000. 656 Output index: 4, value is: 8.000000. 657 Output index: 5, value is: 10.000000. 658 Output index: 6, value is: 12.000000. 659 Output index: 7, value is: 14.000000. 660 Output index: 8, value is: 16.000000. 661 Output index: 9, value is: 18.000000. 662 Output index: 10, value is: 20.000000. 663 Output index: 11, value is: 22.000000. 664 ``` 665 6664. (Optional) Check the model cache. 667 668 If the HDI service connected to NNRt supports the model cache function, you can find the generated cache file in the `/data/local/tmp` directory after the `nnrt_example` is executed successfully. 669 670 > **NOTE** 671 > 672 > The IR graphs of the model need to be passed to the hardware driver layer, so that the HDI service compiles the IR graphs into a computing graph dedicated to hardware. The compilation process is time-consuming. The NNRt supports the computing graph cache feature. It can cache the computing graphs compiled by the HDI service to the device storage. If the same model is compiled on the same acceleration chip next time, you can specify the cache path so that NNRt can directly load the computing graphs in the cache file, reducing the compilation time. 673 674 Check the cached files in the cache directory. 675 ```shell 676 ls /data/local/tmp 677 ``` 678 679 The command output is as follows: 680 ```text 681 # 0.nncache 1.nncache 2.nncache cache_info.nncache 682 ``` 683 684 If the cache is no longer used, manually delete the cache files. 685 ```shell 686 rm /data/local/tmp/*nncache 687 ``` 688