1# Using MindSpore Lite for Model Inference (C/C++)
2
3## When to Use
4
5MindSpore Lite is an AI engine that provides AI model inference for different hardware devices. It has been used in a wide range of fields, such as image classification, target recognition, facial recognition, and character recognition.
6
7This document describes the general development process for MindSpore Lite model inference.
8
9## Basic Concepts
10
11Before getting started, you need to understand the following basic concepts:
12
13**Tensor**: a special data structure that is similar to arrays and matrices. It is basic data structure used in MindSpore Lite network operations.
14
15**Float16 inference mode**: an inference mode in half-precision format, where a number is represented with 16 bits.
16
17
18
19## Available APIs
20
21APIs involved in MindSpore Lite model inference are categorized into context APIs, model APIs, and tensor APIs.
22
23### Context APIs
24
25| API       | Description       |
26| ------------------ | ----------------- |
27|OH_AI_ContextHandle OH_AI_ContextCreate()|Creates a context object. This API must be used together with **OH_AI_ContextDestroy**.|
28|void OH_AI_ContextSetThreadNum(OH_AI_ContextHandle context, int32_t thread_num)|Sets the number of runtime threads.|
29| void OH_AI_ContextSetThreadAffinityMode(OH_AI_ContextHandle context, int mode)|Sets the affinity mode for binding runtime threads to CPU cores, which are classified into large, medium, and small cores based on the CPU frequency. You only need to bind the large or medium cores, but not small cores.
30|OH_AI_DeviceInfoHandle OH_AI_DeviceInfoCreate(OH_AI_DeviceType device_type)|Creates a runtime device information object.|
31|void OH_AI_ContextDestroy(OH_AI_ContextHandle *context)|Destroys a context object.|
32|void OH_AI_DeviceInfoSetEnableFP16(OH_AI_DeviceInfoHandle device_info, bool is_fp16)|Sets whether to enable float16 inference. This function is available only for CPU and GPU devices.|
33|void OH_AI_ContextAddDeviceInfo(OH_AI_ContextHandle context, OH_AI_DeviceInfoHandle device_info)|Adds a runtime device information object.|
34
35### Model APIs
36
37| API       | Description       |
38| ------------------ | ----------------- |
39|OH_AI_ModelHandle OH_AI_ModelCreate()|Creates a model object.|
40|OH_AI_Status OH_AI_ModelBuildFromFile(OH_AI_ModelHandle model, const char *model_path,OH_AI_ModelType odel_type, const OH_AI_ContextHandle model_context)|Loads and builds a MindSpore model from a model file.|
41|void OH_AI_ModelDestroy(OH_AI_ModelHandle *model)|Destroys a model object.|
42
43### Tensor APIs
44
45| API       | Description       |
46| ------------------ | ----------------- |
47|OH_AI_TensorHandleArray OH_AI_ModelGetInputs(const OH_AI_ModelHandle model)|Obtains the input tensor array structure of a model.|
48|int64_t OH_AI_TensorGetElementNum(const OH_AI_TensorHandle tensor)|Obtains the number of tensor elements.|
49|const char *OH_AI_TensorGetName(const OH_AI_TensorHandle tensor)|Obtains the name of a tensor.|
50|OH_AI_DataType OH_AI_TensorGetDataType(const OH_AI_TensorHandle tensor)|Obtains the tensor data type.|
51|void *OH_AI_TensorGetMutableData(const OH_AI_TensorHandle tensor)|Obtains the pointer to mutable tensor data.|
52
53## How to Develop
54
55The following figure shows the development process for MindSpore Lite model inference.
56
57**Figure 1** Development process for MindSpore Lite model inference
58
59![how-to-use-mindspore-lite](figures/01.png)
60
61Before moving to the development process, you need to reference related header files and compile functions to generate random input. The sample code is as follows:
62
63```c
64#include <stdlib.h>
65#include <stdio.h>
66#include "mindspore/model.h"
67
68// Generate random input.
69int GenerateInputDataWithRandom(OH_AI_TensorHandleArray inputs) {
70  for (size_t i = 0; i < inputs.handle_num; ++i) {
71    float *input_data = (float *)OH_AI_TensorGetMutableData(inputs.handle_list[i]);
72    if (input_data == NULL) {
73      printf("MSTensorGetMutableData failed.\n");
74      return OH_AI_STATUS_LITE_ERROR;
75    }
76    int64_t num = OH_AI_TensorGetElementNum(inputs.handle_list[i]);
77    const int divisor = 10;
78    for (size_t j = 0; j < num; j++) {
79      input_data[j] = (float)(rand() % divisor) / divisor;  // 0--0.9f
80    }
81  }
82  return OH_AI_STATUS_SUCCESS;
83}
84```
85
86The development process consists of the following main steps:
87
881. Prepare the required model.
89
90    The required model can be downloaded directly or obtained using the model conversion tool.
91
92     - If the downloaded model is in the `.ms` format, you can use it directly for inference. The following uses the **mobilenetv2.ms** model as an example.
93     - If the downloaded model uses a third-party framework, such as TensorFlow, TensorFlow Lite, Caffe, or ONNX, you can use the [model conversion tool](https://www.mindspore.cn/lite/docs/en/master/use/downloads.html#1-8-1) to convert it to the `.ms` format.
94
952. Create a context, and set parameters such as the number of runtime threads and device type.
96
97    The following describes two typical scenarios:
98
99    Scenario 1: Only the CPU inference context is created.
100
101    ```c
102    // Create a context, and set the number of runtime threads to 2 and the thread affinity mode to 1 (big cores first).
103    OH_AI_ContextHandle context = OH_AI_ContextCreate();
104    if (context == NULL) {
105      printf("OH_AI_ContextCreate failed.\n");
106      return OH_AI_STATUS_LITE_ERROR;
107    }
108    const int thread_num = 2;
109    OH_AI_ContextSetThreadNum(context, thread_num);
110    OH_AI_ContextSetThreadAffinityMode(context, 1);
111    // Set the device type to CPU, and disable Float16 inference.
112    OH_AI_DeviceInfoHandle cpu_device_info = OH_AI_DeviceInfoCreate(OH_AI_DEVICETYPE_CPU);
113    if (cpu_device_info == NULL) {
114      printf("OH_AI_DeviceInfoCreate failed.\n");
115      OH_AI_ContextDestroy(&context);
116      return OH_AI_STATUS_LITE_ERROR;
117    }
118    OH_AI_DeviceInfoSetEnableFP16(cpu_device_info, false);
119    OH_AI_ContextAddDeviceInfo(context, cpu_device_info);
120    ```
121
122    Scenario 2: The neural network runtime (NNRT) and CPU heterogeneous inference contexts are created.
123
124    NNRT is the runtime for cross-chip inference computing in the AI field. Generally, the acceleration hardware connected to NNRT, such as the NPU, has strong inference capabilities but supports only a limited number of operators, whereas the general-purpose CPU has weak inference capabilities but supports a wide range of operators. MindSpore Lite supports NNRT and CPU heterogeneous inference. Model operators are preferentially scheduled to NNRT for inference. If certain operators are not supported by NNRT, then they are scheduled to the CPU for inference. The following is the sample code for configuring NNRT/CPU heterogeneous inference:
125   <!--Del-->
126   > **NOTE**
127   >
128   > NNRT/CPU heterogeneous inference requires access of NNRT hardware. For details, see [OpenHarmony/ai_neural_network_runtime](https://gitee.com/openharmony/ai_neural_network_runtime).
129   <!--DelEnd-->
130    ```c
131    // Create a context, and set the number of runtime threads to 2 and the thread affinity mode to 1 (big cores first).
132    OH_AI_ContextHandle context = OH_AI_ContextCreate();
133    if (context == NULL) {
134      printf("OH_AI_ContextCreate failed.\n");
135      return OH_AI_STATUS_LITE_ERROR;
136    }
137    // Preferentially use NNRT inference.
138    // Use the NNRT hardware of the first ACCELERATORS class to create the NNRT device information and configure the high-performance inference mode for the NNRT hardware. You can also use OH_AI_GetAllNNRTDeviceDescs() to obtain the list of NNRT devices in the current environment, search for a specific device by device name or type, and use the device as the NNRT inference hardware.
139    OH_AI_DeviceInfoHandle nnrt_device_info = OH_AI_CreateNNRTDeviceInfoByType(OH_AI_NNRTDEVICE_ACCELERATOR);
140    if (nnrt_device_info == NULL) {
141      printf("OH_AI_DeviceInfoCreate failed.\n");
142      OH_AI_ContextDestroy(&context);
143      return OH_AI_STATUS_LITE_ERROR;
144    }
145    OH_AI_DeviceInfoSetPerformanceMode(nnrt_device_info, OH_AI_PERFORMANCE_HIGH);
146    OH_AI_ContextAddDeviceInfo(context, nnrt_device_info);
147
148    // Configure CPU inference.
149    OH_AI_DeviceInfoHandle cpu_device_info = OH_AI_DeviceInfoCreate(OH_AI_DEVICETYPE_CPU);
150    if (cpu_device_info == NULL) {
151      printf("OH_AI_DeviceInfoCreate failed.\n");
152      OH_AI_ContextDestroy(&context);
153      return OH_AI_STATUS_LITE_ERROR;
154    }
155    OH_AI_ContextAddDeviceInfo(context, cpu_device_info);
156    ```
157
158
159
1603. Create, load, and build the model.
161
162    Call **OH_AI_ModelBuildFromFile** to load and build the model.
163
164    In this example, the **argv[1]** parameter passed to **OH_AI_ModelBuildFromFile** indicates the specified model file path.
165
166    ```c
167    // Create a model.
168    OH_AI_ModelHandle model = OH_AI_ModelCreate();
169    if (model == NULL) {
170      printf("OH_AI_ModelCreate failed.\n");
171      OH_AI_ContextDestroy(&context);
172      return OH_AI_STATUS_LITE_ERROR;
173    }
174
175    // Load and build the inference model. The model type is OH_AI_MODELTYPE_MINDIR.
176    int ret = OH_AI_ModelBuildFromFile(model, argv[1], OH_AI_MODELTYPE_MINDIR, context);
177    if (ret != OH_AI_STATUS_SUCCESS) {
178      printf("OH_AI_ModelBuildFromFile failed, ret: %d.\n", ret);
179      OH_AI_ModelDestroy(&model);
180      OH_AI_ContextDestroy(&context);
181      return ret;
182    }
183    ```
184
1854. Input data.
186
187    Before executing model inference, you need to populate data to the input tensor. In this example, random data is used to populate the model.
188
189    ```c
190    // Obtain the input tensor.
191    OH_AI_TensorHandleArray inputs = OH_AI_ModelGetInputs(model);
192    if (inputs.handle_list == NULL) {
193      printf("OH_AI_ModelGetInputs failed, ret: %d.\n", ret);
194      OH_AI_ModelDestroy(&model);
195      OH_AI_ContextDestroy(&context);
196      return ret;
197    }
198    // Use random data to populate the tensor.
199    ret = GenerateInputDataWithRandom(inputs);
200    if (ret != OH_AI_STATUS_SUCCESS) {
201      printf("GenerateInputDataWithRandom failed, ret: %d.\n", ret);
202      OH_AI_ModelDestroy(&model);
203      OH_AI_ContextDestroy(&context);
204      return ret;
205    }
206   ```
207
2085. Execute model inference.
209
210    Call **OH_AI_ModelPredict** to perform model inference.
211
212    ```c
213    // Execute model inference.
214    OH_AI_TensorHandleArray outputs;
215    ret = OH_AI_ModelPredict(model, inputs, &outputs, NULL, NULL);
216    if (ret != OH_AI_STATUS_SUCCESS) {
217      printf("OH_AI_ModelPredict failed, ret: %d.\n", ret);
218      OH_AI_ModelDestroy(&model);
219      OH_AI_ContextDestroy(&context);
220      return ret;
221    }
222    ```
223
2246. Obtain the output.
225
226    After model inference is complete, you can obtain the inference result through the output tensor.
227
228    ```c
229    // Obtain the output tensor and print the information.
230    for (size_t i = 0; i < outputs.handle_num; ++i) {
231      OH_AI_TensorHandle tensor = outputs.handle_list[i];
232      long long element_num = OH_AI_TensorGetElementNum(tensor);
233      printf("Tensor name: %s, tensor size is %zu ,elements num: %lld.\n", OH_AI_TensorGetName(tensor),
234            OH_AI_TensorGetDataSize(tensor), element_num);
235      const float *data = (const float *)OH_AI_TensorGetData(tensor);
236      printf("output data is:\n");
237      const int max_print_num = 50;
238      for (int j = 0; j < element_num && j <= max_print_num; ++j) {
239        printf("%f ", data[j]);
240      }
241      printf("\n");
242    }
243    ```
244
2457. Destroy the model.
246
247    If the MindSpore Lite inference framework is no longer needed, you need to destroy the created model.
248
249    ```c
250    // Release the model and context.
251    OH_AI_ModelDestroy(&model);
252    OH_AI_ContextDestroy(&context);
253    ```
254
255## Verification
256
2571. Write **CMakeLists.txt**.
258
259    ```cmake
260    cmake_minimum_required(VERSION 3.14)
261    project(Demo)
262
263    add_executable(demo main.c)
264
265    target_link_libraries(
266            demo
267            mindspore_lite_ndk
268            pthread
269            dl
270    )
271    ```
272   - To use ohos-sdk for cross compilation, you need to set the native toolchain path for the CMake tool as follows: `-DCMAKE_TOOLCHAIN_FILE="/xxx/native/build/cmake/ohos.toolchain.cmake"`.
273
274   - The toolchain builds a 64-bit application by default. To build a 32-bit application, add the following configuration: `-DOHOS_ARCH="armeabi-v7a"`.
275
2762. Run the CMake tool.
277
278    - Use hdc_std to connect to the device and put **demo** and **mobilenetv2.ms** to the same directory on the device.
279    - Run the hdc_std shell command to access the device, go to the directory where **demo** is located, and run the following command:
280
281    ```shell
282    ./demo mobilenetv2.ms
283    ```
284
285    The inference is successful if the output is similar to the following:
286
287    ```shell
288    # ./demo ./mobilenetv2.ms
289    Tensor name: Softmax-65, tensor size is 4004 ,elements num: 1001.
290    output data is:
291    0.000018 0.000012 0.000026 0.000194 0.000156 0.001501 0.000240 0.000825 0.000016 0.000006 0.000007 0.000004 0.000004 0.000004 0.000015 0.000099 0.000011 0.000013 0.000005 0.000023 0.000004 0.000008 0.000003 0.000003 0.000008 0.000014 0.000012 0.000006 0.000019 0.000006 0.000018 0.000024 0.000010 0.000002 0.000028 0.000372 0.000010 0.000017 0.000008 0.000004 0.000007 0.000010 0.000007 0.000012 0.000005 0.000015 0.000007 0.000040 0.000004 0.000085 0.000023
292    ```
293
294