1# 分析AppFreeze(应用无响应)
2
3用户在使用应用时会出现点击没反应、应用无响应等情况,其超过一定时间限制后即被定义为应用无响应(appfreeze)。系统提供了检测应用无响应的机制,并生成appfreeze日志供应用开发分析。
4
5> **说明:**
6> 本文仅适用于Stage模型下的应用使用。且在根据本文分析日志前,需要开发者对JS在系统中运行情况、C++程序堆栈信息有相关基础知识,并对应用相关的子系统有一定了解。
7
8## 应用无响应检测能力点
9
10目前应用无响应检测从以下维度检测,应用开发者了解其原理对定位和分析appfreeze故障非常有帮助。
11
12| 故障类型 | 说明 |
13| -------- | -------- |
14| THREAD_BLOCK_6S | 应用主线程卡死超时 |
15| APP_INPUT_BLOCK | 用户输入响应超时 |
16| LIFECYCLE_TIMEOUT | Ability生命周期切换超时 |
17
18### THREAD_BLOCK_6S 应用主线程卡死超时
19
20该故障出现表示当前应用主线程有卡死或者执行任务过多的情况,影响任务执行的流畅度和体验。
21
22该事件的检测原理是:应用的watchdog线程定期向主线程插入判活检测,并在自己线程插入超时上报机制。当判活检测超过3s没有被执行,会上报THREAD_BLOCK_3S警告事件;超过6s依然没有被执行,会上报THREAD_BLOCK_6S主线程卡死事件。两个事件匹配生成THREAD_BLOCK的应用无响应日志。
23
24检测原理如下图:
25
26![appfreeze_20230308145163](figures/appfreeze_20230308145163.png)
27
28### APP_INPUT_BLOCK 用户输入响应超时
29
30该故障是指用户的点击事件超过一定时间限制未得到响应,严重影响当前用户体验。
31
32该事件的检测原理是:用户点击应用的按钮时,输入系统会向应用侧发送点击事件,但超时未收到应用侧的响应反馈回执,则上报该故障。
33
34检测原理如下图:
35
36![appfreeze_20230308145162](figures/appfreeze_20230308145162.png)
37
38### 生命周期切换超时
39
40生命周期切换超时分为[Ability生命周期](../application-models/uiability-lifecycle.md#uiability组件生命周期)切换超时和[PageAbility生命周期](../application-models/pageability-lifecycle.md#pageability的生命周期)切换超时。
41
42该故障出现在生命周期切换的过程中,影响当前应用内Ability的切换或者不同PageAbility之间的切换。
43
44该事件的检测原理是:通过获取不同生命周期切换的过程,在生命周期切换开始的位置向watchdog线程插入超时任务,在生命周期切换完成之后移除超时任务,固定时间内未成功移除将上报故障。
45
46生命周期切换超时由LIFECYCLE_HALF_TIMEOUT和LIFECYCLE_TIMEOUT两个事件组合而成。LIFECYCLE_HALF_TIMEOUT作为LIFECYCLE_TIMEOUT的警告事件,抓取binder等信息。
47
48![appfreeze_20230308145164](figures/appfreeze_20230308145164.png)
49
50不同的生命周期,超时的时间不一样:
51
52| 生命周期 | 超时时间 |
53| -------- | -------- |
54| Load | 10s |
55| Terminate | 10s |
56| Connect | 3s |
57| Disconnect | 0.5s |
58| Foreground | 5s |
59| Background | 3s |
60
61## 应用无响应日志分析
62
63应用无响应(appfreeze)故障需要结合应用无响应日志和流水hilog日志一起分析。
64
65当前示例仅提供一个分析方法,请开发者根据具体问题具体分析。
66
67应用无响应日志主要分以下几个模块信息:
68
69### 日志头部信息
70
71| 字段 | 说明 |
72| -------- | -------- |
73| Reason | 应用无响应原因,与[应用无响应检测能力点](#应用无响应检测能力点)对应 |
74| PID | 发生故障时候的pid,可以用于在流水日志中搜索相关进程信息 |
75| PACKAGE_NAME | 应用进程包名 |
76
77```
78============================================================
79Device info:OpenHarmony 3.2
80Build info:OpenHarmony 4.0.5.3
81Module name:com.xxx.xxx
82Version:1.0.0
83Pid:1561
84Uid:20010039
85Reason:LIFECYCLE_TIMEOUT
86sysfreeze: LIFECYCLE_TIMEOUT LIFECYCLE_TIMEOUT at 20230317170653
87>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
88DOMAIN:AAFWK
89STRINGID:LIFECYCLE_TIMEOUT
90TIMESTAMP:2023/XX/XX/XX-XX:XX:XX:XX
91PID:1561
92UID:20010039
93PACKAGE_NAME:com.xxx.xxx
94PROCESS_NAME:com.xxx.xxx
95MSG:ablity:EntryAbility background timeout
96```
97
98### 日志主干通用信息
99
100以上三种日志都包含以下几部分信息,可以通过搜索“主要信息字段”在日志中找到对应的位置:
101
102| 主要信息字段 | 说明 |
103| -------- | -------- |
104| EVENTNAME | 应用无响应原因,或者组成卡死检测的不同事件 |
105| TIMESTAMP | 发生故障时上报时刻的事件,可以根据[应用无响应检测能力点](#应用无响应检测能力点)中说明的超时时间,在相应流水日志中缩小查看日志的时间范围 |
106| PID | 发生故障时候的pid,可以与发生时间和超时时间配合用于在流水日志中搜索相关进程信息 |
107| PACKAGE_NAME | 应用进程包名 |
108| MSG | 发生故障时dump信息或者说明信息,后面具体说明 |
109| BinderCatcher | 进程与其他系统进程间通信的调用信息,显示调用等待时间长的情况 |
110| PeerBinder Stacktrace | 跟当前进程相关的对端进程有卡死,会抓取对端的进程堆栈 |
111| cpuusage | 跟当前时间段整机CPU使用情况 |
112| memory | 跟当前时间当前进程的内存使用情况 |
113
114> **说明:**
115>
116> 在整机高负载的情况下,采用低开销方式获取调用栈或抓栈超时的情况,可能损失函数符号和build-id信息。
117
118MSG字段信息主要包括卡死上报的原因,以及当前应用主线程的队列中任务堆积信息。
119
120主线程队列中任务堆积信息包括:
121
122- 当前正在运行的任务以及任务启动的时间:如果跟当前日志上报的时间相差很大,则当前运行的任务就是卡死的主要任务事件。
123
124- 历史任务时间:可判断是否由于历史任务过多且每一个任务执行都占一定时间,导致当前的新任务运行时无法及时响应。
125
126- 堆积中还没有执行的任务。
127
128**当前进程堆栈示例:**
129
130通过搜索pid对应的数字找到应用栈信息。以下堆栈示例表明窗口通过IPC向系统发送事件时,停留在IPC通信阶段。
131
132```
133OpenStacktraceCatcher -pid==1561 packageName is com.ohos.huawei.myapplication
134Result: 0 ( no error )
135Timestamp:2017-08-0817:06:53.000
136Pid:1561
137Uid:20010039
138Process name:com.ohos.huawei.myapplication
139Tid:1561,Name:i.myapplication
140#00 pc 0017888c /system/lib/libark_jsruntime.so
141#01 pc 00025779 /system/lib/platformsdk/libipc_core.z.so(OHOS:BinderConnector:WriteBinder(unsigned Long,void*)+56)
142#02 pc 000265a5 /system/lib/platformsdk/libipc_core.z.so(OHOS:BinderInvoker:TransactWithDriver(bool)+216)
143#03 pc 0002666f /system/lib/platformsdk/libipc_core.z.so(OHOS:BinderInvoker:StartWorkLoop()+18)
144#04 pc 000270a9 /system/lib/platformsdk/libipc_core.z.so(OHOS:BinderInvoker:JoinThread(bool)+32)
145#05 pc 00023783 /system/lib/platformsdk/libipc_core.z.so(OHOS:IPCWorkThread:ThreadHandler(void*)+290)
146#06 pc 00e1c6f7 /system/lib/libace.z.so
147#07 pc 0091bbdd /system/lib/libace.z.so
148#08 pc 0092fd9d /system/lib/libace.z.so
149#09 pc 0092fa5f /system/lib/libace.z.so
150#10 pc 0092cd6b /system/lib/libace.z.so
151#11 pc 009326a9 /system/lib/libace.z.so
152#12 pc 0093054b /system/lib/libace.z.so
153#13 pc 009324f3 /system/lib/libace.z.so
154#14 pc 003989e1 /system/lib/libace.z.so
155#15 pc 0045dd4b /system/lib/libace.z.so
156#16 pc 00d24fef /system/lib/libace.z.so
157#17 pc 0041e6e9 /system/lib/libace.z.so
158#18 pc 0000b4d9 /system/lib/platformsdk/libeventhandler.z.so(OHOS:AppExecFwk:EventHandler:DistributeEvent(std::__h:unique_ptr<0 #19 pc 00012829 /system/lib/platformsdk/libeventhandler.z.so
159#20 pc 00011841 /system/lib/platformsdk/libeventhandler.z.so(OHOS:AppExecFwk:EventRunner:Run()+64)
160#21 pc 00054a8b /system/lib/libappkit_native.z.so(OHOS:AppExecFwk:MainThread:Start()+278)
161#22 pc 00011503 /system/bin/appspawn
162#23 pc 0001141f /system/bin/appspawn
163#24 pc 0000ee97 /system/bin/appspawn
164```
165
166**BinderCatcher信息示例:**
167
168通过搜索pid对应的数字找到当前进程与哪个进程在通信,同步的通信等待的时长。
169
170示例表明当前1561进程向685进程请求通信,等待超过10s没有得到响应。
171
172```
173PeerBinderCatcher -pid==1561 Layer_==0
174
175
176BinderCatcher --
177    1561::1561t0685:0c0de0Wait:10.366245919s 1329::1376t0487:794c0de0Wait:0.12070041s
178
179pid   context  request  started  max  ready free_async_space
1801561   binder    0       3       16     4       520192
181544    binder    0       4       16     5       520192
1821104   binder    0       1       16     2       520192
1831397   binder    0       1       16     3       520192
184...
185```
186
187**PeerBinder Stacktrace信息示例:**
188
189示例表明对端卡死进程685的堆栈信息。
190
191```
192PeerBinder Stacktrace --
193
194PeerBinderCatcher start catcher stacktrace for pid 685
195Result: 0 ( no error )
196Timestamp:2017-08-0817:06:55.000
197Pid:685
198Uid:1000
199Process name:wifi_manager_service
200Tid:658,Name:wifi_manager_service
201#00 pc 000669f0 /system/lib/ld-musl-arm.so.1
202#01 pc 000c60cc /system/lib/ld-musl-arm.so.1
203#02 pc 000c5040 /system/lib/ld-musl-arm.so.1
204#03 pc 000c6818 /system/lib/ld-musl-arm.so.1(__pthread_cond_timedwait_time64+596)
205#04 pc 000bd058 /system/lib/libc++.so
206#05 pc 0008592c /system/lib/ld-musl-arm.so.1(ioctl+72)
207#06 pc 00025779 /system/lib/platformsdk/libipc_core.z.so(OHOS:BinderConnector:WriteBinder(unsigned long,void*)+56)
208#07 pc 000265a5 /system/lib/platformsdk/libipc_core.z.so(OHOS:BinderInvoker:TransactWithDriver(bool)+216)
209#08 pc 0002666f /system/lib/platformsdk/libipc_core.z.so(OHOS:BinderInvoker:StartWorkLoop()+18)
210#09 pc 000270a9 /system/lib/platformsdk/libipc_core.z.so(OHOS:BinderInvoker:JoinThread(bool)+32)
211#10 pc 00023783 /system/lib/platformsdk/libipc_core.z.so(OHOS:IPCWorkThread:ThreadHandler(void*)+290)
212#11 pc 0007b4d9 /system/lib/platformsdk/libeventhandler.z.so(OHOS:AppExecFwk:EventHandler:DistributeEvent(std::__h:unique_ptr<OHOS:Ap
213#12 pc 00072829 /system/lib/platformsdk/libeventhandler.z.so
214#13 pc 00071841 /system/lib/platformsdk/libeventhandler.z.so(OHOS:AppExecFwk:EventRunner:Run()+64)
215#14 pc 00094a8b /system/lib/libappkit_native.z.so(OHOS:AppExecFwk:MainThread:Start()+278)
216
217Tid:1563,Name:IPC_0_1563
218#00 pc 0009392c /system/lib/ld-musl-arm.so.1(ioctl+72)
219#01 pc 00025779 /system/lib/platformsdk/libipc_core.z.so(OHOS:BinderConnector:WriteBinder(unsigned long,void*)+56)
220```
221
222**cpuusage信息示例:**
223
224整机CPU信息。
225
226```
227Load average:2.87 /1.45 /0.58;the cpu load average in 1 min,5 min and 15 min
228CPU usage fr0m2023-03-1017:06:53t02023-03-1017:06:53
229Total:29%;User Space:28%;Kernel Space:1%;iowait:6%;irq:0%;idle:62%
230Details of Processes:
231    PID     Total Usage     User Space     Kernel Space     Page Fault Minor     Page Fault Major      Name
232    1561       23%            23%              0%               9985                  26            i.myapplication
233    527        1%             1%               0%               3046                  9             hidumper_servic
234    242        1%             1%               0%               69799                 280           hiview
235```
236
237**memory信息示例:**
238
239当前进程内存信息。
240
241```
242-------------------------------------------[memory]----------------------------------------
243                 Pss      Shared   Shared   Private  Private   Swap   SwapPss   Heap  Heap
244                 Total    CLean    Dirty    CLean    Dirty     Total  Total     Size  Alloc
245                 (kB)     (kB)     (kB)     (kB)      (kB)     (kB)    (kB)     (kB)  (kB)
246-------------------------------------------------------------------------------------------
247guard             0        0         0       0         0         0      0        0      0
248native heap      185       0        180      0        160        0      0        0      0
249AnonPage other   17881    12        12376    88       15948      0      0        0      0
250stack            292       0        0        0        292        0      0        0      0
251.S0              5053     63408     4172     1812     2640       0      0        0      0
252.ttf             1133     3092      0        4        0          0      0        0      0
253dev              10       0         108      8        0          0      0        0      0
254FilePage other   121      556       8        0        4          0      0        0      0
255------------------------------------------------------------------------------------------
256Total            34675    67068     16844    1912     19044      0      0        0      0
257```
258
259### 日志主干特异性信息(应用主线程卡死超时)
260
261Reason是THREAD_BLOCK_6S的日志。根据前面的[应用主线程卡死超时](#thread_block_6s-应用主线程卡死超时)的原理可知,THREAD_BLOCK由THREAD_BLOCK_3S和THREAD_BLOCK_6S两部分组成。将两部分日志对比分析,可更准确的判断是卡死还是执行任务过多造成无法响应的情况。
262
263THREAD_BLOCK_3S在日志的前部分,THREAD_BLOCK_6S在THREAD_BLOCK_3S后面写入。可以通过EVENTNAME字段搜索两个事件在日志中的位置。
264
265两个事件中都包含MSG字段,该字段在应用主线程卡死超时故障中写入了当前主线程处理队列的信息,可查看在两个时间点中主线程事件处理队列排队情况。
266
267示例日志显示了在Low priority的队列中05:06:18.145的事件一直在处理,THREAD_BLOCK_3S和THREAD_BLOCK_6S的队列都显示其存在。这说明主线程卡死不是任务过多情况。
268
269由于THREAD_BLOCK_6S是主线程卡死,进程堆栈信息只需要关注主线程的堆栈(主线程线程号跟进程号相同)。当前示例日志主线程堆栈显示通过ArkUI控件到JS运行,说明卡死在Js代码中。3S和6S都是这个位置的堆栈,说明JS有卡死,但原因排除任务过多导致。
270
271THREAD_BLOCK_3S:
272
273```
274start time:2017/08/08-17:06:24:380
275DOMAIN = AAFWK EVENTNAME THREAD_BLOCK_3S
276TIMESTAMP = 2017/08/08-17:06:24:363
277PID = 1561
278UID = 20010039
279TID = 1566
280PACKAGE_NAME com.ohos.huawei.myapplication
281PROCESS_NAME com.ohos.huawei.myapplication
282eventLog_action pb:1 eventLog_interval 10
283MSG = App main thread is not response!EventHandler dump begin curTime:2017-08-08 05:06:24.362
284  Event runner (Thread name =Thread ID 1561)is running
285  Current Running:start at 2017-08-08 05:06:18.145,Event send thread 1561,send time =2017-08-08 05:06:18.145,handle time =2017-08-08 05:
286  Immediate priority event queue information:
287  Total size of Immediate events 0
288  High priority event queue information:
289  No.1 Event send thread 1561,send time 2017-08-08 05:06:18.039,handle time 2017-08-08 05:06:21.539,task name [anr_handler.cpp(Send Total size of High events 1
290  Low priority event queue information:
291  No.1:Event{send thread=1566,send time=2017-08-0805:06:21.062,handle time=2017-08-0805:06:21.062,id=1}
292  Total size of Low events 1
293  Idle priority event queue information:
294  Total size of Idle events 0
295  Total event size :2
296
297 Timestamp: 2017-08-0817:06:24.4142447784
298 Pid: 1561
299 Uid: 20010039
300 Process name: com.ohos.huawei.myapplication
301 Tid:1561 Name:i.myapplication
302   at anonymous (D:/project/OpenHarmonyOS/MyApplication_test/entry/build/default/intermediates/loader_out/default/ets,pages/Index_.js:0:1)
303   #00 pc 0017909c /system/lib/libark_jsruntime.so
304   #01 pc 00177ebb /system/lib/libark_jsruntime.so
305   #02 pc 0024b4bb /system/lib/libark_jsruntime.so(panda:FunctionRef:Call(panda:ecmascript:EcmaVM const*,panda:Local<panda:JSValueRef>,panda
306   #03 pc 00fbed23 /system/lib/libace.z.so
307   #04 pc 00d8208f /system/lib/libace.z.so
308   ...
309```
310
311THREAD_BLOCK_6S:
312```
313start time: 2017/08/08-17:06:27:299
314DOMAIN = AAFWK
315EVENTNAME THREAD_BLOCK_6S
316TIMESTAMP = 2017/08/08-17:06:27:292
317PID = 1561
318UID = 20010039
319TID = 1566
320PACKAGE_NAME com.ohos.huawei.myapplication
321PROCESS NAME com.ohos.huawei.myapplication eventLog_action cmd:c,cmd:m,tr,k:SysRqFile
322eventLog_interval 10
323MSG = App main thread is not response!EventHandler dump begin curTime:2017-08-08 05:06:27.291
324  Event runner (Thread name =Thread ID =1561)is running
325  Current Running:start at 2017-08-08 05:06:18.144, Event {send thread 1561,send time =2017-08-08 05:06:18.145,handle time =2017-08-08 05:
326  Immediate priority event queue information:
327  Total size of Immediate events 0
328  High priority event queue information:
329  No.1 Event send thread 1561,send time 2017-08-08 05:06:18.039,handle time 2017-08-08 05:06:21.539,task name [arr_handler.cpp(Se Total size of High events 1
330  Low priority event queue information:
331  No.1:Event{send thread=1566,send time=2017-08-0805:06:21.062,handle time=2017-08-0805:06:21.062,id=1}
332  No.2 Event send thread 1566,send time 2017-08-08 05:06:24.369,handle time 2017-08-08 05:06:24.369,id =1
333  Total size of Low events 2
334  Idle priority event queue information:
335  Total size of Idle events 0
336  Total event size 3
337
338Timestamp:2017-08-0817:0k:27,4142447784
339Pid:1561
340Uid:20010039
341Process name:com.ohos.huawei.myapplication
342Tid:1561 Name:i.myapplication
343  at anonymous (D:/project/OpenHarmony0S/MyApplication_test/entry/build/default/intermediates/loader_out/default/ets/pages/Index_.js:0:1)
344  #00 pc 00178dcc /system/lib/libark_jsruntime.so
345  #01 pc 00177ebb /system/lib/libark_jsruntime.so
346  #02 pc 0024b4bb /system/lib/libark_jsruntime.so(panda:FunctionRef:Call(panda:ecmascript:EcmaVM const*,panda:Local<panda:JSValueRef>,par
347  #03 pc 00fbed23 /system/lib/libace.z.so
348  #04 pc 00d8208f /system/lib/libace.z.so
349  #05 pc 00d7af1b /system/lib/libace.z.so
350```
351
352再结合流水日志查看当前应用侧是在执行哪块代码。
353
354一般情况下可以查看以上[通用日志信息](#日志主干通用信息)内容,判断是否存在对端通信卡死,整机CPU消耗很高导致当前应用响应不过来,内存泄漏,内存非常多导致任务无法运行的情况。
355
356### 日志主干特异性信息(用户输入响应超时)
357
358Reason是APP_INPUT_BLOCK,表明用户点击事件超过5s没有得到反馈。
359
360MSG信息是这个事件的说明:用户的输入没有得到响应。
361
362APP_INPUT_BLOCK的日志信息可以参考[通用日志信息](#日志主干通用信息)进行分析。需特别说明的是,一般情况下用户输入无响应大概率主线程也会卡死。可以结合两个日志的三个堆栈、两个BinderCatcher信息,进行对比查看。如果没有主线程卡死的日志,说明有可能在输入事件之前有大量的细碎的其他事件,细碎的事件不足以卡死主线程,但是数量比较多导致用户的输入事件响应不过来。
363
364### 日志主干特异性信息(生命周期切换超时)
365
366Reason是LIFECYCLE_TIMEOUT的日志与上文THREAD_BLOCK_6S和THREAD_BLOCK_3S一样都是有两个事件。分别是LIFECYCLE_HALF_TIMEOUT和LIFECYCLE_TIMEOUT
367
368MSG说明当前是什么生命周期的超时。
369
370示例可以看出,LIFECYCLE_TIMEOUT的可以看出Ability在切换后台的时候超时,可以按照上面[生命周期切换超时](#生命周期切换超时)的超时时间来找流水日志等信息。
371
372LIFECYCLE_TIMEOUT:
373
374```
375DOMAIN:AAFWK
376STRINGID:LIFECYCLE
377TIMEOUT TIMESTAMP:2023/03/10-17:06:53:65
378PID:1561
379UID:20010039
380PACKAGE_NAME:com.ohos.huawei.myapplication
381PROCESS_NAME:com.ohos.huawei.myapplication
382MSG:ability:EntryAbility background timeout
383```
384
385其他的日志信息可以参考[通用日志信息](#日志主干通用信息)进行分析。需要特别说明的是,一般情况下生命周期切换大概率主线程也会卡死。可以结合两个日志的三个堆栈、两个BinderCatcher信息,进行对比查看。
386
387## 应用退出
388
389当应用发生以下故障时,为了保证可恢复,会杀死应用。
390
391| 故障类型 | 说明 |
392| -------- | -------- |
393| THREAD_BLOCK_6S | 应用主线程卡死超时 |
394| APP_INPUT_BLOCK | 用户输入响应超时 |
395| LIFECYCLE_TIMEOUT | Ability生命周期切换超时 |
396
397## 定位步骤与思路
398
399定位应用无响应问题,首先需要开发者获取相关日志,再通过日志记录的问题基本信息,结合hilog日志和trace来定位出无响应问题的发生的具体位置。
400
401### 获取日志
402
403应用无响应日志是一种故障日志,与Native进程崩溃、JS应用崩溃、系统进程异常等都由FaultLog模块管理,可通过以下方式获取日志:
404
405- 方式一:通过DevEco Studio获取日志
406
407    DevEco Studio会收集设备的故障日志并归档到FaultLog下。具体可参考<!--RP1-->[DevEco Studio使用指南-FaultLog](https://developer.huawei.com/consumer/cn/doc/harmonyos-guides-V5/ide-fault-log-V5)<!--RP1End-->。
408
409- 方式二:通过hiAppEvent接口订阅
410
411    hiAppEvent 提供了故障订阅接口,可以订阅各类故障打点,详见[HiAppEvent介绍](hiappevent-intro.md)。
412
413<!--Del-->
414- 方式三:通过shell获取日志
415
416    应用无响应日志是以appfreeze-开头,生成在”设备/data/log/faultlog/faultlogger/”路径下。该日志文件名格式为“appfreeze-应用包名-应用UID-秒级时间”。
417
418    ![appfreeze_20230308145160](figures/appfreeze_20230308145160.png)
419<!--DelEnd-->
420
421### 确认基本信息
422
423#### 获取直接卡死应用的进程号,是否处于前台等基础信息
424
425```
426Generated by HiviewDFX@OpenHarmony
427============================================================
428Device info:HUANEI Mate 60 Pro
429Build info:ALN-AL00 x.x.x.xx(XXXXXXX)
430Fingerprint: ef8bd28f8b57b54656d743b546efa73764c77866a65934bd96f2678f886813b7
431Module name:com.xxx.xxx
432Version:1.2.2.202
433VersionCode: 1002002202
434PreInstalled:Yes
435Foreground:No   --> 是否处于前台
436Pid:15440
437Uid:20020029
438Reason:THREAD BLOCK 6S
439appfreeze: com.xxx.xxx THREAD_BLOCK 6S at 20240410164052
440DisplayPowerInfo:powerState: AWAKE
441>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
442```
443
444#### 获取故障发生时间点
445
446故障上报时间点
447
448```
449>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
450DOMAIN:AAFWK
451STRINGID: THREAD BLOCK 6S
452TIMESTAMP: 2024/04/10-16:40:52:743   --> 故障上报时间戳
453PID:15440
454UID:20020029
455PACKAGE NAME:com.xxx.xxx
456PROCESS NAME:com.xxx.xxx
457****************************************
458```
459
460不同故障类型的不同场景下的检测时长汇总表格
461
462| THREAD_BLOCK_6S |APP_INPUT_BLOCK|LIFECYCLE_TIMEOUT|
463| -------- |--------|--------|
464|前台应用:6s <br> 后台应用 :3s * 5 + 6s = 21s | 5s | Load:10s <br> Active:5s <br> Inactive:0.5s <br> Terminate:10s <br> Connect:3s <br> Disconnect:0.5s <br> Restart:5s <br> Foreground:5s <br> Background:3s |
465
466**说明:**
4671、THREAD_BLOCK_3S / LIFECYCLE_HALF_TIMEOUT 的检测时长是相应THREAD_BLOCK_6S / LIFECYCLE_TIMEOUT的一半,warning 级别,不会单独上报日志;THREAD_BLOCK_6S / LIFECYCLE_TIMEOUT 是 error 级别,整合了本身和其一半检测时长故障的日志一同上报。
4682、前台应用发生THREAD_BLOCK_3S后即可触发后续THREAD_BLOCK_6S事件
4693、后台应用存在计数器 backgroundReportCount_ = 0,发生THREAD_BLOCK_3S后 +1 累计到 5 次后才会上报 (即连续发生5次 THREAD_BLOCK_3S 事件,计数不清零,才会上报THREAD_BLOCK_6S 事件,可知后台应用THREAD_BLOCK_3S 与THREAD_BLOCK_6S 检测时长依次为 18s 与 21s</samll>
470
471通过故障上报时间点往前推检测时长可得到故障发生的具体时间。
472
473### 查看 eventHandler 信息
474
475开发者可以通过 “mainHandler dump is” 关键字搜索日志中的 eventHandler dump 信息
476
4771、dump begin curTime & Current Running
478
479```
480mainHandler dump is:
481 EventHandler dump begin curTime: 2024-08-08 12:17:43.544      --> 开始 dump 时间
482 Event runner (Thread name = , Thread ID = 35854) is running   --> 正在运行的线程信息
483 Current Running: start at 2024-08-08 12:17:16.629, Event { send thread = 35882, send time = 2024-08-08 12:17:16.628,  handle time = 2024-08-08 12:17:16.629, trigger time = 2024-08-08 12:17:16.630, task name = , caller = xx }  
484 --> trigger time--> 任务开始运行的时间
485```
486
487当前任务运行时长 = dump begin curTime - trigger time, 如示例中当前任务运行达到27s。
488若任务运行时长 > 故障检测时长,表示当前正在运行的任务是导致应用卡死的任务,需对该任务进行排查。
489若任务运行时长较小,表示当前任务仅是检测时间区间内主线程运行的任务之一,主要耗时不一定是该任务,建议优先查看近期耗时最长任务(History event queue information中)。该情形多为线程繁忙导致的watchdog无法调度执行。
490
4912、 History event queue information
492
493```
494 Current Running: start at 2024-08-08 12:17:16.629, Event { send thread = 35882, send time = 2024-08-08 12:17:16.628, handle time = 2024-08-08 12:17:16.629, trigger time = 2024-08-08 12:17:16.630, task name = , caller = [extension_ability_thread.cpp(ScheduleAbilityTransaction:393)]}
495 History event queue information:
496 No. 0 : Event { send thread = 35854, send time = 2024-08-08 12:17:15.525, handle time = 2024-08-08 12:17:15.525, trigger time = 2024-08-08 12:17:15.527, completeTime time = 2024-08-08 12:17:15.528, priority = High, id = 1 }
497 No. 1 : Event { send thread = 35854, send time = 2024-08-08 12:17:15.525, handle time = 2024-08-08 12:17:15.525, trigger time = 2024-08-08 12:17:15.527, completeTime time = 2024-08-08 12:17:15.527, priority = Low, task name = MainThread:SetRunnerStarted }
498 No. 2 : Event { send thread = 35856, send time = 2024-08-08 12:17:15.765, handle time = 2024-08-08 12:17:15.765, trigger time = 2024-08-08 12:17:15.766, completeTime time = 2024-08-08 12:17:15.800, priority = Low, task name = MainThread:LaunchApplication }
499 No. 3 : Event { send thread = 35856, send time = 2024-08-08 12:17:15.767, handle time = 2024-08-08 12:17:15.767, trigger time = 2024-08-08 12:17:15.800, completeTime time = 2024-08-08 12:17:16.629, priority = Low, task name = MainThread:LaunchAbility }
500 No. 4 : Event { send thread = 35854, send time = 2024-08-08 12:17:15.794, handle time = 2024-08-08 12:17:15.794, trigger time = 2024-08-08 12:17:16.629, completeTime time = 2024-08-08 12:17:16.629, priority = IDEL, task name = IdleTime:PostTask }
501 No. 5 : Event { send thread = 35852, send time = 2024-08-08 12:17:16.629, handle time = 2024-08-08 12:17:16.629, trigger time = 2024-08-08 12:17:16.629, completeTime time = , priority = Low, task name =  }
502```
503
504可以从历史任务队列中寻找故障发生时间区间内较为耗时的任务。其中CompleteTime time 为空的任务是当前任务。
505任务运行耗时 = CompleteTime time - trigger time。
506筛选出耗时较高的任务,排查其运行情况。
507
5083、VIP priority event queue information
509
510```
511 VIP priority event queue information:
512 No. 1 : Event { send thread = 3205, send time = 2024-08-07 04:11:15.407, handle time = 2024-08-07 04:11:15.407, task name = ArkUIWindowInjectPointerEvent, caller = [task_runner_adapter_impl.cpp(PostTask:33)]}
513 No. 2 : Event { send thread = 3205, send time = 2024-08-07 04:11:15.407, handle time = 2024-08-07 04:11:15.407, task name = ArkUIWindowInjectPointerEvent, caller = [task_runner_adapter_impl.cpp(PostTask:33)]}
514 No. 3 : Event { send thread = 3205, send time = 2024-08-07 04:11:15.407, handle time = 2024-08-07 04:11:15.407, task name = ArkUIWindowInjectPointerEvent, caller = [task_runner_adapter_impl.cpp(PostTask:33)]}
515 No. 4 : Event { send thread = 3961, send time = 2024-08-07 04:11:15.408, handle time = 2024-08-07 04:11:15.408, task name = MMI::OnPointerEvent, caller = [input_manager_impl.cpp (OnPointerEvent:493)]}
516 No. 5 : Event { send thread = 3205, send time = 2024-08-07 04:11:15.408, handle time = 2024-08-07 04:11:15.408, task name = ArkUIWindowInjectPointerEvent, caller = [task_runner_adapter_impl.cpp(PostTask:33)]}
517 No. 6 : Event { send thread = 3205, send time = 2024-08-07 04:11:15.409, handle time = 2024-08-07 04:11:15.409, task name = ArkUIWindowInjectPointerEvent, caller = [task_runner_adapter_impl.cpp(PostTask:33)]}
518 No. 7 : Event { send thread = 3205, send time = 2024-08-07 04:11:15.409, handle time = 2024-08-07 04:11:15.409, task name = ArkUIWindowInjectPointerEvent, caller = [task_runner_adapter_impl.cpp(PostTask:33)]}
519 No. 8 : Event { send thread = 3205, send time = 2024-08-07 04:11:15.409, handle time = 2024-08-07 04:11:15.409, task name = ArkUIWindowInjectPointerEvent, caller = [task_runner_adapter_impl.cpp(PostTask:33)]}
520 No. 9 : Event { send thread = 3205, send time = 2024-08-07 04:11:15.410, handle time = 2024-08-07 04:11:15.410, task name = ArkUIWindowInjectPointerEvent, caller = [task_runner_adapter_impl.cpp(PostTask:33)]}
521 ...
522```
523
524为保障第一时间响应用户,用户输入事件传递链中的任务都属于高优先级任务。此任务事件队列均由系统创建,通常记录用户输入->屏幕->窗口->ArkUI->应用的传输过程,与三方应用事件无关,开发者无需额外关注。
525
5264、High priority event queue information
527
528```
529 High priority event queue information:
530 No. 1 : Event { send thread = 35862, send time = 2024-08-08 12:17:25.526, handle time = 2024-08-08 12:17:25.526, id = 1, caller = [watchdog.cpp(Timer:156)]}
531 No. 2 : Event { send thread = 35862, send time = 2024-08-08 12:17:28.526, handle time = 2024-08-08 12:17:28.526, id = 1, caller = [watchdog.cpp(Timer:156)]}
532 No. 3 : Event { send thread = 35862, send time = 2024-08-08 12:17:31.526, handle time = 2024-08-08 12:17:31.526, id = 1, caller = [watchdog.cpp(Timer:156)]}
533 No. 4 : Event { send thread = 35862, send time = 2024-08-08 12:17:34.530, handle time = 2024-08-08 12:17:34.530, id = 1, caller = [watchdog.cpp(Timer:156)]}
534 No. 5 : Event { send thread = 35862, send time = 2024-08-08 12:17:37.526, handle time = 2024-08-08 12:17:37.526, id = 1, caller = [watchdog.cpp(Timer:156)]}
535 No. 6 : Event { send thread = 35862, send time = 2024-08-08 12:17:40.526, handle time = 2024-08-08 12:17:40.526, id = 1, caller = [watchdog.cpp(Timer:156)]}
536 No. 7 : Event { send thread = 35862, send time = 2024-08-08 12:17:43.544, handle time = 2024-08-08 12:17:43.544 ,id = 1, caller = [watchdog.cpp(Timer:156)]}
537 Total size of High events : 7
538```
539
540watchdog 任务位于此优先级队列中,观察 watchdog 任务队列发现其是每隔 3s 发送一次。
541
542对比 warning/block 事件,观察 watchdog 任务在队列中的移动情况。
543warning:
544```
545 High priority event queue information:
546 No. 1 : Event { send thread = 35862, send time = 2024-08-08 12:17:25.526, handle time = 2024-08-08 12:17:25.526, id = 1, caller = [watchdog.cpp(Timer:156)]}
547 No. 2 : Event { send thread = 35862, send time = 2024-08-08 12:17:28.526, handle time = 2024-08-08 12:17:28.526, id = 1, caller = [watchdog.cpp(Timer:156)]}
548 No. 3 : Event { send thread = 35862, send time = 2024-08-08 12:17:31.526, handle time = 2024-08-08 12:17:31.526, id = 1, caller = [watchdog.cpp(Timer:156)]}
549 No. 4 : Event { send thread = 35862, send time = 2024-08-08 12:17:34.530, handle time = 2024-08-08 12:17:34.530, id = 1, caller = [watchdog.cpp(Timer:156)]}
550 Total size of High events : 4
551```
552
553block:
554```
555 High priority event queue information:
556 No. 1 : Event { send thread = 35862, send time = 2024-08-08 12:17:25.526, handle time = 2024-08-08 12:17:25.526, id = 1, caller = [watchdog.cpp(Timer:156)]}
557 No. 2 : Event { send thread = 35862, send time = 2024-08-08 12:17:28.526, handle time = 2024-08-08 12:17:28.526, id = 1, caller = [watchdog.cpp(Timer:156)]}
558 No. 3 : Event { send thread = 35862, send time = 2024-08-08 12:17:31.526, handle time = 2024-08-08 12:17:31.526, id = 1, caller = [watchdog.cpp(Timer:156)]}
559 No. 4 : Event { send thread = 35862, send time = 2024-08-08 12:17:34.530, handle time = 2024-08-08 12:17:34.530, id = 1, caller = [watchdog.cpp(Timer:156)]}
560 No. 5 : Event { send thread = 35862, send time = 2024-08-08 12:17:37.526, handle time = 2024-08-08 12:17:37.526, id = 1, caller = [watchdog.cpp(Timer:156)]}
561  Total size of High events : 5
562```
563
564以上示例中可发现 block 队列相比于 warning 队列更长了,而对应的第一个任务没有发生变化,可能存在两种情况:
565- 当前正在运行的任务卡死阻塞,导致其他任务一直未被调度执行。
566- 更高优先级队列中任务堆积,导致位于较低优先级队列中的 watchdog 任务未被调度执行。
567
568### 查看 stack 信息
569
570通过得到的 Pid、Tid 查看对应的 stack,存在以下几种情况:
571
5721、有明确卡死堆栈信息
573
574```
575Tid:3025, Name: xxx
576# 00 pc 00000000001b4094 /system/lib/ld-musl-aarch64.so.1(__timedwait_cp+188)(b168f10a179cf6050a309242262e6a17)
577# 01 pc 00000000001b9fc8 /system/lib/ld-musl-aarch64.so.1(__pthread_mutex_timedlock_inner+592)(b168f10a179cf6050a309242262e6a17)
578# 02 pc 00000000000c3e40 /system/lib64/libc++.so(std::__h::mutex::lock()+8)(9cbc937082b3d7412696099dd58f4f78242f9512) --> 等锁卡死
579# 03 pc 000000000007ac4c /system/lib64/platformsdk/libnative_rdb.z.so(OHOS::NativeRdb::SqliteConnectionPool::Container::Release(std::__h::shared_ptr<OHOS::NativeRdb::SqliteConnectionPool::ConnNode>)+60)(5e8443def4695e8c791e5f847035ad9f)
580# 04 pc 000000000007aaf4 /system/lib64/platformsdk/libnative_rdb.z.so(OHOS::NativeRdb::SqliteConnectionPool::ReleaseNode(std::__h::shared_ptr<OHOS::NativeRdb::SqliteConnectionPool::ConnNode>)+276)(5e8443def4695e8c791e5f847035ad9f)
581# 05 pc 000000000007a8c0 /system/lib64/platformsdk/libnative_rdb.z.so(5e8443def4695e8c791e5f847035ad9f)
582# 06 pc 00000000000b36ec /system/lib64/platformsdk/libnative_rdb.z.so(OHOS::NativeRdb::SqliteSharedResultSet::Close()+324)(5e8443def4695e8c791e5f847035ad9f)
583# 07 pc 000000000006da94 /system/lib64/module/data/librelationalstore.z.so(OHOS::RelationalStoreJsKit::ResultSetProxy::Close(napi_env__*, napi_callback_info__*) (.cfi)+212)(5c7c67512e12e0e53fd23e82ee576a88)
584# 08 pc 0000000000034408 /system/lib64/platformsdk/libace_napi.z.so(panda::JSValueRef ArkNativeFunctionCallBack<true>(panda::JsiRuntimeCallInfo*)+220)(f271f536a588ef9d0dc5328c70fce511)
585# 09 pc 00000000002d71d0 /system/lib64/module/arkcompiler/stub.an(RTStub_PushCallArgsAndDispatchNative+40)
586# 10 at parseResultSet (entry/build/default/cache/default/default@CompileArkTS/esmodule/release/datamanager/datawrapper/src/main/ets/database/RdbManager.ts:266:1)
587# 11 at query (entry/build/default/cache/default/default@CompileArkTS/esmodule/release/datamanager/datawrapper/src/main/ets/database/RdbManager.ts:188:1)
588```
589
590so 明确等锁卡死,通过反编译获取对应代码行,排查代码上下文解决 bug
591
5922、卡在 ipc 请求
593
594```
595Tid:53616, Name:xxx
596# 00 pc 0000000000171c1c /system/lib/ld-musl-aarch64.so.1(ioctl+176)(b168f10a179cf6050a309242262e6a17)
597# 01 pc 0000000000006508 /system/lib64/chipset-pub-sdk/libipc_common.z.so(OHOS::BinderConnector::WriteBinder(unsigned long, void*)+100)(1edec25445c569dd1093635c1da3bc0a) --> binder 卡死
598# 02 pc 000000000004d500 /system/lib64/platformsdk/libipc_core.z.so(OHOS::BinderInvoker::TransactWithDriver(bool)+296)(6151eca3b47aa2ab3e378e6e558b90f3)
599# 03 pc 000000000004c6c0 /system/lib64/platformsdk/libipc_core.z.so(OHOS::BinderInvoker::WaitForCompletion(OHOS::MessageParcel*, int*)+128)(6151eca3b47aa2ab3e378e6e558b90f3)
600# 04 pc 000000000004c304 /system/lib64/platformsdk/libipc_core.z.so(OHOS::BinderInvoker::SendRequest(int, unsigned int, OHOS::MessageParcel&, OHOS::MessageParcel&, OHOS::MessageOption&)+348)(6151eca3b47aa2ab3e378e6e558b90f3)
601# 05 pc 00000000000319ac /system/lib64/platformsdk/libipc_core.z.so(OHOS::IPCObjectProxy::SendRequestInner(bool, unsigned int, OHOS::MessageParcel&, OHOS::MessageParcel&, OHOS::MessageOption&)+124)(6151eca3b47aa2ab3e378e6e558b90f3)
602# 06 pc 0000000000031cfc /system/lib64/platformsdk/libipc_core.z.so(OHOS::IPCObjectProxy::SendRequest(unsigned int, OHOS::MessageParcel&, OHOS::MessageParcel&, OHOS::MessageOption&)+184)(6151eca3b47aa2ab3e378e6e558b90f3)
603# 07 pc 0000000000023c7c /system/lib64/libipc.dylib.so(<ipc::remote::obj::RemoteObj>::send_request+268)(7006cb5520edc22f64d04df86cb90152)
604# 08 pc 000000000000b904 /system/lib64/libasset_sdk.dylib.so(<asset_sdk::Manager>::send_request+48)(4073ec22b58b83f79883d5fc8102ce77)
605# 09 pc 000000000000b600 /system/lib64/libasset_sdk.dylib.so(<asset_sdk::Manager>::query+156)(4073ec22b58b83f79883d5fc8102ce77)
606# 10 pc 0000000000006d94 /system/lib64/libasset_sdk_ffi.z.so(query_asset+116)(9a309896092ba014c878289a54688679)
607# 11 pc 0000000000006740 /system/lib64/module/security/libasset_napi.z.so((anonymous namespace)::NapiQuerySync(napi_env__*, napi_callback_info__*) (.cfi)+220)(ef7afe850712e4822f085ed0ac184e8a)
608# 12 pc 0000000000034408 /system/lib64/platformsdk/libace_napi.z.so(panda::JSValueRef ArkNativeFunctionCallBack<true>(panda::JsiRuntimeCallInfo*)+220)(f271f536a588ef9d0dc5328c70fce511)
609```
610
6113、warning/error 栈一致,栈顶为业务同步执行代码
612
613warning/error 栈均为:
614
615```
616Tid:14727, Name:xxx
617# 00 pc 00000000001c4c60 /system/lib/ld-musl-aarch64.so.1(pread+72)(b168f10a179cf6050a309242262e6a17)
618# 01 pc 0000000000049154 /system/lib64/platformsdk/libsqlite.z.so(unixRead+180)(48485aa23da681fc87d8dc0b4be3e34c)
619# 02 pc 0000000000053e98 /system/lib64/platformsdk/libsqlite.z.so(readDbPage+116)(48485aa23da681fc87d8dc0b4be3e34c)
620# 03 pc 0000000000053d48 /system/lib64/platformsdk/libsqlite.z.so(getPageNormal+864)(48485aa23da681fc87d8dc0b4be3e34c)
621# 04 pc 00000000000757a0 /system/lib64/platformsdk/libsqlite.z.so(getAndInitPage+216)(48485aa23da681fc87d8dc0b4be3e34c)
622# 05 pc 0000000000077658 /system/lib64/platformsdk/libsqlite.z.so(moveToLeftmost+164)(48485aa23da681fc87d8dc0b4be3e34c)
623# 06 pc 000000000006aa34 /system/lib64/platformsdk/libsqlite.z.so(sqlite3VdbeExec+34532)(48485aa23da681fc87d8dc0b4be3e34c)
624# 07 pc 000000000002e424 /system/lib64/platformsdk/libsqlite.z.so(sqlite3_step+644)(48485aa23da681fc87d8dc0b4be3e34c)
625# 08 pc 00000000000b1a70 /system/lib64/platformsdk/libnative_rdb.z.so(FillSharedBlockOpt+408)(5e8443def4695e8c791e5f847035ad9f)
626# 09 pc 0000000000082a94 /system/lib64/platformsdk/libnative_rdb.z.so(OHOS::NativeRdb::SqliteStatement::FillBlockInfo(OHOS::NativeRdb::SharedBlockInfo*) const+76)(5e8443def4695e8c791e5f847035ad9f)
627# 10 pc 00000000000b4214 /system/lib64/platformsdk/libnative_rdb.z.so(OHOS::NativeRdb::SqliteSharedResultSet::ExecuteForSharedBlock(OHOS::AppDataFwk::SharedBlock*, int, int, bool)+236)(5e8443def4695e8c791e5f847035ad9f)
628```
629
630结合 trace 进一步确认,排查调用的单一栈顶函数逻辑是否执行超时
631
6324、 瞬时栈,warning/error 栈不一致
633
634warning 栈:
635
636```
637Tid:3108, Name:xxx
638# 00 pc 0000000000146e2c /system/lib/ld-musl-aarch64.so.1(open64+224)(b168f10a179cf6050a309242262e6a17)
639# 01 pc 0000000000014600 /system/lib64/chipset-pub-sdk/libextractortool.z.so(OHOS::AbilityBase::ZipFileReader::init()+600)(c4893935af8fc8cb36569be5ccdebfa9)
640# 02 pc 0000000000014144 /system/lib64/chipset-pub-sdk/libextractortool.z.so(OHOS::AbilityBase::ZipFileReader::CreateZipFileReader(std::__h::basic_string<char, std::__h::char_traits<char>, std::__h::allocator<char>> const&)+392)(c4893935af8fc8cb36569be5ccdebfa9)
641# 03 pc 000000000000f724 /system/lib64/chipset-pub-sdk/libextractortool.z.so(OHOS::AbilityBase::ZipFile::Open()+728)(c4893935af8fc8cb36569be5ccdebfa9)
642# 04 pc 000000000000a808 /system/lib64/chipset-pub-sdk/libextractortool.z.so(OHOS::AbilityBase::Extractor::Init()+124)(c4893935af8fc8cb36569be5ccdebfa9)
643# 05 pc 000000000000c4a4 /system/lib64/chipset-pub-sdk/libextractortool.z.so(OHOS::AbilityBase::ExtractorUtil::GetExtractor(std::__h::basic_string<char, std::__h::char_traits<char>, std::__h::allocator<char>> const&, bool&, bool)+596)(c4893935af8fc8cb36569be5ccdebfa9)
644# 06 pc 00000000000391e4 /system/lib64/platformsdk/libglobal_resmgr.z.so(OHOS::Global::Resource::GetIndexData(char const*, std::__h::unique_ptr<unsigned char [], std::__h::default_delete<unsigned char []>>&, unsigned long&)+284)(5c4263e737507b4a8f2ee7196a152dbd)
645# 07 pc 0000000000038590 /system/lib64/platformsdk/libglobal_resmgr.z.so(OHOS::Global::Resource::HapResource::LoadFromHap(char const*, std::__h::shared_ptr<OHOS::Global::Resource::ResConfigImpl>&, bool, bool, unsigned int const&)+80)(5c4263e737507b4a8f2ee7196a152dbd)
646# 08 pc 00000000000384e8 /system/lib64/platformsdk/libglobal_resmgr.z.so(OHOS::Global::Resource::HapResource::Load(char const*, std::__h::shared_ptr<OHOS::Global::Resource::ResConfigImpl>&, bool, bool, unsigned int const&)+364)(5c4263e737507b4a8f2ee7196a152dbd)
647# 09 pc 000000000002f118 /system/lib64/platformsdk/libglobal_resmgr.z.so(OHOS::Global::Resource::HapManager::AddResourcePath(char const*, unsigned int const&)+280)(5c4263e737507b4a8f2ee7196a152dbd)
648# 10 pc 000000000002efdc /system/lib64/platformsdk/libglobal_resmgr.z.so(OHOS::Global::Resource::HapManager::AddResource(char const*, unsigned int const&)+52)(5c4263e737507b4a8f2ee7196a152dbd)
649```
650
651error 栈:
652
653```
654Tid:3108, xxx
655# 00 pc 00000000003e13cc /system/lib64/platformsdk/libark_jsruntime.so(panda::ecmascript::JSObject::GetProperty(panda::ecmascript::JSThread*, panda::ecmascript::JSHandle<panda::ecmascript::JSTaggedValue> const&, panda::ecmascript::JSHandle<panda::ecmascript::JSTaggedValue> const&, panda::ecmascript::JSShared::SCheckMode)+164)(13376099388381a01b166c00a8af99fb)
656# 01 pc 00000000003d5518 /system/lib64/platformsdk/libark_jsruntime.so(panda::ecmascript::JSIterator::IteratorStep(panda::ecmascript::JSThread*, panda::ecmascript::JSHandle<panda::ecmascript::JSTaggedValue> const&)+228)(13376099388381a01b166c00a8af99fb)
657# 02 pc 0000000000570fa8 /system/lib64/platformsdk/libark_jsruntime.so(panda::ecmascript::RuntimeStubs::StArraySpread(unsigned long, unsigned int, unsigned long)+592)(13376099388381a01b166c00a8af99fb)
658# 03 pc 00000000002d53c0 /system/lib64/module/arkcompiler/stub.an(RTStub_CallRuntime+40)
659# 04 at doTask (product/phone/build/default/cache/default/default@CompileArkTS/esmodule/release/staticcommon/launchercommon/src/main/ets/db/RdbTaskPool.ts:1:1)
660# 05 at update (product/phone/build/default/cache/default/default@CompileArkTS/esmodule/release/staticcommon/launchercommon/src/main/ets/db/RdbTaskPool.ts:1:1)
661# 06 at updateAppNameByAbilityInfoAndType (product/phone/build/default/cache/default/default@CompileArkTS/esmodule/release/staticcommon/launchercommon/src/main/ets/db/RdbStoreManager.ts:12:1)
662# 07 at anonymous (product/phone/build/default/cache/default/default@CompileArkTS/esmodule/release/staticcommon/launchercommon/src/main/ets/model/AppModel.ts:0:1)
663# 08 pc 0000000000304a94 /system/lib64/platformsdk/libark_jsruntime.so(panda::ecmascript::InterpreterAssembly::Execute(panda::ecmascript::EcmaRuntimeCallInfo*)+144)(13376099388381a01b166c00a8af99fb)
664# 09 pc 0000000000201d84 /system/lib64/platformsdk/libark_jsruntime.so(panda::ecmascript::builtins::BuiltinsPromiseJob::PromiseReactionJob(panda::ecmascript::EcmaRuntimeCallInfo*)+348)(13376099388381a01b166c00a8af99fb)
665# 10 pc 00000000002d6e14 /system/lib64/module/arkcompiler/stub.an(RTStub_AsmInterpreterEntry+208)
666```
667
668此时栈是在线程的运行过程中抓的,没有规律,说明线程未卡死;线程繁忙场景,需结合 trace 和 hilog 判断应用具体运行场景,针对场景进行优化
669
670### 查看 binder 信息
671
672binder信息抓取时机:存在半周期检测的故障类型实在warning事件产生后获取;其他则在block事件后获取。
673
6741、获取binder调用链
675
676```
677PeerBinderCatcher -- pid==35854 layer_ == 1
678BinderCatcher --
679    35854:35854 to 52462:52462 code 3 wait:27.185154163 s frz_state:3          -> 35854:35854 to 52462:53462 code 3 wait:27.185154163 s
680    ...
681    52462:52462 to 1386:0 code 13 wait:24.733640622 s frz_state:3              -> 52462:52462 to 1386:0 code 13 wait:24.733640622 s
682```
683以上示例为参考:从故障进程的主线程出发,存在 35854:35854 -> 52462:52462 -> 1386:0 的调用链关系,结合对端进程堆栈信息排查对端阻塞原因。
684
6852、线程号为0
686
687表示该应用为IPC_FULL,即应用的ipc线程都在使用中,没有空闲线程分配来完成本次请求,导致阻塞,如上面示例中的1386进程,可参照其stack分析:
688
689```
690pid     context     request   started    max     ready   free_async_space
691
69235862    binder      0          2        16       2         519984
693
69435854    binder      0          2        16       3         520192
695
69635850    binder      0          2        16       3         520192
697
69813669    binder      0          1        16       3         520192
699
700...
701
7021386     binder      1          15       16       0         517264                 -> binderInfo
703
7041474     binder      0          2        16       4         520192
705```
706
707可以看到此时 1386 进程处于 ready 态的线程为 0,验证了上述说法。此情况说明该进程的其他ipc线程可能全部被阻塞了,需要分析排查为什么其他ipc线程不释放。常见场景为:某一ipc线程持锁阻塞,导致其他线程等锁卡死。
708
709另一种情况为 free_async_space 消耗殆尽,导致新的ipc线程没有足够的 buffer 空间完成请求。值得说明的是,同步和异步请求都会消耗该值,常见场景为:某短时间段内大批量异步请求。
710
7113、waitTime过小
712
713waitTime 表示的是本次ipc通信时长,如果该值远小于故障检测时长,我们有理由确认本次ipc请求并不是卡死的根本原因。
714一种典型的场景是:应用侧主线程在短时间内多次ipc请求,总请求时长过长导致故障。
715
716排查方向:
717    - 单次请求是否在预期时长内(例如:规格在20ms的请求接口异常情形下达到1s),排查接口性能不达预期的原因。
718    - 应用测频繁调用场景是否合理。
719
7204、无调用关系,栈为ipc栈
721
722确定是否为瞬时栈,即waring/block栈是否一致,可能场景是:warning为ipc栈,block栈为其他瞬时栈,表明抓取binder时ipc请求已经结束,本次ipc请求耗时并不长。
723需要提到的是:binder信息并不是在发生故障时刻实时获取的,有一定的延迟性;对于存在半周期检测的故障类型来说,binder抓取比较准确,绝大多数都可以在故障时间段内完成采集;而其他故障类型在上报存在延迟的情况下可能抓取到非现场binder。
724
725当然,结合 trace 分析更能直观查看binder的耗时情况。
726
727### 结合 hilog
728
729#### DFX 相关日志
730
7311、故障上报(reportEvent)
732
733![appfreeze_2024061401](figures/appfreeze_2024061401.png)
734
7352、抓栈(signal: 35)
736
737![appfreeze_2024061402](figures/appfreeze_2024061402.png)
738
7393、后台应用检测(5次后上报),21s 左右
740
741![appfreeze_2024061403](figures/appfreeze_2024061403.png)
742
7434、记录查杀原因
744
745![appfreeze_2024061404](figures/appfreeze_2024061404.png)
746
7475、APPFREEZE kill 卡死应用
748
749![appfreeze_2024061405](figures/appfreeze_2024061405.png)
750
751#### 一般分析步骤
752
753根据故障日志确定上报[时间点](#获取故障发生时间点),再根据具体场景下的故障类型前推断卡死开始发生的时间点,查看对应时间段的hilog日志,分析日志得出应用对应线程运行状态:
754
755- 应用日志完全无打应输出:卡死在最后日志打印的接口调用处
756
757   ![appfreeze_2024061406](figures/appfreeze_2024061406.png)
758
759   **...**
760
761   ![appfreeze_2024061407](figures/appfreeze_2024061407.png)
762
763   例如上图案例:APP_INPUT_BLOCK 类型在 07:24:08.167 上报,应用主线程在 07:24:01.581 后就没有打印了,可排查是否为 FormManagerService:
764
765   [form_mgr_proxy.cpp(GetFormsInfoByApp:1128)] 中的逻辑超时
766
767- 应用频繁打印输出日志:分析对应输出表示的场景及其合理性
768
769   ![appfreeze_2024061408](figures/appfreeze_2024061408.png)
770
771   例如上图案例:进程在被 APP_FREEZE 杀死前在大量输出,对应的 ImageEffect 领域需排查此日志是否正常
772
773### 结合 trace
774
775存在以下可能:
776
7771、进程每一小段业务时间并不长,但是较长时间段运行非常密集,占满了主线程
778
779![appfreeze_2024061409](figures/appfreeze_2024061409.png)
780
781![appfreeze_2024061410](figures/appfreeze_2024061410.png)
782
783上图案例为:PriviewArea::updateShotComponent(更新组件) -> ohos.animator (执行动画)-> 密集的动画执行过程达 9.2s;
784
785线程繁忙地循环执行某业务,分析每一小段业务:
786
787- 不符合业务场景(此处不应该频繁调用),分析业务代码,为何会循环执行;
788
789- 符合业务场景,分析每一小段业务是否耗时超过预期,性能为何不满足设计规格;
790
7912、进程执行某一函数接口超时
792
793![appfreeze_2024061411](figures/appfreeze_2024061411.png)
794
795上图案例为:OHOS::AppExecFwk::FormMgrAdapter::GetFormsInfoByApp 接口执行时长达到 8s
796
797## 分析案例
798
799### ThreadBlock 类典型案例——未正确使用锁
800
801#### 背景/原理
802
803xxxservice 上报 THREAD_BLOCK_6S 的 appfreeze 问题
804
805#### 错误代码案例
806
807第4行加锁,第6行函数返回失败后,第6行直接返回未解锁,导致其他线程一直等锁
808
809```cpp
810int xxx()
811{
812    ...
813    mutex_.lock();
814    AIContext aiContext;
815    if (ConvertRpcHandle2AIContext(inputs[0], aiContext) != aicp::SUCCESS) {
816        return FAILED;
817    }
818    ...
819}
820```
821
822#### 影响/报错
823
824后台应用卡死,用户无感知,但是相关功能不可用
825
826#### 定位思路
827
828提取故障日志关键类别信息
829
830```
831appfreeze: com.huawei.hmsapp.xxx THREAD_BLOCK_6S at 20240408082432
832DisplayPowerInfo:powerState:AWAKE
833```
834
835从 Foreground 值可看出 hiai 为后台应用,可推断出当真正的 3s 事件上报上来时,后台应用已卡 **18s** 前
836
837```
838Module name:com.xxx.xxx.xxx
839Version:1.2.2.202
840VersionCode:1002002202
841PreInstalled:Yes
842Foreground:No  --> 后台
843Pid:43675
844Uid:20020029
845Reason:THREAD_BLOCK_6S
846```
847
848THREAD_BLOCK_3S 上报的时间为 08:24:29:612;
849THREAD_BLOCK_6S 上报的时间为 08:24:32:638;相隔 3s 符合预期
850
851```
852>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
853DOMAIN:AAFWK
854STRINGID:THREAD_BLOCK_6S
855TIMESTAMP:2024/04/08-08:24:32:638
856PID:43675
857UID:20020029
858PACKAGE_NAME:com.xxx.xxx.xxx
859PROCESS_NAME:com.xxx.xxx.xxx
860*******************************************
861start time: 2024/04/08-08:24:29:627
862DOMAIN = AAFWK
863EVENTNAME = THREAD_BLOCK_3S
864TIMESTAMP = 2024/04/08-08:24:29:612
865PID = 43675
866UID = 20020029
867PACKAGE_NAME = com.xxx.xxx.xxx
868PROCESS_NAME = com.xxx.xxx.xxx
869```
870
8713s 上报时会去抓取此时的 EventHandler 信息,时间为 08:24:29.413,符合预期上报的原因为:App main thread is not response! 主线程无响应,当前正在运行的任务开始时间为 08:24:01.514
872
873```
874MSG = 
875Fault time:2024/04/08-08:24:29
876App main thread is not response!
877mainHandler dump is:
878 EventHandler dump begin curTime: 2024-04-08 08:24:29.413
879 Event runner (Thread name = , Thread ID = 43675) is running
880 Current Running: start at 2024-04-08 08:24:01.514, Event { send thread = 43675, send time = 2024-04-08 08:24:01.514, handle time = 2024-04-08 08:24:01.514, task name = uvLoopTask }
881```
882
883watchdog 任务位于高优先级队列(High priority event queue),如下图可发现:每隔 3s 就会抛一个任务到主线程去,符合预期;
884
885THREAD_BLOCK_3S、THREAD_BLOCK_6S 的队列一致,6s 较 3s 多了一个 event
886
887最早的一个 event send time 为 **08:24:11.388**,与 3s 上报上来的时间 08:24:29:612 刚好差 18s,符合预期;
888
889```
890 High priority event queue information:
891 No.1 : Event { send thread = 43679, send time = 2024-04-08 08:24:11.388, handle time = 2024-04-08 08:24:11.388, id = 1, caller = [watchdog.cpp(Timer:139)] }
892 No.2 : Event { send thread = 43679, send time = 2024-04-08 08:24:14.458, handle time = 2024-04-08 08:24:14.458, id = 1, caller = [watchdog.cpp(Timer:139)] }
893 No.3 : Event { send thread = 43679, send time = 2024-04-08 08:24:17.383, handle time = 2024-04-08 08:24:17.383, id = 1, caller = [watchdog.cpp(Timer:139)] }
894 No.4 : Event { send thread = 43679, send time = 2024-04-08 08:24:20.363, handle time = 2024-04-08 08:24:20.363, id = 1, caller = [watchdog.cpp(Timer:139)] }
895 No.5 : Event { send thread = 43679, send time = 2024-04-08 08:24:23.418, handle time = 2024-04-08 08:24:23.418, id = 1, caller = [watchdog.cpp(Timer:139)] }
896 No.6 : Event { send thread = 43679, send time = 2024-04-08 08:24:26.369, handle time = 2024-04-08 08:24:26.369, id = 1, caller = [watchdog.cpp(Timer:139)] }
897 No.7 : Event { send thread = 43679, send time = 2024-04-08 08:24:29.412, handle time = 2024-04-08 08:24:29.412, id = 1, caller = [watchdog.cpp(Timer:139)] }
898```
899
900以上可总结:应用**主线程从 08:24:01.514 开始运行本次任务,第一次 3s 检测开始时间为 08:24:11.388,真正开始卡住的时间在 08:24:11 左右;**
901
902查看主线程栈:从 xxx_request_client.so -> libsamgr_proxy.z.so -> libipc_core.z.so(OHOS::BinderConnector::WriteBinder)
903
904可知:**此时主线程发起了一个 ipc 请求,对端进程未返回导致卡死,如下堆栈所示。**
905
906```
907Tid:43675, Name:xxx
908# 00 pc 0000000000168c44 /system/lib/ld-musl-aarch64.so.1(ioctl+176)(91b804d2409a13f27463debe9e19fb5d)
909# 01 pc 0000000000049268 /system/lib64/platformsdk/libipc_core.z.so(OHOS::BinderConnector::WriteBinder(unsigned long, void*)+112)(e59500a4ea66775388332f6e3cc12fe3)
910# 02 pc 0000000000054fd4 /system/lib64/platformsdk/libipc_core.z.so(OHOS::BinderInvoker::TransactWithDriver(bool)+296)(e59500a4ea66775388332f6e3cc12fe3)
911# 03 pc 00000000000544c8 /system/lib64/platformsdk/libipc_core.z.so(OHOS::BinderInvoker::WaitForCompletion(OHOS::MessageParcel*, int*)+304)(e59500a4ea66775388332f6e3cc12fe3)
912# 04 pc 0000000000053c84 /system/lib64/platformsdk/libipc_core.z.so(OHOS::BinderInvoker::SendRequest(int, unsigned int, OHOS::MessageParcel&, OHOS::MessageParcel&, OHOS::MessageOption&)+312)(e59500a4ea66775388332f6e3cc12fe3)
913# 05 pc 000000000002d6d8 /system/lib64/platformsdk/libipc_core.z.so(OHOS::IPCObjectProxy::SendRequestInner(bool, unsigned int, OHOS::MessageParcel&, OHOS::MessageParcel&, OHOS::MessageOption&)+128)(e59500a4ea66775388332f6e3cc12fe3)
914# 06 pc 0000000000030e00 /system/lib64/platformsdk/libipc_core.z.so(OHOS::IPCObjectProxy::GetProtoInfo()+396)(e59500a4ea66775388332f6e3cc12fe3)
915# 07 pc 000000000002e990 /system/lib64/platformsdk/libipc_core.z.so(OHOS::IPCObjectProxy::WaitForInit()+292)(e59500a4ea66775388332f6e3cc12fe3)
916# 08 pc 0000000000036cd0 /system/lib64/platformsdk/libipc_core.z.so(OHOS::IPCProcessSkeleton::FindOrNewObject(int)+116)(e59500a4ea66775388332f6e3cc12fe3)
917# 09 pc 00000000000571cc /system/lib64/platformsdk/libipc_core.z.so(OHOS::BinderInvoker::UnflattenObject(OHOS::Parcel&)+272)(e59500a4ea66775388332f6e3cc12fe3)
918# 10 pc 00000000000463a4 /system/lib64/platformsdk/libipc_core.z.so(OHOS::MessageParcel::ReadRemoteObject()+116)(e59500a4ea66775388332f6e3cc12fe3)
919# 11 pc 000000000001250c /system/lib64/chipset-pub-sdk/libsamgr_proxy.z.so(OHOS::SystemAbilityManagerProxy::CheckSystemAbility(int, bool&)+952)(6f113f37ac6ac882cfa16077ad5b406a)
920# 12 pc 0000000000010e7c /system/lib64/chipset-pub-sdk/libsamgr_proxy.z.so(OHOS::SystemAbilityManagerProxy::GetSystemAbilityWrapper(int, std::__h::basic_string<char, std::__h::char_traits<char>, std::__h::allocator<char>> const&)+232)(6f113f37ac6ac882cfa16077ad5b406a)
921# 13 pc 00000000000118b8 /system/lib64/chipset-pub-sdk/libsamgr_proxy.z.so(OHOS::SystemAbilityManagerProxy::Recompute(int, int)+132)(6f113f37ac6ac882cfa16077ad5b406a)
922# 14 pc 0000000000011170 /system/lib64/chipset-pub-sdk/libsamgr_proxy.z.so(OHOS::DynamicCache<int, OHOS::sptr<OHOS::IRemoteObject>>::QueryResult(int, int)+316)(6f113f37ac6ac882cfa16077ad5b406a)
923# 15 pc 0000000000007e0c xxx_request_client.so(xxx::RPCRequestClient::GetService()+540)(557450139184527807025a632613fd76)
924# 16 pc 0000000000008824 xxx_request_client.so(xxx::RPCRequestClient::Init()+16)(557450139184527807025a632613fd76)
925# 17 pc 0000000000008d60 xxx_request_client.so(CreateRpcRequestByServiceName+104)(557450139184527807025a632613fd76)
926# 18 pc 0000000000008f98 xxx_request_client.so(CreateRpcRequest+72)(557450139184527807025a632613fd76)
927# 19 pc 0000000000002944 xxx_rpc_client.so(xxx::xxx::RpcRequestClient::RpcRequestClient()+224)(02343ed2fff69759d408b23eaf69fcde)
928```
929
930查看 binderCatcher:**此时 43675 的主线程正在与 979 进程通信,抓 binder 时已经卡了 27s**
931
932```
933PeerBinderCatcher -- pid==43675 layer_ == 1
934BinderCatcher --
935    43675:43675 to 979:0 code 5f475249 wait:27.104545829 s frz_state:1  --> binder通信等待了27s
936    57187:39147 to 28644:30753 code 0 wait:0.337894271 s frz_state:3
937    57187:39151 to 28644:28652 code 7 wait:0.531140625 s frz_state:3
938    57187:39150 to 28644:31297 code 0 wait:0.976419270 s frz_state:3
939    57187:38979 to 28644:32554 code 0 wait:0.22108334 s frz_state:3
940    57187:39149 to 28644:30754 code 0 wait:0.534261979 s frz_state:3
941    57187:38949 to 28644:31301 code 0 wait:0.977779166 s frz_state:3
942    57187:39172 to 28644:35667 code 0 wait:1.47387500 s frz_state:3
943    57187:39173 to 28644:28822 code 0 wait:0.565389063 s frz_state:3
944    1477:1676 to 1408:2026 code 17 wait:0.0 s frz_state:3
945    628:8136 to 979:0 code 2 wait:13166.722870859 s frz_state:1
946```
947
948查看 979 进程主线程栈:xxxserver 在等待锁释放。**该问题即为典型的锁使用不当,导致的等锁卡死。**
949
950```
951PeerBinder Stacktrace --
952PeerBinderCatcher start catcher stacktrace for pid : 979
953Result: 0 ( no error )
954Timestamp:2024-04-08 08:24:29.000
955Pid:979
956Uid:3094
957Process name:xxxserver
958Process life time:60410s
959Tid:979, Name:xxxserver
960# 00 pc 00000000001aafc4 /system/lib/ld-musl-aarch64.so.1(__timedwait_cp+192)(91b804d2409a13f27463debe9e19fb5d)
961# 01 pc 00000000001b0d50 /system/lib/ld-musl-aarch64.so.1(__pthread_mutex_timedlock_inner+592)(91b804d2409a13f27463debe9e19fb5d)
962# 02 pc 00000000000c38e0 /system/lib64/libc++.so(std::__h::mutex::lock()+8)(0b61ba21a775725a1bd8802a393b133afbc503a5)   --> 调用了lock,然后等待
963# 03 pc 00000000000086dc xxx_server.so(xxx::xxx::InitImpl(int, std::__h::vector<xxx::xxx::RpcHandle, std::__h::allocator<xxx::xxx::RpcHandle>> const&, std::__h::vector<xxx::xxx::RpcHandle, std::__h::allocator<xxx::xxx::RpcHandle>>&)+84)(f4bb275898d797b22eae35fe48db9009)
964# 04 pc 000000000000798c xxx_request_server.so(xxx::RPCRequestStub::SyncExecute(OHOS::MessageParcel&, OHOS::MessageParcel&)+164)(70cbb10c758902c1e3e179efc93ce0af)
965# 05 pc 0000000000008314 xxx_request_server.so(xxx::RPCRequestStub::OnRemoteRequest(unsigned int, OHOS::MessageParcel&, OHOS::MessageParcel&, OHOS::MessageOption&)+300)(70cbb10c758902c1e3e179efc93ce0af)
966# 06 pc 00000000000153e4 /system/lib64/chipset-pub-sdk/libipc_single.z.so(OHOS::IPCObjectStub::SendRequest(unsigned int, OHOS::MessageParcel&, OHOS::MessageParcel&, OHOS::MessageOption&)+604)(25b3d63905ef47289c096ea42ba2d86a)
967# 07 pc 000000000002b464 /system/lib64/chipset-pub-sdk/libipc_single.z.so(OHOS::IPC_SINGLE::BinderInvoker::OnTransaction(unsigned char const*)+1220)(25b3d63905ef47289c096ea42ba2d86a)
968# 08 pc 000000000002baec /system/lib64/chipset-pub-sdk/libipc_single.z.so(OHOS::IPC_SINGLE::BinderInvoker::HandleCommandsInner(unsigned int)+368)(25b3d63905ef47289c096ea42ba2d86a)
969# 09 pc 000000000002a6b0 /system/lib64/chipset-pub-sdk/libipc_single.z.so(OHOS::IPC_SINGLE::BinderInvoker::HandleCommands(unsigned int)+60)(25b3d63905ef47289c096ea42ba2d86a)
970# 10 pc 000000000002a4dc /system/lib64/chipset-pub-sdk/libipc_single.z.so(OHOS::IPC_SINGLE::BinderInvoker::StartWorkLoop()+120)(25b3d63905ef47289c096ea42ba2d86a)
971# 11 pc 000000000002bc2c /system/lib64/chipset-pub-sdk/libipc_single.z.so(OHOS::IPC_SINGLE::BinderInvoker::JoinThread(bool)+92)(25b3d63905ef47289c096ea42ba2d86a)
972# 12 pc 0000000000004bd4 xxxserver(02cff7e5dce05d6d28096601458b6f6d)
973# 13 pc 00000000000a3b58 /system/lib/ld-musl-aarch64.so.1(libc_start_main_stage2+64)(91b804d2409a13f27463debe9e19fb5d)
974```
975
976反编译即可确定对应卡锁代码行,结合上下文检测锁的使用
977
978#### 修改方法
979
980```cpp
981int xxx()
982{
983    ...
984    mutex_.lock();
985    AIContext aiContext;
986    if (ConvertRpcHandle2AIContext(inputs[0], aiContext) != aicp::SUCCESS) {
987        return FAILED;
988    }
989    ...
990}
991```
992
993修改为:
994
995```cpp
996int xxx()
997{
998    ...
999    AIContext aiContext;
1000    if (ConvertRpcHandle2AIContext(inputs[0], aiContext) != aicp::SUCCESS) {
1001        return FAILED;
1002    }
1003    mutex_.lock();
1004    ...
1005}
1006```
1007
1008结合上下文,合理调整锁的使用
1009
1010#### 推荐建议(问题总结)
1011
10121、多线程交互时需要格外注意时序、死锁问题
1013
1014### APP_INPUT_BLOCK 类典型案例——组件全量刷新
1015
1016#### 背景
1017
1018用户在切换主题时突然卡死,有 sceneboard 的 appfreeze 问题上报
1019
1020该问题为线程繁忙导致卡死
1021
1022#### 错误代码实例
1023
1024对于组件的刷新复用,是通过组件的 key 值来控制的,当页面更新时,若组件的 key 不变,会复用之前的组件;若 key 值变化,会更新组件及其子组件。
1025
1026该函数用户生成桌面组件的 key,关联有 themeStyle,当用户在桌面连续切换主题时,导致组件反复全量刷新,导致卡死。
1027
1028```ts
1029private getForeachKey(item: xxx): string {
1030    ...
1031    return `${item.xxx2}${item.xxx2}...${item.themeStyle}`;
1032}
1033```
1034
1035#### 影响
1036
1037用户在合一桌面切换主题时页面高概率卡死,点击无响应,而后闪退到锁屏界面。
1038
1039严重影响用户使用体验。
1040
1041#### 定位思路
1042
1043提取故障关键信息。
1044
1045```
1046appfreeze: com.ohos.sceneboard APP_INPUT_BLOCK at 20240319022527
1047DisplayPowerInfo:powerState:AWAKE
1048```
1049
1050APP_INPUT_BLOCK 是事件上报的时间为 **14:40:59:440**
1051
1052```
1053DOMAIN:AAFWK
1054STRINGID:APP_INPUT_BLOCK
1055TIMESTAMP:2024/03/14-14:40:59:440 --> 故障上报时间
1056PID:2918
1057UID:20020017
1058PACKAGE_NAME:com.ohos.sceneboard
1059PROCESS_NAME:com.ohos.sceneboard
1060```
1061
1062上报的原因是:User input does not respond! 用户输入事件没有响应
1063可以看到当前是在主线程上(Thread ID == Pid),正在运行的任务从 **14:40:53.499** 开始运行,直到 Fault time **14:40:58** 都还没有运行完
1064
1065```
1066MSG = 
1067Fault time:2024/03/14-14:40:58
1068User input does not respond!
1069mainHandler dump is:
1070 EventHandler dump begin curTime: 2024-03-14 02:40:58.520
1071 Event runner (Thread name = , Thread ID = 2918) is running
1072 Current Running: start at 2024-03-14 02:40:53.499, Event { send thread = 2918, send time = 2024-03-14 02:40:53.499, handle time = 2024-03-14 02:40:53.499, task name =  }
1073```
1074
1075用户输入事件需要第一时间响应,所以同 watchdog 一样都在 High priority event queue;
1076
1077可以看到此时已经有 200+ 的 input event 在队列中阻塞住没有处理了;
1078
1079```
1080 High priority event queue information:
1081 No.1 : Event { send thread = 3370, send time = 2024-03-14 02:40:53.690, handle time = 2024-03-14 02:40:53.690, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] }
1082 No.2 : Event { send thread = 3370, send time = 2024-03-14 02:40:53.699, handle time = 2024-03-14 02:40:53.699, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] }
1083 No.3 : Event { send thread = 3370, send time = 2024-03-14 02:40:53.708, handle time = 2024-03-14 02:40:53.708, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] }
1084 No.4 : Event { send thread = 3370, send time = 2024-03-14 02:40:53.717, handle time = 2024-03-14 02:40:53.717, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] }
1085 No.5 : Event { send thread = 3370, send time = 2024-03-14 02:40:53.726, handle time = 2024-03-14 02:40:53.726, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] }
1086 No.6 : Event { send thread = 3370, send time = 2024-03-14 02:40:53.736, handle time = 2024-03-14 02:40:53.736, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] }
1087 No.7 : Event { send thread = 3370, send time = 2024-03-14 02:40:53.745, handle time = 2024-03-14 02:40:53.745, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] }
1088 No.8 : Event { send thread = 3370, send time = 2024-03-14 02:40:53.754, handle time = 2024-03-14 02:40:53.754, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] }
1089 ...
1090 No.190 : Event { send thread = 3370, send time = 2024-03-14 02:40:56.166, handle time = 2024-03-14 02:40:56.166, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] }
1091 No.191 : Event { send thread = 3370, send time = 2024-03-14 02:40:56.176, handle time = 2024-03-14 02:40:56.176, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] }
1092 No.192 : Event { send thread = 3370, send time = 2024-03-14 02:40:56.186, handle time = 2024-03-14 02:40:56.186, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] }
1093 No.193 : Event { send thread = 2923, send time = 2024-03-14 02:40:56.196, handle time = 2024-03-14 02:40:56.196, id = 1, caller = [watchdog.cpp(Timer:140)] }
1094 No.194 : Event { send thread = 3370, send time = 2024-03-14 02:40:56.196, handle time = 2024-03-14 02:40:56.196, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] }
1095 No.195 : Event { send thread = 3370, send time = 2024-03-14 02:40:56.206, handle time = 2024-03-14 02:40:56.206, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] }
1096 No.196 : Event { send thread = 3370, send time = 2024-03-14 02:40:56.216, handle time = 2024-03-14 02:40:56.216, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] }
1097 No.197 : Event { send thread = 3370, send time = 2024-03-14 02:40:56.226, handle time = 2024-03-14 02:40:56.226, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] }
1098 No.198 : Event { send thread = 3370, send time = 2024-03-14 02:40:56.236, handle time = 2024-03-14 02:40:56.236, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] }
1099 No.199 : Event { send thread = 3370, send time = 2024-03-14 02:40:56.245, handle time = 2024-03-14 02:40:56.245, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] }
1100 No.200 : Event { send thread = 3370, send time = 2024-03-14 02:40:56.254, handle time = 2024-03-14 02:40:56.254, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] }
1101 No.201 : Event { send thread = 3370, send time = 2024-03-14 02:40:56.265, handle time = 2024-03-14 02:40:56.265, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] }
1102 No.202 : Event { send thread = 3370, send time = 2024-03-14 02:40:56.275, handle time = 2024-03-14 02:40:56.274, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] }
1103 No.203 : Event { send thread = 3370, send time = 2024-03-14 02:40:56.284, handle time = 2024-03-14 02:40:56.284, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] }
1104 No.204 : Event { send thread = 3370, send time = 2024-03-14 02:40:56.294, handle time = 2024-03-14 02:40:56.294, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] }
1105 No.205 : Event { send thread = 3370, send time = 2024-03-14 02:40:56.305, handle time = 2024-03-14 02:40:56.305, task name = , caller = [input_manager_impl.cpp(OnPointerEvent:465)] }
1106```
1107
1108从逻辑来看,input event 触发应用主线程任务开始执行,但是 6s 还没有执行完,没有反馈,导致 ANR 超时;
1109因此我们只需要关心 input 触发了应用执行什么任务,该任务为什么会执行超时即可。
1110
1111主线程栈:此时运行时状态,栈顶的 ark_jsruntime GetCurrentThreadId 也不是持锁阻塞或耗时很长函数,抓到的栈为瞬时栈,没有参考意义;
1112
1113```
1114Tid:2918, Name:ohos.sceneboard
1115# 00 pc 000000000009f73c /system/lib/ld-musl-aarch64.so.1(8fa55898166cd804dad43d909b5319cc)
1116# 01 pc 000000000054b7b4 /system/lib64/platformsdk/libark_jsruntime.so(panda::os::thread::GetCurrentThreadId()+12)(7715646e48f750f3dc31e660b056eb43)
1117# 02 pc 00000000002107a4 /system/lib64/platformsdk/libark_jsruntime.so(panda::ecmascript::EcmaVM::CheckThread() const+200)(7715646e48f750f3dc31e660b056eb43)
1118# 03 pc 0000000000432998 /system/lib64/platformsdk/libark_jsruntime.so(panda::JSNApi::GetHandleAddr(panda::ecmascript::EcmaVM const*, unsigned long)+64)(7715646e48f750f3dc31e660b056eb43)
1119# 04 pc 000000000003eeb8 /system/lib64/platformsdk/libace_napi.z.so(ArkNativeReference::Get()+32)(c3a760aff0c73a2e76accaf518321fc9)
1120# 05 pc 0000000000043cb4 /system/lib64/platformsdk/libace_napi.z.so(napi_get_reference_value+48)(c3a760aff0c73a2e76accaf518321fc9)
1121# 06 pc 0000000000007564 /system/lib64/module/events/libemitter.z.so(OHOS::AppExecFwk::SearchCallbackInfo(napi_env__*, std::__h::variant<unsigned int, std::__h::basic_string<char, std::__h::char_traits<char>, std::__h::allocator<char>>> const&, napi_value__*)+248)(8fe2937855aab3ea839419f952597511)
1122# 07 pc 0000000000007d8c /system/lib64/module/events/libemitter.z.so(OHOS::AppExecFwk::OnOrOnce(napi_env__*, napi_callback_info__*, bool)+568)(8fe2937855aab3ea839419f952597511)
1123# 08 pc 00000000000096d8 /system/lib64/module/events/libemitter.z.so(OHOS::AppExecFwk::JS_Once(napi_env__*, napi_callback_info__*) (.cfi)+84)(8fe2937855aab3ea839419f952597511)
1124# 09 pc 000000000002c8f0 /system/lib64/platformsdk/libace_napi.z.so(ArkNativeFunctionCallBack(panda::JsiRuntimeCallInfo*)+168)(c3a760aff0c73a2e76accaf518321fc9)
1125# 10 pc 0000000000187b48 /system/lib64/module/arkcompiler/stub.an(RTStub_PushCallArgsAndDispatchNative+40)
1126# 11 pc 00000000002da5fc /system/lib64/platformsdk/libark_jsruntime.so(panda::ecmascript::InterpreterAssembly::Execute(panda::ecmascript::EcmaRuntimeCallInfo*)+416)(7715646e48f750f3dc31e660b056eb43)
1127# 12 pc 00000000002da5fc /system/lib64/platformsdk/libark_jsruntime.so(panda::ecmascript::InterpreterAssembly::Execute(panda::ecmascript::EcmaRuntimeCallInfo*)+416)(7715646e48f750f3dc31e660b056eb43)
1128# 13 pc 00000000003954a0 /system/lib64/platformsdk/libark_jsruntime.so(panda::ecmascript::JSStableArray::HandleforEachOfStable(panda::ecmascript::JSThread*, panda::ecmascript::JSHandle<panda::ecmascript::JSObject>, panda::ecmascript::JSHandle<panda::ecmascript::JSTaggedValue>, panda::ecmascript::JSHandle<panda::ecmascript::JSTaggedValue>, unsigned int, unsigned int&)+596)(7715646e48f750f3dc31e660b056eb43)
1129# 14 pc 000000000018f4b0 /system/lib64/platformsdk/libark_jsruntime.so(panda::ecmascript::builtins::BuiltinsArray::ForEach(panda::ecmascript::EcmaRuntimeCallInfo*)+840)(7715646e48f750f3dc31e660b056eb43)
1130...
1131```
1132
1133接下来排查流水日志:
1134
1135首先找到上报 APP_INPUT_BLOCK 的时间点,大概在 13:40:59.448 左右,且从这里我们可以看到在事件上报完后,dfx 将卡死的 scb 杀掉。
1136
1137![appfreeze_2024061412](figures/appfreeze_2024061412.png)
1138
1139往前推 6s 左右,可以看到在 14:40:53.498 左右,有一个点击事件发给了 scb。
1140
1141![appfreeze_2024061413](figures/appfreeze_2024061413.png)
1142
1143在这之间的 6s 存在大量的 scb 日志,判断是在做更新渲染
1144
1145![appfreeze_2024061414](figures/appfreeze_2024061414.png)
1146
1147看下对应时间点的 trace:
1148
1149![appfreeze_2024061415](figures/appfreeze_2024061415.png)
1150
1151发现 scb 主线程被占满,确实很繁忙。选择耗时较长的任务,是 **CustomNodeUpdate SwiperPage**,后续就需要排查为啥这个组件里一直在做刷新。
1152
1153根据对应领域排查后发现:swiperPage上把 themeStyle 加入到了 key 里面,key 变了就会走控件新建流程
1154
1155即当用户切换主题或者切换图标风格时,会造成桌面上控件的全量新建,导致主线程繁忙,导致输入事件未响应
1156
1157#### 修改方法
1158
1159仅当切换桌面组件风格时,才触发桌面组件的刷新,缩小刷新范围。
1160
1161```ts
1162+ if (!CheckEmptyUtils.isEmpty(themeStyleInfo.iconResourcePath) &&
1163+     themeStyleInfo.iconResourcePath !== this.themeStyle.iconResourcePath) {
1164+     this.isStyleChanged = true;
1165+     this.themeStyle.iconResourcePath = themeStyleInfo.iconResourcePath; --> themeStyle 与 iconResourcePath 关联
1166+ }
1167```
1168
1169#### 推荐建议(问题总结)
1170
11711、用户点击触发页面更新时,需要合理控制页面刷新的范围,考虑大量组件、频繁刷新等场景
1172
1173### LIFECYCLE_TIMEOUT 类典型案例——加载云图
1174
1175#### 背景/原理
1176
1177用户在打开云笔记时应用卡死后闪退
1178
1179#### 错误代码实例
1180
1181循环中同步获取云图
1182
1183```ts
1184public static xxxFunction(fileUris: string[]): void {
1185    ...
1186    for (const fileuril of fileUrils) {
1187        let file = fs.openSync(fileUri, fs.OpenMode.READ_ONLY);
1188        ...
1189    }
1190    ...
1191}
1192```
1193
1194#### 影响/报错
1195
1196应用拉起、切前台等过程中卡死并闪退
1197
1198#### 定位思路
1199
1200以 notepad LIFECYCLE_TIMEOUT 为例,提取故障关键信息:
1201
1202```
1203 sysfreeze: LIFECYCLE_TIMEOUT LIFECYCLE_TIMEOUT at 20240201100459
1204```
1205
1206查看 MSG 信息:**foreground timeout,对应时长为 5s**
1207
1208```
1209MSG = 
1210Fault time:2024/02/01-10:04:57
1211ability:MainAbility foreground timeout.
1212server:
1213312522; AbilityRecord::ForegroundAbility; the ForegroundAbility lifecycle starts.
1214client:
1215312522; AbilityThread::ScheduleAbilityTransaction; the foreground lifecycle.
1216```
1217
1218LIFECYCLE_HALF_TIMEOUT 上报时间为 **10:04:57:538**;
1219LIFECYCLE_TIMEOUT 上报时间为 **10:04:59:965**;相隔 2.5s 左右,符合预期;
1220
1221```
1222>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
1223DOMAIN:AAFWK
1224STRINGID:LIFECYCLE_TIMEOUT
1225TIMESTAMP:2024/02/01-10:04:59:965
1226PID:18083
1227UID:20020041
1228PACKAGE_NAME:com.huawei.hmos.notepad
1229PROCESS_NAME:com.huawei.hmos.notepad
1230*******************************************
1231start time: 2024/02/01-10:04:57:555
1232DOMAIN = AAFWK
1233EVENTNAME = LIFECYCLE_HALF_TIMEOUT
1234TIMESTAMP = 2024/02/01-10:04:57:538
1235PID = 18083
1236UID = 20020041
1237TID = 17286
1238PACKAGE_NAME = com.huawei.hmos.notepad
1239PROCESS_NAME = com.huawei.hmos.notepad
1240```
1241
1242任务开始的时间为 **10:04:54.798**,离 LIFECYCLE_HALF_TIMEOUT 相隔 2.5s 左右,符合预期;
1243
1244```
1245mainHandler dump is:
1246 EventHandler dump begin curTime: 2024-02-01 10:04:57.306
1247 Event runner (Thread name = , Thread ID = 18083) is running
1248 Current Running: start at 2024-02-01 10:04:54.798, Event { send thread = 18132, send time = 2024-02-01 10:04:54.778, handle time = 2024-02-01 10:04:54.778, task name = UIAbilityThread:SendResult }
1249 History event queue information:
1250 No. 0 : Event { send thread = 18083, send time = 2024-02-01 10:04:46.481, handle time = 2024-02-01 10:04:46.981, trigger time = 2024-02-01 10:04:46.982, completeTime time = 2024-02-01 10:04:46.982, task name =  }
1251 No. 1 : Event { send thread = 18132, send time = 2024-02-01 10:04:47.149, handle time = 2024-02-01 10:04:47.149, trigger time = 2024-02-01 10:04:47.149, completeTime time = 2024-02-01 10:04:47.197, task name = MainThread:BackgroundApplication }
1252 No. 2 : Event { send thread = 18083, send time = 2024-02-01 10:04:44.329, handle time = 2024-02-01 10:04:47.329, trigger time = 2024-02-01 10:04:47.329, completeTime time = 2024-02-01 10:04:47.329, task name =  }
1253 No. 3 : Event { send thread = 18087, send time = 2024-02-01 10:04:48.091, handle time = 2024-02-01 10:04:48.091, trigger time = 2024-02-01 10:04:48.091, completeTime time = 2024-02-01 10:04:48.091, task name =  }
1254 No. 4 : Event { send thread = 18087, send time = 2024-02-01 10:04:51.047, handle time = 2024-02-01 10:04:51.047, trigger time = 2024-02-01 10:04:51.048, completeTime time = 2024-02-01 10:04:51.048, task name =  }
1255 No. 5 : Event { send thread = 18087, send time = 2024-02-01 10:04:54.067, handle time = 2024-02-01 10:04:54.067, trigger time = 2024-02-01 10:04:54.067, completeTime time = 2024-02-01 10:04:54.067, task name =  }
1256 ...
1257```
1258
1259看对应的堆栈信息:libfs.z.so -> libdatashare_consumer.z.so -> libipc_core.z.so
1260
1261```
1262Tid:18083, Name:ei.hmos.notepad
1263# 00 pc 00000000001617a4 /system/lib/ld-musl-aarch64.so.1(ioctl+180)(4ca73cff61bea7c4a687eb0f71c9df69)
1264# 01 pc 000000000003e8a0 /system/lib64/platformsdk/libipc_core.z.so(OHOS::BinderConnector::WriteBinder(unsigned long, void*)+72)(3248fceb1fa676994734e0437430ce37)
1265# 02 pc 0000000000049f38 /system/lib64/platformsdk/libipc_core.z.so(OHOS::BinderInvoker::TransactWithDriver(bool)+296)(3248fceb1fa676994734e0437430ce37)
1266# 03 pc 00000000000496f8 /system/lib64/platformsdk/libipc_core.z.so(OHOS::BinderInvoker::WaitForCompletion(OHOS::MessageParcel*, int*)+116)(3248fceb1fa676994734e0437430ce37)
1267# 04 pc 00000000000490bc /system/lib64/platformsdk/libipc_core.z.so(OHOS::BinderInvoker::SendRequest(int, unsigned int, OHOS::MessageParcel&, OHOS::MessageParcel&, OHOS::MessageOption&)+312)(3248fceb1fa676994734e0437430ce37)
1268# 05 pc 0000000000027700 /system/lib64/platformsdk/libipc_core.z.so(OHOS::IPCObjectProxy::SendRequestInner(bool, unsigned int, OHOS::MessageParcel&, OHOS::MessageParcel&, OHOS::MessageOption&)+132)(3248fceb1fa676994734e0437430ce37)
1269# 06 pc 000000000002799c /system/lib64/platformsdk/libipc_core.z.so(OHOS::IPCObjectProxy::SendRequest(unsigned int, OHOS::MessageParcel&, OHOS::MessageParcel&, OHOS::MessageOption&)+140)(3248fceb1fa676994734e0437430ce37)
1270# 07 pc 000000000002640c /system/lib64/platformsdk/libdatashare_consumer.z.so(OHOS::DataShare::DataShareProxy::OpenFile(OHOS::Uri const&, std::__h::basic_string<char, std::__h::char_traits<char>, std::__h::allocator<char>> const&)+440)(e93b5085235269d4b7218ea7de64b69a)
1271# 08 pc 0000000000014b2c /system/lib64/platformsdk/libdatashare_consumer.z.so(OHOS::DataShare::ExtSpecialController::OpenFile(OHOS::Uri const&, std::__h::basic_string<char, std::__h::char_traits<char>, std::__h::allocator<char>> const&)+160)(e93b5085235269d4b7218ea7de64b69a)
1272# 09 pc 0000000000022c54 /system/lib64/platformsdk/libdatashare_consumer.z.so(OHOS::DataShare::DataShareHelperImpl::OpenFile(OHOS::Uri&, std::__h::basic_string<char, std::__h::char_traits<char>, std::__h::allocator<char>> const&)+96)(e93b5085235269d4b7218ea7de64b69a)
1273# 10 pc 0000000000108b34 /system/lib64/module/file/libfs.z.so(OHOS::FileManagement::ModuleFileIO::OpenFileByDatashare(std::__h::basic_string<char, std::__h::char_traits<char>, std::__h::allocator<char>> const&, unsigned int)+468)(152580bf9c379f11f29fdded278541bd)
1274# 11 pc 0000000000108264 /system/lib64/module/file/libfs.z.so(OHOS::FileManagement::ModuleFileIO::OpenFileByUri(std::__h::basic_string<char, std::__h::char_traits<char>, std::__h::allocator<char>> const&, unsigned int)+1760)(152580bf9c379f11f29fdded278541bd)
1275# 12 pc 00000000001077fc /system/lib64/module/file/libfs.z.so(OHOS::FileManagement::ModuleFileIO::Open::Sync(napi_env__*, napi_callback_info__*) (.cfi)+1036)(152580bf9c379f11f29fdded278541bd)
1276# 13 pc 000000000002bbf8 /system/lib64/platformsdk/libace_napi.z.so(ArkNativeFunctionCallBack(panda::JsiRuntimeCallInfo*)+168)(f5b81db475835ee752235c606b1c5e33)
1277# 14 pc 0000000000132e48 /system/lib64/module/arkcompiler/stub.an
1278```
1279
1280通过 binder 可以看出在与 5235 进程通信,时长大于 2.5s,符合预期。
1281
1282```
1283PeerBinderCatcher -- pid==18083 layer_ == 1
1284BinderCatcher --
1285    18083:18083 to 5235:7437 code 2 wait:2.723147396 s
1286    3462:3840 to 4956:4958 code 8 wait:261.213830169 s
1287    3462:3621 to 4956:4981 code 8 wait:273.550283291 s
1288```
1289
12905235 为媒体库进程,该堆栈无有效信息
1291
1292```
1293PeerBinder Stacktrace --
1294PeerBinderCatcher start catcher stacktrace for pid : 5235
1295Result: 0 ( no error )
1296Timestamp:2024-02-01 10:04:57.000
1297Pid:5235
1298Uid:20020079
1299Process name:com.ohos.medialibrary.medialibrarydata
1300Tid:5235, Name:edialibrarydata
1301# 00 pc 0000000000142d1c /system/lib/ld-musl-aarch64.so.1(epoll_wait+84)(4ca73cff61bea7c4a687eb0f71c9df69)
1302# 01 pc 000000000000fb74 /system/lib64/chipset-pub-sdk/libeventhandler.z.so(OHOS::AppExecFwk::EpollIoWaiter::WaitFor(std::__h::unique_lock<std::__h::mutex>&, long)+224)(a4d21072c08fd3ac639d5cf5b8fb8b51)
1303# 02 pc 0000000000019df8 /system/lib64/chipset-pub-sdk/libeventhandler.z.so(OHOS::AppExecFwk::EventQueue::WaitUntilLocked(std::__h::chrono::time_point<std::__h::chrono::steady_clock, std::__h::chrono::duration<long long, std::__h::ratio<1l, 1000000000l>>> const&, std::__h::unique_lock<std::__h::mutex>&)+180)(a4d21072c08fd3ac639d5cf5b8fb8b51)
1304# 03 pc 0000000000019c6c /system/lib64/chipset-pub-sdk/libeventhandler.z.so(OHOS::AppExecFwk::EventQueue::GetEvent()+128)(a4d21072c08fd3ac639d5cf5b8fb8b51)
1305# 04 pc 00000000000202b8 /system/lib64/chipset-pub-sdk/libeventhandler.z.so(OHOS::AppExecFwk::(anonymous namespace)::EventRunnerImpl::Run()+1164)(a4d21072c08fd3ac639d5cf5b8fb8b51)
1306# 05 pc 0000000000022388 /system/lib64/chipset-pub-sdk/libeventhandler.z.so(OHOS::AppExecFwk::EventRunner::Run()+120)(a4d21072c08fd3ac639d5cf5b8fb8b51)
1307# 06 pc 000000000007ea08 /system/lib64/platformsdk/libappkit_native.z.so(OHOS::AppExecFwk::MainThread::Start()+772)(183fe2babcfdd3e1ea4bca16a0e26a5d)
1308# 07 pc 0000000000011ac8 /system/bin/appspawn(RunChildProcessor+236)(7b715884c45cfe57b22df46fdaeeca88)
1309# 08 pc 0000000000034684 /system/bin/appspawn(AppSpawnChild+264)(7b715884c45cfe57b22df46fdaeeca88)
1310# 09 pc 00000000000344f4 /system/bin/appspawn(AppSpawnProcessMsg+380)(7b715884c45cfe57b22df46fdaeeca88)
1311# 10 pc 00000000000305a0 /system/bin/appspawn(OnReceiveRequest+1820)(7b715884c45cfe57b22df46fdaeeca88)
1312# 11 pc 0000000000017c58 /system/lib64/chipset-pub-sdk/libbegetutil.z.so(HandleRecvMsg_+260)(22f33d1b0218f31bad0dcc75cf348b90)
1313# 12 pc 00000000000178fc /system/lib64/chipset-pub-sdk/libbegetutil.z.so(HandleStreamEvent_+148)(22f33d1b0218f31bad0dcc75cf348b90)
1314# 13 pc 0000000000015478 /system/lib64/chipset-pub-sdk/libbegetutil.z.so(ProcessEvent+112)(22f33d1b0218f31bad0dcc75cf348b90)
1315# 14 pc 0000000000015090 /system/lib64/chipset-pub-sdk/libbegetutil.z.so(RunLoop_+308)(22f33d1b0218f31bad0dcc75cf348b90)
1316# 15 pc 000000000002eff4 /system/bin/appspawn(AppSpawnRun+116)(7b715884c45cfe57b22df46fdaeeca88)
1317# 16 pc 000000000001f438 /system/bin/appspawn(main+724)(7b715884c45cfe57b22df46fdaeeca88)
1318# 17 pc 00000000000a0974 /system/lib/ld-musl-aarch64.so.1(libc_start_main_stage2+64)(4ca73cff61bea7c4a687eb0f71c9df69)
1319# 18 pc 000000000001106c /system/bin/appspawn(_start_c+76)(7b715884c45cfe57b22df46fdaeeca88)
1320```
1321
1322以上可以得到信息:应用通过文件系统 Open::Sync 同步通过 uri 加载文件,调用到 datashare 请求媒体库文件数据
1323
1324查看对应时间点的流水信息:进程调用 datashare 加载云图后卡死,与堆栈信息吻合;
1325
1326![appfreeze_2024061416](figures/appfreeze_2024061416.png)
1327
1328查看具体代码:
1329
1330在循环中同步加载 fileUri ,这种明显是不合理的,当弱网或者同时加载大量数据的条件下,极易出现卡死情况,应用侧需要整改。
1331
1332#### 修改方法
1333
1334同步加载改为异步加载,并用标志位来标识是否加载完,用户界面展示加载中效果。
1335
1336```ts
1337public static xxxFunction(fileUris: string[]): void {
1338    ...
1339    for (const fileuril of fileUrils) {
1340        let file = fs.openSync(fileUri, fs.OpenMode.READ_ONLY);
1341        ...
1342    }
1343    ...
1344}
1345```
1346
1347修改为:
1348
1349```ts
1350public static async xxxFunction(fileUris: string[]): void {
1351    ...
1352    AppStorage.setOrCreate<boolean>('isLoadingPic', true); --> 用于页面 load 效果展示
1353    for (const fileuril of fileUrils) {
1354        let file = await fs.open(fileUri, fs.OpenMode.READ_ONLY);
1355        ...
1356    }
1357    ...
1358}
1359```
1360
1361#### 推荐建议(问题总结)
1362
13631、请求云侧数据需要验证充分,有网、弱网、无网场景下
1364
13652、不要在应用生命周期函数中做耗时操作
1366