CN116450390A

CN116450390A - Watchdog detection method and electronic equipment

Info

Publication number: CN116450390A
Application number: CN202210018496.XA
Authority: CN
Inventors: 余亮; 赵俊民; 孙继龙
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2022-01-07
Filing date: 2022-01-07
Publication date: 2023-07-18

Abstract

The application provides a watchdog detection method. The method comprises the steps that a software watchdog detects the CPU running state of the electronic equipment at fixed time; the software watchdog is located in a kernel layer of the electronic equipment; if the CPU running state meets the preset kernel restarting condition, restarting the kernel system by the software watchdog, and stopping the feeding operation of the hardware watchdog; if the kernel system is restarted successfully, the software watchdog resumes the feeding operation to the hardware watchdog; if the hardware watchdog does not receive the feeding operation of the software watchdog in the detection period, restarting the electronic device. Therefore, a detection mechanism is arranged between the hardware layer and the kernel layer of the electronic equipment, and hierarchical recovery of the electronic equipment is realized.

Description

Watchdog detection method and electronic equipment

Technical Field

The application relates to the field of intelligent terminals, in particular to a watchdog detection method and electronic equipment.

Background

The watchdog, also called a watchdog timer, is essentially a timer. Watchdog can be classified into software watchdog and hardware watchdog. A watchdog generally has an input and an output, wherein the input is called a watchdog (kicking the dog or service the dog), and the output is generally used for resetting a corresponding business process or hardware when the detected business process or hardware is abnormal, so that the detected business process or hardware is recovered to be normal.

Although the software watchdog can recover the detected business process, in practical application, the software watchdog may not successfully reset the detected business process due to the interference of uncontrollable factors, and even the software watchdog fails, thereby affecting the normal use of the device.

Disclosure of Invention

In order to solve the technical problems, the application provides a watchdog detection method and electronic equipment. In the watchdog detection method, a detection mechanism is arranged between a hardware layer and a kernel layer, so that hierarchical recovery of the electronic equipment is realized. Because the granularity of the restarting of the hardware is larger than that of the restarting of the kernel, when the kernel cannot be restarted successfully, the success rate of recovering the kernel system can be greatly improved through the restarting of the hardware, and the problem that the kernel system cannot be restarted successfully repeatedly by the kernel layer of the electronic equipment can be avoided.

In a first aspect, the present application provides a watchdog detection method. The method comprises the following steps: the software watchdog detects the CPU running state of the electronic equipment at regular time; the software watchdog is located in a kernel layer of the electronic equipment; if the CPU running state meets the preset kernel restarting condition, restarting the kernel system by the software watchdog, and stopping the feeding operation of the hardware watchdog; if the kernel system is restarted successfully, the software watchdog resumes the feeding operation to the hardware watchdog; if the hardware watchdog does not receive the feeding operation of the software watchdog in the detection period, restarting the electronic device. In this way, a detection mechanism is arranged between the hardware layer and the kernel layer of the electronic equipment, and hierarchical recovery of the electronic equipment is realized through the software watchdog and the hardware watchdog. Because the granularity of the restarting of the hardware is larger than that of the restarting of the kernel, when the kernel cannot be restarted successfully, the success rate of recovering the kernel system can be greatly improved through the restarting of the hardware, and the problem that the kernel system cannot be restarted successfully repeatedly by the kernel layer of the electronic equipment can be avoided.

By way of example, the software watchdog may be a CPU core state watchdog in the core layer mentioned below.

According to a first aspect, a software watchdog detects a CPU operating state of an electronic device, including: the software watchdog detects the running state of each target CPU core respectively; wherein the physical state of the target CPU core is an on-line state.

According to a first aspect, or any implementation manner of the first aspect, the software watchdog detects an operation state of the target CPU core, including: if the software watchdog detects that the target task on the target CPU core can be scheduled, determining that the running state of the target CPU core is normal; if the software watchdog detects that the target task on the target CPU core cannot be scheduled, determining that the running state of the target CPU core is abnormal; wherein the target task is bound to run on the target CPU core.

According to a first aspect, or any implementation manner of the first aspect, the software watchdog detects an operation state of the target CPU core, including: the software watchdog sends a detection message to the target CPU core; if the software watchdog receives the detection feedback message of the target CPU core, determining that the running state of the target CPU core is normal; the detection feedback message is a feedback message aiming at the detection message; and if the software watchdog does not receive the detection feedback message, determining that the running state of the target CPU core is abnormal.

According to a first aspect, or any implementation of the first aspect above, the probe message is a ping message sent in the form of an interrupt.

According to the first aspect, or any implementation manner of the first aspect, the software watchdog detects an operation state of each target CPU core, including: the software watchdog detects the running state of each target CPU core based on a first detection mode, and determines the first running state of each target CPU core; the software watchdog detects the running state of each target CPU core based on a second detection mode, and determines the second running state of each target CPU core; and the software watchdog judges whether the target CPU core runs abnormally or not according to the first running state and/or the second running state of the target CPU core.

According to the first aspect, or any implementation manner of the first aspect, the CPU running state meets a preset kernel restart condition, including: the software watchdog determines the number of abnormal CPU cores according to the running state of each target CPU core; if the number of abnormal CPU cores reaches a preset number threshold, the software watchdog determines that the CPU running state meets a preset kernel restarting condition.

According to the first aspect, or any implementation manner of the first aspect, the CPU running state meets a preset kernel restart condition, including: if the running state of the target CPU core is abnormal and the target CPU core belongs to a preset type, the software watchdog determines that the running state of the CPU meets a preset kernel restarting condition.

In a second aspect, the present application provides an electronic device. The electronic device includes: one or more processors; one or more memories; the one or more memories store one or more programs that, when executed by the one or more processors, cause the electronic device to perform the watchdog detection method of any of the first aspect and the first aspect.

Any implementation manner of the second aspect and the second aspect corresponds to any implementation manner of the first aspect and the first aspect, respectively. The technical effects corresponding to the second aspect and any implementation manner of the second aspect may be referred to the technical effects corresponding to the first aspect and any implementation manner of the first aspect, which are not described herein.

In a third aspect, the present application provides a computer readable storage medium comprising a computer program which, when run on an electronic device, causes the electronic device to perform the watchdog detection method of any one of the first aspect and the first aspect.

Any implementation manner of the third aspect and any implementation manner of the third aspect corresponds to any implementation manner of the first aspect and any implementation manner of the first aspect, respectively. The technical effects corresponding to the third aspect and any implementation manner of the third aspect may be referred to the technical effects corresponding to the first aspect and any implementation manner of the first aspect, which are not described herein.

Drawings

Fig. 1 is a schematic diagram of a hardware structure of an electronic device exemplarily shown;

FIG. 2 is a schematic diagram of a software architecture of an exemplary electronic device;

fig. 3 is a schematic system architecture diagram corresponding to the watchdog detection method provided in the embodiment of the present application;

fig. 4 is a schematic flow chart of a watchdog detection method according to an embodiment of the present application;

fig. 5 is a schematic diagram of an application scenario to which the watchdog detection method shown in fig. 4 is applicable;

FIG. 6 is one of the flow diagrams of the exemplary watchdog detection method provided in the embodiments of the present application;

FIG. 7 is a second flow chart of a method for watchdog detection according to an exemplary embodiment of the present disclosure;

fig. 8 is a schematic diagram of an application scenario to which the watchdog detection method shown in fig. 7 is applicable;

Fig. 9 is a third schematic flow chart of a watchdog detection method according to an embodiment of the present application;

fig. 10 is a schematic diagram of an application scenario to which the watchdog detection method shown in fig. 9 is applicable;

fig. 11 is a further schematic diagram of an application scenario to which the exemplary watchdog detection method is applicable;

fig. 12 is a further schematic diagram of an application scenario to which the exemplary watchdog detection method is applicable;

FIG. 13 is one of the exemplary CPU core state detection schematics shown;

FIG. 14 is a second exemplary CPU core state detection diagram;

fig. 15 is a further schematic diagram of an application scenario to which the exemplary watchdog detection method is applicable.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone.

The terms first and second and the like in the description and in the claims of embodiments of the present application are used for distinguishing between different objects and not necessarily for describing a particular sequential order of objects. For example, the first target object and the second target object, etc., are used to distinguish between different target objects, and are not used to describe a particular order of target objects.

In the embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as examples, illustrations, or descriptions. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.

In the description of the embodiments of the present application, unless otherwise indicated, the meaning of "a plurality" means two or more. For example, the plurality of processing units refers to two or more processing units; the plurality of systems means two or more systems.

Fig. 1 shows a schematic configuration of an electronic device 100. Alternatively, the electronic device 100 may be a terminal, which may also be referred to as a terminal device, and the terminal may be a cellular phone (cellular phone), a tablet computer (pad), or the like, which is not limited in this application. It should be understood that the electronic device 100 shown in fig. 1 is only one example of an electronic device, and that the electronic device 100 may have more or fewer components than shown in the figures, may combine two or more components, or may have a different configuration of components. The various components shown in fig. 1 may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.

The electronic device 100 may include: processor 110, external memory interface 120, internal memory 121, universal serial bus (universal serial bus, USB) interface 130, charge management module 140, power management module 141, battery 142, antenna 1, antenna 2, mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headset interface 170D, sensor module 180, keys 190, motor 191, indicator 192, camera 193, display 194, and subscriber identity module (subscriber identification module, SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor, a gyroscope sensor, a barometric sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a proximity sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, a bone conduction sensor, and the like.

The processor 110 may include one or more processing units, such as: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, a memory, a video codec, a digital signal processor (digital signal processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.

The controller may be a neural hub and a command center of the electronic device 100, among others. The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.

A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the processor 110 may include one or more interfaces, such as a PCM interface, a universal serial bus (universal serial bus, USB) interface, or the like. PCM interfaces may also be used for audio communication to sample, quantize and encode analog signals. In some embodiments, the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface. In some embodiments, the audio module 170 may also transmit audio signals to the wireless communication module 160 through the PCM interface to implement a function of answering a call through the bluetooth headset. Both the I2S interface and the PCM interface may be used for audio communication. The USB interface 130 is an interface conforming to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The USB interface 130 may be used to connect a charger to charge the electronic device 100, and may also be used to transfer data between the electronic device 100 and a peripheral device. And can also be used for connecting with a headset, and playing audio through the headset. The interface may also be used to connect other electronic devices, such as AR devices, etc.

Furthermore, it should be noted that in some embodiments, the processor 110 may have a built-in hardware watchdog, for example, a timer in the processor 110 may be used as the hardware watchdog. The processor 110 writes an initial value by initialization of the program and starts a timer to detect the processor 110. This timer signals a restart to the processor 110 once an error occurs in the processor 110.

In other embodiments, a separate watchdog chip may be provided in the electronic device 100 as a hardware watchdog for detecting the processor 110. The watchdog chip mainly includes a pin for feeding a watchdog (typically connected to a GPIO (General Purpose Input Output, general purpose input/output) pin of the processor 110) and a RESET pin (connected to a RESET pin of the processor 110). If processor 110 does not change the level of the watchdog pin within a certain period of time, the reset pin will change state to reset processor 110.

The charge management module 140 is configured to receive a charge input from a charger. The charging management module 140 may also supply power to the electronic device through the power management module 141 while charging the battery 142. The power management module 141 is used for connecting the battery 142, and the charge management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140 and provides power to the processor 110, the internal memory 121, the external memory, the display 194, the camera 193, the wireless communication module 160, and the like.

The wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like. The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. The mobile communication module 150 may provide a solution for wireless communication including 2G/3G/4G/5G, etc., applied to the electronic device 100. The wireless communication module 160 may provide solutions for wireless communication including wireless local area network (wireless local area networks, WLAN) (e.g., wireless fidelity (wireless fidelity, wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field wireless communication technology (near field communication, NFC), infrared technology (IR), etc., as applied to the electronic device 100. In some embodiments, antenna 1 and mobile communication module 150 of electronic device 100 are coupled, and antenna 2 and wireless communication module 160 are coupled, such that electronic device 100 may communicate with a network and other devices through wireless communication techniques.

The electronic device 100 implements display functions through a GPU, a display screen 194, an application processor, and the like. The display screen 194 is used to display images, videos, and the like. In some embodiments, the electronic device 100 may include 1 or N display screens 194, N being a positive integer greater than 1.

The electronic device 100 may implement photographing functions through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like. The camera 193 is used to capture still images or video. In some embodiments, electronic device 100 may include 1 or N cameras 193, N being a positive integer greater than 1.

Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 may play or record video in a variety of encoding formats, such as: dynamic picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4, etc.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to enable expansion of the memory capabilities of the electronic device 100. The external memory card communicates with the processor 110 through an external memory interface 120 to implement data storage functions. For example, files such as music, video, etc. are stored in an external memory card.

The internal memory 121 may be used to store computer executable program code including instructions. The processor 110 executes various functional applications of the electronic device 100 and data processing by executing instructions stored in the internal memory 121, so that the electronic device 100 implements the watchdog detection method in the embodiment of the present application. The internal memory 121 may include a storage program area and a storage data area. The storage program area may store an application program (such as a sound playing function, an image playing function, etc.) required for at least one function of the operating system, etc. The storage data area may store data created during use of the electronic device 100 (e.g., audio data, phonebook, etc.), and so on.

The electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playing, recording, etc.

The audio module 170 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be disposed in the processor 110, or a portion of the functional modules of the audio module 170 may be disposed in the processor 110.

The speaker 170A, also referred to as a "horn," is used to convert audio electrical signals into sound signals. The electronic device 100 may listen to music, or to hands-free conversations, through the speaker 170A.

A receiver 170B, also referred to as a "earpiece", is used to convert the audio electrical signal into a sound signal. When electronic device 100 is answering a telephone call or voice message, voice may be received by placing receiver 170B in close proximity to the human ear.

Microphone 170C, also referred to as a "microphone" or "microphone", is used to convert sound signals into electrical signals.

The earphone interface 170D is used to connect a wired earphone. The headset interface 170D may be a USB interface 130 or a 3.5mm open mobile electronic device platform (open mobile terminal platform, OMTP) standard interface, a american cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.

The pressure sensor is used for sensing a pressure signal and can convert the pressure signal into an electric signal. In some embodiments, the pressure sensor may be provided on the display screen 194. In some embodiments, touch operations that act on the same touch location, but at different touch operation strengths, may correspond to different operation instructions.

Touch sensors, also known as "touch panels". The touch sensor may be disposed on the display screen 194, and the touch sensor and the display screen 194 form a touch screen, which is also referred to as a "touch screen". The touch sensor is used to detect a touch operation acting on or near it. The touch sensor may communicate the detected touch operation to the application processor to determine the touch event type. Visual output related to touch operations may be provided through the display 194.

The keys 190 include a power-on key, a volume key, etc. The electronic device 100 may receive key inputs, generating key signal inputs related to user settings and function controls of the electronic device 100.

The software system of the electronic device 100 may employ a layered architecture, an event driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture. In this embodiment, taking an Android system with a layered architecture as an example, a software structure of the electronic device 100 is illustrated.

Fig. 2 is a software configuration block diagram of the electronic device 100 according to the embodiment of the present application.

The layered architecture of the electronic device 100 divides the software into several layers, each with a distinct role and division of labor. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into three layers, an application layer, an application framework layer, and a kernel layer from top to bottom.

The application layer may include a series of application packages.

As shown in FIG. 2, the application package may include applications for cameras, gallery, calendar, talk, map, navigation, music, video, short message, etc.

The application framework layer provides an application programming interface (application programming interface, API) and programming framework for application programs of the application layer. The application framework layer includes a number of predefined functions.

As shown in fig. 2, the application framework layers may include a System service (System Server), a System service watchdog (System Server WatchDog), a layer integrator (surfefliger), a System mission critical timeout detection watchdog (Xcollie), an initialization service (Init), an initialization service watchdog (Init watchdog), and so on.

The System Server is a provider of Android basic services, and is the most basic requirement for Android System operation. Some services in the System reside in a System Server, such as WMS (Window Manager Server, window management service), AMS (Activity Manager System Service, operation management service), PMS (Package Manager Server, packet management service), etc., which are commonly found in a System Server process in a thread manner.

System Server WatchDog, which is used to detect whether the System Server has a deadlock or not, and has no response. When the System Server fails, system Server WatchDog kills the (kill) System Server process, and realizes soft restart to perform self-recovery of the System Server. Among them, the objects detected by System Server WatchDog are mainly divided into two types, one is an object lock and one is a processor (Handler) of a thread.

Wherein System Server WatchDog is initialized and started in a System Server process. When the System Server is started, various Android services are registered and started, including initialization and startup of System Server WatchDog.

The System Server periodically detects whether key services such as AMS, WMS and the like are normally operated. If the System Server detects that the key services such as AMS, WMS and the like are all running normally, the dog feeding operation is carried out on System Server WatchDog. If the System Server detects that any one of the key services such as the AMS, the WMS and the like is not operated normally, the dog feeding operation is not carried out on System Server WatchDog. System Server WatchDog resets the System Server process when the System Server does not perform dog feeding operation on System Server WatchDog for a number of (e.g., three) cycles.

The surfaceflink is started in the System Server process and is responsible for unified management of the device's frame buffers. During startup, surfaceFlinger creates two threads, one for detecting console events and the other for rendering the UI of the system. In particular, surfeflinger may be used to manage the display subsystem and no multiple applications provide fusion of 2D and 3D layers.

In some implementations, the SurfaceFlinger may also be disposed in a system library with an android system, which is not limited in this application.

The system critical task timeout detection watchdog (Xcollie) is used for detecting whether actions executed in the critical process are completed or not. The Xcollie may be configured with two threads, one thread is configured to set a state flag of a critical process to be normal when a critical thread performs an action, determine whether to set the state flag of the critical process to be abnormal according to whether the critical process performs the action overtime, and one thread is configured to poll the state of each critical process and reset the critical process when the state flag of the critical process is abnormal.

By way of example, the key process may be SurfaceFlinger, vold (volume Daemon), audioFlinger, face regcon, etc.

The Vold, i.e. the Volume daemon, is used to manage hot plug events of the storage class in Android. AudioFlinger is an executor of audio system policy and is responsible for management of audio stream devices and processing and transmission of audio stream data. Face Regconize is used to implement Face recognition, face verification, etc.

Init is the first process of the Linux system user space, namely the first process of the Android system user space based on the Linux kernel. The Init process is mainly responsible for analyzing the property file, initializing the property, analyzing script Init. Rc, triggering Action and starting Service, providing system property Service management, completing corresponding triggering events and maintaining system level Service.

The Init watchdog is used for detecting the Init flow and resetting the Init process when the Init flow is abnormal.

The kernel layer is a layer between hardware and software.

As shown in fig. 2, the kernel layer contains at least a lock detection watchdog (huntdetect watchdog) and a CPU core status watchdog.

The Hungdetect watchdog is used for detecting a Kernel system and controlling the Kernel to restart when a Kernel error (Kernel Panic) occurs.

In this embodiment, the Hungdetect watchdog is further configured to detect System Server WatchDog, xcollie, and Init watchdog, and control the kernel to restart when any of System Server WatchDog, xcollie, and Init watchdog fails, or when traffic detected by any of System Server WatchDog, xcollie, and Init watchdog cannot be recovered.

And the CPU core state watchdog is used for detecting the running state of each core of the CPU and controlling the core to restart when the CPU core state meets the preset core restarting condition.

In this embodiment, the hardware watchdog may also detect the huntdetect watchdog and the CPU core state watchdog, for example, detect a reset action of the huntdetect watchdog and the CPU core state watchdog. When either the huntdetect watchdog or the CPU core state watchdog fails, or the huntdetect watchdog and the CPU core state watchdog cannot restart the core successfully, the hardware watchdog can control the whole machine to restart.

Illustratively, the kernel layer may also include a software watchdog (referred to herein as a hungtast watchdog) for detecting whether a process is in the D (uninterruptible sleep, uninterruptible deep sleep) state.

Correspondingly, in practical application, the hardware watchdog can also detect the hungtast watchdog. It will be appreciated that, regarding the detection of the hungtast watchdog by the hardware watchdog, reference may be made to the detection of the hungtetect watchdog by the hardware watchdog, which is not described herein.

It will be appreciated that the layers and components contained in the layers in the software structure shown in fig. 2 do not constitute a specific limitation on the electronic device 100. In other embodiments of the present application, electronic device 100 may include more or fewer layers than shown, and more or fewer components may be included in each layer, as the present application is not limited.

It will be appreciated that, in order to implement the watchdog detection method in the present application, the electronic device includes corresponding hardware and/or software modules that perform the respective functions. The steps of an algorithm for each example described in connection with the embodiments disclosed herein may be embodied in hardware or a combination of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Those skilled in the art may implement the described functionality using different approaches for each particular application in conjunction with the embodiments, but such implementation is not to be considered as outside the scope of this application.

The embodiment of the application provides a watchdog detection method. In this embodiment, the lower layer watchdog of the electronic device may detect whether the service of the layer is abnormal, and perform a reset operation on the service of the layer when the service of the layer is abnormal. The lower layer watchdog of the electronic equipment can also detect the upper layer watchdog, and when the upper layer watchdog fails or the upper layer watchdog cannot recover the upper layer service, the reset operation is carried out on the service of the layer, so that the recovery of the upper layer service is realized, namely, the watchdog detection method for hierarchical recovery is realized.

It should be noted that, the "layer" in this embodiment may be divided according to software and hardware of the electronic device, or may be divided according to a system architecture of the electronic device, which is not limited in this application.

Fig. 3 is a system architecture diagram provided in an embodiment of the present application. The following explains the watchdog detection method provided in this embodiment, taking the first layer and the second layer of the electronic device as examples. Wherein the first layer is the upper layer of the second layer.

As shown in fig. 3, a first watchdog is included in a first layer of the electronic device, and is configured to detect a first service in the first layer, and perform a reset or restart operation on the first service when the first service is abnormal.

Illustratively, when the first service is operating normally, the first service times a first feeding operation for the first watchdog. When the first business cannot normally operate, the first business stops the first feeding operation of the first watchdog. The first watchdog resets or restarts the first service if it does not receive the first feeding operation of the first service in one detection period or in a plurality of (e.g., 3) detection periods.

Also exemplary, the first watchdog periodically obtains a first status flag for the first service. When the first service is normally operated, the first state mark indicates that the service is normal; when the first service cannot normally run, the first state mark indicates that the service is abnormal. For example, when the status flag is "ERROR," a traffic anomaly is indicated; when the status flag is "OK", it indicates that the service is normal. And if the first state marks of the first service acquired by the first watchdog indicate the first service abnormality, or the first state marks of the first service acquired in a plurality of (e.g. 3) detection periods indicate the first service abnormality, resetting or restarting the first service.

As shown in fig. 3, a second watchdog is included in a second layer of the electronic device, and is configured to detect a second service in the second layer, and perform a reset or restart operation on the second service when the second service is abnormal.

Illustratively, the second business is timed to perform a second feeding operation on the second watchdog when the second business is operating normally. And when the second service cannot normally operate, stopping the second feeding dog operation of the second watchdog by the second service. The second watchdog resets or restarts the second service if no second feeding operation of the second service is received within one detection period or within a plurality of (e.g., 3) detection periods.

Also exemplary, the second watchdog periodically obtains a second status flag for the second service. When the second service is normally operated, the second state mark indicates that the service is normal; when the second service cannot normally run, the second state mark indicates that the service is abnormal. And if the second state marks of the second service acquired by the second watchdog indicate the second service is abnormal, or the second state marks of the second service acquired in a plurality of (e.g. 3) detection periods indicate the second service is abnormal, resetting or restarting the second service.

With continued reference to fig. 3, the second watchdog is further configured to detect, in addition to the second service in the second layer, the first watchdog, and perform a reset or restart operation on the second service when the first watchdog is abnormal, so that the first service in the first layer is reloaded to operate after the second service is reset or restarted successfully, so that the first service is restored to be normal. The first watchdog exception may refer to failure of the first watchdog, or may be that the first watchdog fails to successfully recover or restart the first service.

Illustratively, the first watchdog is timed to perform a third feeding operation on the second watchdog when the first watchdog is operating normally and the first service it detects is operating normally. And stopping the third feeding dog operation of the second watchdog by the first watchdog when the first watchdog cannot normally operate or cannot successfully recover or restart the first service. The second watchdog resets or restarts the second service if the third watchdog feeding operation of the first watchdog is not received within one detection period or within a plurality of (e.g. 3) consecutive detection periods.

Also exemplary, the second watchdog periodically obtains a third status flag for the first watchdog. When the first watchdog operates normally and the first service detected by the first watchdog operates normally, the third state mark of the first watchdog indicates that the service is normal; when the first watchdog fails to operate normally or fails to successfully resume or restart the first service, a third status flag of the first watchdog indicates that the service is abnormal. And if the third state marks of the first watchdog, which are acquired by the second watchdog, indicate abnormal business, or the third state marks of the first watchdog, which are acquired in a plurality of (e.g. 3) detection periods, indicate abnormal business, resetting or restarting the second business.

When the first watchdog and the second watchdog are software watchdog, the detection period of the second watchdog aiming at the first watchdog is longer than the detection period of the first watchdog on the first service. Optionally, the detection period of the second watchdog for the first watchdog is an integer multiple, such as 2 times, of the detection period of the first watchdog for the first service.

As an alternative embodiment, the first layer is an application layer of the electronic device, and the second layer is an application framework layer of the electronic device. The first service is an application program, and the second service is a system service.

As another alternative embodiment, the first layer is an application framework layer of the electronic device, and the second layer is a kernel layer of the electronic device. The first service is a system service, and the second service is a kernel system.

As yet another alternative embodiment, the first layer is an application kernel layer of the electronic device, and the second layer is a hardware layer of the electronic device. The first service is a kernel system, and the second service is a processing chip.

It should be noted that, the second watchdog resets or restarts the second service, which may be triggered by the second service stopping the first feeding operation or the status flag of the second service indicating that the service is abnormal, or may be triggered by the first watchdog stopping the second feeding operation or the status flag of the first watchdog indicating that the service is abnormal.

In this way, hierarchical recovery of the electronic device is achieved by setting a detection mechanism between layers. Because the second layer is the lower layer of the first layer, the granularity of the second layer for service recovery or restarting is larger than that of the first layer, so that the success rate of the service recovery of the first layer is higher. When the first watchdog in the first layer of the electronic device cannot successfully recover the first service detected by the first watchdog, the first service in the first layer can be recovered through the second layer (namely the next layer), namely the first service is reloaded and operated after the second service is reset or restarted, so that the first service in the first layer can be recovered normally. If the second service in the second layer cannot be successfully recovered, the second service can be recovered through the next layer (namely the third layer), so that the problem that the reset or restarting operation is repeatedly executed in a certain layer of the electronic equipment cannot be successfully solved.

Based on the above described hierarchical recovery scheme of the electronic device, the technical scheme of the application is described in detail below through several specific scenarios to which the hierarchical recovery scheme is applicable.

Scene one

In this scenario, a detection mechanism is set between an application framework layer and a kernel layer of the electronic device, and a detection method of a watchdog provided in this application is explained by taking a detection of a Hungdetect watchdog (stuck detection watchdog) in the kernel layer as an example of a detection of a System Server WatchDog (system service watchdog) in the application framework layer.

Fig. 4 is a flow chart illustrating an exemplary method of performing watchdog detection by System Server WatchDog. As shown in fig. 4, the procedure of the watchdog detection method is executed by System Server WatchDog, and specifically includes:

step 101, initializing a system service watchdog in the process of starting a system service process.

As can be seen from the above description, the System service watchdog System Server WatchDog is used for detecting a System service process in an application framework layer, that is, a System Server process, for example, detecting whether a System Server process is deadlocked, and has no response.

Step 102, the system service process detects whether the system service process is operating normally according to a preset period.

It should be noted that, since various Android services, such as AMS, WMS, etc., are registered and started when the System Server process is started. Thus, the operation in step 102 is, for example, that the system service process periodically detects whether the above-described registered and initiated critical services running therein are functioning properly.

Correspondingly, when the key services normally run, determining that the system service process is currently normally running, and executing step 103; otherwise, it is determined that the system service process cannot normally run, and step 105 is performed.

Step 103, the system service process sends first feeding dog information to the system service watchdog.

For example, in some implementations, the first feeding dog information sent by the system service process to the system service watchdog may be "key" flag information, or may be other agreed information.

Based on the working principle of the watchdog, the system service watchdog receives the first watchdog information sent by the system service process in the watchdog feeding period, and then considers that the system service process can normally run currently without executing a reset operation, namely the system service watchdog does not process the system service process after receiving the first watchdog information.

Step 104, the system service watchdog sends second feeding information to the stuck detection watchdog located at the kernel layer.

Illustratively, in some implementations, the second feeding dog information sent by the system service watchdog to the stuck detection watchdog may be, for example, "key" flag information, or other agreed information.

Based on the working principle of the watchdog, the stuck detection watchdog receives second watchdog feeding information sent by the system service watchdog in a watchdog feeding period, so that the system service process can be considered to normally run at present, reset operation is not needed to be executed, namely, after the stuck detection watchdog receives the second watchdog feeding information, the kernel system is not restarted.

The timing of steps 103 and 104 is not limited in this embodiment.

And 105, in the feeding period, if the system service watchdog does not receive the first feeding information sent by the system service process, stopping sending the second feeding information to the stuck detection watchdog, and executing a reset operation on the system service process.

Specifically, in some implementations, in order to avoid frequent reset operations, i.e. restarting, performed on the system service process, and reduce the influence of the user on using the electronic device, the system service watchdog may be set to perform the reset operations on the system service process when the first watchdog feeding information sent by the system service process is not received in a plurality of (e.g. 3) consecutive watchdog feeding periods, or in other words, detection periods.

Further, in some implementations, in order to avoid that the system service process sends the first feeding dog information, however, for some reasons, for example, interference caused by external factors, the first feeding dog information does not reach the system service watchdog in time, so that the system service watchdog is caused to mistakenly generate an exception for the system service process, and a reset operation is performed on the system service process. The method can be set to trigger the dog call first when the first dog feeding information sent by the system service process is not received in n dog feeding periods, and execute dog biting when the first dog feeding information sent by the system service process is not received in (n+m) dog feeding periods.

Illustratively, n is an integer greater than 0 and m is an integer greater than 0.

In addition, regarding the dog bite, the system service watchdog is triggered to execute a reset operation on the system service process, and the dog is used for reminding an operation and maintenance person to perform maintenance test on the system.

Correspondingly, maintenance testing is carried out on the system for operation and maintenance personnel conveniently, and the problem of abnormality is accurately positioned. When a dog call is triggered, the system service watchdog can grab an exception log through pre-compiled dump logic.

In addition, it should be noted that when the system service watchdog stops sending the second feeding dog information to the stuck detection watchdog, if the second feeding dog information sent by the system service watchdog is not received in the preset period, the system service watchdog may be considered to not successfully reset the system service process, or the system service watchdog fails, that is, the system service process cannot be reset. In this case, the stuck detection dog performs a reset operation, i.e., reboots the kernel system. Therefore, after the kernel is restarted successfully, the System Server process is restarted, so that the normal state is recovered.

In addition, for a stuck detection watchdog, in addition to receiving first feeding information provided by a system service watchdog from an upper layer, third feeding information provided by a kernel system of the detection of the present layer is also received. Therefore, in practical application, the condition for triggering the lock detection watchdog to execute the reset operation may be that the second feeding information is not received in the preset period, or that the third feeding information is not received in the preset period.

In addition, since the second feeding information and the third feeding information come from different objects, the corresponding preset periods may be different, and the specific setting mode may be set according to the actual service requirement, which is not limited in this application.

In addition, it should be noted that, in practical application, the first feeding dog information may be actively sent to the system service watchdog by the system service process, or may be actively obtained by the system service watchdog from the system service process, which is not limited in this embodiment.

Accordingly, the second feeding dog information may be actively sent to the stuck detection watchdog by the system service watchdog, or may be actively obtained from the system service watchdog by the stuck detection watchdog, which is not limited in this embodiment.

Accordingly, the third feeding dog information may be actively sent to the lock detection watchdog by the kernel system, or may be actively obtained from the kernel system by the lock detection watchdog, which is not limited in this embodiment.

Therefore, in the watchdog detection method provided by the embodiment, the System Server WatchDog of the detection System Server is connected to the Hungdetect watchdog of the kernel layer, when an abnormality occurs at System Server WatchDog, and the detected business process cannot be recovered to be normal through reset, the Hungdetect watchdog of the kernel layer executes the reset operation, so that the abnormal business process can be recovered to be normal based on the principle of hierarchical recovery, and the normal use of the electronic equipment is further ensured.

For a better understanding of the implementation of the huntdetect watchdog to System Server WatchDog, a detailed description is provided below in conjunction with fig. 5.

Fig. 5 is a schematic view of an exemplary application scenario. As shown in fig. 5, an application framework layer of the electronic device includes System Server WatchDog, which is used for detecting a System Server process in the application framework layer, for example, detecting whether a deadlock occurs in the System Server process, and has no response.

The System Server process periodically detects whether a key service running in the System Server process runs normally, and the key service may be, for example, AMS, WMS, etc. When each key service running in the System Server process runs normally, the System Server process performs the first dog feeding operation on System Server WatchDog at regular time, that is, the System Server process sends the first dog feeding information to System Server WatchDog at regular time, or System Server WatchDog obtains the first dog feeding information from the System Server process at regular time. When any key service running in the System Server process cannot normally run, the System Server process stops the first dog feeding operation of System Server WatchDog.

System Server WatchDog if the first dog feeding operation of the System Server Process is not received in one detection period or in a plurality of (e.g. 3) detection periods, the System Server Process is restarted.

As shown in fig. 5, a Hungdetect watchdog is included in the kernel layer of the electronic device to detect whether the kernel system is operating normally.

And when the kernel system normally operates, the kernel system performs a second feeding operation on the Hungdetect watchdog at regular time, namely the kernel system sends the third feeding information to the Hungdetect watchdog at regular time, or the Hungdetect watchdog acquires the third feeding information from the kernel system at regular time. When the kernel system cannot normally operate, the kernel system stops the second feeding dog operation of the Hungdetect watchdog.

The Hungdetect watchdog restarts the kernel system if no second feeding operation of the kernel system is received within one detection period or within a consecutive number of (e.g., 3) detection periods.

With continued reference to fig. 5, the hungdetect watchdog may detect System Server WatchDog in addition to the kernel System, and restart the kernel System when System Server WatchDog is abnormal, so as to reload the System Server process after the kernel is restarted successfully, so that the System Server process returns to normal. The System Server WatchDog exception may refer to System Server WatchDog failure, or System Server WatchDog failure to restart the System Server process successfully.

When System Server WatchDog is running normally and the detected System Server process is running normally, system Server WatchDog performs a third feeding operation on the Hungdetect watchdog at a timing, that is, system Server WatchDog sends the second feeding information to the Hungdetect watchdog at a timing, or the Hungdetect watchdog acquires the second feeding information from System Server WatchDog at a timing. When System Server WatchDog fails to operate properly, or the System Server process fails to restart successfully, system Server WatchDog stops the third feeding operation on the Hungdetect watchdog.

The Hungdetect watchdog restarts the kernel system if no third feeding operation of System Server WatchDog is received within one detection period or a consecutive number of (e.g., 3) detection periods. Furthermore, after the kernel is restarted successfully, the System Server process is restarted to restore the normal state.

The detection period of the Hungdetect watchdog on System Server WatchDog is longer than the detection period of System Server WatchDog on the System Server progress. Optionally, the detection period of the Hungdetect watchdog on System Server WatchDog is an integer multiple of the detection period of System Server WatchDog on the System Server process. For example, system Server WatchDog detection period for System Server process is 30 seconds, and Hungdetect watchdog detection period for System Server WatchDog is 60 seconds.

In this way, hierarchical recovery of the electronic device is achieved by arranging a detection mechanism between the kernel layer and the application framework layer. Because the granularity of the kernel restart is larger than that of the System Server process restart, when the System Server process cannot be restarted successfully, the success rate of recovering the System Server process can be greatly improved through the kernel restart, and the problem that the System Server process cannot be restarted repeatedly by the application program framework layer of the electronic equipment can be avoided.

In an application scenario of restarting an electronic device, a situation may occur that the kernel System is started successfully, but the System Server process is started unsuccessfully. At this point, the Hungdetect watchdog in the kernel layer will still detect System Server WatchDog. Since the System Server process is not successfully started, system Server WatchDog cannot execute the feeding operation on the Hungdetect watchdog, and the Hungdetect watchdog considers System Server WatchDog to be abnormal, and restarts the kernel System again, which may cause the problem of restarting the kernel System for multiple times.

To solve this problem, the present embodiment adjusts the start time of the Hungdetect watchdog in the kernel layer. The detection function of the Hungdetect watchdog to System Server WatchDog is started after the System Server is started successfully, but not after the kernel System is started.

Fig. 6 is a flow chart illustrating an exemplary watchdog detection method. As shown in fig. 6, the flow of the method for detecting a watchdog specifically includes:

step 11, in the process of initializing the system service process, system Server WatchDog is started.

In step 12, the hungdetect watchdog initiates a detection function on System Server WatchDog.

Step 13, the hungdetect watchdog restarts the kernel system when System Server WatchDog is abnormal.

When the electronic device is initialized and restarted, the System Server is killed (kill), the dog feeding function is closed by System Server WatchDog, and at this time System Server WatchDog, the dog feeding operation is not performed on the Hungdetect watchdog in the kernel layer. Until the System Server is restarted successfully, system Server WatchDog will not proceed with feeding the huntdetect watchdog.

In the process of initializing the electronic device, the kernel system is started first, and the process in the application framework layer is started. When the kernel system is initialized, the Hungdetect watchdog in the kernel layer starts, but its detection function for System Server WatchDog does not. When the System Server in the application framework layer is initialized, the Hungdetect watchdog in the kernel layer is instructed to start the detection function of the Hungdetect watchdog on System Server WatchDog.

For example, when the System Server performs an initialization operation, the System Server sends indication information to the Hungdetect watchdog in the kernel layer, where the indication information is used to indicate that the System Server is initialized, or is used to indicate that the Hungdetect watchdog starts a detection function for System Server WatchDog. Further, the kernel system starts its detection function for System Server WatchDog based on the instruction information to realize detection for System Server WatchDog.

In this way, the detection function of the Hungdetect watchdog on System Server WatchDog is started after the System Server is started and is started at least once after the System Server is started successfully, so that the problem that the Hungdetect watchdog detects System Server WatchDog before the System Server is started can be avoided, and the phenomenon that the Hungdetect watchdog is mistakenly considered System Server WatchDog to be abnormal before the System Server is started to restart the kernel System is avoided.

Scene two

In this scenario, a detection mechanism is set between an application framework layer and a kernel layer of an electronic device, and an Xcollie (system critical task timeout detection watchdog) in the application framework layer is taken as an example to detect a huntdetect in the kernel layer, so as to explain the watchdog detection method provided in this application.

Fig. 7 is a flow chart illustrating an Xcollie execution watchdog detection method. As shown in fig. 7, the Xcollie executes a flow of a watchdog detection method, specifically including:

in step 201, a system critical task timeout detection watchdog is initialized, and state information for identifying a critical process is set as a normal flag.

Specifically, the key processes in this embodiment may be Vold, surfaceFlinger, audioFlinger, face regcon, etc. mentioned above, which are not listed here, and the present application is not limited thereto.

In addition, it should be noted that, regarding the timing of initializing the system critical task timeout detection watchdog, in some implementations, the key detection watchdog may be initialized during the process of starting the critical process, so that it may be ensured that the key detection watchdog may start to detect in time when a function corresponding to an action executed by the critical process is called.

In addition, as can be seen from the above description, the system critical task timeout detection watchdog is a resident thread in the critical process and is used for detecting whether the action (action) performed by the critical detection is completed, taking surfeflinger as an example, if the action detected by the system critical task timeout detection watchdog is, for example, whether the action of rendering the system UI is completed. Therefore, in other implementations, the system critical task timeout detection watchdog may be initialized when the critical process is started and a function corresponding to an action executed by the critical process is called, so that it can be determined according to service requirements, which actions need to be detected by the system critical task timeout detection watchdog, and the system critical task timeout detection watchdog is better adapted to various application scenarios.

That is, whether the system critical task timeout detection watchdog is initialized, and whether the system critical task timeout detection watchdog detects that the execution of the action is completed or not can be determined according to actual service requirements.

Further, with regard to the above, after the system critical task timeout detects that the watchdog initialization is successful, the state information identifying the critical process may be set to a normal flag, for example, may be set to "OK". Accordingly, for the anomaly flag mentioned below, it may be represented by "ERROR", for example.

In addition, in some implementations, a "1" may be used as a normal flag and a "0" may be used as an abnormal flag, as desired.

It should be understood that the above description is only an example for better understanding of the technical solution of the present embodiment, and is not to be taken as the only limitation of the present embodiment. In practical application, the normal mark and the abnormal mark can be agreed according to the requirement, and the application is not limited to the normal mark and the abnormal mark.

Step 202, after detecting that the action executed by the key process calls the starting node, the system key task timeout detection watchdog records the execution duration of the action executed by the key process.

It will be appreciated that in practical applications, an action executed by a critical process is specifically implemented by a function (or program code, hereinafter collectively referred to as a function) implementing the action, and in the function, the function is implemented by a start node (begin flag) that identifies the start of the action and an end node (end flag) that identifies the end of the action.

For example, the system critical task timeout detection watchdog may learn the start time of an action according to the invocation of the begin identification bit, and learn the end time of an action according to the invocation of the end identification bit.

In addition, the timeout period for each action can be reasonably set according to the service characteristics and the time required for completing normal execution of the action.

For example, for an action with a completion time of 5ms (time from the call start node to the call end node), the timeout period may be 5ms, or 8ms, or n×5ms. Illustratively, n is, for example, an integer greater than 0.

It should be understood that the above description is only an example for better understanding of the technical solution of the present embodiment, and is not to be taken as the only limitation of the present embodiment.

In step 203, when the execution duration reaches the timeout time corresponding to the action, the system critical task timeout detects whether the end node of the watchdog query action is invoked.

Specifically, when the execution duration reaches the timeout time corresponding to the action, but the ending node is not called, that is, the timeout occurs, step 204 is executed; otherwise, step 206 is performed.

In step 204, the system critical task timeout detection watchdog modifies state information identifying the critical process from a normal flag to an abnormal flag.

Specifically, when timeout occurs, an action indicating that the execution of the critical process is not completed, which may be due to an exception in the critical process. Therefore, in order for the system critical task timeout detection watchdog or the stuck detection watchdog that detects the critical process watchdog to perform a reset operation to restore the critical process to normal, it is necessary to modify the state information identifying the critical process from a normal flag to an abnormal flag, such as "OK" to "ERROR". In this way, after the key process executes the reset operation and the next initialization is successful, the ERROR can be changed into OK again, and the system key task timeout detection watchdog can continue to detect the action to be detected.

In step 205, the system critical task timeout detection watchdog sends an exception flag to the stuck detection watchdog in the kernel layer, and the stuck detection watchdog performs a reset operation.

It can be appreciated that, because the application framework layer is located above the kernel layer, the system critical task timeout detection watchdog located in the application framework has less impact on the electronic device when performing the reset operation than the stuck detection watchdog located in the kernel layer. Thus, in some implementations, the system critical task timeout detection watchdog may perform a reset operation prior to performing step 205.

For example, if the key detection watchdog is successfully reset, that is, the key process is about to be recovered to be normal, the state information of the key process is modified from the abnormal mark to the normal mark, and the normal mark is sent to the stuck detection watchdog in the kernel layer, so that the stuck detection watchdog does not make a dog call or even a dog bite (i.e., performs a reset operation on the kernel system) when a preset period (a dog feeding period) arrives.

Step 205 is performed, for example, if the critical process is not successfully reset, e.g., the critical process is not restored to normal, or if the critical process cannot be reset due to failure of the system critical task timeout detection watchdog, i.e., the reset is performed by means of the underlying stuck detection watchdog.

In addition, it should be understood that, in practical application, the granularity of performing service recovery or restarting by the dead lock detection watchdog located at the lower layer is larger than the granularity of performing service recovery or restarting by the system critical task timeout detection watchdog located at the upper layer, and the influence of the dead lock detection watchdog on the electronic device by performing the reset operation is larger than the influence of the system critical task timeout detection watchdog on the electronic device by performing the reset operation, so the period for triggering the dead lock detection watchdog to perform the reset is generally larger than the period for triggering the system critical task timeout detection watchdog to perform the reset.

Based on this, in order to better understand when the reset operation in the watchdog is detected by the system mission critical timeout and when the reset operation is performed by the stuck detection watchdog in the watchdog detection scheme in this embodiment, two specific implementations are given below.

Mode 1:

illustratively, the system critical task timeout detection watchdog detects that the state information of the critical process is in an abnormal length of the abnormality flag.

Correspondingly, when the abnormal time length is smaller than the time length threshold value, detecting that the watchdog re-executes the reset operation when the critical task of the system is overtime; when the abnormal time period is not less than the time period threshold, the above step 205 is performed.

It will be appreciated that with respect to the above-described time duration threshold, in some implementations the determination of the time delay between the application framework layer and the kernel layer may be based on the timeout period in which timeout of action occurs, and the detection period (also known as the dog feeding period) of the stuck detection watchdog.

Mode 2:

illustratively, similar to mode 1, the system critical task timeout detection watchdog may still detect that the state information of the critical process is in an abnormal length of the abnormality flag. The difference is that whether the reset operation is continuously executed by the system critical task timeout detection watchdog or the reset operation is executed by the stuck detection watchdog is determined by judging the timeout times in the mode. Therefore, after the abnormal time length is obtained, the system critical task timeout detection watchdog can determine the timeout times according to the abnormal time length and the timeout time length.

Correspondingly, when the timeout times are smaller than the time threshold, the system critical task timeout detection watchdog re-executes the reset operation; when the abnormal time period is not less than the threshold number of times, step 205 is executed.

Regarding the setting of the timeout times, similar to the setting of the duration threshold, the setting can be performed according to the service requirements in combination with the actual situation, which is not limited in the present application.

In addition, in practical application, in addition to the system critical task timeout detection watchdog, there may be a system service watchdog, an initialization watchdog, etc. for detecting different service processes. Therefore, in order to enable the stuck detection watchdog located at the lower layer to enable detection of different watchdog of the upper layer, a pre-packaged common node for deciding whether or not to perform a reset operation by the stuck detection watchdog may be provided in the core layer.

Correspondingly, the system critical task timeout detection watchdog sends an abnormal mark of the stuck detection watchdog, and in the implementation scheme provided with the public node, the abnormal mark is specifically sent to the public node in the kernel layer.

The public node is preset with a preset strategy, namely a strategy for deciding whether the lock detection watchdog executes the reset operation or not. Therefore, after receiving the abnormal mark sent by the upper layer watchdog, for example, after the system critical task timeout detection watchdog in this embodiment, the public node determines whether the reset operation needs to be performed by the dead lock detection watchdog according to the abnormal mark and the preset policy.

Accordingly, the common node notifies the stuck detection watchdog to execute the reset operation when it is determined that the reset operation needs to be executed by the stuck detection watchdog through the processing.

In some implementations, the common node may actively send a reset instruction to the stuck detection watchdog when it is determined that the reset operation needs to be performed by the stuck detection watchdog, or may periodically obtain instruction information determined by the common node from the common node by the stuck detection watchdog, and further perform the reset operation when the reset instruction is obtained.

In addition, it should be noted that, in order to enable the technical scheme provided by the embodiment to be applicable to more application scenarios and meet different service requirements, in practical application, when the execution process of which actions are identified according to the service requirements is abnormal, the watchdog can be reset by means of the blocking detection of the lower layer.

For example, the above operation may set a hierarchical resume flag for a function corresponding to an action executed by a critical process when the critical process is started and the function corresponding to the action executed by the critical process is called, and when the system critical task timeout detection watchdog is initialized. Thus, when the state information of the key process is in an abnormal state and the executed action is set by the layered recovery mark, the system key task overtime detection watchdog can inform the stuck detection watchdog to execute the reset operation.

The above-mentioned hierarchical restoration flag may be set as needed in practical applications, and is not limited to this case.

In some implementations, the system critical task timeout detection watchdog may send only the exception flag of the critical process corresponding to the action with the hierarchical recovery flag to the stuck detection watchdog, so that after the stuck detection watchdog receives the exception flag, or after the public node receives the exception flag, it is not necessary to consider whether the critical process corresponding to the exception flag has set the hierarchical recovery flag, and the default received critical processes all have set the hierarchical recovery flag, that is, all stuck detection watchdog intervenes.

For example, in other implementations, the system critical task timeout detection watchdog may not distinguish whether the critical process sets a hierarchical restoration flag, and when detecting that the state information identifying the critical process changes, the state information identifying the critical process and other flag information set for the critical process, for example, the hierarchical restoration flag, are directly sent to the public node or the stuck detection watchdog of the lower layer, and the public node or the stuck detection watchdog of the lower layer identifies whether intervention is needed.

Step 206, stopping the detection of the action performed by the critical process.

It will be appreciated that in practical applications, the change in critical process state is shown in table 1.

TABLE 1 Key Process State Change Table

Scene(s)	Status of
		Initialization (init)	Normal state
Execution end (end)	Normal state
		Execution Timeout (Timeout)	Abnormality of

That is, if the ending node is called in the timeout period, it indicates that the action executed this time is normally ended, in this case, the system critical task timeout detection watchdog does not need to do processing, that is, does not need to do reset operation, so after the action executed this time is normally ended, the system critical task timeout detection watchdog can stop the detection of the action executed by the critical process, thereby saving occupation of system resources of the electronic device.

In addition, it should be noted that, when implementing the watchdog detection method provided in this embodiment based on the system architecture to which the hierarchical restoration scheme is applicable, when the end node is called, state information identifying a critical process may also be sent to a lock detection watchdog of a lower layer, that is, a kernel layer. For this case, the system critical task timeout detection watchdog issues status information to the stuck detection watchdog, specifically a normal flag, such as "OK", that identifies that the critical process is in a normal state.

In addition, in some implementations, when the detection of the action is finished, the information sent by the system critical task timeout detection watchdog to the lock detection watchdog may also be state information identifying that the action is normally finished, so that when the subsequent lock detection watchdog does not receive the state information about the critical process to execute the action in a preset period (a dog feeding period), the critical process is not considered to be abnormal, or the system critical task timeout detection watchdog is considered to be invalid, and further the reset operation is not executed.

Therefore, according to the watchdog detection method provided by the embodiment, the Xcolie for detecting the key process is accessed to the Hungdetect watchdog of the kernel layer, when the Xcolie is abnormal and the detected business process cannot be recovered to be normal through resetting, the Hungdetect watchdog of the kernel layer executes resetting operation, so that the abnormal business process can be recovered to be normal based on the principle of hierarchical recovery, and normal use of the electronic equipment is further ensured.

For a better understanding of the implementation of detection of Xcollie by a Hungdetect watchdog, a detailed description is provided below in connection with fig. 8.

Fig. 8 is a schematic view of an exemplary application scenario. As shown in fig. 8, xcollie is included in an application framework layer of the electronic device, and is used to detect whether an action performed by a key process SurfaceFlinger is completed, for example, whether UI operations of the rendering system are completed.

It will be appreciated that surfeflink is initiated in the System Server process and is responsible for unified management of the device's frame buffers. During startup, surfaceFlinger creates two threads, one for detecting console events (hereinafter thread a) and the other for rendering the UI of the system (hereinafter thread B). In particular, surfeflinger may be used to manage the display subsystem and provide a fusion of 2D and 3D layers for multiple applications.

And the Xcolie is used for detecting whether the action executed in the key process is completed or not. Wherein, xcollie can set two threads, one thread (hereinafter referred to as thread C) is used for setting the state flag of the critical process to be normal when the critical thread executing action starts, determining whether to set the state flag of the critical process to be abnormal according to whether the critical process executing action is overtime, and one thread (hereinafter referred to as thread D) is used for polling the state of each critical process and resetting the critical process when the state flag of the critical process is abnormal.

With continued reference to FIG. 8, based on the characteristics of Xcolie and SurfaceFringer, the Xcolie obtains a first state flag from SurfaceFringer, specifically from thread D in Xcolie.

For example, in some implementations, the first status flag may be determined by thread D actively based on console events detected by thread a in surfeflinger and/or progress information rendered by thread B.

For example, in other implementations, the information that determines the first status flag may be actively sent to the Xcollie by thread a and thread B in surfeflinger, and then the first status flag may be determined by thread D in Xcollie based on the received information.

With continued reference to FIG. 8, when an action performed by SurfaceFlinger is performed normally, the first state acquired by Xcolie is marked as normal, e.g., "OK", and when an action performed by SurfaceFlinger occurs as timeout, the first state acquired by Xcolie is marked as abnormal, e.g., "ERROR".

That is, the information given to Xcollie by surfeflinger is essentially identifying its state. Whether the Xcollie performs a reset operation or whether the huntdetect watchdog of the kernel layer is notified to perform a reset operation is determined according to a first state flag identifying its state.

Illustratively, in some implementations, if the first state flags acquired in one detection period or in a plurality (e.g., 3) of consecutive detection periods are both exception flags, the surfacefliger process is restarted, i.e., a reset operation is performed by the Xcollie.

With continued reference to fig. 8, a huntdetect watchdog is included in the kernel layer of the electronic device to detect whether the kernel system is operating properly.

When the kernel system operates normally, the kernel system feeds the Hungdetect watchdog at regular time. When the kernel system cannot normally operate, the kernel system stops the feeding operation of the Hungdetect watchdog.

Illustratively, the huntdetect watchdog restarts the kernel system, i.e., performs a reset operation, if no feeding operation of the kernel system is received within one detection period or within a plurality (e.g., 3) of consecutive detection periods.

With continued reference to fig. 8, the hungdetect watchdog may detect, in addition to the kernel system, an Xcollie, and restart the kernel system when the Xcollie is abnormal, so as to reload the surfefliger process after the kernel is restarted successfully, so that the surfefliger process returns to normal. The Xcollie exception may refer to an Xcollie failure, or that Xcollie fails to restart the surfeflinger process successfully.

When the Xcollie operates normally and the detected surfeflinger process operates normally, the Xcollie may actively send the second state flag to the Hungdetect watchdog, or the Hungdetect watchdog may actively obtain the second state flag from the Xcollie.

It is appreciated that in some implementations, whether the Xcollie actively sends the second status flag to the Hungdetect watchdog or the Hungdetect watchdog actively obtains the second status flag from the Xcollie may be performed according to a preset period, i.e., a timed send or a timed obtain.

In practical applications, the second status flag may be the same as or different from the first status flag.

Illustratively, when the first status flag is a normal flag, the second status flag is the same as the first status flag, and is also a normal flag if Xcollie is normal (valid).

Illustratively, when the first status flag is a normal flag, the second status flag is different from the first status flag, specifically an abnormal flag, if Xcollie is abnormal (failed).

Illustratively, when the first status flag is an exception flag, the second status flag is the same as the first status flag, regardless of whether Xcollie is normal or not, and is an exception flag.

That is, for the scenario of Xcollie access huntdetect, the huntdetect watchdog may acquire the second status flag no matter whether Xcollie can function normally or whether the surfeflinger process can be restarted.

Furthermore, in other implementations, the second status flag may be a detection of a channel communicating with the Xcollie by the Hungdetect watchdog, or a detection of Xcollie transmission information.

Specifically, if the Hungdetect watchdog does not query any information provided by Xcollie, and cannot detect the current state of Xcollie, an abnormal second state flag may be generated.

Illustratively, the Hungdetect watchdog restarts the kernel system if no second state flag is obtained that identifies that the SurfaceFlinger process is returning to normal, or no information is obtained that identifies that Xcolie stops detecting actions performed by SurfaceFlinger, either within one detection period or within a number of consecutive (e.g., 3) detection periods. Furthermore, after the kernel is restarted successfully, the SurfaceFlinger process is reloaded and started to restore to normal.

Furthermore, it will be appreciated that to ensure implementation of this scheme, the period of detection of Xcollie by the Hungdetect watchdog is longer than the period of detection of the surfeflinger process by Xcollie.

Optionally, the period of detection of Xcollie by the Hungdetect watchdog is an integer multiple of the period of detection of the surfefliger process by Xcollie. For example, the Xcolie detection period for the SurfaceFlinger process is 30 seconds and the Hungdetect watchdog detection period for the Xcolie is 60 seconds.

In this way, hierarchical recovery of the electronic device is achieved by arranging a detection mechanism between the kernel layer and the application framework layer. Because the granularity of the kernel restart is larger than that of the surface eFlinger process restart, when the surface eFlinger process cannot be restarted successfully, the success rate of recovering the surface eFlinger process can be greatly improved through the kernel restart, and the problem that the surface eFlinger process cannot be restarted repeatedly by an application program framework layer of the electronic equipment can be avoided.

Scene three

In the scene, a detection mechanism is arranged between an application program framework layer and a kernel layer of the electronic device, wherein a Hungdetect watchdog in the kernel layer can detect a plurality of software watchdog in the application program framework layer at the same time.

Referring to fig. 9, fig. 9 is a schematic flow diagram schematically illustrating a process of enabling a huntdetect watchdog to detect multiple software watchdog in an application framework layer simultaneously by accessing the multiple software watchdog in the application framework layer to the huntdetect watchdog by means of a common node pre-encapsulated in the kernel layer.

As shown in fig. 9, the watchdog detection method provided in this embodiment applies a pre-packaged common node in a kernel layer, and specifically includes:

in step 301, the common node receives information provided by each software watchdog in the application framework layer.

The software watchdog may be System Server WatchDog, xcollie, init watchdog, etc., which are not listed here, but are not limited thereto.

Accordingly, the information provided by each of the above-mentioned software watchdog is specific to the characteristics of the software watchdog in practical application.

For example, for System Server WatchDog, the information provided may be the second feeding dog information described in scenario one above.

It will be appreciated that when the information provided by System Server WatchDog is the second feeding dog information, system Server WatchDog is indicated to be valid and the detected System Server progress is also normal.

Accordingly, in some implementation scenarios, the information provided by System Server WatchDog may not be the second feeding information, for example, when System Server WatchDog fails or the System Server process detected by it is abnormal, system Server WatchDog may stop providing the second feeding information. In this case, system Server WatchDog may not provide information any more, may set the provided information to "null", or may provide contracted abnormality information in order to facilitate informing that the underlying common node is already abnormal at present.

The information provided for Xcollie may then be the second status flag described in scenario two above. As can be seen from the description of the second scenario, when the reset operation needs to be performed by means of the lock detection watchdog (Hungdetect watchdog), the second state is marked as an abnormal flag, and when the Xcollie and the corresponding critical process states are normal, the second state is provided as a normal flag.

In step 302, the public node determines the priority of the information provided by each software watchdog according to the service process corresponding to each software watchdog.

For example, taking an Android System as an example, most of key processes are registered and started in a System Server process, so that the System Server process is a precondition for ensuring that other key processes are normal, and therefore in some implementations, information related to the System Server process can be determined as a first priority, and information provided by other key processes running in the System Server process is determined as a second priority, that is, the first priority is higher than the second priority.

Based on this, the priority is determined to be a first priority for information provided by System Server WatchDog and a second priority for information provided by Xcollie.

It should be understood that the above description is only an example for better understanding of the technical solution of the present embodiment, and is not to be taken as the only limitation of the present embodiment. In practical application, the priorities of different business processes can be set reasonably according to other business requirements, so that the method is better suitable for various application scenes.

In step 303, the public node determines whether to execute the reset operation according to the preset decision and the determined priority of the information provided by each software watchdog.

For example, in some implementations, the preset decision may specify that when the information of the first priority is the second feeding information, that is, the System Server process is normal, no matter what content is provided by other software watchdog, no jamming detection watchdog intervention is needed to perform the reset operation currently.

For example, in other implementations, the preset decision may specify that, when the information of the first priority is the second feeding information, and N pieces of information provided by other software watchdog (such as N pieces) are abnormal information, it is determined that the jamming detection watchdog is currently required to intervene in performing the reset operation.

Illustratively, N is an integer greater than 1 and N is an integer greater than N.

For example, the public node receives 5 pieces of information of the software watchdog in a total in a detection period, one of the information is provided by System Server WatchDog, 4 pieces of information (namely, the above-mentioned N) are provided by other software watchdog (2 pieces of provided information are normal information, 2 pieces of information are abnormal information), if N is specified to be greater than or equal to 2, the detection watchdog is blocked to intervene to execute a reset operation, and in this case, the decision information made by the public node is a reset instruction.

For example, in other implementations, the preset decision may specify that when the first priority information is the second feeding information, and when the software watchdog providing the second priority information is a specified software watchdog, for example, a reset flag is set, it is determined that a seizure detection watchdog intervention is currently required to perform a reset operation.

Further, on the basis of the above, each software watchdog of the upper layer may be considered to fail, or the detected abnormal business process is time, if the normal state is not recovered in a plurality of continuous periods (such as 3), the public node determines that the lock detection watchdog is needed to intervene to execute the reset operation.

It should be understood that the above description is only an example for better understanding of the technical solution of the present embodiment, and is not to be taken as the only limitation of the present embodiment. In practical application, different preset strategies can be reasonably set according to other service requirements, so that the method is better suitable for various application scenes.

Correspondingly, based on the preset strategy and the determined priority of the information provided by each software watchdog, if the reset operation is determined to be required to be executed, executing step 304; otherwise, step 305 is performed.

In step 304, the public node provides a reset instruction to trigger the lock detection watchdog to execute a reset operation.

The reset operation of the lock detection watchdog is described in detail in the first and second scenes, and will not be described in detail here.

In step 305, the common node provides contracted feeding information to the stuck detection watchdog.

It can be appreciated that the common node provides contracted feeding information, for example, a "key" mark, or an "OK" mark, etc., so that the stuck detection watchdog knows that the upper software watchdog and the detected business process are normal, and no intervention is needed to restart the kernel system.

Therefore, according to the watchdog detection method, when the upper layer software watchdog cannot successfully reset the abnormal business process, whether the lower layer software watchdog executes the reset public node or not can be decided in advance, the public node unifies the state marks for receiving the actions sent by the upper layer different software watchdog and identifying the business process/business process executed by the upper layer software watchdog, and the state marks are analyzed and decided according to the preset strategy, so that the stuck detection watchdog is informed to execute the reset when the situation that the stuck detection watchdog needs to execute the reset is determined, and the layered and graded recovery scheme in the application is more reasonable.

In addition, the public node can determine a processing result according to the feeding information and the state marking of different software watchdog from the upper layer according to the service requirement, so that restarting operation of the blocking detection watchdog on the kernel system is reduced as much as possible under the condition that the user does not influence the use of electronic equipment, and resource expenditure is reduced.

In order to better understand the implementation scheme of accessing a huntdetect watchdog by a plurality of software watchdog in an application framework layer, the huntdetect watchdog in a kernel layer is taken as an example to detect System Server WatchDog and Xcollie in the application framework layer at the same time, and the watchdog detection method provided by the application is explained.

Referring to fig. 10, fig. 10 is a schematic view of an application scenario exemplarily shown.

As shown in fig. 10, the application framework layer of the electronic device includes a System Server process detected by System Server WatchDog and System Server WatchDog, an Xcollie, and a key process corresponding to the Xcollie, such as a surfefliger; a Hungdetect watchdog is included in the kernel layer of the electronic device to detect whether the kernel system is operating properly, and a common node in communication with System Server WatchDog, xcollie of the application framework layer and the Hungdetect watchdog of the kernel layer, respectively.

Regarding System Server WatchDog detection of the System Server process, the System Server process performs a first feeding operation on System Server WatchDog, when to restart the System Server process, and when to perform a second feeding operation (second feeding information in scenario one) on System Server WatchDog, details are described in scenario one, and this is not repeated here.

Regarding the Xcollie detecting the action in the surfacefliger, how Xcollie obtains the first status flag of the surfacefliger, when to restart the surfacefliger, and when to provide the second status flag, details are described in scenario two, and are not repeated here.

Regarding how the Hungdetect watchdog detects the kernel system, the kernel system performs a third feeding operation on the Hungdetect watchdog, and when to restart the kernel system, details of detection of the kernel system on the layer by the Hungdetect watchdog in the first or second scenario will be described herein.

The following describes the writing of the second feeding operation and the second status flag to the common node of the kernel layer in conjunction with fig. 10, with the common node deciding whether to perform the reset operation by the stuck detect watchdog, rather than directly to the Hungdetect watchdog.

For example, in a specific implementation, a preset policy may be preset in the public node according to a service requirement, that is, a policy for deciding whether to execute a reset operation by the lock detection watchdog.

Therefore, the public node decides whether the reset operation is needed to be executed by the Hungdetect watchdog currently according to the preset strategy, the obtained feeding dog information and the state mark.

Regarding the above-mentioned obtained feeding information, in some implementation scenarios, it may be possible to obtain information describing System Server WatchDog as normal, for which case it is indicated that the System Server process detected by it is also normal, and at this time System Server WatchDog feeds the Hungdetect watchdog as normal.

Accordingly, in other implementation scenarios, information describing System Server WatchDog failure or System Server WatchDog normal but detected abnormal System Server progress may be obtained, and System Server WatchDog stops providing the dog feeding information to the Hungdetect watchdog, that is, the second dog feeding operation cannot be performed, which may also be understood as System Server WatchDog abnormal dog feeding for the Hungdetect watchdog.

Regarding the status flags mentioned above, in some implementation scenarios, it may be normal flags, i.e., xcollie and surfeflinger are both normal. In other implementations, it may be an anomaly flag, i.e., at least one of Xcollie and surfeFlinger is anomaly.

Correspondingly, the public node obtains decision information through processing.

Illustratively, in some implementations, the decision information processed by the public node may be actively pushed to the Hungdetect watchdog, or may be actively acquired by the Hungdetect watchdog from the public node.

Taking the example that the decision information is that the huntdetect watchdog needs to perform the reset operation, in one scenario, the public node actively notifies the huntdetect watchdog to perform the reset operation when determining that the huntdetect watchdog needs to perform the reset operation. In another scenario, when the public node determines that the reset operation needs to be performed by the Hungdetect watchdog, the decision information may be saved first, and the Hungdetect watchdog waits for periodically acquiring the decision information decided by the public node from the public node, and further performs the reset operation when the reset instruction is acquired.

Correspondingly, when the decision information is obtained without the need of the Hungdetect watchdog to execute the reset operation, the agreed feeding information can be uniformly sent to the Hungdetect watchdog to realize feeding, so that the Hungdetect watchdog is prevented from triggering a dog call or even biting the dog.

Also exemplary, in some implementations, a common node in the kernel layer is used to store the kicking information of the software watchdog to the Hungdetect watchdog in the application framework layer, as well as the transmitted process state information. The kick information may include a name of the software watchdog and a kick action (kick), and the process state information may include a name of the software watchdog and a process state (OK or ERROR). The Hungdetect watchdog, as a consumer of the public node, can periodically (or periodically) view the kickdog information stored in the public node, and the transmitted progress status information (e.g., OK or ERROR).

The Hungdetect watchdog determines whether to perform an operation of resetting the kernel system based on kickdog information and/or process state information acquired in the common node at a timing (or periodically), and a preset policy. The preset policy may be the above-mentioned preset decision and the priority of the information provided by each software watchdog, which are not described herein.

Scene four

In the scene, a detection mechanism is arranged between a kernel layer and a hardware layer of the electronic device, and a hardware watchdog in the hardware layer is used for detecting a Hungdetect watchdog in the kernel layer as an example, so that the watchdog detection method provided by the application is explained.

Fig. 11 is a schematic view of an exemplary application scenario.

As shown in fig. 11, a Hungdetect watchdog is included in the kernel layer of the electronic device to detect whether the kernel system is operating normally.

When the kernel system normally operates, the kernel system regularly performs a first feeding operation on the Hungdetect watchdog. When the kernel system cannot normally run, the kernel system stops the first feeding operation of the Hungdetect watchdog. The Hungdetect watchdog restarts the kernel system if the first feeding operation of the kernel system is not received within one detection period or within a consecutive number (e.g., 3) of detection periods.

As shown in fig. 11, a hardware watchdog (Hardware WatchDog) is included in the hardware layer of the electronic device for detecting the hardware chip.

When the program in the hardware chip runs normally, the hardware chip can perform the second feeding operation on the hardware watchdog at regular time, for example, the first timing of the hardware watchdog is set to zero, so that the timing is restarted. And stopping performing the second feeding operation on the hardware watchdog when the program in the hardware chip runs abnormally.

When the hardware watchdog does not receive the second feeding operation of the hardware chip, if the first timing is increased to the first set value, resetting the hardware chip, and restarting the whole electronic equipment. Here, the detection period corresponding to the first timing is a detection period of the hardware watchdog for detecting the hardware chip. Wherein a first timer in the hardware watchdog is used to implement the operation of the first timing.

With continued reference to fig. 11, the hardware watchdog may detect, in addition to the hardware chip, a huntdetect watchdog, and reset the hardware chip when the huntdetect watchdog is abnormal, so as to restart the whole electronic device, and recover the kernel system. The abnormal condition of the Hungdetect watchdog may refer to that the Hungdetect watchdog fails, or that the Hungdetect watchdog cannot restart the kernel system successfully.

When the Hungdetect watchdog operates normally and the kernel system detected by the Hungdetect watchdog operates normally, the Hungdetect watchdog performs a third feeding operation on the Hungdetect watchdog at regular time, for example, the second timing of the hardware watchdog is set to zero, so that the second timing of the hardware watchdog is restarted. When the Hungdetect watchdog fails to operate normally or the kernel system fails to restart successfully, the Hungdetect watchdog stops the third feeding operation on the hardware watchdog.

As an alternative embodiment, the huntdetect watchdog stops the third feeding operation on the hardware watchdog before triggering the kernel system reset (or restart).

When the hardware watchdog does not receive the third feeding operation of the huntdetect watchdog, if the second timing is increased to the second set value, resetting the hardware chip, and restarting the whole electronic equipment. Here, the detection period corresponding to the second timing is a detection period for detecting the Hungdetect watchdog by the hardware watchdog. Wherein a second timer in the hardware watchdog is used to implement a second timing operation.

In this way, hierarchical recovery of the electronic device is realized by arranging a detection mechanism between the hardware layer and the kernel layer. Because the granularity of the restarting of the hardware is larger than that of the restarting of the kernel, when the kernel cannot be restarted successfully, the success rate of recovering the kernel system can be greatly improved through the restarting of the hardware, and the problem that the kernel system cannot be restarted successfully repeatedly by the kernel layer of the electronic equipment can be avoided.

Scene five

In the scene, a detection mechanism is arranged between a kernel layer and a hardware layer of the electronic device, and a detection method of the watchdog provided by the application is explained by taking a case that the hardware watchdog in the hardware layer detects the CPU kernel state watchdog in the kernel layer.

Fig. 12 is a schematic view of an exemplary application scenario.

As shown in fig. 12, a CPU core status watchdog is included in a core layer of the electronic device, and is configured to check a CPU running status, and control to restart the core when the CPU running status meets a preset core restart condition.

Specifically, the CPU core status watchdog may be configured to detect an operation status of each CPU core, and control to restart the core when the operation status of each CPU core meets a preset core restart condition.

The CPU core state watchdog obtains the running state of each core of the CPU at fixed time, and controls the restarting of the core when the CPU core state meets the preset core restarting condition. For example, the CPU core status watchdog acquires the running status of each core of the CPU every 30 seconds, and controls to restart the cores when the CPU core status satisfies a preset core restart condition.

As an alternative implementation, the CPU core status watchdog may determine the running status of the CPU core based on whether a task on the CPU core may be scheduled, thereby enabling detection of CPU scheduling problems.

Wherein each core of the CPU is bound with a target task that runs on the bound CPU core at regular intervals, e.g. once every 30 seconds. The CPU core state watchdog regularly detects whether target tasks bound on the CPU cores can be scheduled or not through the first detection task, so that the running state of the CPU cores is determined. For example, the first detection task checks every 30 seconds to see if the target task bound to the respective CPU core can be scheduled. If the target task on a certain CPU core cannot be scheduled, the CPU core state watchdog can determine that the running state of the CPU core is abnormal; if the target task on a certain CPU core can be scheduled, the CPU core state watchdog can determine that the running state of the CPU core is normal.

As shown in fig. 13, the CPU of the electronic device includes 8 cores, namely, a CPU0 core, a CPU1 core, a CPU2 core, a … core, and a CPU7 core. Each CPU core binds a target task, such as CPU0 core binding target task Tast 0, CPU1 core binding target task Tast 1, CPU2 core binding target task Tast 2, …, and CPU7 core binding target task Tast 7. Each target task is timed to run on the CPU core bound with the target task, for example, the target task Tast 0 is timed to run on the CPU0 core, the target task Tast 1 is timed to run on the CPU1 core, the target task Tast 2 is timed to run on the CPU2 core, …, and the target task Tast 7 is timed to run on the CPU7 core.

The CPU core state watchdog regularly detects whether each target task (Tast 0-Tast 7) can be scheduled on the bound CPU cores through the first detection task, so that the running state of each CPU core is determined. The first detection task may run on any CPU core. For example, if Tast 0 cannot be scheduled on its bound CPU0 core, the CPU core status watchdog may determine that the running status of the CPU0 core is abnormal. For example, if Tast 7 can be scheduled on its bound CPU0 core, the CPU core status watchdog may determine that the running status of the CPU7 core is normal. In this manner, the CPU core status watchdog may determine the operational status of each CPU core.

For example, the running state of the CPU core may be identified using "1" and "0". For example, when the operation state of the CPU core is "1", the CPU core operates normally; when the operation state of the CPU core is "0", the CPU core operates abnormally. Furthermore, the first detection task regularly detects whether each target task can be scheduled on the bound CPU cores, and generates an operation state identifier corresponding to each CPU core according to the detection result, and then the CPU core state watchdog can determine whether the operation state of each CPU core is abnormal according to the operation state identifier generated by the first detection task. Wherein the number of bits in the running state identifier is the same as the number of CPU cores. For example, the CPU includes 8 cores, namely, a CPU0 core, a CPU1 core, a CPU2 core, a CPU … core and a CPU7 core, and the running state identifier may include 8 bits, where the values of the 8 bits sequentially identify the running states of the respective CPU cores. Assuming that the running state identifier generated by the first detection task is "111111101", the CPU core status watchdog may determine that the running state of the CPU6 core is abnormal and the running states of the remaining CPU cores are normal according to the running state identifier.

As another alternative, the CPU core status watchdog may determine the operating status of the CPU core based on the probe message.

The CPU core state watchdog can determine the running state of the CPU core based on the detection message through the second detection task timing. For example, the CPU core status watchdog may determine the operating status of the CPU core based on the probe message every 30 seconds through the second detection task.

The CPU core status watchdog may send a probe message to a CPU core whose physical status is online (online) through the second detection task, and if a probe feedback message sent by the CPU core for the probe message is received, determine that the running status of the CPU core is normal, otherwise determine that the running status of the CPU core is abnormal. It should be noted that the CPU core status watchdog may not detect the operation status of the CPU core whose physical status is offline (offlip).

Alternatively, the probe message may be a ping message. The CPU core state watchdog can send a ping message to the CPU core in the form of Interrupt (Interrupt) through the second detection task, and the CPU core sends a feedback message in the form of Interrupt to the ping message, so that the CPU core state watchdog can detect the CPU Interrupt storm.

As shown in fig. 14, the CPU of the electronic device includes 8 cores, namely, a CPU0 core, a CPU1 core, a CPU2 core, a … core, and a CPU7 core. The CPU core state watchdog sequentially sends ping messages to the CPU core with the physical state being online at regular time through the second detection task, if the feedback messages sent by the CPU core for the ping messages are received, the CPU core is determined to be capable of responding normally, the running state of the CPU core is determined to be normal, otherwise, the CPU core cannot respond normally, and the running state of the CPU core is determined to be abnormal. For example, after the CPU core status watchdog sends a ping message to the CPU0 core whose physical status is online through the second detection task, if a feedback message sent by the CPU0 core for the ping message can be received, it is determined that the running status of the CPU0 core is determined to be normal. For another example, after the CPU core status watchdog sends a ping message to the CPU1 core whose physical status is online through the second detection task, if a feedback message sent by the CPU1 core for the ping message cannot be received, it is determined that the running status of the CPU1 core is determined to be abnormal. In this manner, the CPU core status watchdog may determine the operational status of each CPU core.

After the running state of each core of the CPU is determined, the CPU core state watchdog judges whether the running state of each core of the CPU meets the preset core restarting condition, and if so, the cores are controlled to be restarted. The preset kernel restart conditions are not limited in this embodiment.

For example, if the number of CPU cores whose running states are abnormal exceeds a preset number threshold, the CPU core status watchdog determines that the running states of the respective cores of the CPU satisfy a preset core restart condition, and controls restarting the cores.

Also for example, if the running state of the target core of the CPU is abnormal, the CPU core state watchdog determines that the running state of each core of the CPU satisfies a preset core restart condition, and controls the restart of the cores. The CPU target core is a CPU core of a preset type, such as a CPU big core or a core important in the CPU. For example, if the CPU0 core is a CPU big core, when the running state thereof is abnormal, the CPU core state watchdog determines that the running state of each core of the CPU meets the preset core restart condition, and controls to restart the cores.

As an alternative embodiment, the CPU core status watchdog may determine a first operating status of the CPU core based on whether tasks on the CPU core may be scheduled, and the CPU core status watchdog may determine a second operating status of the CPU core based on the probe message. When the first running state and the second running state of a certain CPU core indicate abnormality, the CPU core state watchdog determines that the CPU runs abnormally.

As another alternative embodiment, the CPU core status watchdog may determine a first operating status of the CPU core based on whether tasks on the CPU core may be scheduled, and the CPU core status watchdog may determine a second operating status of the CPU core based on the probe message. When either the first running state or the second running state of a certain CPU core indicates abnormality, the CPU core state watchdog determines that the CPU runs abnormally.

As shown in fig. 12, a hardware watchdog (Hardware WatchDog) is included in the hardware layer of the electronic device for detecting the hardware chip.

With continued reference to fig. 12, the hardware watchdog may detect, in addition to the hardware chip, a CPU core status watchdog, and reset the hardware chip when the CPU core status watchdog is abnormal, so as to restart the whole electronic device, and recover the kernel system. The exception of the CPU core state watchdog may refer to failure of the CPU core state watchdog, or may be that the CPU core state watchdog cannot restart the kernel system successfully.

When the CPU core state watchdog normally operates and the detected CPU core state does not meet the preset core restarting condition, the CPU core state watchdog performs third feeding operation on the Hungdetect watchdog at regular time, for example, the second timing of the hardware watchdog is set to zero, so that the hardware watchdog restarts timing. And when the CPU core state watchdog cannot normally operate or the kernel system cannot be restarted successfully, the CPU core state watchdog stops the third feeding operation of the hardware watchdog.

As an alternative embodiment, the CPU core state watchdog stops the third feeding operation of the hardware watchdog before triggering the core system reset (or restart).

When the hardware watchdog does not receive the third watchdog feeding operation of the CPU core state watchdog, if the second timing is increased to a second set value, resetting the hardware chip, and restarting the whole electronic equipment. Here, the detection period corresponding to the second timing is the detection period of the hardware watchdog for detecting the CPU core state watchdog. Wherein a second timer in the hardware watchdog is used to implement a second timing operation.

In one embodiment, the hardware watchdog may detect the CPU core status watchdog and the huntdetect watchdog in the core layer simultaneously, in addition to the hardware chip. At this time, a second timer and a third timer may be set in the hardware watchdog, which are respectively used to detect the feeding operation of the hardware watchdog by the CPU core state watchdog and the Hungdetect watchdog. For example, when the hardware watchdog does not receive the watchdog feeding operation of the CPU core state watchdog, if the second timing is increased to the second set value, the hardware chip is reset, and the whole electronic equipment is restarted. When the hardware watchdog does not receive the feeding operation of the Hungdetect watchdog, if the third timing is increased to a third set value, resetting the hardware chip, and restarting the whole electronic equipment. The detection period corresponding to the second timing is the detection period of the hardware watchdog for detecting the CPU core state watchdog; and the detection period corresponding to the third time is the detection period of the hardware watchdog for detecting the Hungdetect watchdog. The sizes of the first setting value, the second setting value and the third setting value are not limited in this embodiment.

If the hardware watchdog can also detect other watchdog in the kernel layer at the same time, the processing manner can refer to the detection of the hardware watchdog on the hungdetect watchdog or the CPU kernel state watchdog, which is not described herein.

Scene six

In the scene, detection mechanisms are arranged between an application program framework layer and a kernel layer and between the kernel layer and a hardware layer of the electronic device, and a Hungdetect watchdog in the kernel layer can detect a plurality of software watchdog in the application program framework layer at the same time.

Fig. 15 is a schematic view of an exemplary application scenario.

As shown in fig. 15, system Server WatchDog, xcollie, and Init watchdog are included in the application framework layer of the electronic device. The System Server WatchDog is used for detecting a System Server process in an application framework layer, for example, detecting whether the System Server process is deadlocked or not, and has no response. Xcolie is used to detect whether an action performed by a critical process SurfaceFlinger is completed, for example, whether UI operation of a rendering system is completed. The Init watchdog is used for detecting the Init flow, for example, detecting whether the shutdown and startup flows of the electronic equipment are abnormal.

A Hungdetect watchdog and a CPU core status watchdog are included in the kernel layer of the electronic device. The Hungdetect watchdog is used to detect the kernel system, for example, whether the kernel system is stuck. And the CPU core state watchdog is used for detecting the running state of each core of the CPU, for example, detecting whether each CPU core runs abnormally or not.

A hardware watchdog is included at a hardware layer of the electronic device. The first hardware watchdog, the second hardware watchdog and the third hardware watchdog shown in fig. 15 may be understood as hardware watchdog with different processing logic set for different platforms of different chips. The embodiment of the application provides a technical scheme for carrying out hierarchical recovery on electronic equipment, which can be adapted to different hardware watchdog, such as a first hardware watchdog, a second hardware watchdog and a third hardware watchdog shown in fig. 15. Wherein the adaptation to different hardware watchdog may be implemented at the code compilation stage. The first watchdog is actually used in the electronic device, and will be explained below.

Referring to fig. 15, in the present embodiment, the kernel layer Hungdetect watchdog may detect not only the kernel system but also application framework layers System Server WatchDog, xcollie, init watchdog, and the like.

In this embodiment, the kernel layer is provided with a common node, which is used to store the kicking information of the software watchdog to the Hungdetect watchdog in the application framework layer, and the transmitted process state information. The Hungdetect watchdog, as a consumer of the public node, can periodically (or periodically) view the kickdog information stored in the public node, and the transmitted progress status information (e.g., OK or ERROR).

Further, the Hungdetect watchdog periodically acquires kicking information stored in the common node and process state information, and then determines whether to execute a kernel restart operation according to a preset policy.

Optionally, when any software watchdog in the application framework layers detected by the Hungdetect watchdog is abnormal, the Hungdetect watchdog executes a kernel restart operation to recover a corresponding process in the application framework layers through kernel restart.

Alternatively, when the plurality of software watchdog in the application framework layer detected by the Hungdetect watchdog is abnormal, the Hungdetect watchdog executes the kernel restart operation again, so as to recover the corresponding process in the application framework layer through the kernel restart.

Alternatively, when any software watchdog in the application framework layer detected by the Hungdetect watchdog is abnormal in a plurality of continuous periods, the Hungdetect watchdog executes the kernel restart operation again, so as to recover the corresponding process in the application framework layer through the kernel restart.

Taking three periods as an example, assuming that the Hungdetect watchdog does not acquire the kick information of System Server WatchDog in three consecutive periods, or acquires the process state information sent by Xcollie as ERROR in three consecutive periods, the kernel restart operation is performed.

Similarly, in this embodiment, the first hardware watchdog of the hardware layer may detect not only the hardware chip, but also the Hungdetect watchdog, the CPU core status watchdog, and the like in the kernel layer.

If the Hungdetect watchdog or the CPU core state watchdog does not execute the kicking operation on the first hardware watchdog on time, the first hardware watchdog resets the hardware chip detected by the first hardware watchdog, so that the whole machine is restarted, and the kernel system is recovered through restarting the whole machine.

For details of this scenario, please refer to the description in the foregoing scenario, and the details are not repeated here.

In this way, hierarchical recovery of the electronic device is achieved by arranging a detection mechanism between layers of the electronic device. Because the granularity of the next recovery is larger than that of the upper layer, when the upper layer cannot be successfully recovered, the success rate of the upper layer recovery is improved by restarting the lower layer, and the problem that the repeated restarting of a certain layer of the electronic equipment cannot be successfully solved.

In addition, it should be noted that, in order to enable the watchdog detection method provided by the application to be suitable for different chip platforms, so as to realize multi-stage watchdog care, so as to realize full coverage of electronic equipment, ensure that the electronic equipment can be used normally, and also can set a pre-packaged adaptation node in the kernel layer, so that a CPU core state watchdog, a Hungdetect watchdog and the like in the kernel layer can be detected by hardware watchdog provided by different chip platforms.

For example, since the hardware watchdog of different platforms needs different time when loading, the feeding information provided by the CPU core state watchdog is not acquired in the same time, and for different hardware watchdog, some may be considered normal, some may be considered abnormal, and then the dog is executed, even the dog bites. Therefore, in order to ensure that the technical scheme can adapt to different hardware watchdog, when the electronic equipment is started, the closing interface of the hardware watchdog can be called to close the hardware watchdog, and the hardware watchdog is started after the software watchdog at the upper layer is started.

In addition, in particular implementation, in order to simplify the calling procedure and avoid code redundancy, the interface of the initialized hardware dog can be uniformly packaged, the interface of the dog biting (resetting is performed) and the interface of the dog stopping feeding are triggered, and then the processing logic of the hardware dog currently adopted by the electronic equipment is called through the uniform interfaces, so that a set of scheme can be adapted to different hardware watchdog.

The present embodiment also provides a computer storage medium having stored therein computer instructions which, when executed on an electronic device, cause the electronic device to perform the above-described related method steps to implement the watchdog detection method in the above-described embodiments.

The present embodiment also provides a computer program product which, when run on a computer, causes the computer to perform the above-described related steps to implement the watchdog detection method in the above-described embodiments.

In addition, embodiments of the present application also provide an apparatus, which may be specifically a chip, a component, or a module, and may include a processor and a memory connected to each other; the memory is used for storing computer-executable instructions, and when the device is running, the processor can execute the computer-executable instructions stored in the memory, so that the chip executes the watchdog detection method in each method embodiment.

The electronic device, the computer storage medium, the computer program product, or the chip provided in this embodiment are used to execute the corresponding methods provided above, so that the beneficial effects of the electronic device, the computer storage medium, the computer program product, or the chip can refer to the beneficial effects of the watchdog detection method provided above, and are not described herein.

It will be appreciated by those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional modules is illustrated, and in practical application, the above-described functional allocation may be performed by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to perform all or part of the functions described above.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of modules or units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another apparatus, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The above embodiments are merely for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions from the scope of the technical solutions of the embodiments of the present application.

Claims

1. A watchdog detection method, which is applied to an electronic device, comprising:

the software watchdog detects the CPU running state of the electronic equipment at regular time; wherein the software watchdog is located in a kernel layer of the electronic device;

if the CPU running state meets the preset kernel restarting condition, restarting the kernel system by the software watchdog, and stopping the feeding operation of the hardware watchdog;

if the kernel system is restarted successfully, the software watchdog resumes the feeding operation of the hardware watchdog;

and restarting the electronic device if the hardware watchdog does not receive the feeding operation of the software watchdog in the detection period.

2. The method of claim 1, wherein the software watchdog detects a CPU operating state of the electronic device, comprising:

the software watchdog detects the running state of each target CPU core respectively; wherein the physical state of the target CPU core is an on-line state.

3. The method of claim 2, wherein the software watchdog detects an operational state of a target CPU core, comprising:

If the software watchdog detects that the target task on the target CPU core can be scheduled, determining that the running state of the target CPU core is normal;

if the software watchdog detects that the target task on the target CPU core cannot be scheduled, determining that the running state of the target CPU core is abnormal;

wherein the target task is bound to run on the target CPU core.

4. The method of claim 2, wherein the software watchdog detects an operational state of a target CPU core, comprising:

the software watchdog sends a detection message to a target CPU core;

if the software watchdog receives the detection feedback message of the target CPU core, determining that the running state of the target CPU core is normal; wherein the probe feedback message is a feedback message for the probe message;

and if the software watchdog does not receive the detection feedback message, determining that the running state of the target CPU core is abnormal.

5. The method of claim 4, wherein the probe message is a ping message sent in the form of an interrupt.

6. The method of claim 2, wherein the software watchdog detects an operation state of each target CPU core separately, comprising:

The software watchdog detects the running state of each target CPU core based on a first detection mode and determines the first running state of each target CPU core;

the software watchdog detects the running state of each target CPU core based on a second detection mode and determines the second running state of each target CPU core;

and the software watchdog judges whether the target CPU core runs abnormally or not according to the first running state and/or the second running state of the target CPU core.

7. The method according to claim 2, wherein the CPU running state satisfies a preset kernel restart condition, comprising:

the software watchdog determines the number of abnormal CPU cores according to the running state of each target CPU core;

and if the number of the abnormal CPU cores reaches a preset number threshold, the software watchdog determines that the CPU running state meets a preset kernel restarting condition.

8. The method according to claim 2, wherein the CPU running state satisfies a preset kernel restart condition, comprising:

if the running state of the target CPU core is abnormal and the target CPU core belongs to a preset type, the software watchdog determines that the running state of the CPU meets a preset kernel restarting condition.

9. An electronic device, comprising: one or more processors; one or more memories; the one or more memories stores one or more programs that, when executed by the one or more processors, cause the electronic device to perform the watchdog detection method of any of claims 1-8.

10. A computer readable storage medium comprising a computer program, characterized in that the computer program, when run on an electronic device, causes the electronic device to perform the watchdog detection method according to any of claims 1-8.