CN107526646A - Monitoring method, device and watchdog system - Google Patents

Monitoring method, device and watchdog system Download PDF

Info

Publication number
CN107526646A
CN107526646A CN201610443850.8A CN201610443850A CN107526646A CN 107526646 A CN107526646 A CN 107526646A CN 201610443850 A CN201610443850 A CN 201610443850A CN 107526646 A CN107526646 A CN 107526646A
Authority
CN
China
Prior art keywords
value
counter
processor
watchdog
count value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610443850.8A
Other languages
Chinese (zh)
Inventor
杜宝山
刘俊峰
卢小张
郑红波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN201610443850.8A priority Critical patent/CN107526646A/en
Priority to PCT/CN2017/086710 priority patent/WO2017219834A1/en
Publication of CN107526646A publication Critical patent/CN107526646A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • G06F11/0754Error or fault detection not based on redundancy by exceeding limits
    • G06F11/0757Error or fault detection not based on redundancy by exceeding limits by exceeding a time limit, i.e. time-out, e.g. watchdogs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/88Monitoring involving counting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a kind of monitoring method, device and watchdog system, wherein, this method includes:The running status of each processor in watchdog system is monitored, obtains monitored results;According to the monitored results of acquisition, the abnormal processor of running status is determined.By the present invention, when solving that processor existing for multicomputer system occurs abnormal in correlation technique, can not alignment system the problem of resetting failure cause, and then Exact Location System resets the effect of failure cause.

Description

Monitoring method and device and watchdog system
Technical Field
The invention relates to the field of communication, in particular to a monitoring method and device and a watchdog system.
Background
In a typical system, a hardware watchdog circuit is typically provided to enhance the stability and reliability of the system. A Watchdog, also called Watchdog timer, is essentially a timer circuit. For a multiprocessor system, there is often a case where only one processor feeds a dog (i.e., an instruction is issued to a watchdog to clear a watchdog circuit and restart countdown), and other processors have a problem and cannot timely reset the system. After the system is reset, the reason for resetting the watchdog is often unclear, so that hidden troubles existing in the system are hidden.
Therefore, in the related art, the multiprocessor system has a problem that when the processor is abnormal, the cause of the system reset fault cannot be located.
Disclosure of Invention
The embodiment of the invention provides a monitoring method, a monitoring device and a watchdog system, and at least solves the problem that in the related art, when a processor of a multiprocessor system is abnormal, the reason of the system reset fault cannot be positioned.
According to an embodiment of the present invention, there is provided a monitoring method including: monitoring the running state of each processor in the watchdog system to obtain a monitoring result; and determining a processor with abnormal running state according to the obtained monitoring result.
Optionally, the processor for determining that the operation status is abnormal according to the obtained monitoring result includes: judging whether each slave processor in the watchdog system is scheduled within a preset time according to the obtained monitoring result; and in the case of no judgment result, determining that the slave processor which is not scheduled abnormally operates.
Optionally, the determining, according to the obtained monitoring result, whether each slave processor in the watchdog system is scheduled within a predetermined time includes: in the timeout processing process of the watchdog timer of the master processor in the watchdog system, subtracting a first preset value from the count value of a first counter of each slave processor in the watchdog system, wherein the count value of the first counter is reset to a second preset value larger than a first preset threshold value in the timeout processing process of the watchdog timer of the corresponding slave processor; and determining that the slave processor corresponding to the first counter of which the count value obtained by subtracting the first preset value is less than or equal to the first preset threshold value is not scheduled in the preset time.
Optionally, the determining, according to the obtained monitoring result, whether each slave processor in the watchdog system is scheduled within a predetermined time includes: adding a third preset value to the count value of a second counter of each slave processor in the watchdog system in the timeout processing process of the watchdog timer of the master processor in the watchdog system, wherein the count value of the second counter is reset to a fourth preset value smaller than a second preset threshold in the timeout processing process of the watchdog timer of the corresponding slave processor; and determining that the slave processor corresponding to the second counter with the count value added with the third preset value being greater than or equal to the second preset threshold value is not scheduled in the preset time.
Optionally, before determining, according to the obtained monitoring result, a processor with an abnormal operating state, the method further includes: subtracting a fifth preset value from a count value of a third counter of a main processor in a clock interrupt process of the main processor in the watchdog system, wherein the count value of the third counter is reset to a sixth preset value larger than a third preset threshold value in a timeout processing process of a watchdog timer of the main processor; recording the state information of the main processor under the condition that the count value of the third counter after subtracting the sixth preset value is less than or equal to the third preset threshold value; or, during a clock interrupt process of a slave processor in the watchdog system, subtracting the fifth predetermined value from a count value of a fourth counter of the slave processor, wherein the count value of the fourth counter is reset to a sixth predetermined value greater than a third predetermined threshold during a watchdog timer timeout process of the slave processor; and recording the state information of the slave processor when the count value of the third counter after subtracting the fifth preset value is less than or equal to the third preset threshold value.
Optionally, before determining, according to the obtained monitoring result, a processor with an abnormal operating state, the method further includes: adding a seventh preset value to a count value of a fifth counter of a main processor in a clock interrupt process of the main processor in the watchdog system, wherein the count value of the fifth counter is reset to an eighth preset value smaller than a fourth preset threshold value in a watchdog timer timeout processing process of the main processor; recording the state information of the main processor under the condition that the count value of the fifth counter after the seventh preset value is added is greater than or equal to the fourth preset threshold value; or, during a clock interrupt process of a slave processor in the watchdog system, adding the seventh predetermined value to a count value of a sixth counter of the slave processor, wherein the count value of the sixth counter is reset to an eighth predetermined value smaller than a fourth predetermined threshold during a watchdog timer timeout process of the slave processor; and recording the state information of the slave processor when the count value of the sixth counter added with the seventh preset value is greater than or equal to the fourth preset threshold value.
According to another embodiment of the present invention, there is provided a monitoring apparatus including: the acquisition module is used for monitoring the running state of each processor in the watchdog system to obtain a monitoring result; and the determining module is used for determining the processor with abnormal running state according to the obtained monitoring result.
Optionally, the determining module includes: the judging unit is used for judging whether each slave processor in the watchdog system is scheduled within preset time according to the obtained monitoring result; and the determining unit is used for determining that the slave processor which is not scheduled abnormally operates under the condition that the judgment result is negative.
Optionally, the determining unit includes: the watchdog timer processing unit is used for processing the watchdog timer overtime of the master processor in the watchdog system, and is used for subtracting a first preset value from the count value of a first counter of each slave processor in the watchdog system, wherein the count value of the first counter is reset to a second preset value larger than a first preset threshold value in the watchdog timer overtime processing process of the corresponding slave processor; and the first determining subunit is used for determining that the slave processor corresponding to the first counter of which the count value obtained by subtracting the first preset value is less than or equal to the first preset threshold value is not scheduled in the preset time.
Optionally, the determining unit includes: the second calculating subunit is configured to add a third predetermined value to a count value of a second counter of each slave processor in the watchdog system in the watchdog timer timeout processing procedure of the master processor in the watchdog system, where the count value of the second counter is reset to a fourth predetermined value smaller than the second predetermined threshold in the watchdog timer timeout processing procedure of the corresponding slave processor; and the second determining subunit is used for determining that the slave processor corresponding to the second counter of which the count value added with the third preset value is greater than or equal to the second preset threshold value is not scheduled in the preset time.
Optionally, the apparatus further comprises: the watchdog system comprises a first calculating module, a second calculating module and a third calculating module, wherein the first calculating module is used for subtracting a fifth preset value from a count value of a third counter of a main processor in a clock interrupt process of the main processor in the watchdog system, and the count value of the third counter is reset to a sixth preset value larger than a third preset threshold in a timeout processing process of a watchdog timer of the main processor; the first recording module is used for recording the state information of the main processor under the condition that the count value of the third counter after subtracting the sixth preset value is smaller than or equal to the third preset threshold value; or, the second calculating module is configured to subtract the fifth predetermined value from a count value of a fourth counter of the slave processor in the watchdog system during a clock interrupt of the slave processor, where the count value of the fourth counter is reset to a sixth predetermined value greater than a third predetermined threshold during a watchdog timer timeout process of the slave processor; and the second recording module is used for recording the state information of the slave processor under the condition that the count value of the third counter after the fifth preset value is subtracted is less than or equal to the third preset threshold value.
Optionally, the apparatus further comprises: a third calculating module, configured to add a seventh predetermined value to a count value of a fifth counter of a main processor in the watchdog system during a clock interrupt process of the main processor, where the count value of the fifth counter is reset to an eighth predetermined value smaller than a fourth predetermined threshold during a timeout processing procedure of a watchdog timer of the main processor; a third recording module, configured to record the state information of the main processor when the count value of the fifth counter after the seventh predetermined value is added is greater than or equal to the fourth predetermined threshold; or, the fourth calculating module is configured to add the seventh predetermined value to a count value of a sixth counter of the slave processor in the watchdog system during a clock interrupt of the slave processor, where the count value of the sixth counter is reset to an eighth predetermined value smaller than a fourth predetermined threshold during a watchdog timer timeout process of the slave processor; and the fourth recording module is used for recording the state information of the slave processor under the condition that the count value of the sixth counter after the seventh preset value is added is greater than or equal to the fourth preset threshold value.
According to still another embodiment of the present invention, there is also provided a storage medium. The storage medium is configured to store program code for performing the steps of: monitoring the running state of each processor in the watchdog system to obtain a monitoring result; and determining a processor with abnormal running state according to the obtained monitoring result.
Optionally, the storage medium is further arranged to store program code for performing the steps of: the processor for determining the abnormal operation state according to the obtained monitoring result comprises: judging whether each slave processor in the watchdog system is scheduled within a preset time according to the obtained monitoring result; and in the case of no judgment result, determining that the slave processor which is not scheduled runs abnormally.
Optionally, the storage medium is further arranged to store program code for performing the steps of: judging whether each slave processor in the watchdog system is scheduled in a preset time according to the obtained monitoring result comprises: in the timeout processing process of the watchdog timer of the master processor in the watchdog system, subtracting a first preset value from the count value of a first counter of each slave processor in the watchdog system, wherein the count value of the first counter is reset to a second preset value larger than a first preset threshold value in the timeout processing process of the watchdog timer of the corresponding slave processor; and determining that the slave processor corresponding to the first counter of which the count value obtained by subtracting the first preset value is less than or equal to the first preset threshold value is not scheduled in the preset time.
Optionally, the storage medium is further arranged to store program code for performing the steps of: judging whether each slave processor in the watchdog system is scheduled in a preset time according to the obtained monitoring result comprises: adding a third preset value to the count value of a second counter of each slave processor in the watchdog system in the timeout processing process of the watchdog timer of the master processor in the watchdog system, wherein the count value of the second counter is reset to a fourth preset value smaller than a second preset threshold in the timeout processing process of the watchdog timer of the corresponding slave processor; and determining that the slave processor corresponding to the second counter with the count value added with the third preset value being greater than or equal to the second preset threshold value is not scheduled in the preset time.
Optionally, the storage medium is further arranged to store program code for performing the steps of: before determining a processor with an abnormal operating state according to the obtained monitoring result, the method further comprises the following steps: subtracting a fifth preset value from a count value of a third counter of a main processor in a clock interrupt process of the main processor in the watchdog system, wherein the count value of the third counter is reset to a sixth preset value larger than a third preset threshold value in a timeout processing process of a watchdog timer of the main processor; recording the state information of the main processor under the condition that the count value of the third counter after subtracting the sixth preset value is less than or equal to the third preset threshold value; or, during a clock interrupt process of a slave processor in the watchdog system, subtracting the fifth predetermined value from a count value of a fourth counter of the slave processor, wherein the count value of the fourth counter is reset to a sixth predetermined value greater than a third predetermined threshold during a watchdog timer timeout process of the slave processor; and recording the state information of the slave processor when the count value of the third counter after subtracting the fifth preset value is less than or equal to the third preset threshold value.
Optionally, the storage medium is further arranged to store program code for performing the steps of: before determining a processor with an abnormal operating state according to the obtained monitoring result, the method further comprises the following steps: adding a seventh preset value to a count value of a fifth counter of a main processor in a clock interrupt process of the main processor in the watchdog system, wherein the count value of the fifth counter is reset to an eighth preset value smaller than a fourth preset threshold value in a watchdog timer timeout processing process of the main processor; recording the state information of the main processor under the condition that the count value of the fifth counter after the seventh preset value is added is greater than or equal to the fourth preset threshold value; or, during a clock interrupt process of a slave processor in the watchdog system, adding the seventh predetermined value to a count value of a sixth counter of the slave processor, wherein the count value of the sixth counter is reset to an eighth predetermined value smaller than a fourth predetermined threshold during a watchdog timer timeout process of the slave processor; and recording the state information of the slave processor when the count value of the sixth counter added with the seventh preset value is greater than or equal to the fourth preset threshold value.
By monitoring the running state of each processor in the watchdog system, the invention can determine the processor with abnormal running state according to the monitoring result when the system is abnormal, thereby solving the problem that the system reset fault reason can not be positioned when the processor in the multiprocessor system is abnormal in the related technology, and achieving the effect of accurately positioning the system reset fault reason.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
fig. 1 is a block diagram of a hardware configuration of a computer terminal of a monitoring method according to an embodiment of the present invention;
FIG. 2 is a flow chart of a monitoring method according to an embodiment of the invention;
fig. 3 is a block diagram of a watchdog monitoring system according to a monitoring method in a preferred embodiment of the present invention;
fig. 4 is a schematic flow chart of the main functions of the watchdog management unit 32 of the watchdog monitoring system according to the preferred embodiment of the present invention;
fig. 5 is a schematic flow chart of the main functions of the watchdog monitoring unit 34 of the watchdog monitoring system according to the preferred embodiment of the present invention;
FIG. 6 is a flow chart of a monitoring method according to a preferred embodiment of the present invention;
FIG. 7 is a block diagram of a monitoring device according to an embodiment of the present invention;
FIG. 8 is a block diagram of the determining module 74 of the monitoring device according to an embodiment of the invention;
fig. 9 is a block diagram one of the structure of the judgment unit 82 of the monitoring apparatus according to the embodiment of the present invention;
fig. 10 is a block diagram ii of the structure of the judgment unit 82 of the monitoring apparatus according to the embodiment of the present invention;
FIG. 11 is a block diagram of a monitoring device according to an embodiment of the present invention;
fig. 12 is a block diagram of the structure of a monitoring apparatus according to an embodiment of the present invention;
fig. 13 is a block diagram of a watchdog system according to an embodiment of the present invention.
Detailed Description
The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
Example 1
The method provided by the first embodiment of the present application may be executed in a mobile terminal, a computer terminal, or a similar computing device. Taking the example of running on a computer terminal, fig. 1 is a hardware structure block diagram of a computer terminal of the monitoring method according to the embodiment of the present invention. As shown in fig. 1, the computer terminal 10 may include one or more (only one shown) processors 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), a memory 104 for storing data, and a transmission device 106 for communication functions. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
The memory 104 may be used to store software programs and modules of application software, such as program instructions/modules corresponding to the monitoring method in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by executing the software programs and modules stored in the memory 104, so as to implement the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 can be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.
In this embodiment, a monitoring method operating on the computer terminal is provided, and fig. 2 is a flowchart of the monitoring method according to the embodiment of the present invention, as shown in fig. 2, the flowchart includes the following steps:
step S202, monitoring the running state of each processor in the watchdog system to obtain a monitoring result;
and step S204, determining a processor with abnormal running state according to the obtained monitoring result.
Through the steps, the running state of each processor in the watchdog system is monitored, the processor with the abnormal running state can be determined according to the monitoring result when the system is abnormal, the problem that the system reset fault reason cannot be positioned when the processor in the multiprocessor system is abnormal is solved, and the effect of accurately positioning the system reset fault reason is achieved.
Alternatively, the processor with an abnormal operating state may be determined in various manners according to the obtained monitoring result, for example, the operating state information of each processor may be periodically obtained, the operating state of the processor may be evaluated according to the operating state information of each processor, and whether the operating state of each processor is abnormal or not may be respectively determined according to the evaluation result. .
For another example, the processor that determines the running state abnormality may also be determined according to the obtained monitoring result in the following manner: for each slave processor, the master processor may determine whether the operating state of the corresponding slave processor is abnormal according to whether each slave processor in the watchdog system is scheduled within a predetermined time. If there are slave processors that have not been called within a predetermined time, the master processor may not feed the watchdog, wait for the watchdog to overflow, and reset the system. For the main processor, if the running state of the main processor is abnormal, the main processor does not feed the watchdog, so that the watchdog overflows, and the system is reset.
By the technical scheme of the invention, the processor with abnormal operation state is determined by judging whether each slave processor with the preset time is scheduled, so that the problem that the system cannot be reset in time when the processor in the multiprocessor system is abnormal is solved, the efficiency of resetting the system is improved, and the occupation of the processor with the abnormal operation state on system resources is reduced.
Alternatively, whether each slave processor in the watchdog system is scheduled within the predetermined time may be determined in various ways, for example, during the watchdog timer timeout process of the master processor, a first predetermined value (e.g., 1) may be subtracted from the count value of the first counter of each slave processor, the count value of the first counter is reset to a second predetermined value greater than a first predetermined threshold value during the watchdog timer timeout process of the corresponding slave processor, and the slave processor corresponding to the first counter, from which the count value after the subtraction of the first predetermined value is smaller than or equal to the first predetermined threshold value, is determined not to be scheduled within the predetermined time. For another example, a third predetermined value (e.g., 1) may be added to the count value of the second counter of each slave processor during the watchdog timer timeout process of the master processor, wherein the count value of the second counter is reset to a fourth predetermined value (e.g., 0) smaller than the second predetermined threshold during the watchdog timer timeout process of the corresponding slave processor; and determining that the slave processor corresponding to the second counter with the count value after the third preset value is added being larger than or equal to the second preset threshold value is not scheduled in the preset time.
The count value of the first counter (or the second counter) of the slave processor is reset to the second predetermined value (or the fourth predetermined value) during the watchdog timer timeout process of the corresponding slave processor. During the timer timeout process of the master processor, a subtraction (or addition) operation is performed on the count value of the first counter (or the second counter) of the slave processor. If the watchdog timer timeout process of the slave processor has not been performed, the count value of the first counter (or the second counter) of the slave processor is not reset, and then the count value of the first counter (or the second counter) is decreased to be less than or equal to (or added to be greater than or equal to) the first predetermined threshold (or the second predetermined threshold) after the timer timeout processes of the plurality of master processors (i.e., a predetermined time). Whether the slave processor is scheduled within the preset time can be judged according to the size relation of the count value of the first counter (or the second counter) of the slave processor and the first preset threshold (or the second preset threshold). The watchdog timer timeout process of the slave processor is not executed within the predetermined time, i.e. the slave processor is not considered to be scheduled within the predetermined time.
Through the technical scheme of the invention, whether the slave processor is scheduled in the preset time is judged according to the size relation between the count value of the first counter of the slave processor and the first preset threshold (or the size relation between the count value of the second counter and the second preset threshold) in the timeout processing of the watchdog timer of the master processor, so that the occupation of the processor for determining the abnormal operation state on system resources is reduced.
Optionally, before step S204, the method may further include: subtracting a fifth predetermined value (e.g., 1) from a count value of a third counter of a main processor during a clock interrupt of the main processor in the watchdog system, the count value of the third counter being reset to a sixth predetermined value greater than a third predetermined threshold during a watchdog timer timeout process of the main processor; and recording the state information of the main processor in the case that the count value of the third counter after subtracting the fifth predetermined value is less than or equal to the third predetermined threshold. Or, during clock interrupt of the slave processor in the watchdog system, subtracting a fifth predetermined value from a count value of a fourth counter of the slave processor, wherein the count value of the fourth counter is reset to a sixth predetermined value larger than a third predetermined threshold value during timeout processing of the watchdog timer of the slave processor; and recording the state information of the slave processor when the count value of the fourth counter after subtracting the fifth preset value is less than or equal to the third preset threshold value.
During a clock interrupt of the host processor, a subtraction operation (subtracting a fifth predetermined value, e.g., 1) is performed on the count value of the third counter of the host processor and compared with a third predetermined threshold. The count value of the third counter is reset in the timeout processing process of the watchdog timer of the main processor, if the timeout processing process of the watchdog timer of the main processor cannot be executed all the time due to system resource occupation and other reasons, the count value of the third counter cannot be reset all the time, and then after the clocks of a plurality of main processors are interrupted, the count value of the third counter is reduced to be less than or equal to a third preset threshold value, at the moment, the situation that the current system is about to be reset by the watchdog can be judged, and the state information of the main processor is recorded in time. Alternatively, during the clock interrupt of the slave processor, a subtraction operation (subtracting a fifth predetermined value, for example, 1) may be performed on the count value of a fourth counter of the slave processor, the count value of the fourth counter is reset to a sixth predetermined value larger than the third predetermined threshold value during the watchdog timer timeout process of the slave processor, and the status information of the slave processor may be recorded when the count value of the fourth counter after subtracting the fifth predetermined value is smaller than or equal to the third predetermined threshold value.
Alternatively, during the clock interrupt of the main processor, an addition operation (adding a seventh predetermined value, for example, 1) may be performed on a count value of a fifth counter of the main processor, the count value of the fifth counter is reset to an eighth predetermined value (for example, 0) smaller than a fourth predetermined threshold value during the watchdog timer timeout processing of the main processor, and the state information of the main processor is recorded when the count value of the fifth counter after the seventh predetermined value is added is greater than or equal to the fourth predetermined threshold value. Alternatively, during the clock interrupt of the slave processor, an addition operation (adding a seventh predetermined value, for example, 1) may be performed on the count value of a sixth counter of the slave processor, the count value of the sixth counter may be reset to an eighth predetermined value (for example, 0) smaller than the fourth predetermined threshold value during the watchdog timer timeout process of the slave processor, and the slave processor state information may be recorded when the count value of the sixth counter to which the seventh predetermined value is added is greater than or equal to the fourth predetermined threshold value.
Through the technical scheme of the invention, in the clock interrupt process of the master processor or the slave processor, the count value of the third counter after subtracting the fifth preset value or the count value of the fourth counter after subtracting the fifth preset value is compared with the third preset threshold value (or the count value of the fifth counter after adding the seventh preset value or the count value of the sixth counter after adding the seventh preset value is compared with the fourth preset threshold value), the state information of the master processor or the slave processor is recorded in time when the system is judged to be reset by the watchdog, and the condition that the state information cannot be recorded or the state information cannot be recorded completely due to the reset of the watchdog is avoided.
Optionally, when the running states of the processors are all normal according to the obtained monitoring result, the master processor or a certain slave processor (determined according to which watchdog controlling the processor is) sends an instruction to the watchdog to feed the watchdog, and counting down is restarted. The master and slave processors may be distinguished by name only, and there is no difference in specific structure or function.
Optionally, when it is determined that the operating state of one or some of the processors is abnormal according to the obtained monitoring result, the master processor or one of the slave processors does not feed the watchdog (send a clear signal or an instruction to the watchdog), but waits for the watchdog to overflow, and the system is reset.
Alternatively, the difference between the second predetermined value and the first predetermined threshold, the difference between the fourth predetermined value and the second predetermined threshold, the difference between the sixth predetermined value and the third predetermined threshold, and the difference between the eighth predetermined value and the fourth predetermined threshold (when the foregoing differences are negative, the absolute value is taken) may be set to be smaller than the system watchdog reset time.
Alternatively, the execution order of step S202 and step S204 may be interchanged, i.e., step S202 and step S204 may be executed in a loop.
Based on the foregoing embodiments and optional implementations, to illustrate the whole process interaction of the scheme, in the preferred embodiment, a monitoring method is provided, and the monitoring method can be executed in a watchdog monitoring system (similar to the foregoing watchdog system) as shown in fig. 3. Fig. 3 is a block diagram of a watchdog monitoring system of a monitoring method according to a preferred embodiment of the present invention, as shown in fig. 3, the system includes: a watchdog management unit 32, a watchdog monitoring unit 34, an information recording unit 36, and a hardware watchdog 38 (similar to the watchdog described above). The system is explained below.
The watchdog management unit 32 is configured to start a watchdog timer, manage watchdog counters, regularly feed dogs, and manage running states of hardware watchdog for each processor of the system.
The watchdog monitoring unit 34 mainly maintains watchdog monitoring counters of all processors, and is configured to monitor an operating state of the system, perform on-site photographing (i.e., collecting information stored in a CPU state, a stack, or a stack) before any processor is abnormally reset, collect information, and perform recording.
And the information recording unit 36 is used for recording the acquired information.
And the hardware watchdog 38 is used for monitoring the running state of the system and resetting the system.
The information recording unit 36 can ensure that the recorded monitoring information is not lost after the system reset caused by the hardware dog, and provide a plurality of reading modes.
The main functional flow of the watchdog managing unit 32 is explained below. Fig. 4 is a schematic diagram of a main functional flow of the watchdog management unit 32 of the watchdog monitoring system according to the preferred embodiment of the present invention, and as shown in fig. 4, the main functional flow of the watchdog management unit 32 includes the following steps:
step S402, judging whether the watchdog timer is overtime, if yes, executing step S404.
Step S404, judging whether the current processor is the main processor, if so, executing step S406, otherwise, executing step S408.
In step S406, all slave processor watchdog counters are decremented.
In the timeout process of the watchdog timer of the master processor, the watchdog counters CNTn of other slave processors (where n is the number of processors in the system, the types of the watchdog counters CNTn of different processors may be the same or different, and the count value of the watchdog counter CNTn of each processor varies with the actual situation) are subtracted (e.g., by 1) from the aforementioned first counter.
In step S408, the watchdog counter of the processor is reset.
The watchdog counter CNTn of the processor is reset to the initial value a (same as the aforementioned second predetermined value).
Step S410, determining whether the watchdog counters of all the processors are greater than a threshold, if so, executing step S412, otherwise, executing step S414.
Step S412, feeding a hardware watchdog.
The watchdog counters CNTn of all processors are greater than the threshold (same as the first predetermined threshold), indicating that the system is in a normal state at present, and then the hardware watchdog 38 starts to be fed.
In step S414, the watchdog monitor counter of the current processor is reset.
If the watchdog counter CNTn of any processor is not greater than the threshold, indicating that the processor may be abnormal, the hardware dog is not fed to wait for the system reset. And resetting the watchdog monitoring counter MCNTn of the current processor (where n is the number of processors in the system), that is, if the current processor is the master processor, resetting the watchdog monitoring counter MCNTn of the master processor (same as the third counter) to the initial value T (same as the sixth predetermined value), and if the current processor is the slave processor, resetting the watchdog monitoring counter MCNTn of the slave processor (same as the fourth counter) to the initial value T.
And step S416, ending.
The above initial value a and the corresponding threshold are set based on the following principle: the watchdog time (the difference between the initial value a and the corresponding threshold) is guaranteed to be less than the system hardware reset time. The above initial value T and the corresponding threshold are set based on the following principle: and ensuring that the watchdog monitoring time (the difference value between the initial value T and the corresponding threshold value) is less than the system hardware dog reset time.
The main functional flow of the watchdog monitoring unit 34 is explained below. Fig. 5 is a schematic diagram of a main functional flow of the watchdog monitoring unit 34 of the watchdog monitoring system according to the preferred embodiment of the present invention, and as shown in fig. 5, the main functional flow of the watchdog monitoring unit 34 includes the following steps:
step S502, a watchdog monitoring counter of the current processor is subtracted.
In a system clock interrupt, the watchdog monitor counter MCNTn of the current processor is decremented (e.g., by 1).
Step S504, determine whether the watchdog monitoring counter is greater than the threshold, if yes, execute step S506, otherwise, execute step S510.
The watchdog monitoring counter MCNTn after the subtraction operation is determined, and whether the count value of the watchdog monitoring counter after the subtraction operation is subtracted to be smaller than the threshold (the same as the third predetermined threshold) is determined.
Step S506, taking a picture on site.
If the value after the subtraction operation is reduced to be less than the threshold value, which indicates that the current system is in a situation to be reset by the watchdog, the system is immediately photographed.
Step S508, writes the acquired information to the information recording unit.
Information collected by photographing (such as CPU state, information stored in a heap or a stack) is written to the information recording unit 36.
And step S510, ending.
In the preferred embodiment, a monitoring method operating in the watchdog monitoring system is provided, and fig. 6 is a flowchart of the monitoring method according to the preferred embodiment of the present invention, as shown in fig. 6, the flowchart includes the following steps:
step S602, the system starts, and a watchdog timer is started for each processor of the system.
When the system is started, the watchdog management unit 32 starts a watchdog timer for each processor of the system, and specifically, if the default processor 0 is the master processor and the other processors are the slave processors, the master processor watchdog timer is T0The watchdog timer of the slave processor is T1To Tn-1(where n is the number of processors in the system).
And step S604, in the timeout processing of the watchdog timer of the main processor, subtracting the watchdog counters of other sub-processors, judging whether the watchdog counters of other sub-processors are greater than a threshold value, and executing corresponding operation according to the judgment result.
The watchdog management unit 32 sets a watchdog counter CNTn to an initial value a for each slave processor. In the timeout processing of the watchdog timer of the master processor, the watchdog management unit 32 first performs subtraction (e.g., 1 subtraction) on the watchdog counters of the other slave processors, and determines whether the watchdog counters of the other processors are greater than a threshold, and if the watchdog counters of all the processors (the master processor and each slave processor) are greater than the threshold, indicating that the system is in a normal state at present, starts to feed the hardware watchdog 38; conversely, if the watchdog counter of any one processor is not greater than the threshold, indicating that the processor may be anomalous, the hardware dog is not fed to wait for the system to reset. The above initial value a and the corresponding threshold are set based on the following principle: the watchdog time (the difference between the initial value a and the corresponding threshold) is guaranteed to be less than the system hardware reset time.
Step S606: and resetting the watchdog counter and the watchdog monitoring counter of the corresponding processor in the timeout processing of the watchdog timer of each slave processor.
In the watchdog timer timeout process of each slave processor, the watchdog management unit 32 needs to reset the watchdog counter CNTn (reset to the initial value a) and the watchdog monitoring counter MCNTn (reset to the initial value) of the processor, and the initial value of the watchdog monitoring counter MCNTn is T. The above initial value T and the corresponding threshold are set based on the following principle: the watchdog monitoring time (the difference between the initial value T and the corresponding threshold) is guaranteed to be less than the system hardware dog reset time, so as to guarantee that the field information can be successfully collected and recorded to the information recording unit 36.
Step S608: in the clock interrupt of the system, the watchdog monitoring counter of the current processor is subtracted, the value is judged, and corresponding operation is executed according to the judgment result.
In the system clock interrupt, the watchdog monitoring unit 34 performs subtraction (for example, subtraction by 1) on the watchdog monitoring counter MCNTn of the current processor, and determines the value after the subtraction, and if the value after the subtraction is reduced to be less than the threshold value, which indicates that the current system is about to be reset by the watchdog, the system is immediately photographed, and the acquired information is written to the information recording unit 36.
Step S610: and (6) reading information.
After the system is reset (the system is reset due to hardware dog overflow), the relevant files (for example, the collected information of the processor recorded in the information recording unit 36) can be obtained through a local obtaining mode such as a serial port and a local telnet or a remote obtaining mode such as a remote telnet for analysis and positioning, and the abnormal processor and the reason causing the processor abnormality are determined.
The watchdog monitoring system in the preferred embodiment may be any universal watchdog monitoring system under a multiprocessor framework, and the monitoring method provides a universal positioning means for the system reset problem caused by the watchdog. Note that the operation (for example, subtraction operation, reset, or the like) of each counter in the present preferred embodiment is an operation performed on the count value of the counter.
Similarly, the count value of the counter of the master processor and the count values of the counters of the slave processors may also perform an addition operation, and at this time, each initial value and a corresponding threshold value may be set as needed (for example, the initial value (a or T) is smaller than the corresponding threshold value, and an absolute value of a difference between the initial value and the corresponding threshold value is smaller than the system hardware dog reset time, and the like).
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
Example 2
In this embodiment, a monitoring device is further provided, and the device is used to implement the foregoing embodiments and preferred embodiments, and details are not repeated after the description is given. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 7 is a first block diagram of a monitoring device according to an embodiment of the present invention, as shown in fig. 7, the device includes: an obtaining module 72 and a determining module 74, which are explained below.
An obtaining module 72, configured to monitor an operating state of each processor in the watchdog system, and obtain a monitoring result; and a determining module 74, connected to the obtaining module 72, for determining a processor with abnormal operating condition according to the obtained monitoring result.
Fig. 8 is a block diagram of a structure of the determination module 74 of the monitoring apparatus according to the embodiment of the present invention, and as shown in fig. 8, the apparatus includes, in addition to all the modules shown in fig. 7: a judging unit 82 and a determining unit 84. The determination module 74 is described below.
A judging unit 82, configured to judge whether each slave processor in the watchdog system is scheduled within a predetermined time according to the obtained monitoring result; and the determining unit 84 is connected with the judging unit 82 and used for determining that the slave processor which is not scheduled runs abnormally if the judging result is negative.
Fig. 9 is a block diagram of a first structure of the judging unit 82 of the monitoring apparatus according to the embodiment of the present invention, and as shown in fig. 9, the apparatus includes, in addition to all the modules shown in fig. 8: a first calculating subunit 92, a first determining subunit 94. The judgment unit 82 will be explained below.
The first calculating subunit 92 is configured to subtract a first predetermined value from a count value of a first counter of each slave processor in the watchdog system during timeout processing of a watchdog timer of a master processor in the watchdog system, where the count value of the first counter is reset to a second predetermined value greater than a first predetermined threshold during timeout processing of the watchdog timer of the corresponding slave processor; and a first determining subunit 94, connected to the first calculating subunit 92, for determining that the slave processor corresponding to the first counter with the count value obtained by subtracting the first predetermined value being less than or equal to the first predetermined threshold value is not scheduled within a predetermined time.
Fig. 10 is a block diagram of a second structure of the judging unit 82 of the monitoring apparatus according to the embodiment of the present invention, and as shown in fig. 10, the apparatus includes, in addition to all modules shown in fig. 8: a second calculating subunit 102, a second determining subunit 104. The judgment unit 82 will be explained below.
The second calculating subunit 102 is configured to add a third predetermined value to a count value of a second counter of each slave processor in the watchdog system in the watchdog timer timeout process of the master processor in the watchdog system, where the count value of the second counter is reset to a fourth predetermined value smaller than the second predetermined threshold in the watchdog timer timeout process of the corresponding slave processor; and a second determining subunit 104, connected to the second calculating subunit 102, configured to determine that the slave processor corresponding to the second counter, to which the count value obtained by adding the third predetermined value is smaller than or equal to the second predetermined threshold, is not scheduled within a predetermined time.
Fig. 11 is a block diagram of a second structure of a monitoring device according to an embodiment of the present invention, and as shown in fig. 11, the device includes, in addition to all modules shown in fig. 7: a first calculation module 112, a first recording module 114, or a second calculation module 116, a second recording module 118. The apparatus will be explained below.
The first calculating module 112 is configured to subtract a fifth predetermined value from a count value of a third counter of the main processor in a clock interrupt process of the main processor in the watchdog system, where the count value of the third counter is reset to a sixth predetermined value greater than a third predetermined threshold in an timeout processing process of the watchdog timer of the main processor; a first recording module 114, connected to the first calculating module 112, for recording the state information of the host processor when the count value of the third counter after subtracting the sixth predetermined value is less than or equal to the third predetermined threshold;
or,
a second calculating module 116, configured to subtract a fifth predetermined value from a count value of a fourth counter of the slave processor during a clock interrupt of the slave processor in the watchdog system, wherein the count value of the fourth counter is reset to a sixth predetermined value greater than the third predetermined threshold during a watchdog timer timeout process of the slave processor; and a second recording module 118, configured to record the status information of the slave processor in a case that the count value of the third counter after subtracting the fifth predetermined value is less than or equal to a third predetermined threshold.
Fig. 12 is a block diagram of a third structure of a monitoring device according to an embodiment of the present invention, and as shown in fig. 12, the device includes, in addition to all modules shown in fig. 7: a third calculation module 122, a third recording module 124, or a fourth calculation module 126, a fourth recording module 128. The apparatus will be explained below.
A third calculating module 122, configured to add a seventh predetermined value to a count value of a fifth counter of a main processor in a clock interrupt process of the main processor in the watchdog system, where the count value of the fifth counter is reset to an eighth predetermined value smaller than a fourth predetermined threshold in a timeout processing process of a watchdog timer of the main processor; a third recording module 124, connected to the third calculating module 122, for recording the state information of the main processor when the count value of the fifth counter after adding the seventh predetermined value is greater than or equal to the fourth predetermined threshold;
or,
a fourth calculating module 126, configured to add a seventh predetermined value to a count value of a sixth counter of the slave processor during a clock interrupt of the slave processor in the watchdog system, where the count value of the sixth counter is reset to an eighth predetermined value smaller than the fourth predetermined threshold during a watchdog timer timeout process of the slave processor; and a fourth recording module 128, connected to the fourth calculating module 126, for recording the status information of the slave processor when the count value of the sixth counter after adding the seventh predetermined value is greater than or equal to a fourth predetermined threshold.
It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.
Example 3
In this embodiment, a watchdog system is further provided, and fig. 13 is a block diagram of a watchdog system according to an embodiment of the present invention, as shown in fig. 13, the watchdog system in the above embodiment of the system is a monitoring device 132.
Example 4
The embodiment of the invention also provides a storage medium. Alternatively, in the present embodiment, the storage medium may be configured to store program codes for performing the following steps:
s1, monitoring the running state of each processor in the watchdog system to obtain a monitoring result;
and S2, determining the processor with abnormal running state according to the obtained monitoring result.
Optionally, the storage medium is further arranged to store program code for performing the steps of:
the processor for determining the running state exception according to the obtained monitoring result comprises:
s1, judging whether each slave processor in the watchdog system is scheduled in a preset time according to the obtained monitoring result;
and S2, if the judgment result is negative, determining that the operation state of the slave processor which is not scheduled is abnormal.
Optionally, the storage medium is further arranged to store program code for performing the steps of:
judging whether each slave processor in the watchdog system is scheduled in a preset time according to the obtained monitoring result comprises the following steps:
s1, in the timeout process of the watchdog timer of the master processor in the watchdog system, subtracting a first preset value from the count value of the first counter of each slave processor in the watchdog system, wherein the count value of the first counter is reset to a second preset value larger than the first preset threshold value in the timeout process of the watchdog timer of the corresponding slave processor;
and S2, determining that the slave processor corresponding to the first counter with the count value after subtracting the first preset value less than or equal to the first preset threshold value is not scheduled in the preset time.
Optionally, the storage medium is further arranged to store program code for performing the steps of:
judging whether each slave processor in the watchdog system is scheduled in a preset time according to the obtained monitoring result comprises the following steps:
s1, in the timeout process of the watchdog timer of the master processor in the watchdog system, adding a third preset value to the count value of the second counter of each slave processor in the watchdog system, wherein the count value of the second counter is reset to a fourth preset value smaller than the second preset threshold in the timeout process of the watchdog timer of the corresponding slave processor;
and S2, determining that the slave processor corresponding to the second counter with the count value after the third preset value is added being larger than or equal to the second preset threshold value is not scheduled in the preset time.
Optionally, the storage medium is further arranged to store program code for performing the steps of:
before determining the processor with abnormal running state according to the obtained monitoring result, the method further comprises the following steps:
in the clock interrupt process of a main processor in the watchdog system, subtracting a fifth preset value from the count value of a third counter of the main processor, wherein the count value of the third counter is reset to a sixth preset value larger than a third preset threshold value in the timeout processing process of a watchdog timer of the main processor; recording the state information of the main processor under the condition that the count value of the third counter after subtracting the fifth preset value is less than or equal to a third preset threshold value;
or,
subtracting a fifth predetermined value from a count value of a fourth counter of the slave processor during a clock interrupt of the slave processor in the watchdog system, wherein the count value of the fourth counter is reset to a sixth predetermined value greater than a third predetermined threshold during a watchdog timer timeout process of the slave processor; and recording the state information of the slave processor when the count value of the third counter after subtracting the fifth preset value is less than or equal to the third preset threshold value.
Optionally, the storage medium is further arranged to store program code for performing the steps of:
before determining the processor with abnormal running state according to the obtained monitoring result, the method further comprises the following steps:
adding a seventh preset value to a count value of a fifth counter of a main processor in a clock interrupt process of the main processor in the watchdog system, wherein the count value of the fifth counter is reset to an eighth preset value smaller than a fourth preset threshold in a timeout processing process of a watchdog timer of the main processor; recording the state information of the main processor under the condition that the count value of the fifth counter after the seventh preset value is added is greater than or equal to a fourth preset threshold value;
or,
adding a seventh preset value to the count value of a sixth counter of the slave processor in the clock interrupt process of the slave processor in the watchdog system, wherein the count value of the sixth counter is reset to an eighth preset value smaller than a fourth preset threshold value in the timeout process of the watchdog timer of the slave processor; and recording the state information of the slave processor when the count value of the sixth counter after the seventh preset value is added is greater than or equal to a fourth preset threshold value.
Optionally, in this embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
Optionally, in this embodiment, the processor executes, according to the program code stored in the storage medium: monitoring the running state of each processor in the watchdog system to obtain a monitoring result; and determining the processor with abnormal running state according to the obtained monitoring result.
Optionally, in this embodiment, the processor executes, according to the program code stored in the storage medium: the processor for determining the running state exception according to the obtained monitoring result comprises: judging whether each slave processor in the watchdog system is scheduled in a preset time according to the obtained monitoring result; and in the case of no judgment result, determining that the operation state of the slave processor which is not scheduled is abnormal.
Optionally, in this embodiment, the processor executes, according to the program code stored in the storage medium: judging whether each slave processor in the watchdog system is scheduled in a preset time according to the obtained monitoring result comprises the following steps: in the timeout processing process of the watchdog timer of the master processor in the watchdog system, subtracting a first preset value from the count value of the first counter of each slave processor in the watchdog system, wherein the count value of the first counter is reset to a second preset value larger than the first preset threshold in the timeout processing process of the watchdog timer of the corresponding slave processor; and determining that the slave processor corresponding to the first counter of which the count value obtained by subtracting the first preset value is less than or equal to the first preset threshold value is not scheduled in a preset time.
Optionally, in this embodiment, the processor executes, according to the program code stored in the storage medium: judging whether each slave processor in the watchdog system is scheduled in a preset time according to the obtained monitoring result comprises the following steps: adding a third preset value to the count value of the second counter of each slave processor in the watchdog system in the timeout processing process of the watchdog timer of the master processor in the watchdog system, wherein the count value of the second counter is reset to a fourth preset value smaller than the second preset threshold in the timeout processing process of the watchdog timer of the corresponding slave processor; and determining that the slave processor corresponding to the second counter with the count value after the third preset value is added being larger than or equal to the second preset threshold value is not scheduled in the preset time.
Optionally, in this embodiment, the processor executes, according to the program code stored in the storage medium: before determining the processor with abnormal running state according to the obtained monitoring result, the method further comprises the following steps: in the clock interrupt process of a main processor in the watchdog system, subtracting a fifth preset value from the count value of a third counter of the main processor, wherein the count value of the third counter is reset to a sixth preset value larger than a third preset threshold value in the timeout processing process of a watchdog timer of the main processor; recording the state information of the main processor under the condition that the count value of the third counter after subtracting the fifth preset value is less than or equal to a third preset threshold value; or, during clock interrupt of the slave processor in the watchdog system, subtracting a fifth predetermined value from a count value of a fourth counter of the slave processor, wherein the count value of the fourth counter is reset to a sixth predetermined value greater than a third predetermined threshold during timeout processing of the watchdog timer of the slave processor; and recording the state information of the slave processor when the count value of the third counter after subtracting the fifth preset value is less than or equal to the third preset threshold value.
Optionally, in this embodiment, the processor executes, according to the program code stored in the storage medium: before determining the processor with abnormal running state according to the obtained monitoring result, the method further comprises the following steps: adding a seventh preset value to a count value of a fifth counter of a main processor in a clock interrupt process of the main processor in the watchdog system, wherein the count value of the fifth counter is reset to an eighth preset value smaller than a fourth preset threshold in a timeout processing process of a watchdog timer of the main processor; recording the state information of the main processor under the condition that the count value of the fifth counter after the seventh preset value is added is greater than or equal to a fourth preset threshold value; or, during clock interrupt of the slave processor in the watchdog system, adding a seventh predetermined value to the count value of a sixth counter of the slave processor, wherein the count value of the sixth counter is reset to an eighth predetermined value smaller than a fourth predetermined threshold during timeout processing of the watchdog timer of the slave processor; and recording the state information of the slave processor when the count value of the sixth counter after the seventh preset value is added is greater than or equal to a fourth preset threshold value.
Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (13)

1. A method of monitoring, comprising:
monitoring the running state of each processor in the watchdog system to obtain a monitoring result;
and determining a processor with abnormal running state according to the obtained monitoring result.
2. The method of claim 1, wherein determining, based on the obtained monitoring, a processor with an abnormal operating condition comprises:
judging whether each slave processor in the watchdog system is scheduled within a preset time according to the obtained monitoring result;
and in the case of no judgment result, determining that the operation state of the slave processor which is not scheduled is abnormal.
3. The method of claim 2, wherein determining whether each slave processor in the watchdog system is scheduled within a predetermined time based on the obtained monitoring result comprises:
in the timeout processing process of the watchdog timer of the master processor in the watchdog system, subtracting a first preset value from the count value of a first counter of each slave processor in the watchdog system, wherein the count value of the first counter is reset to a second preset value larger than a first preset threshold value in the timeout processing process of the watchdog timer of the corresponding slave processor;
and determining that the slave processor corresponding to the first counter of which the count value obtained by subtracting the first preset value is less than or equal to the first preset threshold value is not scheduled in the preset time.
4. The method of claim 2, wherein determining whether each slave processor in the watchdog system is scheduled within a predetermined time based on the obtained monitoring result comprises:
adding a third preset value to the count value of a second counter of each slave processor in the watchdog system in the timeout processing process of the watchdog timer of the master processor in the watchdog system, wherein the count value of the second counter is reset to a fourth preset value smaller than a second preset threshold in the timeout processing process of the watchdog timer of the corresponding slave processor;
and determining that the slave processor corresponding to the second counter with the count value added with the third preset value being greater than or equal to the second preset threshold value is not scheduled in the preset time.
5. The method according to any one of claims 1 to 4, wherein before determining a processor with an abnormal operating state according to the obtained monitoring result, further comprising:
subtracting a fifth preset value from a count value of a third counter of a main processor in a clock interrupt process of the main processor in the watchdog system, wherein the count value of the third counter is reset to a sixth preset value larger than a third preset threshold value in a timeout processing process of a watchdog timer of the main processor; recording the state information of the main processor under the condition that the count value of the third counter after subtracting the fifth preset value is less than or equal to the third preset threshold value;
or,
subtracting the fifth predetermined value from a count value of a fourth counter of a slave processor in the watchdog system during a clock interrupt of the slave processor, wherein the count value of the fourth counter is reset to a sixth predetermined value greater than a third predetermined threshold during a watchdog timer timeout process of the slave processor; and recording the state information of the slave processor when the count value of the third counter after subtracting the fifth preset value is less than or equal to the third preset threshold value.
6. The method according to any one of claims 1 to 4, wherein before determining a processor with an abnormal operating state according to the obtained monitoring result, further comprising:
adding a seventh preset value to a count value of a fifth counter of a main processor in a clock interrupt process of the main processor in the watchdog system, wherein the count value of the fifth counter is reset to an eighth preset value smaller than a fourth preset threshold value in a watchdog timer timeout processing process of the main processor; recording the state information of the main processor under the condition that the count value of the fifth counter after the seventh preset value is added is greater than or equal to the fourth preset threshold value;
or,
adding the seventh predetermined value to a count value of a sixth counter of a slave processor in the watchdog system during a clock interrupt of the slave processor, wherein the count value of the sixth counter is reset to an eighth predetermined value less than a fourth predetermined threshold during a watchdog timer timeout process of the slave processor; and recording the state information of the slave processor when the count value of the sixth counter added with the seventh preset value is greater than or equal to the fourth preset threshold value.
7. A monitoring device, comprising:
the acquisition module is used for monitoring the running state of each processor in the watchdog system to obtain a monitoring result;
and the determining module is used for determining the processor with abnormal running state according to the obtained monitoring result.
8. The apparatus of claim 7, wherein the determining module comprises:
the judging unit is used for judging whether each slave processor in the watchdog system is scheduled within preset time according to the obtained monitoring result;
and the determining unit is used for determining that the slave processor which is not scheduled abnormally operates in the case of no judgment result.
9. The apparatus according to claim 8, wherein the judging unit includes:
the watchdog timer processing unit is used for processing the watchdog timer overtime of the master processor in the watchdog system, and is used for subtracting a first preset value from the count value of a first counter of each slave processor in the watchdog system, wherein the count value of the first counter is reset to a second preset value larger than a first preset threshold value in the watchdog timer overtime processing process of the corresponding slave processor;
and the first determining subunit is used for determining that the slave processor corresponding to the first counter of which the count value obtained by subtracting the first preset value is less than or equal to the first preset threshold value is not scheduled in the preset time.
10. The apparatus according to claim 8, wherein the judging unit includes:
the second calculating subunit is configured to add a third predetermined value to a count value of a second counter of each slave processor in the watchdog system in the watchdog timer timeout processing procedure of the master processor in the watchdog system, where the count value of the second counter is reset to a fourth predetermined value smaller than the second predetermined threshold in the watchdog timer timeout processing procedure of the corresponding slave processor;
and the second determining subunit is used for determining that the slave processor corresponding to the second counter of which the count value added with the third preset value is greater than or equal to the second preset threshold value is not scheduled in the preset time.
11. The apparatus of any one of claims 7 to 10, further comprising:
the watchdog system comprises a first calculating module, a second calculating module and a third calculating module, wherein the first calculating module is used for subtracting a fifth preset value from a count value of a third counter of a main processor in a clock interrupt process of the main processor in the watchdog system, and the count value of the third counter is reset to a sixth preset value larger than a third preset threshold in a timeout processing process of a watchdog timer of the main processor; the first recording module is used for recording the state information of the main processor under the condition that the count value of the third counter after subtracting the sixth preset value is smaller than or equal to the third preset threshold value;
or,
the second calculation module is used for subtracting the fifth preset value from the count value of a fourth counter of the slave processor in the watchdog system during the clock interrupt process of the slave processor, wherein the count value of the fourth counter is reset to a sixth preset value which is larger than a third preset threshold value during the timeout process of the watchdog timer of the slave processor; and the second recording module is used for recording the state information of the slave processor under the condition that the count value of the third counter after the fifth preset value is subtracted is less than or equal to the third preset threshold value.
12. The apparatus of any one of claims 7 to 10, further comprising:
a third calculating module, configured to add a seventh predetermined value to a count value of a fifth counter of a main processor in the watchdog system during a clock interrupt process of the main processor, where the count value of the fifth counter is reset to an eighth predetermined value smaller than a fourth predetermined threshold during a timeout processing procedure of a watchdog timer of the main processor; a third recording module, configured to record the state information of the main processor when the count value of the fifth counter after the seventh predetermined value is added is greater than or equal to the fourth predetermined threshold;
or,
a fourth calculating module, configured to add the seventh predetermined value to a count value of a sixth counter of the slave processor in the watchdog system during a clock interrupt of the slave processor, where the count value of the sixth counter is reset to an eighth predetermined value smaller than a fourth predetermined threshold during timeout processing of the watchdog timer of the slave processor; and the fourth recording module is used for recording the state information of the slave processor under the condition that the count value of the sixth counter after the seventh preset value is added is greater than or equal to the fourth preset threshold value.
13. A watchdog system comprising the apparatus of any one of claims 7 to 12.
CN201610443850.8A 2016-06-20 2016-06-20 Monitoring method, device and watchdog system Pending CN107526646A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201610443850.8A CN107526646A (en) 2016-06-20 2016-06-20 Monitoring method, device and watchdog system
PCT/CN2017/086710 WO2017219834A1 (en) 2016-06-20 2017-05-31 Monitoring method and device, and watchdog system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610443850.8A CN107526646A (en) 2016-06-20 2016-06-20 Monitoring method, device and watchdog system

Publications (1)

Publication Number Publication Date
CN107526646A true CN107526646A (en) 2017-12-29

Family

ID=60734663

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610443850.8A Pending CN107526646A (en) 2016-06-20 2016-06-20 Monitoring method, device and watchdog system

Country Status (2)

Country Link
CN (1) CN107526646A (en)
WO (1) WO2017219834A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108664357A (en) * 2018-05-14 2018-10-16 许继集团有限公司 A kind of embedded device system repair and system based on startup Information Statistics
CN109878533A (en) * 2018-12-29 2019-06-14 百度在线网络技术(北京)有限公司 Monitoring method, automatic Pilot control unit and the storage medium of processing unit
WO2021198804A1 (en) * 2020-03-31 2021-10-07 International Business Machines Corporation Partial computer processor core shutoff
CN113515429A (en) * 2021-07-28 2021-10-19 深圳忆联信息系统有限公司 Multi-core abnormity monitoring method and device for solid state disk, computer equipment and storage medium
CN113656211A (en) * 2021-08-24 2021-11-16 南方电网数字电网研究院有限公司 Watchdog control method and system based on dual-CPU multi-core system
CN114200874A (en) * 2022-02-17 2022-03-18 四川创智联恒科技有限公司 Device and method for detecting equipment reset event

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113806130B (en) * 2021-09-22 2023-08-08 广州通则康威智能科技有限公司 Watchdog period self-adaption method, device, computer equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1400529A (en) * 2001-07-30 2003-03-05 华为技术有限公司 Fault location method of real-time embedding system
CN1862499A (en) * 2005-09-15 2006-11-15 上海华为技术有限公司 Main-standby protection method for multi-processor device units
WO2008058435A1 (en) * 2006-11-16 2008-05-22 Zte Corporation A method of mobile terminal with double processors mornitoring and controlling the working status processor
CN101452420A (en) * 2008-12-30 2009-06-10 中兴通讯股份有限公司 Embedded software abnormal monitoring and handling arrangement and method thereof
CN104407927A (en) * 2014-11-11 2015-03-11 南京科远自动化集团股份有限公司 Circuit and method for monitoring synchronous running state of processor
CN105260255A (en) * 2015-10-10 2016-01-20 中国兵器工业集团第二一四研究所苏州研发中心 Method for implementing watchdog on system on chip with multiple processor cores

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011002993A (en) * 2009-06-18 2011-01-06 Toyota Motor Corp Watchdog timer monitoring device, and watchdog timer monitoring method
CN102073572B (en) * 2009-11-24 2015-10-21 中兴通讯股份有限公司 For method for supervising and the system of polycaryon processor
CN103870350A (en) * 2014-03-27 2014-06-18 浪潮电子信息产业股份有限公司 Microprocessor multi-core strengthening method based on watchdog

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1400529A (en) * 2001-07-30 2003-03-05 华为技术有限公司 Fault location method of real-time embedding system
CN1862499A (en) * 2005-09-15 2006-11-15 上海华为技术有限公司 Main-standby protection method for multi-processor device units
WO2008058435A1 (en) * 2006-11-16 2008-05-22 Zte Corporation A method of mobile terminal with double processors mornitoring and controlling the working status processor
CN101188828A (en) * 2006-11-16 2008-05-28 中兴通讯股份有限公司 Method for dual-processor mobile terminal to work status of process slave processor
CN101452420A (en) * 2008-12-30 2009-06-10 中兴通讯股份有限公司 Embedded software abnormal monitoring and handling arrangement and method thereof
CN104407927A (en) * 2014-11-11 2015-03-11 南京科远自动化集团股份有限公司 Circuit and method for monitoring synchronous running state of processor
CN105260255A (en) * 2015-10-10 2016-01-20 中国兵器工业集团第二一四研究所苏州研发中心 Method for implementing watchdog on system on chip with multiple processor cores

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108664357A (en) * 2018-05-14 2018-10-16 许继集团有限公司 A kind of embedded device system repair and system based on startup Information Statistics
CN108664357B (en) * 2018-05-14 2021-07-13 许继集团有限公司 Embedded equipment system repairing method and system based on starting information statistics
CN109878533A (en) * 2018-12-29 2019-06-14 百度在线网络技术(北京)有限公司 Monitoring method, automatic Pilot control unit and the storage medium of processing unit
CN109878533B (en) * 2018-12-29 2020-12-08 百度在线网络技术(北京)有限公司 Monitoring method for processing unit, automatic driving control unit and storage medium
WO2021198804A1 (en) * 2020-03-31 2021-10-07 International Business Machines Corporation Partial computer processor core shutoff
US11281474B2 (en) 2020-03-31 2022-03-22 International Business Machines Corporation Partial computer processor core shutoff
GB2609790A (en) * 2020-03-31 2023-02-15 Ibm Partial computer processor core shutoff
CN113515429A (en) * 2021-07-28 2021-10-19 深圳忆联信息系统有限公司 Multi-core abnormity monitoring method and device for solid state disk, computer equipment and storage medium
CN113656211A (en) * 2021-08-24 2021-11-16 南方电网数字电网研究院有限公司 Watchdog control method and system based on dual-CPU multi-core system
CN114200874A (en) * 2022-02-17 2022-03-18 四川创智联恒科技有限公司 Device and method for detecting equipment reset event

Also Published As

Publication number Publication date
WO2017219834A1 (en) 2017-12-28

Similar Documents

Publication Publication Date Title
CN107526646A (en) Monitoring method, device and watchdog system
JP6333410B2 (en) Fault processing method, related apparatus, and computer
CN107704360B (en) Monitoring data processing method, equipment, server and storage medium
EP3206127B1 (en) Method, computer, and apparatus for migrating memory data
CN110109741B (en) Method and device for managing circular tasks, electronic equipment and storage medium
US20140268226A1 (en) Apparatus management system, electronic apparatus, apparatus management method, and computer readable recording medium
CN111796959B (en) Self-healing method, device and system for host container
CN111382008B (en) Virtual machine data backup method, device and system
CN111767199B (en) Resource management method, device, equipment and system based on batch job
US11023335B2 (en) Computer and control method thereof for diagnosing abnormality
US9497339B2 (en) Information processing system, information processing method and recording medium storing an information processing program
CN112714010A (en) Network topology management method, device, expansion unit and storage medium
WO2013161522A1 (en) Log collection server, log collection system, log collection method
CN109284275B (en) Cloud platform virtual machine file system monitoring method and device
CN107273291B (en) Processor debugging method and system
US11695673B2 (en) System and method for collecting consumption
CN115543746A (en) Graphics processor monitoring method, system and device and electronic equipment
CN114281250A (en) Method and device for cleaning storage file, storage medium and electronic device
CN113342496B (en) Single-instance process switching method, system and storage medium
CN108924013B (en) Network flow accurate acquisition method and device
CN115328693A (en) Method, device, equipment and storage medium for recovering service in service process
CN114691343B (en) Polling task execution method and device, computer equipment and readable storage medium
WO2023112359A1 (en) Communication system, management apparatus, and terminal
CN112769889B (en) Service data pushing method and device, storage medium and electronic device
CN113568719B (en) Service fault processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20171229

WD01 Invention patent application deemed withdrawn after publication