WO2022228012A1 - Exception handling method, apparatus and system, and electronic device - Google Patents

Exception handling method, apparatus and system, and electronic device Download PDF

Info

Publication number
WO2022228012A1
WO2022228012A1 PCT/CN2022/084143 CN2022084143W WO2022228012A1 WO 2022228012 A1 WO2022228012 A1 WO 2022228012A1 CN 2022084143 W CN2022084143 W CN 2022084143W WO 2022228012 A1 WO2022228012 A1 WO 2022228012A1
Authority
WO
WIPO (PCT)
Prior art keywords
verification information
program
monitored
monitored device
task processing
Prior art date
Application number
PCT/CN2022/084143
Other languages
French (fr)
Chinese (zh)
Inventor
宗诗皓
丁进超
郑尧成
李怡康
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2022228012A1 publication Critical patent/WO2022228012A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing

Definitions

  • the present disclosure relates to the technical field of program exception monitoring, and in particular, to an exception processing method, device, electronic device, and system.
  • the present disclosure provides an exception handling method, device, electronic device and system.
  • an exception handling method is provided, the method is applied to a monitored device, and the method includes:
  • the verification information is continuously generated based on the task processing progress of the program running on the monitored device, wherein different task processing progress corresponds to different verification information;
  • an exception handling operation is performed.
  • the task processing progress includes a plurality of nodes
  • the verification information is generated based on the task processing progress of the program running on the monitored device, including:
  • the method further includes:
  • the monitored device is restarted based on a trigger signal output by the monitoring device, wherein the trigger signal is output when the monitoring device determines that the program runs abnormally.
  • the monitored device is provided with a relay, the relay is used to control the on-off of a reset switch of the monitored device, and the monitored device is restarted based on a trigger signal output by the monitoring device, including :
  • the state of the relay is changed based on the trigger signal output by the monitoring device, so that the relay turns on the reset switch of the monitored device.
  • the monitoring device and the monitored device are powered by different power sources.
  • an exception handling method is provided, the method is applied to a monitoring device, and the method includes:
  • the task processing progress includes multiple nodes, and for each node in the multiple nodes, verification information corresponding to the node is generated after the task processing progress reaches the node.
  • performing abnormality diagnosis based on the verification information includes:
  • executing an exception handling operation when it is determined that the program runs abnormally including:
  • a trigger signal is output to trigger the monitored device to restart.
  • the monitored device is provided with a relay, and the relay is used to control the on-off of a reset switch of the monitored device, and output a trigger signal to trigger the monitored device to restart, including:
  • a trigger signal is output to the relay, so that the relay turns on the reset switch of the monitored device.
  • an exception processing apparatus the apparatus is applied to a monitored device, and the apparatus includes:
  • a verification information generation module configured to continuously generate verification information based on the task processing progress of the program running on the monitored device during the operation of the monitored device, wherein different task processing progress corresponds to different verification information;
  • a sending module configured to send the verification information to a monitoring device, so that the monitoring device performs abnormal diagnosis based on the received verification information
  • the first processing module in response to determining that the program runs abnormally, performs an exception handling operation.
  • an exception processing apparatus the apparatus is applied to monitoring equipment, and the apparatus includes:
  • a receiving module configured to receive verification information continuously sent by the monitored device during operation, wherein the verification information is generated based on the task processing progress of the program running on the monitored device, and different task processing progress corresponds to different verification information;
  • an abnormality diagnosis module for performing abnormality diagnosis based on the verification information
  • the second processing module is configured to perform an exception processing operation when it is determined that the program runs abnormally.
  • an electronic device includes a processor and a memory, the memory stores a computer program executable by the processor, and the processor executes the computer When the program is executed, the exception handling method mentioned in the first aspect or the second aspect is implemented.
  • an exception handling system including a monitored device and a monitoring device;
  • the monitored device is used to continuously generate verification information based on the task processing progress of the program running on the monitored device during the running process, wherein different task processing progress corresponds to different verification information; information sent to monitoring equipment;
  • the monitoring device is configured to perform abnormality diagnosis based on the verification information, and execute abnormality processing operations when it is determined that the program runs abnormally.
  • the traditional way is to monitor exceptions by sending static verification information, so that the exception cannot be detected when the program is stuck in an infinite loop.
  • the monitored device can continuously generate verification information based on the task processing progress of the running program, and send the generated verification information to the monitoring device. Since the verification information corresponding to different task processing progress is different, as long as the program on the monitored device runs normally, any two verification information is different, so that the monitoring device can detect the abnormality of the program operation according to the received verification information. , such as entering an infinite loop, and make corresponding exception handling.
  • the embodiment of the present disclosure can monitor the program more comprehensively and reliably.
  • FIG. 1 is a flowchart of an exception processing method according to an embodiment of the present disclosure.
  • FIG. 2 is a flowchart of an exception processing method according to an embodiment of the present disclosure.
  • FIG. 3 is a logical structural block diagram of an exception processing apparatus according to an embodiment of the present disclosure.
  • FIG. 4 is a logical structural block diagram of an exception processing apparatus according to an embodiment of the present disclosure.
  • FIG. 5 is a logical structural block diagram of an electronic device according to an embodiment of the present disclosure.
  • FIG. 6 is a logical structural block diagram of an exception handling system according to an embodiment of the present disclosure.
  • first, second, third, etc. may be used in this disclosure to describe various pieces of information, such information should not be limited by these terms. These terms are only used to distinguish the same type of information from each other.
  • first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure.
  • word “if” as used herein can be interpreted as "at the time of” or “when” or "in response to determining.”
  • Many tasks can be performed automatically by developed programs or software, for example, the data to be processed can be continuously received by a special data processing program and then processed. Because the processing time of many tasks is relatively long, for example, the program may need to run all day to complete the pending tasks, so many programs run in an unattended state. Of course, since an exception will inevitably occur during the running process of the program, the task processing will be interrupted and the task processing progress will be affected. Therefore, it is necessary to monitor the running state of the program that processes the task, find out the abnormality in time, and deal with it accordingly.
  • Some monitoring methods When monitoring the running state of the program processing the task, some monitoring methods directly open another auxiliary program on the monitored end, and use the auxiliary program to monitor the abnormality of the monitored program on the monitored end, but this monitoring method is in The device becomes ineffective when it loses power or malfunctions.
  • Some monitoring methods can monitor the program on the monitored terminal through monitoring equipment independent of the monitored terminal, so as to overcome the problem of monitoring failure due to equipment failure. Specifically, the monitored terminal can continuously send verification information to the monitoring device. If the monitoring device does not receive the verification information for a certain period of time, it can be determined that the running state of the monitored program is abnormal.
  • the monitored terminal will still send verification information to the monitoring device, because the verification information currently sent involves software version, remaining capacity of hard disk, program log, client address or The static verification information related to the application path and other related verification information, that is, the verification information for a period of time is the same.
  • the monitoring device cannot distinguish that the same verification information sent by the monitored device is due to an abnormal situation caused by an infinite loop. It is also caused by the fact that the static verification information has not changed, so that the abnormality of the monitored program cannot be detected in time and dealt with accordingly.
  • an embodiment of the present disclosure provides an exception handling method, which can monitor a program running on a monitored device through a monitoring device that is independent of the monitored device, and the monitored device can generate dynamic Verification information, which can change with the progress of the task processing. Therefore, as long as the program executing the task runs normally, any two pieces of verification information sent to the monitoring device are inconsistent, so that the monitoring device can receive The verification information detects that the program is running abnormally, for example, entering an abnormal state such as an infinite loop.
  • the monitored device in the embodiment of the present disclosure refers to a device running a program that needs to be monitored, and the device may be an electronic device such as a mobile phone, a notebook computer, a server, and a smart wearable device.
  • the monitored program can be used to perform some specific tasks, for example, can be used to process images, or clean and integrate data, and so on.
  • the monitored program may be any code or code set capable of implementing certain specific functions, which is not limited in this embodiment of the present disclosure.
  • the monitoring device in the embodiment of the present disclosure may be a device that is physically independent from the monitored device, and the device has a function of monitoring programs on the monitored device.
  • the monitoring device may be a device with a relatively simple structure and low performance, for example, a single-chip microcomputer that integrates a function of monitoring programs on the monitored device.
  • it can also be a device with high performance that integrates other functions at the same time, for example, it can also be a mobile phone, a computer, a tablet, a smart wearable device, and so on.
  • the processing flow performed by the monitored device includes the following steps:
  • the exception handling method executed by the monitored device may be executed by a monitored program in the monitored device, or may be executed by another program in the monitored device that is independent of the monitored program.
  • the exception handling method may be executed by the data processing program, or may be executed by other programs independent of the data processing program.
  • verification information can be continuously generated based on the task processing progress of the program running on the monitored device, wherein different task processing progress corresponds to different verification information, that is, the verification information will follow the progress of the program processing task. changes with the change. If the task processing progress of the program does not change, the verification information also does not change, so that the abnormal state that the program enters an infinite loop can be detected according to the verification information.
  • the verification information can be sent to the monitoring device.
  • the monitoring device can receive the verification information continuously sent by the monitored device, and then judge the program running on the monitored device to perform abnormal diagnosis according to the received verification information. If the monitoring device determines that the program running on the monitored device runs abnormally based on the received verification information, a corresponding exception handling operation is performed.
  • the monitored device generates verification information based on the task processing progress of the running program, and the verification information changes with the task processing progress. Therefore, as long as the program runs normally, any two pieces of verification information sent by the monitored device are different. Therefore, the monitoring device can also detect the abnormal state that the program enters an infinite loop based on the received verification information. At the same time, because the monitored device and the monitoring device are independent devices, compared with the monitored program and the monitoring program are located in the same device. It can detect the abnormality of the monitored program caused by equipment failure in time, so that various abnormal situations can be covered, and the progress status of the monitored program can be monitored more comprehensively and reliably.
  • the verification information may include information identifying the current task processing progress of the program. For example, if the task executed by the program is to process a batch of data, when the program has processed 1% of the batch of data, the verification information Include information indicating the current progress (eg, 1%).
  • the verification information may also include other information generated according to the current task processing progress, for example, a string uniquely representing the progress generated based on the current task processing progress, which is not limited in this embodiment of the present disclosure.
  • multiple nodes may be set for the task processing progress of the monitored program in advance, and only when the task processing progress of the monitored program reaches each node, the verification information corresponding to the node will be generated to Make sure that when the monitoring device receives the verification information corresponding to the node, the processing task corresponding to the node has been completed.
  • the task to be processed by the monitored program is to denoise 1000 frames of images
  • 1000 nodes can be set for the progress of the task processing.
  • the first node means that the first frame of images has been processed.
  • the second node means that the second frame image has been processed, and so on.
  • the first piece of verification information generated by the monitored device is generated and sent after the program has processed the first frame of image, so that when the monitoring device receives the verification information, the task corresponding to the processing progress indicated by the verification information has been completed.
  • the monitored device can continuously send verification information to the monitoring device.
  • the monitored device and the monitoring device can be connected wirelessly, and the verification information can be sent to the monitoring device through wireless communication.
  • the monitoring device and the monitored device may be electrically connected through a hardware communication interface, and the verification information may be sent through the hardware communication interface.
  • the monitoring device and the monitored device can be connected through the serial port, and the verification information is transmitted through the serial port.
  • the monitoring device After receiving the verification information sent by the monitored device, the monitoring device can determine whether the program is abnormal according to the received verification information. In some embodiments, if two or more pieces of verification information continuously received by the monitoring device are the same, or the task processing progress displayed in the verification information is the same, it means that the program may be stuck in an infinite loop, so it can be determined that the program is abnormal. In some embodiments, if the monitoring device does not receive the verification information for a preset period of time, the program may also be abnormal, for example, interrupted. Therefore, it may also be determined that the monitored program is abnormal.
  • the preset duration may be determined based on the maximum task processing duration between two adjacent nodes, for example, the preset duration may be greater than the maximum task processing duration between two adjacent nodes.
  • 100 nodes representing the progress of the task processing can be set, and the processing task between two adjacent nodes is to process one frame of pictures. If the processing duration is 10s, the preset duration must be greater than 10s, for example, it can be set to 15s.
  • a timer may be set on the monitoring device. If the monitoring device receives the verification information sent by the monitored device, and the verification information is different, the timer will be reset to zero, otherwise, it will not be reset to zero. If the duration counted by the device is greater than the preset duration, it is determined that the program is abnormal and subsequent exception handling operations are performed.
  • the monitoring device After the monitoring device determines that the program is abnormal, it can perform abnormal processing operations.
  • the exception handling operation can be set according to actual requirements.
  • the monitoring device can send an alarm message when the running state of the monitored program is abnormal to prompt the user that the monitored program is abnormal so that the user can handle it in time.
  • the monitoring device can issue a sound prompt, or the monitoring device can send instant messaging information, such as SMS, WeChat, QQ, etc., to the user's mobile phone, computer, smart wearable device and other terminals to remind the user that the running state of the monitored program is abnormal, so that The user comes to deal with it in time.
  • instant messaging information such as SMS, WeChat, QQ, etc.
  • the monitoring device can also send an exception handling instruction to the monitored device, and the monitored device can To perform abnormal handling operations, for example, to send alarm information from the monitored device.
  • the monitoring device may send an exception handling instruction to the monitored device, and after receiving the exception handling instruction, the monitored device may restart the program, or restart the monitored device, wherein , the monitored program can be set to start automatically when the device is started, so that the program can also be re-run when the monitored device is restarted.
  • the monitoring device can send an exception handling instruction to the program running on the monitored device, and then the program executes the instruction to restart the monitored device.
  • the monitored device or the program itself is abnormal (for example, the monitored device crashes or the program cannot receive the instruction abnormally)
  • the exception handling instruction cannot be received normally, and the corresponding exception handling operation cannot be performed.
  • the monitoring device and the monitored device can be connected through a hardware interface. After the monitoring device detects that the program runs abnormally, it can output trigger information through the hardware interface to trigger the monitored device. reboot. Compared with the method of outputting software instructions to control the restart of the monitored device, this method of triggering the restart of the monitored device through the output signal of the hardware interface is more reliable.
  • the monitoring device can output a trigger signal through the hardware interface to change the on-off state of some hardware switches on the monitored device. After the state of these hardware switches is changed, the monitored device can be restarted, or the software running on the monitored device can be restarted. The monitored program can resume normal operation.
  • a relay can be set on the monitored device, and the relay can be used to change the on-off of the reset switch on the monitored device. After the monitoring device detects that the program is running abnormally, it can output a trigger signal through a hardware interface, The trigger signal can change the state of the relay (for example, voltage, current or frequency), so that the relay controls the reset switch to be turned on, thereby controlling the monitored equipment to restart.
  • an alarm message can also be sent to prompt the user to come over to deal with the abnormality in time.
  • the monitoring device and the monitored device may be powered by different power sources. In this way, it can be avoided that the monitoring fails due to the failure of the power supply when the two use the same power supply for power supply.
  • the processing flow performed by the monitoring device specifically includes the following steps:
  • S206 Execute an exception handling operation when it is determined that the program runs abnormally.
  • the exception handling method executed by the monitoring device may be executed by a specific application program installed on the monitoring device.
  • the exception handling function can also be built in the monitoring device.
  • the monitoring device can receive verification information continuously sent by the monitored device, where the verification information is generated based on the task processing progress of the program running on the monitored device, and the verification information corresponding to different processing progress is also different. After receiving the verification information, the monitoring device can perform abnormality diagnosis according to the received verification information, and execute abnormality processing operations when it is determined that the program is running abnormally.
  • the monitoring device can also detect the abnormal state that the program enters an infinite loop based on the received verification information.
  • the monitored device and the monitoring device are independent devices, compared with the monitored program and the monitoring program are located in the same device. It can detect the abnormality of the monitored program caused by equipment failure in time, so that various abnormal situations can be covered, and the progress status of the monitored program can be monitored more comprehensively and reliably.
  • the task processing progress includes multiple nodes, and for each node in the multiple nodes, verification information corresponding to the node is generated after the task processing progress reaches the node.
  • the abnormality diagnosis is performed based on the received verification information, including:
  • the monitoring device and the monitored device are connected through a hardware interface (in other embodiments, the monitoring device and the monitored device may also be connected wirelessly), and when it is determined that the program is running abnormally Execute exception handling operations, including:
  • a trigger signal is output through the hardware interface to trigger the monitored device to restart.
  • the monitored device is provided with a relay, the relay is used to control the on-off of the reset switch of the monitored device, and a trigger signal is output through the hardware interface to trigger the monitored device to restart ,include:
  • a trigger signal is output to the relay through the hardware interface, so that the relay turns on the reset switch of the monitored device.
  • the monitored device is a high-performance computer on which a data processing program runs.
  • the data processing program can automatically receive the data to be processed from the outside, complete the data processing, and then save the processing results locally. Since the entire process of data processing by the data processing program can be run automatically, no human supervision is required.
  • an additional single-chip microcomputer can be configured as a monitoring device.
  • the single-chip microcomputer and the high-performance computer can communicate with each other through the serial port, and the single-chip computer can control the on-off of the reset switch on the mainboard of the high-performance computer.
  • a relay can be set between the reset pin and the ground on the motherboard of a high-performance computer, and the microcontroller can change the state of the relay to control the state of the reset switch on the motherboard to restart the high-performance computer.
  • the verification information can be sent to the single-chip microcomputer through the serial port of the high-performance computer continuously.
  • the verification information may change with the change of the data processing progress.
  • the verification information may include information identifying the current data processing progress.
  • the content of the verification information can be set according to specific data processing tasks.
  • any two pieces of verification information sent by the data processing program to the microcontroller should not be the same, and the verification information can only be generated after the data processing program has processed a piece of data normally. For example, assuming that the current data processing program has processed 1% of the data, it can generate a piece of verification information, and the verification information can indicate that the current processing progress is 1% of all the data processed.
  • a timer can be set on the single-chip microcomputer. If the single-chip microcomputer does not receive the verification information for a preset period of time, or if multiple pieces of verification information received continuously show that the progress of data processing has not changed, it is determined that the data processing program is abnormal. At this time, the single-chip microcomputer will turn on the reset switch on the high-performance computer motherboard, and restart the high-performance computer and the data processing program. Of course, if the single-chip microcomputer fails to restart the high-performance computer and the data processing program several times in a row, an alarm message can be sent to the maintenance personnel.
  • the single-chip microcomputer and the high-performance computer can use different power supplies.
  • the monitored data processing program can be comprehensively monitored, and the abnormality caused by the power supply, the high-performance computer, and each link of the data processing program can be detected in time, and the corresponding abnormality processing can be carried out.
  • restarting the high-performance computer by controlling the hardware switch on the high-performance device to restore the data processing program to a normal state is more secure and reliable than the method of software-controlled restarting of the high-performance computer.
  • an embodiment of the present disclosure further provides an exception processing apparatus.
  • the apparatus 30 includes:
  • the verification information generation module 31 is used for continuously generating verification information based on the task processing progress of the program running on the monitored device during the operation of the monitored device, wherein different task processing progress corresponds to different verification information;
  • a sending module 32 configured to send the verification information to the monitoring device, so that the monitoring device can perform abnormal diagnosis based on the received verification information
  • the first processing module 33 is configured to perform an exception handling operation in response to determining that the program runs abnormally.
  • the task processing progress includes a plurality of nodes
  • the verification information generation module is configured to:
  • the verification information corresponding to the node is generated.
  • the monitoring device and the monitored device are connected through a hardware interface, and the exception handling apparatus is further configured to:
  • the monitored device is restarted based on a trigger signal output by the monitoring device through the hardware interface, and the trigger signal is output when the monitoring device determines that the program is running abnormally.
  • the monitored device is provided with a relay, and the relay is used to control the on-off of the reset switch of the monitored device, and the abnormality processing device is further configured to:
  • the state of the relay is changed, so that the relay turns on the reset switch of the monitored device.
  • the monitoring device and the monitored device are powered by different power sources.
  • an embodiment of the present disclosure further provides an exception processing apparatus.
  • the apparatus 40 includes:
  • the receiving module 41 is configured to receive the verification information continuously sent by the monitored device during the running process, wherein the verification information is generated based on the task processing progress of the program running on the monitored device, and different task processing progress corresponds to different verify message;
  • an abnormality diagnosis module 42 configured to perform abnormality diagnosis based on the verification information
  • the second processing module 43 is configured to perform an exception processing operation when it is determined that the program runs abnormally.
  • the task processing progress includes multiple nodes, and for each node in the multiple nodes, verification information corresponding to the node is generated after the task processing progress reaches the progress node.
  • the abnormality diagnosis module is used to:
  • the monitoring device and the monitored device are connected through a hardware interface, and when it is determined that the program runs abnormally, an exception handling operation is performed, and the exception handling apparatus is further configured to:
  • a trigger signal is output through the hardware interface to trigger the monitored device to restart.
  • the monitored device is provided with a relay, and the relay is used to control the on-off of a reset switch of the monitored device, and the abnormality processing device is used for:
  • a trigger signal is output to the relay through the hardware interface, so that the relay turns on the reset switch of the monitored device.
  • an embodiment of the present disclosure further provides an electronic device.
  • the electronic device includes a processor 51 and a memory 52 , and the memory 52 stores a computer program executable by the processor 51 .
  • the processor 51 executes the computer program, any one of the processing methods in the foregoing embodiments is implemented.
  • an embodiment of the present disclosure also provides an exception handling system, as shown in FIG. 6 , which is a schematic diagram of an exception handling system in an embodiment of the present disclosure, and the system includes a monitored device and a monitoring device;
  • the monitored device is used to generate verification information based on the task processing progress of the program running on the monitored device during the running process, wherein different task processing progress corresponds to different verification information; information sent to monitoring equipment;
  • the monitoring device is configured to perform abnormality diagnosis based on the received verification information, and perform abnormality processing operations when it is determined that the program is abnormally running.
  • Embodiments of the present disclosure further provide a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, implements the exception handling method described in any of the foregoing embodiments.
  • Computer-readable media includes both persistent and non-permanent, removable and non-removable media, and storage of information may be implemented by any method or technology.
  • Information may be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • computer-readable media does not include transitory computer-readable media, such as modulated data signals and carrier waves.
  • a typical implementation device is a computer, which may be in the form of a personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media player, navigation device, email sending and receiving device, game control desktop, tablet, wearable device, or a combination of any of these devices.
  • each embodiment in this specification is described in a progressive manner, and the same and similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from other embodiments.
  • the description is relatively simple, and reference may be made to the partial description of the method embodiment for related parts.
  • the device embodiments described above are only illustrative, wherein the modules described as separate components may or may not be physically separated.
  • the functions of each module may be integrated into the same module. or multiple software and/or hardware implementations. Some or all of the modules may also be selected according to actual needs to achieve the purpose of the solution in this embodiment. Those of ordinary skill in the art can understand and implement it without creative effort.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

Embodiments of the present description provide an exception handling method, apparatus and system, and an electronic device. A monitored device can continuously generate verification information on the basis of the task handling progress of a running program, and send the generated verification information to a monitoring device. Because the verification information corresponding to different task handling progresses is different, as long as the program on the monitored device runs normally, any two pieces of verification information are different, and thus the monitoring device can detect, according to the received verification information, an exception state that the program enters an endless loop, and conduct corresponding exception handling.

Description

异常处理方法、装置、电子设备及系统Exception handling method, device, electronic device and system
相关申请交叉引用Cross-reference to related applications
本申请要求2021年4月25日提交的中国专利申请202110447342.8的优先权,其全部内容通过引用的方式并入本文。This application claims priority to Chinese patent application 202110447342.8 filed on April 25, 2021, the entire contents of which are incorporated herein by reference.
技术领域technical field
本公开涉及程序异常监控技术领域,尤其涉及一种异常处理方法、装置、电子设备及系统。The present disclosure relates to the technical field of program exception monitoring, and in particular, to an exception processing method, device, electronic device, and system.
背景技术Background technique
目前很多任务都会通过程序自动处理,由于程序在执行任务时都是无人值守的,因而需要对程序的运行状态进行监控,尤其是很多任务处理耗时比较长的情况,通过对其进行监控能够及时发现程序运行过程中的异常并做出相应的处理。如无法可靠检测出各种原因导致的异常,将会严重影响任务处理的进度,因而,有必要提供一种更加可靠的用于检测程序运行异常并对异常进行处理的方案。At present, many tasks are automatically processed by the program. Since the program is unattended when executing the task, it is necessary to monitor the running status of the program, especially if the processing of many tasks takes a long time. Detect exceptions in the running process of the program in time and deal with them accordingly. If the abnormality caused by various reasons cannot be reliably detected, it will seriously affect the progress of task processing. Therefore, it is necessary to provide a more reliable solution for detecting abnormality of program operation and processing the abnormality.
发明内容SUMMARY OF THE INVENTION
本公开提供一种异常处理方法、装置、电子设备及系统。The present disclosure provides an exception handling method, device, electronic device and system.
根据本公开实施例的第一方面,提供一种异常处理方法,该方法应用于被监控设备,所述方法包括:According to a first aspect of the embodiments of the present disclosure, an exception handling method is provided, the method is applied to a monitored device, and the method includes:
在所述被监控设备运行的过程中,持续地基于所述被监控设备上运行的程序的任务处理进度生成验证信息,其中,不同的任务处理进度对应不同的验证信息;During the operation of the monitored device, the verification information is continuously generated based on the task processing progress of the program running on the monitored device, wherein different task processing progress corresponds to different verification information;
将所述验证信息发送至监控设备,以使所述监控设备基于所述验证信息进行异常诊断;sending the verification information to a monitoring device, so that the monitoring device performs abnormality diagnosis based on the verification information;
响应于确定所述程序运行异常,执行异常处理操作。In response to determining that the program is operating abnormally, an exception handling operation is performed.
在一些实施例中,所述任务处理进度包括多个节点,基于所述被监控设备上运行的程序的任务处理进度生成所述验证信息,包括:In some embodiments, the task processing progress includes a plurality of nodes, and the verification information is generated based on the task processing progress of the program running on the monitored device, including:
针对任一节点,响应于确定所述任务处理进度达到该节点,生成该节点对应的验证 信息。For any node, in response to determining that the task processing progress has reached the node, verification information corresponding to the node is generated.
在一些实施例中,所述方法还包括:In some embodiments, the method further includes:
基于所述监控设备输出的触发信号重启所述被监控设备,其中,所述触发信号在所述监控设备判定所述程序运行异常的情况下输出。The monitored device is restarted based on a trigger signal output by the monitoring device, wherein the trigger signal is output when the monitoring device determines that the program runs abnormally.
在一些实施例中,所述被监控设备设有继电器,所述继电器用于控制所述被监控设备的复位开关的通断,基于所述监控设备输出的触发信号重启所述被监控设备,包括:In some embodiments, the monitored device is provided with a relay, the relay is used to control the on-off of a reset switch of the monitored device, and the monitored device is restarted based on a trigger signal output by the monitoring device, including :
基于所述监控设备输出的触发信号改变所述继电器的状态,以使所述继电器导通所述被监控设备的复位开关。The state of the relay is changed based on the trigger signal output by the monitoring device, so that the relay turns on the reset switch of the monitored device.
在一些实施例中,所述监控设备和所述被监控设备由不同的电源进行供电。In some embodiments, the monitoring device and the monitored device are powered by different power sources.
根据本公开实施例的第二方面,提供一种异常处理方法,该方法应用于监控设备,所述方法包括:According to a second aspect of the embodiments of the present disclosure, an exception handling method is provided, the method is applied to a monitoring device, and the method includes:
接收被监控设备发送的验证信息,其中,所述验证信息基于所述被监控设备上运行的程序的任务处理进度生成,不同的任务处理进度对应不同的验证信息;receiving verification information sent by the monitored device, wherein the verification information is generated based on the task processing progress of the program running on the monitored device, and different task processing progress corresponds to different verification information;
基于所述验证信息进行异常诊断;Perform abnormal diagnosis based on the verification information;
在判定所述程序运行异常的情况下执行异常处理操作。Execute an exception handling operation when it is determined that the program runs abnormally.
在一些实施例中,所述任务处理进度包括多个节点,对于所述多个节点中的每个节点,该节点对应的验证信息在所述任务处理进度达到所述节点后生成。In some embodiments, the task processing progress includes multiple nodes, and for each node in the multiple nodes, verification information corresponding to the node is generated after the task processing progress reaches the node.
在一些实施例中,基于所述验证信息进行异常诊断,包括:In some embodiments, performing abnormality diagnosis based on the verification information includes:
在连续接收到的两条验证信息相同或超过预设时间未接收到验证信息的情况下,则判定所述程序运行异常。In the case that the two pieces of verification information received continuously are the same or the verification information is not received after a preset time, it is determined that the program is running abnormally.
在一些实施例中,在判定所述程序运行异常的情况下执行异常处理操作,包括:In some embodiments, executing an exception handling operation when it is determined that the program runs abnormally, including:
在判定所述程序运行异常的情况下,输出触发信号,以触发所述被监控设备重启。When it is determined that the program runs abnormally, a trigger signal is output to trigger the monitored device to restart.
在一些实施例中,所述被监控设备设有继电器,所述继电器用于控制所述被监控设备的复位开关的通断,输出触发信号,以触发所述被监控设备重启,包括:In some embodiments, the monitored device is provided with a relay, and the relay is used to control the on-off of a reset switch of the monitored device, and output a trigger signal to trigger the monitored device to restart, including:
向所述继电器输出触发信号,以使所述继电器导通所述被监控设备的复位开关。A trigger signal is output to the relay, so that the relay turns on the reset switch of the monitored device.
根据本公开实施例的第三方面,提供一种异常处理装置,所述装置应用于被监控设备,所述装置包括:According to a third aspect of the embodiments of the present disclosure, there is provided an exception processing apparatus, the apparatus is applied to a monitored device, and the apparatus includes:
验证信息生成模块,用于在被监控设备运行的过程中,持续地基于所述被监控设备上运行的程序的任务处理进度生成验证信息,其中,不同的任务处理进度对应不同的验证信息;A verification information generation module, configured to continuously generate verification information based on the task processing progress of the program running on the monitored device during the operation of the monitored device, wherein different task processing progress corresponds to different verification information;
发送模块,用于将所述验证信息发送至监控设备,以使所述监控设备基于接收到的验证信息进行异常诊断;a sending module, configured to send the verification information to a monitoring device, so that the monitoring device performs abnormal diagnosis based on the received verification information;
第一处理模块,响应于确定所述程序运行异常,执行异常处理操作。The first processing module, in response to determining that the program runs abnormally, performs an exception handling operation.
根据本公开实施例的第四方面,提供一种异常处理装置,所述装置应用于监控设备,所述装置包括:According to a fourth aspect of the embodiments of the present disclosure, there is provided an exception processing apparatus, the apparatus is applied to monitoring equipment, and the apparatus includes:
接收模块,用于接收被监控设备在运行过程中持续发送的验证信息,其中,所述验证信息基于所述被监控设备上运行的程序的任务处理进度生成,不同的任务处理进度对应不同的验证信息;A receiving module, configured to receive verification information continuously sent by the monitored device during operation, wherein the verification information is generated based on the task processing progress of the program running on the monitored device, and different task processing progress corresponds to different verification information;
异常诊断模块,用于基于所述验证信息进行异常诊断;an abnormality diagnosis module for performing abnormality diagnosis based on the verification information;
第二处理模块,用于在判定所述程序运行异常的情况下执行异常处理操作。The second processing module is configured to perform an exception processing operation when it is determined that the program runs abnormally.
根据本公开实施例的第五方面,提供一种电子设备,所述电子设备包括处理器和存储器,所述存储器存储有可供所述处理器执行的计算机程序,所述处理器执行所述计算机程序时,实现第一方面或第二方面提及的异常处理方法。According to a fifth aspect of the embodiments of the present disclosure, an electronic device is provided, the electronic device includes a processor and a memory, the memory stores a computer program executable by the processor, and the processor executes the computer When the program is executed, the exception handling method mentioned in the first aspect or the second aspect is implemented.
根据本公开实施例的第六方面,提供一种异常处理系统,包括被监控设备和监控设备;According to a sixth aspect of the embodiments of the present disclosure, there is provided an exception handling system, including a monitored device and a monitoring device;
所述被监控设备用于在运行过程中,持续地基于所述被监控设备上运行的程序的任务处理进度生成验证信息,其中,不同的任务处理进度对应不同的验证信息;并将所述验证信息发送至监控设备;The monitored device is used to continuously generate verification information based on the task processing progress of the program running on the monitored device during the running process, wherein different task processing progress corresponds to different verification information; information sent to monitoring equipment;
所述监控设备用于基于所述验证信息进行异常诊断,并在判定所述程序运行异常的情况下执行异常处理操作。The monitoring device is configured to perform abnormality diagnosis based on the verification information, and execute abnormality processing operations when it is determined that the program runs abnormally.
传统方式通过发送静态验证信息进行异常监控,从而导致在程序陷入死循环的情况下无法检测出异常。本公开实施例中,被监控设备可以持续地基于运行的程序的任务处理进度生成验证信息,并将生成的验证信息发送给监控设备。由于不同的任务处理进度对应的验证信息不同,因而,只要被监控设备上的程序正常运行,那么任意两条验证信息都不一样,从而监控设备可以根据接收到的验证信息检测出程序运行出现异常,例如 进入死循环,并做出相应的异常处理。相比于以往的发送静态的验证信息的方式,本公开实施例可以对程序进行更加全面和可靠的监控。The traditional way is to monitor exceptions by sending static verification information, so that the exception cannot be detected when the program is stuck in an infinite loop. In the embodiment of the present disclosure, the monitored device can continuously generate verification information based on the task processing progress of the running program, and send the generated verification information to the monitoring device. Since the verification information corresponding to different task processing progress is different, as long as the program on the monitored device runs normally, any two verification information is different, so that the monitoring device can detect the abnormality of the program operation according to the received verification information. , such as entering an infinite loop, and make corresponding exception handling. Compared with the conventional method of sending static verification information, the embodiment of the present disclosure can monitor the program more comprehensively and reliably.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本公开。It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.
附图说明Description of drawings
此处的附图被并入说明书中并构成本说明书的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。The accompanying drawings, which are incorporated into and constitute a part of this specification, illustrate embodiments consistent with the present disclosure, and together with the description, serve to explain the technical solutions of the present disclosure.
图1是本公开实施例的一种异常处理方法流程图。FIG. 1 is a flowchart of an exception processing method according to an embodiment of the present disclosure.
图2是本公开实施例的一种异常处理方法流程图。FIG. 2 is a flowchart of an exception processing method according to an embodiment of the present disclosure.
图3是本公开实施例的一种异常处理装置的逻辑结构框图。FIG. 3 is a logical structural block diagram of an exception processing apparatus according to an embodiment of the present disclosure.
图4是本公开实施例的一种异常处理装置的逻辑结构框图。FIG. 4 is a logical structural block diagram of an exception processing apparatus according to an embodiment of the present disclosure.
图5是本公开实施例的一种电子设备的逻辑结构框图。FIG. 5 is a logical structural block diagram of an electronic device according to an embodiment of the present disclosure.
图6是本公开实施例的一种异常处理系统的逻辑结构框图。FIG. 6 is a logical structural block diagram of an exception handling system according to an embodiment of the present disclosure.
具体实施方式Detailed ways
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. Where the following description refers to the drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the illustrative examples below are not intended to represent all implementations consistent with this disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as recited in the appended claims.
在本公开使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本公开。在本公开和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。另外,本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合。The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to limit the present disclosure. As used in this disclosure and the appended claims, the singular forms "a," "the," and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It will also be understood that the term "and/or" as used herein refers to and includes any and all possible combinations of one or more of the associated listed items. Additionally, the term "at least one" herein refers to any one of a plurality or any combination of at least two of a plurality.
应当理解,尽管在本公开可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本公开范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以 被称为第一信息。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。It should be understood that although the terms first, second, third, etc. may be used in this disclosure to describe various pieces of information, such information should not be limited by these terms. These terms are only used to distinguish the same type of information from each other. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure. Depending on the context, the word "if" as used herein can be interpreted as "at the time of" or "when" or "in response to determining."
为了使本技术领域的人员更好的理解本公开实施例中的技术方案,并使本公开实施例的上述目的、特征和优点能够更加明显易懂,下面结合附图对本公开实施例中的技术方案作进一步详细的说明。In order for those skilled in the art to better understand the technical solutions in the embodiments of the present disclosure, and to make the above objects, features and advantages of the embodiments of the present disclosure more clearly understood, the following describes the technical solutions in the embodiments of the present disclosure with reference to the accompanying drawings. The program is described in further detail.
很多任务都可以通过开发的程序或软件自动执行,比如,可以通过专门的数据处理程序不断地接收待处理的数据,然后对待处理数据进行处理。由于很多任务的处理时长比较长,比如,可能程序需要全天运行以完成待处理的任务,所以,很多程序在运行时都处于无人值守的状态。当然,由于程序在运行过程中不可避免会出现异常,导致任务处理中断,影响任务处理进度。因此,需要对处理任务的程序的运行状态进行监控,以及时发现异常并进行相应的处理。Many tasks can be performed automatically by developed programs or software, for example, the data to be processed can be continuously received by a special data processing program and then processed. Because the processing time of many tasks is relatively long, for example, the program may need to run all day to complete the pending tasks, so many programs run in an unattended state. Of course, since an exception will inevitably occur during the running process of the program, the task processing will be interrupted and the task processing progress will be affected. Therefore, it is necessary to monitor the running state of the program that processes the task, find out the abnormality in time, and deal with it accordingly.
在对处理任务的程序的运行状态进行监控时,有的监控方式直接在被监控端开启另一个辅助程序,通过辅助程序对被监控端上的被监控程序进行异常监控,但是这种监控方式在设备断电或出现故障后即失效。有的监控方式可以通过与被监控端互相独立的监控设备对被监控端上的程序进行监控,以克服因设备故障导致监控失效的问题。具体的,被监控端可以持续地向监控设备发送验证信息,如果监控设备超过一定时长没有接收到验证信息,即可以判定被监控的程序运行状态出现异常。但是,在被监控的程序陷入死循环的情况下,被监控端仍然会一直向监控设备发送验证信息,由于目前发送的验证信息都是涉及软件版本、硬盘剩余容量、程序日志、用户端地址或者应用程序路径等相关的静态的验证信息,即一段时间内的验证信息是一样的,在这种情况下,监控设备无法区分被监控设备发送相同的验证信息是由于陷入死循环导致的异常情况,还是由于静态验证信息未发生改变而导致的,从而无法及时地检测出被监控程序的异常并进行相应地处理。When monitoring the running state of the program processing the task, some monitoring methods directly open another auxiliary program on the monitored end, and use the auxiliary program to monitor the abnormality of the monitored program on the monitored end, but this monitoring method is in The device becomes ineffective when it loses power or malfunctions. Some monitoring methods can monitor the program on the monitored terminal through monitoring equipment independent of the monitored terminal, so as to overcome the problem of monitoring failure due to equipment failure. Specifically, the monitored terminal can continuously send verification information to the monitoring device. If the monitoring device does not receive the verification information for a certain period of time, it can be determined that the running state of the monitored program is abnormal. However, when the monitored program falls into an infinite loop, the monitored terminal will still send verification information to the monitoring device, because the verification information currently sent involves software version, remaining capacity of hard disk, program log, client address or The static verification information related to the application path and other related verification information, that is, the verification information for a period of time is the same. In this case, the monitoring device cannot distinguish that the same verification information sent by the monitored device is due to an abnormal situation caused by an infinite loop. It is also caused by the fact that the static verification information has not changed, so that the abnormality of the monitored program cannot be detected in time and dealt with accordingly.
基于此,本公开实施例提供了一种异常处理方法,可以通过与被监控设备相互独立的监控设备对被监控设备上运行的程序进行监控,被监控设备可以根据程序的任务处理进度生成动态的验证信息,该验证信息可以随着任务处理进度的变化而变化,从而,只要执行任务的程序正常运行,但其发送给监控设备的任意两条验证信息都不一致,这样监控设备即可以根据接收到的验证信息检测出程序运行出现异常,比如,进入死循环等异常状态。Based on this, an embodiment of the present disclosure provides an exception handling method, which can monitor a program running on a monitored device through a monitoring device that is independent of the monitored device, and the monitored device can generate dynamic Verification information, which can change with the progress of the task processing. Therefore, as long as the program executing the task runs normally, any two pieces of verification information sent to the monitoring device are inconsistent, so that the monitoring device can receive The verification information detects that the program is running abnormally, for example, entering an abnormal state such as an infinite loop.
本公开实施例中的被监控设备是指运行有需要被监控的程序的设备,该设备可以是 手机、笔记本电脑、服务器、智能穿戴设备等电子设备。该被监控的程序可以用于执行某些特定的任务,比如,可以用于对图像进行处理,或对数据进行清洗整合等等。该被监控的程序可以是任一能够实现某些具体功能的代码或者代码集合,本公开实施例不作限制。The monitored device in the embodiment of the present disclosure refers to a device running a program that needs to be monitored, and the device may be an electronic device such as a mobile phone, a notebook computer, a server, and a smart wearable device. The monitored program can be used to perform some specific tasks, for example, can be used to process images, or clean and integrate data, and so on. The monitored program may be any code or code set capable of implementing certain specific functions, which is not limited in this embodiment of the present disclosure.
本公开实施例的监控设备可以是与被监控设备在物理上相互独立的设备,该设备具备对被监控设备上的程序进行监控的功能。在一些实施例中,该监控设备可以是结构比较简单,性能比较低的设备,比如,可以仅仅是集成了对被监控设备上的程序进行监控的功能的单片机。当然,也可以是同时集成了其他功能的具有较高性能的设备,比如,也可以是手机、电脑、平板、智能穿戴设备等等。The monitoring device in the embodiment of the present disclosure may be a device that is physically independent from the monitored device, and the device has a function of monitoring programs on the monitored device. In some embodiments, the monitoring device may be a device with a relatively simple structure and low performance, for example, a single-chip microcomputer that integrates a function of monitoring programs on the monitored device. Of course, it can also be a device with high performance that integrates other functions at the same time, for example, it can also be a mobile phone, a computer, a tablet, a smart wearable device, and so on.
以下分别从被监控设备和监控设备的处理过程对本公开实施例中的异常处理方法进行介绍。The exception handling methods in the embodiments of the present disclosure will be introduced below from the processing procedures of the monitored device and the monitoring device, respectively.
如图1所示,被监控设备执行的处理流程包括以下步骤:As shown in Figure 1, the processing flow performed by the monitored device includes the following steps:
S102、在被监控设备运行的过程中,持续地基于所述被监控设备上运行的程序的任务处理进度生成验证信息,其中,不同的任务处理进度对应不同的验证信息;S102, during the operation of the monitored device, continuously generate verification information based on the task processing progress of the program running on the monitored device, wherein different task processing progress corresponds to different verification information;
S104、将生成的所述验证信息发送至监控设备,以使所述监控设备基于接收到的验证信息进行异常诊断;S104, sending the generated verification information to a monitoring device, so that the monitoring device performs abnormal diagnosis based on the received verification information;
S106、响应于确定所述程序运行异常,执行异常处理操作。S106. In response to determining that the program runs abnormally, perform an exception handling operation.
其中,由被监控设备执行的异常处理方法可以由被监控设备中的被监控的程序执行,也可以由被监控设备中独立于该被监控的程序的其他程序执行。举个例子,假设被监控的程序为数据处理程序,则该异常处理方法可以由该数据处理程序执行,也可以由独立于该数据处理程序的其他程序执行。The exception handling method executed by the monitored device may be executed by a monitored program in the monitored device, or may be executed by another program in the monitored device that is independent of the monitored program. For example, assuming that the monitored program is a data processing program, the exception handling method may be executed by the data processing program, or may be executed by other programs independent of the data processing program.
在被监控设备运行后,可以持续地基于被监控设备上运行的程序的任务处理进度生成验证信息,其中,不同的任务处理进度对应不同的验证信息,即验证信息会随着程序处理任务的进度的变化而变化。如果程序的任务处理进度未发生变化,则验证信息也不会变化,从而可以根据验证信息检测出程序进入死循环这一异常状态。After the monitored device runs, verification information can be continuously generated based on the task processing progress of the program running on the monitored device, wherein different task processing progress corresponds to different verification information, that is, the verification information will follow the progress of the program processing task. changes with the change. If the task processing progress of the program does not change, the verification information also does not change, so that the abnormal state that the program enters an infinite loop can be detected according to the verification information.
被监控设备基于任务处理进度每生成一条验证信息后,可以将验证信息发送给监控设备。监控设备可以接收被监控设备持续发送的验证信息,然后根据接收到的验证信息判定被监控设备上运行的程序进行异常诊断。如果监控设备基于接收到的验证信息判定被监控设备上运行的程序运行异常,则执行相应的异常处理操作。After each piece of verification information is generated by the monitored device based on the task processing progress, the verification information can be sent to the monitoring device. The monitoring device can receive the verification information continuously sent by the monitored device, and then judge the program running on the monitored device to perform abnormal diagnosis according to the received verification information. If the monitoring device determines that the program running on the monitored device runs abnormally based on the received verification information, a corresponding exception handling operation is performed.
通过被监控设备基于运行的程序的任务处理进度生成验证信息,验证信息随着任务处理进度的变化而变化,因而,只要程序运行正常,被监控设备发送的任意两条验证信息均不同。从而,监控设备也可以基于接收到的验证信息检测出程序进入死循环这一异常状态,同时,由于被监控设备和监控设备为相互独立的设备,相比于被监控程序和监控程序位于同一设备的情况,可以及时检测出因设备故障导致的被监控程序异常,从而可以覆盖各种异常情况,对被监控程序的进行状态进行更加全面而可靠的监测。The monitored device generates verification information based on the task processing progress of the running program, and the verification information changes with the task processing progress. Therefore, as long as the program runs normally, any two pieces of verification information sent by the monitored device are different. Therefore, the monitoring device can also detect the abnormal state that the program enters an infinite loop based on the received verification information. At the same time, because the monitored device and the monitoring device are independent devices, compared with the monitored program and the monitoring program are located in the same device. It can detect the abnormality of the monitored program caused by equipment failure in time, so that various abnormal situations can be covered, and the progress status of the monitored program can be monitored more comprehensively and reliably.
在一些实施例中,验证信息中可以包括标识程序当前任务处理进度的信息,比如,程序执行的任务为处理一批数据,那么当程序处理完这批数据的1%时,则可以在验证信息中加入表示当前进度的信息(比如,1%)。当然,验证信息中也可以包括根据当前任务处理进度生成的其他信息,比如,基于当前任务处理进度生成的唯一表示该进度的字符串,本公开实施例对此不作限制。In some embodiments, the verification information may include information identifying the current task processing progress of the program. For example, if the task executed by the program is to process a batch of data, when the program has processed 1% of the batch of data, the verification information Include information indicating the current progress (eg, 1%). Of course, the verification information may also include other information generated according to the current task processing progress, for example, a string uniquely representing the progress generated based on the current task processing progress, which is not limited in this embodiment of the present disclosure.
在一些实施例中,可以预先为被监控的程序的任务处理进度设置多个节点,在被监控的程序的任务处理进度到达每个节点的情况下,才会生成该节点对应的验证信息,以确保监控设备在接收到该节点对应的验证信息时,该节点对应的处理任务已经完成。举个例子,假设被监控程序要处理的任务是对1000帧图像进行去噪处理,则可以为该任务处理的进度设置1000个节点,比如,第一个节点是指第一帧图像已处理完,第二节点是指第二帧图像已处理完,以此类推。而被监控设备生成第一条验证信息是在程序处理完第一帧图像后才生成并发送,以便监控设备在收到验证信息时,验证信息所指示的处理进度对应的任务已完成。In some embodiments, multiple nodes may be set for the task processing progress of the monitored program in advance, and only when the task processing progress of the monitored program reaches each node, the verification information corresponding to the node will be generated to Make sure that when the monitoring device receives the verification information corresponding to the node, the processing task corresponding to the node has been completed. For example, if the task to be processed by the monitored program is to denoise 1000 frames of images, 1000 nodes can be set for the progress of the task processing. For example, the first node means that the first frame of images has been processed. , the second node means that the second frame image has been processed, and so on. The first piece of verification information generated by the monitored device is generated and sent after the program has processed the first frame of image, so that when the monitoring device receives the verification information, the task corresponding to the processing progress indicated by the verification information has been completed.
被监控设备可以持续地给监控设备发送验证信息,在一些实施例中,被监控设备和监控设备可以无线连接,验证信息可以通过无线通信的方式发送给监控设备。在另一些实施例中,为了保证验证信息的传输更加可靠,监控设备和被监控设备可以通过硬件通信接口电连接,验证信息可以通过硬件通信接口发送。比如,监控设备和被监控设备可以通过串口连接,验证信息通过串口传输。The monitored device can continuously send verification information to the monitoring device. In some embodiments, the monitored device and the monitoring device can be connected wirelessly, and the verification information can be sent to the monitoring device through wireless communication. In other embodiments, in order to ensure more reliable transmission of the verification information, the monitoring device and the monitored device may be electrically connected through a hardware communication interface, and the verification information may be sent through the hardware communication interface. For example, the monitoring device and the monitored device can be connected through the serial port, and the verification information is transmitted through the serial port.
监控设备在接收到被监控设备发送的验证信息后,可以根据接收到的验证信息判定程序是否异常。在一些实施例中,如果监控设备连续接收的两条或者多条验证信息一样,或者验证信息中显示的任务处理进度一样,则说明程序可能陷入了死循环,因而可以判定程序出现异常。在一些实施例中,如果监控设备超过预设时长而未接收到验证信息,则程序也可能出现了异常,比如,被中断,因此,也可以判定被监控程序出现异常。其中,预设时长可以基于相邻两个节点之间的最大任务处理时长确定,比如,预设时长可 以大于相邻两个节点之间的最大任务处理时长。示例性的,处理100帧图片,则可以设置100个表征任务处理进度的节点,相邻两个节点之间的处理任务为处理一帧图片,这100帧图片中最耗时的一帧图片的处理时长为10s,那么预设时长需大于10s,比如,可以设置成15s。After receiving the verification information sent by the monitored device, the monitoring device can determine whether the program is abnormal according to the received verification information. In some embodiments, if two or more pieces of verification information continuously received by the monitoring device are the same, or the task processing progress displayed in the verification information is the same, it means that the program may be stuck in an infinite loop, so it can be determined that the program is abnormal. In some embodiments, if the monitoring device does not receive the verification information for a preset period of time, the program may also be abnormal, for example, interrupted. Therefore, it may also be determined that the monitored program is abnormal. The preset duration may be determined based on the maximum task processing duration between two adjacent nodes, for example, the preset duration may be greater than the maximum task processing duration between two adjacent nodes. Exemplarily, to process 100 frames of pictures, 100 nodes representing the progress of the task processing can be set, and the processing task between two adjacent nodes is to process one frame of pictures. If the processing duration is 10s, the preset duration must be greater than 10s, for example, it can be set to 15s.
在一些实施例中,可以在监控设备上设置一个计时器,如果监控设备接收到被监控设备发送的验证信息,且验证信息不同,则将计时器归零,否则,则不归零,如果计时器所计的时长大于预设时长,则判定程序异常并执行后续的异常处理操作。In some embodiments, a timer may be set on the monitoring device. If the monitoring device receives the verification information sent by the monitored device, and the verification information is different, the timer will be reset to zero, otherwise, it will not be reset to zero. If the duration counted by the device is greater than the preset duration, it is determined that the program is abnormal and subsequent exception handling operations are performed.
监控设备在判定程序出现异常后,可以执行异常处理操作。异常处理操作可以根据实际需求设置,在一些实施例中,监控设备在被监控程序的运行状态出现异常后,可以发出报警信息,以提示用户被监控的程序出现异常,以便用户及时处理。比如,监控设备可以发出声音提示、或者监控设备可以向用户的手机、电脑、智能穿戴设备等终端发送即时通讯信息,比如短信、微信、QQ等,提示用户被监控程序的运行状态出现异常,以便用户及时过来处理。当然,在一些实施例中,如果监控设备的性能比较差,无法实现上述报警功能,比如,在监控设备只是一个单片机的场景,监控设备也可以向被监控设备发送异常处理指令,由被监控设备来执行异常处理操作,比如,由被监控设备发出告警信息。在一些实施例中,监控设备在判定被监控程序异常后,可以向被监控设备发送异常处理指令,被监控设备在接收到异常处理指令后,可以重启该程序,或者重启该被监控设备,其中,被监控程序可以设置成在设备启动的时候也自动启动,从而在重启被监控设备的时候程序也可以重新运行。After the monitoring device determines that the program is abnormal, it can perform abnormal processing operations. The exception handling operation can be set according to actual requirements. In some embodiments, the monitoring device can send an alarm message when the running state of the monitored program is abnormal to prompt the user that the monitored program is abnormal so that the user can handle it in time. For example, the monitoring device can issue a sound prompt, or the monitoring device can send instant messaging information, such as SMS, WeChat, QQ, etc., to the user's mobile phone, computer, smart wearable device and other terminals to remind the user that the running state of the monitored program is abnormal, so that The user comes to deal with it in time. Of course, in some embodiments, if the performance of the monitoring device is relatively poor, the above-mentioned alarm function cannot be implemented. For example, in the scenario where the monitoring device is only a single-chip microcomputer, the monitoring device can also send an exception handling instruction to the monitored device, and the monitored device can To perform abnormal handling operations, for example, to send alarm information from the monitored device. In some embodiments, after determining that the monitored program is abnormal, the monitoring device may send an exception handling instruction to the monitored device, and after receiving the exception handling instruction, the monitored device may restart the program, or restart the monitored device, wherein , the monitored program can be set to start automatically when the device is started, so that the program can also be re-run when the monitored device is restarted.
当然,在监控设备发现程序运行异常后,可以发送异常处理指令给被监控设备上运行的程序,然后由程序执行该指令,以重启被监控设备。但是,如果被监控设备或程序本身存在异常(比如,被监控设备死机或者程序异常无法接收指令),则无法正常接收该异常处理指令,也无法执行相应的异常处理操作。在一些实施例中,为了保证异常处理操作可以更加可靠,监控设备和被监控设备可以通过硬件接口连接,监控设备在检测到程序运行异常后,可以通过该硬件接口输出触发信息,触发被监控设备重启。相比于输出软件指令控制被监控设备重启的方式,这种通过硬件接口输出信号触发其重启的方式更加可靠。Of course, after the monitoring device finds that the program runs abnormally, it can send an exception handling instruction to the program running on the monitored device, and then the program executes the instruction to restart the monitored device. However, if the monitored device or the program itself is abnormal (for example, the monitored device crashes or the program cannot receive the instruction abnormally), the exception handling instruction cannot be received normally, and the corresponding exception handling operation cannot be performed. In some embodiments, in order to ensure that the exception handling operation can be more reliable, the monitoring device and the monitored device can be connected through a hardware interface. After the monitoring device detects that the program runs abnormally, it can output trigger information through the hardware interface to trigger the monitored device. reboot. Compared with the method of outputting software instructions to control the restart of the monitored device, this method of triggering the restart of the monitored device through the output signal of the hardware interface is more reliable.
在一些实例中,监控设备可以通过硬件接口输出触发信号改变被监控设备上的一些硬件开关的通断状态,这些硬件开关的状态改变后可以使得被监控设备重启,或者使得被监控设备上运行的被监控程序可以恢复正常运行的状态。在一些实施例中,被监控设 备上可以设置一个继电器,该继电器可以用来改变被监控设备上的复位开关的通断,监控设备在检测到程序运行异常后,可以通过硬件接口输出触发信号,该触发信号可以改变继电器的状态(比如,电压、电流或频率),使得继电器控制复位开关导通,从而控制被监控设备重启。当然,如果监控设备多次触发被监控设备重启都失败,也可以发出告警信息,以提示用户及时过来处理异常。In some instances, the monitoring device can output a trigger signal through the hardware interface to change the on-off state of some hardware switches on the monitored device. After the state of these hardware switches is changed, the monitored device can be restarted, or the software running on the monitored device can be restarted. The monitored program can resume normal operation. In some embodiments, a relay can be set on the monitored device, and the relay can be used to change the on-off of the reset switch on the monitored device. After the monitoring device detects that the program is running abnormally, it can output a trigger signal through a hardware interface, The trigger signal can change the state of the relay (for example, voltage, current or frequency), so that the relay controls the reset switch to be turned on, thereby controlling the monitored equipment to restart. Of course, if the monitoring device fails to restart the monitored device for many times, an alarm message can also be sent to prompt the user to come over to deal with the abnormality in time.
在一些实施例中,监控设备和被监控设备可以采用不同的电源供电。从而可以避免由于两者使用同一个电源供电时,因电源出现故障,导致监控失效。In some embodiments, the monitoring device and the monitored device may be powered by different power sources. In this way, it can be avoided that the monitoring fails due to the failure of the power supply when the two use the same power supply for power supply.
以下从监控设备执行的处理过程进行解释,如图2所示,监控设备执行的处理流程具体包括以下步骤:The following is an explanation from the processing process performed by the monitoring device. As shown in Figure 2, the processing flow performed by the monitoring device specifically includes the following steps:
S202、接收被监控设备发送的验证信息,其中,所述验证信息基于所述被监控设备上运行的程序的任务处理进度生成,不同的任务处理进度对应不同的验证信息;S202, receiving the verification information sent by the monitored device, wherein the verification information is generated based on the task processing progress of the program running on the monitored device, and different task processing progress corresponds to different verification information;
S204、基于所接收到的验证信息进行异常诊断;S204, performing abnormality diagnosis based on the received verification information;
S206、在判定所述程序运行异常的情况下执行异常处理操作。S206: Execute an exception handling operation when it is determined that the program runs abnormally.
监控设备执行的异常处理方法可以由安装于监控设备上的特定的应用程序执行。当然,该异常处理功能也可以内置于该监控设备上。The exception handling method executed by the monitoring device may be executed by a specific application program installed on the monitoring device. Of course, the exception handling function can also be built in the monitoring device.
监控设备可以接收被监控设备持续发送的验证信息,该验证信息基于被监控设备上运行的程序的任务处理进度生成,不同的处理进度对应的验证信息也不同。监控设备在接收到验证信息后,可以根据接收到的验证信息进行异常诊断,并且在判定该程序运行异常时执行异常处理操作。The monitoring device can receive verification information continuously sent by the monitored device, where the verification information is generated based on the task processing progress of the program running on the monitored device, and the verification information corresponding to different processing progress is also different. After receiving the verification information, the monitoring device can perform abnormality diagnosis according to the received verification information, and execute abnormality processing operations when it is determined that the program is running abnormally.
由于验证信息随着任务处理进度的变化而变化,因而,只要程序运行正常,被监控设备发送的任意两条验证信息均不同。从而,监控设备也可以基于接收到的验证信息检测出程序进入死循环这一异常状态,同时,由于被监控设备和监控设备为相互独立的设备,相比于被监控程序和监控程序位于同一设备的情况,可以及时检测出因设备故障导致的被监控程序异常,从而可以覆盖各种异常情况,对被监控程序的进行状态进行更加全面而可靠的监测。Since the verification information changes with the progress of the task processing, as long as the program runs normally, any two pieces of verification information sent by the monitored device are different. Therefore, the monitoring device can also detect the abnormal state that the program enters an infinite loop based on the received verification information. At the same time, because the monitored device and the monitoring device are independent devices, compared with the monitored program and the monitoring program are located in the same device. It can detect the abnormality of the monitored program caused by equipment failure in time, so that various abnormal situations can be covered, and the progress status of the monitored program can be monitored more comprehensively and reliably.
在一些实施例中,所述任务处理进度包括多个节点,对于所述多个节点中的每个节点,该节点对应的验证信息在所述任务处理进度达到该节点后生成。In some embodiments, the task processing progress includes multiple nodes, and for each node in the multiple nodes, verification information corresponding to the node is generated after the task processing progress reaches the node.
在一些实施例中,基于所接收到的验证信息进行异常诊断,包括:In some embodiments, the abnormality diagnosis is performed based on the received verification information, including:
在连续接收到的两条验证信息相同和/或超过预设时长未接收到所述验证信息的情况下,则判定所述程序异常。If the two pieces of verification information received continuously are the same and/or the verification information has not been received for a preset time period, it is determined that the program is abnormal.
在一些实施例中,所述监控设备和所述被监控设备通过硬件接口连接(在其他实施例中,所述监控设备和所述被监控设备也可以无线连接),在判定所述程序运行异常的情况下执行异常处理操作,包括:In some embodiments, the monitoring device and the monitored device are connected through a hardware interface (in other embodiments, the monitoring device and the monitored device may also be connected wirelessly), and when it is determined that the program is running abnormally Execute exception handling operations, including:
在判定所述程序运行异常的情况下,通过所述硬件接口输出触发信号,以触发所述被监控设备重启。When it is determined that the program runs abnormally, a trigger signal is output through the hardware interface to trigger the monitored device to restart.
在一些实施例中,所述被监控设备设有继电器,所述继电器用于控制所述被监控设备的复位开关的通断,通过所述硬件接口输出触发信号,以触发所述被监控设备重启,包括:In some embodiments, the monitored device is provided with a relay, the relay is used to control the on-off of the reset switch of the monitored device, and a trigger signal is output through the hardware interface to trigger the monitored device to restart ,include:
通过所述硬件接口向所述继电器输出触发信号,以使所述继电器导通所述被监控设备的复位开关。A trigger signal is output to the relay through the hardware interface, so that the relay turns on the reset switch of the monitored device.
其中,监控设备在对被监控设备进行异常检测和异常处理的具体实现细节可参考被监控设备一侧中各实施中的描述,在此不再赘述。For the specific implementation details of the monitoring device performing abnormality detection and abnormality processing on the monitored device, reference may be made to the descriptions in each implementation on the monitored device side, which will not be repeated here.
为了进一步解释本公开实施例的异常处理方法,以下结合一个具体的实施例加以解释。In order to further explain the exception handling method of the embodiment of the present disclosure, the following is explained with reference to a specific embodiment.
示例性地,被监控设备为一台高性能计算机,该计算机上运行有一个数据处理程序。该数据处理程序能够自动从外部接收到待处理数据并完成数据处理,随后将处理结果保存在本地,由于数据处理程序进行数据处理的整个过程全部能够自动化运行,不需要人监管。Exemplarily, the monitored device is a high-performance computer on which a data processing program runs. The data processing program can automatically receive the data to be processed from the outside, complete the data processing, and then save the processing results locally. Since the entire process of data processing by the data processing program can be run automatically, no human supervision is required.
为了及时发现数据处理程序由于各种异常导致的数据处理任务中断,可以额外配置一台单片机作为监控设备。单片机与高性能计算机可以通过串口互相通信,并且单片机能够控制高性能计算机主板上reset(复位)开关的通断。比如,高性能计算机主板上的reset引脚与地之间可以设置一个继电器,单片机可以改变该继电器的状态,从而对主板上的reset开关的状态进行控制,以重启高性能计算机。In order to find out the interruption of data processing tasks caused by various abnormalities in the data processing program in time, an additional single-chip microcomputer can be configured as a monitoring device. The single-chip microcomputer and the high-performance computer can communicate with each other through the serial port, and the single-chip computer can control the on-off of the reset switch on the mainboard of the high-performance computer. For example, a relay can be set between the reset pin and the ground on the motherboard of a high-performance computer, and the microcontroller can change the state of the relay to control the state of the reset switch on the motherboard to restart the high-performance computer.
数据处理程序运行过程中可以持续地通过高性能计算机的串口向单片机发送验证信息。其中,验证信息会随着数据处理进度的改变而变化,比如,验证信息中可以包括标识当前的数据处理进度的信息。当然,验证信息的内容可以根据具体的数据处理任务设置。当主设备上的数据处理程序正常运行时,数据处理程序向单片机发送的任意两条验 证信息都不应相同,并且验证信息只有在数据处理程序完全正常地处理完一段数据后才能生成。举个例子,假设当前数据处理程序处理完1%的数据,那么其可以生成一条验证信息,验证信息中可以指示当前的处理进度为处理完所有数据的1%。During the running process of the data processing program, the verification information can be sent to the single-chip microcomputer through the serial port of the high-performance computer continuously. The verification information may change with the change of the data processing progress. For example, the verification information may include information identifying the current data processing progress. Of course, the content of the verification information can be set according to specific data processing tasks. When the data processing program on the main device is running normally, any two pieces of verification information sent by the data processing program to the microcontroller should not be the same, and the verification information can only be generated after the data processing program has processed a piece of data normally. For example, assuming that the current data processing program has processed 1% of the data, it can generate a piece of verification information, and the verification information can indicate that the current processing progress is 1% of all the data processed.
其中,单片机上可以设置一个计时器,如果单片机超过预设时长没有收到验证信息,或连续接收到的多条验证信息均显示数据处理的进度没有任何改变,则判定数据处理程序出现异常。此时单片机将接通高性能计算机主板上reset开关,重新启动高性能计算机以及数据处理程序。当然,如果单片机连续多次尝试重启高性能计算机和数据处理程序失败,则可以向维护人员发送报警信息。Among them, a timer can be set on the single-chip microcomputer. If the single-chip microcomputer does not receive the verification information for a preset period of time, or if multiple pieces of verification information received continuously show that the progress of data processing has not changed, it is determined that the data processing program is abnormal. At this time, the single-chip microcomputer will turn on the reset switch on the high-performance computer motherboard, and restart the high-performance computer and the data processing program. Of course, if the single-chip microcomputer fails to restart the high-performance computer and the data processing program several times in a row, an alarm message can be sent to the maintenance personnel.
当然,为了避免由于供电电源的异常导致单片机无法对高性能计算机进行监控,单片机和高性能计算机可以采用不同的电源供电。Of course, in order to prevent the single-chip microcomputer from being unable to monitor the high-performance computer due to the abnormality of the power supply, the single-chip microcomputer and the high-performance computer can use different power supplies.
通过上述方法,可以对被监控的数据处理程序进行全面的监控,可以及时发现由于供电电源、高性能计算机、以及数据处理程序各个环节出现的异常,并进行相应的异常处理。同时,通过控制高性能设备上的硬件开关重启高性能计算机,以将数据处理程序恢复至正常状态,相比于软件控制高性能计算机重启的方式更加安全可靠。Through the above method, the monitored data processing program can be comprehensively monitored, and the abnormality caused by the power supply, the high-performance computer, and each link of the data processing program can be detected in time, and the corresponding abnormality processing can be carried out. At the same time, restarting the high-performance computer by controlling the hardware switch on the high-performance device to restore the data processing program to a normal state is more secure and reliable than the method of software-controlled restarting of the high-performance computer.
相应的,本公开实施例还提供了一种异常处理装置,如图3所示,所述装置30包括:Correspondingly, an embodiment of the present disclosure further provides an exception processing apparatus. As shown in FIG. 3 , the apparatus 30 includes:
验证信息生成模块31,用于在被监控设备运行的过程中,持续地基于所述被监控设备上运行的程序的任务处理进度生成验证信息,其中,不同的任务处理进度对应不同的验证信息;The verification information generation module 31 is used for continuously generating verification information based on the task processing progress of the program running on the monitored device during the operation of the monitored device, wherein different task processing progress corresponds to different verification information;
发送模块32,用于将所述验证信息发送至监控设备,以使所述监控设备基于接收到的验证信息进行异常诊断;a sending module 32, configured to send the verification information to the monitoring device, so that the monitoring device can perform abnormal diagnosis based on the received verification information;
第一处理模块33,用于响应于确定所述程序运行异常,执行异常处理操作。The first processing module 33 is configured to perform an exception handling operation in response to determining that the program runs abnormally.
在一些实施例中,所述任务处理进度包括多个节点,所述验证信息生成模块用于:In some embodiments, the task processing progress includes a plurality of nodes, and the verification information generation module is configured to:
针对任一节点,在所述任务处理进度达到该节点的情况下,生成该节点对应的验证信息。For any node, when the task processing progress reaches the node, the verification information corresponding to the node is generated.
在一些实施例中,所述监控设备和所述被监控设备通过硬件接口连接,所述异常处理装置还用于:In some embodiments, the monitoring device and the monitored device are connected through a hardware interface, and the exception handling apparatus is further configured to:
基于所述监控设备通过所述硬件接口输出的触发信号重启所述被监控设备,所 述触发信号在所述监控设备判定所述程序运行异常的情况下输出。The monitored device is restarted based on a trigger signal output by the monitoring device through the hardware interface, and the trigger signal is output when the monitoring device determines that the program is running abnormally.
在一些实施例中,所述被监控设备设有继电器,所述继电器用于控制所述被监控设备的复位开关的通断,所述异常处理装置还用于:In some embodiments, the monitored device is provided with a relay, and the relay is used to control the on-off of the reset switch of the monitored device, and the abnormality processing device is further configured to:
基于所述监控设备通过所述硬件接口输出触发信号改变所述继电器的状态,以使所述继电器导通所述被监控设备的复位开关。Based on the monitoring device outputting a trigger signal through the hardware interface, the state of the relay is changed, so that the relay turns on the reset switch of the monitored device.
在一些实施例中,所述监控设备和所述被监控设备由不同的电源进行供电。In some embodiments, the monitoring device and the monitored device are powered by different power sources.
相应的,本公开实施例还提供了一种异常处理装置,如图4所示,所述装置40包括:Correspondingly, an embodiment of the present disclosure further provides an exception processing apparatus. As shown in FIG. 4 , the apparatus 40 includes:
接收模块41,用于接收被监控设备在运行过程中持续发送的验证信息,其中,所述验证信息基于所述被监控设备上运行的程序的任务处理进度生成,不同的任务处理进度对应不同的验证信息;The receiving module 41 is configured to receive the verification information continuously sent by the monitored device during the running process, wherein the verification information is generated based on the task processing progress of the program running on the monitored device, and different task processing progress corresponds to different verify message;
异常诊断模块42,用于基于验证信息进行异常诊断;an abnormality diagnosis module 42, configured to perform abnormality diagnosis based on the verification information;
第二处理模块43,用于在判定所述程序运行异常的情况下执行异常处理操作。The second processing module 43 is configured to perform an exception processing operation when it is determined that the program runs abnormally.
在一些实施例中,所述任务处理进度包括多个节点,对于所述多个节点中的每个节点,该节点对应的验证信息在所述任务处理进度达到所述进度节点后生成。In some embodiments, the task processing progress includes multiple nodes, and for each node in the multiple nodes, verification information corresponding to the node is generated after the task processing progress reaches the progress node.
在一些实施例中,所述异常诊断模块用于:In some embodiments, the abnormality diagnosis module is used to:
在连续接收到的两条所述验证信息相同或超过预设时长未接收到所述验证信息的情况下,则判定所述程序运行异常,其中,所述预设时长大于相邻两个所述节点之间的最大任务处理时长。In the case where the two pieces of the verification information received continuously are the same or the verification information is not received after a preset duration, it is determined that the program is running abnormally, wherein the preset duration is longer than the two adjacent ones of the verification information. Maximum task processing time between nodes.
在一些实施例中,所述监控设备和所述被监控设备通过硬件接口连接,在判定所述程序运行异常的情况下执行异常处理操作,所述异常处理装置还用于:In some embodiments, the monitoring device and the monitored device are connected through a hardware interface, and when it is determined that the program runs abnormally, an exception handling operation is performed, and the exception handling apparatus is further configured to:
在判定所述程序运行异常的情况下,通过所述硬件接口输出触发信号,以触发所述被监控设备重启。When it is determined that the program runs abnormally, a trigger signal is output through the hardware interface to trigger the monitored device to restart.
在一些实施例中,所述被监控设备设有继电器,所述继电器用于控制所述被监控设备的复位开关的通断,所述异常处理装置用于:In some embodiments, the monitored device is provided with a relay, and the relay is used to control the on-off of a reset switch of the monitored device, and the abnormality processing device is used for:
通过所述硬件接口向所述继电器输出触发信号,以使所述继电器导通所述被监控设备的复位开关。A trigger signal is output to the relay through the hardware interface, so that the relay turns on the reset switch of the monitored device.
进一步地,本公开实施例还提供了一种电子设备,如图5所示,所述电子设备包括处理器51和存储器52,所述存储器52存储有可供所述处理器51执行的计算机程序,所述处理器51执行所述计算机程序时,实现上述实施例中的任一项处理方法。Further, an embodiment of the present disclosure further provides an electronic device. As shown in FIG. 5 , the electronic device includes a processor 51 and a memory 52 , and the memory 52 stores a computer program executable by the processor 51 . , when the processor 51 executes the computer program, any one of the processing methods in the foregoing embodiments is implemented.
此外,本公开实施例还提供了一种异常处理系统,如图6所示,为本公开实施例中的一种异常处理系统的的示意图,所述系统包括被监控设备和监控设备;In addition, an embodiment of the present disclosure also provides an exception handling system, as shown in FIG. 6 , which is a schematic diagram of an exception handling system in an embodiment of the present disclosure, and the system includes a monitored device and a monitoring device;
所述被监控设备用于在运行过程中,基于所述被监控设备上运行的程序的任务处理进度生成验证信息,其中,不同的任务处理进度对应不同的验证信息;并持续地将所述验证信息发送至监控设备;The monitored device is used to generate verification information based on the task processing progress of the program running on the monitored device during the running process, wherein different task processing progress corresponds to different verification information; information sent to monitoring equipment;
所述监控设备用于基于所接收到的验证信息进行异常诊断,并在判定所述程序运行异常的情况下执行异常处理操作。The monitoring device is configured to perform abnormality diagnosis based on the received verification information, and perform abnormality processing operations when it is determined that the program is abnormally running.
其中,所述被监控设备和所述监控设备在异常处理过程中的具体实现细节可参考上述方法实施例中的描述,在此不再赘述。For the specific implementation details of the monitored device and the monitoring device in the exception handling process, reference may be made to the descriptions in the foregoing method embodiments, which will not be repeated here.
本公开实施例还提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现前述任一实施例所述的异常处理方法。Embodiments of the present disclosure further provide a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, implements the exception handling method described in any of the foregoing embodiments.
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Computer-readable media includes both persistent and non-permanent, removable and non-removable media, and storage of information may be implemented by any method or technology. Information may be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer-readable media does not include transitory computer-readable media, such as modulated data signals and carrier waves.
通过以上的实施方式的描述可知,本领域的技术人员可以清楚地了解到本说明书实施例可借助软件加必需的通用硬件平台的方式来实现。基于这样的理解,本说明书实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本说明书实施例各个实施例或者实施例的某些部分所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the embodiments of the present specification can be implemented by means of software plus a necessary general hardware platform. Based on such understanding, the technical solutions of the embodiments of this specification or the parts that make contributions to the prior art may be embodied in the form of software products, and the computer software products may be stored in storage media, such as ROM/RAM, A magnetic disk, an optical disk, etc., includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in various embodiments or some parts of the embodiments in this specification.
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机,计算机的具体形式可以是个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件收发设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任意几种设备的组合。The systems, devices, modules or units described in the above embodiments may be specifically implemented by computer chips or entities, or by products with certain functions. A typical implementation device is a computer, which may be in the form of a personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media player, navigation device, email sending and receiving device, game control desktop, tablet, wearable device, or a combination of any of these devices.
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置实施例而言,由于其基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,在实施本说明书实施例方案时可以把各模块的功能在同一个或多个软件和/或硬件中实现。也可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。Each embodiment in this specification is described in a progressive manner, and the same and similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the apparatus embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and reference may be made to the partial description of the method embodiment for related parts. The device embodiments described above are only illustrative, wherein the modules described as separate components may or may not be physically separated. When implementing the solutions of the embodiments of the present specification, the functions of each module may be integrated into the same module. or multiple software and/or hardware implementations. Some or all of the modules may also be selected according to actual needs to achieve the purpose of the solution in this embodiment. Those of ordinary skill in the art can understand and implement it without creative effort.
以上所述仅是本说明书实施例的具体实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本说明书实施例原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本说明书实施例的保护范围。The above are only specific implementations of the embodiments of the present specification. It should be pointed out that for those skilled in the art, without departing from the principles of the embodiments of the present specification, several improvements and modifications can be made. These Improvements and modifications should also be regarded as the protection scope of the embodiments of the present specification.

Claims (15)

  1. 一种异常处理方法,所述方法应用于被监控设备,所述方法包括:An exception handling method, the method is applied to a monitored device, and the method includes:
    在所述被监控设备运行的过程中,持续地基于所述被监控设备上运行的程序的任务处理进度生成验证信息,其中,不同的任务处理进度对应不同的验证信息;During the operation of the monitored device, the verification information is continuously generated based on the task processing progress of the program running on the monitored device, wherein different task processing progress corresponds to different verification information;
    将所述验证信息发送至监控设备,以使所述监控设备基于所述验证信息进行异常诊断;sending the verification information to a monitoring device, so that the monitoring device performs abnormality diagnosis based on the verification information;
    响应于确定所述程序运行异常,执行异常处理操作。In response to determining that the program is operating abnormally, an exception handling operation is performed.
  2. 根据权利要求1所述的方法,其特征在于,所述任务处理进度包括多个节点,基于所述被监控设备上运行的程序的任务处理进度生成所述验证信息,包括:The method according to claim 1, wherein the task processing progress comprises a plurality of nodes, and generating the verification information based on the task processing progress of the program running on the monitored device comprises:
    针对任一节点,响应于确定所述任务处理进度达到该节点,生成该节点对应的验证信息。For any node, in response to determining that the task processing progress has reached the node, verification information corresponding to the node is generated.
  3. 根据权利要求1或2所述的方法,其特征在于,所述方法还包括:The method according to claim 1 or 2, wherein the method further comprises:
    基于所述监控设备输出的触发信号重启所述被监控设备,其中,所述触发信号在所述监控设备判定所述程序运行异常的情况下输出。The monitored device is restarted based on a trigger signal output by the monitoring device, wherein the trigger signal is output when the monitoring device determines that the program runs abnormally.
  4. 根据权利要求3所述的方法,其特征在于,所述被监控设备设有继电器,所述继电器用于控制所述被监控设备的复位开关的通断,基于所述监控设备输出的触发信号重启所述被监控设备,包括:The method according to claim 3, wherein the monitored device is provided with a relay, and the relay is used to control the on-off of a reset switch of the monitored device, and restarts based on a trigger signal output by the monitoring device The monitored equipment includes:
    基于所述监控设备输出的触发信号改变所述继电器的状态,以使所述继电器导通所述被监控设备的复位开关。The state of the relay is changed based on the trigger signal output by the monitoring device, so that the relay turns on the reset switch of the monitored device.
  5. 根据权利要求1-4任一项所述的异常处理方法,其特征在于,所述监控设备和所述被监控设备由不同的电源进行供电。The exception handling method according to any one of claims 1-4, wherein the monitoring device and the monitored device are powered by different power sources.
  6. 一种异常处理方法,所述方法应用于监控设备,所述方法包括:An exception handling method, the method is applied to a monitoring device, and the method includes:
    接收被监控设备发送的验证信息,其中,所述验证信息基于所述被监控设备上运行的程序的任务处理进度生成,不同的任务处理进度对应不同的验证信息;receiving verification information sent by the monitored device, wherein the verification information is generated based on the task processing progress of the program running on the monitored device, and different task processing progress corresponds to different verification information;
    基于所述验证信息进行异常诊断;Perform abnormal diagnosis based on the verification information;
    在判定所述程序运行异常的情况下执行异常处理操作。Execute an exception handling operation when it is determined that the program runs abnormally.
  7. 根据权利要求6所述的方法,其特征在于,所述任务处理进度包括多个节点,对于所述多个节点中的每个节点,该节点对应的验证信息在所述任务处理进度达到该节点后生成。The method according to claim 6, wherein the task processing progress includes a plurality of nodes, and for each node in the plurality of nodes, the verification information corresponding to the node is reached when the task processing progress reaches the node. generated later.
  8. 根据权利要求6或7所述的方法,其特征在于,基于所述验证信息进行异常诊断,包括:The method according to claim 6 or 7, wherein the abnormal diagnosis based on the verification information comprises:
    在连续接收到的两条验证信息相同和/或超过预设时长未接收到验证信息的情况下,则判定所述程序运行异常,其中,所述预设时长大于相邻两个节点之间的最大任务处理时长。In the case that the two pieces of verification information received continuously are the same and/or the verification information is not received after a preset duration, it is determined that the program is running abnormally, wherein the preset duration is greater than the duration between two adjacent nodes. Maximum task processing time.
  9. 根据权利要求6-8任一项所述的方法,其特征在于,在判定所述程序运行异常的情况下执行异常处理操作,包括:The method according to any one of claims 6-8, wherein performing an exception handling operation when it is determined that the program runs abnormally, comprising:
    在判定所述程序运行异常的情况下,输出触发信号,以触发所述被监控设备重启。When it is determined that the program runs abnormally, a trigger signal is output to trigger the monitored device to restart.
  10. 根据权利要求9所述的方法,其特征在于,所述被监控设备设有继电器,所述继电器用于控制所述被监控设备的复位开关的通断,The method according to claim 9, wherein the monitored device is provided with a relay, and the relay is used to control the on-off of the reset switch of the monitored device,
    所述输出触发信号,以触发所述被监控设备重启,包括:The outputting a trigger signal to trigger restarting of the monitored device includes:
    向所述继电器输出触发信号,以使所述继电器导通所述被监控设备的复位开关。A trigger signal is output to the relay, so that the relay turns on the reset switch of the monitored device.
  11. 一种异常处理装置,所述装置应用于被监控设备,所述装置包括:An exception handling device, the device is applied to a monitored device, and the device includes:
    验证信息生成模块,用于在所述被监控设备运行的过程中,持续地基于所述被监控设备上运行的程序的任务处理进度生成验证信息,其中,不同的任务处理进度对应不同的验证信息;A verification information generation module is used to continuously generate verification information based on the task processing progress of the program running on the monitored device during the operation of the monitored device, wherein different task processing progress corresponds to different verification information ;
    发送模块,用于将所述验证信息发送至监控设备,以使所述监控设备基于接收到的验证信息进行异常诊断;a sending module, configured to send the verification information to a monitoring device, so that the monitoring device performs abnormal diagnosis based on the received verification information;
    第一处理模块,响应于确定所述程序运行异常,执行异常处理操作。The first processing module, in response to determining that the program runs abnormally, performs an exception handling operation.
  12. 一种异常处理装置,所述装置应用于监控设备,所述装置包括:An exception processing device, the device is applied to monitoring equipment, and the device includes:
    接收模块,用于接收被监控设备发送的验证信息,其中,所述验证信息基于所述被监控设备上运行的程序的任务处理进度生成,不同的任务处理进度对应不同的验证信息;a receiving module, configured to receive the verification information sent by the monitored device, wherein the verification information is generated based on the task processing progress of the program running on the monitored device, and different task processing progress corresponds to different verification information;
    异常诊断模块,用于基于所述验证信息进行异常诊断;an abnormality diagnosis module for performing abnormality diagnosis based on the verification information;
    第二处理模块,用于在判定所述程序运行异常的情况下执行异常处理操作。The second processing module is configured to perform an exception processing operation when it is determined that the program runs abnormally.
  13. 一种电子设备,其包括处理器和存储器,所述存储器存储有可供所述处理器执行的计算机程序,所述处理器执行所述计算机程序时,实现权利要求1-5或6-10任一项所述的方法的步骤。An electronic device comprising a processor and a memory, the memory stores a computer program executable by the processor, and when the processor executes the computer program, any one of claims 1-5 or 6-10 is implemented. A step of the method.
  14. 一种异常处理系统,包括被监控设备和监控设备;An exception handling system, including monitored equipment and monitoring equipment;
    所述被监控设备用于在运行过程中,持续地基于所述被监控设备上运行的程序的任务处理进度生成验证信息,其中,不同的任务处理进度对应不同的验证信息;并将所述验证信息发送至监控设备;The monitored device is used to continuously generate verification information based on the task processing progress of the program running on the monitored device during the running process, wherein different task processing progress corresponds to different verification information; information sent to monitoring equipment;
    所述监控设备用于基于所述验证信息进行异常诊断,并在判定所述程序运行异常的情况下执行异常处理操作。The monitoring device is configured to perform abnormality diagnosis based on the verification information, and execute abnormality processing operations when it is determined that the program runs abnormally.
  15. 一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1-5或6-10任一项所述的方法的步骤。A computer-readable storage medium having a computer program stored thereon, the computer program implementing the steps of the method according to any one of claims 1-5 or 6-10 when executed by a processor.
PCT/CN2022/084143 2021-04-25 2022-03-30 Exception handling method, apparatus and system, and electronic device WO2022228012A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110447342.8A CN113064798A (en) 2021-04-25 2021-04-25 Exception handling method and device, electronic equipment and system
CN202110447342.8 2021-04-25

Publications (1)

Publication Number Publication Date
WO2022228012A1 true WO2022228012A1 (en) 2022-11-03

Family

ID=76567558

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/084143 WO2022228012A1 (en) 2021-04-25 2022-03-30 Exception handling method, apparatus and system, and electronic device

Country Status (2)

Country Link
CN (1) CN113064798A (en)
WO (1) WO2022228012A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113064798A (en) * 2021-04-25 2021-07-02 上海商汤临港智能科技有限公司 Exception handling method and device, electronic equipment and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060277448A1 (en) * 2005-06-06 2006-12-07 Denso Corporation Malfunction monitoring method and system
CN105898318A (en) * 2015-12-21 2016-08-24 乐视云计算有限公司 Offline transcoding method and system
CN105898554A (en) * 2015-12-18 2016-08-24 乐视云计算有限公司 Real-time transcoding monitoring method and real-time transcoding system
CN111625428A (en) * 2020-04-20 2020-09-04 中国建设银行股份有限公司 Method, system, device and storage medium for monitoring running state of Java application program
CN113064798A (en) * 2021-04-25 2021-07-02 上海商汤临港智能科技有限公司 Exception handling method and device, electronic equipment and system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109284217B (en) * 2018-09-28 2023-01-10 平安科技(深圳)有限公司 Application program exception handling method and device, electronic equipment and storage medium
WO2020107198A1 (en) * 2018-11-27 2020-06-04 刘馥祎 Operation apparatus maintenance method and device, storage medium, and program product

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060277448A1 (en) * 2005-06-06 2006-12-07 Denso Corporation Malfunction monitoring method and system
CN105898554A (en) * 2015-12-18 2016-08-24 乐视云计算有限公司 Real-time transcoding monitoring method and real-time transcoding system
CN105898318A (en) * 2015-12-21 2016-08-24 乐视云计算有限公司 Offline transcoding method and system
CN111625428A (en) * 2020-04-20 2020-09-04 中国建设银行股份有限公司 Method, system, device and storage medium for monitoring running state of Java application program
CN113064798A (en) * 2021-04-25 2021-07-02 上海商汤临港智能科技有限公司 Exception handling method and device, electronic equipment and system

Also Published As

Publication number Publication date
CN113064798A (en) 2021-07-02

Similar Documents

Publication Publication Date Title
CN108780412B (en) Memory backup management in a computing system
TWI337707B (en) System and method for logging recoverable errors
US11526411B2 (en) System and method for improving detection and capture of a host system catastrophic failure
US7856639B2 (en) Monitoring and controlling applications executing in a computing node
US20180113764A1 (en) Hypervisor Based Watchdog Timer
US20140019807A1 (en) Transaction server performance monitoring using component performance data
WO2018095107A1 (en) Bios program abnormal processing method and apparatus
WO2021248836A1 (en) Smart device startup method and apparatus, smart device, and readable storage medium
CN107480014A (en) A kind of High Availabitity equipment switching method and device
TW201828071A (en) Switching device and method for detecting i2c bus
US11099961B2 (en) Systems and methods for prevention of data loss in a power-compromised persistent memory equipped host information handling system during a power loss event
US10089162B2 (en) Method for maintaining file system of computer system
WO2022228012A1 (en) Exception handling method, apparatus and system, and electronic device
CN110704228A (en) Solid state disk exception handling method and system
CN111800304A (en) Process running monitoring method, storage medium and virtual device
WO2020078355A1 (en) Device state monitoring method and apparatus
CN115617550A (en) Processing device, control unit, electronic device, method, and computer program
CN111918236B (en) Internet of things security sensor network searching method, device, equipment and storage medium
US8024604B2 (en) Information processing apparatus and error processing
CN105912414A (en) Method and system for server management
CN109062718B (en) Server and data processing method
CN116974804A (en) Debugging method, device, equipment and storage medium for managing engine suspension
JP2013149128A (en) Computer system, power supply disconnection processing device, power supply disconnection processing method, and program
WO2014112039A1 (en) Information processing device, method for controlling information processing device and information processing device control program
TWI461905B (en) Computing device capable of remote crash recovery, method for remote crash recovery of computing device, and computer readable medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22794485

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22794485

Country of ref document: EP

Kind code of ref document: A1