CN116795599A - Proxy process exception self-recovery method and device - Google Patents

Proxy process exception self-recovery method and device Download PDF

Info

Publication number
CN116795599A
CN116795599A CN202310760556.XA CN202310760556A CN116795599A CN 116795599 A CN116795599 A CN 116795599A CN 202310760556 A CN202310760556 A CN 202310760556A CN 116795599 A CN116795599 A CN 116795599A
Authority
CN
China
Prior art keywords
proxy process
proxy
self
abnormal
recovery
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310760556.XA
Other languages
Chinese (zh)
Inventor
黄梓锋
王竟成
郑天文
李海龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202310760556.XA priority Critical patent/CN116795599A/en
Publication of CN116795599A publication Critical patent/CN116795599A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1479Generic software techniques for error detection or fault masking

Abstract

The disclosure provides a proxy process exception self-recovery method, relates to the technical field of cloud computing, and can be applied to the technical field of finance. Each of the agent processes corresponds to a daemon for monitoring the agent process, the method comprising: when the response abnormality of the proxy process is determined, determining the survival state of the proxy process according to the proxy process identification; when the agent process is determined to survive, carrying out business function test on the agent process; when the state of the proxy process is determined to be abnormal, carrying out abnormal positioning on the proxy process to determine the reason of the abnormality of the proxy process; and performing self-recovery operation on the proxy process according to the abnormality reason. The disclosure also provides a proxy process exception self-recovery device, a storage medium and a program product.

Description

Proxy process exception self-recovery method and device
Technical Field
The present disclosure relates to the field of cloud computing technologies, and in particular, to the field of automatic operation and maintenance technologies, and more particularly, to a proxy process anomaly self-recovery method, apparatus, device, storage medium, and program product.
Background
In an enterprise operation and maintenance architecture, all servers in an enterprise are managed through a system management operation and maintenance platform, and are used for executing batch operation and maintenance operations such as commands, executing scripts and the like in batches. The management operation is issued by relying on the proxy agent process, namely, the proxy process is installed on the server operating system to perform management. In daily use, the abnormal condition of the agent process can occur, and the agent process can not normally operate, so that unified management can not be realized, and particularly, the abnormal condition of the agent process is more common in environments with more manual operations and large environmental variation such as development and test environments.
When the agent process is abnormal, operation and maintenance personnel are required to manually log in the server for problem investigation and manual recovery, but the traditional operation and maintenance mode has lower efficiency and cannot meet the requirements of a large operation and maintenance platform.
It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
In view of the foregoing, the present disclosure provides a proxy process anomaly self-recovery method, apparatus, device, storage medium, and program product that improve operation and maintenance efficiency.
According to a first aspect of the present disclosure, there is provided a method for abnormal self-recovery of agent processes, each agent process corresponding to a daemon for monitoring the agent process, the method comprising:
when the response abnormality of the proxy process is determined, determining the survival state of the proxy process according to the proxy process identification;
when the agent process is determined to survive, carrying out business function test on the agent process;
when the state of the proxy process is determined to be abnormal, carrying out abnormal positioning on the proxy process to determine the reason of the abnormality of the proxy process; and
and performing self-recovery operation on the proxy process according to the abnormality reason.
According to an embodiment of the disclosure, the performing exception positioning on the proxy process to determine an exception cause of the proxy process includes:
collecting a real-time value of a monitoring index; and
and determining the abnormal reason of the proxy process according to the real-time value and the preset threshold value of the monitoring index.
According to an embodiment of the present disclosure, the monitoring indicator includes a memory usage rate, a processor usage rate, and a disk usage rate, and determining, according to the real-time value and a preset threshold of the monitoring indicator, an abnormality cause of the proxy process includes:
when the memory usage rate is determined to be greater than a first preset threshold value, determining that the abnormality is caused by the process memory usage rate being too high;
when the processor utilization rate is determined to be greater than a second preset threshold value, determining that the abnormality is caused by the process processor utilization rate being too high; and
and when the disk usage rate is determined to be larger than a third preset threshold value, determining that the abnormality is caused by overhigh process disk usage rate.
According to an embodiment of the disclosure, the performing a self-recovery operation on the proxy process according to the abnormality cause includes:
determining a corresponding repair script according to the abnormal cause;
operating the repair script to repair the proxy process; and
restarting the proxy process.
According to an embodiment of the present disclosure, further comprising:
and restarting the proxy process when the proxy process is determined to not survive.
According to an embodiment of the present disclosure, after performing a self-recovery operation on the proxy process according to the abnormality cause, the method further includes:
and carrying out business function test on the restarted proxy process to determine the self-recovery result of the proxy process.
A second aspect of the present disclosure provides a device for abnormal self-recovery of agent processes, each agent process corresponding to a daemon for monitoring the agent process, the device comprising:
the survival state determining module is used for determining the survival state of the proxy process according to the proxy process identification when the response of the proxy process is abnormal;
the first testing module is used for carrying out business function testing on the proxy process when the proxy process is determined to survive;
the abnormal positioning module is used for positioning the abnormal state of the proxy process when the state of the proxy process is abnormal, so as to determine the reason of the abnormal state of the proxy process; and
and the self-recovery module is used for carrying out self-recovery operation on the proxy process according to the abnormal reasons.
According to an embodiment of the present disclosure, the anomaly locating module includes: an acquisition sub-module and a first determination sub-module,
the acquisition sub-module is used for acquiring the real-time value of the monitoring index; and
and the first determination submodule is used for determining the abnormal reason of the proxy process according to the real-time value and the preset threshold value of the monitoring index.
According to an embodiment of the disclosure, the monitoring index includes a memory usage rate, a processor usage rate, and a disk usage rate, and the first determination submodule includes a first determination unit, a second determination unit, and a third determination unit.
The first determining unit is used for determining that the abnormality is that the process memory usage rate is too high when the memory usage rate is determined to be larger than a first preset threshold value;
the second determining unit is used for determining that the abnormality is that the process processor utilization rate is too high when the processor utilization rate is determined to be larger than a second preset threshold value; and
and the third determining unit is used for determining that the abnormality is that the process disk usage rate is too high when the disk usage rate is determined to be larger than a third preset threshold value.
According to an embodiment of the present disclosure, the self-recovery module includes: the second determination sub-module and the repair sub-module and the restart sub-module.
The second determining submodule is used for determining a corresponding repair script according to the abnormal cause;
the repair sub-module is used for operating the repair script to repair the proxy process; and
and the restarting sub-module is used for restarting the proxy process.
According to an embodiment of the present disclosure, further comprising: and restarting the module.
And the restarting module is used for restarting the proxy process when the proxy process is determined to be not alive.
According to an embodiment of the present disclosure, further comprising: and a second test module.
And the second testing module is used for carrying out business function testing on the restarted proxy process so as to determine the self-recovery result of the proxy process.
A third aspect of the present disclosure provides an electronic device, comprising: one or more processors; and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the agent process exception self-recovery method described above.
A fourth aspect of the present disclosure also provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the above-described proxy process exception self-recovery method.
A fifth aspect of the present disclosure also provides a computer program product comprising a computer program which, when executed by a processor, implements the above proxy process exception self-recovery method.
According to the agent process abnormal self-recovery method provided by the embodiment of the disclosure, the daemon process is established for each agent process, the agent process is monitored by the daemon process, when the agent process responds to the abnormality, the survival state of the agent process is determined, the business function test is carried out according to the survival state of the agent process, the abnormality location is carried out according to the test result, the agent process is automatically recovered according to the abnormality cause, and compared with the manual operation and maintenance of operation and maintenance personnel, the agent process abnormal self-recovery method provided by the embodiment of the disclosure can realize the rapid recovery of the abnormal agent process, reduce the business risk, reduce the manual operation of the agent process, and further improve the operation and maintenance efficiency.
Drawings
The foregoing and other objects, features and advantages of the disclosure will be more apparent from the following description of embodiments of the disclosure with reference to the accompanying drawings, in which:
FIG. 1 schematically illustrates an application scenario diagram of a proxy process exception self-recovery method, apparatus, device, storage medium, and program product according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow diagram of a proxy process exception self-recovery method provided in accordance with an embodiment of the present disclosure;
FIG. 3 schematically illustrates one of the flowcharts of a proxy process exception self-recovery method provided in accordance with another embodiment of the present disclosure;
FIG. 4 schematically illustrates a second flowchart of a proxy process exception self-recovery method provided in accordance with another embodiment of the present disclosure;
FIG. 5 schematically illustrates a flow chart of a proxy process exception self-recovery method provided in accordance with another embodiment of the present disclosure;
FIG. 6 schematically illustrates a flow chart of a proxy process exception self-recovery method provided in accordance with yet another embodiment of the present disclosure;
FIG. 7 schematically illustrates a block diagram of a proxy process exception self-recovery apparatus according to an embodiment of the present disclosure; and
fig. 8 schematically illustrates a block diagram of an electronic device adapted to implement a proxy process exception self-recovery method in accordance with an embodiment of the present disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.
Where expressions like at least one of "A, B and C, etc. are used, the expressions should generally be interpreted in accordance with the meaning as commonly understood by those skilled in the art (e.g.," a system having at least one of A, B and C "shall include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
The terms appearing in the embodiments of the present disclosure will first be explained:
and a system management operation and maintenance platform: and a platform for carrying out unified management and unified operation and maintenance on the operation systems of the servers in the IT architecture.
Proxy agent process: and the program deployed on the server automatically executes various instructions or functional operations issued by the server.
Based on the above technical problems, an embodiment of the present disclosure provides a method for abnormal self-recovery of agent processes, where each agent process corresponds to a daemon, and the daemon is configured to monitor the agent process, including: when the response abnormality of the proxy process is determined, determining the survival state of the proxy process according to the proxy process identification; when the agent process is determined to survive, carrying out business function test on the agent process; when the state of the proxy process is determined to be abnormal, carrying out abnormal positioning on the proxy process to determine the reason of the abnormality of the proxy process; and performing self-recovery operation on the proxy process according to the abnormality reason.
FIG. 1 schematically illustrates an application scenario diagram of a proxy process exception self-recovery method, apparatus, device, storage medium, and program product according to an embodiment of the present disclosure.
As shown in fig. 1, an application scenario 100 according to this embodiment may include a proxy process exception self-recovery scenario. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only) may be installed on the terminal devices 101, 102, 103.
The terminal devices 101, 102, 103 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 105 may be a proxy process anomaly self-recovery server, and creates a daemon process for each proxy process, when the proxy process responds to anomalies, the daemon process monitors the survival state of the proxy process, and performs functional service testing on the surviving proxy process, when the proxy process is determined to be anomalous, positioning of anomaly problems is performed according to the acquired real-time value of the monitoring index, and self-repairing is performed according to the corresponding repairing script operated according to the specific problems, so that automatic operation and maintenance of the proxy process are realized.
It should be noted that, the proxy process exception self-recovery method provided in the embodiments of the present disclosure may be generally executed by the server 105. Accordingly, the proxy process exception self-recovery apparatus provided by the embodiments of the present disclosure may be generally disposed in the server 105. The proxy process exception self-recovery method provided by the embodiments of the present disclosure may also be performed by a server or server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the proxy process anomaly self-recovery apparatus provided by the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
It should be noted that, the method and the device for self-recovering abnormal proxy process determined by the embodiments of the present disclosure may be used in the technical field of cloud computing, or may be used in the technical field of finance, or may be used in any field other than the financial field, and the application field of the method and the device for self-recovering abnormal proxy process determined by the embodiments of the present disclosure is not limited.
The proxy process anomaly self-recovery method according to the embodiment of the present disclosure will be described in detail below with reference to fig. 2 to 6 based on the application scenario described in fig. 1.
Fig. 2 schematically illustrates a flowchart of a proxy process exception self-recovery method provided according to an embodiment of the present disclosure. As shown in fig. 2, the proxy process exception self-recovery method of this embodiment includes operations S210 to S240, which may be performed by a server or other computing device.
In one example, the operation and maintenance efficiency is low and the operation and maintenance cost is high because the state recovery when the proxy process is abnormal is completed through manual operation; meanwhile, when the abnormal proxy processes are more, the proxy processes cannot be recovered in time due to the lack of manpower, so that the actual service is possibly problematic, and even the service loss is caused. In the embodiment of the disclosure, a daemon process is newly added to each proxy process based on the technology of the daemon process of the Linux system, when the proxy process is abnormal and cannot normally serve, various indexes of the system are detected, whether the related monitoring indexes of the system are abnormal or not is judged based on preset abnormal conditions, such as the conditions that the proxy process runs abnormally, the space resources of the host system run out, the memory resources of the host system run out and the like, the reasons of the abnormal proxy process are positioned according to the conditions, and after the reasons of the abnormal proxy process are definitely known problems, the problems are automatically solved, so that the proxy process is restored to normal operation. For a specific procedure, see operations S210 to S240.
When it is determined that the proxy process responds to the exception, the survival state of the proxy process is determined according to the proxy process identification in operation S210.
When it is determined that the proxy process survives, a business function test is performed on the proxy process in operation S220.
In one example, when the proxy process does not respond to the system management operation platform, the daemon process monitors to detect whether the process number of the proxy process exists in the system, if so, the proxy process is characterized to survive, and the proxy process needs to be further tested to confirm whether the proxy process can work normally. If not, the characterization proxy process is already killed, and the proxy process needs to be restarted. When the survival of the Agent process is determined, carrying out business function test on the Agent process, judging whether the Agent process can respond normally, if the Agent works normally, indicating that the Agent is likely to be unresponsive due to the network environment problem, and being irrelevant to the process operation; otherwise, the Agent is said to survive, but the abnormal working state may be caused by the abnormal system environment, and then operation S230 is continued to perform abnormality location.
When it is determined that the proxy process state is abnormal, the proxy process is abnormally located to determine the cause of the abnormality of the proxy process in operation S230.
In operation S240, a self-recovery operation is performed on the proxy process according to the abnormality cause.
In one example, when determining that the agent process state is abnormal, determining whether the related monitor indicator is abnormal according to the preset monitor indicator abnormal condition, and locating the cause of the abnormality, the specific process may refer to operations S231 to S232 shown in fig. 3, which are not described herein. The abnormal reasons determined in operation S230, such as excessive memory usage, excessive processor usage, or excessive disk usage, are determined to be corresponding to the abnormal reasons, and the corresponding repair scripts are automatically repaired according to the corresponding abnormal reasons, and specific self-repairing processes can be seen in operation S241 to operation S243.
According to the agent process abnormal self-recovery method provided by the embodiment of the disclosure, the daemon process is established for each agent process, the agent process is monitored by the daemon process, when the agent process responds to the abnormality, the survival state of the agent process is determined, the business function test is carried out according to the survival state of the agent process, the abnormality location is carried out according to the test result, the agent process is automatically recovered according to the abnormality cause, and compared with the manual operation and maintenance of operation and maintenance personnel, the agent process abnormal self-recovery method provided by the embodiment of the disclosure can realize the rapid recovery of the abnormal agent process, reduce the business risk, reduce the manual operation of the agent process, and further improve the operation and maintenance efficiency.
FIG. 3 schematically illustrates one of the flowcharts of a proxy process anomaly localization method provided in accordance with another embodiment of the present disclosure; fig. 4 schematically illustrates a second flowchart of a proxy process anomaly location method according to another embodiment of the present disclosure. As shown in fig. 4, operation S230 includes operations S231 to S232.
In operation S231, a real-time value of the monitoring index is acquired.
According to an embodiment of the present disclosure, the monitoring index includes a memory usage rate, a processor usage rate, and a disk usage rate.
In operation S232, the cause of the abnormality of the agent process is determined according to the real-time value and the monitoring index preset threshold.
As shown in fig. 4, operation S232 includes operations S2321 to S2323.
In operation S2321, when it is determined that the memory usage rate is greater than the first preset threshold, it is determined that the abnormality is that the process memory usage rate is too high.
In operation S2322, when it is determined that the processor usage is greater than the second preset threshold, it is determined that the abnormality is that the process processor usage is too high.
In operation S2323, when it is determined that the disk usage is greater than the third preset threshold, it is determined that the abnormality is that the process disk usage is too high.
In one example, the monitoring index is set in advance, and the real-time value of the related index of the operating system where the proxy process is located is obtained through the related command and is used as the basis for judging the cause of the process abnormality. The metrics may include: cpu utilization, memory utilization, disk utilization of the folder in which the process resides, and the like. The monitoring index can also be customized according to the user's needs. Comparing the index value with a preset threshold value, judging whether the related monitoring index is abnormal, and locating the cause of the abnormality. For example, if the cpu usage rate is higher than a second preset threshold, the second preset threshold may be 80%, for example, then locating as an abnormal cause; the memory usage rate exceeds a first preset threshold value by 80%, and the memory usage rate is positioned as an abnormal reason; the disk usage rate of the folder where the process is located reaches a third preset threshold value, 99%, and the process is positioned as an abnormal reason. If the obtained indexes are all in normal values, the preset indexes are not the reasons for the abnormality of the agent process.
Fig. 5 schematically illustrates a flowchart of a proxy process exception self-recovery method according to another embodiment of the present disclosure, and as illustrated in fig. 5, operation S240 includes operations S241 to S243.
In operation S241, a corresponding repair script is determined according to the abnormality cause. In operation S242, the repair script is run to repair the proxy process. In operation S243, the proxy process is restarted.
In one example, for locating a problem cause that explicitly leads to process abnormality, a repair script corresponding to a problem written in advance is called to automatically repair an operating system, and after the problem is solved, the proxy process is restarted. The repair script set in advance can be adjusted according to specific requirements, such as the reason that the CPU utilization rate is too high, the process is killed, and the CPU utilization limit of the process is adjusted; if the memory usage rate is too high, killing the process, and properly adjusting the usable memory quota of the Agent; and (4) performing disk capacity expansion on the corresponding folders through the lvm technology and other technologies due to the fact that the disk use rate of the folders where the processes are located is too high. The repair script can also be used for carrying out actual demand reconstruction according to operation and maintenance experience.
Fig. 6 schematically illustrates a flowchart of a proxy process exception self-recovery method according to still another embodiment of the present disclosure, as shown in fig. 6, including operations S310 to S380.
In operation S310, when it is determined that the proxy process responds to the exception, the survival state of the proxy process is determined according to the proxy process identification.
In operation S320, when it is determined that the proxy process survives, a business function test is performed on the proxy process.
In operation S330, when it is determined that the proxy process does not survive, the proxy process is restarted.
When it is determined that the proxy process state is abnormal, the proxy process is abnormally located to determine the cause of the abnormality of the proxy process in operation S340.
In operation S350, a self-recovery operation is performed on the proxy process according to the abnormality cause.
In operation S360, a business function test is performed on the restarted proxy process to determine a proxy process self-recovery result.
In one example, the technical solutions and principles of operation S310, operation S320, operation S340 and operation S350 are the same as those of operation S210 to operation S240, and are not described herein. When it is determined that the proxy process does not survive, the restart proxy process re-monitors whether the process can respond normally in operation S330. In operation S350, after the problem is automatically solved, the agent process is again subjected to a business function test to verify whether the process is restored to a normal operating state. If the operation still cannot be performed normally, reporting to operation and maintenance personnel for manual resolution.
Based on the agent process abnormality self-recovery method, the disclosure also provides an agent process abnormality self-recovery device. The device will be described in detail below in connection with fig. 7.
Fig. 7 schematically illustrates a block diagram of a proxy process exception self-recovery apparatus according to an embodiment of the present disclosure. As shown in fig. 7, the proxy process exception self-recovery apparatus 700 of this embodiment includes a survival status determination module 710, a test module 720, an exception positioning module 730, and a self-recovery module 740.
The survival status determination module 710 is configured to determine, when it is determined that the proxy process responds to the exception, a survival status of the proxy process according to the proxy process identification. In an embodiment, the survival status determination module 710 may be configured to perform the operation S210 described above, which is not described herein.
The test module 720 is configured to perform a business function test on the proxy process when it is determined that the proxy process survives. In an embodiment, the test module 720 may be configured to perform the operation S220 described above, which is not described herein.
The exception locating module 730 is configured to, when determining that the state of the proxy process is abnormal, locate the proxy process in an exception manner to determine an exception cause of the proxy process. In an embodiment, the anomaly locating module 730 may be configured to perform the operation S230 described above, which is not described herein.
The self-recovery module 740 is configured to perform self-recovery operation on the proxy process according to the anomaly cause. In an embodiment, the storage module 740 may be used to perform the operation S240 described above, which is not described herein.
According to an embodiment of the present disclosure, the anomaly locating module 730 includes: an acquisition sub-module and a first determination sub-module,
and the acquisition sub-module is used for acquiring the real-time value of the monitoring index. In an embodiment, the collecting sub-module may be used to perform the operation S231 described above, which is not described herein.
And the first determination submodule is used for determining the abnormal reason of the proxy process according to the real-time value and the preset threshold value of the monitoring index. In an embodiment, the first determining sub-module may be used to perform the operation S232 described above, which is not described herein.
According to an embodiment of the disclosure, the monitoring index includes a memory usage rate, a processor usage rate, and a disk usage rate, and the first determination submodule includes a first determination unit, a second determination unit, and a third determination unit.
And the first determining unit is used for determining that the abnormality is that the process memory usage rate is too high when the memory usage rate is determined to be larger than a first preset threshold value. In an embodiment, the first determining unit may be configured to perform the operation S2321 described above, which is not described herein.
And the second determining unit is used for determining that the abnormality is that the process processor utilization rate is too high when the processor utilization rate is determined to be larger than a second preset threshold value. In an embodiment, the second determining unit may be configured to perform the operation S2322 described above, which is not described herein.
And the third determining unit is used for determining that the abnormality is that the process disk usage rate is too high when the disk usage rate is determined to be larger than a third preset threshold value. In an embodiment, the third determining unit may be configured to perform the operation S2323 described above, which is not described herein.
According to an embodiment of the present disclosure, the self-recovery module includes: the second determination sub-module and the repair sub-module and the restart sub-module.
The second determining submodule is used for determining a corresponding repair script according to the abnormal cause; in an embodiment, the second determining sub-module may be used to perform the operation S241 described above, which is not described herein.
And the repair sub-module is used for operating the repair script to repair the proxy process. In an embodiment, the repair sub-module may be used to perform the operation S242 described above, which is not described herein.
And the restarting sub-module is used for restarting the proxy process. In an embodiment, the restart sub-module may be used to perform the operation S243 described above, which is not described herein.
According to an embodiment of the present disclosure, further comprising: and restarting the module.
And the restarting module is used for restarting the proxy process when the proxy process is determined to be not alive. In an embodiment, the restart module may be configured to perform the operation S330 described above, which is not described herein.
According to an embodiment of the present disclosure, further comprising: and a second test module.
And the second testing module is used for carrying out business function testing on the restarted proxy process so as to determine the self-recovery result of the proxy process. In an embodiment, the second test module may be used to perform the operation S360 described above, which is not described herein.
Any of the plurality of modules of the survival status determination module 710, the test module 720, the anomaly localization module 730, and the self-recovery module 740 may be combined in one module, or any of the plurality of modules may be split into a plurality of modules, according to embodiments of the present disclosure. Alternatively, at least some of the functionality of one or more of the modules may be combined with at least some of the functionality of other modules and implemented in one module. According to embodiments of the present disclosure, at least one of the survival status determination module 710, the test module 720, the anomaly localization module 730, and the self-recovery module 740 may be implemented at least in part as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system-on-chip, a system-on-substrate, a system-on-package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging the circuitry, or in any one of or a suitable combination of any of the three. Alternatively, at least one of the survival status determination module 710, the test module 720, the anomaly localization module 730, and the self-recovery module 740 may be at least partially implemented as a computer program module that, when executed, performs the corresponding functions.
Fig. 8 schematically illustrates a block diagram of an electronic device adapted to implement a proxy process exception self-recovery method in accordance with an embodiment of the present disclosure.
As shown in fig. 8, an electronic device 900 according to an embodiment of the present disclosure includes a processor 901 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 902 or a program loaded from a storage portion 908 into a Random Access Memory (RAM) 903. The processor 901 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. Processor 901 may also include on-board memory for caching purposes. Processor 901 may include a single processing unit or multiple processing units for performing the different actions of the method flows according to embodiments of the present disclosure.
In the RAM 903, various programs and data necessary for the operation of the electronic device 900 are stored. The processor 901, the ROM 902, and the RAM 903 are connected to each other by a bus 904. The processor 901 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM 902 and/or the RAM 903. Note that the program may be stored in one or more memories other than the ROM 902 and the RAM 903. The processor 901 may also perform various operations of the method flow according to embodiments of the present disclosure by executing programs stored in the one or more memories.
According to an embodiment of the disclosure, the electronic device 900 may also include an input/output (I/O) interface 905, the input/output (I/O) interface 905 also being connected to the bus 904. The electronic device 900 may also include one or more of the following components connected to the I/O interface 905: an input portion 908 including a keyboard, a mouse, and the like; an output portion 907 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage portion 908 including a hard disk or the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as needed. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on the drive 910 so that a computer program read out therefrom is installed into the storage section 908 as needed.
The present disclosure also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs that, when executed, implement a proxy process exception self-recovery method according to an embodiment of the present disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, the computer-readable storage medium may include ROM 902 and/or RAM 903 and/or one or more memories other than ROM 902 and RAM 903 described above.
Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the methods shown in the flowcharts. When the computer program product runs in a computer system, the program code is used for enabling the computer system to realize the agent process exception self-recovery method provided by the embodiment of the disclosure.
The above-described functions defined in the system/apparatus of the embodiments of the present disclosure are performed when the computer program is executed by the processor 901. The systems, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.
In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed, and downloaded and installed in the form of a signal on a network medium, via communication portion 909, and/or installed from removable medium 911. The computer program may include program code that may be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
In such an embodiment, the computer program may be downloaded and installed from the network via the communication portion 909 and/or installed from the removable medium 911. The above-described functions defined in the system of the embodiments of the present disclosure are performed when the computer program is executed by the processor 901. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.
According to embodiments of the present disclosure, program code for performing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that the features recited in the various embodiments of the disclosure and/or in the claims may be provided in a variety of combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, the features recited in the various embodiments of the present disclosure and/or the claims may be variously combined and/or combined without departing from the spirit and teachings of the present disclosure. All such combinations and/or combinations fall within the scope of the present disclosure.
The embodiments of the present disclosure are described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the disclosure, and such alternatives and modifications are intended to fall within the scope of the disclosure.

Claims (10)

1. A method for abnormal self-recovery of agent processes, wherein each agent process corresponds to a daemon, and the daemon is used for monitoring the agent process, the method comprising:
when the response abnormality of the proxy process is determined, determining the survival state of the proxy process according to the proxy process identification;
when the agent process is determined to survive, carrying out business function test on the agent process;
when the state of the proxy process is determined to be abnormal, carrying out abnormal positioning on the proxy process to determine the reason of the abnormality of the proxy process; and
and performing self-recovery operation on the proxy process according to the abnormality reason.
2. The method of claim 1, wherein the exception locating the proxy process to determine the cause of the exception of the proxy process comprises:
collecting a real-time value of a monitoring index; and
and determining the abnormal reason of the proxy process according to the real-time value and the preset threshold value of the monitoring index.
3. The method of claim 2, wherein the monitor indicator includes a memory usage rate, a processor usage rate, and a disk usage rate, and wherein determining the cause of the abnormality of the proxy process based on the real-time value and the monitor indicator preset threshold value includes:
when the memory usage rate is determined to be greater than a first preset threshold value, determining that the abnormality is caused by the process memory usage rate being too high;
when the processor utilization rate is determined to be greater than a second preset threshold value, determining that the abnormality is caused by the process processor utilization rate being too high; and
and when the disk usage rate is determined to be larger than a third preset threshold value, determining that the abnormality is caused by overhigh process disk usage rate.
4. The method of claim 1, wherein the performing a self-recovery operation on the proxy process according to the cause of the anomaly comprises:
determining a corresponding repair script according to the abnormal cause;
operating the repair script to repair the proxy process; and
restarting the proxy process.
5. The method according to any one of claims 1 to 4, further comprising:
and restarting the proxy process when the proxy process is determined to not survive.
6. The method of claim 5, further comprising, after performing a self-recovery operation on the proxy process according to the cause of the exception:
and carrying out business function test on the restarted proxy process to determine the self-recovery result of the proxy process.
7. A proxy process exception self-recovery apparatus, wherein each of said proxy processes corresponds to a daemon for monitoring said proxy processes, said apparatus comprising:
the survival state determining module is used for determining the survival state of the proxy process according to the proxy process identification when the response of the proxy process is abnormal;
the testing module is used for carrying out business function testing on the proxy process when the proxy process is determined to survive;
the abnormal positioning module is used for positioning the abnormal state of the proxy process when the state of the proxy process is abnormal, so as to determine the reason of the abnormal state of the proxy process; and
and the self-recovery module is used for carrying out self-recovery operation on the proxy process according to the abnormal reasons.
8. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the proxy process exception self-recovery method of any one of claims 1-6.
9. A computer readable storage medium having stored thereon executable instructions which when executed by a processor cause the processor to perform the proxy process exception self-recovery method of any one of claims 1 to 6.
10. A computer program product comprising a computer program which when executed by a processor implements the proxy process exception self-recovery method according to any one of claims 1 to 6.
CN202310760556.XA 2023-06-26 2023-06-26 Proxy process exception self-recovery method and device Pending CN116795599A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310760556.XA CN116795599A (en) 2023-06-26 2023-06-26 Proxy process exception self-recovery method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310760556.XA CN116795599A (en) 2023-06-26 2023-06-26 Proxy process exception self-recovery method and device

Publications (1)

Publication Number Publication Date
CN116795599A true CN116795599A (en) 2023-09-22

Family

ID=88041688

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310760556.XA Pending CN116795599A (en) 2023-06-26 2023-06-26 Proxy process exception self-recovery method and device

Country Status (1)

Country Link
CN (1) CN116795599A (en)

Similar Documents

Publication Publication Date Title
US10459780B2 (en) Automatic application repair by network device agent
US20140122931A1 (en) Performing diagnostic tests in a data center
US20160224400A1 (en) Automatic root cause analysis for distributed business transaction
CN107896172B (en) Monitoring fault processing method and device, storage medium and electronic equipment
CN113014445B (en) Operation and maintenance method, device and platform for server and electronic equipment
US9798625B2 (en) Agentless and/or pre-boot support, and field replaceable unit (FRU) isolation
CN111522703A (en) Method, apparatus and computer program product for monitoring access requests
US9690672B2 (en) Acquiring diagnostic data selectively
CN110896362B (en) Fault detection method and device
CN114884796A (en) Fault processing method and device, electronic equipment and storage medium
CN109639755B (en) Associated system server decoupling method, device, medium and electronic equipment
CN111694684A (en) Abnormal construction method and device of storage equipment, electronic equipment and storage medium
US20080216057A1 (en) Recording medium storing monitoring program, monitoring method, and monitoring system
CN116230067A (en) Automatic testing method, system, equipment and medium for solid state disk
CN116795599A (en) Proxy process exception self-recovery method and device
CN111897701B (en) Alarm processing method, device, computer system and medium for application
US10296967B1 (en) System, method, and computer program for aggregating fallouts in an ordering system
CN115190008B (en) Fault processing method, fault processing device, electronic equipment and storage medium
US11720431B1 (en) Identifying and reporting baseboard management controller performance degradation
CN116483566A (en) Resource processing method and device for server, electronic equipment and storage medium
CN116594968A (en) Method, system, equipment, medium and product for cleaning redundant files of server
CN116136818A (en) Health inspection method, device, equipment and medium for message queue
US20220164733A1 (en) Event monitoring with support system integration
CN116975200A (en) Method, device, equipment and medium for controlling working state of server
CN115292100A (en) Database fault processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination