WO2021169064A1

WO2021169064A1 - Edge network-based anomaly processing method and apparatus

Info

Publication number: WO2021169064A1
Application number: PCT/CN2020/091867
Authority: WO
Inventors: 朱少武
Original assignee: 网宿科技股份有限公司
Priority date: 2020-02-25
Filing date: 2020-05-22
Publication date: 2021-09-02
Also published as: CN111355610A

Abstract

Disclosed in the present invention are an edge network-based anomaly processing method and apparatus, being used for solving the technical problems in the prior art of the high pressure of a central node and the untimely processing of an anomaly caused due to the centralized anomaly processing of the central node. The method comprises: an edge node analyzes service data by using an anomaly analysis rule, and after it is determined that a first service in the edge node is abnormal, if the edge node has the anomaly processing rule of the first service, the first service is repaired by using the anomaly processing rule; if the edge node has no the anomaly processing rule of the first service, report to the central node. By configuring the anomaly identification and anomaly repair of a service at an edge node side instead of reporting to the central node in a unified mode, the present invention can effectively reduce the working pressure of the central node, and reduce network overhead and a time cost; moreover, the edge node performs self-closed-loop processing on the anomaly of the edge node, and thus, the anomaly can be found and processed in a timely fashion, thereby improving the anomaly processing efficiency.

Description

An abnormal processing method and device based on edge network

Technical field

The present invention relates to the technical field of network security, in particular to an abnormal processing method and device based on an edge network.

Background technique

At this stage, when providing services to users, it is usually necessary to monitor the status of the service. Once an abnormal service status is monitored, the service needs to be repaired in time to improve the availability and service capability of the service.

In an existing self-closed loop strategy, each edge node collects its own service data and reports it to the central node, and then the central node analyzes whether each edge node is abnormal based on these service data. If there is an abnormality, the operation and maintenance personnel are notified to go Repair abnormal edge nodes. However, the problem with this method is that there is a large amount of service data in each edge node, and uploading a large amount of service data to the central node for centralized anomaly analysis usually requires the central node to spend a lot of time and cost, which leads to the central node’s failure. The pressure is great, and it will also reduce the real-time performance of exception handling.

In summary, there is an urgent need for an abnormality processing method based on an edge network to solve the technical problems of high pressure on the central node and untimely processing of abnormalities caused by the centralized analysis of the abnormality of each edge node by the central node in the prior art.

Summary of the invention

The present invention provides an abnormality processing method and device based on an edge network, which is used to solve the technical problems of high pressure on the central node and untimely processing of abnormalities caused by the centralized analysis of the abnormality of each edge node by the central node in the prior art.

In the first aspect, the present invention provides an abnormality processing method based on an edge network, the edge network including a central node and at least one edge node; the method includes:

Any edge node analyzes service data using anomaly analysis rules to determine whether the first service in the edge node is abnormal; the service data includes service data corresponding to the first service; further, the edge node determines the first service After the service is abnormal, if there is an exception handling rule for the first service in the edge node, use the exception handling rule to repair the first service; if the first service does not exist in the edge node The exception handling rules of, are reported to the central node.

In the present invention, by placing service abnormality identification and abnormality repair on the side of the edge node for execution, instead of reporting to the central node uniformly, the work pressure of the central node can be effectively reduced, and network overhead and time cost can be saved; Nodes perform self-closed loop processing of their own exceptions, and can also discover and handle exceptions in time, which not only improves the efficiency of exception identification and processing, but also restores service availability in time.

In a possible implementation manner, after the reporting to the central node, the central node determines the exception handling rule of the first service and sends it to the edge node; accordingly, the edge node Receiving the exception handling rule of the first service sent by the central node; the edge node uses the exception handling rule of the first service to repair the first service.

In the above implementation, when the edge node cannot handle the exception, the exception is reported to the central node, and the central node issues exception handling rules, so that the edge node can handle the exception according to the exception handling rule set by the central node, and improve the exception handling. Accuracy and comprehensiveness.

In a possible implementation manner, before any edge node analyzes service data using anomaly analysis rules, it also sends a registration request to the central node; the registration request is used to establish between the central node and the edge node Communication connection; In this way, after the edge node establishes a communication connection with the central node, it obtains the self-closed loop strategy corresponding to various services from the central node; the various services include the first service; the self-closing strategy corresponding to any service The closed-loop strategy includes exception analysis rules for the service, or also includes exception handling rules for the service.

In the above implementation, through the central node unified management and each edge node obtains the self-closed loop strategy corresponding to various services from the central node, the self-closed loop strategy can be configured on the central node side instead of separately in each edge node Separate configuration, thereby improving the flexibility and convenience of self-closed-loop strategy configuration; and, by using the service as a unit to configure self-closed-loop strategy, it can make the abnormal identification process more targeted, better reflect the true service capabilities of the service, and improve Accuracy of anomaly recognition and anomaly handling.

In a possible implementation manner, the self-closed loop strategy corresponding to the various services is obtained by the following method: the central node obtains and analyzes the abnormal monitoring after detecting that the user enters the abnormal monitoring configuration information in the abnormal monitoring configuration interface The configuration information is obtained, and the self-closed loop strategy corresponding to various services is obtained and stored in the local database of the central node.

In the above implementation, the self-closed loop strategy corresponding to various services can be set by the user on the abnormal monitoring configuration interface of the central node, and the self-closed loop strategy of the service can be decoupled from the business. The service is configured with different self-closed-loop strategies to improve the flexibility of exception handling; moreover, configuring each self-closed-loop strategy through the configuration interface can also simplify operations, reduce manual operation and maintenance costs and events, and improve the efficiency of exception handling.

In a possible implementation, the anomaly analysis rule of any service includes an anomaly analysis rule corresponding to each monitoring event in the service; the any edge node analyzes the service data using the anomaly analysis rule to determine that the edge node Whether the first service is abnormal includes: for any monitoring event in the first service, the edge node parses out the service data of the monitoring event from the service data of the first service, and calls and The abnormal analysis algorithm that matches the type of the service data of the monitoring event analyzes the service data of the monitoring event, and if the analysis result meets the first abnormal condition corresponding to the monitoring event, it is determined that the monitoring event is abnormal, at least according to the The monitoring event determines whether the first service is abnormal; if the analysis result does not meet the first abnormal condition corresponding to the monitoring event, it is determined that the first service is not abnormal.

In the above implementation, by setting a common anomaly analysis algorithm for the same type of monitoring events, and identifying different monitoring events by the abnormal conditions, it is no longer necessary to set the corresponding algorithm for each monitoring event, reducing the difficulty of development and improving the anomaly Flexibility of analysis.

In a possible implementation manner, the edge node determines whether the first service is abnormal at least according to the monitoring event, including: if the edge node determines that the abnormal condition corresponding to the monitoring event only includes the first abnormal condition , It is determined that the first service is abnormal; if it is determined that the abnormal condition corresponding to the monitoring event also includes a second abnormal condition, and the second abnormal condition is the impact time, then when the abnormal duration of the monitoring event is less than the When the impact time, it is determined that the first service is not abnormal, and when the abnormal duration of the monitoring event is greater than or equal to the impact time, it is determined that the first service is abnormal.

In a possible implementation, the method further includes: if the edge node determines that the second abnormal condition is that the associated monitoring event is abnormal at the same time, determining whether other monitoring events associated with the monitoring event are abnormal, when When the other monitoring event is also abnormal, it is determined that the first service is abnormal, and when at least one other monitoring event is normal, it is determined that the first service is not abnormal.

In the foregoing implementation manner, by setting the associated monitoring event or impact time, it is possible to accurately determine the truly abnormal service, reduce the probability of misjudgment, and correspondingly improve the accuracy of abnormality recognition and abnormality processing.

In the second aspect, the present invention provides an abnormality processing device based on an edge network. The edge network includes a central node and at least one edge node; the device includes:

An anomaly analysis module, configured to analyze service data using anomaly analysis rules to determine whether the first service in the edge node is abnormal; the service data includes service data corresponding to the first service;

The exception processing module is configured to, after determining that the first service is abnormal, if there is an exception handling rule for the first service in the edge node, use the exception handling rule to repair the first service; If there is no exception handling rule for the first service in the edge node, it is reported to the central node.

In a possible implementation, after the exception handling module reports to the central node, the central node determines the exception handling rule of the first service and sends it to the edge node; the device further It includes a transceiver module, the transceiver module is configured to: receive the exception handling rule of the first service sent by the central node; accordingly, the exception handling module is also configured to: use the exception handling rule of the first service Repair the first service.

In a possible implementation, the device further includes a transceiver module; before the abnormality analysis module analyzes the service data using abnormal analysis rules, the transceiver module is configured to: send a registration request to the central node; the registration The request is used for the central node to establish a communication connection with the edge node; and, after the communication connection is established with the central node, obtain self-closed loop strategies corresponding to various services from the central node; the various services include the first A service; the self-closed loop strategy corresponding to any service includes the exception analysis rules of the service, or also includes the exception handling rules of the service.

In a possible implementation, the anomaly analysis rule of any service includes an anomaly analysis rule corresponding to each monitoring event in the service; the anomaly analysis module is specifically configured to: target any one of the first services The monitoring event, analyzing the service data of the monitoring event from the service data of the first service, and invoking an abnormality analysis algorithm that matches the type of the service data of the monitoring event to analyze the service data of the monitoring event, If the analysis result meets the first abnormal condition corresponding to the monitoring event, determine that the monitoring event is abnormal, and determine whether the first service is abnormal at least according to the monitoring event; if the analysis result does not meet the monitoring event corresponding If the first abnormal condition is found, it is determined that the first service is not abnormal.

In a possible implementation manner, the abnormality analysis module is specifically configured to: if it is determined that the abnormal condition corresponding to the monitoring event only includes the first abnormal condition, determine that the first service is abnormal; if it is determined that the monitoring event The corresponding abnormal condition also includes a second abnormal condition, and the second abnormal condition is the impact time, when the abnormal duration of the monitoring event is less than the impact time, it is determined that the first service is not abnormal, and when the When the abnormal duration of the monitoring event is greater than or equal to the impact time, it is determined that the first service is abnormal.

In a possible implementation manner, the abnormality analysis module is further configured to: if it is determined that the second abnormal condition is that the associated monitoring event is abnormal at the same time, determine whether other monitoring events associated with the monitoring event are abnormal, and when all When the other monitoring event is also abnormal, it is determined that the first service is abnormal, and when at least one other monitoring event is normal, it is determined that the first service is not abnormal.

In a third aspect, a computing device provided by the present invention includes at least one processor and at least one memory, wherein the memory stores a computer program, and when the program is executed by the processor, the processor Perform any of the methods described in the first aspect above.

In a fourth aspect, the present invention provides a computer-readable storage medium that stores a computer program executable by a computing device. When the program runs on the computing device, the computing device executes the first aspect described above. Any of the methods described.

These and other aspects of the present invention will be more concise and understandable in the description of the following embodiments.

Description of the drawings

In order to explain the technical solutions in the embodiments of the present invention more clearly, the following will briefly introduce the drawings needed in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained from these drawings without creative labor.

FIG. 1 is a schematic diagram of a system architecture of an edge network provided by an embodiment of the present invention;

2 is a schematic diagram of a corresponding process flow of an edge network-based exception handling method provided by an embodiment of the present invention;

3 is a schematic diagram of the overall interaction flow corresponding to an exception handling method provided by an embodiment of the present invention;

4 is a schematic structural diagram of a monitoring device provided by an embodiment of the present invention;

Fig. 5 is a schematic structural diagram of a computing device provided by an embodiment of the present invention.

Detailed ways

In order to make the objectives, technical solutions, and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all of them. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.

FIG. 1 is a schematic diagram of a system architecture of an edge network provided by an embodiment of the present invention. As shown in FIG. 1, the edge network includes a central node 110 and at least one edge node, such as an edge node 121, an edge node 122, and an edge node 123. Wherein, the central node 110 may be connected to any edge node, for example, it may be connected in a wired manner, or may be connected in a wireless manner, which is not specifically limited.

In the embodiment of the present invention, the central node 110 is a remote device, and each edge node is a near-end device, and any edge node can also be connected to a client (not shown in FIG. 1) to provide a near-end service to the client. For example, as shown in Figure 1, the edge node 121 can be connected to the client 131 and the client 132 and provide near-end services to the client 131 and the client 132; the edge node 122 can be connected to the client 133 and provide the client 133 provides near-end services; the edge node 123 can be connected to the client 134 and the client 135, and provide near-end services to the client 134 and the client 135. Among them, the client can be any terminal device, such as a notebook computer, an IPad, a mobile phone, a router, and other hardware devices with communication interaction functions, which are not limited.

In specific implementation, the central node 110 can pre-deliver business data to each edge node. In this way, when the client has a data access request, the client can send a data access request to the central node 110, and the data access request arrives in advance. The edge node adjacent to the client. Correspondingly, the edge node detects whether the service data corresponding to the data access request is stored locally according to the data access request. If so, the service data can be directly responded to the client; if not, the data access request can be forwarded to the central node 110.

It should be noted that the architecture in Figure 1 is only an exemplary description, and does not constitute a limitation to the solution; in specific implementation, multiple layers (ie, two or more layers) can also be deployed in the edge network At the edge node, the client's data access request first reaches the lowest edge node. If the bottom edge node stores the corresponding business data locally, the bottom edge node responds to the corresponding business data to the client. If the bottom edge node's local If the corresponding business data is not stored, the bottom edge node forwards the data access request to the next level edge node, and the next level edge node performs the data response operation until the corresponding business data is responded to the client.

It should be noted that the edge node in the embodiment of the present invention may be an edge device, an edge device cluster deployed according to a cluster, or a process in an edge device, which is not limited.

Based on the edge network illustrated in Fig. 1, Fig. 2 is a schematic diagram of the process corresponding to an edge network-based exception handling method provided by an embodiment of the present invention. The method is applicable to any edge node in the edge network, and the method includes:

Step 201: The edge node analyzes the service data using an abnormality analysis rule to determine whether the first service in the edge node is abnormal; the service data includes service data corresponding to the first service.

Step 202: After determining that the first service is abnormal, the edge node determines whether there is an exception handling rule for the first service in the edge node, and if so, uses the exception handling rule to perform the first service Repair, if not, report the first service exception to the central node.

In the embodiment of the present invention, by placing service abnormality identification and abnormality repair on the edge node side for execution instead of uniformly reporting to the central node for execution, the work pressure of the central node can be effectively reduced, and network overhead and time cost can be saved; In the solution, the edge node performs self-closed loop processing of its own abnormalities, and can also discover and handle abnormalities in time, which not only improves the efficiency of abnormal identification and processing, but also restores service availability in time.

In step 201, the anomaly analysis rule can be configured in the edge node based on the anomaly monitoring configuration information. The anomaly monitoring configuration information can be pre-configured on the edge node side by the operation and maintenance personnel, or it can be synchronized to the central node after being configured by the business personnel. The edge node may also be obtained by the edge node from a third-party interface device, and the specifics are not limited.

As a possible implementation, the abnormal monitoring configuration information can be configured in the edge node through the following steps:

Step a: The central node receives the abnormal monitoring configuration information input by the user.

In specific implementation, the central node can provide users with an abnormal monitoring configuration interface. After detecting that the user inputs abnormal monitoring configuration information in the abnormal monitoring configuration interface, it can obtain and analyze the abnormal monitoring configuration information, and use the service as a unit to configure the abnormal monitoring The abnormal monitoring configuration information belonging to the same service is extracted from the information, so as to obtain the abnormal monitoring configuration information corresponding to various services. Further, the central node can parse the abnormal monitoring configuration information corresponding to any service, obtain the self-closed loop strategy corresponding to the service, and store it in the local database of the central node. Wherein, the self-closed loop strategy corresponding to any service may include the exception analysis rule of the service, and may also include the exception handling rule of the service and/or the acquisition rule of service data, which is not limited.

It should be noted that the self-closed-loop strategy refers to a strategy for self-closed-loop processing of abnormal conditions of the service, including various rules related to self-closed-loop processing, such as exception analysis rules, exception handling rules, data acquisition rules, abnormal conditions, and so on. In other words, the self-closed-loop strategy is actually obtained by extracting various rules from the abnormal monitoring configuration information of the service, and belongs to the collective name of the various rules for self-closed-loop processing of the same service, not the processing method.

In an example, the exception analysis rule corresponding to any service may include the exception analysis rule for each monitoring event in the service, and the exception handling rule corresponding to any service may include the exception processing rule for each monitoring event in the service.

Table 1 illustrates a schematic table of a self-closed loop strategy corresponding to each service.

Table 1

As shown in Table 1, any service can correspond to one monitoring event or multiple monitoring events, and each monitoring event can be set with corresponding abnormal conditions and abnormal handling rules. For example, the concurrent service corresponds to two monitoring events, namely the concurrent volume event and the concurrent error rate event. When the concurrent volume is greater than or equal to 10,000, the concurrent volume event is determined to be abnormal, so the concurrent service process can be added to restore the concurrent service in the edge node When the concurrent error rate is greater than 45%, it is determined that the concurrent error rate event is abnormal, so the concurrent service can be restarted to restore the accuracy of the concurrent service in the edge node. For another example, a resource service corresponds to a monitoring event, that is, a resource occupancy event. When the resource occupancy is greater than or equal to 95% for more than 5 minutes, it is determined that the resource service is abnormal, so the cache of the resource service can be cleaned to restore the edge node Availability of resource services.

In an example, the central node can also support the user to create new self-closed-loop strategies, clear existing self-closed-loop strategies, modify existing self-closed-loop strategies, or query existing self-closed-loop strategies and other update operations, and the self-closed loop is detected After the strategy is updated, the central node can also automatically load the updated abnormal self-closed loop strategy to improve the accuracy of abnormal handling. Take the clearing of the existing self-closed loop strategy as an example. When it is detected that the user triggers the modification instruction of the existing self-closed loop strategy in the abnormal monitoring configuration interface, the central node can also display the existing self-closed loop strategy to the user, and the user can Directly select the self-closed-loop strategy to be cleared for deletion, or modify the state of the self-closed-loop strategy to be cleared from the effective state to the invalid state to delete the self-closed-loop strategy to be cleared.

In the above example, by setting the self-closed-loop strategy corresponding to various services on the abnormal monitoring configuration page of the central node, the self-closed-loop strategy of the service can be decoupled from the business, and users can support different services according to their respective business needs. Configure different self-closing-loop strategies to improve the flexibility of exception handling; moreover, configuring each self-closing-loop strategy through the configuration interface can also simplify operations, reduce manual operation and maintenance costs and events, and improve the efficiency of exception handling.

Step b: The edge node sends a registration request to the central node when it is started.

Step c: The central node verifies the registration request of the edge node. If the verification is successful, it establishes a communication connection with the edge node (used to allow the edge node to obtain a self-closed loop strategy corresponding to various services), and sends a successful registration to the edge node In the response message, if the verification fails, it refuses to establish a communication connection with the edge node, and sends a registration failure response message to the edge node.

Step d: If the edge node receives the response message of successful registration, it can obtain the self-closed loop strategy corresponding to various services from the central node, and store the self-closed loop strategy corresponding to various services in the local database. Correspondingly, if the edge node does not receive the response message, or receives the response message of the registration failure, it can periodically send the registration request to the central node repeatedly, and if the registration is not successful after the set number of repeated transmissions, it will give up Register and generate warning messages.

Among them, various services can be services deployed on edge nodes, or all services stored in central nodes, without limitation.

Taking various services as the services deployed on the edge node as an example, based on Table 1, after receiving a successful registration response message, if it is determined that there are concurrent services and port services deployed locally, the edge node can obtain concurrency from the central node The self-closed loop strategy corresponding to the volume service and the self-closed loop strategy corresponding to the port service are stored in the local database of the edge node. Among them, there can be many ways to obtain, for example, the edge node can send an obtain request to the central node, and the obtain request carries the identifier of the concurrent service and the identifier of the port service, so that the central node corresponds to the concurrent service according to the obtain request. The self-closed loop strategy and the self-closed loop strategy corresponding to the port service are returned to the edge node. Alternatively, the central node can upload the self-closed loop strategy corresponding to all services to the set location, and authorize the access rights of the set location to the edge node, so that the edge node can automatically go to the set location to obtain the self-closed loop corresponding to the concurrent service Strategies and self-closed loop strategies corresponding to port services, etc.

As an example, after successfully registering in the central node, the edge node can also periodically obtain the self-closed loop strategy corresponding to various services from the central node to ensure that the self-closed loop strategy corresponding to any service is in the configuration side (that is, the central node). The consistency of the node) and the executor (that is, the edge node) improves the accuracy of exception handling. As another example, the central node can also monitor the local database in real time. Once it detects that the user has updated the self-closed loop strategy corresponding to a certain service, it can issue an update instruction to the edge node corresponding to the service, so that the edge node can obtain it in real time. The updated self-closed-loop strategy ensures the consistency of the self-closed-loop strategy corresponding to the service in the configuration side and the execution side, and improves the accuracy of abnormal handling of the service.

In the embodiment of the present invention, a service process of any service (such as the first service) is set in the edge node, and the edge node provides the first service to the client or other devices through the service process of the first service. After the edge node stores the self-closed loop strategy corresponding to the first service in the local database, the edge node can also obtain the service data of the first service by invoking the service process of the first service. Among them, there can be many ways to obtain, for example, after the service related to any monitoring event in the first service is executed in the service process that listens to the first service, an obtaining request can be sent to the service process of the first service, and The acquisition request carries the identifier of the monitoring event, so that the service process of the first service returns the service data corresponding to the monitoring event in real time, or the acquisition request can be sent to the service process of the first service according to the set period, so that the first service The service process returns the service data corresponding to the monitoring event according to the set period, etc., which are not limited.

In a possible implementation, the edge node can obtain the service data of the first service in the following way: the self-closed loop strategy also includes the data source interface corresponding to each monitoring event in the first service, and the data source interface is pre-encapsulated in The internal function function of the edge node, the data source interface can record the service data corresponding to the monitoring event during the process of the service process providing the first service. In this way, for any monitoring event in the first service, the edge node can first determine the data source interface corresponding to the monitoring event from the self-closed loop strategy, and then obtain the corresponding monitoring event by calling the data source interface corresponding to the monitoring event Service data.

For example, a first service process is set in the edge node, and the first service process is used to provide port services to the Internet Protocol (IP) address 127.0.0.1. For the number of requests stored in the local database, the edge node may request call number data corresponding to the event source interface _to the first service providing server process port number acquisition requesting access port IP address 127.0.0.1 is set in the period (i.e., data service).

It should be noted that the self-closed loop strategy may also include other configuration information required to call the data source interface, such as environment variables and communication protocol conventions, which are not limited.

In the embodiment of the present invention, the acquisition operation may be performed by the monitoring process set in the edge node, and socket communication is adopted between the monitoring process and the service process to improve the efficiency and accuracy of communication.

In the above implementation, by setting the data source interface corresponding to the monitoring event in the exception handling rule, the edge node can directly call the data source interface corresponding to the monitoring event to obtain the corresponding service data without manual configuration. The operation is simple, easy to implement, and can also improve the efficiency of service data acquisition.

In the embodiment of the present invention, the abnormality analysis rule corresponding to the monitoring event may include one or more abnormal conditions, and each monitoring event may correspond to its own first abnormal condition, and the first abnormal condition is used to indicate whether the monitoring event is abnormal. If the monitoring event only corresponds to the first abnormal condition, the first abnormal condition can not only indicate the abnormality of the monitoring event, but also the abnormality of the service corresponding to the monitoring event; if the monitoring event corresponds to the first abnormal condition and at least one second abnormal condition at the same time Abnormal conditions, the first abnormal condition is used to indicate the abnormality of the monitoring event, and the first abnormal condition and the at least one second abnormal condition together indicate the abnormality of the service corresponding to the monitoring event. Among them, the at least one second abnormal condition can be set by those skilled in the art based on experience, or can also be set according to actual needs, which is not specifically limited.

In specific implementation, if the abnormality analysis rule corresponding to the monitoring event only includes the first abnormal condition, when the service data corresponding to the monitoring event meets the first abnormal condition, it means that the service corresponding to the monitoring event is in an abnormal state in the edge node, so, The exception handling rules corresponding to the monitoring event can be directly invoked to process the edge node, so as to restore the service corresponding to the monitoring event in the central node. If the service data corresponding to the monitoring event does not meet the first abnormal condition, it can be determined that the monitoring event is in a normal state in the edge node, and therefore, no processing is required. For example, as shown in Table 1, the concurrent volume events and concurrent error rate events in the concurrent service only correspond to the first abnormal condition, and the concurrent volume events and concurrent error rate events correspond to their respective exception handling rules. Therefore, when concurrent In the event of an exception in any of the quantitative event and the concurrent error rate event, the concurrent service exception can be determined, so that the exception handling rule corresponding to the abnormal monitoring event can be used to process the concurrent service in the edge node.

Correspondingly, if the abnormality analysis rule corresponding to the monitoring event also includes at least one second abnormal condition, only when the service data corresponding to the monitoring event meets the first abnormal condition and at least one second abnormal condition at the same time, the service corresponding to the monitoring event is explained The edge node is in an abnormal state, so that the abnormal handling rules corresponding to the monitoring event can be called to process the edge node, so as to restore the service corresponding to the monitoring event in the center node. When the service data corresponding to the monitoring event only meets the first abnormal condition and does not meet at least one second abnormal condition, it means that the monitoring event is abnormal in the edge node, and the service corresponding to the monitoring event is not abnormal in the edge node, so no processing is required. .

In an example, the second abnormal condition may include the associated monitoring event and/or impact time, and the second abnormal condition may be determined based on the actual failure scenario of the service. Specifically, for any service, you can first obtain the historical service data corresponding to each monitoring event when the service fails, and then combine the historical service data corresponding to each monitoring event to analyze the characteristic factors that caused the service failure, and set according to the characteristic factors The second abnormal condition. For example, if the characteristic factor is that both a certain monitoring event and other monitoring events are abnormal and the service is truly abnormal, then the second abnormal condition corresponding to the monitoring event can be set to be associated with other monitoring events, and the monitoring event can be associated with other monitoring events. Corresponding to the same exception handling rule, if the characteristic factor is that the duration of a certain monitoring event abnormality is greater than the impact time, the service is truly abnormal, and the second abnormal condition corresponding to the monitoring event can be set as the impact time.

For example, based on Table 1, the second abnormal condition corresponding to the abnormal status code event in the port service is the associated request count event. When the first abnormal condition corresponding to the abnormal status code event is used to determine that the abnormal status code event is abnormal, it can be Determine whether the request count event associated with the abnormal status code event is abnormal. If the request count event is also abnormal, it can be determined that the port service is abnormal, so that the port service can be corrected using the exception handling rules corresponding to the abnormal status code event. If it is not abnormal, it can be determined that the port service is not abnormal, so it is not necessary to deal with it. For another example, the second abnormal condition corresponding to the resource occupancy event in the resource service is the impact time (≥5 minutes). When the time period when the resource occupancy exceeds 95% is less than 5 minutes, although the resource occupancy event is abnormal, the resource service It can quickly return to normal, and the resource service is not really abnormal, so it can be left untreated; and when the time period when the resource usage exceeds 95% is greater than or equal to 5 minutes, the resource service cannot quickly return to normal, and the resource service is truly abnormal, so the resource can be used The exception handling rules corresponding to the occupancy event amend the resource service.

It should be noted that Table 1 is only an exemplary description, and does not constitute a limitation to the solution. In specific implementation, each monitoring event can also correspond to three or more abnormal conditions, for example, it can also correspond to The third abnormal condition, the third abnormal condition is used to indicate the abnormal level of the service. Only when the abnormal level of the service exceeds the abnormal level indicated by the third abnormal condition, the abnormal handling rule corresponding to the monitoring event is used for repair, or can also be set The fourth abnormal condition, the fourth abnormal condition is used to indicate the combined abnormal situation of the service. Only when the services indicated by the fourth abnormal condition are abnormal, the abnormal handling rules corresponding to the monitoring event are used for repair, etc., and the specific is not limited .

In the above example, setting the second abnormal condition for the monitoring event by combining the real failure scenario can reduce the probability of detecting false abnormal services and improve the accuracy of detection; and, by setting the second abnormal condition to affect time and/or The associated monitoring events are abnormal at the same time, and the abnormality of the service can be comprehensively judged based on the abnormal duration characteristics and/or the abnormal quantity characteristics, and the accuracy of abnormal judgment can be improved.

In a possible implementation, the abnormality analysis rule corresponding to the monitoring event can also include the abnormality analysis algorithm corresponding to the monitoring event. The same type of monitoring event can correspond to the same type of abnormality analysis algorithm, because the abnormality analysis rule corresponding to the monitoring event includes anomaly Analyze algorithms and abnormal conditions, so the abnormal analysis rules corresponding to each monitoring event can be unique. In this way, after the service data corresponding to the monitoring event is obtained, the corresponding abnormality analysis algorithm can be called according to the type of the service data to calculate the service data, so as to filter out the abnormal judgment data in the service data, and then judge whether the abnormal judgment data meets the monitoring requirements. If the abnormal condition corresponding to the event is met, it is determined that the monitoring event is abnormal, and if it is not met, it is determined that the monitoring event is not abnormal.

In the embodiment of the present invention, the abnormality analysis algorithm may include any one or more of log keyword analysis method, service health value analysis method, threshold value analysis method, and service self-defined analysis method. The following are respectively analyzed:

The log keyword analysis method is used for abnormal analysis of the service data of the log data type. The service data of the log data type includes batch processing time, batch processing success amount, etc. In specific implementation, the service data can be segmented based on preset log fields to obtain each monitoring log field, and then multiple pattern matching algorithms (such as Aho-Corasick algorithm, wu-manber algorithm, etc.) can be used to match each monitoring log field. The successfully matched monitoring log field is used as the abnormality judgment data and compared with the preset log field in the abnormal condition to determine whether the monitoring event is abnormal.

The service health value analysis method is used to perform abnormal analysis on the service data of the operation data type. The service data of the operation data type includes status code, bandwidth, number of requests, resource occupancy rate, etc. In specific implementation, you can first train the monitoring model corresponding to any indicator based on historical service data, and then use the monitoring model corresponding to the indicator to predict the service data under the indicator to obtain the predicted score of the service data under the indicator. Then compare the predicted score with the self-defined index score, determine the health level based on the comparison result, use the health level as the abnormality judgment data, and compare it with the preset health level in the abnormal condition to determine whether the monitoring event is abnormal.

Threshold analysis method is used for abnormal analysis of service data of indicator data type. Service data of indicator data type includes the number of requests, the number of alarms, and so on. In specific implementation, the monitoring value of the monitoring event under each specific indicator can be extracted from the service data according to the specific indicator of the service, and the monitoring value under the specific indicator is used as the abnormal judgment data, and the threshold value under the specific indicator in the abnormal condition Make a comparison to determine whether the monitoring event is abnormal.

The service custom analysis method is used to perform anomaly analysis on service data of unknown data types or users who require custom anomaly analysis algorithms. In specific implementation, after detecting that the user has a need for a custom analysis method for the service, the edge node can provide the user with a general interface so that the user can upload the custom anomaly analysis algorithm through the general interface. Correspondingly, after receiving the customized anomaly analysis algorithm, the edge node can also load the anomaly analysis algorithm, and use the loaded anomaly analysis algorithm to calculate the service data corresponding to the monitoring event to obtain the abnormality judgment data. Moreover, the user can also customize the abnormal conditions at the same time. After the abnormality judgment data is calculated, the edge node can also compare the abnormality judgment data with the user-defined abnormal conditions to determine whether the monitoring event is abnormal.

Based on the above several abnormal analysis algorithms, in specific implementation, after obtaining the service data corresponding to the monitoring event, if the type of the service data is determined to be the log data type, the log keyword analysis method can be called to analyze the abnormality of the service data. If the service data type is determined to be the operational data type, the service health value analysis method can be used to analyze the service data. If the service data type is determined to be the indicator data type, the threshold value analysis method can be used to analyze the service data abnormality. If the type of service data is determined to be other data types or the user has a need for a custom anomaly analysis algorithm, then the service custom analysis method can be called to perform anomaly analysis on the service data.

In the embodiment of the present invention, by setting a unified abnormality analysis algorithm and setting respective abnormal conditions corresponding to each monitoring event, the abnormality analysis method can be decoupled from the actual business, and the flexibility of abnormality analysis can be improved. The corresponding abnormal analysis algorithm is set for each monitoring event, which reduces the difficulty of development and further improves the flexibility of abnormal analysis. Moreover, the above method also supports user-defined anomaly analysis algorithms, which can not only continuously supplement new anomaly analysis algorithms according to user settings, improve the applicable scenarios of anomaly analysis, but also meet the needs of different users and improve the versatility of anomaly analysis.

In step 202, when it is determined that the first service is abnormal, the edge node can query the local database to determine whether there is an exception handling rule for the first service. If so, it can directly call the exception handling rule of the first service to perform the first service. Repair, if it does not exist, an exception message can be generated and reported to the central node 110. Wherein, the abnormal message carries related abnormal data of the first service, such as the identifier of the abnormal monitoring event in the first service, the abnormal field, the abnormal time, and the abnormal level in the service data corresponding to the abnormal monitoring event.

In a possible implementation, after receiving the abnormal message, the central node 110 can first parse the abnormal message to obtain the abnormal field in the service data corresponding to the abnormal monitoring event, and then calculate the abnormal field and each prediction in the operation and maintenance knowledge base. Set the matching degree of the abnormal event, and use the preset abnormal event with the matching degree greater than the preset matching degree as the preset abnormal event corresponding to the monitoring event. If there is a preset abnormal event with a matching degree greater than the preset matching degree, the central node 110 may analyze the matched preset abnormal event to generate a corresponding abnormal handling rule, and send the abnormal handling rule to the edge node. If there is no preset abnormal event with a matching degree greater than the preset matching degree, the central node 110 may push the exception message to the user, and the user sets the corresponding exception handling rule, and sends the set exception handling rule to the edge node.

Correspondingly, after receiving the exception handling rule, the edge node can not only use the exception handling rule to repair the first service, but also use the abnormal monitoring event in the first service and the exception handling rule of the first service to update the local database. The self-closed loop strategy corresponding to the first service stored in, to continuously enrich the local database. In this way, when the first service exception occurs again, the exception handling rule of the first service in the local database can be directly called to repair the exception without sending it to the central node, thereby improving the exception handling capability of the edge node.

In an example, the central node 110 may also display the service status of each edge node to the user, so that the user can check the abnormal status and distribution status of various services in a timely manner. The displayed information can include the abnormal situation of any service in each edge node, the abnormal situation of each monitoring event in any service, the processing result of the abnormal monitoring event, the distribution of abnormal monitoring events, and the correlation of each monitoring event. Any one or more of. Moreover, the central node 110 may be displayed to the user in the form of a holographic view, or may be displayed to the user in the form of a table, which is not limited.

In the embodiment of the present invention, by placing service abnormality identification and abnormality repair on the side of the edge node for execution, instead of uniformly reporting to the central node, the work pressure of the central node can be effectively reduced, and network overhead and time cost can be saved; and this solution The edge node performs self-closed loop processing on its own anomalies, and can also discover and handle anomalies in time, which not only improves the efficiency of anomaly identification and processing, but also restores service availability in time.

Fig. 3 is a schematic diagram of the overall interaction flow corresponding to an exception handling method provided by an embodiment of the present invention. As shown in Fig. 3, the method includes:

Step 301: After detecting that the user inputs abnormal monitoring configuration information in the abnormal monitoring configuration interface, the central node acquires and stores the abnormal monitoring configuration information.

Wherein, the abnormality monitoring configuration information may include the self-closed-loop strategy corresponding to each service. The self-closed-loop strategy corresponding to any service may include the abnormal analysis rules of the service, and may also include the abnormal handling rules of the service and/or the acquisition of service data. rule.

Step 302: The edge node sends a registration request to the central node when it is started.

In step 303, the central node verifies the registration request. If the verification is successful, step 304 is executed, and if the verification fails, step 315 is executed.

Step 304: The central node sends a response message of successful registration to the edge node.

Step 305: The edge node obtains the self-closed loop strategy corresponding to various services from the central node and stores it in the local database of the edge node; the various services include the first service.

Step 306: The edge node invokes the data source interface corresponding to the first service to obtain the service data of the first service from the service process of the first service.

In step 307, the edge node makes the abnormality analysis rule of the first service analyze the service data of the first service to determine whether the first service is abnormal.

In step 308, the edge node queries the local database to determine whether there is an exception handling rule for the first service, if not, execute step 309, and if yes, execute step 312.

Step 309: The edge node sends an abnormal message to the central node, and the abnormal message carries related abnormal data of the first service.

Step 310: The central node sets an exception handling rule for the first service based on the parsed related exception data of the first service.

Step 311: The central node sends the exception handling rule of the first service to the edge node.

Step 312: The edge node uses the exception handling rule of the first service to repair the first service.

Step 313: If the central node determines that the exception handling rule of the first service is not stored in the local database, it updates the local database using the exception handling rule of the first service.

In step 314, the edge node repeatedly sends a registration request to the central node, and after repeatedly sending a set number of times, if the registration is not successful, an alarm message is generated.

In the above-mentioned embodiment of the present invention, any edge node analyzes service data using an abnormality analysis rule to determine whether the first service in the edge node is abnormal; the service data includes service data corresponding to the first service; further, After the edge node determines that the first service is abnormal, if an exception handling rule of the first service exists in the edge node, use the exception handling rule to repair the first service; if the edge node If there is no exception handling rule for the first service in, it is reported to the central node. In the embodiment of the present invention, by placing service abnormality identification and abnormality repair on the edge node side for execution instead of uniformly reporting to the central node for execution, the work pressure of the central node can be effectively reduced, and network overhead and time cost can be saved; In the solution, the edge node performs self-closed loop processing of its own abnormalities, and can also discover and handle abnormalities in time, which not only improves the efficiency of abnormal identification and processing, but also restores service availability in time.

In view of the foregoing method flow, an embodiment of the present invention also provides an edge network-based exception handling device, and the specific content of the device can be implemented with reference to the foregoing method.

Fig. 4 is a schematic structural diagram of an abnormality processing device based on an edge network provided by an embodiment of the present invention. The edge network includes a central node and at least one edge node; the device includes:

Anomaly analysis 401, configured to analyze service data using anomaly analysis rules to determine whether the first service in the edge node is abnormal; the service data includes service data corresponding to the first service;

The exception processing module 402 is configured to, after determining that the first service is abnormal, if an exception handling rule of the first service exists in the edge node, use the exception handling rule to repair the first service; if If there is no exception handling rule for the first service in the edge node, it is reported to the central node.

Optionally, after the exception handling module 402 reports to the central node, the central node determines the exception handling rule of the first service and issues it to the edge node;

The device also includes a transceiver module 403, which is configured to: receive the exception handling rule of the first service sent by the central node;

The exception handling module 402 is further configured to: use the exception handling rule of the first service to repair the first service.

Optionally, the device further includes a transceiver module 403; before the abnormality analysis module 401 analyzes the service data using anomaly analysis rules, the transceiver module 403 is configured to:

Send a registration request to the central node; the registration request is used for the central node to establish a communication connection with the edge node; and, after the communication connection is established with the central node, various service correspondences are obtained from the central node The self-closed-loop strategy of the service; the various services include the first service; the self-closed-loop strategy corresponding to any service includes the exception analysis rules of the service, or also includes the exception handling rules of the service.

Optionally, the self-closed loop strategy corresponding to the various services is obtained in the following manner:

When the central node detects that the user enters the abnormal monitoring configuration information in the abnormal monitoring configuration interface, it obtains and analyzes the abnormal monitoring configuration information, obtains the self-closed loop strategy corresponding to various services, and stores it in the local database of the central node .

Optionally, the anomaly analysis rule of any service includes an anomaly analysis rule corresponding to each monitoring event in the service;

The abnormality analysis module 401 is specifically used for:

For any monitoring event in the first service, the service data of the monitoring event is parsed from the service data of the first service, and an abnormality analysis algorithm that matches the type of the service data of the monitoring event is invoked Analyze the service data of the monitoring event, and if the analysis result meets the first abnormal condition corresponding to the monitoring event, determine that the monitoring event is abnormal, and determine whether the first service is abnormal at least according to the monitoring event; if If the analysis result does not satisfy the first abnormal condition corresponding to the monitoring event, it is determined that the first service is not abnormal.

Optionally, the abnormality analysis module 401 is specifically configured to:

If it is determined that the abnormal condition corresponding to the monitoring event only includes the first abnormal condition, it is determined that the first service is abnormal; if it is determined that the abnormal condition corresponding to the monitoring event also includes a second abnormal condition, and the second abnormal condition Is the impact time, when the abnormal duration of the monitoring event is less than the impact time, it is determined that the first service is not abnormal, and when the abnormal duration of the monitoring event is greater than or equal to the impact time, the first service is determined to be A service is abnormal.

Optionally, the abnormality analysis module 401 is further configured to:

If it is determined that the second abnormal condition is that the associated monitoring event is abnormal at the same time, it is determined whether the other monitoring event associated with the monitoring event is abnormal. When the other monitoring event is also abnormal, it is determined that the first service is abnormal. When at least one other monitoring event is normal, it is determined that the first service is not abnormal.

It can be seen from the above content that in the above-mentioned embodiment of the present invention, any edge node analyzes service data using an abnormality analysis rule to determine whether the first service in the edge node is abnormal; the service data includes the corresponding information for the first service Service data; further, after the edge node determines that the first service is abnormal, if there is an exception handling rule for the first service in the edge node, the exception handling rule is used to perform the first service Repair; if there is no exception handling rule for the first service in the edge node, report to the central node. In the embodiment of the present invention, by placing service abnormality identification and abnormality repair on the edge node side for execution instead of uniformly reporting to the central node for execution, the work pressure of the central node can be effectively reduced, and network overhead and time cost can be saved; In the solution, the edge node performs self-closed loop processing of its own abnormalities, and can also discover and handle abnormalities in time, which not only improves the efficiency of abnormal identification and processing, but also restores service availability in time.

Based on the same inventive concept, an embodiment of the present invention also provides a computing device, as shown in FIG. 5, including at least one processor 501 and a memory 502 connected to the at least one processor. The embodiment of the present invention does not limit the processor For the specific connection medium between the 501 and the memory 502, the connection between the processor 501 and the memory 502 in FIG. 5 is taken as an example. The bus can be divided into address bus, data bus, control bus and so on.

In the embodiment of the present invention, the memory 502 stores instructions that can be executed by at least one processor 501. By executing the instructions stored in the memory 502, the at least one processor 501 can execute the aforementioned edge network-based exception handling method included step.

Among them, the processor 501 is the control center of the computing device, which can use various interfaces and lines to connect various parts of the computing device, and realize data by running or executing instructions stored in the memory 502 and calling data stored in the memory 502. deal with. Optionally, the processor 501 may include one or more processing units, and the processor 501 may integrate an application processor and a modem processor. The application processor mainly processes the operating system, user interface, and application programs. The adjustment processor mainly handles issuing instructions. It can be understood that the foregoing modem processor may not be integrated into the processor 501. In some embodiments, the processor 501 and the memory 502 may be implemented on the same chip, and in some embodiments, they may also be implemented on separate chips.

The processor 501 may be a general-purpose processor, such as a central processing unit (CPU), a digital signal processor, an application specific integrated circuit (ASIC), a field programmable gate array or other programmable logic devices, discrete gates or transistors Logic devices and discrete hardware components can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of the present invention. The general-purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the embodiment of the exception handling based on the edge network can be directly embodied as executed by a hardware processor, or executed by a combination of hardware and software modules in the processor.

As a non-volatile computer-readable storage medium, the memory 502 can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The memory 502 may include at least one type of storage medium, such as flash memory, hard disk, multimedia card, card-type memory, random access memory (Random Access Memory, RAM), static random access memory (Static Random Access Memory, SRAM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), magnetic memory, disk , CD, etc. The memory 502 is any other medium that can be used to carry or store desired program codes in the form of instructions or data structures and that can be accessed by a computer, but is not limited thereto. The memory 502 in the embodiment of the present invention may also be a circuit or any other device capable of realizing a storage function for storing program instructions and/or data.

Based on the same inventive concept, embodiments of the present invention also provide a computer-readable storage medium that stores a computer program executable by a computing device, and when the program runs on the computing device, the computing device executes Figure 2 or Figure 3 arbitrarily described an edge network-based exception handling method.

Those skilled in the art should understand that the embodiments of the present invention can be provided as a method or a computer program product. Therefore, the present invention may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present invention may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.

The present invention is described with reference to flowcharts and/or block diagrams of methods, devices (systems), and computer program products according to embodiments of the present invention. It should be understood that each process and/or block in the flowchart and/or block diagram, and the combination of processes and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data processing equipment to generate a machine, so that the instructions executed by the processor of the computer or other programmable data processing equipment are used to generate It is a device that realizes the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device. The device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment. The instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

Although the preferred embodiments of the present invention have been described, those skilled in the art can make additional changes and modifications to these embodiments once they learn the basic creative concept. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments and all changes and modifications falling within the scope of the present invention.

Obviously, those skilled in the art can make various changes and modifications to the present invention without departing from the spirit and scope of the present invention. In this way, if these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalent technologies, the present invention is also intended to include these modifications and variations.

Claims

An abnormality processing method based on an edge network, wherein the edge network includes a central node and at least one edge node; the method includes:

Any edge node analyzes service data using an abnormality analysis rule to determine whether the first service in the edge node is abnormal; the service data includes service data corresponding to the first service;

After the edge node determines that the first service is abnormal, if there is an exception handling rule for the first service in the edge node, the abnormal handling rule is used to repair the first service; if the edge If there is no exception handling rule for the first service in the node, it is reported to the central node.
The method according to claim 1, wherein after the reporting to the central node, the central node determines the exception handling rule of the first service and sends it to the edge node;

Receiving, by the edge node, the exception handling rule of the first service sent by the central node;

The edge node uses the exception handling rule of the first service to repair the first service.
The method according to claim 1, characterized in that, before any edge node analyzes the service data using an abnormality analysis rule, the method further comprises:

The edge node sends a registration request to the central node; the registration request is used for the central node to establish a communication connection with the edge node;

After the edge node establishes a communication connection with the central node, the self-closed loop strategy corresponding to various services is obtained from the central node; the various services include the first service; the self-closed loop strategy corresponding to any service includes the The exception analysis rule of the service, or the exception handling rule of the service.
The method according to claim 3, wherein the self-closed loop strategies corresponding to the various services are obtained in the following manner:

When the central node detects that the user enters the abnormal monitoring configuration information in the abnormal monitoring configuration interface, it obtains and analyzes the abnormal monitoring configuration information, obtains the self-closed loop strategy corresponding to various services, and stores it in the local database of the central node .
The method according to any one of claims 1 to 4, wherein the anomaly analysis rule of any service includes an anomaly analysis rule corresponding to each monitoring event in the service;

The analysis of service data by any edge node using an abnormality analysis rule to determine whether the first service in the edge node is abnormal includes:

For any monitoring event in the first service, the edge node parses out the service data of the monitoring event from the service data of the first service, and invokes the type of service data matching the monitoring event The abnormal analysis algorithm of the monitoring event analyzes the service data of the monitoring event, and if the analysis result meets the first abnormal condition corresponding to the monitoring event, it is determined that the monitoring event is abnormal, and the first service is determined at least according to the monitoring event Whether it is abnormal; if the analysis result does not meet the first abnormal condition corresponding to the monitoring event, it is determined that the first service is not abnormal.
The method according to claim 5, wherein the edge node determining whether the first service is abnormal at least according to the monitoring event comprises:

If the edge node determines that the abnormal condition corresponding to the monitoring event only includes the first abnormal condition, it determines that the first service is abnormal; if it determines that the abnormal condition corresponding to the monitoring event also includes the second abnormal condition, and the The second abnormal condition is the impact time. When the abnormal duration of the monitoring event is less than the impact time, it is determined that the first service is not abnormal, and when the abnormal duration of the monitoring event is greater than or equal to the impact time, It is determined that the first service is abnormal.
The method according to claim 6, wherein the method further comprises:

If the edge node determines that the second abnormal condition is that the associated monitoring event is abnormal at the same time, it determines whether the other monitoring event associated with the monitoring event is abnormal, and when the other monitoring event is also abnormal, determines the first service Abnormal, when at least one other monitoring event is normal, it is determined that the first service is not abnormal.
An abnormality processing device based on an edge network, wherein the edge network includes a central node and at least one edge node; the device includes:

An anomaly analysis module, configured to analyze service data using anomaly analysis rules to determine whether the first service in the edge node is abnormal; the service data includes service data corresponding to the first service;

The exception processing module is configured to, after determining that the first service is abnormal, if there is an exception handling rule for the first service in the edge node, use the exception handling rule to repair the first service; If there is no exception handling rule for the first service in the edge node, it is reported to the central node.
A computing device, comprising at least one processor and at least one memory, wherein the memory stores a computer program, and when the program is executed by the processor, the processor executes claim 1 ~7 The method of any one of claims.
A computer-readable storage medium, characterized in that it stores a computer program executable by a computing device, and when the program runs on the computing device, the computing device executes any one of claims 1-7 Require the described method.