WO2024139937A1

WO2024139937A1 - Edge-computing-based method and apparatus for monitoring livestream pulling

Info

Publication number: WO2024139937A1
Application number: PCT/CN2023/134464
Authority: WO
Inventors: 吴钊
Original assignee: 天翼数字生活科技有限公司
Priority date: 2022-12-28
Filing date: 2023-11-27
Publication date: 2024-07-04
Also published as: CN116112340A

Abstract

The present application discloses an edge-computing-based method and apparatus for monitoring livestream pulling. The method comprises: according to received historical monitoring data information, using a scheduling center node to sequentially perform classification processing and model training operations, and obtaining a plurality of anomaly detection models, wherein the historical monitoring data information comprises information related to basic data, loads, and monitored devices; based on the historical monitoring data information, using a proxy node to respectively send the plurality of anomaly detection models to the servers corresponding to each monitored device; and according to a computing power balancing mechanism, using the servers and the anomaly detection models to perform real-time anomaly detection on the monitored devices, and obtaining detection results. The present application solves the technical problem of existing monitoring methods either being too one-sided, easily resulting in low accuracy, or easily causing excess pressure on a central node, resulting in the inability to achieve real-time performance.

Description

A method and device for monitoring live streaming based on edge computing

Technical Field

The present application relates to the field of operation and maintenance monitoring technology, and in particular to a live streaming monitoring method and device based on edge computing.

Background technique

With the continuous iteration and update of the Internet, the number of servers has skyrocketed, and business scenarios have become more diverse. Especially in business scenarios where the visual network requires a series of different application services such as live streaming push and pull streaming, authentication, etc., the original alarms based on alarm thresholds are increasingly prone to one-sidedness. When calling AI models for anomaly detection without threshold monitoring, the multi-indicator anomaly detection has too large a difference in feature values in different deployment projects, resulting in low accuracy. Training different models adapted to different deployment projects for anomaly detection at the same time will cause excessive pressure on the server and excessive centralization. That is, when a large number of machine indicators enter the central node to call the AI model for real-time anomaly detection, the central node is overloaded. In addition, when local data reaches the anomaly detection node through multiple layers of the network, it takes time and is difficult to control. It is very likely that when the abnormal results come out, the customer complaint has arrived, affecting the system availability evaluation.

Summary of the invention

The present application provides a live streaming monitoring method and device based on edge computing, which is used to solve the technical problems that existing monitoring methods are either too one-sided and single, which easily leads to low accuracy, or easily cause excessive pressure on the central node and cannot meet real-time requirements.

In view of this, the first aspect of the present application provides a live streaming monitoring method based on edge computing, comprising:

The dispatch center node sequentially performs classification processing and model training operations based on the received historical monitoring data information to obtain multiple anomaly detection models, wherein the historical monitoring data information includes basic data, load, and information related to the monitored machine;

Sending the plurality of anomaly detection models to the servers corresponding to the monitored machines respectively based on the historical monitoring data information through the proxy node;

According to the computing power balancing mechanism, the server performs real-time anomaly detection on the monitored machine according to the anomaly detection model to obtain a detection result.

Preferably, the dispatch center node sequentially divides the received historical monitoring data information into Class processing and model training operations, resulting in a variety of anomaly detection models, including:

Classify and process the received historical monitoring data information according to the monitored machine information in the received historical monitoring data information, and obtain corresponding multiple monitoring information sequences;

Based on the projection mechanism, multiple types of model training operations are performed respectively according to the multiple monitoring information sequences to obtain multiple anomaly detection models.

Preferably, the scheduling center node sequentially performs classification processing and model training operations according to the received historical monitoring data information to obtain multiple anomaly detection models, and the method also includes:

The monitored machine is monitored in real time through the abnormal monitoring equipment, and the collected historical monitoring data information is sent to the dispatching center.

Preferably, the computing power balancing mechanism is used to perform real-time anomaly detection on the monitored machine by the server according to the anomaly detection model to obtain a detection result, including:

Performing real-time anomaly detection on the monitored machine according to the received anomaly detection model by the server;

If the current load of the server exceeds a preset standard value, real-time anomaly detection is performed on the monitored machine according to the anomaly detection model through an adjacent server, or the proxy node, or the dispatch center;

The priority of the neighboring server is higher than that of the proxy node, and the priority of the proxy node is higher than that of the scheduling center.

Preferably, according to the computing power balancing mechanism, the server performs real-time anomaly detection on the monitored machine according to the anomaly detection model to obtain a detection result, and then further includes:

If the detection result is abnormal, the system operation log of the monitored machine is retrieved;

Performing root cause analysis on the parsed system operation log to obtain a root cause analysis result;

The system operation log is annotated according to the root cause analysis result to obtain an annotated structured log.

The second aspect of the present application provides a live streaming monitoring device based on edge computing, including:

A model training unit is used to perform classification processing and model training operations in sequence according to the received historical monitoring data information through the dispatch center node to obtain multiple anomaly detection models, wherein the historical monitoring data information includes basic data, load and information related to the monitored machine;

A model distribution unit, configured to send the plurality of anomaly detection models to servers corresponding to the monitored machines respectively based on the historical monitoring data information through an agent node;

The anomaly detection unit is used to detect the anomaly through the server according to the anomaly detection mechanism. The detection model performs real-time anomaly detection on the monitored machine to obtain a detection result.

Preferably, the model training unit is specifically used for:

Preferably, it also includes:

The data acquisition unit is used to monitor the monitored machine in real time through the abnormal monitoring equipment, and send the collected historical monitoring data information to the dispatching center.

Preferably, the anomaly detection unit is specifically used to:

Preferably, it also includes:

A log pulling unit, used to pull the system operation log of the monitored machine when the detection result is abnormal;

A log analysis unit, used to perform abnormal root cause analysis on the parsed system operation log to obtain a root cause analysis result;

The log annotation unit is used to annotate the system operation log according to the root cause analysis result to obtain an annotated structured log.

It can be seen from the above technical solutions that the embodiments of the present application have the following advantages:

In the present application, a live streaming monitoring method based on edge computing is provided, including: performing classification processing and model training operations in sequence according to historical monitoring data information received by a dispatching center node to obtain a plurality of anomaly detection models, wherein the historical monitoring data information includes basic data, load and information related to the monitored machine; sending the plurality of anomaly detection models to servers corresponding to each monitored machine based on the historical monitoring data information through an agent node; and performing real-time anomaly detection on the monitored machine according to the anomaly detection model according to a computing power balancing mechanism to obtain a detection result.

The live streaming monitoring method based on edge computing provided by the present application uses a multi-class anomaly detection model without a threshold to perform anomaly detection on machines under different engineering projects. It is neither subject to the one-sidedness of the threshold nor the low detection accuracy due to the overly single model. Moreover, each type of model is trained based on machine monitoring data, which is more in line with the actual situation. In order to avoid the model detection calculation being concentrated in the dispatch center node, the model computing power is sunk to the edge, that is, the trained model is directly sent to the server corresponding to the monitored machine for operation, which reduces the computing pressure of the dispatch center and does not cause the data transmission time to be too long, resulting in poor real-time performance. Therefore, the present application can solve the technical problems that the existing monitoring methods are either too one-sided and single, which easily leads to low accuracy, or easily cause excessive pressure on the central node and cannot meet the real-time requirements.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG1 is a flow chart of a live streaming monitoring method based on edge computing provided in an embodiment of the present application;

FIG2 is a schematic diagram of the structure of a live streaming monitoring device based on edge computing provided in an embodiment of the present application;

FIG3 is a schematic diagram of the structure of an edge-sinking live streaming monitoring system provided in an embodiment of the present application;

FIG4 is a schematic diagram of a multi-index data projection processing process provided in an embodiment of the present application;

FIG5 is a schematic diagram of an abnormality detection computing power balancing process provided in an embodiment of the present application.

Detailed ways

In order to enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments are only part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of this application.

For ease of understanding, please refer to FIG. 1 . An embodiment of a live streaming monitoring method based on edge computing provided by the present application includes:

Step 101: The dispatch center node performs classification processing and model training operations in sequence according to the received historical monitoring data information to obtain multiple anomaly detection models. The historical monitoring data information includes basic data, load and information related to the monitored machine.

Furthermore, step 101 includes:

Classify and process the historical monitoring data information according to the monitored machine information in the received historical monitoring data information, and obtain corresponding multiple monitoring information sequences;

Based on the projection mechanism, multiple types of model training operations are performed according to various monitoring information sequences to obtain multiple anomaly detection models.

Please refer to Figure 3. The dispatch center can receive the detection results from the server corresponding to the monitored machine, that is, the reported data. The reported data includes monitoring data information. Generally speaking, the monitoring data used for training the model are historical monitoring data, so it is historical monitoring data information.

In addition to basic data, load, and information related to the monitored machine, historical monitoring data information may also include other types of indicator information, as long as it is related to the monitoring solution; in addition, the information related to the monitored machine specifically includes indicators such as CPU, memory, network, disk IO, RTSP/RTMP/HLS/privatization protocol request success rate and number of connections. For the convenience of analysis, the information related to the monitored machine in this embodiment also includes indicators such as machine specifications and project categories. Therefore, it can be seen that the uploaded historical monitoring data information can not only be used for model training, but also for other tasks, such as data sorting and model distribution.

Please refer to Figure 4. The classification process in this embodiment is based on machine specifications and project categories. Then, based on the projection mechanism, the various monitoring data information in the time series can be randomly projected, that is, the characteristic values of the random time series of the two monitored indicator data are calculated to obtain the correlation between the two indicators. Then, based on the correlation, the histogram of the one-dimensional projection is estimated to obtain the probability of the current point. After projecting the monitoring information sequence, it is input into the initial model for model training operation, and an anomaly detection model that is suitable for a variety of anomalies can be obtained. A variety of anomaly detection models are more in line with the actual live anomaly detection needs, and can improve the detection accuracy of non-threshold models to a certain extent.

Furthermore, step 101, before that, also includes:

The monitored machine is monitored in real time through abnormal monitoring equipment, and the collected historical monitoring data information is sent to the dispatch center.

Please refer to FIG. 3 . This embodiment uses a specific anomaly detection module as an anomaly monitoring device to perform real-time monitoring on the monitored machine, and sends the acquired monitoring data information to the dispatching center for storage, display and model training.

Step 102: Send multiple anomaly detection models to servers corresponding to each monitored machine based on historical monitoring data information through an agent node.

Please refer to Figure 3, where the edge proxy node is the upper node of the monitored machine. The proxy node can forward the model according to the relevant information of the monitored machine in the historical monitoring data information. The model is forwarded in a targeted manner according to the server type, machine model or project category corresponding to the monitored machine, ensuring that the server corresponding to each machine receives the anomaly detection model for its anomaly type, thereby ensuring the reliability of the detection results. In addition, the proxy node can also encapsulate the anomaly detection model before distribution to facilitate analysis and processing. The specific process is not repeated here.

Step 103: Based on the computing power balancing mechanism, the server performs real-time anomaly detection on the monitored machine according to the anomaly detection model to obtain the detection result.

Furthermore, step 103 includes:

The server performs real-time anomaly detection on the monitored machine according to the received anomaly detection model;

If the current load of the server exceeds the preset standard value, the monitored machine will be subjected to real-time anomaly detection according to the anomaly detection model through the adjacent server, proxy node, or dispatch center;

The priority of the neighboring server is higher than that of the proxy node, and the priority of the proxy node is higher than that of the dispatch center.

In this embodiment, the detection computing power of the model is all sunk to the edge node, that is, the server corresponding to the monitored machine, which relieves the computing pressure of the dispatch center. In order to avoid the calculation of the edge node server exceeding its capacity, resulting in the inability to perform normal detection, this embodiment uses a computing power balancing mechanism to perform real-time anomaly detection on the monitored machine according to the anomaly detection model, thereby improving the stability and robustness of the detection process.

Specifically, when the current load of the server is normal, the server corresponding to each machine directly performs real-time anomaly detection according to the received anomaly detection model to obtain the anomaly detection result. Once the current load of the server exceeds the preset standard value, it means that the server is overloaded and cannot undertake the anomaly detection task. The computing power can be shared through the adjacent servers. This work can be achieved through the edge proxy node, that is, computing power balancing; if the adjacent server is also overloaded and cannot share the computing pressure, then seek help from the superior node. The superior node of this embodiment is the edge proxy node, and the proxy node performs real-time anomaly detection according to the anomaly detection model to obtain the anomaly detection result. Similarly, if the proxy node still cannot handle it, it will continue to be dispatched to the next level, and the dispatch center will perform anomaly detection. By distributing the computing pressure in a hierarchical manner to reduce the load pressure in the anomaly detection process, the stability of the algorithm can be improved and the reliability of the detection can be ensured. In general, the priority of the adjacent server is higher than that of the proxy node, and the priority of the proxy node is higher than that of the dispatch center. Please refer to the example of the computing power dispersion mechanism given in Figure 5.

Furthermore, step 103 further includes:

If the detection result is abnormal, pull the system operation log of the monitored machine;

Perform root cause analysis on the parsed system operation logs to obtain root cause analysis results;

The system operation log is annotated according to the root cause analysis results to obtain annotated structured log.

It should be noted that the abnormal detection results in this embodiment are reported to the proxy node, which The system will directly pull the system operation log of the monitored machine, analyze it, and perform root cause analysis based on the analysis content. Then, the abnormality will be segmented and located in the structured log. The abnormal point will be marked to obtain the marked structured log. Sending the marked structured log to the operation and maintenance personnel can provide theoretical support for operation and maintenance and facilitate operation and maintenance troubleshooting.

The edge computing-based live streaming monitoring method provided in the embodiment of the present application uses a multi-class anomaly detection model without a threshold to perform anomaly detection on machines under different engineering projects. It is neither subject to the one-sidedness of the threshold nor the low detection accuracy due to the overly single model. Moreover, each type of model is trained based on machine monitoring data, which is more in line with the actual situation. In order to avoid the model detection calculation being concentrated in the dispatch center node, the model computing power is sunk to the edge, that is, the trained model is directly sent to the server corresponding to the monitored machine for operation, which reduces the computing pressure of the dispatch center and does not cause the data transmission time to be too long, resulting in poor real-time performance. Therefore, the embodiment of the present application can solve the technical problems that the existing monitoring methods are either too one-sided and single, which easily leads to low accuracy, or easily cause excessive pressure on the central node and cannot meet the real-time performance.

For ease of understanding, please refer to FIG. 2 . The present application provides an embodiment of a live streaming monitoring device based on edge computing, including:

The model training unit 201 is used to perform classification processing and model training operations in sequence according to the received historical monitoring data information through the dispatch center node to obtain multiple anomaly detection models, and the historical monitoring data information includes basic data, load and information related to the monitored machine;

A model distribution unit 202 is used to send multiple anomaly detection models to servers corresponding to each monitored machine based on historical monitoring data information through an agent node;

The anomaly detection unit 203 is used to perform real-time anomaly detection on the monitored machine according to the anomaly detection model through the server based on the computing power balancing mechanism to obtain the detection result.

Furthermore, the model training unit 201 is specifically used for:

Furthermore, it also includes:

The data acquisition unit 204 is used to monitor the monitored machine in real time through the abnormal monitoring device, and send the collected historical monitoring data information to the dispatching center.

Furthermore, the abnormality detection unit 203 is specifically configured to:

Furthermore, it also includes:

The log pulling unit 205 is used to pull the system operation log of the monitored machine when the detection result is abnormal;

The log analysis unit 206 is used to perform abnormal root cause analysis on the parsed system operation log to obtain a root cause analysis result;

The log annotation unit 207 is used to annotate the system operation log according to the root cause analysis result to obtain annotated structured log.

In the several embodiments provided in the present application, it should be understood that the disclosed devices and methods can be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of the units is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware or in the form of software functional units.

If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application, or the part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium and includes a number of instructions for executing the program through a computer device (which can be a personal computer, server, or The aforementioned storage medium includes: a USB flash drive, a mobile hard disk, a read-only memory (full name: Read-Only Memory, English abbreviation: ROM), a random access memory (full name: Random Access Memory, English abbreviation: RAM), a magnetic disk or an optical disk, and other media that can store program codes.

As described above, the above embodiments are only used to illustrate the technical solutions of the present application, rather than to limit it. Although the present application has been described in detail with reference to the aforementioned embodiments, a person of ordinary skill in the art should understand that the technical solutions described in the aforementioned embodiments can still be modified, or some of the technical features therein can be replaced by equivalents. However, these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

A live streaming monitoring method based on edge computing, characterized by comprising:

The dispatch center node sequentially performs classification processing and model training operations based on the received historical monitoring data information to obtain multiple anomaly detection models, wherein the historical monitoring data information includes basic data, load, and information related to the monitored machine;

Sending the plurality of anomaly detection models to the servers corresponding to the monitored machines respectively based on the historical monitoring data information through the proxy node;

According to the computing power balancing mechanism, the server performs real-time anomaly detection on the monitored machine according to the anomaly detection model to obtain a detection result.
The live streaming monitoring method based on edge computing according to claim 1 is characterized in that the scheduling center node sequentially performs classification processing and model training operations according to the received historical monitoring data information to obtain multiple anomaly detection models, including:

Classify and process the received historical monitoring data information according to the monitored machine information in the received historical monitoring data information, and obtain corresponding multiple monitoring information sequences;

Based on the projection mechanism, multiple types of model training operations are performed respectively according to the multiple monitoring information sequences to obtain multiple anomaly detection models.
The live streaming monitoring method based on edge computing according to claim 1 is characterized in that the scheduling center node sequentially performs classification processing and model training operations according to the received historical monitoring data information to obtain multiple anomaly detection models, and the method also includes:

The monitored machine is monitored in real time through the abnormal monitoring equipment, and the collected historical monitoring data information is sent to the dispatching center.
The live streaming monitoring method based on edge computing according to claim 1 is characterized in that the server performs real-time anomaly detection on the monitored machine according to the anomaly detection model based on the computing power balancing mechanism to obtain the detection result, including:

Performing real-time anomaly detection on the monitored machine according to the received anomaly detection model by the server;

If the current load of the server exceeds a preset standard value, real-time anomaly detection is performed on the monitored machine according to the anomaly detection model through an adjacent server, or the proxy node, or the dispatch center;

The priority of the adjacent server is higher than that of the proxy node, and the priority of the proxy node is higher than that of The dispatch center.
The live streaming monitoring method based on edge computing according to claim 1 is characterized in that, according to the computing power balancing mechanism, the server performs real-time anomaly detection on the monitored machine according to the anomaly detection model to obtain the detection result, and then further comprises:

If the detection result is abnormal, the system operation log of the monitored machine is retrieved;

Performing root cause analysis on the parsed system operation log to obtain a root cause analysis result;

The system operation log is annotated according to the root cause analysis result to obtain an annotated structured log.
A live streaming monitoring device based on edge computing, characterized by comprising:

A model training unit is used to perform classification processing and model training operations in sequence according to the received historical monitoring data information through the dispatch center node to obtain multiple anomaly detection models, wherein the historical monitoring data information includes basic data, load and information related to the monitored machine;

A model distribution unit, configured to send the plurality of anomaly detection models to servers corresponding to the monitored machines respectively based on the historical monitoring data information through an agent node;

The anomaly detection unit is used to perform real-time anomaly detection on the monitored machine according to the anomaly detection model through the server based on the computing power balancing mechanism to obtain a detection result.
The live streaming monitoring device based on edge computing according to claim 6 is characterized in that the model training unit is specifically used to:

Classify and process the received historical monitoring data information according to the monitored machine information in the received historical monitoring data information, and obtain corresponding multiple monitoring information sequences;

Based on the projection mechanism, multiple types of model training operations are performed respectively according to the multiple monitoring information sequences to obtain multiple anomaly detection models.
The live streaming monitoring device based on edge computing according to claim 6 is characterized in that it also includes:

The data acquisition unit is used to monitor the monitored machine in real time through the abnormal monitoring equipment, and send the collected historical monitoring data information to the dispatching center.
The live streaming monitoring device based on edge computing according to claim 6 is characterized in that the anomaly detection unit is specifically used to:

Performing real-time anomaly detection on the monitored machine according to the received anomaly detection model by the server;

If the current load of the server exceeds the preset standard value, the adjacent server or the The agent node or the dispatch center performs real-time anomaly detection on the monitored machine according to the anomaly detection model;

The priority of the neighboring server is higher than that of the proxy node, and the priority of the proxy node is higher than that of the scheduling center.
The live streaming monitoring device based on edge computing according to claim 6 is characterized in that it also includes:

A log pulling unit, used to pull the system operation log of the monitored machine when the detection result is abnormal;

A log analysis unit, used to perform abnormal root cause analysis on the parsed system operation log to obtain a root cause analysis result;

The log annotation unit is used to annotate the system operation log according to the root cause analysis result to obtain an annotated structured log.