WO2024139937A1 - Edge-computing-based method and apparatus for monitoring livestream pulling - Google Patents

Edge-computing-based method and apparatus for monitoring livestream pulling Download PDF

Info

Publication number
WO2024139937A1
WO2024139937A1 PCT/CN2023/134464 CN2023134464W WO2024139937A1 WO 2024139937 A1 WO2024139937 A1 WO 2024139937A1 CN 2023134464 W CN2023134464 W CN 2023134464W WO 2024139937 A1 WO2024139937 A1 WO 2024139937A1
Authority
WO
WIPO (PCT)
Prior art keywords
anomaly detection
data information
monitoring data
monitored machine
server
Prior art date
Application number
PCT/CN2023/134464
Other languages
French (fr)
Chinese (zh)
Inventor
吴钊
Original Assignee
天翼数字生活科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 天翼数字生活科技有限公司 filed Critical 天翼数字生活科技有限公司
Publication of WO2024139937A1 publication Critical patent/WO2024139937A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/069Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1008Server selection for load balancing based on parameters of servers, e.g. available memory or workload
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the first aspect of the present application provides a live streaming monitoring method based on edge computing, comprising:
  • the dispatch center node sequentially divides the received historical monitoring data information into Class processing and model training operations, resulting in a variety of anomaly detection models, including:
  • the computing power balancing mechanism is used to perform real-time anomaly detection on the monitored machine by the server according to the anomaly detection model to obtain a detection result, including:
  • real-time anomaly detection is performed on the monitored machine according to the anomaly detection model through an adjacent server, or the proxy node, or the dispatch center;
  • the system operation log of the monitored machine is retrieved
  • the classification process in this embodiment is based on machine specifications and project categories. Then, based on the projection mechanism, the various monitoring data information in the time series can be randomly projected, that is, the characteristic values of the random time series of the two monitored indicator data are calculated to obtain the correlation between the two indicators. Then, based on the correlation, the histogram of the one-dimensional projection is estimated to obtain the probability of the current point. After projecting the monitoring information sequence, it is input into the initial model for model training operation, and an anomaly detection model that is suitable for a variety of anomalies can be obtained. A variety of anomaly detection models are more in line with the actual live anomaly detection needs, and can improve the detection accuracy of non-threshold models to a certain extent.
  • step 101 also includes:
  • the edge proxy node is the upper node of the monitored machine.
  • the proxy node can forward the model according to the relevant information of the monitored machine in the historical monitoring data information.
  • the model is forwarded in a targeted manner according to the server type, machine model or project category corresponding to the monitored machine, ensuring that the server corresponding to each machine receives the anomaly detection model for its anomaly type, thereby ensuring the reliability of the detection results.
  • the proxy node can also encapsulate the anomaly detection model before distribution to facilitate analysis and processing. The specific process is not repeated here.
  • Step 103 Based on the computing power balancing mechanism, the server performs real-time anomaly detection on the monitored machine according to the anomaly detection model to obtain the detection result.
  • step 103 includes:
  • the monitored machine will be subjected to real-time anomaly detection according to the anomaly detection model through the adjacent server, proxy node, or dispatch center;
  • the server corresponding to each machine directly performs real-time anomaly detection according to the received anomaly detection model to obtain the anomaly detection result.
  • the server is overloaded and cannot undertake the anomaly detection task.
  • the computing power can be shared through the adjacent servers. This work can be achieved through the edge proxy node, that is, computing power balancing; if the adjacent server is also overloaded and cannot share the computing pressure, then seek help from the superior node.
  • the superior node of this embodiment is the edge proxy node, and the proxy node performs real-time anomaly detection according to the anomaly detection model to obtain the anomaly detection result.
  • the abnormal detection results in this embodiment are reported to the proxy node, which The system will directly pull the system operation log of the monitored machine, analyze it, and perform root cause analysis based on the analysis content. Then, the abnormality will be segmented and located in the structured log. The abnormal point will be marked to obtain the marked structured log. Sending the marked structured log to the operation and maintenance personnel can provide theoretical support for operation and maintenance and facilitate operation and maintenance troubleshooting.
  • the edge computing-based live streaming monitoring method uses a multi-class anomaly detection model without a threshold to perform anomaly detection on machines under different engineering projects. It is neither subject to the one-sidedness of the threshold nor the low detection accuracy due to the overly single model. Moreover, each type of model is trained based on machine monitoring data, which is more in line with the actual situation. In order to avoid the model detection calculation being concentrated in the dispatch center node, the model computing power is sunk to the edge, that is, the trained model is directly sent to the server corresponding to the monitored machine for operation, which reduces the computing pressure of the dispatch center and does not cause the data transmission time to be too long, resulting in poor real-time performance. Therefore, the embodiment of the present application can solve the technical problems that the existing monitoring methods are either too one-sided and single, which easily leads to low accuracy, or easily cause excessive pressure on the central node and cannot meet the real-time performance.
  • the present application provides an embodiment of a live streaming monitoring device based on edge computing, including:
  • the model training unit 201 is used to perform classification processing and model training operations in sequence according to the received historical monitoring data information through the dispatch center node to obtain multiple anomaly detection models, and the historical monitoring data information includes basic data, load and information related to the monitored machine;
  • a model distribution unit 202 is used to send multiple anomaly detection models to servers corresponding to each monitored machine based on historical monitoring data information through an agent node;
  • the anomaly detection unit 203 is used to perform real-time anomaly detection on the monitored machine according to the anomaly detection model through the server based on the computing power balancing mechanism to obtain the detection result.
  • model training unit 201 is specifically used for:
  • the data acquisition unit 204 is used to monitor the monitored machine in real time through the abnormal monitoring device, and send the collected historical monitoring data information to the dispatching center.
  • the abnormality detection unit 203 is specifically configured to:
  • the server performs real-time anomaly detection on the monitored machine according to the received anomaly detection model
  • the monitored machine will be subjected to real-time anomaly detection according to the anomaly detection model through the adjacent server, proxy node, or dispatch center;
  • the priority of the neighboring server is higher than that of the proxy node, and the priority of the proxy node is higher than that of the dispatch center.
  • the disclosed devices and methods can be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the units is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed.
  • Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • a software product which is stored in a storage medium and includes a number of instructions for executing the program through a computer device (which can be a personal computer, server, or
  • the aforementioned storage medium includes: a USB flash drive, a mobile hard disk, a read-only memory (full name: Read-Only Memory, English abbreviation: ROM), a random access memory (full name: Random Access Memory, English abbreviation: RAM), a magnetic disk or an optical disk, and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Debugging And Monitoring (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present application discloses an edge-computing-based method and apparatus for monitoring livestream pulling. The method comprises: according to received historical monitoring data information, using a scheduling center node to sequentially perform classification processing and model training operations, and obtaining a plurality of anomaly detection models, wherein the historical monitoring data information comprises information related to basic data, loads, and monitored devices; based on the historical monitoring data information, using a proxy node to respectively send the plurality of anomaly detection models to the servers corresponding to each monitored device; and according to a computing power balancing mechanism, using the servers and the anomaly detection models to perform real-time anomaly detection on the monitored devices, and obtaining detection results. The present application solves the technical problem of existing monitoring methods either being too one-sided, easily resulting in low accuracy, or easily causing excess pressure on a central node, resulting in the inability to achieve real-time performance.

Description

一种基于边缘计算的直播拉流监测方法及装置A method and device for monitoring live streaming based on edge computing 技术领域Technical Field
本申请涉及运维监控技术领域,尤其涉及一种基于边缘计算的直播拉流监测方法及装置。The present application relates to the field of operation and maintenance monitoring technology, and in particular to a live streaming monitoring method and device based on edge computing.
背景技术Background technique
随着互联网的不断迭代更新,服务器数量的暴增,业务场景的多样化,尤其在视联网需要直播流媒体推拉流、认证等一系列不同应用服务构成的业务场景下,原有的基于告警阈值的告警越来越容易出现片面化的情况,而使用无阈值监控的情况下调用AI模型进行异常检测时,而多指标异常检测在不同的部署工程项目的情况下特征值差别过大导致准确性不高,适配不同部署工程项目的训练不同模型在同时进行异常检测会造成服务器压力过大,过度中心化,即当海量机器指标进入中心节点调用AI模型实时进行异常检测时,中心节点压力过大。此外,本地数据经过多层网络到达异常检测节点时,耗时难以把控,极可能出现异常结果出来时候,客诉已经到达,影响系统可用性评价。With the continuous iteration and update of the Internet, the number of servers has skyrocketed, and business scenarios have become more diverse. Especially in business scenarios where the visual network requires a series of different application services such as live streaming push and pull streaming, authentication, etc., the original alarms based on alarm thresholds are increasingly prone to one-sidedness. When calling AI models for anomaly detection without threshold monitoring, the multi-indicator anomaly detection has too large a difference in feature values in different deployment projects, resulting in low accuracy. Training different models adapted to different deployment projects for anomaly detection at the same time will cause excessive pressure on the server and excessive centralization. That is, when a large number of machine indicators enter the central node to call the AI model for real-time anomaly detection, the central node is overloaded. In addition, when local data reaches the anomaly detection node through multiple layers of the network, it takes time and is difficult to control. It is very likely that when the abnormal results come out, the customer complaint has arrived, affecting the system availability evaluation.
发明内容Summary of the invention
本申请提供了一种基于边缘计算的直播拉流监测方法及装置,用于解决现有的监测方法要么过于片面单一,容易导致准确率较低,要么容易造成中心节点压力过大,且无法满足实时性的技术问题。The present application provides a live streaming monitoring method and device based on edge computing, which is used to solve the technical problems that existing monitoring methods are either too one-sided and single, which easily leads to low accuracy, or easily cause excessive pressure on the central node and cannot meet real-time requirements.
有鉴于此,本申请第一方面提供了一种基于边缘计算的直播拉流监测方法,包括:In view of this, the first aspect of the present application provides a live streaming monitoring method based on edge computing, comprising:
通过调度中心节点根据接收的历史监控数据信息依次进行分类处理和模型训练操作,得到多种异常检测模型,所述历史监控数据信息包括基础数据、负载和被监控机器相关信息;The dispatch center node sequentially performs classification processing and model training operations based on the received historical monitoring data information to obtain multiple anomaly detection models, wherein the historical monitoring data information includes basic data, load, and information related to the monitored machine;
通过代理节点基于所述历史监控数据信息将多种所述异常检测模型分别发送至各个被监控机器对应的服务器中;Sending the plurality of anomaly detection models to the servers corresponding to the monitored machines respectively based on the historical monitoring data information through the proxy node;
依据算力均衡机制,通过所述服务器根据所述异常检测模型对所述被监控机器进行实时的异常检测,得到检测结果。According to the computing power balancing mechanism, the server performs real-time anomaly detection on the monitored machine according to the anomaly detection model to obtain a detection result.
优选地,所述通过调度中心节点根据接收的历史监控数据信息依次进行分 类处理和模型训练操作,得到多种异常检测模型,包括:Preferably, the dispatch center node sequentially divides the received historical monitoring data information into Class processing and model training operations, resulting in a variety of anomaly detection models, including:
根据接收的历史监控数据信息中的被监控机器信息将所述历史监控数据信息进行分类处理,并获取对应的多种监控信息序列;Classify and process the received historical monitoring data information according to the monitored machine information in the received historical monitoring data information, and obtain corresponding multiple monitoring information sequences;
基于投影机制,根据多种所述监控信息序列分别进行多类模型训练操作,得到多种异常检测模型。Based on the projection mechanism, multiple types of model training operations are performed respectively according to the multiple monitoring information sequences to obtain multiple anomaly detection models.
优选地,所述通过调度中心节点根据接收的历史监控数据信息依次进行分类处理和模型训练操作,得到多种异常检测模型,之前还包括:Preferably, the scheduling center node sequentially performs classification processing and model training operations according to the received historical monitoring data information to obtain multiple anomaly detection models, and the method also includes:
通过异常监控设备对所述被监控机器进行实时监控,并将采集的历史监控数据信息发送至所述调度中心。The monitored machine is monitored in real time through the abnormal monitoring equipment, and the collected historical monitoring data information is sent to the dispatching center.
优选地,所述依据算力均衡机制,通过所述服务器根据所述异常检测模型对所述被监控机器进行实时的异常检测,得到检测结果,包括:Preferably, the computing power balancing mechanism is used to perform real-time anomaly detection on the monitored machine by the server according to the anomaly detection model to obtain a detection result, including:
通过所述服务器根据接收到的所述异常检测模型对所述被监控机器进行实时异常检测;Performing real-time anomaly detection on the monitored machine according to the received anomaly detection model by the server;
若所述服务器的当前负载超过预设标准值,则通过相邻服务器,或者所述代理节点,或者所述调度中心根据所述异常检测模型对所述被监控机器进行实时异常检测;If the current load of the server exceeds a preset standard value, real-time anomaly detection is performed on the monitored machine according to the anomaly detection model through an adjacent server, or the proxy node, or the dispatch center;
所述相邻服务器的优先级高于所述代理节点,所述代理节点的优先级高于所述调度中心。The priority of the neighboring server is higher than that of the proxy node, and the priority of the proxy node is higher than that of the scheduling center.
优选地,所述依据算力均衡机制,通过所述服务器根据所述异常检测模型对所述被监控机器进行实时的异常检测,得到检测结果,之后还包括:Preferably, according to the computing power balancing mechanism, the server performs real-time anomaly detection on the monitored machine according to the anomaly detection model to obtain a detection result, and then further includes:
在所述检测结果为异常的情况下,拉取所述被监控机器的系统运行日志;If the detection result is abnormal, the system operation log of the monitored machine is retrieved;
对解析后的所述系统运行日志进行异常根因分析,得到根因分析结果;Performing root cause analysis on the parsed system operation log to obtain a root cause analysis result;
依据所述根因分析结果对所述系统运行日志进行标注,得到标注结构化日志。The system operation log is annotated according to the root cause analysis result to obtain an annotated structured log.
本申请第二方面提供了一种基于边缘计算的直播拉流监测装置,包括:The second aspect of the present application provides a live streaming monitoring device based on edge computing, including:
模型训练单元,用于通过调度中心节点根据接收的历史监控数据信息依次进行分类处理和模型训练操作,得到多种异常检测模型,所述历史监控数据信息包括基础数据、负载和被监控机器相关信息;A model training unit is used to perform classification processing and model training operations in sequence according to the received historical monitoring data information through the dispatch center node to obtain multiple anomaly detection models, wherein the historical monitoring data information includes basic data, load and information related to the monitored machine;
模型分发单元,用于通过代理节点基于所述历史监控数据信息将多种所述异常检测模型分别发送至各个被监控机器对应的服务器中;A model distribution unit, configured to send the plurality of anomaly detection models to servers corresponding to the monitored machines respectively based on the historical monitoring data information through an agent node;
异常检测单元,用于依据算力均衡机制,通过所述服务器根据所述异常检 测模型对所述被监控机器进行实时的异常检测,得到检测结果。The anomaly detection unit is used to detect the anomaly through the server according to the anomaly detection mechanism. The detection model performs real-time anomaly detection on the monitored machine to obtain a detection result.
优选地,所述模型训练单元,具体用于:Preferably, the model training unit is specifically used for:
根据接收的历史监控数据信息中的被监控机器信息将所述历史监控数据信息进行分类处理,并获取对应的多种监控信息序列;Classify and process the received historical monitoring data information according to the monitored machine information in the received historical monitoring data information, and obtain corresponding multiple monitoring information sequences;
基于投影机制,根据多种所述监控信息序列分别进行多类模型训练操作,得到多种异常检测模型。Based on the projection mechanism, multiple types of model training operations are performed respectively according to the multiple monitoring information sequences to obtain multiple anomaly detection models.
优选地,还包括:Preferably, it also includes:
数据获取单元,用于通过异常监控设备对所述被监控机器进行实时监控,并将采集的历史监控数据信息发送至所述调度中心。The data acquisition unit is used to monitor the monitored machine in real time through the abnormal monitoring equipment, and send the collected historical monitoring data information to the dispatching center.
优选地,所述异常检测单元,具体用于:Preferably, the anomaly detection unit is specifically used to:
通过所述服务器根据接收到的所述异常检测模型对所述被监控机器进行实时异常检测;Performing real-time anomaly detection on the monitored machine according to the received anomaly detection model by the server;
若所述服务器的当前负载超过预设标准值,则通过相邻服务器,或者所述代理节点,或者所述调度中心根据所述异常检测模型对所述被监控机器进行实时异常检测;If the current load of the server exceeds a preset standard value, real-time anomaly detection is performed on the monitored machine according to the anomaly detection model through an adjacent server, or the proxy node, or the dispatch center;
所述相邻服务器的优先级高于所述代理节点,所述代理节点的优先级高于所述调度中心。The priority of the neighboring server is higher than that of the proxy node, and the priority of the proxy node is higher than that of the scheduling center.
优选地,还包括:Preferably, it also includes:
日志拉取单元,用于在所述检测结果为异常的情况下,拉取所述被监控机器的系统运行日志;A log pulling unit, used to pull the system operation log of the monitored machine when the detection result is abnormal;
日志分析单元,用于对解析后的所述系统运行日志进行异常根因分析,得到根因分析结果;A log analysis unit, used to perform abnormal root cause analysis on the parsed system operation log to obtain a root cause analysis result;
日志标注单元,用于依据所述根因分析结果对所述系统运行日志进行标注,得到标注结构化日志。The log annotation unit is used to annotate the system operation log according to the root cause analysis result to obtain an annotated structured log.
从以上技术方案可以看出,本申请实施例具有以下优点:It can be seen from the above technical solutions that the embodiments of the present application have the following advantages:
本申请中,提供了一种基于边缘计算的直播拉流监测方法,包括:通过调度中心节点根据接收的历史监控数据信息依次进行分类处理和模型训练操作,得到多种异常检测模型,历史监控数据信息包括基础数据、负载和被监控机器相关信息;通过代理节点基于历史监控数据信息将多种异常检测模型分别发送至各个被监控机器对应的服务器中;依据算力均衡机制,通过服务器根据异常检测模型对被监控机器进行实时的异常检测,得到检测结果。 In the present application, a live streaming monitoring method based on edge computing is provided, including: performing classification processing and model training operations in sequence according to historical monitoring data information received by a dispatching center node to obtain a plurality of anomaly detection models, wherein the historical monitoring data information includes basic data, load and information related to the monitored machine; sending the plurality of anomaly detection models to servers corresponding to each monitored machine based on the historical monitoring data information through an agent node; and performing real-time anomaly detection on the monitored machine according to the anomaly detection model according to a computing power balancing mechanism to obtain a detection result.
本申请提供的基于边缘计算的直播拉流监测方法,采用无阈值的多类异常检测模型对不同工程项目下的机器进行异常检测,既不会受制于阈值的片面性,也不会因为模型过于单一导致检测准确率较低,且每类模型都是基于机器监控数据训练得到,更符合实际情况。为了避免模型检测计算聚集在调度中心节点,将模型计算力下沉至边缘,即直接将训练好的模型发送至被监控机器对应的服务器中运行,减小调度中心的计算压力,也不会出现数据传输时间过长导致实时性较差的情况。因此,本申请能够解决现有的监测方法要么过于片面单一,容易导致准确率较低,要么容易造成中心节点压力过大,且无法满足实时性的技术问题。The live streaming monitoring method based on edge computing provided by the present application uses a multi-class anomaly detection model without a threshold to perform anomaly detection on machines under different engineering projects. It is neither subject to the one-sidedness of the threshold nor the low detection accuracy due to the overly single model. Moreover, each type of model is trained based on machine monitoring data, which is more in line with the actual situation. In order to avoid the model detection calculation being concentrated in the dispatch center node, the model computing power is sunk to the edge, that is, the trained model is directly sent to the server corresponding to the monitored machine for operation, which reduces the computing pressure of the dispatch center and does not cause the data transmission time to be too long, resulting in poor real-time performance. Therefore, the present application can solve the technical problems that the existing monitoring methods are either too one-sided and single, which easily leads to low accuracy, or easily cause excessive pressure on the central node and cannot meet the real-time requirements.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1为本申请实施例提供的一种基于边缘计算的直播拉流监测方法的流程示意图;FIG1 is a flow chart of a live streaming monitoring method based on edge computing provided in an embodiment of the present application;
图2为本申请实施例提供的一种基于边缘计算的直播拉流监测装置的结构示意图;FIG2 is a schematic diagram of the structure of a live streaming monitoring device based on edge computing provided in an embodiment of the present application;
图3为本申请实施例提供的边缘下沉的直播拉流监测系统结构示意图;FIG3 is a schematic diagram of the structure of an edge-sinking live streaming monitoring system provided in an embodiment of the present application;
图4为本申请实施例提供的多指标数据投影处理过程示意图;FIG4 is a schematic diagram of a multi-index data projection processing process provided in an embodiment of the present application;
图5为本申请实施例提供的异常检测算力均衡过程示意图。FIG5 is a schematic diagram of an abnormality detection computing power balancing process provided in an embodiment of the present application.
具体实施方式Detailed ways
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments are only part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of this application.
为了便于理解,请参阅图1,本申请提供的一种基于边缘计算的直播拉流监测方法的实施例,包括:For ease of understanding, please refer to FIG. 1 . An embodiment of a live streaming monitoring method based on edge computing provided by the present application includes:
步骤101、通过调度中心节点根据接收的历史监控数据信息依次进行分类处理和模型训练操作,得到多种异常检测模型,历史监控数据信息包括基础数据、负载和被监控机器相关信息。Step 101: The dispatch center node performs classification processing and model training operations in sequence according to the received historical monitoring data information to obtain multiple anomaly detection models. The historical monitoring data information includes basic data, load and information related to the monitored machine.
进一步地,步骤101,包括: Furthermore, step 101 includes:
根据接收的历史监控数据信息中的被监控机器信息将历史监控数据信息进行分类处理,并获取对应的多种监控信息序列;Classify and process the historical monitoring data information according to the monitored machine information in the received historical monitoring data information, and obtain corresponding multiple monitoring information sequences;
基于投影机制,根据多种监控信息序列分别进行多类模型训练操作,得到多种异常检测模型。Based on the projection mechanism, multiple types of model training operations are performed according to various monitoring information sequences to obtain multiple anomaly detection models.
请参阅图3,调度中心可以接收来自被监控机器对应的服务器的检测结果,即上报数据,这个上报数据就包括监控数据信息,用于训练模型的一般而言都是历史时间的监控数据,所以是历史监控数据信息。Please refer to Figure 3. The dispatch center can receive the detection results from the server corresponding to the monitored machine, that is, the reported data. The reported data includes monitoring data information. Generally speaking, the monitoring data used for training the model are historical monitoring data, so it is historical monitoring data information.
历史监控数据信息除了包括基础数据、负载和被监控机器相关信息等指标信息之外,还可以包含其他类指标信息,只要和监控方案相关即可;此外,被监控机器相关信息具体包括CPU、内存、网络、磁盘IO、RTSP/RTMP/HLS/私有化协议请求成功率以及连接数等指标,为了便于分析,本实施例的被监控机器相关信息还包括机器规格和所属项目类别等指标。所以可知,上传的历史监控数据信息不仅仅可以进行模型训练,还可以用于其他任务中,例如数据整理和模型分发等。In addition to basic data, load, and information related to the monitored machine, historical monitoring data information may also include other types of indicator information, as long as it is related to the monitoring solution; in addition, the information related to the monitored machine specifically includes indicators such as CPU, memory, network, disk IO, RTSP/RTMP/HLS/privatization protocol request success rate and number of connections. For the convenience of analysis, the information related to the monitored machine in this embodiment also includes indicators such as machine specifications and project categories. Therefore, it can be seen that the uploaded historical monitoring data information can not only be used for model training, but also for other tasks, such as data sorting and model distribution.
请参阅图4,本实施例中的分类过程依据的则是机器规格和项目类别;然后可以基于投影机制将时序上的多种监控数据信息进行随机投影,即将两个监控的指标数据求随机时间序列的特征值,获取到两个指标之间的关联关系;接着基于关联关系估计一维投影的直方图,得到当前点的概率。将监控信息序列投影后输入初始模型进行模型训练操作,就可以得到适应于多种异常的异常检测模型。多种异常检测模型更加符合实际的直播异常检测需求,能够在一定程度上提升非阈值模型的检测准确率。Please refer to Figure 4. The classification process in this embodiment is based on machine specifications and project categories. Then, based on the projection mechanism, the various monitoring data information in the time series can be randomly projected, that is, the characteristic values of the random time series of the two monitored indicator data are calculated to obtain the correlation between the two indicators. Then, based on the correlation, the histogram of the one-dimensional projection is estimated to obtain the probability of the current point. After projecting the monitoring information sequence, it is input into the initial model for model training operation, and an anomaly detection model that is suitable for a variety of anomalies can be obtained. A variety of anomaly detection models are more in line with the actual live anomaly detection needs, and can improve the detection accuracy of non-threshold models to a certain extent.
进一步地,步骤101,之前还包括:Furthermore, step 101, before that, also includes:
通过异常监控设备对被监控机器进行实时监控,并将采集的历史监控数据信息发送至调度中心。The monitored machine is monitored in real time through abnormal monitoring equipment, and the collected historical monitoring data information is sent to the dispatch center.
请参阅图3,本实施例采用具体的异常检测模块作为异常监控设备对被监控机器进行实时监控,将获取的监控数据信息发送至调度中心进行存储、展示和模型训练。Please refer to FIG. 3 . This embodiment uses a specific anomaly detection module as an anomaly monitoring device to perform real-time monitoring on the monitored machine, and sends the acquired monitoring data information to the dispatching center for storage, display and model training.
步骤102、通过代理节点基于历史监控数据信息将多种异常检测模型分别发送至各个被监控机器对应的服务器中。Step 102: Send multiple anomaly detection models to servers corresponding to each monitored machine based on historical monitoring data information through an agent node.
请参阅图3,其中的边缘代理节点即为被监控机器的上层节点,代理节点能够根据历史监控数据信息中被监控机器相关信息实现模型的转发,具体可以 根据被监控机器对应的服务器类型、机器型号或者项目类别等信息进行模型针对性转发,确保每个机器对应的服务器接收到的都是针对其异常类型的异常检测模型,从而保证检测结果的可靠性。除此之外,代理节点还可以在分发之前对异常检测模型进行封装,便于分析处理,具体过程不赘述。Please refer to Figure 3, where the edge proxy node is the upper node of the monitored machine. The proxy node can forward the model according to the relevant information of the monitored machine in the historical monitoring data information. The model is forwarded in a targeted manner according to the server type, machine model or project category corresponding to the monitored machine, ensuring that the server corresponding to each machine receives the anomaly detection model for its anomaly type, thereby ensuring the reliability of the detection results. In addition, the proxy node can also encapsulate the anomaly detection model before distribution to facilitate analysis and processing. The specific process is not repeated here.
步骤103、依据算力均衡机制,通过服务器根据异常检测模型对被监控机器进行实时的异常检测,得到检测结果。Step 103: Based on the computing power balancing mechanism, the server performs real-time anomaly detection on the monitored machine according to the anomaly detection model to obtain the detection result.
进一步地,步骤103,包括:Furthermore, step 103 includes:
通过服务器根据接收到的异常检测模型对被监控机器进行实时异常检测;The server performs real-time anomaly detection on the monitored machine according to the received anomaly detection model;
若服务器的当前负载超过预设标准值,则通过相邻服务器,或者代理节点,或者调度中心根据异常检测模型对被监控机器进行实时异常检测;If the current load of the server exceeds the preset standard value, the monitored machine will be subjected to real-time anomaly detection according to the anomaly detection model through the adjacent server, proxy node, or dispatch center;
相邻服务器的优先级高于代理节点,代理节点的优先级高于调度中心。The priority of the neighboring server is higher than that of the proxy node, and the priority of the proxy node is higher than that of the dispatch center.
在本实施例中,模型的检测算力均下沉至边缘节点,即被监控机器对应的服务器中,缓解了调度中心的计算压力。为了避免边缘节点的服务器的计算超出其能力范围,导致无法进行正常检测,本实施例采用算力均衡机制根据异常检测模型对被监控机器进行实时异常检测,提高了检测过程的稳定性和鲁棒性。In this embodiment, the detection computing power of the model is all sunk to the edge node, that is, the server corresponding to the monitored machine, which relieves the computing pressure of the dispatch center. In order to avoid the calculation of the edge node server exceeding its capacity, resulting in the inability to perform normal detection, this embodiment uses a computing power balancing mechanism to perform real-time anomaly detection on the monitored machine according to the anomaly detection model, thereby improving the stability and robustness of the detection process.
具体的,在服务器的当前负载正常的情况下,每个机器对应的服务器直接根据接收到的异常检测模型进行实时异常检测,得到异常检测结果即可。一旦服务器当前负载超过预设标准值,则说明服务器过载,无法承担异常检测任务,则可以通过相邻服务器进行算力分担,这一工作可以通过边缘代理节点实现,即进行算力均衡;若是相邻服务器也过载无法分担计算压力,则向上级节点寻求帮助,本实施例的上级节点为边缘代理节点,由代理节点根据异常检测模型进行实时异常检测,得到异常检测结果。同理若是代理节点仍然无法处理,则继续向上一级调度,由调度中心进行异常检测。通过层级分散计算压力的方式减缓异常检测过程中的负载压力,能够提升算法的稳定性,确保检测的可靠性。总的来说就是,相邻服务器的优先级高于代理节点,代理节点的优先级高于调度中心,请参阅图5给出的算力分散机制示例。Specifically, when the current load of the server is normal, the server corresponding to each machine directly performs real-time anomaly detection according to the received anomaly detection model to obtain the anomaly detection result. Once the current load of the server exceeds the preset standard value, it means that the server is overloaded and cannot undertake the anomaly detection task. The computing power can be shared through the adjacent servers. This work can be achieved through the edge proxy node, that is, computing power balancing; if the adjacent server is also overloaded and cannot share the computing pressure, then seek help from the superior node. The superior node of this embodiment is the edge proxy node, and the proxy node performs real-time anomaly detection according to the anomaly detection model to obtain the anomaly detection result. Similarly, if the proxy node still cannot handle it, it will continue to be dispatched to the next level, and the dispatch center will perform anomaly detection. By distributing the computing pressure in a hierarchical manner to reduce the load pressure in the anomaly detection process, the stability of the algorithm can be improved and the reliability of the detection can be ensured. In general, the priority of the adjacent server is higher than that of the proxy node, and the priority of the proxy node is higher than that of the dispatch center. Please refer to the example of the computing power dispersion mechanism given in Figure 5.
进一步地,步骤103,之后还包括:Furthermore, step 103 further includes:
在检测结果为异常的情况下,拉取被监控机器的系统运行日志;If the detection result is abnormal, pull the system operation log of the monitored machine;
对解析后的系统运行日志进行异常根因分析,得到根因分析结果;Perform root cause analysis on the parsed system operation logs to obtain root cause analysis results;
依据根因分析结果对系统运行日志进行标注,得到标注结构化日志。The system operation log is annotated according to the root cause analysis results to obtain annotated structured log.
需要说明的是,本实施例中的异常检测结果是上报代理节点的,由代理节 点判断结果是否异常,若是则直接拉取被监控机器的系统运行日志,进行解析,并根据解析内容进行根因分析,然后在结构化日志中分段定位出异常,将异常点进行标注,就可以得到标注结构化日志。将标注结构化日志发送给运维人员,可以提供运维理论支撑,便于运维排查。It should be noted that the abnormal detection results in this embodiment are reported to the proxy node, which The system will directly pull the system operation log of the monitored machine, analyze it, and perform root cause analysis based on the analysis content. Then, the abnormality will be segmented and located in the structured log. The abnormal point will be marked to obtain the marked structured log. Sending the marked structured log to the operation and maintenance personnel can provide theoretical support for operation and maintenance and facilitate operation and maintenance troubleshooting.
本申请实施例提供的基于边缘计算的直播拉流监测方法,采用无阈值的多类异常检测模型对不同工程项目下的机器进行异常检测,既不会受制于阈值的片面性,也不会因为模型过于单一导致检测准确率较低,且每类模型都是基于机器监控数据训练得到,更符合实际情况。为了避免模型检测计算聚集在调度中心节点,将模型计算力下沉至边缘,即直接将训练好的模型发送至被监控机器对应的服务器中运行,减小调度中心的计算压力,也不会出现数据传输时间过长导致实时性较差的情况。因此,本申请实施例能够解决现有的监测方法要么过于片面单一,容易导致准确率较低,要么容易造成中心节点压力过大,且无法满足实时性的技术问题。The edge computing-based live streaming monitoring method provided in the embodiment of the present application uses a multi-class anomaly detection model without a threshold to perform anomaly detection on machines under different engineering projects. It is neither subject to the one-sidedness of the threshold nor the low detection accuracy due to the overly single model. Moreover, each type of model is trained based on machine monitoring data, which is more in line with the actual situation. In order to avoid the model detection calculation being concentrated in the dispatch center node, the model computing power is sunk to the edge, that is, the trained model is directly sent to the server corresponding to the monitored machine for operation, which reduces the computing pressure of the dispatch center and does not cause the data transmission time to be too long, resulting in poor real-time performance. Therefore, the embodiment of the present application can solve the technical problems that the existing monitoring methods are either too one-sided and single, which easily leads to low accuracy, or easily cause excessive pressure on the central node and cannot meet the real-time performance.
为了便于理解,请参阅图2,本申请提供了一种基于边缘计算的直播拉流监测装置的实施例,包括:历史For ease of understanding, please refer to FIG. 2 . The present application provides an embodiment of a live streaming monitoring device based on edge computing, including:
模型训练单元201,用于通过调度中心节点根据接收的历史监控数据信息依次进行分类处理和模型训练操作,得到多种异常检测模型,历史监控数据信息包括基础数据、负载和被监控机器相关信息;The model training unit 201 is used to perform classification processing and model training operations in sequence according to the received historical monitoring data information through the dispatch center node to obtain multiple anomaly detection models, and the historical monitoring data information includes basic data, load and information related to the monitored machine;
模型分发单元202,用于通过代理节点基于历史监控数据信息将多种异常检测模型分别发送至各个被监控机器对应的服务器中;A model distribution unit 202 is used to send multiple anomaly detection models to servers corresponding to each monitored machine based on historical monitoring data information through an agent node;
异常检测单元203,用于依据算力均衡机制,通过服务器根据异常检测模型对被监控机器进行实时的异常检测,得到检测结果。The anomaly detection unit 203 is used to perform real-time anomaly detection on the monitored machine according to the anomaly detection model through the server based on the computing power balancing mechanism to obtain the detection result.
进一步地,模型训练单元201,具体用于:Furthermore, the model training unit 201 is specifically used for:
根据接收的历史监控数据信息中的被监控机器信息将历史监控数据信息进行分类处理,并获取对应的多种监控信息序列;Classify and process the historical monitoring data information according to the monitored machine information in the received historical monitoring data information, and obtain corresponding multiple monitoring information sequences;
基于投影机制,根据多种监控信息序列分别进行多类模型训练操作,得到多种异常检测模型。Based on the projection mechanism, multiple types of model training operations are performed according to various monitoring information sequences to obtain multiple anomaly detection models.
进一步地,还包括:Furthermore, it also includes:
数据获取单元204,用于通过异常监控设备对被监控机器进行实时监控,并将采集的历史监控数据信息发送至调度中心。The data acquisition unit 204 is used to monitor the monitored machine in real time through the abnormal monitoring device, and send the collected historical monitoring data information to the dispatching center.
进一步地,异常检测单元203,具体用于: Furthermore, the abnormality detection unit 203 is specifically configured to:
通过服务器根据接收到的异常检测模型对被监控机器进行实时异常检测;The server performs real-time anomaly detection on the monitored machine according to the received anomaly detection model;
若服务器的当前负载超过预设标准值,则通过相邻服务器,或者代理节点,或者调度中心根据异常检测模型对被监控机器进行实时异常检测;If the current load of the server exceeds the preset standard value, the monitored machine will be subjected to real-time anomaly detection according to the anomaly detection model through the adjacent server, proxy node, or dispatch center;
相邻服务器的优先级高于代理节点,代理节点的优先级高于调度中心。The priority of the neighboring server is higher than that of the proxy node, and the priority of the proxy node is higher than that of the dispatch center.
进一步地,还包括:Furthermore, it also includes:
日志拉取单元205,用于在检测结果为异常的情况下,拉取被监控机器的系统运行日志;The log pulling unit 205 is used to pull the system operation log of the monitored machine when the detection result is abnormal;
日志分析单元206,用于对解析后的系统运行日志进行异常根因分析,得到根因分析结果;The log analysis unit 206 is used to perform abnormal root cause analysis on the parsed system operation log to obtain a root cause analysis result;
日志标注单元207,用于依据根因分析结果对系统运行日志进行标注,得到标注结构化日志。The log annotation unit 207 is used to annotate the system operation log according to the root cause analysis result to obtain annotated structured log.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in the present application, it should be understood that the disclosed devices and methods can be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of the units is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware or in the form of software functional units.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以通过一台计算机设备(可以是个人计算机,服务器,或 者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(英文全称:Read-Only Memory,英文缩写:ROM)、随机存取存储器(英文全称:Random Access Memory,英文缩写:RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application, or the part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium and includes a number of instructions for executing the program through a computer device (which can be a personal computer, server, or The aforementioned storage medium includes: a USB flash drive, a mobile hard disk, a read-only memory (full name: Read-Only Memory, English abbreviation: ROM), a random access memory (full name: Random Access Memory, English abbreviation: RAM), a magnetic disk or an optical disk, and other media that can store program codes.
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。 As described above, the above embodiments are only used to illustrate the technical solutions of the present application, rather than to limit it. Although the present application has been described in detail with reference to the aforementioned embodiments, a person of ordinary skill in the art should understand that the technical solutions described in the aforementioned embodiments can still be modified, or some of the technical features therein can be replaced by equivalents. However, these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (10)

  1. 一种基于边缘计算的直播拉流监测方法,其特征在于,包括:A live streaming monitoring method based on edge computing, characterized by comprising:
    通过调度中心节点根据接收的历史监控数据信息依次进行分类处理和模型训练操作,得到多种异常检测模型,所述历史监控数据信息包括基础数据、负载和被监控机器相关信息;The dispatch center node sequentially performs classification processing and model training operations based on the received historical monitoring data information to obtain multiple anomaly detection models, wherein the historical monitoring data information includes basic data, load, and information related to the monitored machine;
    通过代理节点基于所述历史监控数据信息将多种所述异常检测模型分别发送至各个被监控机器对应的服务器中;Sending the plurality of anomaly detection models to the servers corresponding to the monitored machines respectively based on the historical monitoring data information through the proxy node;
    依据算力均衡机制,通过所述服务器根据所述异常检测模型对所述被监控机器进行实时的异常检测,得到检测结果。According to the computing power balancing mechanism, the server performs real-time anomaly detection on the monitored machine according to the anomaly detection model to obtain a detection result.
  2. 根据权利要求1所述的基于边缘计算的直播拉流监测方法,其特征在于,所述通过调度中心节点根据接收的历史监控数据信息依次进行分类处理和模型训练操作,得到多种异常检测模型,包括:The live streaming monitoring method based on edge computing according to claim 1 is characterized in that the scheduling center node sequentially performs classification processing and model training operations according to the received historical monitoring data information to obtain multiple anomaly detection models, including:
    根据接收的历史监控数据信息中的被监控机器信息将所述历史监控数据信息进行分类处理,并获取对应的多种监控信息序列;Classify and process the received historical monitoring data information according to the monitored machine information in the received historical monitoring data information, and obtain corresponding multiple monitoring information sequences;
    基于投影机制,根据多种所述监控信息序列分别进行多类模型训练操作,得到多种异常检测模型。Based on the projection mechanism, multiple types of model training operations are performed respectively according to the multiple monitoring information sequences to obtain multiple anomaly detection models.
  3. 根据权利要求1所述的基于边缘计算的直播拉流监测方法,其特征在于,所述通过调度中心节点根据接收的历史监控数据信息依次进行分类处理和模型训练操作,得到多种异常检测模型,之前还包括:The live streaming monitoring method based on edge computing according to claim 1 is characterized in that the scheduling center node sequentially performs classification processing and model training operations according to the received historical monitoring data information to obtain multiple anomaly detection models, and the method also includes:
    通过异常监控设备对所述被监控机器进行实时监控,并将采集的历史监控数据信息发送至所述调度中心。The monitored machine is monitored in real time through the abnormal monitoring equipment, and the collected historical monitoring data information is sent to the dispatching center.
  4. 根据权利要求1所述的基于边缘计算的直播拉流监测方法,其特征在于,所述依据算力均衡机制,通过所述服务器根据所述异常检测模型对所述被监控机器进行实时的异常检测,得到检测结果,包括:The live streaming monitoring method based on edge computing according to claim 1 is characterized in that the server performs real-time anomaly detection on the monitored machine according to the anomaly detection model based on the computing power balancing mechanism to obtain the detection result, including:
    通过所述服务器根据接收到的所述异常检测模型对所述被监控机器进行实时异常检测;Performing real-time anomaly detection on the monitored machine according to the received anomaly detection model by the server;
    若所述服务器的当前负载超过预设标准值,则通过相邻服务器,或者所述代理节点,或者所述调度中心根据所述异常检测模型对所述被监控机器进行实时异常检测;If the current load of the server exceeds a preset standard value, real-time anomaly detection is performed on the monitored machine according to the anomaly detection model through an adjacent server, or the proxy node, or the dispatch center;
    所述相邻服务器的优先级高于所述代理节点,所述代理节点的优先级高于 所述调度中心。The priority of the adjacent server is higher than that of the proxy node, and the priority of the proxy node is higher than that of The dispatch center.
  5. 根据权利要求1所述的基于边缘计算的直播拉流监测方法,其特征在于,所述依据算力均衡机制,通过所述服务器根据所述异常检测模型对所述被监控机器进行实时的异常检测,得到检测结果,之后还包括:The live streaming monitoring method based on edge computing according to claim 1 is characterized in that, according to the computing power balancing mechanism, the server performs real-time anomaly detection on the monitored machine according to the anomaly detection model to obtain the detection result, and then further comprises:
    在所述检测结果为异常的情况下,拉取所述被监控机器的系统运行日志;If the detection result is abnormal, the system operation log of the monitored machine is retrieved;
    对解析后的所述系统运行日志进行异常根因分析,得到根因分析结果;Performing root cause analysis on the parsed system operation log to obtain a root cause analysis result;
    依据所述根因分析结果对所述系统运行日志进行标注,得到标注结构化日志。The system operation log is annotated according to the root cause analysis result to obtain an annotated structured log.
  6. 一种基于边缘计算的直播拉流监测装置,其特征在于,包括:A live streaming monitoring device based on edge computing, characterized by comprising:
    模型训练单元,用于通过调度中心节点根据接收的历史监控数据信息依次进行分类处理和模型训练操作,得到多种异常检测模型,所述历史监控数据信息包括基础数据、负载和被监控机器相关信息;A model training unit is used to perform classification processing and model training operations in sequence according to the received historical monitoring data information through the dispatch center node to obtain multiple anomaly detection models, wherein the historical monitoring data information includes basic data, load and information related to the monitored machine;
    模型分发单元,用于通过代理节点基于所述历史监控数据信息将多种所述异常检测模型分别发送至各个被监控机器对应的服务器中;A model distribution unit, configured to send the plurality of anomaly detection models to servers corresponding to the monitored machines respectively based on the historical monitoring data information through an agent node;
    异常检测单元,用于依据算力均衡机制,通过所述服务器根据所述异常检测模型对所述被监控机器进行实时的异常检测,得到检测结果。The anomaly detection unit is used to perform real-time anomaly detection on the monitored machine according to the anomaly detection model through the server based on the computing power balancing mechanism to obtain a detection result.
  7. 根据权利要求6所述的基于边缘计算的直播拉流监测装置,其特征在于,所述模型训练单元,具体用于:The live streaming monitoring device based on edge computing according to claim 6 is characterized in that the model training unit is specifically used to:
    根据接收的历史监控数据信息中的被监控机器信息将所述历史监控数据信息进行分类处理,并获取对应的多种监控信息序列;Classify and process the received historical monitoring data information according to the monitored machine information in the received historical monitoring data information, and obtain corresponding multiple monitoring information sequences;
    基于投影机制,根据多种所述监控信息序列分别进行多类模型训练操作,得到多种异常检测模型。Based on the projection mechanism, multiple types of model training operations are performed respectively according to the multiple monitoring information sequences to obtain multiple anomaly detection models.
  8. 根据权利要求6所述的基于边缘计算的直播拉流监测装置,其特征在于,还包括:The live streaming monitoring device based on edge computing according to claim 6 is characterized in that it also includes:
    数据获取单元,用于通过异常监控设备对所述被监控机器进行实时监控,并将采集的历史监控数据信息发送至所述调度中心。The data acquisition unit is used to monitor the monitored machine in real time through the abnormal monitoring equipment, and send the collected historical monitoring data information to the dispatching center.
  9. 根据权利要求6所述的基于边缘计算的直播拉流监测装置,其特征在于,所述异常检测单元,具体用于:The live streaming monitoring device based on edge computing according to claim 6 is characterized in that the anomaly detection unit is specifically used to:
    通过所述服务器根据接收到的所述异常检测模型对所述被监控机器进行实时异常检测;Performing real-time anomaly detection on the monitored machine according to the received anomaly detection model by the server;
    若所述服务器的当前负载超过预设标准值,则通过相邻服务器,或者所述 代理节点,或者所述调度中心根据所述异常检测模型对所述被监控机器进行实时异常检测;If the current load of the server exceeds the preset standard value, the adjacent server or the The agent node or the dispatch center performs real-time anomaly detection on the monitored machine according to the anomaly detection model;
    所述相邻服务器的优先级高于所述代理节点,所述代理节点的优先级高于所述调度中心。The priority of the neighboring server is higher than that of the proxy node, and the priority of the proxy node is higher than that of the scheduling center.
  10. 根据权利要求6所述的基于边缘计算的直播拉流监测装置,其特征在于,还包括:The live streaming monitoring device based on edge computing according to claim 6 is characterized in that it also includes:
    日志拉取单元,用于在所述检测结果为异常的情况下,拉取所述被监控机器的系统运行日志;A log pulling unit, used to pull the system operation log of the monitored machine when the detection result is abnormal;
    日志分析单元,用于对解析后的所述系统运行日志进行异常根因分析,得到根因分析结果;A log analysis unit, used to perform abnormal root cause analysis on the parsed system operation log to obtain a root cause analysis result;
    日志标注单元,用于依据所述根因分析结果对所述系统运行日志进行标注,得到标注结构化日志。 The log annotation unit is used to annotate the system operation log according to the root cause analysis result to obtain an annotated structured log.
PCT/CN2023/134464 2022-12-28 2023-11-27 Edge-computing-based method and apparatus for monitoring livestream pulling WO2024139937A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211715309.X 2022-12-28
CN202211715309.XA CN116112340A (en) 2022-12-28 2022-12-28 Live broadcast pulling flow monitoring method and device based on edge calculation

Publications (1)

Publication Number Publication Date
WO2024139937A1 true WO2024139937A1 (en) 2024-07-04

Family

ID=86266760

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/134464 WO2024139937A1 (en) 2022-12-28 2023-11-27 Edge-computing-based method and apparatus for monitoring livestream pulling

Country Status (2)

Country Link
CN (1) CN116112340A (en)
WO (1) WO2024139937A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116112340A (en) * 2022-12-28 2023-05-12 天翼数字生活科技有限公司 Live broadcast pulling flow monitoring method and device based on edge calculation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103401947A (en) * 2013-08-20 2013-11-20 曙光信息产业(北京)有限公司 Method and device for allocating tasks to multiple servers
CN105915405A (en) * 2016-03-29 2016-08-31 深圳市中博科创信息技术有限公司 Large-scale cluster node performance monitoring system
CN113011745A (en) * 2021-03-19 2021-06-22 中国南方电网有限责任公司 Abnormity detection method, device, equipment and medium in power grid safety operation and maintenance
CN116112340A (en) * 2022-12-28 2023-05-12 天翼数字生活科技有限公司 Live broadcast pulling flow monitoring method and device based on edge calculation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103401947A (en) * 2013-08-20 2013-11-20 曙光信息产业(北京)有限公司 Method and device for allocating tasks to multiple servers
CN105915405A (en) * 2016-03-29 2016-08-31 深圳市中博科创信息技术有限公司 Large-scale cluster node performance monitoring system
CN113011745A (en) * 2021-03-19 2021-06-22 中国南方电网有限责任公司 Abnormity detection method, device, equipment and medium in power grid safety operation and maintenance
CN116112340A (en) * 2022-12-28 2023-05-12 天翼数字生活科技有限公司 Live broadcast pulling flow monitoring method and device based on edge calculation

Also Published As

Publication number Publication date
CN116112340A (en) 2023-05-12

Similar Documents

Publication Publication Date Title
WO2022068645A1 (en) Database fault discovery method, apparatus, electronic device, and storage medium
WO2021129367A1 (en) Method and apparatus for monitoring distributed storage system
CN110750377A (en) Fault positioning method and device
WO2024139937A1 (en) Edge-computing-based method and apparatus for monitoring livestream pulling
CN113282635A (en) Micro-service system fault root cause positioning method and device
CN112087334B (en) Alarm root cause analysis method, electronic device and storage medium
CN107704387B (en) Method, device, electronic equipment and computer readable medium for system early warning
CN113537268A (en) Fault detection method and device, computer equipment and storage medium
CN109150635A (en) Failure effect analysis (FEA) method and device
TWI684139B (en) System and method of learning-based prediction for anomalies within a base station
CN112380089A (en) Data center monitoring and early warning method and system
CN112636967A (en) Root cause analysis method, device, equipment and storage medium
CN111274084A (en) Fault diagnosis method, device, equipment and computer readable storage medium
CN115529595A (en) Method, device, equipment and medium for detecting abnormity of log data
CN112910733A (en) Full link monitoring system and method based on big data
CN113704018A (en) Application operation and maintenance data processing method and device, computer equipment and storage medium
CN117391675B (en) Data center infrastructure operation and maintenance management method
CN113656252B (en) Fault positioning method, device, electronic equipment and storage medium
CN110968479A (en) Business-level full-link monitoring method for application program and server
CN108111328A (en) A kind of abnormality eliminating method and device
CN107769993A (en) Towards the data traffic monitoring method of power network big data distributed system
CN112306722A (en) Method, device, equipment and computer readable medium for identifying fault
Li et al. An integrated data-driven framework for computing system management
CN113537519B (en) Method and device for identifying abnormal equipment
CN113656452A (en) Method and device for detecting abnormal index of call chain, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23909868

Country of ref document: EP

Kind code of ref document: A1