WO2023179042A1 - Data updating method, fault diagnosis method, electronic device, and storage medium - Google Patents

Data updating method, fault diagnosis method, electronic device, and storage medium Download PDF

Info

Publication number
WO2023179042A1
WO2023179042A1 PCT/CN2022/130723 CN2022130723W WO2023179042A1 WO 2023179042 A1 WO2023179042 A1 WO 2023179042A1 CN 2022130723 W CN2022130723 W CN 2022130723W WO 2023179042 A1 WO2023179042 A1 WO 2023179042A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
indicator
fault diagnosis
historical
historical indicator
Prior art date
Application number
PCT/CN2022/130723
Other languages
French (fr)
Chinese (zh)
Inventor
刘煌
梁帅
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2023179042A1 publication Critical patent/WO2023179042A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection

Definitions

  • Embodiments of the present application relate to the field of communications, and in particular to a data update method, fault diagnosis method, electronic device and storage medium.
  • Big data systems are widely used not only in the Internet industry, but also in key fields such as government, finance, communications, aerospace, and military. This type of system usually has a massive user base of hundreds of millions. Service interruption or service quality degradation caused by any minor fault will bring huge losses. Moreover, such systems are large in scale, complex in composition and operation logic, resulting in frequent system failures. occurs, and it is difficult to find, locate and diagnose the fault after it occurs, and it is difficult to analyze and debug the system after it goes wrong.
  • the currently commonly used fault diagnosis method is fault diagnosis based on the graph neural network algorithm.
  • This method requires training the graph neural network based on existing abnormal data to obtain a fault diagnosis model, and completing the root cause of the fault through the trained fault diagnosis model and real-time data. diagnosis.
  • this method has low accuracy of fault diagnosis model due to the small sample size of abnormal data.
  • the main purpose of the embodiments of this application is to propose a data update method, fault diagnosis method, electronic device and storage medium, and update the fault diagnosis model through dynamic thresholds so that the fault diagnosis model can accurately locate faults and be applicable to various scenarios.
  • embodiments of the present application provide a data update method, which includes: obtaining historical indicator data of the business, classifying the historical indicator data according to the change pattern of the historical indicator data over time, and obtaining the classification results. ; Perform anomaly detection on the historical indicator data according to the anomaly detection algorithm corresponding to the classification result, and determine the abnormal data in the historical indicator data; update the preset indicators of the business according to the distribution of the abnormal data threshold.
  • Embodiments of the present application provide a fault diagnosis method, which includes: obtaining fault data of a service; when the indicator threshold of the service is not updated, analyzing the fault data according to a preset fault diagnosis model, generating and outputting Fault diagnosis results, wherein the preset fault diagnosis model includes a preset indicator threshold; when the indicator threshold of the service has been updated, the latest indicator threshold is obtained, and the preset indicator threshold is calculated based on the latest indicator threshold.
  • the fault diagnosis model is updated, the updated fault diagnosis model is used to analyze the fault data, and fault diagnosis results are generated and output, wherein the latest index threshold is obtained through the data update method described in the above embodiment.
  • an embodiment of the present application also provides an electronic device, including: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores information that can be processed by the at least one processor.
  • Instructions executed by a processor the instructions are executed by the at least one processor, so that the at least one processor can perform the data update method described in the above embodiment, or can perform the fault diagnosis described in the above embodiment. method.
  • embodiments of the present application also propose a computer-readable storage medium that stores a computer program.
  • the computer program is executed by a processor, the data update method described in the above embodiments is implemented, or the above implementation can be performed.
  • the fault diagnosis method described in the example is not limited to, but not limited to, but not limited to, but not limited to, but not limited to, but not limited to, but not limited to, but not limited to, but not limited to, but not limited to the above embodiments, the data update method described in the above embodiments.
  • This application proposes a data update method that obtains classification results by classifying historical indicator data of the business.
  • the classification results indicate the change pattern of historical indicator data over time.
  • the corresponding anomaly detection algorithm is selected according to the type of historical indicator data to analyze the historical indicators. Detect the data, identify abnormal data, and then determine or update the business indicator thresholds based on the distribution of abnormal data.
  • this application can periodically or continuously obtain historical indicator data of the business, determine the latest indicator threshold based on the latest historical indicator data, and obtain the current operating status of the business in a timely manner through this dynamic threshold.
  • a fault diagnosis method proposed in this application obtains the fault data of the service in real time, and then determines whether the indicator threshold of the service has been updated. If the indicator threshold of the service has not been updated, the preset fault diagnosis model and the preset indicator threshold are used. Analyze the fault data and obtain the fault diagnosis results (the cause of the fault). If the indicator threshold of the business has been updated, the preset fault diagnosis model is updated according to the latest indicator threshold, and the updated fault diagnosis model is used to complete fault diagnosis and determine the cause of the fault.
  • the fault diagnosis model of this application is not static, but is constantly updated based on the latest indicator thresholds, and the latest indicator thresholds are determined based on the latest historical indicator data. In this way, the fault diagnosis model can be timely based on the current operation of the business.
  • the adaptive adjustment of state changes makes the model more suitable for actual business scenarios, thereby improving the accuracy of fault diagnosis results.
  • Figure 1 is a flow chart of a data update method provided by an embodiment of the present application.
  • Figure 2 is a flow chart of a fault diagnosis method provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the currently commonly used fault diagnosis method is fault diagnosis based on the graph neural network algorithm.
  • This method often abstracts the big data system into a network graph and obtains exceptions from all nodes in the network graph (such as servers, clients, etc.) Data is used to train the pre-created graph neural network to obtain a fault diagnosis model.
  • a fault occurs, abnormal data is input into the fault diagnosis model in real time to complete the root cause diagnosis of the fault.
  • this method needs to prepare a large amount of abnormal data in the early training stage, and it is abnormal data that occurs when a fault occurs, and this kind of abnormal data often has a small sample size. If abnormal data with a small sample size is used for training, the fault diagnosis obtained The accuracy of the model is low; if the model is iteratively trained by accumulating abnormal data, the cost is extremely high and it takes a long time.
  • the embodiment of the present application relates to a data update method, as shown in Figure 1, including:
  • Step 101 Obtain the historical indicator data of the business, classify the historical indicator data according to the change pattern of the historical indicator data over time, and obtain the classification results.
  • the historical indicator data of the business includes: various indicator data related to the operation of the business itself and indicator data of the underlying platform on which the business operation depends.
  • [Business Service A] is located in the business center.
  • the various data required by business service A come from [Other Services 1] and [Other Services 2] in the data center, and the data center relies on the underlying PaaS platform to complete it.
  • the data center relies on the underlying PaaS platform to complete it.
  • the content of historical indicator data may be different.
  • the updated indicator thresholds must conform to the current operating status of the business as much as possible. Therefore, when obtaining historical indicator data, try to select the data closest to the current moment, that is, the latest historical indicator data. For example: For a certain business, it is necessary to obtain a week's historical indicator data to determine the indicator threshold. Then it is best to select the data of the week before the current time as the historical indicator data, rather than selecting the data of a certain week a year ago to calculate the indicator threshold. .
  • the indicator threshold of the business can be obtained through the data update method of this application. If the indicator threshold of the business has been determined, the index threshold of the business can be continuously updated through the data update method of this application.
  • Some indicator thresholds, or in other words, historical indicator thresholds are constantly updated. In other words, when this application obtains historical indicator data, it can be obtained continuously, in real time, or periodically.
  • classifying the historical indicator data according to the change pattern of the historical indicator data over time and obtaining the classification results includes: obtaining the autocorrelation coefficient amplitude spectrum of the historical indicator data, and based on the The autocorrelation coefficient amplitude spectrum determines whether the historical indicator data is periodic; obtains the mean and variance of the historical indicator data, and determines whether the historical indicator data is volatile based on the mean and variance; The indicator data is linearly fitted, a fitting function is obtained, and whether the historical indicator data has a trend is determined based on the fitting function.
  • the periodicity of historical indicator data first obtain the autocorrelation coefficient map of the historical indicator data, perform Fourier transform on the autocorrelation coefficient map to obtain the autocorrelation coefficient amplitude spectrum, and calculate the maximum value of the autocorrelation coefficient amplitude spectrum. , mean and variance. When the maximum value is greater than the sum of the mean and the variance, the historical indicator data is determined to be periodic data. When the maximum value is less than or equal to the sum of the mean and the variance, the historical indicator data is determined to be aperiodic data.
  • the autocorrelation coefficient of historical indicator data can also be directly calculated. If the autocorrelation coefficient is less than the preset threshold, the historical indicator data can be directly determined to be non-periodic data. However, when the autocorrelation coefficient is greater than the preset threshold, it cannot be determined that the historical indicator data is periodic data, because the autocorrelation coefficient of non-periodic data may also be greater than the preset threshold. In one embodiment, for historical indicator data whose autocorrelation coefficient is greater than a preset threshold, its autocorrelation coefficient amplitude spectrum can be obtained. If its amplitude spectrum has no peaks, it can be directly determined that the historical indicator data is aperiodic data.
  • the strength of the periodicity can be further determined, and different anomaly detection algorithms can be selected according to the strength of the periodicity. For example: for strongly periodic data, you can choose a simpler and less complex anomaly detection algorithm; for weakly periodic data, you can choose an anomaly detection algorithm with better detection results and stronger compatibility.
  • the maximum peak value of the main peak of the autocorrelation coefficient amplitude spectrum can be obtained, which is determined by the following formula:
  • value max is the maximum value of the amplitude spectrum
  • is the mean value
  • is the variance
  • peak max is the maximum peak value of the main wave peak
  • is the preset discrimination rate.
  • the mean and variance of historical indicator data can be obtained. When the historical indicator data meets any of the following conditions, it is determined that it has no volatility. Otherwise, the historical indicator data has volatility:
  • is the mean value of historical indicator data
  • is the variance of historical indicator data
  • is the preset adjustment parameter
  • the historical indicator data can be linearly fitted, a fitting function is obtained, and whether the overall data has an upward or downward trend is determined based on the type of the fitting function or the slope of the fitting function. trend.
  • Step 102 Detect the historical indicator data according to the anomaly detection algorithm corresponding to the classification result, and determine the abnormal data in the historical indicator data.
  • a regression algorithm is used to perform anomaly detection on the historical indicator data; when the classification result is that the historical indicator data is aperiodic data, a clustering algorithm is used to detect the historical indicator data. Anomaly detection is performed on the indicator data; when the classification result is that the historical indicator data is non-volatile data, the box plot algorithm is used to detect anomalies in the historical indicator data; when the classification result is that the historical indicator data is trend data, time series decomposition is used The algorithm performs anomaly detection on historical indicator data.
  • the corresponding anomaly detection algorithm is selected according to the type of historical indicator data.
  • the data trend shows a certain periodic pattern over time and there is a correlation between the data, so the regression algorithm is used to detect anomalies.
  • Detection and regression algorithms can be Histogram-based decision tree algorithms, Leaf-wise algorithms with depth restrictions, unilateral gradient sampling algorithms, etc.
  • non-periodic historical indicator data the data is volatile, but the fluctuation patterns are messy and non-periodic, so a clustering algorithm is used to detect anomalies.
  • the clustering algorithm can be Gaussian mixture model algorithm, K-means clustering algorithm, mean shift clustering algorithm, density-based clustering algorithm, agglomerative hierarchical clustering algorithm, etc.
  • Time series decomposition algorithms can be X11 decomposition algorithm (X11 decomposition), classic decomposition algorithm, SEATS decomposition algorithm (SEATS decomposition), STL decomposition algorithm (STL decomposition), etc.
  • Step 103 Update the business indicator threshold according to the distribution of abnormal data.
  • updating the preset indicator threshold of the business according to the distribution of the abnormal data includes: determining the upper quartile and lower quartile of the abnormal data according to the distribution of the abnormal data. number and interquartile range; determine the current indicator threshold of the business based on the upper quartile, the lower quartile and the interquartile range, and use the current indicator threshold to replace the preset indicator threshold.
  • the lower quartile Q1, the median Q2, and the upper quartile Q3 are obtained.
  • the 25th, 50th, and 75th percentile numbers are the lower quartile Q1, the median Q2, and the upper quartile Q3 respectively.
  • the upper quartile and the lower quartile The difference between the quartiles is the interquartile range IQR, the upper limit of the indicator threshold is Q3+1.5 ⁇ IQR, and the lower limit of the indicator threshold is Q3+1.5 ⁇ IQR.
  • the data update method is explained.
  • a clustering algorithm is used for anomaly detection, that is, the entire CPU usage data is clustered according to the set number of categories, and then the abnormal ones are found according to the discrimination rules. That category or categories and label them. In order to achieve the purpose of anomaly detection.
  • This application proposes a data update method that obtains classification results by classifying historical indicator data of the business.
  • the classification results indicate the change pattern of historical indicator data over time.
  • the corresponding anomaly detection algorithm is selected according to the type of historical indicator data to analyze the historical indicators. Detect the data, identify abnormal data, and then determine or update the business indicator thresholds based on the distribution of abnormal data.
  • this application can periodically or continuously obtain historical indicator data of the business, determine the latest indicator threshold based on the latest historical indicator data, and obtain the current operating status of the business in a timely manner through this dynamic threshold.
  • the embodiment of the present application relates to a fault diagnosis method, as shown in Figure 2, including:
  • Step 201 Obtain service fault data.
  • the fault data includes: log data, alarm data, indicator data, third-party operation and maintenance data, etc.
  • the specific fault data acquisition time can be to obtain the business fault data in real time when the fault occurs, and then store the fault data so that it can be called during subsequent fault diagnosis. It can also be obtained from relevant equipment or platforms during fault diagnosis.
  • fault diagnosis methods are deployed in the data center in the form of modules.
  • the data center is based on the data warehouse and data platform, and produces data into data API services to provide it to the business in a more efficient way.
  • [Business Service A] is located in the business center.
  • the various data required by business service A come from [Other Services 1] and [Other Services 2] in the data center.
  • the data center relies on the underlying PaaS platform to complete industry applications. Support, then when obtaining fault data, you need to obtain fault data of the PaaS platform, fault data of the data center, fault data of the business center, etc.
  • Step 202 When the service indicator threshold is not updated, the fault data is analyzed according to a preset fault diagnosis model, and a fault diagnosis result is generated and output, where the preset fault diagnosis model includes the preset indicator threshold.
  • Step 203 When the indicator threshold of the service has been updated, the latest indicator threshold is obtained, the preset fault diagnosis model is updated according to the latest indicator threshold, and the fault data is analyzed using the updated fault diagnosis model. Analyze, generate and output fault diagnosis results, where the latest indicator threshold is obtained through the data update method described in the above embodiment.
  • the update of the indicator threshold is obtained through the data update method described in the above embodiment, and the data update method can be deployed on a certain device or platform in the form of a module, and the fault diagnosis method can also be deployed on a certain device or platform in the form of a module. On the same or different device or platform as the data update method.
  • the preset indicator threshold in the fault diagnosis model (the preset indicator threshold can be initially manually set, or it can be the indicator threshold at a historical moment obtained through the data update method), and the updated fault diagnosis model is obtained.
  • the preset fault diagnosis model includes multiple fault diagnosis templates, and the fault diagnosis templates are created according to a rule engine and preset fault diagnosis conditions.
  • the rule engine can be a Drools engine, an Open Tablets engine, an Easy Rules engine, etc.
  • the rule engine converts preset fault diagnosis conditions into fault diagnosis templates.
  • a fault diagnosis model usually includes multiple fault diagnosis templates.
  • the thresholds in the rule engine are generally fixed values, which cannot satisfy fault diagnosis in various scenarios of the same business. Therefore, this application also combines the data update method to obtain dynamic thresholds. In this way, the fault diagnosis model of this application does not need to collect a large amount of abnormal data. Conduct training and adaptively adjust the threshold according to the current operating status of the business. The entire fault diagnosis method has low cost, short model construction time and high accuracy.
  • the fault diagnosis method of this application is explained by taking the sudden drop in the number of Yarn's running tasks as an example.
  • the preset fault diagnosis conditions include: (1) If the task submission service is down, then the number of Yarn's running tasks will be caused. The reason for the decrease is that the submitting node is down; (2) If the submitting task service is not down, but the Pg service is abnormal, then the reason for the decrease in the number of Yarn running tasks is that the Pg service is abnormal; (3) If the submitting task service is different from the Pg service None of them are abnormal, but the task initialized at the current moment is abnormal, so the reason for the decrease in the number of running tasks in Yarn is the task initialization exception. Create multiple fault diagnosis templates through the above fault diagnosis conditions and rule engine, and create a fault diagnosis model based on multiple fault diagnosis templates.
  • the indicator data involved in this failure are (1) the number of submitted task services; (2) the number of Pg service processes; (3) the number of initial tasks.
  • the corresponding indicator thresholds are fixed values, so there is no need for data update methods to update. And whether there is any abnormality in the initial number of tasks in the current hour (how many tasks are initialized before how many tasks need to be submitted), This cannot be judged using a fixed threshold, because different nodes have different task volumes, so the task volume is different at different hours every day, and the number of tasks at the same hour on different days is not necessarily the same, so the data update method is used to continuously update
  • the indicator threshold of the number of initial tasks is continuously updated, and the fault diagnosis model is continuously updated to complete the root cause diagnosis of the sudden drop in the number of running tasks in Yarn.
  • a fault diagnosis method proposed in this application obtains the fault data of the service in real time, and then determines whether the indicator threshold of the service has been updated. If the indicator threshold of the service has not been updated, the preset fault diagnosis model and the preset indicator threshold are used. Analyze the fault data and obtain the fault diagnosis results (the cause of the fault). If the indicator threshold of the business has been updated, the preset fault diagnosis model is updated according to the latest indicator threshold, and the updated fault diagnosis model is used to complete fault diagnosis and determine the cause of the fault.
  • the fault diagnosis model of this application is not static, but is constantly updated based on the latest indicator thresholds, and the latest indicator thresholds are determined based on the latest historical indicator data. In this way, the fault diagnosis model can be timely based on the current operation of the business.
  • the adaptive adjustment of state changes makes the model more suitable for actual business scenarios, thereby improving the accuracy of fault diagnosis results.
  • the embodiment of the present application relates to an electronic device, as shown in Figure 3, including: at least one processor 301; and a memory 302 communicatively connected with the at least one processor 301; wherein the memory 302 stores data that can Instructions executed by the at least one processor 301, the instructions are executed by the at least one processor 301, so that the at least one processor 301 can perform the fault diagnosis method of the above embodiment, or can perform the above embodiment data update method.
  • the bus can include any number of interconnected buses and bridges.
  • the bus connects one or more processors and various circuits of the memory together.
  • the bus may also connect various other circuits together such as peripherals, voltage regulators, and power management circuits, which are all well known in the art and therefore will not be described further herein.
  • the bus interface provides the interface between the bus and the transceiver.
  • a transceiver may be one element or may be multiple elements, such as multiple receivers and transmitters, providing a unit for communicating with various other devices over a transmission medium.
  • the data processed by the processor is transmitted over the wireless medium through the antenna. Further, the antenna also receives the data and transmits the data to the processor.
  • the processor is responsible for managing the bus and general processing, and can also provide a variety of functions, including timing, peripheral interfaces, voltage regulation, power management, and other control functions.
  • Memory can be used to store data used by the processor when performing operations.
  • Embodiments of the present application relate to a computer-readable storage medium storing a computer program.
  • the computer program is executed by the processor, the above fault diagnosis method or data update method is implemented.
  • the program is stored in a storage medium and includes several instructions to cause a device ( It may be a microcontroller, a chip, etc.) or a processor (processor) that executes all or part of the steps of the method described in each embodiment of the application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program code. .

Abstract

Provided are a data updating method, a fault diagnosis method, an electronic device, and a storage medium, which relate to the field of communications. The data updating method comprises: acquiring historical index data of a service, according to a change rule of the historical index data over time, classifying the historical index data, and acquiring a classification result (101); according to an anomaly detection algorithm corresponding to the classification result, performing anomaly detection on the historical index data, and determining abnormal data in the historical index data (102); and according to the distribution of the abnormal data, updating a preset index threshold of the service (103).

Description

数据更新方法、故障诊断方法、电子设备和存储介质Data update method, fault diagnosis method, electronic equipment and storage media
相关申请Related applications
本申请要求于2022年3月21号申请的、申请号为202210283753.2的中国专利申请的优先权。This application claims priority to the Chinese patent application with application number 202210283753.2, which was filed on March 21, 2022.
技术领域Technical field
本申请实施例涉及通信领域,特别涉及一种数据更新方法、故障诊断方法、电子设备和存储介质。Embodiments of the present application relate to the field of communications, and in particular to a data update method, fault diagnosis method, electronic device and storage medium.
背景技术Background technique
大数据系统不仅广泛应用于互联网行业,也应用于政府、金融、通信、航天、军事等关键领域中。这类系统通常拥有数以亿计的海量用户群,任何微小的故障引发的服务中断或者服务质量下降都会带来巨大的损失,且此类系统规模庞大、组成及运行逻辑复杂,导致系统故障频繁发生,且故障发生后难以发现、定位和诊断,系统出错后难以分析和调试。Big data systems are widely used not only in the Internet industry, but also in key fields such as government, finance, communications, aerospace, and military. This type of system usually has a massive user base of hundreds of millions. Service interruption or service quality degradation caused by any minor fault will bring huge losses. Moreover, such systems are large in scale, complex in composition and operation logic, resulting in frequent system failures. occurs, and it is difficult to find, locate and diagnose the fault after it occurs, and it is difficult to analyze and debug the system after it goes wrong.
目前常用的故障诊断方法是基于图神经网络算法的故障诊断,该方法需要根据已有的异常数据对图神经网络进行训练得到故障诊断模型,通过训练后的故障诊断模型和实时数据完成故障根因诊断。但这种方法由于异常数据样本量小导致故障诊断模型准确度较低。The currently commonly used fault diagnosis method is fault diagnosis based on the graph neural network algorithm. This method requires training the graph neural network based on existing abnormal data to obtain a fault diagnosis model, and completing the root cause of the fault through the trained fault diagnosis model and real-time data. diagnosis. However, this method has low accuracy of fault diagnosis model due to the small sample size of abnormal data.
发明内容Contents of the invention
本申请实施例的主要目的在于提出一种数据更新方法、故障诊断方法、电子设备及存储介质,通过动态阈值更新故障诊断模型使得故障诊断模型能精确定位故障,并适用于各种场景。The main purpose of the embodiments of this application is to propose a data update method, fault diagnosis method, electronic device and storage medium, and update the fault diagnosis model through dynamic thresholds so that the fault diagnosis model can accurately locate faults and be applicable to various scenarios.
为实现上述目的,本申请实施例提供了一种数据更新方法,包括:获取业务的历史指标数据,并根据所述历史指标数据随时间的变化规律对所述历史指标数据进行分类,获取分类结果;根据所述分类结果对应的异常检测算法对所述历史指标数据进行异常检测,确定出所述历史指标数据中的异常数据;根据所述异常数据的分布,更新所述业务的预设的指标阈值。In order to achieve the above purpose, embodiments of the present application provide a data update method, which includes: obtaining historical indicator data of the business, classifying the historical indicator data according to the change pattern of the historical indicator data over time, and obtaining the classification results. ; Perform anomaly detection on the historical indicator data according to the anomaly detection algorithm corresponding to the classification result, and determine the abnormal data in the historical indicator data; update the preset indicators of the business according to the distribution of the abnormal data threshold.
本申请实施例提供了一种故障诊断方法,包括:获取业务的故障数据;当所述业务的指标阈值未更新时,则根据预设的故障诊断模型对所述故障数据进行分析,生成并输出故障诊断结果,其中所述预设的故障诊断模型包含预设的指标阈值;当所述业务的指标阈值已更新时,则获取最新的指标阈值,并根据所述最新的指标阈值对预设的故障诊断模型进行更新,利用更新后的故障诊断模型对所述故障数据进行分析,生成并输出故障诊断结果,其中所述最新的指标阈值通过上述实施例所述的数据更新方法获取。Embodiments of the present application provide a fault diagnosis method, which includes: obtaining fault data of a service; when the indicator threshold of the service is not updated, analyzing the fault data according to a preset fault diagnosis model, generating and outputting Fault diagnosis results, wherein the preset fault diagnosis model includes a preset indicator threshold; when the indicator threshold of the service has been updated, the latest indicator threshold is obtained, and the preset indicator threshold is calculated based on the latest indicator threshold. The fault diagnosis model is updated, the updated fault diagnosis model is used to analyze the fault data, and fault diagnosis results are generated and output, wherein the latest index threshold is obtained through the data update method described in the above embodiment.
为实现上述目的,本申请实施例还提出了一种电子设备,包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行以上实施例所述的数据更新方法,或者能够执行以上实施例所述的故障诊断方法。In order to achieve the above object, an embodiment of the present application also provides an electronic device, including: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores information that can be processed by the at least one processor. Instructions executed by a processor, the instructions are executed by the at least one processor, so that the at least one processor can perform the data update method described in the above embodiment, or can perform the fault diagnosis described in the above embodiment. method.
为实现上述目的,本申请实施例还提出了一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时实现以上实施例所述的数据更新方法,或者能够执行以上实施例所述的故障诊断方法。In order to achieve the above purpose, embodiments of the present application also propose a computer-readable storage medium that stores a computer program. When the computer program is executed by a processor, the data update method described in the above embodiments is implemented, or the above implementation can be performed. The fault diagnosis method described in the example.
本申请提出的一种数据更新方法,通过对业务的历史指标数据进行分类获取分类结果,分类结果指示历史指标数据随时间的变化规律,根据历史指标数据的类型选择对应的异常检测算法对历史指标数据进行检测,从中确定出异常数据,进而根据异常数据的分布确定或更新业务的指标阈值。也就是说,本申请可以周期性或不间断地获取业务的历史指标数据,根据最新的历史指标数据确定出最新的指标阈值,通过这种动态阈值来及时获取业务当前的运行状态。This application proposes a data update method that obtains classification results by classifying historical indicator data of the business. The classification results indicate the change pattern of historical indicator data over time. The corresponding anomaly detection algorithm is selected according to the type of historical indicator data to analyze the historical indicators. Detect the data, identify abnormal data, and then determine or update the business indicator thresholds based on the distribution of abnormal data. In other words, this application can periodically or continuously obtain historical indicator data of the business, determine the latest indicator threshold based on the latest historical indicator data, and obtain the current operating status of the business in a timely manner through this dynamic threshold.
本申请提出的一种故障诊断方法,实时获取业务的故障数据,然后判断该业务的指标阈值是否更新,若该业务的指标阈值未更新,则使用预设的故障诊断模型和预设的指标阈值对故障数据进行分析,获取故障诊断结果(引起故障的原因)。若该业务的指标阈值已更新,则根据最新的指标阈值更新预设的故障诊断模型,利用更新后的故障诊断模型完成故障诊断,确定引起故障的原因。也就是说,本申请的故障诊断模型不是一成不变的,而是根据最新的指标阈值不断更新的,而最新的指标阈值是根据最新的历史指标数据确定,如此故障诊断模型可以及时根据业务当前的运行状态变化适应性调整,使得模型更适用于实际业务场景,进而提高了故障诊断结果的准确性。A fault diagnosis method proposed in this application obtains the fault data of the service in real time, and then determines whether the indicator threshold of the service has been updated. If the indicator threshold of the service has not been updated, the preset fault diagnosis model and the preset indicator threshold are used. Analyze the fault data and obtain the fault diagnosis results (the cause of the fault). If the indicator threshold of the business has been updated, the preset fault diagnosis model is updated according to the latest indicator threshold, and the updated fault diagnosis model is used to complete fault diagnosis and determine the cause of the fault. In other words, the fault diagnosis model of this application is not static, but is constantly updated based on the latest indicator thresholds, and the latest indicator thresholds are determined based on the latest historical indicator data. In this way, the fault diagnosis model can be timely based on the current operation of the business. The adaptive adjustment of state changes makes the model more suitable for actual business scenarios, thereby improving the accuracy of fault diagnosis results.
附图说明Description of the drawings
一个或多个实施例通过与之对应的附图中的图片进行示例性说明,这些示例性说明并不构成对实施例的限定。One or more embodiments are exemplified by the pictures in the corresponding drawings, and these exemplary illustrations do not constitute limitations to the embodiments.
图1是本申请的实施例提供的数据更新方法的流程图;Figure 1 is a flow chart of a data update method provided by an embodiment of the present application;
图2是本申请的实施例提供的故障诊断方法的流程图;Figure 2 is a flow chart of a fault diagnosis method provided by an embodiment of the present application;
图3是本申请的实施方式提供的电子设备的结构示意图。FIG. 3 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
具体实施方式Detailed ways
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请的各实施例进行详细的阐述。然而,本领域的普通技术人员可以理解,在本申请各实施例中,为了使读者更好地理解本申请而提出了许多技术细节。但是,即使没有这些技术细节和基于以下各实施例的种种变化和修改,也可以实现本申请所要求保护的技术方案。以下各个实施例的划分是为了描述方便,不应对本申请的具体实现方式构成任何限定,各个实施例在不矛盾的前提下可以相互结合相互引用。In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, each embodiment of the present application will be described in detail below with reference to the accompanying drawings. However, those of ordinary skill in the art can understand that in each embodiment of the present application, many technical details are provided to enable readers to better understand the present application. However, even without these technical details and various changes and modifications based on the following embodiments, the technical solution claimed in this application can also be implemented. The division of the following embodiments is for the convenience of description and should not constitute any limitation on the specific implementation of the present application. The various embodiments can be combined with each other and referenced with each other on the premise that there is no contradiction.
目前常用的故障诊断方法是基于图神经网络算法的故障诊断,这种方法往往先将大数据系统抽象为一个网络图谱,从网络图谱中的所有节点(如:服务器、客户端等)处获取异常数据,对预先创建的图神经网络进行训练得到故障诊断模型。当故障发生时,实时输入异常数据到故障诊断模型,完成故障的根因诊断。但这种方法前期训练阶段需要准备大量的异常数据,且是故障时发生的异常数据,而这种异常数据往往样本量较小,若采用样本量较小的异常数据进行训练,得到的故障诊断模型准确性较低;若通过积累异常数据迭代训练模型,则成本极高,耗费时间也较长。The currently commonly used fault diagnosis method is fault diagnosis based on the graph neural network algorithm. This method often abstracts the big data system into a network graph and obtains exceptions from all nodes in the network graph (such as servers, clients, etc.) Data is used to train the pre-created graph neural network to obtain a fault diagnosis model. When a fault occurs, abnormal data is input into the fault diagnosis model in real time to complete the root cause diagnosis of the fault. However, this method needs to prepare a large amount of abnormal data in the early training stage, and it is abnormal data that occurs when a fault occurs, and this kind of abnormal data often has a small sample size. If abnormal data with a small sample size is used for training, the fault diagnosis obtained The accuracy of the model is low; if the model is iteratively trained by accumulating abnormal data, the cost is extremely high and it takes a long time.
本申请的实施例涉及一种数据更新方法,如图1所示,包括:The embodiment of the present application relates to a data update method, as shown in Figure 1, including:
步骤101,获取业务的历史指标数据,并根据历史指标数据随时间的变化规律对历史指标数据进行分类,获取分类结果。Step 101: Obtain the historical indicator data of the business, classify the historical indicator data according to the change pattern of the historical indicator data over time, and obtain the classification results.
在本实施例中,业务的历史指标数据包括:业务自身运行有关的各种指标数据和业务运行所依赖的底层平台的指标数据。比如:[业务服务A]位于业务中台中,业务服务A所需要的各种数据来源于数据中台中的[其他服务1]、[其他服务2],而数据中台依赖于底层PaaS平台来完成行业应用的支撑,那么在获取历史指标数据时则需要获取PaaS平台自身的指标数据、数据中台的指标数据、业务中台的指标数据等。可以理解的是,对于不同业务、不同场景、不同设备来说,历史指标数据的内容可能并不相同。In this embodiment, the historical indicator data of the business includes: various indicator data related to the operation of the business itself and indicator data of the underlying platform on which the business operation depends. For example: [Business Service A] is located in the business center. The various data required by business service A come from [Other Services 1] and [Other Services 2] in the data center, and the data center relies on the underlying PaaS platform to complete it. In order to obtain the support of industry applications, when obtaining historical indicator data, you need to obtain the indicator data of the PaaS platform itself, the indicator data of the data center, the indicator data of the business center, etc. It is understandable that for different businesses, different scenarios, and different devices, the content of historical indicator data may be different.
另外,在获取历史指标数据时,需要结合业务类型、故障诊断要求(数据量、时间、精确度等)、应用场景等各种因素选取合适时间段的指标数据。需要注意的是,更新后的指标阈值要尽可能符合业务当前运行状态,因此在获取历史指标数据时,尽可能地选取距离当前时刻最近的数据,即最新的历史指标数据。比如:对于某一业务来说,需要获取一周的历史指标数据来确定指标阈值,那么最好选取当前时间前一周的数据作为历史指标数据,而不是选取一年以前某一周的数据来计算指标阈值。In addition, when obtaining historical indicator data, it is necessary to select indicator data for an appropriate time period based on various factors such as business type, fault diagnosis requirements (data volume, time, accuracy, etc.), application scenarios, etc. It should be noted that the updated indicator thresholds must conform to the current operating status of the business as much as possible. Therefore, when obtaining historical indicator data, try to select the data closest to the current moment, that is, the latest historical indicator data. For example: For a certain business, it is necessary to obtain a week's historical indicator data to determine the indicator threshold. Then it is best to select the data of the week before the current time as the historical indicator data, rather than selecting the data of a certain week a year ago to calculate the indicator threshold. .
需要说明的是,若业务的指标阈值未确定,则可以通过本申请的数据更新方法获取当前时刻业务的指标阈值,若业务的指标阈值已确定,则可以通过本申请的数据更新方法不断更新已有的指标阈值,或者说,不断更新历史指标阈值。也就是说,本申请在获取历史指标数据时可以不间断地、实时地获取,也可以周期性地获取。It should be noted that if the indicator threshold of the business has not been determined, the indicator threshold of the business at the current moment can be obtained through the data update method of this application. If the indicator threshold of the business has been determined, the index threshold of the business can be continuously updated through the data update method of this application. Some indicator thresholds, or in other words, historical indicator thresholds are constantly updated. In other words, when this application obtains historical indicator data, it can be obtained continuously, in real time, or periodically.
在一实施例中,所述根据所述历史指标数据随时间的变化规律对所述历史指标数据进行分类,获取分类结果,包括:获取所述历史指标数据的自相关系数幅度谱,并根据所述自相关系数幅度谱确定所述历史指标数据是否具有周期性;获取所述历史指标数据的均值和方差,并根据所述均值和方差确定所述历史指标数据是否具有波动性;对所述历史指标数据进行线性拟合,获取拟合函数,并根据所述拟合函数确定所述历史指标数据是否具有趋势性。In one embodiment, classifying the historical indicator data according to the change pattern of the historical indicator data over time and obtaining the classification results includes: obtaining the autocorrelation coefficient amplitude spectrum of the historical indicator data, and based on the The autocorrelation coefficient amplitude spectrum determines whether the historical indicator data is periodic; obtains the mean and variance of the historical indicator data, and determines whether the historical indicator data is volatile based on the mean and variance; The indicator data is linearly fitted, a fitting function is obtained, and whether the historical indicator data has a trend is determined based on the fitting function.
具体地说,对于历史指标数据周期性的确定,先获取历史指标数据的自相关系数图,对自相关系数图作傅里叶变换得到自相关系数幅度谱,计算自相关系数幅度谱的最大值、均值和方差,当最大值大于均值和方差之和时,确定历史指标数据为周期性数据,当最大值小于或等于均值和方差之和时,确定历史指标数据为非周期性数据。Specifically, to determine the periodicity of historical indicator data, first obtain the autocorrelation coefficient map of the historical indicator data, perform Fourier transform on the autocorrelation coefficient map to obtain the autocorrelation coefficient amplitude spectrum, and calculate the maximum value of the autocorrelation coefficient amplitude spectrum. , mean and variance. When the maximum value is greater than the sum of the mean and the variance, the historical indicator data is determined to be periodic data. When the maximum value is less than or equal to the sum of the mean and the variance, the historical indicator data is determined to be aperiodic data.
另外,对于非周期性数据的确定也可以直接计算历史指标数据的自相关系数,若自相关系数小于预设的阈值,则可以直接确定该历史指标数据为非周期性数据。但是,当自相关系数大于预设的阈值时,并不能确定该历史指标数据为周期性数据,因为非周期性数据的自相关系数也有可能大于预设的阈值。在一实施方式中,对于自相关系数大于预设阈值的历史指标数据,可以获取其自相关系数幅度谱,若其幅度谱无尖峰,则可以直接确定该历史指标数据为非周期性数据。In addition, for the determination of non-periodic data, the autocorrelation coefficient of historical indicator data can also be directly calculated. If the autocorrelation coefficient is less than the preset threshold, the historical indicator data can be directly determined to be non-periodic data. However, when the autocorrelation coefficient is greater than the preset threshold, it cannot be determined that the historical indicator data is periodic data, because the autocorrelation coefficient of non-periodic data may also be greater than the preset threshold. In one embodiment, for historical indicator data whose autocorrelation coefficient is greater than a preset threshold, its autocorrelation coefficient amplitude spectrum can be obtained. If its amplitude spectrum has no peaks, it can be directly determined that the historical indicator data is aperiodic data.
在本实施例中,对于周期性数据,可以进一步确定其周期性强弱,根据周期性的强弱程度选择不同的异常检测算法。比如:对于强周期数据,可以选择较简单的、复杂度较低的异常检测算法,对于弱周期性数据,可以选择检测结果较好的、兼容性较强的异常检测算法。判断周期性数据的强弱程度,可以获取自相关系数幅度谱的主波峰最大波峰值,通过以下公式确定:In this embodiment, for periodic data, the strength of the periodicity can be further determined, and different anomaly detection algorithms can be selected according to the strength of the periodicity. For example: for strongly periodic data, you can choose a simpler and less complex anomaly detection algorithm; for weakly periodic data, you can choose an anomaly detection algorithm with better detection results and stronger compatibility. To judge the strength of periodic data, the maximum peak value of the main peak of the autocorrelation coefficient amplitude spectrum can be obtained, which is determined by the following formula:
Figure PCTCN2022130723-appb-000001
Figure PCTCN2022130723-appb-000001
其中,value max为幅度谱最大值,μ为均值、σ为方差,peak max主波峰最大波峰值,λ为预设的判别率。 Among them, value max is the maximum value of the amplitude spectrum, μ is the mean value, σ is the variance, peak max is the maximum peak value of the main wave peak, and λ is the preset discrimination rate.
在一实施例中,对于数据波动性地检测,可以获取历史指标数据的均值和方差,当历史指标数据满足以下任一条件时,确定其无波动性,否则,该历史指标数据具有波动性:In one embodiment, for detecting data volatility, the mean and variance of historical indicator data can be obtained. When the historical indicator data meets any of the following conditions, it is determined that it has no volatility. Otherwise, the historical indicator data has volatility:
Figure PCTCN2022130723-appb-000002
Figure PCTCN2022130723-appb-000002
其中,μ为历史指标数据的均值,σ为历史指标数据的方差,λ为预设的调节参数。Among them, μ is the mean value of historical indicator data, σ is the variance of historical indicator data, and λ is the preset adjustment parameter.
在一实施例中,对于趋势性数据的确定,可以对历史指标数据进行线性拟合,获取拟合函数,根据拟合函数的类型或拟合函数的斜率判断数据整体是否有向上或向下的趋势。In one embodiment, to determine the trend data, the historical indicator data can be linearly fitted, a fitting function is obtained, and whether the overall data has an upward or downward trend is determined based on the type of the fitting function or the slope of the fitting function. trend.
步骤102,根据分类结果对应的异常检测算法对历史指标数据进行检测,确定出历史指标数据中的异常数据。Step 102: Detect the historical indicator data according to the anomaly detection algorithm corresponding to the classification result, and determine the abnormal data in the historical indicator data.
在一实施例中,当分类结果为历史指标数据为周期性数据时,采用回归算法对历史指标数据进行异常检测;当分类结果为历史指标数据为非周期性数据时,采用聚类算法对历史指标数据进行异常检测;当分类结果为历史指标数据为无波动性数据时,采用箱线图算法对历史指标数据进行异常检测;当分类结果为历史指标数据为趋势性数据时,采用时间序列分解算法对历史指标数据进行异常检测。In one embodiment, when the classification result is that the historical indicator data is periodic data, a regression algorithm is used to perform anomaly detection on the historical indicator data; when the classification result is that the historical indicator data is aperiodic data, a clustering algorithm is used to detect the historical indicator data. Anomaly detection is performed on the indicator data; when the classification result is that the historical indicator data is non-volatile data, the box plot algorithm is used to detect anomalies in the historical indicator data; when the classification result is that the historical indicator data is trend data, time series decomposition is used The algorithm performs anomaly detection on historical indicator data.
具体地说,根据历史指标数据的类型选择对应的异常检测算法,对于周期性历史指标数据,其数据走势随时间呈现一定的周期规律、数据之间存在相关性,因此采用回归算法对其进行异常检测,回归算法可以是基于Histogram的决策树算法、带深度限制的Leaf-wise算法、单边梯度采样算法等等。对于非周期性历史指标数据,其数据具有波动性,但波动规律又杂乱无章、无周期可言,因此采用聚类算法对其进行异常检测。聚类算法可以是高斯混合模型算法、K均值聚类算法、均值漂移聚类算法、基于密度的聚类算法、凝聚层次聚类算法等等。对于无波动性历史指标数据,其数据基本处于一个恒值状态,或者基本在一个很小的窄幅区间震荡,因此采用箱线图算法对其进行异常检测。对于趋势性历史指标数据,其数据会有局部震荡、但整体有一个向上或向下的走势,因此采用时间序列分解算法对其进行异常检测。时间序列分解算法可以是X11分解算法(X11 decomposition)、经典分解算法、SEATS分解算法(SEATS decomposition)、STL分解算法(STL decomposition)等等。Specifically, the corresponding anomaly detection algorithm is selected according to the type of historical indicator data. For cyclical historical indicator data, the data trend shows a certain periodic pattern over time and there is a correlation between the data, so the regression algorithm is used to detect anomalies. Detection and regression algorithms can be Histogram-based decision tree algorithms, Leaf-wise algorithms with depth restrictions, unilateral gradient sampling algorithms, etc. For non-periodic historical indicator data, the data is volatile, but the fluctuation patterns are messy and non-periodic, so a clustering algorithm is used to detect anomalies. The clustering algorithm can be Gaussian mixture model algorithm, K-means clustering algorithm, mean shift clustering algorithm, density-based clustering algorithm, agglomerative hierarchical clustering algorithm, etc. For historical indicator data without volatility, the data is basically in a constant value state, or basically oscillates in a very small narrow range, so the box plot algorithm is used to detect anomalies. For trending historical indicator data, the data will fluctuate locally, but the overall trend will be upward or downward, so the time series decomposition algorithm is used to detect anomalies. Time series decomposition algorithms can be X11 decomposition algorithm (X11 decomposition), classic decomposition algorithm, SEATS decomposition algorithm (SEATS decomposition), STL decomposition algorithm (STL decomposition), etc.
步骤103,根据异常数据的分布,更新业务的指标阈值。Step 103: Update the business indicator threshold according to the distribution of abnormal data.
本实施例中,根据所述异常数据的分布,更新所述业务的预设的指标阈值,包括:根据所述异常数据的分布,确定所述异常数据的上四分位数、下四分位数和四分位间距;根据所述上四分位数、所述下四分位数和所述四分位间距确定所述业务当前的指标阈值,并采用当前的指标阈值替换预设的指标阈值。In this embodiment, updating the preset indicator threshold of the business according to the distribution of the abnormal data includes: determining the upper quartile and lower quartile of the abnormal data according to the distribution of the abnormal data. number and interquartile range; determine the current indicator threshold of the business based on the upper quartile, the lower quartile and the interquartile range, and use the current indicator threshold to replace the preset indicator threshold.
具体地说,将异常数据按照从小到大的顺序排列后,将该组数据四等分后,获取下四分位数Q1、中位数Q2、和上四分位数Q3,即异常数据中所有数值由小到大排列后第25%、第 50%和第75%的数字分别为下四分位数Q1、中位数Q2、和上四分位数Q3,上四分位数与下四分位数的差距为四分位间距IQR,指标阈值的上限为Q3+1.5×IQR,指标阈值的下限为Q3+1.5×IQR。比如:有一组异常数据按从小到大排列后为(2710,2755,2850,2880,2880,2890,2920,2940,2950,3050,3130,3325),将其四等分后为(2710,2755,2850|2880,2880,2890|2920,2940,2950|3050,3130,3325),Q1=2850+(2880-2850)/2,Q2=2890+(2920-2890)/2,Q3=2950+(3050-2950)/2。需要说明的是,对于不同业务、不同指标数据来说,指标阈值可以是一个范围,也可以是一个值。Specifically, after arranging the abnormal data in order from small to large, and dividing the set of data into four equal parts, the lower quartile Q1, the median Q2, and the upper quartile Q3 are obtained. After all the values are arranged from small to large, the 25th, 50th, and 75th percentile numbers are the lower quartile Q1, the median Q2, and the upper quartile Q3 respectively. The upper quartile and the lower quartile The difference between the quartiles is the interquartile range IQR, the upper limit of the indicator threshold is Q3+1.5×IQR, and the lower limit of the indicator threshold is Q3+1.5×IQR. For example: There is a set of abnormal data arranged from small to large: (2710, 2755, 2850, 2880, 2880, 2890, 2920, 2940, 2950, 3050, 3130, 3325). After being divided into four equal parts, it is (2710, 2755 , 2850|2880, 2880, 2890|2920, 2940, 2950|3050, 3130, 3325), Q1=2850+(2880-2850)/2, Q2=2890+(2920-2890)/2, Q3=2950+ (3050-2950)/2. It should be noted that for different businesses and different indicator data, the indicator threshold can be a range or a value.
以历史指标数据为CPU使用率为例对数据更新方法进行说明,(1)首先选取一天中每分钟的CPU使用率,将CPU使用率数据进行分类,计算该数据的均值、方差来确定是否具有波动性。计算CPU使用率数据的自相关系数幅度谱,来进行周期判别。计算CPU使用率数据的拟合函数来进行趋势性判别,可以确定CPU使用率数据属于非周期性数据。(2)由于CPU使用率数据属于非周期性数据,所以采用聚类算法进行异常检测,即对整个CPU使用率数据的点按设定的类数进行聚类,然后根据判别规则找出异常的那一类或者多类,并标记。从而达到异常检测的目的。(3)最后根据异常数据的分布,得出正常的CPU使用率指标阈值区间。Taking the historical indicator data as CPU usage as an example, the data update method is explained. (1) First, select the CPU usage every minute of the day, classify the CPU usage data, and calculate the mean and variance of the data to determine whether it has Volatility. Calculate the autocorrelation coefficient amplitude spectrum of the CPU usage data to perform period discrimination. Calculate the fitting function of the CPU usage data to identify trends. It can be determined that the CPU usage data is non-periodic data. (2) Since the CPU usage data is non-periodic data, a clustering algorithm is used for anomaly detection, that is, the entire CPU usage data is clustered according to the set number of categories, and then the abnormal ones are found according to the discrimination rules. That category or categories and label them. In order to achieve the purpose of anomaly detection. (3) Finally, based on the distribution of abnormal data, the normal CPU usage indicator threshold interval is obtained.
本申请提出的一种数据更新方法,通过对业务的历史指标数据进行分类获取分类结果,分类结果指示历史指标数据随时间的变化规律,根据历史指标数据的类型选择对应的异常检测算法对历史指标数据进行检测,从中确定出异常数据,进而根据异常数据的分布确定或更新业务的指标阈值。也就是说,本申请可以周期性或不间断地获取业务的历史指标数据,根据最新的历史指标数据确定出最新的指标阈值,通过这种动态阈值来及时获取业务当前的运行状态。This application proposes a data update method that obtains classification results by classifying historical indicator data of the business. The classification results indicate the change pattern of historical indicator data over time. The corresponding anomaly detection algorithm is selected according to the type of historical indicator data to analyze the historical indicators. Detect the data, identify abnormal data, and then determine or update the business indicator thresholds based on the distribution of abnormal data. In other words, this application can periodically or continuously obtain historical indicator data of the business, determine the latest indicator threshold based on the latest historical indicator data, and obtain the current operating status of the business in a timely manner through this dynamic threshold.
本申请的实施例涉及一种故障诊断方法,如图2所示,包括:The embodiment of the present application relates to a fault diagnosis method, as shown in Figure 2, including:
步骤201,获取业务的故障数据。Step 201: Obtain service fault data.
本实施例中,故障数据包括:日志数据、告警数据、指标数据和第三方运维数据等。具体故障数据获取的时间可以是故障发生时实时获取业务的故障数据,然后将故障数据进行存储以便后续故障诊断时调用,还可以是在故障诊断时再从相关设备或平台中获取。比如:将故障诊断方法以模块的形式部署于数据中台内,数据中台是在数据仓库和数据平台的基础上,将数据生产为一个个数据API服务,以更高效的方式提供给业务,[业务服务A]位于业务中台中,业务服务A所需要的各种数据来源于数据中台中的[其他服务1]、[其他服务2],而数据中台依赖于底层PaaS平台来完成行业应用的支撑,那么在获取故障数据时则需要获取PaaS平台的故障数据、数据中台的故障数据、业务中台的故障数据等。In this embodiment, the fault data includes: log data, alarm data, indicator data, third-party operation and maintenance data, etc. The specific fault data acquisition time can be to obtain the business fault data in real time when the fault occurs, and then store the fault data so that it can be called during subsequent fault diagnosis. It can also be obtained from relevant equipment or platforms during fault diagnosis. For example: fault diagnosis methods are deployed in the data center in the form of modules. The data center is based on the data warehouse and data platform, and produces data into data API services to provide it to the business in a more efficient way. [Business Service A] is located in the business center. The various data required by business service A come from [Other Services 1] and [Other Services 2] in the data center. The data center relies on the underlying PaaS platform to complete industry applications. Support, then when obtaining fault data, you need to obtain fault data of the PaaS platform, fault data of the data center, fault data of the business center, etc.
步骤202,当业务的指标阈值未更新时,则根据预设的故障诊断模型对故障数据进行分析,生成并输出故障诊断结果,其中预设的故障诊断模型包含预设的指标阈值。Step 202: When the service indicator threshold is not updated, the fault data is analyzed according to a preset fault diagnosis model, and a fault diagnosis result is generated and output, where the preset fault diagnosis model includes the preset indicator threshold.
步骤203,当业务的指标阈值已更新时,则获取最新的指标阈值,并根据所述最新的指标阈值对预设的故障诊断模型进行更新,利用更新后的故障诊断模型对所述故障数据进行分析,生成并输出故障诊断结果,其中所述最新的指标阈值通过如上实施例所述的数据更新方法获取。Step 203: When the indicator threshold of the service has been updated, the latest indicator threshold is obtained, the preset fault diagnosis model is updated according to the latest indicator threshold, and the fault data is analyzed using the updated fault diagnosis model. Analyze, generate and output fault diagnosis results, where the latest indicator threshold is obtained through the data update method described in the above embodiment.
本实施例中,指标阈值的更新通过上述实施例所述的数据更新方法获取,而数据更新方法可以以模块的形式部署在某一设备或平台上,故障诊断方法也可以以模块的形式部署在与数据更新方法相同或不同的设备或平台上。在对业务进行故障诊断时,可以先获取业务的故 障数据,然后判断该业务的指标阈值是否更新,若指标阈值已更新,则获取更新后的指标阈值,并用更新后的指标阈值替换预设的故障诊断模型中预设的指标阈值(预设的指标阈值可以是初始人为设置的,也可以是通过数据更新方法获取的历史时刻的指标阈值),得到更新后的故障诊断模型。In this embodiment, the update of the indicator threshold is obtained through the data update method described in the above embodiment, and the data update method can be deployed on a certain device or platform in the form of a module, and the fault diagnosis method can also be deployed on a certain device or platform in the form of a module. On the same or different device or platform as the data update method. When performing fault diagnosis on a business, you can first obtain the fault data of the business, and then determine whether the indicator threshold of the business has been updated. If the indicator threshold has been updated, obtain the updated indicator threshold and replace the preset one with the updated indicator threshold. The preset indicator threshold in the fault diagnosis model (the preset indicator threshold can be initially manually set, or it can be the indicator threshold at a historical moment obtained through the data update method), and the updated fault diagnosis model is obtained.
在一实施例中,所述预设的故障诊断模型包括多个故障诊断模板,所述故障诊断模板根据规则引擎和预设的故障诊断条件进行创建。规则引擎可以是Drools引擎、Open Tablets引擎、Easy Rules引擎等等,通过规则引擎将预设的故障诊断条件转化为故障诊断模板,一个故障诊断模型中通常包括多个故障诊断模板。In one embodiment, the preset fault diagnosis model includes multiple fault diagnosis templates, and the fault diagnosis templates are created according to a rule engine and preset fault diagnosis conditions. The rule engine can be a Drools engine, an Open Tablets engine, an Easy Rules engine, etc. The rule engine converts preset fault diagnosis conditions into fault diagnosis templates. A fault diagnosis model usually includes multiple fault diagnosis templates.
需要说明的是,通过规则引擎用户无需编写任何代码、无需训练即可根据预设的故障诊断条件创建故障诊断模型,避免了需要专业人员编写代码、训练数据量大、且训练时间长的问题。但规则引擎中的阈值一般为固定值,无法满足同一业务各种场景下的故障诊断,因此本申请又结合了数据更新方法,获取动态阈值,如此本申请的故障诊断模型既无需搜集大量异常数据进行训练,又根据业务当前运行状况适应性调整阈值。整个故障诊断方法成本低、模型构建时间短、准确度高。It should be noted that through the rule engine, users can create fault diagnosis models based on preset fault diagnosis conditions without writing any code or training, which avoids the problems of requiring professionals to write code, large amounts of training data, and long training time. However, the thresholds in the rule engine are generally fixed values, which cannot satisfy fault diagnosis in various scenarios of the same business. Therefore, this application also combines the data update method to obtain dynamic thresholds. In this way, the fault diagnosis model of this application does not need to collect a large amount of abnormal data. Conduct training and adaptively adjust the threshold according to the current operating status of the business. The entire fault diagnosis method has low cost, short model construction time and high accuracy.
以故障是Yarn的运行任务数突然下降问题为例说明本申请的故障诊断方法,对于该故障,预设的故障诊断条件包括:(1)如果提交任务服务宕机,那么引起Yarn的运行任务数下降的原因就是提交节点宕机;(2)如果提交任务服务未宕机,而Pg服务异常,那么引起Yarn的运行任务数下降的原因就是Pg服务异常;(3)如果提交任务服务与Pg服务都未异常,而当前时刻初始化的任务异常,那么引起Yarn的运行任务数下降的原因就是任务初始化异常。通过以上故障诊断条件和规则引擎创建多个故障诊断模板,根据多个故障诊断模板创建故障诊断模型。The fault diagnosis method of this application is explained by taking the sudden drop in the number of Yarn's running tasks as an example. For this fault, the preset fault diagnosis conditions include: (1) If the task submission service is down, then the number of Yarn's running tasks will be caused. The reason for the decrease is that the submitting node is down; (2) If the submitting task service is not down, but the Pg service is abnormal, then the reason for the decrease in the number of Yarn running tasks is that the Pg service is abnormal; (3) If the submitting task service is different from the Pg service None of them are abnormal, but the task initialized at the current moment is abnormal, so the reason for the decrease in the number of running tasks in Yarn is the task initialization exception. Create multiple fault diagnosis templates through the above fault diagnosis conditions and rule engine, and create a fault diagnosis model based on multiple fault diagnosis templates.
对于该故障涉及的指标数据为(1)提交任务服务的数量;(2)Pg服务处理的数量;(3)初始任务的数量。对于(1)和(2)一般其对应的指标阈值为固定值,所以无需数据更新方法进行更新,而当前小时的初始任务数(初始化了多少任务数才需要提交多少任务数)是否有异常,这个是不能使用固定阈值来判断的,因为不同的节点,任务量本身就不同,所以每天不同的小时任务量不同,不同的天相同小时的任务数也不一定相同,所以采用数据更新方法不断更新初始任务的数量的指标阈值,进而不断更新故障诊断模型,完成Yarn的运行任务数突然下降问题的故障根因诊断。The indicator data involved in this failure are (1) the number of submitted task services; (2) the number of Pg service processes; (3) the number of initial tasks. For (1) and (2), generally the corresponding indicator thresholds are fixed values, so there is no need for data update methods to update. And whether there is any abnormality in the initial number of tasks in the current hour (how many tasks are initialized before how many tasks need to be submitted), This cannot be judged using a fixed threshold, because different nodes have different task volumes, so the task volume is different at different hours every day, and the number of tasks at the same hour on different days is not necessarily the same, so the data update method is used to continuously update The indicator threshold of the number of initial tasks is continuously updated, and the fault diagnosis model is continuously updated to complete the root cause diagnosis of the sudden drop in the number of running tasks in Yarn.
本申请提出的一种故障诊断方法,实时获取业务的故障数据,然后判断该业务的指标阈值是否更新,若该业务的指标阈值未更新,则使用预设的故障诊断模型和预设的指标阈值对故障数据进行分析,获取故障诊断结果(引起故障的原因)。若该业务的指标阈值已更新,则根据最新的指标阈值更新预设的故障诊断模型,利用更新后的故障诊断模型完成故障诊断,确定引起故障的原因。也就是说,本申请的故障诊断模型不是一成不变的,而是根据最新的指标阈值不断更新的,而最新的指标阈值是根据最新的历史指标数据确定,如此故障诊断模型可以及时根据业务当前的运行状态变化适应性调整,使得模型更适用于实际业务场景,进而提高了故障诊断结果的准确性。A fault diagnosis method proposed in this application obtains the fault data of the service in real time, and then determines whether the indicator threshold of the service has been updated. If the indicator threshold of the service has not been updated, the preset fault diagnosis model and the preset indicator threshold are used. Analyze the fault data and obtain the fault diagnosis results (the cause of the fault). If the indicator threshold of the business has been updated, the preset fault diagnosis model is updated according to the latest indicator threshold, and the updated fault diagnosis model is used to complete fault diagnosis and determine the cause of the fault. In other words, the fault diagnosis model of this application is not static, but is constantly updated based on the latest indicator thresholds, and the latest indicator thresholds are determined based on the latest historical indicator data. In this way, the fault diagnosis model can be timely based on the current operation of the business. The adaptive adjustment of state changes makes the model more suitable for actual business scenarios, thereby improving the accuracy of fault diagnosis results.
此外,应当理解的是,上面各种方法的步骤划分,只是为了描述清楚,实现时可以合并为一个步骤或者对某些步骤进行拆分,分解为多个步骤,只要包括相同的逻辑关系,都在本申请的保护范围内;对流程中添加无关紧要的修改或者引入无关紧要的设计,但不改变其流 程的核心设计都在该申请的保护范围内。In addition, it should be understood that the division of steps in the various methods above is only for the purpose of clear description. During implementation, they can be combined into one step or some steps can be split into multiple steps. As long as they include the same logical relationship, all It is within the protection scope of this application; adding insignificant modifications to the process or introducing insignificant designs without changing the core design of the process are all within the protection scope of this application.
本申请的实施方式涉及一种电子设备,如图3所示,包括:至少一个处理器301;以及,与所述至少一个处理器301通信连接的存储器302;其中,所述存储器302存储有可被所述至少一个处理器301执行的指令,所述指令被所述至少一个处理器301执行,以使所述至少一个处理器301能够执行上述实施方式的故障诊断方法,或者能够执行上述实施方式的数据更新方法。The embodiment of the present application relates to an electronic device, as shown in Figure 3, including: at least one processor 301; and a memory 302 communicatively connected with the at least one processor 301; wherein the memory 302 stores data that can Instructions executed by the at least one processor 301, the instructions are executed by the at least one processor 301, so that the at least one processor 301 can perform the fault diagnosis method of the above embodiment, or can perform the above embodiment data update method.
其中,存储器和处理器采用总线方式连接,总线可以包括任意数量的互联的总线和桥,总线将一个或多个处理器和存储器的各种电路连接在一起。总线还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路连接在一起,这些都是本领域所公知的,因此,本文不再对其进行进一步描述。总线接口在总线和收发机之间提供接口。收发机可以是一个元件,也可以是多个元件,比如多个接收器和发送器,提供用于在传输介质上与各种其他装置通信的单元。经处理器处理的数据通过天线在无线介质上进行传输,进一步,天线还接收数据并将数据传送给处理器。Among them, the memory and the processor are connected using a bus. The bus can include any number of interconnected buses and bridges. The bus connects one or more processors and various circuits of the memory together. The bus may also connect various other circuits together such as peripherals, voltage regulators, and power management circuits, which are all well known in the art and therefore will not be described further herein. The bus interface provides the interface between the bus and the transceiver. A transceiver may be one element or may be multiple elements, such as multiple receivers and transmitters, providing a unit for communicating with various other devices over a transmission medium. The data processed by the processor is transmitted over the wireless medium through the antenna. Further, the antenna also receives the data and transmits the data to the processor.
处理器负责管理总线和通常的处理,还可以提供各种功能,包括定时,外围接口,电压调节、电源管理以及其他控制功能。而存储器可以被用于存储处理器在执行操作时所使用的数据。The processor is responsible for managing the bus and general processing, and can also provide a variety of functions, including timing, peripheral interfaces, voltage regulation, power management, and other control functions. Memory can be used to store data used by the processor when performing operations.
本申请的实施方式涉及一种计算机可读存储介质,存储有计算机程序。计算机程序被处理器执行时实现上述故障诊断方法或数据更新方法。Embodiments of the present application relate to a computer-readable storage medium storing a computer program. When the computer program is executed by the processor, the above fault diagnosis method or data update method is implemented.
即,本领域技术人员可以理解,实现上述实施方式方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施方式所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。That is, those skilled in the art can understand that all or part of the steps in the method of implementing the above embodiments can be completed by instructing relevant hardware through a program. The program is stored in a storage medium and includes several instructions to cause a device ( It may be a microcontroller, a chip, etc.) or a processor (processor) that executes all or part of the steps of the method described in each embodiment of the application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program code. .
本领域的普通技术人员可以理解,上述各实施例是实现本申请的具体实施例,而在实际应用中,可以在形式上和细节上对其作各种改变,而不偏离本申请的精神和范围。Those of ordinary skill in the art can understand that the above-mentioned embodiments are specific embodiments for implementing the present application, and in actual applications, various changes can be made in form and details without departing from the spirit and spirit of the present application. scope.

Claims (10)

  1. 一种数据更新方法,包括:A data update method including:
    获取业务的历史指标数据,并根据所述历史指标数据随时间的变化规律对所述历史指标数据进行分类,获取分类结果;Obtain the historical indicator data of the business, classify the historical indicator data according to the change pattern of the historical indicator data over time, and obtain the classification results;
    根据所述分类结果对应的异常检测算法对所述历史指标数据进行异常检测,确定出所述历史指标数据中的异常数据;Perform anomaly detection on the historical indicator data according to the anomaly detection algorithm corresponding to the classification result, and determine the abnormal data in the historical indicator data;
    根据所述异常数据的分布,更新所述业务的预设的指标阈值。According to the distribution of the abnormal data, the preset indicator threshold of the business is updated.
  2. 根据权利要求1所述的数据更新方法,其中,所述根据所述历史指标数据随时间的变化规律对所述历史指标数据进行分类,获取分类结果,包括:The data update method according to claim 1, wherein said classifying the historical indicator data according to the change pattern of the historical indicator data over time and obtaining the classification result includes:
    获取所述历史指标数据的自相关系数幅度谱,并根据所述自相关系数幅度谱确定所述历史指标数据是否具有周期性;Obtain the autocorrelation coefficient amplitude spectrum of the historical indicator data, and determine whether the historical indicator data is periodic based on the autocorrelation coefficient amplitude spectrum;
    获取所述历史指标数据的均值和方差,并根据所述均值和方差确定所述历史指标数据是否具有波动性;Obtain the mean and variance of the historical indicator data, and determine whether the historical indicator data is volatile based on the mean and variance;
    对所述历史指标数据进行线性拟合,获取拟合函数,并根据所述拟合函数确定所述历史指标数据是否具有趋势性。Linear fitting is performed on the historical indicator data, a fitting function is obtained, and whether the historical indicator data has a trend is determined based on the fitting function.
  3. 根据权利要求2所述的数据更新方法,其中,所述根据所述自相关系数幅度谱确定所述历史指标数据是否具有周期性,包括:The data update method according to claim 2, wherein the determining whether the historical indicator data is periodic based on the autocorrelation coefficient amplitude spectrum includes:
    计算所述自相关系数幅度谱的最大值、均值和方差;Calculate the maximum value, mean value and variance of the autocorrelation coefficient amplitude spectrum;
    当所述最大值大于所述均值和所述方差之和时,所述历史指标数据为周期性数据;When the maximum value is greater than the sum of the mean value and the variance, the historical indicator data is periodic data;
    当所述最大值小于或等于所述均值和所述方差之和时,所述历史指标数据为非周期性数据。When the maximum value is less than or equal to the sum of the mean value and the variance, the historical indicator data is aperiodic data.
  4. 根据权利要求2所述的数据更新方法,其中,通过以下公式确定所述历史指标数据是否具有波动性,包括:The data updating method according to claim 2, wherein whether the historical indicator data has volatility is determined through the following formula, including:
    Figure PCTCN2022130723-appb-100001
    Figure PCTCN2022130723-appb-100001
    其中,μ为历史指标数据的均值,σ为历史指标数据的方差,λ为预设的调节参数。Among them, μ is the mean value of historical indicator data, σ is the variance of historical indicator data, and λ is the preset adjustment parameter.
  5. 根据权利要求1-4中任一项所述的数据更新方法,其中,所述方法还包括:The data update method according to any one of claims 1-4, wherein the method further includes:
    当所述分类结果为所述历史指标数据为周期性数据时,采用回归算法对所述历史指标数据进行异常检测;When the classification result shows that the historical indicator data is periodic data, a regression algorithm is used to perform anomaly detection on the historical indicator data;
    当所述分类结果为所述历史指标数据为非周期性数据时,采用聚类算法对所述历史指标数据进行异常检测;When the classification result shows that the historical indicator data is non-periodic data, a clustering algorithm is used to perform anomaly detection on the historical indicator data;
    当所述分类结果为所述历史指标数据为无波动性数据时,采用箱线图算法对所述历史指 标数据进行异常检测;When the classification result is that the historical indicator data is non-volatile data, a box plot algorithm is used to perform anomaly detection on the historical indicator data;
    当所述分类结果为所述历史指标数据为趋势性数据时,采用时间序列分解算法对所述历史指标数据进行异常检测。When the classification result shows that the historical indicator data is trend data, a time series decomposition algorithm is used to perform anomaly detection on the historical indicator data.
  6. 根据权利要求1所述的数据更新方法,其中,所述根据所述异常数据的分布,更新所述业务的预设的指标阈值,包括:The data update method according to claim 1, wherein updating the preset indicator threshold of the business according to the distribution of the abnormal data includes:
    根据所述异常数据的分布,确定所述异常数据的上四分位数、下四分位数和四分位间距;According to the distribution of the abnormal data, determine the upper quartile, lower quartile and interquartile range of the abnormal data;
    根据所述上四分位数、所述下四分位数和所述四分位间距确定所述业务当前的指标阈值,并采用当前的指标阈值替换预设的指标阈值。The current indicator threshold of the business is determined based on the upper quartile, the lower quartile and the interquartile range, and the current indicator threshold is used to replace the preset indicator threshold.
  7. 一种故障诊断方法,包括:A fault diagnosis method including:
    获取业务的故障数据;Obtain business fault data;
    当所述业务的指标阈值未更新时,则根据预设的故障诊断模型对所述故障数据进行分析,生成并输出故障诊断结果,其中所述预设的故障诊断模型包含预设的指标阈值;When the indicator threshold of the service is not updated, the fault data is analyzed according to a preset fault diagnosis model, and a fault diagnosis result is generated and output, wherein the preset fault diagnosis model includes the preset indicator threshold;
    当所述业务的指标阈值已更新时,则获取最新的指标阈值,并根据所述最新的指标阈值对预设的故障诊断模型进行更新,利用更新后的故障诊断模型对所述故障数据进行分析,生成并输出故障诊断结果,其中所述最新的指标阈值通过如上权利要求1-6中任一项所述的数据更新方法获取。When the indicator threshold of the service has been updated, the latest indicator threshold is obtained, the preset fault diagnosis model is updated according to the latest indicator threshold, and the fault data is analyzed using the updated fault diagnosis model. , generate and output fault diagnosis results, wherein the latest indicator threshold is obtained through the data update method as described in any one of claims 1-6.
  8. 根据权利要求7中所述的故障诊断方法,其中,所述预设的故障诊断模型包括多个故障诊断模板,所述故障诊断模板根据规则引擎和预设的故障诊断条件进行创建。The fault diagnosis method according to claim 7, wherein the preset fault diagnosis model includes a plurality of fault diagnosis templates, and the fault diagnosis templates are created according to a rule engine and preset fault diagnosis conditions.
  9. 一种电子设备,包括:An electronic device including:
    至少一个处理器;以及,at least one processor; and,
    与所述至少一个处理器通信连接的存储器;其中,a memory communicatively connected to the at least one processor; wherein,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如权利要求1至6中任一项所述的数据更新方法,或者能够执行如权利要求7至8中任一项所述的故障诊断方法。The memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor, so that the at least one processor can perform as claimed in any one of claims 1 to 6 The data update method described above, or the fault diagnosis method described in any one of claims 7 to 8 can be performed.
  10. 一种计算机可读存储介质,存储有计算机程序,其中,所述计算机程序被处理器执行时实现权利要求1至6中任一项所述的数据更新方法,或者能够执行如权利要求7至8中任一项所述的故障诊断方法。A computer-readable storage medium storing a computer program, wherein when the computer program is executed by a processor, the data update method according to any one of claims 1 to 6 is implemented, or the computer program is capable of executing the data update method according to any one of claims 7 to 8. The fault diagnosis method described in any one of them.
PCT/CN2022/130723 2022-03-21 2022-11-08 Data updating method, fault diagnosis method, electronic device, and storage medium WO2023179042A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210283753.2 2022-03-21
CN202210283753.2A CN116821141A (en) 2022-03-21 2022-03-21 Data updating method, fault diagnosis method, electronic device, and storage medium

Publications (1)

Publication Number Publication Date
WO2023179042A1 true WO2023179042A1 (en) 2023-09-28

Family

ID=88099730

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/130723 WO2023179042A1 (en) 2022-03-21 2022-11-08 Data updating method, fault diagnosis method, electronic device, and storage medium

Country Status (2)

Country Link
CN (1) CN116821141A (en)
WO (1) WO2023179042A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117591607A (en) * 2024-01-19 2024-02-23 杭州青橄榄网络技术有限公司 Data synchronization management method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170094537A1 (en) * 2015-09-24 2017-03-30 Futurewei Technologies, Inc. System and Method for a Multi View Learning Approach to Anomaly Detection and Root Cause Analysis
CN110807024A (en) * 2019-10-12 2020-02-18 广州市申迪计算机系统有限公司 Dynamic threshold anomaly detection method and system, storage medium and intelligent device
CN112712113A (en) * 2020-12-29 2021-04-27 广州品唯软件有限公司 Alarm method and device based on indexes and computer system
CN113010389A (en) * 2019-12-20 2021-06-22 阿里巴巴集团控股有限公司 Training method, fault prediction method, related device and equipment
CN113887616A (en) * 2021-09-30 2022-01-04 海看网络科技(山东)股份有限公司 Real-time abnormity detection system and method for EPG (electronic program guide) connection number

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170094537A1 (en) * 2015-09-24 2017-03-30 Futurewei Technologies, Inc. System and Method for a Multi View Learning Approach to Anomaly Detection and Root Cause Analysis
CN110807024A (en) * 2019-10-12 2020-02-18 广州市申迪计算机系统有限公司 Dynamic threshold anomaly detection method and system, storage medium and intelligent device
CN113010389A (en) * 2019-12-20 2021-06-22 阿里巴巴集团控股有限公司 Training method, fault prediction method, related device and equipment
CN112712113A (en) * 2020-12-29 2021-04-27 广州品唯软件有限公司 Alarm method and device based on indexes and computer system
CN113887616A (en) * 2021-09-30 2022-01-04 海看网络科技(山东)股份有限公司 Real-time abnormity detection system and method for EPG (electronic program guide) connection number

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117591607A (en) * 2024-01-19 2024-02-23 杭州青橄榄网络技术有限公司 Data synchronization management method and system
CN117591607B (en) * 2024-01-19 2024-05-07 杭州青橄榄网络技术有限公司 Data synchronization management method and system

Also Published As

Publication number Publication date
CN116821141A (en) 2023-09-29

Similar Documents

Publication Publication Date Title
US10311044B2 (en) Distributed data variable analysis and hierarchical grouping system
US11403164B2 (en) Method and device for determining a performance indicator value for predicting anomalies in a computing infrastructure from values of performance indicators
CN103513983B (en) method and system for predictive alert threshold determination tool
US20190080253A1 (en) Analytic system for graphical interpretability of and improvement of machine learning models
US11288577B2 (en) Deep long short term memory network for estimation of remaining useful life of the components
US20170097980A1 (en) Detection method and information processing device
WO2021105927A1 (en) Machine learning performance monitoring and analytics
CN111949429A (en) Server fault monitoring method and system based on density clustering algorithm
WO2023179042A1 (en) Data updating method, fault diagnosis method, electronic device, and storage medium
CN116416884B (en) Testing device and testing method for display module
CN115699209A (en) Method for Artificial Intelligence (AI) model selection
CN115705501A (en) Hyper-parametric spatial optimization of machine learning data processing pipeline
US11792081B2 (en) Managing telecommunication network event data
CN117540826A (en) Optimization method and device of machine learning model, electronic equipment and storage medium
US11853945B2 (en) Data anomaly forecasting from data record meta-statistics
CN111400122A (en) Hard disk health degree assessment method and device
Burmeister et al. Exploration of production data for predictive maintenance of industrial equipment: A case study
Landin et al. An intelligent monitoring algorithm to detect dependencies between test cases in the manual integration process
US11711257B1 (en) Systems and methods for automating incident severity classification
US11954615B2 (en) Model management for non-stationary systems
CN117539948B (en) Service data retrieval method and device based on deep neural network
US20240143430A1 (en) Extended dynamic intelligent log analysis tool
US20240142922A1 (en) Analysis method, analysis program and information processing device
US20230308461A1 (en) Event-Based Machine Learning for a Time-Series Metric
Zhang Essays on Empirical likelihood for Heaviness Estimation, Outlier Detection and Clustering

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22933093

Country of ref document: EP

Kind code of ref document: A1