WO2022048168A1

WO2022048168A1 - Training method and device for failure prediction neural network model

Info

Publication number: WO2022048168A1
Application number: PCT/CN2021/090028
Authority: WO
Inventors: 王洪涛
Original assignee: 上海上讯信息技术股份有限公司
Priority date: 2020-09-03
Filing date: 2021-04-26
Publication date: 2022-03-10
Also published as: CN112115024B; CN112115024A

Abstract

A training method and device for a failure prediction neural network model. The method comprises: first obtaining a historical index data set of monitored points, wherein the historical index data set is composed of monitoring index data, collected at different historical time points, of the monitored points (S11); next, on the basis of a preset period, processing the historical index data set to determine a training set and a test set (S12); and then, training a neural network on the basis of the training set till an output error output by the neural network satisfies a first preset threshold, testing the neural network on the basis of the test set, and if an accuracy rate satisfies a second preset threshold, obtaining a trained monitored point failure prediction neural network model (S13). According to the method, the trained neural network model is obtained and is used for performing failure prediction on a running state or a service state of the monitored computer, so that operation and maintenance personnel can intervene in advance, a failure and abnormity are effectively prevented or failures are eliminated in time, and MTBF can be effectively increased or MTTR can be effectively reduced.

Description

A kind of training method and equipment for fault prediction neural network model

technical field

The present application relates to the technical field of computer data processing, and in particular, to a technology for training a fault prediction neural network model.

Background technique

At present, in the daily operation and maintenance practice of various types of computers, especially a large number of data computing and storage servers, monitoring indicators are widely used to monitor the health status of computer servers, such as CPU usage, memory usage, etc. It is used to monitor the status of the business running on the computer server, such as the business volume per minute, the inflow and outflow data volume of the network card per unit time, etc.

The prior art is to determine whether the monitoring indicators are abnormal offline or in real time by setting fixed thresholds and/or dynamic thresholds of monitoring indicators. However, these methods can only detect the abnormality that is or has occurred, which belongs to an after-the-fact monitoring method. It is not possible to predict in advance an anomaly occurs.

SUMMARY OF THE INVENTION

The purpose of this application is to provide a training method and device for a fault prediction neural network model, to solve the technical problem in the prior art that it is impossible to predict in advance the abnormality of the monitored computer operating state.

According to an aspect of the present application, a training method for a fault prediction neural network model is provided, wherein the method includes:

Obtain the historical indicator data set of the monitored point, wherein the historical indicator data set is composed of the monitoring indicator data of the monitored point collected at different historical time points;

Based on a preset period, processing the historical indicator data set to determine a training set and a test set;

The neural network is trained based on the training set until the output error output by the neural network meets the first preset threshold, and the neural network is tested based on the test set. If the accuracy rate meets the second preset threshold, the trained neural network is obtained. A neural network model for fault prediction of monitored points.

Optionally, the processing of the historical indicator data set based on a preset period to determine the training set and the test set includes:

Based on the preset period, determine the sampling number N;

Traverse the historical indicator data in the historical indicator data set, and construct historical indicator data sequences at different time points, wherein the historical indicator data sequences at different time points are composed of N historical indicator data before the time point;

Determine the historical indicator data at different time points as the true value annotation of the historical indicator data sequence corresponding to the time point;

The training set and the test set are determined based on the historical indicator data sequence and the true value label, wherein the samples in the training set and the test set include historical indicator data sequences and corresponding true value labels at different time points.

Optionally, before constructing the historical indicator data series at different time points, the method further includes:

The historical indicator data set is preprocessed to eliminate the influence of abnormal historical indicator data.

Optionally, wherein, the neural network is an LSTM neural network, and the structure of the LSTM neural network includes:

1 input layer;

2 LSTM hidden layers;

1 fully connected output layer.

Optionally, wherein the output error includes a mean square error.

Optionally, wherein, the method further includes:

Obtain the index data sequence at the current time point, wherein the index data sequence at the current time point is composed of N historical index data before the current time point;

Determine the iterative extrapolation prediction times M based on the preset prediction time length, the preset period and the sampling number N;

The index data sequence at the current time point is input into the trained neural network model for failure prediction of monitored points, and M times of iterative extrapolation is performed to obtain prediction index data of M monitored points.

Optionally, wherein, the method further includes:

The prediction index data of the M monitored points are compared with the third preset threshold in time sequence, and the time point corresponding to the first non-compliant prediction index data of the monitored point is determined as the failure time point.

Optionally, wherein, the method further includes:

Based on the failure time point and the corresponding prediction index data, alarm information is determined, and the alarm information is reported.

According to another aspect of the present application, a training device for a fault prediction neural network model is also provided, wherein the device includes:

a first device, configured to obtain a historical indicator data set of a monitored point, wherein the historical indicator data set is composed of monitoring indicator data of the monitored point collected at different historical time points;

a second device, configured to process the historical indicator data set based on a preset period and sampling frequency to determine a training set and a test set;

a third device, configured to train a neural network based on the training set until the output error output by the neural network meets a first preset threshold, and test the neural network based on the test set, if the accuracy meets the second preset Threshold to obtain the trained neural network model for failure prediction of monitored points.

Optionally, wherein the device further includes:

The fourth device is used for preprocessing the historical indicator data set to eliminate the influence of abnormal historical indicator data.

Optionally, wherein the device further includes:

a fifth device, configured to obtain the index data sequence at the current time point, wherein the index data sequence at the current time point is composed of N historical index data before the current time point;

a sixth device, configured to determine the number of iteratively extrapolated predictions M based on a preset prediction time length, a preset period and a sampling number N;

The seventh device is used for inputting the index data sequence of the current time point into the trained neural network model for failure prediction of monitored points, and performing M times of iterative extrapolation prediction to obtain prediction index data of M monitored points .

Optionally, wherein the device further includes:

The eighth device is used to compare the prediction index data of the M monitored points with a third preset threshold in time sequence, and determine that the time point corresponding to the prediction index data of the first non-compliant monitored point is a fault point in time.

Optionally, wherein the device further includes:

A ninth device is configured to determine alarm information based on the failure time point and corresponding prediction index data, and report the alarm information.

Compared with the prior art, the present application uses a method and equipment for training a neural network model for fault prediction, first obtains the historical indicator data set of the monitored point, and then performs the analysis on the historical indicator data set based on a preset period. processing to determine a training set and a test set, and then train a neural network based on the training set until the output error output by the neural network meets a first preset threshold, and test the neural network based on the test set. The second preset threshold is met, and the trained neural network model for fault prediction of the monitored point is obtained. Through this method, a trained neural network model can be obtained, which can be used to predict the failure of the monitored computer operating state or business state, allowing operation and maintenance personnel to intervene in advance, effectively preventing the occurrence of abnormal failures, and effectively increasing the MTBF ( Mean Time Between Failure), or in the event of an unavoidable failure, prepare for abnormal failure recovery in advance, and eliminate it in time when the failure occurs, effectively reducing MTTR (Mean Time To Restoration, mean failure recovery time) .

Description of drawings

Other features, objects and advantages of the present invention will become more apparent by reading the detailed description of non-limiting embodiments made with reference to the following drawings:

1 shows a flowchart of a training method for a fault prediction neural network model according to an aspect of the present application;

2 shows a schematic diagram of a training device for a fault prediction neural network model according to an aspect of the present application;

The same or similar reference numbers in the drawings represent the same or similar parts.

detailed description

The present invention will be described in further detail below with reference to the accompanying drawings.

In a typical configuration of the present application, each module of the system and the trusted party include one or more processors (CPUs), input/output interfaces, network interfaces and memory.

Memory may include forms of non-persistent memory, random access memory (RAM) and/or non-volatile memory in computer readable media, such as read only memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media includes both persistent and non-permanent, removable and non-removable media, and storage of information may be implemented by any method or technology. Information may be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer-readable media does not include non-transitory computer-readable media (transitory media), such as modulated data signals and carrier waves.

In order to further illustrate the technical means adopted in the present application and the obtained effects, the following describes the technical solutions of the present application in a clear and complete manner with reference to the accompanying drawings and preferred embodiments.

FIG. 1 shows a flowchart of a training method for a fault prediction neural network model according to an aspect of the present application, wherein the method of an embodiment includes:

S11 obtains the historical indicator data set of the monitored point, wherein the historical indicator data set is composed of the monitoring indicator data of the monitored point collected at different historical time points;

S12, based on a preset period, process the historical indicator data set to determine a training set and a test set;

S13 trains the neural network based on the training set until the output error output by the neural network meets the first preset threshold, and tests the neural network based on the test set, if the accuracy meets the second preset threshold, the trained well is obtained The monitored point fault prediction neural network model.

In this application, the method is performed by a device 1, which is a computer device and/or a cloud, and the computer device includes, but is not limited to, a personal computer, a laptop computer, an industrial computer, a network host, a single network server, multiple A network server set; the cloud is composed of a large number of computers or network servers based on cloud computing (Cloud Computing), wherein cloud computing is a kind of distributed computing, a virtual supercomputer composed of a group of loosely coupled computer sets.

Here, the computer equipment and/or cloud are only examples, and other existing or future equipment and/or resource sharing platforms, if applicable to this application, should also be included in the protection scope of this application. Here, Incorporated herein by reference.

In this embodiment, in the step S11, the device 1 obtains the historical indicator data set of the monitored point, wherein the monitored point is set on the monitored device, or is based on the service status running on the monitored device. set up.

For example, device 1 directly obtains the indicator data of the monitoring points of the monitored device through SNMP (Simple Network Management Protocol) or other protocol capture, monitoring agent push, etc., for example, the CPU and memory of the monitored computer, Or the indicator data of the business running on the monitored device, such as the network traffic of the monitored computer.

For the indicator data that cannot be obtained directly, Device 1 can first obtain the operating operation log of the operating system or service on the monitored device, and then process the operating operation log, such as analyzing the operating operation log by keyword or using a clustering algorithm. Classification, to obtain the statistics of the number of operations of similar logs at different time points or time windows.

For example, the access log of the monitored computer 192.168.211.124 accessing port 30443 on the device 192.168.212.22 is as follows:

192.168.212.124 192.168.211.22:30443-[16/Aug/2017:16:25:03+0800]"GET/HTTP/1.1"200 10799"https://192.168.211.22:30443/verifypasslog/detail/id/ 6""Mozilla/5.0(Windows NT 6.1; Win64; x64)AppleWebKit/537.36(KHTML,like Gecko)Chrome/73.0.3683.86 Safari/537.36"

192.168.212.124 192.168.211.22:30443-[16/Aug/2017:16:25:03+0800]"GET/themes/blue/css/login.css HTTP/1.1"200 5530"https://192.168.211.22 :30443/""Mozilla/5.0(Windows NT 6.1; Win64; x64)AppleWebKit/537.36(KHTML,like Gecko)Chrome/73.0.3683.86 Safari/537.36"

192.168.212.124 192.168.211.22:30443-[16/Aug/2017:17:25:03+0800]"GET/js/jquery-1.7.2.min.js HTTP/1.1"304 0"https://192.168 .211.22:30443/""Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36"

192.168.212.124 192.168.211.22:30443-[16/Aug/2017:18:25:03+0800]"GET /js/jquery.placeholder.js HTTP/1.1"304 0"https://192.168.211.22:30443 /""Mozilla/5.0(Windows NT 6.1; Win64; x64)AppleWebKit/537.36(KHTML,like Gecko)Chrome/73.0.3683.86 Safari/537.36"

192.168.212.124 192.168.211.22:30443-[16/Aug/2017:19:25:03+0800]"GET/default/getcodes HTTP/1.1"200 5892"https://192.168.211.22:30443/""Mozilla /5.0(Windows NT 6.1; Win64; x64)AppleWebKit/537.36(KHTML,like Gecko)Chrome/73.0.3683.86 Safari/537.36"

192.168.212.124 192.168.211.22:30443-[16/Aug/2017:19:25:03+0800]"GET/themes/blue/images/oma_login_bg.jpg HTTP/1.1"304 0"https://192.168.211.22 :30443/themes/blue/css/login.css""Mozilla/5.0(Windows NT 6.1;Win64;x64)AppleWebKit/537.36(KHTML,like Gecko)Chrome/73.0.3683.86 Safari/537.36"

If 1 hour is used as the collection time window and the number of visits within 1 hour is counted, a data set of visits at different time points can be counted as shown in Table 1 below.

Table 1

采集时间窗口acquisition time window	访问次数number of visits
16:0016:00	22
17:0017:00	11
18:0018:00	11
19:0019:00	22

Similarly, the data set about the number of visits to the /default/getcodes page at different time points can also be counted as shown in Table 2.

Table 2

采集时间窗口acquisition time window	访问次数number of visits
16:0016:00	00
17:0017:00	00
18:0018:00	00
19:0019:00	11

In step S11, the historical indicator data set is composed of monitoring indicator data of the monitored points collected at different historical time points, wherein, in the different historical time points, the interval between adjacent historical time points is the same Yes, the historical indicator dataset is required to contain a sufficient amount of monitoring indicator data for training.

Continuing in this embodiment, in the step S12, the device 1 processes the obtained historical indicator data set based on a preset period to determine a training set and a test set.

Optionally, wherein the processing of the historical indicator data set based on a preset period to determine the training set and the test set includes:

Based on the preset period, determine the sampling number N;

For example, a preset time period T includes monitoring index data collected at N time points. If the time interval T _b between adjacent time points is the same, the number N of monitoring index data collection time points within a preset time period T is T /T _b , if T _b is 1, then N is the same as T in value, that is, there are T time points in a preset time period T, corresponding to T historical indicator data. The T+1 historical indicator data s _t _-T+1 , s _tT , s _tT-1 ...... _t-2 , s _t-1 form a historical indicator data sequence S _t-1 {s _t-T+1 , s _tT , s _tT-1 ...... s _t-2 , s _t-1 }, the historical indicator data st collected at the time point _t is determined as the true value label corresponding to the historical indicator data sequence S _t-1 .

Based on the historical index data sequence S _t-1 and the true value labeling pass s _t , a sample corresponding to the time point t can be determined. By traversing and processing the historical indicator data in the historical indicator data set P, samples corresponding to different historical time points can be constructed, wherein each sample includes a historical indicator data sequence corresponding to a certain historical time point and the true value of the data sequence. The marked historical indicator data collected at this historical time point. A sample set is composed of all samples, and the sample set is divided into a training set and a test set. The division ratio can be 4:1 or other division ratios, which can be adjusted according to the actual training situation in the subsequent training of the neural network.

Among them, the individual data in the historical indicator data set obtained by device 1 may have data anomalies (deviation from normal high values or normal low values) caused by accidental abnormal conditions, or multiple consecutive data may increase or decrease one by one. Therefore, in view of the possible abnormal historical indicator data in the historical indicator data set, before being used to determine the training set and test set, the historical indicator data set can be preprocessed to eliminate the influence of abnormal historical indicator data.

For example, the moving average method is used to process the data in the historical indicator data set P. For example, for the historical indicator data s t collected at the historical time point _t , the average value of the n historical indicator data before s _t is selected as the corresponding historical The historical indicator data s' _t at time point t can also be selected as the average value of n historical indicator data before and after s _t as the historical indicator data s' _t corresponding to historical time point t. Here, the selection of n should be much smaller than The number of data in the historical indicator data set, so as not to cause the data to fail to truly reflect the actual situation of the monitored points, and the selection method of n historical indicator data before and after s _t is not specifically limited. Traversing all the data in the historical indicator data set in this way, a new historical indicator data set P' composed of new historical indicator data s' _t corresponding to different historical time t can be obtained, wherein, in the new historical indicator data set P' The number of data is n less than the data in the historical indicator data set P (the n data selected when doing moving average does not include the data s t at time _t ), or (n-1) less (when doing moving average The selected n data includes the data s t at time _t ). Based on the preset period, the new historical indicator data set P' is processed to determine the training set and the test set.

Continuing in this embodiment, in the step S13, the device 1 inputs the historical index data sequence in the training set sample into the neural network for training, compares the output of the neural network with the true value label of the sample, and obtains the output error , until the output error meets the preset threshold, and then use the test set samples to test the obtained neural network model. If the accuracy rate meets the preset threshold, the trained neural network is determined as the trained neural network for failure prediction of monitored points. Model.

For example, the device 1 obtains the CPU usage historical data set P of the monitored computer, and preprocesses the CPU usage historical data set P to obtain a new CPU usage historical data set P', where the new CPU usage history The data set P' includes CPU occupancy rate data s' _t corresponding to different historical time points t, and the new CPU occupancy rate historical data set P' is processed to obtain a CPU occupancy rate data sequence S' _{t at different historical time points t -1} , determine s' _t as the true value label of S' t- ₁ , S' _t-1 corresponds to s' _t to form a sample, traverse the new historical data set P' of CPU occupancy, and obtain different history The samples at the time point constitute the CPU usage training set and test set, in which the division ratio can be 4:1, or other division ratios, which can be adjusted according to the actual training situation in the subsequent training of the neural network. Input the CPU occupancy data sequence S' _t-1 corresponding to time t in the CPU occupancy training set into the neural network, and compare the network output with the true value label s' _t corresponding to S' _t-1 to obtain the output error. The output error of the training sample is used to obtain the loss function value of the neural network. If the loss function value meets the preset threshold, or the preset number of iterative training rounds is completed, the training of the neural network is completed, and then the samples in the test set are used to traverse the test to complete the training. For the neural network, if the accuracy rate meets the preset threshold, the trained neural network is determined to be the trained neural network model for failure prediction of the CPU usage of the monitored computer.

Optionally, wherein, the neural network is an LSTM (Long Short Memory Network, long short-term memory network) neural network, and the structure of the LSTM neural network includes:

1 input layer;

2 LSTM hidden layers;

1 fully connected output layer.

The LSTM network structure in one embodiment is as follows in Table 3,

table 3

The neural network structure is not limited in the technical solution of the present application, and an LSTM network can be used, or an ANN (Artificial Neural Network, artificial neural network) or CNN (Convolutional Neural Networks, convolutional neural network) commonly used in the prior art can be used. Network) and other neural networks, and do not make special adjustments to the neural network structure.

Optionally, wherein the output error includes a mean square error.

For example, in the above-mentioned embodiment about the monitored CPU occupancy rate, the CPU occupancy rate data sequence S' _t-1 corresponding to the time point t in the CPU occupancy rate training set is input into the neural network, and the network output is compared with the corresponding S' _t- The true value of ₁ is compared with s' _t , and the output mean square error of the training samples at time t is obtained. Based on the output mean square error of different training samples, the mean square error loss function value of the neural network is obtained. If the mean square error loss function value meets the preset threshold , or complete the preset number of iterative training rounds, complete the training of the neural network, and then use the samples in the test set to traverse and test the trained neural network. If the accuracy rate meets the preset threshold, it is determined that the trained neural network is trained. A neural network model for fault prediction of the CPU occupancy rate of the monitored computer.

Optionally, wherein, the training method for a fault prediction neural network model further includes:

Based on the preset prediction time length, preset period and sampling number N, determine the number of iterative extrapolation predictions M;

Inputting the index data sequence at the current time point into the trained neural network model for failure prediction of monitored points, and performing M iterations of extrapolation prediction to obtain prediction index data of M monitored points.

Wherein, the device 1 obtains the index data sequence S _tp for predicting the next time point t _p+1 of the current time point t _p , wherein S _tp contains T+1 index data, including the current time point collected at the current time point. The indicator data s _tp of the monitored point at the time point, and T historical indicator data s _tp _-T of the monitored points at different historical time points collected at different historical time points within a preset period T before s tp , s _tp-T-1 ...... s _tp-2 , s _tp-1 , where the number of index data contained in S _tp is the same as the number of historical index data contained in the historical index data sequence in the training set and the test set .

Based on the preset prediction time length T _f , the preset period T and the number of samples N, the iterative extrapolation prediction times M can be determined, wherein the calculation formula of M is as follows:

M=T _f /T*N

Input S _tp {s _tp _-T , s _tp -T _- ₁ _...... Prediction indicator data s _tp+1 of the monitored point at the next time point t _p+ 1, and then construct the indicator data sequence S _tp+1 at time point t _p+2 {s _tT-1 ...... s _tp-2 , s _{tp -1} , s _tp , s _tp+1 }, input the trained neural network model for failure prediction of monitored points, obtain the predictive index data s _tp+2 of monitored points at time point t _p+ 2, and perform M iterations in sequence By pushing the prediction, it is possible to predict and obtain the monitored point prediction index data s _tp+3 , s _tp+4 ...... s _tp+M at different time points t _p+3 , t _p+4 ...... t _p+ M .

For example, in the above-mentioned embodiment about the monitored CPU occupancy rate, based on the collected historical indicator data, for example, the collection interval is 1 hour, and the collection time points are t _1-1 , t _1-2 ...... t _10-23 respectively , t _10-24 , a total of 240 collection time points, the corresponding collected CPU usage data are D _1-1 , D _1-2 ...... D _10-23 , D _10-24 , a total of 240 CPUs were collected for 10 days The occupancy rate data constitutes the CPU occupancy rate historical data set. If the selected preset period T is 24 hours, then N is 24, and the number of elements of each CPU occupancy rate historical data sequence determined based on the obtained CPU occupancy rate historical data set The number is 25, and the obtained training samples are shown in Table 4 below.

Table 4

序列sequence	CPU占用率数据CPU usage data	标注真值Label the ground truth
S _2-1 S _2-1	D _1-1、D _1-2……D _1-23、D _1-24、D _2-1 D _1-1 , D _1-2 ......D _1-23 , D _1-24 , D _2-1	D _2-2 D _2-2
S _2-2 S _2-2	D _1-2、D _1-3……D _1-24、D _2-1、D _2-2 D _1-2 , D _1-3 ......D _1-24 , D _2-1 , D _2-2	D _2-3 D _2-3
S _2-3 S _2-3	D _1-3、D _1-4……D _2-1、D _2-2、D _2-3 D _1-3 , D _1-4 ......D _2-1 , D _2-2 , D _2-3	D _2-4 D _2-4
……...	……...	……...
S _10-21 S _10-21	D _9-1、D _9-2……D _10-19、D _10-20、D _10-21 D _9-1 , D _9-2 ......D _10-19 , D _10-20 , D _10-21	D _10-22 D _10-22
S _10-22 S _10-22	D _9-2、D _9-3……D _10-20、D _10-21、D _10-22 D _9-2 , D _9-3 ......D _10-20 , D _10-21 , D _10-22	D _10-23 D _10-23
S _10-23 S _10-23	D _9-3、D _9-4……D _10-21、D _10-22、D _10-23 D _9-3 , D _9-4 ......D _10-21 , D _10-22 , D _10-23	D _10-24 D _10-24

The above samples are respectively composed of a CPU occupancy training set and a test set in a ratio of 4:1 to obtain a trained CPU occupancy prediction neural network model after training the LSTM neural network. Then the obtained sequence S _10-24 {D _9-4 , D _9-5 ...... D _10-22 , D _10-23 , D _10-24 } is input into the trained neural network model for predicting CPU usage, and the prediction is obtained The CPU usage prediction value D _11-1 after 1 hour, and then iteratively obtains the sequence S _11-1 {D _9-6 , D _9-7 ...... D _10-24 , D _11-1 }, the S _11-1 Input the trained neural network model for CPU occupancy prediction, and predict the CPU occupancy rate prediction value D _11-2 after 2 hours. If the preset prediction time length is 24 hours, then iteratively extrapolate the prediction 24 times. The prediction obtains D _11-1 , D _11-2 . . . D _11-23 , D _11-24 CPU usage prediction values.

For example, in the above-mentioned embodiment about the monitored CPU occupancy rate, the 24 CPU occupancy rates obtained by extrapolating the prediction through 24 iterations, in chronological order, respectively compare D _11-1 , D _11-2 . . . with the prediction A threshold (for example, 90%) is set for comparison, and the time point corresponding to the first predicted value exceeding 90% of the CPU usage is determined as the failure time point.

The predicted value obtained by each prediction can also be compared with a preset threshold. If the predicted value is met, the predicted value at the next time point is iteratively extrapolated to predict the predicted value. If the predicted value does not meet the predicted value, the iteratively extrapolated prediction is ended.

Among them, if the failure time point and the corresponding prediction index data are determined, the failure time point and the corresponding prediction index data can be used as the alarm information or part of the alarm information, and notified to the stakeholders related to the monitored point, and the stakeholders will be notified by the stakeholders. Respond to corresponding measures in a timely manner, such as allowing operation and maintenance personnel to intervene in advance to eliminate hidden faults, effectively prevent the occurrence of abnormal faults, and effectively increase MTBF; or deal with faults immediately when they inevitably occur, effectively reducing fault processing time and effectively reducing MTTR .

2 shows a schematic diagram of a training device for a fault prediction neural network model according to another aspect of the present application, wherein the device includes:

a first device 21, configured to obtain a historical indicator data set of a monitored point, wherein the historical indicator data set is composed of monitoring indicator data of the monitored point collected at different historical time points;

The second device 22 is configured to process the historical indicator data set based on a preset period and sampling frequency to determine a training set and a test set;

A third device 23, configured to train a neural network based on the training set until the output error output by the neural network meets a first preset threshold, and test the neural network based on the test set, if the accuracy meets the second preset threshold Set the threshold to obtain the trained neural network model for fault prediction of monitored points.

The first device 21 of the device 1 obtains the historical indicator data set of the monitored point, wherein the historical indicator data set is composed of the monitoring indicator data of the monitored point collected at different historical time points, and the second device 22 Based on the preset period and sampling frequency, the historical index data set obtained by the first device 21 is processed to determine a training set and a test set, and the third device 23 trains the neural network based on the training set determined by the second device 22 until the The output error output by the neural network meets the first preset threshold, and the neural network is tested based on the test set. If the accuracy rate meets the second preset threshold, a trained neural network model for fault prediction of the monitored point is obtained.

Optionally, wherein, the training device for a fault prediction neural network model further includes:

The fourth device 24 (not shown) is configured to preprocess the historical indicator data set to eliminate the influence of abnormal historical indicator data.

The fourth device 24 of the device 1 preprocesses the historical indicator data set acquired by the first device 21 to eliminate the influence of abnormal historical indicator data, and the second device 22 preprocesses the fourth device 24 based on the preset period and sampling frequency After the historical indicator data set is processed, the training set and the test set are determined.

a fifth device 25 (not shown), configured to obtain the index data sequence at the current time point, wherein the index data sequence at the current time point is composed of N historical index data before the current time point;

a sixth device 26 (not shown), configured to determine the number of iteratively extrapolated predictions M based on a preset prediction time length, a preset period and a sampling number N;

A seventh device 27 (not shown) is configured to input the index data sequence of the current time point into the trained neural network model for failure prediction of monitored points, and perform M iterations of extrapolation prediction to obtain M Predictor data for monitoring points.

The fifth device 25 of the device 1 obtains the index data sequence at the current time point, wherein the index data sequence at the current time point is composed of N historical index data before the current time point, and the sixth device 26 is based on a preset prediction time Length, preset period and sampling number N, determine the number of iterative extrapolation predictions M, the seventh device 27 inputs the index data sequence of the current time point obtained by the fifth device into the trained monitored point obtained by the third device 23 The fault prediction neural network model performs M times of iterative extrapolation prediction to obtain the prediction index data of the M monitored points.

The eighth device 28 (not shown) is used to compare the prediction index data of the M monitored points with a third preset threshold in time sequence, and determine the prediction index data of the first non-compliant monitored point The corresponding time point is the failure time point.

The eighth device 28 of the device 1 compares the prediction index data of the M monitored points obtained by the seventh device 27 with the third preset threshold in chronological order, and determines that the first non-compliant predicted index data of the monitored point corresponds to The time point is the failure time point.

A ninth device 29 (not shown) is configured to determine alarm information based on the failure time point and corresponding prediction index data, and report the alarm information.

The ninth device 29 of the device 1 obtains the failure time point and the corresponding prediction index data determined by the eighth device, determines the failure time point and the corresponding prediction index data as the alarm information or a part of the alarm information, and reports the alarm information related to the monitored point. Stakeholders, stakeholders should respond to corresponding measures in a timely manner, such as allowing operation and maintenance personnel to intervene in advance to eliminate hidden faults, effectively prevent the occurrence of abnormal faults, and effectively increase MTBF; or deal with faults immediately when they inevitably occur, effectively reducing Troubleshooting time, can effectively reduce MTTR.

According to yet another aspect of the present application, there is also provided a computer-readable medium storing computer-readable instructions executable by a processor to implement the aforementioned method.

one or more processors; and

A memory storing computer readable instructions which, when executed, cause the processor to perform operations as the aforementioned methods.

For example, the computer-readable instructions, when executed, cause the one or more processors to: obtain a historical indicator data set of the monitored point, preprocess the historical indicator data set to eliminate the influence of abnormal historical indicator data, Based on a preset period, the historical indicator data set is processed to determine a training set and a test set, and a neural network is trained based on the training set until the output error output by the neural network meets the first preset threshold, based on the The test set is used to test the neural network. If the accuracy rate meets the second preset threshold, a trained neural network model for fault prediction of the monitored point is obtained; the index data sequence at the current time point is obtained, based on the preset prediction time length, prediction Set the period and the number of samples, determine the number of iterative extrapolation predictions M, input the index data sequence at the current time point into the trained neural network model for failure prediction of monitored points, and perform M times of iterative extrapolation prediction to obtain M Predictive index data of the M monitored points; compare the predictive index data of the M monitored points with a third preset threshold in time sequence, and determine the time corresponding to the first non-compliant predictive index data of the monitored point The point is the failure time point; based on the failure time point and the corresponding prediction index data, alarm information is determined, and the alarm information is reported.

It will be apparent to those skilled in the art that the present invention is not limited to the details of the above-described exemplary embodiments, but that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics of the invention. Therefore, the embodiments are to be regarded in all respects as illustrative and not restrictive, and the scope of the invention is to be defined by the appended claims rather than the foregoing description, which are therefore intended to fall within the scope of the claims. All changes within the meaning and range of the equivalents of , are included in the present invention. Any reference signs in the claims shall not be construed as limiting the involved claim. Furthermore, it is clear that the word "comprising" does not exclude other units or steps and the singular does not exclude the plural. Several units or means recited in the device claims can also be realized by one unit or means by means of software or hardware. The terms first, second, etc. are used to denote names and do not denote any particular order.

Claims

A training method for a fault prediction neural network model, characterized in that the method comprises:

Obtain the historical indicator data set of the monitored point, wherein the historical indicator data set is composed of the monitoring indicator data of the monitored point collected at different historical time points;

Based on a preset period, processing the historical indicator data set to determine a training set and a test set;

The neural network is trained based on the training set until the output error output by the neural network meets the first preset threshold, and the neural network is tested based on the test set. If the accuracy rate meets the second preset threshold, the trained neural network is obtained. A neural network model for fault prediction of monitored points.
The method according to claim 1, wherein the processing of the historical indicator data set based on a preset period to determine a training set and a test set includes:

Based on the preset period, determine the sampling number N;

Traverse the historical indicator data in the historical indicator data set, and construct historical indicator data sequences at different time points, wherein the historical indicator data sequences at different time points are composed of N historical indicator data before the time point;

Determine the historical indicator data at different time points as the true value annotation of the historical indicator data sequence corresponding to the time point;

The training set and the test set are determined based on the historical indicator data sequence and the true value label, wherein the samples in the training set and the test set include historical indicator data sequences and corresponding true value labels at different time points.
The method according to claim 2, characterized in that, before constructing the historical indicator data series at different time points, the method further comprises:

The historical indicator data set is preprocessed to eliminate the influence of abnormal historical indicator data.
The method according to any one of claims 1 to 3, wherein the neural network is an LSTM neural network, and the structure of the LSTM neural network comprises:

1 input layer;

2 LSTM hidden layers;

1 fully connected output layer.
The method of any one of claims 1 to 4, wherein the output error comprises a mean squared error.
The method according to claim 1, wherein the method further comprises:

Obtain the index data sequence at the current time point, wherein the index data sequence at the current time point is composed of N historical index data before the current time point;

Determine the iterative extrapolation prediction times M based on the preset prediction time length, the preset period and the sampling number N;

The index data sequence at the current time point is input into the trained neural network model for failure prediction of monitored points, and M times of iterative extrapolation is performed to obtain prediction index data of M monitored points.
The method according to claim 6, wherein the method further comprises:

The prediction index data of the M monitored points are compared with the third preset threshold in time sequence, and the time point corresponding to the first non-compliant prediction index data of the monitored point is determined as the failure time point.
The method according to claim 7, wherein the method further comprises:

Based on the failure time point and the corresponding prediction index data, alarm information is determined, and the alarm information is reported.
A training device for a fault prediction neural network model, characterized in that the device includes:

a first device, configured to obtain a historical indicator data set of a monitored point, wherein the historical indicator data set is composed of monitoring indicator data of the monitored point collected at different historical time points;

a second device, configured to process the historical indicator data set based on a preset period and sampling frequency to determine a training set and a test set;

a third device, configured to train a neural network based on the training set until the output error output by the neural network meets a first preset threshold, and test the neural network based on the test set, if the accuracy meets the second preset Threshold to obtain the trained neural network model for failure prediction of monitored points.
The device according to claim 9, wherein the device further comprises:

The fourth device is used for preprocessing the historical indicator data set to eliminate the influence of abnormal historical indicator data.
The device according to claim 9 or 10, wherein the device further comprises:

a fifth device, configured to obtain the index data sequence at the current time point, wherein the index data sequence at the current time point is composed of N historical index data before the current time point;

a sixth device, configured to determine the number of iteratively extrapolated predictions M based on a preset prediction time length, a preset period and a sampling number N;

The seventh device is used for inputting the index data sequence of the current time point into the trained neural network model for failure prediction of monitored points, and performing M times of iterative extrapolation prediction to obtain prediction index data of M monitored points .
The device according to claim 11, wherein the device further comprises:

The eighth device is used to compare the prediction index data of the M monitored points with a third preset threshold in time sequence, and determine that the time point corresponding to the prediction index data of the first non-compliant monitored point is a fault point in time.
The device according to claim 12, wherein the device further comprises:

A ninth device is configured to determine alarm information based on the failure time point and corresponding prediction index data, and report the alarm information.
A computer-readable medium, characterized in that:

Computer readable instructions are stored thereon which are executable by a processor to implement the method of any one of claims 1 to 8.
A device, characterized in that the device comprises:

one or more processors; and

A memory storing computer readable instructions which, when executed, cause the processor to perform the operations of the method of any one of claims 1 to 8.