WO2022048168A1 - Training method and device for failure prediction neural network model - Google Patents

Training method and device for failure prediction neural network model Download PDF

Info

Publication number
WO2022048168A1
WO2022048168A1 PCT/CN2021/090028 CN2021090028W WO2022048168A1 WO 2022048168 A1 WO2022048168 A1 WO 2022048168A1 CN 2021090028 W CN2021090028 W CN 2021090028W WO 2022048168 A1 WO2022048168 A1 WO 2022048168A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
historical
indicator data
prediction
monitored
Prior art date
Application number
PCT/CN2021/090028
Other languages
French (fr)
Chinese (zh)
Inventor
王洪涛
Original Assignee
上海上讯信息技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海上讯信息技术股份有限公司 filed Critical 上海上讯信息技术股份有限公司
Publication of WO2022048168A1 publication Critical patent/WO2022048168A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3024Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Definitions

  • the present application relates to the technical field of computer data processing, and in particular, to a technology for training a fault prediction neural network model.
  • monitoring indicators are widely used to monitor the health status of computer servers, such as CPU usage, memory usage, etc. It is used to monitor the status of the business running on the computer server, such as the business volume per minute, the inflow and outflow data volume of the network card per unit time, etc.
  • the prior art is to determine whether the monitoring indicators are abnormal offline or in real time by setting fixed thresholds and/or dynamic thresholds of monitoring indicators.
  • these methods can only detect the abnormality that is or has occurred, which belongs to an after-the-fact monitoring method. It is not possible to predict in advance an anomaly occurs.
  • the purpose of this application is to provide a training method and device for a fault prediction neural network model, to solve the technical problem in the prior art that it is impossible to predict in advance the abnormality of the monitored computer operating state.
  • a training method for a fault prediction neural network model includes:
  • the historical indicator data set is composed of the monitoring indicator data of the monitored point collected at different historical time points;
  • processing the historical indicator data set to determine a training set and a test set
  • the neural network is trained based on the training set until the output error output by the neural network meets the first preset threshold, and the neural network is tested based on the test set. If the accuracy rate meets the second preset threshold, the trained neural network is obtained.
  • processing of the historical indicator data set based on a preset period to determine the training set and the test set includes:
  • the training set and the test set are determined based on the historical indicator data sequence and the true value label, wherein the samples in the training set and the test set include historical indicator data sequences and corresponding true value labels at different time points.
  • the method further includes:
  • the historical indicator data set is preprocessed to eliminate the influence of abnormal historical indicator data.
  • the neural network is an LSTM neural network
  • the structure of the LSTM neural network includes:
  • the output error includes a mean square error.
  • the method further includes:
  • the index data sequence at the current time point is composed of N historical index data before the current time point;
  • the index data sequence at the current time point is input into the trained neural network model for failure prediction of monitored points, and M times of iterative extrapolation is performed to obtain prediction index data of M monitored points.
  • the method further includes:
  • the prediction index data of the M monitored points are compared with the third preset threshold in time sequence, and the time point corresponding to the first non-compliant prediction index data of the monitored point is determined as the failure time point.
  • the method further includes:
  • alarm information is determined, and the alarm information is reported.
  • a training device for a fault prediction neural network model includes:
  • a first device configured to obtain a historical indicator data set of a monitored point, wherein the historical indicator data set is composed of monitoring indicator data of the monitored point collected at different historical time points;
  • a second device configured to process the historical indicator data set based on a preset period and sampling frequency to determine a training set and a test set;
  • a third device configured to train a neural network based on the training set until the output error output by the neural network meets a first preset threshold, and test the neural network based on the test set, if the accuracy meets the second preset Threshold to obtain the trained neural network model for failure prediction of monitored points.
  • the device further includes:
  • the fourth device is used for preprocessing the historical indicator data set to eliminate the influence of abnormal historical indicator data.
  • the device further includes:
  • a fifth device configured to obtain the index data sequence at the current time point, wherein the index data sequence at the current time point is composed of N historical index data before the current time point;
  • a sixth device configured to determine the number of iteratively extrapolated predictions M based on a preset prediction time length, a preset period and a sampling number N;
  • the seventh device is used for inputting the index data sequence of the current time point into the trained neural network model for failure prediction of monitored points, and performing M times of iterative extrapolation prediction to obtain prediction index data of M monitored points .
  • the device further includes:
  • the eighth device is used to compare the prediction index data of the M monitored points with a third preset threshold in time sequence, and determine that the time point corresponding to the prediction index data of the first non-compliant monitored point is a fault point in time.
  • the device further includes:
  • a ninth device is configured to determine alarm information based on the failure time point and corresponding prediction index data, and report the alarm information.
  • the present application uses a method and equipment for training a neural network model for fault prediction, first obtains the historical indicator data set of the monitored point, and then performs the analysis on the historical indicator data set based on a preset period. processing to determine a training set and a test set, and then train a neural network based on the training set until the output error output by the neural network meets a first preset threshold, and test the neural network based on the test set. The second preset threshold is met, and the trained neural network model for fault prediction of the monitored point is obtained.
  • a trained neural network model can be obtained, which can be used to predict the failure of the monitored computer operating state or business state, allowing operation and maintenance personnel to intervene in advance, effectively preventing the occurrence of abnormal failures, and effectively increasing the MTBF ( Mean Time Between Failure), or in the event of an unavoidable failure, prepare for abnormal failure recovery in advance, and eliminate it in time when the failure occurs, effectively reducing MTTR (Mean Time To Restoration, mean failure recovery time) .
  • FIG. 1 shows a flowchart of a training method for a fault prediction neural network model according to an aspect of the present application
  • FIG. 2 shows a schematic diagram of a training device for a fault prediction neural network model according to an aspect of the present application
  • each module of the system and the trusted party include one or more processors (CPUs), input/output interfaces, network interfaces and memory.
  • Memory may include forms of non-persistent memory, random access memory (RAM) and/or non-volatile memory in computer readable media, such as read only memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
  • RAM random access memory
  • ROM read only memory
  • flash RAM flash memory
  • Computer-readable media includes both persistent and non-permanent, removable and non-removable media, and storage of information may be implemented by any method or technology.
  • Information may be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • computer-readable media does not include non-transitory computer-readable media (transitory media), such as modulated data signals and carrier waves.
  • FIG. 1 shows a flowchart of a training method for a fault prediction neural network model according to an aspect of the present application, wherein the method of an embodiment includes:
  • S11 obtains the historical indicator data set of the monitored point, wherein the historical indicator data set is composed of the monitoring indicator data of the monitored point collected at different historical time points;
  • S13 trains the neural network based on the training set until the output error output by the neural network meets the first preset threshold, and tests the neural network based on the test set, if the accuracy meets the second preset threshold, the trained well is obtained
  • the monitored point fault prediction neural network model is obtained.
  • the method is performed by a device 1, which is a computer device and/or a cloud
  • the computer device includes, but is not limited to, a personal computer, a laptop computer, an industrial computer, a network host, a single network server, multiple A network server set
  • the cloud is composed of a large number of computers or network servers based on cloud computing (Cloud Computing), wherein cloud computing is a kind of distributed computing, a virtual supercomputer composed of a group of loosely coupled computer sets.
  • the device 1 obtains the historical indicator data set of the monitored point, wherein the monitored point is set on the monitored device, or is based on the service status running on the monitored device. set up.
  • device 1 directly obtains the indicator data of the monitoring points of the monitored device through SNMP (Simple Network Management Protocol) or other protocol capture, monitoring agent push, etc., for example, the CPU and memory of the monitored computer, Or the indicator data of the business running on the monitored device, such as the network traffic of the monitored computer.
  • SNMP Simple Network Management Protocol
  • monitoring agent push etc.
  • the CPU and memory of the monitored computer or the indicator data of the business running on the monitored device, such as the network traffic of the monitored computer.
  • Device 1 can first obtain the operating operation log of the operating system or service on the monitored device, and then process the operating operation log, such as analyzing the operating operation log by keyword or using a clustering algorithm. Classification, to obtain the statistics of the number of operations of similar logs at different time points or time windows.
  • the access log of the monitored computer 192.168.211.124 accessing port 30443 on the device 192.168.212.22 is as follows:
  • the historical indicator data set is composed of monitoring indicator data of the monitored points collected at different historical time points, wherein, in the different historical time points, the interval between adjacent historical time points is the same Yes, the historical indicator dataset is required to contain a sufficient amount of monitoring indicator data for training.
  • the device 1 processes the obtained historical indicator data set based on a preset period to determine a training set and a test set.
  • processing of the historical indicator data set based on a preset period to determine the training set and the test set includes:
  • the training set and the test set are determined based on the historical indicator data sequence and the true value label, wherein the samples in the training set and the test set include historical indicator data sequences and corresponding true value labels at different time points.
  • a preset time period T includes monitoring index data collected at N time points. If the time interval T b between adjacent time points is the same, the number N of monitoring index data collection time points within a preset time period T is T /T b , if T b is 1, then N is the same as T in value, that is, there are T time points in a preset time period T, corresponding to T historical indicator data.
  • a sample corresponding to the time point t can be determined.
  • samples corresponding to different historical time points can be constructed, wherein each sample includes a historical indicator data sequence corresponding to a certain historical time point and the true value of the data sequence.
  • the marked historical indicator data collected at this historical time point.
  • a sample set is composed of all samples, and the sample set is divided into a training set and a test set.
  • the division ratio can be 4:1 or other division ratios, which can be adjusted according to the actual training situation in the subsequent training of the neural network.
  • the method further includes:
  • the historical indicator data set is preprocessed to eliminate the influence of abnormal historical indicator data.
  • the individual data in the historical indicator data set obtained by device 1 may have data anomalies (deviation from normal high values or normal low values) caused by accidental abnormal conditions, or multiple consecutive data may increase or decrease one by one. Therefore, in view of the possible abnormal historical indicator data in the historical indicator data set, before being used to determine the training set and test set, the historical indicator data set can be preprocessed to eliminate the influence of abnormal historical indicator data.
  • the moving average method is used to process the data in the historical indicator data set P.
  • the average value of the n historical indicator data before s t is selected as the corresponding historical
  • the historical indicator data s' t at time point t can also be selected as the average value of n historical indicator data before and after s t as the historical indicator data s' t corresponding to historical time point t.
  • the selection of n should be much smaller than The number of data in the historical indicator data set, so as not to cause the data to fail to truly reflect the actual situation of the monitored points, and the selection method of n historical indicator data before and after s t is not specifically limited.
  • a new historical indicator data set P' composed of new historical indicator data s' t corresponding to different historical time t
  • the number of data is n less than the data in the historical indicator data set P (the n data selected when doing moving average does not include the data s t at time t ), or (n-1) less (when doing moving average
  • the selected n data includes the data s t at time t ).
  • the new historical indicator data set P' is processed to determine the training set and the test set.
  • the device 1 inputs the historical index data sequence in the training set sample into the neural network for training, compares the output of the neural network with the true value label of the sample, and obtains the output error , until the output error meets the preset threshold, and then use the test set samples to test the obtained neural network model. If the accuracy rate meets the preset threshold, the trained neural network is determined as the trained neural network for failure prediction of monitored points. Model.
  • the device 1 obtains the CPU usage historical data set P of the monitored computer, and preprocesses the CPU usage historical data set P to obtain a new CPU usage historical data set P', where the new CPU usage history
  • the data set P' includes CPU occupancy rate data s' t corresponding to different historical time points t
  • the new CPU occupancy rate historical data set P' is processed to obtain a CPU occupancy rate data sequence S' t at different historical time points t -1 , determine s' t as the true value label of S' t- 1 , S' t-1 corresponds to s' t to form a sample, traverse the new historical data set P' of CPU occupancy, and obtain different history
  • the samples at the time point constitute the CPU usage training set and test set, in which the division ratio can be 4:1, or other division ratios, which can be adjusted according to the actual training situation in the subsequent training of the neural network.
  • the output error of the training sample is used to obtain the loss function value of the neural network. If the loss function value meets the preset threshold, or the preset number of iterative training rounds is completed, the training of the neural network is completed, and then the samples in the test set are used to traverse the test to complete the training.
  • the trained neural network is determined to be the trained neural network model for failure prediction of the CPU usage of the monitored computer.
  • the neural network is an LSTM (Long Short Memory Network, long short-term memory network) neural network
  • the structure of the LSTM neural network includes:
  • the LSTM network structure in one embodiment is as follows in Table 3,
  • the neural network structure is not limited in the technical solution of the present application, and an LSTM network can be used, or an ANN (Artificial Neural Network, artificial neural network) or CNN (Convolutional Neural Networks, convolutional neural network) commonly used in the prior art can be used. Network) and other neural networks, and do not make special adjustments to the neural network structure.
  • ANN Artificial Neural Network, artificial neural network
  • CNN Convolutional Neural Networks, convolutional neural network
  • the output error includes a mean square error.
  • the CPU occupancy rate data sequence S' t-1 corresponding to the time point t in the CPU occupancy rate training set is input into the neural network, and the network output is compared with the corresponding S' t- The true value of 1 is compared with s' t , and the output mean square error of the training samples at time t is obtained. Based on the output mean square error of different training samples, the mean square error loss function value of the neural network is obtained. If the mean square error loss function value meets the preset threshold , or complete the preset number of iterative training rounds, complete the training of the neural network, and then use the samples in the test set to traverse and test the trained neural network. If the accuracy rate meets the preset threshold, it is determined that the trained neural network is trained.
  • a neural network model for fault prediction of the CPU occupancy rate of the monitored computer is input into the neural network, and the network output is compared with the corresponding S' t- The true value of 1 is compared with s' t , and the output mean square error of the training samples at time
  • the training method for a fault prediction neural network model further includes:
  • the index data sequence at the current time point is composed of N historical index data before the current time point;
  • the device 1 obtains the index data sequence S tp for predicting the next time point t p+1 of the current time point t p , wherein S tp contains T+1 index data, including the current time point collected at the current time point.
  • the indicator data s tp of the monitored point at the time point, and T historical indicator data s tp -T of the monitored points at different historical time points collected at different historical time points within a preset period T before s tp , s tp-T-1 > s tp-2 , s tp-1 , where the number of index data contained in S tp is the same as the number of historical index data contained in the historical index data sequence in the training set and the test set .
  • the iterative extrapolation prediction times M can be determined, wherein the calculation formula of M is as follows:
  • the collection interval is 1 hour
  • the collection time points are t 1-1 , t 1-2 ?? t 10-23 respectively , t 10-24 , a total of 240 collection time points
  • the corresponding collected CPU usage data are D 1-1 , D 1-2 ?? D 10-23 , D 10-24 , a total of 240 CPUs were collected for 10 days
  • the occupancy rate data constitutes the CPU occupancy rate historical data set. If the selected preset period T is 24 hours, then N is 24, and the number of elements of each CPU occupancy rate historical data sequence determined based on the obtained CPU occupancy rate historical data set The number is 25, and the obtained training samples are shown in Table 4 below.
  • sequence CPU usage data Label the ground truth S 2-1 D 1-1 , D 1-2 ??D 1-23 , D 1-24 , D 2-1 D 2-2 S 2-2 D 1-2 , D 1-3 —D 1-24 , D 2-1 , D 2-2 D 2-3 S 2-3 D 1-3 , D 1-4 isingD 2-1 , D 2-2 , D 2-3 D 2-4 ... ... ... S 10-21 D 9-1 , D 9-2 ??D 10-19 , D 10-20 , D 10-21 D 10-22 S 10-22 D 9-2 , D 9-3 ......D 10-20 , D 10-21 , D 10-22 D 10-23 S 10-23 D 9-3 , D 9-4 ......D 10-21 , D 10-22 , D 10-23 D 10-24
  • the above samples are respectively composed of a CPU occupancy training set and a test set in a ratio of 4:1 to obtain a trained CPU occupancy prediction neural network model after training the LSTM neural network. Then the obtained sequence S 10-24 ⁇ D 9-4 , D 9-5 ?? D 10-22 , D 10-23 , D 10-24 ⁇ is input into the trained neural network model for predicting CPU usage, and the prediction is obtained
  • the CPU usage prediction value D 11-1 after 1 hour and then iteratively obtains the sequence S 11-1 ⁇ D 9-6 , D 9-7 ...... D 10-24 , D 11-1 ⁇ , the S 11-1 Input the trained neural network model for CPU occupancy prediction, and predict the CPU occupancy rate prediction value D 11-2 after 2 hours. If the preset prediction time length is 24 hours, then iteratively extrapolate the prediction 24 times.
  • the prediction obtains D 11-1 , D 11-2 . . . D 11-23 , D 11-24 CPU usage prediction values.
  • the training method for a fault prediction neural network model further includes:
  • the prediction index data of the M monitored points are compared with the third preset threshold in time sequence, and the time point corresponding to the first non-compliant prediction index data of the monitored point is determined as the failure time point.
  • a threshold for example, 90%
  • the time point corresponding to the first predicted value exceeding 90% of the CPU usage is determined as the failure time point.
  • the predicted value obtained by each prediction can also be compared with a preset threshold. If the predicted value is met, the predicted value at the next time point is iteratively extrapolated to predict the predicted value. If the predicted value does not meet the predicted value, the iteratively extrapolated prediction is ended.
  • the training method for a fault prediction neural network model further includes:
  • alarm information is determined, and the alarm information is reported.
  • the failure time point and the corresponding prediction index data can be used as the alarm information or part of the alarm information, and notified to the stakeholders related to the monitored point, and the stakeholders will be notified by the stakeholders.
  • Respond to corresponding measures in a timely manner such as allowing operation and maintenance personnel to intervene in advance to eliminate hidden faults, effectively prevent the occurrence of abnormal faults, and effectively increase MTBF; or deal with faults immediately when they inevitably occur, effectively reducing fault processing time and effectively reducing MTTR .
  • FIG. 2 shows a schematic diagram of a training device for a fault prediction neural network model according to another aspect of the present application, wherein the device includes:
  • a first device 21 configured to obtain a historical indicator data set of a monitored point, wherein the historical indicator data set is composed of monitoring indicator data of the monitored point collected at different historical time points;
  • the second device 22 is configured to process the historical indicator data set based on a preset period and sampling frequency to determine a training set and a test set;
  • a third device 23 configured to train a neural network based on the training set until the output error output by the neural network meets a first preset threshold, and test the neural network based on the test set, if the accuracy meets the second preset threshold Set the threshold to obtain the trained neural network model for fault prediction of monitored points.
  • the first device 21 of the device 1 obtains the historical indicator data set of the monitored point, wherein the historical indicator data set is composed of the monitoring indicator data of the monitored point collected at different historical time points, and the second device 22 Based on the preset period and sampling frequency, the historical index data set obtained by the first device 21 is processed to determine a training set and a test set, and the third device 23 trains the neural network based on the training set determined by the second device 22 until the The output error output by the neural network meets the first preset threshold, and the neural network is tested based on the test set. If the accuracy rate meets the second preset threshold, a trained neural network model for fault prediction of the monitored point is obtained.
  • the training device for a fault prediction neural network model further includes:
  • the fourth device 24 (not shown) is configured to preprocess the historical indicator data set to eliminate the influence of abnormal historical indicator data.
  • the fourth device 24 of the device 1 preprocesses the historical indicator data set acquired by the first device 21 to eliminate the influence of abnormal historical indicator data, and the second device 22 preprocesses the fourth device 24 based on the preset period and sampling frequency After the historical indicator data set is processed, the training set and the test set are determined.
  • the training device for a fault prediction neural network model further includes:
  • a fifth device 25 configured to obtain the index data sequence at the current time point, wherein the index data sequence at the current time point is composed of N historical index data before the current time point;
  • a sixth device 26 configured to determine the number of iteratively extrapolated predictions M based on a preset prediction time length, a preset period and a sampling number N;
  • a seventh device 27 (not shown) is configured to input the index data sequence of the current time point into the trained neural network model for failure prediction of monitored points, and perform M iterations of extrapolation prediction to obtain M Predictor data for monitoring points.
  • the fifth device 25 of the device 1 obtains the index data sequence at the current time point, wherein the index data sequence at the current time point is composed of N historical index data before the current time point, and the sixth device 26 is based on a preset prediction time Length, preset period and sampling number N, determine the number of iterative extrapolation predictions M, the seventh device 27 inputs the index data sequence of the current time point obtained by the fifth device into the trained monitored point obtained by the third device 23
  • the fault prediction neural network model performs M times of iterative extrapolation prediction to obtain the prediction index data of the M monitored points.
  • the training device for a fault prediction neural network model further includes:
  • the eighth device 28 (not shown) is used to compare the prediction index data of the M monitored points with a third preset threshold in time sequence, and determine the prediction index data of the first non-compliant monitored point The corresponding time point is the failure time point.
  • the eighth device 28 of the device 1 compares the prediction index data of the M monitored points obtained by the seventh device 27 with the third preset threshold in chronological order, and determines that the first non-compliant predicted index data of the monitored point corresponds to The time point is the failure time point.
  • the training device for a fault prediction neural network model further includes:
  • a ninth device 29 (not shown) is configured to determine alarm information based on the failure time point and corresponding prediction index data, and report the alarm information.
  • the ninth device 29 of the device 1 obtains the failure time point and the corresponding prediction index data determined by the eighth device, determines the failure time point and the corresponding prediction index data as the alarm information or a part of the alarm information, and reports the alarm information related to the monitored point. Stakeholders, stakeholders should respond to corresponding measures in a timely manner, such as allowing operation and maintenance personnel to intervene in advance to eliminate hidden faults, effectively prevent the occurrence of abnormal faults, and effectively increase MTBF; or deal with faults immediately when they inevitably occur, effectively reducing Troubleshooting time, can effectively reduce MTTR.
  • a computer-readable medium storing computer-readable instructions executable by a processor to implement the aforementioned method.
  • a training device for a fault prediction neural network model includes:
  • a memory storing computer readable instructions which, when executed, cause the processor to perform operations as the aforementioned methods.
  • the computer-readable instructions when executed, cause the one or more processors to: obtain a historical indicator data set of the monitored point, preprocess the historical indicator data set to eliminate the influence of abnormal historical indicator data, Based on a preset period, the historical indicator data set is processed to determine a training set and a test set, and a neural network is trained based on the training set until the output error output by the neural network meets the first preset threshold, based on the The test set is used to test the neural network.
  • a trained neural network model for fault prediction of the monitored point is obtained; the index data sequence at the current time point is obtained, based on the preset prediction time length, prediction Set the period and the number of samples, determine the number of iterative extrapolation predictions M, input the index data sequence at the current time point into the trained neural network model for failure prediction of monitored points, and perform M times of iterative extrapolation prediction to obtain M Predictive index data of the M monitored points; compare the predictive index data of the M monitored points with a third preset threshold in time sequence, and determine the time corresponding to the first non-compliant predictive index data of the monitored point The point is the failure time point; based on the failure time point and the corresponding prediction index data, alarm information is determined, and the alarm information is reported.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Debugging And Monitoring (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A training method and device for a failure prediction neural network model. The method comprises: first obtaining a historical index data set of monitored points, wherein the historical index data set is composed of monitoring index data, collected at different historical time points, of the monitored points (S11); next, on the basis of a preset period, processing the historical index data set to determine a training set and a test set (S12); and then, training a neural network on the basis of the training set till an output error output by the neural network satisfies a first preset threshold, testing the neural network on the basis of the test set, and if an accuracy rate satisfies a second preset threshold, obtaining a trained monitored point failure prediction neural network model (S13). According to the method, the trained neural network model is obtained and is used for performing failure prediction on a running state or a service state of the monitored computer, so that operation and maintenance personnel can intervene in advance, a failure and abnormity are effectively prevented or failures are eliminated in time, and MTBF can be effectively increased or MTTR can be effectively reduced.

Description

一种用于故障预测神经网络模型的训练方法与设备A kind of training method and equipment for fault prediction neural network model 技术领域technical field
本申请涉及计算机数据处理技术领域,尤其涉及一种用于故障预测神经网络模型的训练的技术。The present application relates to the technical field of computer data processing, and in particular, to a technology for training a fault prediction neural network model.
背景技术Background technique
目前在对各类计算机特别是大量用于数据运算、存储服务器的日常运维实践中,监控指标广泛地应用于对计算机服务器健康状态进行监控,例如:CPU使用率、内存使用率等,也应用于对在计算机服务器上运行的业务的状态进行监控,例如:每分钟的业务量、单位时间内网卡的流入及流出数据量等。At present, in the daily operation and maintenance practice of various types of computers, especially a large number of data computing and storage servers, monitoring indicators are widely used to monitor the health status of computer servers, such as CPU usage, memory usage, etc. It is used to monitor the status of the business running on the computer server, such as the business volume per minute, the inflow and outflow data volume of the network card per unit time, etc.
现有技术是通过设置监控指标的固定阈值和/或者动态阈值的方法,离线或实时判断监控指标是否异常,但是,这些方法都只能发现正在或者已经发生的异常,属于一种事后监控手段,不能够在异常发生之前,提前预测。The prior art is to determine whether the monitoring indicators are abnormal offline or in real time by setting fixed thresholds and/or dynamic thresholds of monitoring indicators. However, these methods can only detect the abnormality that is or has occurred, which belongs to an after-the-fact monitoring method. It is not possible to predict in advance an anomaly occurs.
发明内容SUMMARY OF THE INVENTION
本申请的目的是提供一种用于故障预测神经网络模型的训练方法与设备,用以解决现有技术中不能够在被监控的计算机运行状态异常发生之前,提前预测的技术问题。The purpose of this application is to provide a training method and device for a fault prediction neural network model, to solve the technical problem in the prior art that it is impossible to predict in advance the abnormality of the monitored computer operating state.
根据本申请的一个方面,提供了一种用于故障预测神经网络模型的训练方法,其中,所述方法包括:According to an aspect of the present application, a training method for a fault prediction neural network model is provided, wherein the method includes:
获取被监控点的历史指标数据集,其中,所述历史指标数据集由在不同历史时间点采集到的所述被监控点的监控指标数据组成;Obtain the historical indicator data set of the monitored point, wherein the historical indicator data set is composed of the monitoring indicator data of the monitored point collected at different historical time points;
基于预设周期,对所述历史指标数据集进行处理,以确定训练集及测试集;Based on a preset period, processing the historical indicator data set to determine a training set and a test set;
基于所述训练集训练神经网络,直至所述神经网络输出的输出误差符合第一预设阈值,基于所述测试集测试所述神经网络,若准确率符合第二预设阈值,获得训练好的被监控点故障预测神经网络模型。The neural network is trained based on the training set until the output error output by the neural network meets the first preset threshold, and the neural network is tested based on the test set. If the accuracy rate meets the second preset threshold, the trained neural network is obtained. A neural network model for fault prediction of monitored points.
可选地,其中,所述基于预设周期,对所述历史指标数据集进行处理,以确定训练集及测试集包括:Optionally, the processing of the historical indicator data set based on a preset period to determine the training set and the test set includes:
基于预设周期,确定采样数量N;Based on the preset period, determine the sampling number N;
遍历所述历史指标数据集中的历史指标数据,构建不同时间点的历史指标数据序列,其中,所述不同时间点的历史指标数据序列是由该时间点前的N个历史指标数据组成;Traverse the historical indicator data in the historical indicator data set, and construct historical indicator data sequences at different time points, wherein the historical indicator data sequences at different time points are composed of N historical indicator data before the time point;
将不同时间点的历史指标数据确定为该时间点对应的历史指标数据序列的真值标注;Determine the historical indicator data at different time points as the true value annotation of the historical indicator data sequence corresponding to the time point;
基于所述历史指标数据序列及所述真值标注,确定所述训练集及测试集,其中,所述训练集及测试集中的样本包括不同时间点的历史指标数据序列及对应的真值标注。The training set and the test set are determined based on the historical indicator data sequence and the true value label, wherein the samples in the training set and the test set include historical indicator data sequences and corresponding true value labels at different time points.
可选地,其中,在构建不同时间点的历史指标数据序列之前,所述方法还包括:Optionally, before constructing the historical indicator data series at different time points, the method further includes:
对所述历史指标数据集进行预处理,以消除异常历史指标数据的影响。The historical indicator data set is preprocessed to eliminate the influence of abnormal historical indicator data.
可选地,其中,所述神经网络是LSTM神经网络,所述LSTM神经网络的结构包括:Optionally, wherein, the neural network is an LSTM neural network, and the structure of the LSTM neural network includes:
1个输入层;1 input layer;
2个LSTM隐含层;2 LSTM hidden layers;
1个全连接输出层。1 fully connected output layer.
可选地,其中,所述输出误差包括均方误差。Optionally, wherein the output error includes a mean square error.
可选地,其中,所述方法还包括:Optionally, wherein, the method further includes:
获取当前时间点的指标数据序列,其中,所述当前时间点的指标数据序列由当前时间点前的N个历史指标数据组成;Obtain the index data sequence at the current time point, wherein the index data sequence at the current time point is composed of N historical index data before the current time point;
基于预设的预测时间长度、预设周期及采样数量N,确定迭代外推预测次数M;Determine the iterative extrapolation prediction times M based on the preset prediction time length, the preset period and the sampling number N;
将所述当前时间点的指标数据序列输入所述训练好的被监控点故障预测神经网络模型,进行M次迭代外推预测,以获得M个被监控点的预测指标数据。The index data sequence at the current time point is input into the trained neural network model for failure prediction of monitored points, and M times of iterative extrapolation is performed to obtain prediction index data of M monitored points.
可选地,其中,所述方法还包括:Optionally, wherein, the method further includes:
将所述M个被监控点的预测指标数据按时间顺序分别于第三预设阈值比较,确定第一个不符合的被监控点的预测指标数据对应的时间点为故障时间点。The prediction index data of the M monitored points are compared with the third preset threshold in time sequence, and the time point corresponding to the first non-compliant prediction index data of the monitored point is determined as the failure time point.
可选地,其中,所述方法还包括:Optionally, wherein, the method further includes:
基于所述故障时间点及对应的预测指标数据,确定告警信息,并将所述告警信息上报。Based on the failure time point and the corresponding prediction index data, alarm information is determined, and the alarm information is reported.
根据本申请的另一方面,还提供了一种用于故障预测神经网络模型的训练设备,其中,所述设备包括:According to another aspect of the present application, a training device for a fault prediction neural network model is also provided, wherein the device includes:
第一装置,用于获取被监控点的历史指标数据集,其中,所述历史指标数据集是由在不同历史时间点采集到的所述被监控点的监控指标数据组成;a first device, configured to obtain a historical indicator data set of a monitored point, wherein the historical indicator data set is composed of monitoring indicator data of the monitored point collected at different historical time points;
第二装置,用于基于预设周期及采样频率,对所述历史指标数据集进行处理,以确定训练集及测试集;a second device, configured to process the historical indicator data set based on a preset period and sampling frequency to determine a training set and a test set;
第三装置,用于基于所述训练集训练神经网络,直至所述神经网络输出的输出误差符合第一预设阈值,基于所述测试集测试所述神经网络,若准确率符合第二预设阈值,获得训练好的被监控点故障预测神经网络模型。a third device, configured to train a neural network based on the training set until the output error output by the neural network meets a first preset threshold, and test the neural network based on the test set, if the accuracy meets the second preset Threshold to obtain the trained neural network model for failure prediction of monitored points.
可选地,其中,所述设备还包括:Optionally, wherein the device further includes:
第四装置,用于对所述历史指标数据集进行预处理,以消除异常历史指标数据的影响。The fourth device is used for preprocessing the historical indicator data set to eliminate the influence of abnormal historical indicator data.
可选地,其中,所述设备还包括:Optionally, wherein the device further includes:
第五装置,用于获取当前时间点的指标数据序列,其中,所述当前时间点的指标数据序列由当前时间点前的N个历史指标数据组成;a fifth device, configured to obtain the index data sequence at the current time point, wherein the index data sequence at the current time point is composed of N historical index data before the current time point;
第六装置,用于基于预设的预测时间长度、预设周期及采样数量N,确定迭代外推预测次数M;a sixth device, configured to determine the number of iteratively extrapolated predictions M based on a preset prediction time length, a preset period and a sampling number N;
第七装置,用于将所述当前时间点的指标数据序列输入所述训练好的被监控点故障预测神经网络模型,进行M次迭代外推预测,以获得M个被监控点的预测指标数据。The seventh device is used for inputting the index data sequence of the current time point into the trained neural network model for failure prediction of monitored points, and performing M times of iterative extrapolation prediction to obtain prediction index data of M monitored points .
可选地,其中,所述设备还包括:Optionally, wherein the device further includes:
第八装置,用于将所述M个被监控点的预测指标数据按时间顺序分别于第三预设阈值比较,确定第一个不符合的被监控点的预测指标数据对 应的时间点为故障时间点。The eighth device is used to compare the prediction index data of the M monitored points with a third preset threshold in time sequence, and determine that the time point corresponding to the prediction index data of the first non-compliant monitored point is a fault point in time.
可选地,其中,所述设备还包括:Optionally, wherein the device further includes:
第九装置,用于基于所述故障时间点及对应的预测指标数据,确定告警信息,并将所述告警信息上报。A ninth device is configured to determine alarm information based on the failure time point and corresponding prediction index data, and report the alarm information.
与现有技术相比,本申请通过一种用于故障预测神经网络模型的训练方法与设备,首先获取被监控点的历史指标数据集,接着基于预设周期,对所述历史指标数据集进行处理,以确定训练集及测试集,然后基于所述训练集训练神经网络,直至所述神经网络输出的输出误差符合第一预设阈值,基于所述测试集测试所述神经网络,若准确率符合第二预设阈值,获得训练好的被监控点故障预测神经网络模型。通过该方法,可获得经过训练的神经网络模型,用于对被监控的计算机运行状态或者业务状态进行故障预测,可让运维人员提前介入,有效预防故障异常的发生,可有效增大MTBF(Mean Time Between Failure,平均无故障工作时间),或者在故障异常不可避免情况下,提前为故障异常恢复做准备,在故障发生时及时消除,有效减少MTTR(Mean Time To Restoration,平均故障恢复时间)。Compared with the prior art, the present application uses a method and equipment for training a neural network model for fault prediction, first obtains the historical indicator data set of the monitored point, and then performs the analysis on the historical indicator data set based on a preset period. processing to determine a training set and a test set, and then train a neural network based on the training set until the output error output by the neural network meets a first preset threshold, and test the neural network based on the test set. The second preset threshold is met, and the trained neural network model for fault prediction of the monitored point is obtained. Through this method, a trained neural network model can be obtained, which can be used to predict the failure of the monitored computer operating state or business state, allowing operation and maintenance personnel to intervene in advance, effectively preventing the occurrence of abnormal failures, and effectively increasing the MTBF ( Mean Time Between Failure), or in the event of an unavoidable failure, prepare for abnormal failure recovery in advance, and eliminate it in time when the failure occurs, effectively reducing MTTR (Mean Time To Restoration, mean failure recovery time) .
附图说明Description of drawings
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本发明的其它特征、目的和优点将会变得更明显:Other features, objects and advantages of the present invention will become more apparent by reading the detailed description of non-limiting embodiments made with reference to the following drawings:
图1示出根据本申请一个方面的一种用于故障预测神经网络模型的训练方法流程图;1 shows a flowchart of a training method for a fault prediction neural network model according to an aspect of the present application;
图2示出根据本申请一个方面的一种用于故障预测神经网络模型的训练设备示意图;2 shows a schematic diagram of a training device for a fault prediction neural network model according to an aspect of the present application;
附图中相同或相似的附图标记代表相同或相似的部件。The same or similar reference numbers in the drawings represent the same or similar parts.
具体实施方式detailed description
下面结合附图对本发明作进一步详细描述。The present invention will be described in further detail below with reference to the accompanying drawings.
在本申请一个典型的配置中,系统各模块和可信方均包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。In a typical configuration of the present application, each module of the system and the trusted party include one or more processors (CPUs), input/output interfaces, network interfaces and memory.
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。Memory may include forms of non-persistent memory, random access memory (RAM) and/or non-volatile memory in computer readable media, such as read only memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括非暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Computer-readable media includes both persistent and non-permanent, removable and non-removable media, and storage of information may be implemented by any method or technology. Information may be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer-readable media does not include non-transitory computer-readable media (transitory media), such as modulated data signals and carrier waves.
为更进一步阐述本申请所采取的技术手段及取得的效果,下面结合附图及优选实施例,对本申请的技术方案,进行清楚和完整的描述。In order to further illustrate the technical means adopted in the present application and the obtained effects, the following describes the technical solutions of the present application in a clear and complete manner with reference to the accompanying drawings and preferred embodiments.
图1示出本申请一个方面的一种用于故障预测神经网络模型的训练方法流程图,其中,一个实施例的方法包括:FIG. 1 shows a flowchart of a training method for a fault prediction neural network model according to an aspect of the present application, wherein the method of an embodiment includes:
S11获取被监控点的历史指标数据集,其中,所述历史指标数据集由在不同历史时间点采集到的所述被监控点的监控指标数据组成;S11 obtains the historical indicator data set of the monitored point, wherein the historical indicator data set is composed of the monitoring indicator data of the monitored point collected at different historical time points;
S12基于预设周期,对所述历史指标数据集进行处理,以确定训练集及测试集;S12, based on a preset period, process the historical indicator data set to determine a training set and a test set;
S13基于所述训练集训练神经网络,直至所述神经网络输出的输出误差符合第一预设阈值,基于所述测试集测试所述神经网络,若准确率符合第二预设阈值,获得训练好的被监控点故障预测神经网络模型。S13 trains the neural network based on the training set until the output error output by the neural network meets the first preset threshold, and tests the neural network based on the test set, if the accuracy meets the second preset threshold, the trained well is obtained The monitored point fault prediction neural network model.
在本申请中,所述方法通过设备1执行,所述设备1为计算机设备和/或云,所述计算机设备包括但不限于个人计算机、笔记本电脑、工业计算机、网络主机、单个网络服务器、多个网络服务器集;所述云由基于云计算(Cloud Computing)的大量计算机或网络服务器构成,其中,云计算是分布式计算的一种,由一群松散耦合的计算机集组成的一个虚拟超级计算机。In this application, the method is performed by a device 1, which is a computer device and/or a cloud, and the computer device includes, but is not limited to, a personal computer, a laptop computer, an industrial computer, a network host, a single network server, multiple A network server set; the cloud is composed of a large number of computers or network servers based on cloud computing (Cloud Computing), wherein cloud computing is a kind of distributed computing, a virtual supercomputer composed of a group of loosely coupled computer sets.
在此,所述计算机设备和/或云仅为举例,其他现有的或者今后可能出现的设备和/或资源共享平台如适用于本申请也应包含在本申请的保护范围内,在此,以引用的方式包含于此。Here, the computer equipment and/or cloud are only examples, and other existing or future equipment and/or resource sharing platforms, if applicable to this application, should also be included in the protection scope of this application. Here, Incorporated herein by reference.
在该实施例中,在所述步骤S11中,设备1获取被监控点的历史指标数据集,其中,所述被监控点设置在被监控设备上,或者针对被监控设备上运行的业务状态而设置。In this embodiment, in the step S11, the device 1 obtains the historical indicator data set of the monitored point, wherein the monitored point is set on the monitored device, or is based on the service status running on the monitored device. set up.
例如,设备1通过SNMP(Simple Network Management Protocol,简单网络管理协议)或其它协议抓取、监控代理推送等方式直接获取被监控设备的监控点的指标数据,例如,被监控计算机的CPU、内存,或者是被监控设备上运行业务的指标数据,例如被监控计算机的网络流量。For example, device 1 directly obtains the indicator data of the monitoring points of the monitored device through SNMP (Simple Network Management Protocol) or other protocol capture, monitoring agent push, etc., for example, the CPU and memory of the monitored computer, Or the indicator data of the business running on the monitored device, such as the network traffic of the monitored computer.
对于无法直接获取的指标数据,设备1可以通过先获取被监控设备上的操作系统或者业务的运行操作日志,然后对运行操作日志进行处理,比如按关键字或者使用聚类算法对运行操作日志进行分类,获取到不同时间点或时间窗口内同类日志的操作次数统计值。For the indicator data that cannot be obtained directly, Device 1 can first obtain the operating operation log of the operating system or service on the monitored device, and then process the operating operation log, such as analyzing the operating operation log by keyword or using a clustering algorithm. Classification, to obtain the statistics of the number of operations of similar logs at different time points or time windows.
例如,在设备192.168.212.22上获取被监控计算机192.168.211.124访问30443端口的访问日志如下:For example, the access log of the monitored computer 192.168.211.124 accessing port 30443 on the device 192.168.212.22 is as follows:
192.168.212.124 192.168.211.22:30443-[16/Aug/2017:16:25:03+0800]"GET/HTTP/1.1"200 10799"https://192.168.211.22:30443/verifypasslog/detail/id/6""Mozilla/5.0(Windows NT 6.1;Win64;x64)AppleWebKit/537.36(KHTML,like Gecko)Chrome/73.0.3683.86 Safari/537.36"192.168.212.124 192.168.211.22:30443-[16/Aug/2017:16:25:03+0800]"GET/HTTP/1.1"200 10799"https://192.168.211.22:30443/verifypasslog/detail/id/ 6""Mozilla/5.0(Windows NT 6.1; Win64; x64)AppleWebKit/537.36(KHTML,like Gecko)Chrome/73.0.3683.86 Safari/537.36"
192.168.212.124 192.168.211.22:30443-[16/Aug/2017:16:25:03+0800]"GET/themes/blue/css/login.css HTTP/1.1"200 5530"https://192.168.211.22:30443/""Mozilla/5.0(Windows NT 6.1;Win64;x64)AppleWebKit/537.36(KHTML,like Gecko)Chrome/73.0.3683.86 Safari/537.36"192.168.212.124 192.168.211.22:30443-[16/Aug/2017:16:25:03+0800]"GET/themes/blue/css/login.css HTTP/1.1"200 5530"https://192.168.211.22 :30443/""Mozilla/5.0(Windows NT 6.1; Win64; x64)AppleWebKit/537.36(KHTML,like Gecko)Chrome/73.0.3683.86 Safari/537.36"
192.168.212.124 192.168.211.22:30443-[16/Aug/2017:17:25:03+0800]"GET/js/jquery-1.7.2.min.js HTTP/1.1"304 0"https://192.168.211.22:30443/""Mozilla/5.0(Windows NT 6.1;Win64;x64)AppleWebKit/537.36(KHTML,like Gecko)Chrome/73.0.3683.86 Safari/537.36"192.168.212.124 192.168.211.22:30443-[16/Aug/2017:17:25:03+0800]"GET/js/jquery-1.7.2.min.js HTTP/1.1"304 0"https://192.168 .211.22:30443/""Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36"
192.168.212.124 192.168.211.22:30443-[16/Aug/2017:18:25:03+0800]"GET /js/jquery.placeholder.js HTTP/1.1"304 0"https://192.168.211.22:30443/""Mozilla/5.0(Windows NT 6.1;Win64;x64)AppleWebKit/537.36(KHTML,like Gecko)Chrome/73.0.3683.86 Safari/537.36"192.168.212.124 192.168.211.22:30443-[16/Aug/2017:18:25:03+0800]"GET /js/jquery.placeholder.js HTTP/1.1"304 0"https://192.168.211.22:30443 /""Mozilla/5.0(Windows NT 6.1; Win64; x64)AppleWebKit/537.36(KHTML,like Gecko)Chrome/73.0.3683.86 Safari/537.36"
192.168.212.124 192.168.211.22:30443-[16/Aug/2017:19:25:03+0800]"GET/default/getcodes HTTP/1.1"200 5892"https://192.168.211.22:30443/""Mozilla/5.0(Windows NT 6.1;Win64;x64)AppleWebKit/537.36(KHTML,like Gecko)Chrome/73.0.3683.86 Safari/537.36"192.168.212.124 192.168.211.22:30443-[16/Aug/2017:19:25:03+0800]"GET/default/getcodes HTTP/1.1"200 5892"https://192.168.211.22:30443/""Mozilla /5.0(Windows NT 6.1; Win64; x64)AppleWebKit/537.36(KHTML,like Gecko)Chrome/73.0.3683.86 Safari/537.36"
192.168.212.124 192.168.211.22:30443-[16/Aug/2017:19:25:03+0800]"GET/themes/blue/images/oma_login_bg.jpg HTTP/1.1"304 0"https://192.168.211.22:30443/themes/blue/css/login.css""Mozilla/5.0(Windows NT 6.1;Win64;x64)AppleWebKit/537.36(KHTML,like Gecko)Chrome/73.0.3683.86 Safari/537.36"192.168.212.124 192.168.211.22:30443-[16/Aug/2017:19:25:03+0800]"GET/themes/blue/images/oma_login_bg.jpg HTTP/1.1"304 0"https://192.168.211.22 :30443/themes/blue/css/login.css""Mozilla/5.0(Windows NT 6.1;Win64;x64)AppleWebKit/537.36(KHTML,like Gecko)Chrome/73.0.3683.86 Safari/537.36"
若按照1小时作为采集时间窗口,统计1个小时内的访问次数,可以统计出一个不同时间点访问量数据集如下表1,If 1 hour is used as the collection time window and the number of visits within 1 hour is counted, a data set of visits at different time points can be counted as shown in Table 1 below.
表1Table 1
采集时间窗口acquisition time window 访问次数number of visits
16:0016:00 22
17:0017:00 11
18:0018:00 11
19:0019:00 22
同样地,也可以统计出关于不同时间点访问/default/getcodes页面次数的数据集如表2,Similarly, the data set about the number of visits to the /default/getcodes page at different time points can also be counted as shown in Table 2.
表2Table 2
采集时间窗口acquisition time window 访问次数number of visits
16:0016:00 00
17:0017:00 00
18:0018:00 00
19:0019:00 11
在步骤S11中,所述历史指标数据集由在不同历史时间点采集到的所述被监控点的监控指标数据组成,其中,所述不同历史时间点中,相邻历史时间点的间隔是相同的,要求历史指标数据集包含足够数量的监控指标数据以用于训练。In step S11, the historical indicator data set is composed of monitoring indicator data of the monitored points collected at different historical time points, wherein, in the different historical time points, the interval between adjacent historical time points is the same Yes, the historical indicator dataset is required to contain a sufficient amount of monitoring indicator data for training.
继续在该实施例中,在所述步骤S12中,设备1基于预设周期,对获得的历史指标数据集进行处理,以确定训练集及测试集。Continuing in this embodiment, in the step S12, the device 1 processes the obtained historical indicator data set based on a preset period to determine a training set and a test set.
可选地,其中,所述基于预设周期,对所述历史指标数据集进行处理,以确定训练集及测试集包括:Optionally, wherein the processing of the historical indicator data set based on a preset period to determine the training set and the test set includes:
基于预设周期,确定采样数量N;Based on the preset period, determine the sampling number N;
遍历所述历史指标数据集中的历史指标数据,构建不同时间点的历史指标数据序列,其中,所述不同时间点的历史指标数据序列是由该时间点前的N个历史指标数据组成;Traverse the historical indicator data in the historical indicator data set, and construct historical indicator data sequences at different time points, wherein the historical indicator data sequences at different time points are composed of N historical indicator data before the time point;
将不同时间点的历史指标数据确定为该时间点对应的历史指标数据序列的真值标注;Determine the historical indicator data at different time points as the true value annotation of the historical indicator data sequence corresponding to the time point;
基于所述历史指标数据序列及所述真值标注,确定所述训练集及测试集,其中,所述训练集及测试集中的样本包括不同时间点的历史指标数据序列及对应的真值标注。The training set and the test set are determined based on the historical indicator data sequence and the true value label, wherein the samples in the training set and the test set include historical indicator data sequences and corresponding true value labels at different time points.
例如,在预设时间周期T包含N个时间点采集的监控指标数据,若相邻时间点的时间间隔T b相同,则一个预设时间周期T内的监控指标数据采集时间点数量N为T/T b,若T b为1,则N在数值上与T相同,即在一个预设时间周期T内有T个时间点,对应由T个历史指标数据。将t(在数值上,t要大于等于T+1)时间点采集的历史指标数据s t之前的T+1个历史指标数据s t-T+1、s t-T、s t-T-1……s t-2、s t-1组成一个包含T+1个历史指标数据的历史指标数据序列S t-1{s t-T+1、s t-T、s t-T-1……s t-2、s t-1},将t时间点采集的历史指标数据s t确定为历史指标数据序列S t-1对应的真值标注。 For example, a preset time period T includes monitoring index data collected at N time points. If the time interval T b between adjacent time points is the same, the number N of monitoring index data collection time points within a preset time period T is T /T b , if T b is 1, then N is the same as T in value, that is, there are T time points in a preset time period T, corresponding to T historical indicator data. The T+1 historical indicator data s t -T+1 , s tT , s tT-1 ...... t-2 , s t-1 form a historical indicator data sequence S t-1 {s t-T+1 , s tT , s tT-1 ...... s t-2 , s t-1 }, the historical indicator data st collected at the time point t is determined as the true value label corresponding to the historical indicator data sequence S t-1 .
基于历史指标数据序列S t-1及所述真值标注遍s t,可确定一个与t时间点对应的样本。遍历处理历史指标数据集P中的历史指标数据,可构建与不同历史时间点对应的样本,其中,每个样本包括与某个历史时间点对应的历史指标数据序列及作为该数据序列的真值标注的该历史时间点采集的历史指标数据。由所有样本组成样本集,将样本集划分为训练集及测试集,其中,划分比例可以为4:1,也可以是其它划分比例,可在后续训练神经网络过程中根据实际训练情况进行调整。 Based on the historical index data sequence S t-1 and the true value labeling pass s t , a sample corresponding to the time point t can be determined. By traversing and processing the historical indicator data in the historical indicator data set P, samples corresponding to different historical time points can be constructed, wherein each sample includes a historical indicator data sequence corresponding to a certain historical time point and the true value of the data sequence. The marked historical indicator data collected at this historical time point. A sample set is composed of all samples, and the sample set is divided into a training set and a test set. The division ratio can be 4:1 or other division ratios, which can be adjusted according to the actual training situation in the subsequent training of the neural network.
可选地,其中,在构建不同时间点的历史指标数据序列之前,所述方法还包括:Optionally, before constructing the historical indicator data series at different time points, the method further includes:
对所述历史指标数据集进行预处理,以消除异常历史指标数据的影响。The historical indicator data set is preprocessed to eliminate the influence of abnormal historical indicator data.
其中,设备1获取到的历史指标数据集中的个别数据可能存在因偶然的异常情况导致的数据异常(偏离正常高值或正常低值),又或者连续多个数据可能存在逐个增大或减小的趋势,因此,针对历史指标数据集中可能的历史指标数据异常情况,在用于确定训练集及测试集之前,可以先对历史指标数据集进行预处理,以消除异常历史指标数据的影响。Among them, the individual data in the historical indicator data set obtained by device 1 may have data anomalies (deviation from normal high values or normal low values) caused by accidental abnormal conditions, or multiple consecutive data may increase or decrease one by one. Therefore, in view of the possible abnormal historical indicator data in the historical indicator data set, before being used to determine the training set and test set, the historical indicator data set can be preprocessed to eliminate the influence of abnormal historical indicator data.
例如,采用移动平均法对历史指标数据集P中的数据进行处理,比如,对于在历史时间点t采集的历史指标数据s t,选择s t前的n个历史指标数据的平均值作为对应历史时间点t的历史指标数据s’ t,也可以选择s t前后共n个历史指标数据的平均值作为对应历史时间点t的历史指标数据s’ t,在此,对n的选择应远小于历史指标数据集中数据的个数,以免造成数据不能真实反映被监控点的实际情况,而且,对s t前后n个历史指标数据的选择方式不作具体限定。按此方法遍历历史指标数据集中的所有数据,可得到由对应不同历史时刻t的新历史指标数据s’ t构成的新的历史指标数据集P’,其中,新的历史指标数据集P’中的数据个数比历史指标数据集P中的数据少n个(做移动平均时选择的n个数据中不包含t时刻的数据s t),或者少(n-1)个(做移动平均时选择的n个数据中包含t时刻的数据s t)。基于预设周期,对新的历史指标数据集P’进行处理,以确定训练集及测试集。 For example, the moving average method is used to process the data in the historical indicator data set P. For example, for the historical indicator data s t collected at the historical time point t , the average value of the n historical indicator data before s t is selected as the corresponding historical The historical indicator data s' t at time point t can also be selected as the average value of n historical indicator data before and after s t as the historical indicator data s' t corresponding to historical time point t. Here, the selection of n should be much smaller than The number of data in the historical indicator data set, so as not to cause the data to fail to truly reflect the actual situation of the monitored points, and the selection method of n historical indicator data before and after s t is not specifically limited. Traversing all the data in the historical indicator data set in this way, a new historical indicator data set P' composed of new historical indicator data s' t corresponding to different historical time t can be obtained, wherein, in the new historical indicator data set P' The number of data is n less than the data in the historical indicator data set P (the n data selected when doing moving average does not include the data s t at time t ), or (n-1) less (when doing moving average The selected n data includes the data s t at time t ). Based on the preset period, the new historical indicator data set P' is processed to determine the training set and the test set.
继续在该实施例中,在所述步骤S13中,设备1将训练集样本中的历史指标数据序列输入神经网络进行训练,将神经网络的输出与该样本的真值标注进行比较,获取输出误差,直至输出误差符合预设阈值,然后采用测试集样本对获得的神经网络模型进行测试,若准确率符合预设阈值,则将训练后的神经网络确定为训练好的被监控点故障预测神经网络模型。Continuing in this embodiment, in the step S13, the device 1 inputs the historical index data sequence in the training set sample into the neural network for training, compares the output of the neural network with the true value label of the sample, and obtains the output error , until the output error meets the preset threshold, and then use the test set samples to test the obtained neural network model. If the accuracy rate meets the preset threshold, the trained neural network is determined as the trained neural network for failure prediction of monitored points. Model.
例如,设备1获取到被监控计算机的CPU占用率历史数据集P,对CPU占用率历史数据集P进行预处理,得到新的CPU占用率历史数据集P’,其中,新的CPU占用率历史数据集P’中包括不同历史时间点t对应的CPU占用率数据s’ t,对新的CPU占用率历史数据集P’进行处理,获得不同历史时间点t的CPU占用率数据序列S’ t-1,将s’ t确定为S’ t-1的真值标注,S’ t-1与s’ t对应组成一个样本,遍历新的CPU占用率历史数据集P’,将获得的不同历史时间点的样本组成CPU占用率训练集及测试集,其中,划分比例可以为4:1,也可 以是其它划分比列,可在后续训练神经网络过程中根据实际训练情况进行调整。将CPU占用率训练集中t时间点对应的CPU占用率数据序列S’ t-1输入神经网络,将网络输出与对应S’ t-1的真值标注s’ t比较,获得输出误差,基于不同训练样本的输出误差,获得神经网络的损失函数值,若损失函数值满足预设阈值,或者完成预设的迭代训练轮数,完成神经网络的训练,然后采用测试集中的样本遍历测试完成训练的神经网络,若准确率满足预设阈值,则确定训练好的神经网络为训练好的被监控计算机CPU占用率故障预测神经网络模型。 For example, the device 1 obtains the CPU usage historical data set P of the monitored computer, and preprocesses the CPU usage historical data set P to obtain a new CPU usage historical data set P', where the new CPU usage history The data set P' includes CPU occupancy rate data s' t corresponding to different historical time points t, and the new CPU occupancy rate historical data set P' is processed to obtain a CPU occupancy rate data sequence S' t at different historical time points t -1 , determine s' t as the true value label of S' t- 1 , S' t-1 corresponds to s' t to form a sample, traverse the new historical data set P' of CPU occupancy, and obtain different history The samples at the time point constitute the CPU usage training set and test set, in which the division ratio can be 4:1, or other division ratios, which can be adjusted according to the actual training situation in the subsequent training of the neural network. Input the CPU occupancy data sequence S' t-1 corresponding to time t in the CPU occupancy training set into the neural network, and compare the network output with the true value label s' t corresponding to S' t-1 to obtain the output error. The output error of the training sample is used to obtain the loss function value of the neural network. If the loss function value meets the preset threshold, or the preset number of iterative training rounds is completed, the training of the neural network is completed, and then the samples in the test set are used to traverse the test to complete the training. For the neural network, if the accuracy rate meets the preset threshold, the trained neural network is determined to be the trained neural network model for failure prediction of the CPU usage of the monitored computer.
可选地,其中,所述神经网络是LSTM(Long Short Memory Network,长短时记忆网络)神经网络,所述LSTM神经网络的结构包括:Optionally, wherein, the neural network is an LSTM (Long Short Memory Network, long short-term memory network) neural network, and the structure of the LSTM neural network includes:
1个输入层;1 input layer;
2个LSTM隐含层;2 LSTM hidden layers;
1个全连接输出层。1 fully connected output layer.
一个实施例中的LSTM网络结构如下表3,The LSTM network structure in one embodiment is as follows in Table 3,
表3table 3
Figure PCTCN2021090028-appb-000001
Figure PCTCN2021090028-appb-000001
本申请的技术方案中对所述神经网络结构不做限定,可以采用LSTM网络,也可以采用现有技术中常用的ANN(Artificial Neural Network,人工神经网络)或者CNN(Convolutional Neural Networks,卷积神经网络)等神经网络,对神经网络结构也不做特殊调整。The neural network structure is not limited in the technical solution of the present application, and an LSTM network can be used, or an ANN (Artificial Neural Network, artificial neural network) or CNN (Convolutional Neural Networks, convolutional neural network) commonly used in the prior art can be used. Network) and other neural networks, and do not make special adjustments to the neural network structure.
可选地,其中,所述输出误差包括均方误差。Optionally, wherein the output error includes a mean square error.
例如,在上述关于被监控的CPU占用率的实施例中,将CPU占用率训练集中t时间点对应的CPU占用率数据序列S’ t-1输入神经网络,将网络输出与对应S’ t-1的真值标注s’ t比较,获得t时间点训练样本的输出均方差,基于不同训练样本的输出均方差,获得神经网络的均方差损失函数值,若均方差损失函数值满足预设阈值,或者完成预设的迭代训练轮数,完成神经网络的 训练,然后采用测试集中的样本遍历测试完成训练的神经网络,若准确率满足预设阈值,则确定训练好的神经网络为训练好的被监控计算机CPU占用率故障预测神经网络模型。 For example, in the above-mentioned embodiment about the monitored CPU occupancy rate, the CPU occupancy rate data sequence S' t-1 corresponding to the time point t in the CPU occupancy rate training set is input into the neural network, and the network output is compared with the corresponding S' t- The true value of 1 is compared with s' t , and the output mean square error of the training samples at time t is obtained. Based on the output mean square error of different training samples, the mean square error loss function value of the neural network is obtained. If the mean square error loss function value meets the preset threshold , or complete the preset number of iterative training rounds, complete the training of the neural network, and then use the samples in the test set to traverse and test the trained neural network. If the accuracy rate meets the preset threshold, it is determined that the trained neural network is trained. A neural network model for fault prediction of the CPU occupancy rate of the monitored computer.
可选地,其中,所述一种用于故障预测神经网络模型的训练方法还包括:Optionally, wherein, the training method for a fault prediction neural network model further includes:
获取当前时间点的指标数据序列,其中,所述当前时间点的指标数据序列由当前时间点前的N个历史指标数据组成;Obtain the index data sequence at the current time point, wherein the index data sequence at the current time point is composed of N historical index data before the current time point;
基于预设的预测时间长度、预设周期及采样数量N,确定迭代外推预测次数M;Based on the preset prediction time length, preset period and sampling number N, determine the number of iterative extrapolation predictions M;
将所述当前时间点的指标数据序列输入所述训练好的被监控点故障预测神经网络模型,进行M次迭代外推预测,以获得M个被监控点的预测指标数据。Inputting the index data sequence at the current time point into the trained neural network model for failure prediction of monitored points, and performing M iterations of extrapolation prediction to obtain prediction index data of M monitored points.
其中,设备1获取用于预测当前时间点t p的下一个时间点t p+1的指标数据序列S tp,其中,S tp中包含T+1个指标数据,包括当前时间点采集到的当前时间点的被监控点的指标数据s tp,以及s tp之前的一个预设周期T内的不同历史时间点的采集到的不同历史时间点的被监控点的T个历史指标数据s tp-T、s tp-T-1……s tp-2、s tp-1,其中,S tp中包含的指标数据个数与前述训练集及测试集中的历史指标数据序列包含的历史指标数据个数相同。 Wherein, the device 1 obtains the index data sequence S tp for predicting the next time point t p+1 of the current time point t p , wherein S tp contains T+1 index data, including the current time point collected at the current time point. The indicator data s tp of the monitored point at the time point, and T historical indicator data s tp -T of the monitored points at different historical time points collected at different historical time points within a preset period T before s tp , s tp-T-1 ...... s tp-2 , s tp-1 , where the number of index data contained in S tp is the same as the number of historical index data contained in the historical index data sequence in the training set and the test set .
基于预设的预测时间长度T f、预设周期T及采样数量N,可以确定迭代外推预测次数M,其中,所述M的计算公式如下: Based on the preset prediction time length T f , the preset period T and the number of samples N, the iterative extrapolation prediction times M can be determined, wherein the calculation formula of M is as follows:
M=T f/T*N M=T f /T*N
将S tp{s tp-T、s tp-T-1……s tp-2、s tp-1、s tp}输入训练好的被监控点故障预测神经网络模型,获得当前时间点t p的下一个时间点t p+1的被监控点预测指标数据s tp+1,再构建时间点t p+2的指标数据序列S tp+1{s t-T-1……s tp-2、s tp-1、s tp、s tp+1},输入训练好的被监控点故障预测神经网络模型,获得时间点t p+2的被监控点预测指标数据s tp+2,依次进行M次迭代外推预测,可以预测获得不同时间点t p+3、t p+4……t p+M的被监控点预测指标数据s tp+3、s tp+4……s tp+MInput S tp {s tp -T , s tp -T - 1 ...... Prediction indicator data s tp+1 of the monitored point at the next time point t p+ 1, and then construct the indicator data sequence S tp+1 at time point t p+2 {s tT-1 ...... s tp-2 , s tp -1 , s tp , s tp+1 }, input the trained neural network model for failure prediction of monitored points, obtain the predictive index data s tp+2 of monitored points at time point t p+ 2, and perform M iterations in sequence By pushing the prediction, it is possible to predict and obtain the monitored point prediction index data s tp+3 , s tp+4 ...... s tp+M at different time points t p+3 , t p+4 ...... t p+ M .
例如,在上述关于被监控的CPU占用率的实施例中,基于采集的历史指标数据,比如,采集间隔1小时,采集时间点分别为t 1-1、t 1-2……t 10-23、t 10-24,共240个采集时间点,对应采集到CPU占用率数据分别为D 1-1、D 1-2……D 10-23、 D 10-24,共采集10天240个CPU占用率数据,构成CPU占用率历史数据集,若选择的预设周期T为24小时,则N为24,基于获得的CPU占用率历史数据集确定的每个CPU占用率历史数据序列的元素个数为25,获得的训练样本如下表4, For example, in the above-mentioned embodiment about the monitored CPU occupancy rate, based on the collected historical indicator data, for example, the collection interval is 1 hour, and the collection time points are t 1-1 , t 1-2 ...... t 10-23 respectively , t 10-24 , a total of 240 collection time points, the corresponding collected CPU usage data are D 1-1 , D 1-2 ...... D 10-23 , D 10-24 , a total of 240 CPUs were collected for 10 days The occupancy rate data constitutes the CPU occupancy rate historical data set. If the selected preset period T is 24 hours, then N is 24, and the number of elements of each CPU occupancy rate historical data sequence determined based on the obtained CPU occupancy rate historical data set The number is 25, and the obtained training samples are shown in Table 4 below.
表4Table 4
序列sequence CPU占用率数据CPU usage data 标注真值Label the ground truth
S 2-1 S 2-1 D 1-1、D 1-2……D 1-23、D 1-24、D 2-1 D 1-1 , D 1-2 ......D 1-23 , D 1-24 , D 2-1 D 2-2 D 2-2
S 2-2 S 2-2 D 1-2、D 1-3……D 1-24、D 2-1、D 2-2 D 1-2 , D 1-3 ......D 1-24 , D 2-1 , D 2-2 D 2-3 D 2-3
S 2-3 S 2-3 D 1-3、D 1-4……D 2-1、D 2-2、D 2-3 D 1-3 , D 1-4 ......D 2-1 , D 2-2 , D 2-3 D 2-4 D 2-4
……... ……... ……...
S 10-21 S 10-21 D 9-1、D 9-2……D 10-19、D 10-20、D 10-21 D 9-1 , D 9-2 ......D 10-19 , D 10-20 , D 10-21 D 10-22 D 10-22
S 10-22 S 10-22 D 9-2、D 9-3……D 10-20、D 10-21、D 10-22 D 9-2 , D 9-3 ......D 10-20 , D 10-21 , D 10-22 D 10-23 D 10-23
S 10-23 S 10-23 D 9-3、D 9-4……D 10-21、D 10-22、D 10-23 D 9-3 , D 9-4 ......D 10-21 , D 10-22 , D 10-23 D 10-24 D 10-24
将上述样本按4:1比例分别组成CPU占用率训练集及测试集训练LSTM神经网络后获得训练好的CPU占用率预测神经网络模型。然后将获得的序列S 10-24{D 9-4、D 9-5……D 10-22、D 10-23、D 10-24}输入训练好的CPU占用率预测神经网络模型,预测获得1小时后的CPU占用率预测值D 11-1,接着迭代获得序列S 11-1{D 9-6、D 9-7……D 10-24、D 11-1},将S 11-1输入训练好的CPU占用率预测神经网络模型,预测获得2小时后的CPU占用率预测值D 11-2,若预设的预测时间长度是24小时,则经迭代外推预测24次,可以分别预测获得D 11-1、D 11-2……D 11-23、D 11-24个CPU占用率预测值。 The above samples are respectively composed of a CPU occupancy training set and a test set in a ratio of 4:1 to obtain a trained CPU occupancy prediction neural network model after training the LSTM neural network. Then the obtained sequence S 10-24 {D 9-4 , D 9-5 ...... D 10-22 , D 10-23 , D 10-24 } is input into the trained neural network model for predicting CPU usage, and the prediction is obtained The CPU usage prediction value D 11-1 after 1 hour, and then iteratively obtains the sequence S 11-1 {D 9-6 , D 9-7 ...... D 10-24 , D 11-1 }, the S 11-1 Input the trained neural network model for CPU occupancy prediction, and predict the CPU occupancy rate prediction value D 11-2 after 2 hours. If the preset prediction time length is 24 hours, then iteratively extrapolate the prediction 24 times. The prediction obtains D 11-1 , D 11-2 . . . D 11-23 , D 11-24 CPU usage prediction values.
可选地,其中,所述一种用于故障预测神经网络模型的训练方法还包括:Optionally, wherein, the training method for a fault prediction neural network model further includes:
将所述M个被监控点的预测指标数据按时间顺序分别于第三预设阈值比较,确定第一个不符合的被监控点的预测指标数据对应的时间点为故障时间点。The prediction index data of the M monitored points are compared with the third preset threshold in time sequence, and the time point corresponding to the first non-compliant prediction index data of the monitored point is determined as the failure time point.
例如,在上述关于被监控的CPU占用率的实施例中,经24次迭代外推预测获得的24个CPU占用率,按时间顺序,分别将D 11-1、D 11-2……与预设阈值(比如90%)比较,确定其中第一个超过90%CPU占用率的预测值对应的时间点确定为故障时间点。 For example, in the above-mentioned embodiment about the monitored CPU occupancy rate, the 24 CPU occupancy rates obtained by extrapolating the prediction through 24 iterations, in chronological order, respectively compare D 11-1 , D 11-2 . . . with the prediction A threshold (for example, 90%) is set for comparison, and the time point corresponding to the first predicted value exceeding 90% of the CPU usage is determined as the failure time point.
其中,还可以将每次预测得到的预测值与预设阈值比较,若符合预测值,则再迭代外推预测下一个时间点的预测值,若不符合预测值,则结束迭代外推预测。The predicted value obtained by each prediction can also be compared with a preset threshold. If the predicted value is met, the predicted value at the next time point is iteratively extrapolated to predict the predicted value. If the predicted value does not meet the predicted value, the iteratively extrapolated prediction is ended.
可选地,其中,所述一种用于故障预测神经网络模型的训练方法还包括:Optionally, wherein, the training method for a fault prediction neural network model further includes:
基于所述故障时间点及对应的预测指标数据,确定告警信息,并将所述告警信息上报。Based on the failure time point and the corresponding prediction index data, alarm information is determined, and the alarm information is reported.
其中,若确定故障时间点及对应的预测指标数据,可以将故障时间点及对应的预测指标数据作为告警信息或告警信息的一部分,通报给与被监控点有关的利益相关方,由利益相关方及时响应对应处理措施,比如让运维人员提前介入消除故障隐,有效预防故障异常的发生,可有效增大MTBF;或者在故障不可避免发生时立刻处理,有效减少故障处理时间,可有效减少MTTR。Among them, if the failure time point and the corresponding prediction index data are determined, the failure time point and the corresponding prediction index data can be used as the alarm information or part of the alarm information, and notified to the stakeholders related to the monitored point, and the stakeholders will be notified by the stakeholders. Respond to corresponding measures in a timely manner, such as allowing operation and maintenance personnel to intervene in advance to eliminate hidden faults, effectively prevent the occurrence of abnormal faults, and effectively increase MTBF; or deal with faults immediately when they inevitably occur, effectively reducing fault processing time and effectively reducing MTTR .
图2示出根据本申请另一个方面的一种用于故障预测神经网络模型的训练设备示意图,其中,所述设备包括:2 shows a schematic diagram of a training device for a fault prediction neural network model according to another aspect of the present application, wherein the device includes:
第一装置21,用于获取被监控点的历史指标数据集,其中,所述历史指标数据集是由在不同历史时间点采集到的所述被监控点的监控指标数据组成;a first device 21, configured to obtain a historical indicator data set of a monitored point, wherein the historical indicator data set is composed of monitoring indicator data of the monitored point collected at different historical time points;
第二装置22,用于基于预设周期及采样频率,对所述历史指标数据集进行处理,以确定训练集及测试集;The second device 22 is configured to process the historical indicator data set based on a preset period and sampling frequency to determine a training set and a test set;
第三装置23,用于基于所述训练集训练神经网络,直至所述神经网络输出的输出误差符合第一预设阈值,基于所述测试集测试所述神经网络,若准确率符合第二预设阈值,获得训练好的被监控点故障预测神经网络模型。A third device 23, configured to train a neural network based on the training set until the output error output by the neural network meets a first preset threshold, and test the neural network based on the test set, if the accuracy meets the second preset threshold Set the threshold to obtain the trained neural network model for fault prediction of monitored points.
设备1的第一装置21获取被监控点的历史指标数据集,其中,所述历史指标数据集是由在不同历史时间点采集到的所述被监控点的监控指标数据组成,第二装置22基于预设周期及采样频率,对第一装置21获取的历史指标数据集进行处理,以确定训练集及测试集,第三装置23基于第二装置22确定的训练集训练神经网络,直至所述神经网络输出的输出误差符合第一预设阈值,基于所述测试集测试所述神经网络,若准确率符合第二预设阈值,获得训练好的被监控点故障预测神经网络模型。The first device 21 of the device 1 obtains the historical indicator data set of the monitored point, wherein the historical indicator data set is composed of the monitoring indicator data of the monitored point collected at different historical time points, and the second device 22 Based on the preset period and sampling frequency, the historical index data set obtained by the first device 21 is processed to determine a training set and a test set, and the third device 23 trains the neural network based on the training set determined by the second device 22 until the The output error output by the neural network meets the first preset threshold, and the neural network is tested based on the test set. If the accuracy rate meets the second preset threshold, a trained neural network model for fault prediction of the monitored point is obtained.
可选地,其中,所述一种用于故障预测神经网络模型的训练设备还包括:Optionally, wherein, the training device for a fault prediction neural network model further includes:
第四装置24(未示出),用于对所述历史指标数据集进行预处理,以消除异常历史指标数据的影响。The fourth device 24 (not shown) is configured to preprocess the historical indicator data set to eliminate the influence of abnormal historical indicator data.
设备1的第四装置24对第一装置21获取的历史指标数据集进行预处理,以消除异常历史指标数据的影响,第二装置22基于预设周期及采样频率, 对第四装置24预处理后的历史指标数据集进行处理,以确定训练集及测试集。The fourth device 24 of the device 1 preprocesses the historical indicator data set acquired by the first device 21 to eliminate the influence of abnormal historical indicator data, and the second device 22 preprocesses the fourth device 24 based on the preset period and sampling frequency After the historical indicator data set is processed, the training set and the test set are determined.
可选地,其中,所述一种用于故障预测神经网络模型的训练设备还包括:Optionally, wherein, the training device for a fault prediction neural network model further includes:
第五装置25(未示出),用于获取当前时间点的指标数据序列,其中,所述当前时间点的指标数据序列由当前时间点前的N个历史指标数据组成;a fifth device 25 (not shown), configured to obtain the index data sequence at the current time point, wherein the index data sequence at the current time point is composed of N historical index data before the current time point;
第六装置26(未示出),用于基于预设的预测时间长度、预设周期及采样数量N,确定迭代外推预测次数M;a sixth device 26 (not shown), configured to determine the number of iteratively extrapolated predictions M based on a preset prediction time length, a preset period and a sampling number N;
第七装置27(未示出),用于将所述当前时间点的指标数据序列输入所述训练好的被监控点故障预测神经网络模型,进行M次迭代外推预测,以获得M个被监控点的预测指标数据。A seventh device 27 (not shown) is configured to input the index data sequence of the current time point into the trained neural network model for failure prediction of monitored points, and perform M iterations of extrapolation prediction to obtain M Predictor data for monitoring points.
设备1的第五装置25获取当前时间点的指标数据序列,其中,所述当前时间点的指标数据序列由当前时间点前的N个历史指标数据组成,第六装置26基于预设的预测时间长度、预设周期及采样数量N,确定迭代外推预测次数M,第七装置27将第五装置获取的所述当前时间点的指标数据序列输入第三装置23获得的训练好的被监控点故障预测神经网络模型,进行M次迭代外推预测,以获得M个被监控点的预测指标数据。The fifth device 25 of the device 1 obtains the index data sequence at the current time point, wherein the index data sequence at the current time point is composed of N historical index data before the current time point, and the sixth device 26 is based on a preset prediction time Length, preset period and sampling number N, determine the number of iterative extrapolation predictions M, the seventh device 27 inputs the index data sequence of the current time point obtained by the fifth device into the trained monitored point obtained by the third device 23 The fault prediction neural network model performs M times of iterative extrapolation prediction to obtain the prediction index data of the M monitored points.
可选地,其中,所述一种用于故障预测神经网络模型的训练设备还包括:Optionally, wherein, the training device for a fault prediction neural network model further includes:
第八装置28(未示出),用于将所述M个被监控点的预测指标数据按时间顺序分别于第三预设阈值比较,确定第一个不符合的被监控点的预测指标数据对应的时间点为故障时间点。The eighth device 28 (not shown) is used to compare the prediction index data of the M monitored points with a third preset threshold in time sequence, and determine the prediction index data of the first non-compliant monitored point The corresponding time point is the failure time point.
设备1的第八装置28将第七装置27获得的M个被监控点的预测指标数据按时间顺序分别于第三预设阈值比较,确定第一个不符合的被监控点的预测指标数据对应的时间点为故障时间点。The eighth device 28 of the device 1 compares the prediction index data of the M monitored points obtained by the seventh device 27 with the third preset threshold in chronological order, and determines that the first non-compliant predicted index data of the monitored point corresponds to The time point is the failure time point.
可选地,其中,所述一种用于故障预测神经网络模型的训练设备还包括:Optionally, wherein, the training device for a fault prediction neural network model further includes:
第九装置29(未示出),用于基于所述故障时间点及对应的预测指标数据,确定告警信息,并将所述告警信息上报。A ninth device 29 (not shown) is configured to determine alarm information based on the failure time point and corresponding prediction index data, and report the alarm information.
设备1的第九装置29获取第八装置确定的故障时间点及对应的预测指标数据,将故障时间点及对应的预测指标数据确定为告警信息或告警信息的一部分,上报与被监控点有关的利益相关方,由利益相关方及时响应 对应处理措施,比如让运维人员提前介入消除故障隐,有效预防故障异常的发生,可有效增大MTBF;或者在故障不可避免发生时立刻处理,有效减少故障处理时间,可有效减少MTTR。The ninth device 29 of the device 1 obtains the failure time point and the corresponding prediction index data determined by the eighth device, determines the failure time point and the corresponding prediction index data as the alarm information or a part of the alarm information, and reports the alarm information related to the monitored point. Stakeholders, stakeholders should respond to corresponding measures in a timely manner, such as allowing operation and maintenance personnel to intervene in advance to eliminate hidden faults, effectively prevent the occurrence of abnormal faults, and effectively increase MTBF; or deal with faults immediately when they inevitably occur, effectively reducing Troubleshooting time, can effectively reduce MTTR.
根据本申请的又一方面,还提供了一种计算机可读介质,所述计算机可读介质存储有计算机可读指令,所述计算机可读指令可被处理器执行以实现前述方法。According to yet another aspect of the present application, there is also provided a computer-readable medium storing computer-readable instructions executable by a processor to implement the aforementioned method.
根据本申请的又一方面,还提供了一种用于故障预测神经网络模型的训练设备,其中,该设备包括:According to another aspect of the present application, a training device for a fault prediction neural network model is also provided, wherein the device includes:
一个或多个处理器;以及one or more processors; and
存储有计算机可读指令的存储器,所述计算机可读指令在被执行时使所述处理器执行如前述方法的操作。A memory storing computer readable instructions which, when executed, cause the processor to perform operations as the aforementioned methods.
例如,计算机可读指令在被执行时使所述一个或多个处理器:获取被监控点的历史指标数据集,对所述历史指标数据集进行预处理,以消除异常历史指标数据的影响,基于预设周期,对所述历史指标数据集进行处理,以确定训练集及测试集,基于所述训练集训练神经网络,直至所述神经网络输出的输出误差符合第一预设阈值,基于所述测试集测试所述神经网络,若准确率符合第二预设阈值,获得训练好的被监控点故障预测神经网络模型;获取当前时间点的指标数据序列,基于预设的预测时间长度、预设周期及采样数量,确定迭代外推预测次数M,将所述当前时间点的指标数据序列输入所述训练好的被监控点故障预测神经网络模型,进行M次迭代外推预测,以获得M个被监控点的预测指标数据;将所述M个被监控点的预测指标数据按时间顺序分别于第三预设阈值比较,确定第一个不符合的被监控点的预测指标数据对应的时间点为故障时间点;基于所述故障时间点及对应的预测指标数据,确定告警信息,并将所述告警信息上报。For example, the computer-readable instructions, when executed, cause the one or more processors to: obtain a historical indicator data set of the monitored point, preprocess the historical indicator data set to eliminate the influence of abnormal historical indicator data, Based on a preset period, the historical indicator data set is processed to determine a training set and a test set, and a neural network is trained based on the training set until the output error output by the neural network meets the first preset threshold, based on the The test set is used to test the neural network. If the accuracy rate meets the second preset threshold, a trained neural network model for fault prediction of the monitored point is obtained; the index data sequence at the current time point is obtained, based on the preset prediction time length, prediction Set the period and the number of samples, determine the number of iterative extrapolation predictions M, input the index data sequence at the current time point into the trained neural network model for failure prediction of monitored points, and perform M times of iterative extrapolation prediction to obtain M Predictive index data of the M monitored points; compare the predictive index data of the M monitored points with a third preset threshold in time sequence, and determine the time corresponding to the first non-compliant predictive index data of the monitored point The point is the failure time point; based on the failure time point and the corresponding prediction index data, alarm information is determined, and the alarm information is reported.
对于本领域技术人员而言,显然本发明不限于上述示范性实施例的细节,而且在不背离本发明的精神或基本特征的情况下,能够以其他的具体形式实现本发明。因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本发明的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本发明内。 不应将权利要求中的任何附图标记视为限制所涉及的权利要求。此外,显然“包括”一词不排除其他单元或步骤,单数不排除复数。装置权利要求中陈述的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实现。第一,第二等词语用来表示名称,而并不表示任何特定的顺序。It will be apparent to those skilled in the art that the present invention is not limited to the details of the above-described exemplary embodiments, but that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics of the invention. Therefore, the embodiments are to be regarded in all respects as illustrative and not restrictive, and the scope of the invention is to be defined by the appended claims rather than the foregoing description, which are therefore intended to fall within the scope of the claims. All changes within the meaning and range of the equivalents of , are included in the present invention. Any reference signs in the claims shall not be construed as limiting the involved claim. Furthermore, it is clear that the word "comprising" does not exclude other units or steps and the singular does not exclude the plural. Several units or means recited in the device claims can also be realized by one unit or means by means of software or hardware. The terms first, second, etc. are used to denote names and do not denote any particular order.

Claims (15)

  1. 一种用于故障预测神经网络模型的训练方法,其特征在于,所述方法包括:A training method for a fault prediction neural network model, characterized in that the method comprises:
    获取被监控点的历史指标数据集,其中,所述历史指标数据集由在不同历史时间点采集到的所述被监控点的监控指标数据组成;Obtain the historical indicator data set of the monitored point, wherein the historical indicator data set is composed of the monitoring indicator data of the monitored point collected at different historical time points;
    基于预设周期,对所述历史指标数据集进行处理,以确定训练集及测试集;Based on a preset period, processing the historical indicator data set to determine a training set and a test set;
    基于所述训练集训练神经网络,直至所述神经网络输出的输出误差符合第一预设阈值,基于所述测试集测试所述神经网络,若准确率符合第二预设阈值,获得训练好的被监控点故障预测神经网络模型。The neural network is trained based on the training set until the output error output by the neural network meets the first preset threshold, and the neural network is tested based on the test set. If the accuracy rate meets the second preset threshold, the trained neural network is obtained. A neural network model for fault prediction of monitored points.
  2. 根据权利要求1所述的方法,其特征在于,所述基于预设周期,对所述历史指标数据集进行处理,以确定训练集及测试集包括:The method according to claim 1, wherein the processing of the historical indicator data set based on a preset period to determine a training set and a test set includes:
    基于预设周期,确定采样数量N;Based on the preset period, determine the sampling number N;
    遍历所述历史指标数据集中的历史指标数据,构建不同时间点的历史指标数据序列,其中,所述不同时间点的历史指标数据序列是由该时间点前的N个历史指标数据组成;Traverse the historical indicator data in the historical indicator data set, and construct historical indicator data sequences at different time points, wherein the historical indicator data sequences at different time points are composed of N historical indicator data before the time point;
    将不同时间点的历史指标数据确定为该时间点对应的历史指标数据序列的真值标注;Determine the historical indicator data at different time points as the true value annotation of the historical indicator data sequence corresponding to the time point;
    基于所述历史指标数据序列及所述真值标注,确定所述训练集及测试集,其中,所述训练集及测试集中的样本包括不同时间点的历史指标数据序列及对应的真值标注。The training set and the test set are determined based on the historical indicator data sequence and the true value label, wherein the samples in the training set and the test set include historical indicator data sequences and corresponding true value labels at different time points.
  3. 根据权利要求2所述的方法,其特征在于,在构建不同时间点的历史指标数据序列之前,所述方法还包括:The method according to claim 2, characterized in that, before constructing the historical indicator data series at different time points, the method further comprises:
    对所述历史指标数据集进行预处理,以消除异常历史指标数据的影响。The historical indicator data set is preprocessed to eliminate the influence of abnormal historical indicator data.
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述神经网络是LSTM神经网络,所述LSTM神经网络的结构包括:The method according to any one of claims 1 to 3, wherein the neural network is an LSTM neural network, and the structure of the LSTM neural network comprises:
    1个输入层;1 input layer;
    2个LSTM隐含层;2 LSTM hidden layers;
    1个全连接输出层。1 fully connected output layer.
  5. 根据权利要求1至4中任一项所述的方法,其特征在于,所述输出 误差包括均方误差。The method of any one of claims 1 to 4, wherein the output error comprises a mean squared error.
  6. 根据权利要求1所述的方法,其特征在于,所述方法还包括:The method according to claim 1, wherein the method further comprises:
    获取当前时间点的指标数据序列,其中,所述当前时间点的指标数据序列由当前时间点前的N个历史指标数据组成;Obtain the index data sequence at the current time point, wherein the index data sequence at the current time point is composed of N historical index data before the current time point;
    基于预设的预测时间长度、预设周期及采样数量N,确定迭代外推预测次数M;Determine the iterative extrapolation prediction times M based on the preset prediction time length, the preset period and the sampling number N;
    将所述当前时间点的指标数据序列输入所述训练好的被监控点故障预测神经网络模型,进行M次迭代外推预测,以获得M个被监控点的预测指标数据。The index data sequence at the current time point is input into the trained neural network model for failure prediction of monitored points, and M times of iterative extrapolation is performed to obtain prediction index data of M monitored points.
  7. 根据权利要求6所述的方法,其特征在于,所述方法还包括:The method according to claim 6, wherein the method further comprises:
    将所述M个被监控点的预测指标数据按时间顺序分别于第三预设阈值比较,确定第一个不符合的被监控点的预测指标数据对应的时间点为故障时间点。The prediction index data of the M monitored points are compared with the third preset threshold in time sequence, and the time point corresponding to the first non-compliant prediction index data of the monitored point is determined as the failure time point.
  8. 根据权利要求7所述的方法,其特征在于,所述方法还包括:The method according to claim 7, wherein the method further comprises:
    基于所述故障时间点及对应的预测指标数据,确定告警信息,并将所述告警信息上报。Based on the failure time point and the corresponding prediction index data, alarm information is determined, and the alarm information is reported.
  9. 一种用于故障预测神经网络模型的训练设备,其特征在于,所述设备包括:A training device for a fault prediction neural network model, characterized in that the device includes:
    第一装置,用于获取被监控点的历史指标数据集,其中,所述历史指标数据集是由在不同历史时间点采集到的所述被监控点的监控指标数据组成;a first device, configured to obtain a historical indicator data set of a monitored point, wherein the historical indicator data set is composed of monitoring indicator data of the monitored point collected at different historical time points;
    第二装置,用于基于预设周期及采样频率,对所述历史指标数据集进行处理,以确定训练集及测试集;a second device, configured to process the historical indicator data set based on a preset period and sampling frequency to determine a training set and a test set;
    第三装置,用于基于所述训练集训练神经网络,直至所述神经网络输出的输出误差符合第一预设阈值,基于所述测试集测试所述神经网络,若准确率符合第二预设阈值,获得训练好的被监控点故障预测神经网络模型。a third device, configured to train a neural network based on the training set until the output error output by the neural network meets a first preset threshold, and test the neural network based on the test set, if the accuracy meets the second preset Threshold to obtain the trained neural network model for failure prediction of monitored points.
  10. 根据权利要求9所述的设备,其特征在于,所述设备还包括:The device according to claim 9, wherein the device further comprises:
    第四装置,用于对所述历史指标数据集进行预处理,以消除异常历史指标数据的影响。The fourth device is used for preprocessing the historical indicator data set to eliminate the influence of abnormal historical indicator data.
  11. 根据权利要求9或10所述的设备,其特征在于,所述设备还包括:The device according to claim 9 or 10, wherein the device further comprises:
    第五装置,用于获取当前时间点的指标数据序列,其中,所述当前时间点的指标数据序列由当前时间点前的N个历史指标数据组成;a fifth device, configured to obtain the index data sequence at the current time point, wherein the index data sequence at the current time point is composed of N historical index data before the current time point;
    第六装置,用于基于预设的预测时间长度、预设周期及采样数量N,确定迭代外推预测次数M;a sixth device, configured to determine the number of iteratively extrapolated predictions M based on a preset prediction time length, a preset period and a sampling number N;
    第七装置,用于将所述当前时间点的指标数据序列输入所述训练好的被监控点故障预测神经网络模型,进行M次迭代外推预测,以获得M个被监控点的预测指标数据。The seventh device is used for inputting the index data sequence of the current time point into the trained neural network model for failure prediction of monitored points, and performing M times of iterative extrapolation prediction to obtain prediction index data of M monitored points .
  12. 根据权利要求11所述的设备,其特征在于,所述设备还包括:The device according to claim 11, wherein the device further comprises:
    第八装置,用于将所述M个被监控点的预测指标数据按时间顺序分别于第三预设阈值比较,确定第一个不符合的被监控点的预测指标数据对应的时间点为故障时间点。The eighth device is used to compare the prediction index data of the M monitored points with a third preset threshold in time sequence, and determine that the time point corresponding to the prediction index data of the first non-compliant monitored point is a fault point in time.
  13. 根据权利要求12所述的设备,其特征在于,所述设备还包括:The device according to claim 12, wherein the device further comprises:
    第九装置,用于基于所述故障时间点及对应的预测指标数据,确定告警信息,并将所述告警信息上报。A ninth device is configured to determine alarm information based on the failure time point and corresponding prediction index data, and report the alarm information.
  14. 一种计算机可读介质,其特征在于,A computer-readable medium, characterized in that:
    其上存储有计算机可读指令,所述计算机可读指令可被处理器执行以实现如权利要求1至8中任一项所述的方法。Computer readable instructions are stored thereon which are executable by a processor to implement the method of any one of claims 1 to 8.
  15. 一种设备,其特征在于,该设备包括:A device, characterized in that the device comprises:
    一个或多个处理器;以及one or more processors; and
    存储有计算机可读指令的存储器,所述计算机可读指令在被执行时使所述处理器执行如权利要求1至8中任一项所述方法的操作。A memory storing computer readable instructions which, when executed, cause the processor to perform the operations of the method of any one of claims 1 to 8.
PCT/CN2021/090028 2020-09-03 2021-04-26 Training method and device for failure prediction neural network model WO2022048168A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010916672.2A CN112115024B (en) 2020-09-03 2020-09-03 Training method and device for fault prediction neural network model
CN202010916672.2 2020-09-03

Publications (1)

Publication Number Publication Date
WO2022048168A1 true WO2022048168A1 (en) 2022-03-10

Family

ID=73801715

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/090028 WO2022048168A1 (en) 2020-09-03 2021-04-26 Training method and device for failure prediction neural network model

Country Status (2)

Country Link
CN (1) CN112115024B (en)
WO (1) WO2022048168A1 (en)

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114638412A (en) * 2022-03-11 2022-06-17 平安国际智慧城市科技股份有限公司 Hazardous waste disposal capacity early warning method and device, electronic equipment and storage medium
CN115001942A (en) * 2022-05-26 2022-09-02 腾云悦智科技(深圳)有限责任公司 Method and system for recommending operation and maintenance monitoring threshold
CN114999182A (en) * 2022-05-25 2022-09-02 中国人民解放军国防科技大学 Vehicle flow prediction method, device and equipment based on LSTM feedback mechanism
CN115269319A (en) * 2022-07-21 2022-11-01 河南职业技术学院 CEPH distributed computer fault diagnosis method
CN115293057A (en) * 2022-10-10 2022-11-04 深圳先进技术研究院 Wind driven generator fault prediction method based on multi-source heterogeneous data
CN115392056A (en) * 2022-10-26 2022-11-25 广东电网有限责任公司中山供电局 Method and device for monitoring and early warning running state of high-voltage overhead transmission line
CN115600764A (en) * 2022-11-17 2023-01-13 中船重工(武汉)凌久高科有限公司(Cn) Rolling time domain energy consumption prediction method based on weight neighborhood rough set rapid reduction
CN115671616A (en) * 2022-10-28 2023-02-03 厦门海辰储能科技股份有限公司 Fire fighting system and method for energy storage container and storage medium
CN115865992A (en) * 2023-03-02 2023-03-28 中国建材检验认证集团湖南有限公司 Wisdom water conservancy on-line monitoring system
CN116027736A (en) * 2023-02-14 2023-04-28 广东热浪新材料科技有限公司 Control optimization method and control system of star basin processing equipment
CN116046078A (en) * 2023-03-31 2023-05-02 东莞市楷德精密机械有限公司 Fault monitoring and early warning method and system for semiconductor cleaning equipment
CN116192608A (en) * 2023-01-18 2023-05-30 北京百度网讯科技有限公司 Cloud mobile phone fault prediction method, device and equipment
CN116248959A (en) * 2023-05-12 2023-06-09 深圳市橙视科技发展有限公司 Network player fault detection method, device, equipment and storage medium
CN116304928A (en) * 2023-03-21 2023-06-23 北京思维实创科技有限公司 Power supply equipment fault prediction method, device, equipment and storage medium
CN116359683A (en) * 2023-02-28 2023-06-30 四川大唐国际甘孜水电开发有限公司 Partial discharge mode identification method and system based on information interaction
CN116432542A (en) * 2023-06-12 2023-07-14 国网江西省电力有限公司电力科学研究院 Switch cabinet busbar temperature rise early warning method and system based on error sequence correction
CN116471196A (en) * 2023-06-19 2023-07-21 宏景科技股份有限公司 Operation and maintenance monitoring network maintenance method, system and equipment
CN116517921A (en) * 2023-07-03 2023-08-01 成都飞机工业(集团)有限责任公司 On-line detection method and system for aviation hydraulic oil vehicle state
CN116558824A (en) * 2023-04-19 2023-08-08 华中科技大学 Multi-channel-oriented bearing comprehensive index health monitoring method and system
CN116611006A (en) * 2023-05-22 2023-08-18 广州吉谷电器有限公司 Fault identification method and device of electric kettle based on user feedback
CN116861202A (en) * 2023-09-05 2023-10-10 青岛哈尔滨工程大学创新发展中心 Ship motion envelope forecasting method and system based on long-term and short-term memory neural network
CN116934354A (en) * 2023-07-21 2023-10-24 浙江远图技术股份有限公司 Method and device for supervising medicine metering scale, electronic equipment and medium
CN116975639A (en) * 2023-07-28 2023-10-31 东莞盟大集团有限公司 Abnormality prevention and control system and method for equipment
CN117370848A (en) * 2023-12-08 2024-01-09 深圳市明心数智科技有限公司 Equipment fault prediction method, device, computer equipment and storage medium
CN117880055A (en) * 2024-03-12 2024-04-12 灵长智能科技(杭州)有限公司 Network fault diagnosis method, device, equipment and medium based on transmission layer index
CN117880055B (en) * 2024-03-12 2024-05-31 灵长智能科技(杭州)有限公司 Network fault diagnosis method, device, equipment and medium based on transmission layer index

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112115024B (en) * 2020-09-03 2023-07-18 上海上讯信息技术股份有限公司 Training method and device for fault prediction neural network model
CN113111585A (en) * 2021-04-15 2021-07-13 德州欧瑞电子通信设备制造有限公司 Intelligent cabinet fault prediction method and system and intelligent cabinet
CN113452379B (en) * 2021-07-16 2022-08-02 燕山大学 Section contour dimension reduction model training method and system and data compression method and system
CN115859188A (en) * 2021-09-24 2023-03-28 中兴通讯股份有限公司 Service abnormity prediction method and device, storage medium and electronic device
CN114710413A (en) * 2022-03-31 2022-07-05 中国农业银行股份有限公司 Method and device for predicting network state of bank outlets
CN115331155B (en) * 2022-10-14 2023-02-03 智慧齐鲁(山东)大数据科技有限公司 Mass video monitoring point location graph state detection method and system
CN117455666A (en) * 2023-10-16 2024-01-26 厦门国际银行股份有限公司 Transaction technical index prediction method, device and equipment based on neural network

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180075356A1 (en) * 2016-09-09 2018-03-15 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for monitoring system
CN109117941A (en) * 2018-07-16 2019-01-01 北京思特奇信息技术股份有限公司 Alarm prediction method, system, storage medium and computer equipment
CN110008079A (en) * 2018-12-25 2019-07-12 阿里巴巴集团控股有限公司 Monitor control index method for detecting abnormality, model training method, device and equipment
CN110441065A (en) * 2019-07-04 2019-11-12 杭州华电江东热电有限公司 Gas turbine online test method and device based on LSTM
CN110865929A (en) * 2019-11-26 2020-03-06 携程旅游信息技术(上海)有限公司 Abnormity detection early warning method and system
CN112115024A (en) * 2020-09-03 2020-12-22 上海上讯信息技术股份有限公司 Training method and device for fault prediction neural network model

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200104639A1 (en) * 2018-09-28 2020-04-02 Applied Materials, Inc. Long short-term memory anomaly detection for multi-sensor equipment monitoring
CN109697852B (en) * 2019-01-23 2021-04-02 吉林大学 Urban road congestion degree prediction method based on time sequence traffic events
CN110163189B (en) * 2019-06-10 2020-12-25 哈尔滨工业大学 Bandwidth-limited signal dynamic extrapolation method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180075356A1 (en) * 2016-09-09 2018-03-15 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for monitoring system
CN109117941A (en) * 2018-07-16 2019-01-01 北京思特奇信息技术股份有限公司 Alarm prediction method, system, storage medium and computer equipment
CN110008079A (en) * 2018-12-25 2019-07-12 阿里巴巴集团控股有限公司 Monitor control index method for detecting abnormality, model training method, device and equipment
CN110441065A (en) * 2019-07-04 2019-11-12 杭州华电江东热电有限公司 Gas turbine online test method and device based on LSTM
CN110865929A (en) * 2019-11-26 2020-03-06 携程旅游信息技术(上海)有限公司 Abnormity detection early warning method and system
CN112115024A (en) * 2020-09-03 2020-12-22 上海上讯信息技术股份有限公司 Training method and device for fault prediction neural network model

Cited By (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114638412A (en) * 2022-03-11 2022-06-17 平安国际智慧城市科技股份有限公司 Hazardous waste disposal capacity early warning method and device, electronic equipment and storage medium
CN114999182B (en) * 2022-05-25 2023-07-04 中国人民解放军国防科技大学 Traffic flow prediction method, device and equipment based on LSTM feedback mechanism
CN114999182A (en) * 2022-05-25 2022-09-02 中国人民解放军国防科技大学 Vehicle flow prediction method, device and equipment based on LSTM feedback mechanism
CN115001942A (en) * 2022-05-26 2022-09-02 腾云悦智科技(深圳)有限责任公司 Method and system for recommending operation and maintenance monitoring threshold
CN115269319A (en) * 2022-07-21 2022-11-01 河南职业技术学院 CEPH distributed computer fault diagnosis method
CN115269319B (en) * 2022-07-21 2023-09-01 河南职业技术学院 CEPH distributed computer fault diagnosis method
CN115293057B (en) * 2022-10-10 2022-12-20 深圳先进技术研究院 Wind driven generator fault prediction method based on multi-source heterogeneous data
CN115293057A (en) * 2022-10-10 2022-11-04 深圳先进技术研究院 Wind driven generator fault prediction method based on multi-source heterogeneous data
CN115392056A (en) * 2022-10-26 2022-11-25 广东电网有限责任公司中山供电局 Method and device for monitoring and early warning running state of high-voltage overhead transmission line
CN115671616A (en) * 2022-10-28 2023-02-03 厦门海辰储能科技股份有限公司 Fire fighting system and method for energy storage container and storage medium
CN115600764A (en) * 2022-11-17 2023-01-13 中船重工(武汉)凌久高科有限公司(Cn) Rolling time domain energy consumption prediction method based on weight neighborhood rough set rapid reduction
CN116192608A (en) * 2023-01-18 2023-05-30 北京百度网讯科技有限公司 Cloud mobile phone fault prediction method, device and equipment
CN116027736A (en) * 2023-02-14 2023-04-28 广东热浪新材料科技有限公司 Control optimization method and control system of star basin processing equipment
CN116027736B (en) * 2023-02-14 2023-09-12 广东热浪新材料科技有限公司 Control optimization method and control system of star basin processing equipment
CN116359683A (en) * 2023-02-28 2023-06-30 四川大唐国际甘孜水电开发有限公司 Partial discharge mode identification method and system based on information interaction
CN116359683B (en) * 2023-02-28 2023-12-26 四川大唐国际甘孜水电开发有限公司 Partial discharge mode identification method and system based on information interaction
CN115865992A (en) * 2023-03-02 2023-03-28 中国建材检验认证集团湖南有限公司 Wisdom water conservancy on-line monitoring system
CN116304928A (en) * 2023-03-21 2023-06-23 北京思维实创科技有限公司 Power supply equipment fault prediction method, device, equipment and storage medium
CN116046078A (en) * 2023-03-31 2023-05-02 东莞市楷德精密机械有限公司 Fault monitoring and early warning method and system for semiconductor cleaning equipment
CN116558824A (en) * 2023-04-19 2023-08-08 华中科技大学 Multi-channel-oriented bearing comprehensive index health monitoring method and system
CN116558824B (en) * 2023-04-19 2024-02-06 华中科技大学 Multi-channel-oriented bearing comprehensive index health monitoring method and system
CN116248959A (en) * 2023-05-12 2023-06-09 深圳市橙视科技发展有限公司 Network player fault detection method, device, equipment and storage medium
CN116611006B (en) * 2023-05-22 2024-02-20 广州吉谷电器有限公司 Fault identification method and device of electric kettle based on user feedback
CN116611006A (en) * 2023-05-22 2023-08-18 广州吉谷电器有限公司 Fault identification method and device of electric kettle based on user feedback
CN116432542A (en) * 2023-06-12 2023-07-14 国网江西省电力有限公司电力科学研究院 Switch cabinet busbar temperature rise early warning method and system based on error sequence correction
CN116432542B (en) * 2023-06-12 2023-10-20 国网江西省电力有限公司电力科学研究院 Switch cabinet busbar temperature rise early warning method and system based on error sequence correction
CN116471196A (en) * 2023-06-19 2023-07-21 宏景科技股份有限公司 Operation and maintenance monitoring network maintenance method, system and equipment
CN116471196B (en) * 2023-06-19 2023-10-20 宏景科技股份有限公司 Operation and maintenance monitoring network maintenance method, system and equipment
CN116517921B (en) * 2023-07-03 2023-11-10 成都飞机工业(集团)有限责任公司 On-line detection method and system for aviation hydraulic oil vehicle state
CN116517921A (en) * 2023-07-03 2023-08-01 成都飞机工业(集团)有限责任公司 On-line detection method and system for aviation hydraulic oil vehicle state
CN116934354A (en) * 2023-07-21 2023-10-24 浙江远图技术股份有限公司 Method and device for supervising medicine metering scale, electronic equipment and medium
CN116934354B (en) * 2023-07-21 2024-04-05 浙江远图技术股份有限公司 Method and device for supervising medicine metering scale, electronic equipment and medium
CN116975639A (en) * 2023-07-28 2023-10-31 东莞盟大集团有限公司 Abnormality prevention and control system and method for equipment
CN116861202B (en) * 2023-09-05 2023-12-19 青岛哈尔滨工程大学创新发展中心 Ship motion envelope forecasting method and system based on long-term and short-term memory neural network
CN116861202A (en) * 2023-09-05 2023-10-10 青岛哈尔滨工程大学创新发展中心 Ship motion envelope forecasting method and system based on long-term and short-term memory neural network
CN117370848A (en) * 2023-12-08 2024-01-09 深圳市明心数智科技有限公司 Equipment fault prediction method, device, computer equipment and storage medium
CN117370848B (en) * 2023-12-08 2024-04-02 深圳市明心数智科技有限公司 Equipment fault prediction method, device, computer equipment and storage medium
CN117880055A (en) * 2024-03-12 2024-04-12 灵长智能科技(杭州)有限公司 Network fault diagnosis method, device, equipment and medium based on transmission layer index
CN117880055B (en) * 2024-03-12 2024-05-31 灵长智能科技(杭州)有限公司 Network fault diagnosis method, device, equipment and medium based on transmission layer index

Also Published As

Publication number Publication date
CN112115024A (en) 2020-12-22
CN112115024B (en) 2023-07-18

Similar Documents

Publication Publication Date Title
WO2022048168A1 (en) Training method and device for failure prediction neural network model
WO2020259421A1 (en) Method and apparatus for monitoring service system
JP6690011B2 (en) System and method for measuring effective customer impact of network problems in real time using streaming analysis
US20190095266A1 (en) Detection of Misbehaving Components for Large Scale Distributed Systems
US20190235944A1 (en) Anomaly Detection using Circumstance-Specific Detectors
US8918345B2 (en) Network analysis system
EP2874064B1 (en) Adaptive metric collection, storage, and alert thresholds
US11093349B2 (en) System and method for reactive log spooling
US20060188011A1 (en) Automated diagnosis and forecasting of service level objective states
WO2022028120A1 (en) Indicator detection model acquisition method and apparatus, fault locating method and apparatus, and device and storage medium
CN112188534B (en) Abnormality detection method and device
CN114338372B (en) Network information security monitoring method and system
WO2018125628A1 (en) A network monitor and method for event based prediction of radio network outages and their root cause
CN108038624A (en) Method and device for analyzing health state of wind turbine generator
US20200233587A1 (en) Method, device and computer product for predicting disk failure
CN113590429A (en) Server fault diagnosis method and device and electronic equipment
US11294748B2 (en) Identification of constituent events in an event storm in operations management
US11868203B1 (en) Systems and methods for computer infrastructure monitoring and maintenance
CN115509853A (en) Cluster data anomaly detection method and electronic equipment
CN116804957A (en) System monitoring method and device
WO2022000285A1 (en) Health index of a service
Syal et al. Automatic detection of network traffic anomalies and changes
Li et al. Generic and robust root cause localization for multi-dimensional data in online service systems
US11941284B2 (en) Management system, QoS violation detection method, and QoS violation detection program
US20240036963A1 (en) Multi-contextual anomaly detection

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21863238

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21863238

Country of ref document: EP

Kind code of ref document: A1