CN110275814A - A kind of monitoring method and device of operation system - Google Patents
A kind of monitoring method and device of operation system Download PDFInfo
- Publication number
- CN110275814A CN110275814A CN201910580570.5A CN201910580570A CN110275814A CN 110275814 A CN110275814 A CN 110275814A CN 201910580570 A CN201910580570 A CN 201910580570A CN 110275814 A CN110275814 A CN 110275814A
- Authority
- CN
- China
- Prior art keywords
- time period
- index data
- prediction
- service system
- machine learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012544 monitoring process Methods 0.000 title claims abstract description 153
- 238000000034 method Methods 0.000 title claims abstract description 45
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 100
- 238000010801 machine learning Methods 0.000 claims abstract description 87
- 238000012549 training Methods 0.000 claims description 43
- 230000002159 abnormal effect Effects 0.000 claims description 39
- 230000015654 memory Effects 0.000 claims description 9
- 238000003860 storage Methods 0.000 claims description 2
- 230000008569 process Effects 0.000 description 14
- 230000005856 abnormality Effects 0.000 description 12
- 230000000875 corresponding effect Effects 0.000 description 12
- 238000012423 maintenance Methods 0.000 description 12
- 238000010586 diagram Methods 0.000 description 11
- 238000005516 engineering process Methods 0.000 description 6
- 238000012545 processing Methods 0.000 description 6
- 238000004590 computer program Methods 0.000 description 5
- 230000002354 daily effect Effects 0.000 description 5
- 238000001514 detection method Methods 0.000 description 5
- 238000011161 development Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 238000004519 manufacturing process Methods 0.000 description 4
- 238000010295 mobile communication Methods 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 230000002596 correlated effect Effects 0.000 description 3
- 230000036541 health Effects 0.000 description 3
- 238000012806 monitoring device Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000007477 logistic regression Methods 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 210000001520 comb Anatomy 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 230000004083 survival effect Effects 0.000 description 1
- 238000012731 temporal analysis Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000000700 time series analysis Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3065—Monitoring arrangements determined by the means or processing involved in reporting the monitored data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/10—Machine learning using kernel methods, e.g. support vector machines [SVM]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Quality & Reliability (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Testing And Monitoring For Control Systems (AREA)
- Debugging And Monitoring (AREA)
Abstract
The present embodiments relate to machine learning field more particularly to the monitoring method and device of a kind of operation system, to solve the problems, such as operation system alarm there are hysteresis quality and accuracy it is lower.The embodiment of the present invention includes: obtain the monitor control index data of operation system in reference time section;The monitor control index data are compared with direct alarm conditions;If the monitor control index data are unsatisfactory for the direct alarm conditions, the monitor control index data are inputted in the machine learning algorithm model trained in advance, the prediction result in predicted time section is determined using the machine learning algorithm model;By the prediction result with it is expected that alarm conditions compare, predict whether the operation system exception occurs in predicted time section.
Description
Technical Field
The invention relates to the field of machine learning in financial technology (Fintech), in particular to a monitoring method and a monitoring device for a business system.
Background
With the development of computer technology, more and more technologies (big data, distributed, Blockchain, artificial intelligence, etc.) are applied to the financial field, and the traditional financial industry is gradually changing to financial technology (Fintech), but higher requirements are also put forward on the technologies due to the requirements of security and real-time performance of the financial industry. A traditional service system monitoring platform is mainly used for configuring relevant alarm strategies according to requirements by users. When a service system is on-line and needs daily monitoring, a service operation and maintenance/development personnel combs key points of the service system, makes related alarm strategy conditions for the key points, and configures the corresponding monitoring alarm strategies in a monitoring platform. Thus, the monitoring platform scans and detects the configured service key points to obtain corresponding detection indexes, matches the detection indexes with a monitoring alarm strategy (namely whether the alarm condition is met) configured by the user, and alarms and informs the user if the alarm condition configured by the user is met.
In the prior art, the monitoring alarm strategies are alarm thresholds pre-configured in an alarm tool for a user, and generally, such thresholds are configured by operation and maintenance/development personnel according to historical experience, and the accuracy is low. And the related operation and maintenance personnel are informed finally from the detection of the abnormity to the confirmation of the abnormity, so that the time consumption of the alarming process is long. Thus, in some cases, there is hysteresis in the alarm. When the service system is abnormal, the data abnormality is uncontrollable and possibly rises exponentially, the index data of the service system does not meet the alarm condition when the service system is just abnormal, when an operation and maintenance worker receives an alarm notification, the abnormal degree of the service index is very serious, the abnormal influence range is rapidly diffused, the service is damaged, and the alarm significance is lost.
Disclosure of Invention
The application provides a monitoring method and a monitoring device for a service system, which are used for solving the problems of hysteresis and low accuracy of service system alarm.
The monitoring method for the service system provided by the embodiment of the invention comprises the following steps:
acquiring monitoring index data of a service system in a reference time period;
comparing the monitoring index data with a direct alarm condition;
if the monitoring index data do not meet the direct alarm condition, inputting the monitoring index data into a machine learning algorithm model trained in advance, and determining a prediction result in a prediction time period by using the machine learning algorithm model;
and comparing the prediction result with the predicted alarm condition, and predicting whether the service system is abnormal in the prediction time period.
In an optional embodiment, before inputting the monitoring index data into a machine learning algorithm model trained in advance and determining a prediction result in a prediction time period by using the machine learning algorithm model, the method further includes:
acquiring training data of a service system in a historical time period;
and inputting training data of the service system in the historical time period as parameters into the machine learning algorithm model, and determining model parameters of the machine learning algorithm model.
In an alternative embodiment, the projected alarm condition is determined according to the following:
inputting historical fault sample data of the service system into the machine learning algorithm model for training, and determining fault model parameters;
inputting historical non-fault sample data of the service system into the machine learning algorithm model for training, and determining non-fault model parameters;
and determining a fault condition according to the fault model parameters and the non-fault model parameters.
In an optional embodiment, the monitoring index data of the business system includes hardware index data of the business system;
for the hardware index data of the service system, the acquiring of the monitoring index data of the service system in the reference time period includes:
acquiring hardware index data of the service system in a first reference time period;
the determining a prediction result within a prediction time period by using the machine learning algorithm model comprises:
determining fluctuation conditions of the hardware index data in the prediction time period;
comparing the prediction result with the predicted alarm condition to predict whether the service system is abnormal in the prediction time period comprises the following steps:
comparing the fluctuation condition of the hardware index data with the fault condition, and judging whether the service system has hardware fault in the prediction time period;
and if the service system has hardware faults in the prediction time period, determining the hardware fault prediction time period and the prediction accuracy.
In an optional embodiment, the monitoring index data of the service system includes service index data of the service system;
for the service index data of the service system, the acquiring of the monitoring index data of the service system in the reference time period includes:
acquiring service index data of the service system in a second reference time period;
the determining a prediction result within a prediction time period by using the machine learning algorithm model comprises:
determining the fluctuation condition of the service index data in the prediction time period;
comparing the prediction result with the predicted alarm condition to predict whether the service system is abnormal in the prediction time period comprises the following steps:
comparing the fluctuation condition of the service index data with a normal fluctuation range, and judging whether the service system is abnormal in the prediction time period;
if the service is abnormal in the prediction time period, determining an abnormal prediction time period; and the normal fluctuation range is determined by the machine learning algorithm model according to the fluctuation condition of the service index data in the historical time period.
In an optional embodiment, the monitoring index data of the service system includes monitoring index data of a plurality of monitoring indexes;
before determining the prediction result in the prediction time period by using the machine learning algorithm model, the method further comprises:
determining a weight parameter of each monitoring index;
and inputting the weight parameters corresponding to the monitoring index data into the machine learning algorithm model.
A monitoring device of a business system, comprising:
the acquisition unit is used for acquiring monitoring index data of the service system in a reference time period;
the comparison unit is used for comparing the monitoring index data with the direct alarm condition;
the prediction unit is used for inputting the monitoring index data into a machine learning algorithm model trained in advance if the monitoring index data does not meet the direct alarm condition, and determining a prediction result in a prediction time period by using the machine learning algorithm model;
and the alarm unit is used for comparing the prediction result with the predicted alarm condition and predicting whether the service system is abnormal in the prediction time period.
In an optional embodiment, the apparatus further comprises a training unit configured to:
acquiring training data of a service system in a historical time period;
and inputting training data of the service system in the historical time period as parameters into the machine learning algorithm model, and determining model parameters of the machine learning algorithm model.
In an optional embodiment, the apparatus further comprises a training unit configured to:
inputting historical fault sample data of the service system in a historical time period into the machine learning algorithm model for training, and determining fault model parameters;
inputting historical non-fault sample data of the service system in the historical time period into the machine learning algorithm model for training, and determining non-fault model parameters;
and determining a fault condition according to the fault model parameters and the non-fault model parameters.
In an optional embodiment, the monitoring index data of the business system includes hardware index data of the business system;
hardware metrics data for the business system,
the acquiring unit is used for acquiring hardware index data of the service system in a first reference time period;
the prediction unit is used for determining the fluctuation condition of the hardware index data in the prediction time period;
the alarm unit is used for comparing the fluctuation condition of the hardware index data with the fault condition and judging whether the service system has a hardware fault in the prediction time period; and if the service system has hardware faults in the prediction time period, determining the hardware fault prediction time period and the prediction accuracy.
In an optional embodiment, the monitoring index data of the service system includes service index data of the service system;
for the traffic indicator data of the traffic system,
the acquiring unit is used for acquiring service index data of the service system in a second reference time period;
the prediction unit is used for determining the fluctuation condition of the service index data in the prediction time period;
the alarm unit is used for comparing the fluctuation condition of the service index data with a normal fluctuation range and judging whether the service system is abnormal in the prediction time period; if the service is abnormal in the prediction time period, determining an abnormal prediction time period; and the normal fluctuation range is determined by the machine learning algorithm model according to the fluctuation condition of the service index data in the historical time period.
In an optional embodiment, the monitoring index data of the service system includes monitoring index data of a plurality of monitoring indexes;
the prediction unit is further configured to:
determining a weight parameter of each monitoring index;
and inputting the weight parameters corresponding to the monitoring index data into the machine learning algorithm model.
An embodiment of the present invention further provides an electronic device, including:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.
Embodiments of the present invention also provide a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method as described above.
In the embodiment of the invention, the monitoring index data of the service system in the reference time period is obtained, the monitoring index data is compared with the direct alarm condition, and if the monitoring index data meets the direct alarm condition, the user is directly alarmed. And if the monitoring index data do not meet the direct alarm condition, inputting the monitoring index data into the machine learning algorithm model, and determining a prediction result in the prediction time period by using the machine learning algorithm model. The prediction time period comprises a time period after the current time point, namely, the machine learning algorithm model can predict the operation condition of the business system for a period of time in the future, and compares the operation condition with the predicted alarm condition, so as to predict whether the abnormity is possible, and if the abnormity is possible, the machine learning algorithm model alarms the user. The embodiment of the invention predicts the possible future abnormality by using the machine learning algorithm model, so that the service operation and maintenance personnel can prepare service disaster tolerance in advance for the impending abnormality, the availability of a service system is improved, and the prediction accuracy is high.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
FIG. 1 is a schematic diagram of a possible system architecture according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of a monitoring method for a service system according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a monitoring apparatus of a service system according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, a system architecture to which the embodiment of the present invention is applicable includes a business system 101, a monitoring platform 102, and a monitoring client 103. The service system 101 and/or the monitoring platform 102 may be a network device such as a computer, an independent device, or a server cluster formed by a plurality of servers. Preferably, the business system 101 and/or the monitoring platform 102 may employ cloud computing technology for information processing.
The monitoring client 103 is installed on the monitoring platform 102. The monitoring platform 102 may be an electronic device with a wireless communication function, such as a mobile phone, a tablet computer, or a dedicated handheld device, and may also be a device connected to the internet in a wired access manner, such as a Personal Computer (PC), a notebook computer, or a server.
The monitoring platform 102 may communicate with the service System 101 through an INTERNET network, or may communicate with the service System 101 through a Global System for Mobile Communications (GSM), a Long Term Evolution (LTE) System, or other Mobile communication systems. The monitoring client 103 may communicate with the monitoring platform 102 through an INTERNET network, or may communicate with the monitoring platform 102 through a global system for Mobile Communications (GSM), a Long Term Evolution (LTE) system, or other Mobile communication systems.
For the use of the user, the system architecture in the embodiment of the invention is almost the same as that of the traditional monitoring platform, and the user only needs to configure the service index monitoring strategy concerned by the user, so that the use of the user is more friendly, the user does not need to pay attention to how to realize the fault prediction in the monitoring platform, and the use threshold is not used.
For convenience of understanding, terms that may be referred to in the embodiments of the present invention are defined and explained below.
The user: the users in the embodiment of the invention comprise service system developers, service operation and maintenance personnel and all related personnel for monitoring the service by using the monitoring platform.
The intelligent monitoring platform: a tool for taking charge of monitoring and alarming a business system. The method comprises the steps of monitoring service indexes and basic service (such as server hardware health conditions, network connection conditions and the like) indexes of a system, integrating the detected indexes through a machine learning algorithm model, and predicting possible fault abnormity in the future.
Alarm detection/prediction: the method is also called as service system fault detection/prediction, and is used for detecting and predicting faults/abnormalities which may occur in daily operation of a service system by a monitoring platform.
LSTM: long Short-Term Memory (LSTM), a time-recursive neural network algorithm in machine learning.
Time series: the numerical sequence is a numerical sequence formed by arranging numerical values of the same statistical index according to the occurrence time sequence. The main purpose of time series analysis is to predict the future based on existing historical data. Most of the economic data is given in time series. The time in the time series may be year, quarter, month or any other form of time depending on the time of observation.
In order to implement the position data of the prediction node and improve the accuracy of the prediction, an embodiment of the present invention provides a monitoring method for a service system, and as shown in fig. 2, the monitoring method for a service system provided by the embodiment of the present invention includes the following steps:
step 201, obtaining monitoring index data of the service system in the reference time period.
Because different server manufacturers, the collected data formats are different, including different hardware, the recorded hardware data formats are also different, and the data formats of different service interfaces and different services are also different, the reported data needs to be cleaned, the unification of various data formats is realized, and the cleaned data can be used for machine learning training, alarm matching and prediction of a big data processing and machine learning algorithm module.
Meanwhile, because the monitoring indexes of the server hardware and the service interface are different, and the dimension data of each part and each interface may be many, a data source positively correlated to the monitoring index needs to be selected to eliminate interference items, such as a SMART value of a hard disk, a Health value of a mainboard, and the like.
Step 202, comparing the monitoring index data with a direct alarm condition.
Carrying out logic processing on the cleaned monitoring index data, judging whether the monitoring index data reaches a direct alarm condition, if so, indicating that the business system is abnormal currently, and directly alarming a user; if the direct alarm condition is not met, the reported monitoring index data is calculated through a trained machine learning algorithm model, and whether the abnormality is possibly generated in the future time period is predicted.
The direct alarm condition in the embodiment of the invention can also be trained through a machine learning algorithm model and automatically judged by the system in the daily iteration process in the production environment, so that the time spent by operation and maintenance personnel in the process of configuring the direct alarm condition is reduced, the management efficiency is improved, and the false alarm caused by manual configuration is avoided.
And 203, if the monitoring index data does not meet the direct alarm condition, inputting the monitoring index data into a machine learning algorithm model trained in advance, and determining a prediction result in a prediction time period by using the machine learning algorithm model.
The machine learning algorithm model can comprise a CNN neural network, a vector machine SVM, a K-Means cluster, a Logistic Regression (Logistic Regression) and the like. In consideration of the balance relationship between the training cost (operation time, the scale of the server cluster required to be operated) and the prediction result, the LSTM neural network algorithm is preferably used for prediction in the embodiment of the invention.
And step 204, comparing the prediction result with the predicted alarm condition, and predicting whether the service system is abnormal in the prediction time period.
In the specific implementation process, the predicted alarm condition can be determined by operation and maintenance personnel according to experience, can be obtained through machine learning algorithm model training, or can be judged by the system in the daily iteration process in the production environment. If the abnormality occurs, the user can be informed by means of mails and/or short messages and/or telephones and/or WeChat and the like.
In the embodiment of the invention, the monitoring index data of the service system in the reference time period is obtained, the monitoring index data is compared with the direct alarm condition, and if the monitoring index data meets the direct alarm condition, the user is directly alarmed. And if the monitoring index data do not meet the direct alarm condition, inputting the monitoring index data into the machine learning algorithm model, and determining a prediction result in the prediction time period by using the machine learning algorithm model. The prediction time period comprises a time period after the current time point, namely, the machine learning algorithm model can predict the operation condition of the business system for a period of time in the future, and compares the operation condition with the predicted alarm condition, so as to predict whether the abnormity is possible, and if the abnormity is possible, the machine learning algorithm model alarms the user. The embodiment of the invention predicts the possible future abnormality by using the machine learning algorithm model, so that the service operation and maintenance personnel can prepare service disaster tolerance in advance for the impending abnormality, the availability of a service system is improved, and the prediction accuracy is high.
Since the monitoring index data of the service system is correlated with time to form time series data, a prediction result in a future time period can be predicted according to the monitoring index data, so that the operation condition of the service system is indexed. And comparing the prediction result with the set predicted alarm condition so as to determine whether the service system is possible to be abnormal in a future time period.
Further, the LSTM algorithm model is trained based on training data over a historical period of time. Before inputting the monitoring index data into a machine learning algorithm model trained in advance and determining a prediction result in a prediction time period by using the machine learning algorithm model, the method further comprises:
acquiring training data of a service system in a historical time period;
and inputting training data of the service system in the historical time period as parameters into the machine learning algorithm model, and determining model parameters of the machine learning algorithm model.
In the specific implementation process, the training data of the service system at each time point is used as the output parameters of the LSTM algorithm model, and for each output parameter, a lot of training data in the historical time period before the corresponding time point is used as the input parameters of the LSTM algorithm model. Thus, after a large number of corresponding relations between the input parameters and the output parameters are obtained, model parameters of the LSTM algorithm model can be obtained based on the existing LSTM algorithm model training method.
It should be noted that the historical time period corresponding to the training process and the reference time period corresponding to the prediction process may be the same time period or different time periods, and if the historical time period and the reference time period are different time periods, the two time periods may or may not overlap. For example, the historical time period is 1000 hours before the current time point, and the reference time period is 999 hours before the current time point; or the historical period is 9 to 11 am per day from 1 to 3 months in 2018, and the reference period is 9 to 11 am per day from 1 to 3 months in 2019. The selection of the historical time period and the reference time period is based on the calculation requirement, and the embodiment of the invention is not limited.
Further, in the embodiment of the present invention, the predicted alarm condition may also be obtained by training using the LSTM algorithm. The expected alarm condition is determined according to the following manner:
inputting historical fault sample data of the service system into the machine learning algorithm model for training, and determining fault model parameters;
inputting historical non-fault sample data of the service system into the machine learning algorithm model for training, and determining non-fault model parameters;
and determining a fault condition according to the fault model parameters and the non-fault model parameters.
In a specific implementation process, the historical fault samples are various hardware index data acquired by the service system when the hardware fault is determined, and the fault model parameters of the hardware of the service system when the hardware of the service system is in fault can be determined by inputting the historical fault samples into the LSTM algorithm model. The historical non-fault samples are various hardware index data collected when the service system operates normally, and the historical non-fault samples are input into the LSTM algorithm model, so that non-fault model parameters of the hardware of the service system in the normal operation process can be determined. Thus, the specific fault condition may be determined based on the fault model parameters and non-fault model parameters.
Because the monitoring index data of the service system comprises the hardware index data of the service system and the service index data of the service system, the embodiment of the invention respectively predicts and alarms aiming at two different monitoring indexes.
Further, the acquiring of the monitoring index data of the service system in the reference time period for the hardware index data of the service system includes:
acquiring hardware index data of the service system in a first reference time period;
the determining a prediction result within a prediction time period by using the machine learning algorithm model comprises:
determining fluctuation conditions of the hardware index data in the prediction time period;
comparing the prediction result with the predicted alarm condition to predict whether the service system is abnormal in the prediction time period comprises the following steps:
comparing the fluctuation condition of the hardware index data with the fault condition, and judging whether the service system has hardware fault in the prediction time period;
and if the service system has hardware faults in the prediction time period, determining the hardware fault prediction time period and the prediction accuracy.
In the specific implementation process, for hardware index data, each server has its own life cycle, and the more the time node at which the temporary fault abnormality occurs, the higher the prediction accuracy. Therefore, the first reference time period of the hardware index data is selected as close as possible to the current time point.
Table 1 shows the failure prediction results of the hardware index data.
TABLE 1
For example, as shown in table 1, for monitoring index 1, it is predicted that abnormality may occur in the server hardware within 45 days, and the prediction accuracy is 78%; if the server hardware is predicted to be abnormal within 60 days, the prediction accuracy is 80%.
For the service index data of the service system, the acquiring of the monitoring index data of the service system in the reference time period includes:
acquiring service index data of the service system in a second reference time period;
the determining a prediction result within a prediction time period by using the machine learning algorithm model comprises:
determining the fluctuation condition of the service index data in the prediction time period;
comparing the prediction result with the predicted alarm condition to predict whether the service system is abnormal in the prediction time period comprises the following steps:
comparing the fluctuation condition of the service index data with a normal fluctuation range, and judging whether the service system is abnormal in the prediction time period;
if the service is abnormal in the prediction time period, determining an abnormal prediction time period; and the normal fluctuation range is determined by the machine learning algorithm model according to the fluctuation condition of the service index data in the historical time period.
In the specific implementation process, since the service index data changes every day, for the prediction model, the prediction accuracy is also improved along with the survival time of the server hardware after production or the service index monitoring time. For the service index, the more monitoring index data used for prediction, the larger the sample data, and the more accurate the result. Therefore, the second reference time period of the service index data is selected as long as possible.
Because the data volume of the collected monitoring index data may be very large, and the influence of the weight of each monitoring index on the abnormality is different, the weight range of each monitoring index needs to be configured through calculation. Further, the monitoring index data of the service system comprises monitoring index data of a plurality of monitoring indexes;
before determining the prediction result in the prediction time period by using the machine learning algorithm model, the method further comprises:
determining a weight parameter of each monitoring index;
and inputting the weight parameters corresponding to the monitoring index data into the machine learning algorithm model.
In the embodiment of the invention, the monitoring index condition in a certain time period in the future is predicted through an LSTM model, and if the monitoring index data detected at the next monitoring time point is not in the predicted normal fluctuation range, the alarm is given to a service operation and maintenance/development staff. In addition, the service operation and maintenance/development personnel can also make corresponding preparation in advance according to the monitoring index fluctuation condition predicted by the monitoring platform, so that the service is prevented from being influenced. For example, before holidays or new business activities, the monitoring platform predicts daily access traffic which may increase in the future of the business, so that business operation and maintenance personnel can perform system capacity expansion in advance, and the situation that the business system is unavailable due to insufficient performance of the business system is avoided.
In order to more clearly understand the present invention, the above flow is described in detail below with a specific embodiment based on the architecture of fig. 1, and the steps of the specific embodiment are as follows, including:
step S300: and training the LSTM algorithm model to obtain model parameters.
Step S301: and acquiring monitoring index data.
Because data indexes of server hardware and service interfaces are different, and dimension data of each part and each interface may be many, a data source positively correlated to the service index needs to be selected to eliminate interference items, such as a SMART value of a hard disk, a Health value of a mainboard and the like.
Step S302: and (4) preprocessing data.
Because the data volume of the collected monitoring indexes may be very large, and the influence of the weight of each monitoring index on the abnormality is different, the weight parameters of each monitoring index need to be acquired.
Step S303: and acquiring a predicted alarm condition. The method comprises the steps of obtaining a fault model parameter and a non-fault model parameter by respectively carrying out model training by using historical fault sample data and historical non-fault sample data, and determining a predicted alarm condition according to the fault model parameter and the non-fault model parameter.
Step S304: and inputting the monitoring index data and the weight parameters into the trained LSTM algorithm model, and calculating a prediction result of the prediction time period by using the LSTM algorithm model.
In the specific implementation process, an LSTM algorithm model is used for predicting a complete sequence, namely, a training window is initialized once only by using the first part of training data, and then the sliding window is continuously moved and the next point is predicted like point-by-point prediction. The LSTM algorithm model predicts by using the predicted data, namely, in the second prediction, one data point (the last point) in the data used by the model comes from the previous prediction; at the third prediction, there are two points in the data from the previous prediction … … and so on, and by the 99 th prediction, the data in the test set is already fully predicted. This means that the time series that the algorithm model can predict is greatly extended.
Step S305: and comparing the prediction result with the predicted alarm condition, determining whether the service system is abnormal in the prediction time period, and displaying the prediction result to a user.
Different service systems/servers may have different priorities for alarms for different service systems/servers. Because the algorithm model can simultaneously feed back the prediction accuracy, whether the corresponding service system needs to carry out fault prediction or not can be determined according to different prediction accuracy and a threshold value matching strategy which can be defined by a user in advance.
An embodiment of the present invention further provides a monitoring apparatus for a service system, as shown in fig. 3, including:
an obtaining unit 31, configured to obtain monitoring index data of a service system in a reference time period;
a comparison unit 32, configured to compare the monitoring index data with a direct alarm condition;
the prediction unit 33 is configured to, if the monitoring index data does not satisfy the direct alarm condition, input the monitoring index data into a machine learning algorithm model trained in advance, and determine a prediction result within a prediction time period by using the machine learning algorithm model;
and the warning unit 34 is configured to compare the prediction result with a predicted warning condition, and predict whether the service system is abnormal within a prediction time period.
Further comprising a training unit 35 for:
acquiring training data of a service system in a historical time period;
and inputting training data of the service system in the historical time period as parameters into the machine learning algorithm model, and determining model parameters of the machine learning algorithm model.
Optionally, a training unit 35 is further included, configured to:
inputting historical fault sample data of the service system in a historical time period into the machine learning algorithm model for training, and determining fault model parameters;
inputting historical non-fault sample data of the service system in the historical time period into the machine learning algorithm model for training, and determining non-fault model parameters;
and determining a fault condition according to the fault model parameters and the non-fault model parameters.
Optionally, the monitoring index data of the service system includes hardware index data of the service system;
hardware metrics data for the business system,
the acquiring unit 31 is configured to acquire hardware index data of the service system in a first reference time period;
the prediction unit 33 is configured to determine a fluctuation condition of the hardware index data in the prediction time period;
the alarm unit 34 is configured to compare the fluctuation condition of the hardware indicator data with the fault condition, and determine whether a hardware fault occurs in the service system within the prediction time period; and if the service system has hardware faults in the prediction time period, determining the hardware fault prediction time period and the prediction accuracy.
The monitoring index data of the service system comprises service index data of the service system; for the traffic indicator data of the traffic system,
the acquiring unit 31 is configured to acquire service index data of the service system in a second reference time period;
the prediction unit 33 is configured to determine a fluctuation condition of the service index data in the prediction time period;
the alarm unit 34 is configured to compare the fluctuation condition of the service index data with a normal fluctuation range, and determine whether the service system is abnormal in the prediction time period; if the service is abnormal in the prediction time period, determining an abnormal prediction time period; and the normal fluctuation range is determined by the machine learning algorithm model according to the fluctuation condition of the service index data in the historical time period.
Optionally, the monitoring index data of the service system includes monitoring index data of a plurality of monitoring indexes;
the prediction unit 33 is further configured to:
determining a weight parameter of each monitoring index;
and inputting the weight parameters corresponding to the monitoring index data into the machine learning algorithm model.
Based on the same principle, the present invention also provides an electronic device, as shown in fig. 4, including:
the system comprises a processor 501, a memory 502, a transceiver 503 and a bus interface 504, wherein the processor 501, the memory 502 and the transceiver 503 are connected through the bus interface 504;
the processor 501 is configured to read the program in the memory 502, and execute the following method:
acquiring monitoring index data of a service system in a reference time period;
comparing the monitoring index data with a direct alarm condition;
if the monitoring index data do not meet the direct alarm condition, inputting the monitoring index data into a machine learning algorithm model trained in advance, and determining a prediction result in a prediction time period by using the machine learning algorithm model;
and comparing the prediction result with the predicted alarm condition, and predicting whether the service system is abnormal in the prediction time period.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.
Claims (14)
1. A method for monitoring a service system, comprising:
acquiring monitoring index data of a service system in a reference time period;
comparing the monitoring index data with a direct alarm condition;
if the monitoring index data do not meet the direct alarm condition, inputting the monitoring index data into a machine learning algorithm model trained in advance, and determining a prediction result in a prediction time period by using the machine learning algorithm model;
and comparing the prediction result with the predicted alarm condition, and predicting whether the service system is abnormal in the prediction time period.
2. The method of claim 1, wherein before inputting the monitoring index data into a pre-trained machine learning algorithm model and determining a prediction result within a prediction time period using the machine learning algorithm model, the method further comprises:
acquiring training data of a service system in a historical time period;
and inputting training data of the service system in the historical time period as parameters into the machine learning algorithm model, and determining model parameters of the machine learning algorithm model.
3. The method of claim 1, wherein the projected alarm condition is determined according to:
inputting historical fault sample data of the service system into the machine learning algorithm model for training, and determining fault model parameters;
inputting historical non-fault sample data of the service system into the machine learning algorithm model for training, and determining non-fault model parameters;
and determining a fault condition according to the fault model parameters and the non-fault model parameters.
4. The method of claim 3, wherein the monitoring metrics data of the business system comprises hardware metrics data of the business system;
for the hardware index data of the service system, the acquiring of the monitoring index data of the service system in the reference time period includes:
acquiring hardware index data of the service system in a first reference time period;
the determining a prediction result within a prediction time period by using the machine learning algorithm model comprises:
determining fluctuation conditions of the hardware index data in the prediction time period;
comparing the prediction result with the predicted alarm condition to predict whether the service system is abnormal in the prediction time period comprises the following steps:
comparing the fluctuation condition of the hardware index data with the fault condition, and judging whether the service system has hardware fault in the prediction time period;
and if the service system has hardware faults in the prediction time period, determining the hardware fault prediction time period and the prediction accuracy.
5. The method of claim 1, wherein the monitoring metrics data of the business system comprises business metrics data of the business system;
for the service index data of the service system, the acquiring of the monitoring index data of the service system in the reference time period includes:
acquiring service index data of the service system in a second reference time period;
the determining a prediction result within a prediction time period by using the machine learning algorithm model comprises:
determining the fluctuation condition of the service index data in the prediction time period;
comparing the prediction result with the predicted alarm condition to predict whether the service system is abnormal in the prediction time period comprises the following steps:
comparing the fluctuation condition of the service index data with a normal fluctuation range, and judging whether the service system is abnormal in the prediction time period;
if the service is abnormal in the prediction time period, determining an abnormal prediction time period; and the normal fluctuation range is determined by the machine learning algorithm model according to the fluctuation condition of the service index data in the historical time period.
6. The method of claim 1, wherein the monitoring index data of the business system includes monitoring index data of a plurality of monitoring indices;
before determining the prediction result in the prediction time period by using the machine learning algorithm model, the method further comprises:
determining a weight parameter of each monitoring index;
and inputting the weight parameters corresponding to the monitoring index data into the machine learning algorithm model.
7. A monitoring apparatus for a business system, comprising:
the acquisition unit is used for acquiring monitoring index data of the service system in a reference time period;
the comparison unit is used for comparing the monitoring index data with the direct alarm condition;
the prediction unit is used for inputting the monitoring index data into a machine learning algorithm model trained in advance if the monitoring index data does not meet the direct alarm condition, and determining a prediction result in a prediction time period by using the machine learning algorithm model;
and the alarm unit is used for comparing the prediction result with the predicted alarm condition and predicting whether the service system is abnormal in the prediction time period.
8. The apparatus of claim 7, further comprising a training unit to:
acquiring training data of a service system in a historical time period;
and inputting training data of the service system in the historical time period as parameters into the machine learning algorithm model, and determining model parameters of the machine learning algorithm model.
9. The apparatus of claim 7, further comprising a training unit to:
inputting historical fault sample data of the service system in a historical time period into the machine learning algorithm model for training, and determining fault model parameters;
inputting historical non-fault sample data of the service system in the historical time period into the machine learning algorithm model for training, and determining non-fault model parameters;
and determining a fault condition according to the fault model parameters and the non-fault model parameters.
10. The apparatus of claim 9, wherein the monitoring metrics data of the business system comprises hardware metrics data of the business system;
hardware metrics data for the business system,
the acquiring unit is used for acquiring hardware index data of the service system in a first reference time period;
the prediction unit is used for determining the fluctuation condition of the hardware index data in the prediction time period;
the alarm unit is used for comparing the fluctuation condition of the hardware index data with the fault condition and judging whether the service system has a hardware fault in the prediction time period; and if the service system has hardware faults in the prediction time period, determining the hardware fault prediction time period and the prediction accuracy.
11. The apparatus of claim 7, wherein the monitoring metrics data of the business system comprises business metrics data of the business system;
for the traffic indicator data of the traffic system,
the acquiring unit is used for acquiring service index data of the service system in a second reference time period;
the prediction unit is used for determining the fluctuation condition of the service index data in the prediction time period;
the alarm unit is used for comparing the fluctuation condition of the service index data with a normal fluctuation range and judging whether the service system is abnormal in the prediction time period; if the service is abnormal in the prediction time period, determining an abnormal prediction time period; and the normal fluctuation range is determined by the machine learning algorithm model according to the fluctuation condition of the service index data in the historical time period.
12. The apparatus of claim 7, wherein the monitoring index data of the business system comprises monitoring index data of a plurality of monitoring indices;
the prediction unit is further configured to:
determining a weight parameter of each monitoring index;
and inputting the weight parameters corresponding to the monitoring index data into the machine learning algorithm model.
13. An electronic device, comprising:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.
14. A non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-6.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910580570.5A CN110275814A (en) | 2019-06-28 | 2019-06-28 | A kind of monitoring method and device of operation system |
PCT/CN2020/097249 WO2020259421A1 (en) | 2019-06-28 | 2020-06-19 | Method and apparatus for monitoring service system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910580570.5A CN110275814A (en) | 2019-06-28 | 2019-06-28 | A kind of monitoring method and device of operation system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110275814A true CN110275814A (en) | 2019-09-24 |
Family
ID=67963677
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910580570.5A Pending CN110275814A (en) | 2019-06-28 | 2019-06-28 | A kind of monitoring method and device of operation system |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110275814A (en) |
WO (1) | WO2020259421A1 (en) |
Cited By (35)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110865929A (en) * | 2019-11-26 | 2020-03-06 | 携程旅游信息技术(上海)有限公司 | Abnormity detection early warning method and system |
CN110941797A (en) * | 2019-11-07 | 2020-03-31 | 中信银行股份有限公司 | Operation index monitoring and trend prediction system based on service index |
CN111078503A (en) * | 2019-12-23 | 2020-04-28 | 中国建设银行股份有限公司 | Abnormity monitoring method and system |
CN111104299A (en) * | 2019-11-29 | 2020-05-05 | 山东英信计算机技术有限公司 | Server performance prediction method and device, electronic equipment and storage medium |
CN111241151A (en) * | 2019-12-27 | 2020-06-05 | 北京健康之家科技有限公司 | Service data analysis early warning method, system, storage medium and computing device |
CN111339156A (en) * | 2020-02-07 | 2020-06-26 | 京东城市(北京)数字科技有限公司 | Long-term determination method and device of business data and computer readable storage medium |
CN111563022A (en) * | 2020-05-12 | 2020-08-21 | 中国民航信息网络股份有限公司 | Centralized storage monitoring method and device |
CN111708682A (en) * | 2020-06-17 | 2020-09-25 | 腾讯科技(深圳)有限公司 | Data prediction method, device, equipment and storage medium |
CN111752816A (en) * | 2020-06-30 | 2020-10-09 | 深圳前海微众银行股份有限公司 | Operating system analysis method and device |
CN111796995A (en) * | 2020-06-30 | 2020-10-20 | 中国工商银行股份有限公司 | Cyclic serial number usage early warning method and system based on ensemble learning |
CN111833557A (en) * | 2020-07-27 | 2020-10-27 | 中国工商银行股份有限公司 | Fault identification method and device |
CN112019390A (en) * | 2020-09-09 | 2020-12-01 | 腾讯科技(深圳)有限公司 | Network fault positioning method and related device |
CN112102049A (en) * | 2020-09-23 | 2020-12-18 | 中国建设银行股份有限公司 | Model training method, business processing method, device and equipment |
WO2020259421A1 (en) * | 2019-06-28 | 2020-12-30 | 深圳前海微众银行股份有限公司 | Method and apparatus for monitoring service system |
CN112182508A (en) * | 2020-09-16 | 2021-01-05 | 支付宝(杭州)信息技术有限公司 | Abnormity monitoring method and device for compliance business indexes |
CN112256526A (en) * | 2020-10-14 | 2021-01-22 | 中国银联股份有限公司 | Data real-time monitoring method and device based on machine learning |
CN112486767A (en) * | 2020-11-25 | 2021-03-12 | 中移(杭州)信息技术有限公司 | Intelligent monitoring method, system, server and storage medium for cloud resources |
CN112702184A (en) * | 2019-10-23 | 2021-04-23 | 中国电信股份有限公司 | Fault early warning method and device and computer-readable storage medium |
CN112825175A (en) * | 2019-11-20 | 2021-05-21 | 顺丰科技有限公司 | Client abnormity early warning method, device and equipment |
CN112948223A (en) * | 2019-11-26 | 2021-06-11 | 北京沃东天骏信息技术有限公司 | Method and device for monitoring operation condition |
CN112994960A (en) * | 2019-12-02 | 2021-06-18 | 中国移动通信集团浙江有限公司 | Method and device for detecting business data abnormity and computing equipment |
CN113411549A (en) * | 2021-06-11 | 2021-09-17 | 上海兴容信息技术有限公司 | Method for judging whether business of target store is normal or not |
CN113411233A (en) * | 2021-06-17 | 2021-09-17 | 建信金融科技有限责任公司 | Method and device for monitoring CPU utilization rate of central processing unit |
CN113411217A (en) * | 2021-06-21 | 2021-09-17 | 广州迷听科技有限公司 | Method and device for monitoring and alarming call system |
CN113516270A (en) * | 2020-10-30 | 2021-10-19 | 腾讯科技(深圳)有限公司 | Service data monitoring method and device |
CN113537809A (en) * | 2021-07-28 | 2021-10-22 | 深圳供电局有限公司 | Active decision-making method and system for resource expansion in deep learning |
CN113535444A (en) * | 2020-04-14 | 2021-10-22 | 中国移动通信集团浙江有限公司 | Transaction detection method, transaction detection device, computing equipment and computer storage medium |
CN113572625A (en) * | 2020-04-28 | 2021-10-29 | 中国移动通信集团浙江有限公司 | Fault early warning method, early warning device, equipment and computer medium |
CN113807690A (en) * | 2021-09-09 | 2021-12-17 | 国网江苏省电力有限公司苏州供电分公司 | Online evaluation and early warning method and system for operation state of regional power grid regulation and control system |
CN113821416A (en) * | 2021-09-18 | 2021-12-21 | 中国电信股份有限公司 | Monitoring alarm method, device, storage medium and electronic equipment |
CN114328118A (en) * | 2021-12-30 | 2022-04-12 | 苏州浪潮智能科技有限公司 | Intelligent alarm method, device, equipment and medium for operation and maintenance monitoring data |
CN114399321A (en) * | 2021-11-15 | 2022-04-26 | 湖南快乐阳光互动娱乐传媒有限公司 | Business system stability analysis method, device and equipment |
CN114415602A (en) * | 2021-12-03 | 2022-04-29 | 珠海格力电器股份有限公司 | Monitoring method, device and system of industrial equipment and storage medium |
US11500368B2 (en) * | 2020-05-21 | 2022-11-15 | Tata Consultancy Services Limited | Predicting early warnings of an operating mode of equipment in industry plants |
CN115439089A (en) * | 2022-09-08 | 2022-12-06 | 江苏方洋智能科技有限公司 | Business management system based on machine learning |
Families Citing this family (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112766698B (en) * | 2021-01-13 | 2024-02-09 | 中国工商银行股份有限公司 | Application service pressure determining method and device |
CN115103386B (en) * | 2021-03-05 | 2024-07-16 | 中国电信股份有限公司 | Cell 5G wireless network performance early warning device, method and recording medium |
CN115119237B (en) * | 2021-03-17 | 2024-07-05 | 中国移动通信集团福建有限公司 | Chamber separation hidden fault identification method and device |
CN113780329A (en) * | 2021-04-06 | 2021-12-10 | 北京沃东天骏信息技术有限公司 | Method, apparatus, server and medium for identifying data anomalies |
CN113111589B (en) * | 2021-04-25 | 2024-07-05 | 北京百度网讯科技有限公司 | Training method of prediction model, method, device and equipment for predicting heat supply temperature |
CN113127309B (en) * | 2021-04-30 | 2023-10-10 | 北京奇艺世纪科技有限公司 | Program monitoring method and device, electronic equipment and storage medium |
CN113391981A (en) * | 2021-06-30 | 2021-09-14 | 中国民航信息网络股份有限公司 | Early warning method for monitoring index and related equipment |
CN113468022B (en) * | 2021-07-01 | 2024-02-09 | 丁鹤 | Automatic operation and maintenance method for centralized monitoring of products |
CN113626285A (en) * | 2021-07-30 | 2021-11-09 | 平安普惠企业管理有限公司 | Model-based job monitoring method and device, computer equipment and storage medium |
CN113590427B (en) * | 2021-08-09 | 2024-05-03 | 中国建设银行股份有限公司 | Alarm method, device, storage medium and equipment for monitoring index abnormality |
CN113835961B (en) * | 2021-09-23 | 2023-05-16 | 中国联合网络通信集团有限公司 | Alarm information monitoring method, device, server and storage medium |
CN114003461A (en) * | 2021-09-26 | 2022-02-01 | 苏州浪潮智能科技有限公司 | Server failure prediction method, system, terminal and storage medium |
CN114157585B (en) * | 2021-12-09 | 2024-09-20 | 京东科技信息技术有限公司 | Method and device for monitoring service resources |
CN114185948A (en) * | 2021-12-16 | 2022-03-15 | 北京宏天信业信息技术股份有限公司 | Data quality monitoring method and system based on data center |
CN114971057A (en) * | 2022-06-09 | 2022-08-30 | 支付宝(杭州)信息技术有限公司 | Model selection method and device |
CN115314412B (en) * | 2022-06-22 | 2023-09-05 | 北京邮电大学 | Operation-and-maintenance-oriented type self-adaptive index prediction and early warning method and device |
CN115169709B (en) * | 2022-07-18 | 2023-04-18 | 华能汕头海门发电有限责任公司 | Power station auxiliary machine fault diagnosis method and system based on data driving |
CN115412326A (en) * | 2022-08-23 | 2022-11-29 | 天翼安全科技有限公司 | Abnormal flow detection method and device, electronic equipment and storage medium |
CN115473784B (en) * | 2022-09-06 | 2024-07-09 | 中国银联股份有限公司 | Method and device for determining invalid alarm |
CN115981969A (en) * | 2023-03-10 | 2023-04-18 | 中国信息通信研究院 | Monitoring method and device for block chain data platform, electronic equipment and storage medium |
CN116664110B (en) * | 2023-06-08 | 2024-03-29 | 湖北华中电力科技开发有限责任公司 | Electric power marketing digitizing method and system based on business center |
CN116455679B (en) * | 2023-06-16 | 2023-09-08 | 杭州美创科技股份有限公司 | Abnormal database operation and maintenance flow monitoring method and device and computer equipment |
CN116895046B (en) * | 2023-07-21 | 2024-05-07 | 北京亿宇嘉隆科技有限公司 | Abnormal operation and maintenance data processing method based on virtualization |
CN117806900B (en) * | 2023-07-28 | 2024-05-07 | 苏州浪潮智能科技有限公司 | Server management method, device, electronic equipment and storage medium |
CN116991108B (en) * | 2023-09-25 | 2023-12-12 | 四川公路桥梁建设集团有限公司 | Intelligent management and control method, system and device for bridge girder erection machine and storage medium |
CN117149552A (en) * | 2023-10-31 | 2023-12-01 | 联通在线信息科技有限公司 | Automatic interface detection method and device, electronic equipment and storage medium |
CN117896284A (en) * | 2024-01-17 | 2024-04-16 | 北京奇虎科技有限公司 | Performance fluctuation positioning method, device, equipment and storage medium |
CN117648383B (en) * | 2024-01-30 | 2024-06-11 | 中国人民解放军国防科技大学 | Heterogeneous database real-time data synchronization method, device, equipment and medium |
CN117892249B (en) * | 2024-03-15 | 2024-05-31 | 宁波析昶环保科技有限公司 | Intelligent operation and maintenance platform early warning system |
CN118260167B (en) * | 2024-05-08 | 2024-09-10 | 国家气象信息中心(中国气象局气象数据中心) | Meteorological data product processing flow monitoring method, system, equipment and storage medium |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8352216B2 (en) * | 2008-05-29 | 2013-01-08 | General Electric Company | System and method for advanced condition monitoring of an asset system |
CN108172288A (en) * | 2018-01-05 | 2018-06-15 | 深圳倍佳医疗科技服务有限公司 | Medical Devices intelligent control method, device and computer readable storage medium |
CN110275814A (en) * | 2019-06-28 | 2019-09-24 | 深圳前海微众银行股份有限公司 | A kind of monitoring method and device of operation system |
-
2019
- 2019-06-28 CN CN201910580570.5A patent/CN110275814A/en active Pending
-
2020
- 2020-06-19 WO PCT/CN2020/097249 patent/WO2020259421A1/en active Application Filing
Cited By (47)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020259421A1 (en) * | 2019-06-28 | 2020-12-30 | 深圳前海微众银行股份有限公司 | Method and apparatus for monitoring service system |
CN112702184A (en) * | 2019-10-23 | 2021-04-23 | 中国电信股份有限公司 | Fault early warning method and device and computer-readable storage medium |
CN110941797B (en) * | 2019-11-07 | 2023-04-07 | 中信银行股份有限公司 | Operation index monitoring and trend prediction system based on service index |
CN110941797A (en) * | 2019-11-07 | 2020-03-31 | 中信银行股份有限公司 | Operation index monitoring and trend prediction system based on service index |
CN112825175A (en) * | 2019-11-20 | 2021-05-21 | 顺丰科技有限公司 | Client abnormity early warning method, device and equipment |
CN110865929B (en) * | 2019-11-26 | 2024-01-23 | 携程旅游信息技术(上海)有限公司 | Abnormality detection early warning method and system |
CN112948223A (en) * | 2019-11-26 | 2021-06-11 | 北京沃东天骏信息技术有限公司 | Method and device for monitoring operation condition |
CN110865929A (en) * | 2019-11-26 | 2020-03-06 | 携程旅游信息技术(上海)有限公司 | Abnormity detection early warning method and system |
CN111104299A (en) * | 2019-11-29 | 2020-05-05 | 山东英信计算机技术有限公司 | Server performance prediction method and device, electronic equipment and storage medium |
CN112994960B (en) * | 2019-12-02 | 2022-09-16 | 中国移动通信集团浙江有限公司 | Method and device for detecting business data abnormity and computing equipment |
CN112994960A (en) * | 2019-12-02 | 2021-06-18 | 中国移动通信集团浙江有限公司 | Method and device for detecting business data abnormity and computing equipment |
CN111078503A (en) * | 2019-12-23 | 2020-04-28 | 中国建设银行股份有限公司 | Abnormity monitoring method and system |
CN111241151A (en) * | 2019-12-27 | 2020-06-05 | 北京健康之家科技有限公司 | Service data analysis early warning method, system, storage medium and computing device |
CN111339156A (en) * | 2020-02-07 | 2020-06-26 | 京东城市(北京)数字科技有限公司 | Long-term determination method and device of business data and computer readable storage medium |
CN111339156B (en) * | 2020-02-07 | 2023-09-26 | 京东城市(北京)数字科技有限公司 | Method, apparatus and computer readable storage medium for long-term determination of business data |
CN113535444A (en) * | 2020-04-14 | 2021-10-22 | 中国移动通信集团浙江有限公司 | Transaction detection method, transaction detection device, computing equipment and computer storage medium |
CN113535444B (en) * | 2020-04-14 | 2023-11-03 | 中国移动通信集团浙江有限公司 | Abnormal motion detection method, device, computing equipment and computer storage medium |
CN113572625A (en) * | 2020-04-28 | 2021-10-29 | 中国移动通信集团浙江有限公司 | Fault early warning method, early warning device, equipment and computer medium |
CN111563022A (en) * | 2020-05-12 | 2020-08-21 | 中国民航信息网络股份有限公司 | Centralized storage monitoring method and device |
CN111563022B (en) * | 2020-05-12 | 2023-09-05 | 中国民航信息网络股份有限公司 | Centralized memory monitoring method and device |
US11500368B2 (en) * | 2020-05-21 | 2022-11-15 | Tata Consultancy Services Limited | Predicting early warnings of an operating mode of equipment in industry plants |
CN111708682A (en) * | 2020-06-17 | 2020-09-25 | 腾讯科技(深圳)有限公司 | Data prediction method, device, equipment and storage medium |
CN111796995B (en) * | 2020-06-30 | 2024-02-09 | 中国工商银行股份有限公司 | Integrated learning-based cyclic serial number usage early warning method and system |
CN111796995A (en) * | 2020-06-30 | 2020-10-20 | 中国工商银行股份有限公司 | Cyclic serial number usage early warning method and system based on ensemble learning |
CN111752816A (en) * | 2020-06-30 | 2020-10-09 | 深圳前海微众银行股份有限公司 | Operating system analysis method and device |
CN111833557A (en) * | 2020-07-27 | 2020-10-27 | 中国工商银行股份有限公司 | Fault identification method and device |
CN112019390A (en) * | 2020-09-09 | 2020-12-01 | 腾讯科技(深圳)有限公司 | Network fault positioning method and related device |
CN112182508A (en) * | 2020-09-16 | 2021-01-05 | 支付宝(杭州)信息技术有限公司 | Abnormity monitoring method and device for compliance business indexes |
CN112102049A (en) * | 2020-09-23 | 2020-12-18 | 中国建设银行股份有限公司 | Model training method, business processing method, device and equipment |
CN112256526B (en) * | 2020-10-14 | 2024-02-23 | 中国银联股份有限公司 | Machine learning-based data real-time monitoring method and device |
CN112256526A (en) * | 2020-10-14 | 2021-01-22 | 中国银联股份有限公司 | Data real-time monitoring method and device based on machine learning |
CN113516270A (en) * | 2020-10-30 | 2021-10-19 | 腾讯科技(深圳)有限公司 | Service data monitoring method and device |
CN112486767A (en) * | 2020-11-25 | 2021-03-12 | 中移(杭州)信息技术有限公司 | Intelligent monitoring method, system, server and storage medium for cloud resources |
CN113411549A (en) * | 2021-06-11 | 2021-09-17 | 上海兴容信息技术有限公司 | Method for judging whether business of target store is normal or not |
CN113411233A (en) * | 2021-06-17 | 2021-09-17 | 建信金融科技有限责任公司 | Method and device for monitoring CPU utilization rate of central processing unit |
CN113411233B (en) * | 2021-06-17 | 2022-12-23 | 中国建设银行股份有限公司 | Method and device for monitoring CPU utilization rate of central processing unit |
CN113411217A (en) * | 2021-06-21 | 2021-09-17 | 广州迷听科技有限公司 | Method and device for monitoring and alarming call system |
CN113537809A (en) * | 2021-07-28 | 2021-10-22 | 深圳供电局有限公司 | Active decision-making method and system for resource expansion in deep learning |
CN113807690A (en) * | 2021-09-09 | 2021-12-17 | 国网江苏省电力有限公司苏州供电分公司 | Online evaluation and early warning method and system for operation state of regional power grid regulation and control system |
CN113821416A (en) * | 2021-09-18 | 2021-12-21 | 中国电信股份有限公司 | Monitoring alarm method, device, storage medium and electronic equipment |
CN114399321A (en) * | 2021-11-15 | 2022-04-26 | 湖南快乐阳光互动娱乐传媒有限公司 | Business system stability analysis method, device and equipment |
CN114415602A (en) * | 2021-12-03 | 2022-04-29 | 珠海格力电器股份有限公司 | Monitoring method, device and system of industrial equipment and storage medium |
CN114415602B (en) * | 2021-12-03 | 2023-09-26 | 珠海格力电器股份有限公司 | Monitoring method, device, system and storage medium for industrial equipment |
CN114328118B (en) * | 2021-12-30 | 2023-11-14 | 苏州浪潮智能科技有限公司 | Intelligent alarming method, device, equipment and medium for operation and maintenance monitoring data |
CN114328118A (en) * | 2021-12-30 | 2022-04-12 | 苏州浪潮智能科技有限公司 | Intelligent alarm method, device, equipment and medium for operation and maintenance monitoring data |
CN115439089A (en) * | 2022-09-08 | 2022-12-06 | 江苏方洋智能科技有限公司 | Business management system based on machine learning |
CN115439089B (en) * | 2022-09-08 | 2023-09-08 | 江苏方洋智能科技有限公司 | Service management system based on machine learning |
Also Published As
Publication number | Publication date |
---|---|
WO2020259421A1 (en) | 2020-12-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110275814A (en) | A kind of monitoring method and device of operation system | |
CN110069810B (en) | Battery failure prediction method, device, equipment and readable storage medium | |
US11403164B2 (en) | Method and device for determining a performance indicator value for predicting anomalies in a computing infrastructure from values of performance indicators | |
KR101984730B1 (en) | Automatic predicting system for server failure and automatic predicting method for server failure | |
CN111045894B (en) | Database abnormality detection method, database abnormality detection device, computer device and storage medium | |
US20140351642A1 (en) | System and methods for automated plant asset failure detection | |
US11307916B2 (en) | Method and device for determining an estimated time before a technical incident in a computing infrastructure from values of performance indicators | |
US20210232104A1 (en) | Method and system for identifying and forecasting the development of faults in equipment | |
CN114267178B (en) | Intelligent operation maintenance method and device for station | |
US11675643B2 (en) | Method and device for determining a technical incident risk value in a computing infrastructure from performance indicator values | |
CN113099476B (en) | Network quality detection method, device, equipment and storage medium | |
CN111897705A (en) | Service state processing method, service state processing device, model training method, model training device, equipment and storage medium | |
CN107480703B (en) | Transaction fault detection method and device | |
KR101960755B1 (en) | Method and apparatus of generating unacquired power data | |
CN114138601A (en) | Service alarm method, device, equipment and storage medium | |
CN110413482B (en) | Detection method and device | |
CN116866152A (en) | Risk operation management and control method and device, electronic equipment and storage medium | |
CN108664696B (en) | Method and device for evaluating running state of water chiller | |
CN115858291A (en) | System index detection method and device, electronic equipment and storage medium thereof | |
CN114938339B (en) | Data processing method and related device | |
CN110517731A (en) | Genetic test quality monitoring data processing method and system | |
CN114861909A (en) | Model quality monitoring method and device, electronic equipment and storage medium | |
CN116962229A (en) | Cluster health degree assessment method and device, electronic equipment and storage medium | |
CN116347466A (en) | Base station out-of-service alarm prediction method and device | |
CN117573412A (en) | System fault early warning method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |