CN110275814A - A kind of monitoring method and device of operation system - Google Patents

A kind of monitoring method and device of operation system Download PDF

Info

Publication number
CN110275814A
CN110275814A CN201910580570.5A CN201910580570A CN110275814A CN 110275814 A CN110275814 A CN 110275814A CN 201910580570 A CN201910580570 A CN 201910580570A CN 110275814 A CN110275814 A CN 110275814A
Authority
CN
China
Prior art keywords
time period
index data
prediction
service system
machine learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910580570.5A
Other languages
Chinese (zh)
Inventor
陈泽昊
邹高锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN201910580570.5A priority Critical patent/CN110275814A/en
Publication of CN110275814A publication Critical patent/CN110275814A/en
Priority to PCT/CN2020/097249 priority patent/WO2020259421A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Testing And Monitoring For Control Systems (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The present embodiments relate to machine learning field more particularly to the monitoring method and device of a kind of operation system, to solve the problems, such as operation system alarm there are hysteresis quality and accuracy it is lower.The embodiment of the present invention includes: obtain the monitor control index data of operation system in reference time section;The monitor control index data are compared with direct alarm conditions;If the monitor control index data are unsatisfactory for the direct alarm conditions, the monitor control index data are inputted in the machine learning algorithm model trained in advance, the prediction result in predicted time section is determined using the machine learning algorithm model;By the prediction result with it is expected that alarm conditions compare, predict whether the operation system exception occurs in predicted time section.

Description

Monitoring method and device of service system
Technical Field
The invention relates to the field of machine learning in financial technology (Fintech), in particular to a monitoring method and a monitoring device for a business system.
Background
With the development of computer technology, more and more technologies (big data, distributed, Blockchain, artificial intelligence, etc.) are applied to the financial field, and the traditional financial industry is gradually changing to financial technology (Fintech), but higher requirements are also put forward on the technologies due to the requirements of security and real-time performance of the financial industry. A traditional service system monitoring platform is mainly used for configuring relevant alarm strategies according to requirements by users. When a service system is on-line and needs daily monitoring, a service operation and maintenance/development personnel combs key points of the service system, makes related alarm strategy conditions for the key points, and configures the corresponding monitoring alarm strategies in a monitoring platform. Thus, the monitoring platform scans and detects the configured service key points to obtain corresponding detection indexes, matches the detection indexes with a monitoring alarm strategy (namely whether the alarm condition is met) configured by the user, and alarms and informs the user if the alarm condition configured by the user is met.
In the prior art, the monitoring alarm strategies are alarm thresholds pre-configured in an alarm tool for a user, and generally, such thresholds are configured by operation and maintenance/development personnel according to historical experience, and the accuracy is low. And the related operation and maintenance personnel are informed finally from the detection of the abnormity to the confirmation of the abnormity, so that the time consumption of the alarming process is long. Thus, in some cases, there is hysteresis in the alarm. When the service system is abnormal, the data abnormality is uncontrollable and possibly rises exponentially, the index data of the service system does not meet the alarm condition when the service system is just abnormal, when an operation and maintenance worker receives an alarm notification, the abnormal degree of the service index is very serious, the abnormal influence range is rapidly diffused, the service is damaged, and the alarm significance is lost.
Disclosure of Invention
The application provides a monitoring method and a monitoring device for a service system, which are used for solving the problems of hysteresis and low accuracy of service system alarm.
The monitoring method for the service system provided by the embodiment of the invention comprises the following steps:
acquiring monitoring index data of a service system in a reference time period;
comparing the monitoring index data with a direct alarm condition;
if the monitoring index data do not meet the direct alarm condition, inputting the monitoring index data into a machine learning algorithm model trained in advance, and determining a prediction result in a prediction time period by using the machine learning algorithm model;
and comparing the prediction result with the predicted alarm condition, and predicting whether the service system is abnormal in the prediction time period.
In an optional embodiment, before inputting the monitoring index data into a machine learning algorithm model trained in advance and determining a prediction result in a prediction time period by using the machine learning algorithm model, the method further includes:
acquiring training data of a service system in a historical time period;
and inputting training data of the service system in the historical time period as parameters into the machine learning algorithm model, and determining model parameters of the machine learning algorithm model.
In an alternative embodiment, the projected alarm condition is determined according to the following:
inputting historical fault sample data of the service system into the machine learning algorithm model for training, and determining fault model parameters;
inputting historical non-fault sample data of the service system into the machine learning algorithm model for training, and determining non-fault model parameters;
and determining a fault condition according to the fault model parameters and the non-fault model parameters.
In an optional embodiment, the monitoring index data of the business system includes hardware index data of the business system;
for the hardware index data of the service system, the acquiring of the monitoring index data of the service system in the reference time period includes:
acquiring hardware index data of the service system in a first reference time period;
the determining a prediction result within a prediction time period by using the machine learning algorithm model comprises:
determining fluctuation conditions of the hardware index data in the prediction time period;
comparing the prediction result with the predicted alarm condition to predict whether the service system is abnormal in the prediction time period comprises the following steps:
comparing the fluctuation condition of the hardware index data with the fault condition, and judging whether the service system has hardware fault in the prediction time period;
and if the service system has hardware faults in the prediction time period, determining the hardware fault prediction time period and the prediction accuracy.
In an optional embodiment, the monitoring index data of the service system includes service index data of the service system;
for the service index data of the service system, the acquiring of the monitoring index data of the service system in the reference time period includes:
acquiring service index data of the service system in a second reference time period;
the determining a prediction result within a prediction time period by using the machine learning algorithm model comprises:
determining the fluctuation condition of the service index data in the prediction time period;
comparing the prediction result with the predicted alarm condition to predict whether the service system is abnormal in the prediction time period comprises the following steps:
comparing the fluctuation condition of the service index data with a normal fluctuation range, and judging whether the service system is abnormal in the prediction time period;
if the service is abnormal in the prediction time period, determining an abnormal prediction time period; and the normal fluctuation range is determined by the machine learning algorithm model according to the fluctuation condition of the service index data in the historical time period.
In an optional embodiment, the monitoring index data of the service system includes monitoring index data of a plurality of monitoring indexes;
before determining the prediction result in the prediction time period by using the machine learning algorithm model, the method further comprises:
determining a weight parameter of each monitoring index;
and inputting the weight parameters corresponding to the monitoring index data into the machine learning algorithm model.
A monitoring device of a business system, comprising:
the acquisition unit is used for acquiring monitoring index data of the service system in a reference time period;
the comparison unit is used for comparing the monitoring index data with the direct alarm condition;
the prediction unit is used for inputting the monitoring index data into a machine learning algorithm model trained in advance if the monitoring index data does not meet the direct alarm condition, and determining a prediction result in a prediction time period by using the machine learning algorithm model;
and the alarm unit is used for comparing the prediction result with the predicted alarm condition and predicting whether the service system is abnormal in the prediction time period.
In an optional embodiment, the apparatus further comprises a training unit configured to:
acquiring training data of a service system in a historical time period;
and inputting training data of the service system in the historical time period as parameters into the machine learning algorithm model, and determining model parameters of the machine learning algorithm model.
In an optional embodiment, the apparatus further comprises a training unit configured to:
inputting historical fault sample data of the service system in a historical time period into the machine learning algorithm model for training, and determining fault model parameters;
inputting historical non-fault sample data of the service system in the historical time period into the machine learning algorithm model for training, and determining non-fault model parameters;
and determining a fault condition according to the fault model parameters and the non-fault model parameters.
In an optional embodiment, the monitoring index data of the business system includes hardware index data of the business system;
hardware metrics data for the business system,
the acquiring unit is used for acquiring hardware index data of the service system in a first reference time period;
the prediction unit is used for determining the fluctuation condition of the hardware index data in the prediction time period;
the alarm unit is used for comparing the fluctuation condition of the hardware index data with the fault condition and judging whether the service system has a hardware fault in the prediction time period; and if the service system has hardware faults in the prediction time period, determining the hardware fault prediction time period and the prediction accuracy.
In an optional embodiment, the monitoring index data of the service system includes service index data of the service system;
for the traffic indicator data of the traffic system,
the acquiring unit is used for acquiring service index data of the service system in a second reference time period;
the prediction unit is used for determining the fluctuation condition of the service index data in the prediction time period;
the alarm unit is used for comparing the fluctuation condition of the service index data with a normal fluctuation range and judging whether the service system is abnormal in the prediction time period; if the service is abnormal in the prediction time period, determining an abnormal prediction time period; and the normal fluctuation range is determined by the machine learning algorithm model according to the fluctuation condition of the service index data in the historical time period.
In an optional embodiment, the monitoring index data of the service system includes monitoring index data of a plurality of monitoring indexes;
the prediction unit is further configured to:
determining a weight parameter of each monitoring index;
and inputting the weight parameters corresponding to the monitoring index data into the machine learning algorithm model.
An embodiment of the present invention further provides an electronic device, including:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.
Embodiments of the present invention also provide a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method as described above.
In the embodiment of the invention, the monitoring index data of the service system in the reference time period is obtained, the monitoring index data is compared with the direct alarm condition, and if the monitoring index data meets the direct alarm condition, the user is directly alarmed. And if the monitoring index data do not meet the direct alarm condition, inputting the monitoring index data into the machine learning algorithm model, and determining a prediction result in the prediction time period by using the machine learning algorithm model. The prediction time period comprises a time period after the current time point, namely, the machine learning algorithm model can predict the operation condition of the business system for a period of time in the future, and compares the operation condition with the predicted alarm condition, so as to predict whether the abnormity is possible, and if the abnormity is possible, the machine learning algorithm model alarms the user. The embodiment of the invention predicts the possible future abnormality by using the machine learning algorithm model, so that the service operation and maintenance personnel can prepare service disaster tolerance in advance for the impending abnormality, the availability of a service system is improved, and the prediction accuracy is high.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
FIG. 1 is a schematic diagram of a possible system architecture according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of a monitoring method for a service system according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a monitoring apparatus of a service system according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, a system architecture to which the embodiment of the present invention is applicable includes a business system 101, a monitoring platform 102, and a monitoring client 103. The service system 101 and/or the monitoring platform 102 may be a network device such as a computer, an independent device, or a server cluster formed by a plurality of servers. Preferably, the business system 101 and/or the monitoring platform 102 may employ cloud computing technology for information processing.
The monitoring client 103 is installed on the monitoring platform 102. The monitoring platform 102 may be an electronic device with a wireless communication function, such as a mobile phone, a tablet computer, or a dedicated handheld device, and may also be a device connected to the internet in a wired access manner, such as a Personal Computer (PC), a notebook computer, or a server.
The monitoring platform 102 may communicate with the service System 101 through an INTERNET network, or may communicate with the service System 101 through a Global System for Mobile Communications (GSM), a Long Term Evolution (LTE) System, or other Mobile communication systems. The monitoring client 103 may communicate with the monitoring platform 102 through an INTERNET network, or may communicate with the monitoring platform 102 through a global system for Mobile Communications (GSM), a Long Term Evolution (LTE) system, or other Mobile communication systems.
For the use of the user, the system architecture in the embodiment of the invention is almost the same as that of the traditional monitoring platform, and the user only needs to configure the service index monitoring strategy concerned by the user, so that the use of the user is more friendly, the user does not need to pay attention to how to realize the fault prediction in the monitoring platform, and the use threshold is not used.
For convenience of understanding, terms that may be referred to in the embodiments of the present invention are defined and explained below.
The user: the users in the embodiment of the invention comprise service system developers, service operation and maintenance personnel and all related personnel for monitoring the service by using the monitoring platform.
The intelligent monitoring platform: a tool for taking charge of monitoring and alarming a business system. The method comprises the steps of monitoring service indexes and basic service (such as server hardware health conditions, network connection conditions and the like) indexes of a system, integrating the detected indexes through a machine learning algorithm model, and predicting possible fault abnormity in the future.
Alarm detection/prediction: the method is also called as service system fault detection/prediction, and is used for detecting and predicting faults/abnormalities which may occur in daily operation of a service system by a monitoring platform.
LSTM: long Short-Term Memory (LSTM), a time-recursive neural network algorithm in machine learning.
Time series: the numerical sequence is a numerical sequence formed by arranging numerical values of the same statistical index according to the occurrence time sequence. The main purpose of time series analysis is to predict the future based on existing historical data. Most of the economic data is given in time series. The time in the time series may be year, quarter, month or any other form of time depending on the time of observation.
In order to implement the position data of the prediction node and improve the accuracy of the prediction, an embodiment of the present invention provides a monitoring method for a service system, and as shown in fig. 2, the monitoring method for a service system provided by the embodiment of the present invention includes the following steps:
step 201, obtaining monitoring index data of the service system in the reference time period.
Because different server manufacturers, the collected data formats are different, including different hardware, the recorded hardware data formats are also different, and the data formats of different service interfaces and different services are also different, the reported data needs to be cleaned, the unification of various data formats is realized, and the cleaned data can be used for machine learning training, alarm matching and prediction of a big data processing and machine learning algorithm module.
Meanwhile, because the monitoring indexes of the server hardware and the service interface are different, and the dimension data of each part and each interface may be many, a data source positively correlated to the monitoring index needs to be selected to eliminate interference items, such as a SMART value of a hard disk, a Health value of a mainboard, and the like.
Step 202, comparing the monitoring index data with a direct alarm condition.
Carrying out logic processing on the cleaned monitoring index data, judging whether the monitoring index data reaches a direct alarm condition, if so, indicating that the business system is abnormal currently, and directly alarming a user; if the direct alarm condition is not met, the reported monitoring index data is calculated through a trained machine learning algorithm model, and whether the abnormality is possibly generated in the future time period is predicted.
The direct alarm condition in the embodiment of the invention can also be trained through a machine learning algorithm model and automatically judged by the system in the daily iteration process in the production environment, so that the time spent by operation and maintenance personnel in the process of configuring the direct alarm condition is reduced, the management efficiency is improved, and the false alarm caused by manual configuration is avoided.
And 203, if the monitoring index data does not meet the direct alarm condition, inputting the monitoring index data into a machine learning algorithm model trained in advance, and determining a prediction result in a prediction time period by using the machine learning algorithm model.
The machine learning algorithm model can comprise a CNN neural network, a vector machine SVM, a K-Means cluster, a Logistic Regression (Logistic Regression) and the like. In consideration of the balance relationship between the training cost (operation time, the scale of the server cluster required to be operated) and the prediction result, the LSTM neural network algorithm is preferably used for prediction in the embodiment of the invention.
And step 204, comparing the prediction result with the predicted alarm condition, and predicting whether the service system is abnormal in the prediction time period.
In the specific implementation process, the predicted alarm condition can be determined by operation and maintenance personnel according to experience, can be obtained through machine learning algorithm model training, or can be judged by the system in the daily iteration process in the production environment. If the abnormality occurs, the user can be informed by means of mails and/or short messages and/or telephones and/or WeChat and the like.
In the embodiment of the invention, the monitoring index data of the service system in the reference time period is obtained, the monitoring index data is compared with the direct alarm condition, and if the monitoring index data meets the direct alarm condition, the user is directly alarmed. And if the monitoring index data do not meet the direct alarm condition, inputting the monitoring index data into the machine learning algorithm model, and determining a prediction result in the prediction time period by using the machine learning algorithm model. The prediction time period comprises a time period after the current time point, namely, the machine learning algorithm model can predict the operation condition of the business system for a period of time in the future, and compares the operation condition with the predicted alarm condition, so as to predict whether the abnormity is possible, and if the abnormity is possible, the machine learning algorithm model alarms the user. The embodiment of the invention predicts the possible future abnormality by using the machine learning algorithm model, so that the service operation and maintenance personnel can prepare service disaster tolerance in advance for the impending abnormality, the availability of a service system is improved, and the prediction accuracy is high.
Since the monitoring index data of the service system is correlated with time to form time series data, a prediction result in a future time period can be predicted according to the monitoring index data, so that the operation condition of the service system is indexed. And comparing the prediction result with the set predicted alarm condition so as to determine whether the service system is possible to be abnormal in a future time period.
Further, the LSTM algorithm model is trained based on training data over a historical period of time. Before inputting the monitoring index data into a machine learning algorithm model trained in advance and determining a prediction result in a prediction time period by using the machine learning algorithm model, the method further comprises:
acquiring training data of a service system in a historical time period;
and inputting training data of the service system in the historical time period as parameters into the machine learning algorithm model, and determining model parameters of the machine learning algorithm model.
In the specific implementation process, the training data of the service system at each time point is used as the output parameters of the LSTM algorithm model, and for each output parameter, a lot of training data in the historical time period before the corresponding time point is used as the input parameters of the LSTM algorithm model. Thus, after a large number of corresponding relations between the input parameters and the output parameters are obtained, model parameters of the LSTM algorithm model can be obtained based on the existing LSTM algorithm model training method.
It should be noted that the historical time period corresponding to the training process and the reference time period corresponding to the prediction process may be the same time period or different time periods, and if the historical time period and the reference time period are different time periods, the two time periods may or may not overlap. For example, the historical time period is 1000 hours before the current time point, and the reference time period is 999 hours before the current time point; or the historical period is 9 to 11 am per day from 1 to 3 months in 2018, and the reference period is 9 to 11 am per day from 1 to 3 months in 2019. The selection of the historical time period and the reference time period is based on the calculation requirement, and the embodiment of the invention is not limited.
Further, in the embodiment of the present invention, the predicted alarm condition may also be obtained by training using the LSTM algorithm. The expected alarm condition is determined according to the following manner:
inputting historical fault sample data of the service system into the machine learning algorithm model for training, and determining fault model parameters;
inputting historical non-fault sample data of the service system into the machine learning algorithm model for training, and determining non-fault model parameters;
and determining a fault condition according to the fault model parameters and the non-fault model parameters.
In a specific implementation process, the historical fault samples are various hardware index data acquired by the service system when the hardware fault is determined, and the fault model parameters of the hardware of the service system when the hardware of the service system is in fault can be determined by inputting the historical fault samples into the LSTM algorithm model. The historical non-fault samples are various hardware index data collected when the service system operates normally, and the historical non-fault samples are input into the LSTM algorithm model, so that non-fault model parameters of the hardware of the service system in the normal operation process can be determined. Thus, the specific fault condition may be determined based on the fault model parameters and non-fault model parameters.
Because the monitoring index data of the service system comprises the hardware index data of the service system and the service index data of the service system, the embodiment of the invention respectively predicts and alarms aiming at two different monitoring indexes.
Further, the acquiring of the monitoring index data of the service system in the reference time period for the hardware index data of the service system includes:
acquiring hardware index data of the service system in a first reference time period;
the determining a prediction result within a prediction time period by using the machine learning algorithm model comprises:
determining fluctuation conditions of the hardware index data in the prediction time period;
comparing the prediction result with the predicted alarm condition to predict whether the service system is abnormal in the prediction time period comprises the following steps:
comparing the fluctuation condition of the hardware index data with the fault condition, and judging whether the service system has hardware fault in the prediction time period;
and if the service system has hardware faults in the prediction time period, determining the hardware fault prediction time period and the prediction accuracy.
In the specific implementation process, for hardware index data, each server has its own life cycle, and the more the time node at which the temporary fault abnormality occurs, the higher the prediction accuracy. Therefore, the first reference time period of the hardware index data is selected as close as possible to the current time point.
Table 1 shows the failure prediction results of the hardware index data.
TABLE 1
For example, as shown in table 1, for monitoring index 1, it is predicted that abnormality may occur in the server hardware within 45 days, and the prediction accuracy is 78%; if the server hardware is predicted to be abnormal within 60 days, the prediction accuracy is 80%.
For the service index data of the service system, the acquiring of the monitoring index data of the service system in the reference time period includes:
acquiring service index data of the service system in a second reference time period;
the determining a prediction result within a prediction time period by using the machine learning algorithm model comprises:
determining the fluctuation condition of the service index data in the prediction time period;
comparing the prediction result with the predicted alarm condition to predict whether the service system is abnormal in the prediction time period comprises the following steps:
comparing the fluctuation condition of the service index data with a normal fluctuation range, and judging whether the service system is abnormal in the prediction time period;
if the service is abnormal in the prediction time period, determining an abnormal prediction time period; and the normal fluctuation range is determined by the machine learning algorithm model according to the fluctuation condition of the service index data in the historical time period.
In the specific implementation process, since the service index data changes every day, for the prediction model, the prediction accuracy is also improved along with the survival time of the server hardware after production or the service index monitoring time. For the service index, the more monitoring index data used for prediction, the larger the sample data, and the more accurate the result. Therefore, the second reference time period of the service index data is selected as long as possible.
Because the data volume of the collected monitoring index data may be very large, and the influence of the weight of each monitoring index on the abnormality is different, the weight range of each monitoring index needs to be configured through calculation. Further, the monitoring index data of the service system comprises monitoring index data of a plurality of monitoring indexes;
before determining the prediction result in the prediction time period by using the machine learning algorithm model, the method further comprises:
determining a weight parameter of each monitoring index;
and inputting the weight parameters corresponding to the monitoring index data into the machine learning algorithm model.
In the embodiment of the invention, the monitoring index condition in a certain time period in the future is predicted through an LSTM model, and if the monitoring index data detected at the next monitoring time point is not in the predicted normal fluctuation range, the alarm is given to a service operation and maintenance/development staff. In addition, the service operation and maintenance/development personnel can also make corresponding preparation in advance according to the monitoring index fluctuation condition predicted by the monitoring platform, so that the service is prevented from being influenced. For example, before holidays or new business activities, the monitoring platform predicts daily access traffic which may increase in the future of the business, so that business operation and maintenance personnel can perform system capacity expansion in advance, and the situation that the business system is unavailable due to insufficient performance of the business system is avoided.
In order to more clearly understand the present invention, the above flow is described in detail below with a specific embodiment based on the architecture of fig. 1, and the steps of the specific embodiment are as follows, including:
step S300: and training the LSTM algorithm model to obtain model parameters.
Step S301: and acquiring monitoring index data.
Because data indexes of server hardware and service interfaces are different, and dimension data of each part and each interface may be many, a data source positively correlated to the service index needs to be selected to eliminate interference items, such as a SMART value of a hard disk, a Health value of a mainboard and the like.
Step S302: and (4) preprocessing data.
Because the data volume of the collected monitoring indexes may be very large, and the influence of the weight of each monitoring index on the abnormality is different, the weight parameters of each monitoring index need to be acquired.
Step S303: and acquiring a predicted alarm condition. The method comprises the steps of obtaining a fault model parameter and a non-fault model parameter by respectively carrying out model training by using historical fault sample data and historical non-fault sample data, and determining a predicted alarm condition according to the fault model parameter and the non-fault model parameter.
Step S304: and inputting the monitoring index data and the weight parameters into the trained LSTM algorithm model, and calculating a prediction result of the prediction time period by using the LSTM algorithm model.
In the specific implementation process, an LSTM algorithm model is used for predicting a complete sequence, namely, a training window is initialized once only by using the first part of training data, and then the sliding window is continuously moved and the next point is predicted like point-by-point prediction. The LSTM algorithm model predicts by using the predicted data, namely, in the second prediction, one data point (the last point) in the data used by the model comes from the previous prediction; at the third prediction, there are two points in the data from the previous prediction … … and so on, and by the 99 th prediction, the data in the test set is already fully predicted. This means that the time series that the algorithm model can predict is greatly extended.
Step S305: and comparing the prediction result with the predicted alarm condition, determining whether the service system is abnormal in the prediction time period, and displaying the prediction result to a user.
Different service systems/servers may have different priorities for alarms for different service systems/servers. Because the algorithm model can simultaneously feed back the prediction accuracy, whether the corresponding service system needs to carry out fault prediction or not can be determined according to different prediction accuracy and a threshold value matching strategy which can be defined by a user in advance.
An embodiment of the present invention further provides a monitoring apparatus for a service system, as shown in fig. 3, including:
an obtaining unit 31, configured to obtain monitoring index data of a service system in a reference time period;
a comparison unit 32, configured to compare the monitoring index data with a direct alarm condition;
the prediction unit 33 is configured to, if the monitoring index data does not satisfy the direct alarm condition, input the monitoring index data into a machine learning algorithm model trained in advance, and determine a prediction result within a prediction time period by using the machine learning algorithm model;
and the warning unit 34 is configured to compare the prediction result with a predicted warning condition, and predict whether the service system is abnormal within a prediction time period.
Further comprising a training unit 35 for:
acquiring training data of a service system in a historical time period;
and inputting training data of the service system in the historical time period as parameters into the machine learning algorithm model, and determining model parameters of the machine learning algorithm model.
Optionally, a training unit 35 is further included, configured to:
inputting historical fault sample data of the service system in a historical time period into the machine learning algorithm model for training, and determining fault model parameters;
inputting historical non-fault sample data of the service system in the historical time period into the machine learning algorithm model for training, and determining non-fault model parameters;
and determining a fault condition according to the fault model parameters and the non-fault model parameters.
Optionally, the monitoring index data of the service system includes hardware index data of the service system;
hardware metrics data for the business system,
the acquiring unit 31 is configured to acquire hardware index data of the service system in a first reference time period;
the prediction unit 33 is configured to determine a fluctuation condition of the hardware index data in the prediction time period;
the alarm unit 34 is configured to compare the fluctuation condition of the hardware indicator data with the fault condition, and determine whether a hardware fault occurs in the service system within the prediction time period; and if the service system has hardware faults in the prediction time period, determining the hardware fault prediction time period and the prediction accuracy.
The monitoring index data of the service system comprises service index data of the service system; for the traffic indicator data of the traffic system,
the acquiring unit 31 is configured to acquire service index data of the service system in a second reference time period;
the prediction unit 33 is configured to determine a fluctuation condition of the service index data in the prediction time period;
the alarm unit 34 is configured to compare the fluctuation condition of the service index data with a normal fluctuation range, and determine whether the service system is abnormal in the prediction time period; if the service is abnormal in the prediction time period, determining an abnormal prediction time period; and the normal fluctuation range is determined by the machine learning algorithm model according to the fluctuation condition of the service index data in the historical time period.
Optionally, the monitoring index data of the service system includes monitoring index data of a plurality of monitoring indexes;
the prediction unit 33 is further configured to:
determining a weight parameter of each monitoring index;
and inputting the weight parameters corresponding to the monitoring index data into the machine learning algorithm model.
Based on the same principle, the present invention also provides an electronic device, as shown in fig. 4, including:
the system comprises a processor 501, a memory 502, a transceiver 503 and a bus interface 504, wherein the processor 501, the memory 502 and the transceiver 503 are connected through the bus interface 504;
the processor 501 is configured to read the program in the memory 502, and execute the following method:
acquiring monitoring index data of a service system in a reference time period;
comparing the monitoring index data with a direct alarm condition;
if the monitoring index data do not meet the direct alarm condition, inputting the monitoring index data into a machine learning algorithm model trained in advance, and determining a prediction result in a prediction time period by using the machine learning algorithm model;
and comparing the prediction result with the predicted alarm condition, and predicting whether the service system is abnormal in the prediction time period.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (14)

1. A method for monitoring a service system, comprising:
acquiring monitoring index data of a service system in a reference time period;
comparing the monitoring index data with a direct alarm condition;
if the monitoring index data do not meet the direct alarm condition, inputting the monitoring index data into a machine learning algorithm model trained in advance, and determining a prediction result in a prediction time period by using the machine learning algorithm model;
and comparing the prediction result with the predicted alarm condition, and predicting whether the service system is abnormal in the prediction time period.
2. The method of claim 1, wherein before inputting the monitoring index data into a pre-trained machine learning algorithm model and determining a prediction result within a prediction time period using the machine learning algorithm model, the method further comprises:
acquiring training data of a service system in a historical time period;
and inputting training data of the service system in the historical time period as parameters into the machine learning algorithm model, and determining model parameters of the machine learning algorithm model.
3. The method of claim 1, wherein the projected alarm condition is determined according to:
inputting historical fault sample data of the service system into the machine learning algorithm model for training, and determining fault model parameters;
inputting historical non-fault sample data of the service system into the machine learning algorithm model for training, and determining non-fault model parameters;
and determining a fault condition according to the fault model parameters and the non-fault model parameters.
4. The method of claim 3, wherein the monitoring metrics data of the business system comprises hardware metrics data of the business system;
for the hardware index data of the service system, the acquiring of the monitoring index data of the service system in the reference time period includes:
acquiring hardware index data of the service system in a first reference time period;
the determining a prediction result within a prediction time period by using the machine learning algorithm model comprises:
determining fluctuation conditions of the hardware index data in the prediction time period;
comparing the prediction result with the predicted alarm condition to predict whether the service system is abnormal in the prediction time period comprises the following steps:
comparing the fluctuation condition of the hardware index data with the fault condition, and judging whether the service system has hardware fault in the prediction time period;
and if the service system has hardware faults in the prediction time period, determining the hardware fault prediction time period and the prediction accuracy.
5. The method of claim 1, wherein the monitoring metrics data of the business system comprises business metrics data of the business system;
for the service index data of the service system, the acquiring of the monitoring index data of the service system in the reference time period includes:
acquiring service index data of the service system in a second reference time period;
the determining a prediction result within a prediction time period by using the machine learning algorithm model comprises:
determining the fluctuation condition of the service index data in the prediction time period;
comparing the prediction result with the predicted alarm condition to predict whether the service system is abnormal in the prediction time period comprises the following steps:
comparing the fluctuation condition of the service index data with a normal fluctuation range, and judging whether the service system is abnormal in the prediction time period;
if the service is abnormal in the prediction time period, determining an abnormal prediction time period; and the normal fluctuation range is determined by the machine learning algorithm model according to the fluctuation condition of the service index data in the historical time period.
6. The method of claim 1, wherein the monitoring index data of the business system includes monitoring index data of a plurality of monitoring indices;
before determining the prediction result in the prediction time period by using the machine learning algorithm model, the method further comprises:
determining a weight parameter of each monitoring index;
and inputting the weight parameters corresponding to the monitoring index data into the machine learning algorithm model.
7. A monitoring apparatus for a business system, comprising:
the acquisition unit is used for acquiring monitoring index data of the service system in a reference time period;
the comparison unit is used for comparing the monitoring index data with the direct alarm condition;
the prediction unit is used for inputting the monitoring index data into a machine learning algorithm model trained in advance if the monitoring index data does not meet the direct alarm condition, and determining a prediction result in a prediction time period by using the machine learning algorithm model;
and the alarm unit is used for comparing the prediction result with the predicted alarm condition and predicting whether the service system is abnormal in the prediction time period.
8. The apparatus of claim 7, further comprising a training unit to:
acquiring training data of a service system in a historical time period;
and inputting training data of the service system in the historical time period as parameters into the machine learning algorithm model, and determining model parameters of the machine learning algorithm model.
9. The apparatus of claim 7, further comprising a training unit to:
inputting historical fault sample data of the service system in a historical time period into the machine learning algorithm model for training, and determining fault model parameters;
inputting historical non-fault sample data of the service system in the historical time period into the machine learning algorithm model for training, and determining non-fault model parameters;
and determining a fault condition according to the fault model parameters and the non-fault model parameters.
10. The apparatus of claim 9, wherein the monitoring metrics data of the business system comprises hardware metrics data of the business system;
hardware metrics data for the business system,
the acquiring unit is used for acquiring hardware index data of the service system in a first reference time period;
the prediction unit is used for determining the fluctuation condition of the hardware index data in the prediction time period;
the alarm unit is used for comparing the fluctuation condition of the hardware index data with the fault condition and judging whether the service system has a hardware fault in the prediction time period; and if the service system has hardware faults in the prediction time period, determining the hardware fault prediction time period and the prediction accuracy.
11. The apparatus of claim 7, wherein the monitoring metrics data of the business system comprises business metrics data of the business system;
for the traffic indicator data of the traffic system,
the acquiring unit is used for acquiring service index data of the service system in a second reference time period;
the prediction unit is used for determining the fluctuation condition of the service index data in the prediction time period;
the alarm unit is used for comparing the fluctuation condition of the service index data with a normal fluctuation range and judging whether the service system is abnormal in the prediction time period; if the service is abnormal in the prediction time period, determining an abnormal prediction time period; and the normal fluctuation range is determined by the machine learning algorithm model according to the fluctuation condition of the service index data in the historical time period.
12. The apparatus of claim 7, wherein the monitoring index data of the business system comprises monitoring index data of a plurality of monitoring indices;
the prediction unit is further configured to:
determining a weight parameter of each monitoring index;
and inputting the weight parameters corresponding to the monitoring index data into the machine learning algorithm model.
13. An electronic device, comprising:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.
14. A non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-6.
CN201910580570.5A 2019-06-28 2019-06-28 A kind of monitoring method and device of operation system Pending CN110275814A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910580570.5A CN110275814A (en) 2019-06-28 2019-06-28 A kind of monitoring method and device of operation system
PCT/CN2020/097249 WO2020259421A1 (en) 2019-06-28 2020-06-19 Method and apparatus for monitoring service system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910580570.5A CN110275814A (en) 2019-06-28 2019-06-28 A kind of monitoring method and device of operation system

Publications (1)

Publication Number Publication Date
CN110275814A true CN110275814A (en) 2019-09-24

Family

ID=67963677

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910580570.5A Pending CN110275814A (en) 2019-06-28 2019-06-28 A kind of monitoring method and device of operation system

Country Status (2)

Country Link
CN (1) CN110275814A (en)
WO (1) WO2020259421A1 (en)

Cited By (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110865929A (en) * 2019-11-26 2020-03-06 携程旅游信息技术(上海)有限公司 Abnormity detection early warning method and system
CN110941797A (en) * 2019-11-07 2020-03-31 中信银行股份有限公司 Operation index monitoring and trend prediction system based on service index
CN111078503A (en) * 2019-12-23 2020-04-28 中国建设银行股份有限公司 Abnormity monitoring method and system
CN111104299A (en) * 2019-11-29 2020-05-05 山东英信计算机技术有限公司 Server performance prediction method and device, electronic equipment and storage medium
CN111241151A (en) * 2019-12-27 2020-06-05 北京健康之家科技有限公司 Service data analysis early warning method, system, storage medium and computing device
CN111339156A (en) * 2020-02-07 2020-06-26 京东城市(北京)数字科技有限公司 Long-term determination method and device of business data and computer readable storage medium
CN111563022A (en) * 2020-05-12 2020-08-21 中国民航信息网络股份有限公司 Centralized storage monitoring method and device
CN111708682A (en) * 2020-06-17 2020-09-25 腾讯科技(深圳)有限公司 Data prediction method, device, equipment and storage medium
CN111752816A (en) * 2020-06-30 2020-10-09 深圳前海微众银行股份有限公司 Operating system analysis method and device
CN111796995A (en) * 2020-06-30 2020-10-20 中国工商银行股份有限公司 Cyclic serial number usage early warning method and system based on ensemble learning
CN111833557A (en) * 2020-07-27 2020-10-27 中国工商银行股份有限公司 Fault identification method and device
CN112019390A (en) * 2020-09-09 2020-12-01 腾讯科技(深圳)有限公司 Network fault positioning method and related device
CN112102049A (en) * 2020-09-23 2020-12-18 中国建设银行股份有限公司 Model training method, business processing method, device and equipment
WO2020259421A1 (en) * 2019-06-28 2020-12-30 深圳前海微众银行股份有限公司 Method and apparatus for monitoring service system
CN112182508A (en) * 2020-09-16 2021-01-05 支付宝(杭州)信息技术有限公司 Abnormity monitoring method and device for compliance business indexes
CN112256526A (en) * 2020-10-14 2021-01-22 中国银联股份有限公司 Data real-time monitoring method and device based on machine learning
CN112486767A (en) * 2020-11-25 2021-03-12 中移(杭州)信息技术有限公司 Intelligent monitoring method, system, server and storage medium for cloud resources
CN112702184A (en) * 2019-10-23 2021-04-23 中国电信股份有限公司 Fault early warning method and device and computer-readable storage medium
CN112825175A (en) * 2019-11-20 2021-05-21 顺丰科技有限公司 Client abnormity early warning method, device and equipment
CN112948223A (en) * 2019-11-26 2021-06-11 北京沃东天骏信息技术有限公司 Method and device for monitoring operation condition
CN112994960A (en) * 2019-12-02 2021-06-18 中国移动通信集团浙江有限公司 Method and device for detecting business data abnormity and computing equipment
CN113411549A (en) * 2021-06-11 2021-09-17 上海兴容信息技术有限公司 Method for judging whether business of target store is normal or not
CN113411233A (en) * 2021-06-17 2021-09-17 建信金融科技有限责任公司 Method and device for monitoring CPU utilization rate of central processing unit
CN113411217A (en) * 2021-06-21 2021-09-17 广州迷听科技有限公司 Method and device for monitoring and alarming call system
CN113516270A (en) * 2020-10-30 2021-10-19 腾讯科技(深圳)有限公司 Service data monitoring method and device
CN113537809A (en) * 2021-07-28 2021-10-22 深圳供电局有限公司 Active decision-making method and system for resource expansion in deep learning
CN113535444A (en) * 2020-04-14 2021-10-22 中国移动通信集团浙江有限公司 Transaction detection method, transaction detection device, computing equipment and computer storage medium
CN113572625A (en) * 2020-04-28 2021-10-29 中国移动通信集团浙江有限公司 Fault early warning method, early warning device, equipment and computer medium
CN113807690A (en) * 2021-09-09 2021-12-17 国网江苏省电力有限公司苏州供电分公司 Online evaluation and early warning method and system for operation state of regional power grid regulation and control system
CN113821416A (en) * 2021-09-18 2021-12-21 中国电信股份有限公司 Monitoring alarm method, device, storage medium and electronic equipment
CN114328118A (en) * 2021-12-30 2022-04-12 苏州浪潮智能科技有限公司 Intelligent alarm method, device, equipment and medium for operation and maintenance monitoring data
CN114399321A (en) * 2021-11-15 2022-04-26 湖南快乐阳光互动娱乐传媒有限公司 Business system stability analysis method, device and equipment
CN114415602A (en) * 2021-12-03 2022-04-29 珠海格力电器股份有限公司 Monitoring method, device and system of industrial equipment and storage medium
US11500368B2 (en) * 2020-05-21 2022-11-15 Tata Consultancy Services Limited Predicting early warnings of an operating mode of equipment in industry plants
CN115439089A (en) * 2022-09-08 2022-12-06 江苏方洋智能科技有限公司 Business management system based on machine learning

Families Citing this family (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112766698B (en) * 2021-01-13 2024-02-09 中国工商银行股份有限公司 Application service pressure determining method and device
CN115103386B (en) * 2021-03-05 2024-07-16 中国电信股份有限公司 Cell 5G wireless network performance early warning device, method and recording medium
CN115119237B (en) * 2021-03-17 2024-07-05 中国移动通信集团福建有限公司 Chamber separation hidden fault identification method and device
CN113780329A (en) * 2021-04-06 2021-12-10 北京沃东天骏信息技术有限公司 Method, apparatus, server and medium for identifying data anomalies
CN113111589B (en) * 2021-04-25 2024-07-05 北京百度网讯科技有限公司 Training method of prediction model, method, device and equipment for predicting heat supply temperature
CN113127309B (en) * 2021-04-30 2023-10-10 北京奇艺世纪科技有限公司 Program monitoring method and device, electronic equipment and storage medium
CN113391981A (en) * 2021-06-30 2021-09-14 中国民航信息网络股份有限公司 Early warning method for monitoring index and related equipment
CN113468022B (en) * 2021-07-01 2024-02-09 丁鹤 Automatic operation and maintenance method for centralized monitoring of products
CN113626285A (en) * 2021-07-30 2021-11-09 平安普惠企业管理有限公司 Model-based job monitoring method and device, computer equipment and storage medium
CN113590427B (en) * 2021-08-09 2024-05-03 中国建设银行股份有限公司 Alarm method, device, storage medium and equipment for monitoring index abnormality
CN113835961B (en) * 2021-09-23 2023-05-16 中国联合网络通信集团有限公司 Alarm information monitoring method, device, server and storage medium
CN114003461A (en) * 2021-09-26 2022-02-01 苏州浪潮智能科技有限公司 Server failure prediction method, system, terminal and storage medium
CN114157585B (en) * 2021-12-09 2024-09-20 京东科技信息技术有限公司 Method and device for monitoring service resources
CN114185948A (en) * 2021-12-16 2022-03-15 北京宏天信业信息技术股份有限公司 Data quality monitoring method and system based on data center
CN114971057A (en) * 2022-06-09 2022-08-30 支付宝(杭州)信息技术有限公司 Model selection method and device
CN115314412B (en) * 2022-06-22 2023-09-05 北京邮电大学 Operation-and-maintenance-oriented type self-adaptive index prediction and early warning method and device
CN115169709B (en) * 2022-07-18 2023-04-18 华能汕头海门发电有限责任公司 Power station auxiliary machine fault diagnosis method and system based on data driving
CN115412326A (en) * 2022-08-23 2022-11-29 天翼安全科技有限公司 Abnormal flow detection method and device, electronic equipment and storage medium
CN115473784B (en) * 2022-09-06 2024-07-09 中国银联股份有限公司 Method and device for determining invalid alarm
CN115981969A (en) * 2023-03-10 2023-04-18 中国信息通信研究院 Monitoring method and device for block chain data platform, electronic equipment and storage medium
CN116664110B (en) * 2023-06-08 2024-03-29 湖北华中电力科技开发有限责任公司 Electric power marketing digitizing method and system based on business center
CN116455679B (en) * 2023-06-16 2023-09-08 杭州美创科技股份有限公司 Abnormal database operation and maintenance flow monitoring method and device and computer equipment
CN116895046B (en) * 2023-07-21 2024-05-07 北京亿宇嘉隆科技有限公司 Abnormal operation and maintenance data processing method based on virtualization
CN117806900B (en) * 2023-07-28 2024-05-07 苏州浪潮智能科技有限公司 Server management method, device, electronic equipment and storage medium
CN116991108B (en) * 2023-09-25 2023-12-12 四川公路桥梁建设集团有限公司 Intelligent management and control method, system and device for bridge girder erection machine and storage medium
CN117149552A (en) * 2023-10-31 2023-12-01 联通在线信息科技有限公司 Automatic interface detection method and device, electronic equipment and storage medium
CN117896284A (en) * 2024-01-17 2024-04-16 北京奇虎科技有限公司 Performance fluctuation positioning method, device, equipment and storage medium
CN117648383B (en) * 2024-01-30 2024-06-11 中国人民解放军国防科技大学 Heterogeneous database real-time data synchronization method, device, equipment and medium
CN117892249B (en) * 2024-03-15 2024-05-31 宁波析昶环保科技有限公司 Intelligent operation and maintenance platform early warning system
CN118260167B (en) * 2024-05-08 2024-09-10 国家气象信息中心(中国气象局气象数据中心) Meteorological data product processing flow monitoring method, system, equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8352216B2 (en) * 2008-05-29 2013-01-08 General Electric Company System and method for advanced condition monitoring of an asset system
CN108172288A (en) * 2018-01-05 2018-06-15 深圳倍佳医疗科技服务有限公司 Medical Devices intelligent control method, device and computer readable storage medium
CN110275814A (en) * 2019-06-28 2019-09-24 深圳前海微众银行股份有限公司 A kind of monitoring method and device of operation system

Cited By (47)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020259421A1 (en) * 2019-06-28 2020-12-30 深圳前海微众银行股份有限公司 Method and apparatus for monitoring service system
CN112702184A (en) * 2019-10-23 2021-04-23 中国电信股份有限公司 Fault early warning method and device and computer-readable storage medium
CN110941797B (en) * 2019-11-07 2023-04-07 中信银行股份有限公司 Operation index monitoring and trend prediction system based on service index
CN110941797A (en) * 2019-11-07 2020-03-31 中信银行股份有限公司 Operation index monitoring and trend prediction system based on service index
CN112825175A (en) * 2019-11-20 2021-05-21 顺丰科技有限公司 Client abnormity early warning method, device and equipment
CN110865929B (en) * 2019-11-26 2024-01-23 携程旅游信息技术(上海)有限公司 Abnormality detection early warning method and system
CN112948223A (en) * 2019-11-26 2021-06-11 北京沃东天骏信息技术有限公司 Method and device for monitoring operation condition
CN110865929A (en) * 2019-11-26 2020-03-06 携程旅游信息技术(上海)有限公司 Abnormity detection early warning method and system
CN111104299A (en) * 2019-11-29 2020-05-05 山东英信计算机技术有限公司 Server performance prediction method and device, electronic equipment and storage medium
CN112994960B (en) * 2019-12-02 2022-09-16 中国移动通信集团浙江有限公司 Method and device for detecting business data abnormity and computing equipment
CN112994960A (en) * 2019-12-02 2021-06-18 中国移动通信集团浙江有限公司 Method and device for detecting business data abnormity and computing equipment
CN111078503A (en) * 2019-12-23 2020-04-28 中国建设银行股份有限公司 Abnormity monitoring method and system
CN111241151A (en) * 2019-12-27 2020-06-05 北京健康之家科技有限公司 Service data analysis early warning method, system, storage medium and computing device
CN111339156A (en) * 2020-02-07 2020-06-26 京东城市(北京)数字科技有限公司 Long-term determination method and device of business data and computer readable storage medium
CN111339156B (en) * 2020-02-07 2023-09-26 京东城市(北京)数字科技有限公司 Method, apparatus and computer readable storage medium for long-term determination of business data
CN113535444A (en) * 2020-04-14 2021-10-22 中国移动通信集团浙江有限公司 Transaction detection method, transaction detection device, computing equipment and computer storage medium
CN113535444B (en) * 2020-04-14 2023-11-03 中国移动通信集团浙江有限公司 Abnormal motion detection method, device, computing equipment and computer storage medium
CN113572625A (en) * 2020-04-28 2021-10-29 中国移动通信集团浙江有限公司 Fault early warning method, early warning device, equipment and computer medium
CN111563022A (en) * 2020-05-12 2020-08-21 中国民航信息网络股份有限公司 Centralized storage monitoring method and device
CN111563022B (en) * 2020-05-12 2023-09-05 中国民航信息网络股份有限公司 Centralized memory monitoring method and device
US11500368B2 (en) * 2020-05-21 2022-11-15 Tata Consultancy Services Limited Predicting early warnings of an operating mode of equipment in industry plants
CN111708682A (en) * 2020-06-17 2020-09-25 腾讯科技(深圳)有限公司 Data prediction method, device, equipment and storage medium
CN111796995B (en) * 2020-06-30 2024-02-09 中国工商银行股份有限公司 Integrated learning-based cyclic serial number usage early warning method and system
CN111796995A (en) * 2020-06-30 2020-10-20 中国工商银行股份有限公司 Cyclic serial number usage early warning method and system based on ensemble learning
CN111752816A (en) * 2020-06-30 2020-10-09 深圳前海微众银行股份有限公司 Operating system analysis method and device
CN111833557A (en) * 2020-07-27 2020-10-27 中国工商银行股份有限公司 Fault identification method and device
CN112019390A (en) * 2020-09-09 2020-12-01 腾讯科技(深圳)有限公司 Network fault positioning method and related device
CN112182508A (en) * 2020-09-16 2021-01-05 支付宝(杭州)信息技术有限公司 Abnormity monitoring method and device for compliance business indexes
CN112102049A (en) * 2020-09-23 2020-12-18 中国建设银行股份有限公司 Model training method, business processing method, device and equipment
CN112256526B (en) * 2020-10-14 2024-02-23 中国银联股份有限公司 Machine learning-based data real-time monitoring method and device
CN112256526A (en) * 2020-10-14 2021-01-22 中国银联股份有限公司 Data real-time monitoring method and device based on machine learning
CN113516270A (en) * 2020-10-30 2021-10-19 腾讯科技(深圳)有限公司 Service data monitoring method and device
CN112486767A (en) * 2020-11-25 2021-03-12 中移(杭州)信息技术有限公司 Intelligent monitoring method, system, server and storage medium for cloud resources
CN113411549A (en) * 2021-06-11 2021-09-17 上海兴容信息技术有限公司 Method for judging whether business of target store is normal or not
CN113411233A (en) * 2021-06-17 2021-09-17 建信金融科技有限责任公司 Method and device for monitoring CPU utilization rate of central processing unit
CN113411233B (en) * 2021-06-17 2022-12-23 中国建设银行股份有限公司 Method and device for monitoring CPU utilization rate of central processing unit
CN113411217A (en) * 2021-06-21 2021-09-17 广州迷听科技有限公司 Method and device for monitoring and alarming call system
CN113537809A (en) * 2021-07-28 2021-10-22 深圳供电局有限公司 Active decision-making method and system for resource expansion in deep learning
CN113807690A (en) * 2021-09-09 2021-12-17 国网江苏省电力有限公司苏州供电分公司 Online evaluation and early warning method and system for operation state of regional power grid regulation and control system
CN113821416A (en) * 2021-09-18 2021-12-21 中国电信股份有限公司 Monitoring alarm method, device, storage medium and electronic equipment
CN114399321A (en) * 2021-11-15 2022-04-26 湖南快乐阳光互动娱乐传媒有限公司 Business system stability analysis method, device and equipment
CN114415602A (en) * 2021-12-03 2022-04-29 珠海格力电器股份有限公司 Monitoring method, device and system of industrial equipment and storage medium
CN114415602B (en) * 2021-12-03 2023-09-26 珠海格力电器股份有限公司 Monitoring method, device, system and storage medium for industrial equipment
CN114328118B (en) * 2021-12-30 2023-11-14 苏州浪潮智能科技有限公司 Intelligent alarming method, device, equipment and medium for operation and maintenance monitoring data
CN114328118A (en) * 2021-12-30 2022-04-12 苏州浪潮智能科技有限公司 Intelligent alarm method, device, equipment and medium for operation and maintenance monitoring data
CN115439089A (en) * 2022-09-08 2022-12-06 江苏方洋智能科技有限公司 Business management system based on machine learning
CN115439089B (en) * 2022-09-08 2023-09-08 江苏方洋智能科技有限公司 Service management system based on machine learning

Also Published As

Publication number Publication date
WO2020259421A1 (en) 2020-12-30

Similar Documents

Publication Publication Date Title
CN110275814A (en) A kind of monitoring method and device of operation system
CN110069810B (en) Battery failure prediction method, device, equipment and readable storage medium
US11403164B2 (en) Method and device for determining a performance indicator value for predicting anomalies in a computing infrastructure from values of performance indicators
KR101984730B1 (en) Automatic predicting system for server failure and automatic predicting method for server failure
CN111045894B (en) Database abnormality detection method, database abnormality detection device, computer device and storage medium
US20140351642A1 (en) System and methods for automated plant asset failure detection
US11307916B2 (en) Method and device for determining an estimated time before a technical incident in a computing infrastructure from values of performance indicators
US20210232104A1 (en) Method and system for identifying and forecasting the development of faults in equipment
CN114267178B (en) Intelligent operation maintenance method and device for station
US11675643B2 (en) Method and device for determining a technical incident risk value in a computing infrastructure from performance indicator values
CN113099476B (en) Network quality detection method, device, equipment and storage medium
CN111897705A (en) Service state processing method, service state processing device, model training method, model training device, equipment and storage medium
CN107480703B (en) Transaction fault detection method and device
KR101960755B1 (en) Method and apparatus of generating unacquired power data
CN114138601A (en) Service alarm method, device, equipment and storage medium
CN110413482B (en) Detection method and device
CN116866152A (en) Risk operation management and control method and device, electronic equipment and storage medium
CN108664696B (en) Method and device for evaluating running state of water chiller
CN115858291A (en) System index detection method and device, electronic equipment and storage medium thereof
CN114938339B (en) Data processing method and related device
CN110517731A (en) Genetic test quality monitoring data processing method and system
CN114861909A (en) Model quality monitoring method and device, electronic equipment and storage medium
CN116962229A (en) Cluster health degree assessment method and device, electronic equipment and storage medium
CN116347466A (en) Base station out-of-service alarm prediction method and device
CN117573412A (en) System fault early warning method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination