CN110275814A

CN110275814A - A kind of monitoring method and device of operation system

Info

Publication number: CN110275814A
Application number: CN201910580570.5A
Authority: CN
Inventors: 陈泽昊; 邹高锋
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2019-06-28
Filing date: 2019-06-28
Publication date: 2019-09-24
Also published as: WO2020259421A1

Abstract

The present embodiments relate to machine learning field more particularly to the monitoring method and device of a kind of operation system, to solve the problems, such as operation system alarm there are hysteresis quality and accuracy it is lower.The embodiment of the present invention includes: obtain the monitor control index data of operation system in reference time section；The monitor control index data are compared with direct alarm conditions；If the monitor control index data are unsatisfactory for the direct alarm conditions, the monitor control index data are inputted in the machine learning algorithm model trained in advance, the prediction result in predicted time section is determined using the machine learning algorithm model；By the prediction result with it is expected that alarm conditions compare, predict whether the operation system exception occurs in predicted time section.

Description

Monitoring method and device of service system

Technical Field

The invention relates to the field of machine learning in financial technology (Fintech), in particular to a monitoring method and a monitoring device for a business system.

Background

With the development of computer technology, more and more technologies (big data, distributed, Blockchain, artificial intelligence, etc.) are applied to the financial field, and the traditional financial industry is gradually changing to financial technology (Fintech), but higher requirements are also put forward on the technologies due to the requirements of security and real-time performance of the financial industry. A traditional service system monitoring platform is mainly used for configuring relevant alarm strategies according to requirements by users. When a service system is on-line and needs daily monitoring, a service operation and maintenance/development personnel combs key points of the service system, makes related alarm strategy conditions for the key points, and configures the corresponding monitoring alarm strategies in a monitoring platform. Thus, the monitoring platform scans and detects the configured service key points to obtain corresponding detection indexes, matches the detection indexes with a monitoring alarm strategy (namely whether the alarm condition is met) configured by the user, and alarms and informs the user if the alarm condition configured by the user is met.

In the prior art, the monitoring alarm strategies are alarm thresholds pre-configured in an alarm tool for a user, and generally, such thresholds are configured by operation and maintenance/development personnel according to historical experience, and the accuracy is low. And the related operation and maintenance personnel are informed finally from the detection of the abnormity to the confirmation of the abnormity, so that the time consumption of the alarming process is long. Thus, in some cases, there is hysteresis in the alarm. When the service system is abnormal, the data abnormality is uncontrollable and possibly rises exponentially, the index data of the service system does not meet the alarm condition when the service system is just abnormal, when an operation and maintenance worker receives an alarm notification, the abnormal degree of the service index is very serious, the abnormal influence range is rapidly diffused, the service is damaged, and the alarm significance is lost.

Disclosure of Invention

The application provides a monitoring method and a monitoring device for a service system, which are used for solving the problems of hysteresis and low accuracy of service system alarm.

The monitoring method for the service system provided by the embodiment of the invention comprises the following steps:

acquiring monitoring index data of a service system in a reference time period;

comparing the monitoring index data with a direct alarm condition;

if the monitoring index data do not meet the direct alarm condition, inputting the monitoring index data into a machine learning algorithm model trained in advance, and determining a prediction result in a prediction time period by using the machine learning algorithm model;

and comparing the prediction result with the predicted alarm condition, and predicting whether the service system is abnormal in the prediction time period.

In an optional embodiment, before inputting the monitoring index data into a machine learning algorithm model trained in advance and determining a prediction result in a prediction time period by using the machine learning algorithm model, the method further includes:

acquiring training data of a service system in a historical time period;

and inputting training data of the service system in the historical time period as parameters into the machine learning algorithm model, and determining model parameters of the machine learning algorithm model.

In an alternative embodiment, the projected alarm condition is determined according to the following:

inputting historical fault sample data of the service system into the machine learning algorithm model for training, and determining fault model parameters;

inputting historical non-fault sample data of the service system into the machine learning algorithm model for training, and determining non-fault model parameters;

and determining a fault condition according to the fault model parameters and the non-fault model parameters.

In an optional embodiment, the monitoring index data of the business system includes hardware index data of the business system;

for the hardware index data of the service system, the acquiring of the monitoring index data of the service system in the reference time period includes:

acquiring hardware index data of the service system in a first reference time period;

the determining a prediction result within a prediction time period by using the machine learning algorithm model comprises:

determining fluctuation conditions of the hardware index data in the prediction time period;

comparing the prediction result with the predicted alarm condition to predict whether the service system is abnormal in the prediction time period comprises the following steps:

comparing the fluctuation condition of the hardware index data with the fault condition, and judging whether the service system has hardware fault in the prediction time period;

and if the service system has hardware faults in the prediction time period, determining the hardware fault prediction time period and the prediction accuracy.

In an optional embodiment, the monitoring index data of the service system includes service index data of the service system;

for the service index data of the service system, the acquiring of the monitoring index data of the service system in the reference time period includes:

acquiring service index data of the service system in a second reference time period;

determining the fluctuation condition of the service index data in the prediction time period;

comparing the fluctuation condition of the service index data with a normal fluctuation range, and judging whether the service system is abnormal in the prediction time period;

if the service is abnormal in the prediction time period, determining an abnormal prediction time period; and the normal fluctuation range is determined by the machine learning algorithm model according to the fluctuation condition of the service index data in the historical time period.

In an optional embodiment, the monitoring index data of the service system includes monitoring index data of a plurality of monitoring indexes;

before determining the prediction result in the prediction time period by using the machine learning algorithm model, the method further comprises:

determining a weight parameter of each monitoring index;

and inputting the weight parameters corresponding to the monitoring index data into the machine learning algorithm model.

A monitoring device of a business system, comprising:

the acquisition unit is used for acquiring monitoring index data of the service system in a reference time period;

the comparison unit is used for comparing the monitoring index data with the direct alarm condition;

the prediction unit is used for inputting the monitoring index data into a machine learning algorithm model trained in advance if the monitoring index data does not meet the direct alarm condition, and determining a prediction result in a prediction time period by using the machine learning algorithm model;

and the alarm unit is used for comparing the prediction result with the predicted alarm condition and predicting whether the service system is abnormal in the prediction time period.

In an optional embodiment, the apparatus further comprises a training unit configured to:

acquiring training data of a service system in a historical time period;

inputting historical fault sample data of the service system in a historical time period into the machine learning algorithm model for training, and determining fault model parameters;

inputting historical non-fault sample data of the service system in the historical time period into the machine learning algorithm model for training, and determining non-fault model parameters;

hardware metrics data for the business system,

the acquiring unit is used for acquiring hardware index data of the service system in a first reference time period;

the prediction unit is used for determining the fluctuation condition of the hardware index data in the prediction time period;

the alarm unit is used for comparing the fluctuation condition of the hardware index data with the fault condition and judging whether the service system has a hardware fault in the prediction time period; and if the service system has hardware faults in the prediction time period, determining the hardware fault prediction time period and the prediction accuracy.

for the traffic indicator data of the traffic system,

the acquiring unit is used for acquiring service index data of the service system in a second reference time period;

the prediction unit is used for determining the fluctuation condition of the service index data in the prediction time period;

the alarm unit is used for comparing the fluctuation condition of the service index data with a normal fluctuation range and judging whether the service system is abnormal in the prediction time period; if the service is abnormal in the prediction time period, determining an abnormal prediction time period; and the normal fluctuation range is determined by the machine learning algorithm model according to the fluctuation condition of the service index data in the historical time period.

the prediction unit is further configured to:

determining a weight parameter of each monitoring index;

An embodiment of the present invention further provides an electronic device, including:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.

Embodiments of the present invention also provide a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method as described above.

In the embodiment of the invention, the monitoring index data of the service system in the reference time period is obtained, the monitoring index data is compared with the direct alarm condition, and if the monitoring index data meets the direct alarm condition, the user is directly alarmed. And if the monitoring index data do not meet the direct alarm condition, inputting the monitoring index data into the machine learning algorithm model, and determining a prediction result in the prediction time period by using the machine learning algorithm model. The prediction time period comprises a time period after the current time point, namely, the machine learning algorithm model can predict the operation condition of the business system for a period of time in the future, and compares the operation condition with the predicted alarm condition, so as to predict whether the abnormity is possible, and if the abnormity is possible, the machine learning algorithm model alarms the user. The embodiment of the invention predicts the possible future abnormality by using the machine learning algorithm model, so that the service operation and maintenance personnel can prepare service disaster tolerance in advance for the impending abnormality, the availability of a service system is improved, and the prediction accuracy is high.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

FIG. 1 is a schematic diagram of a possible system architecture according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of a monitoring method for a service system according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a monitoring apparatus of a service system according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1, a system architecture to which the embodiment of the present invention is applicable includes a business system 101, a monitoring platform 102, and a monitoring client 103. The service system 101 and/or the monitoring platform 102 may be a network device such as a computer, an independent device, or a server cluster formed by a plurality of servers. Preferably, the business system 101 and/or the monitoring platform 102 may employ cloud computing technology for information processing.

The monitoring client 103 is installed on the monitoring platform 102. The monitoring platform 102 may be an electronic device with a wireless communication function, such as a mobile phone, a tablet computer, or a dedicated handheld device, and may also be a device connected to the internet in a wired access manner, such as a Personal Computer (PC), a notebook computer, or a server.

The monitoring platform 102 may communicate with the service System 101 through an INTERNET network, or may communicate with the service System 101 through a Global System for Mobile Communications (GSM), a Long Term Evolution (LTE) System, or other Mobile communication systems. The monitoring client 103 may communicate with the monitoring platform 102 through an INTERNET network, or may communicate with the monitoring platform 102 through a global system for Mobile Communications (GSM), a Long Term Evolution (LTE) system, or other Mobile communication systems.

For the use of the user, the system architecture in the embodiment of the invention is almost the same as that of the traditional monitoring platform, and the user only needs to configure the service index monitoring strategy concerned by the user, so that the use of the user is more friendly, the user does not need to pay attention to how to realize the fault prediction in the monitoring platform, and the use threshold is not used.

For convenience of understanding, terms that may be referred to in the embodiments of the present invention are defined and explained below.

The user: the users in the embodiment of the invention comprise service system developers, service operation and maintenance personnel and all related personnel for monitoring the service by using the monitoring platform.

The intelligent monitoring platform: a tool for taking charge of monitoring and alarming a business system. The method comprises the steps of monitoring service indexes and basic service (such as server hardware health conditions, network connection conditions and the like) indexes of a system, integrating the detected indexes through a machine learning algorithm model, and predicting possible fault abnormity in the future.

Alarm detection/prediction: the method is also called as service system fault detection/prediction, and is used for detecting and predicting faults/abnormalities which may occur in daily operation of a service system by a monitoring platform.

LSTM: long Short-Term Memory (LSTM), a time-recursive neural network algorithm in machine learning.

Time series: the numerical sequence is a numerical sequence formed by arranging numerical values of the same statistical index according to the occurrence time sequence. The main purpose of time series analysis is to predict the future based on existing historical data. Most of the economic data is given in time series. The time in the time series may be year, quarter, month or any other form of time depending on the time of observation.

In order to implement the position data of the prediction node and improve the accuracy of the prediction, an embodiment of the present invention provides a monitoring method for a service system, and as shown in fig. 2, the monitoring method for a service system provided by the embodiment of the present invention includes the following steps:

step 201, obtaining monitoring index data of the service system in the reference time period.

Because different server manufacturers, the collected data formats are different, including different hardware, the recorded hardware data formats are also different, and the data formats of different service interfaces and different services are also different, the reported data needs to be cleaned, the unification of various data formats is realized, and the cleaned data can be used for machine learning training, alarm matching and prediction of a big data processing and machine learning algorithm module.

Meanwhile, because the monitoring indexes of the server hardware and the service interface are different, and the dimension data of each part and each interface may be many, a data source positively correlated to the monitoring index needs to be selected to eliminate interference items, such as a SMART value of a hard disk, a Health value of a mainboard, and the like.

Step 202, comparing the monitoring index data with a direct alarm condition.

Carrying out logic processing on the cleaned monitoring index data, judging whether the monitoring index data reaches a direct alarm condition, if so, indicating that the business system is abnormal currently, and directly alarming a user; if the direct alarm condition is not met, the reported monitoring index data is calculated through a trained machine learning algorithm model, and whether the abnormality is possibly generated in the future time period is predicted.

The direct alarm condition in the embodiment of the invention can also be trained through a machine learning algorithm model and automatically judged by the system in the daily iteration process in the production environment, so that the time spent by operation and maintenance personnel in the process of configuring the direct alarm condition is reduced, the management efficiency is improved, and the false alarm caused by manual configuration is avoided.

And 203, if the monitoring index data does not meet the direct alarm condition, inputting the monitoring index data into a machine learning algorithm model trained in advance, and determining a prediction result in a prediction time period by using the machine learning algorithm model.

The machine learning algorithm model can comprise a CNN neural network, a vector machine SVM, a K-Means cluster, a Logistic Regression (Logistic Regression) and the like. In consideration of the balance relationship between the training cost (operation time, the scale of the server cluster required to be operated) and the prediction result, the LSTM neural network algorithm is preferably used for prediction in the embodiment of the invention.

And step 204, comparing the prediction result with the predicted alarm condition, and predicting whether the service system is abnormal in the prediction time period.

In the specific implementation process, the predicted alarm condition can be determined by operation and maintenance personnel according to experience, can be obtained through machine learning algorithm model training, or can be judged by the system in the daily iteration process in the production environment. If the abnormality occurs, the user can be informed by means of mails and/or short messages and/or telephones and/or WeChat and the like.

Since the monitoring index data of the service system is correlated with time to form time series data, a prediction result in a future time period can be predicted according to the monitoring index data, so that the operation condition of the service system is indexed. And comparing the prediction result with the set predicted alarm condition so as to determine whether the service system is possible to be abnormal in a future time period.

Further, the LSTM algorithm model is trained based on training data over a historical period of time. Before inputting the monitoring index data into a machine learning algorithm model trained in advance and determining a prediction result in a prediction time period by using the machine learning algorithm model, the method further comprises:

acquiring training data of a service system in a historical time period;

In the specific implementation process, the training data of the service system at each time point is used as the output parameters of the LSTM algorithm model, and for each output parameter, a lot of training data in the historical time period before the corresponding time point is used as the input parameters of the LSTM algorithm model. Thus, after a large number of corresponding relations between the input parameters and the output parameters are obtained, model parameters of the LSTM algorithm model can be obtained based on the existing LSTM algorithm model training method.

It should be noted that the historical time period corresponding to the training process and the reference time period corresponding to the prediction process may be the same time period or different time periods, and if the historical time period and the reference time period are different time periods, the two time periods may or may not overlap. For example, the historical time period is 1000 hours before the current time point, and the reference time period is 999 hours before the current time point; or the historical period is 9 to 11 am per day from 1 to 3 months in 2018, and the reference period is 9 to 11 am per day from 1 to 3 months in 2019. The selection of the historical time period and the reference time period is based on the calculation requirement, and the embodiment of the invention is not limited.

Further, in the embodiment of the present invention, the predicted alarm condition may also be obtained by training using the LSTM algorithm. The expected alarm condition is determined according to the following manner:

In a specific implementation process, the historical fault samples are various hardware index data acquired by the service system when the hardware fault is determined, and the fault model parameters of the hardware of the service system when the hardware of the service system is in fault can be determined by inputting the historical fault samples into the LSTM algorithm model. The historical non-fault samples are various hardware index data collected when the service system operates normally, and the historical non-fault samples are input into the LSTM algorithm model, so that non-fault model parameters of the hardware of the service system in the normal operation process can be determined. Thus, the specific fault condition may be determined based on the fault model parameters and non-fault model parameters.

Because the monitoring index data of the service system comprises the hardware index data of the service system and the service index data of the service system, the embodiment of the invention respectively predicts and alarms aiming at two different monitoring indexes.

Further, the acquiring of the monitoring index data of the service system in the reference time period for the hardware index data of the service system includes:

In the specific implementation process, for hardware index data, each server has its own life cycle, and the more the time node at which the temporary fault abnormality occurs, the higher the prediction accuracy. Therefore, the first reference time period of the hardware index data is selected as close as possible to the current time point.

Table 1 shows the failure prediction results of the hardware index data.

TABLE 1

For example, as shown in table 1, for monitoring index 1, it is predicted that abnormality may occur in the server hardware within 45 days, and the prediction accuracy is 78%; if the server hardware is predicted to be abnormal within 60 days, the prediction accuracy is 80%.

In the specific implementation process, since the service index data changes every day, for the prediction model, the prediction accuracy is also improved along with the survival time of the server hardware after production or the service index monitoring time. For the service index, the more monitoring index data used for prediction, the larger the sample data, and the more accurate the result. Therefore, the second reference time period of the service index data is selected as long as possible.

Because the data volume of the collected monitoring index data may be very large, and the influence of the weight of each monitoring index on the abnormality is different, the weight range of each monitoring index needs to be configured through calculation. Further, the monitoring index data of the service system comprises monitoring index data of a plurality of monitoring indexes;

determining a weight parameter of each monitoring index;

In the embodiment of the invention, the monitoring index condition in a certain time period in the future is predicted through an LSTM model, and if the monitoring index data detected at the next monitoring time point is not in the predicted normal fluctuation range, the alarm is given to a service operation and maintenance/development staff. In addition, the service operation and maintenance/development personnel can also make corresponding preparation in advance according to the monitoring index fluctuation condition predicted by the monitoring platform, so that the service is prevented from being influenced. For example, before holidays or new business activities, the monitoring platform predicts daily access traffic which may increase in the future of the business, so that business operation and maintenance personnel can perform system capacity expansion in advance, and the situation that the business system is unavailable due to insufficient performance of the business system is avoided.

In order to more clearly understand the present invention, the above flow is described in detail below with a specific embodiment based on the architecture of fig. 1, and the steps of the specific embodiment are as follows, including:

step S300: and training the LSTM algorithm model to obtain model parameters.

Step S301: and acquiring monitoring index data.

Because data indexes of server hardware and service interfaces are different, and dimension data of each part and each interface may be many, a data source positively correlated to the service index needs to be selected to eliminate interference items, such as a SMART value of a hard disk, a Health value of a mainboard and the like.

Step S302: and (4) preprocessing data.

Because the data volume of the collected monitoring indexes may be very large, and the influence of the weight of each monitoring index on the abnormality is different, the weight parameters of each monitoring index need to be acquired.

Step S303: and acquiring a predicted alarm condition. The method comprises the steps of obtaining a fault model parameter and a non-fault model parameter by respectively carrying out model training by using historical fault sample data and historical non-fault sample data, and determining a predicted alarm condition according to the fault model parameter and the non-fault model parameter.

Step S304: and inputting the monitoring index data and the weight parameters into the trained LSTM algorithm model, and calculating a prediction result of the prediction time period by using the LSTM algorithm model.

In the specific implementation process, an LSTM algorithm model is used for predicting a complete sequence, namely, a training window is initialized once only by using the first part of training data, and then the sliding window is continuously moved and the next point is predicted like point-by-point prediction. The LSTM algorithm model predicts by using the predicted data, namely, in the second prediction, one data point (the last point) in the data used by the model comes from the previous prediction; at the third prediction, there are two points in the data from the previous prediction … … and so on, and by the 99 th prediction, the data in the test set is already fully predicted. This means that the time series that the algorithm model can predict is greatly extended.

Step S305: and comparing the prediction result with the predicted alarm condition, determining whether the service system is abnormal in the prediction time period, and displaying the prediction result to a user.

Different service systems/servers may have different priorities for alarms for different service systems/servers. Because the algorithm model can simultaneously feed back the prediction accuracy, whether the corresponding service system needs to carry out fault prediction or not can be determined according to different prediction accuracy and a threshold value matching strategy which can be defined by a user in advance.

An embodiment of the present invention further provides a monitoring apparatus for a service system, as shown in fig. 3, including:

an obtaining unit 31, configured to obtain monitoring index data of a service system in a reference time period;

a comparison unit 32, configured to compare the monitoring index data with a direct alarm condition;

the prediction unit 33 is configured to, if the monitoring index data does not satisfy the direct alarm condition, input the monitoring index data into a machine learning algorithm model trained in advance, and determine a prediction result within a prediction time period by using the machine learning algorithm model;

and the warning unit 34 is configured to compare the prediction result with a predicted warning condition, and predict whether the service system is abnormal within a prediction time period.

Further comprising a training unit 35 for:

acquiring training data of a service system in a historical time period;

Optionally, a training unit 35 is further included, configured to:

Optionally, the monitoring index data of the service system includes hardware index data of the service system;

hardware metrics data for the business system,

the acquiring unit 31 is configured to acquire hardware index data of the service system in a first reference time period;

the prediction unit 33 is configured to determine a fluctuation condition of the hardware index data in the prediction time period;

the alarm unit 34 is configured to compare the fluctuation condition of the hardware indicator data with the fault condition, and determine whether a hardware fault occurs in the service system within the prediction time period; and if the service system has hardware faults in the prediction time period, determining the hardware fault prediction time period and the prediction accuracy.

The monitoring index data of the service system comprises service index data of the service system; for the traffic indicator data of the traffic system,

the acquiring unit 31 is configured to acquire service index data of the service system in a second reference time period;

the prediction unit 33 is configured to determine a fluctuation condition of the service index data in the prediction time period;

the alarm unit 34 is configured to compare the fluctuation condition of the service index data with a normal fluctuation range, and determine whether the service system is abnormal in the prediction time period; if the service is abnormal in the prediction time period, determining an abnormal prediction time period; and the normal fluctuation range is determined by the machine learning algorithm model according to the fluctuation condition of the service index data in the historical time period.

Optionally, the monitoring index data of the service system includes monitoring index data of a plurality of monitoring indexes;

the prediction unit 33 is further configured to:

determining a weight parameter of each monitoring index;

Based on the same principle, the present invention also provides an electronic device, as shown in fig. 4, including:

the system comprises a processor 501, a memory 502, a transceiver 503 and a bus interface 504, wherein the processor 501, the memory 502 and the transceiver 503 are connected through the bus interface 504;

the processor 501 is configured to read the program in the memory 502, and execute the following method:

acquiring monitoring index data of a service system in a reference time period;

comparing the monitoring index data with a direct alarm condition;

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method for monitoring a service system, comprising:

acquiring monitoring index data of a service system in a reference time period;

comparing the monitoring index data with a direct alarm condition;

2. The method of claim 1, wherein before inputting the monitoring index data into a pre-trained machine learning algorithm model and determining a prediction result within a prediction time period using the machine learning algorithm model, the method further comprises:

acquiring training data of a service system in a historical time period;

3. The method of claim 1, wherein the projected alarm condition is determined according to:

4. The method of claim 3, wherein the monitoring metrics data of the business system comprises hardware metrics data of the business system;

5. The method of claim 1, wherein the monitoring metrics data of the business system comprises business metrics data of the business system;

6. The method of claim 1, wherein the monitoring index data of the business system includes monitoring index data of a plurality of monitoring indices;

determining a weight parameter of each monitoring index;

7. A monitoring apparatus for a business system, comprising:

8. The apparatus of claim 7, further comprising a training unit to:

acquiring training data of a service system in a historical time period;

9. The apparatus of claim 7, further comprising a training unit to:

10. The apparatus of claim 9, wherein the monitoring metrics data of the business system comprises hardware metrics data of the business system;

hardware metrics data for the business system,

11. The apparatus of claim 7, wherein the monitoring metrics data of the business system comprises business metrics data of the business system;

for the traffic indicator data of the traffic system,

12. The apparatus of claim 7, wherein the monitoring index data of the business system comprises monitoring index data of a plurality of monitoring indices;

the prediction unit is further configured to:

determining a weight parameter of each monitoring index;

13. An electronic device, comprising:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

14. A non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-6.