WO2020259421A1

WO2020259421A1 - Method and apparatus for monitoring service system

Info

Publication number: WO2020259421A1
Application number: PCT/CN2020/097249
Authority: WO
Inventors: 陈泽昊; 邹高锋
Original assignee: 深圳前海微众银行股份有限公司
Priority date: 2019-06-28
Filing date: 2020-06-19
Publication date: 2020-12-30
Also published as: CN110275814A

Abstract

The embodiments of the present invention relate to the field of machine learning, and in particular to a method and apparatus for monitoring a service system, which are used for solving the problem of lag and low accuracy in a service system alarm. The embodiment of the present invention comprises: acquiring monitoring index data of a service system within a reference time period; comparing the monitoring index data with a direct alarm condition; if the monitoring index data does not meet the direct alarm condition, inputting the monitoring index data into a machine learning algorithm model trained in advance, and determining, using the machine learning algorithm model, a prediction result within a prediction time period; and comparing the prediction result with a predicted alarm condition to predict whether the service system is abnormal within the prediction time period.

Description

Monitoring method and device of business system

Cross references to related applications

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on June 28, 2019, the application number is 201910580570.5, and the application name is "A monitoring method and device for a business system", the entire content of which is incorporated herein by reference Applying.

Technical field

The present invention relates to the field of machine learning in Fintech, and in particular to a monitoring method and device of a business system.

Background technique

With the development of computer technology, more and more technologies (big data, distributed, blockchain, artificial intelligence, etc.) are applied in the financial field. The traditional financial industry is gradually transforming to Fintech. However, due to financial The industry's security and real-time requirements also place higher requirements on technology. In the traditional business system monitoring platform, users configure related alarm strategies according to their needs. When a business system is online and needs to be monitored daily, the business operation and maintenance personnel first sort out the key points of the business system, and formulate relevant alarm policy conditions for it, and configure the corresponding monitoring and alarm policy in the monitoring platform . In this way, the monitoring platform will scan and detect these configured key business points to obtain the corresponding detection indicators, and match them with the monitoring alarm strategy configured by the user (that is, whether the alarm condition is met). If the alarm condition configured by the user is met, Then the user will be notified.

In the prior art, monitoring and alarm policies are all pre-configured warning thresholds for users in the alarm tool, and usually such thresholds are configured by operation and maintenance personnel based on historical experience, and the accuracy is low. And from detecting the abnormality to confirming the occurrence of the abnormality, and finally notifying the relevant operation and maintenance personnel, the alarm process takes a long time. Therefore, in some cases, the alarm has hysteresis. When the business system is abnormal, the abnormality of its data is uncontrollable and may increase exponentially. When the business system just appears to be abnormal, its indicator data does not meet the alarm conditions. When the operation and maintenance personnel receive the alarm notification, the business indicator The degree of abnormality is very serious, and the scope of the abnormality has spread rapidly. At this time, the business has been damaged and the meaning of warning is lost.

Summary of the invention

This application provides a monitoring method and device for a business system to solve the problem of hysteresis and low accuracy in business system alarms.

An embodiment of the present invention provides a monitoring method for a business system, including:

Obtain the monitoring index data of the business system in the reference time period;

Compare the monitoring index data with the direct alarm condition;

If the monitoring index data does not satisfy the direct warning condition, input the monitoring index data into a pre-trained machine learning algorithm model, and use the machine learning algorithm model to determine the prediction result in the prediction time period;

The prediction result is compared with the predicted alarm condition to predict whether the business system is abnormal in the predicted time period.

In an optional embodiment, before inputting the monitoring index data into a pre-trained machine learning algorithm model, before using the machine learning algorithm model to determine the prediction result in the prediction time period, the method further includes:

Obtain the training data of the business system in the historical time period;

The training data of the business system in the historical time period is used as a parameter, and the machine learning algorithm model is input to determine the model parameters of the machine learning algorithm model.

In an optional embodiment, the predicted alarm condition is determined in the following manner:

Inputting historical fault sample data of the business system into the machine learning algorithm model for training, and determining fault model parameters;

Input historical non-fault sample data of the business system into the machine learning algorithm model for training, and determine non-fault model parameters;

Determine a fault condition according to the fault model parameters and the non-fault model parameters.

In an optional embodiment, the monitoring index data of the business system includes hardware index data of the business system;

Regarding the hardware index data of the business system, the obtaining the monitoring index data of the business system in a reference time period includes:

Acquiring hardware index data of the business system in the first reference time period;

The using the machine learning algorithm model to determine the prediction result in the prediction time period includes:

Determine the fluctuation of the hardware indicator data in the prediction time period;

The comparing the prediction result with the predicted alarm condition and predicting whether the business system is abnormal during the predicted time period includes:

Comparing the fluctuation of the hardware index data with the failure condition, and judging whether the business system has a hardware failure within the predicted time period;

If the business system has a hardware failure within the predicted time period, the hardware failure prediction time period and the prediction accuracy rate are determined.

In an optional embodiment, the monitoring index data of the business system includes the business index data of the business system;

Regarding the business index data of the business system, the obtaining the monitoring index data of the business system in a reference time period includes:

Acquiring business index data of the business system in the second reference time period;

Determine the fluctuation of the business index data in the forecast time period;

Comparing the fluctuation of the business index data with the normal fluctuation range, and judging whether the business system is abnormal in the predicted time period;

If the business is abnormal in the predicted time period, determine the abnormal predicted time period; the normal fluctuation range is determined by the machine learning algorithm model based on the fluctuation of the business index data in the historical time period.

In an optional embodiment, the monitoring index data of the business system includes monitoring index data of multiple monitoring indexes;

Before determining the prediction result in the prediction time period by using the machine learning algorithm model, the method further includes:

Determine the weight parameter of each monitoring index;

The weight parameter corresponding to the monitoring index data is input into the machine learning algorithm model.

A monitoring device for a business system includes:

The obtaining unit is used to obtain the monitoring index data of the business system in the reference time period;

The comparison unit is used to compare the monitoring index data with the direct alarm condition;

The prediction unit is configured to input the monitoring index data into a pre-trained machine learning algorithm model if the monitoring index data does not meet the direct alarm condition, and use the machine learning algorithm model to determine the prediction time period forecast result;

The alarm unit is used to compare the prediction result with the predicted alarm condition and predict whether the business system is abnormal in the predicted time period.

In an optional embodiment, a training unit is further included for:

Obtain the training data of the business system in the historical time period;

In an optional embodiment, a training unit is further included for:

Input the historical fault sample data of the business system in the historical time period into the machine learning algorithm model for training, and determine the fault model parameters;

Input historical non-fault sample data of the business system in the historical time period into the machine learning algorithm model for training, and determine non-fault model parameters;

For the hardware index data of the business system,

The acquiring unit is configured to acquire hardware index data of the business system in a first reference time period;

The prediction unit is configured to determine the fluctuation of the hardware index data in the prediction time period;

The alarm unit is configured to compare the fluctuation situation of the hardware index data with the failure condition, and determine whether the business system has a hardware failure within the forecast time period; if the business system is in the forecast If a hardware failure occurs during the time period, the hardware failure prediction time period and the prediction accuracy rate are determined.

For the business index data of the business system,

The acquiring unit is configured to acquire business index data of the business system in a second reference time period;

The prediction unit is configured to determine the fluctuation of the business index data in the prediction time period;

The alarm unit is configured to compare the fluctuation of the business index data with the normal fluctuation range, and determine whether the business system is abnormal in the predicted time period; if the business is in the predicted time period If an abnormality occurs, the abnormal prediction time period is determined; the normal fluctuation range is determined by the machine learning algorithm model according to the fluctuation situation of the business index data in the historical time period.

The prediction unit is also used for:

Determine the weight parameter of each monitoring index;

This application provides a computing device, which includes:

Processor, memory, transceiver, and bus interface; among them, the processor, memory and transceiver are connected by a bus;

The processor is configured to read the program in the memory and execute the monitoring method of the business system described above;

The memory is used to store one or more executable programs, and can store data used by the processor when performing operations.

This application provides a non-transitory computer-readable storage medium in which instructions are stored in the computer storage medium, which when run on a computer, cause the computer to execute the above-mentioned monitoring method of the business system.

This application provides a computer program product containing instructions, which when running on a computer, enables the computer to execute the monitoring method of the above-mentioned business system.

In the embodiment of the present invention, to obtain the monitoring index data of the business system in the reference time period, firstly, the monitoring index data is compared with the direct alarm condition, and if the monitoring index data meets the direct alarm condition, the user is directly alerted. If the monitoring index data does not meet the direct warning conditions, the monitoring index data is input into the machine learning algorithm model, and the machine learning algorithm model is used to determine the prediction result within the prediction time period. Since the predicted time period includes the time period after the current time point, that is, the machine learning algorithm model can predict the operation status of the business system for a period of time in the future, and compare the operation status with the expected alarm conditions, so as to predict whether abnormalities may occur. If an abnormality occurs, the user will be alerted. The embodiment of the present invention uses a machine learning algorithm model to predict future abnormalities that may occur, so that business operation and maintenance personnel can prepare for business disaster recovery in advance for upcoming abnormalities, improve the availability of the business system, and have high prediction accuracy. .

Description of the drawings

In order to more clearly describe the technical solutions in the embodiments of the present invention, the following will briefly introduce the drawings needed in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, other drawings may be obtained from these drawings without creative labor.

Figure 1 is a schematic structural diagram of a possible system architecture provided by an embodiment of the present invention;

2 is a schematic flowchart of a monitoring method for a business system provided by an embodiment of the present invention;

3 is a schematic structural diagram of a monitoring device of a business system provided by an embodiment of the present invention;

Fig. 4 is a schematic structural diagram of a computing device provided by an embodiment of the present invention.

Detailed ways

In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. . Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.

As shown in FIG. 1, a system architecture to which the embodiment of the present invention is applicable includes a business system 101, a monitoring platform 102, and a monitoring client 103. The business system 101 and/or the monitoring platform 102 may be a network device such as a computer, an independent device, or a server cluster formed by multiple servers. Preferably, the business system 101 and/or the monitoring platform 102 can use cloud computing technology for information processing.

The monitoring client 103 is installed on the monitoring platform 102. The monitoring platform 102 can be an electronic device with wireless communication functions such as a mobile phone, a tablet computer or a dedicated handheld device, or it can be a personal computer (PC), notebook computer, server and other wired access devices connected to the Internet .

The monitoring platform 102 can communicate with the business system 101 through the INTERNET network, or through mobile communication systems and business systems such as the Global System for Mobile Communications (GSM), long term evolution (LTE) system, etc. 101 to communicate. The monitoring client 103 can communicate with the monitoring platform 102 through the INTERNET network, or through mobile communication systems such as the Global System for Mobile Communications (GSM), long term evolution (LTE) system, etc. The platform 102 communicates.

For users, the system architecture in the embodiment of the present invention is almost the same as the traditional monitoring platform. Users only need to configure the monitoring strategy of the business indicators they care about, so it is more user-friendly, and users do not need to pay attention to how faults are realized inside the monitoring platform. Forecast, there is no threshold for use.

For ease of understanding, the following defines and explains the terms that may be involved in the embodiments of the present invention.

User: The user in the embodiment of the present invention includes business system developers, business operation and maintenance personnel, and all relevant personnel who use the monitoring platform for business monitoring.

Intelligent monitoring platform: a tool responsible for monitoring and alerting business systems. Including monitoring system business indicators and basic service indicators (such as server hardware health status, network connection status, etc.), the detected indicators are integrated through the machine learning algorithm model to predict possible failures and abnormalities in the future.

Alarm detection/prediction: Also known as business system failure detection/prediction, it detects and predicts the possible failures/abnormalities in the daily operation of the business system for the monitoring platform.

Long Short-Term Memory (LSTM): A time recurrent neural network algorithm in machine learning.

Time series: refers to the sequence of numbers of the same statistical indicator arranged in the order of the time of occurrence. The main purpose of time series analysis is to predict the future based on existing historical data. Most of the economic data are given in the form of time series. Depending on the observation time, the time in the time series can be year, quarter, month or any other time format.

In order to implement forecasting node position data and improve the accuracy of prediction, an embodiment of the present invention provides a monitoring method of a business system. As shown in FIG. 2, the monitoring method of a business system provided by the embodiment of the present invention includes the following steps:

Step 201: Obtain monitoring index data of the business system in a reference time period.

Due to different server manufacturers, the collected data format will be different, including different hardware, the recorded hardware data format is also different, and the data format of different service interfaces and different services may also be different, so it is necessary to report the data Perform cleaning processing to achieve the unification of various data formats, and ensure that the cleaned data can be used for big data processing and machine learning algorithm modules for machine learning training, as well as alarm matching and prediction.

At the same time, because the monitoring indicators of server hardware and business interfaces are different, and the dimensional data of each component and interface may be a lot, it is necessary to select data sources that are positively related to the monitoring indicators to eliminate interference items, such as the SMART value of the hard disk, the motherboard Health value, etc.

Step 202: Compare the monitoring index data with the direct alarm condition.

The cleaned monitoring index data is logically processed to determine whether the monitoring index data meets the direct warning condition. If the direct warning condition is met, it indicates that the business system is currently abnormal, and the user is directly alerted; if the direct warning condition is not met, then The reported monitoring index data is calculated through the trained machine learning algorithm model to predict whether abnormalities may occur in the future time period.

The direct alarm conditions in the embodiment of the present invention can also be trained by the machine learning algorithm model and the system can judge by itself in the daily iterative process in the production environment, thereby reducing the time spent by the operation and maintenance personnel in the process of configuring the direct alarm conditions , Improve management efficiency, and avoid false alarms caused by human configuration.

Step 203: If the monitoring index data does not meet the direct warning condition, input the monitoring index data into a pre-trained machine learning algorithm model, and use the machine learning algorithm model to determine the prediction result in the prediction time period .

Among them, the machine learning algorithm model may include Convolutional Neural Networks (CNN), Support Vector Machine (SVM), K-Means clustering, and Logistic Regression (Logistic Regression). Considering the balanced relationship between training cost (operation time, required operation server cluster size) and prediction results, in the embodiment of the present invention, a Long Short-Term Memory (LSTM) neural network algorithm is preferred for prediction.

Step 204: Compare the prediction result with the predicted alarm condition, and predict whether the business system is abnormal in the predicted time period.

In the specific implementation process, the expected alarm conditions can be determined by the operation and maintenance personnel based on experience, can also be obtained through machine learning algorithm model training, or be judged by the system during the daily iteration process in the production environment. If an abnormality occurs, it can be notified to the user via email and/or SMS and/or phone call and/or WeChat.

Since the monitoring index data of the business system is related to time and composes time series data, the forecast results in the future time period can be predicted based on the monitoring index data, thereby indexing the operational status of the business system. Then compare the predicted results with the set predicted alarm conditions to determine whether the business system is likely to be abnormal in the future time period.

Further, the LSTM algorithm model is trained based on the training data in the historical time period. Said inputting the monitoring index data into a pre-trained machine learning algorithm model, before using the machine learning algorithm model to determine the prediction result in the prediction time period, further includes:

Obtain the training data of the business system in the historical time period;

In the specific implementation process, the training data of the business system at each time point is used as the output parameter of the LSTM algorithm model. For each output parameter, a lot of training data in the historical time period before the corresponding time point is used as the LSTM algorithm model Input parameters. In this way, after obtaining a large number of corresponding relationships between the aforementioned input parameters and output parameters, the model parameters of the LSTM algorithm model can be obtained based on the existing training method of the LSTM algorithm model.

It should be noted that the historical time period corresponding to the training process and the reference time period corresponding to the prediction process can be the same time period or different time periods. If the historical time period and the reference time period are different time periods, the two The time periods may or may not overlap. For example, the historical time period is 1000 hours before the current time point, and the reference time period is 999 hours before the current time point; or the historical time period is 9 am to 11 am every day from January to March 2018, reference The time period is from 9 am to 11 am every day from January to March 2019. The selection of the historical time period and the reference time period is based on calculation requirements, and is not limited in the embodiment of the present invention.

Further, in the embodiment of the present invention, the predicted alarm condition can also be obtained by training using the LSTM algorithm. The expected alarm conditions are determined according to the following methods:

In the specific implementation process, the historical fault samples are various hardware index data collected when the business system determines the hardware fault. Input the historical fault samples into the LSTM algorithm model to determine the fault model parameters of the hardware of the business system when it fails. Historical non-fault samples are various hardware index data collected during normal operation of the business system. Inputting historical non-fault samples into the LSTM algorithm model can determine the non-fault model parameters of the hardware of the business system during normal operation. Therefore, specific fault conditions can be determined based on the fault model parameters and the non-fault model parameters.

Since the monitoring index data of the business system includes the hardware index data of the business system and the business index data of the business system, the embodiment of the present invention respectively performs prediction and alarm for two different types of monitoring indexes.

Further, for the hardware index data of the business system, the obtaining the monitoring index data of the business system in a reference time period includes:

In the specific implementation process, for hardware index data, each server has its own life cycle. The more temporary the time node when the abnormal fault occurs, the higher the accuracy of the prediction. Therefore, the first reference time period of the hardware index data is as close as possible to the current time point.

Table 1 shows the failure prediction results of the hardware index data.

Table 1

For example, as shown in Table 1, for monitoring indicator 1, it is predicted that the server hardware may be abnormal within 45 days, and the prediction accuracy rate is 78%; the prediction is that the server hardware may be abnormal within 60 days, the prediction accuracy rate is 80%.

In the specific implementation process, because the business index data changes every day, for the prediction model, the accuracy of the prediction will also increase with the survival time of the server hardware after production, or the longer the monitoring time of the business indicators. For business indicators, the more monitoring indicator data used for forecasting, the larger the sample data and the more accurate the results. Therefore, the second reference time period of the business indicator data should be as long as possible.

Since the amount of collected monitoring index data may be very large, and the weight of each monitoring index has different effects on abnormalities, it is necessary to calculate and configure the weight range of each monitoring index. Further, the monitoring index data of the business system includes monitoring index data of multiple monitoring indexes;

Determine the weight parameter of each monitoring index;

In the embodiment of the present invention, the monitoring index situation in a certain time period in the future is predicted through the LSTM model. If the monitoring index data is detected to be out of the predicted normal fluctuation range at the next monitoring time point, an alarm is notified to the business operation and maintenance/ Developer. In addition, business operation and maintenance personnel can also make corresponding preparations in advance according to the fluctuations of the monitoring indicators predicted by the monitoring platform to avoid business impact. For example, before holidays or new business activities go online, the monitoring platform will predict the daily access traffic that the business may increase in the future, so that business operation and maintenance personnel can make system expansion in advance to avoid business system unavailability due to insufficient business system performance.

In order to understand the present invention more clearly, the following describes the above process in detail with specific embodiments based on the architecture of FIG. 1. The steps of the specific embodiments are as follows, including:

Step 300: Train the LSTM algorithm model to obtain model parameters.

Step 301: Obtain monitoring index data.

Since the data indicators of server hardware and business interfaces are different, and the dimensional data of each component and interface may be a lot, it is necessary to select data sources that are positively related to business indicators to eliminate interference items, such as the SMART value of the hard disk and the motherboard Health value, etc.

Step 302: Data preprocessing.

Since the amount of collected monitoring index data may be very large, and the weight of each monitoring index has different effects on abnormalities, it is necessary to obtain the weight parameters of each monitoring index.

Step 303: Obtain estimated alarm conditions. Among them, it is necessary to use historical fault sample data and historical non-fault sample data for model training respectively to obtain fault model parameters and non-fault model parameters, and then determine the expected alarm conditions according to the fault model parameters and non-fault model parameters.

Step 304: Input the monitoring index data and weight parameters into the trained LSTM algorithm model, and use the LSTM algorithm model to calculate the prediction result of the prediction time period.

In the specific implementation process, the LSTM algorithm model is used to predict a complete sequence, that is, the training window is initialized once with the first part of the training data, and then the sliding window is continuously moved and the next point is predicted like point-by-point prediction. The LSTM algorithm model uses the predicted data to make predictions, that is, in the second prediction, one data point (the last point) in the data used by the model comes from the previous prediction; in the third prediction, there are two points in the data From previous predictions...and so on. By the time of the 99th prediction, the data in the test set was completely predicted. This means that the predictable time series of the algorithm model is greatly extended.

Step 305: Compare the predicted result with the predicted alarm condition, determine whether the business system will be abnormal in the predicted time period, and display the predicted result to the user.

For different business systems/servers, different business systems/servers have different priorities for alarms. Since the algorithm model will feed back the prediction accuracy rate at the same time, it can determine whether the corresponding business system needs to perform fault prediction according to different prediction accuracy rates and the threshold matching strategy that users can define in advance.

The embodiment of the present invention also provides a monitoring device for a business system, as shown in FIG. 3, including:

The obtaining unit 31 is configured to obtain monitoring index data of the business system in the reference time period;

The comparing unit 32 is configured to compare the monitoring index data with the direct alarm condition;

The prediction unit 33 is configured to input the monitoring index data into a pre-trained machine learning algorithm model if the monitoring index data does not meet the direct alarm condition, and use the machine learning algorithm model to determine the prediction period Forecast results;

The alarm unit 34 is configured to compare the prediction result with the predicted alarm condition, and predict whether the business system is abnormal in the predicted time period.

It also includes a training unit 35 for:

Obtain the training data of the business system in the historical time period;

Optionally, it also includes a training unit 35 for:

Optionally, the monitoring index data of the business system includes hardware index data of the business system;

For the hardware index data of the business system,

The acquiring unit 31 is configured to acquire hardware index data of the business system in the first reference time period;

The prediction unit 33 is configured to determine the fluctuation of the hardware indicator data in the prediction time period;

The alarm unit 34 is configured to compare the fluctuation of the hardware index data with the failure condition, and determine whether the business system has a hardware failure within the predicted time period; if the business system is in the If a hardware failure occurs during the prediction time period, the hardware failure prediction time period and the prediction accuracy rate are determined.

The monitoring index data of the business system includes the business index data of the business system; the business index data of the business system,

The acquiring unit 31 is configured to acquire business index data of the business system in a second reference time period;

The prediction unit 33 is configured to determine the fluctuation of the business index data in the prediction time period;

The alarm unit 34 is configured to compare the fluctuation of the business index data with the normal fluctuation range, and determine whether the business system is abnormal in the predicted time period; if the business is in the predicted time period If an abnormality occurs within the period, the abnormal prediction time period is determined; the normal fluctuation range is determined by the machine learning algorithm model according to the fluctuation situation of the business index data in the historical time period.

Optionally, the monitoring index data of the business system includes monitoring index data of multiple monitoring indexes;

The prediction unit 33 is further configured to:

Determine the weight parameter of each monitoring index;

Based on the same concept as the method shown in FIG. 2, this application also provides a computing device. As shown in FIG. 4, the computing device includes:

The processor 401, the memory 402, the transceiver 403, and the bus interface 404; wherein the processor 401, the memory 402 and the transceiver 403 are connected by a bus;

The processor 401 is configured to read a program in the memory 402, and execute the foregoing monitoring method of the business system;

The processor 401 may be a central processing unit (central processing unit, CPU for short), a network processor (NP for short), or a combination of CPU and NP. It can also be a hardware chip. The aforementioned hardware chip may be an application-specific integrated circuit (ASIC for short), a programmable logic device (PLD for short), or a combination thereof. The above-mentioned PLD can be a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a generic array logic (generic array logic, GAL), or any of them combination.

The memory 402 is configured to store one or more executable programs, and can store data used by the processor 401 when performing operations.

Specifically, the program may include program code, and the program code includes computer operation instructions. The memory 402 may include a volatile memory (volatile memory), such as random-access memory (RAM for short); the memory 402 may also include a non-volatile memory (non-volatile memory), such as flash memory ( flash memory, hard disk drive (HDD for short) or solid-state drive (SSD for short); the memory 402 may also include a combination of the foregoing types of memories.

The memory 402 stores the following elements, executable modules or data structures, or their subsets, or their extended sets:

Operating instructions: including various operating instructions, used to implement various operations.

Operating system: including various system programs, used to implement various basic services and process hardware-based tasks.

The bus may be a peripheral component interconnect standard (PCI) bus or an extended industry standard architecture (EISA) bus, etc. The bus can be divided into address bus, data bus, control bus, etc.

The bus interface 404 may be a wired communication access port, a wireless bus interface or a combination thereof, where the wired bus interface may be, for example, an Ethernet interface. The Ethernet interface can be an optical interface, an electrical interface or a combination thereof. The wireless bus interface may be a WLAN interface.

Based on the same inventive concept, the embodiments of the present application also provide a non-transitory computer-readable storage medium, which stores instructions in the computer storage medium, which when run on a computer, causes the computer to execute the foregoing monitoring method of the business system.

Based on the same inventive concept, the embodiments of the present application provide a computer program product containing instructions, which when running on a computer, cause the computer to execute the above-mentioned monitoring method of the business system.

The present invention is described with reference to flowcharts and/or block diagrams of methods, devices (systems), and computer program products according to embodiments of the present invention. It should be understood that each process and/or block in the flowchart and/or block diagram, and the combination of processes and/or blocks in the flowchart and/or block diagram can be implemented by computer program instructions. These computer program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data processing equipment to generate a machine, so that the instructions executed by the processor of the computer or other programmable data processing equipment are generated It is a device that realizes the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device. The device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment. The instructions provide steps for implementing functions specified in a flow or multiple flows in the flowchart and/or a block or multiple blocks in the block diagram.

Although the preferred embodiments of the present invention have been described, those skilled in the art can make additional changes and modifications to these embodiments once they learn the basic creative concept. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments and all changes and modifications falling within the scope of the present invention.

Obviously, those skilled in the art can make various changes and modifications to the present invention without departing from the spirit and scope of the present invention. In this way, if these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalent technologies, the present invention is also intended to include these modifications and variations.

Claims

A monitoring method for a business system, characterized in that it comprises:

Obtain the monitoring index data of the business system in the reference time period;

Compare the monitoring index data with the direct alarm condition;

If the monitoring index data does not satisfy the direct warning condition, input the monitoring index data into a pre-trained machine learning algorithm model, and use the machine learning algorithm model to determine the prediction result in the prediction time period;

The prediction result is compared with the predicted alarm condition to predict whether the business system is abnormal in the predicted time period.
The method according to claim 1, characterized in that, before inputting the monitoring index data into a pre-trained machine learning algorithm model, and before using the machine learning algorithm model to determine the prediction result in the prediction time period, further include:

Obtain the training data of the business system in the historical time period;

The training data of the business system in the historical time period is used as a parameter, and the machine learning algorithm model is input to determine the model parameters of the machine learning algorithm model.
The method according to claim 1, wherein the predicted alarm condition is determined in the following manner:

Inputting historical fault sample data of the business system into the machine learning algorithm model for training, and determining fault model parameters;

Input historical non-fault sample data of the business system into the machine learning algorithm model for training, and determine non-fault model parameters;

Determine a fault condition according to the fault model parameters and the non-fault model parameters.
The method according to claim 3, wherein the monitoring index data of the business system includes hardware index data of the business system;

Regarding the hardware index data of the business system, the obtaining the monitoring index data of the business system in a reference time period includes:

Acquiring hardware index data of the business system in the first reference time period;

The using the machine learning algorithm model to determine the prediction result in the prediction time period includes:

Determine the fluctuation of the hardware indicator data in the prediction time period;

The comparing the prediction result with the predicted alarm condition and predicting whether the business system is abnormal during the predicted time period includes:

Comparing the fluctuation of the hardware index data with the failure condition, and judging whether the business system has a hardware failure within the predicted time period;

If the business system has a hardware failure within the predicted time period, the hardware failure prediction time period and the prediction accuracy rate are determined.
The method according to claim 1, wherein the monitoring index data of the business system includes the business index data of the business system;

Regarding the business index data of the business system, the obtaining the monitoring index data of the business system in a reference time period includes:

Acquiring business index data of the business system in the second reference time period;

The using the machine learning algorithm model to determine the prediction result in the prediction time period includes:

Determine the fluctuation of the business index data in the forecast time period;

The comparing the prediction result with the predicted alarm condition and predicting whether the business system is abnormal during the predicted time period includes:

Comparing the fluctuation of the business index data with the normal fluctuation range, and judging whether the business system is abnormal in the predicted time period;

If the business is abnormal in the predicted time period, determine the abnormal predicted time period; the normal fluctuation range is determined by the machine learning algorithm model based on the fluctuation of the business index data in the historical time period.
The method according to claim 1, wherein the monitoring index data of the business system includes monitoring index data of multiple monitoring indexes;

Before determining the prediction result in the prediction time period by using the machine learning algorithm model, the method further includes:

Determine the weight parameter of each monitoring index;

The weight parameter corresponding to the monitoring index data is input into the machine learning algorithm model.
A monitoring device for a business system, characterized in that it comprises:

The obtaining unit is used to obtain the monitoring index data of the business system in the reference time period;

The comparison unit is used to compare the monitoring index data with the direct alarm condition;

The prediction unit is configured to input the monitoring index data into a pre-trained machine learning algorithm model if the monitoring index data does not meet the direct alarm condition, and use the machine learning algorithm model to determine the prediction time period forecast result;

The alarm unit is used to compare the prediction result with the predicted alarm condition and predict whether the business system is abnormal in the predicted time period.
8. The device of claim 7, further comprising a training unit for:

Obtain the training data of the business system in the historical time period;

The training data of the business system in the historical time period is used as a parameter, and the machine learning algorithm model is input to determine the model parameters of the machine learning algorithm model.
8. The device of claim 7, further comprising a training unit for:

Input the historical fault sample data of the business system in the historical time period into the machine learning algorithm model for training, and determine the fault model parameters;

Input historical non-fault sample data of the business system in the historical time period into the machine learning algorithm model for training, and determine non-fault model parameters;

Determine a fault condition according to the fault model parameters and the non-fault model parameters.
9. The apparatus according to claim 9, wherein the monitoring index data of the business system includes hardware index data of the business system;

For the hardware index data of the business system,

The acquiring unit is configured to acquire hardware index data of the business system in a first reference time period;

The prediction unit is configured to determine the fluctuation of the hardware index data in the prediction time period;

The alarm unit is configured to compare the fluctuation situation of the hardware index data with the failure condition, and determine whether the business system has a hardware failure within the forecast time period; if the business system is in the forecast If a hardware failure occurs during the time period, the hardware failure prediction time period and the prediction accuracy rate are determined.
8. The device according to claim 7, wherein the monitoring index data of the business system includes the business index data of the business system;

For the business index data of the business system,

The acquiring unit is configured to acquire business index data of the business system in a second reference time period;

The prediction unit is configured to determine the fluctuation of the business index data in the prediction time period;

The alarm unit is configured to compare the fluctuation of the business index data with the normal fluctuation range, and determine whether the business system is abnormal in the predicted time period; if the business is in the predicted time period If an abnormality occurs, the abnormal prediction time period is determined; the normal fluctuation range is determined by the machine learning algorithm model according to the fluctuation situation of the business index data in the historical time period.
The device according to claim 7, wherein the monitoring index data of the business system includes monitoring index data of multiple monitoring indexes;

The prediction unit is also used for:

Determine the weight parameter of each monitoring index;

The weight parameter corresponding to the monitoring index data is input into the machine learning algorithm model.
A computing device, characterized by comprising a processor, a memory, a transceiver, and a bus interface, wherein the processor, the memory and the transceiver are connected by a bus;

The processor is configured to read the program in the memory and execute the method according to any one of claims 1 to 6;

The memory is used to store one or more executable programs and store data used by the processor when performing operations.
A non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium stores computer instructions, and the computer instructions are used to make the computer execute the method described in any one of claims 1 to 6 .
A computer program product, characterized in that, the computer program product includes a calculation program stored on a non-transitory computer-readable storage medium, the computer program includes program instructions, when the program instructions are executed by a computer, The computer executes the method described in any one of claims 1 to 6.