CN113297045B

CN113297045B - Monitoring method and device for distributed system

Info

Publication number: CN113297045B
Application number: CN202010732311.2A
Authority: CN
Inventors: 王梦天; 王鹏; 闫小龙; 王勇
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-07-27
Filing date: 2020-07-27
Publication date: 2024-03-08
Anticipated expiration: 2040-07-27
Also published as: CN113297045A

Abstract

The application provides a monitoring method and device of a distributed system, wherein the monitoring method of the distributed system comprises the following steps: acquiring attribute data of a distributed system in at least one monitoring dimension; preprocessing the attribute data, and marking the attribute data according to a preprocessing result to obtain attribute data carrying a label; inputting attribute data carrying a tag into a score prediction model corresponding to the monitoring dimension, and predicting the score of the distributed system according to the tag to obtain a target score; and comparing the target score with a score threshold of the monitoring dimension, and determining a monitoring result of the distributed system in the monitoring dimension according to a comparison result.

Description

Monitoring method and device for distributed system

Technical Field

The present disclosure relates to the field of data processing technologies, and in particular, to a method and an apparatus for monitoring a distributed system.

Background

With the development of internet technology, the scale of the distributed system becomes larger and larger along with the demands of users, and the number of clusters is also increased, so that the probability of the distributed system to fail is increased; in order to avoid the failure of the distributed system, the system is predicted in a manual scoring mode, so that the possible problems of the distributed system are analyzed, and timely maintenance is realized. However, because of the difficulty in the feasibility of the artificial scoring, on one hand, it is difficult to give a specific score, on the other hand, the scoring standard is not uniform, it is difficult to accurately predict faults of the distributed system, and the scoring indexes of different angles are different, so that the accuracy of the prediction result is lower, and therefore, an effective scheme is needed to solve the problem.

Disclosure of Invention

In view of this, the embodiments of the present application provide two monitoring methods for distributed systems. The application relates to two monitoring devices of the distributed system, two computing devices and a computer readable storage medium at the same time, so as to solve the technical defects in the prior art.

According to a first aspect of an embodiment of the present application, there is provided a monitoring method of a first distributed system, including:

acquiring attribute data of a distributed system in at least one monitoring dimension;

preprocessing the attribute data, and marking the attribute data according to a preprocessing result to obtain attribute data carrying a label;

inputting attribute data carrying a tag into a score prediction model corresponding to the monitoring dimension, and predicting the score of the distributed system according to the tag to obtain a target score;

and comparing the target score with a score threshold of the monitoring dimension, and determining a monitoring result of the distributed system in the monitoring dimension according to a comparison result.

Optionally, the preprocessing the attribute data, and labeling the attribute data according to the preprocessing result to obtain attribute data carrying a tag, including:

Acquiring an analysis result of analyzing the attribute data;

and marking the attribute data according to the analysis result to obtain the attribute data carrying the processed label or the label to be processed.

Optionally, the inputting the attribute data carrying the tag to the score prediction model corresponding to the monitoring dimension includes:

determining a score prediction model corresponding to the monitoring dimension;

and inputting attribute data carrying the processed label or the label to be processed into the score prediction model.

Optionally, the predicting the score of the distributed system according to the label, to obtain a target score, includes:

when the tag is the processed tag, inputting the attribute data into a first score prediction module in the score prediction model to predict the score of the distributed system, and obtaining a first target score of the distributed system in the monitoring dimension as the target score;

or,

and under the condition that the tag is the tag to be processed, inputting the attribute data into a second score prediction module in the score prediction model to predict the score of the distributed system, and obtaining a second target score of the distributed system in the monitoring dimension as the target score.

Optionally, the comparing the target score with the score threshold of the monitoring dimension, and determining the monitoring result of the distributed system in the monitoring dimension according to the comparison result includes:

reading the score threshold of the monitoring dimension, and judging whether the target score is larger than the score threshold;

if yes, determining that the distributed system has no fault in the monitoring dimension;

if not, determining that the distributed system has faults in the monitoring dimension.

Optionally, after the step of determining that the monitoring dimension has a fault, the determining further includes:

generating notification information of the distributed system in the monitoring dimension based on attribute data;

and sending the notification information to a supervisor of the distributed system.

Optionally, the score prediction model includes a first score prediction module and a second score prediction module, the first score prediction module introduces a first relationship weight of the second score prediction module in the process of performing score prediction, and the second score prediction module introduces a second relationship weight of the first score prediction module in the process of performing score prediction;

Correspondingly, the score prediction model is trained by the following method:

acquiring first sample data in the sample set of the monitoring dimension, and marking the first sample data to acquire first sample data carrying a label;

inputting first sample data carrying a label and a first sample score corresponding to the first sample data into an initial score prediction model for preliminary training to obtain a middle score prediction model;

acquiring second sample data in the sample set, and labeling the second sample data to acquire second sample data carrying a label;

and inputting second sample data carrying labels and second sample values corresponding to the second sample data into the intermediate value prediction model for deep training to obtain the value prediction model.

Optionally, the obtaining second sample data in the sample set and labeling the second sample data to obtain second sample data carrying a tag includes:

acquiring the second sample data in the sample set;

dividing the second sample data according to each sub-monitoring dimension of the monitoring dimension, and determining target sample data of each sub-monitoring dimension according to a dividing result;

And labeling the target sample data to obtain target sample data carrying a label.

Optionally, the determining the target sample data of each sub-monitoring dimension according to the division result includes:

determining at least two sub-sample data of each sub-monitoring dimension according to the division result;

and selecting the subsampled data with the highest weight from the at least two subsampled data as the target sample data.

Optionally, the method further comprises:

determining the data correlation degree between each sub-sample data in the second sample data;

judging whether the data correlation is larger than a preset correlation threshold;

if yes, compressing the score prediction model, and taking the compressed score prediction model as the score prediction model corresponding to the monitoring dimension.

Optionally, the score threshold is determined by:

drawing an analysis curve of the score prediction model;

and selecting the intersection point of the analysis curves as the score threshold value.

Optionally, the method further comprises:

receiving an update request for the sample set;

updating the sample data contained in the sample set and the corresponding sample scores according to the updating request;

Performing secondary training on the score prediction model according to the updated result to obtain a target score prediction model; the target score prediction model is used for predicting the score of the distributed system of the next time node in the monitoring dimension.

Optionally, the preprocessing the attribute data includes:

and inputting the attribute data into a classification model corresponding to the monitoring dimension for classification processing, obtaining a classification result corresponding to the attribute data, and taking the classification result as a preprocessing result of the attribute data.

Optionally, the monitoring dimension includes at least one of:

flow monitoring dimension, resource monitoring dimension, time monitoring dimension, cluster monitoring dimension;

accordingly, the attribute data includes at least one of:

traffic data, resource data, time data, cluster data.

According to a second aspect of embodiments of the present application, there is provided a monitoring device of a first distributed system, including:

an acquisition attribute data unit configured to acquire attribute data of the distributed system in at least one monitoring dimension;

the attribute data labeling unit is configured to preprocess the attribute data and label the attribute data according to the preprocessing result to obtain attribute data carrying a label;

The score prediction unit is configured to input attribute data carrying labels into a score prediction model corresponding to the monitoring dimension, and predict the score of the distributed system according to the labels to obtain target scores;

and the monitoring result determining unit is configured to compare the target score with a score threshold value of the monitoring dimension and determine a monitoring result of the distributed system in the monitoring dimension according to a comparison result.

According to a third aspect of the embodiments of the present application, there is provided a monitoring method of a second distributed system, including:

receiving attribute data uploaded by a supervisor in at least one monitoring dimension for a distributed system;

and comparing the target score with a score threshold of the monitoring dimension, generating monitoring reminding information of the distributed system in the monitoring dimension according to a comparison result, and sending the monitoring reminding information to the supervisor.

According to a fourth aspect of the embodiments of the present application, there is provided a monitoring device of a second distributed system, including:

a receiving attribute data unit configured to receive attribute data uploaded by a supervisor for the distributed system in at least one monitoring dimension;

the model prediction unit is configured to input attribute data carrying labels into a score prediction model corresponding to the monitoring dimension, and perform score prediction on the distributed system according to the labels to obtain target scores;

and the reminding information sending unit is configured to compare the target score with the score threshold value of the monitoring dimension, generate monitoring reminding information of the distributed system in the monitoring dimension according to a comparison result, and send the monitoring reminding information to the supervision party.

According to a fifth aspect of embodiments of the present application, there is provided a first computing device comprising:

a memory and a processor;

the memory is for storing computer-executable instructions, and the processor is for executing the computer-executable instructions:

According to a sixth aspect of embodiments of the present application, there is provided a second computing device comprising:

a memory and a processor;

According to a seventh aspect of embodiments of the present application, there is provided a computer readable storage medium storing computer executable instructions which, when executed by a processor, implement the steps of a method of monitoring of two distributed systems.

According to the monitoring method of the distributed system, after the attribute data of the distributed system in at least one monitoring dimension are obtained, the attribute data are preprocessed, the attribute data are marked according to the processing result, the distributed system is subjected to preliminary evaluation in a preprocessing mode, the attribute data carrying the tag are input into the score prediction model corresponding to the monitoring dimension, the distributed system is subjected to score prediction in the monitoring dimension according to the tag, the target score of the distributed system is obtained, the target score is compared with the score threshold, the health degree of the distributed system can be intuitively reflected through the comparison result, the monitoring process of the distributed system can be simplified, the score prediction model is used for scoring the monitoring result of the distributed system, and therefore the accuracy of monitoring the distributed system is further improved.

Drawings

FIG. 1 is a flow chart of a method for monitoring a distributed system according to an embodiment of the present application;

FIG. 2 is a diagram showing a relationship between health and index according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a function curve provided in an embodiment of the present application;

FIG. 4 is a schematic diagram of a training method for a score prediction model according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a monitoring device of a distributed system according to an embodiment of the present application;

FIG. 6 is a flow chart of another method for monitoring a distributed system according to an embodiment of the present application;

FIG. 7 is a schematic diagram of another monitoring method of a distributed system according to an embodiment of the present application;

FIG. 8 is a schematic diagram of a monitoring device of another distributed system according to an embodiment of the present application;

FIG. 9 is a block diagram of a computing device provided in an embodiment of the present application;

FIG. 10 is a block diagram of another computing device provided in an embodiment of the present application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is, however, susceptible of embodiment in many other ways than those herein described and similar generalizations can be made by those skilled in the art without departing from the spirit of the application and the application is therefore not limited to the specific embodiments disclosed below.

The terminology used in one or more embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of one or more embodiments of the application. As used in this application in one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present application refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that, although the terms first, second, etc. may be used in one or more embodiments of the present application to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, a first may also be referred to as a second, and similarly, a second may also be referred to as a first, without departing from the scope of one or more embodiments of the present application. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

First, terms related to one or more embodiments of the present application will be explained.

Cluster risk: the unstable factors of the cloud computing clusters cannot cause faults immediately, but have a certain risk for a long time and need to be treated.

Cluster health score: various risks existing in clusters are quantified by a score, the clusters can be compared in sequence, and the risk treatment effect can be tracked in time.

Logistic regression model: the method is a generalized linear regression analysis model, can be used as a two-class or multi-class problem, and predicts the occurrence probability of various types according to factors.

In the present application, two monitoring methods of a distributed system are provided, and the present application relates to two monitoring apparatuses of a distributed system, two computing devices, and one computer readable storage medium, which are described in detail in the following embodiments.

Fig. 1 is a flowchart of a monitoring method of a distributed system according to an embodiment of the present application, fig. 2 is a schematic diagram of a relationship between health and index values according to an embodiment of the present application, fig. 3 is a schematic diagram of a function curve according to an embodiment of the present application, and fig. 4 is a schematic diagram of a training method of a score prediction model according to an embodiment of the present application, wherein fig. 1 specifically includes the following steps:

Step S102, attribute data of the distributed system in at least one monitoring dimension is obtained.

In practical application, as the service provided by the distributed system is more and more abundant, the problem that the distributed system is more complex is also brought, as the complexity is increased, the risk variety which possibly causes the distributed system to fail is also increased, at this time, the distributed system needs to be monitored in time, the occurrence of small faults is avoided being changed into large faults, and the large faults are evolved into faults which cannot be maintained. In general, the distributed system is monitored by an expert, the health degree of the distributed system is scored, and the health degree of the distributed system is measured by the score, however, the standards of artificial scoring are not uniform, so that a reasonable score is difficult to give, and the reliability of the distributed system is further affected.

According to the monitoring method for the distributed system, in order to improve accuracy of measurement of the distributed system, after the attribute data of the distributed system in at least one monitoring dimension are obtained, the attribute data are preprocessed, the attribute data are marked according to the processing result, preliminary evaluation of the distributed system is achieved through the preprocessing mode, then the attribute data carrying the tag are input into the score prediction model corresponding to the monitoring dimension, the distributed system is subjected to score prediction in the monitoring dimension according to the tag, the target score of the distributed system is obtained, the target score is compared with the score threshold, the health degree of the distributed system can be intuitively reflected through the comparison result, the monitoring process of the distributed system can be simplified, the score is marked through the score prediction model, the monitoring result of the distributed system has better interpretability, and the accuracy of monitoring the distributed system is further improved.

In specific implementation, although the scale of the distributed system can be enlarged to improve the service comprehensiveness, the risk management is a notable place, so that the distributed system needs to be subjected to score prediction, the distributed system is scored in different dimensions, and the health degree of the distributed system is measured through scores, so that the timely discovery of risks or faults is realized, the faults with large risks are removed in time, and the operation of the distributed system is ensured.

Based on the above, the monitoring dimension specifically refers to a dimension for monitoring the distributed system, and the monitoring dimension may be a flow monitoring dimension, a resource monitoring dimension, a time monitoring dimension, and a cluster monitoring dimension; accordingly, the attribute data may be traffic data, resource data, time data, cluster data. The distributed system can be subjected to score prediction in the flow monitoring dimension through the flow data, and the health degree of the distributed system in the flow monitoring dimension is analyzed; the value of the distributed system in the resource monitoring dimension can be predicted through the resource data, and the monitoring degree of the distributed system in the resource monitoring dimension is analyzed; the value of the distributed system in the time monitoring dimension can be predicted through the time data, and the health degree of the distributed system in the time monitoring dimension is analyzed; the value of the distributed system in the cluster monitoring dimension can be predicted through the cluster data, and the health degree of the distributed system in the cluster monitoring dimension is analyzed; monitoring the distributed system is to measure the health degree of the distributed system, wherein the health degree reflects the risk degree of the distributed system, and the lower the health degree, the higher the risk, and the lower the risk.

The flow data specifically refers to data related to the flow of data processed by the distributed system, data related to flow variation coefficients or data related to standard deviation among machines, and the like; the resource data specifically refers to data related to CPU utilization rate of the distributed system, data related to CPU service life or data related to CPU running time; the time data specifically refers to data related to the time of processing data by the distributed system, and data related to the time of creating the data or data related to the time of reading and writing the data by the disk; the cluster data specifically refers to data related to the number of servers of the distributed system, data related to the number of disks or data related to deployment positions, and the like.

It should be noted that the monitoring dimension may also be other dimensions capable of measuring the health degree of the distributed system, such as an operation monitoring dimension (a dimension to which the relevant data for maintaining the distributed system belongs), which is not described in detail herein.

In addition, in the process of monitoring the distributed system, the distributed system may be monitored from multiple dimensions at the same time, for example, the distributed system is monitored through three dimensions of a resource monitoring dimension, a flow monitoring dimension and a time monitoring dimension, the health degree of the distributed system in the three dimensions is measured, then when the predicted values of the distributed system in each dimension are obtained, the health condition of the distributed system may be determined in a manner of analyzing one by one, or the predicted values of the three dimensions may be integrated, and the health condition of the distributed system is analyzed as a whole.

The process of monitoring the distributed system from multiple dimensions at the same time may refer to description content of the embodiment of monitoring the entries of the distributed system from at least one dimension, which is not described herein in detail, and the embodiment describes a monitoring method of the distributed system by using the monitoring dimension as a resource monitoring dimension, where the attribute data is, correspondingly, resource utilization data, disk capacity data and the like of the distributed system.

The health degree of the distributed system in each dimension is reflected on the predictive value of each monitoring dimension, the health degree of the distributed system can be reflected through the change of the predictive value of each dimension, and the maintenance of the system by a supervisor of the distributed system is more convenient.

Step S104, preprocessing the attribute data, and marking the attribute data according to the preprocessing result to obtain the attribute data carrying the tag.

Specifically, on the basis of the obtained attribute data of the distributed system in at least one monitoring dimension, further, preprocessing the attribute data, and marking the attribute data according to a preprocessing result, so as to obtain attribute data carrying a label, and performing score prediction on the distributed system by using a score prediction model corresponding to the monitoring dimension to be input subsequently.

The preprocessing is performed on the attribute data, specifically, the health degree of the distributed system is primarily analyzed, but in this case, in order to avoid the insufficient accuracy of the analysis on the distributed system, the primary analysis is performed, that is, the analysis result of the distributed system based on the attribute data is 0 or 1,0 indicates that the distributed system has a fault in the monitoring dimension, and 1 indicates that the distributed system has no fault in the monitoring dimension.

Based on the above, after determining the preprocessing result of the distributed system, labeling the attribute data according to the preprocessing result to obtain attribute data carrying a label, namely, the label carried by the attribute data is label 1 or label 0, so as to be used for carrying out score prediction on the distributed system by a subsequent input model.

Further, in the process of labeling the attribute data, different pretreatment results will label the attribute data differently and also affect the score prediction result of the subsequent score prediction model, so in this process, the attribute data needs to be accurately analyzed and then labeled, and in this embodiment, the specific implementation manner is as follows:

Acquiring an analysis result of analyzing the attribute data;

In practical application, analyzing the attribute data specifically means primarily judging whether the distributed system has a fault, if yes, marking the attribute data to obtain the attribute data carrying the label to be processed, and if not, marking the attribute data to obtain the attribute data carrying the processed label.

For example, the obtained attribute data of the distributed system in the resource monitoring dimension is that the disk utilization rate reaches 85%, at this time, by analyzing the disk utilization rate reaching 85%, it can be determined that the distributed system has a need for maintenance in the resource monitoring dimension, or there is no need for maintenance, at this time, the attribute data is marked according to the analysis result, and when the analysis result is that there is a need for maintenance, the attribute data is: adding a label 0 (a label to be processed) when the disk utilization rate reaches 85%; in the case where the analysis result is that there is no need for maintenance, the attribute data is: disk utilization reaches 85% with 1 (processed label) added for subsequent scoring of the distributed system by the score prediction model.

In addition, in the process of preprocessing the attribute data, since only preliminary analysis is needed for the distributed system with respect to the attribute data, a process of preprocessing can be implemented by using a classification model, and a preprocessing result of 0 or 1 is directly given to the attribute data by using the classification model, so as to improve the preprocessing effect on the attribute data, and in this embodiment, the specific implementation manner is as follows:

Specifically, different classification models are configured in different monitoring dimensions, preliminary analysis can be performed on the distributed system through attribute data of each dimension, and the classification model outputs only 0 or 1, namely, two results of failure or failure-free of the distributed system in the monitoring dimension are output, so that the attribute data can be more conveniently marked, and the subsequent score prediction of the distributed system is facilitated.

According to the method, when the attribute data of the distributed system in the resource monitoring dimension is obtained and the disk utilization rate reaches 85%, the disk utilization rate reaches 85% and is input into a classification model of the resource monitoring dimension for classification processing, and an output result of the distributed system in the resource monitoring dimension, which is required to be maintained, or is not required to be maintained, can be obtained for subsequent labeling.

The attribute data is preprocessed through the classification model, so that the efficiency of health degree measurement of the distributed system can be effectively improved, the preliminary analysis of the attribute data can be completed without human intervention, and the accuracy of health degree measurement of the distributed system is further improved.

And S106, inputting attribute data carrying labels into a score prediction model corresponding to the monitoring dimension, and predicting the score of the distributed system according to the labels to obtain target scores.

Specifically, on the basis of obtaining the attribute data carrying the tag, further, the score prediction of the distributed system in the monitoring dimension is performed through the score prediction model of the monitoring dimension, the target score of the distributed system in the monitoring dimension is obtained, the target score indicates the health score of the distributed system in the monitoring dimension, the higher the target score indicates the healthier the distributed system is, namely, the lower the probability of having a fault is, the description is unnecessary, and the lower the target score indicates the unhealthy the distributed system, namely, the higher the probability of having a fault is, the description is more necessary.

Based on the above, in the process of predicting the score of the distributed system, the attribute data is preprocessed to obtain the preliminary analysis result of the distributed system, so that in the process of predicting the score of the distributed system through the score prediction model, the influence of the preliminary analysis result needs to be considered, thereby realizing more accurate prediction of the target score of the distributed system, namely, the score prediction needs to be performed on the distributed system according to the label carried by the attribute data, and the target score is obtained.

In the implementation, the score prediction is carried out on the distributed system according to the label, specifically, a prediction mode is selected in the score prediction model to calculate the target score of the distributed system; the score prediction model may be a Logistic regression model; it should be noted that, the score prediction models of different dimensions are trained by using different sample sets, so as to adapt to the corresponding monitoring dimensions, and more accurate score prediction is performed.

Further, in order to monitor the distributed system from multiple dimensions, the server side monitoring the distributed system processes attribute data of each dimension by configuring different score prediction models in different dimensions, and outputs a target score of the distributed system in each dimension, in this process, in order to avoid a problem that attribute data of an input model is not matched with the dimension, a subsequent score prediction process is performed after determining a score prediction model of a monitored dimension, and in this embodiment, the specific implementation manner is as follows:

Determining a score prediction model corresponding to the monitoring dimension;

Further, after the attribute data carrying the tag is input into the score prediction model, different modules need to be selected according to the type of the tag to perform score prediction on the distributed system, and in this embodiment, the specific implementation manner is as follows:

when the tag is the processed tag, inputting the attribute data into a first score prediction module in the score prediction model to predict the score of the distributed system, and obtaining a first target score of the distributed system in the monitoring dimension as the target score; or if the label is the label to be processed, inputting the attribute data into a second score prediction module in the score prediction model to predict the score of the distributed system, and obtaining a second target score of the distributed system in the monitoring dimension as the target score.

Specifically, if the tag is a processed tag, it is indicated that there is no fault as a result of the preliminary analysis on the distributed system according to the attribute data, a first score prediction module may be selected in the score prediction model to perform score prediction on the distributed system, and the first target score output by the first score prediction module may be used as the target score of the distributed system in the monitoring dimension.

And under the condition that the label is a label to be processed, the fact that the result of preliminary analysis on the distributed system according to the attribute data is that a fault exists is indicated, a second score prediction module can be selected in the score prediction model to conduct score prediction on the distributed system, and a second target score output by the second score prediction module is taken as the target score of the distributed system in the monitoring dimension.

Based on the above, the first score prediction module is configured to process attribute data of which the preliminary judgment result is that no fault tag exists, output a first target score of the distributed system in the monitoring dimension, and the second score prediction module is configured to process attribute data of which the preliminary judgment result is that no fault tag exists, and output a second target score of the distributed system in the monitoring dimension; the first score prediction module introduces the influence of the pretreatment result that no faults exist, and the second score prediction module introduces the influence of the pretreatment result that faults exist, so that the output result of the score prediction model meets scene requirements better.

Along the above example, in the attribute data: under the condition that the label carried by the disk utilization rate reaches 85% is label 1 (processed label), attribute data are input into a resource score prediction model corresponding to a resource monitoring dimension, and the distributed system is subjected to score prediction in the resource monitoring dimension through a first score prediction module in the resource score prediction model to obtain a first target score of 75 of the distributed system for later analysis of whether the distributed system has a fault risk.

Alternatively, in the attribute data: under the condition that the disk utilization rate reaches 85% and the carried label is label 0 (label to be processed), attribute data is required to be input into a resource score prediction model corresponding to the resource monitoring dimension, and the second score prediction module in the resource score prediction model predicts the score of the distributed system in the resource monitoring dimension to obtain a second target score 78 of the distributed system for later analysis of whether the distributed system has a fault risk.

By setting different modules for different label types in the score prediction model to process the attribute data, the target score of the distributed system in the monitoring dimension can be predicted more accurately, so that the distributed system can be monitored more conveniently and subsequently, and the monitoring accuracy is further improved.

In addition, in the process of constructing the score prediction model, since the health degree of the distributed system has an inverse relationship with the risk of the distributed system, a sensitive interval exists in the process of measuring the health degree, as shown in (a) of fig. 2, wherein the abscissa represents an index value, such as a CPU utilization rate, a storage space utilization rate, and the like, and the ordinate represents the health degree of the distributed system, the smaller the index value, the higher the health degree, and the health degree decreases with the increase of the index value; referring to fig. 2 (b), when the index value exceeds a certain threshold value, the health degree is rapidly reduced, and the risk degree of the distributed system becomes high, and at this time, the interval in which the health degree is rapidly changed can be determined as the sensitive interval.

As described above, the relationship between the health degree and the index value is an inverse S-shaped curve of the existence sensitive zone, so that the change process of the curve is analyzed to be similar to that of the Sigmoid function, the distribution function of the Logistic distribution can be described by the Sigmoid function, and the calculation formula of the Sigmoid function is S (x) =1/1+e ^-z Where Z represents a continuous random variable, the corresponding relationship is shown in FIG. 3, whereby a distribution function F (Z) =P (Z < =z) =1/1+e of the Logistic distribution can be derived ^-(z-μ)/γ Where μ represents a position parameter and γ represents a shape parameter.

At this time, the relation that the distribution function F (z) of the Logistic distribution is described by the Sigmoid function is determined, and let-z=wx+b, where x represents a variable, w represents a weight, and b represents a constant, at this time, the last learned result of the Logistic regression model is obtained as follows formula (1) and formula (2):

P(Y＝0|x)＝1/1+e ^(wx+b) (1)；

P(Y＝1|x)＝1-P(Y＝0|x)＝e ^(wx+b) /1+e ^(wx+b) (2)；

wherein P represents the output score of the score prediction model, and x represents the variable characteristic, that is, the relationship curve of the health degree and the index value at the time of P (y= 0|x) is a Sigmoid curve; the label to be processed (with fault) is identified by the above-mentioned known 0, the label to be processed (without fault) is identified by the above-mentioned known 1, namely when the attribute data input into the Logistic regression model carries the 0 label (label to be processed), the characteristic variable corresponding to the attribute data is input into the formula (1), and the second target score of the distributed system at this time can be obtained, which indicates the score that the distributed system has no fault; or when the attribute data of the output Logistic regression model carries a 1 tag (processed tag), inputting a characteristic variable corresponding to the attribute data into a formula (2) to obtain a first target score of the distributed system at the moment, and also indicating that the distributed system has no fault, wherein the score is a value in a section [0,1], and can be output as a health score of a score prediction model, namely the target score.

In summary, the attribute data is labeled by the label 0 or the label 1, and then a calculation model for measuring the health degree of the distributed system is fitted by a score prediction model (Logistic regression model), so that the distributed system is predicted in the monitoring dimension more accurately, the accuracy of monitoring the distributed system is improved, and the efficiency of measuring the health degree of the distributed system is also effectively improved.

Further, the score prediction model may monitor the distributed system for attribute data carrying different labels, so that a first score prediction module and a second score prediction module are included in the score prediction model, so that attribute data containing different labels may be processed through the score prediction model, and in order to improve monitoring efficiency of the distributed system, the first score module and the second score module are deployed in the distributed system, in addition, in a process of performing score prediction on the distributed system, the first score module will introduce a first relationship weight of the second score prediction module, and in a process of performing score prediction on the distributed system, the second score prediction module will introduce a second relationship of the first score prediction module, so that when performing score prediction, the first score prediction module and the second score prediction module can introduce a relationship weight to adjust output, and accuracy of target score prediction is improved.

Based on this, in the training process of the score prediction model, not only the prediction capability of the model but also the stability of the model need to be considered, that is, the accuracy of the model for performing the score prediction on the distributed system needs to be improved.

(1) And acquiring first sample data in the sample set of the monitoring dimension, and marking the first sample data to acquire first sample data carrying a label.

Specifically, the sample set specifically refers to a set formed by sample data corresponding to the monitoring dimension, sample data contained in the sample set is similar to the attribute data, and correspondingly, the first sample data specifically refers to initial sample data for training an initial score prediction model, in a stage of initial training of the model, a definite critical point (warning, error or fatal) needs to be given to the first sample data, so as to realize that the model can definitely learn sample scores at two ends, and before that, the first sample data needs to be labeled, that is, a processed label or a label to be processed is added to the first sample data, and training of the score prediction model can be understood as training of a first score prediction module and a second score prediction module in the score prediction model.

(2) And inputting the first sample data carrying the label and the first sample score corresponding to the first sample data into an initial score prediction model for preliminary training to obtain an intermediate score prediction model.

Specifically, under the condition that first sample data carrying a label is obtained after the first sample data is marked, extracting a first sample value corresponding to the first sample data from the sample set, forming a sample pair based on the first sample data carrying the label and the first sample value, and performing preliminary training on the initial value prediction model to obtain the intermediate value prediction model, wherein the initial value prediction model specifically refers to a value prediction model which is obtained after preliminary construction, and the intermediate value prediction model specifically refers to a value prediction model which is obtained after preliminary training.

Based on the above, the initial score prediction model is primarily trained to enable the model to have the capability of score prediction, but the prediction effect does not reach the requirement, and the intermediate score prediction model is further trained deeply to obtain the score prediction model meeting the prediction requirement.

(3) And obtaining second sample data in the sample set, and labeling the second sample data to obtain second sample data carrying a label.

Specifically, after obtaining the intermediate score prediction model, in order to obtain the score prediction model meeting the actual scene requirement, further training is needed to be performed on the intermediate score prediction model, and the training is performed at this time in order to enable the prediction effect of the model to be better, so that further training is needed to be performed on the intermediate score prediction model by acquiring second sample data from the sample set; the second sample data specifically refers to sample data containing various types, specifically refers to attribute data of various types in the monitoring dimension, for example, the monitoring dimension is a resource monitoring dimension, and the second sample data may be data of different occupancy rates of the CPU, data of different usage rates of a disk storage space, and the like.

Based on this, since the training of the intermediate score prediction model is to obtain a score prediction model with a better prediction effect, a large amount of second sample data needs to be extracted to train the intermediate score prediction model, and if the sample data contained in the second sample data is not balanced enough, the trained model prediction effect may also deviate, so that the second sample data needs to be adjusted, so that the trained score prediction model has a better prediction effect.

Acquiring the second sample data in the sample set;

Specifically, each sub-monitoring dimension in the monitoring dimension specifically refers to a dimension corresponding to different types of data, for example, the monitoring dimension is a resource monitoring dimension, then each sub-monitoring dimension in the resource monitoring dimension may be different types of resource data, each type represents a sub-monitoring dimension, if the resource data is a CPU occupancy rate and a disk utilization rate, then the data type of the CPU occupancy rate belongs to one sub-monitoring dimension, and the data type of the disk utilization rate belongs to one sub-monitoring dimension.

Based on this, the second sample data may include a large number of sub-sample data in each sub-monitoring dimension, in order to perform deep training on the intermediate score prediction model with balance maintained, the second sample data is divided according to each sub-monitoring dimension, the target sample data of each sub-monitoring dimension is determined according to the division result, and then the target sample data is labeled to obtain target sample data with labels, so as to implement subsequent deep training on the intermediate score prediction model.

In the process of determining the target sample data of each sub-monitoring dimension according to the division result, if the correlation of the sub-sample data included in the second sample data is higher, the trained model prediction effect may not reach a better effect, and then principal component analysis may be performed on the sample data, in this embodiment, the specific implementation manner is as follows:

Specifically, at least two sub-sample data under each sub-monitoring dimension are determined according to the dividing result, and then sub-sample data with highest weight in the at least two sub-sample data are selected as the target sample data for training the intermediate value prediction model later, so that the prediction effect of the model is improved by reducing the relevance among the sample data.

In practical applications, performing principal component analysis on the second sample data may be understood as performing dimension reduction processing on the second sample data; for example, the monitoring dimensions include 50 sub-monitoring dimensions, and the second sample data includes 100 sub-sample data, at this time, it is determined that each sub-monitoring dimension in the 50 sub-monitoring dimensions corresponds to at least two pieces of sub-sample data, in order to ensure the uniformity of the trained score prediction model, the dimension reduction process is performed on the 100 pieces of sample data, so that the 50 sub-monitoring dimensions respectively correspond to one piece of sub-sample data, and in the dimension reduction process, the sample data with the highest weight in each sub-monitoring dimension is selected as the target sample data for the subsequent deep training.

Before the intermediate score prediction model is subjected to deep training, the second sample data is adjusted, so that the influence of sample data with higher relevance on the prediction effect of the score prediction model is avoided, and the prediction accuracy of the score prediction model is further improved.

(4) And inputting second sample data carrying labels and second sample values corresponding to the second sample data into the intermediate value prediction model for deep training to obtain the value prediction model.

Specifically, after second sample data carrying a label is obtained, second sample scores corresponding to the second sample data are extracted from the sample set, a sample pair is formed based on the second sample data carrying the label and the second sample scores, and deep training is performed on the intermediate score prediction model to obtain a score prediction model meeting the score prediction requirement.

Referring to fig. 4, in the training process of the score prediction model, firstly, data capable of measuring the health degree of the distributed system in the monitoring dimension needs to be extracted, then, sample data are marked, whether each sub-sample data is processed or to be processed is marked, and finally, sample data carrying labels and corresponding sample scores thereof are input into the score prediction model for training, so that the score prediction model meeting the actual scene requirements is obtained.

In summary, in the process of training the score prediction model, in order to improve the interpretability of the score prediction model, training the score prediction model is achieved through a preliminary training mode and a deep training mode, and when the score prediction model is in deep training, relatively balanced second sample data is extracted to conduct the deep training, so that the prediction effect of the score prediction model of the training component is further improved.

In addition, if the second sample data has more features corresponding to each sub-sample data, the trained model may have larger volume and more redundancy parameters, and the score prediction model may be compressed at this time, in this embodiment, the specific implementation manner is as follows:

if yes, compressing the score prediction model, and taking the compressed score prediction model as a score prediction model corresponding to the monitoring dimension;

if not, do not do any processing.

Specifically, the reason why the excessive redundancy parameter occurs in the score prediction model is that the score prediction model is trained by using excessive sample data with strong correlation, so that whether the score prediction model needs to be compressed or not can be determined by calculating the data correlation between the second sample data and comparing the data correlation with the preset correlation threshold.

And under the condition that the data correlation is not greater than the preset correlation threshold, the fact that the correlation between all sub-sample data in the second sample data used in the process of training the score prediction model is weak is indicated, and no processing is needed for the score prediction model.

In practical application, in the process of compressing the score prediction model, the model can be compressed through a regular term, so that the redundant parameters of the model are reduced, and the prediction effect of the model is not reduced.

On the other hand, since the score prediction model is a measure of the health of the distributed system, in the case where an adjustment occurs in the distributed system, the score prediction model also needs to be adjusted following the adjustment of the distributed system, and in this embodiment, the specific implementation manner is as follows:

Receiving an update request for the sample set;

Specifically, when an update request of the sample set is received, it is stated that the score prediction model needs to be updated, the score prediction model is retrained by the adjusted data in the updating process, based on the update request, the sample data and the corresponding sample scores contained in the sample set are updated, then the sample data and the corresponding sample scores thereof can be reselected in the sample set according to the update result, and the score prediction model is secondarily trained, so that a target score prediction model meeting the requirement is obtained, and the target score prediction model is used for predicting the scores of the distributed system of the next time node in the monitoring dimension.

In practical application, in the process of performing secondary training on the score prediction model, because the scoring standard of the distributed system is changed, if the score prediction model before adjustment is still used for predicting the changed distributed system, the prediction accuracy is greatly reduced, and in order to avoid the situation, the score prediction model needs to be trained again, so that the version of the score prediction model can be changed along with the change of the distributed system, and the accuracy of measuring the health degree of the distributed system is improved.

And step S108, comparing the target score with a score threshold of the monitoring dimension, and determining a monitoring result of the distributed system in the monitoring dimension according to a comparison result.

Specifically, on the basis of obtaining the target score of the distributed system in the monitoring dimension, further, determining a monitoring result of the distributed system in the monitoring dimension according to the score, comparing the target score with a score threshold of the monitoring dimension, and determining whether the distributed system has a fault in the monitoring dimension according to the comparison result, namely, the comparison result of the target score and the score threshold of the monitoring dimension.

The score threshold of the monitoring dimension may be determined according to an analysis curve of a score prediction model, and in this embodiment, the specific implementation manner is as follows:

drawing an analysis curve of the score prediction model;

Specifically, after the distributed system is scored, whether the distributed system has a fault or not needs to be analyzed according to the size of a target score, and if a threshold value for distinguishing whether the fault exists is set manually, a problem of inaccurate setting exists, in order to avoid the problem, a variable corresponding to an intersection point in the ROC curve can be selected as the score threshold value by drawing an analysis curve (ROC curve) of the score prediction model, so that more accurate comparison with the target score is realized, and the health condition of the distributed system is analyzed.

Further, in the process of comparing the target score with the score threshold, determining the health degree of the distributed system according to the comparison result, and judging whether the target score is larger than the score threshold after reading the score threshold of the monitoring dimension; if yes, determining that the distributed system has no fault in the monitoring dimension, determining that the monitoring result of the distributed system has no fault, if not, determining that the distributed system has the fault in the monitoring dimension, and determining that the monitoring result of the distributed system has the fault. It should be noted that, the monitoring result is that there is a fault and the monitoring result is that there is no fault is a prediction result, so as to prompt the supervision party to make maintenance in time, thereby avoiding causing larger influence.

Furthermore, in the case that the monitoring result of the distributed system is that there is a fault, in order to avoid greater influence caused by the fault predicted by the distributed system, a reminding message is sent to a supervisor of the distributed system, so as to achieve the purpose of timely maintenance, and in this embodiment, the specific implementation manner is as follows:

In practical application, since the monitoring of the distributed system is performed in the monitoring dimension, the attribute data needs to be combined to generate in the process of generating the notification information, so that the supervisor can be informed of the fault direction of the distributed system, and the distributed system can be maintained in time.

According to the method, after the distributed system is scored in the resource monitoring dimension through the score prediction model, the score of the distributed system in the resource monitoring dimension is 78 and the score threshold is 80, so that hidden danger of the distributed system in the resource monitoring dimension is analyzed, investigation is needed, notification information sent to the distributed system is generated according to the fact that the disk utilization reaches 85%, the notification information is sent to a supervisor of the distributed system, hidden danger that the supervisor has too high disk utilization is timely informed, investigation is timely conducted on hidden danger, and influence on upstream and downstream services is avoided.

Corresponding to the method embodiment, the present application further provides a monitoring device embodiment of the distributed system, and fig. 5 shows a schematic structural diagram of a monitoring device of the distributed system according to an embodiment of the present application. As shown in fig. 5, the apparatus includes:

An acquire attribute data unit 502 configured to acquire attribute data of the distributed system in at least one monitoring dimension;

the attribute data labeling unit 504 is configured to preprocess the attribute data and label the attribute data according to the preprocessing result to obtain attribute data carrying a tag;

the score prediction unit 506 is configured to input attribute data carrying a tag into a score prediction model corresponding to the monitoring dimension, and predict the score of the distributed system according to the tag to obtain a target score;

and the monitoring result determining unit 508 is configured to compare the target score with a score threshold value of the monitoring dimension, and determine a monitoring result of the distributed system in the monitoring dimension according to a comparison result.

In an alternative embodiment, the attribute data labeling unit 504 includes:

an analysis result acquisition subunit configured to acquire an analysis result of analyzing the attribute data;

and the attribute data labeling subunit is configured to label the attribute data according to the analysis result to obtain the attribute data carrying the processed label or the label to be processed.

In an alternative embodiment, the score prediction unit 506 includes:

a determining model subunit configured to determine a score prediction model corresponding to the monitored dimension;

and the data input subunit is configured to input attribute data carrying the processed label or the label to be processed into the score prediction model.

In an alternative embodiment, the score prediction unit 506 is further configured to:

or,

In an alternative embodiment, the determining the monitoring result unit 508 includes:

A score judgment subunit configured to read the score threshold of the monitoring dimension and judge whether the target score is greater than the score threshold;

if yes, a first determining subunit is operated, and the first determining subunit is configured to determine that the distributed system has no fault in the monitoring dimension;

and if not, operating a second determining subunit, wherein the second determining subunit is configured to determine that the distributed system has a fault in the monitoring dimension.

In an alternative embodiment, the monitoring device of the distributed system further includes:

a generation notification information unit configured to generate notification information of the distributed system in the monitoring dimension based on attribute data;

and a sending notification information unit configured to send the notification information to a supervisor of the distributed system.

In an optional embodiment, the score prediction model includes a first score prediction module and a second score prediction module, where the first score prediction module introduces a first relationship weight of the second score prediction module in a process of making a score prediction, and the second score prediction module introduces a second relationship weight of the first score prediction module in a process of making a score prediction;

Correspondingly, the score prediction model is trained by the following units:

the first sample data obtaining unit is configured to obtain first sample data in the sample set of the monitoring dimension, mark the first sample data and obtain first sample data carrying a label;

the initial training unit is configured to input first sample data carrying a label and first sample scores corresponding to the first sample data into the initial score prediction model for initial training to obtain a middle score prediction model;

the second sample data obtaining unit is configured to obtain second sample data in the sample set, mark the second sample data and obtain second sample data carrying a label;

and the depth training unit is configured to input second sample data carrying labels and second sample scores corresponding to the second sample data into the intermediate score prediction model for depth training to obtain the score prediction model.

In an alternative embodiment, the second acquired sample data unit includes:

an acquire sample data subunit configured to acquire the second sample data in the sample set;

A sample data dividing subunit configured to divide the second sample data according to each sub-monitoring dimension of the monitoring dimension, and determine target sample data of each sub-monitoring dimension according to a division result;

and the sample data labeling subunit is configured to label the target sample data to obtain target sample data carrying a label.

In an alternative embodiment, the sample data dividing subunit includes:

a determining data sub-module configured to determine at least two sub-sample data of the respective sub-monitoring dimensions according to the division result;

and the selecting data sub-module is configured to select the sub-sample data with the highest weight in the at least two sub-sample data as the target sample data.

a data correlation determination unit configured to determine a data correlation between respective sub-sample data in the second sample data;

a judging correlation unit configured to judge whether the data correlation is greater than a preset correlation threshold;

if yes, a compression model unit is operated, the compression model unit is configured to compress the score prediction model, and the compressed score prediction model is used as the score prediction model corresponding to the monitoring dimension.

In an alternative embodiment, the score threshold is determined by:

drawing an analysis curve of the score prediction model;

a reception request unit configured to receive an update request for the sample set;

an update data unit configured to update sample data and corresponding sample scores thereof included in the sample set according to the update request;

the secondary training unit is configured to perform secondary training on the score prediction model according to the updating result to obtain a target score prediction model; the target score prediction model is used for predicting the score of the distributed system of the next time node in the monitoring dimension.

In an alternative embodiment, the attribute data labeling unit 504 is further configured to:

In an alternative embodiment, the monitoring dimension includes at least one of:

accordingly, the attribute data includes at least one of:

traffic data, resource data, time data, cluster data.

According to the monitoring device of the distributed system, after the attribute data of the distributed system in at least one monitoring dimension are obtained, the attribute data are preprocessed, the attribute data are marked according to the processing result, the distributed system is subjected to preliminary evaluation in a preprocessing mode, the attribute data carrying the tag are input into the score prediction model corresponding to the monitoring dimension, the distributed system is subjected to score prediction in the monitoring dimension according to the tag, the target score of the distributed system is obtained, the target score is compared with the score threshold, the health degree of the distributed system can be intuitively reflected through the comparison result, the monitoring process of the distributed system can be simplified, the score is obtained through the score prediction model, the monitoring result of the distributed system has better interpretability, and the accuracy of monitoring the distributed system is further improved.

The above is a schematic scheme of a monitoring device of a distributed system of the present embodiment. It should be noted that, the technical solution of the monitoring device of the distributed system and the technical solution of the monitoring method of the distributed system belong to the same concept, and details of the technical solution of the monitoring device of the distributed system, which are not described in detail, can be referred to the description of the technical solution of the monitoring method of the distributed system.

The following is an embodiment of another monitoring method for a distributed system provided in the present application:

fig. 6 is a flowchart of another monitoring method of a distributed system according to an embodiment of the present application, and fig. 7 is a schematic diagram of another monitoring method of a distributed system according to an embodiment of the present application, where fig. 6 specifically includes the following steps:

in step S602, attribute data uploaded by a supervisor in at least one monitoring dimension for a distributed system is received.

According to the monitoring method of the distributed system, in order to improve accuracy of measurement of the distributed system, after the attribute data of the distributed system in at least one monitoring dimension are received, the attribute data are preprocessed, the attribute data are marked according to the processing result, preliminary evaluation of the distributed system is achieved through the preprocessing mode, then the attribute data carrying with the tag are input into the score prediction model corresponding to the monitoring dimension, the distributed system is subjected to score prediction in the monitoring dimension according to the tag, the target score of the distributed system is obtained, the target score is compared with the score threshold, and finally supervision reminding information is sent to a supervision party, so that the health degree of the distributed system can be intuitively reflected through the comparison result, the monitoring process of the distributed system can be simplified, the score prediction model is used for marking the result of monitoring the distributed system, the accuracy of monitoring the distributed system is further improved, the distributed system can be timely maintained through reminding the supervision party, and the influence on upstream and downstream service is reduced.

In specific implementation, the supervision party specifically refers to a party maintaining the distributed system, a service platform or a maintenance department of the service platform, and the like, and the supervision reminding information specifically refers to reminding information sent to the supervision party after the distributed system is predicted, and can be reminding information of hidden danger of the distributed system or reminding information of hidden danger of the distributed system.

In practical application, because the technology for monitoring the distributed system is complex, if a monitoring node is independently constructed for the distributed system, more resources are consumed, in order to save the resource consumption of one party with the distributed system, the attribute data of the distributed system can be uploaded to a server, the server measures the health degree of the distributed system according to the attribute data, and finally the supervision party of the distributed system returns monitoring reminding information, so that the accurate monitoring of the distributed system can be performed while the resource consumption is saved, and the experience effect of the supervision party is further improved.

Step S604, preprocessing the attribute data, and labeling the attribute data according to the preprocessing result to obtain the attribute data carrying the tag.

And step S606, inputting attribute data carrying labels into a score prediction model corresponding to the monitoring dimension, and predicting the score of the distributed system according to the labels to obtain target scores.

And step 608, comparing the target score with a score threshold of the monitoring dimension, generating monitoring reminding information of the distributed system in the monitoring dimension according to a comparison result, and sending the monitoring reminding information to the supervisor.

The monitoring method of the distributed system according to the present embodiment is similar to the monitoring method of the distributed system according to the foregoing embodiment, and specific descriptions of the monitoring method of the distributed system may refer to descriptions corresponding to the foregoing embodiment, which is not repeated herein.

In addition, under the condition that the attribute data uploaded by the supervision side corresponds to a plurality of dimensions, determining the total predicted value of the value predicting model by integrating the values of the dimensions, performing subsequent health analysis, referring to fig. 7, describing by taking the dimensions as flow monitoring dimensions and resource monitoring dimensions as examples, after obtaining the flow data and the resource data, inputting the flow data into the flow value predicting model corresponding to the flow monitoring dimensions, and inputting the resource data into the resource value predicting model corresponding to the resource monitoring dimensions, preprocessing the flow data and the resource data, adding a processed label or a label to be processed according to the preprocessing result, obtaining the flow target value of the distributed system in the flow monitoring dimensions by the flow value predicting model, obtaining the resource target value of the distributed system in the resource monitoring dimensions by the resource predicting model, and finally integrating the target values of the dimensions, determining the total target value of the distributed system by taking the minimum value, so as to reflect the health degree of the distributed system.

Further, the average value of the score threshold values of all the dimensions is calculated, and the total target score value is compared with the average value, so that monitoring reminding information is sent to the supervision party according to the comparison result, the purpose of reminding the supervision party in time is achieved, and the influence on upstream and downstream services caused by faults of the distributed system is avoided.

According to the monitoring method of the distributed system, after the attribute data of the distributed system in at least one monitoring dimension are received, the attribute data are preprocessed, the attribute data are marked according to the processing result, the distributed system is subjected to preliminary evaluation in a preprocessing mode, the attribute data carrying the tag are input into the score prediction model corresponding to the monitoring dimension, the distributed system is subjected to score prediction in the monitoring dimension according to the tag, the target score of the distributed system is obtained, the target score is compared with the score threshold, and finally supervision reminding information is sent to a supervision party, so that the health degree of the distributed system can be intuitively reflected through the comparison result, the monitoring process of the distributed system can be simplified, the score prediction model is used for marking the monitoring result of the distributed system, the accuracy of monitoring the distributed system is further improved, the supervision party is informed to timely maintain the distributed system, and the influence on upstream and downstream services is reduced.

Corresponding to the method embodiment, the present application further provides another embodiment of a monitoring device of the distributed system, and fig. 8 shows a schematic structural diagram of another monitoring device of the distributed system according to an embodiment of the present application. As shown in fig. 8, the apparatus includes:

a receive attribute data unit 802 configured to receive attribute data uploaded by a supervisor for the distributed system in at least one monitoring dimension;

an attribute data labeling unit 804 configured to preprocess the attribute data, and label the attribute data according to the preprocessing result, so as to obtain attribute data carrying a tag;

the model prediction unit 806 is configured to input attribute data carrying a tag into a score prediction model corresponding to the monitoring dimension, and predict the score of the distributed system according to the tag to obtain a target score;

and a reminding information sending unit 808, configured to compare the target score with the score threshold of the monitoring dimension, generate monitoring reminding information of the distributed system in the monitoring dimension according to the comparison result, and send the monitoring reminding information to the supervisor.

According to the monitoring device of the distributed system, after the attribute data of the distributed system in at least one monitoring dimension are received, the attribute data are preprocessed, the attribute data are marked according to the processing result, the distributed system is subjected to preliminary evaluation in a preprocessing mode, the attribute data carrying the tag are input into the score prediction model corresponding to the monitoring dimension, the distributed system is subjected to score prediction in the monitoring dimension according to the tag, the target score of the distributed system is obtained, the target score is compared with the score threshold, and finally supervision reminding information is sent to the supervision party, so that the health degree of the distributed system can be intuitively reflected through the comparison result, the monitoring process of the distributed system can be simplified, the score prediction model is used for marking the monitoring result of the distributed system, the accuracy of monitoring the distributed system is further improved, the supervision party is prompted to timely maintain the distributed system, and the influence on upstream and downstream services is reduced.

The above is a schematic solution of the monitoring device of another distributed system of the present embodiment. It should be noted that, the technical solution of the monitoring device of the distributed system and the technical solution of the monitoring method of the other distributed system belong to the same concept, and details of the technical solution of the monitoring device of the distributed system, which are not described in detail, can be referred to the description of the technical solution of the monitoring method of the other distributed system.

Fig. 9 illustrates a block diagram of a computing device 900 provided in accordance with an embodiment of the present application. The components of computing device 900 include, but are not limited to, memory 910 and processor 920. Processor 920 is coupled to memory 910 via bus 930 with database 950 configured to hold data.

Computing device 900 also includes an access device 940, access device 940 enabling computing device 900 to communicate via one or more networks 960. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 940 may include one or more of any type of network interface, wired or wireless (e.g., a Network Interface Card (NIC)), such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the present application, the above-described components of computing device 900 and other components not shown in FIG. 9 may also be connected to each other, for example, by a bus. It should be understood that the block diagram of the computing device illustrated in FIG. 9 is for exemplary purposes only and is not intended to limit the scope of the present application. Those skilled in the art may add or replace other components as desired.

Computing device 900 may be any type of stationary or mobile computing device including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smart phone), wearable computing device (e.g., smart watch, smart glasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 900 may also be a mobile or stationary server.

Wherein the processor 920 is configured to execute the following computer-executable instructions:

The foregoing is a schematic illustration of a computing device of this embodiment. It should be noted that, the technical solution of the computing device and the technical solution of the monitoring method of the distributed system belong to the same concept, and details of the technical solution of the computing device, which are not described in detail, can be referred to the description of the technical solution of the monitoring method of the distributed system.

Fig. 10 illustrates a block diagram of another computing device 1000 provided in accordance with an embodiment of the present application. The components of the computing device 1000 include, but are not limited to, a memory 1010 and a processor 1020. Processor 1020 is coupled to memory 1010 via bus 1030 and database 1050 is used to store data.

Computing device 1000 also includes access device 1040, which access device 1040 enables computing device 1000 to communicate via one or more networks 1060. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. The access device 1040 may include one or more of any type of network interface, wired or wireless (e.g., a Network Interface Card (NIC)), such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the present application, the above-described components of computing device 1000, as well as other components not shown in FIG. 10, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device illustrated in FIG. 10 is for exemplary purposes only and is not intended to limit the scope of the present application. Those skilled in the art may add or replace other components as desired.

Computing device 1000 may be any type of stationary or mobile computing device including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smart phone), wearable computing device (e.g., smart watch, smart glasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 1000 may also be a mobile or stationary server.

Wherein the processor 1020 is configured to execute the following computer-executable instructions:

The foregoing is a schematic illustration of another computing device of this embodiment. It should be noted that, the technical solution of the computing device and the technical solution of the monitoring method of the other distributed system belong to the same concept, and details of the technical solution of the computing device, which are not described in detail, can be referred to the description of the technical solution of the monitoring method of the other distributed system.

An embodiment of the present application further provides a computer readable storage medium, where computer instructions are stored, where the instructions are used in a monitoring method of two distributed systems when the instructions are executed by a processor, and it should be noted that a technical solution of the storage medium and a technical solution of a monitoring method of two distributed systems belong to the same concept, and details of the technical solution of the storage medium are not described in detail, and all reference may be made to descriptions of the technical solutions of the monitoring methods of two distributed systems.

The foregoing describes specific embodiments of the present application. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

The computer instructions include computer program code that may be in source code form, object code form, executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.

It should be noted that, for the sake of simplicity of description, the foregoing method embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the present application is not limited by the order of actions described, as some steps may be performed in other order or simultaneously in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily all necessary for the present application.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

The above-disclosed preferred embodiments of the present application are provided only as an aid to the elucidation of the present application. Alternative embodiments are not intended to be exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the teaching of this application. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. This application is to be limited only by the claims and the full scope and equivalents thereof.

Claims

1. A method of monitoring a distributed system, comprising:

comparing the target score with a score threshold of the monitoring dimension, and determining a monitoring result of the distributed system in the monitoring dimension according to a comparison result;

the preprocessing the attribute data, labeling the attribute data according to the preprocessing result, and obtaining the attribute data carrying the tag comprises the following steps:

acquiring an analysis result of analyzing the attribute data; labeling the attribute data according to the analysis result to obtain attribute data carrying a processed label or a label to be processed; analyzing the attribute data refers to primarily judging whether the distributed system has faults or not, if so, marking the attribute data to obtain attribute data carrying a label to be processed, and if not, marking the attribute data to obtain attribute data carrying a processed label;

The score prediction is performed on the distributed system according to the label to obtain a target score, which comprises the following steps:

2. The method for monitoring a distributed system according to claim 1, wherein the step of inputting the attribute data carrying the tag into the score prediction model corresponding to the monitoring dimension includes:

determining a score prediction model corresponding to the monitoring dimension;

3. The method for monitoring the distributed system according to claim 1, wherein comparing the target score with the score threshold of the monitoring dimension, and determining the monitoring result of the distributed system in the monitoring dimension according to the comparison result comprises:

4. A method of monitoring a distributed system according to claim 3, the determining that the distributed system is after the step of monitoring that there is a fault in the dimension is performed further comprising:

5. The method for monitoring a distributed system according to claim 1, wherein the score prediction model comprises a first score prediction module and a second score prediction module, the first score prediction module introduces a first relationship weight of the second score prediction module in the process of making a score prediction, and the second score prediction module introduces a second relationship weight of the first score prediction module in the process of making a score prediction;

correspondingly, the score prediction model is trained by the following method:

6. The method for monitoring a distributed system according to claim 5, wherein the obtaining the second sample data in the sample set and labeling the second sample data to obtain the second sample data with the tag includes:

acquiring the second sample data in the sample set;

7. The method for monitoring a distributed system according to claim 6, wherein determining the target sample data of each sub-monitoring dimension according to the division result comprises:

8. The method of monitoring a distributed system according to claim 5, further comprising:

9. The method of monitoring a distributed system according to claim 1, wherein the score threshold is determined by:

drawing an analysis curve of the score prediction model;

10. The method of monitoring a distributed system according to claim 5, further comprising:

Receiving an update request for the sample set;

11. The method for monitoring a distributed system according to claim 1, wherein the preprocessing the attribute data includes:

12. The method of monitoring a distributed system according to claim 1, the monitoring dimension comprising at least one of:

accordingly, the attribute data includes at least one of:

traffic data, resource data, time data, cluster data.

13. A monitoring device of a distributed system, comprising:

the monitoring result determining unit is configured to compare the target score with a score threshold value of the monitoring dimension, and determine a monitoring result of the distributed system in the monitoring dimension according to a comparison result;

wherein the attribute data annotation unit is further configured to:

Wherein the score prediction unit is further configured to:

14. A method of monitoring a distributed system, comprising:

Comparing the target score with a score threshold of the monitoring dimension, generating monitoring reminding information of the distributed system in the monitoring dimension according to a comparison result, and sending the monitoring reminding information to the supervision party;

15. A monitoring device of a distributed system, comprising:

the reminding information sending unit is configured to compare the target score with a score threshold value of the monitoring dimension, generate monitoring reminding information of the distributed system in the monitoring dimension according to a comparison result, and send the monitoring reminding information to the supervision party;

Wherein the attribute data annotation unit is further configured to:

wherein the model prediction unit is further configured to:

16. A computing device, comprising:

a memory and a processor;

17. A computing device, comprising:

a memory and a processor;

18. A computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the method of monitoring a distributed system as claimed in any one of claims 1 to 12 or 14.