CN104063747A - Performance abnormality prediction method in distributed system and system - Google Patents

Performance abnormality prediction method in distributed system and system Download PDF

Info

Publication number
CN104063747A
CN104063747A CN201410294472.2A CN201410294472A CN104063747A CN 104063747 A CN104063747 A CN 104063747A CN 201410294472 A CN201410294472 A CN 201410294472A CN 104063747 A CN104063747 A CN 104063747A
Authority
CN
China
Prior art keywords
data pattern
pattern
performance
distributed
eigenwert
Prior art date
Application number
CN201410294472.2A
Other languages
Chinese (zh)
Inventor
曹健
杨定裕
仇沂
顾骅
沈琪骏
王烺
Original Assignee
上海交通大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海交通大学 filed Critical 上海交通大学
Priority to CN201410294472.2A priority Critical patent/CN104063747A/en
Publication of CN104063747A publication Critical patent/CN104063747A/en

Links

Abstract

The invention relates to a performance abnormality prediction method in a distributed system and a system. The historical performance data and real-time performance data are collected through the monitoring system of a distributed environment, a characteristic value is employed to extract the characteristic of description data, the mode of a performance variable is constructed, a classification model is trained through Naive Bayesian classification, a current data mode and historical data modes are compared, a mode which is most similar to the current data mode is found in the historical data modes, and finally a question whether the current data mode is in an abnormal state is predicated according to a Naive Bayesian predication model. According to the method and the system, for the abnormal performance prediction in the distributed system, the problem of the characteristic of a variable is considered comprehensively, the accuracy is high, a machine learning method Bayesian model is employed to guide the prediction, the performance abnormality situation is detected in real time, the detected prediction is estimated and analyzed through the previously obtained Bayesian model, the confidence of the prediction is raised, the degree of automation is high, and the reliability and practicality of the prediction are improved.

Description

Property abnormality Forecasting Methodology and system in a kind of distributed system

Technical field

The present invention relates to a kind of property abnormality and detect Forecasting Methodology and system, relate in particular to property abnormality Forecasting Methodology and system in a kind of distributed system.

Background technology

In distributed system, each computing machine is separate, can be physically adjacent, also can geographically disperse, and they connect by network or other modes, form a whole.From research, Distributed Calculation has following characteristics: 1. resource sharing; 2. scalability; 3. fault-tolerance; 4. concurrency.

The ability of calculating in order to embody better the powerful deal with data of Distributed Calculation, monitors and will become particularly important and crucial distributed computing environment.Operation, the reasonable distribution resource that system must be coordinated these tasks is fully utilized resource and promotes the performance of whole system.Under normal circumstances, system adopts scheduler program to manage these tasks.In scheduler program meeting acquisition system, the relevant information of various resources is to determine whether resource can be used, and then dispatching algorithm is according to determining the priority of task and distribute to their available resources the working time of the availability of resource, task etc.But along with the operation of task, the state of various resources, as cpu load, free memory, hard disk remaining space etc. can change at any time, if before carrying out scheduling, whether just can predict resource still can use in following certain time, and reasonably avoid the use of abnormal period to resource, the scheduling result of system will be more desirable so.Therefore, the resource in system is monitored in real time, and before abnormal generation, detected abnormal omen and have great importance.

System performance refers to extremely during running software, because resource exhausts or run-time error builds up caused computer system performance and declines gradually gradually, finally drop to people the phenomenon of flagrant degree.Extremely normally system state behavior of system performance (as, cpu load, memory usage etc.) can not maintain existing application work.Most of predicting abnormality models are all the model based on regression technique, and regression technique has its specific limitation, and therefore this class model exists defect separately, or is only applicable to specific data, or predicated error is larger etc.And based on the already present predicting abnormality model based on classification, still need manually historical data allocation identification, automaticity is not high, and just goes to observe from the angle of variate-value, the feature that can not comprehensively consider variable, therefore predicts the outcome and can have certain error.

Summary of the invention

The object of the present invention is to provide property abnormality Forecasting Methodology and system in a kind of distributed system, solved not high to distributed environment performance prediction automaticity, just go to observe and can not consider the problem of the feature of variable comprehensively from the angle of variate-value.

In order to address the above problem, the present invention relates to the property abnormality Forecasting Methodology in a kind of distributed system, comprise the following steps:

S1: extract the data source of target data values as training in the History Performance Data that the some monitoring nodes from supervisory system obtain, and calculate the eigenwert of each historical data pattern in data source;

S2: obtain respectively the prior probability distribution of each historical data pattern under various states according to the eigenwert of each historical data pattern, and add up the probability distribution of various states, thereby train the Bayesian model of the state of various data patterns;

S3: the real-time performance data of obtaining according to supervisory system calculate the eigenwert of current data pattern;

S4: find the data pattern the most similar to current data pattern from described historical data pattern;

S5: predict by the Bayesian model of training in S2 according to the Output rusults of S4, draw respectively the probability distribution of described various states;

S6: according to result in S5, the self-confident factor and abnormal threshold value are set, are predicted as abnormality if the self-confident factor exceeds abnormal threshold value.

Preferably, described eigenwert comprises performance number variable quantity, performance number rate of change and performance number.

Preferably, in S2, the various eigenwert variances of all historical data patterns are arranged by value size, and be divided into some subspaces, calculate the prior probability of the particular state of the corresponding eigenwert variance in each subspace.

Preferably, in S2, train the Bayesian model of each historical data pattern according to the eigenwert of described each historical data pattern, obtain respectively the prior probability of the various states of each pattern.

Preferably, S4 further comprises:

Calculate the standard variance of the eigenwert between current data pattern and each historical normal mode;

Draw with the historical data pattern of all standard variance sums of current data pattern minimum to be the parallel pattern of current data pattern.

Preferably, described state is abnormality, alarm condition and normal condition.

Preferably, in S6, also comprise alarm threshold value is set, if the self-confident factor is between alarm threshold value and abnormal threshold value, be predicted as alarm condition, be predicted as normal condition if the self-confident factor is less than alarm threshold value.

In order to address the above problem, the invention still further relates to the property abnormality prognoses system in a kind of distributed system, be connected with the supervisory system of distributed system, comprising:

History feature value computing module, extracts the data source of target data values as training in the History Performance Data that the some monitoring nodes from supervisory system obtain, and calculates the eigenwert of each historical data pattern in data source;

Prior probability module, be connected with the output terminal of history feature value computing module, obtain respectively the prior probability distribution of each historical data pattern under various states according to the eigenwert of each historical data pattern, and the probability distribution of adding up various states, thereby train the Bayesian model of the state of various data patterns;

Real-time characteristic value computing module, the real-time performance data of obtaining according to the some monitoring nodes in supervisory system calculate the eigenwert of current data pattern;

Parallel pattern module, is connected with the output terminal of history feature value computing module and the output terminal of real-time characteristic computing module, from described historical data pattern, finds the data pattern the most similar to current data pattern;

Probability calculation module, predicts by the Bayesian model of training in prior probability module according to the Output rusults of parallel pattern module, draws respectively the probability distribution of described various states; And

Abnormal alarm module, arranges the self-confident factor and abnormal threshold value according to result in probability calculation module, is predicted as abnormality if the self-confident factor exceeds abnormal threshold value.

Preferably, described eigenwert comprises performance number variable quantity, performance number rate of change and performance number.

Preferably, described state comprises abnormality, alarm condition and normal condition.

The present invention, owing to adopting above technical scheme, compared with prior art, has following advantage and good effect:

1) the present invention, originally for the property abnormality prediction in distributed system, by the performance of distributed node is analyzed by special value and dividing data pattern, considers the problem of the feature of variable comprehensively, and accuracy rate is higher;

2) the present invention adopts machine learning method Bayesian model to instruct prediction, and detect in real time property abnormality situation, and the prediction detecting is carried out to analysis and assessment by the Bayesian model drawing before, the degree of confidence of prediction is provided, automaticity is high, has improved forecasting reliability and practicality;

3) the eigenwert standard variance of each historical data pattern is changed into multiple subspaces by the present invention, parameter using these subspaces as Bayesian model is trained, calculate the prior probability of the corresponding particular state of sub spaces, further promoted the accuracy rate of predicting abnormality.

Brief description of the drawings

Fig. 1 is the process flow diagram of the property abnormality Forecasting Methodology in a kind of distributed system of the present invention;

Fig. 2 is the system chart of the property abnormality prognoses system in a kind of distributed system of the present invention.

Embodiment

Below with reference to accompanying drawing of the present invention; technical scheme in the embodiment of the present invention is carried out to clear, complete description; obviously; as described herein is only a part of example of the present invention; it is not whole examples; based on the embodiment in the present invention, the every other embodiment that those of ordinary skill in the art obtain under the prerequisite of not making creative work, belongs to protection scope of the present invention.

For the ease of the understanding to the embodiment of the present invention, be further explained as an example of specific embodiment example below in conjunction with accompanying drawing, and each embodiment does not form the restriction to the embodiment of the present invention.

Embodiment mono-

Please refer to Fig. 1, the invention provides the property abnormality Forecasting Methodology in a kind of distributed system, mainly comprise the following steps:

S1: extract the data source of target data values as training in the History Performance Data that the some monitoring nodes from supervisory system obtain, and calculate the eigenwert of each historical data pattern in data source;

In the present embodiment, describe a data point by the eigenwert of three aspects, comprise performance number variable quantity (Change Value, CV), performance number rate of change (Change Rate, CR) and performance number (Value, V).Performance number is a moment t 1the value of performance metric.

Performance number variable quantity is a moment t 1with another moment t 2the difference of performance metric:

CV ( t i ) = V t i - V t i - 1

Wherein, ---moment t ithe value of performance metric, i=0,1 ..., n;

---moment t i-1the value of performance metric, i=1 ..., n.

Performance number rate of change is the variation ratio of performance metric, equals performance number variable quantity divided by current time t 1performance number:

Wherein, ---moment t ithe value of performance metric, i=0,1 ..., n;

---moment t i-1the value of performance metric, i=1 ..., n.

S2: obtain respectively the prior probability distribution of each historical data pattern under various states according to the eigenwert of each historical data pattern, and add up the probability distribution of various states, thereby train the Bayesian model of the state of various data patterns;

According to the data characteristics result of S1, historical data is divided into multiple patterns, and these patterns are carried out to the mark of three states, it is abnormality, alarm condition and normal condition, then train prior probability distribution by three states, count the probability distribution at each state of each pattern, train the Bayesian model of various patterns, for further lift scheme correctness, the feature of pattern is changed into multiple subspaces, and the parameter using these subspaces as Bayesian model is trained.

In S2, can train according to the eigenwert of each historical data pattern the Bayesian model of each historical data pattern, obtain respectively the prior probability of the various states of each pattern.Various states can be abnormality, alarm condition and normal condition.

Set up a disaggregated model with Naive Bayes Classifier.The use restriction of Naive Bayes Classification is between each parameter, to be mutually independently, and in the pattern obtaining is formally three parameters independently mutually, therefore meets the requirement of Naive Bayes Classification.

Suppose that current time is t i, so from t i-Lto t itime period in the relevant eigenwert of all data form current data pattern, wherein L is the length of current data pattern.

In training, add label for the each pattern in training data, indicate the state of this pattern, pattern can be expressed as (Vt1, Vt2 ..., Vtn, Status).Use the training dataset containing label, can obtain the prior probability distribution (prior distribution) of all patterns of three states:

P((SD CV,SD CR,SD V)|status)

Wherein, status---normal condition normal, alarm condition alert or abnormality abnormal.

Three standard variances of parallel pattern are respectively SD cV, SD cR, SD v, the probability size that the corresponding state of this pattern is status.According to training data, can also obtain the distribution situation P (status) of each state.

According to above prior probability, can be in the hope of obtaining in situation at this variance yields, calculate a specific shape probability of state size, use Bayes's classification to obtain:

P ( status | ( SD CV , SD CR , SD V ) ) = P ( ( SD CV , SD CR , SD V ) | status ) P ( status ) P ( ( SD cv , SD CR , SD V ) )

As noted earlier, between three parameters, be independent of each other, therefore can be expressed as:

P ( status | ( SD CV , SD CR , SD V ) ) = P ( SD CV | status ) P ( SD CR | status ) P ( SD V | status ) P ( status ) P ( SD CV ) P ( SD CR ) P ( SD V )

For further lift scheme correctness, also can be that the various eigenwert variances of all historical data patterns are arranged by value size, and be divided into some subspaces, calculate the prior probability of the particular state of the corresponding eigenwert variance in each subspace, particular state can be abnormality, alarm condition or normal condition.

Model space is divided into several subspaces, and each subspace has comprised all particular characteristic value that exist in a continuous span, has therefore obtained several discrete subspaces, the parameter using these subspaces as Naive Bayes Classification.For example, performance number rate of change variance SD cRall spans are r=[a, b], wherein a is the minimum value that performance number rate of change variance is got, b is the maximal value that performance number rate of change variance is got.Be m sub spaces by this spatial division, the length of every sub spaces is:

Δr = b - a m

So each subspace can be expressed as:

S SDCR1=[a,a+Δr],S SDCR2=[a+Δr,a+2*Δr],...,S SDCR1=[b-Δr,b]

For each performance number rate of change variance, as long as it is put in suitable subspace.Therefore, do not need to calculate each variance the prior probability of corresponding particular state, only need to calculate the prior probability of the corresponding particular state of sub spaces:

P ( status | ( SD CV , SD CR , SD V ) ) = P ( S SDCVi | status ) P ( S SDCVj | status ) P ( S SDCVk | status ) P ( S SDCVi ) P ( S SDCRj ) P ( S SDVk )

Wherein, S sDCVi---performance number variable quantity variance SD cVcorresponding certain sub spaces;

S sDCRj---performance number rate of change variance SD cRcorresponding certain sub spaces;

S sDVk---performance number variance SD vcorresponding certain sub spaces;

Status---certain specific state, normal, alert or abnormal.

S3: the real-time performance data of obtaining according to supervisory system calculate the eigenwert of current data pattern.Suppose that current time is t i, so from t i-L is to t itime period in the relevant feature of all data form current data pattern, wherein L is the length of current data pattern.

S4: find the data pattern the most similar to current data pattern from historical data pattern;

Be specially:

S41: the standard variance that calculates the eigenwert between current data pattern and each historical normal mode;

Each moment t idata have three features, i.e. (CV (t i), CR (t i), V (ti)).Suppose that current time is t i, so from t i-Lto t itime period in the relevant feature of all data form the pattern of current performance metric, wherein L is the length of current data pattern.

As Fig. 2, current pattern and historical normal mode are compared, and in historical normal mode, find a pattern the most similar to current data pattern.Calculate the standard variance (Standard Deviation) of each feature between current data pattern and each historical normal mode.If a historical data pattern is from moment t j-L starts, to t jfinish, the performance number variable quantity standard variance between current data pattern and this historical data pattern is designated as SD cV(t j), the performance number rate of change standard variance between current data pattern and this historical data pattern is designated as SD cR(t j), the worth standard variance SD between current data pattern and this historical data pattern v(t j).Current data pattern and historical data pattern are before contrasted one by one,

S42: meet all standard variance sum minimums of current data pattern and a historical data pattern, establishing this historical data pattern is the parallel pattern of current data pattern.

In the time that a pattern in historical data meets following formula:

SD CV ( t k ) + SD CR ( t k ) + SD V ( t k ) = min j { SD CV ( t j ) + SD CR ( t j ) + SD V ( t j ) }

Wherein, { SD cV(t j)+SD cR(t j)+SD v(t j)---the set that between current data pattern and all historical data patterns, the standard variance of feature forms;

Min---the minimum value in set.

That is, meet all standard variance sum minimums of current data pattern and this historical data pattern, so just claim that this historical data pattern is the parallel pattern of current data pattern.Therefore,, for each current data pattern, can find the most similar pattern in a history:

(SD CV(t k),SD CR(t k),SD V(t k))。

S5: predict by the Bayesian model of training in S2 according to the Output rusults of S4, draw respectively the probability distribution of various states;

In the present embodiment according to the parallel pattern (SD of S4 cV(t k), SD cR(t k), SD v(t k)), the Bayesian model of training from S2 instructs prediction, obtains the shape probability of state situation of pattern:

P ( status | ( SD CV , SD CR , SD V ) ) = P ( ( SD CV , SD CR , SD V ) | status ) P ( status ) P ( ( SD cv , SD CR , SD V ) )

The probability that obtains pattern carrys out the state of deterministic model, and the state of judgment model exactly, just can capture the omen of abnormal generation, thereby realizes predicting abnormality.

S6: according to result in S5, the self-confident factor and abnormal threshold value are set, are predicted as abnormality if the self-confident factor exceeds abnormal threshold value.

Also comprise alarm threshold value is set, if the self-confident factor is between alarm threshold value and abnormal threshold value, be predicted as alarm condition, be predicted as normal condition if the self-confident factor is less than alarm threshold value.Also need to set up alarm mechanism, and take the defence treatment measures after warning by default alarm mechanism.

In the present embodiment, for current pattern (SD cV, SD cR, SD v), obtain corresponding three kinds of shape probability of states according to above method:

P(normal|(SD cv,SD CR,SD V))

P(alert|(SD cv,SD CR,SD V))

P(abnormal|(SD cv,SD CR,SD V))

In order to determine which kind of state is this pattern be in, and above three shape probability of states are done to corresponding comparison:

δ 1=logP(alert|(SD CV,SD CR,SD V))-logP(normal|(SD CV,SD CR,SD V))

δ 2=logP(alert|(SD CV,SD CR,SD V))-logP(abnormal|(SD CV,SD CR,SD V))

If meet following condition, judge that current data pattern is in alarm state, next may occur extremely:

δ 1>=0 and δ 2>=0

δ 1which is larger to represent the possibility of current data pattern in alarm condition and the possibility in normal condition, δ 2that is larger to represent the possibility of current data pattern in alarm condition and the possibility in abnormality.If meet formula (3-10), illustrate that the possibility of current data pattern in alarm condition is all larger than the possibility in normal or abnormality, can judge next and likely occur extremely.

When sending while predicting abnormal alarm, if δ 1>=0, and δ 1be worth larger, so show that this pattern is the possibility of the significantly large normal condition of possibility of alarm condition.Similarly, if δ 2>=0, and δ 2be worth larger, so show that this pattern is the possibility of the significantly large abnormality of possibility of alarm condition.Can say | δ 1| and | δ 2| value larger, the confidence level predicting the outcome is higher, therefore can be by | δ 1| and | δ 2| as the reference index of the credibility of predicting abnormality.Each predicting abnormality of making is distributed to self-confident factor (Confidence Factor, CF), being calculated as follows of the self-confident factor:

CF=δ 12

Apparently, larger if this pattern is the possibility of alert state, CF value is larger so, and therefore this is a mode of effectively weighing predicting abnormality confidence level.According to CF value, the confidence level that can know prediction is much, and determine alarm threshold value according to confidence level size, if the self-confident factor is between alarm threshold value and abnormal threshold value, be predicted as alarm condition, be predicted as normal condition if the self-confident factor is less than alarm threshold value, also need to set up alarm mechanism, and take when alarm condition and the abnormality to defend treatment measures to prevent from extremely occurring or reducing the loss extremely bringing by default alarm mechanism.

Embodiment bis-

Please refer to Fig. 2, the invention provides the property abnormality prognoses system in a kind of distributed system, be connected with the supervisory system of distributed system, mainly comprise: history feature value computing module, prior probability module, real-time characteristic value computing module, parallel pattern module, probability calculation module and abnormal alarm module.

History feature value computing module, extracts the data source of target data values as training in the History Performance Data that the some monitoring nodes from supervisory system obtain, and calculates the eigenwert of each historical data pattern in data source;

In the present embodiment, describe a data point by the eigenwert of three aspects, comprise performance number variable quantity (Change Value, CV), performance number rate of change (Change Rate, CR) and performance number (Value, V).Performance number is a moment t 1the value of performance metric.

Performance number variable quantity is a moment t 1with another moment t 2the difference of performance metric:

CV ( t i ) = V t i - V t i - 1

Wherein, ---moment t ithe value of performance metric, i=0,1 ..., n;

---moment t i-1the value of performance metric, i=1 ..., n.

Performance number rate of change is the variation ratio of performance metric, equals performance number variable quantity divided by current time t 1performance number:

Wherein, ---moment t ithe value of performance metric, i=0,1 ..., n;

---moment t i-1the value of performance metric, i=1 ..., n.

Prior probability module, be connected with the output terminal of history feature value computing module, obtain respectively the prior probability distribution of each historical data pattern under various states according to the eigenwert of each historical data pattern, and the probability distribution of adding up various states, thereby train the Bayesian model of the state of various data patterns;

According to the data characteristics result of history feature value computing module output, historical data is divided into multiple patterns, and these patterns are carried out to the mark of three states, it is abnormality, alarm condition and normal condition, then train prior probability distribution by three states, count the probability distribution at each state of each pattern, train the Bayesian model of various patterns, for further lift scheme correctness, the feature of pattern is changed into multiple subspaces, and the parameter using these subspaces as Bayesian model is trained.

In prior probability module, can train according to the eigenwert of each historical data pattern the Bayesian model of each historical data pattern, obtain respectively the prior probability of the various states of each pattern.Various states can be abnormality, alarm condition and normal condition.

Set up a disaggregated model with Naive Bayes Classifier.The use restriction of Naive Bayes Classification is between each parameter, to be mutually independently, and in the pattern obtaining is formally three parameters independently mutually, therefore meets the requirement of Naive Bayes Classification.

Suppose that current time is t i, so from t i-Lto t itime period in the relevant eigenwert of all data form current data pattern, wherein L is the length of current data pattern.

In training, add label for the each pattern in training data, indicate the state of this pattern, pattern can be expressed as (Vt1, Vt2 ..., Vtn, Status).Use the training dataset containing label, can obtain the prior probability distribution (prior distribution) of all patterns of three states:

P((SD CV,SD CR,SD V)|status)

Wherein, status---normal condition normal, alarm condition alert or abnormality abnormal.

Three standard variances of parallel pattern are respectively SD cV, SD cR, SD v, the probability size that the corresponding state of this pattern is status.According to training data, can also obtain the distribution situation P (status) of each state.

According to above prior probability, can be in the hope of obtaining in situation at this variance yields, calculate a specific shape probability of state size, use Bayes's classification to obtain:

P ( status | ( SD CV , SD CR , SD V ) ) = P ( ( SD CV , SD CR , SD V ) | status ) P ( status ) P ( ( SD cv , SD CR , SD V ) )

As noted earlier, between three parameters, be independent of each other, therefore can be expressed as:

P ( status | ( SD CV , SD CR , SD V ) ) = P ( SD CV | status ) P ( SD CR | status ) P ( SD V | status ) P ( status ) P ( SD CV ) P ( SD CR ) P ( SD V )

For further lift scheme correctness, also can be that the various eigenwert variances of all historical data patterns are arranged by value size, and be divided into some subspaces, calculate the prior probability of the particular state of the corresponding eigenwert variance in each subspace, particular state can be abnormality, alarm condition or normal condition.

Model space is divided into several subspaces, and each subspace has comprised all particular characteristic value that exist in a continuous span, has therefore obtained several discrete subspaces, the parameter using these subspaces as Naive Bayes Classification.For example, performance number rate of change variance SD cRall spans are r=[a, b], wherein a is the minimum value that performance number rate of change variance is got, b is the maximal value that performance number rate of change variance is got.Be m sub spaces by this spatial division, the length of every sub spaces is:

Δr = b - a m

So each subspace can be expressed as:

S SDCR1=[a,a+Δr],S SDCR2=[a+Δr,a+2*Δr],...,S SDCR1=[b-Δr,b]

For each performance number rate of change variance, as long as it is put in suitable subspace.Therefore, do not need to calculate each variance the prior probability of corresponding particular state, only need to calculate the prior probability of the corresponding particular state of sub spaces:

P ( status | ( SD CV , SD CR , SD V ) ) = P ( S SDCVi | status ) P ( S SDCVj | status ) P ( S SDCVk | status ) P ( S SDCVi ) P ( S SDCRj ) P ( S SDVk )

Wherein, S sDCVi---performance number variable quantity variance SD cVcorresponding certain sub spaces;

S sDCRj---performance number rate of change variance SD cRcorresponding certain sub spaces;

S sDVk---performance number variance SD vcorresponding certain sub spaces;

Status---certain specific state, normal condition normal, alarm condition alert or abnormality abnormal.

Real-time characteristic value computing module, the real-time performance data of obtaining according to the some monitoring nodes in supervisory system calculate the eigenwert of current data pattern.Suppose that current time is t i, so from t i-L is to t itime period in the relevant feature of all data form current data pattern, wherein L is the length of current data pattern.

Parallel pattern module, is connected with history feature value computing module with real-time characteristic value computing module, finds the data pattern the most similar to current data pattern from historical data pattern;

Be specially:

Historical data comparison module, is connected with history feature value computing module with real-time characteristic value computing module, calculates the standard variance of the eigenwert between current data pattern and each historical normal mode;

Each moment t idata have three features, i.e. (CV (t i), CR (t i), V (ti)).Suppose that current time is t i, so from t i-Lto t itime period in the relevant feature of all data form the pattern of current performance metric, wherein L is the length of current data pattern.

As Fig. 2, current pattern and historical normal mode are compared, and in historical normal mode, find a pattern the most similar to current data pattern.Calculate the standard variance (Standard Deviation) of each feature between current data pattern and each historical normal mode.If a historical data pattern is from moment t j-L starts, to t jfinish, the performance number variable quantity standard variance between current data pattern and this historical data pattern is designated as SD cV(t j), the performance number rate of change standard variance between current data pattern and this historical data pattern is designated as SD cR(t j), the worth standard variance SD between current data pattern and this historical data pattern v(t j).Current data pattern and historical data pattern are before contrasted one by one,

Minimum variance acquisition module, is connected with the output terminal of historical data comparison module, meets all standard variance sum minimums of current data pattern and a historical data pattern, and establishing this historical data pattern is the parallel pattern of current data pattern.

In the time that a pattern in historical data meets following formula:

SD CV ( t k ) + SD CR ( t k ) + SD V ( t k ) = min j { SD CV ( t j ) + SD CR ( t j ) + SD V ( t j ) }

Wherein, { SD cV(t j)+SD cR(t j)+SD v(t j)---the set that between current data pattern and all historical data patterns, the standard variance of feature forms; Min---the minimum value in set.

That is, meet all standard variance sum minimums of current data pattern and this historical data pattern, so just claim that this historical data pattern is the parallel pattern of current data pattern.Therefore,, for each current data pattern, can find the most similar pattern in a history:

(SD CV(t k),SD CR(t k),SD V(t k)。

Probability calculation module, predicts by the Bayesian model of training in prior probability module according to the Output rusults of parallel pattern module, draws respectively the probability distribution of various states;

In the present embodiment according to the parallel pattern of minimum variance acquisition module:

(SD cV(t k), SD cR(t k), SD v(t k), the Bayesian model of training from prior probability module instructs prediction, obtains each shape probability of state situation of pattern:

P ( status | ( SD CV , SD CR , SD V ) ) = P ( ( SD CV , SD CR , SD V ) | status ) P ( status ) P ( ( SD cv , SD CR , SD V ) )

The probability that obtains pattern carrys out the state of deterministic model, and the state of judgment model exactly, just can capture the omen of abnormal generation, thereby realizes predicting abnormality.

Abnormal alarm module, arranges the self-confident factor and abnormal threshold value according to Output rusults in probability calculation module, is predicted as abnormality if the self-confident factor exceeds abnormal threshold value.

Generally also comprise alarm threshold value is set, if the self-confident factor is between alarm threshold value and abnormal threshold value, be predicted as alarm condition, be predicted as normal condition if the self-confident factor is less than alarm threshold value.Also need to set up alarm mechanism, and take the defence treatment measures after warning by default alarm mechanism.

In the present embodiment, for current pattern (SD cV, SD cR, SD v), obtain corresponding three kinds of shape probability of states according to above method:

P(normal|(SD cv,SD CR,SD V))

P(alert|(SD cv,SD CR,SD V))

P(abnormal|(SD cv,SD CR,SD V))

In order to determine which kind of state is this pattern be in, and above three shape probability of states are done to corresponding comparison:

δ 1=logP(alert|(SD CV,SD CR,SD V))-logP(normal|(SD CV,SD CR,SD V))

δ 2=logP(alert|(SD CV,SD CR,SD V))-logP(abnormal|(SD CV,SD CR,SD V))

If meet following condition, judge that current data pattern is in alarm state, next may occur extremely:

δ 1>=0 and δ 2>=0

δ 1which is larger to represent the possibility of current data pattern in alarm condition and the possibility in normal condition, δ 2that is larger to represent the possibility of current data pattern in alarm condition and the possibility in abnormality.If meet formula (3-10), illustrate that the possibility of current data pattern in alarm condition is all larger than the possibility in normal or abnormality, can judge next and likely occur extremely.

When sending while predicting abnormal alarm, if δ 1>=0, and δ 1be worth larger, so show that this pattern is the possibility of the significantly large normal condition of possibility of alert state.Similarly, if δ 2>=0, and δ 2be worth larger, so show that this pattern is the possibility of the significantly large abnormality of possibility of alarm condition.Can say | δ 1| and | δ 2| value larger, the confidence level predicting the outcome is higher, therefore can be by | δ 1| and | δ 2| as the reference index of the credibility of predicting abnormality.Each predicting abnormality of making is distributed to self-confident factor (Confidence Factor, CF), being calculated as follows of the self-confident factor:

CF=δ 12

Apparently, larger if this pattern is the possibility of alert state, CF value is larger so, and therefore this is a mode of effectively weighing predicting abnormality confidence level.According to CF value, the confidence level that can know prediction is much, and determine alarm threshold value according to confidence level size, if the self-confident factor is between alarm threshold value and abnormal threshold value, be predicted as alarm condition, be predicted as normal condition if the self-confident factor is less than alarm threshold value, also need to set up alarm mechanism, and take when alarm condition and the abnormality to defend treatment measures to prevent from extremely occurring or reducing the loss extremely bringing by default alarm mechanism.

The above; only for preferably embodiment of the present invention, but protection scope of the present invention is not limited to this, is anyly familiar with in technical scope that those skilled in the art disclose in the present invention; the variation that can expect easily or replacement, within all should being encompassed in protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of claim.

Claims (10)

1. the property abnormality Forecasting Methodology in distributed system, is characterized in that, comprises the following steps:
S1: extract the data source of target data values as training in the History Performance Data that the some monitoring nodes from supervisory system obtain, and calculate the eigenwert of each historical data pattern in data source;
S2: obtain respectively the prior probability distribution of each historical data pattern under various states according to the eigenwert of each historical data pattern, and add up the probability distribution of various states, thereby train the Bayesian model of the state of various data patterns;
S3: the real-time performance data of obtaining according to supervisory system calculate the eigenwert of current data pattern;
S4: find the data pattern the most similar to current data pattern from described historical data pattern;
S5: predict by the Bayesian model of training in S2 according to the Output rusults of S4, draw respectively the probability distribution of various states;
S6: according to result in S5, the self-confident factor and abnormal threshold value are set, are predicted as abnormality if the self-confident factor exceeds abnormal threshold value.
2. the property abnormality Forecasting Methodology in a kind of distributed system as claimed in claim 1, is characterized in that,
Described eigenwert comprises performance number variable quantity, performance number rate of change and performance number.
3. the property abnormality Forecasting Methodology in a kind of distributed system as claimed in claim 1, it is characterized in that, in S2, the various eigenwert variances of all historical data patterns are arranged by value size, and be divided into some subspaces, calculate the prior probability of the particular state of the corresponding eigenwert variance in each subspace.
4. the property abnormality Forecasting Methodology in a kind of distributed system as claimed in claim 3, it is characterized in that, in S2, train the Bayesian model of each historical data pattern according to the eigenwert of described each historical data pattern, obtain respectively the prior probability of the various states of each pattern.
5. the property abnormality Forecasting Methodology in a kind of distributed system as claimed in claim 3, is characterized in that, S4 further comprises:
Calculate the standard variance of the eigenwert between current data pattern and each historical normal mode;
Draw with the historical data pattern of all standard variance sums of current data pattern minimum to be the parallel pattern of current data pattern.
6. the property abnormality Forecasting Methodology in a kind of distributed system as claimed in claim 3, is characterized in that, described state is abnormality, alarm condition and normal condition.
7. the property abnormality Forecasting Methodology in a kind of distributed system as claimed in claim 3, it is characterized in that, in S6, also comprise alarm threshold value is set, if the self-confident factor is between alarm threshold value and abnormal threshold value, be predicted as alarm condition, be predicted as normal condition if the self-confident factor is less than alarm threshold value.
8. the property abnormality prognoses system in distributed system, is connected with the supervisory system of distributed system, it is characterized in that, comprising:
History feature value computing module, extracts the data source of target data values as training in the History Performance Data that the some monitoring nodes from supervisory system obtain, and calculates the eigenwert of each historical data pattern in data source;
Prior probability module, be connected with the output terminal of history feature value computing module, obtain respectively the prior probability distribution of each historical data pattern under various states according to the eigenwert of each historical data pattern, and the probability distribution of adding up various states, thereby train the Bayesian model of the state of various data patterns;
Real-time characteristic value computing module, the real-time performance data of obtaining according to the some monitoring nodes in supervisory system calculate the eigenwert of current data pattern;
Parallel pattern module, is connected with output terminal and the real-time characteristic computing module of history feature value computing module, from described historical data pattern, finds the data pattern the most similar to current data pattern;
Probability calculation module, predicts by the Bayesian model of training in prior probability module according to the Output rusults of parallel pattern module, draws respectively the probability distribution of described various states; And
Abnormal alarm module, arranges the self-confident factor and abnormal threshold value according to result in probability calculation module, is predicted as abnormality if the self-confident factor exceeds abnormal threshold value.
9. the property abnormality prognoses system in a kind of distributed system as claimed in claim 8, is characterized in that, described eigenwert comprises performance number variable quantity, performance number rate of change and performance number.
10. the property abnormality prognoses system in a kind of distributed system as claimed in claim 8, is characterized in that, described state comprises abnormality, alarm condition and normal condition.
CN201410294472.2A 2014-06-26 2014-06-26 Performance abnormality prediction method in distributed system and system CN104063747A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410294472.2A CN104063747A (en) 2014-06-26 2014-06-26 Performance abnormality prediction method in distributed system and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410294472.2A CN104063747A (en) 2014-06-26 2014-06-26 Performance abnormality prediction method in distributed system and system

Publications (1)

Publication Number Publication Date
CN104063747A true CN104063747A (en) 2014-09-24

Family

ID=51551447

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410294472.2A CN104063747A (en) 2014-06-26 2014-06-26 Performance abnormality prediction method in distributed system and system

Country Status (1)

Country Link
CN (1) CN104063747A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105629947A (en) * 2015-11-30 2016-06-01 东莞酷派软件技术有限公司 Household equipment monitoring method, household equipment monitoring device and terminal
CN105871879A (en) * 2016-05-06 2016-08-17 中国联合网络通信集团有限公司 Automatic network element abnormal behavior detection method and device
CN106095639A (en) * 2016-05-30 2016-11-09 中国农业银行股份有限公司 A kind of cluster subhealth state method for early warning and system
CN106125643A (en) * 2016-06-22 2016-11-16 华东师范大学 A kind of industry control safety protection method based on machine learning techniques
CN106293976A (en) * 2016-08-15 2017-01-04 东软集团股份有限公司 Application performance Risk Forecast Method, device and system
CN106897113A (en) * 2017-02-23 2017-06-27 郑州云海信息技术有限公司 The method and device of a kind of virtualized host operation conditions prediction
WO2017124953A1 (en) * 2016-01-21 2017-07-27 阿里巴巴集团控股有限公司 Method for processing machine abnormality, method for adjusting learning rate, and device
CN107844406A (en) * 2017-10-25 2018-03-27 千寻位置网络有限公司 Method for detecting abnormality and system, service terminal, the memory of distributed system
CN108039971A (en) * 2017-12-18 2018-05-15 北京搜狐新媒体信息技术有限公司 A kind of alarm method and device
CN109297582A (en) * 2017-07-25 2019-02-01 台达电子电源(东莞)有限公司 The detection device and detection method of fan abnormal sound
CN109921955A (en) * 2017-12-12 2019-06-21 北京嘀嘀无限科技发展有限公司 Portfolio monitoring method, system, computer equipment and storage medium
WO2020078385A1 (en) * 2018-10-18 2020-04-23 杭州海康威视数字技术股份有限公司 Data collecting method and apparatus, and storage medium and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103324155A (en) * 2012-03-19 2013-09-25 通用电气航空系统有限公司 System monitoring
KR20140056952A (en) * 2012-11-02 2014-05-12 주식회사 세이프티아 Method and system for evaluating abnormality detection

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103324155A (en) * 2012-03-19 2013-09-25 通用电气航空系统有限公司 System monitoring
KR20140056952A (en) * 2012-11-02 2014-05-12 주식회사 세이프티아 Method and system for evaluating abnormality detection

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
仇沂: "分布式环境中的性能异常预测监控", 《中国优秀硕士学位论文全书数据库 信息科技辑》 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105629947A (en) * 2015-11-30 2016-06-01 东莞酷派软件技术有限公司 Household equipment monitoring method, household equipment monitoring device and terminal
CN105629947B (en) * 2015-11-30 2019-02-01 东莞酷派软件技术有限公司 Home equipment monitoring method, home equipment monitoring device and terminal
WO2017124953A1 (en) * 2016-01-21 2017-07-27 阿里巴巴集团控股有限公司 Method for processing machine abnormality, method for adjusting learning rate, and device
CN106991095A (en) * 2016-01-21 2017-07-28 阿里巴巴集团控股有限公司 Machine abnormal processing method, the method for adjustment of learning rate and device
US10748090B2 (en) 2016-01-21 2020-08-18 Alibaba Group Holding Limited Method and apparatus for machine-exception handling and learning rate adjustment
CN105871879B (en) * 2016-05-06 2019-03-05 中国联合网络通信集团有限公司 Network element abnormal behaviour automatic testing method and device
CN105871879A (en) * 2016-05-06 2016-08-17 中国联合网络通信集团有限公司 Automatic network element abnormal behavior detection method and device
CN106095639A (en) * 2016-05-30 2016-11-09 中国农业银行股份有限公司 A kind of cluster subhealth state method for early warning and system
CN106125643A (en) * 2016-06-22 2016-11-16 华东师范大学 A kind of industry control safety protection method based on machine learning techniques
CN106293976A (en) * 2016-08-15 2017-01-04 东软集团股份有限公司 Application performance Risk Forecast Method, device and system
CN106897113A (en) * 2017-02-23 2017-06-27 郑州云海信息技术有限公司 The method and device of a kind of virtualized host operation conditions prediction
CN109297582A (en) * 2017-07-25 2019-02-01 台达电子电源(东莞)有限公司 The detection device and detection method of fan abnormal sound
CN107844406A (en) * 2017-10-25 2018-03-27 千寻位置网络有限公司 Method for detecting abnormality and system, service terminal, the memory of distributed system
CN109921955A (en) * 2017-12-12 2019-06-21 北京嘀嘀无限科技发展有限公司 Portfolio monitoring method, system, computer equipment and storage medium
CN109921955B (en) * 2017-12-12 2020-10-02 北京嘀嘀无限科技发展有限公司 Traffic monitoring method, system, computer device and storage medium
CN108039971A (en) * 2017-12-18 2018-05-15 北京搜狐新媒体信息技术有限公司 A kind of alarm method and device
WO2020078385A1 (en) * 2018-10-18 2020-04-23 杭州海康威视数字技术股份有限公司 Data collecting method and apparatus, and storage medium and system

Similar Documents

Publication Publication Date Title
CN107408225B (en) Adaptive handling of operational data
US9672085B2 (en) Adaptive fault diagnosis
US10192170B2 (en) System and methods for automated plant asset failure detection
US20190114244A1 (en) Correlation-Based Analytic For Time-Series Data
US9722945B2 (en) Dynamically identifying target capacity when scaling cloud resources
Tsui et al. Prognostics and health management: A review on data driven approaches
US20170351241A1 (en) Predictive and prescriptive analytics for systems under variable operations
JP6394726B2 (en) Operation management apparatus, operation management method, and program
CN106104496B (en) The abnormality detection not being subjected to supervision for arbitrary sequence
EP3101599A2 (en) Advanced analytical infrastructure for machine learning
CN104350471B (en) Method and system for detecting anomalies in real-time in processing environment
US10223403B2 (en) Anomaly detection system and method
Langone et al. LS-SVM based spectral clustering and regression for predicting maintenance of industrial machines
US9070121B2 (en) Approach for prioritizing network alerts
Saxena et al. Metrics for offline evaluation of prognostic performance
US8060342B2 (en) Self-learning integrity management system and related methods
US9600394B2 (en) Stateful detection of anomalous events in virtual machines
US9122273B2 (en) Failure cause diagnosis system and method
US10013303B2 (en) Detecting anomalies in an internet of things network
Wang et al. Spatiotemporal anomaly detection in gas monitoring sensor networks
US10410135B2 (en) Systems and/or methods for dynamic anomaly detection in machine sensor data
US9424157B2 (en) Early detection of failing computers
US20160231738A1 (en) Information processing apparatus and analysis method
US7693982B2 (en) Automated diagnosis and forecasting of service level objective states
JP2018500709A5 (en) Computing system, program and method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20140924