CN110275809A - A kind of data fluctuations recognition methods, device and storage medium - Google Patents

A kind of data fluctuations recognition methods, device and storage medium Download PDF

Info

Publication number
CN110275809A
CN110275809A CN201810214976.7A CN201810214976A CN110275809A CN 110275809 A CN110275809 A CN 110275809A CN 201810214976 A CN201810214976 A CN 201810214976A CN 110275809 A CN110275809 A CN 110275809A
Authority
CN
China
Prior art keywords
data
fluctuation
value
fluctuation parameters
parameters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810214976.7A
Other languages
Chinese (zh)
Other versions
CN110275809B (en
Inventor
阮航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201810214976.7A priority Critical patent/CN110275809B/en
Publication of CN110275809A publication Critical patent/CN110275809A/en
Application granted granted Critical
Publication of CN110275809B publication Critical patent/CN110275809B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis

Abstract

The embodiment of the invention discloses a kind of data fluctuations recognition methods, device and storage mediums;The embodiment of the present invention is using the data value for obtaining current data;Obtain the first fluctuation parameters between data value and historical data values;Nonlinear regression model (NLRM) is trained according to training data sequence;The first current prediction data value is obtained according to the nonlinear regression model (NLRM) after training, and obtains the second fluctuation parameters between data value and the first prediction data value;According to the first fluctuation parameters and the second fluctuation parameters, determine whether the fluctuation of current data is abnormal.Fluctuation parameters of the available data of the program in multiple dimensions determine whether data fluctuations are abnormal, therefore, can promote the identification accuracy of data fluctuations based on multidimensional fluctuation parameters.

Description

A kind of data fluctuations recognition methods, device and storage medium
Technical field
The present invention relates to field of computer technology, and in particular to a kind of data fluctuations recognition methods, device and storage medium.
Background technique
For the quality for guaranteeing service, needs using each achievement data of data monitoring technical solution monitoring business and find to report It accuses abnormal.
Current data monitoring technical solution is concentrated mainly on the data acquisition aspect of monitoring, mainly has based on cloud platform Monitoring, i.e., summarize data collection to cloud platform, by the fluctuation of cloud computing platform calculating data, to determine that data are No exception.
However, data monitoring scheme is only concerned about on the hardware and software platform of monitoring data and the real-time of monitoring data at present Face, but the monitoring for external service business data, these schemes can only play the role of " perceiving " to data variation, for The fluctuation of business datum normalization once in a while, these schemes can not accurately identify these fluctuations, such as the wave of current data Dynamic whether to belong to normal range (NR), therefore, available data monitoring scheme is lower to the identification accuracy of data fluctuations.
Summary of the invention
The embodiment of the present invention provides a kind of data fluctuations recognition methods, device and storage medium, can promote data fluctuations Identification accuracy.
The embodiment of the present invention provides a kind of data fluctuations recognition methods, comprising:
Obtain the data value of current data;
Obtain the first fluctuation parameters between the data value and historical data values;
Nonlinear regression model (NLRM) is trained according to training data sequence;
The first current prediction data value is obtained according to the nonlinear regression model (NLRM) after training, and obtain the data value with The second fluctuation parameters between the first prediction data value;
According to first fluctuation parameters and second fluctuation parameters, determine whether the fluctuation of the current data is different Often.
Correspondingly, the embodiment of the present invention also provides a kind of data fluctuations identification device, comprising:
Data capture unit, for obtaining the data value of current data;
First parameter acquiring unit, for obtaining the first fluctuation parameters between the data value and historical data values;
Training unit, for being trained according to training data sequence to nonlinear regression model (NLRM);
Second parameter acquiring unit, for obtaining the first current prediction data according to the nonlinear regression model (NLRM) after training Value, and obtain the second fluctuation parameters between the data value and the first prediction data value;
Determination unit, for determining the current number according to first fluctuation parameters and second fluctuation parameters According to fluctuation it is whether abnormal.
Correspondingly, the embodiment of the present invention also provides a kind of storage medium, the storage medium is stored with instruction, described instruction The step of method of any offer of the embodiment of the present invention is provided when being executed by processor.
The embodiment of the present invention is using the data value for obtaining current data;Obtain first between data value and historical data values Fluctuation parameters;Nonlinear regression model (NLRM) is trained according to training data sequence;According to the nonlinear regression model (NLRM) after training The first current prediction data value is obtained, and obtains the second fluctuation parameters between data value and the first prediction data value;According to First fluctuation parameters and the second fluctuation parameters determine whether the fluctuation of current data is abnormal.The available data of the program exist Fluctuation parameters in multiple dimensions determine whether data fluctuations are abnormal, therefore, can promote data wave based on multidimensional fluctuation parameters Dynamic identification accuracy.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for For those skilled in the art, without creative efforts, it can also be obtained according to these attached drawings other attached Figure.
Fig. 1 a is the schematic diagram of a scenario of data monitoring system provided in an embodiment of the present invention;
Fig. 1 b is the flow diagram of data fluctuations recognition methods provided in an embodiment of the present invention;
Fig. 1 c is the solution schematic diagram of nonlinear regression model (NLRM) provided in an embodiment of the present invention;
Fig. 2 is that lag order provided in an embodiment of the present invention determines flow diagram;
Fig. 3 a is another flow diagram of data fluctuations recognition methods provided in an embodiment of the present invention;
Fig. 3 b is the logical architecture schematic diagram of data fluctuations recognition methods provided in an embodiment of the present invention;
Fig. 4 a is the first structural schematic diagram of data fluctuations identification device provided in an embodiment of the present invention;
Fig. 4 b is second of structural schematic diagram of data fluctuations identification device provided in an embodiment of the present invention;
Fig. 5 is the structural schematic diagram of server provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, those skilled in the art's every other implementation obtained without creative efforts Example, shall fall within the protection scope of the present invention.
The embodiment of the invention provides a kind of data fluctuations recognition methods, device and storage mediums.
The embodiment of the invention provides a kind of data monitoring system, which may include any offer of the embodiment of the present invention Data fluctuations identification device.Wherein, data fluctuations identification device can be in server, such as monitoring server.
In addition, data monitoring system can also include other equipment, it include such as terminal, which can be mobile phone, plate The equipment such as computer, laptop.
For example, with reference to Fig. 1 a, a kind of data monitoring system is provided, which includes: terminal 10 and service Device 20, terminal 10 are connect with server 20 by network 30.It wherein, include router, gateway etc. network entity in network 30, In figure and to illustrate.Terminal 10 can be communicated by cable network or wireless network with server 20, to request to service Service on device 20, for example, can from server 20 download application and/or application updated data package and/or to apply relevant number It is believed that breath or business information.Wherein, terminal 10 can be with terminal for equipment, Fig. 1 a such as mobile phone, tablet computer, laptops 10 is for laptops.Application needed for being also equipped with various users in the terminal 10, for example have amusement function Application (such as image processing application, audio play application, game application, ocr software), for another example have the application of service function.
Terminal 10 can obtain the data value of the data to 20 reported data of server, server 20 from local;So Afterwards, the first fluctuation parameters between data value and historical data values are obtained;According to training data sequence to nonlinear regression model (NLRM) It is trained;The first current prediction data value is obtained according to the nonlinear regression model (NLRM) after training, and obtains data value and the The second fluctuation parameters between one prediction data value;According to the first fluctuation parameters and the second fluctuation parameters, current data is determined Fluctuation it is whether abnormal.
It is reminded in addition, server 20 in the fluctuation exception for determining data value, can be sent out alarm.
It will be described in detail respectively below.
The present embodiment will be described from the angle of data fluctuations identification device, which specifically can be with Server etc..
As shown in Figure 1 b, a kind of data fluctuations recognition methods is provided, this method can be held by the processor in server Row, detailed process can be such that
101, the data value of current data is obtained.
Wherein, current data is the data currently got from data source, the number such as currently got from monitoring data source According to.Wherein, current data can be the data got from data source today.
In view of the data source of monitoring is varied, for example, can be mysql, the databases such as hive are also possible to list A file (file) or distributed document (hdfs), the even one section code (shell script) that can be executed.Therefore, The data type or format of acquisition be not identical.
In order to promote data fluctuations recognition efficiency, optionally, standardization processing can also be carried out to data format or type. That is, step " data value for obtaining current data " may include:
Data are obtained from data source, obtain current data;
Current data is converted into the data of Uniform data format, obtains the data value of translated data.
Specifically, the data source of monitoring can be abstracted into corresponding Data Generator (generator), it is raw by data Grow up to be a useful person (generator) from corresponding data source obtain the corresponding data of current time, by Data Generator by data conversion At the data of Uniform data format.
Wherein, the abstract method of data source can be abstracted based on the mode of jdbc.It in practical applications, can be in data Level of abstraction realizes that generator is abstract.The core function of data abstraction layer is exactly to analyze various data sources, and adjust Corresponding generator is adapted to and generates externally all to appear as after data sources all in this way carries out data abstraction layer A kind of this data format of generator.
As it can be seen that data abstraction layer has been mainly the standardization of paired data type, the data source of monitoring be it is changeable, therefore To need to carry out the data source of these various formats a unified structuring before carrying out fluctuation identification, redescribe for Generator, generator, which are realized from data source, obtains data, and mentions to monitoring logic layer (layer fluctuated for identification) Format is reported for unified data.
102, the first fluctuation parameters between data value and historical data values are obtained.
Wherein, the value for the data that historical data values are got before being from data source, i.e., from data before current time The value for the data that source is got.For example, can be the value of the last data got from data source.
For example, historical data values can be the value of the data obtained from data source yesterday.
Wherein, the first fluctuation parameters are the parameter for measuring data value amplitude of variation, for example, can be used for measuring current Data value amplitude of variation of the data value of data relative to historical data values.For example, first fluctuation parameters may include fluctuation Rate confidence, stability bandwidth confidence can be by the data values of current data and the difference of historical data values divided by history Data value obtains.It is as follows:
Stability bandwidth confidence=(x-x-1)/x-1, wherein x is the data value of current data, x-1For historical data Value.
103, nonlinear regression model (NLRM) is trained according to training data sequence.
Wherein, linear regression are as follows: known a series of linear data column, such as time series (time series or Dynamic series refer to ordered series of numbers made of the chronological order arrangement by the numerical value of same statistical indicator by its generation), if the time Sequence meets linear character, it can (i.e. linear regression model (LRM) expression formula) is indicated with Y=WX+b, wherein W and b is wait estimate ginseng Number, linear regression minimize two norms and make recurrence side by calculating direct two norm of known sample and function Y=WX+b Journey Y=WX+b is closest to existing timed sample sequence.
Wherein, nonlinear regression are as follows: it is similar with linear regression, only in nonlinear regression, function Y=f to be assessed (X) it is a nonlinear function, also makes regression equation Y=f (X) and existing time sequence by minimizing two norms Column sample is closest.
Wherein, nonlinear regression model (NLRM) can there are many, for example, hyperbolic model, power function model, nonlinear polynomial Model etc..
For example, by taking nonlinear regression model (NLRM) is nonlinear polynomial model as an example, the model table of nonlinear polynomial model Up to formula are as follows:
YT=a0+a1T1+a2T2+...+apTp
Wherein, what p was indicated is power series.Because of the problem of any curve, curved surface, hypersurface, in a certain range all It can arbitrarily be approached with multinomial.What p was represented is the degree approached, in the embodiment of the present invention in preferably p=4 i.e. 4 formula, a be Model parameter to be estimated, solution mode can be completed using least square method.
Wherein, training data sequence is a time series, including multiple historical data values, and historical data is corresponding by its Chronological order arrangement.
In the embodiment of the present invention, nonlinear regression model (NLRM) is trained according to training data sequence, is exactly according to training The model parameter to be estimated of data sequence solution nonlinear regression model (NLRM).For example, using nonlinear regression model (NLRM) as nonlinear polynomial For model, the training model is just to solve for model parameter a to be estimated.
Specifically, step " being trained according to training data sequence to nonlinear regression model (NLRM) " may include:
Determine the number of model parameter to be estimated in nonlinear regression model (NLRM);
The model parameter to be estimated that nonlinear regression model (NLRM) is solved based on least square method, training data sequence, is trained Nonlinear regression model (NLRM) afterwards.
For example, by taking nonlinear regression model (NLRM) is nonlinear polynomial model as an example, the model table of nonlinear polynomial model Up to formula are as follows:
YT=a0+a1T1+a2T2+...+apTp
Wherein, a is model parameter to be estimated, and directly determining number of parameters here is 4, i.e. p=4, is because in the present invention In embodiment, number of parameters can fit reasonable curve when being 4, and theoretically parameter is more, the closer actual value of meeting, But it is not necessarily most reasonably, because of the phenomenon that actual value each time is not necessarily most reliable, and here it is over-fittings. So it is to reach optimal fitting state that the number taken in the embodiment of the present invention, which is 4,.
With reference to Fig. 1 c, if known actual value is yt (i.e. training data sequence), by the least square for calling python Method finds out all a, so that (Yt-yt)2Value is minimum out.
104, the first current prediction data value is obtained according to the nonlinear regression model (NLRM) after training, and obtain data value with The second fluctuation parameters between first prediction data value.
Wherein, the first current prediction data value is the prediction data value of current time, for example, the prediction data value of today Etc..
In the embodiment of the present invention, after nonlinear regression model (NLRM) training, can obtain nonlinear regression model (NLRM) wait estimate Model parameter, thus the nonlinear regression model (NLRM) after being trained.
For example, using nonlinear regression model (NLRM) as nonlinear polynomial model: YT=a0+a1T1+a2T2+...+apTpFor, After solving all a, the nonlinear polynomial model after being trained is according to (being denoted as G (X)), then, after training Nonlinear polynomial model: available the first current prediction data value of YT=a0+a1T1+a2T2+...+apTp, namely Current data value is predicted, for example, predicting the first prediction data value of today.
In embodiments of the present invention, after obtaining the first current prediction data value, the number of current data can also be obtained According to the second fluctuation parameters between value and the first prediction data value.
Wherein, the second fluctuation parameters are used to measure the parameter of data value amplitude of variation, for example, can be used for measuring current number According to amplitude of variation of the data value relative to prediction data value.For example, second fluctuation parameters can be denoted as confidence ', the Two fluctuation parameters confidence ' can be by the difference of the data value of current data and prediction data value divided by the number of current data It is obtained according to value.It is as follows:
Second fluctuation parameters confidence '=(x-G (X))/x, x are the data value of current data, and G (X) is non-linear The prediction data value of regression model.
In the embodiment of the present invention, the timing of the acquisition process of the acquisition process of the first fluctuation parameters and the second fluctuation parameters can There are many, for example, may be performed simultaneously, or successively execute.
105, according to the first fluctuation parameters and the second fluctuation parameters, determine whether the fluctuation of current data is abnormal.
It is alternatively possible to obtain the final fluctuation parameters of current data according to the first fluctuation parameters and the second fluctuation parameters Then value determines whether the fluctuation of current data is abnormal according to final fluctuation parameters value.
For example, when final fluctuation parameters value within a preset range when, determine that the fluctuation of current data is normal;
When final fluctuation parameters value not within a preset range when, determine that the fluctuation of current data is abnormal.
Wherein, the parameter value based on the first fluctuation parameters and second fluctuation parameters the two parameters generates final fluctuation ginseng There are many modes of numerical value, for example, aggregation process can be carried out with the parameter value of the first fluctuation parameters and the second fluctuation parameters, will converge Final fluctuation parameters value of the parameter value as current data that always treated.
For example, processing can be weighted and averaged to the parameter value of the first fluctuation parameters and the second fluctuation parameters, will weight Average value is as final fluctuation parameters value.
It is assumed that the first fluctuation parameters are confidence, the second fluctuation parameters are confidence ', at this point it is possible to finally Fluctuation parameters value confidence final=q1*confidence+q2confidence ', wherein q1 and q2 is weight, can To set according to actual needs, for example, q1=0.3, q2=0.7 etc..
In one embodiment, in order to further enhance the accuracy of data fluctuations identification, it may be incorporated into autoregression sliding Averaging model (Auto-Regressive and Moving Average Model, ARMA) carry out number it was predicted that and obtain work as Third fluctuation parameters between the data value of preceding data and the model predication value, then, based on the fluctuation parameters in three dimensions Determine whether the fluctuation of current data is abnormal.
It must be the prediction based on stationary sequence, for non-stationary sequence since arma modeling is restricted in prediction Arranging unavailable namely arma modeling fitting sequence i.e. training data sequence must be a stationary sequence.Therefore, ARMA is being used Before model prediction, it must be determined that whether the training data sequence of the model is a stationary sequence.
Optionally, step " according to the first fluctuation parameters and the second fluctuation parameters, determines whether the fluctuation of current data is different Often " may include:
When training data sequence is stationary sequence, autoregressive moving-average model is instructed according to training data sequence Practice;
The second current prediction data value is obtained according to the autoregressive moving-average model after training, and obtain data value with Third fluctuation parameters between second prediction data value;
According to the first fluctuation parameters, the second fluctuation parameters and third fluctuation parameters, determine current data fluctuation whether It is abnormal;
When training data sequence is not stationary sequence, according to the first fluctuation parameters and the second fluctuation parameters, determination is worked as Whether the fluctuation of preceding data is abnormal.
Wherein, stationary sequence refers to a time series, if sequence desired value does not have the variation of tendency, variance not to have Have and significantly change very much, weak related to current time point, periodic feature is unobvious, is just referred to as to be stable sequence, such as Simplest arithmetic progression.
Wherein, arma modeling is the model for being usually used in analyzing the trend of Future Data at present in econometrics, the model It is to be simulated by AR (p) (Auto-regressive, autoregression model) and MA (Moving-Average, moving average model(MA model)) The situation of change of one group of data.The expression formula of arma modeling are as follows:
Yt=a0+a1Yt-1+a2Yt-2+...+apYt-p+b1et+b2et-2+...+bqet-q
Wherein, et is to obey desired value E (et)=0, and variance is the distribution of D (et)=d2, and et and et+n are mutually only It is vertical.A and b is parameter to be estimated, after giving this model, it is necessary to find out a in this model, the value of b can just obtain Yt's Expression formula F (Yt).Solution mode can be completed using least square method.
Wherein, autoregression model: (AutoRegression, AR) is abbreviated as AR (P), refers to the random mistake of following form Journey: YT=A1YT-1+A2YT-2+....+APYT-P+UTA therein1、A2、...、APIt is P parameters to be asked;P is the number for lagging the time limit Mesh.
Moving average model(MA model): equally by taking above-mentioned autoregressive process as an example, YTIt is YT,, YT-pFunction, pass through calculus of differences Y can be calculatedT=UT-A1UT-1-APUT-P
In the embodiment of the present invention, when training data sequence is that (i.e. above-mentioned Yt is stationary sequence to stationary sequence, that is, Yt It is expected that variance, auto-correlation function is unrelated with t) when, which can be fitted using arma modeling, it specifically, can basis Stationary sequence is trained arma modeling, that is, solves the model parameter to be estimated of arma modeling, such as above-mentioned arma modeling expression formula In a and b.
When training data sequence is not stationary sequence, the stationary sequence, therefore, this hair cannot be fitted with arma modeling Bright embodiment does not use arma modeling prediction data value when training data sequence is non-stationary series;At this point it is possible to according to non- Linear regression model (LRM) is fitted training data sequence, nonlinear regression model (NLRM) prediction data value is based on, then, according to current The fluctuation parameters and current data of data and the prediction data value and the fluctuation parameters of historical data values determine the fluctuation of data It is whether abnormal.
In the embodiment of the present invention, on the one hand nonlinear regression model (NLRM) is used as the supplement of ARMA, i.e., when data sequence is unstable In the case where, pass through the model prediction data value;On the other hand it in the case where sequence stationary, can go forward side by side with arma modeling one Prediction obtains more reasonable predicted value, generates multiple dimension fluctuation parameters, promotes the accuracy of data fluctuations identification.
Wherein, arma modeling is trained based on training data sequence, is just to solve for model the to be estimated ginseng of arma modeling Number.
The solution of the model parameter to be estimated of arma modeling most importantly determines the lag order of arma modeling, also referred to as Lag period, i.e., p, q parameter in above-mentioned arma modeling expression formula.
The embodiment of the present invention can be analyzed by auto-correlation function and partial autocorrelation function analysis mode determines lag order; Then, model parameter to be estimated is solved by least square method.
Assuming that arma modeling are as follows: R=Xt+Yt, wherein Yt=b1et+b2et-2+...+bqet-q;Xt=a0+a1Yt-1+ a2Yt-2+...+apYt-p
With reference to Fig. 2, the lag order determination process of arma modeling is as follows:
201, lag order q=1, p=1 are set.
202, judge whether q is greater than 5, if it is not, 203 are thened follow the steps, if so, step 207.
203, the auto-correlation coefficient of lag order q is obtained.
That is, sequence of calculation YtAnd Yt+qBetween related coefficient:
Cov(Yt,Yt+q)=E (Yt-ut)(Yt+q-ut)/D(Yt)。
204, judge whether auto-correlation coefficient is zero, if so, step 206 is executed, if it is not, thening follow the steps 205.
205, the value of q is added 1, and returns to step 202.
206, it determines that q is the value of current setting, goes to step 208.
After lag order q is determined, Yt=b can be determined1et+b2et-2+...+bqet-q。
207,0 is set by Yt, goes to step 208.
As q > 5, show that fitting degree is too low, at this point it is possible to zero is set by Yt, arma modeling are as follows: R=Xt+0.
208, judge whether lag order p is greater than 5, if it is not, 209 are thened follow the steps, if so, thening follow the steps 213.
209, the PARCOR coefficients of lag period p are obtained.
Partial correlation coefficient between the sequence of calculation Xt and Xt+p:
E{[(x(t)-Ex(t)][x(t-k)-Ex(t-k)])}/E{[x(t-k)-Ex(t-k)]^2}。
210, judge whether PARCOR coefficients are zero, if so, 211 are thened follow the steps, if it is not, then executing
211, it determines that p is the value of current setting, terminates process.
After p is determined, it can obtain determining Xt=a0+a1Yt-1+a2Yt-2+...+apYt-p。
212, the value of p is added 1, and returns to step 208.
213,0 is set by setting Xt, terminates process.
As p > 5, show that fitting degree is too low, at this point it is possible to zero is set by Xt, arma modeling are as follows: R=0+Yt.
Lag order p, the q that can determine arma modeling through the above way can after lag order p, q are determined Construct arma modeling:
Yt=a0+a1Yt-1+a2Yt-2+...+apYt-p+b1et+b2et-2+...+bqet-qHere R is assigned to Yt.
After arma modeling building, model parameter to be estimated can be solved using least square method.For example, setting known reality Actual value is yt (i.e. training data sequence), by calling the least square method of python to find out all a and b, so that (Yt-yt )2Value is minimum out.
Solve arma modeling after estimating model parameter such as a, b, can obtain arma modeling cuts true model Expression formula, such as data YtExpression formula F (Yt);Then, current data value is predicted based on arma modeling, that is, obtains current Two prediction data values such as predict the data value of today.For example, Yt+1Value can use F (Yt+1) ask.
After obtaining the second prediction data value by arma modeling, the data value of available current data and this is second pre- Third fluctuation parameters between measured data value.Join finally, being fluctuated in conjunction with the first fluctuation parameters, the second fluctuation parameters and third Number determines whether the fluctuation of current data is abnormal.
Wherein, third fluctuation parameters are the parameter for measuring data value amplitude of variation, for example, can be used for measuring current Amplitude of variation of the data value of data relative to prediction data value.For example, the third fluctuation parameters can be denoted as confidence ", Third fluctuation parameters confidence " can be by the data value of current data and the difference of prediction data value divided by current data Data value obtains.It is as follows:
Third fluctuation parameters confidence "=(x-ARMA (x))/x, x are the data value of current data, and ARMA (x) is The prediction data value of arma modeling.
Wherein, it based on the first fluctuation parameters, the second fluctuation parameters and third fluctuation parameters these three fluctuation parameters, determines Current data fluctuation whether Yi Chang mode can there are many, for example, also according to the parameter value of this three fluctuation parameters acquisition Then final fluctuation parameters value determines whether the broadcasting of data is abnormal based on final fluctuation parameters value.
For example, step " according to the first fluctuation parameters, the second fluctuation parameters and third fluctuation parameters, determines current data Fluctuation it is whether abnormal ", may include:
Current data is obtained most according to the parameter value of the first fluctuation parameters, the second fluctuation parameters and third fluctuation parameters Whole fluctuation parameters value;
When final fluctuation parameters value within a preset range when, determine that the fluctuation of current data is normal;
When final fluctuation parameters value not within a preset range when, determine that the fluctuation of current data is abnormal.
Wherein, the parameter value based on the first fluctuation parameters, the second fluctuation parameters and third fluctuation parameters these three parameters, There are many modes for generating final fluctuation parameters value, joins for example, can be fluctuated with the first fluctuation parameters, the second fluctuation parameters and third Several parameter values carries out aggregation process, using the parameter value after aggregation process as the final fluctuation parameters value of current data.
For example, can join to the first fluctuation parameters, the second fluctuation to promote the accuracy and efficiency of data fluctuations identification Several and third fluctuation parameters parameter values are weighted and averaged processing, using weighted average as final fluctuation parameters value.
It is assumed that the first fluctuation parameters are confidence, the second fluctuation parameters are confidence ', third fluctuation parameters For confidence ", at this point it is possible to final fluctuation parameters value confidence final=q1*confidence+q2* Confidence '+q3*confidence ", wherein q1, q2, q3 are weight, can be set according to actual needs, for example, q1= 0.3, q2=0.3, q3=0.3 etc..
From the foregoing, it will be observed that the embodiment of the present invention is using the data value for obtaining current data;Obtain data value and historical data values Between the first fluctuation parameters;Nonlinear regression model (NLRM) is trained according to training data sequence;According to non-thread after training Property regression model obtain the first current prediction data value, and obtain the second fluctuation between data value and the first prediction data value Parameter;According to the first fluctuation parameters and the second fluctuation parameters, determine whether the fluctuation of current data is abnormal.The program can obtain Access is based on multidimensional fluctuation parameters according to the fluctuation parameters (such as the first fluctuation parameters and the second fluctuation parameters) in multiple dimensions Determine whether data fluctuations are abnormal, therefore, can promote the identification accuracy of data fluctuations.
In addition, the program, which can also increase arma modeling, count the dimension it was predicted that increase fluctuation index, it can be to industry Data of being engaged in carry out multidimensional prediction, obtain the fluctuation index (such as the first, second, third fluctuation parameters) of multiple dimensions, based on more The fluctuation index of a dimension determines whether data fluctuations are abnormal, further improves the identification accuracy and flexibly of data fluctuations Property,
The method according to described in above-described embodiment, will now be described in further detail below.
With reference to Fig. 3 a and Fig. 3 b, a kind of data fluctuations recognition methods, detailed process is as follows:
301, the data that Data Generator currently reports are received, and obtain the data value of current data.
In view of the data source of monitoring is varied, for example, can be mysql, the databases such as hive are also possible to list A file (file) or distributed document (hdfs), the even one section code (shell script) that can be executed.Therefore, The data type or format of acquisition be not identical.
In order to promote data fluctuations recognition efficiency, optionally, standardization processing can also be carried out to data format or type. The data source of monitoring can be abstracted into corresponding Data Generator (generator), be passed through Data Generator (generator) The corresponding data of current time are obtained from corresponding data source, Uniform data format is converted the data by Data Generator Data report.
Wherein, the abstract method of data source can be abstracted based on the mode of jdbc.It in practical applications, can be in data Level of abstraction realizes that generator is abstract.The core function of data abstraction layer is exactly to analyze various data sources, and adjust Corresponding generator is adapted to and generates externally all to appear as after data sources all in this way carries out data abstraction layer A kind of this data format of generator.
Generator, which is realized, obtains data from data source, and provides to monitoring logic layer (layer fluctuated for identification) Uniform data reports the data of format.
302, the first fluctuation parameters between data value and historical data values are obtained, go to step 307.
Wherein, the value for the data that historical data values are got before being from data source, i.e., from data before current time The value for the data that source is got.For example, can be the value of the last data got from data source.
For example, historical data values can be the value of the data obtained from data source yesterday.
Wherein, the first fluctuation parameters are the parameter for measuring data value amplitude of variation, for example, can be used for measuring current Data value amplitude of variation of the data value of data relative to historical data values.For example, first fluctuation parameters may include fluctuation Rate confidence, stability bandwidth confidence can be by the data values of current data and the difference of historical data values divided by history Data value obtains.It is as follows:
Stability bandwidth confidence=(x-x-1)/x-1, wherein x is the data value of current data, x-1For historical data Value.
303, nonlinear regression model (NLRM) is trained according to training data sequence, and according to the nonlinear regression after training Model obtains the first current prediction data value.
Wherein, linear regression are as follows: known a series of linear data column, such as time series (time series or Dynamic series refer to ordered series of numbers made of the chronological order arrangement by the numerical value of same statistical indicator by its generation), if the time Sequence meets linear character, it can (i.e. linear regression model (LRM) expression formula) is indicated with Y=WX+b, wherein W and b is wait estimate ginseng Number, linear regression minimize two norms and make recurrence side by calculating direct two norm of known sample and function Y=WX+b Journey Y=WX+b is closest to existing timed sample sequence.
Wherein, nonlinear regression are as follows: it is similar with linear regression, only in nonlinear regression, function Y=f to be assessed (X) it is a nonlinear function, also makes regression equation Y=f (X) and existing time sequence by minimizing two norms Column sample is closest.
Wherein, nonlinear regression model (NLRM) can there are many, for example, hyperbolic model, power function model, nonlinear polynomial Model etc..
For example, by taking nonlinear regression model (NLRM) is nonlinear polynomial model as an example, the model table of nonlinear polynomial model Up to formula are as follows:
YT=a0+a1T1+a2T2+...+apTp
Wherein, what p was indicated is power series.Because of the problem of any curve, curved surface, hypersurface, in a certain range all It can arbitrarily be approached with multinomial.What p was represented is the degree approached, in the embodiment of the present invention in preferably p=4 i.e. 4 formula, a be Model parameter to be estimated, solution mode can be completed using least square method.
Wherein, training data sequence is a time series, including multiple historical data values, and historical data is corresponding by its Chronological order arrangement.
Wherein, the training process of nonlinear regression model (NLRM) can refer to the description of above-described embodiment.
304, the second fluctuation parameters between data value and the first prediction data value are obtained, go to step 307.
Wherein, the second fluctuation parameters are used to measure the parameter of data value amplitude of variation, for example, can be used for measuring current number According to amplitude of variation of the data value relative to prediction data value.For example, second fluctuation parameters can be denoted as confidence ', the Two fluctuation parameters confidence ' can be by the difference of the data value of current data and prediction data value divided by the number of current data It is obtained according to value.It is as follows:
Second fluctuation parameters confidence '=(x-G (X))/x, x are the data value of current data, and G (X) is non-linear The prediction data value of regression model.
305, when training data sequence be stationary sequence when, according to training data sequence to autoregressive moving-average model into Row training;The second current prediction data value is obtained according to the autoregressive moving-average model after training;When training data sequence When not being stationary sequence, autoregressive moving-average model training and prediction are not executed.
Wherein, arma modeling is the model for being usually used in analyzing the trend of Future Data at present in econometrics, the model It is to be simulated by AR (p) (Auto-regressive, autoregression model) and MA (Moving-Average, moving average model(MA model)) The situation of change of one group of data.The expression formula of arma modeling are as follows:
Yt=a0+a1Yt-1+a2Yt-2+...+apYt-p+b1et+b2et-2+...+bqet-q
Wherein, et is to obey desired value E (et)=0, and variance is the distribution of D (et)=d2, and et and et+n are mutually only It is vertical.A and b is parameter to be estimated, after giving this model, it is necessary to find out a in this model, the value of b can just obtain Yt's Expression formula F (Yt).Solution mode can be completed using least square method.
Wherein, autoregression model: (AutoRegression, AR) is abbreviated as AR (P), refers to the random mistake of following form Journey: YT=A1YT-1+A2YT-2+....+APYT-P+UTA therein1、A2、...、APIt is P parameters to be asked;P is the number for lagging the time limit Mesh.
Moving average model(MA model): equally by taking above-mentioned autoregressive process as an example, YTIt is YT,, YT-pFunction, pass through calculus of differences Y can be calculatedT=UT-A1UT-1-APUT-P
In the embodiment of the present invention, when training data sequence is that (i.e. above-mentioned Yt is stationary sequence to stationary sequence, that is, Yt It is expected that variance, auto-correlation function is unrelated with t) when, which can be fitted using arma modeling, it specifically, can basis Stationary sequence is trained arma modeling, that is, solves the model parameter to be estimated of arma modeling, such as above-mentioned arma modeling expression formula In a and b.
When training data sequence is not stationary sequence, the stationary sequence, therefore, this hair cannot be fitted with arma modeling Bright embodiment does not use arma modeling prediction data value when training data sequence is non-stationary series;At this point it is possible to according to non- Linear regression model (LRM) is fitted training data sequence, nonlinear regression model (NLRM) prediction data value is based on, then, according to current The fluctuation parameters and current data of data and the prediction data value and the fluctuation parameters of historical data values determine the fluctuation of data It is whether abnormal.
As it can be seen that a in the embodiment of the present invention, on the one hand nonlinear regression model (NLRM) is used as the supplement of ARMA, that is, works as data sequence In jiggly situation, pass through the model prediction data value;On the other hand in the case where sequence stationary, can and arma modeling One prediction of going forward side by side obtains more reasonable predicted value, generates multiple dimension fluctuation parameters, promotes the accuracy of data fluctuations identification.
Wherein, arma modeling is trained based on training data sequence, is just to solve for model the to be estimated ginseng of arma modeling Number.
The solution of the model parameter to be estimated of arma modeling most importantly determines the lag order of arma modeling, also referred to as Lag period, i.e., p, q parameter in above-mentioned arma modeling expression formula.
Specifically, lag order is determining and model parameter solves the description that can refer to above-described embodiment.
306, the third fluctuation parameters between data value and the second prediction data value are obtained, go to step 307.
Wherein, third fluctuation parameters are the parameter for measuring data value amplitude of variation, for example, can be used for measuring current Amplitude of variation of the data value of data relative to prediction data value.For example, the third fluctuation parameters can be denoted as confidence ", Third fluctuation parameters confidence " can be by the data value of current data and the difference of prediction data value divided by current data Data value obtains.It is as follows:
Third fluctuation parameters confidence "=(x-ARMA (x))/x, x are the data value of current data, and ARMA (x) is The prediction data value of arma modeling.
307, determine whether the fluctuation of current data is abnormal according to current fluctuation parameters.
In the embodiment of the present invention, if training data sequence is stationary sequence, current fluctuation parameters may include: the One fluctuation parameters, the second fluctuation parameters and third fluctuation parameters;
If training data sequence is not stationary sequence, current fluctuation parameters may include: the first fluctuation parameters, Two fluctuation parameters.
After getting current fluctuation parameters, the final fluctuation of current data can be obtained according to current fluctuation parameters Parameter value;
When final fluctuation parameters value within a preset range when, determine that the fluctuation of current data is normal;
When final fluctuation parameters value not within a preset range when, determine that the fluctuation of current data is abnormal.
Work as example, can be obtained according to the parameter value of the first fluctuation parameters, the second fluctuation parameters and third fluctuation parameters The final fluctuation parameters value of preceding data, or according to the first fluctuation parameters, the final fluctuation of the second fluctuation parameters acquisition current data Parameter value;
When final fluctuation parameters value within a preset range when, determine that the fluctuation of current data is normal;
When final fluctuation parameters value not within a preset range when, determine that the fluctuation of current data is abnormal.
Wherein, the parameter value based on the first fluctuation parameters, the second fluctuation parameters and third fluctuation parameters these three parameters, There are many modes for generating final fluctuation parameters value, joins for example, can be fluctuated with the first fluctuation parameters, the second fluctuation parameters and third Several parameter values carries out aggregation process, using the parameter value after aggregation process as the final fluctuation parameters value of current data.
For example, with reference to Fig. 3 b, in order to promote the accuracy and efficiency of data fluctuations identification, can to the first fluctuation parameters, The parameter value of second fluctuation parameters and third fluctuation parameters is weighted and averaged processing, using weighted average as final fluctuation Parameter value.
It is assumed that the first fluctuation parameters are confidence, the second fluctuation parameters are confidence ', third fluctuation parameters For confidence ", at this point it is possible to final fluctuation parameters value confidence final=q1*confidence+q2* Confidence '+q3*confidence ", wherein q1, q2, q3 are weight, can be set according to actual needs, for example, q1= 0.3, q2=0.3, q3=0.3 etc..
In practical application, identifying schemes provided in an embodiment of the present invention can be realized in monitoring logic layer.
From the foregoing, it will be observed that scheme provided in an embodiment of the present invention can be based on ARMA prediction model and nonlinear regression model (NLRM) Data are predicted in combination, are obtained the fluctuation parameters of multiple dimensions, then, are determined data based on the fluctuation parameters of multiple dimensions Whether fluctuation is abnormal, can promote the identification accuracy and authenticity of data fluctuations.
Scheme provided in an embodiment of the present invention can detecte the weekend effect of data in business, such as our online interrogation Business, the order of interrogation can be fallen after rise much when weekend, if according to existing identification method will business datum this Kind fluctuation is alerted as abnormal.But after using the scheme of the embodiment of the present invention.Since predicted value is inherently based on history What data were learnt, then determining industry in conjunction with the fluctuation parameters of multiple dimensions just in normal interval with the stability bandwidth of actual value The fluctuation for data of being engaged in is normal.
In addition, the abstract in monitoring data source can be abstracted into Data Generator by the program, user is not limited and inputs tool The data type of body, it is only necessary to user provides the interface with data output capacities, is configured to monitoring system, including But it is not limited to mysql, hive data source, the executable script such as python of others, shell, perl etc. can directly pass through The script that shell tune rises.
In order to better implement above method, the embodiment of the present invention also provides a kind of data fluctuations identification device, such as Fig. 4 a Shown, which may include: data capture unit 401, the first parameter acquiring unit 402, training unit 403, the second parameter acquiring unit 404 and determination unit 405 are as follows:
Data capture unit 401, for obtaining the data value of current data;
First parameter acquiring unit 402, for obtaining the first fluctuation parameters between the data value and historical data values;
Training unit 403, for being trained according to training data sequence to nonlinear regression model (NLRM);
Second parameter acquiring unit 404, for obtaining the first current prediction according to the nonlinear regression model (NLRM) after training Data value, and obtain the second fluctuation parameters between the data value and the first prediction data value;
Determination unit 405, for determining described current according to first fluctuation parameters and second fluctuation parameters Whether the fluctuation of data is abnormal.
In one embodiment, with reference to Fig. 4 b, wherein determination unit 405 may include:
Training subelement 4051 is used for when the training data sequence is stationary sequence, according to the training data sequence Column are trained autoregressive moving-average model;
Parameter obtains subelement 4052, pre- for obtaining current second according to the autoregressive moving-average model after training Measured data value, and obtain the third fluctuation parameters between the data value and the second prediction data value;
First abnormal determining subelement 4053, for according to first fluctuation parameters, second fluctuation parameters and Third fluctuation parameters determine whether the fluctuation of the current data is abnormal;
Second abnormal determining subelement 4054, for when the training data sequence is not stationary sequence, according to described First fluctuation parameters and second fluctuation parameters determine whether the fluctuation of the current data is abnormal.
In one embodiment, the described first abnormal determining subelement 4053, is used for:
According to the acquisition of the parameter value of first fluctuation parameters, second fluctuation parameters and third fluctuation parameters The final fluctuation parameters value of current data;
When the final fluctuation parameters value within a preset range when, determine that the fluctuation of the current data is normal;
When the final fluctuation parameters value not within a preset range when, determine that the fluctuation of the current data is abnormal.
In one embodiment, the described first abnormal determining subelement 4053, can be specifically used for:
The parameter value of first fluctuation parameters, second fluctuation parameters and third fluctuation parameters is weighted flat It handles, obtains weighted average parameter value;
Using the weighted average parameter value as the final fluctuation parameters value of the current data.
In one embodiment, data capture unit 401 can be used for:
Data are obtained from data source, obtain current data;
The current data is converted into the data of Uniform data format, obtains the data value of translated data.
In one embodiment, training unit 403 can be used for:
Determine the number of model parameter to be estimated in nonlinear regression model (NLRM);
The model parameter to be estimated that nonlinear regression model (NLRM) is solved based on least square method, the training data sequence, is obtained Nonlinear regression model (NLRM) after training.
In one embodiment, training subelement 4051, can be used for:
Based on auto-correlation function analysis and partial autocorrelation function analysis mode, the lag of autoregressive moving-average model is determined Order;
Based on least square method, the lag order and the training data sequence, the autoregressive moving average is solved The model parameter to be estimated of model, the autoregressive moving-average model after being trained.
The description that the step of execution of the above each unit, reference can be made to the above method embodiment.
When it is implemented, above each unit can be used as independent entity to realize, any combination can also be carried out, is made It is realized for same or several entities, the specific implementation of above each unit can be found in the embodiment of the method for front, herein not It repeats again.
The data fluctuations identification device specifically can integrate in server, such as monitoring server.
From the foregoing, it will be observed that the data fluctuations identification device of the embodiment of the present invention obtains current number by data capture unit 401 According to data value;The first fluctuation parameters between the data value and historical data values are obtained by the first parameter acquiring unit 402; Nonlinear regression model (NLRM) is trained according to training data sequence by training unit 403;By the second parameter acquiring unit 404 The first current prediction data value is obtained according to the nonlinear regression model (NLRM) after training, and obtains the data value and described first in advance The second fluctuation parameters between measured data value;By determination unit 405 according to first fluctuation parameters and second fluctuation Parameter determines whether the fluctuation of the current data is abnormal.Fluctuation parameters of the available data of the program in multiple dimensions (such as the first fluctuation parameters and the second fluctuation parameters) determine whether data fluctuations are abnormal based on multidimensional fluctuation parameters, therefore, can To promote the identification accuracy of data fluctuations.
In addition, the program, which can also increase arma modeling, count the dimension it was predicted that increase fluctuation index, it can be to industry Data of being engaged in carry out multidimensional prediction, obtain the fluctuation index (such as the first, second, third fluctuation parameters) of multiple dimensions, based on more The fluctuation index of a dimension determines whether data fluctuations are abnormal, further improves the identification accuracy and flexibly of data fluctuations Property,
With reference to Fig. 5, it may include one or more than one processing that the embodiment of the invention provides a kind of servers 500 The processor 501 of core, the memory 502 of one or more computer readable storage mediums, radio frequency (Radio Frequency, RF) components such as circuit 503, power supply 504, input unit 505.It will be understood by those skilled in the art that showing in Fig. 5 Server architecture out does not constitute the restriction to server, may include than illustrating more or fewer components, or combination Certain components or different component layouts.Wherein:
Processor 501 is the control centre of the server, utilizes each of various interfaces and the entire server of connection Part by running or execute the software program and/or module that are stored in memory 502, and calls and is stored in memory Data in 502, the various functions and processing data of execute server, to carry out integral monitoring to server.Optionally, locate Managing device 501 may include one or more processing cores;Preferably, processor 501 can integrate application processor and modulatedemodulate is mediated Manage device, wherein the main processing operation system of application processor, user interface and application program etc., modem processor is main Processing wireless communication.It is understood that above-mentioned modem processor can not also be integrated into processor 501.
Memory 502 can be used for storing software program and module, and processor 501 is stored in memory 502 by operation Software program and module, thereby executing various function application and data processing.
During RF circuit 503 can be used for receiving and sending messages, signal is sended and received, and particularly, the downlink of base station is believed After breath receives, one or the processing of more than one processor 501 are transferred to;In addition, the data for being related to uplink are sent to base station.
Server further includes the power supply 504 (such as battery) powered to all parts, it is preferred that power supply can pass through power supply Management system and processor 501 are logically contiguous, to realize management charging, electric discharge and power consumption pipe by power-supply management system The functions such as reason.Power supply 504 can also include one or more direct current or AC power source, recharging system, power failure The random components such as detection circuit, power adapter or inverter, power supply status indicator.
The server may also include input unit 505, which can be used for receiving the number or character letter of input Breath.
Specifically in the present embodiment, the processor 501 in server can be according to following instruction, by one or more The corresponding executable file of process of application program be loaded into memory 502, and run and be stored in by processor 501 Application program in reservoir 502, thus realize various functions, it is as follows:
Obtain the data value of current data;Obtain the first fluctuation parameters between the data value and historical data values;Root Nonlinear regression model (NLRM) is trained according to training data sequence;Current the is obtained according to the nonlinear regression model (NLRM) after training One prediction data value, and obtain the second fluctuation parameters between the data value and the first prediction data value;According to described First fluctuation parameters and second fluctuation parameters determine whether the fluctuation of the current data is abnormal.
In some embodiments, work as described in determine according to first fluctuation parameters and second fluctuation parameters When whether the fluctuation of preceding data is abnormal, the processor 501 specifically executes following steps:
When the training data sequence is stationary sequence, according to the training data sequence to autoregressive moving average mould Type is trained;
The second current prediction data value is obtained according to the autoregressive moving-average model after training, and obtains the data Third fluctuation parameters between value and the second prediction data value;
According to first fluctuation parameters, second fluctuation parameters and third fluctuation parameters, the current number is determined According to fluctuation it is whether abnormal;
When the training data sequence is not stationary sequence, according to first fluctuation parameters and second fluctuation Parameter determines whether the fluctuation of the current data is abnormal.
In some embodiments, join when according to the fluctuation of first fluctuation parameters, second fluctuation parameters and third Number, when determining whether the fluctuation of the current data is abnormal, the processor 501 specifically executes following steps:
According to the acquisition of the parameter value of first fluctuation parameters, second fluctuation parameters and third fluctuation parameters The final fluctuation parameters value of current data;
When the final fluctuation parameters value within a preset range when, determine that the fluctuation of the current data is normal;
When the final fluctuation parameters value not within a preset range when, determine that the fluctuation of the current data is abnormal.
In some embodiments, when obtaining the data value of current data, the processor 501 specifically executes following step It is rapid:
Data are obtained from data source, obtain current data;
The current data is converted into the data of Uniform data format, obtains the data value of translated data.
In some embodiments, when being trained according to training data sequence to nonlinear regression model (NLRM), the processing Device 501 specifically executes following steps:
Determine the number of model parameter to be estimated in nonlinear regression model (NLRM);
The model parameter to be estimated that nonlinear regression model (NLRM) is solved based on least square method, the training data sequence, is obtained Nonlinear regression model (NLRM) after training.
In some embodiments, when being trained according to the training data sequence to autoregressive moving-average model, The processor 501 specifically executes following steps:
Based on auto-correlation function analysis and partial autocorrelation function analysis mode, the lag of autoregressive moving-average model is determined Order;
Based on least square method, the lag order and the training data sequence, the autoregressive moving average is solved The model parameter to be estimated of model, the autoregressive moving-average model after being trained.
The data value of the available current data of the server of the embodiment of the present invention;Obtain the data value and historical data The first fluctuation parameters between value;Nonlinear regression model (NLRM) is trained according to training data sequence;According to non-after training Linear regression model (LRM) obtains the first current prediction data value, and obtains between the data value and the first prediction data value The second fluctuation parameters;According to first fluctuation parameters and second fluctuation parameters, the wave of the current data is determined It is dynamic whether abnormal.Fluctuation parameters (such as first fluctuation parameters and second fluctuation of the available data of the program in multiple dimensions Parameter etc.), determine whether data fluctuations are abnormal, and therefore, the identification that can promote data fluctuations is accurate based on multidimensional fluctuation parameters Property.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can It is completed with instructing relevant hardware by program, which can be stored in a computer readable storage medium, storage Medium may include: read-only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), disk or CD etc..
A kind of data fluctuations recognition methods, device and storage medium is provided for the embodiments of the invention above to have carried out in detail Thin to introduce, used herein a specific example illustrates the principle and implementation of the invention, and above embodiments are said It is bright to be merely used to help understand method and its core concept of the invention;Meanwhile for those skilled in the art, according to this hair Bright thought, there will be changes in the specific implementation manner and application range, in conclusion the content of the present specification should not manage Solution is limitation of the present invention.

Claims (14)

1. a kind of data fluctuations recognition methods characterized by comprising
Obtain the data value of current data;
Obtain the first fluctuation parameters between the data value and historical data values;
Nonlinear regression model (NLRM) is trained according to training data sequence;
The first current prediction data value is obtained according to the nonlinear regression model (NLRM) after training, and obtain the data value with it is described The second fluctuation parameters between first prediction data value;
According to first fluctuation parameters and second fluctuation parameters, determine whether the fluctuation of the current data is abnormal.
2. data fluctuations recognition methods as described in claim 1, which is characterized in that according to first fluctuation parameters and institute The second fluctuation parameters are stated, determine whether the fluctuation of the current data is abnormal, comprising:
When the training data sequence be stationary sequence when, according to the training data sequence to autoregressive moving-average model into Row training;
The second current prediction data value is obtained according to the autoregressive moving-average model after training, and obtain the data value with Third fluctuation parameters between the second prediction data value;
According to first fluctuation parameters, second fluctuation parameters and third fluctuation parameters, the current data is determined Whether fluctuation is abnormal;
When the training data sequence is not stationary sequence, joined according to first fluctuation parameters and second fluctuation Number determines whether the fluctuation of the current data is abnormal.
3. data fluctuations recognition methods as described in claim 1, which is characterized in that according to first fluctuation parameters, described Second fluctuation parameters and third fluctuation parameters determine whether the fluctuation of the current data is abnormal, comprising:
It is obtained according to the parameter value of first fluctuation parameters, second fluctuation parameters and third fluctuation parameters described current The final fluctuation parameters value of data;
When the final fluctuation parameters value within a preset range when, determine that the fluctuation of the current data is normal;
When the final fluctuation parameters value not within a preset range when, determine that the fluctuation of the current data is abnormal.
4. data fluctuations recognition methods as claimed in claim 3, which is characterized in that according to first fluctuation parameters, described Second fluctuation parameters and third fluctuation parameters obtain the final fluctuation parameters of the current data, comprising:
Place is weighted and averaged to the parameter value of first fluctuation parameters, second fluctuation parameters and third fluctuation parameters Reason obtains weighted average parameter value;
Using the weighted average parameter value as the final fluctuation parameters value of the current data.
5. data fluctuations recognition methods as described in claim 1, which is characterized in that obtain the data value of current data, comprising:
Data are obtained from data source, obtain current data;
The current data is converted into the data of Uniform data format, obtains the data value of translated data.
6. data fluctuations recognition methods as described in claim 1, which is characterized in that according to training data sequence to non-linear time Model is returned to be trained, comprising:
Determine the number of model parameter to be estimated in nonlinear regression model (NLRM);
The model parameter to be estimated that nonlinear regression model (NLRM) is solved based on least square method, the training data sequence, is trained Nonlinear regression model (NLRM) afterwards.
7. data fluctuations recognition methods as claimed in claim 2, which is characterized in that returned according to the training data sequence to oneself Moving average model is returned to be trained, comprising:
Based on auto-correlation function analysis and partial autocorrelation function analysis mode, the lag rank of autoregressive moving-average model is determined Number;
Based on least square method, the lag order and the training data sequence, the autoregressive moving-average model is solved Model parameter to be estimated, the autoregressive moving-average model after being trained.
8. a kind of data fluctuations identification device characterized by comprising
Data capture unit, for obtaining the data value of current data;
First parameter acquiring unit, for obtaining the first fluctuation parameters between the data value and historical data values;
Training unit, for being trained according to training data sequence to nonlinear regression model (NLRM);
Second parameter acquiring unit, for obtaining the first current prediction data value according to the nonlinear regression model (NLRM) after training, And obtain the second fluctuation parameters between the data value and the first prediction data value;
Determination unit, for determining the current data according to first fluctuation parameters and second fluctuation parameters Whether fluctuation is abnormal.
9. data fluctuations identification device as claimed in claim 8, which is characterized in that the determination unit, comprising:
Training subelement, for when the training data sequence be stationary sequence when, according to the training data sequence to from return Moving average model is returned to be trained;
Parameter obtains subelement, for obtaining the second current prediction data according to the autoregressive moving-average model after training Value, and obtain the third fluctuation parameters between the data value and the second prediction data value;
First abnormal determining subelement, for being fluctuated according to first fluctuation parameters, second fluctuation parameters and third Parameter determines whether the fluctuation of the current data is abnormal;
Second abnormal determining subelement, for being fluctuated according to described first when the training data sequence is not stationary sequence Parameter and second fluctuation parameters determine whether the fluctuation of the current data is abnormal.
10. data fluctuations identification device as claimed in claim 9, which is characterized in that the described first abnormal determining subelement is used In:
It is obtained according to the parameter value of first fluctuation parameters, second fluctuation parameters and third fluctuation parameters described current The final fluctuation parameters value of data;
When the final fluctuation parameters value within a preset range when, determine that the fluctuation of the current data is normal;
When the final fluctuation parameters value not within a preset range when, determine that the fluctuation of the current data is abnormal.
11. data fluctuations identification device as claimed in claim 8, which is characterized in that data capture unit is used for:
Data are obtained from data source, obtain current data;
The current data is converted into the data of Uniform data format, obtains the data value of translated data.
12. data fluctuations identification device as claimed in claim 8, which is characterized in that training unit is used for:
Determine the number of model parameter to be estimated in nonlinear regression model (NLRM);
The model parameter to be estimated that nonlinear regression model (NLRM) is solved based on least square method, the training data sequence, is trained Nonlinear regression model (NLRM) afterwards.
13. data fluctuations identification device as claimed in claim 9, which is characterized in that training subelement is used for:
Based on auto-correlation function analysis and partial autocorrelation function analysis mode, the lag rank of autoregressive moving-average model is determined Number;
Based on least square method, the lag order and the training data sequence, the autoregressive moving-average model is solved Model parameter to be estimated, the autoregressive moving-average model after being trained.
14. a kind of storage medium, which is characterized in that the storage medium is stored with instruction, when described instruction is executed by processor It realizes such as the step of any one of claim 1-7 the method.
CN201810214976.7A 2018-03-15 2018-03-15 Data fluctuation identification method and device and storage medium Active CN110275809B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810214976.7A CN110275809B (en) 2018-03-15 2018-03-15 Data fluctuation identification method and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810214976.7A CN110275809B (en) 2018-03-15 2018-03-15 Data fluctuation identification method and device and storage medium

Publications (2)

Publication Number Publication Date
CN110275809A true CN110275809A (en) 2019-09-24
CN110275809B CN110275809B (en) 2022-07-08

Family

ID=67957686

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810214976.7A Active CN110275809B (en) 2018-03-15 2018-03-15 Data fluctuation identification method and device and storage medium

Country Status (1)

Country Link
CN (1) CN110275809B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113419141A (en) * 2021-08-26 2021-09-21 中国南方电网有限责任公司超高压输电公司广州局 Direct-current line fault positioning method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103685347A (en) * 2012-09-03 2014-03-26 阿里巴巴集团控股有限公司 Method and device for allocating network resources
CN106991285A (en) * 2017-04-01 2017-07-28 广东工业大学 A kind of short-term wind speed multistep forecasting method and device
CN107506871A (en) * 2017-09-08 2017-12-22 广东工业大学 A kind of method and system of interval prediction
US20180041527A1 (en) * 2013-03-15 2018-02-08 Shape Security, Inc. Using instrumentation code to detect bots or malware

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103685347A (en) * 2012-09-03 2014-03-26 阿里巴巴集团控股有限公司 Method and device for allocating network resources
US20180041527A1 (en) * 2013-03-15 2018-02-08 Shape Security, Inc. Using instrumentation code to detect bots or malware
CN106991285A (en) * 2017-04-01 2017-07-28 广东工业大学 A kind of short-term wind speed multistep forecasting method and device
CN107506871A (en) * 2017-09-08 2017-12-22 广东工业大学 A kind of method and system of interval prediction

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113419141A (en) * 2021-08-26 2021-09-21 中国南方电网有限责任公司超高压输电公司广州局 Direct-current line fault positioning method and device

Also Published As

Publication number Publication date
CN110275809B (en) 2022-07-08

Similar Documents

Publication Publication Date Title
CN107704387B (en) Method, device, electronic equipment and computer readable medium for system early warning
EP4020315A1 (en) Method, apparatus and system for determining label
CN114298322B (en) Federal learning method and apparatus, system, electronic device, and computer readable medium
CN110474896A (en) Data communications method and relevant device based on Modbus consensus standard
CN114500339B (en) Node bandwidth monitoring method and device, electronic equipment and storage medium
CN109558248A (en) A kind of method and system for the determining resource allocation parameters calculated towards ocean model
CN110275809A (en) A kind of data fluctuations recognition methods, device and storage medium
CN110688098A (en) Method and device for generating system framework code, electronic equipment and storage medium
CN113886006A (en) Resource scheduling method, device and equipment and readable storage medium
CN114019400A (en) Lithium battery life cycle monitoring and management method, system and storage medium
CN111901405B (en) Multi-node monitoring method and device, electronic equipment and storage medium
CN110389876B (en) Method, device and equipment for supervising basic resource capacity and storage medium
CN110175083A (en) The monitoring method and device of operating system
CN111046082A (en) Data source determination method, device, server and storage medium
CN107590012B (en) Equipment disconnection reason analysis method and device, storage medium and electronic equipment
CN113449008B (en) Modeling method and device
CN114446427A (en) Electronic equipment and health data attribution identification method
CN110220639A (en) Pressure gauge meter register method, device and terminal device in substation
CN109067620A (en) The monitoring method and device of gateway
CN115473343B (en) Intelligent gateway multi-master-station parallel access test method
CN116703248B (en) Data auditing method, device, electronic equipment and computer readable storage medium
CN115081942B (en) Data processing method and related device
CN113656270B (en) Method, device, medium and computer program product for testing application performance
CN109031041B (en) Distribution network voltage monitoring device point distribution method and system
CN117236885A (en) Emergency treatment method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant