CN110275809B

CN110275809B - Data fluctuation identification method and device and storage medium

Info

Publication number: CN110275809B
Application number: CN201810214976.7A
Authority: CN
Inventors: 阮航
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-03-15
Filing date: 2018-03-15
Publication date: 2022-07-08
Anticipated expiration: 2038-03-15
Also published as: CN110275809A

Abstract

The embodiment of the invention discloses a data fluctuation identification method, a data fluctuation identification device and a storage medium; the embodiment of the invention acquires the data value of the current data; acquiring a first fluctuation parameter between a data value and a historical data value; training the nonlinear regression model according to the training data sequence; acquiring a current first predicted data value according to the trained nonlinear regression model, and acquiring a second fluctuation parameter between the data value and the first predicted data value; and determining whether the fluctuation of the current data is abnormal or not according to the first fluctuation parameter and the second fluctuation parameter. According to the scheme, fluctuation parameters of the data on multiple dimensions can be acquired, and whether the data fluctuation is abnormal or not is determined based on the multidimensional fluctuation parameters, so that the identification accuracy of the data fluctuation can be improved.

Description

Data fluctuation identification method and device and storage medium

Technical Field

The invention relates to the technical field of computers, in particular to a data fluctuation identification method and device and a storage medium.

Background

In order to ensure the quality of service, a data fluctuation identification technical scheme is adopted to identify the data of each index of the service in a fluctuation mode and find report abnormity.

The existing data fluctuation identification technical scheme mainly focuses on the data acquisition aspect of fluctuation identification, and mainly comprises cloud platform-based fluctuation identification, namely, data is collected to a cloud platform for gathering, and whether the data is abnormal or not is determined by calculating the fluctuation of the data through a cloud computing platform.

However, the existing data fluctuation identification schemes only pay attention to the platform of the fluctuation identification data and the real-time performance of the fluctuation identification data, but for the fluctuation identification of the external service business data, the schemes only have the function of "perceiving" the data change, and for the occasional fluctuations of the business data normalization, the schemes cannot accurately identify the fluctuations, such as whether the current data fluctuation belongs to a normal range, so that the existing data fluctuation identification schemes have low accuracy in identifying the data fluctuation.

Disclosure of Invention

The embodiment of the invention provides a data fluctuation identification method, a data fluctuation identification device and a storage medium, which can improve the identification accuracy of data fluctuation.

The embodiment of the invention provides a data fluctuation identification method, which comprises the following steps:

acquiring a data value of current data;

acquiring a first fluctuation parameter between the data value and a historical data value;

training the nonlinear regression model according to the training data sequence;

acquiring a current first predicted data value according to the trained nonlinear regression model, and acquiring a second fluctuation parameter between the data value and the first predicted data value;

and determining whether the fluctuation of the current data is abnormal or not according to the first fluctuation parameter and the second fluctuation parameter.

Correspondingly, an embodiment of the present invention further provides a data fluctuation identification apparatus, including:

the data acquisition unit is used for acquiring the data value of the current data;

a first parameter acquisition unit for acquiring a first fluctuation parameter between the data value and a historical data value;

the training unit is used for training the nonlinear regression model according to the training data sequence;

the second parameter obtaining unit is used for obtaining a current first prediction data value according to the trained nonlinear regression model and obtaining a second fluctuation parameter between the data value and the first prediction data value;

and the determining unit is used for determining whether the fluctuation of the current data is abnormal or not according to the first fluctuation parameter and the second fluctuation parameter.

Correspondingly, the embodiment of the present invention further provides a storage medium, where the storage medium stores instructions, and the instructions, when executed by a processor, implement the steps of any of the methods provided in the embodiment of the present invention.

The embodiment of the invention adopts the steps of obtaining the data value of the current data; acquiring a first fluctuation parameter between a data value and a historical data value; training the nonlinear regression model according to the training data sequence; acquiring a current first predicted data value according to the trained nonlinear regression model, and acquiring a second fluctuation parameter between the data value and the first predicted data value; and determining whether the fluctuation of the current data is abnormal or not according to the first fluctuation parameter and the second fluctuation parameter. According to the scheme, fluctuation parameters of the data on multiple dimensions can be acquired, and whether the data fluctuation is abnormal or not is determined based on the multidimensional fluctuation parameters, so that the identification accuracy of the data fluctuation can be improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1a is a schematic diagram of a scenario of a data fluctuation identification system according to an embodiment of the present invention;

FIG. 1b is a schematic flow chart of a data fluctuation identification method according to an embodiment of the present invention;

FIG. 1c is a schematic diagram of a solution of a non-linear regression model provided by an embodiment of the present invention;

FIG. 2 is a schematic diagram of a hysteresis order determination process provided by an embodiment of the present invention;

FIG. 3a is another schematic flow chart of a data fluctuation identification method according to an embodiment of the present invention;

FIG. 3b is a schematic diagram of a logic architecture of a data fluctuation identification method according to an embodiment of the present invention;

FIG. 4a is a schematic diagram of a first structure of a data fluctuation recognition apparatus according to an embodiment of the present invention;

FIG. 4b is a schematic diagram of a second structure of the data fluctuation recognition apparatus according to the embodiment of the present invention;

fig. 5 is a schematic structural diagram of a server according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention provides a data fluctuation identification method, a data fluctuation identification device and a storage medium.

The embodiment of the invention provides a data fluctuation identification system, which can comprise any data fluctuation identification device provided by the embodiment of the invention. The data fluctuation identification device may be in a server, such as a fluctuation identification server.

In addition, the data fluctuation identification system may further include other devices, such as a terminal, where the terminal may be a mobile phone, a tablet computer, a notebook computer, or the like.

For example, referring to fig. 1a, there is provided a data fluctuation identification system comprising: a terminal 10 and a server 20, the terminal 10 and the server 20 being connected via a network 30. The network 30 includes network entities such as routers and gateways, which are shown schematically in the figure. The terminal 10 may communicate with the server 20 via a wired network or a wireless network to request a service on the server 20, such as downloading an application and/or an application update package and/or data information or service information related to the application from the server 20. The terminal 10 may be a mobile phone, a tablet computer, a notebook computer, etc., and fig. 1a illustrates the terminal 10 as a notebook computer. The terminal 10 may also have various user-desired applications installed therein, such as entertainment-enabled applications (e.g., image processing applications, audio playback applications, gaming applications, reading software), and service-enabled applications.

The terminal 10 may report data to the server 20, and the server 20 may obtain a data value of the data from a local location; then, acquiring a first fluctuation parameter between the data value and the historical data value; training the nonlinear regression model according to the training data sequence; acquiring a current first predicted data value according to the trained nonlinear regression model, and acquiring a second fluctuation parameter between the data value and the first predicted data value; and determining whether the fluctuation of the current data is abnormal or not according to the first fluctuation parameter and the second fluctuation parameter.

In addition, the server 20 may also send an alarm to remind when it is determined that the fluctuation of the data value is abnormal.

The details will be described below separately.

The present embodiment will be described from the perspective of a data fluctuation recognizing apparatus, which may be specifically a server or the like.

As shown in fig. 1b, a data fluctuation identification method is provided, which may be executed by a processor in a server, and the specific flow may be as follows:

101. and acquiring the data value of the current data.

The current data is data currently acquired from a data source, such as data currently acquired from a fluctuation identification data source. The current data may be data obtained from a data source today.

The data source for recognizing the fluctuation is various, for example, the data source can be mysql, hive and other databases, a single file (file) or a distributed file (hdfs), and even a piece of executable code (shell script). Therefore, the types or formats of the acquired data are not the same.

In order to improve the data fluctuation identification efficiency, optionally, the data format or type can also be normalized. That is, the step of "obtaining the data value of the current data" may include:

acquiring data from a data source to obtain current data;

and converting the current data into data with a uniform data format to obtain a data value of the converted data.

Specifically, the data source identified by the fluctuation may be abstracted into a corresponding data generator (generator), the data generator (generator) acquires data corresponding to the current time from the corresponding data source, and the data generator converts the data into data in a uniform data format.

Wherein, the abstract method of the data source can be abstracted based on the jdbc mode. In practical applications, generator abstraction can be implemented at the data abstraction layer. The core function of the data abstraction layer is to analyze various data sources and adjust adaptation and generate a corresponding generator, so that all the data sources are represented to the outside in a data format of the generator after the data abstraction layer is performed on the data sources.

It can be seen that the data abstraction layer mainly completes the normalization of data types, and the data sources for identifying fluctuations are variable, so before identifying fluctuations, these data sources in various formats need to be uniformly structured, and are re-described as generators, and the generators realize the acquisition of data from the data sources and provide uniform data reporting formats to the fluctuation identification logic layer (the layer for identifying fluctuations).

102. A first fluctuation parameter between the data value and the historical data value is obtained.

The historical data value is the value of data previously acquired from the data source, that is, the value of data acquired from the data source before the current time. For example, the value may be the last data retrieved from the data source.

For example, the historical data value may be the value of the data obtained from the data source yesterday.

The first fluctuation parameter is a parameter for measuring a data value change amplitude, for example, the first fluctuation parameter may be used for measuring a data value change amplitude of a data value of current data relative to a data value of historical data. For example, the first fluctuation parameter may include a fluctuation rate confidence, which may be obtained by dividing a difference between a data value of the current data and a historical data value by the historical data value. The following were used:

volatility constancy ═ x-x-1)/x_-1Where x is the data value of the current data, x_-1Is a historical data value.

103. And training the nonlinear regression model according to the training data sequence.

Wherein, the linear regression is: it is known that a series of linear data sequences, such as a time series (a time series or dynamic series refers to a series of numbers obtained by arranging values of the same statistical index in time sequence of occurrence), are expressed by Y ═ WX + b (i.e. a linear regression model expression) when the time series conforms to a linear characteristic, where W and b are parameters to be estimated, and linear regression is performed by calculating a direct two-norm between a known sample and a function Y ═ WX + b, so that a regression equation Y ═ WX + b is closest to an existing time series sample by minimizing the two-norm.

Wherein the nonlinear regression is: similar to linear regression, except that in nonlinear regression, the function to be evaluated, Y ═ f (x), is a nonlinear function, and the regression equation, Y ═ f (x), is also approximated to the existing time series samples by minimizing the two-norm.

The nonlinear regression model may be various, such as a hyperbolic model, a power function model, a nonlinear polynomial model, and the like.

For example, taking the nonlinear regression model as the nonlinear polynomial model as an example, the model expression of the nonlinear polynomial model is:

Y_T＝a₀+a₁T¹+a2T²+...+a_pT^p

where p represents a power series. Because of the problems of any curve, curved surface and hypersurface, the polynomial can be arbitrarily approximated within a certain range. p represents the degree of approximation, in the embodiment of the present invention, p is preferably 4, that is, a is a model parameter to be estimated, and the solution is completed by using a least square method.

The training data sequence is a time sequence and comprises a plurality of historical data values, and the historical data are arranged according to the corresponding time sequence.

In the embodiment of the invention, the nonlinear regression model is trained according to the training data sequence, namely, the model parameters to be estimated of the nonlinear regression model are solved according to the training data sequence. For example, taking a nonlinear regression model as a nonlinear polynomial model as an example, training the model is to solve the model parameter a to be estimated.

Specifically, the step "training the non-linear regression model according to the training data sequence" may include:

determining the number of model parameters to be estimated in the nonlinear regression model;

and solving the model parameters to be estimated of the nonlinear regression model based on a least square method and a training data sequence to obtain the trained nonlinear regression model.

Y_T＝a₀+a₁T¹+a2T²+...+a_pT^p

the reason why the number of parameters is directly determined to be 4, that is, p is 4, is that in the embodiment of the present invention, a reasonable curve can be fit when the number of parameters is 4, and theoretically, the more the parameters are, the closer the parameters are to actual values, but the more the parameters are, the more the actual values are, the more the actual values are, the overfitting phenomenon is. Therefore, the number of the fitting states adopted in the embodiment of the invention is 4, so that the optimal fitting state is achieved.

Referring to FIG. 1c, assuming the known actual value of Yt (i.e., the training data sequence), all a's are found by calling the least squares method of python, so that (Yt-Yt)²The minimum output value is obtained.

104. And acquiring a current first predicted data value according to the trained nonlinear regression model, and acquiring a second fluctuation parameter between the data value and the first predicted data value.

The current first predicted data value is a predicted data value of the current time, such as a predicted data value of today.

In the embodiment of the invention, after the nonlinear regression model is trained, the model parameters to be estimated of the nonlinear regression model can be obtained, so that the trained nonlinear regression model is obtained.

For example, a nonlinear regression model is used as a nonlinear polynomial model: y is_T＝a₀+a₁T¹+a2T²+...+a_pT^pFor example, after solving all a, a trained nonlinear polynomial model (denoted as g (x)) can be obtained, and then, according to the trained nonlinear polynomial model: the YT 0+ a1T1+ a2T2+. + apTp may obtain a current first predicted data value, i.e., a predicted current data value, such as a predicted first predicted data value today.

In the embodiment of the present invention, after the current first predicted data value is obtained, a second fluctuation parameter between the data value of the current data and the first predicted data value may also be obtained.

The second fluctuation parameter is a parameter for measuring a change amplitude of the data value, and for example, may be used for measuring a change amplitude of the data value of the current data relative to the predicted data value. For example, the second fluctuation parameter may be denoted as confidence ', and the second fluctuation parameter confidence' may be obtained by dividing a difference between a data value of the current data and the predicted data value by a data value of the current data. The following were used:

the second fluctuation parameter (x-g (x))/x, where x is the data value of the current data and g (x) is the predicted data value of the non-linear regression model.

In the embodiment of the present invention, the time sequence of the acquiring process of the first fluctuation parameter and the acquiring process of the second fluctuation parameter may be multiple, for example, may be executed simultaneously or may be executed sequentially.

105. And determining whether the fluctuation of the current data is abnormal or not according to the first fluctuation parameter and the second fluctuation parameter.

Alternatively, a final fluctuation parameter value of the current data may be obtained according to the first fluctuation parameter and the second fluctuation parameter, and then, whether the fluctuation of the current data is abnormal or not may be determined according to the final fluctuation parameter value.

For example, when the final fluctuation parameter value is within a preset range, determining that the fluctuation of the current data is normal;

and when the final fluctuation parameter value is not in the preset range, determining that the fluctuation of the current data is abnormal.

For example, the parameter values of the first fluctuation parameter and the second fluctuation parameter may be summarized, and the summarized parameter values are used as the final fluctuation parameter values of the current data.

For example, the parameter values of the first fluctuation parameter and the second fluctuation parameter may be subjected to weighted average processing, and the weighted average value may be taken as the final fluctuation parameter value.

Assuming that the first fluctuation parameter is confidence and the second fluctuation parameter is confidence ', in this case, the final fluctuation parameter value confidence final is q1 confidence + q2 confidence', where q1 and q2 are weights and may be set according to actual requirements, for example, q1 is 0.3, q2 is 0.7, and so on.

In an embodiment, in order to further improve the accuracy of data fluctuation identification, an Auto-Regressive Moving Average Model (ARMA) may be further introduced to perform data prediction, a third fluctuation parameter between a data value of current data and a predicted value of the Model is obtained, and then whether the fluctuation of the current data is abnormal is determined based on the fluctuation parameters in three dimensions.

Since the ARMA model is limited in prediction, it must be based on stationary sequence prediction, and is not available for non-stationary sequences, i.e. the ARMA model fitting sequence, i.e. the training data sequence, must be a stationary sequence. Therefore, before using the ARMA model for prediction, it must be determined whether the training data sequence of the model is a stationary sequence.

Alternatively, the step of "determining whether the fluctuation of the current data is abnormal according to the first fluctuation parameter and the second fluctuation parameter" may include:

when the training data sequence is a stable sequence, training an autoregressive moving average model according to the training data sequence;

acquiring a current second predicted data value according to the trained autoregressive moving average model, and acquiring a third fluctuation parameter between the data value and the second predicted data value;

determining whether the fluctuation of the current data is abnormal or not according to the first fluctuation parameter, the second fluctuation parameter and the third fluctuation parameter;

and when the training data sequence is not a stable sequence, determining whether the fluctuation of the current data is abnormal according to the first fluctuation parameter and the second fluctuation parameter.

The stationary sequence refers to a time sequence, and if the expected value of the sequence has no trend change, the variance has no great change, and is weakly related to the current time point, and the periodic characteristic is not obvious, the stationary sequence is called, such as the simplest arithmetic series.

The ARMA model is a model commonly used for analyzing the trend of future data in the metrological economics at present, and the ARMA model simulates the change condition of a group of data by means of AR (Auto-regressive) and MA (Moving-Average) models. The expression of the ARMA model is:

Yt＝a₀+a1Y_t-1+a₂Y_t-2+...+a_pY_t-p+b₁e_t+b₂e_t-2+...+b_qe_t-q

where et is a distribution obeying the desired value e (et) ═ 0, the variance d (et) ═ d2, and et + n are independent of one another. a and b are parameters to be estimated, and after the model is given, the values of a and b in the model need to be solved to obtain the expression F (Yt) of Yt. The solving method can be completed by adopting a least square method.

Wherein, the autoregressive model: (AutoRegression, AR), abbreviated as AR (p), refers to a stochastic process of the form: y is_T＝A₁Y_T-1+A₂Y_T-2+....+A_PY_T-P+U_TWherein A is₁、A₂、...、A_PP parameters to be solved; p is the number of lag deadlines.

Moving average model: also taking the autoregressive process as an example, Y_TIs Y_T,，，，Y_T-pBy differential operation to obtain Y_T＝U_T-A₁U_T-1-A_PU_T-P。

In the embodiment of the present invention, when the training data sequence is a stationary sequence (i.e. Yt is a stationary sequence, i.e. expectation, variance, and autocorrelation function of Yt are not related to t), the stationary sequence may be fitted by using an ARMA model, and specifically, the ARMA model may be trained according to the stationary sequence, i.e. to solve model parameters to be estimated of the ARMA model, such as a and b in the above ARMA model expression.

When the training data sequence is not a stationary sequence, the ARMA model cannot be used for fitting the stationary sequence, so that the ARMA model is not used for predicting the data value when the training data sequence is a non-stationary sequence; at this time, the training data sequence may be fitted according to a nonlinear regression model, a data value may be predicted based on the nonlinear regression model, and then whether or not the fluctuation of the data is abnormal may be determined according to the fluctuation parameters of the current data and the predicted data value, and the fluctuation parameters of the current data and the historical data value.

In the embodiment of the invention, the nonlinear regression model is used as a supplement to ARMA on one hand, namely, under the condition that a data sequence is unstable, a data value is predicted through the model; on the other hand, under the condition of stable sequence, a more reasonable predicted value can be predicted together with the ARMA model, a plurality of dimensional fluctuation parameters are generated, and the accuracy of data fluctuation identification is improved.

The ARMA model is trained based on the training data sequence, namely model parameters to be estimated of the ARMA model are solved.

The solution of the model parameters to be estimated of the ARMA model is mainly to determine the hysteresis order of the ARMA model, which is also called hysteresis period, namely the p and q parameters in the above-mentioned ARMA model expression.

The embodiment of the invention can determine the hysteresis order by an autocorrelation function analysis mode and a partial autocorrelation function analysis mode; then, the model parameters to be estimated are solved by a least square method.

Assume that the ARMA model is: r is Xt + Yt, wherein Yt is b₁e_t+b₂e_t-2+...+b_qe_t-q；Xt＝a₀+a1Y_t-1+a₂Y_t-2+...+a_pY_t-p。

Referring to fig. 2, the hysteresis order determination process of the ARMA model is as follows:

201. the hysteresis order q is set to 1 and p is set to 1.

202. And judging whether q is greater than 5, if not, executing the step 203, and if so, executing the step 207.

203. And acquiring the autocorrelation coefficient of the lag order q.

That is, the sequence Y is calculated_tAnd Y_t+qCorrelation coefficient between:

Cov(Y_t,Y_t+q)＝E(Y_t-u_t)(Y_t+q-u_t)/D(Y_t)。

204. it is determined whether the autocorrelation coefficient is zero, if so, go to step 206, and if not, go to step 205.

205. The value of q is incremented by 1 and execution returns to step 202.

206. It is determined that q is the currently set value, go to step 208.

After the hysteresis order q is determined, Yt ═ b can be determined₁e_t+b₂e_t-2+...+b_qe_t-q。

207. Set Yt to 0, go to step 208.

When q > 5, indicating that the degree of fit is too low, at this point Yt can be set to zero and the ARMA model is: r ═ Xt + 0.

208. It is determined whether the hysteresis order p is greater than 5, if not, step 209 is executed, and if so, step 213 is executed.

209. And acquiring the partial autocorrelation coefficient of the lag phase p.

Calculating the partial correlation coefficient between the sequences Xt and Xt + p:

E{[(x(t)-Ex(t)][x(t-k)-Ex(t-k)])}/E{[x(t-k)-Ex(t-k)]^2}。

210. and judging whether the partial autocorrelation coefficient is zero, if so, executing step 211, otherwise, executing step 211, determining that p is the currently set value, and ending the process.

After p is determined, the determination Xt as a can be obtained₀+a1Y_t-1+a₂Y_t-2+...+a_pY_t-p。

212. The value of p is incremented by 1 and execution returns to step 208.

213. Setting Xt to 0 ends the flow.

When p > 5, indicating that the degree of fit is too low, at this point Xt may be set to zero and the ARMA model is: r is 0+ Yt.

The lag orders p and q of the ARMA model can be determined by the method, and after the lag orders p and q are determined, the ARMA model can be constructed:

Yt＝a₀+a1Y_t-1+a₂Y_t-2+...+a_pY_t-p+b₁e_t+b₂e_t-2+...+b_qe_t-qhere, R is given to Yt.

After the ARMA model is constructed, the least square method can be adopted to solve the model parameters to be estimated. For example, let Yt be the actual value known (i.e., the training data sequence), all a and b are found by calling the least squares method of python, so that (Yt-Yt)²The minimum output value is obtained.

After solving the model parameters to be estimated, such as a and b, of the ARMA model, an exact model expression of the ARMA model can be obtained, such as data Y_tExpression F (Y)_t) (ii) a Then, the current data value is predicted based on the ARMA model, i.e. a second current predicted data value, such as a data value predicted today, is obtained. For example, Y_t+1The value of (A) can be represented by F (Y)_t+1) And (4) solving.

After the second predicted data value is obtained by the ARMA model, a third fluctuation parameter between the data value of the current data and the second predicted data value may be obtained. And finally, determining whether the fluctuation of the current data is abnormal or not by combining the first fluctuation parameter, the second fluctuation parameter and the third fluctuation parameter.

The third fluctuation parameter is a parameter for measuring a change amplitude of the data value, for example, the third fluctuation parameter may be used for measuring a change amplitude of the data value of the current data relative to the predicted data value. For example, the third fluctuation parameter may be denoted as confidence ", and the third fluctuation parameter confidence" may be obtained by dividing a difference between a data value of the current data and a predicted data value by a data value of the current data. The following were used:

the third fluctuation parameter consistency ═ (x-ARMA (x))/x, x is the data value of the current data, and ARMA (x) is the predicted data value of the ARMA model.

The method for determining whether the fluctuation of the current data is abnormal based on the three fluctuation parameters, i.e., the first fluctuation parameter, the second fluctuation parameter, and the third fluctuation parameter, may be various, for example, a final fluctuation parameter value is obtained according to the parameter values of the three fluctuation parameters, and then, whether the playing of the data is abnormal is determined based on the final fluctuation parameter value.

For example, the step "determining whether the fluctuation of the current data is abnormal according to the first fluctuation parameter, the second fluctuation parameter, and the third fluctuation parameter" may include:

acquiring a final fluctuation parameter value of the current data according to the parameter values of the first fluctuation parameter, the second fluctuation parameter and the third fluctuation parameter;

when the final fluctuation parameter value is within a preset range, determining that the fluctuation of the current data is normal;

The method for generating the final fluctuation parameter value may be various ways based on the parameter values of the first fluctuation parameter, the second fluctuation parameter, and the third fluctuation parameter, for example, the parameter values of the first fluctuation parameter, the second fluctuation parameter, and the third fluctuation parameter may be summarized, and the summarized parameter value is used as the final fluctuation parameter value of the current data.

For example, in order to improve the accuracy and efficiency of data fluctuation identification, the parameter values of the first fluctuation parameter, the second fluctuation parameter, and the third fluctuation parameter may be subjected to weighted average processing, and the weighted average value may be used as the final fluctuation parameter value.

Assuming that the first fluctuation parameter is a compliance, the second fluctuation parameter is a compliance, and the third fluctuation parameter is a compliance ", in this case, the final fluctuation parameter values compliance final ═ q1 ═ compliance + q2 ═ compliance' + q3 ″, where q1, q2, and q3 may be weights, and may be set according to actual requirements, for example, q1 ═ 0.3, q2 ═ 0.3, q3 ═ 0.3, and so on.

As can be seen from the above, the embodiment of the present invention obtains the data value of the current data; acquiring a first fluctuation parameter between a data value and a historical data value; training the nonlinear regression model according to the training data sequence; acquiring a current first predicted data value according to the trained nonlinear regression model, and acquiring a second fluctuation parameter between the data value and the first predicted data value; and determining whether the fluctuation of the current data is abnormal or not according to the first fluctuation parameter and the second fluctuation parameter. According to the scheme, fluctuation parameters (such as a first fluctuation parameter and a second fluctuation parameter) of the data on multiple dimensions can be obtained, and whether the data fluctuation is abnormal or not is determined based on the multidimensional fluctuation parameters, so that the identification accuracy of the data fluctuation can be improved.

In addition, the scheme can also increase an ARMA model for data prediction, increase the dimensionality of fluctuation indexes, perform multi-dimensional prediction on service data to obtain fluctuation indexes (such as first, second and third fluctuation parameters) of multiple dimensionalities, determine whether the data fluctuation is abnormal or not based on the fluctuation indexes of the multiple dimensionalities, further improve the identification accuracy and flexibility of the data fluctuation,

the method described in the above embodiments is further described in detail below.

Referring to fig. 3a and 3b, a data fluctuation identification method specifically includes the following steps:

301. and receiving the data reported by the data generator at present, and acquiring the data value of the current data.

In order to improve the data fluctuation identification efficiency, optionally, the data format or type can also be normalized. The data source identified by the fluctuation can be abstracted into a corresponding data generator (generator), the data generator (generator) acquires data corresponding to the current time from the corresponding data source, and the data generator converts the data into data with a uniform data format for reporting.

The generator implements data acquisition from a data source and provides data in a unified data reporting format to a fluctuation recognition logic layer (a layer for recognizing fluctuations).

302. A first fluctuation parameter between the data value and the historical data value is obtained, go to step 307.

The historical data value is the value of data previously acquired from the data source, that is, the value of data acquired from the data source before the current time. For example, it may be the value of the data last retrieved from the data source.

The first fluctuation parameter is a parameter for measuring a data value change amplitude, for example, the first fluctuation parameter may be used for measuring a data value change amplitude of a data value of current data relative to a data value of historical data. For example, the first fluctuation parameter may include a fluctuation rate confidence, which may be obtained by dividing a difference between a data value of the current data and a historical data value by the historical data value. The following:

303. And training the nonlinear regression model according to the training data sequence, and acquiring a current first prediction data value according to the trained nonlinear regression model.

Y_T＝a₀+a₁T¹+a2T²+...+a_pT^p

The training process of the nonlinear regression model may refer to the description of the above embodiments.

304. A second fluctuation parameter between the data value and the first predicted data value is obtained, step 307.

305. When the training data sequence is a stable sequence, training an autoregressive moving average model according to the training data sequence; obtaining a current second prediction data value according to the trained autoregressive moving average model; when the training data sequence is not a stationary sequence, the autoregressive moving average model training and prediction are not performed.

The ARMA model is a model commonly used in the metrology economics at present for analyzing the trend of future data, and the model simulates the change condition of a group of data by using an ar (Auto-regressive) model and an MA (Moving-Average model). The expression of the ARMA model is:

Yt＝a₀+a1Y_t-1+a₂Y_t-2+...+a_pY_t-p+b₁e_t+b₂e_t-2+...+b_qe_t-q

When the training data sequence is not a stationary sequence, the ARMA model cannot be used for fitting the stationary sequence, so that the ARMA model is not used for predicting the data value when the training data sequence is a non-stationary sequence; at this time, the training data sequence may be fitted according to a nonlinear regression model, data values may be predicted based on the nonlinear regression model, and then, whether or not the fluctuation of the data is abnormal may be determined according to the fluctuation parameters of the current data and the predicted data values, and the fluctuation parameters of the current data and the historical data values.

It can be seen that, in the embodiments of the present invention, on one hand, the nonlinear regression model is used as a supplement to ARMA, that is, when the data sequence is unstable, the data value is predicted by the model; on the other hand, under the condition of stable sequence, a more reasonable predicted value can be predicted together with the ARMA model, a plurality of dimensional fluctuation parameters are generated, and the accuracy of data fluctuation identification is improved.

Specifically, the hysteresis order determination and the model parameter solution may refer to the description of the above embodiments.

306. A third fluctuation parameter between the data value and the second predicted data value is obtained, go to step 307.

307. And determining whether the fluctuation of the current data is abnormal or not according to the current fluctuation parameters.

In this embodiment of the present invention, if the training data sequence is a stationary sequence, the current fluctuation parameter may include: a first fluctuation parameter, a second fluctuation parameter, and a third fluctuation parameter;

if the training data sequence is not a stationary sequence, the current fluctuation parameters may include: a first fluctuation parameter, a second fluctuation parameter.

After the current fluctuation parameter is obtained, the final fluctuation parameter value of the current data can be obtained according to the current fluctuation parameter;

For example, the final fluctuation parameter value of the current data may be obtained according to the parameter values of the first fluctuation parameter, the second fluctuation parameter, and the third fluctuation parameter, or the final fluctuation parameter value of the current data may be obtained according to the first fluctuation parameter and the second fluctuation parameter;

For example, referring to fig. 3b, in order to improve the accuracy and efficiency of data fluctuation identification, the parameter values of the first fluctuation parameter, the second fluctuation parameter, and the third fluctuation parameter may be subjected to weighted average processing, and the weighted average value is taken as the final fluctuation parameter value.

If the first fluctuation parameter is a fluctuation, the second fluctuation parameter is a fluctuation', and the third fluctuation parameter is a fluctuation ", then the final fluctuation parameter values are q 1+ q 2+ q3 + a fluctuation, where q1, q2, and q3 are weights, and may be set according to actual requirements, for example, q1 is 0.3, q2 is 0.3, q3 is 0.3, and so on.

In practical application, the identification scheme provided by the embodiment of the invention can be realized in a fluctuation identification logic layer.

Therefore, the scheme provided by the embodiment of the invention can predict the data based on the combination of the ARMA prediction model and the nonlinear regression model to obtain fluctuation parameters of multiple dimensions, and then determine whether the data fluctuation is abnormal based on the fluctuation parameters of the multiple dimensions, so that the identification accuracy and the authenticity of the data fluctuation can be improved.

The scheme provided by the embodiment of the invention can detect the weekend effect of the business data, for example, in our online inquiry business, orders for inquiry can fall back a lot during the weekend, and if the fluctuation of the business data is regarded as abnormal according to the existing identification mode, the alarm is given. But after the scheme of the embodiment of the present invention is adopted. Since the predicted value is learned based on historical data, the fluctuation rate of the actual value is in a normal interval, and the fluctuation of the business data is determined to be normal by combining fluctuation parameters of multiple dimensions.

In addition, the scheme can abstract the fluctuation identification data source into a data generator, does not limit the user to input specific data types, and only needs the user to provide an interface with data output capability and configure the interface to the fluctuation identification system, including but not limited to mysql and hive data sources, and other executable scripts such as python, shell and perl which can be called up directly through the shell.

In order to better implement the above method, an embodiment of the present invention further provides a data fluctuation identification apparatus, as shown in fig. 4a, the data fluctuation identification apparatus may include: data acquisition section 401, first parameter acquisition section 402, training section 403, second parameter acquisition section 404, and determination section 405 are as follows:

a data obtaining unit 401, configured to obtain a data value of current data;

a first parameter obtaining unit 402, configured to obtain a first fluctuation parameter between the data value and a historical data value;

a training unit 403, configured to train the nonlinear regression model according to a training data sequence;

a second parameter obtaining unit 404, configured to obtain a current first predicted data value according to the trained nonlinear regression model, and obtain a second fluctuation parameter between the data value and the first predicted data value;

a determining unit 405, configured to determine whether the fluctuation of the current data is abnormal according to the first fluctuation parameter and the second fluctuation parameter.

In an embodiment, referring to fig. 4b, wherein the determining unit 405 may include:

the training subunit 4051 is configured to, when the training data sequence is a stationary sequence, train an autoregressive moving average model according to the training data sequence;

a parameter obtaining subunit 4052, configured to obtain a current second predicted data value according to the trained autoregressive moving average model, and obtain a third fluctuation parameter between the data value and the second predicted data value;

a first anomaly determination subunit 4053, configured to determine whether the fluctuation of the current data is abnormal according to the first fluctuation parameter, the second fluctuation parameter, and a third fluctuation parameter;

a second anomaly determination subunit 4054, configured to determine, when the training data sequence is not a stationary sequence, whether the fluctuation of the current data is abnormal according to the first fluctuation parameter and the second fluctuation parameter.

In an embodiment, the first anomaly determination subunit 4053 is configured to:

and when the final fluctuation parameter value is not in a preset range, determining that the fluctuation of the current data is abnormal.

In an embodiment, the first abnormality determining subunit 4053 may specifically be configured to:

carrying out weighted average processing on the parameter values of the first fluctuation parameter, the second fluctuation parameter and the third fluctuation parameter to obtain a weighted average parameter value;

and taking the weighted average parameter value as a final fluctuation parameter value of the current data.

In an embodiment, the data obtaining unit 401 may be configured to:

acquiring data from a data source to obtain current data;

In an embodiment, the training unit 403 may be configured to:

and solving the model parameters to be estimated of the nonlinear regression model based on a least square method and the training data sequence to obtain the trained nonlinear regression model.

In an embodiment, the training sub-unit 4051 may be configured to:

determining a hysteresis order of the autoregressive moving average model based on an autocorrelation function analysis mode and a partial autocorrelation function analysis mode;

and solving the model parameters to be estimated of the autoregressive moving average model based on a least square method, the hysteresis order and the training data sequence to obtain the trained autoregressive moving average model.

The steps performed by the above units may refer to the description of the above method embodiments.

In a specific implementation, the above units may be implemented as independent entities, or may be combined arbitrarily to be implemented as the same or several entities, and the specific implementation of the above units may refer to the foregoing method embodiments, which are not described herein again.

The data fluctuation identification device can be integrated into a server, such as a fluctuation identification server.

As can be seen from the above, the data fluctuation identification apparatus in the embodiment of the present invention obtains the data value of the current data through the data obtaining unit 401; acquiring, by a first parameter acquisition unit 402, a first fluctuation parameter between the data value and the historical data value; the non-linear regression model is trained by the training unit 403 according to the training data sequence; a second parameter obtaining unit 404 obtains a current first predicted data value according to the trained nonlinear regression model, and obtains a second fluctuation parameter between the data value and the first predicted data value; determining, by the determining unit 405, whether the fluctuation of the current data is abnormal according to the first fluctuation parameter and the second fluctuation parameter. According to the scheme, fluctuation parameters (such as a first fluctuation parameter, a second fluctuation parameter and the like) of the data on multiple dimensions can be obtained, and whether the data fluctuation is abnormal or not is determined based on the multidimensional fluctuation parameters, so that the identification accuracy of the data fluctuation can be improved.

referring to fig. 5, an embodiment of the present invention provides a server 500, which may include one or more processors 501 of a processing core, one or more memories 502 of a computer-readable storage medium, a Radio Frequency (RF) circuit 503, a power supply 504, an input unit 505, and the like. Those skilled in the art will appreciate that the server architecture shown in FIG. 5 is not meant to be limiting, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components. Wherein:

the processor 501 is a control center of the server, connects various parts of the entire server with various interfaces and lines, and performs various functions of the server and processes data by running or executing software programs and/or modules stored in the memory 502 and calling data stored in the memory 502, thereby performing overall wave recognition of the server. Optionally, processor 501 may include one or more processing cores; preferably, the processor 501 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 501.

The memory 502 may be used to store software programs and modules, and the processor 501 executes various functional applications and data processing by operating the software programs and modules stored in the memory 502.

The RF circuit 503 may be used for receiving and transmitting signals during information transmission and reception, and in particular, for receiving downlink information of a base station and then processing the received downlink information by one or more processors 501; in addition, data relating to uplink is transmitted to the base station.

The server also includes a power supply 504 (e.g., a battery) for powering the various components, which may preferably be logically connected to the processor 501 via a power management system to manage charging, discharging, and power consumption management functions via the power management system. The power supply 504 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.

The server may further include an input unit 505, and the input unit 505 may be used to receive input numeric or character information.

Specifically, in this embodiment, the processor 501 in the server loads the executable file corresponding to the process of one or more application programs into the memory 502 according to the following instructions, and the processor 501 runs the application program stored in the memory 502, thereby implementing various functions as follows:

acquiring a data value of current data; acquiring a first fluctuation parameter between the data value and a historical data value; training the nonlinear regression model according to the training data sequence; acquiring a current first predicted data value according to the trained nonlinear regression model, and acquiring a second fluctuation parameter between the data value and the first predicted data value; and determining whether the fluctuation of the current data is abnormal or not according to the first fluctuation parameter and the second fluctuation parameter.

In some embodiments, when determining whether the fluctuation of the current data is abnormal according to the first fluctuation parameter and the second fluctuation parameter, the processor 501 specifically performs the following steps:

and when the training data sequence is not a stable sequence, determining whether the fluctuation of the current data is abnormal or not according to the first fluctuation parameter and the second fluctuation parameter.

In some embodiments, when determining whether the fluctuation of the current data is abnormal according to the first fluctuation parameter, the second fluctuation parameter, and the third fluctuation parameter, the processor 501 specifically performs the following steps:

In some embodiments, when acquiring the data value of the current data, the processor 501 specifically performs the following steps:

acquiring data from a data source to obtain current data;

In some embodiments, when the non-linear regression model is trained according to the training data sequence, the processor 501 specifically performs the following steps:

In some embodiments, when training the autoregressive moving average model according to the training data sequence, the processor 501 specifically performs the following steps:

The server of the embodiment of the invention can obtain the data value of the current data; acquiring a first fluctuation parameter between the data value and a historical data value; training the nonlinear regression model according to the training data sequence; acquiring a current first predicted data value according to the trained nonlinear regression model, and acquiring a second fluctuation parameter between the data value and the first predicted data value; and determining whether the fluctuation of the current data is abnormal or not according to the first fluctuation parameter and the second fluctuation parameter. According to the scheme, fluctuation parameters (such as a first fluctuation parameter, a second fluctuation parameter and the like) of the data on multiple dimensions can be obtained, and whether the data fluctuation is abnormal or not is determined based on the multidimensional fluctuation parameters, so that the identification accuracy of the data fluctuation can be improved.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

The data fluctuation identification method, device and storage medium provided by the embodiments of the present invention are described in detail above, and the principles and embodiments of the present invention are explained herein by applying specific examples, and the descriptions of the above embodiments are only used to help understanding the method and the core ideas of the present invention; meanwhile, for those skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A data fluctuation identification method, comprising:

the server acquires the data value of the current data, and the acquiring the data value of the current data comprises the following steps: abstracting a monitored data source into a corresponding data generator, acquiring data corresponding to the current time from the corresponding data source through the data generator, and converting the data into data in a uniform data format through the data generator to obtain a data value of the converted data;

acquiring a first fluctuation parameter between the data value and a historical data value, wherein the first fluctuation parameter is used for measuring the change amplitude of the data value;

training the nonlinear regression model according to the training data sequence, which specifically comprises: determining the number of model parameters to be estimated in the nonlinear regression model, and solving the model parameters to be estimated of the nonlinear regression model based on a least square method and a training data sequence to obtain a trained nonlinear regression model;

acquiring a current first predicted data value according to the trained nonlinear regression model, and acquiring a second fluctuation parameter between the data value and the first predicted data value, wherein the second fluctuation parameter is a parameter for measuring the variation amplitude of the data value;

determining whether the fluctuation of the current data is abnormal or not according to the first fluctuation parameter and the second fluctuation parameter;

wherein, the first and the second end of the pipe are connected with each other,determining the fluctuation of the current data according to the first fluctuation parameter and the second fluctuation parameter Whether abnormal, including: when the training data sequence is a stationary sequence, performing autoregressive sliding according to the training data sequence pair Training an average model; obtaining a current second prediction data value according to the trained autoregressive moving average model, and obtaining Taking a third fluctuation parameter between the data value and the second predicted data value; according to the first fluctuation parameter, the The second fluctuation parameter and the third fluctuation parameter are used for determining whether the fluctuation of the current data is abnormal or not; when the training data sequence When the column is not a stable sequence, determining the current data according to the first fluctuation parameter and the second fluctuation parameter Whether the wave motion is abnormal。

2. The data fluctuation identification method according to claim 1, wherein determining whether the fluctuation of the current data is abnormal based on the first fluctuation parameter, the second fluctuation parameter, and a third fluctuation parameter includes:

3. The data fluctuation identification method according to claim 1, wherein obtaining a final fluctuation parameter of the current data based on the first fluctuation parameter, the second fluctuation parameter, and a third fluctuation parameter comprises:

4. The data fluctuation identification method as claimed in claim 1, wherein the acquiring of the data value of the current data comprises:

acquiring data from a data source to obtain current data;

5. The data fluctuation identification method of claim 1, wherein training the non-linear regression model based on the training data sequence comprises:

6. The data fluctuation identification method of claim 1, wherein training an autoregressive moving average model based on the training data sequence comprises:

7. A data fluctuation recognition device, applied to a server, includes:

the data acquisition unit is used for acquiring the data value of the current data; the data acquisition unit is specifically used for abstracting the monitored data source into a corresponding data generator, acquiring data corresponding to the current time from the corresponding data source through the data generator, and converting the data into data in a uniform data format through the data generator to obtain a data value of the converted data;

the first parameter acquiring unit is used for acquiring a first fluctuation parameter between the data value and the historical data value, wherein the first fluctuation parameter is used for measuring the change amplitude of the data value;

the training unit is used for training the nonlinear regression model according to the training data sequence; the training unit is specifically used for determining the number of model parameters to be estimated in the nonlinear regression model, solving the model parameters to be estimated of the nonlinear regression model based on a least square method and a training data sequence, and obtaining the trained nonlinear regression model;

the second parameter obtaining unit is used for obtaining a current first predicted data value according to the trained nonlinear regression model and obtaining a second fluctuation parameter between the data value and the first predicted data value, wherein the second fluctuation parameter is a parameter for measuring the change amplitude of the data value;

a determining unit, configured to determine whether fluctuation of the current data is abnormal according to the first fluctuation parameter and the second fluctuation parameter;

wherein the determination unit includes: the training subunit is used for training an autoregressive moving average model according to the training data sequence when the training data sequence is a stable sequence; the parameter obtaining subunit is used for obtaining a current second predicted data value according to the trained autoregressive moving average model and obtaining a third fluctuation parameter between the data value and the second predicted data value; a first anomaly determination subunit, configured to determine whether the fluctuation of the current data is abnormal according to the first fluctuation parameter, the second fluctuation parameter, and a third fluctuation parameter; and a second abnormality determining subunit, configured to determine whether fluctuation of the current data is abnormal according to the first fluctuation parameter and the second fluctuation parameter when the training data sequence is not a stationary sequence.

8. The data fluctuation identifying apparatus according to claim 7, wherein the first abnormality determining subunit is configured to:

9. The data fluctuation recognition apparatus according to claim 7, wherein the data acquisition unit is configured to:

acquiring data from a data source to obtain current data;

10. The data fluctuation recognition apparatus of claim 7, wherein the training unit is configured to:

11. The data fluctuation recognition apparatus of claim 7, wherein the training subunit is configured to:

12. A storage medium storing instructions which, when executed by a processor, carry out the steps of the method according to any one of claims 1 to 6.