CN107577721A - Data stability detection method and device, storage medium, server for big data - Google Patents

Data stability detection method and device, storage medium, server for big data Download PDF

Info

Publication number
CN107577721A
CN107577721A CN201710705979.6A CN201710705979A CN107577721A CN 107577721 A CN107577721 A CN 107577721A CN 201710705979 A CN201710705979 A CN 201710705979A CN 107577721 A CN107577721 A CN 107577721A
Authority
CN
China
Prior art keywords
data
training
stability detection
big
actual provision
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710705979.6A
Other languages
Chinese (zh)
Inventor
汤奇峰
侯东东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZAMPLUS ADVERTISING (SHANGHAI) CO Ltd
Original Assignee
ZAMPLUS ADVERTISING (SHANGHAI) CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZAMPLUS ADVERTISING (SHANGHAI) CO Ltd filed Critical ZAMPLUS ADVERTISING (SHANGHAI) CO Ltd
Priority to CN201710705979.6A priority Critical patent/CN107577721A/en
Publication of CN107577721A publication Critical patent/CN107577721A/en
Pending legal-status Critical Current

Links

Abstract

A kind of data stability detection method and device, storage medium, server for big data, methods described include:The training data arranged according to time series is obtained based on historical data;The training data is fitted, to obtain data distribution of the training data with the time;Determine whether the actual provision data exception occur according to the comparative result of actual provision data and the data distribution.By technical solution of the present invention, the stability of the actual provision data of access data management platform can be detected;If detecting actual provision data exception occurs, data management platform being capable of timely early warning.

Description

Data stability detection method and device, storage medium, server for big data
Technical field
The present invention relates to field of information processing, and in particular to a kind of data stability detection method and dress for big data Put, storage medium, server.
Background technology
With the rapid development of internet and big data industry, data management platform (Data Management Platform, abbreviation DMP) due to possessing the multiple functions such as data acquisition and management, analysis and application, it has also become support big data The basis of management.First, DMP is received using unitized mode integrates each side's data, possesses Data Integration and standardisation abilities;Its Secondary, DMP possesses data subdividing managerial ability using technologies such as data modeling, machine algorithms;Again, DMP provides what function perfected Data label, such as crowd's tag control, Product labelling management etc., help, which markets to obtain, maximizes effect.
DMP has powerful convergence integration ability, can handle the data in various sources, including website monitoring number According to SDK (Software Development Kit, abbreviation SDK) Monitoring Data, offline business data, structure Change query language (Structured Query Language, abbreviation SQL) data, real-time interface data etc..Integrated in convergence Cheng Zhong, it is one of committed step that data normally access to DMP from outside.If the data of outside access go wrong, then can DMP subsequent analysis can be caused to produce deviation, and then influence marketing effectiveness.Therefore, for accessing DMP data, it is necessary to detect Whether data are in normal range (NR), so as to the timely early warning of DMP.
DMP generally possesses the data resource of magnanimity, and by taking user tag as an example, DMP can provide what function perfected to client User tag.However, although DMP have collected the mass data of user, but because the data from different channels are (such as across screen number According on, cross-line, data etc. under line) often unstructured, dimension disunity, non-relational data, thus need to collecting To user tag supplemented, so as to obtain the user tag with normal structure.It should be evident that access DMP data Stability it is higher, DMP provide user tag quality it is also higher.
As can be seen here, in order to extract the maximum value with mining data, detecting the stability for the data for accessing to DMP is DMP needs one of key issue solved.
For the above mentioned problem in correlation technique, effective solution is not yet proposed at present.
The content of the invention
Present invention solves the technical problem that it is the stability problem for the data supply for detecting big data, so that DMP is timely Early warning abnormal data supply.
In order to solve the above technical problems, the embodiment of the present invention provides a kind of data stability detection side for big data Method, including:The training data arranged according to time series is obtained based on historical data;The training data is fitted, with Obtain data distribution of the training data with the time;Determined according to the comparative result of actual provision data and the data distribution Whether the actual provision data there is exception.
Optionally, it is described to obtain including according to the training data of time series arrangement based on historical data:According to time sequence Row extract the data in the historical data, to obtain time series data;The time series data is cleaned, with To the training data.
Optionally, it is described that the time series data is cleaned, included with obtaining the training data:To it is described when Between sequence data carry out logarithmic transformation, with the data after being converted;The abnormal data in the data after the conversion is rejected, with Obtain the training data.
Optionally, the abnormal data refers to:Not in (QL-1.5IQR,QU+ 1.5IQR) in the range of data, wherein, QLRepresent the lower quartile of the data after the conversion, QUThe upper quartile of the data after the conversion is represented, described in IQR is represented The interquartile-range IQR of data after conversion, IQR=QU-QL
Optionally, the historical data includes following one or more:Website Monitoring Data, SDK monitoring Data, offline business data, SQL data, real-time interface data.
Optionally, it is described that the training data is fitted, to obtain data distribution of the training data with the time Including:Build the multinomial model containing parameter;The multinomial model is solved using least square method or gradient descent method Parameter, to obtain data distribution of the training data with the time.
Optionally, the comparative result according to actual provision data and the data distribution determines the actual provision number According to whether there is abnormal include:Construct confidential interval of the data distribution under default confidence level;If the actual provision Data fall into the confidential interval, then judge that the actual provision data are normal, otherwise judge the actual provision data exception.
Optionally, the data stability detection method for big data also includes:Figure shows are with the next item down or more :Confidential interval, the actual provision data and the expression institute of the data distribution, the data distribution under default confidence level State the mark of actual provision data exception.
In order to solve the above technical problems, the embodiment of the present invention, which also provides a kind of data stability for big data, detects dress Put, including:Training module, suitable for obtaining the training data arranged according to time series based on historical data;Fitting module, it is suitable to The training data is fitted, to obtain data distribution of the training data with the time;Judge module, suitable for according to reality The comparative result that data and the data distribution are supplied in border determines whether the actual provision data exception occur.
Optionally, training module is suitable to extract the data in the historical data according to time series, to obtain time sequence Column data;The time series data is cleaned, to obtain the training data.
Optionally, training module is further adapted for carrying out logarithmic transformation to the time series data, with the number after being converted According to;The abnormal data in the data after the conversion is rejected, to obtain the training data.
Optionally, the abnormal data refers to:Not in (QL-1.5IQR,QU+ 1.5IQR) in the range of data, wherein, QLRepresent the lower quartile of the training data, QUAfter representing that the upper quartile of the training data, IQR represent the conversion The interquartile-range IQR of data, IQR=QU-QL
Optionally, the historical data includes following one or more:Website Monitoring Data, SDK monitoring Data, offline business data, SQL data, real-time interface data.
Optionally, fitting module includes:Submodule is built, suitable for building the multinomial model containing parameter;Solve submodule Block, based on the training data, using least square method or the parameter of the gradient descent method solution multinomial model, to obtain Data distribution to the training data with the time.
Optionally, judge module is suitable to construct confidential interval of the data distribution under default confidence level;It is if described Actual provision data fall into the confidential interval, then judge module is suitable to judge that the actual provision data are normal, otherwise judges The actual provision data exception.
Optionally, described device also includes:Image display module, suitable for one or more below figure shows:The number According to confidential interval, the actual provision data and the expression actual confession of distribution, the data distribution under default confidence level Answer the abnormal mark of data.
In order to solve the above technical problems, the embodiment of the invention also discloses a kind of storage medium, computer is stored thereon with Instruction, the step of performing the data stability method for being used for big data described in any of the above-described during computer instruction operation.
In order to solve the above technical problems, the embodiment of the invention also discloses a kind of server, including memory and processor, The computer instruction that can be run on the processor is stored with the memory, the processor runs the computer and referred to The step of data stability method for being used for big data described in any of the above-described is performed when making.
Compared with prior art, the technical scheme of the embodiment of the present invention has the advantages that:
The technical scheme of the embodiment of the present invention provides a kind of data stability detection method for big data, including: The training data arranged according to time series is obtained based on historical data;The training data is fitted, it is described to obtain Training data with the time data distribution;The reality is determined according to the comparative result of actual provision data and the data distribution Whether supply data there is exception.The technical scheme of the embodiment of the present invention considers that historical data is to continue the consecutive numbers of supply According to, first by continuous data be divided into time series be distributed training data, be then based on the training data be fitted from And data distribution of the training data with the time is obtained, finally by the comparison knot of actual provision data and the data distribution Fruit detects actual provision data whether in normal range (NR).By the technical scheme of the embodiment of the present invention, access DMP can be detected Actual provision data stability.If detecting actual provision data exception occurs, DMP being capable of timely early warning.
Further, it is described to obtain including according to the training data of time series arrangement based on historical data:According to time sequence Row extract the data in the historical data, to obtain time series data;Logarithmic transformation is carried out to the time series data, With the data after being converted;The abnormal data in the data after the conversion is rejected, to obtain the training data.The present invention Time series data is carried out the data after logarithmic transformation is converted by the technical scheme of embodiment, and this allows for temporally sequence The magnitude (million to ten million ranks) for the time series data that row extraction historical data obtains is generally very big, and carrying out logarithmic transformation can To reduce data magnitude, simplify processing procedure;The abnormal data rejected in the data after conversion can exclude to there may be exception Historical data, the interference of abnormal data can be avoided, be advantageous to follow-up statistical analysis and draw reliable conclusion.
Further, the embodiment of the present invention technical scheme provide data stability detection method also include figure shows with The next item down is multinomial:Confidential interval under default confidence level of the data distribution, the data distribution, the actual provision number Identified according to the exception for representing the actual provision data.Using such technical scheme, when exception occur in actual provision data When, DMP, which can understand, intuitively shows Exception Type.
Brief description of the drawings
Fig. 1 is a kind of flow chart of data stability detection method for big data of the embodiment of the present invention;
Fig. 2 is that the result of figure shows when a kind of data supply obtained according to embodiments of the present invention occurs abnormal is shown It is intended to;
Fig. 3 is a kind of structural representation of data stability detection means for big data of the embodiment of the present invention.
Embodiment
As described in background, the stability for detecting the actual provision data for accessing to DMP is the key issue that DMP needs to consider One of, effective solution is not yet proposed at present.
The technical scheme of the embodiment of the present invention provides a kind of data stability detection method for big data, including: The training data arranged according to time series is obtained based on historical data;The training data is fitted, it is described to obtain Training data with the time data distribution;The reality is determined according to the comparative result of actual provision data and the data distribution Whether supply data there is exception.The technical scheme of the embodiment of the present invention considers that historical data is to continue the consecutive numbers of supply According to, first by continuous data be divided into time series be distributed training data, be then based on the training data be fitted from And the data distribution of the training data is obtained, detected finally by the comparative result of actual provision data and the data distribution Whether actual provision data are in normal range (NR).By the technical scheme of the embodiment of the present invention, the reality for accessing DMP can be detected Supply the stability of data.If detecting actual provision data exception occurs, DMP being capable of timely early warning.
It is understandable to enable above-mentioned purpose, feature and the beneficial effect of the present invention to become apparent, below in conjunction with the accompanying drawings to this The specific embodiment of invention is described in detail.
Fig. 1 is a kind of flow chart of data stability detection method for big data of the embodiment of the present invention.
The data stability detection method for big data shown in Fig. 1 can include:
Step S101:The training data arranged according to time series is obtained based on historical data.
Step S102:The training data is fitted, to obtain data distribution of the training data with the time.
Step S103:The actual provision data are determined according to the comparative result of actual provision data and the data distribution Whether exception is occurred.
Generally, accessing DMP data typically can all continue to supply, and belong to continuous data, can form what is temporally arranged Time series data.In step S101 specific implementation, the training number arranged according to time series is obtained based on historical data According to.The historical data can be the continuous data of (for example, not including the same day) up to now that extraction obtains.Historical data can With including following one or more:Website Monitoring Data, SDK Monitoring Datas, offline business data, SQL data, real-time interface number According to.
In a non-limiting example, the data in the historical data can be extracted, and then obtain time series Data.The data magnitude of the time series obtained in view of extracting historical data in temporal sequence generally (can reach hundred very much greatly Ten thousand to ten million datas), the numerical value of data may be also very big, and the data in historical data there may be abnormality, can To be cleaned to the time series data to avoid the interference of abnormal data, to obtain clean training data, and then obtain To the fitting data of high confidence level.For example, logarithmic transformation can be carried out to the time series data, with the number after being converted According to the numerical value of the data after conversion is significantly reduced, and is easy to subsequently be fitted and other data statistic analysis operate; Then the abnormal data in the data after the conversion is rejected, to obtain the training data.Wherein, abnormal data can pass through The mode of box traction substation determines.Specifically, the abnormal data is referred to not in (QL-1.5IQR,QU+ 1.5IQR) in the range of Data, wherein, QLRepresent the lower quartile of the data after the conversion, QUThe upper quartile of the data after the conversion is represented, IQR represents the interquartile-range IQR of the data after the conversion, IQR=QU-QL
In step s 102, the multinomial model containing parameter can be constructed;Based on the training data, a most young waiter in a wineshop or an inn is utilized Multiplication or gradient descent method solve the polynomial parameter, to obtain data distribution of the training data with the time.
Below by taking least square method as an example, the multinomial model containing parameter described in a kind of structure is provided, and based on training Data solve parameter to obtain the process of the data distribution using least square method.
(1) multinomial model, i.e. polynomial function y=β are constructed01x+β2x2+…+βpxp, wherein x is independent variable, and p is Nonnegative integer, βi, i=0,1,2 ..., p is model parameter.
(2) the polynomial module shape parameter is calculated:Construct residual sum of squares (RSS) M
Wherein, yiIt is polynomial function estimate, Xi=(1, xi1,xi2,…xip) ' it is (p + 1) dimensional vector, parameter beta=(β012,…,βp) ' it is (p+1) dimensional vector.
Derivative operation, order are carried out to above-mentioned (p+1) individual parameter
Wherein, Xik=xi k, k=0,1,2 ..., p, respectively obtain (p+1) Individual equation, can be in the hope of parameter beta by matrix operation.
(3) fitting data is generated according to multinomial model, then calculates the time series of the training data and the plan The difference of data is closed, obtains residual sequence.Estimate to obtain the distribution of residual sequence followed by nonparametric technique, and then try to achieve residual The variance of difference sequence.It will be understood by those skilled in the art that parameter beta can be determined according to β value and the variance of the residual sequence It is whether notable.If β is notable, β is substituted into polynomial regression model, to obtain data distribution of the training data with the time.
In a change case of the present embodiment, linear polynomial model can also be constructed, is solved using gradient descent method Parameter in linear polynomial model, to obtain the data distribution, the step is as follows:
(1) linear polynomial model, i.e. polynomial function are constructedWherein xiIt is independent variable, p is non-negative whole Number, βi, i=0,1 ..., p is model parameter.
(2) parameter of linear polynomial model is solved.
First, it is assumed that loss functionWherein, m represents the training number According to number, yiRepresent i-th of element of the training data output.It is by solving parameter beta to calculate targeti, i=0,1 ..., P causes loss function to minimize;Secondly, given parameters calculation of initial value βi, i=0,1 ..., p value, such as initial parameter value Can be βi=0, i=0,1 ..., p;Then, β is updated according to the gradient descent direction of loss functioni, constantly circulation, until damage Function convergence is lost, obtains parameter betai, i=0,1 ..., p optimal solutions.
(3) by βi, i=0,1 ..., p substitute into the linear polynomial model, to obtain data of the training data with the time Distribution.
Preferably, the confidential interval that can be arranged on based on the data distribution under default confidence level, for example, data are divided The default confidence level of cloth is arranged to 95%, can obtain corresponding confidential interval.It will be apparent to a skilled person that can The parameter vector of digital simulation data in a manner of use is arbitrarily enforceable, and then the data distribution with the time is obtained, the present invention Embodiment is not restricted to this.
In step s 103, can be true by comparative result according to actual provision data compared with the data distribution Whether the fixed actual provision data there is exception.Preferably, confidential interval is determined according to default confidence level, then described in judgement Whether actual provision data fall into the confidential interval.If in the confidential interval, the actual provision data are judged Normally, the actual provision data exception is otherwise judged.
It is possible to further intuitively show whether actual provision data are abnormal by the way of figure shows.For example, figure Display can include following one or more:Confidential interval under default confidence level of the data distribution, the data distribution, The actual provision data and the mark for representing the actual provision data exception.Display mode can have multiple choices, such as Above-mentioned every terms of information can be shown using curve of different colors, different shapes, can also color combining and shape show it is above-mentioned Every terms of information.Further, it is also possible to shown using Digital ID, display result as shown in Figure 2.Further, it is also possible to using word The Exception Type of the actual provision data is shown, such as different language such as Chinese or English display data can be used abnormal Type, abnormal type can be abnormal high or abnormal low.
Fig. 2 is that the result of figure shows when a kind of data supply obtained according to embodiments of the present invention occurs abnormal is shown It is intended to.
Reference picture 2, illustrate figure when a kind of actual provision data based on the detection of Fig. 1 embodiments method occur abnormal Show result.Intuitively give and a kind of from March, 2017 to the actual provision data 1 in May, 2017 and be based in Fig. 2 The actual provision data that training data is fitted to obtain are with the data distribution 2 of time, the data distribution in default confidence level For the confidential interval (including the confidential interval upper bound 3 and confidential interval lower bound 4) under 95% and the actual provision data exception Identify 5 (being illustrated as black circles sawtooth).Furthermore, it is possible to the exception occurred is supplied using real data described in text importing Type, the display word in Fig. 2 is " state:Numerical exception is high!”.
Fig. 3 is a kind of structural representation of data stability detection means for big data of the embodiment of the present invention.
The data stability detection means 30 for big data shown in Fig. 3 can include:
Training module 301, suitable for obtaining the training data arranged according to time series based on historical data.
Fitting module 302, suitable for being fitted to the training data, to obtain data of the training data with the time Distribution.
Judge module 303, suitable for determining the reality according to the comparative result of actual provision data and the data distribution Whether supply data there is exception.
Generally, accessing DMP data typically can all continue to supply, and belong to continuous data, can form what is temporally arranged Time series data.In the specific implementation of training module 301, the training arranged according to time series is obtained based on historical data Data.The historical data can extract the continuous data of (for example, not including the same day) up to now.Historical data can wrap Include following one or more:Website Monitoring Data, SDK Monitoring Datas, offline business data, SQL data, real-time interface data.
In a non-limiting example, the data in the historical data can be extracted, and then obtain time series Data.The data magnitude of the time series obtained in view of extracting historical data in temporal sequence is generally very big, and (such as data can Million to ten million bars can be reached), and the data value of data may be also very big, and the data in historical data there may be it is different Normal state, the time series data can be cleaned to avoid the interference of abnormal data, to obtain clean training number According to, and then obtain the fitting data of high confidence level.For example, logarithmic transformation can be carried out to the time series data, to obtain Data after conversion, the numerical value of the data after conversion are significantly reduced, and are easy to subsequently be fitted and other data are united Count analysis operation;Then the abnormal data in the data after the conversion is rejected, to obtain the training data.Wherein, it is abnormal Data can be determined by box traction substation.The abnormal data is referred to not in (QL-1.5IQR,QU+ 1.5IQR) in the range of number According to, wherein, QLRepresent the lower quartile of the data after the conversion, QURepresent the upper quartile of the data after the conversion, IQR Represent the interquartile-range IQR of the data after the conversion, IQR=QU-QL
In the fitting module 302, it can include:Submodule 3021 is built, suitable for building the multinomial containing parameter Model;Submodule 3022 is solved, based on the training data, is solved using least square method or gradient descent method described multinomial The parameter of formula model, to obtain data distribution of the training data with the time.Build in submodule 3021, structure contains parameter Multinomial model can use polynomial regression model.The polynomial regression model can be those skilled in the art's progress Common model when data are fitted.
Below by taking least square method as an example, a kind of multinomial model of the construction containing parameter is provided, and be based on training data Using least square method to obtain the process of data distribution.
(1) multinomial model, i.e. polynomial function y=β are constructed01x+β2x2+…+βpxp, wherein x is independent variable, and p is Nonnegative integer, βi, i=0,1,2 ..., p is model parameter.
(2) evaluator model parameter:Residual sum of squares (RSS) M is constructed,
Wherein, yiIt is polynomial function estimate, Xi=(1, xi1,xi2,…xip) ' it is (p+ 1) dimensional vector, parameter beta=(β012,…,βp) ' it is (p+1) dimensional vector.
Derivative operation, order are carried out to above-mentioned (p+1) individual parameter
Wherein, Xik=xi k, k=0,1,2 ..., p, respectively obtain (p+1) Individual equation, can be in the hope of parameter beta by matrix operation.
(3) fitting data is obtained according to multinomial model, then calculates the time series of the training data and the plan The difference of data is closed, to obtain residual sequence.Estimate to obtain the distribution of residual sequence followed by nonparametric technique, and then try to achieve The variance of residual sequence.It will be understood by those skilled in the art that parameter can be determined according to β value and the variance of the residual sequence Whether β is notable.If β is notable, β is substituted into multinomial model, obtains data distribution of the training data with the time.
In a change case of the present embodiment, it is multinomial using gradient descent method solution to be also based on the training data The parameter of formula model, to obtain the data distribution, main flow is as follows:
(1) linear polynomial model, i.e. polynomial function are constructedWherein xiIt is independent variable, p is non-negative whole Number, βi, i=0,1 ..., p is model parameter.
(2) linear polynomial model parameter is calculated.
First, it is assumed that loss functionWherein, m represents the training number According to number, yiRepresent i-th of element of the training data output.It is by solving parameter beta to calculate targeti, i=0,1 ..., P causes loss function to minimize;Secondly, the initial value of given parameters solves βi, i=0,1 ..., p, such as initial parameter value can Think βi=0, i=0,1 ..., p;Then, β is updated according to the gradient descent direction of loss functioniConstantly circulation, until losing letter Number convergence, obtains parameter betai, i=0,1 ..., p optimal solution.
(3) by βi, i=0,1 ..., p substitute into linear polynomial model, to obtain data of the training data with the time Distribution.Preferably, the confidential interval that can be arranged on based on the data distribution under default confidence level, for example, by data distribution Default confidence level be arranged to 95%, can obtain corresponding confidential interval.
It will be apparent to a skilled person that the parameter of any enforceable mode digital simulation data can be used Vector, and then obtain not being restricted this with the data distribution of time, the embodiment of the present invention.
, can be according to actual provision data compared with the data distribution, by comparing in the judge module 303 As a result determine whether the actual provision data exception occur.Preferably, set reliability that confidential interval is set, then described in judgement Whether actual provision data fall into the confidential interval.If in the confidential interval, the actual provision data are judged Normally, the actual provision data exception is otherwise judged.
Further, the data stability detection means 30 for big data can also include image display module 304, suitable for intuitively showing whether actual provision data are abnormal by the way of figure shows.The result signal shown with reference to Fig. 2 Figure, the actual provision data 1 can be shown and represent the mark 5 of the actual provision data exception, the data distribution 2, The confidential interval upper bound 3 and confidential interval lower bound 4 of the data distribution under default confidence level.Further, it is also possible to show reality It is low for abnormal high or exception to supply data mode.By taking Fig. 2 as an example, it can be shown as:" state:Numerical exception is high!”.
Further, the embodiment of the invention also discloses a kind of storage medium, computer instruction, the meter are stored thereon with The step of data stability detection method for being used for big data described in above-mentioned embodiment illustrated in fig. 1 is performed during calculation machine instruction operation. Preferably, the storage medium can include computer-readable recording medium.Preferably, the storage medium can include ROM, RAM, disk or CD etc..
Further, the embodiment of the invention also discloses a kind of server, including memory and processor, the memory On be stored with the computer instruction that can be run on the processor, the processor performs when running the computer instruction The step of being used for the data stability detection method of big data described in above-mentioned embodiment illustrated in fig. 1.
Although present disclosure is as above, the present invention is not limited to this.Any those skilled in the art, this is not being departed from In the spirit and scope of invention, it can make various changes or modifications, therefore protection scope of the present invention should be with claim institute The scope of restriction is defined.

Claims (18)

  1. A kind of 1. data stability detection method for big data, it is characterised in that including:
    The training data arranged according to time series is obtained based on historical data;
    The training data is fitted, to obtain data distribution of the training data with the time;
    Determine whether the actual provision data exception occur according to the comparative result of actual provision data and the data distribution.
  2. 2. the data stability detection method according to claim 1 for big data, it is characterised in that described to be based on going through History data obtain including according to the training data of time series arrangement:The number in the historical data is extracted according to time series According to obtain time series data;The time series data is cleaned, to obtain the training data.
  3. 3. the data stability detection method according to claim 2 for big data, it is characterised in that described to described Time series data is cleaned, and is included with obtaining the training data:Logarithmic transformation is carried out to the time series data, with Data after being converted;The abnormal data in the data after the conversion is rejected, to obtain the training data.
  4. 4. the data stability detection method according to claim 3 for big data, it is characterised in that the abnormal number According to referring to:Not in (QL-1.5IQR,QU+ 1.5IQR) in the range of data, wherein, QLRepresent under the data after the conversion Quartile, QUThe upper quartile of the data after the conversion is represented, IQR represents the interquartile-range IQR of the data after the conversion, IQR =QU-QL
  5. 5. the data stability detection method according to claim 1 for big data, it is characterised in that the history number According to including following one or more:Website Monitoring Data, SDK Monitoring Data, offline business data, structuring Query language data, real-time interface data.
  6. 6. the data stability detection method according to claim 1 for big data, it is characterised in that described to described Training data is fitted, and is included with obtaining the training data with the data distribution of time:Build the multinomial containing parameter Model;Based on the training data, the parameter of the multinomial model is solved using least square method or gradient descent method, with Obtain data distribution of the training data with the time.
  7. 7. the data stability detection method according to claim 1 for big data, it is characterised in that described according to reality Data are supplied on border and the comparative result of the data distribution determines whether the actual provision data abnormal include occur:
    Construct confidential interval of the data distribution under default confidence level;
    If the actual provision data fall into the confidential interval, judge that the actual provision data are normal, otherwise judge The actual provision data exception.
  8. 8. the data stability detection method according to claim 7 for big data, it is characterised in that also include:Figure Shape shows following one or more:Confidential interval, the reality of the data distribution, the data distribution under default confidence level Border supplies data and represents the mark of the actual provision data exception.
  9. A kind of 9. data stability detection means for big data, it is characterised in that including:
    Training module, suitable for obtaining the training data arranged according to time series based on historical data;
    Fitting module, suitable for being fitted to the training data, to obtain data distribution of the training data with the time;
    Judge module, suitable for determining the actual provision data according to the comparative result of actual provision data and the data distribution Whether exception is occurred.
  10. 10. the data stability detection means according to claim 9 for big data, it is characterised in that training module Suitable for extracting the data in the historical data according to time series, to obtain time series data;To the time series number According to being cleaned, to obtain the training data.
  11. 11. the data stability detection means according to claim 10 for big data, it is characterised in that training module It is further adapted for carrying out logarithmic transformation to the time series data, with the data after being converted;Reject the data after the conversion In abnormal data, to obtain the training data.
  12. 12. the data stability detection means according to claim 11 for big data, it is characterised in that the exception Data refer to:Not in (QL-1.5IQR,QU+ 1.5IQR) in the range of data, wherein, QLRepresent lower the four of the training data Branch, QUThe upper quartile of the training data is represented, IQR represents the interquartile-range IQR of the data after the conversion, IQR=QU- QL
  13. 13. the data stability detection means according to claim 9 for big data, it is characterised in that the history Data include following one or more:Website Monitoring Data, SDK Monitoring Data, offline business data, structure Change query language data, real-time interface data.
  14. 14. the data stability detection means according to claim 9 for big data, it is characterised in that
    The fitting module includes:
    Submodule is built, suitable for constructing the multinomial model containing parameter;
    Submodule is solved, based on the training data, the polynomial module is solved using least square method or gradient descent method The parameter of type, to obtain data distribution of the training data with the time.
  15. 15. the data stability detection means according to claim 9 for big data, it is characterised in that
    Judge module is suitable to construct confidential interval of the data distribution under default confidence level;
    If the actual provision data fall into the confidential interval, judge module is suitable to judging the actual provision data just Often, the actual provision data exception is otherwise judged.
  16. 16. the data stability detection means according to claim 15 for big data, it is characterised in that also include: Image display module, suitable for one or more below figure shows:The data distribution, the data distribution are in default confidence level Under confidential interval, the actual provision data and the abnormal mark for representing the actual provision data.
  17. 17. a kind of storage medium, is stored thereon with computer instruction, it is characterised in that is performed during the computer instruction operation The step of being used for the data stability detection method of big data described in any one of claim 1 to 8.
  18. 18. a kind of server, including memory and processor, it is stored with what can be run on the processor on the memory Computer instruction, it is characterised in that perform claim requirement any one of 1 to 8 when the processor runs the computer instruction The step of data stability detection method for big data.
CN201710705979.6A 2017-08-17 2017-08-17 Data stability detection method and device, storage medium, server for big data Pending CN107577721A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710705979.6A CN107577721A (en) 2017-08-17 2017-08-17 Data stability detection method and device, storage medium, server for big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710705979.6A CN107577721A (en) 2017-08-17 2017-08-17 Data stability detection method and device, storage medium, server for big data

Publications (1)

Publication Number Publication Date
CN107577721A true CN107577721A (en) 2018-01-12

Family

ID=61034285

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710705979.6A Pending CN107577721A (en) 2017-08-17 2017-08-17 Data stability detection method and device, storage medium, server for big data

Country Status (1)

Country Link
CN (1) CN107577721A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112734977A (en) * 2020-12-25 2021-04-30 安徽省安泰科技股份有限公司 Equipment risk early warning system and algorithm based on Internet of things
CN116542635A (en) * 2023-07-05 2023-08-04 浙江和达科技股份有限公司 Intelligent monitoring method and device for water affair data and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5151279B2 (en) * 2007-07-09 2013-02-27 株式会社明電舎 Trolley wire wear management method and wear management device
CN103974311A (en) * 2014-05-21 2014-08-06 哈尔滨工业大学 Condition monitoring data stream anomaly detection method based on improved gaussian process regression model
CN106055613A (en) * 2016-05-26 2016-10-26 华东理工大学 Cleaning method for data classification and training databases based on mixed norm
CN106960126A (en) * 2017-03-28 2017-07-18 联想(北京)有限公司 Data early warning method and equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5151279B2 (en) * 2007-07-09 2013-02-27 株式会社明電舎 Trolley wire wear management method and wear management device
CN103974311A (en) * 2014-05-21 2014-08-06 哈尔滨工业大学 Condition monitoring data stream anomaly detection method based on improved gaussian process regression model
CN106055613A (en) * 2016-05-26 2016-10-26 华东理工大学 Cleaning method for data classification and training databases based on mixed norm
CN106960126A (en) * 2017-03-28 2017-07-18 联想(北京)有限公司 Data early warning method and equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
伍楠林: "《中国金融市场风险预警研究》", 31 December 2012 *
周臻等: "《拉锁预应力网格结构的分析理论、施工控制与优化设计》", 31 December 2013 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112734977A (en) * 2020-12-25 2021-04-30 安徽省安泰科技股份有限公司 Equipment risk early warning system and algorithm based on Internet of things
CN112734977B (en) * 2020-12-25 2022-07-05 安徽省安泰科技股份有限公司 Equipment risk early warning system and algorithm based on Internet of things
CN116542635A (en) * 2023-07-05 2023-08-04 浙江和达科技股份有限公司 Intelligent monitoring method and device for water affair data and electronic equipment
CN116542635B (en) * 2023-07-05 2023-10-20 浙江和达科技股份有限公司 Intelligent monitoring method and device for water affair data and electronic equipment

Similar Documents

Publication Publication Date Title
Assaf et al. Mtex-cnn: Multivariate time series explanations for predictions with convolutional neural networks
US20140372175A1 (en) Method and system for detection, classification and prediction of user behavior trends
US20170168996A1 (en) Systems and methods for web page layout detection
US20190050763A1 (en) Method and system for model fitting to hierarchical time series cluster
JP6193287B2 (en) Anomaly detection device, anomaly detection method, and network anomaly detection system
US20180164794A1 (en) Methods and Systems for Discovery of Prognostic Subsequences in Time Series
US20140129499A1 (en) Value oriented action recommendation using spatial and temporal memory system
CN109101476A (en) A kind of term vector generates, data processing method and device
CN110634081A (en) Method and device for processing abnormal data of hydropower station
Yang et al. Hyperspectral image classification with spectral and spatial graph using inductive representation learning network
CN109144964A (en) log analysis method and device based on machine learning
Vries et al. Application of machine learning techniques to predict anomalies in water supply networks
CN111695584A (en) Time series data monitoring system and time series data monitoring method
JP2022145822A (en) Video processing apparatus, video processing method, and program
CN114881989A (en) Small sample based target object defect detection method and device, and electronic equipment
Usmani et al. A review of unsupervised machine learning frameworks for anomaly detection in industrial applications
US20210319333A1 (en) Methods and systems for detection and isolation of bias in predictive models
CN107577721A (en) Data stability detection method and device, storage medium, server for big data
CN104536996A (en) Computational node anomaly detection method in isomorphic environments
e Oliveira et al. On the influence of overlap in automatic root cause analysis in manufacturing
US20190197578A1 (en) Generating significant performance insights on campaigns data
CN115545103A (en) Abnormal data identification method, label identification method and abnormal data identification device
Ramos et al. Multivariate statistical process control methods for batch production: A review focused on applications
Yan et al. A comprehensive survey of deep transfer learning for anomaly detection in industrial time series: Methods, applications, and directions
Li et al. Signal anomaly detection of bridge SHM system based on two-stage deep convolutional neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180112

RJ01 Rejection of invention patent application after publication