CN106709250A - Data flow abnormality detection method based on parallel Kalman algorithm - Google Patents

Data flow abnormality detection method based on parallel Kalman algorithm Download PDF

Info

Publication number
CN106709250A
CN106709250A CN201611197599.8A CN201611197599A CN106709250A CN 106709250 A CN106709250 A CN 106709250A CN 201611197599 A CN201611197599 A CN 201611197599A CN 106709250 A CN106709250 A CN 106709250A
Authority
CN
China
Prior art keywords
value
measured value
factor
influence
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611197599.8A
Other languages
Chinese (zh)
Inventor
许国艳
花青
石水倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hohai University HHU
Original Assignee
Hohai University HHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hohai University HHU filed Critical Hohai University HHU
Priority to CN201611197599.8A priority Critical patent/CN106709250A/en
Publication of CN106709250A publication Critical patent/CN106709250A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16ZINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS, NOT OTHERWISE PROVIDED FOR
    • G16Z99/00Subject matter not provided for in other main groups of this subclass

Abstract

The invention discloses a data flow abnormality detection method based on a parallel Kalman algorithm. The data flow abnormality detection method comprises the following steps that 1, measurement data of a sensor in a period of time is acquired; 2, the measurement data is compared with a measurement value in a previous period of time, once a change is generated, an estimation value is calculated through the Kalman algorithm according to the measurement value, an absolute value of a difference between the estimation value and the measurement value is compared with a specified threshold value, and if the absolute value is not smaller than the threshold value, the absolute value is judged to be an abnormal value, and the next step is conducted; 3, the generation reasons of the abnormal value are judged by considering a time influence factor, a space influence factor and other factors such as the flood period, the weather and the human factors which influence abnormality detection and recorded, and information is stored in a database. According to the data flow abnormality detection method, the time influence factor, the space influence factor and the other provenance information influence factor are taken into account; an algorithm task is decomposed and processed in parallel in order to improve the algorithm efficiency, and the detection precision is improved.

Description

A kind of data flow anomaly detection method based on parallel Kalman algorithms
Technical field
The invention belongs to big data administrative skill field, more particularly to a kind of data flow based on parallel Kalman algorithms is different Normal detection method.
Background technology
Method for detecting abnormality to data flow is generally required for numerous and diverse calculating, in addition it is also necessary to data are modified and is melted Close, this is a process for complexity, so how to ensure that the accuracy and efficiency of detection is most important.
Although also there is Kalman algorithms in the prior art, consider the time, space and other play source information, The calculating and detection of individual event are simply carried out simultaneously.So that it is not high for the precision of overall data flow anomaly detection, can cause Occurs error in detection process.
The content of the invention
Goal of the invention:The present invention provides a kind of data flow anomaly detection method based on parallel Kalman algorithms, can solve Certainly in data flow abnormality detection the low problem of inaccuracy and efficiency.
The invention discloses a kind of data flow anomaly detection method based on parallel Kalman algorithms, comprise the steps of:
Step one:Obtain one section of measurement data of sensor, predominantly real time water level measured value;
Step 2:The current measured value of sensor and measured value for the previous period are compared, judge that measured value is It is no to stablize unchanged;If measured value is not changed within a period of time, data are just carried out summary extraction by that, then by data It is stored in database;If measured value has a difference with data for the previous period, that is put into step 3;
Step 3:Measured value in step 2 is calculated into estimate by Kalman algorithms, and by estimate and is measured The absolute value of difference is compared with given threshold value between value, normal value is then judged to if less than threshold values and summary is carried out shift to an earlier date Then database is stored in, otherwise is judged to exceptional value and is entered step 4;
Step 4:Time-concerning impact factor is calculated according to sensor measured value for the previous period, if time-concerning impact factor Less than threshold value, then it is judged to improper value, Exception Type is monodrome exception, then records and correct replacement exceptional value and then be stored in number According to storehouse, otherwise into step 5;
Step 5:Measured value according to the sensor being associated with sensing station calculates the spacial influence factor, if empty Between factor of influence be less than threshold value, then be judged to improper value, Exception Type is single sensor continuous abnormal, and recording exceptional is simultaneously corrected Then database is stored in, is processed according to sensor states, otherwise into step 6;
Step 6:Factor according to the other influences abnormality detection including flood season, weather and human factor is different to judge this The producing cause of constant value, Exception Type is multiple sensor continuous abnormals, is finally recorded the abnormal cause of the exceptional value And information is stored in database.
The present invention provides a kind of parallel Kalman methods based on multidimensional factor of influence according to flow data feature.Algorithm by when Between, space and other play these three dimensional informations of source information as factor of influence, improve the degree of accuracy of abnormality detection result;So Task-decomposing is carried out into parallel processing, boosting algorithm efficiency afterwards;Finally algorithm is tested, the feasible of algorithm is demonstrated Property.
Further, calculate estimate using Kalman algorithms in the step 3, and by estimate and measured value it Between difference it is as follows with the concrete operation step that given threshold value is compared judgement:
Step 3.1:Input Initial state estimation value, initial mean square error estimate and initial covariance;
Step 3.2:One section of measurement data before to current time is decomposed using wavelet transformation to data, by coefficient The noise as measurement is extracted less than the HFS of threshold value, a measured value is obtained;
Step 3.3:The weights of one and time correlation are added to the measured value in this period;
Step 3.4:Measured value according to last moment estimates the measured value at current time, further according to the measurement at current time Value corrects state estimation in real time, and will update measurement noise, prediction task and amendment task three tasks is carried out at parallelization Reason, COMPREHENSIVE CALCULATING obtains an estimate;
Step 3.5:The result of calculation that step 3.1 and step 3.3 are obtained is carried out the judgement of estimate and measurement value difference, such as Fruit estimate is more than threshold value with the absolute value of measurement value difference, into step 4;Otherwise it is judged to normal value.
Further, measurement noise, three tasks of prediction task and amendment task are updated in the step 3.4 is carried out parallel Change is processed, and last COMPREHENSIVE CALCULATING obtains comprising the following steps that for estimate:
3.4.1:Update measurement noise
To length for the measurement value sequence of L carries out wavelet decomposition, input quantity is measured value y (t-L-1), the y (t- of L L) ..., y (t-1), and it is output as covariance Qw (t) of the measurement noise of t;Wherein update measurement noise formula be:
QW(t-1)=Ew (t-1) wT(t-1)
Wherein, the process of small echo extraction noise is:W (t-1)=WaveDec (y (t-1));
It is subsequently adding forgetting factor:
Qw (t-1)=(1- λt)Qw(t-1)
t[w(t-1)wT(t-1)-C(t-1)P(t|t-1)CT(t-1)]
Wherein, λt=(1- λ)/(1- λt), and 0 < λ < 1;
3.4.2:Prediction task
The predicting link of the task is to calculate state estimation and mean square error estimation;Therefore when the input of prediction task is t-1 The state estimation at quarterWith the error estimate P (t-1) at t-1 moment, and export be t state estimationsAnd the error of t estimates P (t | t-1);
Calculate a step state estimation:
Calculate step mean square error estimation:
P (t | t-1)=F (t-1) P (t-1) FT(t-1)+Qv(t-1)
Wherein, Qv (t-1) is the system noise variance at t-1 moment, and Qv (t) here is constant;
3.4.3:Amendment task
Amendment task is mainly calculating Kalman filter gain, then corrects state according to Kalman gains and measured value Estimate and estimation error;The measurement noise at P (t | t-1), t-1 moment is estimated in input in link is corrected for the error of t Variance Qw (t-1), the state estimations of tAnd measured value y (t) of t, and it is output as the state of t EstimateWith estimation error P (t) of t;
Estimated according to mean square error, calculate filtering gain:
According to a step state estimation and filtering gain, state estimation is updated:
Update mean square error:
P (t)=[1-K (t) C (t-1)] P (t | t-1)
3.4.4:COMPREHENSIVE CALCULATING measures estimate:
Further, the step 4 fall into a trap evaluation time factor of influence method it is as follows:
Wherein:λtT () is the time dimension factor of influence of t, yi(t-j) it is the measured value of t-j moment node is,It is the discreet value of t-j moment node is.
Further, the method that the spacial influence factor is calculated in the step 5 is as follows:
Wherein:λsT () is the Spatial Dimension factor of influence of t, yiT () is the measured value of t node i,It is t The discreet value of moment node i.
Further, the factor of the other influences abnormality detection in the step 6 is mainly flood season, wherein flood season influence The computational methods of the factor are as follows:
Wherein:λfT () is the flood season factor of influence of t;N, M are respectively the sampling of non-flood period period and period in flood season Number;PN,t、PM,tRespectively non-flood period period and period in flood season t sampling water level value.
The present invention is directed to data flow anomaly test problems, to improve the degree of accuracy, introduces time, space and other origin letters Breath, proposes a kind of Kalman methods based on multidimensional factor of influence;On this basis, it is the efficiency of raising algorithm, task is entered Row parallelization is processed, and proposes a kind of parallel Kalman methods based on multidimensional factor of influence;Finally the algorithm for proposing is carried out Experiment, demonstrates the feasible, effective of algorithm.
Brief description of the drawings
Fig. 1 is existing Kalman algorithm flow charts;
Fig. 2 is the flow chart of data flow anomaly detection method of the present invention;
Fig. 3 is the Kalman algorithm flow charts after the present invention is improved;
Fig. 4 is weight function schematic diagram;
Fig. 5 is AKF, WKF, MDF-KF Riming time of algorithm comparison diagram in embodiment;
Fig. 6 is AKF, WKF, MDF-KF and PKF algorithm time comparison diagram in embodiment;
Specific embodiment
With reference to specific embodiment, the present invention is furture elucidated, it should be understood that these embodiments are merely to illustrate the present invention Rather than limitation the scope of the present invention, after the present invention has been read, those skilled in the art are to various equivalences of the invention The modification of form falls within the application appended claims limited range.
Parallel Kalman algorithms based on multidimensional factor of influence, are the abnormal inspection of lifting to the abnormality detection of data flow first Survey the degree of accuracy of result, and can determine that abnormal type and producing cause, add the time, space and other play source information A kind of three factors of influence of dimension, it is proposed that Kalman methods based on multidimensional factor of influence;Then, imitated to improve algorithm Rate, Task-decomposing is carried out by algorithm, carries out tasks in parallel, proposes that a kind of parallel Kalman based on multidimensional factor of influence is calculated Method.
Kalman algorithms based on multidimensional factor of influence:
1st, algorithm improvement
(1) improve wavelet transformation and extract measurement noise
Kalman algorithms are substantially a processes for constantly circulating, and constantly correct estimate by measured value y (t)So as to improve the accuracy of prediction.But due to basic Kalman filter algorithm be only applicable to known to measurement noise be In system, so effect in actual applications is unsatisfactory.So the Kalman algorithms based on multidimensional factor of influence of the invention Improved in terms of measurement noise is obtained, the Kalman algorithm flow charts before improvement are as shown in Figure 1.
Kalman algorithms based on multidimensional factor of influence of the present invention are improved for obtaining measurement noise, and Measurement noise will be obtained and be divided into two steps:
1. it is one section of measurement data of L that the algorithm chooses the length before current time before abnormality detection is carried out, first, It is decomposed using wavelet transformation, coefficient is extracted as measurement noise less than the HFS of threshold value.Based on many Tie up in the Kalman algorithms of factor of influence using to be the method based on threshold value, the process for extracting noise can be subdivided into small echo Decomposition, threshold value screening, wavelet reconstruction, extraction 4 steps of noise.
2. go out after measurement noise using wavelet transformation extract real-time, this paper algorithms are also made an uproar to the measurement in this period Sound adds the weights of and time correlation, so as to further improve the accuracy that measurement noise is estimated.Assuming that in time span For in the data segment of L, measurement noise the value respectively w1, w2 ..., wL obtained by wavelet transformation, the then noise figure at L+1 moment For:
Wherein:λt=(1- λ)/(1- λt) it is t measurement noise wtWeights, λ ∈ (0,1).Fig. 4 is that weight function shows It is intended to, wherein, the first width figure is k moment for being tried to achieve according to wavelet transformation and the L-1 measurement noise value at moment before, originally Method be to be averaging as the measurement noise value at t+1 moment according to this L measurement noise value.It is obvious that do so has very Big error, because the measurement noise at more early moment is smaller for the measurement noise influence at current time.Second width figure is weights Function graft, it can be seen that nearer apart from current time, weights are bigger;Conversely, from current time more away from, weights are smaller.This One characteristic just with influence of the noise to current noise is not consistent in the same time, therefore, weight function is added to wavelet transformation and is asked During measurement noise, so as to improve its accuracy.3rd width figure is exactly to add the result after weights.
(2) multidimensional factor of influence is added
Under wireless sensor network environment, existing data fusion operation is main in time dimension and Spatial Dimension logarithm According to being merged, time dimension is referred to for same sensor node, in data fusion not in the same time;Spatial Dimension refers to Be in synchronization, for the data fusion between adjacent node.On this basis, data origin is added to abnormality detection In algorithm, it is proposed that the time, space and other rise source information this 3 dimensions influence abnormality detection factor.
For time dimension, the Kalman algorithms based on multidimensional factor of influence are carrying out the same of abnormality detection to current time When, the measured value detection case at several moment before the node can be also combined, obtain time-concerning impact factor.The time model that needs compare Enclose dynamically to adjust as needed, so as to improve algorithm adaptability in actual applications.The definition of time-concerning impact factor It is as follows:
Wherein:λtT () is the time dimension factor of influence of t, yi(t-j) it is the measured value of t-j moment node is,It is the discreet value of t-j moment node is.
For Spatial Dimension, the Kalman algorithms based on multidimensional factor of influence consider and present node have it is neighbouring or It is the influence of the abnormality detection situation for present node detection case of the node of the relations such as upstream and downstream, to these space correlations Node carries out data fusion.The factor of influence of Spatial Dimension is defined as follows:
Wherein:λsT () is the Spatial Dimension factor of influence of t, yiT () is the measured value of t node i,It is t The discreet value of moment node i.
Other data fusions for playing source information dimension refer to the predicted value and other influences exception according to Kalman algorithms A source information of detection, including weather, flood season, sensor service condition and human activity etc., for river in the period in flood season The characteristics of water level value situation of change of water level value changes and non-flood period period has notable difference, it is proposed that flood season factor of influence it is general Read, it is defined as follows:
Wherein:λfT () is the flood season factor of influence of t;N, M are respectively the sampling of non-flood period period and period in flood season Number;PN,t、PM,tRespectively non-flood period period and period in flood season t sampling water level value.
2. algorithm is realized
(1) first, each sensor node carries out Kalman algorithm detections to the data of oneself respectively, if detected The value at current time is exceptional value, then calculate time dimension factor of influence λ according to data for the previous periodt(t).If Go out λtT () is more than or equal to threshold xit, then by the predicted value at current time and the difference of measured valueAnd λtT () records, go forward side by side Enter second step;Otherwise it is assumed that the abnormity point is single abnormity point, replace the exceptional value with predicted value, and result of determination recorded In database.
(2) and then, according to each related sensor be transmitted through come characteristic valueAnd λtT () calculates Spatial Dimension influence Factor lambdas(t), if drawing λsT () is more than or equal to threshold xis, then by λsT () records, and enter the 3rd step;Otherwise it is assumed that should Abnormal is measurement value sensor exception, during result of determination recorded into database.
(3) it is last, according to the incoming λ of systemsT other play source information for () and flood season, working sensor state etc., judge different Normal type.
Parallel Kalman algorithms based on multidimensional factor of influence
1. parallelization is improved
In sensor network, Kalman algorithms are carried out into parallelization treatment can make full use of section in sensor Point, improves the efficiency of algorithm.
Conventional Kalman Algorithm parallelization methods have two kinds:
(1) matrix disassembling method.In the iterative process of Kalman algorithms, substantial amounts of matrix has been used to be added the behaviour being multiplied Make, someone decomposed and is simplified to these matrix operations, calculating these can be while carries out, so as to realize the effect of multimachine Really.
(2) task analytic approach.Kalman algorithms mainly have two links of prediction and amendment, in unit Kalman algorithms In, CPU must wait the calculating of prediction process to finish and can just be modified process afterwards, can so have a strong impact on computational efficiency.Dividing In cloth environment, the two processes can be decomposed, result of calculation is then transmitted by the communication between processor.
This algorithm is ultimately breaks down into the calculating four for updating measurement noise, prediction task, amendment task and the three dimensional effects factor Individual task, wherein update measurement noise, prediction task and amendment task can be with executed in parallel:
(1) measurement noise is updated
The method that the improvement Kalman algorithms based on multidimensional factor of influence extract measurement noise is described above, at this , it is necessary to length for the measurement value sequence of L carries out wavelet decomposition during individual.Therefore the input of this task is that quantity is L's Measured value y (t-L-1), y (t-L) ..., y (t-1), and be output as covariance Qw (t) of the measurement noise of t.Update measurement The formula of noise is:
QW(t-1)=Ew (t-1) wT(t-1) (formula 5)
Wherein, w (t-1)=WaveDec (y (t-1)), is the process of small echo extraction noise.
Wherein, λt=(1- λ)/(1- λt), and 0 < λ < 1.
(2) task is predicted
Gone out the flow chart of existing Kalman algorithms in Fig. 1, wherein prediction link task be calculate state estimation and Error covariance matrix.Therefore the input of prediction task is the state estimation at t-1 momentWith the estimation error at t-1 moment Value P (t-1), and output is the state estimations of tAnd the error of t estimates P (t | t-1).Prediction task Formula be:
P (t | t-1)=F (t-1) P (t-1) FT(t-1)+Qv (t-1) (formula 8)
Wherein, Qv (t-1) is the system noise variance at t-1 moment, because application scenarios of the invention are the water in river Literary sensor network, can regard system constant at one as, so Qv (t) is constant in a short time.
(3) task is corrected
Amendment task is mainly calculating Kalman filter gain, then corrects state according to Kalman gains and measured value Estimate and estimation error.The measurement noise at P (t | t-1), t-1 moment is estimated in input in link is corrected for the error of t Variance Qw (t-1), the state estimations of tAnd measured value y (t) of t, and it is output as the state of t EstimateWith estimation error P (t) of t.The formula for correcting link has:
P (t)=[1-K (t) C (t-1)] P (t | t-1) (formula 11)
As can be seen that the output of each task is the input of another task from three introductions of task above, such as Fruit is wanted to realize parallelization, just these three links must be adjusted.Method used herein is that will to correct link delayed, in advance Survey link and carry previous duration, be i.e. amendment link is constant, and predicts the formula of link and be changed into:
So, three tasks presented hereinbefore can just be calculated simultaneously with three different processors respectively, so that The calculating time is saved.
In addition to the task that above three is synchronously performed, also one is further calculated by three dimension factors of influence The task of abnormal results.Can just be carried out because this task needs to use the result for improving Kalman algorithms, it is impossible to above three Individual task is carried out simultaneously, so being carried out after placing it three parallel tasks.
2. parallelization is realized
In implementation process, the initiation parameter of system is provided first, then using three processors respectively to being situated between above Three tasks for continuing are calculated.Due to having carried out delayed treatment to amendment task, so not needing phase between these three tasks Mutually wait, it is only necessary to carry out necessary communication after completion is calculated to transmit result of calculation.Finally, will to measure estimate defeated Go out, carry out the treatment of next step.Fig. 3 gives the Parallel Algorithm flow chart after improving.
With reference to the Kalman algorithms based on multidimensional factor of influence and the parallel Kalman algorithms based on multidimensional factor of influence, plus Angle of incidence factor of influence, the spacial influence factor and other origins are followed (herein refer to flood season factor), and will extract measurement noise, shape State is estimated and three tasks of state revision carry out parallelization treatment.Total detection method flow is as shown in Figure 2:
Need to carry out wavelet decomposition according to current time measured value interior for the previous period due to extracting measurement noise, this The time that process is consumed will be far above state estimations and state revision, so needing for the extracting measurement noise of the task to enter one again Step is divided.Because the process that each moment extracts noise does not have coupled relation, it is possible to directly decomposed.
Experimental verification
1st, based on the Kalman algorithm experimentals result of multidimensional factor of influence and analysis
Kalman algorithm detection accuracy of the analysis based on multidimensional factor of influence.Use respectively based on time forgetting factor Kalman algorithms (Amnesic Kalman Filtering), Kalman algorithms (the Wavelet Kalman based on wavelet transformation Filtering the Kalman algorithms (MDF-KF)) and based on multidimensional factor of influence detect to identical data, comparative analysis Detection results of three kinds of algorithms to different type abnormity point.Remote measurement water level real time data collection according to certain river, chooses its May In it is continuous 1000 record.First, discontinuous 5 records are selected in this 1000 data at random, respectively plus or subtract The threshold value being previously set is gone, in this, as single abnormal data;Then, reselection 10 is continuously counted from this 1000 data Threshold value is added respectively according to section, as the situation of continuous abnormal point;Finally, then 5 continuous data are chosen, is modified as continuous The data successively decreased, the abnormal conditions die-offed as measured value.Analyzed by many experiments, this experiment is concluded that:And its He compares two algorithms, the error rate highest of AKF algorithms, and the situation of missing inspection and flase drop is than more serious, the inspection to continuous abnormal point Survey effect also poor;The error rate of WKF algorithms is slightly better than AKF algorithms, and missing inspection number is less, for single and continuous constant exceptional value Detection results it is slightly good, but it is poor to the abnormal conditions Detection results of continuous cataclysm, and false drop rate is also higher;Based on multidimensional The Kalman algorithms of factor of influence combine above two advantages of algorithm, and the detection number of continuous abnormal value is significantly improved, Missing inspection number and flase drop number are also reduced simultaneously, the accuracy rate of detection is improve.
The execution time of Kalman algorithm of the analysis based on multidimensional factor of influence.To the telemetry of above-mentioned selection, difference 8 sampled points between 100 to 5000 records are taken, three run times of algorithm (second) are contrasted, Fig. 5 is difference The broken line graph of Riming time of algorithm during data volume.From experiment, with the growth of data volume, the run time of AKF algorithms increases Long unobvious, because AKF algorithms do not need extract real-time measurement noise, calculating process is most simple.And WKF algorithms and based on many Wavelet transformation is needed to use to extract the measurement noise in a time period in the Kalman algorithms for tieing up factor of influence, so with number According to the growth of amount, the run time of the two algorithms can also increase quickly.Kalman algorithms based on multidimensional factor of influence Although operational efficiency is higher than AKF algorithm, also it is higher by than WKF algorithm a little, the accuracy rate of abnormality detection is apparently higher than forefathers Algorithm.Table 1 is algorithm detection error rate contrast (MDF-KF is the abbreviation of Kalman algorithms in table).
The algorithm of table 1 detects error rate contrast table
2nd, the parallel Kalman algorithm experimentals result based on multidimensional factor of influence and analysis
Present invention is generally directed to the on-line checking of abnormal data, the data volume in whole wireless sensor network is not very It is many, therefore in an experiment, at most only select 5000 datas.It is same to use remote measurement waterlevel data collection, choose 100 respectively and arrive 8 sampled points between 5000 datas, on the algorithm after parallelization and AKF algorithms, WKF algorithms and based on multidimensional influence because The run time (second) of Kalman algorithms (MDF-KF) these three algorithms of son is contrasted, and Fig. 6 is right for algorithm execution time Than figure.
The run time of AKF algorithms is most short, and WKF algorithms and the Kalman Riming time of algorithm based on multidimensional factor of influence It is more long.Parallel Kalman algorithms (PKF) based on multidimensional factor of influence after the parallelization effect in lower data amount is poor, Because Storm is before data processing is carried out, first having to take some time carries out resource allocation.When data volume is relatively low, point With the shared large percentage in the wastage in bulk or weight time of task and distribution time for being consumed of resource, therefore parallelization effect and pay no attention to Think;After data volume increase to a certain extent, the time effects that distribution task and resource are consumed diminish, at this moment parallelization Advantage is also embodied.From fig. 6, it can be seen that when data volume is reached after 1000, needed for the algorithm after parallelization treatment Time less than the algorithm before parallelization.Table 2 gives mistake when carrying out abnormality detection using these factors of influence respectively Rate.As can be seen from the results, the factor of influence of time dimension can reduce missing inspection number and lift the detection of continuous abnormal value Number, the factor of influence of Spatial Dimension can reduce flase drop number, and other factors of influence for playing source information dimension can reduce mistake Inspection number.
The different dimensions error rate contrast table of table 2

Claims (6)

1. a kind of data flow anomaly detection method based on parallel Kalman algorithms, it is characterised in that comprise the steps of:
Step one:Obtain one section of measurement data of sensor, predominantly real time water level measured value;
Step 2:The current measured value of sensor and measured value for the previous period are compared, judge whether measured value is steady It is fixed unchanged;If measured value is not changed within a period of time, data are just carried out summary extraction by that, are then stored in data Database;If measured value has a difference with data for the previous period, that is put into step 3;
Step 3:Measured value in step 2 is calculated into estimate by Kalman algorithms, and by estimate and measured value it Between the absolute value of difference be compared with given threshold value, normal value is then judged to if less than threshold values and summary is carried out shift to an earlier date then Database is stored in, otherwise is judged to exceptional value and is entered step 4;
Step 4:Time-concerning impact factor is calculated according to sensor measured value for the previous period, if time-concerning impact factor is less than Threshold value, then be judged to improper value, and Exception Type is monodrome exception, then records and corrects replacement exceptional value and then be stored in data Storehouse, otherwise into step 5;
Step 5:Measured value according to the sensor being associated with sensing station calculates the spacial influence factor, if space shadow Ring the factor and be less than threshold value, be then judged to improper value, Exception Type is single sensor continuous abnormal, and then recording exceptional is simultaneously corrected Database is stored in, is processed according to sensor states, otherwise into step 6;
Step 6:Factor according to the other influences abnormality detection including flood season, weather and human factor judges the exceptional value Producing cause, Exception Type is multiple sensor continuous abnormals, finally by the abnormal cause of the exceptional value recorded and incite somebody to action Information is stored in database.
2. a kind of data flow anomaly detection method based on parallel Kalman algorithms according to claim 1, its feature exists In, calculate estimate using Kalman algorithms in the step 3, and by the difference between estimate and measured value with it is given The concrete operation step that threshold value is compared judgement is as follows:
Step 3.1:Input Initial state estimation value, initial mean square error estimate and initial covariance;
Step 3.2:One section of measurement data before to current time is decomposed using wavelet transformation to data, and coefficient is less than The HFS of threshold value extracts the noise as measurement, obtains a measured value;
Step 3.3:The weights of one and time correlation are added to the measured value in this period;
Step 3.4:Measured value according to last moment estimates the measured value at current time, and the measured value further according to current time comes State estimation is corrected in real time, and will update measurement noise, three tasks of prediction task and amendment task carries out parallelization treatment, comprehensive Conjunction is calculated an estimate;
Step 3.5:The result of calculation that step 3.1 and step 3.3 are obtained is carried out the judgement of estimate and measurement value difference, if estimated Evaluation is more than threshold value with the absolute value of measurement value difference, into step 4;Otherwise it is judged to normal value.
3. a kind of data flow anomaly detection method based on parallel Kalman algorithms according to claim 1, its feature exists In, measurement noise, three tasks of prediction task and amendment task are updated in the step 3.4 carries out parallelization treatment, and finally COMPREHENSIVE CALCULATING obtains comprising the following steps that for estimate:
3.4.1:Update measurement noise
To length for the measurement value sequence of L carries out wavelet decomposition, input quantity is the measured value y (t-L-1) of L, y (t-L) ..., y (t-1), it is output as covariance Qw (t) of the measurement noise of t;Wherein update measurement noise formula be:
QW(t-1)=Ew (t-1) wT(t-1)
Wherein, the process of small echo extraction noise is:W (t-1)=WaveDec (y (t-1));
It is subsequently adding forgetting factor:
Qw (t-1)=(1- λt)Qw(t-1)+λt[w(t-1)wT(t-1)-C(t-1)P(t|t-1)CT(t-1)]
Wherein, λt=(1- λ)/(1- λt), and 0 < λ < 1;
3.4.2:Prediction task
The predicting link of the task is to calculate state estimation and mean square error estimation;Therefore the input of prediction task is the t-1 moment State estimationWith the error estimate P (t-1) at t-1 moment, and export be t state estimations And the error of t estimates P (t | t-1);
Calculate a step state estimation:
x ^ ( t | t - 1 ) = F ( t - 1 ) x ^ ( t - 1 ) + v ( t - 1 )
Calculate step mean square error estimation:
P (t | t-1)=F (t-1) P (t-1) FT(t-1)+Qv(t-1)
Wherein, Qv (t-1) is the system noise variance at t-1 moment, and Qv (t) here is constant;
3.4.3:Amendment task
Amendment task is mainly calculating Kalman filter gain, then corrects state estimation according to Kalman gains and measured value And estimation error;The measurement noise variance at P (t | t-1), t-1 moment is estimated in input in link is corrected for the error of t Qw (t-1), the state estimations of tAnd measured value y (t) of t, and it is output as the state estimation of tWith estimation error P (t) of t;
Estimated according to mean square error, calculate filtering gain:
K ( t ) = P ( t | t - 1 ) C T ( t - 1 ) C ( t - 1 ) P ( t | t - 1 ) C T ( t - 1 ) + Q w ( t - 1 )
According to a step state estimation and filtering gain, state estimation is updated:
x ^ ( t ) = x ^ ( t | t - 1 ) + K ( t ) [ y ( t ) - C ( t - 1 ) x ( t | t - 1 ) ]
Update mean square error:
P (t)=[1-K (t) C (t-1)] P (t | t-1)
3.4.4:COMPREHENSIVE CALCULATING measures estimate:
y ^ ( t ) = C ( t ) x ^ ( t ) + w ( t ) .
4. according to a kind of data flow anomaly detection method based on parallel Kalman algorithms that one of claims 1 to 3 is described, its Be characterised by, the step 4 fall into a trap evaluation time factor of influence method it is as follows:
λ t ( t ) = Σ j = 0 N | y i ( t - j ) - y ^ i ( t - j ) | / N
Wherein:λtT () is the time dimension factor of influence of t, yi(t-j) it is the measured value of t-j moment node is, It is the discreet value of t-j moment node is.
5. a kind of data flow anomaly detection method based on parallel Kalman algorithms according to claim 4, its feature exists In the method that the spacial influence factor is calculated in the step 5 is as follows:
λ s ( t ) = Σ i = 0 N | y i ( t ) - y ^ i ( t ) | / N
Wherein:λsT () is the Spatial Dimension factor of influence of t, yiT () is the measured value of t node i,It is t The discreet value of node i.
6. a kind of data flow anomaly detection method based on parallel Kalman algorithms according to claim 1, its feature exists In the factor of the other influences abnormality detection in the step 6 is mainly flood season, the wherein computational methods of flood season factor of influence It is as follows:
λ f ( t ) = ( 1 N Σ i = 1 N P N , t ) / ( 1 M Σ i = 1 M P M , t )
Wherein:λfT () is the flood season factor of influence of t;N, M are respectively the hits of non-flood period period and period in flood season;PN,t、 PM,tRespectively non-flood period period and period in flood season t sampling water level value.
CN201611197599.8A 2016-12-22 2016-12-22 Data flow abnormality detection method based on parallel Kalman algorithm Pending CN106709250A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611197599.8A CN106709250A (en) 2016-12-22 2016-12-22 Data flow abnormality detection method based on parallel Kalman algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611197599.8A CN106709250A (en) 2016-12-22 2016-12-22 Data flow abnormality detection method based on parallel Kalman algorithm

Publications (1)

Publication Number Publication Date
CN106709250A true CN106709250A (en) 2017-05-24

Family

ID=58938799

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611197599.8A Pending CN106709250A (en) 2016-12-22 2016-12-22 Data flow abnormality detection method based on parallel Kalman algorithm

Country Status (1)

Country Link
CN (1) CN106709250A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107484196A (en) * 2017-08-14 2017-12-15 北京上格云技术有限公司 The quality of data ensuring method and computer-readable medium of sensor network
CN108571997A (en) * 2017-12-26 2018-09-25 深圳市鼎阳科技有限公司 A kind of method and apparatus that measured point is steadily contacted in detection probe
CN108616838A (en) * 2018-04-29 2018-10-02 山东省计算中心(国家超级计算济南中心) Agricultural greenhouse Data Fusion method based on Kalman filtering algorithm
CN108981679A (en) * 2017-05-31 2018-12-11 精工爱普生株式会社 Circuit device, physical amount measuring device, electronic equipment and moving body
CN109388772A (en) * 2018-09-04 2019-02-26 河海大学 A kind of taboo search method that time-based Large Scale Graphs equilibrium k is divided
CN109522520A (en) * 2018-11-09 2019-03-26 河海大学 The multiple small echo coherent analysis method of groundwater level fluctuation and multiple factors
CN109699021A (en) * 2018-12-31 2019-04-30 宁波工程学院 One kind is based on time-weighted agriculture Internet of Things method for diagnosing faults
CN109990789A (en) * 2019-03-27 2019-07-09 广东工业大学 A kind of flight navigation method, apparatus and relevant device
CN112650281A (en) * 2020-12-14 2021-04-13 一飞(海南)科技有限公司 Multi-sensor tri-redundancy system, control method, unmanned aerial vehicle, medium and terminal
CN114137636A (en) * 2021-11-11 2022-03-04 四川九通智路科技有限公司 Regional meteorological monitoring management method and system for annular pressure sensor
CN115388931A (en) * 2022-10-27 2022-11-25 河北省科学院应用数学研究所 Credible monitoring method, monitoring terminal and storage medium for sensor abnormal data
CN115795350A (en) * 2023-01-29 2023-03-14 北京众驰伟业科技发展有限公司 Abnormal data information processing method in production process of blood rheology test cup

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
KATSUYA KONDO等: "《European Signal Processing Conference》", 31 December 2015 *
焉晓贞等: ""基于卡尔曼滤波的动态传感数据流估计方法"", 《仪器仪表学报》 *
王永利等: ""数据流上异常数据的在线检测与修正"", 《应该科学学报》 *
花青等: ""基于多维滑窗的异常数据检测方法"", 《计算机应用》 *
高羽等: ""小波变换域估计观测噪声方差的Kalman滤波算法及其在数据融合中的应用"", 《电子学报》 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108981679A (en) * 2017-05-31 2018-12-11 精工爱普生株式会社 Circuit device, physical amount measuring device, electronic equipment and moving body
CN107484196B (en) * 2017-08-14 2020-10-09 博锐尚格科技股份有限公司 Data quality assurance method for sensor network and computer readable medium
CN107484196A (en) * 2017-08-14 2017-12-15 北京上格云技术有限公司 The quality of data ensuring method and computer-readable medium of sensor network
CN108571997A (en) * 2017-12-26 2018-09-25 深圳市鼎阳科技有限公司 A kind of method and apparatus that measured point is steadily contacted in detection probe
CN108616838A (en) * 2018-04-29 2018-10-02 山东省计算中心(国家超级计算济南中心) Agricultural greenhouse Data Fusion method based on Kalman filtering algorithm
CN109388772A (en) * 2018-09-04 2019-02-26 河海大学 A kind of taboo search method that time-based Large Scale Graphs equilibrium k is divided
CN109522520A (en) * 2018-11-09 2019-03-26 河海大学 The multiple small echo coherent analysis method of groundwater level fluctuation and multiple factors
CN109699021A (en) * 2018-12-31 2019-04-30 宁波工程学院 One kind is based on time-weighted agriculture Internet of Things method for diagnosing faults
CN109699021B (en) * 2018-12-31 2021-08-10 宁波工程学院 Agricultural Internet of things fault diagnosis method based on time weighting
CN109990789A (en) * 2019-03-27 2019-07-09 广东工业大学 A kind of flight navigation method, apparatus and relevant device
CN112650281A (en) * 2020-12-14 2021-04-13 一飞(海南)科技有限公司 Multi-sensor tri-redundancy system, control method, unmanned aerial vehicle, medium and terminal
CN112650281B (en) * 2020-12-14 2023-08-22 一飞(海南)科技有限公司 Multi-sensor three-redundancy system, control method, unmanned aerial vehicle, medium and terminal
CN114137636A (en) * 2021-11-11 2022-03-04 四川九通智路科技有限公司 Regional meteorological monitoring management method and system for annular pressure sensor
CN114137636B (en) * 2021-11-11 2022-08-12 四川九通智路科技有限公司 Regional meteorological monitoring management method and system for annular pressure sensor
CN115388931A (en) * 2022-10-27 2022-11-25 河北省科学院应用数学研究所 Credible monitoring method, monitoring terminal and storage medium for sensor abnormal data
CN115388931B (en) * 2022-10-27 2023-02-03 河北省科学院应用数学研究所 Credible monitoring method, monitoring terminal and storage medium for sensor abnormal data
CN115795350A (en) * 2023-01-29 2023-03-14 北京众驰伟业科技发展有限公司 Abnormal data information processing method in production process of blood rheology test cup

Similar Documents

Publication Publication Date Title
CN106709250A (en) Data flow abnormality detection method based on parallel Kalman algorithm
US10606862B2 (en) Method and apparatus for data processing in data modeling
CN108027594B (en) Method for detecting anomalies in a water distribution system
US8892478B1 (en) Adaptive model training system and method
Visser Estimation and detection of flexible trends
CN113435725B (en) Power grid host dynamic threshold setting method based on FARIMA-LSTM prediction
CN106529145A (en) ARIMA-BP neutral network-based bridge monitoring data prediction method
CN111126680A (en) Road section traffic flow prediction method based on time convolution neural network
CN110991625B (en) Surface anomaly remote sensing monitoring method and device based on recurrent neural network
CN111461321A (en) Improved deep reinforcement learning method and system based on Double DQN
CN104156615A (en) Sensor test data point anomaly detection method based on LS-SVM
CN111639798A (en) Intelligent prediction model selection method and device
Meng et al. Analysis of ecological resilience to evaluate the inherent maintenance capacity of a forest ecosystem using a dense Landsat time series
CN114580260A (en) Landslide section prediction method based on machine learning and probability theory
CN113868953A (en) Multi-unit operation optimization method, device and system in industrial system and storage medium
Chabane et al. Sensor fault detection and diagnosis using zonotopic set-membership estimation
CN115935283B (en) Drought cause tracing method based on multi-element nonlinear causal analysis
CN115759263A (en) Strategy effect evaluation method and device based on cause and effect inference
CN115310705A (en) Method and device for determining gas emission quantity and computer readable storage medium
CN113569324A (en) Slope deformation monitoring abnormal data analysis and optimization method
CN115310359A (en) Method, device, equipment and medium for determining transient emission of nitrogen oxides
CN111221479B (en) Method, system and storage medium for judging abnormal storage capacity variation
Chen et al. Iterative convolution particle filtering for nonlinear parameter estimation and data assimilation with application to crop yield prediction
CN115470718B (en) Landslide prediction method combining random forest and logistic regression
CN117574298A (en) Time sequence data anomaly detection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170524