A kind of dam thundering observed data automatic identifying method
Technical field
The present invention relates to Dam Monitoring field more particularly to a kind of dam thundering observed data automatic identifying methods.
Background technique
China builds together into 100 meters or more more than 200 seats of dam, wherein 150 meters or more of more than 40 seat of high dam.To these dams into
Row monitoring, monitoring are to guarantee the important means of its safe operation." the Dam operational safety prison that Bureau of Energy in 2015 promulgates
Superintend and direct administrative provisions " " for 100 meters of height of dam or more of dam, electric power enterprise should establish dam safety and supervise online further requirement
Control system ", " dam center should carry out remote online technical supervision to the dam operational safety that registration (putting on record) registers ".Due to
Operating condition is complicated, and the monitoring point that high dam is laid is usually far more than common engineering;And when being managed to group dam, the survey that is related to
Point number is even more huge.
The vital task of on-line monitoring is identification abnormal operating condition, and basis is identification thundering observed data.Currently,
The visible outlier identification method in Dam Monitoring field is divided into method of identification based on envelope domain according to its principle, based on item
The method of identification of part similitude and method of identification three classes based on mathematical model.Wherein, the exception based on statistical regression mathematical model
Value recognition methods can be normally reached higher susceptibility, while also be not easy for normal value to be mistaken for exception, therefore application is most
It is extensive.But the people that the foundation of regression model is typically required for abundant data analysis experience operates completion.When to monitoring point pole
When more high dams or group dam is monitored, human cost is huge;Meanwhile different operators be likely to occur standard it is inconsistent,
The even situation of inaccuracy.In addition, also typically including the environment parameters such as water level, temperature in statistical model, and pass through in practical projects
Often there is the case where reporting and submitting when environment parameter and effect quantity difference, at this moment will be unable to obtain effect quantity predicted value using statistical model,
Thus also can not just judge whether measured value is abnormal.
Summary of the invention
Manual operation and environment parameter are relied in the dam thundering observed data identification technology based on statistical model
The excessively high problem of degree, the present invention provide a kind of method of automatic identification thundering observed data, it is intended to promote dam safety monitor
Precision, efficiency and the robustness that abnormal value identifies in work, and reduce human resources investment.
To achieve the goals above, the present invention discloses a kind of dam thundering observed data automatic identifying method, the method
The following steps are included:
1) by primary monitoring data sequence construct track matrix, singular value decomposition then is carried out to track matrix and obtains a system
Column feature group;
2) feature group is arranged from big to small according to characteristic value, selection accumulation contribution rate more than or equal to 85% it is preceding several
Feature group is main feature group;
3) the corresponding fundamental matrix of main feature group is calculated, diagonal equalization then is carried out to fundamental matrix and obtains data sequence
Several preceding main components of column;
4) main component obtained is added up and obtains reconstruct data sequence;
5) subtract each other to obtain residual sequence with reproducing sequence and former data sequence, find out the standard deviation of residual sequence;
6) judge whether measured value is exceptional value by Pauta criterion according to the standard deviation of residual sequence.
Further, in above-mentioned steps disclosed by the invention, the track matrix X is by monitoring data sequence f0,f1,
f2,...,fN-1Lag arrangement in time obtains, and indicates are as follows:
Wherein, N is monitoring data sequent data count, and L is length of window, and 1 < L < N, K includes by the track every row of matrix X
Measured value quantity, K=N-L+1;I, j are used to indicate element xijIt is to be arranged in the i-th row jth the location of in the matrix X of track.
Further, described that track matrix X progress singular value decomposition is referred to: to seek S=XXTNon-negative eigenvalue λ1,λ2,
λ3,...,λlAnd the feature vector U of corresponding orthonormalization1,U2,U3,...,UlAndThe spy
Sign group refers to (λi,Ui,Vi), referred to as ith feature group.
Further, the contribution rate CR of the ith feature groupiIt is calculated by following formula:
Further, the main feature group is the preceding m feature group accumulated contribution rate and be more than or equal to 85%, it may be assumed that
It is that i, j are used to indicate for which characteristic value, and m indicates that the sum of main feature group, l indicate the total of non-negative feature group
Number.
Further, the fundamental matrix XiIt is calculated by feature group:
Further, described to fundamental matrix XiThe process diagonally equalized are as follows: for the matrix X of L × Ki, enable it
Element is xij, L*=min (L, K), K*=max (L, K), N=L+K-1;If L < K, xij *=xij, otherwise xij *=xji.It is described
I-th of main componentIt is calculated as follows:
The reconstruct data sequence is obtained by preceding m main component is cumulative, it may be assumed that
The residual sequence ΔkSubtract each other to obtain by reproducing sequence and former data sequence, it may be assumed that
When the Pauta criterion is that take confidence level be 99%, the standard deviation of residual sequence is σ, when following formula is set up
Measured value is normal value, is otherwise exceptional value:
|Δk|≤2.58σ (14)
The invention has the benefit that monitoring data can be automatically extracted using the outlier identification method that the present invention announces
The main feature of sequence, avoids artificial founding mathematical models, not only can guarantee the consistency and accuracy of judgement, also substantially reduces
Human resources investments.Meanwhile it still being able to differentiate monitoring data when the environment parameters such as water level, temperature missing.
Detailed description of the invention
Fig. 1 is the flow chart of one embodiment of method of the invention.
Fig. 2 is measured data to be analyzed in one embodiment, is a top of arch dam displacement monitoring data.
Fig. 3 is the contribution rate distribution map of preceding 20 feature groups in one embodiment.
Fig. 4 is the corresponding main component curve graph of CR1=67.1% in one embodiment.
Fig. 5 is the corresponding main component curve graph of CR2=12% in one embodiment.
Fig. 6 is the corresponding main component curve graph of CR3=11.98% in one embodiment.
Fig. 7 is the reconstruct data sequence obtained in one embodiment by first three main component.
Fig. 8 is the residual sequence acquired in one embodiment by reconstruct data sequence and measured data sequence and reasonable residual error
Range.
Specific embodiment
Below in conjunction with attached drawing and one embodiment to the reality of Dam Monitoring Data outlier identification method of the present invention
It applies mode and beneficial effect is described further.Fig. 1 is the flow chart for the outlier identification method announced according to the present invention, is used
This method carries out analysis identification to top of arch dam displacement monitoring data shown in Fig. 2.The following steps are included:
Step 1: reading monitoring data sequence f0,f1,f2,...,fN-1, data sequence is lagged to arrangement in time and is obtained
Track matrix X:
Wherein, N is monitoring data sequent data count, and L is known as length of window, 1 < L < N;For there is periodic monitoring data
Sequence, it is desirable that the value of L is greater than the length of a cycle.L=400 days, N=6800 days in the present embodiment.
Step 2: singular value decomposition being carried out to track matrix X: seeking matrix S=XXT, seek the non-negative eigenvalue λ of S1,λ2,
λ3,...,λlAnd the feature vector U of corresponding orthonormalization1,U2,U3,...,UlAndObtain l
Feature group (λi,Ui,Vi), i=1,2 ... .l.
Step 3: seeking the contribution rate CR of ith feature groupi, its calculation formula is:
Wherein, the non-negative characteristic value sum that l is matrix S.
Step 4: feature group being arranged from big to small according to characteristic value, the accumulation contribution rate SCR of i feature group before askingi:
The accumulation contribution rate SCR of i feature group before judgingiWhether it is more than or equal to 85%, if set up, enables main feature
Group number m=i, and choosing this m feature group is main feature group;Otherwise, the cumulative tribute for seeking next feature group is continued through
Offer rate.
Fig. 3 is the contribution rate distribution map of preceding 20 feature groups of embodiment illustrated in fig. 2, it is seen that the contribution of first three feature group
Rate be noticeably greater than after feature group, accumulation contribution rate be 91%;Therefore, select first three feature group for the master of the present embodiment
Want feature group.
Step 5: calculating the corresponding fundamental matrix of main feature group: the corresponding fundamental matrix X of i-th of main feature groupiAre as follows:
Wherein, i=1,2 ..., m.
Step 6: diagonal equalization being carried out to the corresponding fundamental matrix of main feature group and seeks main component: for the square of L × K
Battle array Xi, enabling its element is xij, L*=min (L, K), K*=max (L, K), N=L+K-1;If L < K, xij *=xij, otherwise xij *
=xji.I-th of main componentIt is calculated as follows:
Fig. 4 show first three corresponding main component of main feature group of embodiment illustrated in fig. 2.
Step 7: preceding m main component being added up and obtains reconstruct data sequence, it may be assumed that
Wherein, k=0,1,2 ..., N-1.Heavy line show the reconstruct data sequence of corresponding embodiment illustrated in fig. 2 in Fig. 5
Column are obtained by first three corresponding main component of main feature group is cumulative.
Step 8: subtracting each other to obtain residual sequence Δ by reproducing sequence and former data sequencek, it may be assumed that
Wherein, k=0,1,2 ..., N-1.Fig. 6 show the residual sequence of corresponding embodiment illustrated in fig. 2.
Step 9: find out the standard deviation sigma of residual sequence:
The standard deviation of the residual error of embodiment illustrated in fig. 2 is 0.9090.
Step 10: for each measured value, judge whether measured value is exceptional value according to Pauta criterion:
|Δk|≤2.58σ (9)
When above formula is set up, measured value corresponding to residual error is normal value;Otherwise the probability that the residual error corresponds to measured value appearance is less than
1%, thus be exceptional value.
The grey area Fig. 6 is residual error range ± 2.3452 of normal measured value in the embodiment determined according to above-mentioned criterion,
Measured value corresponding to residual error except the region is abnormal value.