CN116451161A - Self-adaptive identification method for abnormal value of dam deformation monitoring data - Google Patents

Self-adaptive identification method for abnormal value of dam deformation monitoring data Download PDF

Info

Publication number
CN116451161A
CN116451161A CN202310272375.2A CN202310272375A CN116451161A CN 116451161 A CN116451161 A CN 116451161A CN 202310272375 A CN202310272375 A CN 202310272375A CN 116451161 A CN116451161 A CN 116451161A
Authority
CN
China
Prior art keywords
data
regression model
linear regression
model
monitoring data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310272375.2A
Other languages
Chinese (zh)
Inventor
程琳
肖晟
杨杰
陈家敏
徐笑颜
贾冬焱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Technology
Original Assignee
Xian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Technology filed Critical Xian University of Technology
Priority to CN202310272375.2A priority Critical patent/CN116451161A/en
Publication of CN116451161A publication Critical patent/CN116451161A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/20Drawing from basic elements, e.g. lines or circles
    • G06T11/203Drawing of straight lines or curves
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/20Drawing from basic elements, e.g. lines or circles
    • G06T11/206Drawing of charts or graphs
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A10/00TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE at coastal zones; at river basins
    • Y02A10/40Controlling or monitoring, e.g. of flood or hurricane; Forecasting, e.g. risk assessment or mapping

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Testing Or Calibration Of Command Recording Devices (AREA)

Abstract

The invention discloses a self-adaptive identification method for abnormal values of dam deformation monitoring data, which is mainly characterized in that key influencing factors of a dam deformation monitoring model in a modeling process are optimized by a BMS (management system) introduction method; and then, establishing a dam deformation robust regression model by utilizing the optimized key influence factor set, and carrying out robust regression analysis on dam deformation monitoring data by combining an LTS estimation method, so that regression modeling, data outlier identification and structure deformation prediction are simultaneously realized under the condition of no data preprocessing. The self-adaptive identification method for the abnormal value of the dam deformation monitoring data can self-adaptively overcome the misleading effect of the abnormal value on regression, the result enhances the significance of regression, and meanwhile, the data prediction precision is effectively improved.

Description

Self-adaptive identification method for abnormal value of dam deformation monitoring data
Technical Field
The invention belongs to the technical field of dam safety monitoring methods, and particularly relates to a self-adaptive identification method for abnormal values of dam deformation monitoring data.
Background
The method for establishing the safety monitoring model by utilizing the dam deformation monitoring data is an important method for quantitatively analyzing the deformation state of the dam. At present, three types of safety monitoring models which can be established aiming at dam deformation mainly exist, wherein the statistical model is widely applied to quantitative analysis of dam deformation monitoring data by virtue of the advantages of mature theory, simple modeling, convenience in use and the like.
The common statistical model mostly adopts experience influence factors to describe the influence of load on the deformation state of the dam, but in actual engineering, the factors influencing the deformation of the dam are complicated, and the redundancy factors in the experience influence factors can possibly cause poor fitting goodness of the statistical model and lower prediction precision; in addition, in the process of collecting the dam deformation monitoring data, due to the complexity of a data collecting environment, the limitation of the performance of collecting equipment and the influence of human factors, the obtained original monitoring data per se contains information of the actual running state of the dam, and is doped with abnormal values which do not accord with a specific physical and mechanical relation between the environment quantity and the deformation effect quantity, so that a statistical model is seriously deviated from reality, and meanwhile, due to the fact that the safety monitoring data quantity of the dam is large, the cost of abnormal value elimination is increased and the workload is increased to a certain extent.
In summary, in order to effectively remove the redundancy factor in the dam deformation experience influence factor and adaptively overcome the adverse effect of the abnormal value interference on the performance of the statistical model, a new safety monitoring model is provided and the abnormal value in the dam deformation monitoring data is very necessary to be identified based on the model.
Disclosure of Invention
The invention aims to provide a dam deformation monitoring data outlier self-adaptive identification method, which constructs a steady regression model based on BMS-LTS, can self-adaptively overcome the misleading effect of outlier on regression, and effectively improves the data prediction precision.
The technical scheme adopted by the invention is that the dam deformation monitoring data outlier self-adaptive identification method specifically comprises the following steps:
step 1, determining dam deformation measuring points which need to be subjected to abnormal value self-adaptive identification according to engineering reality, taking deformation of corresponding measuring points and experience influence factor monitoring data thereof as a data set of a robust regression model to be constructed, and dividing a training set and a testing set according to the size of the data set and the actual need;
step 2, reducing experience influence factors influencing dam deformation by adopting a BMS method, and removing redundant factors in the experience influence factors by adopting a BIC backward removal method so as to determine key influence factors of a robust regression model to be constructed;
step 3, constructing a multiple linear regression model by utilizing the deformation of the training set and the actual measurement value of key influence factor monitoring data thereof, and obtaining a final regression coefficient through LTS estimation, thereby constructing a steady regression model based on BMS-LTS;
step 4, marking a plurality of data used for final regression coefficient estimation as an optimal data group, marking the rest data as abnormality, identifying the possible horizontal shift abnormality in the data sequence, and visualizing the change condition of the abnormal value in the data sequence by adopting a double wedge diagram;
and 5, inputting key influence factors of the test set into the steady regression model trained in the step 3 and based on BMS-LTS to obtain dam deformation predicted values of corresponding measuring points.
The present invention is also characterized in that,
the step 2 specifically comprises the following steps:
step 2.1, constructing a complete linear regression model by utilizing all the experience influence factors obtained in the step 1, randomly removing one experience influence factor from the complete linear regression model, fitting a new linear regression model by using the rest factors, and recording BIC values of the new linear regression model; repeating the same times as the number of all experience influence factors, and selecting a linear regression model with a minimum BIC value for subsequent calculation and analysis;
step 2.2, aiming at the linear regression model with the minimum BIC value selected in the step 2.1, further observing BIC values of a plurality of new linear regression models which are fitted by removing one experience influence factor of the model again, and selecting a model with the lowest BIC value for subsequent calculation and analysis;
and 2.3, repeating the steps 2.1-2.2 until the BIC value of the fitted linear regression model is not reduced, and finally selecting the linear regression model with the global minimum BIC value as the optimal simple model, wherein the corresponding factors are key influence factors of the robust regression model to be constructed.
The calculation formula of the BIC value in the step 2 is as follows:
BIC=-2ln(likelihood)+(m+1)ln(n) (1)
wherein: n is the number of observations in the linear regression model, and m is the number of predictions;
the likelihood function for a given model M and its parameters θ is as follows:
likelihood=P(data|θ,M)=L(θ,M) (2);
when the sample size is large enough and the data obeys an exponential family distribution, the BIC value can be approximated as:
BIC≈-2 ln(P(data|M))=-2ln(∫P(data|θ,M)P(θ|M)dθ) (3)
wherein: p (data|m) is a marginal likelihood function for data under model M, and P (θ|m) is an a priori distribution for parameter θ.
The step 3 specifically comprises the following steps:
step 3.1, using the deformation of the corresponding measurement point training set and the key influence factor monitoring data selected by the BMS method to construct a multiple linear regression model, and setting the number of actual measurement values of the monitoring data as n, the multiple linear regression model can be expressed as:
wherein: x is an independent variable, x is E R n×p The method comprises the steps of carrying out a first treatment on the surface of the y is a dependent variable, y ε R nFor the parameters to be estimated->Epsilon is a random error term, epsilon R n The method comprises the steps of carrying out a first treatment on the surface of the p is the number of independent variables;
step 3.2, performing parameter estimation on the multiple linear regression model constructed in the step 3.1 by adopting LTS estimation, and randomly selecting H different sample points from n actual measurement values to form a sample subset H 0 Calculating to obtain an initial regression coefficient by adopting an LS method;
step 3.3, substituting the initial regression coefficient obtained in the step 3.2 into the multiple linear regression model established in the step 3.1, calculating to obtain square residuals of n actual measurement values, arranging the square residuals in an ascending order, and recording the sum of the square residuals of the first h actual measurement values asGiven arbitrary parameter->The square residual is defined as:
wherein: (x) i ,y i ) For the actual measurement of the monitored data at the ith sample point,the square residual error corresponding to the ith sample point;
step 3.4, reserving samples corresponding to the first H measured values with the least square residual in step 3.3, and taking the samples as a new sample subset H 1 Calculating to obtain a new regression coefficient by adopting an LS method;
step 3.5, substituting the new regression coefficient obtained in the step 3.4 into the multiple linear regression model established in the step 3.1, calculating to obtain square residuals of n actual measurement values, arranging the square residuals in an ascending order, and recording the sum of the square residuals of the first h actual measurement values as
Step 3.6, repeating the steps 3.2 to 3.5 until the sum of square residuals of the first h measured valuesConvergence, then->The expression is:
wherein, the value range of h needs to satisfy
Step 3.7, step 3.6And substituting the final regression coefficient corresponding to convergence into the multiple linear regression model established in the step 3.1 to obtain the robust regression model based on BMS-LTS.
The step 4 specifically comprises the following steps:
step 4.1, adding corresponding horizontal shift anomaly detection factors, namely delta, into the multiple linear regression model obtained in step 3.1 1 I(w≥δ 2 ) Wherein w=1, …, n; i (-) is an indicator function, delta 2 To shift the exact position of the abnormality horizontally, w (1 ),…,w (S) Is delta 2 Is w (s) ∈{w (1) ,…,w (S) };
Step 4.2, setting delta 2 =w (s) Wherein s=1 is initially set;
step 4.3, constructing an element subset E containing p-1 different measured values, and maintaining delta on the basis of the element subset E 2 =w (s) The steps 3.2 to 3.4 are carried out twice without change;
step 4.4, cycling the step 4.3 for a plurality of times within the range from 1 to the number of the element subsets E, and executing the steps 3.2 to 3.6 on the nbest element subsets generating the minimum objective function until convergence; if s>1, also from study w (s-1) The nbest element subset found at the time starts, delta is set as well 2 =w (s) Executing the steps 3.2 to 3.6 until convergence;
step 4.5, selecting regression coefficient with minimum objective function from 2 Xnbest data and usingIndicating that the corresponding residual ∈is stored>
Step 4.6, step 4.2 to step 4.5 are performed at all w (s) Upper loop, where s=1, …, S, then multipleTaking a block with the smallest objective function +.>Is->And is marked as +.>
Step 4.7, performing anomaly detection on the data sequence by applying a univariate anomaly value detection program;
step 4.8, from the initial estimateInitially, hold delta 2 The LS method is applied to all points which are not marked as abnormal in the step 4.7, and an abnormal value detection result is obtained;
and 4.9, drawing a double-wedge graph according to the abnormal value detection result obtained in the step 4.8 so as to intuitively visualize the horizontal displacement abnormality and possible abnormal conditions in the monitoring data.
The subset of elements E in step 4.3 should include w (s) Corresponding measured value, one satisfying w<w (s) And another p-3 measured values randomly extracted from the entire data sequence.
The invention has the advantages that,
(1) According to the self-adaptive identification method for abnormal values of the dam deformation monitoring data, the BMS method is adopted to reduce the influence factors of the dam deformation experience, redundant factors in the dam safety monitoring model are removed, key influence factor sets of the model are determined, and the reliability of parameter estimation results is effectively improved;
(2) The steady regression model based on BMS-LTS constructed by the dam deformation monitoring data outlier self-adaptive recognition method can overcome the adverse effect of outliers in the deformation monitoring data on regression estimation results, and the robustness on outliers is realized in a self-adaptive manner in the learning process, so that the significance of regression is enhanced, and the fitting goodness and the prediction accuracy are improved;
(3) The self-adaptive identification method for the abnormal value of the dam deformation monitoring data realizes the accurate identification of the abnormal value and the horizontal displacement abnormality in the dam deformation monitoring data, intuitively visualizes the possible abnormal situation in the data sequence by adopting the double wedge-shaped graphs, and has wide application prospect.
Drawings
FIG. 1 is a flow chart of a method for adaptively identifying outliers of dam deformation monitoring data according to the present invention;
FIG. 2 is a diagram showing the arrangement of deformation measuring points of an arch dam body in embodiment 1 of the present invention;
FIG. 3 is a schematic diagram of the radial horizontal displacement process of each measuring point after noise addition in the embodiment 1 of the present invention;
FIG. 4 is a schematic diagram of the adaptive recognition result of abnormal values of the PL3-2 measuring point horizontal displacement monitoring data in the embodiment 1 of the present invention;
FIG. 5 is a schematic diagram of the adaptive recognition result of abnormal values of the PL5-1 measuring point horizontal displacement monitoring data in the embodiment 1 of the present invention;
FIG. 6 is a graph showing the comparison of measured values of horizontal displacement of PL3-2 measuring points with the fitted values and predicted values of different models in example 1 of the present invention;
FIG. 7 is a graph showing the comparison of the measured horizontal displacement values of the PL5-1 measuring points in example 1 of the present invention with the fitting values and predicted values of different models.
Detailed Description
The invention will be described in detail below with reference to the drawings and the detailed description.
The invention relates to a self-adaptive identification method for abnormal values of dam deformation monitoring data, which is shown in fig. 1 and specifically comprises the following steps:
step 1, determining dam deformation measuring points which need to be subjected to abnormal value self-adaptive identification according to engineering reality, taking deformation of corresponding measuring points and experience influence factor monitoring data thereof as a data set of a robust regression model to be constructed, and dividing a training set and a testing set according to the size of the data set and the actual need;
step 2, adopting a BMS method, namely a Bayesian model selection method, to reduce experience influence factors influencing dam deformation, adopting a BIC, namely a backward elimination method of Bayesian information criterion, to eliminate redundant factors in the experience influence factors, thereby determining key influence factors of a robust regression model to be constructed, and specifically comprising the following steps:
step 2.1, constructing a complete linear regression model by utilizing all the experience influence factors obtained in the step 1, randomly removing one experience influence factor from the complete linear regression model, fitting a new linear regression model by using the rest factors, and recording BIC values of the new linear regression model; repeating the same times as the number of all experience influence factors, and selecting a linear regression model with the minimum BIC value for subsequent calculation and analysis, wherein the calculation formula of the BIC value is as follows:
BIC=-2 ln(likelihood)+(m+1)ln(n) (1)
wherein: n is the number of observations in the linear regression model and m is the number of predictions. The likelihood function for a given model M and its parameters θ is as follows:
likelihood=P(data|θ,M)=L(θ,M) (2);
when the sample size is large enough and the data obeys an exponential family distribution, the BIC can be approximated as:
BIC≈-2 ln(P(data|M))=-2 ln(∫P(data|θ,M)P(θ|M)dθ) (3)
wherein: p (data|m) is a marginal likelihood function for data under model M, P (θ|m) is a priori distribution about parameter θ;
step 2.2, aiming at the linear regression model with the minimum BIC value selected in the step 2.1, further observing BIC values of a plurality of new linear regression models which are fitted by removing one experience influence factor of the model again, and selecting a model with the lowest BIC value for subsequent calculation and analysis;
and 2.3, repeating the steps 2.1-2.2 until the BIC value of the fitted linear regression model is not reduced, and finally selecting the linear regression model with the global minimum BIC value as the optimal simple model, wherein the corresponding factors are key influence factors of the robust regression model to be constructed.
And 3, constructing a multiple linear regression model by utilizing the deformation of the training set and the actual measurement value of key influence factor monitoring data thereof, and obtaining a final regression coefficient through LTS estimation, thereby constructing a robust regression model based on BMS-LTS, and specifically comprising the following steps:
step 3.1, using the deformation of the corresponding measurement point training set and the key influence factor monitoring data selected by the BMS method to construct a multiple linear regression model, and setting the number of actual measurement values of the monitoring data as n, the multiple linear regression model can be expressed as:
wherein: x is an independent variable, x is E R n×p The method comprises the steps of carrying out a first treatment on the surface of the y is a dependent variable, y ε R nFor the parameters to be estimated->P is a random error term, ε R n The method comprises the steps of carrying out a first treatment on the surface of the p is the number of independent variables;
step 3.2, performing parameter estimation on the multiple linear regression model constructed in step 3.1 by adopting LTS estimation, namely least squares sum estimation, and randomly selecting H different sample points from n actual measurement values to form a sample subset H 0 An LS method, namely a least square method is adopted to calculate and obtain an initial regression coefficient;
step 3.3, substituting the initial regression coefficient obtained in the step 3.2 into the multiple linear regression model established in the step 3.1, calculating to obtain square residuals of n actual measurement values, arranging the square residuals in an ascending order, and recording the sum of the square residuals of the first h actual measurement values asGiven arbitrary parameter->The square residual is defined as:
wherein: (x) i ,y i ) For the actual measurement of the monitored data at the ith sample point,the square residual error corresponding to the ith sample point;
step 3.4, reserving samples corresponding to the first H measured values with the least square residual in step 3.3, and taking the samples as a new sample subset H 1 Calculating to obtain a new regression coefficient by adopting an LS method, wherein the new regression coefficient generally has a smaller objective function than the initial regression coefficient;
step 3.5, substituting the new regression coefficient obtained in step 3.4 into the multi-element line established in step 3.1In the sexual regression model, square residuals of n measured values are obtained through calculation, the square residuals are arranged in ascending order, and the sum of the square residuals of the first h measured values is recorded as
Step 3.6, repeating the steps 3.2 to 3.5 until the sum of square residuals of the first h measured valuesConvergence, thenThe expression is:
wherein, the value range of h needs to satisfy
Step 3.7, step 3.6And substituting the final regression coefficient corresponding to convergence into the multiple linear regression model established in the step 3.1 to obtain the robust regression model based on BMS-LTS.
And 4, marking a plurality of data used for final regression coefficient estimation as an optimal data group, marking the rest data as an abnormality, identifying the possible horizontal shift abnormality in the data sequence, and visualizing the change condition of the abnormal value in the data sequence by adopting a double wedge diagram, wherein the method specifically comprises the following steps of:
step 4.1, adding corresponding horizontal shift anomaly detection factors, namely delta, into the multiple linear regression model obtained in step 3.1 1 I(w≥δ 2 ) Wherein w=1, …, n; (I) is% . ) To indicate the function, delta 2 To shift the exact position of the abnormality horizontally, w (1) ,…,w (S) Is delta 2 Is w (s) ∈{w (1) ,…,w (S) };
Step 4.2, setting delta 2 =w (s) Wherein s=1 is initially set;
step 4.3, constructing an element subset E containing p-1 different measured values, and maintaining delta on the basis of the element subset E 2 =w (s) Unchanged, steps 3.2-3.4 are performed twice, wherein the subset E of elements should include w (s) Corresponding measured value, one satisfying w<w (s) In the sequence of data, and additional p-3 measured values randomly extracted from the entire data sequence;
step 4.4, cycling the step 4.3 for a plurality of times within the range from 1 to the number of the element subsets E, and executing the steps 3.2 to 3.6 on the nbest element subsets generating the minimum objective function until convergence; if s>1, also from study w (s-1) The nbest element subset found at the time starts, delta is set as well 2 =w (s) Executing the steps 3.2 to 3.6 until convergence;
step 4.5, selecting regression coefficient with minimum objective function from 2 Xnbest data and usingIndicating that the corresponding residual ∈is stored>
Step 4.6, step 4.2 to step 4.5 are performed at all w (s) Upper loop, where s=1, …, S, then multipleTaking a block with the smallest objective function +.>Is->And is marked as +.>
Step 4.7, performing anomaly detection on the data sequence by applying a univariate anomaly value detection program;
step 4.8, from the initial estimateInitially, hold delta 2 The LS method is applied to all points which are not marked as abnormal in the step 4.7, and an abnormal value detection result is obtained;
and 4.9, drawing a double-wedge graph according to the abnormal value detection result obtained in the step 4.8 so as to intuitively visualize the horizontal displacement abnormality and possible abnormal conditions in the monitoring data.
And 5, inputting key influence factors of the test set into the steady regression model trained in the step 3 and based on BMS-LTS to obtain dam deformation predicted values of corresponding measuring points.
Example 1
The method for adaptively identifying abnormal values of dam deformation monitoring data of the invention is used for identifying the horizontal displacement abnormality of a certain hydropower station dam, and specifically comprises the following steps:
and step 1, a water retaining building of a hydropower station is a concrete double arch dam, and the maximum dam height is 250m. The arch dam is provided with a plurality of deformation monitoring items including horizontal displacement, vertical displacement, deflection and the like. The horizontal displacement of the dam body is monitored by adopting a vertical method, and fig. 2 is a layout diagram of deformation measuring points of the arch dam body. In order to ensure that the adopted data can truly reflect the evolution rule of the operation state of the dam, the horizontal displacement of the PL3-2 and PL5-1 measuring points which are close to the middle part of the dam and have different elevations and the experience influence factor monitoring data thereof are selected as a model data set, and the model data set is prepared according to a training set: test set = 8: the ratio of 2 divides the model dataset.
In addition, in order to test the effectiveness of the outlier self-adaptive identification method provided by the invention, noise is added to the horizontal displacement data sequences of the two measuring points PL3-2 and PL 5-1. The radial horizontal displacement process line of each measuring point after noise addition is shown in figure 3.
Noise 1: the data sequence of the PL3-2 measuring point is subjected to noise adding treatment by adding one horizontal shift abnormality and three isolated outliers, specifically, the 231 th data (10 th month 18 days in 2017) and the following data are added with 3.0, the 130 th data (4 th month 3 days in 2015) are subtracted by 2.5, the 202 th data (19 th month 10 days in 2016) are added with 1.5, and the 268 th data (4 th month 10 years in 2018) are subtracted by 2.0.
Noise 2: the data sequence at PL5-1 station was noisy by adding three consecutive outliers, specifically by adding 8.0 to 171 (2015, 10, 16) to 192 (2016, 3), 8.0 to 193 (2016, 3, 15), 212 (2016, 10, 19), 8.0 to 264 (2018, 6) to 282 (2018, 11, 14) and 7.0 to each other.
And 2, reducing experience influence factors influencing dam deformation by adopting a BMS method, and removing redundant factors in the experience influence factors by adopting a BIC backward removal method, so as to determine key influence factors of a robust regression model to be constructed. In this example, 12 initial empirical factors were chosen to represent the horizontal displacement of the arch dam, as shown in table 1.
TABLE 1 initial experience impact factor summary table
In the table: H. h 0 The upstream water heads respectively correspond to the monitoring day and the initial measuring day; a, a i Regression coefficients are hydraulic factors; t is the accumulated days from the monitoring day to the initial measuring day; t is t 0 Cumulative days from the first monitoring day to the initial monitoring day of the modeling data series; b 1i 、b 2i Is a temperature factor regression coefficient; θ is the cumulative number of days from the monitoring day to the starting day, t divided by 100; θ 0 For modeling data seriesCumulative number of days t from first monitoring day to initial monitoring day 0 Divided by 100; c 1 、c 2 Is the ageing factor regression coefficient. Considering the influence of the coordinated displacement reference value and the initial measured value, a constant term a is generally added into a regression model 0
To find 4096 (i.e. 2 12 =4096) the best linear regression model among the possible models, the influence factor was reduced using the backward elimination of BIC. The concrete procedure of model selection using the backward removal method of BIC in this embodiment is shown in table 2.
TABLE 2 model selection procedure with BIC backward culling
Step (a) Model factor Factor elimination BIC value
Complete model (x 1 ,x 2 ,…,x 12 ) 501.69
Step 1 (x 1 ,x 2 ,…,x 8 ,x 10 ,x 11 ,x 12 ) x 9 500.90
Step 2 (x 1 ,x 2 ,…,x 8 ,x 11 ,x 12 ) x 10 498.61
First, one of 12 influence factors is eliminated, and it is found that when the temperature-dependent influence factor x 9 When removed from the complete model, the BIC value obtained is minimal, and therefore, reduction x is selected in this step 9 The model is new; next, the model with 11 influence factors was further analyzed by eliminating one influence factor again, and it was found that factor x was eliminated from the model 10 After that, the BIC value is further reduced; again, any of the remaining 10 influencing factors is further rejected without significantly reducing the BIC value of the model. Thus, by preferably generating a signal that does not include x 9 And x 10 Is a key influence factor affecting the deformation of the dam.
Step 3, constructing a multiple linear regression model by utilizing 10 key influence factors and actual measurement values of horizontal displacement monitoring data which are optimally selected by a training set, obtaining final regression coefficient estimation of the model through LTS estimation, thereby constructing a steady regression model based on BMS-LTS, wherein the final form of the model is shown as a formula (8), and the final regression coefficients of PL3-2 and PL5-1 measuring point horizontal displacement statistical models are shown in Table 3 in detail:
TABLE 3 final regression coefficients of PL3-2, PL5-1 measurement point horizontal displacement statistical models
Regression coefficient PL3-2 PL5-1
a 0 -3848.7 -1508.6
a 1 871.3 362.1
a 2 -73.05 -31.71
a 3 2.71 1.23
a 4 -0.038 -0.018
c 1 2.13 -9.46
c 2 0.096 0.98
b 11 -1.19 -5.36
b 21 0.74 -1.19
b 12 -0.026 0.45
b 22 0.019 0.015
b ls 3.91 /
And 4, constructing a dam deformation monitoring data outlier self-adaptive identification method to monitor a data sequence possibly containing different types of outliers. By applying the method to 2 different data sets, namely a test set and a training set, outlier recognition capability is tested. The result of the adaptive recognition of abnormal values of the horizontal displacement monitoring data of the measuring points PL3-2 and PL5-1 is shown in the figures 4 and 5 respectively. The result shows that the robust regression model based on BMS-LTS adaptively identifies the abnormal value added in the PL3-2 and PL5-1 measuring point horizontal displacement data sequence, and the adverse effect of the abnormal value on the performance of the regression model is effectively avoided;
and 5, inputting 10 key influence factors of the test set into the BMS-LTS-based robust regression model trained in the step 3, obtaining dam deformation prediction values of PL3-2 and PL5-1 measuring points, and simultaneously, in order to verify performance of the BMS-LTS-based robust regression model in dam deformation prediction, constructing a regression model by adopting the denoised horizontal displacement of the PL3-2 and PL5-1 measuring points and key influence factor monitoring data thereof, fitting and predicting and analyzing data samples of the training set and the test set respectively, and comparing the results with a traditional LS fitting-based multiple linear regression model. Fig. 6 and 7 are graphs showing comparison between measured values of horizontal displacement of PL3-2 and PL5-1 measuring points and fitting values and predicted values of different models, respectively, and the graphs can be seen as follows: when abnormal values exist in the training data, the regression error of the multi-linear regression model based on LS fitting at the abnormal values will dominate the increase or decrease of the whole loss function value, so that the model obtained by training at the moment is biased to an abnormal sample, and therefore, the model is sensitive to the abnormal values and lacks certain robustness. The robust regression model fitting and prediction precision based on the LTS are high, and the robust regression model fitting and prediction precision based on the LTS are stable in performance on a training set and a testing set. Compared with the traditional non-robust regression method, the application of the LTS estimation technology in the model can adaptively identify abnormal values in dam deformation monitoring data, accurately excavate complex action rules between dam deformation effect quantity and influence factors thereof under the condition of overcoming the interference of the abnormal values, further judge the evolution trend of the dam deformation state and realize the aim of safety monitoring of the dam deformation.

Claims (6)

1. The self-adaptive identification method for abnormal values of the dam deformation monitoring data is characterized by comprising the following steps of:
step 1, determining dam deformation measuring points which need to be subjected to abnormal value self-adaptive identification according to engineering reality, taking deformation of corresponding measuring points and experience influence factor monitoring data thereof as a data set of a robust regression model to be constructed, and dividing a training set and a testing set according to the size of the data set and the actual need;
step 2, reducing experience influence factors influencing dam deformation by adopting a BMS method, and removing redundant factors in the experience influence factors by adopting a BIC backward removal method so as to determine key influence factors of a robust regression model to be constructed;
step 3, constructing a multiple linear regression model by utilizing the deformation of the training set and the actual measurement value of key influence factor monitoring data thereof, and obtaining a final regression coefficient through LTS estimation, thereby constructing a steady regression model based on BMS-LTS;
step 4, marking a plurality of data used for final regression coefficient estimation as an optimal data group, marking the rest data as abnormality, identifying the possible horizontal shift abnormality in the data sequence, and visualizing the change condition of the abnormal value in the data sequence by adopting a double wedge diagram;
and 5, inputting key influence factors of the test set into the steady regression model trained in the step 3 and based on BMS-LTS to obtain dam deformation predicted values of corresponding measuring points.
2. The dam deformation monitoring data outlier adaptive identification method according to claim 1, wherein the step 2 specifically comprises the steps of:
step 2.1, constructing a complete linear regression model by utilizing all the experience influence factors obtained in the step 1, randomly removing one experience influence factor from the complete linear regression model, fitting a new linear regression model by using the rest factors, and recording BIC values of the new linear regression model; repeating the same times as the number of all experience influence factors, and selecting a linear regression model with a minimum BIC value for subsequent calculation and analysis;
step 2.2, aiming at the linear regression model with the minimum BIC value selected in the step 2.1, further observing BIC values of a plurality of new linear regression models which are fitted by removing one experience influence factor of the model again, and selecting a model with the lowest BIC value for subsequent calculation and analysis;
and 2.3, repeating the steps 2.1-2.2 until the BIC value of the fitted linear regression model is not reduced, and finally selecting the linear regression model with the global minimum BIC value as the optimal simple model, wherein the corresponding factors are key influence factors of the robust regression model to be constructed.
3. The adaptive identification method for abnormal values of dam deformation monitoring data according to claim 2, wherein the calculation formula of the BIC value in step 2 is as follows:
BIC=-2ln(likelihood)+(m+1)ln(n) (1)
wherein: n is the number of observations in the linear regression model, and m is the number of predictions;
the likelihood function for a given model M and its parameters θ is as follows:
likelihood=P(data|θ,M)=L(θ,M) (2);
when the sample size is large enough and the data obeys an exponential family distribution, the BIC value can be approximated as:
BIC≈-2ln(P(data|M))=-2ln(∫P(data|θ,M)P(θ|M)dθ) (3)
wherein: p (data|m) is a marginal likelihood function for data under model M, and P (θ|m) is an a priori distribution for parameter θ.
4. A method for adaptively identifying abnormal values of dam deformation monitoring data according to claim 2 or 3, wherein the step 3 specifically comprises the steps of:
step 3.1, using the deformation of the corresponding measurement point training set and the key influence factor monitoring data selected by the BMS method to construct a multiple linear regression model, and setting the number of actual measurement values of the monitoring data as n, the multiple linear regression model can be expressed as:
wherein: x is an independent variable, x is E R n×p The method comprises the steps of carrying out a first treatment on the surface of the y is a dependent variable, y ε R nFor the parameters to be estimated->Epsilon is a random error term, epsilon R n The method comprises the steps of carrying out a first treatment on the surface of the p is the number of independent variables;
step 3.2, performing parameter estimation on the multiple linear regression model constructed in the step 3.1 by adopting LTS estimation, and randomly selecting H different sample points from n actual measurement values to form a sample subset H 0 Calculating to obtain an initial regression coefficient by adopting an LS method;
step 3.3, substituting the initial regression coefficient obtained in the step 3.2 into the multiple linear regression model established in the step 3.1, and calculating to obtain nSquare residuals of the measured values are arranged in ascending order, and the sum of the square residuals of the first h measured values is recorded asGiven arbitrary parameter->The square residual is defined as:
wherein: (x) i ,y i ) For the actual measurement of the monitored data at the ith sample point,the square residual error corresponding to the ith sample point;
step 3.4, reserving samples corresponding to the first H measured values with the least square residual in step 3.3, and taking the samples as a new sample subset H 1 Calculating to obtain a new regression coefficient by adopting an LS method;
step 3.5, substituting the new regression coefficient obtained in the step 3.4 into the multiple linear regression model established in the step 3.1, calculating to obtain square residuals of n actual measurement values, arranging the square residuals in an ascending order, and recording the sum of the square residuals of the first h actual measurement values as
Step 3.6, repeating the steps 3.2 to 3.5 until the sum of square residuals of the first h measured valuesConvergence, then->The expression is:
wherein, the value range of h needs to satisfy
Step 3.7, step 3.6And substituting the final regression coefficient corresponding to convergence into the multiple linear regression model established in the step 3.1 to obtain the robust regression model based on BMS-LTS.
5. The method for adaptively identifying abnormal values of dam deformation monitoring data according to claim 4, wherein the step 4 specifically comprises the steps of:
step 4.1, adding corresponding horizontal shift anomaly detection factors, namely delta, into the multiple linear regression model obtained in step 3.1 1 I(w≥δ 2 ) Wherein w=1, …, n; (I) is% . ) To indicate the function, delta 2 To shift the exact position of the abnormality horizontally, w (1) ,…,w (S) Is delta 2 Is w (s) ∈{w (1) ,…,w (S) };
Step 4.2, setting delta 2 =w (s) Wherein s=1 is initially set;
step 4.3, constructing an element subset E containing p-1 different measured values, and maintaining delta on the basis of the element subset E 2 =w (s) The steps 3.2 to 3.4 are carried out twice without change;
step 4.4, cycling the step 4.3 for a plurality of times within the range from 1 to the number of the element subsets E, and executing the steps 3.2 to 3.6 on the nbest element subsets generating the minimum objective function until convergence; if s>1, also from study w (s-1) The nbest element subset found at the time starts, delta is set as well 2 =w (s) Step 3.2 is performedStep 3.6, until convergence;
step 4.5, selecting regression coefficient with minimum objective function from 2 Xnbest data and usingIndicating that the corresponding residual ∈is stored>
Step 4.6, step 4.2 to step 4.5 are performed at all w (s) Upper loop, where s=1, …, S, then multipleTaking a block with the smallest objective function +.>Is->And is marked as +.>
Step 4.7, performing anomaly detection on the data sequence by applying a univariate anomaly value detection program;
step 4.8, from the initial estimateInitially, hold delta 2 The LS method is applied to all points which are not marked as abnormal in the step 4.7, and an abnormal value detection result is obtained;
and 4.9, drawing a double-wedge graph according to the abnormal value detection result obtained in the step 4.8 so as to intuitively visualize the horizontal displacement abnormality and possible abnormal conditions in the monitoring data.
6. The method for adaptively identifying outliers of dam deformation monitoring data according to claim 5, wherein said subset of elements E in step 4.3 should comprise w (s) Corresponding measured value, one satisfying w<w (s) And another p-3 measured values randomly extracted from the entire data sequence.
CN202310272375.2A 2023-03-20 2023-03-20 Self-adaptive identification method for abnormal value of dam deformation monitoring data Pending CN116451161A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310272375.2A CN116451161A (en) 2023-03-20 2023-03-20 Self-adaptive identification method for abnormal value of dam deformation monitoring data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310272375.2A CN116451161A (en) 2023-03-20 2023-03-20 Self-adaptive identification method for abnormal value of dam deformation monitoring data

Publications (1)

Publication Number Publication Date
CN116451161A true CN116451161A (en) 2023-07-18

Family

ID=87134772

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310272375.2A Pending CN116451161A (en) 2023-03-20 2023-03-20 Self-adaptive identification method for abnormal value of dam deformation monitoring data

Country Status (1)

Country Link
CN (1) CN116451161A (en)

Similar Documents

Publication Publication Date Title
CN111222290B (en) Multi-parameter feature fusion-based method for predicting residual service life of large-scale equipment
CN110018670B (en) Industrial process abnormal working condition prediction method based on dynamic association rule mining
CN107562696B (en) On-line detection and control method for tire product quality
CN114297918B (en) Aero-engine residual life prediction method based on full-attention depth network and dynamic ensemble learning
CN109271319B (en) Software fault prediction method based on panel data analysis
CN113838054B (en) Mechanical part surface damage detection method based on artificial intelligence
CN111680725B (en) Gas sensor array multi-fault isolation algorithm based on reconstruction contribution
CN110569566B (en) Method for predicting mechanical property of plate strip
CN112651119B (en) Multi-performance parameter acceleration degradation test evaluation method for space harmonic reducer
CN111222095B (en) Rough difference judging method, device and system in dam deformation monitoring
CN112785091A (en) Method for performing fault prediction and health management on oil field electric submersible pump
CN113065702B (en) Landslide displacement multi-linear prediction method based on ST-SEEP segmentation method and space-time ARMA model
CN114692507B (en) Soft measurement modeling method for count data based on stacked poisson self-encoder network
CN110308713A (en) A kind of industrial process failure identification variables method based on k neighbour reconstruct
CN112100574A (en) Resampling-based AAKR model uncertainty calculation method and system
CN115994248A (en) Data detection method and system for valve faults
CN112949735A (en) Liquid hazardous chemical substance volatile concentration abnormity discovery method based on outlier data mining
CN115422687A (en) Service life prediction method of rolling bearing
CN117521512A (en) Bearing residual service life prediction method based on multi-scale Bayesian convolution transducer model
CN116881640A (en) Method and system for predicting core extraction degree and computer-readable storage medium
CN116910677A (en) Industrial instrument fault diagnosis method and system
CN116502119A (en) Sensor fault detection method and device of structural health monitoring system
CN112069621B (en) Method for predicting residual service life of rolling bearing based on linear reliability index
CN116451161A (en) Self-adaptive identification method for abnormal value of dam deformation monitoring data
CN115659271A (en) Sensor abnormality detection method, model training method, system, device, and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination