CN107992447B - Feature selection decomposition method applied to river water level prediction data - Google Patents

Feature selection decomposition method applied to river water level prediction data Download PDF

Info

Publication number
CN107992447B
CN107992447B CN201711330726.1A CN201711330726A CN107992447B CN 107992447 B CN107992447 B CN 107992447B CN 201711330726 A CN201711330726 A CN 201711330726A CN 107992447 B CN107992447 B CN 107992447B
Authority
CN
China
Prior art keywords
input set
decomposition
feature
water level
feature selection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201711330726.1A
Other languages
Chinese (zh)
Other versions
CN107992447A (en
Inventor
杨拥军
管杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201711330726.1A priority Critical patent/CN107992447B/en
Publication of CN107992447A publication Critical patent/CN107992447A/en
Application granted granted Critical
Publication of CN107992447B publication Critical patent/CN107992447B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • G06F17/148Wavelet transforms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/32Circuit design at the digital level
    • G06F30/333Design for testability [DFT], e.g. scan chain or built-in self-test [BIST]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Algebra (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Operations Research (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a feature selection decomposition method applied to river water level prediction data, which introduces LASSO regression to select features of an original input set, integrates MODWT to decompose components of the selected features and adopts multiple linear regression as a basic model to test the performance of LASSO-MODWT in order to obtain the features which are most suitable for being used as model input. Tests show that the LASSO-MODWT-based feature selection decomposition method is beneficial to improving the performance and model interpretation capability of a river water level prediction model.

Description

Feature selection decomposition method applied to river water level prediction data
Technical Field
The invention belongs to the technical field of water level prediction, and particularly relates to a design of a feature selection decomposition method applied to river water level prediction data.
Background
The water level prediction plays an extremely important role in flood control and disaster reduction, water resource utilization and distribution management. A steady water level prediction model can provide the change situation of the future water level for a relevant decision maker, and timely master the potential hydrological disasters, so that the relevant early warning deployment can be carried out earlier. In the field of water level prediction, due to the multi-dimensionality and complexity of factors influencing the water level, a nonlinear dynamic relation and various correlations are often presented between potential input quantities of a model system. In addition, the number of input variables is generally large, and particularly, the number of dimensions and computational complexity of features are drastically increased by introducing a hysteresis amount of each variable, but these variables actually include a large amount of repetitive information and noise components. In order to reduce the operation complexity of the model and improve the flexibility and the explanatory power of the model, effective characteristics containing the minimum redundancy are required to be selected from an original high-dimensional data set, so that a model which has flexibility, is simpler and can reflect the real water level change rule better is constructed.
LASSO was first proposed by Robert Tibshirani in 1996, and is called as a last absolute shrinkage and selection operator. The method is a compression estimation that results in a more refined model by constructing a penalty function such that it compresses coefficients while setting coefficients to zero. The advantage of subset puncturing is thus retained, and is a way to process biased estimates of data with complex collinearity. The basic idea of LASSO is to minimize the residual sum of squares RSS under the constraint that the sum of the absolute values of the regression coefficients is less than or equal to a constant, so as to generate some regression coefficients strictly 0, and obtain a model with interpretability after compressing features.
Discrete Wavelet Transform (DWT) is widely used in many models of integrated wavelets, and can obtain detailed spectral information of data, such as periodicity, local variation characteristics, randomness and mutation. But because of its decimation effect, it introduces a potential lack of information during the model building phase and thus produces a bias in the prediction. In addition, the wavelet transform coefficient result of the DWT is related to the start position of the wavelet transform, thereby bringing about a certain contingency.
based on the above-mentioned defects of DWT, the related people further propose Maximum Overlap Discrete Wavelet Transform (MODWT) as a method of feature decomposition. MODWT is a linear filtering operation that can better solve the decimation effect, and through MODWT, multi-dimensional wavelet coefficients with the same dimension as the observed values can be obtained. In addition, the result of the wavelet transform is independent of the position of the start of the transform, and can be used for the transform of data of different sample sizes. In general, MODWT can extract different frequency band components of the input signal, so as to obtain more abundant information and reveal the potential variation rule of the data.
Disclosure of Invention
The invention aims to reduce the operation complexity of the existing water level prediction model and improve the flexibility and the explanatory power of the existing water level prediction model, and provides a characteristic selection decomposition method applied to river water level prediction data.
the technical scheme of the invention is as follows: a feature selection decomposition method applied to river water level prediction data comprises the following steps:
And S1, acquiring hydrological factors influencing the water level of the target prediction station (current water level information of the target station, upstream basin water level information, rainfall along the way and the like).
and S2, constructing a feature set based on the information theory according to each hydrological element.
and S3, introducing a hysteresis quantity to each feature in the feature set based on correlation analysis, and constructing an original input set.
And S4, carrying out standardization processing on the original input set.
And S5, selecting the characteristics of the input set after the standardization processing based on the LASSO.
s6, performing feature decomposition on the input set after feature selection based on MODWT to obtain an input set optimized by LASSO-MODWT.
The invention has the beneficial effects that: the invention adopts LASSO regression to select the characteristics of the original input set and integrates MODWT to decompose the selected characteristics, thereby obviously improving the prediction performance of the river water level and being beneficial to improving the performance and the model interpretation capability of a river water level prediction model.
further, step S2 is specifically: respectively calculating the maximum information coefficient MIC between each hydrological element and the prediction target, analyzing the strength of the relation between the maximum information coefficient MIC and the prediction target, and constructing a feature set by taking the hydrological elements with the MIC value larger than a set threshold value with the prediction target as input features.
The maximum information coefficient MIC is calculated by the formula:
Wherein X, Y is two random variables, B is a segmentation limit, the total amount of data taken is 0.6 or 0.55 power, MIC [ X; y represents the maximum information coefficient between X and Y, I [ X; y ] represents the mutual information between X and Y, and the calculation formula is as follows:
Where p (X) and p (Y) represent the probability density distribution function of X, Y, respectively, and p (X, Y) represents the joint probability density distribution function of X, Y.
The beneficial effects of the above further scheme are: and analyzing the relationship strength between each hydrological element and the prediction target by adopting a maximum information coefficient MIC (many integrated computer), and constructing a feature set by taking a factor having a strong relationship with the prediction target as an input feature.
further, step S3 is specifically: determining a hysteresis quantity by adopting a partial autocorrelation function PACF (Picture archiving and communication function) aiming at the current water level information of a target site in a feature set, and analyzing and determining the hysteresis quantity by adopting a cross-correlation coefficient aiming at other input features in the feature set; for each lag, if it exhibits a clear statistical correlation with the predicted target, i.e., reaches a 95% confidence interval, the lag is added to the input set, thereby constructing the original input set.
The beneficial effects of the above further scheme are: since the predicted target river level information is time series, the influence of introducing a lag amount should be taken into consideration when constructing the original input set.
further, step S4 is specifically: carrying out standardization processing on an original input set by adopting a minimum-maximum value standardization processing method, and scaling the original input set to a [0,1] interval, wherein the processing formula is as follows:
Wherein xi,normFor normalized data values, xirepresenting the ith data item to be normalized in the original input set, Nminand NmaxMinimum and maximum values of scaling, i.e. 0 and 1, respectively, xminAnd xmaxRespectively the minimum and maximum in the original input set.
The beneficial effects of the above further scheme are: because different input data have different dimensions, in order to evaluate the original input set by using the same standard, the original input set needs to be standardized to realize non-dimensionalization, and the original input set is scaled to the [0,1] interval.
Further, step S5 specifically includes the following sub-steps:
and S51, taking the input set after the standardization processing as model input, taking the water level data set of the predicted target site as model output, and constructing a LASSO regression model.
S52, training the LASSO regression model, optimizing the parameter lambda of the LASSO regression by adopting a grid search method, and searching for the optimal parameter.
And S53, scoring the features in the input set by using an LASSO regression model with optimal parameters, wherein the scoring standard is a regression coefficient obtained by LASSO regression, selecting the features with the LASSO regression coefficient being positive to continuously keep in the input set, and removing the features with the LASSO regression coefficient being 0 or negative from the input set to realize the feature selection of the input set.
the beneficial effects of the above further scheme are: after the LASSO selects the characteristics of the input set after the standardization processing, the prediction accuracy can be improved on the premise of greatly reducing the model input parameters.
Further, step S6 is specifically: and performing characteristic decomposition on the input set after characteristic selection by adopting an MODWT model, and using wavelet coefficient sets obtained by all characteristic decomposition to construct an optimized input set.
The formula of the characteristic decomposition is as follows:
Where f (t) is the wavelet coefficients resulting from the feature decomposition,for smooth approximation of wavelets, W, to the original signal in M-layer decompositionm(t) is the decomposition wavelet of the original signal in M layers, M is 1, 2.
M=int[log(N)] (5)
where N is the input set length after feature selection and int [. cndot ] is an upward rounding function.
the beneficial effects of the above further scheme are: the MODWT model is adopted to carry out feature decomposition on the input set after feature selection, so that the river water level prediction precision can be obviously improved.
Further, the MODWT model employs Daubechies wavelet basis.
The beneficial effects of the above further scheme are: the invention adopts Daubechies wavelet base, which is widely applied to the field of hydrologic prediction considering that hydrologic prediction is suitable for irregular wavelet base.
Drawings
Fig. 1 is a flowchart of a feature selection decomposition method applied to river water level prediction data according to an embodiment of the present invention.
FIG. 2 is a graph comparing the results of DMDWT on WL _ CS using a Daubechies wavelet base in the form of db3, according to an embodiment of the present invention.
Fig. 3 is a comparison graph of predicted values and actual values of three-hour prediction of different input sets according to an embodiment of the present invention.
Fig. 4 is a scatter diagram illustrating predicted values and true values of three-hour prediction for different input sets according to an embodiment of the present invention.
Detailed Description
exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It is to be understood that the embodiments shown and described in the drawings are merely exemplary and are intended to illustrate the principles and spirit of the invention, not to limit the scope of the invention.
The embodiment of the invention provides a feature selection decomposition method applied to river water level prediction data, as shown in fig. 1, comprising the following steps S1-S6:
And S1, acquiring hydrological elements (including current water level information, upstream basin water level information, rainfall along the way and other hydrological elements) influencing the water level of the target prediction station.
In the embodiment of the invention, the water level change trend in the downstream of the red water river is taken as an example, and the purpose is to predict the water level conditions of the red water station in the future of 3 hours and 6 hours. The adopted data is collected by automatic monitoring stations along the bank in the red river in the period of 2015 and 2016, and the related station information is shown in table 1. Since the data was stored for an hourly acquisition, there were a total of 8834 data points. The data acquisition and storage process inevitably has deletion, analysis finds that the deletion data is WL _ MT 2015-10-0902-2015-10-1407 and 126 items of data, and interpolation and completion are carried out on the data by utilizing pandas.
TABLE 1
Code number Meaning of parameters Monitoring station Data type Acquisition cycle
WL_CS Red water station level red water station water level hourly space
WL_EL water level of the two-man station Two-man station Water level Hourly space
WL_MT Couchtop station water level Couchgrass stand Water level Hourly space
RF_CS Rainfall capacity of red water station Red water station Amount of rainfall Hourly space
RF_XS Rainfall of water learning station Water station Amount of rainfall Hourly space
And S2, constructing a feature set based on the information theory according to each hydrological element.
respectively calculating the maximum information coefficient MIC between each hydrological element and the prediction target, analyzing the strength of the relation between the maximum information coefficient MIC and the prediction target, and constructing a feature set by taking the hydrological element (namely the hydrological element with a strong relation with the prediction target) with the MIC value larger than a set threshold value with the prediction target as an input feature.
The maximum information coefficient MIC is calculated by the formula:
Wherein X, Y is two random variables, B is a segmentation limit, which determines the upper limit of X, Y discrete segmentation, generally takes data of 0.6 or 0.55 power of the total amount, MIC [ X; y represents the maximum information coefficient between X and Y, I [ X; y ] represents the mutual information between X and Y, and the calculation formula is as follows:
Where p (X) and p (Y) represent the probability density distribution function of X, Y, respectively, and p (X, Y) represents the joint probability density distribution function of X, Y.
in the embodiment of the present invention, the feature set has 5 features, which includes the following contents: (1) three hydrological monitoring stations including a red water station, a couchgrass station and a second station (with the code numbers of WL _ CS, WL _ MT and WL _ EL); (2) rainfall data (with the code of RF _ CS and RF _ XS) of two weather monitoring stations, namely a red water station and a water learning station.
And S3, introducing a hysteresis quantity to each feature in the feature set based on correlation analysis, and constructing an original input set.
Since the predicted target river level information is time series, the influence of introducing a lag amount should be taken into consideration when constructing the original input set. In the embodiment of the invention, a partial autocorrelation function PACF is adopted to determine the lag for the current water level information of a target site in a feature set, and the lag is determined by cross-correlation coefficient analysis for other input features in the feature set; for each lag, if it exhibits a clear statistical correlation (i.e., reaches a 95% confidence interval) with the predicted target, the lag is added to the input set, thereby constructing the original input set. The partial autocorrelation function PACF and the cross correlation coefficient analysis methods are all correlation analysis methods commonly used in the art, and are not described herein again.
In the embodiment of the invention, the number of the features of the original input set is predicted to be 221 in 3h and predicted to be 229 in 6h after the hysteresis quantity is introduced to each feature through correlation analysis.
And S4, carrying out standardization processing on the original input set.
Because different input data have different dimensions, in order to evaluate the original input set by using the same standard, the original input set needs to be standardized to realize non-dimensionalization. In the embodiment of the invention, a minimum-maximum value standardization processing method (Min-Max Scaler) is adopted to standardize an original input set, the original input set is zoomed in a [0,1] interval, and the processing formula is as follows:
Wherein xi,normFor normalized data values, xiRepresenting the ith data item to be normalized in the original input set, NminAnd NmaxMinimum and maximum values of scaling, i.e. 0 and 1, respectively, xminAnd xmaxRespectively, the minimum and the maximum in the original input setA large value.
And S5, selecting the characteristics of the input set after the standardization processing based on the LASSO.
in order to simplify the input set and select the most suitable features for input, the feature selection is performed on the element input set based on LASSO regression in the embodiment of the invention. Since it introduces the L1 regular term as a penalty term, the regression coefficient of the redundant features can be compressed to 0, so that the feature selection based on LASSO regression is a sparse feature selection method.
The step S5 specifically includes the following substeps S51-S53:
And S51, taking the input set after the standardization processing as model input, taking the water level data set of the predicted target site as model output, and constructing a LASSO regression model.
S52, training the LASSO regression model, optimizing the parameter lambda of the LASSO regression by adopting a grid search method, and searching for the optimal parameter.
And S53, scoring the features in the input set by using an LASSO regression model with optimal parameters, wherein the scoring standard is a regression coefficient obtained by LASSO regression, selecting the features with the LASSO regression coefficient being positive to continuously keep in the input set, and removing the features with the LASSO regression coefficient being 0 or negative from the input set to realize the feature selection of the input set.
In the embodiment of the invention, the number of the predicted features in 3h is 49 after LASSO-based feature selection, and the number of the predicted features in 6h is 88. It can be seen that the number of input features is greatly reduced in both prediction scenarios, and further the complexity of model construction is reduced.
S6, performing feature decomposition on the input set after feature selection based on MODWT to obtain an input set optimized by LASSO-MODWT.
And performing characteristic decomposition on the input set after characteristic selection by adopting an MODWT model, and using wavelet coefficient sets obtained by all characteristic decomposition to construct an optimized input set.
The formula of the characteristic decomposition is as follows:
Where f (t) is the wavelet coefficients resulting from the feature decomposition,For smooth approximation of wavelets, W, to the original signal in M-layer decompositionm(t) is the decomposition wavelet of the original signal in M layers, M is 1, 2.
M=int[log(N)] (5)
where N is the input set length after feature selection and int [. cndot ] is an upward rounding function.
the effective input sets in the embodiment of the present invention are 8678, so the minimum decomposition layer number of MODWT is: the test is carried out by taking the integer of M ═ log (8678) ═ 3.93 and the integer of M ═ 4, and taking the cases of M ═ 4 and M ═ 5 in the examples of the present invention.
although MODWT has proven to have many advantages as a multi-resolution feature recognition tool, one challenge in building a model based on MODWT is to select a proper wavelet basis function, and since there is no definite general basis function selection standard at present and there is no relevant literature describing which basis function is selected to obtain the best model effect, different application scenarios are theoretically suitable for different basis functions. In view of the fact that hydrologic predictions are suitable for irregular wavelet bases, embodiments of the present invention employ Daubechies wavelet bases, which are widely used in the field of hydrologic predictions. In the embodiment of the invention, the wavelet bases of three forms of db2, db3 and db4 are adopted for comparative test, and the wavelet bases which are most suitable for predicting the water level of the red water river are searched.
Fig. 2 shows the result of DMDWT on WL _ CS using Daubechies wavelet basis in db3, with 6 sub-graphs from top to bottom for the original signal waveform, the smoothed approximation waveform (a4), and the four-layer DMDWT decomposition coefficients (d1, d2, d3, d4), respectively. In order to reduce the computational complexity, the embodiment of the present invention decomposes only the most important WL _ CS feature scored by LASSO, and adds the wavelet coefficients obtained after decomposition as new features to the input set (the 4-layer and 5-layer decompositions are 5-dimensional and 6-dimensional coefficients, respectively), where the 3-hour prediction features are 53, and the 6-hour prediction features are 92.
Since there is no universal single index for evaluating performance of a hydrological prediction model, embodiments of the present invention pass through the Nash efficiency coefficient ENSAnd comprehensively evaluating the prediction performance by three statistical indexes, namely the root mean square error RMSE and the average absolute error MAE.
(1) Coefficient of Nash efficiency ENS
(2) Root mean square error RMSE:
(3) Mean absolute error MAE:
Wherein, SWLOBSFor actually measured water level SWLFORThe water level is obtained through model prediction, N is the number of data points,The measured water level is the overall average value.
In the embodiment of the invention, an original input set obtained based on correlation analysis, an input set subjected to LASSO-based feature selection and an input set subjected to LASSO-MODWT optimization are respectively used as the input of a multiple linear regression model for predicting water level data of a red water station in 3 hours and 6 hours, and the performance of the LASSO-MODWT feature selection decomposition method is further evaluated. Table 2 is a comparison of the performance of different input sets for predicting 3 and 6 hour water levels for a red water station. As can be seen from table 2, the prediction accuracy can be improved on the premise of greatly reducing the model input parameters after the characteristic selection based on LASSO, regardless of the 3-hour prediction or the 6-hour prediction; and the integrated MODWT can obviously improve the prediction accuracy and has good performance for 3-hour prediction and 6-hour prediction.
TABLE 2
fig. 3 is a comparison of the predicted results and the actual values of the water levels at 3 hours during 2016, 8 and month in the red water station for different input sets, and fig. 4 is a scatter diagram of the predicted values and the actual values of the three input sets. It can be seen that, after the LASSO-MODWT feature selection decomposition, the approximation degree of the predicted value and the true value of the LASSO-W-MLR is higher and the model performance is more stable compared with the predicted result of the original input set. Therefore, the LASSO-MODWT characteristic selection decomposition method can obviously improve the accuracy and stability of the red water river level prediction model.
In order to further study the influence of different wavelet base types on the red water river level prediction performance, three wavelets db2, db3 and db4 and two decomposition layer numbers of level4 and level5 are respectively simulated in the embodiment of the invention, and table 3 shows the performance results of 3h prediction and 6h prediction by adopting different wavelet bases and decomposition layer numbers. As can be seen from Table 3, the db2 wavelet basis and 5-layer wavelet decomposition are adopted to obtain better prediction performance in the red water river level prediction model. The result further shows that different application scenarios are suitable for adopting different wavelet bases, and in the actual modeling process, demonstration attempts should be made according to specific requirements to find the most suitable wavelet base and decomposition layer number, so as to improve the model accuracy.
TABLE 3
it will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims (9)

1. A feature selection decomposition method applied to river water level prediction data is characterized by comprising the following steps:
S1, acquiring hydrological factors influencing the water level of the target prediction station;
S2, constructing a feature set based on an information theory according to each hydrological element;
S3, introducing hysteresis quantity to each feature in the feature set based on correlation analysis, and constructing an original input set;
s4, carrying out standardization processing on the original input set;
S5, performing feature selection on the input set after the standardization processing based on the LASSO;
S6, performing feature decomposition on the input set after feature selection based on MODWT to obtain an input set optimized by LASSO-MODWT;
The step S5 specifically includes the following sub-steps:
S51, taking the input set after standardization as model input, taking the water level data set of the predicted target site as model output, and constructing an LASSO regression model;
S52, training the LASSO regression model, optimizing a parameter lambda of the LASSO regression by adopting a grid search method, and searching for an optimal parameter;
And S53, scoring the features in the input set by using an LASSO regression model with optimal parameters, wherein the scoring standard is a regression coefficient obtained by LASSO regression, selecting the features with the LASSO regression coefficient being positive to continuously keep in the input set, and removing the features with the LASSO regression coefficient being 0 or negative from the input set to realize the feature selection of the input set.
2. The feature selection decomposition method according to claim 1, wherein the hydrological elements affecting the water level of the target prediction site in the step S1 include current water level information of the target site, upstream basin water level information, and rainfall along the way.
3. The method for feature selection decomposition according to claim 1, wherein the step S2 specifically comprises: respectively calculating the maximum information coefficient MIC between each hydrological element and the prediction target, analyzing the strength of the relation between the maximum information coefficient MIC and the prediction target, and constructing a feature set by taking the hydrological elements with the MIC value larger than a set threshold value with the prediction target as input features.
4. The method of feature selection decomposition of claim 3, wherein the maximum information coefficient MIC is calculated as:
Wherein X, Y is two random variables, B is a segmentation limit, the total amount of data taken is 0.6 or 0.55 power, MIC [ X; y represents the maximum information coefficient between X and Y, I [ X; y ] represents the mutual information between X and Y, and the calculation formula is as follows:
Where p (X) and p (Y) represent the probability density distribution function of X, Y, respectively, and p (X, Y) represents the joint probability density distribution function of X, Y.
5. The method for feature selection decomposition according to claim 3, wherein the step S3 specifically comprises: determining a hysteresis quantity by adopting a partial autocorrelation function PACF (Picture archiving and communication function) aiming at the current water level information of a target site in a feature set, and analyzing and determining the hysteresis quantity by adopting a cross-correlation coefficient aiming at other input features in the feature set; for each lag, if it exhibits a clear statistical correlation with the predicted target, i.e., reaches a 95% confidence interval, the lag is added to the input set, thereby constructing the original input set.
6. The method for feature selection decomposition according to claim 1, wherein the step S4 specifically comprises: carrying out standardization processing on an original input set by adopting a minimum-maximum value standardization processing method, and scaling the original input set to a [0,1] interval, wherein the processing formula is as follows:
Wherein xi,normFor normalized data values, xiRepresenting the ith data item to be normalized, N, in the original input setminAnd NmaxMinimum and maximum values of scaling, i.e. 0 and 1, respectively, xminAnd xmaxRespectively the minimum and maximum in the original input set.
7. The method for feature selection decomposition according to claim 1, wherein the step S6 specifically comprises: and performing characteristic decomposition on the input set after characteristic selection by adopting an MODWT model, and using wavelet coefficient sets obtained by all characteristic decomposition to construct an optimized input set.
8. The method of feature selection decomposition of claim 7 wherein the formula of the feature decomposition is:
Where f (t) is the wavelet coefficients resulting from the feature decomposition,For smooth approximation of wavelets, W, to the original signal in M-layer decompositionm(t) is the decomposition wavelet of the original signal in M layers, M is 1, 2.
M=int[log(N)] (5)
Where N is the input set length after feature selection and int [. cndot ] is an upward rounding function.
9. The method of feature selection decomposition of claim 7 wherein the MODWT model employs Daubechies wavelet basis.
CN201711330726.1A 2017-12-13 2017-12-13 Feature selection decomposition method applied to river water level prediction data Expired - Fee Related CN107992447B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711330726.1A CN107992447B (en) 2017-12-13 2017-12-13 Feature selection decomposition method applied to river water level prediction data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711330726.1A CN107992447B (en) 2017-12-13 2017-12-13 Feature selection decomposition method applied to river water level prediction data

Publications (2)

Publication Number Publication Date
CN107992447A CN107992447A (en) 2018-05-04
CN107992447B true CN107992447B (en) 2019-12-17

Family

ID=62038276

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711330726.1A Expired - Fee Related CN107992447B (en) 2017-12-13 2017-12-13 Feature selection decomposition method applied to river water level prediction data

Country Status (1)

Country Link
CN (1) CN107992447B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108921222A (en) * 2018-07-05 2018-11-30 四川泰立智汇科技有限公司 A kind of air-conditioning energy consumption feature selection approach based on big data
CN109488321B (en) * 2019-01-03 2019-11-29 天津大学 A kind of Cutter Head Torque in Shield Tunneling determines method and system
CN109736819B (en) * 2019-01-03 2020-02-14 天津大学 Method and system for determining total shield tunneling thrust
CN110427663A (en) * 2019-07-17 2019-11-08 武汉大学 Face precipitation-water-level simulation method based on time series network
CN110311376B (en) * 2019-07-31 2022-12-20 三峡大学 Dynamic safety assessment comprehensive model and space-time visualization method for power system
CN111539587B (en) * 2020-03-06 2023-11-24 武汉极善信息技术有限公司 Hydrologic forecasting method
EP3916667A1 (en) * 2020-05-29 2021-12-01 Fortia Financial Solutions Real-time time series prediction for anomaly detection
CN112529252B (en) * 2020-11-18 2022-05-03 贵州电网有限责任公司 Small hydropower station forebay water level prediction method and prediction system
CN113222145B (en) * 2021-06-04 2023-12-22 西安邮电大学 MODTT-EMD-based time sequence hybrid prediction method
CN115905198A (en) * 2022-11-24 2023-04-04 中国长江电力股份有限公司 Water level data early warning method for key water level station of Yangtze river basin
CN115713164B (en) * 2022-11-26 2023-11-24 福建中锐汉鼎数字科技有限公司 Drainage basin downstream water level prediction method
CN115828757B (en) * 2022-12-12 2024-02-23 福建中锐汉鼎数字科技有限公司 Flood discharge hysteresis characteristic structure and selection method for drainage basin water level prediction

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101877029A (en) * 2009-11-25 2010-11-03 国网电力科学研究院 Hydrologic forecasting method of hydrologic model combination of different mechanisms
CN102789445A (en) * 2012-07-13 2012-11-21 南京大学 Establishment method for wavelet analysis and rank set pair analysis of medium and long-term hydrological forecast model
CN104050242B (en) * 2014-05-27 2018-03-27 哈尔滨理工大学 Feature selecting, sorting technique and its device based on maximum information coefficient
JP2017015453A (en) * 2015-06-29 2017-01-19 株式会社東芝 Control processing apparatus for dam management and control processing method for dam management
CN105512767B (en) * 2015-12-15 2019-06-11 武汉大学 A kind of Flood Forecasting Method of more leading times
CN105577679B (en) * 2016-01-14 2019-02-15 华东师范大学 A kind of anomalous traffic detection method based on feature selecting and density peaks cluster

Also Published As

Publication number Publication date
CN107992447A (en) 2018-05-04

Similar Documents

Publication Publication Date Title
CN107992447B (en) Feature selection decomposition method applied to river water level prediction data
CN112350876A (en) Network flow prediction method based on graph neural network
CN113240187B (en) Prediction model generation method, system, device, storage medium and prediction method
CN111091233A (en) Wind power plant short-term wind power prediction modeling method based on wavelet analysis and multi-model AdaBoost depth network
CN111079989B (en) DWT-PCA-LSTM-based water supply amount prediction device for water supply company
CN113411216B (en) Network flow prediction method based on discrete wavelet transform and FA-ELM
CN112270229A (en) Landslide mass displacement prediction method based on singular spectrum analysis
CN114565124A (en) Ship traffic flow prediction method based on improved graph convolution neural network
CN111242351A (en) Tropical cyclone track prediction method based on self-encoder and GRU neural network
CN114966312A (en) Power distribution network fault detection and positioning method and system based on migration convolutional neural network
CN114428803A (en) Operation optimization method and system for air compression station, storage medium and terminal
CN114252879A (en) InSAR inversion and multi-influence factor based large-range landslide deformation prediction method
CN116565863A (en) Short-term photovoltaic output prediction method based on space-time correlation
CN115222138A (en) Photovoltaic short-term power interval prediction method based on EEMD-LSTM microgrid
CN115310536A (en) Reservoir water level prediction early warning method based on neural network and GCN deep learning model
CN114492540A (en) Training method and device of target detection model, computer equipment and storage medium
CN116934117A (en) Carbon emission peak prediction method and system
CN109284286B (en) Method for extracting effective characteristics from original data set
CN116338322A (en) Power grid line impedance prediction method and system
CN116011655A (en) Load ultra-short-term prediction method and system based on two-stage intelligent feature engineering
CN115936196A (en) Monthly rainfall model prediction method based on time sequence convolution network
CN113657533A (en) Multi-element time sequence segmentation clustering method for space-time scene construction
CN115757365A (en) Multi-dimensional time sequence data anomaly detection method, model training method and device
MANUSOV et al. Analysis of electricity consumption forecasting methods for the coal industry.
CN118052480B (en) Method for quickly acquiring portrait data of home engineer based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20191217

Termination date: 20211213