CN107992447A - A kind of feature selecting decomposition method applied to river level prediction data - Google Patents
A kind of feature selecting decomposition method applied to river level prediction data Download PDFInfo
- Publication number
- CN107992447A CN107992447A CN201711330726.1A CN201711330726A CN107992447A CN 107992447 A CN107992447 A CN 107992447A CN 201711330726 A CN201711330726 A CN 201711330726A CN 107992447 A CN107992447 A CN 107992447A
- Authority
- CN
- China
- Prior art keywords
- mrow
- feature
- msub
- collection
- lasso
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
- G06F17/148—Wavelet transforms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/30—Circuit design
- G06F30/32—Circuit design at the digital level
- G06F30/333—Design for testability [DFT], e.g. scan chain or built-in self-test [BIST]
Abstract
The invention discloses a kind of feature selecting decomposition method applied to river level prediction data, in order to obtain most suitable as the feature of mode input, returned present invention introduces LASSO and carry out feature selecting to being originally inputted collection, and integrate MODWT and ingredient breakdown is carried out to the feature that selection obtains, and using the performance of model measurement LASSO MODWT based on multiple linear regression.Test shows that the feature selecting decomposition method based on LASSO MODWT is conducive to improve the performance and model explanation ability of river level prediction model.
Description
Technical field
The invention belongs to water level forecast technical field, and in particular to a kind of feature applied to river level prediction data is selected
Select the design of decomposition method.
Background technology
Water level forecast plays the role of particularly important for flood control and disaster reduction, water resource utilization and allocation managing.One sane
Forecast model of water level the situations of change of future levels can be provided for relevant Decision person, grasp potential hydrological disaster in time,
So as to carry out correlation early warning deployment earlier.In water level forecast field, due to influencing the multi-dimensional nature and complexity of water level factor, mould
Nonlinear dynamical relations and a variety of correlations are often presented between the potential input quantity of type system.In addition the number of input quantity is general
Larger, especially into the dimension and computation complexity of increase feature that can be drastically after the hysteresis of each variable, but these become
A large amount of duplicate messages and noise contribution are actually included in amount.In order to reduce the computational complexity of model, the flexible of model is improved
Property and explanation strengths, it is necessary to the validity feature that selection includes minimum redundancy be concentrated from original high dimensional data, so as to build one
Possesses the model that is more succinct, more reflecting real water level changing rule of retractility.
LASSO is proposed first by Robert Tibshirani in 1996, full name Least absolute shrinkage
and selection operator.This method is a kind of Shrinkage estimation, it obtains one more by constructing a penalty function
The model of refining so that it compresses some coefficients, and it is zero to concurrently set some coefficients.Therefore the advantages of subset is shunk is remained,
It is a kind of Biased estimator for handling and there are multi-collinearity data.The basic thought of LASSO is in the sum of absolute value of regression coefficient
Under constraints less than or equal to a constant, residual sum of squares (RSS) RSS is minimized, be strictly 0 so as to produce some
Regression coefficient, obtains the model with interpretability after compressive features.
Wavelet transform (DWT) is widely used in the model of many integrated small echos, can obtain data
Detailed spectrum information, such as periodicity, localized variation characteristic, randomness and mutability.But since it is with extraction effect,
The effect can introduce potential loss of learning so as to produce deviation in prediction in the model construction stage.In addition the small echo of DWT
Conversion coefficient result is related with the initial position of wavelet transformation, so as to bring certain contingency.
Drawbacks described above based on DWT, related personnel further provide Maximum overlap wavelet transform (MODWT,
Maximum overlap discrete wavelet transform) method as feature decomposition.MODWT is a kind of linear
Filter operation, can preferably solve extraction effect, by MODWT, can obtain the multidimensional wavelet system with dimension with observation
Number.In addition, the result of wavelet transformation is unrelated with the start position converted, it can be used for the conversion of different sample size data.Always
For, MODWT can use the different frequency range component of extraction input signal, so as to obtain the information of more horn of plenty, disclose data
Potential changing rule.
The content of the invention
The purpose of the invention is to reduce the computational complexity of existing forecast model of water level, while it is pre- to improve existing water level
Survey flexibility and the explanation strengths of model, it is proposed that a kind of feature selecting decomposition method applied to river level prediction data.
The technical scheme is that:A kind of feature selecting decomposition method applied to river level prediction data, including
Following steps:
S1, collection influence hydrographic features (the current level information of targeted sites, the upstream basin of target prediction website water level
Water level information and on the way rainfall etc.).
S2, according to each hydrographic features, based on information theory construction feature collection.
S3, based on correlation analysis in feature set each feature introduce hysteresis, structure be originally inputted collection.
S4, be standardized to being originally inputted collection.
S5, based on LASSO to after standardization input set carry out feature selecting.
S6, based on MODWT to after feature selecting input set carry out feature decomposition, obtain after LASSO-MODWT optimizes
Input set.
The beneficial effects of the invention are as follows:The present invention is returned using LASSO and carries out feature selecting to being originally inputted collection, and is integrated
MODWT carries out ingredient breakdown to the feature that selection obtains so that the estimated performance of river level is obviously improved, is conducive to carry
The performance and model explanation ability of high river level prediction model.
Further, step S2 is specially:Each hydrographic features are calculated respectively and predict the maximum information coefficient between target
MIC, analyzes the intensity of its relation between prediction target, will predict that the MIC value between target will more than the hydrology of given threshold
Element is used as input feature vector, construction feature collection.
The calculation formula of maximum information coefficient MIC is:
Wherein X, Y are two stochastic variables, and B limits for segmentation, takes 0.6 or 0.55 power of total amount of data, MIC [X;Y]
Represent the maximum information coefficient between X and Y, I [X;Y] represent mutual information between X and Y, calculation formula is:
Wherein p (X) and p (Y) represents the probability density function of X, Y respectively, and p (X, Y) represents that the joint probability of X, Y are close
Spend distribution function.
Above-mentioned further scheme has the beneficial effect that:Each hydrographic features and prediction mesh are analyzed using maximum information coefficient MIC
Relationship strength between mark, will predict between target there is the factor of stronger relation as input feature vector, construction feature collection.
Further, step S3 is specially:For the current level information of targeted sites in feature set, using partial autocorrelation
Function PACF determines hysteresis, for other input feature vectors in feature set, is analyzed using cross-correlation coefficient and determines hysteresis;It is right
In each hysteresis, if clear and definite statistic correlation is presented between prediction target in it, that is, reach 95% confidential interval,
Then the hysteresis is added in input set, collection is originally inputted so as to build.
Above-mentioned further scheme has the beneficial effect that:Since prediction target river level information is time series, so that structure
Build the influence for being considered as introducing hysteresis when being originally inputted collection.
Further, step S4 is specially:Collected using min-max value standardization processing method to being originally inputted into rower
Quasi-ization processing, will be originally inputted collection and zooms in [0,1] section, and processing formula is:
Wherein xi,normFor the data value after standardization, xiRepresent to be originally inputted and concentrate the i-th data item for needing to standardize,
NminAnd NmaxThe minimum value and maximum respectively scaled, is 0 and 1, xminAnd xmaxRespectively it is originally inputted the minimum of concentration
Value and maximum.
Above-mentioned further scheme has the beneficial effect that:Since different input datas have different dimensions, in order to adopt
Assessed with same standard to being originally inputted collection, it is necessary to be standardized to being originally inputted collection, realize nondimensionalization,
Collection will be originally inputted to zoom in [0,1] section.
Further, step S5 specifically include it is following step by step:
S51, using the input set after standardization as mode input, will predict the waterlevel data collection of targeted sites as
Model exports, and builds LASSO regression models.
S52, be trained LASSO regression models, and the parameter lambda returned using grid data service to LASSO carries out optimizing,
Find optimized parameter.
S53, using the LASSO regression models with optimized parameter score the feature in input set, standards of grading
The regression coefficient returned for LASSO, selects LASSO regression coefficients to be remained in for positive feature in input set, will
LASSO regression coefficients are 0 or are that negative feature is removed from input set, realize the feature selecting to input set.
Above-mentioned further scheme has the beneficial effect that:Feature choosing is carried out to the input set after standardization by LASSO
After selecting, prediction accuracy can be improved on the premise of mode input parameter is greatly decreased.
Further, step S6 is specially:Feature decomposition is carried out to the input set after feature selecting using MODWT models,
The wavelet systems manifold that all feature decompositions are obtained is used to build the input set after optimization.
The formula of wherein feature decomposition is:
Wherein f (t) is characterized the wavelet coefficient for decomposing and obtaining,To the smooth near of original signal during to carry out M layers of decomposition
Like wavelet, Wm(t) be original signal in m layers of decomposition wavelet, m=1,2 ..., M, M be the minimal decomposition number of plies, calculation formula is:
M=int [log (N)] (5)
Wherein N is characterized the input set length after selection, and int [] is the function that rounds up.
Above-mentioned further scheme has the beneficial effect that:Feature is carried out to the input set after feature selecting using MODWT models
Resolution significantly improves the precision of river level prediction.
Further, MODWT models use Daubechies wavelet basis.
Above-mentioned further scheme has the beneficial effect that:Need to select suitable wavelet basis when establishing the model based on MODWT
Function, since currently without a clear and definite general basic function selection criteria, also which kind of base is pertinent literature explanation do not select
Function can obtain best modelling effect, and different application scene is adapted to different basic functions in theory, it is contemplated that hydrologic(al) prognosis is fitted
Irregular wavelet basis is shared, the present invention uses Daubechies wavelet basis, its extensive use and hydrologic(al) prognosis field.
Brief description of the drawings
Fig. 1 show a kind of feature selecting decomposition side applied to river level prediction data provided in an embodiment of the present invention
Method flow chart.
Fig. 2 show the Daubechies wavelet basis provided in an embodiment of the present invention using db3 forms and WL_CS is carried out
The comparative result figure that DMDWT is obtained.
Fig. 3 show three hours provided in an embodiment of the present invention and predicts different input set predicted values and actual value comparison diagram.
Fig. 4 show three hours provided in an embodiment of the present invention and predicts different input set predicted values and actual value scatter diagram.
Embodiment
Carry out detailed description of the present invention illustrative embodiments referring now to attached drawing.It should be appreciated that shown in attached drawing and
What the embodiment of description was merely exemplary, it is intended that explaination the principle of the present invention and spirit, and not limit the model of the present invention
Enclose.
An embodiment of the present invention provides a kind of feature selecting decomposition method applied to river level prediction data, such as Fig. 1
It is shown, comprise the following steps S1-S6:
S1, collection influence hydrographic features (including the current level information of targeted sites, the upstream of target prediction website water level
The hydrographic features such as basin water level information and on the way rainfall).
In the embodiment of the present invention, by taking the SEA LEVEL VARIATION trend of Chishui River middle and lower reaches as an example, it is therefore intended that prediction Chishui station is not
Come 3 it is small when and 6 it is small when water level conditions.The data of use are by Chishui River middle and lower reaches bank automatic monitor station in 2015 and 2016
Gathered during 5~October, the associated stations information being related to is shown in Table 1 and shows.Since data are to gather to store by hour, so that always
Share 8834 data points.Missing is unavoidably had in data acquisition and storing process, analysis finds that missing data is WL_
Data are carried out interpolation polishing by totally 126 item datas of 02~2015-10-14 of MT 2015-10-09 07 using pandas.
Table 1
Code name | Meaning of parameters | Monitoring station | Data type | Collection period |
WL_CS | Chishui station water level | Chishui station | Water level | By hour |
WL_EL | Two youths station water level | Two youths stand | Water level | By hour |
WL_MT | Maotai station water level | Maotai station | Water level | By hour |
RF_CS | Chishui station rainfall | Chishui station | Rainfall | By hour |
RF_XS | Xishui County station rainfall | Xishui County station | Rainfall | By hour |
S2, according to each hydrographic features, based on information theory construction feature collection.
Calculate respectively each hydrographic features and predict target between maximum information coefficient MIC, analyze itself and prediction target it
Between relation intensity, by predict target between MIC value be more than given threshold hydrographic features (i.e. prediction target between with compared with
The hydrographic features of strong relation) it is used as input feature vector, construction feature collection.
The calculation formula of maximum information coefficient MIC is:
Wherein X, Y are two stochastic variables, and B limits for segmentation, determines the upper limit of X, Y separate division, evidence of generally fetching
0.6 or 0.55 power of total amount, MIC [X;Y] represent X and Y between maximum information coefficient, I [X;Y] represent X and Y between it is mutual
Information, calculation formula are:
Wherein p (X) and p (Y) represents the probability density function of X, Y respectively, and p (X, Y) represents that the joint probability of X, Y are close
Spend distribution function.
In the embodiment of the present invention, a total of 5 features of feature set, include following content:(1) three Hydrologic monitoring station Chishui
Stand, Maotai station, two youths stand waterlevel data (code name WL_CS, WL_MT, WL_EL);(2) two weather monitoring station Chishui station, practise
Water station rainfall product data (code name RF_CS, RF_XS).
S3, based on correlation analysis in feature set each feature introduce hysteresis, structure be originally inputted collection.
Since prediction target river level information is time series, it is considered as introducing hysteresis when being originally inputted collection so as to build
The influence of amount.In the embodiment of the present invention, for the current level information of targeted sites in feature set, using partial autocorrelation function
PACF determines hysteresis, for other input feature vectors in feature set, is analyzed using cross-correlation coefficient and determines hysteresis;For every
One hysteresis, if clear and definite statistic correlation (confidential interval for reaching 95%) is presented between prediction target in it,
The hysteresis is added in input set, collection is originally inputted so as to build.Partial autocorrelation function PACF is analyzed with cross-correlation coefficient
Method is correlation analysis commonly used in the art, and details are not described herein.
In the embodiment of the present invention, 3h predicts the spy for being originally inputted collection after introducing hysteresis to each feature by correlation analysis
It is 221 to levy number, and 6h is predicted as 229.
S4, be standardized to being originally inputted collection.
Since different input datas have different dimensions, in order to which same standard can be used to be carried out to being originally inputted collection
Assessment realizes nondimensionalization, it is necessary to be standardized to being originally inputted collection.In the embodiment of the present invention, using min-max
Value standardization processing method (Min-Max Scaler) is standardized to being originally inputted collection, will be originally inputted collection and is zoomed to
In [0,1] section, processing formula is:
Wherein xi,normFor the data value after standardization, xiRepresent to be originally inputted and concentrate the i-th data item for needing to standardize,
NminAnd NmaxThe minimum value and maximum respectively scaled, is 0 and 1, xminAnd xmaxRespectively it is originally inputted the minimum of concentration
Value and maximum.
S5, based on LASSO to after standardization input set carry out feature selecting.
In order to simplify input set, the feature most suitable as input is selected, to element input set in the embodiment of the present invention
Returned based on LASSO and carry out feature selecting., can be by the recurrence system of redundant character since it introduces L1 regular terms as penalty term
Number boil down to 0, so as to be a kind of sparse features system of selection based on the LASSO feature selectings returned.
Step S5 specifically includes following S51-S53 step by step:
S51, using the input set after standardization as mode input, will predict the waterlevel data collection of targeted sites as
Model exports, and builds LASSO regression models.
S52, be trained LASSO regression models, and the parameter lambda returned using grid data service to LASSO carries out optimizing,
Find optimized parameter.
S53, using the LASSO regression models with optimized parameter score the feature in input set, standards of grading
The regression coefficient returned for LASSO, selects LASSO regression coefficients to be remained in for positive feature in input set, will
LASSO regression coefficients are 0 or are that negative feature is removed from input set, realize the feature selecting to input set.
In the embodiment of the present invention, the feature that 3h is predicted after the feature selecting based on LASSO is 49, the spy of 6h predictions
Levy as 88.It can be seen that the number of input feature vector is all greatly reduced under two kinds of prediction scenes, model construction is thereby reduced
Complexity.
S6, based on MODWT to after feature selecting input set carry out feature decomposition, obtain after LASSO-MODWT optimizes
Input set.
Feature decomposition is carried out to the input set after feature selecting using MODWT models, all feature decompositions are obtained small
Wave system manifold is used to build the input set after optimization.
The formula of wherein feature decomposition is:
Wherein f (t) is characterized the wavelet coefficient for decomposing and obtaining,To the smooth near of original signal during to carry out M layers of decomposition
Like wavelet, Wm(t) be original signal in m layers of decomposition wavelet, m=1,2 ..., M, M be the minimal decomposition number of plies, calculation formula is:
M=int [log (N)] (5)
Wherein N is characterized the input set length after selection, and int [] is the function that rounds up.
Effective input set in the embodiment of the present invention is 8678, therefore the minimal decomposition number of plies of MODWT is:M=log
(8678)=3.93, rounding M=4, takes two kinds of situations of M=4 and M=5 to be tested in the embodiment of the present invention.
Although MODWT has been proved to possess many advantages as a multiresolution features identification facility, building
Be based on MODWT model when face one challenge be the suitable wavelet basis function of selection, due to currently without one clearly
General basic function selection criteria, also pertinent literature does not illustrate to select which kind of basic function to obtain best modelling effect,
Different application scene is adapted to different basic functions in theory.It is adapted in view of hydrologic(al) prognosis with irregular wavelet basis, the present invention
Embodiment uses Daubechies wavelet basis, its extensive use and hydrologic(al) prognosis field.Db2, db3 are used in the embodiment of the present invention
Contrast test is carried out with the wavelet basis of tri- kinds of forms of db4, seeks to be most suitable for the wavelet basis for Chishui River water level forecast.
Be shown in Fig. 2 WL_CS is carried out using the Daubechies wavelet basis of db3 forms it is that DMDWT is obtained as a result, from upper
6 subgraphs under be respectively original signal waveform, smoothed approximation waveform (A4) and four layers of DMDWT decomposition coefficients (d1, d2, d3,
d4).To reduce computational complexity, the embodiment of the present invention by score most important WL_CS this feature of LASSO only to being divided
Solution, input set (4 layers, 5 layers are decomposed respectively 5 dimensions, 6 and maintain number) is added using the wavelet coefficient obtained after decomposition as new feature,
At this time 3 it is small when predicted characteristics be 53,6 it is small when predicted characteristics be 92.
Since no general single index for being used to assess hydrologic forecast model performance, the embodiment of the present invention are assorted by receiving
Efficiency factor ENS, tri- kinds of statistics exponent pair estimated performances of root-mean-square error RMSE and mean absolute error MAE integrate commenting
Sentence.
(1) assorted efficiency factor E is receivedNS:
(2) root-mean-square error RMSE:
(3) mean absolute error MAE:
Wherein, SWLOBSFor measured water level SWLFORFor the water level obtained by model prediction, N is data point number,For the population mean of measured water level.
In the embodiment of the present invention, collection is originally inputted by what is obtained based on correlation analysis, by the spy based on LASSO respectively
The input that the input set of selection and the input set after LASSO-MODWT optimizes are levied as multiple linear regression model is used in advance
Survey Chishui station 3 as a child with 6 waterlevel datas when small, and then assess LASSO-MODWT feature selecting decomposition methods performance.Table
2 be different input sets for prediction Chishui station 3 when small and when 6 is small water level performance comparison.It is from table 2 it can be seen that either right
Prediction is predicted when still 6 is small when 3 is small, mode input parameter can be greatly decreased after the feature selecting based on LASSO
Under the premise of improve prediction accuracy;And precision of prediction can be significantly improved after integrated MODWT, and for 3 it is small when prediction and 6 it is small when
Prediction has good performance.
Table 2
Fig. 3 is the contrast of water level forecast result and actual value when different input sets are small to during in the August, 2016 of Chishui station 3,
Fig. 4 is three kinds of input set predicted values and actual value scatter diagram.As can be seen that after the decomposition of LASSO-MODWT feature selectings, phase
Prediction result for being originally inputted collection, the predicted value of LASSO-W-MLR and the degree of approximation higher of actual value, model performance is more
Stablize.So as to illustrate that LASSO-MODWT feature selectings decomposition method can be obviously improved Chishui River forecast model of water level
Precision and stability.
In order to further study influence of the different small echo base types to Chishui River water level forecast performance, in the embodiment of the present invention
Two kinds of Decomposition orders of tri- kinds of small echos of db2, db3, db4 and level4, level5 are emulated respectively, table 3 is using different small
Ripple base and Decomposition order carry out the results of property of 3h predictions and 6h predictions.It is from table 3 it can be seen that small using db2 wavelet basis and 5 layers
Wave Decomposition can obtain more preferably estimated performance in Chishui River forecast model of water level.The result further illustrates different application
Scene is adapted to use different wavelet basis, in actual modeling process, should carry out demonstration trial with reference to specific requirements, find most
Suitable wavelet basis and Decomposition order, so as to improve model accuracy.
Table 3
Those of ordinary skill in the art will understand that the embodiments described herein, which is to help reader, understands this hair
Bright principle, it should be understood that protection scope of the present invention is not limited to such special statement and embodiment.This area
Those of ordinary skill these disclosed technical inspirations can make according to the present invention and various not depart from the other each of essence of the invention
The specific deformation of kind and combination, these deform and combine still within the scope of the present invention.
Claims (10)
1. a kind of feature selecting decomposition method applied to river level prediction data, it is characterised in that comprise the following steps:
S1, collection influence the hydrographic features of target prediction website water level;
S2, according to each hydrographic features, based on information theory construction feature collection;
S3, based on correlation analysis in feature set each feature introduce hysteresis, structure be originally inputted collection;
S4, be standardized to being originally inputted collection;
S5, based on LASSO to after standardization input set carry out feature selecting;
S6, based on MODWT to after feature selecting input set carry out feature decomposition, obtain defeated after LASSO-MODWT optimizes
Enter collection.
2. feature selecting decomposition method according to claim 1, it is characterised in that influence target prediction in the step S1
The current level information of hydrographic features of website water level including targeted sites, upstream basin water level information and rainfall on the way.
3. feature selecting decomposition method according to claim 1, it is characterised in that the step S2 is specially:Count respectively
Calculate each hydrographic features and predict the maximum information coefficient MIC between target, analyze the intensity of its relation between prediction target, will
MIC value between prediction target is more than the hydrographic features of given threshold as input feature vector, construction feature collection.
4. feature selecting decomposition method according to claim 3, it is characterised in that the meter of the maximum information coefficient MIC
Calculating formula is:
<mrow>
<mi>M</mi>
<mi>I</mi>
<mi>C</mi>
<mo>&lsqb;</mo>
<mi>X</mi>
<mo>;</mo>
<mi>Y</mi>
<mo>&rsqb;</mo>
<mo>=</mo>
<munder>
<mrow>
<mi>m</mi>
<mi>a</mi>
<mi>x</mi>
</mrow>
<mrow>
<mo>|</mo>
<mi>X</mi>
<mo>|</mo>
<mo>|</mo>
<mi>Y</mi>
<mo>|</mo>
<mo><</mo>
<mi>B</mi>
</mrow>
</munder>
<mfrac>
<mrow>
<mi>I</mi>
<mo>&lsqb;</mo>
<mi>X</mi>
<mo>;</mo>
<mi>Y</mi>
<mo>&rsqb;</mo>
</mrow>
<mrow>
<msub>
<mi>log</mi>
<mn>2</mn>
</msub>
<mrow>
<mo>(</mo>
<mi>m</mi>
<mi>i</mi>
<mi>n</mi>
<mo>(</mo>
<mrow>
<mo>|</mo>
<mi>X</mi>
<mo>|</mo>
<mo>,</mo>
<mo>|</mo>
<mi>Y</mi>
<mo>|</mo>
</mrow>
<mo>)</mo>
<mo>)</mo>
</mrow>
</mrow>
</mfrac>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
</mrow>
Wherein X, Y are two stochastic variables, and B limits for segmentation, takes 0.6 or 0.55 power of total amount of data, MIC [X;Y] represent X
Maximum information coefficient between Y, I [X;Y] represent mutual information between X and Y, calculation formula is:
<mrow>
<mi>I</mi>
<mo>&lsqb;</mo>
<mi>X</mi>
<mo>;</mo>
<mi>Y</mi>
<mo>&rsqb;</mo>
<mo>=</mo>
<munder>
<mo>&Sigma;</mo>
<mrow>
<mi>X</mi>
<mo>,</mo>
<mi>Y</mi>
</mrow>
</munder>
<mi>p</mi>
<mrow>
<mo>(</mo>
<mi>X</mi>
<mo>,</mo>
<mi>Y</mi>
<mo>)</mo>
</mrow>
<msub>
<mi>log</mi>
<mn>2</mn>
</msub>
<mfrac>
<mrow>
<mi>p</mi>
<mrow>
<mo>(</mo>
<mi>X</mi>
<mo>,</mo>
<mi>Y</mi>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<mi>p</mi>
<mrow>
<mo>(</mo>
<mi>X</mi>
<mo>)</mo>
</mrow>
<mi>p</mi>
<mrow>
<mo>(</mo>
<mi>Y</mi>
<mo>)</mo>
</mrow>
</mrow>
</mfrac>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>2</mn>
<mo>)</mo>
</mrow>
</mrow>
Wherein p (X) and p (Y) represents the probability density function of X, Y respectively, and p (X, Y) represents the joint probability density point of X, Y
Cloth function.
5. feature selecting decomposition method according to claim 3, it is characterised in that the step S3 is specially:For spy
The current level information of targeted sites in collection, determines hysteresis, for its in feature set using partial autocorrelation function PACF
Its input feature vector, is analyzed using cross-correlation coefficient and determines hysteresis;For each hysteresis, if it is between prediction target
Clear and definite statistic correlation is presented, that is, reaches 95% confidential interval, then the hysteresis is added in input set, so as to build
It is originally inputted collection.
6. feature selecting decomposition method according to claim 1, it is characterised in that the step S4 is specially:Using most
Small-maximum standardization processing method is standardized to being originally inputted collection, will be originally inputted collection and is zoomed to [0,1] section
Interior, processing formula is:
<mrow>
<msub>
<mi>x</mi>
<mrow>
<mi>i</mi>
<mo>,</mo>
<mi>n</mi>
<mi>o</mi>
<mi>r</mi>
<mi>m</mi>
</mrow>
</msub>
<mo>=</mo>
<msub>
<mi>N</mi>
<mrow>
<mi>m</mi>
<mi>i</mi>
<mi>n</mi>
</mrow>
</msub>
<mo>+</mo>
<mfrac>
<mrow>
<msub>
<mi>x</mi>
<mi>i</mi>
</msub>
<mo>-</mo>
<msub>
<mi>x</mi>
<mrow>
<mi>m</mi>
<mi>i</mi>
<mi>n</mi>
</mrow>
</msub>
</mrow>
<mrow>
<msub>
<mi>x</mi>
<mi>max</mi>
</msub>
<mo>-</mo>
<msub>
<mi>x</mi>
<mrow>
<mi>m</mi>
<mi>i</mi>
<mi>n</mi>
</mrow>
</msub>
</mrow>
</mfrac>
<mo>&times;</mo>
<mrow>
<mo>(</mo>
<msub>
<mi>N</mi>
<mrow>
<mi>m</mi>
<mi>a</mi>
<mi>x</mi>
</mrow>
</msub>
<mo>-</mo>
<msub>
<mi>N</mi>
<mrow>
<mi>m</mi>
<mi>i</mi>
<mi>n</mi>
</mrow>
</msub>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>3</mn>
<mo>)</mo>
</mrow>
</mrow>
Wherein xi,normFor the data value after standardization, xiRepresent to be originally inputted and concentrate the i-th data item for needing to standardize, NminWith
NmaxThe minimum value and maximum respectively scaled, is 0 and 1, xminAnd xmaxRespectively it is originally inputted the minimum value and most of concentration
Big value.
7. feature selecting decomposition method according to claim 1, it is characterised in that the step S5 specifically includes following point
Step:
S51, using the input set after standardization as mode input, the waterlevel data collection of targeted sites will be predicted as model
Output, builds LASSO regression models;
S52, be trained LASSO regression models, and the parameter lambda returned using grid data service to LASSO carries out optimizing, finds
Optimized parameter;
S53, using the LASSO regression models with optimized parameter score the feature in input set, and standards of grading are
The regression coefficient that LASSO is returned, selects LASSO regression coefficients to be remained in for positive feature in input set, by LASSO
Regression coefficient is 0 or is that negative feature is removed from input set, realizes the feature selecting to input set.
8. feature selecting decomposition method according to claim 1, it is characterised in that the step S6 is specially:Using
MODWT models carry out feature decomposition to the input set after feature selecting, and the wavelet systems manifold that all feature decompositions obtain is used for
Input set after structure optimization.
9. feature selecting decomposition method according to claim 8, it is characterised in that the formula of the feature decomposition is:
<mrow>
<mi>f</mi>
<mrow>
<mo>(</mo>
<mi>t</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mover>
<mi>W</mi>
<mo>&OverBar;</mo>
</mover>
<mrow>
<mo>(</mo>
<mi>t</mi>
<mo>)</mo>
</mrow>
<mo>+</mo>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>m</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>M</mi>
</munderover>
<msub>
<mi>W</mi>
<mi>m</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>t</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>4</mn>
<mo>)</mo>
</mrow>
</mrow>
Wherein f (t) is characterized the wavelet coefficient for decomposing and obtaining,To smoothed approximation of original signal during to carry out M layers of decomposition
Ripple, Wm(t) be original signal in m layers of decomposition wavelet, m=1,2 ..., M, M be the minimal decomposition number of plies, calculation formula is:
M=int [log (N)] (5)
Wherein N is characterized the input set length after selection, and int [] is the function that rounds up.
10. feature selecting decomposition method according to claim 8, it is characterised in that the MODWT models use
Daubechies wavelet basis.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711330726.1A CN107992447B (en) | 2017-12-13 | 2017-12-13 | Feature selection decomposition method applied to river water level prediction data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711330726.1A CN107992447B (en) | 2017-12-13 | 2017-12-13 | Feature selection decomposition method applied to river water level prediction data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107992447A true CN107992447A (en) | 2018-05-04 |
CN107992447B CN107992447B (en) | 2019-12-17 |
Family
ID=62038276
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711330726.1A Expired - Fee Related CN107992447B (en) | 2017-12-13 | 2017-12-13 | Feature selection decomposition method applied to river water level prediction data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107992447B (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108921222A (en) * | 2018-07-05 | 2018-11-30 | 四川泰立智汇科技有限公司 | A kind of air-conditioning energy consumption feature selection approach based on big data |
CN109488321A (en) * | 2019-01-03 | 2019-03-19 | 天津大学 | A kind of Cutter Head Torque in Shield Tunneling determines method and system |
CN109736819A (en) * | 2019-01-03 | 2019-05-10 | 天津大学 | A kind of shield driving gross thrust determines method and system |
CN110311376A (en) * | 2019-07-31 | 2019-10-08 | 三峡大学 | A kind of Electrical Power System Dynamic security evaluation collective model and space-time method for visualizing |
CN110427663A (en) * | 2019-07-17 | 2019-11-08 | 武汉大学 | Face precipitation-water-level simulation method based on time series network |
CN111539587A (en) * | 2020-03-06 | 2020-08-14 | 李�杰 | Hydrological forecasting method |
CN112529252A (en) * | 2020-11-18 | 2021-03-19 | 贵州电网有限责任公司 | Small hydropower station forebay water level prediction method and prediction system |
CN113222145A (en) * | 2021-06-04 | 2021-08-06 | 西安邮电大学 | MODWT-EMD-based time sequence hybrid prediction method |
US20210374864A1 (en) * | 2020-05-29 | 2021-12-02 | Fortia Financial Solutions | Real-time time series prediction for anomaly detection |
CN115713164A (en) * | 2022-11-26 | 2023-02-24 | 福建中锐汉鼎数字科技有限公司 | Drainage basin downstream water level prediction method |
CN115828757A (en) * | 2022-12-12 | 2023-03-21 | 福建中锐汉鼎数字科技有限公司 | Flood discharge hysteresis characteristic construction and selection method for basin water level prediction |
CN115905198A (en) * | 2022-11-24 | 2023-04-04 | 中国长江电力股份有限公司 | Water level data early warning method for key water level station of Yangtze river basin |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101877029A (en) * | 2009-11-25 | 2010-11-03 | 国网电力科学研究院 | Hydrologic forecasting method of hydrologic model combination of different mechanisms |
CN102789445A (en) * | 2012-07-13 | 2012-11-21 | 南京大学 | Establishment method for wavelet analysis and rank set pair analysis of medium and long-term hydrological forecast model |
CN104050242A (en) * | 2014-05-27 | 2014-09-17 | 哈尔滨理工大学 | Feature selection and classification method based on maximum information coefficient and feature selection and classification device based on maximum information coefficient |
CN105512767A (en) * | 2015-12-15 | 2016-04-20 | 武汉大学 | Flood forecasting method of multiple forecast periods |
CN105577679A (en) * | 2016-01-14 | 2016-05-11 | 华东师范大学 | Method for detecting anomaly traffic based on feature selection and density peak clustering |
JP2017015453A (en) * | 2015-06-29 | 2017-01-19 | 株式会社東芝 | Control processing apparatus for dam management and control processing method for dam management |
-
2017
- 2017-12-13 CN CN201711330726.1A patent/CN107992447B/en not_active Expired - Fee Related
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101877029A (en) * | 2009-11-25 | 2010-11-03 | 国网电力科学研究院 | Hydrologic forecasting method of hydrologic model combination of different mechanisms |
CN102789445A (en) * | 2012-07-13 | 2012-11-21 | 南京大学 | Establishment method for wavelet analysis and rank set pair analysis of medium and long-term hydrological forecast model |
CN104050242A (en) * | 2014-05-27 | 2014-09-17 | 哈尔滨理工大学 | Feature selection and classification method based on maximum information coefficient and feature selection and classification device based on maximum information coefficient |
JP2017015453A (en) * | 2015-06-29 | 2017-01-19 | 株式会社東芝 | Control processing apparatus for dam management and control processing method for dam management |
CN105512767A (en) * | 2015-12-15 | 2016-04-20 | 武汉大学 | Flood forecasting method of multiple forecast periods |
CN105577679A (en) * | 2016-01-14 | 2016-05-11 | 华东师范大学 | Method for detecting anomaly traffic based on feature selection and density peak clustering |
Non-Patent Citations (2)
Title |
---|
小木匠_: "数据归归一化方法", 《HTTPS://BLOG.CSDN.NET/QQ_20823641/ARTICLE/DETAILS/51345057》 * |
钱塘小甲子: "最大信息系数(MIC)", 《HTTPS://BLOG.CSDN.NET/QTLYX/ARTICLE/DETAILS/50780400》 * |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108921222A (en) * | 2018-07-05 | 2018-11-30 | 四川泰立智汇科技有限公司 | A kind of air-conditioning energy consumption feature selection approach based on big data |
CN109488321A (en) * | 2019-01-03 | 2019-03-19 | 天津大学 | A kind of Cutter Head Torque in Shield Tunneling determines method and system |
CN109736819A (en) * | 2019-01-03 | 2019-05-10 | 天津大学 | A kind of shield driving gross thrust determines method and system |
CN109488321B (en) * | 2019-01-03 | 2019-11-29 | 天津大学 | A kind of Cutter Head Torque in Shield Tunneling determines method and system |
CN110427663A (en) * | 2019-07-17 | 2019-11-08 | 武汉大学 | Face precipitation-water-level simulation method based on time series network |
CN110311376A (en) * | 2019-07-31 | 2019-10-08 | 三峡大学 | A kind of Electrical Power System Dynamic security evaluation collective model and space-time method for visualizing |
CN110311376B (en) * | 2019-07-31 | 2022-12-20 | 三峡大学 | Dynamic safety assessment comprehensive model and space-time visualization method for power system |
CN111539587B (en) * | 2020-03-06 | 2023-11-24 | 武汉极善信息技术有限公司 | Hydrologic forecasting method |
CN111539587A (en) * | 2020-03-06 | 2020-08-14 | 李�杰 | Hydrological forecasting method |
US20210374864A1 (en) * | 2020-05-29 | 2021-12-02 | Fortia Financial Solutions | Real-time time series prediction for anomaly detection |
CN112529252A (en) * | 2020-11-18 | 2021-03-19 | 贵州电网有限责任公司 | Small hydropower station forebay water level prediction method and prediction system |
CN112529252B (en) * | 2020-11-18 | 2022-05-03 | 贵州电网有限责任公司 | Small hydropower station forebay water level prediction method and prediction system |
CN113222145A (en) * | 2021-06-04 | 2021-08-06 | 西安邮电大学 | MODWT-EMD-based time sequence hybrid prediction method |
CN113222145B (en) * | 2021-06-04 | 2023-12-22 | 西安邮电大学 | MODTT-EMD-based time sequence hybrid prediction method |
CN115905198A (en) * | 2022-11-24 | 2023-04-04 | 中国长江电力股份有限公司 | Water level data early warning method for key water level station of Yangtze river basin |
CN115713164B (en) * | 2022-11-26 | 2023-11-24 | 福建中锐汉鼎数字科技有限公司 | Drainage basin downstream water level prediction method |
CN115713164A (en) * | 2022-11-26 | 2023-02-24 | 福建中锐汉鼎数字科技有限公司 | Drainage basin downstream water level prediction method |
CN115828757A (en) * | 2022-12-12 | 2023-03-21 | 福建中锐汉鼎数字科技有限公司 | Flood discharge hysteresis characteristic construction and selection method for basin water level prediction |
CN115828757B (en) * | 2022-12-12 | 2024-02-23 | 福建中锐汉鼎数字科技有限公司 | Flood discharge hysteresis characteristic structure and selection method for drainage basin water level prediction |
Also Published As
Publication number | Publication date |
---|---|
CN107992447B (en) | 2019-12-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107992447A (en) | A kind of feature selecting decomposition method applied to river level prediction data | |
Jayawardena et al. | Noise reduction and prediction of hydrometeorological time series: dynamical systems approach vs. stochastic approach | |
Pougaza et al. | Maximum entropies copulas | |
Wang et al. | A compound framework for wind speed forecasting based on comprehensive feature selection, quantile regression incorporated into convolutional simplified long short-term memory network and residual error correction | |
CN110621026B (en) | Multi-moment prediction method for base station flow | |
CN106502815A (en) | A kind of abnormal cause localization method, device and computing device | |
Zhang et al. | A conjunction method of wavelet transform-particle swarm optimization-support vector machine for streamflow forecasting | |
Nourani et al. | A new hybrid algorithm for rainfall–runoff process modeling based on the wavelet transform and genetic fuzzy system | |
CN112036042A (en) | Power equipment abnormality detection method and system based on variational modal decomposition | |
CN108647807A (en) | The prediction technique of river discharge | |
CN115587666A (en) | Load prediction method and system based on seasonal trend decomposition and hybrid neural network | |
Kosana et al. | Hybrid wind speed prediction framework using data pre-processing strategy based autoencoder network | |
CN114358389A (en) | Short-term power load prediction method combining VMD decomposition and time convolution network | |
Hosseini et al. | Short-term traffic flow forecasting by mutual information and artificial neural networks | |
Shiri et al. | Coupling wavelet transform with multivariate adaptive regression spline for simulating suspended sediment load: independent testing approach | |
CN108241900A (en) | Engineering project construction period prediction method, device and system | |
CN105303051A (en) | Air pollutant concentration prediction method | |
Lian | Runoff forecasting model based on CEEMD and combination model: a case study in the Manasi River, China | |
Pandhiani et al. | Time series forecasting by using hybrid models for monthly streamflow data | |
CN113626929A (en) | Multi-stage multi-topology ship traffic complexity measuring method and system | |
Collins et al. | Markov models in geography | |
CN115936196A (en) | Monthly rainfall model prediction method based on time sequence convolution network | |
CN116578858A (en) | Air compressor fault prediction and health degree evaluation method and system based on graphic neural network | |
Mombeini et al. | Developing a new approach for forecasting the trends of oil price | |
Latifoğlu | A novel approach for prediction of daily streamflow discharge data using correlation based feature selection and random forest method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20191217 Termination date: 20211213 |