CN107992447A - A kind of feature selecting decomposition method applied to river level prediction data - Google Patents

A kind of feature selecting decomposition method applied to river level prediction data Download PDF

Info

Publication number
CN107992447A
CN107992447A CN201711330726.1A CN201711330726A CN107992447A CN 107992447 A CN107992447 A CN 107992447A CN 201711330726 A CN201711330726 A CN 201711330726A CN 107992447 A CN107992447 A CN 107992447A
Authority
CN
China
Prior art keywords
mrow
feature
msub
collection
lasso
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711330726.1A
Other languages
Chinese (zh)
Other versions
CN107992447B (en
Inventor
杨拥军
管杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201711330726.1A priority Critical patent/CN107992447B/en
Publication of CN107992447A publication Critical patent/CN107992447A/en
Application granted granted Critical
Publication of CN107992447B publication Critical patent/CN107992447B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • G06F17/148Wavelet transforms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/32Circuit design at the digital level
    • G06F30/333Design for testability [DFT], e.g. scan chain or built-in self-test [BIST]

Abstract

The invention discloses a kind of feature selecting decomposition method applied to river level prediction data, in order to obtain most suitable as the feature of mode input, returned present invention introduces LASSO and carry out feature selecting to being originally inputted collection, and integrate MODWT and ingredient breakdown is carried out to the feature that selection obtains, and using the performance of model measurement LASSO MODWT based on multiple linear regression.Test shows that the feature selecting decomposition method based on LASSO MODWT is conducive to improve the performance and model explanation ability of river level prediction model.

Description

A kind of feature selecting decomposition method applied to river level prediction data
Technical field
The invention belongs to water level forecast technical field, and in particular to a kind of feature applied to river level prediction data is selected Select the design of decomposition method.
Background technology
Water level forecast plays the role of particularly important for flood control and disaster reduction, water resource utilization and allocation managing.One sane Forecast model of water level the situations of change of future levels can be provided for relevant Decision person, grasp potential hydrological disaster in time, So as to carry out correlation early warning deployment earlier.In water level forecast field, due to influencing the multi-dimensional nature and complexity of water level factor, mould Nonlinear dynamical relations and a variety of correlations are often presented between the potential input quantity of type system.In addition the number of input quantity is general Larger, especially into the dimension and computation complexity of increase feature that can be drastically after the hysteresis of each variable, but these become A large amount of duplicate messages and noise contribution are actually included in amount.In order to reduce the computational complexity of model, the flexible of model is improved Property and explanation strengths, it is necessary to the validity feature that selection includes minimum redundancy be concentrated from original high dimensional data, so as to build one Possesses the model that is more succinct, more reflecting real water level changing rule of retractility.
LASSO is proposed first by Robert Tibshirani in 1996, full name Least absolute shrinkage and selection operator.This method is a kind of Shrinkage estimation, it obtains one more by constructing a penalty function The model of refining so that it compresses some coefficients, and it is zero to concurrently set some coefficients.Therefore the advantages of subset is shunk is remained, It is a kind of Biased estimator for handling and there are multi-collinearity data.The basic thought of LASSO is in the sum of absolute value of regression coefficient Under constraints less than or equal to a constant, residual sum of squares (RSS) RSS is minimized, be strictly 0 so as to produce some Regression coefficient, obtains the model with interpretability after compressive features.
Wavelet transform (DWT) is widely used in the model of many integrated small echos, can obtain data Detailed spectrum information, such as periodicity, localized variation characteristic, randomness and mutability.But since it is with extraction effect, The effect can introduce potential loss of learning so as to produce deviation in prediction in the model construction stage.In addition the small echo of DWT Conversion coefficient result is related with the initial position of wavelet transformation, so as to bring certain contingency.
Drawbacks described above based on DWT, related personnel further provide Maximum overlap wavelet transform (MODWT, Maximum overlap discrete wavelet transform) method as feature decomposition.MODWT is a kind of linear Filter operation, can preferably solve extraction effect, by MODWT, can obtain the multidimensional wavelet system with dimension with observation Number.In addition, the result of wavelet transformation is unrelated with the start position converted, it can be used for the conversion of different sample size data.Always For, MODWT can use the different frequency range component of extraction input signal, so as to obtain the information of more horn of plenty, disclose data Potential changing rule.
The content of the invention
The purpose of the invention is to reduce the computational complexity of existing forecast model of water level, while it is pre- to improve existing water level Survey flexibility and the explanation strengths of model, it is proposed that a kind of feature selecting decomposition method applied to river level prediction data.
The technical scheme is that:A kind of feature selecting decomposition method applied to river level prediction data, including Following steps:
S1, collection influence hydrographic features (the current level information of targeted sites, the upstream basin of target prediction website water level Water level information and on the way rainfall etc.).
S2, according to each hydrographic features, based on information theory construction feature collection.
S3, based on correlation analysis in feature set each feature introduce hysteresis, structure be originally inputted collection.
S4, be standardized to being originally inputted collection.
S5, based on LASSO to after standardization input set carry out feature selecting.
S6, based on MODWT to after feature selecting input set carry out feature decomposition, obtain after LASSO-MODWT optimizes Input set.
The beneficial effects of the invention are as follows:The present invention is returned using LASSO and carries out feature selecting to being originally inputted collection, and is integrated MODWT carries out ingredient breakdown to the feature that selection obtains so that the estimated performance of river level is obviously improved, is conducive to carry The performance and model explanation ability of high river level prediction model.
Further, step S2 is specially:Each hydrographic features are calculated respectively and predict the maximum information coefficient between target MIC, analyzes the intensity of its relation between prediction target, will predict that the MIC value between target will more than the hydrology of given threshold Element is used as input feature vector, construction feature collection.
The calculation formula of maximum information coefficient MIC is:
Wherein X, Y are two stochastic variables, and B limits for segmentation, takes 0.6 or 0.55 power of total amount of data, MIC [X;Y] Represent the maximum information coefficient between X and Y, I [X;Y] represent mutual information between X and Y, calculation formula is:
Wherein p (X) and p (Y) represents the probability density function of X, Y respectively, and p (X, Y) represents that the joint probability of X, Y are close Spend distribution function.
Above-mentioned further scheme has the beneficial effect that:Each hydrographic features and prediction mesh are analyzed using maximum information coefficient MIC Relationship strength between mark, will predict between target there is the factor of stronger relation as input feature vector, construction feature collection.
Further, step S3 is specially:For the current level information of targeted sites in feature set, using partial autocorrelation Function PACF determines hysteresis, for other input feature vectors in feature set, is analyzed using cross-correlation coefficient and determines hysteresis;It is right In each hysteresis, if clear and definite statistic correlation is presented between prediction target in it, that is, reach 95% confidential interval, Then the hysteresis is added in input set, collection is originally inputted so as to build.
Above-mentioned further scheme has the beneficial effect that:Since prediction target river level information is time series, so that structure Build the influence for being considered as introducing hysteresis when being originally inputted collection.
Further, step S4 is specially:Collected using min-max value standardization processing method to being originally inputted into rower Quasi-ization processing, will be originally inputted collection and zooms in [0,1] section, and processing formula is:
Wherein xi,normFor the data value after standardization, xiRepresent to be originally inputted and concentrate the i-th data item for needing to standardize, NminAnd NmaxThe minimum value and maximum respectively scaled, is 0 and 1, xminAnd xmaxRespectively it is originally inputted the minimum of concentration Value and maximum.
Above-mentioned further scheme has the beneficial effect that:Since different input datas have different dimensions, in order to adopt Assessed with same standard to being originally inputted collection, it is necessary to be standardized to being originally inputted collection, realize nondimensionalization, Collection will be originally inputted to zoom in [0,1] section.
Further, step S5 specifically include it is following step by step:
S51, using the input set after standardization as mode input, will predict the waterlevel data collection of targeted sites as Model exports, and builds LASSO regression models.
S52, be trained LASSO regression models, and the parameter lambda returned using grid data service to LASSO carries out optimizing, Find optimized parameter.
S53, using the LASSO regression models with optimized parameter score the feature in input set, standards of grading The regression coefficient returned for LASSO, selects LASSO regression coefficients to be remained in for positive feature in input set, will LASSO regression coefficients are 0 or are that negative feature is removed from input set, realize the feature selecting to input set.
Above-mentioned further scheme has the beneficial effect that:Feature choosing is carried out to the input set after standardization by LASSO After selecting, prediction accuracy can be improved on the premise of mode input parameter is greatly decreased.
Further, step S6 is specially:Feature decomposition is carried out to the input set after feature selecting using MODWT models, The wavelet systems manifold that all feature decompositions are obtained is used to build the input set after optimization.
The formula of wherein feature decomposition is:
Wherein f (t) is characterized the wavelet coefficient for decomposing and obtaining,To the smooth near of original signal during to carry out M layers of decomposition Like wavelet, Wm(t) be original signal in m layers of decomposition wavelet, m=1,2 ..., M, M be the minimal decomposition number of plies, calculation formula is:
M=int [log (N)] (5)
Wherein N is characterized the input set length after selection, and int [] is the function that rounds up.
Above-mentioned further scheme has the beneficial effect that:Feature is carried out to the input set after feature selecting using MODWT models Resolution significantly improves the precision of river level prediction.
Further, MODWT models use Daubechies wavelet basis.
Above-mentioned further scheme has the beneficial effect that:Need to select suitable wavelet basis when establishing the model based on MODWT Function, since currently without a clear and definite general basic function selection criteria, also which kind of base is pertinent literature explanation do not select Function can obtain best modelling effect, and different application scene is adapted to different basic functions in theory, it is contemplated that hydrologic(al) prognosis is fitted Irregular wavelet basis is shared, the present invention uses Daubechies wavelet basis, its extensive use and hydrologic(al) prognosis field.
Brief description of the drawings
Fig. 1 show a kind of feature selecting decomposition side applied to river level prediction data provided in an embodiment of the present invention Method flow chart.
Fig. 2 show the Daubechies wavelet basis provided in an embodiment of the present invention using db3 forms and WL_CS is carried out The comparative result figure that DMDWT is obtained.
Fig. 3 show three hours provided in an embodiment of the present invention and predicts different input set predicted values and actual value comparison diagram.
Fig. 4 show three hours provided in an embodiment of the present invention and predicts different input set predicted values and actual value scatter diagram.
Embodiment
Carry out detailed description of the present invention illustrative embodiments referring now to attached drawing.It should be appreciated that shown in attached drawing and What the embodiment of description was merely exemplary, it is intended that explaination the principle of the present invention and spirit, and not limit the model of the present invention Enclose.
An embodiment of the present invention provides a kind of feature selecting decomposition method applied to river level prediction data, such as Fig. 1 It is shown, comprise the following steps S1-S6:
S1, collection influence hydrographic features (including the current level information of targeted sites, the upstream of target prediction website water level The hydrographic features such as basin water level information and on the way rainfall).
In the embodiment of the present invention, by taking the SEA LEVEL VARIATION trend of Chishui River middle and lower reaches as an example, it is therefore intended that prediction Chishui station is not Come 3 it is small when and 6 it is small when water level conditions.The data of use are by Chishui River middle and lower reaches bank automatic monitor station in 2015 and 2016 Gathered during 5~October, the associated stations information being related to is shown in Table 1 and shows.Since data are to gather to store by hour, so that always Share 8834 data points.Missing is unavoidably had in data acquisition and storing process, analysis finds that missing data is WL_ Data are carried out interpolation polishing by totally 126 item datas of 02~2015-10-14 of MT 2015-10-09 07 using pandas.
Table 1
Code name Meaning of parameters Monitoring station Data type Collection period
WL_CS Chishui station water level Chishui station Water level By hour
WL_EL Two youths station water level Two youths stand Water level By hour
WL_MT Maotai station water level Maotai station Water level By hour
RF_CS Chishui station rainfall Chishui station Rainfall By hour
RF_XS Xishui County station rainfall Xishui County station Rainfall By hour
S2, according to each hydrographic features, based on information theory construction feature collection.
Calculate respectively each hydrographic features and predict target between maximum information coefficient MIC, analyze itself and prediction target it Between relation intensity, by predict target between MIC value be more than given threshold hydrographic features (i.e. prediction target between with compared with The hydrographic features of strong relation) it is used as input feature vector, construction feature collection.
The calculation formula of maximum information coefficient MIC is:
Wherein X, Y are two stochastic variables, and B limits for segmentation, determines the upper limit of X, Y separate division, evidence of generally fetching 0.6 or 0.55 power of total amount, MIC [X;Y] represent X and Y between maximum information coefficient, I [X;Y] represent X and Y between it is mutual Information, calculation formula are:
Wherein p (X) and p (Y) represents the probability density function of X, Y respectively, and p (X, Y) represents that the joint probability of X, Y are close Spend distribution function.
In the embodiment of the present invention, a total of 5 features of feature set, include following content:(1) three Hydrologic monitoring station Chishui Stand, Maotai station, two youths stand waterlevel data (code name WL_CS, WL_MT, WL_EL);(2) two weather monitoring station Chishui station, practise Water station rainfall product data (code name RF_CS, RF_XS).
S3, based on correlation analysis in feature set each feature introduce hysteresis, structure be originally inputted collection.
Since prediction target river level information is time series, it is considered as introducing hysteresis when being originally inputted collection so as to build The influence of amount.In the embodiment of the present invention, for the current level information of targeted sites in feature set, using partial autocorrelation function PACF determines hysteresis, for other input feature vectors in feature set, is analyzed using cross-correlation coefficient and determines hysteresis;For every One hysteresis, if clear and definite statistic correlation (confidential interval for reaching 95%) is presented between prediction target in it, The hysteresis is added in input set, collection is originally inputted so as to build.Partial autocorrelation function PACF is analyzed with cross-correlation coefficient Method is correlation analysis commonly used in the art, and details are not described herein.
In the embodiment of the present invention, 3h predicts the spy for being originally inputted collection after introducing hysteresis to each feature by correlation analysis It is 221 to levy number, and 6h is predicted as 229.
S4, be standardized to being originally inputted collection.
Since different input datas have different dimensions, in order to which same standard can be used to be carried out to being originally inputted collection Assessment realizes nondimensionalization, it is necessary to be standardized to being originally inputted collection.In the embodiment of the present invention, using min-max Value standardization processing method (Min-Max Scaler) is standardized to being originally inputted collection, will be originally inputted collection and is zoomed to In [0,1] section, processing formula is:
Wherein xi,normFor the data value after standardization, xiRepresent to be originally inputted and concentrate the i-th data item for needing to standardize, NminAnd NmaxThe minimum value and maximum respectively scaled, is 0 and 1, xminAnd xmaxRespectively it is originally inputted the minimum of concentration Value and maximum.
S5, based on LASSO to after standardization input set carry out feature selecting.
In order to simplify input set, the feature most suitable as input is selected, to element input set in the embodiment of the present invention Returned based on LASSO and carry out feature selecting., can be by the recurrence system of redundant character since it introduces L1 regular terms as penalty term Number boil down to 0, so as to be a kind of sparse features system of selection based on the LASSO feature selectings returned.
Step S5 specifically includes following S51-S53 step by step:
S51, using the input set after standardization as mode input, will predict the waterlevel data collection of targeted sites as Model exports, and builds LASSO regression models.
S52, be trained LASSO regression models, and the parameter lambda returned using grid data service to LASSO carries out optimizing, Find optimized parameter.
S53, using the LASSO regression models with optimized parameter score the feature in input set, standards of grading The regression coefficient returned for LASSO, selects LASSO regression coefficients to be remained in for positive feature in input set, will LASSO regression coefficients are 0 or are that negative feature is removed from input set, realize the feature selecting to input set.
In the embodiment of the present invention, the feature that 3h is predicted after the feature selecting based on LASSO is 49, the spy of 6h predictions Levy as 88.It can be seen that the number of input feature vector is all greatly reduced under two kinds of prediction scenes, model construction is thereby reduced Complexity.
S6, based on MODWT to after feature selecting input set carry out feature decomposition, obtain after LASSO-MODWT optimizes Input set.
Feature decomposition is carried out to the input set after feature selecting using MODWT models, all feature decompositions are obtained small Wave system manifold is used to build the input set after optimization.
The formula of wherein feature decomposition is:
Wherein f (t) is characterized the wavelet coefficient for decomposing and obtaining,To the smooth near of original signal during to carry out M layers of decomposition Like wavelet, Wm(t) be original signal in m layers of decomposition wavelet, m=1,2 ..., M, M be the minimal decomposition number of plies, calculation formula is:
M=int [log (N)] (5)
Wherein N is characterized the input set length after selection, and int [] is the function that rounds up.
Effective input set in the embodiment of the present invention is 8678, therefore the minimal decomposition number of plies of MODWT is:M=log (8678)=3.93, rounding M=4, takes two kinds of situations of M=4 and M=5 to be tested in the embodiment of the present invention.
Although MODWT has been proved to possess many advantages as a multiresolution features identification facility, building Be based on MODWT model when face one challenge be the suitable wavelet basis function of selection, due to currently without one clearly General basic function selection criteria, also pertinent literature does not illustrate to select which kind of basic function to obtain best modelling effect, Different application scene is adapted to different basic functions in theory.It is adapted in view of hydrologic(al) prognosis with irregular wavelet basis, the present invention Embodiment uses Daubechies wavelet basis, its extensive use and hydrologic(al) prognosis field.Db2, db3 are used in the embodiment of the present invention Contrast test is carried out with the wavelet basis of tri- kinds of forms of db4, seeks to be most suitable for the wavelet basis for Chishui River water level forecast.
Be shown in Fig. 2 WL_CS is carried out using the Daubechies wavelet basis of db3 forms it is that DMDWT is obtained as a result, from upper 6 subgraphs under be respectively original signal waveform, smoothed approximation waveform (A4) and four layers of DMDWT decomposition coefficients (d1, d2, d3, d4).To reduce computational complexity, the embodiment of the present invention by score most important WL_CS this feature of LASSO only to being divided Solution, input set (4 layers, 5 layers are decomposed respectively 5 dimensions, 6 and maintain number) is added using the wavelet coefficient obtained after decomposition as new feature, At this time 3 it is small when predicted characteristics be 53,6 it is small when predicted characteristics be 92.
Since no general single index for being used to assess hydrologic forecast model performance, the embodiment of the present invention are assorted by receiving Efficiency factor ENS, tri- kinds of statistics exponent pair estimated performances of root-mean-square error RMSE and mean absolute error MAE integrate commenting Sentence.
(1) assorted efficiency factor E is receivedNS
(2) root-mean-square error RMSE:
(3) mean absolute error MAE:
Wherein, SWLOBSFor measured water level SWLFORFor the water level obtained by model prediction, N is data point number,For the population mean of measured water level.
In the embodiment of the present invention, collection is originally inputted by what is obtained based on correlation analysis, by the spy based on LASSO respectively The input that the input set of selection and the input set after LASSO-MODWT optimizes are levied as multiple linear regression model is used in advance Survey Chishui station 3 as a child with 6 waterlevel datas when small, and then assess LASSO-MODWT feature selecting decomposition methods performance.Table 2 be different input sets for prediction Chishui station 3 when small and when 6 is small water level performance comparison.It is from table 2 it can be seen that either right Prediction is predicted when still 6 is small when 3 is small, mode input parameter can be greatly decreased after the feature selecting based on LASSO Under the premise of improve prediction accuracy;And precision of prediction can be significantly improved after integrated MODWT, and for 3 it is small when prediction and 6 it is small when Prediction has good performance.
Table 2
Fig. 3 is the contrast of water level forecast result and actual value when different input sets are small to during in the August, 2016 of Chishui station 3, Fig. 4 is three kinds of input set predicted values and actual value scatter diagram.As can be seen that after the decomposition of LASSO-MODWT feature selectings, phase Prediction result for being originally inputted collection, the predicted value of LASSO-W-MLR and the degree of approximation higher of actual value, model performance is more Stablize.So as to illustrate that LASSO-MODWT feature selectings decomposition method can be obviously improved Chishui River forecast model of water level Precision and stability.
In order to further study influence of the different small echo base types to Chishui River water level forecast performance, in the embodiment of the present invention Two kinds of Decomposition orders of tri- kinds of small echos of db2, db3, db4 and level4, level5 are emulated respectively, table 3 is using different small Ripple base and Decomposition order carry out the results of property of 3h predictions and 6h predictions.It is from table 3 it can be seen that small using db2 wavelet basis and 5 layers Wave Decomposition can obtain more preferably estimated performance in Chishui River forecast model of water level.The result further illustrates different application Scene is adapted to use different wavelet basis, in actual modeling process, should carry out demonstration trial with reference to specific requirements, find most Suitable wavelet basis and Decomposition order, so as to improve model accuracy.
Table 3
Those of ordinary skill in the art will understand that the embodiments described herein, which is to help reader, understands this hair Bright principle, it should be understood that protection scope of the present invention is not limited to such special statement and embodiment.This area Those of ordinary skill these disclosed technical inspirations can make according to the present invention and various not depart from the other each of essence of the invention The specific deformation of kind and combination, these deform and combine still within the scope of the present invention.

Claims (10)

1. a kind of feature selecting decomposition method applied to river level prediction data, it is characterised in that comprise the following steps:
S1, collection influence the hydrographic features of target prediction website water level;
S2, according to each hydrographic features, based on information theory construction feature collection;
S3, based on correlation analysis in feature set each feature introduce hysteresis, structure be originally inputted collection;
S4, be standardized to being originally inputted collection;
S5, based on LASSO to after standardization input set carry out feature selecting;
S6, based on MODWT to after feature selecting input set carry out feature decomposition, obtain defeated after LASSO-MODWT optimizes Enter collection.
2. feature selecting decomposition method according to claim 1, it is characterised in that influence target prediction in the step S1 The current level information of hydrographic features of website water level including targeted sites, upstream basin water level information and rainfall on the way.
3. feature selecting decomposition method according to claim 1, it is characterised in that the step S2 is specially:Count respectively Calculate each hydrographic features and predict the maximum information coefficient MIC between target, analyze the intensity of its relation between prediction target, will MIC value between prediction target is more than the hydrographic features of given threshold as input feature vector, construction feature collection.
4. feature selecting decomposition method according to claim 3, it is characterised in that the meter of the maximum information coefficient MIC Calculating formula is:
<mrow> <mi>M</mi> <mi>I</mi> <mi>C</mi> <mo>&amp;lsqb;</mo> <mi>X</mi> <mo>;</mo> <mi>Y</mi> <mo>&amp;rsqb;</mo> <mo>=</mo> <munder> <mrow> <mi>m</mi> <mi>a</mi> <mi>x</mi> </mrow> <mrow> <mo>|</mo> <mi>X</mi> <mo>|</mo> <mo>|</mo> <mi>Y</mi> <mo>|</mo> <mo>&lt;</mo> <mi>B</mi> </mrow> </munder> <mfrac> <mrow> <mi>I</mi> <mo>&amp;lsqb;</mo> <mi>X</mi> <mo>;</mo> <mi>Y</mi> <mo>&amp;rsqb;</mo> </mrow> <mrow> <msub> <mi>log</mi> <mn>2</mn> </msub> <mrow> <mo>(</mo> <mi>m</mi> <mi>i</mi> <mi>n</mi> <mo>(</mo> <mrow> <mo>|</mo> <mi>X</mi> <mo>|</mo> <mo>,</mo> <mo>|</mo> <mi>Y</mi> <mo>|</mo> </mrow> <mo>)</mo> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>
Wherein X, Y are two stochastic variables, and B limits for segmentation, takes 0.6 or 0.55 power of total amount of data, MIC [X;Y] represent X Maximum information coefficient between Y, I [X;Y] represent mutual information between X and Y, calculation formula is:
<mrow> <mi>I</mi> <mo>&amp;lsqb;</mo> <mi>X</mi> <mo>;</mo> <mi>Y</mi> <mo>&amp;rsqb;</mo> <mo>=</mo> <munder> <mo>&amp;Sigma;</mo> <mrow> <mi>X</mi> <mo>,</mo> <mi>Y</mi> </mrow> </munder> <mi>p</mi> <mrow> <mo>(</mo> <mi>X</mi> <mo>,</mo> <mi>Y</mi> <mo>)</mo> </mrow> <msub> <mi>log</mi> <mn>2</mn> </msub> <mfrac> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>X</mi> <mo>,</mo> <mi>Y</mi> <mo>)</mo> </mrow> </mrow> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>X</mi> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>Y</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow>
Wherein p (X) and p (Y) represents the probability density function of X, Y respectively, and p (X, Y) represents the joint probability density point of X, Y Cloth function.
5. feature selecting decomposition method according to claim 3, it is characterised in that the step S3 is specially:For spy The current level information of targeted sites in collection, determines hysteresis, for its in feature set using partial autocorrelation function PACF Its input feature vector, is analyzed using cross-correlation coefficient and determines hysteresis;For each hysteresis, if it is between prediction target Clear and definite statistic correlation is presented, that is, reaches 95% confidential interval, then the hysteresis is added in input set, so as to build It is originally inputted collection.
6. feature selecting decomposition method according to claim 1, it is characterised in that the step S4 is specially:Using most Small-maximum standardization processing method is standardized to being originally inputted collection, will be originally inputted collection and is zoomed to [0,1] section Interior, processing formula is:
<mrow> <msub> <mi>x</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>n</mi> <mi>o</mi> <mi>r</mi> <mi>m</mi> </mrow> </msub> <mo>=</mo> <msub> <mi>N</mi> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> </msub> <mo>+</mo> <mfrac> <mrow> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>-</mo> <msub> <mi>x</mi> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> </msub> </mrow> <mrow> <msub> <mi>x</mi> <mi>max</mi> </msub> <mo>-</mo> <msub> <mi>x</mi> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> </msub> </mrow> </mfrac> <mo>&amp;times;</mo> <mrow> <mo>(</mo> <msub> <mi>N</mi> <mrow> <mi>m</mi> <mi>a</mi> <mi>x</mi> </mrow> </msub> <mo>-</mo> <msub> <mi>N</mi> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow>
Wherein xi,normFor the data value after standardization, xiRepresent to be originally inputted and concentrate the i-th data item for needing to standardize, NminWith NmaxThe minimum value and maximum respectively scaled, is 0 and 1, xminAnd xmaxRespectively it is originally inputted the minimum value and most of concentration Big value.
7. feature selecting decomposition method according to claim 1, it is characterised in that the step S5 specifically includes following point Step:
S51, using the input set after standardization as mode input, the waterlevel data collection of targeted sites will be predicted as model Output, builds LASSO regression models;
S52, be trained LASSO regression models, and the parameter lambda returned using grid data service to LASSO carries out optimizing, finds Optimized parameter;
S53, using the LASSO regression models with optimized parameter score the feature in input set, and standards of grading are The regression coefficient that LASSO is returned, selects LASSO regression coefficients to be remained in for positive feature in input set, by LASSO Regression coefficient is 0 or is that negative feature is removed from input set, realizes the feature selecting to input set.
8. feature selecting decomposition method according to claim 1, it is characterised in that the step S6 is specially:Using MODWT models carry out feature decomposition to the input set after feature selecting, and the wavelet systems manifold that all feature decompositions obtain is used for Input set after structure optimization.
9. feature selecting decomposition method according to claim 8, it is characterised in that the formula of the feature decomposition is:
<mrow> <mi>f</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>=</mo> <mover> <mi>W</mi> <mo>&amp;OverBar;</mo> </mover> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>+</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>m</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <msub> <mi>W</mi> <mi>m</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow>
Wherein f (t) is characterized the wavelet coefficient for decomposing and obtaining,To smoothed approximation of original signal during to carry out M layers of decomposition Ripple, Wm(t) be original signal in m layers of decomposition wavelet, m=1,2 ..., M, M be the minimal decomposition number of plies, calculation formula is:
M=int [log (N)] (5)
Wherein N is characterized the input set length after selection, and int [] is the function that rounds up.
10. feature selecting decomposition method according to claim 8, it is characterised in that the MODWT models use Daubechies wavelet basis.
CN201711330726.1A 2017-12-13 2017-12-13 Feature selection decomposition method applied to river water level prediction data Expired - Fee Related CN107992447B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711330726.1A CN107992447B (en) 2017-12-13 2017-12-13 Feature selection decomposition method applied to river water level prediction data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711330726.1A CN107992447B (en) 2017-12-13 2017-12-13 Feature selection decomposition method applied to river water level prediction data

Publications (2)

Publication Number Publication Date
CN107992447A true CN107992447A (en) 2018-05-04
CN107992447B CN107992447B (en) 2019-12-17

Family

ID=62038276

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711330726.1A Expired - Fee Related CN107992447B (en) 2017-12-13 2017-12-13 Feature selection decomposition method applied to river water level prediction data

Country Status (1)

Country Link
CN (1) CN107992447B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108921222A (en) * 2018-07-05 2018-11-30 四川泰立智汇科技有限公司 A kind of air-conditioning energy consumption feature selection approach based on big data
CN109488321A (en) * 2019-01-03 2019-03-19 天津大学 A kind of Cutter Head Torque in Shield Tunneling determines method and system
CN109736819A (en) * 2019-01-03 2019-05-10 天津大学 A kind of shield driving gross thrust determines method and system
CN110311376A (en) * 2019-07-31 2019-10-08 三峡大学 A kind of Electrical Power System Dynamic security evaluation collective model and space-time method for visualizing
CN110427663A (en) * 2019-07-17 2019-11-08 武汉大学 Face precipitation-water-level simulation method based on time series network
CN111539587A (en) * 2020-03-06 2020-08-14 李�杰 Hydrological forecasting method
CN112529252A (en) * 2020-11-18 2021-03-19 贵州电网有限责任公司 Small hydropower station forebay water level prediction method and prediction system
CN113222145A (en) * 2021-06-04 2021-08-06 西安邮电大学 MODWT-EMD-based time sequence hybrid prediction method
US20210374864A1 (en) * 2020-05-29 2021-12-02 Fortia Financial Solutions Real-time time series prediction for anomaly detection
CN115713164A (en) * 2022-11-26 2023-02-24 福建中锐汉鼎数字科技有限公司 Drainage basin downstream water level prediction method
CN115828757A (en) * 2022-12-12 2023-03-21 福建中锐汉鼎数字科技有限公司 Flood discharge hysteresis characteristic construction and selection method for basin water level prediction
CN115905198A (en) * 2022-11-24 2023-04-04 中国长江电力股份有限公司 Water level data early warning method for key water level station of Yangtze river basin

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101877029A (en) * 2009-11-25 2010-11-03 国网电力科学研究院 Hydrologic forecasting method of hydrologic model combination of different mechanisms
CN102789445A (en) * 2012-07-13 2012-11-21 南京大学 Establishment method for wavelet analysis and rank set pair analysis of medium and long-term hydrological forecast model
CN104050242A (en) * 2014-05-27 2014-09-17 哈尔滨理工大学 Feature selection and classification method based on maximum information coefficient and feature selection and classification device based on maximum information coefficient
CN105512767A (en) * 2015-12-15 2016-04-20 武汉大学 Flood forecasting method of multiple forecast periods
CN105577679A (en) * 2016-01-14 2016-05-11 华东师范大学 Method for detecting anomaly traffic based on feature selection and density peak clustering
JP2017015453A (en) * 2015-06-29 2017-01-19 株式会社東芝 Control processing apparatus for dam management and control processing method for dam management

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101877029A (en) * 2009-11-25 2010-11-03 国网电力科学研究院 Hydrologic forecasting method of hydrologic model combination of different mechanisms
CN102789445A (en) * 2012-07-13 2012-11-21 南京大学 Establishment method for wavelet analysis and rank set pair analysis of medium and long-term hydrological forecast model
CN104050242A (en) * 2014-05-27 2014-09-17 哈尔滨理工大学 Feature selection and classification method based on maximum information coefficient and feature selection and classification device based on maximum information coefficient
JP2017015453A (en) * 2015-06-29 2017-01-19 株式会社東芝 Control processing apparatus for dam management and control processing method for dam management
CN105512767A (en) * 2015-12-15 2016-04-20 武汉大学 Flood forecasting method of multiple forecast periods
CN105577679A (en) * 2016-01-14 2016-05-11 华东师范大学 Method for detecting anomaly traffic based on feature selection and density peak clustering

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
小木匠_: "数据归归一化方法", 《HTTPS://BLOG.CSDN.NET/QQ_20823641/ARTICLE/DETAILS/51345057》 *
钱塘小甲子: "最大信息系数(MIC)", 《HTTPS://BLOG.CSDN.NET/QTLYX/ARTICLE/DETAILS/50780400》 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108921222A (en) * 2018-07-05 2018-11-30 四川泰立智汇科技有限公司 A kind of air-conditioning energy consumption feature selection approach based on big data
CN109488321A (en) * 2019-01-03 2019-03-19 天津大学 A kind of Cutter Head Torque in Shield Tunneling determines method and system
CN109736819A (en) * 2019-01-03 2019-05-10 天津大学 A kind of shield driving gross thrust determines method and system
CN109488321B (en) * 2019-01-03 2019-11-29 天津大学 A kind of Cutter Head Torque in Shield Tunneling determines method and system
CN110427663A (en) * 2019-07-17 2019-11-08 武汉大学 Face precipitation-water-level simulation method based on time series network
CN110311376A (en) * 2019-07-31 2019-10-08 三峡大学 A kind of Electrical Power System Dynamic security evaluation collective model and space-time method for visualizing
CN110311376B (en) * 2019-07-31 2022-12-20 三峡大学 Dynamic safety assessment comprehensive model and space-time visualization method for power system
CN111539587B (en) * 2020-03-06 2023-11-24 武汉极善信息技术有限公司 Hydrologic forecasting method
CN111539587A (en) * 2020-03-06 2020-08-14 李�杰 Hydrological forecasting method
US20210374864A1 (en) * 2020-05-29 2021-12-02 Fortia Financial Solutions Real-time time series prediction for anomaly detection
CN112529252A (en) * 2020-11-18 2021-03-19 贵州电网有限责任公司 Small hydropower station forebay water level prediction method and prediction system
CN112529252B (en) * 2020-11-18 2022-05-03 贵州电网有限责任公司 Small hydropower station forebay water level prediction method and prediction system
CN113222145A (en) * 2021-06-04 2021-08-06 西安邮电大学 MODWT-EMD-based time sequence hybrid prediction method
CN113222145B (en) * 2021-06-04 2023-12-22 西安邮电大学 MODTT-EMD-based time sequence hybrid prediction method
CN115905198A (en) * 2022-11-24 2023-04-04 中国长江电力股份有限公司 Water level data early warning method for key water level station of Yangtze river basin
CN115713164B (en) * 2022-11-26 2023-11-24 福建中锐汉鼎数字科技有限公司 Drainage basin downstream water level prediction method
CN115713164A (en) * 2022-11-26 2023-02-24 福建中锐汉鼎数字科技有限公司 Drainage basin downstream water level prediction method
CN115828757A (en) * 2022-12-12 2023-03-21 福建中锐汉鼎数字科技有限公司 Flood discharge hysteresis characteristic construction and selection method for basin water level prediction
CN115828757B (en) * 2022-12-12 2024-02-23 福建中锐汉鼎数字科技有限公司 Flood discharge hysteresis characteristic structure and selection method for drainage basin water level prediction

Also Published As

Publication number Publication date
CN107992447B (en) 2019-12-17

Similar Documents

Publication Publication Date Title
CN107992447A (en) A kind of feature selecting decomposition method applied to river level prediction data
Jayawardena et al. Noise reduction and prediction of hydrometeorological time series: dynamical systems approach vs. stochastic approach
Pougaza et al. Maximum entropies copulas
Wang et al. A compound framework for wind speed forecasting based on comprehensive feature selection, quantile regression incorporated into convolutional simplified long short-term memory network and residual error correction
CN110621026B (en) Multi-moment prediction method for base station flow
CN106502815A (en) A kind of abnormal cause localization method, device and computing device
Zhang et al. A conjunction method of wavelet transform-particle swarm optimization-support vector machine for streamflow forecasting
Nourani et al. A new hybrid algorithm for rainfall–runoff process modeling based on the wavelet transform and genetic fuzzy system
CN112036042A (en) Power equipment abnormality detection method and system based on variational modal decomposition
CN108647807A (en) The prediction technique of river discharge
CN115587666A (en) Load prediction method and system based on seasonal trend decomposition and hybrid neural network
Kosana et al. Hybrid wind speed prediction framework using data pre-processing strategy based autoencoder network
CN114358389A (en) Short-term power load prediction method combining VMD decomposition and time convolution network
Hosseini et al. Short-term traffic flow forecasting by mutual information and artificial neural networks
Shiri et al. Coupling wavelet transform with multivariate adaptive regression spline for simulating suspended sediment load: independent testing approach
CN108241900A (en) Engineering project construction period prediction method, device and system
CN105303051A (en) Air pollutant concentration prediction method
Lian Runoff forecasting model based on CEEMD and combination model: a case study in the Manasi River, China
Pandhiani et al. Time series forecasting by using hybrid models for monthly streamflow data
CN113626929A (en) Multi-stage multi-topology ship traffic complexity measuring method and system
Collins et al. Markov models in geography
CN115936196A (en) Monthly rainfall model prediction method based on time sequence convolution network
CN116578858A (en) Air compressor fault prediction and health degree evaluation method and system based on graphic neural network
Mombeini et al. Developing a new approach for forecasting the trends of oil price
Latifoğlu A novel approach for prediction of daily streamflow discharge data using correlation based feature selection and random forest method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20191217

Termination date: 20211213