CN108875962A - Core ridge regression on-line study method based on fixed budget - Google Patents
Core ridge regression on-line study method based on fixed budget Download PDFInfo
- Publication number
- CN108875962A CN108875962A CN201810593893.3A CN201810593893A CN108875962A CN 108875962 A CN108875962 A CN 108875962A CN 201810593893 A CN201810593893 A CN 201810593893A CN 108875962 A CN108875962 A CN 108875962A
- Authority
- CN
- China
- Prior art keywords
- ridge regression
- budget
- sample
- model
- core
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The core ridge regression on-line study method based on fixed budget that the present invention relates to a kind of, budget value is determined by numerical experiment first, construct initial learning sample set, it establishes core ridge regression model and solves and obtain fallout predictor, core ridge regression model is updated using low-rank matrix alignment technique and Sherman-Morrison-Woodbury formula and obtains on-line prediction device, and then realizes the on-line prediction to data flow.This method uses fixed budget strategy, can be effectively controlled the scale of on-line study model, saves memory space, computation complexity is effectively reduced, is easily achieved.On-line study method of the present invention, the on-line prediction problem with data flow characteristics can flexibly be handled, data can be collected by way of data block, compared with traditional batch processing mode and current on-line study method, computation complexity and model running time are considerably reduced, it being capable of efficient process recurrence and classification problem.
Description
Technical field
The invention belongs to data minings and machine learning field, are related to the method for data mining and data processing, specifically
It says, is related to a kind of core ridge regression on-line study method based on fixed budget.
Background technique
The unbiasedness that ridge regression is estimated by abandoning regression coefficient is obtained using losing partial information, reducing precision as cost
More steady regression coefficient estimation, the fitting effect in ill data are better than least square method.And the core of geo-nuclear tracin4 is merged
Ridge regression can effectively deal with nonlinear problem thus obtain more being widely applied.The solution of traditional core ridge regression model is base
It is executed in batch algorithms, the computation complexity of algorithm is O (n3), wherein n is sample number.However, more and more practical
Data handled by problem have data flow characteristic, such as dynamic industrial processes optimization, sensor real-time monitoring, sampling
Data are constantly acquired over time in the form of data flow.Batch algorithms are not suitable for due to computation complexity height
Handle above-mentioned data-flow problem.The on-line learning algorithm of core ridge regression for this purpose, domestic and foreign scholars begin one's study is calculated with reducing
Complexity reduces the model running time, and marked achievement is the increment type core ridge regression on-line study method that B.W.Chen is proposed.
This method updates core ridge regression model using Sherman-Morrison-Woodbury formula iteration, by each model modification
Computation complexity is by O (n3) drop to O (n2).Since sample size increases linearly over time, the scale of core ridge regression model, storage are empty
Between, runing time will all be continuously increased therewith.To solve the above problems, urgently developing a kind of based on fixed budget learning sample collection
Core ridge regression on-line study method, the memory space of effective Controlling model and learning time while guaranteeing model accuracy,
To adapt to data stream environment.
Summary of the invention
It is an object of the invention to for existing core ridge regression on-line study method can not effectively Controlling model scale etc. be no
Foot, proposes a kind of core ridge regression on-line study method based on fixed budget, this method can reduce model memory space, subtract
Few runing time, meets the real-time demand of application problem.
According to embodiments of the present invention, a kind of core ridge regression on-line study side based on fixed budget learning sample collection is proposed
Method contains following steps:
(1) budget value is determined by numerical experiment;
(2) initial learning sample is randomly selected according to budget and constructs initial learning sample set, establish ridge regression model,
It is the ridge regression model without intercept by ridge regression model conversation by centralization method and obtains ridge regression solution, introducing geo-nuclear tracin4 will
Ridge regression fallout predictor equivalence is converted into core ridge regression fallout predictor;
(3) data flow is acquired in the form of mini-batch or one-by-one, using fallout predictor to the sample in data flow
This is predicted;
(4) noise in data flow is rejected using 3 σ rules, to keep the stability of fallout predictor;
(5) part sample is added by learning sample set according to sample contribution margin, and rejects phase according to minimum contribution criterion
The sample of quantity is answered, budget is maintained to stablize;
(6) core ridge regression is updated using low-rank matrix alignment technique and Sherman-Morrison-Woodbury formula
Model obtains on-line prediction device, carries out on-line prediction to data stream by on-line prediction device.
In learning method according to an embodiment of the present invention, in step (1), determine budget the specific steps are:
(1) training sample set and test sample set are determined.
(2) estimated value to be measured is successively chosen, randomly selects respective number in training sample set according to estimated value to be measured
Sample, establish core ridge regression model, and the precision of the estimated value is tested using test sample set.
(3) it executes step (2) 10 times, and calculates the average test precision and mean test time of each budget.
(4) double longitudinal axis curves are drawn using average test precision and mean test time, comprehensively considers time cost and core
Ridge regression model accuracy determines reasonable budget.
In learning method according to an embodiment of the present invention, in step (2), obtain fallout predictor the specific steps are:
Training sample learning of structure sample set is randomly selected according to determining budget n, establishes ridge regression model, ridge regression
Model is expressed as:
Wherein, β is the coefficient vector of ridge regression fallout predictor, and b is intercept item, eiFor error term, λ is model regularization ginseng
Number,It indicates Feature Mapping, is implicitly determined by way of specified kernel function;
Intercept item in model is removed using following centralization method, specific method is:WithReplace xij,Table
The sample average for showing j-th of input variable, is used in combinationEstimation as intercept item b.At this point, coefficient vector can be solved
β obtains ridge regression solution, is expressed as:
β=[φT(X)φ(X)+λI]-1φT(X)y, (2)
Wherein,Y=[y1;…;yn].
Following inner product representation is converted by ridge regression solution (2) equivalence
φT(X)[φ(X)φT(X)+λI]-1y, (3)
Core ridge regression fallout predictor can be obtained by being further introduced into geo-nuclear tracin4:
F (x)=k (x, X) (K+ λ I)-1y. (4)
Wherein,K (x, X)=[k (x, x1),k(x,x2),…,k(x,xn)], k
() is kernel function, is specified by user.
In learning method according to an embodiment of the present invention, in step (4), after collecting sample true tag, comparison
The prediction output of fallout predictor calculates the contribution of sample | yi-f(xi) |, the noise in data flow is rejected according to 3- σ rule.As excellent
It selects, in step (5), fixed budget learning sample collection is added in the maximum sample of contribution margin, contributes criterion from budget according to minimum
Learning sample concentrates the sample for rejecting respective numbers to maintain budget.
In learning method according to an embodiment of the present invention, in step (6), using low-rank matrix alignment technique and
Sherman- Morrison-Woodbury formula update core ridge regression model, obtain on-line prediction device the specific steps are:
(1) using the sample in data flowThe sample for replacing former learning sample to concentrate
(2) marking the symmetric positive definite matrix (K+ λ I) for needing to invert in old model is A, i.e.,
Structural correction matrix U ∈ Rn×m, it is embodied as:
And correction matrix V ∈ Rn×m, it is embodied as:
(3) constructed correction matrix U ∈ R is utilizedn×mWith V ∈ Rn×mSymmetric positive definite matrix A is corrected, i.e.,:
UTV+VTU+A (8)
(4) inverse matrix of symmetric positive definite matrix in (8) is updated using Sherman-Morrison-Woodbury formula:
Q-1-Q-1V(I+UTQ-1V)-1UTQ-1 (9)
Wherein, Q-1=A-1-A-1U(I+VTA-1U)-1VTA-1;
(5) right-hand-side vector y is updated according to learning sample set, obtains updated fallout predictor, i.e. on-line prediction device.
Core ridge regression on-line study method proposed by the present invention based on fixed budget, determines the scale of learning sample collection i.e.
Budget selects initial learning sample collection, establishes core ridge regression model and solution obtains fallout predictor, utilize low-rank matrix alignment technique
And Sherman-Morrison-Woodbury formula updates core ridge regression model and obtains on-line prediction device, realizes to data
The on-line prediction of stream.This method uses fixed budget strategy, can be effectively controlled the scale of on-line study model, saves storage sky
Between, reduce computation complexity, be easily achieved.Pass through the core ridge regression on-line study according to embodiments of the present invention based on fixed budget
Method can flexibly handle the on-line prediction problem with data flow characteristics, and data can be collected by way of data block, with
Traditional batch processing mode and current on-line study method are compared, and computation complexity is considerably reduced, and reduce model fortune
The row time can flexibly handle recurrence and classification problem.Particularly, it can will be calculated when handling leave one cross validation problem complicated
It spends from O (n4) drop to (n3)。
Detailed description of the invention
Attached drawing 1 is core ridge regression on-line study method schematic diagram of the embodiment of the present invention based on fixed budget.
Attached drawing 2 be in the embodiment of the present invention the upper budget Budget Size of benchmark dataset Cpusmall to model accuracy with
The impact analysis figure of runing time.
Attached drawing 3 be the upper different data block size Chunk Size of benchmark data set Cpusmall to learning method of the present invention with
The mean test time of existing learning method influences schematic diagram.
Attached drawing 4 to learning method of the present invention and has for different data block size Chunk Size on benchmark data set Casp
The mean test time of learning method influences schematic diagram.
Subordinate list 1 is the average on-line testing essence of learning method of the present invention and existing learning method in six benchmark datasets
Degree and mean test time compare.
Specific embodiment
Below in conjunction with attached drawing, embodiments of the present invention is further illustrated.
Embodiment:It is illustrated by taking regression problem as an example.As shown in Figure 1, a kind of base provided according to embodiments of the present invention
In the core ridge regression on-line study method of fixed budget, which contains following steps:
Step 1:Budget value is determined by numerical experiment.The specific steps are that:
(1) select pending data set, in the present embodiment, be illustrated by taking benchmark dataset Cpusmall as an example.
The sample total of Cpusmall data set is 8192.It is 1 that sample block size, which is arranged, and 5000 are randomly selected from Cpusmall
Sample architecture training sample set, remaining sample architecture test set.Select Gaussian radial basis functionAs kernel function, the wide parameter σ of core takes default value, the i.e. dimension of sample.
(2) determine that budget set to be measured is combined into { 200,300 ..., 4900,5000 }.
(3) budget is successively chosen from the set of above-mentioned steps (2), phase is randomly selected in training sample concentration according to budget
It answers the sample of number to establish core ridge regression model, and tests the precision of the budget using test set.
(4) it successively executes above-mentioned steps (3) 10 times, and calculates the average test precision and runing time of each budget.
(5) double longitudinal axis curves are drawn using mean test time and average measuring accuracy, as shown in Fig. 2, when comprehensively considering
Between cost and model accuracy to determine budget space between 3500-4500.In the present embodiment, without loss of generality, selection is pre-
Calculate is 4000.
Step 2:Initial training set is randomly selected according to budget, establishes ridge regression model, by centralization method by ridge
Regression model is converted into the ridge regression model of no intercept and acquires ridge regression solution, is turned ridge regression fallout predictor equivalence by geo-nuclear tracin4
Turn to core ridge regression fallout predictor.The specific steps are that:
Training sample learning of structure sample set is randomly selected according to determining budget n, establishes ridge regression model, ridge regression
Model is expressed as:
Wherein, β is the coefficient vector of ridge regression fallout predictor, and b is intercept item, eiFor error term, λ is model regularization ginseng
Number,It indicates Feature Mapping, is implicitly determined by way of specified kernel function;
Intercept item in model is removed using following centralization method, specific method is:WithReplace xij,Table
Show the sample average of j-th of input variable, and usesEstimation as intercept item b.At this point, can solve coefficient to
Amount β obtains ridge regression solution, is expressed as:
β=[φT(X)φ(X)+λI]-1φT(X)y, (2)
Wherein,Y=[y1;…;yn].
By ridge regression solution (2) can equivalence be converted into following inner product representation
φT(X)[φ(X)φT(X)+λI]-1y, (3)
Core ridge regression fallout predictor can be obtained by being further introduced into geo-nuclear tracin4:
F (x)=k (x, X) (K+ λ I)-1y. (4)
Wherein,K (x, X)=[k (x, x1),k(x,x2),…,k(x,xn)], k
() is kernel function, is specified by user.
Step 3:As shown in Figure 1, data flow is acquired in the form of mini-batch, using fallout predictor in data flow
Sample is predicted.
Step 4:Noise in data flow is rejected using 3- σ rule, to keep the stability of fallout predictor.
Step 5:After collecting sample true tag, the prediction output for comparing fallout predictor calculates the contribution of sample | yi-f
(xi) |, fixed budget learning sample collection is added in maximum contribution sample, is concentrated according to minimum contribution criterion from budget learning sample
The learning sample of respective numbers is rejected to maintain vector budget to stablize.
Step 6:Core ridge is updated using low-rank matrix alignment technique and Sherman-Morrison-Woodbury formula
Regression model obtains on-line prediction device, carries out on-line prediction to data stream by on-line prediction device.
Fig. 3 uses on-line study method of the present invention and sliding window core Ridge Regression Modeling Method in the case of being different Chunk Size
With average survey of the LS-SVMs on-line study method based on budget supporting vector collection on benchmark dataset Casp and Cpusmall
Time comparison diagram is tried, as seen from Figure 3, the on-line study method testing time of the present invention is in different Chunk Size
It is superior to other two methods.
Table 1 is listed using on-line study method of the present invention and existing increment type core Ridge Regression Modeling Method, sliding window core ridge
Homing method and LS-SVMs on-line study method based on budget supporting vector collection benchmark dataset Abalonescale,
Average on-line testing precision and mean test time on Kin, Letters, Pendigits, Cpusmall and Poker.By table
1 as can be seen that on-line study method of the present invention is in the case where guaranteeing measuring accuracy, and the testing time is unanimously better than other sides
Method.
Table 1
Above-described embodiment is used to explain the present invention, rather than limits the invention, in spirit and right of the invention
It is required that protection scope in, to any modifications and changes for making of the present invention, both fall within protection scope of the present invention.
Claims (4)
1. the core ridge regression on-line study method based on fixed budget, it is characterised in that contain following steps:
(1) budget value is determined by numerical experiment;
(2) initial learning sample is randomly selected according to budget and construct initial learning sample set, establish ridge regression model, pass through
Ridge regression model conversation is the ridge regression model without intercept and obtains ridge regression solution by centralization method, is introduced geo-nuclear tracin4 and is gone back to ridge
Fallout predictor equivalence is returned to be converted into core ridge regression fallout predictor;
(3) in the form of mini-batch or one-by-one acquire data flow, using fallout predictor to the sample in data flow into
Row prediction;
(4) noise in data flow is rejected using 3- σ rule, to keep the stability of fallout predictor;
(5) part sample is added by learning sample set according to sample contribution margin, and rejects respective counts according to minimum contribution criterion
The sample of amount maintains budget to stablize;
(6) core ridge regression mould is updated using low-rank matrix alignment technique and Sherman-Morrison-Woodbury formula
Type obtains on-line prediction device, carries out on-line prediction to data stream by on-line prediction device.
2. the core ridge regression on-line study method according to claim 1 based on fixed budget, it is characterised in that:Step
(1) in, determine budget value the specific steps are:
(1) training sample set and test sample set are determined.
(2) estimated value to be measured is successively chosen, randomly selects the sample of respective number in training sample set according to estimated value to be measured
This, establishes core ridge regression model, and the precision of the estimated value is tested using test sample set.
(3) it executes step (2) 10 times, and calculates the average test precision and mean test time of each budget.
(4) double longitudinal axis curves are drawn using average test precision and mean test time, comprehensively considers time cost and core ridge is returned
Model accuracy is returned to determine reasonable budget.
3. the core ridge regression on-line study method according to claim 1 based on fixed budget, it is characterised in that:Step
(2) in, obtain fallout predictor the specific steps are:
Training sample learning of structure sample set is randomly selected according to determining budget n, establishes ridge regression model, ridge regression model
It is expressed as:
Wherein, β is the coefficient vector of ridge regression fallout predictor, and b is intercept item, eiFor error term, λ is model regularization parameter,It indicates Feature Mapping, is implicitly determined by way of specified kernel function;
Intercept item in model is removed using following centralization method, specific method is:WithReplace xij,Indicate jth
The sample average of a input variable, is used in combinationEstimation as intercept item b.It is obtained at this point, coefficient vector β can be solved
Ridge regression solution, is expressed as:
β=[φT(X)φ(X)+λI]-1φT(X)y, (2)
Wherein,Y=[y1;…;yn].
Following inner product representation is converted by ridge regression solution (2) equivalence
φT(X)[φ(X)φT(X)+λI]-1y, (3)
Core ridge regression fallout predictor can be obtained by being further introduced into geo-nuclear tracin4:
F (x)=k (x, X) (K+ λ I)-1y. (4)
Wherein,K (x, X)=[k (x, x1),k(x,x2),…,k(x,xn)], k
() is kernel function, is specified by user.
4. the core ridge regression on-line study method according to claim 1 based on fixed budget, it is characterised in that:Step
(6) in, core ridge regression model is updated using low-rank matrix alignment technique and Sherman-Morrison-Woodbury formula,
Obtain on-line prediction device the specific steps are:
(1) using the sample in data flowThe sample for replacing former learning sample to concentrate
(2) marking the symmetric positive definite matrix (K+ λ I) for needing to invert in old model is A, i.e.,
Structural correction matrix U ∈ Rn×m, it is embodied as:
And correction matrix V ∈ Rn×m, it is embodied as:
(3) constructed correction matrix U ∈ R is utilizedn×mWith V ∈ Rn×mSymmetric positive definite matrix A is corrected, i.e.,:
UTV+VTU+A (8)
(4) inverse matrix of symmetric positive definite matrix in (8) is updated using Sherman-Morrison-Woodbury formula:
Q-1-Q-1V(I+UTQ-1V)-1UTQ-1 (9)
Wherein, Q-1=A-1-A-1U(I+VTA-1U)-1VTA-1;
(5) right-hand-side vector y is updated according to learning sample set, obtains updated fallout predictor, i.e. on-line prediction device.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810593893.3A CN108875962A (en) | 2018-06-11 | 2018-06-11 | Core ridge regression on-line study method based on fixed budget |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810593893.3A CN108875962A (en) | 2018-06-11 | 2018-06-11 | Core ridge regression on-line study method based on fixed budget |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108875962A true CN108875962A (en) | 2018-11-23 |
Family
ID=64337912
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810593893.3A Pending CN108875962A (en) | 2018-06-11 | 2018-06-11 | Core ridge regression on-line study method based on fixed budget |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108875962A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111815681A (en) * | 2020-09-04 | 2020-10-23 | 中国科学院自动化研究所 | Target tracking method based on deep learning and discriminant model training and memory |
CN117952566A (en) * | 2024-03-25 | 2024-04-30 | 南京审计大学 | Project cost prediction method and computer system based on ridge regression machine learning |
-
2018
- 2018-06-11 CN CN201810593893.3A patent/CN108875962A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111815681A (en) * | 2020-09-04 | 2020-10-23 | 中国科学院自动化研究所 | Target tracking method based on deep learning and discriminant model training and memory |
CN117952566A (en) * | 2024-03-25 | 2024-04-30 | 南京审计大学 | Project cost prediction method and computer system based on ridge regression machine learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106485262B (en) | Bus load prediction method | |
Iskrev | Local identification in DSGE models | |
GB2601929A (en) | A machine-learning based architecture search method for a neural network | |
CN106021298B (en) | A kind of collaborative filtering recommending method and system based on asymmetric Weighted Similarity | |
CN106355192A (en) | Support vector machine method based on chaos and grey wolf optimization | |
CN109034175B (en) | Image processing method, device and equipment | |
CN112557034B (en) | Bearing fault diagnosis method based on PCA _ CNNS | |
CN109409425B (en) | Fault type identification method based on neighbor component analysis | |
CN111582538A (en) | Community value prediction method and system based on graph neural network | |
CN113393057A (en) | Wheat yield integrated prediction method based on deep fusion machine learning model | |
CN113674087A (en) | Enterprise credit rating method, apparatus, electronic device and medium | |
CN108875962A (en) | Core ridge regression on-line study method based on fixed budget | |
Davino et al. | Quantile composite-based path modeling | |
CN110324178B (en) | Network intrusion detection method based on multi-experience nuclear learning | |
CN105787507B (en) | LS SVMs on-line study methods based on budget supporting vector collection | |
CN112950048A (en) | National higher education system health evaluation based on fuzzy comprehensive evaluation | |
Westphal et al. | Improving model selection by employing the test data | |
CN114580151A (en) | Water demand prediction method based on gray linear regression-Markov chain model | |
CN108875961A (en) | A kind of online weighting extreme learning machine method based on pre- boundary's mechanism | |
CN111026661B (en) | Comprehensive testing method and system for software usability | |
Degeest et al. | Feature ranking in changing environments where new features are introduced | |
US11803815B1 (en) | System for the computer matching of targets using machine learning | |
Nikolikj et al. | Sensitivity Analysis of RF+ clust for Leave-one-problem-out Performance Prediction | |
CN113296947A (en) | Resource demand prediction method based on improved XGboost model | |
CN109902762A (en) | The data preprocessing method deviateed based on 1/2 similarity |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20181123 |
|
WD01 | Invention patent application deemed withdrawn after publication |