CN110097116A - A kind of virtual sample generation method based on independent component analysis and Density Estimator - Google Patents
A kind of virtual sample generation method based on independent component analysis and Density Estimator Download PDFInfo
- Publication number
- CN110097116A CN110097116A CN201910357339.XA CN201910357339A CN110097116A CN 110097116 A CN110097116 A CN 110097116A CN 201910357339 A CN201910357339 A CN 201910357339A CN 110097116 A CN110097116 A CN 110097116A
- Authority
- CN
- China
- Prior art keywords
- sample
- component analysis
- independent component
- density
- virtual
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2134—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on separation criteria, e.g. independent component analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The virtual sample generation method based on independent component analysis and Density Estimator that the invention discloses a kind of, the present invention is in system initial operating stage, in the insufficient situation of training samples number, utilize the method for Density Estimator, pass through the probability density function of the PDF estimation sample entirety of a small amount of sample, when there are the correlations between the method removal each attribute of original sample for first using independent component analysis when correlation between each attribute of original sample, Density Estimator is carried out again, and virtual sample is generated according to the probability density function that estimation obtains.When the present invention can alleviate training machine learning model the problem of lack of training samples, the accuracy of machine learning model is improved.Compared to other virtual sample generation methods, invention introduces Independent Component Analysis to solve the problems, such as having correlation between each attribute of sample, to widen application surface of the invention.
Description
Technical field
The invention belongs to computer fields, and in particular to a kind of virtual sample based on independent component analysis and Density Estimator
This generation method.
Background technique
Machine learning method is increasingly used among every field at present.The classical theory of statistics can not be solved
Certainly the problem of, it is desirable to can go to solve with the method for machine learning.Accuracy shadow of the sample size to machine learning method
Sound is very big.But in many cases, due to being limited by sampling time and cost, often there is that sample size is insufficient to ask
Topic.
Virtual sample generation technique is proposed by Niyogi etc. earliest.Virtual sample generation method is divided into three classes by Wang Xu etc.,
Based on priori knowledge, the distribution function based on disturbance and based on research field.Virtual sample generation technique is applied to energy
In the building process of prediction model, virtual sample generation technique is obviously improved the precision of energy prediction model.Lee
Et al. using potential information function generate virtual sample, promote Demand Forecast Model performance neural network based.Arora et al.
Virtual sample is generated by empirical equation, and successfully constructs one based on artificial neuron using the data set with virtual sample
The computation model of network estimates battery-heating rate.
Existing virtual sample generation method does not account for sample attribute mainly for sample mutually independent between attribute
Between correlation.
Summary of the invention
The purpose of the present invention is to overcome the above shortcomings and to provide a kind of application surfaces more extensively, operation is simpler based on independent
The virtual sample generation method of constituent analysis and Density Estimator, improves the accuracy rate of machine learning model.
In order to achieve the above object, the present invention the following steps are included:
Step 1 carries out independent component analysis to raw sample data, removes the correlation between attribute, and discriminatory analysis knot
Whether structure restrains;
Step 2 uses multicore density estimation estimated probability density function to independent sample, and sample if convergence;
If not restraining, multicore density estimation estimated probability density function is used to original sample, and sample;
Step 3 restores the data that sampling is restrained in step 2 using the result of independent component analysis in step 1
Correlation, the data after sampling convergence map back original sample space, obtain virtual sample;
Virtual sample is mixed with original sample, obtains the sample set of final expansion by step 4.
When, there are when correlation, the obtained result of independent component analysis is as follows between each attribute of sample data:
Assuming that be collected into a small amount of sample and be,
X=(x1,x2,…,xn),x∈Rn
Assuming that x be it is obtained after linear transformation by n mutually independent random variables s, then have,
S=(s1,s2,…,sn),s∈Rn
Assuming that A is that hybrid matrix then has,
x(i)=As(i), i=(1,2 ..., m), A are constant;
Wherein, x is the sample being collected into, and s is the independent random variable obtained after independent component analysis.
Method using multicore density estimation estimated probability density function is as follows:
The mathematic(al) representation of Density Estimator is,
Smooth coefficients h is solved according to mean square integral error function, wherein f (s) is the trues probability density function of s,For the estimation to f (s);
After solving smooth coefficients h, the estimation to s probability density function is just completed.
Using Gaussian function as kernel function, Gaussian function expression formula is,
In step 2, the method for sampling is as follows:
sv=si+hsr, 1≤i of where≤n, sr~N (0,1);
Wherein, svFor sampling value.
Raw sample data is the insufficient data of training samples number.
Compared with prior art, the present invention is close using core under system initial operating stage, the insufficient situation of training samples number
The method of degree estimation works as original sample by the probability density function of the PDF estimation sample entirety of a small amount of sample
There are the methods that independent component analysis is first used when correlation to remove the correlation between each attribute of original sample between each attribute,
Density Estimator is carried out again, and virtual sample is generated according to the probability density function that estimation obtains.The present invention can alleviate training airplane
When device learning model the problem of lack of training samples, the accuracy of machine learning model is improved.It is raw compared to other virtual samples
At method, invention introduces Independent Component Analysis to solve the problems, such as having correlation between each attribute of sample, thus
Application surface of the invention is widened.
Detailed description of the invention
Fig. 1 is flow chart of the invention;
Fig. 2 is the schematic diagram in the embodiment of the present invention.
Specific embodiment
The present invention will be further described with reference to the accompanying drawing.
Referring to Fig. 1, the present invention the following steps are included:
Step 1, to the insufficient raw sample data of training samples number carry out independent component analysis, remove attribute between
Correlation, and whether discriminatory analysis structure restrains;
Step 2 uses multicore density estimation estimated probability density function to independent sample, and sample if convergence;
If not restraining, multicore density estimation estimated probability density function is used to original sample, and sample;
Step 3 restores the data that sampling is restrained in step 2 using the result of independent component analysis in step 1
Correlation, the data after sampling convergence map back original sample space, obtain virtual sample;
Virtual sample is mixed with the data not restrained after sampling, obtains final virtual sample by step 4.
When, there are when correlation, the obtained result of independent component analysis is as follows between each attribute of sample data:
Assuming that be collected into a small amount of sample and be,
X=(x1,x2,…,xn),x∈Rn
Assuming that x be it is obtained after linear transformation by n mutually independent random variables s, then have,
S=(s1,s2,…,sn),s∈Rn
Assuming that A is that hybrid matrix then has,
x(i)=As(i), i=(1,2 ..., m);
Wherein, x is the sample being collected into, and s is the independent random variable obtained after independent component analysis, and A is normal
Amount.
Method using multicore density estimation estimated probability density function is as follows:
The mathematic(al) representation of Density Estimator is,
Smooth coefficients h is solved according to mean square integral error function, wherein f (s) is the trues probability density function of s,For the estimation to f (s);
After solving smooth coefficients h, the estimation to s probability density function is just completed.
Using Gaussian function as kernel function, Gaussian function expression formula is,
The method of sampling is as follows:
sv=si+hsr, 1≤i of where≤n, sr~N (0,1), svFor sampling value.
Embodiment:
If s=[- 4, -3.5, -2, -1, -0.75,1,3,3.2,4,4.2,4,6], after Density Estimator, in Fig. 2
Solid line isDotted line depicts the gaussian kernel function being added on each original sample.When generating virtual sample, one is selected first
A original sample, original sample when being selectively s=3 in Fig. 2, blue curve depict the gaussian kernel function at s=3.So
The one-dimensional random number s for meeting normal distribution is regenerated afterwardsr, s is taken hereinr=0.29.Last basis is found out by Density Estimator
H find out virtual independent sample sv.S=3 in this example, sr=0.29, h=1.4614 obtain s according to formulav=3+0.29*
1.461=3.4237.
It is sampled according to above step, the virtual independent sample s of quantity is satisfied with until obtainingv, finally according to formula
(3), by independent virtual sample map back it is original have sample space, obtain virtual sample,
xv (i)=Asv (i), i=1,2 ..., m.
Claims (6)
1. a kind of virtual sample generation method based on independent component analysis and Density Estimator, which is characterized in that including following
Step:
Step 1 carries out independent component analysis to raw sample data, removes the correlation between attribute, and discriminatory analysis structure is
No convergence;
Step 2 uses multicore density estimation estimated probability density function to independent sample, and sample if convergence;If no
Convergence then uses multicore density estimation estimated probability density function to original sample, and samples;
Step 3 carries out the data for restraining sampling in step 2 to restore related using the result of independent component analysis in step 1
Property, the data after sampling convergence map back original sample space, obtain virtual sample;
Step 4 mixes virtual sample with original sample, the sample set finally expanded.
2. a kind of virtual sample generation method based on independent component analysis and Density Estimator according to claim 1,
It is characterized in that, when, there are when correlation, the obtained result of independent component analysis is as follows between each attribute of sample data:
Assuming that be collected into a small amount of sample and be,
X=(x1,x2,…,xn),x∈Rn
Assuming that x be it is obtained after linear transformation by n mutually independent random variables s, then have,
S=(s1,s2,…,sn),s∈Rn
Assuming that A is that hybrid matrix then has,
x(i)=As(i), i=(1,2 ..., m);
Wherein, x is the sample being collected into, and s is the independent random variable obtained after independent component analysis, and A is constant.
3. a kind of virtual sample generation method based on independent component analysis and Density Estimator according to claim 1,
It is characterized in that, as follows using the method for multicore density estimation estimated probability density function:
The mathematic(al) representation of Density Estimator is,
Smooth coefficients h is solved according to mean square integral error function, wherein f (s) is the trues probability density function of s,For
Estimation to f (s);
After solving smooth coefficients h, the estimation to s probability density function is just completed.
4. a kind of virtual sample generation method based on independent component analysis and Density Estimator according to claim 3,
It is characterized in that, using Gaussian function as kernel function, Gaussian function expression formula is,
5. a kind of virtual sample generation method based on independent component analysis and Density Estimator according to claim 1,
It is characterized in that, the method for sampling is as follows in step 2:
sv=si+hsr, where1≤i≤n, sr~N (0,1)
Wherein, svFor sampling value.
6. a kind of virtual sample generation method based on independent component analysis and Density Estimator according to claim 1,
It is characterized in that, raw sample data is the insufficient data of training samples number.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910357339.XA CN110097116A (en) | 2019-04-29 | 2019-04-29 | A kind of virtual sample generation method based on independent component analysis and Density Estimator |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910357339.XA CN110097116A (en) | 2019-04-29 | 2019-04-29 | A kind of virtual sample generation method based on independent component analysis and Density Estimator |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110097116A true CN110097116A (en) | 2019-08-06 |
Family
ID=67446517
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910357339.XA Pending CN110097116A (en) | 2019-04-29 | 2019-04-29 | A kind of virtual sample generation method based on independent component analysis and Density Estimator |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110097116A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111160619A (en) * | 2019-12-06 | 2020-05-15 | 北京国电通网络技术有限公司 | Power load prediction method based on data derivation |
CN112098915A (en) * | 2020-11-05 | 2020-12-18 | 武汉格蓝若智能技术有限公司 | Method for evaluating secondary errors of multiple voltage transformers under double-bus segmented wiring |
-
2019
- 2019-04-29 CN CN201910357339.XA patent/CN110097116A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111160619A (en) * | 2019-12-06 | 2020-05-15 | 北京国电通网络技术有限公司 | Power load prediction method based on data derivation |
CN112098915A (en) * | 2020-11-05 | 2020-12-18 | 武汉格蓝若智能技术有限公司 | Method for evaluating secondary errors of multiple voltage transformers under double-bus segmented wiring |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Kim et al. | Softflow: Probabilistic framework for normalizing flow on manifolds | |
CN110381523B (en) | Cellular base station network traffic prediction method based on TVF-EMD-LSTM model | |
CN107506822B (en) | Deep neural network method based on space fusion pooling | |
CN106202756B (en) | Deficient based on single layer perceptron determines blind source separating source signal restoration methods | |
CN110097116A (en) | A kind of virtual sample generation method based on independent component analysis and Density Estimator | |
CN111639808A (en) | Multi-wind-farm output scene generation method and system considering time-space correlation | |
CN111209974A (en) | Tensor decomposition-based heterogeneous big data core feature extraction method and system | |
CN111783209A (en) | Self-adaptive structure reliability analysis method combining learning function and kriging model | |
CN106204477B (en) | Video frequency sequence background restoration methods based on online low-rank background modeling | |
CN105895089A (en) | Speech recognition method and device | |
CN113627685B (en) | Wind driven generator power prediction method considering wind power internet load limit | |
Vu et al. | Accelerating iterative hard thresholding for low-rank matrix completion via adaptive restart | |
CN114528097A (en) | Cloud platform service load prediction method based on time sequence convolution neural network | |
CN105978733A (en) | Network flow modelling method and system based on Weibull distribution | |
CN109376651A (en) | The system that a kind of GPU based on CUDA framework accelerates spike classification | |
CN105844094B (en) | Blind source separating source signal restoration methods are determined based on gradient descent method and the deficient of Newton method | |
CN105354807B (en) | A kind of image analogy method based on parsing rarefaction representation | |
CN111460368A (en) | Parallel Bayesian optimization method | |
Okamura et al. | Estimating markov-modulated compound poisson processes | |
CN113158134B (en) | Method, device and storage medium for constructing non-invasive load identification model | |
CN109390946B (en) | Optimal probability load flow rapid calculation method based on multi-parameter planning theory | |
EP3678140A1 (en) | Method and device for simulating atomic dynamics | |
CN111476408A (en) | Power communication equipment state prediction method and system | |
CN104064195A (en) | Multidimensional blind separation method in noise environment | |
Kuznetsov et al. | Comparative analysis of two modified fast simulation methods for evaluation of the failure probability of a rank structure system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190806 |
|
RJ01 | Rejection of invention patent application after publication |