CN110097116A - A kind of virtual sample generation method based on independent component analysis and Density Estimator - Google Patents

A kind of virtual sample generation method based on independent component analysis and Density Estimator Download PDF

Info

Publication number
CN110097116A
CN110097116A CN201910357339.XA CN201910357339A CN110097116A CN 110097116 A CN110097116 A CN 110097116A CN 201910357339 A CN201910357339 A CN 201910357339A CN 110097116 A CN110097116 A CN 110097116A
Authority
CN
China
Prior art keywords
sample
component analysis
independent component
density
virtual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910357339.XA
Other languages
Chinese (zh)
Inventor
董小社
袁坤
王龙翔
张兴军
王强
王宇菲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN201910357339.XA priority Critical patent/CN110097116A/en
Publication of CN110097116A publication Critical patent/CN110097116A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2134Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on separation criteria, e.g. independent component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The virtual sample generation method based on independent component analysis and Density Estimator that the invention discloses a kind of, the present invention is in system initial operating stage, in the insufficient situation of training samples number, utilize the method for Density Estimator, pass through the probability density function of the PDF estimation sample entirety of a small amount of sample, when there are the correlations between the method removal each attribute of original sample for first using independent component analysis when correlation between each attribute of original sample, Density Estimator is carried out again, and virtual sample is generated according to the probability density function that estimation obtains.When the present invention can alleviate training machine learning model the problem of lack of training samples, the accuracy of machine learning model is improved.Compared to other virtual sample generation methods, invention introduces Independent Component Analysis to solve the problems, such as having correlation between each attribute of sample, to widen application surface of the invention.

Description

A kind of virtual sample generation method based on independent component analysis and Density Estimator
Technical field
The invention belongs to computer fields, and in particular to a kind of virtual sample based on independent component analysis and Density Estimator This generation method.
Background technique
Machine learning method is increasingly used among every field at present.The classical theory of statistics can not be solved Certainly the problem of, it is desirable to can go to solve with the method for machine learning.Accuracy shadow of the sample size to machine learning method Sound is very big.But in many cases, due to being limited by sampling time and cost, often there is that sample size is insufficient to ask Topic.
Virtual sample generation technique is proposed by Niyogi etc. earliest.Virtual sample generation method is divided into three classes by Wang Xu etc., Based on priori knowledge, the distribution function based on disturbance and based on research field.Virtual sample generation technique is applied to energy In the building process of prediction model, virtual sample generation technique is obviously improved the precision of energy prediction model.Lee Et al. using potential information function generate virtual sample, promote Demand Forecast Model performance neural network based.Arora et al. Virtual sample is generated by empirical equation, and successfully constructs one based on artificial neuron using the data set with virtual sample The computation model of network estimates battery-heating rate.
Existing virtual sample generation method does not account for sample attribute mainly for sample mutually independent between attribute Between correlation.
Summary of the invention
The purpose of the present invention is to overcome the above shortcomings and to provide a kind of application surfaces more extensively, operation is simpler based on independent The virtual sample generation method of constituent analysis and Density Estimator, improves the accuracy rate of machine learning model.
In order to achieve the above object, the present invention the following steps are included:
Step 1 carries out independent component analysis to raw sample data, removes the correlation between attribute, and discriminatory analysis knot Whether structure restrains;
Step 2 uses multicore density estimation estimated probability density function to independent sample, and sample if convergence; If not restraining, multicore density estimation estimated probability density function is used to original sample, and sample;
Step 3 restores the data that sampling is restrained in step 2 using the result of independent component analysis in step 1 Correlation, the data after sampling convergence map back original sample space, obtain virtual sample;
Virtual sample is mixed with original sample, obtains the sample set of final expansion by step 4.
When, there are when correlation, the obtained result of independent component analysis is as follows between each attribute of sample data:
Assuming that be collected into a small amount of sample and be,
X=(x1,x2,…,xn),x∈Rn
Assuming that x be it is obtained after linear transformation by n mutually independent random variables s, then have,
S=(s1,s2,…,sn),s∈Rn
Assuming that A is that hybrid matrix then has,
x(i)=As(i), i=(1,2 ..., m), A are constant;
Wherein, x is the sample being collected into, and s is the independent random variable obtained after independent component analysis.
Method using multicore density estimation estimated probability density function is as follows:
The mathematic(al) representation of Density Estimator is,
Smooth coefficients h is solved according to mean square integral error function, wherein f (s) is the trues probability density function of s,For the estimation to f (s);
After solving smooth coefficients h, the estimation to s probability density function is just completed.
Using Gaussian function as kernel function, Gaussian function expression formula is,
In step 2, the method for sampling is as follows:
sv=si+hsr, 1≤i of where≤n, sr~N (0,1);
Wherein, svFor sampling value.
Raw sample data is the insufficient data of training samples number.
Compared with prior art, the present invention is close using core under system initial operating stage, the insufficient situation of training samples number The method of degree estimation works as original sample by the probability density function of the PDF estimation sample entirety of a small amount of sample There are the methods that independent component analysis is first used when correlation to remove the correlation between each attribute of original sample between each attribute, Density Estimator is carried out again, and virtual sample is generated according to the probability density function that estimation obtains.The present invention can alleviate training airplane When device learning model the problem of lack of training samples, the accuracy of machine learning model is improved.It is raw compared to other virtual samples At method, invention introduces Independent Component Analysis to solve the problems, such as having correlation between each attribute of sample, thus Application surface of the invention is widened.
Detailed description of the invention
Fig. 1 is flow chart of the invention;
Fig. 2 is the schematic diagram in the embodiment of the present invention.
Specific embodiment
The present invention will be further described with reference to the accompanying drawing.
Referring to Fig. 1, the present invention the following steps are included:
Step 1, to the insufficient raw sample data of training samples number carry out independent component analysis, remove attribute between Correlation, and whether discriminatory analysis structure restrains;
Step 2 uses multicore density estimation estimated probability density function to independent sample, and sample if convergence; If not restraining, multicore density estimation estimated probability density function is used to original sample, and sample;
Step 3 restores the data that sampling is restrained in step 2 using the result of independent component analysis in step 1 Correlation, the data after sampling convergence map back original sample space, obtain virtual sample;
Virtual sample is mixed with the data not restrained after sampling, obtains final virtual sample by step 4.
When, there are when correlation, the obtained result of independent component analysis is as follows between each attribute of sample data:
Assuming that be collected into a small amount of sample and be,
X=(x1,x2,…,xn),x∈Rn
Assuming that x be it is obtained after linear transformation by n mutually independent random variables s, then have,
S=(s1,s2,…,sn),s∈Rn
Assuming that A is that hybrid matrix then has,
x(i)=As(i), i=(1,2 ..., m);
Wherein, x is the sample being collected into, and s is the independent random variable obtained after independent component analysis, and A is normal Amount.
Method using multicore density estimation estimated probability density function is as follows:
The mathematic(al) representation of Density Estimator is,
Smooth coefficients h is solved according to mean square integral error function, wherein f (s) is the trues probability density function of s,For the estimation to f (s);
After solving smooth coefficients h, the estimation to s probability density function is just completed.
Using Gaussian function as kernel function, Gaussian function expression formula is,
The method of sampling is as follows:
sv=si+hsr, 1≤i of where≤n, sr~N (0,1), svFor sampling value.
Embodiment:
If s=[- 4, -3.5, -2, -1, -0.75,1,3,3.2,4,4.2,4,6], after Density Estimator, in Fig. 2 Solid line isDotted line depicts the gaussian kernel function being added on each original sample.When generating virtual sample, one is selected first A original sample, original sample when being selectively s=3 in Fig. 2, blue curve depict the gaussian kernel function at s=3.So The one-dimensional random number s for meeting normal distribution is regenerated afterwardsr, s is taken hereinr=0.29.Last basis is found out by Density Estimator H find out virtual independent sample sv.S=3 in this example, sr=0.29, h=1.4614 obtain s according to formulav=3+0.29* 1.461=3.4237.
It is sampled according to above step, the virtual independent sample s of quantity is satisfied with until obtainingv, finally according to formula (3), by independent virtual sample map back it is original have sample space, obtain virtual sample,
xv (i)=Asv (i), i=1,2 ..., m.

Claims (6)

1. a kind of virtual sample generation method based on independent component analysis and Density Estimator, which is characterized in that including following Step:
Step 1 carries out independent component analysis to raw sample data, removes the correlation between attribute, and discriminatory analysis structure is No convergence;
Step 2 uses multicore density estimation estimated probability density function to independent sample, and sample if convergence;If no Convergence then uses multicore density estimation estimated probability density function to original sample, and samples;
Step 3 carries out the data for restraining sampling in step 2 to restore related using the result of independent component analysis in step 1 Property, the data after sampling convergence map back original sample space, obtain virtual sample;
Step 4 mixes virtual sample with original sample, the sample set finally expanded.
2. a kind of virtual sample generation method based on independent component analysis and Density Estimator according to claim 1, It is characterized in that, when, there are when correlation, the obtained result of independent component analysis is as follows between each attribute of sample data:
Assuming that be collected into a small amount of sample and be,
X=(x1,x2,…,xn),x∈Rn
Assuming that x be it is obtained after linear transformation by n mutually independent random variables s, then have,
S=(s1,s2,…,sn),s∈Rn
Assuming that A is that hybrid matrix then has,
x(i)=As(i), i=(1,2 ..., m);
Wherein, x is the sample being collected into, and s is the independent random variable obtained after independent component analysis, and A is constant.
3. a kind of virtual sample generation method based on independent component analysis and Density Estimator according to claim 1, It is characterized in that, as follows using the method for multicore density estimation estimated probability density function:
The mathematic(al) representation of Density Estimator is,
Smooth coefficients h is solved according to mean square integral error function, wherein f (s) is the trues probability density function of s,For Estimation to f (s);
After solving smooth coefficients h, the estimation to s probability density function is just completed.
4. a kind of virtual sample generation method based on independent component analysis and Density Estimator according to claim 3, It is characterized in that, using Gaussian function as kernel function, Gaussian function expression formula is,
5. a kind of virtual sample generation method based on independent component analysis and Density Estimator according to claim 1, It is characterized in that, the method for sampling is as follows in step 2:
sv=si+hsr, where1≤i≤n, sr~N (0,1)
Wherein, svFor sampling value.
6. a kind of virtual sample generation method based on independent component analysis and Density Estimator according to claim 1, It is characterized in that, raw sample data is the insufficient data of training samples number.
CN201910357339.XA 2019-04-29 2019-04-29 A kind of virtual sample generation method based on independent component analysis and Density Estimator Pending CN110097116A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910357339.XA CN110097116A (en) 2019-04-29 2019-04-29 A kind of virtual sample generation method based on independent component analysis and Density Estimator

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910357339.XA CN110097116A (en) 2019-04-29 2019-04-29 A kind of virtual sample generation method based on independent component analysis and Density Estimator

Publications (1)

Publication Number Publication Date
CN110097116A true CN110097116A (en) 2019-08-06

Family

ID=67446517

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910357339.XA Pending CN110097116A (en) 2019-04-29 2019-04-29 A kind of virtual sample generation method based on independent component analysis and Density Estimator

Country Status (1)

Country Link
CN (1) CN110097116A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111160619A (en) * 2019-12-06 2020-05-15 北京国电通网络技术有限公司 Power load prediction method based on data derivation
CN112098915A (en) * 2020-11-05 2020-12-18 武汉格蓝若智能技术有限公司 Method for evaluating secondary errors of multiple voltage transformers under double-bus segmented wiring

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111160619A (en) * 2019-12-06 2020-05-15 北京国电通网络技术有限公司 Power load prediction method based on data derivation
CN112098915A (en) * 2020-11-05 2020-12-18 武汉格蓝若智能技术有限公司 Method for evaluating secondary errors of multiple voltage transformers under double-bus segmented wiring

Similar Documents

Publication Publication Date Title
Kim et al. Softflow: Probabilistic framework for normalizing flow on manifolds
CN110381523B (en) Cellular base station network traffic prediction method based on TVF-EMD-LSTM model
CN107506822B (en) Deep neural network method based on space fusion pooling
CN106202756B (en) Deficient based on single layer perceptron determines blind source separating source signal restoration methods
CN110097116A (en) A kind of virtual sample generation method based on independent component analysis and Density Estimator
CN111639808A (en) Multi-wind-farm output scene generation method and system considering time-space correlation
CN111209974A (en) Tensor decomposition-based heterogeneous big data core feature extraction method and system
CN111783209A (en) Self-adaptive structure reliability analysis method combining learning function and kriging model
CN106204477B (en) Video frequency sequence background restoration methods based on online low-rank background modeling
CN105895089A (en) Speech recognition method and device
CN113627685B (en) Wind driven generator power prediction method considering wind power internet load limit
Vu et al. Accelerating iterative hard thresholding for low-rank matrix completion via adaptive restart
CN114528097A (en) Cloud platform service load prediction method based on time sequence convolution neural network
CN105978733A (en) Network flow modelling method and system based on Weibull distribution
CN109376651A (en) The system that a kind of GPU based on CUDA framework accelerates spike classification
CN105844094B (en) Blind source separating source signal restoration methods are determined based on gradient descent method and the deficient of Newton method
CN105354807B (en) A kind of image analogy method based on parsing rarefaction representation
CN111460368A (en) Parallel Bayesian optimization method
Okamura et al. Estimating markov-modulated compound poisson processes
CN113158134B (en) Method, device and storage medium for constructing non-invasive load identification model
CN109390946B (en) Optimal probability load flow rapid calculation method based on multi-parameter planning theory
EP3678140A1 (en) Method and device for simulating atomic dynamics
CN111476408A (en) Power communication equipment state prediction method and system
CN104064195A (en) Multidimensional blind separation method in noise environment
Kuznetsov et al. Comparative analysis of two modified fast simulation methods for evaluation of the failure probability of a rank structure system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190806

RJ01 Rejection of invention patent application after publication