CN110458180A

CN110458180A - A kind of classifier training method based on small sample

Info

Publication number: CN110458180A
Application number: CN201910351889.0A
Authority: CN
Inventors: 刘芷菁; 刘波; 林露樾; 肖燕珊; 刘倩
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2019-04-28
Filing date: 2019-04-28
Publication date: 2019-11-15
Anticipated expiration: 2039-04-28
Also published as: CN110458180B

Abstract

The classifier training method based on small sample that the invention discloses a kind of, comprising the following steps: the following steps are included: S1: setting parameter alpha, β, learning rate, maximum training pace T；S2: a collection of image is read from training set, and image x is input in priori classification device, obtains priori labelValue and each picture latent variable z value；S3: to the i-th class of read picture, the mean value of such latent variable is calculatedWith assertS4: repeating step S3, until all classes of read picture have carried out the mean value of corresponding latent variable and the calculating asserted；S5: image x, priori label are being learntOn the basis of combine posteriority classifier, obtain described posteriority labelS6: optimizing loss function, reduces the calculating cost of loss function；S7: the gradient of the loss function of all variables in the set Θ of all variables optimized required for calculating.Fast convergence rate of the present invention, the training consuming time is shorter, can train and obtain high-precision classification device.

Description

A kind of classifier training method based on small sample

Technical field

The present invention relates to the technical field of machine learning more particularly to a kind of classifier training sides based on small sample Method.

Background technique

With the development of deep learning, there has been proposed a kind of image classification methods based on deep neural network, and need It to be trained by a large amount of sample, so that deep neural network has better performance.However, in certain practical applications In, such as to image tracing or object detection, we may only have limited sample, therefore be difficult to establish it is a large amount of it is valuable, marked The sample set of note.

In the case where only rare sample, the supervised training process of deep neural network is very difficult and is easy Cause ability to express deficiency and the generalization ability of deep neural network model poor, and the insufficient deep neural network of training data Often there is performance limitation in the in-depth of network.

Insufficient for sample deep learning problem, someone's comparative analysis convolutional neural networks structure spy at all levels Ability to express is levied, proposes a layer freezing method fine tuning convolution model, and carried out Classification and Identification on small-scale data set, but should Method convergence rate when network structure is deeper is slower, and it is longer that training expends the time.

Summary of the invention

It is an object of the invention to overcome the deficiencies of the prior art and provide a kind of fast convergence rate, training expend the time compared with It is short, can train and obtain the classifier training method based on small sample of high-precision classification device.

To achieve the above object, technical solution provided by the present invention are as follows:

The following steps are included:

S1: setting parameter alpha, β, learning rate, maximum training pace T；

S2: a collection of image is read from training set, and image x is input in priori classification device, obtains priori label Value and each picture latent variable z value；

S3: to the i-th class of read picture, the mean value of such latent variable is calculatedWith assert

S4: repeating step S3, until all classes of read picture have carried out the mean value of corresponding latent variable and broken The calculating of speech；

S5: image x, priori label are being learntOn the basis of combine posteriority classifier, obtain described posteriority label

S6: optimizing loss function, reduces the calculating cost of loss function；

S7: the gradient of the loss function of all variables in the set Θ of all variables optimized required for calculating.

Further, detailed process is as follows by the step S2:

Depth network C NNs is as priori classification device for training；Using image x as input, hidden layer g (x) makees priori classification device The feature for being characterized extractor generation input picture describes z, also referred to as latent variable, then describes z distribution for each feature One priori label

Further, the step S5 obtains described posteriority labelDetailed process is as follows:

S5-1: by image x, priori labelIt is input to encoderObtain mean value and the association side of latent variable Difference, i.e.,Latent variableDistribution meet

S5-2: the inverse of estimation and cross entropy based on preceding k classifier introduces a modifying layer to correct priori label Result；

S5-3: from distributionIt is generatedMiddle sampling latent variableAnd be entered into CNNs, it is defeated Enter image and distributes a label；

S5-4: step S5-3 is repeated γ times, one group of latent variable is obtainedWherein* dot-product operation is indicated；

S5-5: the latent variable that will be sampledIt is input to decoderAfterwards,

The posteriority label exported

Further, in the step S5-2, modifying layer is using the priori label that obtains from priori classification device as input；In In modifying layer, the label of image is obtained from multinomial input distribution, using activation primitive to the latent variable z distribution one of sampling A label；The difference of the true tag of priori label and class first half is calculated using subtraction, wherein L indicates the number of class Amount distributes to preceding one label of L/2 class and labeled as 1.

Further, the calculating process of the step S5-5 is as follows:

The latent variable of sampling is handled using average operationAnd calculate cross entropy；The arbitrary sample x of given i-th class, Define v_iFor the variable in posteriority classifier, being input to top layer activation primitive softmax or sigmoid, output posteriority is obtained Label

In above formula, θ_vAnd b_vFor the parameter of top layer activation primitive.

Further, the step S6 optimizes that detailed process is as follows to loss function:

In upper two formula, formula (1) is loss function, and Θ indicates one group of variable of the required optimization in posterior probability, public Section 2 in formula (1) is rewritable at formula (2)；

By introducing the thought of stochastic gradient variational Bayesian method, by the flat of the latent variable of class each in small batch processing Mean value is considered as the average value of the global variable of all images in image set, then, give the small lot sample of any i-th class in training set This B, latent variable relevant to the i-th class is defined as:

Latent variable about such is asserted is defined as:

Then optimization problem is rewritable at as follows:

Further, Adam method settlement steps to deal S6 optimization problem is utilized；

In Adam method, the set of all variables optimized required for being represented using θ；θ_t-1It indicates in the t times iteration In, the variable of optimization required for any one in Θ；

First by back-propagating method, target is calculated in the t times iteration relative to element θ_t-1Gradient g_t；It connects , rule calculating is updated by three steps to update each parameter:

Calculate the single order moments estimation of bias correction:

Calculate the second order moments estimation of bias correction:

The second order moments estimation of the single order moments estimation of bias correction and bias correction is updated to the following formula, obtains any change Measure θ_t-1Expression formula at the t times:

Above formula is calculated, optimization can be updated to each parameter.

Compared with prior art, this programme principle and advantage is as follows:

1. being based on variation self-encoding encoder (VAE), it is made of priori classification device and posteriority classifier, different from passing through addition Disturbance or other methods enhance the data enhancement methods of initial data, but extract more latent variables from study distribution It generates more examples, avoids the occurrence of the ability to express deficiency for leading to deep neural network model because of learning sample rareness and general The situation of change ability difference.

2. introduce modifying layer in the encoder, for correct priori label as a result, to compensate true tag and first standard inspection Difference between label.

3. the thought of stochastic gradient variational Bayesian method is considered, by being averaged for the latent variable of class each in small batch processing Value is considered as the average value of the global variable of all images in image set, substantially reduces calculating cost.

Detailed description of the invention

Fig. 1 is a kind of work flow diagram of the classifier training method based on small sample of the present invention；

Fig. 2 is in conjunction with the flow chart for obtaining posteriority label after priori classification device and posteriority classifier；

Fig. 3 is the schematic diagram of modifying layer.

Specific embodiment

The present invention is further explained in the light of specific embodiments:

As shown in Figure 1-3, a kind of classifier training method based on small sample described in the present embodiment, including following step It is rapid:

S1: setting parameter alpha, β, learning rate, maximum training pace T；

S2: depth network C NNs is as priori classification device for training；Priori classification device is using image x as input, hidden layer g (x) feature for generating input picture as feature extractor describes z, also referred to as latent variable, then describes z for each feature Distribute a priori label

S3: to the i-th class of read picture, the mean value of such latent variable is calculatedWith assert Wherein assert the prior distribution Gaussian distributed for referring to latent variable；

S5: image x, priori label are being learntOn the basis of combine posteriority classifier, obtain described posteriority labelDetailed process is as follows:

Specifically, modifying layer is using the priori label that obtains from priori classification device as input；In modifying layer, the mark of image Label are obtained from multinomial input distribution, and the latent variable z using activation primitive to sampling distributes a label；It is transported using subtraction The difference of the true tag of priori of calculating label and class first half, wherein L indicates the quantity of class, L/2 class before distributing to One label is simultaneously labeled as 1；

S5-5: the latent variable that will be sampledIt is input to decoderAfterwards, the posteriority label exportedMeter Calculation process is as follows:

In above formula, θ_vAnd b_vFor the parameter of top layer activation primitive；

S6: posteriority label is obtainedAfterwards, loss function is optimized, reduces the calculating cost of loss function；

Specific optimization process is as follows:

From above-mentioned discovery, due to asserting P (z_i)=N (α z_i, β I) presence, the calculating cost for optimizing loss function is very big. This is becauseThe average value of the latent variable of all images in the i-th class is indicated, so this needs the number by traversing each class It could be calculated according to all images of concentration；

For this purpose, thought of the present embodiment by introducing stochastic gradient variational Bayesian method, by class each in small batch processing The average value of latent variable be considered as the average value of the global variable of all images in image set, then, give in training set and appoint The small quantities of sample B for i-th class of anticipating, latent variable relevant to the i-th class is defined as:

Latent variable about such is asserted is defined as:

Then optimization problem is rewritable at as follows:

S7: the gradient of the loss function of all variables in the set Θ of all variables optimized required for calculating

It is by the output of priori classification device in the backpropagation of algorithmIt is considered as a constant, therefore is unable to get pass The loss gradient of gradient in formula (1), i.e. formula (1) can not travel to previous classifier；And Adam method is a base In the low-order moment optimizer of gradient decline and ART network；Adam algorithm is widely used in the deep learning optimization of small batch processing In, which ensure that loss function converges on zero；Then the present embodiment utilizes Adam method, solves the optimization that previous step is mentioned and asks Topic；

Calculate the single order moments estimation of bias correction:

Calculate the second order moments estimation of bias correction:

Above formula is calculated, optimization can be updated to each parameter.

The present embodiment is based on variation self-encoding encoder (VAE), is made of priori classification device and posteriority classifier, is different from logical Addition disturbance or other methods are crossed to enhance the data enhancement methods of initial data, but are extracted from study distribution more latent More examples are generated in variable, and avoiding the occurrence of leads to the ability to express of deep neural network model not because of learning sample rareness The situation of foot and generalization ability difference.In addition, introduce modifying layer in the encoder, for correct priori label as a result, with compensation Difference between true tag and priori label.Besides consider the thought of stochastic gradient variational Bayesian method, it will be small quantities of The average value of the latent variable of each class is considered as the average value of the global variable of all images in image set in processing, substantially reduces Calculate cost.

The examples of implementation of the above are only the preferred embodiments of the invention, and implementation model of the invention is not limited with this It encloses, therefore all shapes according to the present invention, changes made by principle, should all be included within the scope of protection of the present invention.

Claims

1. a kind of classifier training method based on small sample, which comprises the following steps:

S1: setting parameter alpha, β, learning rate, maximum training pace T；

S2: a collection of image is read from training set, and image x is input in priori classification device, obtains priori labelValue and The value of the latent variable z of each picture；

S4: repeating step S3, until all classes of read picture have carried out the mean value of corresponding latent variable and asserted It calculates；

S6: optimizing loss function, reduces the calculating cost of loss function；

2. a kind of classifier training method based on small sample according to claim 1, which is characterized in that the step S2 Detailed process is as follows:

Depth network C NNs is as priori classification device for training；Priori classification device is using image x as input, and hidden layer g (x) is as special The feature that sign extractor generates input picture describes z, also referred to as latent variable, then describes z for each feature and distributes one Priori label

3. a kind of classifier training method based on small sample according to claim 1, which is characterized in that the step S5 Obtain described posteriority labelDetailed process is as follows:

S5-1: by image x, priori labelIt is input to encoderObtain the mean value and covariance of latent variable, i.e.,Latent variableDistribution meet

S5-2: the inverse of estimation and cross entropy based on preceding k classifier introduces a modifying layer to correct the knot of priori label Fruit；

S5-3: from distributionIt is generatedMiddle sampling latent variableAnd be entered into CNNs, for input figure As one label of distribution；

S5-4: step S5-3 is repeated γ times, one group of latent variable is obtainedWhereinε~N (0,1), * indicate dot-product operation；

S5-5: the latent variable that will be sampledIt is input to decoderAfterwards, the posteriority label exported

4. a kind of classifier training method based on small sample according to claim 3, which is characterized in that the step In S5-2, modifying layer is using the priori label that obtains from priori classification device as input；In modifying layer, the label of image is from multinomial It is obtained in formula input distribution, the latent variable z using activation primitive to sampling distributes a label；It is calculated first using subtraction The difference of the true tag of standard inspection label and class first half, wherein L indicates the quantity of class, one label of L/2 class before distributing to And it is labeled as 1.

5. a kind of classifier training method based on small sample according to claim 3, which is characterized in that the step The calculating process of S5-5 is as follows:

The latent variable of sampling is handled using average operationAnd calculate cross entropy；The arbitrary sample x of given i-th class, definition v_iFor the variable in posteriority classifier, being input to top layer activation primitive softmax or sigmoid, output posteriority label is obtained

6. a kind of classifier training method based on small sample according to claim 1, which is characterized in that the step S6 Optimize that detailed process is as follows to loss function:

In upper two formula, formula (1) is loss function, and Θ indicates one group of variable of the required optimization in posterior probability, formula (1) In Section 2 it is rewritable at formula (2)；

By introducing the thought of stochastic gradient variational Bayesian method, by the average value of the latent variable of class each in small batch processing It is considered as the average value of the global variable of all images in image set, then, small quantities of sample B of any i-th class in training set is given, Latent variable relevant to the i-th class is defined as:

Latent variable about such is asserted is defined as:

Then optimization problem is rewritable at as follows:

7. a kind of classifier training method based on small sample according to claim 6, which is characterized in that utilize the side Adam Method settlement steps to deal S6 optimization problem；

In Adam method, the set of all variables optimized required for being represented using θ；θ_t-1Indicate the In in the t times iteration The variable optimized required for any one in Θ；

First by back-propagating method, target is calculated in the t times iteration relative to element θ_t-1Gradient g_t；Then, lead to It crosses three steps and updates rule calculating to update each parameter:

(1) the single order moments estimation of bias correction is calculated:

(2) the second order moments estimation of bias correction is calculated:

(3) the second order moments estimation of the single order moments estimation of bias correction and bias correction is updated to the following formula, obtains any change Measure θ_t-1Expression formula at the t times:

Above formula is calculated, optimization can be updated to each parameter.