CN110458180A - A kind of classifier training method based on small sample - Google Patents
A kind of classifier training method based on small sample Download PDFInfo
- Publication number
- CN110458180A CN110458180A CN201910351889.0A CN201910351889A CN110458180A CN 110458180 A CN110458180 A CN 110458180A CN 201910351889 A CN201910351889 A CN 201910351889A CN 110458180 A CN110458180 A CN 110458180A
- Authority
- CN
- China
- Prior art keywords
- label
- latent variable
- priori
- image
- class
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
Abstract
The classifier training method based on small sample that the invention discloses a kind of, comprising the following steps: the following steps are included: S1: setting parameter alpha, β, learning rate, maximum training pace T;S2: a collection of image is read from training set, and image x is input in priori classification device, obtains priori labelValue and each picture latent variable z value;S3: to the i-th class of read picture, the mean value of such latent variable is calculatedWith assertS4: repeating step S3, until all classes of read picture have carried out the mean value of corresponding latent variable and the calculating asserted;S5: image x, priori label are being learntOn the basis of combine posteriority classifier, obtain described posteriority labelS6: optimizing loss function, reduces the calculating cost of loss function;S7: the gradient of the loss function of all variables in the set Θ of all variables optimized required for calculating.Fast convergence rate of the present invention, the training consuming time is shorter, can train and obtain high-precision classification device.
Description
Technical field
The present invention relates to the technical field of machine learning more particularly to a kind of classifier training sides based on small sample
Method.
Background technique
With the development of deep learning, there has been proposed a kind of image classification methods based on deep neural network, and need
It to be trained by a large amount of sample, so that deep neural network has better performance.However, in certain practical applications
In, such as to image tracing or object detection, we may only have limited sample, therefore be difficult to establish it is a large amount of it is valuable, marked
The sample set of note.
In the case where only rare sample, the supervised training process of deep neural network is very difficult and is easy
Cause ability to express deficiency and the generalization ability of deep neural network model poor, and the insufficient deep neural network of training data
Often there is performance limitation in the in-depth of network.
Insufficient for sample deep learning problem, someone's comparative analysis convolutional neural networks structure spy at all levels
Ability to express is levied, proposes a layer freezing method fine tuning convolution model, and carried out Classification and Identification on small-scale data set, but should
Method convergence rate when network structure is deeper is slower, and it is longer that training expends the time.
Summary of the invention
It is an object of the invention to overcome the deficiencies of the prior art and provide a kind of fast convergence rate, training expend the time compared with
It is short, can train and obtain the classifier training method based on small sample of high-precision classification device.
To achieve the above object, technical solution provided by the present invention are as follows:
The following steps are included:
S1: setting parameter alpha, β, learning rate, maximum training pace T;
S2: a collection of image is read from training set, and image x is input in priori classification device, obtains priori label
Value and each picture latent variable z value;
S3: to the i-th class of read picture, the mean value of such latent variable is calculatedWith assert
S4: repeating step S3, until all classes of read picture have carried out the mean value of corresponding latent variable and broken
The calculating of speech;
S5: image x, priori label are being learntOn the basis of combine posteriority classifier, obtain described posteriority label
S6: optimizing loss function, reduces the calculating cost of loss function;
S7: the gradient of the loss function of all variables in the set Θ of all variables optimized required for calculating.
Further, detailed process is as follows by the step S2:
Depth network C NNs is as priori classification device for training;Using image x as input, hidden layer g (x) makees priori classification device
The feature for being characterized extractor generation input picture describes z, also referred to as latent variable, then describes z distribution for each feature
One priori label
Further, the step S5 obtains described posteriority labelDetailed process is as follows:
S5-1: by image x, priori labelIt is input to encoderObtain mean value and the association side of latent variable
Difference, i.e.,Latent variableDistribution meet
S5-2: the inverse of estimation and cross entropy based on preceding k classifier introduces a modifying layer to correct priori label
Result;
S5-3: from distributionIt is generatedMiddle sampling latent variableAnd be entered into CNNs, it is defeated
Enter image and distributes a label;
S5-4: step S5-3 is repeated γ times, one group of latent variable is obtainedWherein* dot-product operation is indicated;
S5-5: the latent variable that will be sampledIt is input to decoderAfterwards,
The posteriority label exported
Further, in the step S5-2, modifying layer is using the priori label that obtains from priori classification device as input;In
In modifying layer, the label of image is obtained from multinomial input distribution, using activation primitive to the latent variable z distribution one of sampling
A label;The difference of the true tag of priori label and class first half is calculated using subtraction, wherein L indicates the number of class
Amount distributes to preceding one label of L/2 class and labeled as 1.
Further, the calculating process of the step S5-5 is as follows:
The latent variable of sampling is handled using average operationAnd calculate cross entropy;The arbitrary sample x of given i-th class,
Define viFor the variable in posteriority classifier, being input to top layer activation primitive softmax or sigmoid, output posteriority is obtained
Label
In above formula, θvAnd bvFor the parameter of top layer activation primitive.
Further, the step S6 optimizes that detailed process is as follows to loss function:
In upper two formula, formula (1) is loss function, and Θ indicates one group of variable of the required optimization in posterior probability, public
Section 2 in formula (1) is rewritable at formula (2);
By introducing the thought of stochastic gradient variational Bayesian method, by the flat of the latent variable of class each in small batch processing
Mean value is considered as the average value of the global variable of all images in image set, then, give the small lot sample of any i-th class in training set
This B, latent variable relevant to the i-th class is defined as:
Latent variable about such is asserted is defined as:
Then optimization problem is rewritable at as follows:
Further, Adam method settlement steps to deal S6 optimization problem is utilized;
In Adam method, the set of all variables optimized required for being represented using θ;θt-1It indicates in the t times iteration
In, the variable of optimization required for any one in Θ;
First by back-propagating method, target is calculated in the t times iteration relative to element θt-1Gradient gt;It connects
, rule calculating is updated by three steps to update each parameter:
Calculate the single order moments estimation of bias correction:
Calculate the second order moments estimation of bias correction:
The second order moments estimation of the single order moments estimation of bias correction and bias correction is updated to the following formula, obtains any change
Measure θt-1Expression formula at the t times:
Above formula is calculated, optimization can be updated to each parameter.
Compared with prior art, this programme principle and advantage is as follows:
1. being based on variation self-encoding encoder (VAE), it is made of priori classification device and posteriority classifier, different from passing through addition
Disturbance or other methods enhance the data enhancement methods of initial data, but extract more latent variables from study distribution
It generates more examples, avoids the occurrence of the ability to express deficiency for leading to deep neural network model because of learning sample rareness and general
The situation of change ability difference.
2. introduce modifying layer in the encoder, for correct priori label as a result, to compensate true tag and first standard inspection
Difference between label.
3. the thought of stochastic gradient variational Bayesian method is considered, by being averaged for the latent variable of class each in small batch processing
Value is considered as the average value of the global variable of all images in image set, substantially reduces calculating cost.
Detailed description of the invention
Fig. 1 is a kind of work flow diagram of the classifier training method based on small sample of the present invention;
Fig. 2 is in conjunction with the flow chart for obtaining posteriority label after priori classification device and posteriority classifier;
Fig. 3 is the schematic diagram of modifying layer.
Specific embodiment
The present invention is further explained in the light of specific embodiments:
As shown in Figure 1-3, a kind of classifier training method based on small sample described in the present embodiment, including following step
It is rapid:
S1: setting parameter alpha, β, learning rate, maximum training pace T;
S2: depth network C NNs is as priori classification device for training;Priori classification device is using image x as input, hidden layer g
(x) feature for generating input picture as feature extractor describes z, also referred to as latent variable, then describes z for each feature
Distribute a priori label
S3: to the i-th class of read picture, the mean value of such latent variable is calculatedWith assert
Wherein assert the prior distribution Gaussian distributed for referring to latent variable;
S4: repeating step S3, until all classes of read picture have carried out the mean value of corresponding latent variable and broken
The calculating of speech;
S5: image x, priori label are being learntOn the basis of combine posteriority classifier, obtain described posteriority labelDetailed process is as follows:
S5-1: by image x, priori labelIt is input to encoderObtain mean value and the association side of latent variable
Difference, i.e.,Latent variableDistribution meet
S5-2: the inverse of estimation and cross entropy based on preceding k classifier introduces a modifying layer to correct priori label
Result;
Specifically, modifying layer is using the priori label that obtains from priori classification device as input;In modifying layer, the mark of image
Label are obtained from multinomial input distribution, and the latent variable z using activation primitive to sampling distributes a label;It is transported using subtraction
The difference of the true tag of priori of calculating label and class first half, wherein L indicates the quantity of class, L/2 class before distributing to
One label is simultaneously labeled as 1;
S5-3: from distributionIt is generatedMiddle sampling latent variableAnd be entered into CNNs, it is defeated
Enter image and distributes a label;
S5-4: step S5-3 is repeated γ times, one group of latent variable is obtainedWherein* dot-product operation is indicated;
S5-5: the latent variable that will be sampledIt is input to decoderAfterwards, the posteriority label exportedMeter
Calculation process is as follows:
The latent variable of sampling is handled using average operationAnd calculate cross entropy;The arbitrary sample x of given i-th class,
Define viFor the variable in posteriority classifier, being input to top layer activation primitive softmax or sigmoid, output posteriority is obtained
Label
In above formula, θvAnd bvFor the parameter of top layer activation primitive;
S6: posteriority label is obtainedAfterwards, loss function is optimized, reduces the calculating cost of loss function;
Specific optimization process is as follows:
In upper two formula, formula (1) is loss function, and Θ indicates one group of variable of the required optimization in posterior probability, public
Section 2 in formula (1) is rewritable at formula (2);
From above-mentioned discovery, due to asserting P (zi)=N (α zi, β I) presence, the calculating cost for optimizing loss function is very big.
This is becauseThe average value of the latent variable of all images in the i-th class is indicated, so this needs the number by traversing each class
It could be calculated according to all images of concentration;
For this purpose, thought of the present embodiment by introducing stochastic gradient variational Bayesian method, by class each in small batch processing
The average value of latent variable be considered as the average value of the global variable of all images in image set, then, give in training set and appoint
The small quantities of sample B for i-th class of anticipating, latent variable relevant to the i-th class is defined as:
Latent variable about such is asserted is defined as:
Then optimization problem is rewritable at as follows:
S7: the gradient of the loss function of all variables in the set Θ of all variables optimized required for calculating
It is by the output of priori classification device in the backpropagation of algorithmIt is considered as a constant, therefore is unable to get pass
The loss gradient of gradient in formula (1), i.e. formula (1) can not travel to previous classifier;And Adam method is a base
In the low-order moment optimizer of gradient decline and ART network;Adam algorithm is widely used in the deep learning optimization of small batch processing
In, which ensure that loss function converges on zero;Then the present embodiment utilizes Adam method, solves the optimization that previous step is mentioned and asks
Topic;
In Adam method, the set of all variables optimized required for being represented using θ;θt-1It indicates in the t times iteration
In, the variable of optimization required for any one in Θ;
First by back-propagating method, target is calculated in the t times iteration relative to element θt-1Gradient gt;It connects
, rule calculating is updated by three steps to update each parameter:
Calculate the single order moments estimation of bias correction:
Calculate the second order moments estimation of bias correction:
The second order moments estimation of the single order moments estimation of bias correction and bias correction is updated to the following formula, obtains any change
Measure θt-1Expression formula at the t times:
Above formula is calculated, optimization can be updated to each parameter.
The present embodiment is based on variation self-encoding encoder (VAE), is made of priori classification device and posteriority classifier, is different from logical
Addition disturbance or other methods are crossed to enhance the data enhancement methods of initial data, but are extracted from study distribution more latent
More examples are generated in variable, and avoiding the occurrence of leads to the ability to express of deep neural network model not because of learning sample rareness
The situation of foot and generalization ability difference.In addition, introduce modifying layer in the encoder, for correct priori label as a result, with compensation
Difference between true tag and priori label.Besides consider the thought of stochastic gradient variational Bayesian method, it will be small quantities of
The average value of the latent variable of each class is considered as the average value of the global variable of all images in image set in processing, substantially reduces
Calculate cost.
The examples of implementation of the above are only the preferred embodiments of the invention, and implementation model of the invention is not limited with this
It encloses, therefore all shapes according to the present invention, changes made by principle, should all be included within the scope of protection of the present invention.
Claims (7)
1. a kind of classifier training method based on small sample, which comprises the following steps:
S1: setting parameter alpha, β, learning rate, maximum training pace T;
S2: a collection of image is read from training set, and image x is input in priori classification device, obtains priori labelValue and
The value of the latent variable z of each picture;
S3: to the i-th class of read picture, the mean value of such latent variable is calculatedWith assert
S4: repeating step S3, until all classes of read picture have carried out the mean value of corresponding latent variable and asserted
It calculates;
S5: image x, priori label are being learntOn the basis of combine posteriority classifier, obtain described posteriority label
S6: optimizing loss function, reduces the calculating cost of loss function;
S7: the gradient of the loss function of all variables in the set Θ of all variables optimized required for calculating.
2. a kind of classifier training method based on small sample according to claim 1, which is characterized in that the step S2
Detailed process is as follows:
Depth network C NNs is as priori classification device for training;Priori classification device is using image x as input, and hidden layer g (x) is as special
The feature that sign extractor generates input picture describes z, also referred to as latent variable, then describes z for each feature and distributes one
Priori label
3. a kind of classifier training method based on small sample according to claim 1, which is characterized in that the step S5
Obtain described posteriority labelDetailed process is as follows:
S5-1: by image x, priori labelIt is input to encoderObtain the mean value and covariance of latent variable, i.e.,Latent variableDistribution meet
S5-2: the inverse of estimation and cross entropy based on preceding k classifier introduces a modifying layer to correct the knot of priori label
Fruit;
S5-3: from distributionIt is generatedMiddle sampling latent variableAnd be entered into CNNs, for input figure
As one label of distribution;
S5-4: step S5-3 is repeated γ times, one group of latent variable is obtainedWhereinε~N
(0,1), * indicate dot-product operation;
S5-5: the latent variable that will be sampledIt is input to decoderAfterwards, the posteriority label exported
4. a kind of classifier training method based on small sample according to claim 3, which is characterized in that the step
In S5-2, modifying layer is using the priori label that obtains from priori classification device as input;In modifying layer, the label of image is from multinomial
It is obtained in formula input distribution, the latent variable z using activation primitive to sampling distributes a label;It is calculated first using subtraction
The difference of the true tag of standard inspection label and class first half, wherein L indicates the quantity of class, one label of L/2 class before distributing to
And it is labeled as 1.
5. a kind of classifier training method based on small sample according to claim 3, which is characterized in that the step
The calculating process of S5-5 is as follows:
The latent variable of sampling is handled using average operationAnd calculate cross entropy;The arbitrary sample x of given i-th class, definition
viFor the variable in posteriority classifier, being input to top layer activation primitive softmax or sigmoid, output posteriority label is obtained
In above formula, θvAnd bvFor the parameter of top layer activation primitive.
6. a kind of classifier training method based on small sample according to claim 1, which is characterized in that the step S6
Optimize that detailed process is as follows to loss function:
In upper two formula, formula (1) is loss function, and Θ indicates one group of variable of the required optimization in posterior probability, formula (1)
In Section 2 it is rewritable at formula (2);
By introducing the thought of stochastic gradient variational Bayesian method, by the average value of the latent variable of class each in small batch processing
It is considered as the average value of the global variable of all images in image set, then, small quantities of sample B of any i-th class in training set is given,
Latent variable relevant to the i-th class is defined as:
Latent variable about such is asserted is defined as:
Then optimization problem is rewritable at as follows:
7. a kind of classifier training method based on small sample according to claim 6, which is characterized in that utilize the side Adam
Method settlement steps to deal S6 optimization problem;
In Adam method, the set of all variables optimized required for being represented using θ;θt-1Indicate the In in the t times iteration
The variable optimized required for any one in Θ;
First by back-propagating method, target is calculated in the t times iteration relative to element θt-1Gradient gt;Then, lead to
It crosses three steps and updates rule calculating to update each parameter:
(1) the single order moments estimation of bias correction is calculated:
(2) the second order moments estimation of bias correction is calculated:
(3) the second order moments estimation of the single order moments estimation of bias correction and bias correction is updated to the following formula, obtains any change
Measure θt-1Expression formula at the t times:
Above formula is calculated, optimization can be updated to each parameter.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910351889.0A CN110458180B (en) | 2019-04-28 | 2019-04-28 | Classifier training method based on small samples |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910351889.0A CN110458180B (en) | 2019-04-28 | 2019-04-28 | Classifier training method based on small samples |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110458180A true CN110458180A (en) | 2019-11-15 |
CN110458180B CN110458180B (en) | 2023-09-19 |
Family
ID=68480903
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910351889.0A Active CN110458180B (en) | 2019-04-28 | 2019-04-28 | Classifier training method based on small samples |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110458180B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103020971A (en) * | 2012-12-28 | 2013-04-03 | 青岛爱维互动信息技术有限公司 | Method for automatically segmenting target objects from images |
CN105117429A (en) * | 2015-08-05 | 2015-12-02 | 广东工业大学 | Scenario image annotation method based on active learning and multi-label multi-instance learning |
US20180150728A1 (en) * | 2016-11-28 | 2018-05-31 | D-Wave Systems Inc. | Machine learning systems and methods for training with noisy labels |
CN108932705A (en) * | 2018-06-27 | 2018-12-04 | 北京工业大学 | A kind of image processing method based on matrix variables variation self-encoding encoder |
-
2019
- 2019-04-28 CN CN201910351889.0A patent/CN110458180B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103020971A (en) * | 2012-12-28 | 2013-04-03 | 青岛爱维互动信息技术有限公司 | Method for automatically segmenting target objects from images |
CN105117429A (en) * | 2015-08-05 | 2015-12-02 | 广东工业大学 | Scenario image annotation method based on active learning and multi-label multi-instance learning |
US20180150728A1 (en) * | 2016-11-28 | 2018-05-31 | D-Wave Systems Inc. | Machine learning systems and methods for training with noisy labels |
CN108932705A (en) * | 2018-06-27 | 2018-12-04 | 北京工业大学 | A kind of image processing method based on matrix variables variation self-encoding encoder |
Also Published As
Publication number | Publication date |
---|---|
CN110458180B (en) | 2023-09-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Messikommer et al. | Event-based asynchronous sparse convolutional networks | |
CN107330480B (en) | Computer recognition method for hand-written character | |
CN106203625B (en) | A kind of deep-neural-network training method based on multiple pre-training | |
Shih et al. | Grey number prediction using the grey modification model with progression technique | |
CN109754078A (en) | Method for optimization neural network | |
CN111080675A (en) | Target tracking method based on space-time constraint correlation filtering | |
CN107330355A (en) | A kind of depth pedestrian based on positive sample Constraints of Equilibrium identification method again | |
Sun et al. | A spatially constrained shifted asymmetric Laplace mixture model for the grayscale image segmentation | |
CN108256630A (en) | A kind of over-fitting solution based on low dimensional manifold regularization neural network | |
CN114359631A (en) | Target classification and positioning method based on coding-decoding weak supervision network model | |
Gaedke-Merzhäuser et al. | Multilevel minimization for deep residual networks | |
Talaván et al. | A continuous Hopfield network equilibrium points algorithm | |
Yang et al. | Accelerating the training process of convolutional neural networks for image classification by dropping training samples out | |
CN108509986A (en) | Based on the Aircraft Target Recognition for obscuring constant convolutional neural networks | |
Lee et al. | An edge detection–based eGAN model for connectivity in ambient intelligence environments | |
Chen et al. | A novel neural network training framework with data assimilation | |
Springer et al. | Robust parameter estimation of chaotic systems | |
CN107529647B (en) | Cloud picture cloud amount calculation method based on multilayer unsupervised sparse learning network | |
CN110458180A (en) | A kind of classifier training method based on small sample | |
CN111598580A (en) | XGboost algorithm-based block chain product detection method, system and device | |
CN116486150A (en) | Uncertainty perception-based regression error reduction method for image classification model | |
Wang et al. | Single image rain removal via cascading attention aggregation network on challenging weather conditions | |
Terekhov et al. | Text CAPTCHA Traversal via Knowledge Distillation of Convolutional Neural Networks: Exploring the Impact of Color Channels Selection | |
CN114708501A (en) | Remote sensing image building change detection method based on condition countermeasure network | |
CN113989256A (en) | Detection model optimization method, detection method and detection device for remote sensing image building |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |