CN110458180A - A kind of classifier training method based on small sample - Google Patents

A kind of classifier training method based on small sample Download PDF

Info

Publication number
CN110458180A
CN110458180A CN201910351889.0A CN201910351889A CN110458180A CN 110458180 A CN110458180 A CN 110458180A CN 201910351889 A CN201910351889 A CN 201910351889A CN 110458180 A CN110458180 A CN 110458180A
Authority
CN
China
Prior art keywords
label
latent variable
priori
image
class
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910351889.0A
Other languages
Chinese (zh)
Other versions
CN110458180B (en
Inventor
刘芷菁
刘波
林露樾
肖燕珊
刘倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN201910351889.0A priority Critical patent/CN110458180B/en
Publication of CN110458180A publication Critical patent/CN110458180A/en
Application granted granted Critical
Publication of CN110458180B publication Critical patent/CN110458180B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches

Abstract

The classifier training method based on small sample that the invention discloses a kind of, comprising the following steps: the following steps are included: S1: setting parameter alpha, β, learning rate, maximum training pace T;S2: a collection of image is read from training set, and image x is input in priori classification device, obtains priori labelValue and each picture latent variable z value;S3: to the i-th class of read picture, the mean value of such latent variable is calculatedWith assertS4: repeating step S3, until all classes of read picture have carried out the mean value of corresponding latent variable and the calculating asserted;S5: image x, priori label are being learntOn the basis of combine posteriority classifier, obtain described posteriority labelS6: optimizing loss function, reduces the calculating cost of loss function;S7: the gradient of the loss function of all variables in the set Θ of all variables optimized required for calculating.Fast convergence rate of the present invention, the training consuming time is shorter, can train and obtain high-precision classification device.

Description

A kind of classifier training method based on small sample
Technical field
The present invention relates to the technical field of machine learning more particularly to a kind of classifier training sides based on small sample Method.
Background technique
With the development of deep learning, there has been proposed a kind of image classification methods based on deep neural network, and need It to be trained by a large amount of sample, so that deep neural network has better performance.However, in certain practical applications In, such as to image tracing or object detection, we may only have limited sample, therefore be difficult to establish it is a large amount of it is valuable, marked The sample set of note.
In the case where only rare sample, the supervised training process of deep neural network is very difficult and is easy Cause ability to express deficiency and the generalization ability of deep neural network model poor, and the insufficient deep neural network of training data Often there is performance limitation in the in-depth of network.
Insufficient for sample deep learning problem, someone's comparative analysis convolutional neural networks structure spy at all levels Ability to express is levied, proposes a layer freezing method fine tuning convolution model, and carried out Classification and Identification on small-scale data set, but should Method convergence rate when network structure is deeper is slower, and it is longer that training expends the time.
Summary of the invention
It is an object of the invention to overcome the deficiencies of the prior art and provide a kind of fast convergence rate, training expend the time compared with It is short, can train and obtain the classifier training method based on small sample of high-precision classification device.
To achieve the above object, technical solution provided by the present invention are as follows:
The following steps are included:
S1: setting parameter alpha, β, learning rate, maximum training pace T;
S2: a collection of image is read from training set, and image x is input in priori classification device, obtains priori label Value and each picture latent variable z value;
S3: to the i-th class of read picture, the mean value of such latent variable is calculatedWith assert
S4: repeating step S3, until all classes of read picture have carried out the mean value of corresponding latent variable and broken The calculating of speech;
S5: image x, priori label are being learntOn the basis of combine posteriority classifier, obtain described posteriority label
S6: optimizing loss function, reduces the calculating cost of loss function;
S7: the gradient of the loss function of all variables in the set Θ of all variables optimized required for calculating.
Further, detailed process is as follows by the step S2:
Depth network C NNs is as priori classification device for training;Using image x as input, hidden layer g (x) makees priori classification device The feature for being characterized extractor generation input picture describes z, also referred to as latent variable, then describes z distribution for each feature One priori label
Further, the step S5 obtains described posteriority labelDetailed process is as follows:
S5-1: by image x, priori labelIt is input to encoderObtain mean value and the association side of latent variable Difference, i.e.,Latent variableDistribution meet
S5-2: the inverse of estimation and cross entropy based on preceding k classifier introduces a modifying layer to correct priori label Result;
S5-3: from distributionIt is generatedMiddle sampling latent variableAnd be entered into CNNs, it is defeated Enter image and distributes a label;
S5-4: step S5-3 is repeated γ times, one group of latent variable is obtainedWherein* dot-product operation is indicated;
S5-5: the latent variable that will be sampledIt is input to decoderAfterwards,
The posteriority label exported
Further, in the step S5-2, modifying layer is using the priori label that obtains from priori classification device as input;In In modifying layer, the label of image is obtained from multinomial input distribution, using activation primitive to the latent variable z distribution one of sampling A label;The difference of the true tag of priori label and class first half is calculated using subtraction, wherein L indicates the number of class Amount distributes to preceding one label of L/2 class and labeled as 1.
Further, the calculating process of the step S5-5 is as follows:
The latent variable of sampling is handled using average operationAnd calculate cross entropy;The arbitrary sample x of given i-th class, Define viFor the variable in posteriority classifier, being input to top layer activation primitive softmax or sigmoid, output posteriority is obtained Label
In above formula, θvAnd bvFor the parameter of top layer activation primitive.
Further, the step S6 optimizes that detailed process is as follows to loss function:
In upper two formula, formula (1) is loss function, and Θ indicates one group of variable of the required optimization in posterior probability, public Section 2 in formula (1) is rewritable at formula (2);
By introducing the thought of stochastic gradient variational Bayesian method, by the flat of the latent variable of class each in small batch processing Mean value is considered as the average value of the global variable of all images in image set, then, give the small lot sample of any i-th class in training set This B, latent variable relevant to the i-th class is defined as:
Latent variable about such is asserted is defined as:
Then optimization problem is rewritable at as follows:
Further, Adam method settlement steps to deal S6 optimization problem is utilized;
In Adam method, the set of all variables optimized required for being represented using θ;θt-1It indicates in the t times iteration In, the variable of optimization required for any one in Θ;
First by back-propagating method, target is calculated in the t times iteration relative to element θt-1Gradient gt;It connects , rule calculating is updated by three steps to update each parameter:
Calculate the single order moments estimation of bias correction:
Calculate the second order moments estimation of bias correction:
The second order moments estimation of the single order moments estimation of bias correction and bias correction is updated to the following formula, obtains any change Measure θt-1Expression formula at the t times:
Above formula is calculated, optimization can be updated to each parameter.
Compared with prior art, this programme principle and advantage is as follows:
1. being based on variation self-encoding encoder (VAE), it is made of priori classification device and posteriority classifier, different from passing through addition Disturbance or other methods enhance the data enhancement methods of initial data, but extract more latent variables from study distribution It generates more examples, avoids the occurrence of the ability to express deficiency for leading to deep neural network model because of learning sample rareness and general The situation of change ability difference.
2. introduce modifying layer in the encoder, for correct priori label as a result, to compensate true tag and first standard inspection Difference between label.
3. the thought of stochastic gradient variational Bayesian method is considered, by being averaged for the latent variable of class each in small batch processing Value is considered as the average value of the global variable of all images in image set, substantially reduces calculating cost.
Detailed description of the invention
Fig. 1 is a kind of work flow diagram of the classifier training method based on small sample of the present invention;
Fig. 2 is in conjunction with the flow chart for obtaining posteriority label after priori classification device and posteriority classifier;
Fig. 3 is the schematic diagram of modifying layer.
Specific embodiment
The present invention is further explained in the light of specific embodiments:
As shown in Figure 1-3, a kind of classifier training method based on small sample described in the present embodiment, including following step It is rapid:
S1: setting parameter alpha, β, learning rate, maximum training pace T;
S2: depth network C NNs is as priori classification device for training;Priori classification device is using image x as input, hidden layer g (x) feature for generating input picture as feature extractor describes z, also referred to as latent variable, then describes z for each feature Distribute a priori label
S3: to the i-th class of read picture, the mean value of such latent variable is calculatedWith assert Wherein assert the prior distribution Gaussian distributed for referring to latent variable;
S4: repeating step S3, until all classes of read picture have carried out the mean value of corresponding latent variable and broken The calculating of speech;
S5: image x, priori label are being learntOn the basis of combine posteriority classifier, obtain described posteriority labelDetailed process is as follows:
S5-1: by image x, priori labelIt is input to encoderObtain mean value and the association side of latent variable Difference, i.e.,Latent variableDistribution meet
S5-2: the inverse of estimation and cross entropy based on preceding k classifier introduces a modifying layer to correct priori label Result;
Specifically, modifying layer is using the priori label that obtains from priori classification device as input;In modifying layer, the mark of image Label are obtained from multinomial input distribution, and the latent variable z using activation primitive to sampling distributes a label;It is transported using subtraction The difference of the true tag of priori of calculating label and class first half, wherein L indicates the quantity of class, L/2 class before distributing to One label is simultaneously labeled as 1;
S5-3: from distributionIt is generatedMiddle sampling latent variableAnd be entered into CNNs, it is defeated Enter image and distributes a label;
S5-4: step S5-3 is repeated γ times, one group of latent variable is obtainedWherein* dot-product operation is indicated;
S5-5: the latent variable that will be sampledIt is input to decoderAfterwards, the posteriority label exportedMeter Calculation process is as follows:
The latent variable of sampling is handled using average operationAnd calculate cross entropy;The arbitrary sample x of given i-th class, Define viFor the variable in posteriority classifier, being input to top layer activation primitive softmax or sigmoid, output posteriority is obtained Label
In above formula, θvAnd bvFor the parameter of top layer activation primitive;
S6: posteriority label is obtainedAfterwards, loss function is optimized, reduces the calculating cost of loss function;
Specific optimization process is as follows:
In upper two formula, formula (1) is loss function, and Θ indicates one group of variable of the required optimization in posterior probability, public Section 2 in formula (1) is rewritable at formula (2);
From above-mentioned discovery, due to asserting P (zi)=N (α zi, β I) presence, the calculating cost for optimizing loss function is very big. This is becauseThe average value of the latent variable of all images in the i-th class is indicated, so this needs the number by traversing each class It could be calculated according to all images of concentration;
For this purpose, thought of the present embodiment by introducing stochastic gradient variational Bayesian method, by class each in small batch processing The average value of latent variable be considered as the average value of the global variable of all images in image set, then, give in training set and appoint The small quantities of sample B for i-th class of anticipating, latent variable relevant to the i-th class is defined as:
Latent variable about such is asserted is defined as:
Then optimization problem is rewritable at as follows:
S7: the gradient of the loss function of all variables in the set Θ of all variables optimized required for calculating
It is by the output of priori classification device in the backpropagation of algorithmIt is considered as a constant, therefore is unable to get pass The loss gradient of gradient in formula (1), i.e. formula (1) can not travel to previous classifier;And Adam method is a base In the low-order moment optimizer of gradient decline and ART network;Adam algorithm is widely used in the deep learning optimization of small batch processing In, which ensure that loss function converges on zero;Then the present embodiment utilizes Adam method, solves the optimization that previous step is mentioned and asks Topic;
In Adam method, the set of all variables optimized required for being represented using θ;θt-1It indicates in the t times iteration In, the variable of optimization required for any one in Θ;
First by back-propagating method, target is calculated in the t times iteration relative to element θt-1Gradient gt;It connects , rule calculating is updated by three steps to update each parameter:
Calculate the single order moments estimation of bias correction:
Calculate the second order moments estimation of bias correction:
The second order moments estimation of the single order moments estimation of bias correction and bias correction is updated to the following formula, obtains any change Measure θt-1Expression formula at the t times:
Above formula is calculated, optimization can be updated to each parameter.
The present embodiment is based on variation self-encoding encoder (VAE), is made of priori classification device and posteriority classifier, is different from logical Addition disturbance or other methods are crossed to enhance the data enhancement methods of initial data, but are extracted from study distribution more latent More examples are generated in variable, and avoiding the occurrence of leads to the ability to express of deep neural network model not because of learning sample rareness The situation of foot and generalization ability difference.In addition, introduce modifying layer in the encoder, for correct priori label as a result, with compensation Difference between true tag and priori label.Besides consider the thought of stochastic gradient variational Bayesian method, it will be small quantities of The average value of the latent variable of each class is considered as the average value of the global variable of all images in image set in processing, substantially reduces Calculate cost.
The examples of implementation of the above are only the preferred embodiments of the invention, and implementation model of the invention is not limited with this It encloses, therefore all shapes according to the present invention, changes made by principle, should all be included within the scope of protection of the present invention.

Claims (7)

1. a kind of classifier training method based on small sample, which comprises the following steps:
S1: setting parameter alpha, β, learning rate, maximum training pace T;
S2: a collection of image is read from training set, and image x is input in priori classification device, obtains priori labelValue and The value of the latent variable z of each picture;
S3: to the i-th class of read picture, the mean value of such latent variable is calculatedWith assert
S4: repeating step S3, until all classes of read picture have carried out the mean value of corresponding latent variable and asserted It calculates;
S5: image x, priori label are being learntOn the basis of combine posteriority classifier, obtain described posteriority label
S6: optimizing loss function, reduces the calculating cost of loss function;
S7: the gradient of the loss function of all variables in the set Θ of all variables optimized required for calculating.
2. a kind of classifier training method based on small sample according to claim 1, which is characterized in that the step S2 Detailed process is as follows:
Depth network C NNs is as priori classification device for training;Priori classification device is using image x as input, and hidden layer g (x) is as special The feature that sign extractor generates input picture describes z, also referred to as latent variable, then describes z for each feature and distributes one Priori label
3. a kind of classifier training method based on small sample according to claim 1, which is characterized in that the step S5 Obtain described posteriority labelDetailed process is as follows:
S5-1: by image x, priori labelIt is input to encoderObtain the mean value and covariance of latent variable, i.e.,Latent variableDistribution meet
S5-2: the inverse of estimation and cross entropy based on preceding k classifier introduces a modifying layer to correct the knot of priori label Fruit;
S5-3: from distributionIt is generatedMiddle sampling latent variableAnd be entered into CNNs, for input figure As one label of distribution;
S5-4: step S5-3 is repeated γ times, one group of latent variable is obtainedWhereinε~N (0,1), * indicate dot-product operation;
S5-5: the latent variable that will be sampledIt is input to decoderAfterwards, the posteriority label exported
4. a kind of classifier training method based on small sample according to claim 3, which is characterized in that the step In S5-2, modifying layer is using the priori label that obtains from priori classification device as input;In modifying layer, the label of image is from multinomial It is obtained in formula input distribution, the latent variable z using activation primitive to sampling distributes a label;It is calculated first using subtraction The difference of the true tag of standard inspection label and class first half, wherein L indicates the quantity of class, one label of L/2 class before distributing to And it is labeled as 1.
5. a kind of classifier training method based on small sample according to claim 3, which is characterized in that the step The calculating process of S5-5 is as follows:
The latent variable of sampling is handled using average operationAnd calculate cross entropy;The arbitrary sample x of given i-th class, definition viFor the variable in posteriority classifier, being input to top layer activation primitive softmax or sigmoid, output posteriority label is obtained
In above formula, θvAnd bvFor the parameter of top layer activation primitive.
6. a kind of classifier training method based on small sample according to claim 1, which is characterized in that the step S6 Optimize that detailed process is as follows to loss function:
In upper two formula, formula (1) is loss function, and Θ indicates one group of variable of the required optimization in posterior probability, formula (1) In Section 2 it is rewritable at formula (2);
By introducing the thought of stochastic gradient variational Bayesian method, by the average value of the latent variable of class each in small batch processing It is considered as the average value of the global variable of all images in image set, then, small quantities of sample B of any i-th class in training set is given, Latent variable relevant to the i-th class is defined as:
Latent variable about such is asserted is defined as:
Then optimization problem is rewritable at as follows:
7. a kind of classifier training method based on small sample according to claim 6, which is characterized in that utilize the side Adam Method settlement steps to deal S6 optimization problem;
In Adam method, the set of all variables optimized required for being represented using θ;θt-1Indicate the In in the t times iteration The variable optimized required for any one in Θ;
First by back-propagating method, target is calculated in the t times iteration relative to element θt-1Gradient gt;Then, lead to It crosses three steps and updates rule calculating to update each parameter:
(1) the single order moments estimation of bias correction is calculated:
(2) the second order moments estimation of bias correction is calculated:
(3) the second order moments estimation of the single order moments estimation of bias correction and bias correction is updated to the following formula, obtains any change Measure θt-1Expression formula at the t times:
Above formula is calculated, optimization can be updated to each parameter.
CN201910351889.0A 2019-04-28 2019-04-28 Classifier training method based on small samples Active CN110458180B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910351889.0A CN110458180B (en) 2019-04-28 2019-04-28 Classifier training method based on small samples

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910351889.0A CN110458180B (en) 2019-04-28 2019-04-28 Classifier training method based on small samples

Publications (2)

Publication Number Publication Date
CN110458180A true CN110458180A (en) 2019-11-15
CN110458180B CN110458180B (en) 2023-09-19

Family

ID=68480903

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910351889.0A Active CN110458180B (en) 2019-04-28 2019-04-28 Classifier training method based on small samples

Country Status (1)

Country Link
CN (1) CN110458180B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020971A (en) * 2012-12-28 2013-04-03 青岛爱维互动信息技术有限公司 Method for automatically segmenting target objects from images
CN105117429A (en) * 2015-08-05 2015-12-02 广东工业大学 Scenario image annotation method based on active learning and multi-label multi-instance learning
US20180150728A1 (en) * 2016-11-28 2018-05-31 D-Wave Systems Inc. Machine learning systems and methods for training with noisy labels
CN108932705A (en) * 2018-06-27 2018-12-04 北京工业大学 A kind of image processing method based on matrix variables variation self-encoding encoder

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020971A (en) * 2012-12-28 2013-04-03 青岛爱维互动信息技术有限公司 Method for automatically segmenting target objects from images
CN105117429A (en) * 2015-08-05 2015-12-02 广东工业大学 Scenario image annotation method based on active learning and multi-label multi-instance learning
US20180150728A1 (en) * 2016-11-28 2018-05-31 D-Wave Systems Inc. Machine learning systems and methods for training with noisy labels
CN108932705A (en) * 2018-06-27 2018-12-04 北京工业大学 A kind of image processing method based on matrix variables variation self-encoding encoder

Also Published As

Publication number Publication date
CN110458180B (en) 2023-09-19

Similar Documents

Publication Publication Date Title
Messikommer et al. Event-based asynchronous sparse convolutional networks
CN107330480B (en) Computer recognition method for hand-written character
CN106203625B (en) A kind of deep-neural-network training method based on multiple pre-training
Shih et al. Grey number prediction using the grey modification model with progression technique
CN109754078A (en) Method for optimization neural network
CN111080675A (en) Target tracking method based on space-time constraint correlation filtering
CN107330355A (en) A kind of depth pedestrian based on positive sample Constraints of Equilibrium identification method again
Sun et al. A spatially constrained shifted asymmetric Laplace mixture model for the grayscale image segmentation
CN108256630A (en) A kind of over-fitting solution based on low dimensional manifold regularization neural network
CN114359631A (en) Target classification and positioning method based on coding-decoding weak supervision network model
Gaedke-Merzhäuser et al. Multilevel minimization for deep residual networks
Talaván et al. A continuous Hopfield network equilibrium points algorithm
Yang et al. Accelerating the training process of convolutional neural networks for image classification by dropping training samples out
CN108509986A (en) Based on the Aircraft Target Recognition for obscuring constant convolutional neural networks
Lee et al. An edge detection–based eGAN model for connectivity in ambient intelligence environments
Chen et al. A novel neural network training framework with data assimilation
Springer et al. Robust parameter estimation of chaotic systems
CN107529647B (en) Cloud picture cloud amount calculation method based on multilayer unsupervised sparse learning network
CN110458180A (en) A kind of classifier training method based on small sample
CN111598580A (en) XGboost algorithm-based block chain product detection method, system and device
CN116486150A (en) Uncertainty perception-based regression error reduction method for image classification model
Wang et al. Single image rain removal via cascading attention aggregation network on challenging weather conditions
Terekhov et al. Text CAPTCHA Traversal via Knowledge Distillation of Convolutional Neural Networks: Exploring the Impact of Color Channels Selection
CN114708501A (en) Remote sensing image building change detection method based on condition countermeasure network
CN113989256A (en) Detection model optimization method, detection method and detection device for remote sensing image building

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant