CN112836629B - Image classification method - Google Patents
Image classification method Download PDFInfo
- Publication number
- CN112836629B CN112836629B CN202110136790.6A CN202110136790A CN112836629B CN 112836629 B CN112836629 B CN 112836629B CN 202110136790 A CN202110136790 A CN 202110136790A CN 112836629 B CN112836629 B CN 112836629B
- Authority
- CN
- China
- Prior art keywords
- class
- distance
- sample
- loss
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 64
- 238000000605 extraction Methods 0.000 claims abstract description 13
- 239000013598 vector Substances 0.000 claims description 39
- 238000012549 training Methods 0.000 claims description 36
- 239000011159 matrix material Substances 0.000 claims description 26
- 238000005457 optimization Methods 0.000 claims description 17
- 230000008569 process Effects 0.000 claims description 16
- 238000004364 calculation method Methods 0.000 claims description 14
- 238000013527 convolutional neural network Methods 0.000 claims description 12
- 230000007246 mechanism Effects 0.000 claims description 12
- 238000005065 mining Methods 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 3
- 238000003860 storage Methods 0.000 claims description 2
- 238000012795 verification Methods 0.000 abstract description 12
- 239000006185 dispersion Substances 0.000 abstract description 5
- 230000006870 function Effects 0.000 description 12
- 238000000691 measurement method Methods 0.000 description 10
- 238000013145 classification model Methods 0.000 description 4
- 230000007547 defect Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000009826 distribution Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011423 initialization method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24133—Distances to prototypes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Human Computer Interaction (AREA)
- Evolutionary Biology (AREA)
- Multimedia (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Image Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides an image classification method, which is based on a class center updating strategy of random batches and can be applied to an image classification task and an image verification task based on class center points, and comprises the following steps: constructing a category center and extracting image features; calculating the mahalanobis distance between the image features and the class center points, and constructing classification loss; calculating the mahalanobis distance between class centers of the corresponding classes of the samples in the random batch, and constructing dispersion loss; and a two-stage weight updating method is adopted to update the trainable parameters in the feature extraction module and the class center point alternately. The updating strategy of the invention can lead the image classification method based on the class center to have better engineering significance and obtain better image classification result. Compared with the prior art, the invention obtains better classification results and verification results on the image classification and image verification tasks.
Description
Technical Field
The invention relates to the field of computer vision and image processing, in particular to an image classification method based on metric learning, and a class center updating strategy based on random batches.
Background
In the task of image classification, a classification method based on cross entropy loss is more common, the specific flow of the method is to use a convolutional neural network to extract the characteristics of an input sample, the input sample is generally an image, the characteristics of the sample are generally high-dimensional vectors, such as 512-dimensional characteristic vectors, and then use a multi-layer perceptron to classify the characteristics of the image.
For the face recognition task, the identity of each face picture in the data set is required to be determined, the training process of the face recognition model is to train a classification model, the feature is extracted by using a convolutional neural network, then the score of each type of feature is obtained through a full connection layer, the face recognition task does not need to classify the input samples in the verification stage and the test stage, and the face identities encountered in the test stage or the verification stage generally do not appear in the training set, so that the convolutional neural network is only required to extract the feature vectors of the input images in the test stage and the verification stage, the feature vectors of the two face pictures are normalized, the feature vectors of each sample are changed into the same length, but the feature vectors of different samples have different angles, and in the training process, the feature extracted by the face picture of the same person can be gathered as much as possible, so that whether the two samples belong to the same type can be judged according to the angle between the two feature vectors.
For common classification tasks and face recognition tasks, the characteristic extraction module of the model is generally a convolutional neural network, for the classification model, cross entropy loss is generally adopted, and the calculation process is shown in formulas (1) and (2), wherein e is a natural constant, and logic i Representing the score of a feature belonging to class i, where P J The probability that the feature belongs to the i-th class is expressed, and the loss value is expressed as formula (2) assuming that the feature is input to the j-th class. Features trained using cross entropy loss have angular features, i.e., feature vectors exhibit angular distribution features in high-dimensional space.
L=-logP j (2)
In the face recognition task, in order to better compare whether two face pictures with unknown identities belong to the same class, model training needs to ensure that the intra-class distance is smaller than the inter-class distance, that is, the model training needs to ensure that the intra-class distance can be directly setAnd a threshold value, wherein when the angle between the two features is larger than the threshold value, the two features belong to face pictures of different people, and if the angle between the two features is smaller than the threshold value, the two features belong to face pictures of the same person. A more common loss function is shown in formula (3), where s is a scale parameter, θ i Representing the angle between the feature vector and the i-th node weight vector of the fully connected layer, a is a margin parameter, and the constraint can be added, which belongs to an improved cross entropy loss.
In addition to the method based on improving the cross entropy loss, there is a better method at present that is a method for constructing class centers, for example, a feature vector of a sample and euclidean distance between center points of each class are added on the cross entropy loss as intra-class constraints, but the method uses the cross entropy loss of euclidean distance and angle characteristics at the same time, and there is inconsistency in constraints. Still another method is to directly use euclidean distance as a measure between features and also construct class center points, but this method needs to calculate euclidean distance between class centers when calculating the inter-class distance, and if it is used in face recognition task, this method is not applicable any more because the number of classes of face data sets is large. It is necessary to design a more practical inter-class distance calculation method. Moreover, whether the Euclidean distance or the cosine similarity based on the angle characteristics, the measurement mode between the features is set artificially, and for deep learning, the artificial setting is not necessarily the best, and may be only a local optimal solution.
It should be noted that the information disclosed in the above background section is only for understanding the background of the present application and thus may include information that does not form the prior art that is already known to those of ordinary skill in the art.
Disclosure of Invention
In order to overcome the defects of the background technology, the invention provides an image classification method, which is based on a class center updating strategy of random batches, so as to effectively solve the defects in the image classification and face recognition method based on measurement learning and further improve the performance of a model.
The image classification method provided by the invention comprises the following steps:
1) And constructing class center points according to the number of classes of the input image, wherein each class center point is a high-dimensional vector, the dimension of the vector is equal to the dimension of the feature, and the number of the center points is equal to the number of classes in the data set.
2) And for each random batch in the training process, calculating a feature vector by using the convolutional neural network, calculating a mahalanobis distance between the feature vector and a class center point as an intra-class distance, constructing a classification loss according to the intra-class distance, and updating parameters of the convolutional neural network by back propagation of gradients of the classification loss to ensure that the model is optimized towards the direction of reducing the intra-class distance.
3) For each random batch in the training process, calculating the distance between the corresponding class center points according to the class of the samples in the batch, constructing a divergence loss according to the distance, updating the parameters of the class center points by the gradient back propagation of the divergence loss, and optimizing the parameters of the class center points towards the direction of increasing the distance between classes.
The weight of the feature extraction module and the trainable parameters of the class center are alternately updated through a two-stage training mechanism, and the trainable parameters in the mahalanobis distance are updated in both stages.
According to an embodiment of the present invention, in step 1), the vector is measured in a high-dimensional space by a mahalanobis distance. And constructing a class center point which is used as a basis of image classification, wherein the class center contains trainable parameters. The constructed class center point serves as a plurality of vectors of a high-dimensional space, and random initialization is adopted for initialization. The method is characterized in that the Marshall distance is used for measuring the characteristics, and the method is a measurement method with trainable parameters, and in a deep learning task, a model can automatically learn a proper measurement method according to a specific task.
In step 2), the convolutional neural network is used for extracting the characteristics of the image, the mahalanobis distance between the image characteristics and the centers of each class is calculated, the class loss is constructed by utilizing the distance, and the mahalanobis distance contains trainable parameters. According to the extracted sample characteristics, the current parameters are used for calculating the mahalanobis distance between the characteristic vector and the center vector of each category, the classification loss is also constructed based on the mahalanobis distance between the sample and each center point, along with the continuous training of the model, the mahalanobis distance between the characteristic and the center point of the correct category is continuously reduced, and the mahalanobis distance between the characteristic and the center point of other categories is continuously increased.
In step 3), the mahalanobis distance between class centers of the classes corresponding to the samples in each random batch is calculated and used as the inter-class distance, the divergence loss is constructed, the weight of the class center point can be updated, and the model is optimized towards the direction of increasing the inter-class distance. When calculating the inter-class distance, only the class of samples in the random lot can be considered, the divergence loss is a weighted average of the shortest distances between the center point of each class and other center points, and the shortest distance of each class is weighted according to the number of samples in each class in the random lot.
By adopting a two-stage weight updating method, the trainable parameters in the feature extraction module and the class center point are alternately updated. The model adopts a two-stage training mode in the training process, namely, firstly, parameters in the feature extraction module are updated by using classification loss, so that the model can better extract image features, and then, trainable parameters in a class center point are updated by using divergence loss. Both the classification loss and the divergence loss update the trainable parameters in the mahalanobis distance.
In some embodiments, the two-stage training mechanism is to randomly initialize a class center, then to alternately train the process to optimize model parameters, wherein the two stages are respectively an intra-class optimization stage and an inter-class optimization stage, the intra-class optimization stage optimizes the trainable parameters in the feature extraction module of the model according to the classification loss calculated by each random batch, the inter-class optimization stage updates the class center point in the model according to the divergence loss calculated by each random batch, each stage iterates one or more random batch back-alternating stages, and the intra-class optimization stage and the inter-class optimization stage optimize the parameters in the mahalanobis distance.
In some embodiments, weights are set for each center point based on the number of class sample points in each random lot, reducing the impact of class imbalance.
In some embodiments, a difficult sample mining mechanism is used, difficult samples of each random batch are determined according to the mahalanobis distance between the sample feature vector and each class center point in the training process, the loss of the difficult samples is calculated, the weight of the difficult samples in the loss function is increased, and model training is promoted. In the preferred embodiment, samples meeting certain conditions are divided into difficult samples in the training process through a difficult sample mining mechanism, and the weights of the samples in the loss value are increased. In the model training process, samples with wrong classification are judged to be difficult samples, and extra weights can be added to the classification loss of the difficult samples in the loss function, so that the weights of the samples in the loss function can be increased, and further, the model can obtain a better training effect.
In some embodiments, the mahalanobis distance is a learnable metric, and the parameter matrix K.epsilon.R is constructed when calculating the mahalanobis distance n×n Vector X, Y ε R n The mahalanobis distance D of the vectors X and Y is defined as follows, wherein iij ·j 2 Representing a binary norm:
wherein matrix m=k T K, M is a semi-positive definite matrix, and elements in the parameter matrix K are trainable parameters, and can be optimized according to gradient of the loss function.
In some embodiments, the parameter matrix K is initialized with the identity matrix I, and the mahalanobis distance between the two vectors X and Y is of the form:
D(X,Y)=‖K(X-Y)‖ x2 =‖I(X-Y)‖ 2 =‖(X-Y)‖ 2 (2) The mahalanobis distance is degenerated to the euclidean distance.
In some embodiments, identity matrix initialization may ensure that the model is trained starting from euclidean distance, but as the model computes the loss function and counter-propagates gradients over each random lot, elements in the parameter matrix may be updated, and mahalanobis distance becomes the form of equation (1), allowing the model to learn better metrics in high-dimensional space for different classification tasks.
In some embodiments, the intra-class distance calculation and the classification loss calculation in step 2) include: for the extracted feature f of each sample, the mahalanobis distance between the feature and the class center point is calculated, the class center point is set c= [ C ] 1 ,C 2 ,…,C m ]Where m is the number of categories of the dataset and the label of feature f is the i-th category, the inter-category distance can be expressed as:
D(f,C i )=‖K(f-C i )‖ 2 (3)
when the model is used for judging the classification result of the sample, the mahalanobis distance between the sample feature and the corresponding category is ensured to be smaller than the mahalanobis distance between the sample feature and the center point of all other categories, as shown in the formula (4):
D(f,C i )<D(f,C j ),j=1,2,…,m,j≠i (4)
classification loss L 1 Defined as formula (5):
in some embodiments, the inter-class distance calculation in step 3) comprises: for each random batch participating in training, the sample number is s, the class set of the samples in the random batch is B, the Markov distance between every two classes in the set B is calculated, then for each class in the set B, the minimum value of the distances between other classes in the set B is calculated, and all the minimum values are weighted and averaged to obtain the final inter-class distance.
In some embodimentsThe difficult sample mining mechanism includes: for each input sample, the mahalanobis distance between the feature f and the center point of all classes is D (f, C j ) If the sample belongs to the i-th class, if q epsilon { x|x=1, 2, …, m, x +.i } satisfies the formula (6), p is the residual super-parameter, the sample is judged as a difficult sample
pD(f,C i )>D(f,C q ) (6)
The set of Q satisfying all conditions is Q, and the difficult sample loss is defined as follows:
a computer readable storage medium storing a computer program which when executed by a processor implements the image classification method.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides an image classification method, which designs a class center updating strategy based on random batches, and can be applied to an image classification task and an image verification task based on class center points. Compared with a general class center point-based image classification method, the random batch-based center point updating strategy provided by the invention can only calculate the inter-class distance for the class of the sample in the random batch, and construct the dispersion loss according to the inter-class distance, so that the model can learn the feature distribution with larger inter-class distance. The invention can enable the image classification method based on the class center to be directly used for the task of large class number of the data set such as face recognition. Moreover, the two-stage training, the weight calculation of inter-class distance weighted average and the difficult sample mining mechanism proposed by the invention can ensure that the method based on the invention can obtain good effects in tasks such as image classification, face verification and the like, and avoid training barriers on data sets with larger sample numbers and class numbers.
The image classification method of the invention provides a class center updating strategy based on random batches, updates corresponding class centers aiming at classes in the random batches, and designs a Mahalanobis distance with a parameter which can be learned as a measurement method between features. The updating strategy of the invention can lead the image classification method based on the class center to have better engineering significance and obtain better image classification result. Compared with the prior art, the invention obtains better classification results and verification results on the image classification and image verification tasks.
Drawings
Fig. 1 is a basic flowchart of an update strategy in an image classification method according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of inter-class distance weights according to an embodiment of the present invention.
FIG. 3 is a schematic diagram of a simple sample and a difficult sample according to an embodiment of the present invention.
Detailed Description
The invention will be described in further detail with reference to the following detailed description and with reference to the accompanying drawings. It should be emphasized that the following description is merely exemplary in nature and is in no way intended to limit the scope of the invention or its applications.
Non-limiting and non-exclusive embodiments will be described with reference to the following drawings, in which like reference numerals refer to like elements unless otherwise specified.
Those skilled in the art will recognize that numerous variations to the above description are possible, and that the examples are intended only to be illustrative of one or more particular implementations.
According to the image classification method, a class center updating strategy based on random batches is provided, and the method can be applied to image classification and image verification tasks. The method mainly comprises the following steps: constructing a category center and extracting image features; calculating the mahalanobis distance between the image features and the class center points, and constructing classification loss; calculating the mahalanobis distance between class centers of the corresponding classes of the samples in the random batch, and constructing dispersion loss; adopting a two-stage weight updating method to alternately update the trainable parameters in the feature extraction module and the class center point; using a difficult sample mining mechanism, samples meeting certain conditions are divided into difficult samples during the training process, and the weights of the samples in the loss values are increased.
As described in further detail below.
Feature vector and class center: the feature vector needs to be extracted using a convolutional neural network, and for an input image, a high-dimensional vector f is extracted as the feature of the input image. In the present invention, it is necessary to construct a class center vector, i.e., a center point, for each class, where the set of center points is c= [ C ] 1 ,C 2 ,…,C m ]Wherein, wherein.
Mahalanobis distance:
two eigenvectors x= [ x ] for high-dimensional space 1 ,x 2 ,…,x n ] T ,y=[y 1 ,y 2 ,…,y n ] T Various feature measurement methods can be adopted, the common feature measurement methods are Euclidean distance dis and cosine similarity sim, the calculation processes of the two measurement methods are shown in the formula (4) and the formula (5), after the length of the vector is normalized, the two distances are equivalent, the Euclidean distance and the cosine similarity are the vector measurement methods of the better features, but the defects are that the two methods are the artificially designed distances, the image features extracted by the neural network are quite abstract, and a more suitable distance measurement method cannot be determined artificially.
The invention proposes a method for measuring the image feature vector by using the mahalanobis distance, which uses a trainable parameter matrix M to calculate the distance D between two feature vectors, namely the mahalanobis distance, and the calculation method is shown as a formula (6).
Wherein matrix m=k T K, M is a semi-positive definite matrix, and elements in the parameter matrix K are trainable parameters, and can be optimized according to gradient of the loss function. When the matrix K or the matrix M is the identity matrix I, the Marshall distance is the Euclidean distance, and in order to enable the model to find a better measurement method based on the Euclidean distance, the invention provides a measurement method which is not worse than the Euclidean distance and can be learned along with continuous training of the model by considering the initializing operation of the identity matrix to the parameter matrix M.
The present patent calculates the classification loss and the divergence loss, both of which can update the values of the individual elements in the matrix by gradient back propagation.
Classification loss:
for the extracted feature f of each sample, the mahalanobis distance between the feature and the class center point needs to be calculated, assuming that the class center point is the set c= [ C ] 1 ,C 2 ,…,C m ]And m is the class number of the data set, if the label of the image feature f is the ith class, the class-to-class distance calculation method is shown as a formula (7), the aggregation degree of the ith class is expressed by using the distance between the ith class feature and the center point of the ith class feature, and the intra-class distance is required to be ensured to be small enough for a classification task or a verification task.
D(f,C i )=‖K(f-C i )‖ 2 (7)
When the mahalanobis distance training classification model is used, the sample can be correctly classified only if the mahalanobis distance between the sample feature and the corresponding class is smaller than the mahalanobis distance between the sample feature and the center point of all other classes, that is, the inequality shown in the formula (8) needs to be satisfied.
D(f,C i )<D(f,C j ),j=1,2,…,m,j≠i (8)
The patent of the invention will classify the loss L 1 The definition is shown in the formula (9), the classification loss value is smaller and smaller along with continuous training, and the image characteristics are thatThe mahalanobis distance of the sign f from the center point of the corresponding class and the mahalanobis distance of the feature from other center points are also as small as possible.
Divergence loss:
for each random lot participating in training, the sample set is J, the class set of samples in the random lot is B, each lot adopts a random sampling strategy, all classes in the training set may not be contained, if the mahalanobis distance between every two center points of all classes is calculated as the inter-class distance in each random lot, the calculation amount is great, so the patent considers that only the classes appearing in the class set B of the random lot are calculated.
With B= [ B ] 1 ,b 2 ,…,b r ]For example, the present invention will restrict the minimum value of the mahalanobis distance between each category center point and other category center points, i.e., calculate a distance matrix ZεR r×r Element z in the matrix ij Representing the mahalanobis distance between the center point of the i-th class and the center point of the j-th class.
For the center point of each category B, a shortest distance can be found, as shown in formula (11), and all v are calculated 1 ,v 2 ,v 3 ,…,v r Weighted sum of the two to obtain the inter-class distance d inter The inter-class distance can be used directly as a dispersion loss as in equation (12), but the invention logarithms the distance value for better convergence characteristics, the loss function is shown in equation (13).
v i =min(z i1 ,z i2 ,…,z ir ) (11)
L 2 =-log(d inter +1) (13)
Calculating the inter-class distance requires setting a weight w for the shortest distance of each class i The weight value is related to the number of samples for each class in the random lot J. In the present patent, w i The calculation method of (2) is shown as a formula (14), wherein n is as follows i The number of samples of the i-th class in the set J is represented. The weight is related to the number of samples of each category in the batch, and the larger the number of samples, the larger the weight at the time of weighted summation, and the smaller the number of samples, the smaller the weight at the time of weighted summation.
Visual explanation thereof as shown in fig. 2, the more samples in a certain category, the more likely the dispersion thereof, the more space is occupied in a high-dimensional space. For the three classes A, B and C in FIG. 2, v is calculated A ,v B ,v C The method is characterized in that the method comprises the steps of respectively obtaining line segments AC, BC and CB, wherein more importantly, samples of two types B and C are far away from a sample of A as far as possible, and the constraint on the line segment BC is smaller than the constraint on the AC due to the fact that the number of the samples is smaller between the samples B and C, so that the weight of the shortest distance between the class A and other classes needs to be increased, the class and the other classes can be kept far away as far as possible, the inter-class distance of all the classes in a data set is further increased, and the feature distribution of the samples is more beneficial to classification.
Two-stage updating:
in order to make the training process of the classification model based on the class center point have practical significance, the invention provides a two-stage updating strategy, wherein the two stages are an intra-class optimization stage and an inter-class optimization stage respectively. The class center adopts a random initialization method, firstly, an in-class optimization stage is executed on one or more random batches, class loss is calculated, gradient directions of a loss function are transmitted, and parameters in a feature extraction module are updated; and then, performing an inter-class optimization stage on one or more random batches, calculating the inter-class distance and the divergence loss of the class center, counter-propagating the gradient of the loss, and updating the parameters of the center points according to the distance between the center points of the classes, namely keeping the classes as far as possible. According to the embodiment, the feature extraction module and the class center point are updated alternately, so that the problem that the model is difficult to train can be avoided.
The updating strategy diagram is shown in fig. 1, the computing of the classification constraints will back-propagate the gradient to the convolutional neural network (fig. 1), the updating of the parameters of the feature extraction module, the computing of the inter-class constraints will back-propagate the gradient to the class center point (fig. 2), and the updating of the trainable parameters of the center point. The trainable parameters in the mahalanobis distance are updated, whether it be a classification constraint or an inter-class constraint.
Difficult sample mechanism:
for each input sample, the mahalanobis distance between the feature f and the center point of all classes is D (f, C j ) J=1, 2, …, m, assuming that the sample belongs to the i-th class, if the set of formula (15) is not empty and p is the residual super-parameter, the sample is judged as a difficult sample.
Q={q|pD(f,C i )>D(f,C q ),q=1,2,…,m,q≠i} (15)
In the embodiment of the patent, p > 1 is taken, and as illustrated in fig. 3, X in fig. 3 (a) is a certain sample characteristic point position of class 2, four open circles represent four class center points, and when X and center point 2 are relatively close and can be correctly classified even if multiplied by a coefficient greater than 1, the sample is defined as a simple sample; in fig. 3 (b), Y is also the position of a certain sample feature point of class 2, since the distance between Y and the center point 2 is the shortest, and it can be correctly classified, but if the distance is multiplied by a factor greater than 1, it is possible that the shortest distance becomes the distance between Y and the center point 3, and the sample is defined as a difficult sample. As can be seen from fig. 3, when the sample features and the correct class center are close enough, the model can already classify the sample well, so the sample is a simple sample with a smaller loss value and less promotion of model training; for samples that can be correctly classified but cannot be significantly less distant from the correct class center than the distance between the sample and the other class center, the use of the sample can further facilitate training of the model, because the sample can be correctly classified, its value in the loss function is also relatively small, so that it is necessary to add a difficult sample loss to the loss function.
In calculating the hard sample loss, unlike equation (9), the denominator portion of the loss function is no longer the sum of the distances between the feature points and all class center points, but only the distance between the class center point and feature f in the set Q is considered, as shown in equation (16) (assuming that the sample belongs to class i).
The background section of the present invention may contain background information about the problems or environments of the present invention and is not necessarily descriptive of the prior art. Accordingly, inclusion in the background section is not an admission of prior art by the applicant.
While there have been described and illustrated what are considered to be example embodiments of the present invention, it will be understood by those skilled in the art that various changes and substitutions can be made therein without departing from the spirit of the invention. In addition, many modifications may be made to adapt a particular situation to the teachings of the invention without departing from the central concept thereof as described herein. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed, but that the invention will include all embodiments falling within the scope of the invention and equivalents thereof.
Claims (10)
1. An image classification method, characterized by comprising the steps of:
1) According to the number of categories of the input image, constructing category center points, wherein each category center point is a high-dimensional vector, the dimension of the vector is equal to the dimension of the feature, and the number of the center points is equal to the number of categories in the data set;
2) The method comprises the steps of using a convolutional neural network as a feature extraction module, for each random batch in a training process, calculating a feature vector for each sample by using the convolutional neural network, calculating a mahalanobis distance between the feature vector and a class center point as an intra-class distance, constructing a classification loss according to the intra-class distance, and updating parameters of the convolutional neural network by back propagation of gradients of the classification loss to optimize a model towards a direction in which the intra-class distance is reduced;
3) For each random batch in the training process, calculating the distance between the corresponding class center points according to the class of the samples in the batch, constructing a divergence loss according to the distance, updating the parameters of the class center points by the gradient back propagation of the divergence loss, and optimizing the parameters of the class center points towards the direction of increasing the distance between classes;
the weight of the feature extraction module and the trainable parameters of the class center are alternately updated through a two-stage training mechanism, and the trainable parameters in the mahalanobis distance are updated in both stages.
2. The image classification method according to claim 1, wherein the two-stage training mechanism is to randomly initialize a class center first, then to alternately train process optimization model parameters in two stages, namely an intra-class optimization stage and an inter-class optimization stage, the intra-class optimization stage optimizes the trainable parameters in the feature extraction module of the model according to the classification loss calculated by each random lot, the inter-class optimization stage updates the class center point in the model according to the divergence loss calculated by each random lot, each stage iterates one or more random lot back-alternating stages, and the intra-class optimization stage and the inter-class optimization stage optimize the parameters in the mahalanobis distance.
3. The image classification method of claim 1 or 2, wherein a weight is set for each center point based on the number of sample points of each class in each random lot to reduce the influence of class imbalance.
4. The image classification method according to any one of claims 1 to 2, characterized in that a difficult sample mining mechanism is used, wherein the difficult samples of each random batch are determined according to the mahalanobis distance between the sample feature vector and the center points of each class in the training process, the loss of the difficult samples is calculated, the weight of the difficult samples in the loss function is increased, and the model training is promoted.
5. Image classification method according to any of claims 1-2, characterized in that a parameter matrix K e R is constructed when calculating the mahalanobis distance n×n Vector X, Y ε R n The mahalanobis distance D of the vectors X and Y is defined as follows, where 2 Representing a binary norm:
wherein matrix m=k T K, M is a semi-positive definite matrix, and elements in the parameter matrix K are trainable parameters, and can be optimized according to gradient of the loss function.
6. The image classification method of claim 5, wherein the parameter matrix K is initialized with the identity matrix I, and the mahalanobis distance between the two vectors X and Y is of the form:
D(X,Y)=||K(X-Y)|| 2 =||I(X-Y)|| 2 =||(X-Y)|| 2 (2)
the mahalanobis distance is degenerated to the euclidean distance.
7. The image classification method according to any one of claims 1 to 2, wherein the intra-class distance calculation and the classification loss calculation in step 2) include: for the extracted feature f of each sample, the mahalanobis distance between the feature and the class center point is calculated, the class center point is set c= [ C ] 1 ,C 2 ,...,C m ]Where m is the number of categories of the dataset and the label of feature f is the i-th category, the inter-category distance can be expressed as:
D(f,C i )=||K(f-C i )|| 2 (3)
when the model is used for judging the classification result of the sample, the mahalanobis distance between the sample feature and the corresponding category is ensured to be smaller than the mahalanobis distance between the sample feature and the center point of all other categories, as shown in the formula (4):
D(f,C i )<D(f,C j ),j=1,2,...,m,j≠i (4)
classification loss L 1 Defined as formula (5):
8. the image classification method according to any one of claims 1 to 2, wherein the inter-class distance calculation in step 3) includes: for each random batch participating in training, the sample number is s, the class set of the samples in the random batch is B, the Markov distance between every two classes in the set B is calculated, then for each class in the set B, the minimum value of the distances between other classes in the set B is calculated, and all the minimum values are weighted and averaged to obtain the final inter-class distance.
9. The image classification method of claim 4, wherein the difficult sample mining mechanism comprises: for each input sample, feature f and all class center points C j The March distance between them is D (f, C j ) J=1, 2,..m, if the sample belongs to the i-th class, if q e { x|x=1, 2,..m, x+.i } is present satisfying equation (6):
pD(f,C i )>D(f,C q ) (6)
p is the allowance super-parameter, the sample is judged as a difficult sample,
the set of Q satisfying all conditions is Q, and the difficult sample loss is defined as follows:
10. a computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the image classification method according to any one of claims 1 to 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110136790.6A CN112836629B (en) | 2021-02-01 | 2021-02-01 | Image classification method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110136790.6A CN112836629B (en) | 2021-02-01 | 2021-02-01 | Image classification method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112836629A CN112836629A (en) | 2021-05-25 |
CN112836629B true CN112836629B (en) | 2024-03-08 |
Family
ID=75931273
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110136790.6A Active CN112836629B (en) | 2021-02-01 | 2021-02-01 | Image classification method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112836629B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115880524A (en) * | 2022-11-17 | 2023-03-31 | 苏州大学 | Small sample image classification method based on Mahalanobis distance loss characteristic attention network |
CN117314891B (en) * | 2023-11-23 | 2024-04-12 | 南阳市永泰光电有限公司 | Optical lens surface defect detection method and system based on image processing |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109214360A (en) * | 2018-10-15 | 2019-01-15 | 北京亮亮视野科技有限公司 | A kind of construction method of the human face recognition model based on ParaSoftMax loss function and application |
CN109961089A (en) * | 2019-02-26 | 2019-07-02 | 中山大学 | Small sample and zero sample image classification method based on metric learning and meta learning |
WO2019128367A1 (en) * | 2017-12-26 | 2019-07-04 | 广州广电运通金融电子股份有限公司 | Face verification method and apparatus based on triplet loss, and computer device and storage medium |
WO2019127451A1 (en) * | 2017-12-29 | 2019-07-04 | 深圳前海达闼云端智能科技有限公司 | Image recognition method and cloud system |
CN111079790A (en) * | 2019-11-18 | 2020-04-28 | 清华大学深圳国际研究生院 | Image classification method for constructing class center |
CN111242199A (en) * | 2020-01-07 | 2020-06-05 | 中国科学院苏州纳米技术与纳米仿生研究所 | Training method and classification method of image classification model |
CN111429405A (en) * | 2020-03-04 | 2020-07-17 | 清华大学深圳国际研究生院 | Tin ball defect detection method and device based on 3D CNN |
CN111429407A (en) * | 2020-03-09 | 2020-07-17 | 清华大学深圳国际研究生院 | Chest X-ray disease detection device and method based on two-channel separation network |
CN111814584A (en) * | 2020-06-18 | 2020-10-23 | 北京交通大学 | Vehicle weight identification method under multi-view-angle environment based on multi-center measurement loss |
CN111985310A (en) * | 2020-07-08 | 2020-11-24 | 华南理工大学 | Training method of deep convolutional neural network for face recognition |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11636344B2 (en) * | 2018-03-12 | 2023-04-25 | Carnegie Mellon University | Discriminative cosine embedding in machine learning |
US10872258B2 (en) * | 2019-03-15 | 2020-12-22 | Huawei Technologies Co., Ltd. | Adaptive image cropping for face recognition |
US11720790B2 (en) * | 2019-05-22 | 2023-08-08 | Electronics And Telecommunications Research Institute | Method of training image deep learning model and device thereof |
-
2021
- 2021-02-01 CN CN202110136790.6A patent/CN112836629B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019128367A1 (en) * | 2017-12-26 | 2019-07-04 | 广州广电运通金融电子股份有限公司 | Face verification method and apparatus based on triplet loss, and computer device and storage medium |
WO2019127451A1 (en) * | 2017-12-29 | 2019-07-04 | 深圳前海达闼云端智能科技有限公司 | Image recognition method and cloud system |
CN109214360A (en) * | 2018-10-15 | 2019-01-15 | 北京亮亮视野科技有限公司 | A kind of construction method of the human face recognition model based on ParaSoftMax loss function and application |
CN109961089A (en) * | 2019-02-26 | 2019-07-02 | 中山大学 | Small sample and zero sample image classification method based on metric learning and meta learning |
CN111079790A (en) * | 2019-11-18 | 2020-04-28 | 清华大学深圳国际研究生院 | Image classification method for constructing class center |
CN111242199A (en) * | 2020-01-07 | 2020-06-05 | 中国科学院苏州纳米技术与纳米仿生研究所 | Training method and classification method of image classification model |
CN111429405A (en) * | 2020-03-04 | 2020-07-17 | 清华大学深圳国际研究生院 | Tin ball defect detection method and device based on 3D CNN |
CN111429407A (en) * | 2020-03-09 | 2020-07-17 | 清华大学深圳国际研究生院 | Chest X-ray disease detection device and method based on two-channel separation network |
CN111814584A (en) * | 2020-06-18 | 2020-10-23 | 北京交通大学 | Vehicle weight identification method under multi-view-angle environment based on multi-center measurement loss |
CN111985310A (en) * | 2020-07-08 | 2020-11-24 | 华南理工大学 | Training method of deep convolutional neural network for face recognition |
Non-Patent Citations (2)
Title |
---|
基于小样本学习的高光谱遥感图像分类算法;张婧;袁细国;;聊城大学学报(自然科学版);20200804(第06期);全文 * |
基于深度卷积神经网络与中心损失的人脸识别;张延安;王宏玉;徐方;;科学技术与工程;20171218(第35期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN112836629A (en) | 2021-05-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108647583B (en) | Face recognition algorithm training method based on multi-target learning | |
US7711156B2 (en) | Apparatus and method for generating shape model of object and apparatus and method for automatically searching for feature points of object employing the same | |
CN107885778B (en) | Personalized recommendation method based on dynamic near point spectral clustering | |
CN111523621A (en) | Image recognition method and device, computer equipment and storage medium | |
CN113378632A (en) | Unsupervised domain pedestrian re-identification algorithm based on pseudo label optimization | |
CN110852755B (en) | User identity identification method and device for transaction scene | |
CN111339988B (en) | Video face recognition method based on dynamic interval loss function and probability characteristic | |
US9189750B1 (en) | Methods and systems for sequential feature selection based on significance testing | |
CN112836629B (en) | Image classification method | |
CN110046634B (en) | Interpretation method and device of clustering result | |
Barman et al. | Shape: A novel graph theoretic algorithm for making consensus-based decisions in person re-identification systems | |
CN111079790B (en) | Image classification method for constructing class center | |
CN110942091A (en) | Semi-supervised few-sample image classification method for searching reliable abnormal data center | |
Xing et al. | A self-organizing incremental neural network based on local distribution learning | |
WO2021079442A1 (en) | Estimation program, estimation method, information processing device, relearning program, and relearning method | |
US20230267317A1 (en) | Sign-aware recommendation apparatus and method using graph neural network | |
CN112926397A (en) | SAR image sea ice type classification method based on two-round voting strategy integrated learning | |
CN115311478A (en) | Federal image classification method based on image depth clustering and storage medium | |
CN112668482A (en) | Face recognition training method and device, computer equipment and storage medium | |
US6778701B1 (en) | Feature extracting device for pattern recognition | |
CN114255381A (en) | Training method of image recognition model, image recognition method, device and medium | |
Dou et al. | V-SOINN: A topology preserving visualization method for multidimensional data | |
CN115563519A (en) | Federal contrast clustering learning method and system for non-independent same-distribution data | |
CN113724325B (en) | Multi-scene monocular camera pose regression method based on graph convolution network | |
Kajimura et al. | Quality control for crowdsourced POI collection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |