CN112836629B - Image classification method - Google Patents

Image classification method Download PDF

Info

Publication number
CN112836629B
CN112836629B CN202110136790.6A CN202110136790A CN112836629B CN 112836629 B CN112836629 B CN 112836629B CN 202110136790 A CN202110136790 A CN 202110136790A CN 112836629 B CN112836629 B CN 112836629B
Authority
CN
China
Prior art keywords
class
distance
sample
loss
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110136790.6A
Other languages
Chinese (zh)
Other versions
CN112836629A (en
Inventor
王好谦
刘志宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen International Graduate School of Tsinghua University
Original Assignee
Shenzhen International Graduate School of Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen International Graduate School of Tsinghua University filed Critical Shenzhen International Graduate School of Tsinghua University
Priority to CN202110136790.6A priority Critical patent/CN112836629B/en
Publication of CN112836629A publication Critical patent/CN112836629A/en
Application granted granted Critical
Publication of CN112836629B publication Critical patent/CN112836629B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an image classification method, which is based on a class center updating strategy of random batches and can be applied to an image classification task and an image verification task based on class center points, and comprises the following steps: constructing a category center and extracting image features; calculating the mahalanobis distance between the image features and the class center points, and constructing classification loss; calculating the mahalanobis distance between class centers of the corresponding classes of the samples in the random batch, and constructing dispersion loss; and a two-stage weight updating method is adopted to update the trainable parameters in the feature extraction module and the class center point alternately. The updating strategy of the invention can lead the image classification method based on the class center to have better engineering significance and obtain better image classification result. Compared with the prior art, the invention obtains better classification results and verification results on the image classification and image verification tasks.

Description

Image classification method
Technical Field
The invention relates to the field of computer vision and image processing, in particular to an image classification method based on metric learning, and a class center updating strategy based on random batches.
Background
In the task of image classification, a classification method based on cross entropy loss is more common, the specific flow of the method is to use a convolutional neural network to extract the characteristics of an input sample, the input sample is generally an image, the characteristics of the sample are generally high-dimensional vectors, such as 512-dimensional characteristic vectors, and then use a multi-layer perceptron to classify the characteristics of the image.
For the face recognition task, the identity of each face picture in the data set is required to be determined, the training process of the face recognition model is to train a classification model, the feature is extracted by using a convolutional neural network, then the score of each type of feature is obtained through a full connection layer, the face recognition task does not need to classify the input samples in the verification stage and the test stage, and the face identities encountered in the test stage or the verification stage generally do not appear in the training set, so that the convolutional neural network is only required to extract the feature vectors of the input images in the test stage and the verification stage, the feature vectors of the two face pictures are normalized, the feature vectors of each sample are changed into the same length, but the feature vectors of different samples have different angles, and in the training process, the feature extracted by the face picture of the same person can be gathered as much as possible, so that whether the two samples belong to the same type can be judged according to the angle between the two feature vectors.
For common classification tasks and face recognition tasks, the characteristic extraction module of the model is generally a convolutional neural network, for the classification model, cross entropy loss is generally adopted, and the calculation process is shown in formulas (1) and (2), wherein e is a natural constant, and logic i Representing the score of a feature belonging to class i, where P J The probability that the feature belongs to the i-th class is expressed, and the loss value is expressed as formula (2) assuming that the feature is input to the j-th class. Features trained using cross entropy loss have angular features, i.e., feature vectors exhibit angular distribution features in high-dimensional space.
L=-logP j (2)
In the face recognition task, in order to better compare whether two face pictures with unknown identities belong to the same class, model training needs to ensure that the intra-class distance is smaller than the inter-class distance, that is, the model training needs to ensure that the intra-class distance can be directly setAnd a threshold value, wherein when the angle between the two features is larger than the threshold value, the two features belong to face pictures of different people, and if the angle between the two features is smaller than the threshold value, the two features belong to face pictures of the same person. A more common loss function is shown in formula (3), where s is a scale parameter, θ i Representing the angle between the feature vector and the i-th node weight vector of the fully connected layer, a is a margin parameter, and the constraint can be added, which belongs to an improved cross entropy loss.
In addition to the method based on improving the cross entropy loss, there is a better method at present that is a method for constructing class centers, for example, a feature vector of a sample and euclidean distance between center points of each class are added on the cross entropy loss as intra-class constraints, but the method uses the cross entropy loss of euclidean distance and angle characteristics at the same time, and there is inconsistency in constraints. Still another method is to directly use euclidean distance as a measure between features and also construct class center points, but this method needs to calculate euclidean distance between class centers when calculating the inter-class distance, and if it is used in face recognition task, this method is not applicable any more because the number of classes of face data sets is large. It is necessary to design a more practical inter-class distance calculation method. Moreover, whether the Euclidean distance or the cosine similarity based on the angle characteristics, the measurement mode between the features is set artificially, and for deep learning, the artificial setting is not necessarily the best, and may be only a local optimal solution.
It should be noted that the information disclosed in the above background section is only for understanding the background of the present application and thus may include information that does not form the prior art that is already known to those of ordinary skill in the art.
Disclosure of Invention
In order to overcome the defects of the background technology, the invention provides an image classification method, which is based on a class center updating strategy of random batches, so as to effectively solve the defects in the image classification and face recognition method based on measurement learning and further improve the performance of a model.
The image classification method provided by the invention comprises the following steps:
1) And constructing class center points according to the number of classes of the input image, wherein each class center point is a high-dimensional vector, the dimension of the vector is equal to the dimension of the feature, and the number of the center points is equal to the number of classes in the data set.
2) And for each random batch in the training process, calculating a feature vector by using the convolutional neural network, calculating a mahalanobis distance between the feature vector and a class center point as an intra-class distance, constructing a classification loss according to the intra-class distance, and updating parameters of the convolutional neural network by back propagation of gradients of the classification loss to ensure that the model is optimized towards the direction of reducing the intra-class distance.
3) For each random batch in the training process, calculating the distance between the corresponding class center points according to the class of the samples in the batch, constructing a divergence loss according to the distance, updating the parameters of the class center points by the gradient back propagation of the divergence loss, and optimizing the parameters of the class center points towards the direction of increasing the distance between classes.
The weight of the feature extraction module and the trainable parameters of the class center are alternately updated through a two-stage training mechanism, and the trainable parameters in the mahalanobis distance are updated in both stages.
According to an embodiment of the present invention, in step 1), the vector is measured in a high-dimensional space by a mahalanobis distance. And constructing a class center point which is used as a basis of image classification, wherein the class center contains trainable parameters. The constructed class center point serves as a plurality of vectors of a high-dimensional space, and random initialization is adopted for initialization. The method is characterized in that the Marshall distance is used for measuring the characteristics, and the method is a measurement method with trainable parameters, and in a deep learning task, a model can automatically learn a proper measurement method according to a specific task.
In step 2), the convolutional neural network is used for extracting the characteristics of the image, the mahalanobis distance between the image characteristics and the centers of each class is calculated, the class loss is constructed by utilizing the distance, and the mahalanobis distance contains trainable parameters. According to the extracted sample characteristics, the current parameters are used for calculating the mahalanobis distance between the characteristic vector and the center vector of each category, the classification loss is also constructed based on the mahalanobis distance between the sample and each center point, along with the continuous training of the model, the mahalanobis distance between the characteristic and the center point of the correct category is continuously reduced, and the mahalanobis distance between the characteristic and the center point of other categories is continuously increased.
In step 3), the mahalanobis distance between class centers of the classes corresponding to the samples in each random batch is calculated and used as the inter-class distance, the divergence loss is constructed, the weight of the class center point can be updated, and the model is optimized towards the direction of increasing the inter-class distance. When calculating the inter-class distance, only the class of samples in the random lot can be considered, the divergence loss is a weighted average of the shortest distances between the center point of each class and other center points, and the shortest distance of each class is weighted according to the number of samples in each class in the random lot.
By adopting a two-stage weight updating method, the trainable parameters in the feature extraction module and the class center point are alternately updated. The model adopts a two-stage training mode in the training process, namely, firstly, parameters in the feature extraction module are updated by using classification loss, so that the model can better extract image features, and then, trainable parameters in a class center point are updated by using divergence loss. Both the classification loss and the divergence loss update the trainable parameters in the mahalanobis distance.
In some embodiments, the two-stage training mechanism is to randomly initialize a class center, then to alternately train the process to optimize model parameters, wherein the two stages are respectively an intra-class optimization stage and an inter-class optimization stage, the intra-class optimization stage optimizes the trainable parameters in the feature extraction module of the model according to the classification loss calculated by each random batch, the inter-class optimization stage updates the class center point in the model according to the divergence loss calculated by each random batch, each stage iterates one or more random batch back-alternating stages, and the intra-class optimization stage and the inter-class optimization stage optimize the parameters in the mahalanobis distance.
In some embodiments, weights are set for each center point based on the number of class sample points in each random lot, reducing the impact of class imbalance.
In some embodiments, a difficult sample mining mechanism is used, difficult samples of each random batch are determined according to the mahalanobis distance between the sample feature vector and each class center point in the training process, the loss of the difficult samples is calculated, the weight of the difficult samples in the loss function is increased, and model training is promoted. In the preferred embodiment, samples meeting certain conditions are divided into difficult samples in the training process through a difficult sample mining mechanism, and the weights of the samples in the loss value are increased. In the model training process, samples with wrong classification are judged to be difficult samples, and extra weights can be added to the classification loss of the difficult samples in the loss function, so that the weights of the samples in the loss function can be increased, and further, the model can obtain a better training effect.
In some embodiments, the mahalanobis distance is a learnable metric, and the parameter matrix K.epsilon.R is constructed when calculating the mahalanobis distance n×n Vector X, Y ε R n The mahalanobis distance D of the vectors X and Y is defined as follows, wherein iij ·j 2 Representing a binary norm:
wherein matrix m=k T K, M is a semi-positive definite matrix, and elements in the parameter matrix K are trainable parameters, and can be optimized according to gradient of the loss function.
In some embodiments, the parameter matrix K is initialized with the identity matrix I, and the mahalanobis distance between the two vectors X and Y is of the form:
D(X,Y)=‖K(X-Y)‖ x2 =‖I(X-Y)‖ 2 =‖(X-Y)‖ 2 (2) The mahalanobis distance is degenerated to the euclidean distance.
In some embodiments, identity matrix initialization may ensure that the model is trained starting from euclidean distance, but as the model computes the loss function and counter-propagates gradients over each random lot, elements in the parameter matrix may be updated, and mahalanobis distance becomes the form of equation (1), allowing the model to learn better metrics in high-dimensional space for different classification tasks.
In some embodiments, the intra-class distance calculation and the classification loss calculation in step 2) include: for the extracted feature f of each sample, the mahalanobis distance between the feature and the class center point is calculated, the class center point is set c= [ C ] 1 ,C 2 ,…,C m ]Where m is the number of categories of the dataset and the label of feature f is the i-th category, the inter-category distance can be expressed as:
D(f,C i )=‖K(f-C i )‖ 2 (3)
when the model is used for judging the classification result of the sample, the mahalanobis distance between the sample feature and the corresponding category is ensured to be smaller than the mahalanobis distance between the sample feature and the center point of all other categories, as shown in the formula (4):
D(f,C i )<D(f,C j ),j=1,2,…,m,j≠i (4)
classification loss L 1 Defined as formula (5):
in some embodiments, the inter-class distance calculation in step 3) comprises: for each random batch participating in training, the sample number is s, the class set of the samples in the random batch is B, the Markov distance between every two classes in the set B is calculated, then for each class in the set B, the minimum value of the distances between other classes in the set B is calculated, and all the minimum values are weighted and averaged to obtain the final inter-class distance.
In some embodimentsThe difficult sample mining mechanism includes: for each input sample, the mahalanobis distance between the feature f and the center point of all classes is D (f, C j ) If the sample belongs to the i-th class, if q epsilon { x|x=1, 2, …, m, x +.i } satisfies the formula (6), p is the residual super-parameter, the sample is judged as a difficult sample
pD(f,C i )>D(f,C q ) (6)
The set of Q satisfying all conditions is Q, and the difficult sample loss is defined as follows:
a computer readable storage medium storing a computer program which when executed by a processor implements the image classification method.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides an image classification method, which designs a class center updating strategy based on random batches, and can be applied to an image classification task and an image verification task based on class center points. Compared with a general class center point-based image classification method, the random batch-based center point updating strategy provided by the invention can only calculate the inter-class distance for the class of the sample in the random batch, and construct the dispersion loss according to the inter-class distance, so that the model can learn the feature distribution with larger inter-class distance. The invention can enable the image classification method based on the class center to be directly used for the task of large class number of the data set such as face recognition. Moreover, the two-stage training, the weight calculation of inter-class distance weighted average and the difficult sample mining mechanism proposed by the invention can ensure that the method based on the invention can obtain good effects in tasks such as image classification, face verification and the like, and avoid training barriers on data sets with larger sample numbers and class numbers.
The image classification method of the invention provides a class center updating strategy based on random batches, updates corresponding class centers aiming at classes in the random batches, and designs a Mahalanobis distance with a parameter which can be learned as a measurement method between features. The updating strategy of the invention can lead the image classification method based on the class center to have better engineering significance and obtain better image classification result. Compared with the prior art, the invention obtains better classification results and verification results on the image classification and image verification tasks.
Drawings
Fig. 1 is a basic flowchart of an update strategy in an image classification method according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of inter-class distance weights according to an embodiment of the present invention.
FIG. 3 is a schematic diagram of a simple sample and a difficult sample according to an embodiment of the present invention.
Detailed Description
The invention will be described in further detail with reference to the following detailed description and with reference to the accompanying drawings. It should be emphasized that the following description is merely exemplary in nature and is in no way intended to limit the scope of the invention or its applications.
Non-limiting and non-exclusive embodiments will be described with reference to the following drawings, in which like reference numerals refer to like elements unless otherwise specified.
Those skilled in the art will recognize that numerous variations to the above description are possible, and that the examples are intended only to be illustrative of one or more particular implementations.
According to the image classification method, a class center updating strategy based on random batches is provided, and the method can be applied to image classification and image verification tasks. The method mainly comprises the following steps: constructing a category center and extracting image features; calculating the mahalanobis distance between the image features and the class center points, and constructing classification loss; calculating the mahalanobis distance between class centers of the corresponding classes of the samples in the random batch, and constructing dispersion loss; adopting a two-stage weight updating method to alternately update the trainable parameters in the feature extraction module and the class center point; using a difficult sample mining mechanism, samples meeting certain conditions are divided into difficult samples during the training process, and the weights of the samples in the loss values are increased.
As described in further detail below.
Feature vector and class center: the feature vector needs to be extracted using a convolutional neural network, and for an input image, a high-dimensional vector f is extracted as the feature of the input image. In the present invention, it is necessary to construct a class center vector, i.e., a center point, for each class, where the set of center points is c= [ C ] 1 ,C 2 ,…,C m ]Wherein, wherein.
Mahalanobis distance:
two eigenvectors x= [ x ] for high-dimensional space 1 ,x 2 ,…,x n ] T ,y=[y 1 ,y 2 ,…,y n ] T Various feature measurement methods can be adopted, the common feature measurement methods are Euclidean distance dis and cosine similarity sim, the calculation processes of the two measurement methods are shown in the formula (4) and the formula (5), after the length of the vector is normalized, the two distances are equivalent, the Euclidean distance and the cosine similarity are the vector measurement methods of the better features, but the defects are that the two methods are the artificially designed distances, the image features extracted by the neural network are quite abstract, and a more suitable distance measurement method cannot be determined artificially.
The invention proposes a method for measuring the image feature vector by using the mahalanobis distance, which uses a trainable parameter matrix M to calculate the distance D between two feature vectors, namely the mahalanobis distance, and the calculation method is shown as a formula (6).
Wherein matrix m=k T K, M is a semi-positive definite matrix, and elements in the parameter matrix K are trainable parameters, and can be optimized according to gradient of the loss function. When the matrix K or the matrix M is the identity matrix I, the Marshall distance is the Euclidean distance, and in order to enable the model to find a better measurement method based on the Euclidean distance, the invention provides a measurement method which is not worse than the Euclidean distance and can be learned along with continuous training of the model by considering the initializing operation of the identity matrix to the parameter matrix M.
The present patent calculates the classification loss and the divergence loss, both of which can update the values of the individual elements in the matrix by gradient back propagation.
Classification loss:
for the extracted feature f of each sample, the mahalanobis distance between the feature and the class center point needs to be calculated, assuming that the class center point is the set c= [ C ] 1 ,C 2 ,…,C m ]And m is the class number of the data set, if the label of the image feature f is the ith class, the class-to-class distance calculation method is shown as a formula (7), the aggregation degree of the ith class is expressed by using the distance between the ith class feature and the center point of the ith class feature, and the intra-class distance is required to be ensured to be small enough for a classification task or a verification task.
D(f,C i )=‖K(f-C i )‖ 2 (7)
When the mahalanobis distance training classification model is used, the sample can be correctly classified only if the mahalanobis distance between the sample feature and the corresponding class is smaller than the mahalanobis distance between the sample feature and the center point of all other classes, that is, the inequality shown in the formula (8) needs to be satisfied.
D(f,C i )<D(f,C j ),j=1,2,…,m,j≠i (8)
The patent of the invention will classify the loss L 1 The definition is shown in the formula (9), the classification loss value is smaller and smaller along with continuous training, and the image characteristics are thatThe mahalanobis distance of the sign f from the center point of the corresponding class and the mahalanobis distance of the feature from other center points are also as small as possible.
Divergence loss:
for each random lot participating in training, the sample set is J, the class set of samples in the random lot is B, each lot adopts a random sampling strategy, all classes in the training set may not be contained, if the mahalanobis distance between every two center points of all classes is calculated as the inter-class distance in each random lot, the calculation amount is great, so the patent considers that only the classes appearing in the class set B of the random lot are calculated.
With B= [ B ] 1 ,b 2 ,…,b r ]For example, the present invention will restrict the minimum value of the mahalanobis distance between each category center point and other category center points, i.e., calculate a distance matrix ZεR r×r Element z in the matrix ij Representing the mahalanobis distance between the center point of the i-th class and the center point of the j-th class.
For the center point of each category B, a shortest distance can be found, as shown in formula (11), and all v are calculated 1 ,v 2 ,v 3 ,…,v r Weighted sum of the two to obtain the inter-class distance d inter The inter-class distance can be used directly as a dispersion loss as in equation (12), but the invention logarithms the distance value for better convergence characteristics, the loss function is shown in equation (13).
v i =min(z i1 ,z i2 ,…,z ir ) (11)
L 2 =-log(d inter +1) (13)
Calculating the inter-class distance requires setting a weight w for the shortest distance of each class i The weight value is related to the number of samples for each class in the random lot J. In the present patent, w i The calculation method of (2) is shown as a formula (14), wherein n is as follows i The number of samples of the i-th class in the set J is represented. The weight is related to the number of samples of each category in the batch, and the larger the number of samples, the larger the weight at the time of weighted summation, and the smaller the number of samples, the smaller the weight at the time of weighted summation.
Visual explanation thereof as shown in fig. 2, the more samples in a certain category, the more likely the dispersion thereof, the more space is occupied in a high-dimensional space. For the three classes A, B and C in FIG. 2, v is calculated A ,v B ,v C The method is characterized in that the method comprises the steps of respectively obtaining line segments AC, BC and CB, wherein more importantly, samples of two types B and C are far away from a sample of A as far as possible, and the constraint on the line segment BC is smaller than the constraint on the AC due to the fact that the number of the samples is smaller between the samples B and C, so that the weight of the shortest distance between the class A and other classes needs to be increased, the class and the other classes can be kept far away as far as possible, the inter-class distance of all the classes in a data set is further increased, and the feature distribution of the samples is more beneficial to classification.
Two-stage updating:
in order to make the training process of the classification model based on the class center point have practical significance, the invention provides a two-stage updating strategy, wherein the two stages are an intra-class optimization stage and an inter-class optimization stage respectively. The class center adopts a random initialization method, firstly, an in-class optimization stage is executed on one or more random batches, class loss is calculated, gradient directions of a loss function are transmitted, and parameters in a feature extraction module are updated; and then, performing an inter-class optimization stage on one or more random batches, calculating the inter-class distance and the divergence loss of the class center, counter-propagating the gradient of the loss, and updating the parameters of the center points according to the distance between the center points of the classes, namely keeping the classes as far as possible. According to the embodiment, the feature extraction module and the class center point are updated alternately, so that the problem that the model is difficult to train can be avoided.
The updating strategy diagram is shown in fig. 1, the computing of the classification constraints will back-propagate the gradient to the convolutional neural network (fig. 1), the updating of the parameters of the feature extraction module, the computing of the inter-class constraints will back-propagate the gradient to the class center point (fig. 2), and the updating of the trainable parameters of the center point. The trainable parameters in the mahalanobis distance are updated, whether it be a classification constraint or an inter-class constraint.
Difficult sample mechanism:
for each input sample, the mahalanobis distance between the feature f and the center point of all classes is D (f, C j ) J=1, 2, …, m, assuming that the sample belongs to the i-th class, if the set of formula (15) is not empty and p is the residual super-parameter, the sample is judged as a difficult sample.
Q={q|pD(f,C i )>D(f,C q ),q=1,2,…,m,q≠i} (15)
In the embodiment of the patent, p > 1 is taken, and as illustrated in fig. 3, X in fig. 3 (a) is a certain sample characteristic point position of class 2, four open circles represent four class center points, and when X and center point 2 are relatively close and can be correctly classified even if multiplied by a coefficient greater than 1, the sample is defined as a simple sample; in fig. 3 (b), Y is also the position of a certain sample feature point of class 2, since the distance between Y and the center point 2 is the shortest, and it can be correctly classified, but if the distance is multiplied by a factor greater than 1, it is possible that the shortest distance becomes the distance between Y and the center point 3, and the sample is defined as a difficult sample. As can be seen from fig. 3, when the sample features and the correct class center are close enough, the model can already classify the sample well, so the sample is a simple sample with a smaller loss value and less promotion of model training; for samples that can be correctly classified but cannot be significantly less distant from the correct class center than the distance between the sample and the other class center, the use of the sample can further facilitate training of the model, because the sample can be correctly classified, its value in the loss function is also relatively small, so that it is necessary to add a difficult sample loss to the loss function.
In calculating the hard sample loss, unlike equation (9), the denominator portion of the loss function is no longer the sum of the distances between the feature points and all class center points, but only the distance between the class center point and feature f in the set Q is considered, as shown in equation (16) (assuming that the sample belongs to class i).
The background section of the present invention may contain background information about the problems or environments of the present invention and is not necessarily descriptive of the prior art. Accordingly, inclusion in the background section is not an admission of prior art by the applicant.
While there have been described and illustrated what are considered to be example embodiments of the present invention, it will be understood by those skilled in the art that various changes and substitutions can be made therein without departing from the spirit of the invention. In addition, many modifications may be made to adapt a particular situation to the teachings of the invention without departing from the central concept thereof as described herein. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed, but that the invention will include all embodiments falling within the scope of the invention and equivalents thereof.

Claims (10)

1. An image classification method, characterized by comprising the steps of:
1) According to the number of categories of the input image, constructing category center points, wherein each category center point is a high-dimensional vector, the dimension of the vector is equal to the dimension of the feature, and the number of the center points is equal to the number of categories in the data set;
2) The method comprises the steps of using a convolutional neural network as a feature extraction module, for each random batch in a training process, calculating a feature vector for each sample by using the convolutional neural network, calculating a mahalanobis distance between the feature vector and a class center point as an intra-class distance, constructing a classification loss according to the intra-class distance, and updating parameters of the convolutional neural network by back propagation of gradients of the classification loss to optimize a model towards a direction in which the intra-class distance is reduced;
3) For each random batch in the training process, calculating the distance between the corresponding class center points according to the class of the samples in the batch, constructing a divergence loss according to the distance, updating the parameters of the class center points by the gradient back propagation of the divergence loss, and optimizing the parameters of the class center points towards the direction of increasing the distance between classes;
the weight of the feature extraction module and the trainable parameters of the class center are alternately updated through a two-stage training mechanism, and the trainable parameters in the mahalanobis distance are updated in both stages.
2. The image classification method according to claim 1, wherein the two-stage training mechanism is to randomly initialize a class center first, then to alternately train process optimization model parameters in two stages, namely an intra-class optimization stage and an inter-class optimization stage, the intra-class optimization stage optimizes the trainable parameters in the feature extraction module of the model according to the classification loss calculated by each random lot, the inter-class optimization stage updates the class center point in the model according to the divergence loss calculated by each random lot, each stage iterates one or more random lot back-alternating stages, and the intra-class optimization stage and the inter-class optimization stage optimize the parameters in the mahalanobis distance.
3. The image classification method of claim 1 or 2, wherein a weight is set for each center point based on the number of sample points of each class in each random lot to reduce the influence of class imbalance.
4. The image classification method according to any one of claims 1 to 2, characterized in that a difficult sample mining mechanism is used, wherein the difficult samples of each random batch are determined according to the mahalanobis distance between the sample feature vector and the center points of each class in the training process, the loss of the difficult samples is calculated, the weight of the difficult samples in the loss function is increased, and the model training is promoted.
5. Image classification method according to any of claims 1-2, characterized in that a parameter matrix K e R is constructed when calculating the mahalanobis distance n×n Vector X, Y ε R n The mahalanobis distance D of the vectors X and Y is defined as follows, where 2 Representing a binary norm:
wherein matrix m=k T K, M is a semi-positive definite matrix, and elements in the parameter matrix K are trainable parameters, and can be optimized according to gradient of the loss function.
6. The image classification method of claim 5, wherein the parameter matrix K is initialized with the identity matrix I, and the mahalanobis distance between the two vectors X and Y is of the form:
D(X,Y)=||K(X-Y)|| 2 =||I(X-Y)|| 2 =||(X-Y)|| 2 (2)
the mahalanobis distance is degenerated to the euclidean distance.
7. The image classification method according to any one of claims 1 to 2, wherein the intra-class distance calculation and the classification loss calculation in step 2) include: for the extracted feature f of each sample, the mahalanobis distance between the feature and the class center point is calculated, the class center point is set c= [ C ] 1 ,C 2 ,...,C m ]Where m is the number of categories of the dataset and the label of feature f is the i-th category, the inter-category distance can be expressed as:
D(f,C i )=||K(f-C i )|| 2 (3)
when the model is used for judging the classification result of the sample, the mahalanobis distance between the sample feature and the corresponding category is ensured to be smaller than the mahalanobis distance between the sample feature and the center point of all other categories, as shown in the formula (4):
D(f,C i )<D(f,C j ),j=1,2,...,m,j≠i (4)
classification loss L 1 Defined as formula (5):
8. the image classification method according to any one of claims 1 to 2, wherein the inter-class distance calculation in step 3) includes: for each random batch participating in training, the sample number is s, the class set of the samples in the random batch is B, the Markov distance between every two classes in the set B is calculated, then for each class in the set B, the minimum value of the distances between other classes in the set B is calculated, and all the minimum values are weighted and averaged to obtain the final inter-class distance.
9. The image classification method of claim 4, wherein the difficult sample mining mechanism comprises: for each input sample, feature f and all class center points C j The March distance between them is D (f, C j ) J=1, 2,..m, if the sample belongs to the i-th class, if q e { x|x=1, 2,..m, x+.i } is present satisfying equation (6):
pD(f,C i )>D(f,C q ) (6)
p is the allowance super-parameter, the sample is judged as a difficult sample,
the set of Q satisfying all conditions is Q, and the difficult sample loss is defined as follows:
10. a computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the image classification method according to any one of claims 1 to 9.
CN202110136790.6A 2021-02-01 2021-02-01 Image classification method Active CN112836629B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110136790.6A CN112836629B (en) 2021-02-01 2021-02-01 Image classification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110136790.6A CN112836629B (en) 2021-02-01 2021-02-01 Image classification method

Publications (2)

Publication Number Publication Date
CN112836629A CN112836629A (en) 2021-05-25
CN112836629B true CN112836629B (en) 2024-03-08

Family

ID=75931273

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110136790.6A Active CN112836629B (en) 2021-02-01 2021-02-01 Image classification method

Country Status (1)

Country Link
CN (1) CN112836629B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115880524A (en) * 2022-11-17 2023-03-31 苏州大学 Small sample image classification method based on Mahalanobis distance loss characteristic attention network
CN117314891B (en) * 2023-11-23 2024-04-12 南阳市永泰光电有限公司 Optical lens surface defect detection method and system based on image processing

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109214360A (en) * 2018-10-15 2019-01-15 北京亮亮视野科技有限公司 A kind of construction method of the human face recognition model based on ParaSoftMax loss function and application
CN109961089A (en) * 2019-02-26 2019-07-02 中山大学 Small sample and zero sample image classification method based on metric learning and meta learning
WO2019128367A1 (en) * 2017-12-26 2019-07-04 广州广电运通金融电子股份有限公司 Face verification method and apparatus based on triplet loss, and computer device and storage medium
WO2019127451A1 (en) * 2017-12-29 2019-07-04 深圳前海达闼云端智能科技有限公司 Image recognition method and cloud system
CN111079790A (en) * 2019-11-18 2020-04-28 清华大学深圳国际研究生院 Image classification method for constructing class center
CN111242199A (en) * 2020-01-07 2020-06-05 中国科学院苏州纳米技术与纳米仿生研究所 Training method and classification method of image classification model
CN111429405A (en) * 2020-03-04 2020-07-17 清华大学深圳国际研究生院 Tin ball defect detection method and device based on 3D CNN
CN111429407A (en) * 2020-03-09 2020-07-17 清华大学深圳国际研究生院 Chest X-ray disease detection device and method based on two-channel separation network
CN111814584A (en) * 2020-06-18 2020-10-23 北京交通大学 Vehicle weight identification method under multi-view-angle environment based on multi-center measurement loss
CN111985310A (en) * 2020-07-08 2020-11-24 华南理工大学 Training method of deep convolutional neural network for face recognition

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11636344B2 (en) * 2018-03-12 2023-04-25 Carnegie Mellon University Discriminative cosine embedding in machine learning
US10872258B2 (en) * 2019-03-15 2020-12-22 Huawei Technologies Co., Ltd. Adaptive image cropping for face recognition
US11720790B2 (en) * 2019-05-22 2023-08-08 Electronics And Telecommunications Research Institute Method of training image deep learning model and device thereof

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019128367A1 (en) * 2017-12-26 2019-07-04 广州广电运通金融电子股份有限公司 Face verification method and apparatus based on triplet loss, and computer device and storage medium
WO2019127451A1 (en) * 2017-12-29 2019-07-04 深圳前海达闼云端智能科技有限公司 Image recognition method and cloud system
CN109214360A (en) * 2018-10-15 2019-01-15 北京亮亮视野科技有限公司 A kind of construction method of the human face recognition model based on ParaSoftMax loss function and application
CN109961089A (en) * 2019-02-26 2019-07-02 中山大学 Small sample and zero sample image classification method based on metric learning and meta learning
CN111079790A (en) * 2019-11-18 2020-04-28 清华大学深圳国际研究生院 Image classification method for constructing class center
CN111242199A (en) * 2020-01-07 2020-06-05 中国科学院苏州纳米技术与纳米仿生研究所 Training method and classification method of image classification model
CN111429405A (en) * 2020-03-04 2020-07-17 清华大学深圳国际研究生院 Tin ball defect detection method and device based on 3D CNN
CN111429407A (en) * 2020-03-09 2020-07-17 清华大学深圳国际研究生院 Chest X-ray disease detection device and method based on two-channel separation network
CN111814584A (en) * 2020-06-18 2020-10-23 北京交通大学 Vehicle weight identification method under multi-view-angle environment based on multi-center measurement loss
CN111985310A (en) * 2020-07-08 2020-11-24 华南理工大学 Training method of deep convolutional neural network for face recognition

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于小样本学习的高光谱遥感图像分类算法;张婧;袁细国;;聊城大学学报(自然科学版);20200804(第06期);全文 *
基于深度卷积神经网络与中心损失的人脸识别;张延安;王宏玉;徐方;;科学技术与工程;20171218(第35期);全文 *

Also Published As

Publication number Publication date
CN112836629A (en) 2021-05-25

Similar Documents

Publication Publication Date Title
CN108647583B (en) Face recognition algorithm training method based on multi-target learning
US7711156B2 (en) Apparatus and method for generating shape model of object and apparatus and method for automatically searching for feature points of object employing the same
CN107885778B (en) Personalized recommendation method based on dynamic near point spectral clustering
CN111523621A (en) Image recognition method and device, computer equipment and storage medium
CN113378632A (en) Unsupervised domain pedestrian re-identification algorithm based on pseudo label optimization
CN110852755B (en) User identity identification method and device for transaction scene
CN111339988B (en) Video face recognition method based on dynamic interval loss function and probability characteristic
US9189750B1 (en) Methods and systems for sequential feature selection based on significance testing
CN112836629B (en) Image classification method
CN110046634B (en) Interpretation method and device of clustering result
Barman et al. Shape: A novel graph theoretic algorithm for making consensus-based decisions in person re-identification systems
CN111079790B (en) Image classification method for constructing class center
CN110942091A (en) Semi-supervised few-sample image classification method for searching reliable abnormal data center
Xing et al. A self-organizing incremental neural network based on local distribution learning
WO2021079442A1 (en) Estimation program, estimation method, information processing device, relearning program, and relearning method
US20230267317A1 (en) Sign-aware recommendation apparatus and method using graph neural network
CN112926397A (en) SAR image sea ice type classification method based on two-round voting strategy integrated learning
CN115311478A (en) Federal image classification method based on image depth clustering and storage medium
CN112668482A (en) Face recognition training method and device, computer equipment and storage medium
US6778701B1 (en) Feature extracting device for pattern recognition
CN114255381A (en) Training method of image recognition model, image recognition method, device and medium
Dou et al. V-SOINN: A topology preserving visualization method for multidimensional data
CN115563519A (en) Federal contrast clustering learning method and system for non-independent same-distribution data
CN113724325B (en) Multi-scene monocular camera pose regression method based on graph convolution network
Kajimura et al. Quality control for crowdsourced POI collection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant