Disclosure of Invention
The invention provides a small sample image classification method and device based on feature migration and orthogonal prior aiming at the technical problem of high-resolution feature learning in small sample image classification.
In order to achieve the above purpose, the invention provides the following technical scheme:
the invention firstly provides a small sample image classification method based on feature migration and orthogonal prior, which comprises the following steps:
s1, data preparation, namely pre-training an image to obtain an embedded module f θ Extracting the characteristics of an image, wherein the image comprises a training set and a test set;
s2, introducing an orthogonal prior thought into the convolutional neural network model, and constructing a feature learning network model based on feature migration and orthogonal prior;
s3, learning a network model objective function based on the training optimization orthogonal prior feature;
and S4, classifying the images in the test set by using the optimized image orthogonal prior feature learning network model.
Further, step S1 includes:
s11, data are processed
Is divided into
And
two parts, and the two parts are in the same category space, and D is the same as D
train As base class data training model, D
test Testing the model as new data;
s12, for the C-way K-shot classification task, from D
train Randomly selecting C categories, randomly selecting M samples in each category, wherein K samples are used as support samples S
i And the rest M-K samples are used as query samples Q
i ,S
i And Q
i Form a task T
i Same for D
test Has a task
S13, a first training stage: pre-training embedded module f by base class data θ ,f θ The method comprises the following steps of (1) containing 4 convolution blocks, wherein each convolution block contains a convolution layer, a pooling layer and a nonlinear activation function; the window size of convolution kernel used by each convolution block is 3x3, one batch normalization, three channels of RGB, one pooling layer, 2 x 2 maximal poolingLayer, the maximum pooling layer of the last two blocks is cut, and a nonlinear activation layer, the activation function of which adopts ReLu.
Further, in the step S2, in the feature learning network model based on feature migration and orthogonal prior, the orthogonalized feature adaptive network consists of three parts: embedded module f
θ Orthogonal adaptation module
And a metrology module; quadrature adaptation module
The method is composed of two layers of convolution layers, the size of a convolution kernel is 5 multiplied by 5, and the method is used for transforming the new sample characteristics and learning an orthogonalized characteristic subspace.
Further, step S3 includes:
s31, in the second stage of training, a classification task is carried out on the new data, and all the support samples are input into the embedded module f with fixed parameters θ In the method, corresponding support sample characteristics f are obtained θ (S ck );
S32, feature transformation is carried out by utilizing an orthogonal adaptation module to obtain
S33, the transformed features correspond to the masks M of each class c Multiplying to enable the features between different classes to be orthogonal pairwise;
s34, calculating cosine distance C (P) between similar features by utilizing a measurement module ci ,P cj )(i∈[0,K),i≠j);
S35, optimizing the orthogonal adaptation module by utilizing the mean square error loss function
Further, step S33 calculates the formula as:
wherein S is
ck For the kth supported sample of class c,
representing multiplication of corresponding elements of a matrix of the same order, M
cijh Is a mask M of class c
c Value of ith row and jth column of h channel, M
c The elements of (a) are constituted as follows:
c is the total category number under the current task, H is the number of characteristic channels, and H is an integral multiple of C; in the above formula, when h is within a given range (the range of h starts from 0), M cijh Equal to 1 and the values of the remaining positions are 0.
Further, step S34 calculates the formula as follows:
wherein, C (P)
ci ,P
cj ) For calculating cosine distance between the same classes, K is the number of supporting samples, c is the c-th class, P
ci Represents the ith supported sample feature in class c, P
cj Represents the jth supported sample feature in class c,
multiplication of corresponding elements of the representative matrix, | | P
ci I represents solving matrix P
ci The two norms of (a).
Further, the mean square error loss function calculation formula of step S35 is as follows:
wherein N is under the current taskTotal number of classes, C (P) ci ,P cj ) To calculate the cosine distances between classes, where MSE [ cos (P) ] ci ,P cj ),1]=[cos(P ci ,P cj )-1] 2 ;
After loss of the support sample is calculated, gradient reduction is carried out, and a mini-batch and Adam optimizer are adopted to update the orthogonal adaptation module
Training is repeated for multiple tasks until the network converges.
Further, the Adam adaptive optimization algorithm comprises the following specific steps:
initializing the data: v. of dW =0,S dW =0,v db =0,S db =0, which represent biased first and second moment estimates, respectively, dW, db representing the differential of W and b, respectively;
calculating a Momentum exponential weighted average:
v dW =β 1 v dW +(1-β 1 )dW (5)
v db =β 1 v db +(1-β 1 )db (6)
calculating an exponentially weighted average of the gradient differential squares of the RMSprop algorithm formula:
S dW =β 2 S dW +(1-β 2 )(dW) 2 (7)
S db =β 2 S db +(1-β 2 )(db) 2 (8)
calculating the deviation correction of Momentum and RMSprop algorithms:
deviation correction of Momentum algorithm:
deviation correction of RMSprop algorithm:
gradient descent is carried out, and the weight is updated:
in equations (5) - (14), t represents the t-th iteration, α represents the learning rate, which controls the update rate of the weights, ε represents a very small constant, β
1 ,β
2 Respectively representing the exponential decay rates of the first and second moment estimates,
representing the first and second moment estimates after bias correction.
Further, step S4 includes:
s41, test procedure, each task
From the supporting set
And query set
Composition, query set of test set
Input to an embedding module f
θ Orthogonal adaptation module after fine tuning
In (1) obtaining characteristics
S42, respectively matching the characteristics output by the orthogonal adaptation module with the masks M corresponding to different classes c Multiplication, the concrete operation is shown as formula (1):
wherein the content of the first and second substances,
for the k-th query sample,
representing multiplication of corresponding elements of a matrix of the same order, M
cijh Is a mask M of class c
c Value of ith row and jth column of h channel, M
c The elements of (a) are as follows:
c is the total category number under the current task, H is the number of characteristic channels, and H is an integral multiple of C; in the above formula, when h is within a given range (the range of h starts from 0), M cijh Equal to 1, the values of the remaining positions being 0;
s43, sending the product into a measurement module, and calculating a query sample
Cosine distances from all supported samples;
and S44, taking the support sample class with the closest distance as the prediction class of the query sample.
On the other hand, the invention also provides a small sample image classification device based on feature migration and orthogonal prior, which is used for realizing the method and comprises the following functional modules:
a pre-training module for pre-training the image to obtain an embedded module f θ Extracting the characteristics of an image, wherein the image comprises a training set and a test set;
the processing module introduces the idea of orthogonal prior and constructs a feature learning network model based on feature migration and orthogonal prior;
the calculation module is used for solving model parameters based on a training optimization orthogonal prior feature learning network model objective function;
and the classification module is used for classifying the images of the test set by utilizing the optimized image orthogonal prior feature learning network model.
Compared with the prior art, the invention has the beneficial effects that:
the invention discloses a small sample image classification method and device based on feature migration and orthogonal prior, which are based on Deep Convolutional Neural Networks (DCNN), and are used for researching a small sample classification framework of high-resolution feature extraction on the basis of small sample image classification research based on depth measurement. The method comprises the steps of learning an orthogonal feature subspace by introducing feature migration and orthogonal prior small sample image feature learning, assuming a new class and base class shared feature extraction mode and assuming feature orthogonality of new class data among different classes, wherein mutual correlation does not exist, and establishing an orthogonalized feature adaptation network to enable different classes of features to be mutually orthogonal, so that different classes are easily distinguished, and the identification degree of the features is improved. The method has very important significance for theoretical research of small sample learning and promotion of wide application of machine identification technology. Meanwhile, the advanced technology for breaking through the theoretical bottleneck of small sample learning and mastering artificial intelligence in China firstly plays a role in adding bricks and tiles.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. The embodiments of the present invention, and all other embodiments obtained by those skilled in the art without making any creative efforts, belong to the protection scope of the present invention.
The invention also provides a small sample image classification method based on feature migration and orthogonal prior, and the flow is shown in figure 2 and comprises the following steps.
The method comprises the following steps:
s1, preparing data, and pre-training an image to obtain an embedded module f θ Extracting features of an image, imageComprises a training set and a testing set;
specifically, step S1 includes:
s11, data are processed
Is divided into
And
two parts, and the two parts have mutually exclusive class spaces, and the D is converted into the D
train As base class data training model, D
test Testing the model as new data;
s12, for the C-way K-shot classification task, selecting D
train Randomly selecting C categories, randomly selecting M samples in each category, wherein K samples are used as support samples S
i And the rest M-K samples are used as query samples Q
i ,S
i And Q
i Form a task T
i Same for D
test Has a task
S13, a first training stage: pre-training embedded module f by base class data θ ,f θ The method comprises the steps of containing 4 convolution blocks, wherein each convolution block contains a convolution layer, a pooling layer and a nonlinear activation function; the window size of a convolution kernel used by each convolution block is 3 multiplied by 3, a batch normalization, RGB three channels, a pooling layer, a 2 multiplied by 2 maximum pooling layer, the maximum pooling layers of the last two blocks are cut, and a nonlinear activation layer is used as an activation function of the nonlinear activation layer. For example, for an 84 × 84 × 3RGB image, a 3 × 3 convolution kernel with 64 filters is used per block. Each block is composed of 1 convolution, 1 ReLu, one pooling, as shown in fig. 3. The pre-trained embedded modules can be multiplexed according to different scenes.
S2, introducing an orthogonal prior thought into the convolutional neural network model, and constructing a feature learning network model based on feature migration and orthogonal prior; as shown in fig. 4.
Specifically, in the step S2, in the feature learning network model based on feature migration and orthogonal prior, the orthogonalized feature adaptive network consists of three parts: embedded module f
θ Orthogonal adaptation module
And a metrology module; quadrature adaptation module
The feature subspace is composed of two convolutional layers, the convolutional kernel size is 5 × 5, and is used for transforming and learning the new sample features, and is orthogonalized as shown in fig. 5.
S3, learning a network model objective function based on training optimization orthogonal prior characteristics;
specifically, step S3 includes:
s31, in the second stage of training, a classification task is carried out on the new data, and all the support samples are input into the embedded module f with fixed parameters θ In the method, corresponding support sample characteristics f are obtained θ (S ck );
S32, feature transformation is carried out by utilizing an orthogonal adaptation module to obtain
S33, the transformed characteristic corresponds to a Mask (Mask) M of each class c Multiplying to enable the features between different classes to be orthogonal pairwise;
step S33 is calculated as:
wherein S is
ck For the kth supported sample of class c,
representing the same orderMultiplication of corresponding elements of the matrix, M
cijh Is a mask M of class c
c In the ith row and jth column, the value of the h channel, M
c The elements of (a) are constituted as follows:
c is the total category number of the current task, H is the number of the characteristic channels, and H is an integral multiple of C; in the above formula, when h is within a given range (the range of h starts from 0), M cijh Equal to 1 and the values of the remaining positions are 0.
S34, calculating cosine distance C (P) between similar features by utilizing a measurement module ci ,P cj ) (i belongs to [0, K), i is not equal to j), and the cosine distance of the features among the same classes under a plurality of classes can be obtained;
step S34 is calculated as follows:
wherein, C (P)
ci ,P
cj ) For calculating the cosine distance between the same classes, K is the number of the supporting samples, c is the c-th class, P
ci Represents the ith supported sample feature in class c, P
cj Represents the jth supported sample feature in class c,
multiplication of corresponding elements representing the matrix, | | P
ci I is expressed by solving matrix P
ci The two norms of (a).
S35, optimizing the orthogonal adaptation module by utilizing the mean square error loss function
Step S35 includes:
the formula is calculated using the mean square error loss function as follows:
wherein N is the total category number under the current task, C (P) ci ,P cj ) To calculate the cosine distances between classes, where MSE [ cos (P) ] ci ,P cj ),l]=[cos(P ci ,P cj )-1] 2 ;
After the loss of the support sample is calculated, gradient descent is carried out, and a mini-batch and Adam optimizer is adopted to update the orthogonal adaptation module
Training is repeated for multiple tasks until the network converges.
The Adam adaptive optimization algorithm comprises the following specific steps:
initializing the data: v. of dW =0,S dW =0,v db =0,S db =0, which represent biased first and second moment estimates, respectively, dW, db representing the differential of W and b, respectively;
calculate Momentum exponential weighted average:
v dW =β 1 v dW +(1-β 1 )dW (5)
v db =β 1 v db +(1-β 1 )db (6)
calculating an exponentially weighted average of the gradient differential squares of the RMSprop algorithm formula:
S dW =β 2 S dW +(1-β 2 )(dW) 2 (7)
S db =β 2 S db +(1-β 2 )(db) 2 (8)
calculating the deviation correction of Momentum and RMSprop algorithms:
and (3) deviation correction of the Momentum algorithm:
deviation correction of RMSprop algorithm:
gradient descent is carried out, and the weight is updated:
in equations (5) - (14), t represents the t-th iteration, α represents the learning rate, which controls the update rate of the weights, ε represents a very small constant, β
1 ,β
2 Respectively representing the exponential decay rates of the first and second moment estimates,
representing the first and second moment estimates after bias correction.
And S4, classifying the images of the test set by using the optimized image orthogonal prior feature learning network model.
Step S4 comprises the following steps:
s41, testing process, each task
By supporting set
And query set
Composition, query set of test set
Input to an embedding module f
θ Orthogonal adaptation module after fine adjustment
In order to obtain characteristics
S42, respectively matching the characteristics output by the orthogonal adaptation module with the masks M corresponding to different classes c Multiplication, the specific operation is shown in formula (1):
wherein the content of the first and second substances,
for the sample of the k-th query,
representing multiplication of corresponding elements of a matrix of the same order, M
cijh As class c mask M
c In the ith row and jth column, the value of the h channel, M
c The elements of (a) are constituted as follows:
c is the total category number under the current task, H is the number of characteristic channels, and H is an integral multiple of C; in the above formula, when h is within a given range (the range of h starts from 0), M cijh Equal to 1, the values of the remaining positions being 0;
s43, mixingThe product is sent to a measurement module to calculate a query sample
Cosine distances from all supported samples; in the training stage, the measurement module calculates the cosine distance between the same type of features without calculating different types, and the measurement module in the same testing stage has different use methods;
and S44, taking the support sample type with the closest distance as the prediction type of the query sample. Different from the traditional training, the model is finely adjusted by using the support sample under the new class, and the query sample is directly tested after the optimization is completed.
On the other hand, the invention also provides a small sample image classification device based on feature migration and orthogonal prior, which is used for implementing the method, and as shown in fig. 6, the device comprises the following functional modules:
a pre-training module for pre-training the image to obtain an embedded module f θ Extracting the characteristics of an image, wherein the image comprises a training set and a test set;
the processing module introduces the idea of orthogonal prior and constructs a feature learning network model based on feature migration and orthogonal prior;
the calculation module is used for solving model parameters based on a training optimization orthogonal prior feature learning network model objective function;
and the classification module is used for classifying the images of the test set by utilizing the optimized image orthogonal prior feature learning network model.
The invention learns the orthogonal feature subspace by introducing the feature migration and the orthogonal prior small sample image feature learning, assuming a new class and base class shared feature extraction mode and assuming the feature orthogonality of different classes of new class data, and constructing an orthogonalized feature adaptation network to ensure that the features of different classes are orthogonal to each other, thereby improving the feature identification degree.
The detailed description of the proposed small sample image classification method and apparatus based on feature migration and orthogonal prior and the method thereof are described above with reference to the accompanying drawings. The implementation of the method and the device will be clear to those skilled in the art from the above description of the embodiments.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, this disclosure is not intended to be limited to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the disclosure as described herein, and any descriptions of specific languages are provided above to disclose the best mode disclosed herein.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments disclosed herein, various features disclosed herein are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various disclosed aspects. However, the disclosed method should not be interpreted to reflect the following schematic: rather, the claims appended hereto are directed to more features than are expressly recited in each claim. Rather, as the following claims reflect, disclosed aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this disclosure.
The above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: those skilled in the art can still make modifications or easily conceive of changes to the technical solutions described in the foregoing embodiments, or make equivalents to some of them, within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present application. Are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.