CN111860697A

CN111860697A - Local descriptor-based criticist-driven small sample learning method

Info

Publication number: CN111860697A
Application number: CN202010777958.7A
Authority: CN
Inventors: 陶文源; 郭白鹭; 翁仲铭
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2020-08-05
Filing date: 2020-08-05
Publication date: 2020-10-30

Abstract

The invention discloses a local descriptor-based critic-driven small sample learning method. The invention aims to score the extracted local descriptors of the pictures by using the critic network mechanism and improve the influence weight of the local descriptors with high scores on the classification result, so that the critic network parameters are adjusted to make the critic network parameters score the important local descriptors of the pictures, further the integral small sample learning network is forced to pay attention to the key areas of the pictures, the influence of the disordered background and the irrelevant information of the pictures on the network classification is reduced, and the final classification effect of the network on the pictures is improved.

Description

Local descriptor-based criticist-driven small sample learning method

Technical Field

The invention relates to the fields of image recognition, small sample learning, image characteristic information extraction, deep learning and the like, in particular to a critic-driven small sample learning method based on local descriptors.

Background

In recent years, methods and technologies for deep learning are continuously developed and advanced, which greatly drives the development of research in the field of image recognition, such as image classification, image segmentation, target detection and the like, and even the capability of deep learning a network on certain tasks exceeds that of human beings. However, mainstream deep learning methods typically require a tremendous amount of data to train the network model, and these methods tend to over-fit on specific tasks, or they often perform poorly on entirely new data sets. The ability of humans to adapt rapidly when challenged with entirely new tasks remains elusive. In this context, small sample learning arises.

Small sample learning aims to force the model to learn at a small sample size and to obtain a model generalization capability outside the training data set. Existing small sample learning methods based on meta-learning or metric learning generally utilize several improvement methods to improve model learning capabilities, such as data enhancement, embedded network model structure improvement, comparative benchmark improvement, and target positioning. The maximum entropy block generator can simulate block sequences of human visual track sampling pictures, the sequences are divided into background sequences and target sequences by a reinforcement learning method, and a model is guided to focus on the target block sequences of the images, so that the accuracy of small sample image recognition is improved. The image saliency map is also applied to the field of small sample recognition, the background and the foreground of different pictures can be mixed together by using a data transformation method to increase the model training data amount, meanwhile, the influence of the picture background on recognition can be weakened, and the accuracy of small sample image recognition is improved. These approaches are relatively lacking in some insight into the mechanisms that the human visual system reacts to when faced with a completely new picture.

Humans can learn to classify new pictures very quickly, mainly because of their extensive knowledge reserves and past experience. In addition, another important factor is that humans have a rapid information capture capability. Human beings can quickly locate key information points of a picture from a new picture with a complex environment background and extract important information, and prefer to ignore other irrelevant information in the picture. This is because humans have developed a criticizing mechanism in past experience that forces them to turn their visual attention to a specific area in the picture, but this transfer process is so short that humans do not realize it. This human visual mechanism can help us to better design a network approach for small sample learning.

Disclosure of Invention

The invention aims to overcome the defects in the prior art and provides a method for small sample learning based on local descriptor driven critics. The method uses the local descriptors of different areas of the picture to describe the detailed characteristic information of the picture, and adopts a picture-to-class classification method to improve the classification precision. The invention aims to score the extracted local descriptors of the pictures by using the critic network mechanism and improve the influence weight of the local descriptors with high scores on the classification result, so that the critic network parameters are adjusted to make the critic network parameters score the important local descriptors of the pictures, further the integral small sample learning network is forced to pay attention to the key areas of the pictures, the influence of the disordered background and the irrelevant information of the pictures on the network classification is reduced, and the final classification effect of the network on the pictures is improved.

The purpose of the invention is realized by the following technical scheme:

a critic-driven small sample learning method based on local descriptors comprises the following steps:

(1) extracting a local descriptor of the picture through a local descriptor extraction network;

(2) scoring the extracted local descriptors of the pictures through a critic network;

(3) and classifying the pictures according to the extracted local descriptors and the marked scores by using a picture-to-class classification method.

Further, in the step (1), a local descriptor extraction network is used for respectively extracting local descriptors of all query set pictures and support set pictures, all local descriptors of one query set picture are integrated into one local descriptor pool, and local descriptors of each type of pictures in the support set are respectively integrated into the local descriptor pools;

in the step (2), a critic network is adopted to score all local descriptors of a query set picture;

and (3) calculating the similarity of all local descriptors of the query set picture and all local descriptor pools of the support set by using a K-nearest neighbor algorithm, and enabling the classification weight of the local descriptors of the query set picture with high scores to be higher, so as to find out the support set picture category which is more similar to the query set picture.

Compared with the prior art, the technical scheme of the invention has the following beneficial effects:

1. the critic network is used for driving the small sample learning network to pay attention to key information in the picture, and picture background interference information and irrelevant information are ignored, so that the performance of the network in the small sample learning task is improved.

2. The method adopts the local descriptors to extract the detail features in the pictures, uses the picture-to-class image classification method to classify the images under the condition of small samples, and uses the critic network to score the importance degree of the extracted picture local descriptors, so that the local descriptors with high scores have larger influence on the classification result, the network is forced to pay more attention to the areas with high scores, namely the more important areas in the pictures, and the interference of disordered backgrounds and irrelevant information on the picture classification is reduced. Finally, the purposes of paying attention to picture key information in a classification task and reducing the influence of interference information are achieved; the problem of the interference of the disordered background information of the picture to the identification is solved.

3. The problem that the embedded network extracts the picture characteristic information roughly is solved. In the prior art, a global descriptor is generally used as an extraction result of an embedded network for picture features, so that the extracted picture features are global information of pictures and fine features of the pictures can be lost. The method uses the local descriptor to extract the local information of the picture, and then integrates the local information into the characteristic information set of the picture, thereby keeping the detailed characteristics of the picture.

Drawings

Fig. 1 is a diagram of the network architecture of the present invention.

Fig. 2 is a schematic diagram of the operation mechanism of the criticizing network.

FIG. 3 shows the results of the experiments performed on the miniImageNet data set according to the present invention.

Fig. 4 is an experimental result of the present invention on three fine-grained classified picture data sets.

Detailed Description

The invention is described in further detail below with reference to the figures and specific examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

A small sample learning task T is actually a relationship learning task that contains a large number of classes of pictures. A support set S is given containing many classes of pictures, each class having K pictures (K is a small number, usually 1 or 5, and hence called a small sample learning task), and a query set Q, the pictures contained in S and Q do not overlap each other and they are extracted from T. Under the above data settings, the small sample learning network is trained to indicate which class of pictures in S each picture in Q belongs to, so that after many generations of learning, in the face of a new task T ', the overall network can indicate which class of the new support set S each picture in the new query set Q' belongs to.

Therefore, the invention provides a local descriptor-based critic-driven small sample learning method, and the overall network method mainly comprises the following parts:

(1) and the local descriptor extraction network is used for extracting the local descriptors of the pictures.

(2) And the critic network is used for scoring the extracted picture local descriptors.

(3) And the comparison module considering the scores classifies the pictures according to the extracted local descriptors and the marked scores by using a picture-to-class classification method.

Specifically, with respect to local descriptors: due to quantization errors, the picture-level descriptor, i.e. the global descriptor, loses the detailed information that contributes to the classification. The effect of such errors can be mitigated if the amount of data is sufficient, but the effect of such errors is irreversible at small sample task settings. In addition, the image level descriptor has a receptive field of a whole picture, so that the extracted information is relatively coarse, and the local descriptor has a receptive field of a certain specific area of the picture, so that the local descriptor can provide more detailed characteristic information of different areas of the picture, thereby being more beneficial to the classification of the picture by the whole network.

Specifically, regarding the critic network: the invention adopts a critic mechanism to simulate a human vision mechanism and drives an integral small sample to learn key areas of network attention pictures. The criticizing family network is also a deep learning network and aims to score extracted picture local descriptors, and under the training mode of unsupervised learning, the criticizing family network tends to score more important local descriptors in pictures to be high through parameter adjustment, so that the local descriptors have higher weight on the classification result.

Specifically, the graph-to-class classification method includes: graph-to-graph classification methods sometimes perform poorly due to the effects of intra-class differences. If the training set is small, one query set picture may not resemble much each picture in the support set. The image-to-class classification method integrates all local descriptors of a picture into a local descriptor pool, and calculates the similarity between all local descriptors of a query set picture and each class of local descriptor pool in a support set by adopting a K neighbor method, so as to obtain which class of pictures the query set picture belongs to in the support set, thereby effectively solving the problem of intra-class difference influence.

In the embodiment, a local descriptor extraction network is used for respectively extracting local descriptors of all query set pictures and support set pictures, all local descriptors of a query set picture are integrated into a local descriptor pool, local descriptors of each type of pictures in a support set are integrated into the local descriptor pool of the type, then a critic network is adopted to score all local descriptors of a query set picture, finally a graph-to-class classification method is used for calculating the similarity between all local descriptors of the query set picture and all local descriptor pools of the support set by using a K-nearest neighbor algorithm, and the classification weight of the local descriptors of the query set picture with high score is higher, so that the support set picture type which is more similar to the query set picture is found. The specific overall network structure is shown in fig. 1.

The specific operation flow of this embodiment is as follows:

1. extraction network F using a multi-layer convolutional neural network as local descriptor_e. And inputting the picture into a local descriptor extraction network, wherein the obtained output is a local descriptor set of the picture. In the embodiment, the full connection layer of the extraction network is removed, and only the convolution layer is reserved, so that the output result of the extraction network is not a global descriptor of the picture but a set of local descriptors. The specific extraction process is shown in the following formula:

F_e(X)＝[x₁,x₂,…,x_m]∈R^d×m

wherein X represents a currently input picture pixel value representation matrix, m represents the number of extracted local descriptors, m is the product of w and h (w is the width of the picture pixel value matrix, h is the height of the picture pixel value matrix), and X_iFor the ith local descriptor of the picture, d is the filter number of the last convolution layer of the local descriptor extraction network and is also the dimension of the output local descriptor, and R represents the real number set. In this embodiment, the picture is adjusted to 84 × 84 pixels, m × h 21 × 441, and d 64. The output picture local description subset represents the feature information of the picture.

2. For each local descriptor of a query set picture X extracted by a local descriptor extraction network, a critic network F_cA score is given to evaluate the importance of each local descriptor, as shown in figure 2. The specific scoring process is shown in the following formula:

F_c(X)＝[s₁,s₂,…,s_m]∈R^d×m

where m is the same as the set of the local descriptor extraction network, s_iThe scores of the critic network on the ith local descriptor are marked, d is fixed to be 1, namely the scores of the local descriptors at the same position of one picture are the same. Obviously, m × h 21 × 441 is the same as the local descriptor extraction network. Criticizing family networkAlso only convolutional layers are included and the activation function of the last layer is set to Sigmoid function to make the score derived by critic network compressed between 0-1.

3. A comparison module that considers scores; extraction network F_eA query set picture X_qPartial descriptor extraction of_e(X)＝[x₁,x₂,…,x_m]∈R^d×mExtracting a local description subcategory pool of a support set

K is the number of pictures in each category in the support set, critics network F_cGiving the score F of each local descriptor in the query set picture_c(X)＝[s₁,s₂,…,s_m]∈R^1×m. Each score-weighted query set local descriptor is compared to the pool of local descriptors for each class of the support set to find its k nearest neighbors in the pool

Then the method calculates the image X of the query set_qAnd m × k graph-to-class similarity values between class C are added as shown in the following formula:

the Cosine similarity is calculated as follows:

after comparing all the image categories in one query set and the support set, the similarity score result with each category is printed, and the label with the highest score result is the classification result of the current query set image.

4. Through the training of the meta-tasks of a plurality of generations, each meta-task comprises a support set and a query set, the parameters of the local descriptor extraction network and the critic network are adjusted towards the direction of improving the classification accuracy, and finally the relative stability is achieved, so that the whole small sample learning picture classification network can indicate which picture category of the support set a picture belongs to when facing the picture of the category which is never seen.

Specifically, in the embodiment, a pytorech library is used to build an integral network, and experiments are performed on four data sets, namely miniImageNet, StanfordDogs, Stanford Cars and CUB-200, wherein the experimental task setting comprises 1-shot and 5-shot. The method uses a cross entropy loss function and an Adam optimizer. In the training stage, 15 and 10 query set pictures are respectively extracted from each meta task and used for recognition tasks under 1-shot and 5-shot settings, and the initial learning rate is set at 5e 3. In the testing stage, 600 generation testing tasks are extracted, the data setting continues to use the setting in the training stage, and top-1 average accuracy and 95% confidence interval are adopted for evaluating the whole network. Experimental results show that the invention achieves unusual performances on four data sets, as shown in figures 3 and 4.

The present invention is not limited to the above-described embodiments. The foregoing description of the specific embodiments is intended to describe and illustrate the technical solutions of the present invention, and the above specific embodiments are merely illustrative and not restrictive. Those skilled in the art can make many changes and modifications to the invention without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A critic-driven small sample learning method based on local descriptors is characterized by comprising the following steps:

2. The local descriptor-based critic-driven small sample learning method according to claim 1, wherein in step (1), local descriptors of all query set pictures and support set pictures are extracted respectively by using a local descriptor extraction network, all local descriptors of a query set picture are integrated into a local descriptor pool, and local descriptors of each type of pictures in a support set are integrated into the local descriptor pool respectively;