CN113627522A

CN113627522A - Image classification method, device and equipment based on relational network and storage medium

Info

Publication number: CN113627522A
Application number: CN202110907203.9A
Authority: CN
Inventors: 梁军; 余嘉琳; 余松森; 苏俊光
Original assignee: South China Normal University
Current assignee: South China Normal University
Priority date: 2021-08-09
Filing date: 2021-08-09
Publication date: 2021-11-09
Anticipated expiration: 2041-08-09

Abstract

The invention relates to an image classification method based on a relational network, which comprises the following steps: acquiring a target image; inputting the target image and the support set image into a trained image classification model to obtain the similarity between the target image and each class image in the support set image; the image classification model comprises an embedding module and a measurement module, wherein the embedding module is a random depth network, and the measurement module comprises a convolution layer and a full-connection layer which are connected with each other; and obtaining the category of the target image according to the maximum similarity. According to the method, a random depth network is adopted in an embedded module to replace convolutional layers in a relational network, and the network can optimize the training process of a residual error network by randomly removing some redundant layers, so that the number of layers of the network is increased, the problem of overfitting can be solved, more accurate support set image characteristics and query set image characteristics can be extracted, and the category judgment of a query set is further improved.

Description

Image classification method, device and equipment based on relational network and storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for classifying images based on a relationship network.

Background

In recent years, the unprecedented breakthrough progress of deep learning in various fields depends largely on the large amount of available tagged data, which needs to be collected and labeled at a large cost, which severely limits the expansion in new categories, and more importantly, the problem of the small amount of tagged data is difficult to solve by the deep learning models. Therefore, the problem of learning a small sample based on a relationship network becomes a hot content of recent research.

The goal of small sample research is to design a relevant learning model such that the model can achieve fast learning and identify classes of new samples in only a small number of labeled samples. The current research applicable to the problem of small samples is as follows: data enhancement, meta learning, and metric learning. Data enhancement can relieve the over-fitting problem and the data scarcity problem in the training process of a small amount of data to a certain extent, but cannot fundamentally solve the problem of small samples. The meta learning is to promote the model from the original data learning to the task learning, so that a new direction is provided for the research of the small sample learning problem.

The method for extracting the features of the images by using the deep convolutional network is a key step in the process of learning the small samples, and the deep convolutional network is difficult to improve the classification accuracy of the models by adopting the existing learning method to perform the task of learning the small samples.

The problems that the gradient disappears when the deep neural network is trained, the information gradually decreases when the information flows forwards and the training time is too long exist, so that the training of the deep neural network becomes very difficult. In some tasks, if shallow neural networks are used, they are simple in structure and easy to train but have poor expression ability; if deep neural networks are used, they have more redundant layers, which are more difficult to train, although they are well-expressed.

Disclosure of Invention

Based on this, the present invention provides an image classification method, apparatus, device and storage medium based on a relational network, which can extract more accurate support set image features and query set image features to further improve the category judgment of a query set.

In a first aspect, an embodiment of the present application provides an image classification method based on a relationship network, including the following steps:

acquiring a target image;

inputting the target image and the support set image into a trained image classification model to obtain the similarity between the target image and each class image in the support set image; the image classification model comprises an embedding module and a measurement module, wherein the embedding module is a random depth network, and the measurement module comprises a convolution layer and a full-connection layer which are connected with each other;

and obtaining the category of the target image according to the maximum similarity.

Further, inputting the target image and the support set image into a trained image classification model to obtain the similarity between the target image and each category image in the support set image, including:

inputting the target image and the support set image into the random depth network, and extracting the features of the target image and the support set image;

splicing the extracted features of the target image and the features of the support set image to obtain a spliced image;

inputting the spliced image into the convolutional layer, and further extracting the characteristics of the spliced image;

and inputting the extracted characteristics of the spliced images into the full-connection layer to obtain the similarity between the target image and each category of image in the support set image.

Further, extracting features of the target image and the support set image comprises:

randomly discarding the redundant layers of the target image and the support set image according to a rule generated by a survival probability;

obtaining the characteristic diagram of the target image

And feature maps of the support set images

Wherein x_jIs the target image, x_iIs a support set image.

Further, obtaining the similarity between the target image and each image in the support set image includes:

obtaining the matching degree between the target image and each category image in the support set image by analyzing the acquired characteristics of the spliced image, wherein the process is shown as a formula 1;

wherein the content of the first and second substances,

to support a set of image feature maps,

is a feature map of the target image, r_i,jRepresenting the similarity of the target image and the support set image categories, and C is the number of the support set category images, and generating C similarities.

Further, the training process of the commodity classification model comprises the following steps:

acquiring a query set image and a training set image;

inputting the query set image and the training set image into the random depth network, and extracting the characteristics of the query set image and the training set image;

splicing the extracted characteristics of the images of the query set and the training set to obtain a spliced image;

inputting the output result of the random depth network into the convolutional layer, and further extracting the characteristics of the query set image and the training set image;

and inputting the output result of the convolutional layer to the full-connection layer to obtain the similarity between the image of the query set and each image in the image of the training set.

In a second aspect, an embodiment of the present application provides an apparatus, including:

the image acquisition module is used for acquiring a target image, a query set image and a support set image;

and the similarity judging module is used for inputting the target image and the support set image into a trained image classification model to obtain the similarity between the target image and each image in the support set image.

And the image classification module is used for obtaining the category of the target image according to the size of the similarity.

Further, in an apparatus provided in an embodiment of the present application, the similarity determining module includes:

a first input unit, configured to input the target image and the support set image to the random depth network, and extract features of the target image and the support set image;

the first splicing unit is used for splicing the extracted features of the target image and the features of the support set image to obtain a spliced image;

the second input unit is used for inputting the spliced image to the first convolution layer and further extracting the characteristics of the spliced image;

and the third input unit is used for inputting the extracted characteristics of the spliced images into the full-connection layer to obtain the similarity between the target image and each image in the support set image.

Further, an apparatus provided in an embodiment of the present application further includes a training module:

the training module is used for inputting the images of the query set and the images of the support set into the image classification model for training to obtain an image classification model set corresponding to the image classification model, and classifying and identifying the images of the query set by adopting the image classification model.

In a third aspect, an embodiment of the present application provides an electronic device, including:

the system comprises a processor and a memory, wherein the memory stores a program which can be called by the processor;

wherein the processor, when executing the program, implements the method for image classification based on a relational network according to the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium, and the computer program when executed by a processor implements the steps of the relationship network based image classification method according to the first aspect.

In the embodiment of the application, in order to extract more accurate image features of the support set and the query set so as to further improve the category judgment of the query set, an improved novel model is provided on the basis of a relational network by using the knowledge of less sample learning, and the improved model is applied to the problem of image classification. The difference of the model in comparison with the relational network is that a random depth network (Stochastic depth) is adopted in an embedded module to replace the original four-layer convolution layer in the relational network, the depth of the embedded module can be deepened by the random depth network, the training process of a residual error network can be optimized by randomly removing some redundant layers, and the problem of overfitting can be prevented while the number of layers of the network is deepened.

The model of the application adopts a part of residual error modules which are not activated, so that the idea of model fusion is actually embodied, and because the depth of the model is random during training and the depth of the model is determined during prediction, the models with different depths are actually fused during testing, so that the network becomes simpler.

For a better understanding and practice, the invention is described in detail below with reference to the accompanying drawings.

Drawings

FIG. 1 is a flow chart of a method for image classification based on a relational network according to the present invention;

FIG. 2 is a diagram of the original ResNet structure in a random deep network;

FIG. 3 is a schematic diagram of probability of survival generation in a random deep network;

FIG. 4 is a diagram illustrating an image classification model according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of the construction of a convolutional layer and a fully-connected layer in a metrology module;

FIG. 6 is a schematic block diagram of an image classification apparatus based on a relational network according to the present invention;

fig. 7 is a schematic diagram of a similarity determination module in an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

It should be understood that the embodiments described are only some embodiments of the present application, and not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without any creative effort belong to the protection scope of the embodiments in the present application.

The terminology used in the embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the embodiments of the present application. As used in the examples of this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the application, as detailed in the appended claims. In the description of the present application, it is to be understood that the terms "first," "second," "third," and the like are used solely to distinguish one from another similar human body, and are not necessarily used to describe a particular order or sequence, nor are they to be construed as indicating or implying relative importance. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art as appropriate.

Further, in the description of the present application, "a plurality" means two or more unless otherwise specified. "and/or" describes an associative relationship with a human body, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the context of the associated human is an "or" relationship.

To solve the technical problem in the background art, an embodiment of the present application provides an image classification method based on a relationship network, as shown in fig. 1, the method includes the following steps:

in step S101, a target image is acquired;

in step S102, inputting the target image and the support set image into a trained image classification model to obtain a similarity between the target image and each category image in the support set image; the image classification model comprises an embedding module and a measurement module, wherein the embedding module is a random depth network, and the measurement module comprises a convolution layer and a full-connection layer which are connected with each other;

in step S103, the category of the target image is obtained according to the maximum similarity.

The target image is an image file in which a tag to be recognized is recorded.

Specifically, the image classification model is a model that can extract features of a support set image and a given target image through a small number of labeled support set images and the given target image, and perform recognition and classification by measuring distances between the extracted features; the similarity between the target image and each category image in the support set image is the distance between the characteristics of the commodity image and the characteristics of the support set image, the similarity is low when the distance is long, and the similarity is high when the distance is close.

Experiments in different data sets show that the training method can effectively solve the problem of difficulty in deep network training and greatly improve the model precision and the training speed. Fig. 2 is an original ResNet structure diagram, where f represents a residual part and id represents an identity map, and the two parts are summed, activated and then output. This process can be represented by the following equation:

H_l＝ReLU(f_l(H_l-1)+id(H_l-1)) (1)

wherein H_l-1Represents the l-1 th residual block, i.e., the input; h_lRepresents the ith residual block, i.e., the output result.

Stochastic Depth refers to adding a random variable b during training, wherein the probability distribution of b satisfies a Bernoulli distribution, then multiplying f by b, and randomly discarding the residual part. If b is 1, the structure is the original ResNet structure, and when b is 0, the residual branch is not activated, and the whole structure is degraded to an identity function. This process can be represented by the following equation:

H_l＝ReLU(b_lf_l(H_l-1)+id(H_l-1)) (2)

b satisfies a Bernoulli distribution, the value of b is only 0 and 1, wherein the probability of 0 is 1-p, and the probability of 1 is p. The above p is also referred to as survival probability, and represents the probability that b is 1, and is set as a smoothing function of the number of residual layers l. From p₀Linear decrease to p 1_lThere are a total of L residual blocks, 0.5. The formula is as follows:

wherein p is_lIt means that the first layer is trainingThe probability of survival in the exercise, L, represents the total number of residual blocks. The resulting rule for p is shown in fig. 3.

Because the embedded modules should not experiment with too complex networks, a model based on ResNet-18 optimization is used. 18 of ResNet-18 is the designated 18 layers with weights. Because the embedding module needs to extract the feature maps of the target image and the support set image and then input these feature maps as input to the measurement module. The last two layers of ResNet-18, the max pooling layer and the full connectivity layer, are removed and become the ResNet-16 model. And then, optimizing ResNet-16 on the basis of ResNet-16, and randomly discarding the redundant layers according to a rule generated by the survival probability p to finally form Stochastic Depth-16.

In a specific embodiment, as shown in fig. 4-5, fig. 4-5 are specific structures of an image classification model, wherein the embedding module is a random depth network, and the metrology module comprises a convolutional layer and a fully-connected layer connected to each other.

The random deep network uses Stochastic Depth-16.

The convolutional layer comprises a convolutional block 1 and a convolutional block 2, and the fully-connected layers comprise a maximum pooling layer 1, a ReLU activation function layer, a maximum pooling layer 2 and a Sigmoid function layer. Each convolution block comprises a convolution kernel, a batch normalization layer and a ReLU linear activation layer, parameters of each convolution kernel are the same, a 64-channel 3 x 3 convolution kernel is adopted, and the maximum pooling layer is 2 x 2.

According to the specific structure of the image classification model, inputting the target commodity image and the support set image into the trained commodity classification model, and specifically comprising the following steps:

Specifically, extracting the features of the target image and the support set image includes:

randomly discarding the redundant layers of the target image and the support set image according to a rule generated by a survival probability; obtaining the characteristic diagram of the target image

And feature maps of the support set images

Wherein x_jIs the target image, x_iIs a support set image.

Obtaining the similarity between the target image and each image in the support set image, including:

obtaining the matching degree between the target image and each category image in the support set image by analyzing the acquired characteristics of the spliced image, wherein the process is shown as a formula 4;

wherein the content of the first and second substances,

to support a set of image feature maps,

In a specific embodiment, the training process of the image classification model comprises the following steps:

acquiring a query set image and a training set image;

As shown in fig. 6, it is a schematic block diagram of an apparatus 200 for classifying a target image based on a relationship network according to the present invention, including:

an image obtaining module 210, configured to obtain a target image, a query set image, and a support set image.

And the similarity judging module 220 is configured to input the target image and the support set image into a trained image classification model, so as to obtain a similarity between the target image and each image in the support set image.

An image category obtaining module 230, configured to obtain a category of the target image according to the size of the similarity.

As shown in fig. 7, the similarity determination module 220 includes:

a first input unit 221, configured to input the target commodity image and the support set image into the random depth network, so as to obtain feature maps of the target image and the support set image;

a first stitching unit 222, configured to stitch the target image feature map with the feature map of each support set category image to obtain a stitched feature map;

a second input unit 223, configured to input the stitching feature map into the convolutional layer, and extract features of the stitching feature map;

a third input unit 224, configured to input an output result of the convolutional layer to the full link layer, so as to obtain a similarity between the commodity image and each category image in the support set image.

In a preferred embodiment, the image classification system further includes a training module, where the training module is configured to input the query set images and the support set images into an image classification model for training, to obtain an image classification model set corresponding to the image classification model, and perform classification and identification on the query set images by using the image classification model.

Corresponding to the image classification method based on the relational network, an embodiment of the present application further provides an electronic device, including:

at least one processor and at least one memory;

the memory stores a program that can be called by the processor;

when the processor executes the program, the steps of the image classification method based on the relational network can be realized.

In particular, the electronic device may be a computer or a server.

Corresponding to the above-mentioned image classification method based on the relational network, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored, and the computer program, when executed by a processor, implements the steps of the image classification method based on the relational network.

In a specific embodiment, the evaluation experiment of the classification method model of the commodity image based on the relationship network provided by the invention can be based on a mini-ImageNet data set and an RP2K data set.

The experiment is realized based on PyTorch frame, and the experimental environment is shown in the following table 1.

1) mini-ImageNet dataset

mini-ImageNet was derived from ImageNet and contains 100 classes, each containing 100 samples, each picture size being 84 x 84, where 64 classes were used for training, 16 classes for validation and 20 classes for testing. We tested both 5-way 1-shot and 5-way 5-shot tasks. In the application model, four convolutional layers embedded in a module in a relational network are replaced by a random depth network, and a measurement module is consistent with the relational network. On the mini-ImageNet data set, the experimental results are shown in Table 2, and the precision of the model is improved by 1.58% and 1.21% in two tasks of 5-way 1-shot and 5-way 5-shot respectively.

2) RP2K data set

The RP2K dataset is a broad range of item image datasets used for retail item classification. This data set collected over 500000 images of retail goods. Including 2000 different image categories. It is currently the largest commodity picture dataset. To verify whether our improved model can be more efficiently classified in small sample retail merchandise images. We simulated the mini-ImageNet dataset and randomly drawn 100 categories of commodity in the RP2K dataset, 64 as training set, 16 as validation set, and 20 as test set. And (3) extracting and dividing for 3 times respectively, inputting the data sets for 3 times into the model to obtain results for 3 times, and taking an average value as a final result of the user. Because RP2K pictures are not of the same size, we uniformly modify all the picture sizes to 84 × 84. The experiment was the same as described above. The results of the experiment are shown in table 3. In the RP2K data set, the precision of the comparison relation network SD-RNET model is respectively improved by 0.85 percent and 0.26 percent in two tasks of 5-way 1-shot and 5-way 5-shot

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention.

Claims

1. An image classification method based on a relational network is characterized in that: the method comprises the following steps:

acquiring a target image;

2. The method of claim 1, wherein inputting the target image and the support set image into a trained image classification model to obtain a similarity between the target image and each class image in the support set image comprises:

3. The method of claim 2, wherein extracting the features of the target image and the support set image comprises:

obtaining the characteristic diagram of the target image

And feature maps of the support set images

Wherein x_jIs the target image, x_iIs a support set image.

4. The method of claim 3, wherein obtaining the similarity between the target image and each image in the support set images comprises:

wherein the content of the first and second substances,

to support a set of image feature maps,

5. The small sample image classification method according to claim 3, wherein the training process of the commodity classification model comprises:

acquiring a query set image and a training set image;

6. An image classification apparatus based on a relational network, characterized by comprising:

7. The image classification device based on the relational network according to claim 6, wherein the similarity determination module comprises:

8. The image classification device based on the relational network according to claim 7, wherein:

9. An electronic device, comprising:

wherein the processor, when executing the program, implements the image processing method of any one of claims 1 to 5.

10. A computer-readable storage medium storing a computer program, characterized in that:

the computer program realizing the steps of the method according to any of claims 1-5 when executed by a processor.