CN111242183A

CN111242183A - Image identification and classification method and device based on attention mechanism

Info

Publication number: CN111242183A
Application number: CN202010005582.8A
Authority: CN
Inventors: 张顺利; 林贝贝
Original assignee: Beijing Jiaotong University
Current assignee: Beijing Jiaotong University
Priority date: 2020-01-03
Filing date: 2020-01-03
Publication date: 2020-06-05

Abstract

The invention provides an attention mechanism-based image identification and classification method and device. The method comprises the following steps: constructing an image feature extraction model based on a convolutional neural network and an attention mechanism; extracting the image characteristics of the target image by adopting the image characteristic extraction model; and identifying and classifying the target image according to the image characteristics to determine the category of the target image. The method and the device can improve the accuracy of extracting the image features of the target image, and further can realize the accuracy and precision of image identification.

Description

Image identification and classification method and device based on attention mechanism

Technical Field

The invention relates to the technical field of image processing, in particular to an attention mechanism-based image identification and classification method and device.

Background

With the development of information technology, image data is rapidly increasing, and the demand for image processing is also greatly increasing. The image recognition and classification mainly comprises the steps of extracting specific features in an image, representing information of the image through the specific features, and then recognizing and classifying the image according to the extracted specific features. The image recognition can be used in many fields, and the image recognition can rapidly and accurately extract the characteristics of the object under various complex conditions, so that the method has a wide application prospect.

A typical image recognition and classification system mainly comprises two parts, namely image feature extraction and recognition and classification based on the extracted image features. When the image feature modeling is carried out, firstly, a large amount of image sample data needs to be collected, an image recognition database is constructed on the basis of the image sample data, and a model suitable for extracting the image features can be trained through the image recognition database. Then, based on the extraction result of the image feature extraction model, the result of the image recognition and classification can be obtained by using the corresponding recognition and classification model.

At present, a two-dimensional convolution model is mostly adopted for an image feature extraction model, but the processing of image features in a convolution network is equivalent, for example: when identifying animals in an image, it is more desirable that the convolutional network be able to "look" at the animal itself, rather than equivalently the animal and the background. Therefore, the existing image feature extraction has the problem of inaccurate feature extraction, and further the accuracy of image identification and classification is reduced.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides an attention mechanism-based image identification and classification method and device, which can improve the accuracy of image identification and classification.

In order to achieve the purpose, the invention provides the following technical scheme:

in one aspect, the invention provides an attention mechanism-based image recognition and classification method, which comprises the following steps:

constructing an image feature extraction model based on a convolutional neural network and an attention mechanism;

extracting the image characteristics of the target image by adopting the image characteristic extraction model;

and identifying and classifying the target image according to the image characteristics to determine the category of the target image.

Wherein the image feature extraction model is constructed based on the convolutional neural network and the attention mechanism, and comprises the following steps:

training by adopting a convolutional neural network based on the labels corresponding to the sample training set and the sample training set to obtain a convolutional network model;

determining the weight of the convolution network model through an attention mechanism and obtaining a second convolution network model;

and training the second convolution network model based on the labels corresponding to the sample training set and the sample training set to obtain an image feature extraction model.

The label based on the sample training set and the corresponding sample training set is trained by adopting a convolutional neural network to obtain a convolutional network model, and the method comprises the following steps:

preprocessing the sample training set to generate a size N₁*N₂C target training samples;

training the target training sample and the label in an iterative optimization mode in a convolutional neural network to obtain a convolutional network model;

wherein N is₁And N₂Respectively representing the height and width of the input convolutional neural network samples, and C represents that the input samples are RGB three-channel pictures.

Wherein the determining weights of the convolutional network model by the attention mechanism and obtaining a second convolutional network model comprises:

and determining the weight of the convolution network model by adopting an attention mechanism, and multiplying the weight by the convolution network model to obtain the second convolution network model.

In another aspect, the present invention further provides an attention mechanism-based image recognition and classification apparatus, including:

the modeling unit is used for constructing an image feature extraction model based on the convolutional neural network and the attention mechanism;

the characteristic extraction unit is used for extracting the image characteristics of the target image by adopting the image characteristic extraction model;

and the identification and classification unit is used for identifying and classifying the target image according to the image characteristics to determine the category of the target image.

Wherein the modeling unit includes:

the first training subunit is used for training by adopting a convolutional neural network based on the sample training set and the labels corresponding to the sample training set to obtain a convolutional network model;

the fitting subunit is used for determining the weight of the convolution network model through an attention mechanism and obtaining a second convolution network model;

and the second training subunit is used for training the second convolution network model based on the sample training set and the labels corresponding to the sample training set to obtain an image feature extraction model.

Wherein the first training subunit comprises:

a preprocessing module for preprocessing the sample training set to generate a size N₁*N₂C target training samples;

the convolution module is used for training the target training sample and the label in an iterative optimization mode in a convolution neural network to obtain a convolution network model;

Wherein the fitting subunit comprises:

and the generating module is used for determining the weight of the convolution network model by adopting an attention mechanism, and multiplying the weight and the convolution network model to obtain the second convolution network model.

In another aspect, the present invention further provides an electronic device, including: a processor, a memory, a communication interface, and a communication bus; wherein the content of the first and second substances,

the processor, the communication interface and the memory complete mutual communication through a communication bus;

the processor is used for calling logic instructions in the memory to execute the image identification and classification method based on the attention mechanism.

In another aspect, the present invention also provides a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the above-mentioned image recognition classification method based on attention mechanism.

According to the technical scheme, the image recognition and classification method and device based on the attention mechanism are characterized in that an image feature extraction model is constructed based on a convolutional neural network and the attention mechanism; the image feature extraction model is adopted to extract the image features of the target image, the accuracy of extracting the image features of the target image is improved, the target image is identified and classified according to the image features to determine the category of the target image, and then the accuracy and precision of image identification can be further realized.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flowchart of an image recognition and classification method based on an attention mechanism according to an embodiment of the present invention;

FIG. 2 is a flow diagram of a framework for providing channel domain attention in accordance with an embodiment of the present invention;

FIG. 3 is a flow diagram of a framework for providing a combination of channel domain attention and spatial domain attention in accordance with an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of an image recognition and classification apparatus based on an attention mechanism according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The following embodiment of the invention provides an embodiment of an image identification and classification method based on an attention mechanism, and specifically includes the following steps, with reference to fig. 1:

s101: constructing an image feature extraction model based on a convolutional neural network and an attention mechanism;

in the step, a sample training set and a test set are extracted from a cifar10 data set or a cifar100 data set, and a convolutional network model is obtained by training through a convolutional neural network based on labels corresponding to the sample training set and the sample training set;

wherein the input size of the convolutional neural network is set to N₁*N₂C, preprocessing the sample training set during training to generate the size N₁*N₂C target training samples;

wherein N is₁And N₂Respectively representing the height and width of the input convolutional neural network samples, and C represents that the input samples are RGB three-channel pictures. In the present embodiment, the target training samples are uniformly scaled to a size of 112x 122.

In the embodiment, the convolutional network is pre-trained by using the sample and the sample label through an iterative optimization strategy, so that the trained convolutional network can extract better characteristics from the image. In specific implementation, the target training sample and the label are trained in an iterative optimization mode in a convolutional neural network to obtain a convolutional network model.

Further, determining the weight of the convolution network model through an attention mechanism and obtaining a second convolution network model;

in specific implementation, the weights of the convolution network model are determined by adopting an attention mechanism, and the weights are multiplied by the convolution network model to obtain the second convolution network model.

Further, an iterative optimization strategy is used for the second convolution network model, samples and sample labels are used for pre-training, so that the trained product network model based on the attention mechanism can extract better features from the image, and in specific implementation, the second convolution network model is trained based on the sample training set and the labels corresponding to the sample training set to obtain an image feature extraction model.

S102: extracting the image characteristics of the target image by adopting the image characteristic extraction model;

in this step, feature extraction is performed on the target image by using an image feature extraction model, so as to obtain image features corresponding to the target image, so that the target image is identified and classified based on the image features.

S103: and identifying and classifying the target image according to the image characteristics to determine the category of the target image.

In this step, each image feature and a classification corresponding to each image feature are stored in advance. The method comprises the steps of obtaining image features of a target image, identifying according to the image features, determining and storing the image features of the obtained target image in each image feature, and determining the classification of the target image according to the classification corresponding to the stored image features.

As can be seen from the above description, in the image identification and classification method based on the attention mechanism provided by the embodiment of the present invention, an image feature extraction model is constructed based on a convolutional neural network and the attention mechanism; the image feature extraction model is adopted to extract the image features of the target image, the accuracy of extracting the image features of the target image is improved, the target image is identified and classified according to the image features to determine the category of the target image, and then the accuracy and precision of image identification can be further realized.

The embodiment of the present invention provides an implementation method for determining the weight of the convolutional network model by using an attention mechanism in the above embodiment, which specifically includes the following steps:

in the image recognition, the convolutional neural network is used for extracting image features, so that each feature channel and the image features corresponding to each channel can be obtained, most information in the image recognition usually comes from one part of the feature channels, and therefore feature graphs in each channel are aggregated and adaptive weights are generated according to the feature channels.

In this embodiment, two spatial domain compression modes, MAX POOLING (MAX POOLING) and average POOLING (AVG POOLING) are adopted. Meanwhile, the side connection is added in the two branches, so that the interaction of the two kinds of spatial domain information is realized. Referring to the flow chart of attention of the channel domain shown in fig. 2, the two ways of compression are combined to be more beneficial to store the information of the image space domain.

When the spatial domain compression is completed, adaptive weights can be generated by compressing-activating non-linear-restoration-mapping to probability for the channel domain. After the two branches complete compression and recovery operations, the information is fused by adopting a side connection operation. The introduction of "side connection" enables more sufficient compression of the information of the characteristic channel. Different from the traditional single-path compression and multi-path weight sharing modes, the multi-path parallel and information interaction maintaining mode provided by the embodiment is more reasonable and effective.

Finally, in order to enable the network to effectively enhance the input, the sigmoid function is required to be used for mapping to the probability conversion.

Specifically, (N, C, H, W) is compressed to (N, C,1,1) using two spatial domain compression methods of MAX POOLING (MAX POOLING) and average POOLING (AVG POOLING), followed by accumulating the information resulting from the MAX POOLING (MAX POOLING) compression into the information resulting from the average POOLING (AVG POOLING). The features (N, C, H, W) are then projected to a reduced size (N, C/r, H, W) using a 1 × 1 convolution to integrate and compress the feature map over the entire channel dimension. The network can fit the curve by activating the nonlinearity, and finally the feature (N, C/r, H, W) is projected to the size (N, C, H, W) by using 1 x1 convolution, and the two branches are fused again. The attention module of this channel domain extracts the adaptive weights.

Where N represents the number of samples input into the network, C represents the number of feature channels in the network, H (N)₁) And W (N)₂) Representing the height and width of the picture, the input X of the neural network is of size (N, C, H, W).

Wherein, X_cRepresenting the output of the channel domain attention module, M_c(x) To represent adaptive weights generated by the channel domain attention mechanism. More robust features may be obtained by enhancing the active area in the channel domainAnd (5) carrying out characterization.

In addition to the channel domain requiring adaptive weighting in image recognition, the feature map of the image, i.e., the spatial domain, also requires adaptive weighting. The spatial domain attention branch produces a spatial attention map to emphasize or suppress features in different spatial locations. The attention mechanism of the research space domain is determined from the source of the image recognition effect. In this embodiment, the adaptive weight is generated by acquiring global information of the feature map and according to the feature map.

The global information of the feature map is obtained by adopting the hole convolution, and the number of parameters and the calculation overhead are saved by using the hole convolution. In the process of carrying out the hole convolution, the problem of information loss in the hole convolution is solved by utilizing the hole rates with different scales. Meanwhile, the voidage of different scales has different effects on image recognition, so that secondary selection is required for the voidage of different scales. Referring to the flow chart of fig. 3, which shows the combination of channel domain attention and spatial domain attention, after generating the channel domain adaptive weights, we further generate the adaptive weights of the spatial domain based on the above, and the calculation formula is as follows:

wherein, X_sRepresenting the output of a spatial domain attention Module, M_s(X_c) Representing adaptive weights generated by a spatial domain attention mechanism. A more robust feature can be obtained by enhancing the effective area in the spatial domain, softmax being logistic regression.

The embodiment of the invention provides an image recognition and classification device based on an attention mechanism, which is shown in fig. 4 and specifically comprises the following steps:

the modeling unit 10 is used for constructing an image feature extraction model based on a convolutional neural network and an attention mechanism;

a feature extraction unit 20, configured to extract an image feature of the target image by using the image feature extraction model;

and the identification and classification unit 30 is used for identifying and classifying the target image according to the image characteristics to determine the category of the target image.

Wherein the modeling unit 10 comprises:

Wherein the first training subunit comprises:

Wherein the fitting subunit comprises:

The functions implemented by the modules in the apparatus correspond to the corresponding operation steps in the method embodiment, and are not described herein again.

According to the technical scheme, the image recognition and classification device based on the attention mechanism is characterized in that an image feature extraction model is constructed based on a convolutional neural network and the attention mechanism; the image feature extraction model is adopted to extract the image features of the target image, the accuracy of extracting the image features of the target image is improved, the target image is identified and classified according to the image features to determine the category of the target image, and then the accuracy and precision of image identification can be further realized.

An embodiment of the present invention provides an electronic device, and referring to fig. 5, the electronic device may include: a processor (processor)810, a communication Interface 820, a memory 830 and a communication bus 840, wherein the processor 810, the communication Interface 820 and the memory 830 communicate with each other via the communication bus 840. The processor 810 may call logic instructions in the memory 830 to perform the following method: constructing an image feature extraction model based on a convolutional neural network and an attention mechanism; extracting the image characteristics of the target image by adopting the image characteristic extraction model; and identifying and classifying the target image according to the image characteristics to determine the category of the target image.

In addition, the logic instructions in the memory 830 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

An embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program implements the method provided by the above method embodiments, for example, the method includes: constructing an image feature extraction model based on a convolutional neural network and an attention mechanism; extracting the image characteristics of the target image by adopting the image characteristic extraction model; and identifying and classifying the target image according to the image characteristics to determine the category of the target image.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, apparatus, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus, and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means/systems for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element. The terms "upper", "lower", and the like, indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience in describing the present invention and simplifying the description, but do not indicate or imply that the referred devices or elements must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention. Unless expressly stated or limited otherwise, the terms "mounted," "connected," and "connected" are intended to be inclusive and mean, for example, that they may be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.

In the description of the present invention, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description. Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present invention is not limited to any single aspect, nor is it limited to any single embodiment, nor is it limited to any combination and/or permutation of these aspects and/or embodiments. Moreover, each aspect and/or embodiment of the present invention may be utilized alone or in combination with one or more other aspects and/or embodiments thereof.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present invention, and they should be construed as being included in the following claims and description.

Claims

1. An image recognition and classification method based on an attention mechanism is characterized by comprising the following steps:

2. The method for image recognition and classification based on the attention mechanism as claimed in claim 1, wherein the constructing an image feature extraction model based on the convolutional neural network and the attention mechanism comprises:

3. The method for image recognition and classification based on the attention mechanism as claimed in claim 2, wherein the training with a convolutional neural network based on the labels corresponding to the sample training set and the sample training set to obtain a convolutional network model comprises:

4. The method for image recognition and classification based on attention mechanism as claimed in claim 2, wherein the determining the weights of the convolutional network model through attention mechanism and obtaining a second convolutional network model comprises:

5. An image recognition and classification device based on an attention mechanism is characterized by comprising:

6. The attention mechanism-based image recognition and classification device according to claim 5, wherein the modeling unit comprises:

7. The attention-based image recognition and classification device of claim 6, wherein the first training subunit comprises:

8. The attention-based image recognition and classification device according to claim 6, wherein the fitting subunit comprises:

9. An electronic device, comprising: a processor, a memory, a communication interface, and a communication bus; wherein the content of the first and second substances,

the processor is used for calling logic instructions in the memory to execute the attention mechanism-based image recognition classification method according to any one of claims 1 to 4.

10. A non-transitory computer readable storage medium storing computer instructions that cause the computer to perform the attention mechanism based image recognition classification method of any one of claims 1-5.