CN116188875B

CN116188875B - Image classification method, device, electronic equipment, medium and product

Info

Publication number: CN116188875B
Application number: CN202310323632.0A
Authority: CN
Inventors: 尉德利
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2023-03-29
Filing date: 2023-03-29
Publication date: 2024-03-01
Anticipated expiration: 2043-03-29
Also published as: CN116188875A

Abstract

The disclosure provides an image classification method, an image classification device, electronic equipment, media and products, relates to the technical field of artificial intelligence, and particularly relates to the technical fields of computer vision, image processing, deep learning and the like. The specific implementation scheme is as follows: acquiring an image to be classified; extracting features of the images to be classified to obtain target feature information, wherein the target feature information comprises a first feature vector and a second feature vector, and the first feature vector is a feature vector generated based on a feature map of the images to be classified; the second feature vector is a feature vector generated based on the key value feature corresponding to the feature map, the value feature corresponding to the feature map and the initial query feature acquired in advance; generating first classification information based on the first feature vector, and generating second classification information based on the second feature vector; and determining a target classification category based on the first classification information and the second classification information. The method and the device can improve the accuracy of the image classification result.

Description

Image classification method, device, electronic equipment, medium and product

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular, to the technical fields of computer vision, image processing, deep learning, and the like. In particular to an image classification method, an image classification device, electronic equipment, media and products.

Background

In the field of computer vision, convolutional neural networks (Convolutional Neural Networks, CNN) are the dominant model structure, and CNN is mainly applied to tasks such as image classification, detection and segmentation. In order to improve the image processing effect of CNN, a means mainly adopted in the related art is to increase the width, depth, or input image resolution of CNN.

Disclosure of Invention

The present disclosure provides an image classification method, apparatus, electronic device, medium, and product.

According to a first aspect of the present disclosure, there is provided an image classification method, comprising:

acquiring an image to be classified;

extracting features of the images to be classified to obtain target feature information, wherein the target feature information comprises a first feature vector and a second feature vector, and the first feature vector is a feature vector generated based on a feature map of the images to be classified; the second feature vector is a feature vector generated based on the key value feature corresponding to the feature map, the value feature corresponding to the feature map and the initial query feature acquired in advance;

Generating first classification information based on the first feature vector, and generating second classification information based on the second feature vector, wherein the first classification information is used for representing the classification category of the image to be classified, and the second classification information is used for representing the classification category of the image to be classified;

a target classification category is determined based on the first classification information and the second classification information.

According to a second aspect of the present disclosure, there is provided an image classification apparatus comprising:

the acquisition module is used for acquiring the images to be classified;

the feature extraction module is used for extracting features of the images to be classified to obtain target feature information, wherein the target feature information comprises a first feature vector and a second feature vector, and the first feature vector is a feature vector generated based on a feature map of the images to be classified; the second feature vector is a feature vector generated based on the key value feature corresponding to the feature map, the value feature corresponding to the feature map and the initial query feature acquired in advance;

the generation module is used for generating first classification information based on the first feature vector and generating second classification information based on the second feature vector, wherein the first classification information is used for representing the classification class of the image to be classified, and the second classification information is used for representing the classification class of the image to be classified;

And the determining module is used for determining a target classification category based on the first classification information and the second classification information.

According to a third aspect of the present disclosure, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect described above.

According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of the first aspect described above.

According to a fifth aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method of the first aspect.

In the embodiment of the disclosure, in the process of classifying an image, a first feature vector and a second feature vector are generated, first classification information is generated based on the first feature vector, second classification information is generated based on the second feature vector, and then a target classification class is determined based on the first classification information and the second classification information, so that the accuracy of an image classification result is improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is one of the flowcharts of an image classification method provided by an embodiment of the present disclosure;

FIG. 2 is a second flowchart of an image classification method according to an embodiment of the disclosure;

FIG. 3 is a flow chart of an image classification method in the related art;

FIG. 4 is a flow chart illustrating a process of updating query features by a feature processing module in an embodiment of the disclosure;

fig. 5 is a schematic structural diagram of an image classification apparatus according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of the structure of a feature extraction module in an embodiment of the disclosure;

FIG. 7 is a schematic diagram of the structure of an update sub-module in an embodiment of the present disclosure;

FIG. 8 is a schematic diagram of the structure of a generation module in an embodiment of the disclosure;

FIG. 9 is a schematic diagram of the configuration of a determination module in an embodiment of the present disclosure;

fig. 10 is a block diagram of an electronic device for implementing an image classification method according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Referring to fig. 1, fig. 1 is a flowchart of an image classification method according to an embodiment of the disclosure, where the image classification method includes:

s101, obtaining an image to be classified;

step S102, extracting features of the images to be classified to obtain target feature information, wherein the target feature information comprises a first feature vector and a second feature vector, and the first feature vector is a feature vector generated based on a feature map of the images to be classified; the second feature vector is a feature vector generated based on the key value feature corresponding to the feature map, the value feature corresponding to the feature map and the initial query feature acquired in advance;

Step S103, generating first classification information based on the first feature vector and generating second classification information based on the second feature vector, wherein the first classification information is used for representing the classification class of the image to be classified, and the second classification information is used for representing the classification class of the image to be classified;

step S104, determining a target classification category based on the first classification information and the second classification information.

The image classification method can be applied to various image classification scenes or image recognition scenes, such as face recognition scenes, image searching scenes, image retrieval scenes, commodity recognition scenes and the like.

The Key feature corresponding to the Key feature, i.e., the feature map, and the Value feature corresponding to the Value feature, i.e., the feature map, may be the initial query feature obtained in advance: the Query features obtained in advance can be features generated based on the Query obtained by random initialization, for example, the Query features can be obtained by linearly projecting the Query.

The feature map may be a feature map obtained by extracting features of the image to be classified by using an image classification model, for example, may be a feature map obtained by extracting features of the image to be classified by using a CNN image classification model. Accordingly, the image classification model may be based on a first feature vector generated by the feature map on the one hand; on the other hand, the image classification model may also generate a second feature vector based on the robust feature corresponding to the feature map, the value feature corresponding to the feature map, and the initial query feature acquired in advance. Then, generating first classification information based on the first feature vector, and generating second classification information based on the second feature vector, namely, an image classification model can acquire features of an image to be classified based on two different modes, classify the features acquired based on the different modes respectively to obtain the first classification information and the second classification information, and finally, fusing the first classification information and the second classification information to obtain a target classification type. In the process, the feature vectors acquired in different modes are generally different, so that the feature acquisition globally is improved, the CNN receptive field is improved, and the accuracy of image classification can be improved.

In this embodiment, in the process of classifying an image, the first feature vector and the second feature vector are generated, the first classification information is generated based on the first feature vector, the second classification information is generated based on the second feature vector, and then the target classification class is determined based on the first classification information and the second classification information, so that the accuracy of the image classification result is improved.

Optionally, the extracting the features of the image to be classified to obtain target feature information includes:

and performing s times of iterative feature processing on the image to be classified based on an image classification model to obtain the target feature information, wherein the image classification model comprises s sequentially connected feature processing layers, one feature processing layer comprises a feature extraction module and a feature processing module, the feature extraction module is used for extracting features in the received image so as to output a feature map, the feature processing module is used for updating the initial query feature based on a healthy value feature corresponding to the feature map, a value feature corresponding to the feature map and the initial query feature, the s sequentially connected feature processing layers are in one-to-one correspondence with the s times of iterative feature processing, and s is an integer greater than 1.

Wherein the image classification model may be a CNN image classification model.

The image classification model can perform s iterative feature processing on the image to be classified based on the s feature processing layers which are connected in sequence. After the feature processing is performed on each feature processing layer, the updated query features and feature graphs can be output, and the updated query features and feature graphs are used as input of the next feature processing layer.

In this embodiment, the s sequentially connected feature processing layers are used to perform the s iterative feature processing on the image to be classified, so that the first feature vector and the second feature vector include more key features for image classification, and further accuracy of classification results is improved.

Optionally, the performing, based on the image classification model, s iterative feature processing on the image to be classified to obtain the target feature information includes:

inputting the ith image into a feature extraction module of an ith feature processing layer to perform feature extraction to obtain an ith feature image, wherein i is an integer greater than 0 and less than or equal to s;

generating an ith key value feature and an ith value feature by using the ith feature map based on a feature processing module of the ith feature processing layer;

Updating the ith query feature by using the ith key value feature, the ith value feature and the ith query feature based on the feature processing module of the ith feature processing layer to obtain an (i+1) th query feature;

wherein, in the case that the i is equal to 1, the i-th image is the image to be classified, and the i-th query feature is the initial query feature;

in the case that i is greater than 1, the ith image is an ith-1 th feature map;

the first feature vector is a feature vector generated based on an s-th feature map, and the second feature vector comprises s+1st query features output by a feature processing module of the s-th feature processing layer.

Referring to fig. 2, fig. 2 is a flow chart illustrating a process of classifying an image to be classified by an image classification model according to an embodiment of the disclosure. The feature processing module may be a feature processing layer of a CNN image classification model in the related art, and the feature processing module may be a Attention classification (Class Attention) module added on the basis of the CNN image classification model in the related art. Further, referring to fig. 3, the CNN image classification model in the related art has only one input since it does not include a Class Attention module. In the embodiment of the disclosure, since the classifier needs to classify based on the first feature vector and the second feature vector, a hybrid classifier may be used to replace the classifier of the CNN image classification model in the related art, where the hybrid classifier includes two inputs, and therefore, the first feature vector and the second feature vector may be transmitted to the hybrid classifier from the two inputs of the hybrid classifier, respectively, and thus, the hybrid classifier may classify based on the first feature vector and the second feature vector, respectively, to obtain the first classification information and the second classification information. After the first classification information and the second classification information are obtained, the hybrid classifier can also determine a target classification class based on the first classification information and the second classification information, so that a classification process of the image is realized.

Referring to fig. 2, in an embodiment of the present disclosure, the value of s is 3, and after the image classification model receives the image to be classified, the first feature extraction module based on the 1 st feature processing layer may perform feature extraction on the image to be classified, so as to obtain the 1 st feature map. And inputting the 1 st feature map and the initial query feature into a first feature processing module of the 1 st feature processing layer, generating a 1 st key value feature and a 1 st value feature based on the 1 st feature map by the first feature processing module, and updating the initial query feature by the first feature processing module by using the 1 st key value feature, the 1 st value feature and the initial query feature to obtain a 2 nd query feature. And then, inputting the 2 nd query feature and the 1 st feature map into a 2 nd feature processing layer for carrying out second feature processing to obtain the 3 rd query feature and the 2 nd feature map. And inputting the 3 rd query feature and the 2 nd feature map into a 3 rd feature processing layer for carrying out third feature processing to obtain the 4 th query feature and the 3 rd feature map.

The ith query feature, the ith key feature and the ith value feature may be features in a vector form, and accordingly, the (s+1) th query feature may be directly used as the second feature vector. That is, in the embodiment shown in fig. 2, the 4 th query feature may be directly taken as the second feature vector.

The feature vector generated based on the s-th feature map may specifically be: and pooling the s-th feature map by a maximum pooling layer (maximum pooling) of the image classification model to obtain feature vectors. Alternatively, the first feature vector may specifically be a feature vector generated based on the s-th feature map: and carrying out pooling processing on the s-th feature map by an average pooling layer (average pooling) of the image classification model to obtain feature vectors. That is, in the embodiment shown in fig. 2, the 3 rd feature map may be subjected to pooling processing based on a maximum pooling layer or an average pooling layer of the image classification model, to obtain the first feature vector.

In the embodiment, the Class Attention module can model global relations among features, so that the updated Query has global receptive fields, and compared with the CNN model in the related technology, the image classification method has stronger classification recognition capability. In addition, in the embodiment of the disclosure, only one Class Attention module is added on the basis of one CNN model, and the classifier of the CNN model is replaced by a hybrid classifier, so that the calculated amount of the image classification model is only slightly more than that of a single CNN model in the process of classifying images, and the calculated amount of the model in the process of classifying images can be effectively reduced compared with that in the process of classifying images by adopting a plurality of CNN models in a hybrid manner.

Optionally, the feature processing module based on the ith feature processing layer updates the ith query feature by using the ith key value feature, the ith value feature and the ith query feature to obtain an ith+1th query feature, including:

processing the ith health value feature and the ith query feature by using an attention mechanism based on a feature processing module of the ith feature processing layer to obtain an ith attention matrix;

and carrying out aggregation processing on the ith value feature and the ith attention moment array based on the feature processing module of the ith feature processing layer to obtain the (i+1) th query feature.

Referring to fig. 4, in an embodiment of the disclosure, a flow chart of a feature processing process performed by a feature processing module is shown, input of the feature processing module may be a Query feature and a feature map, the feature processing module may generate a Key feature and a Value feature based on the feature map, and then perform attention enhancement processing on the Query feature and the Key feature by using an attention mechanism to obtain an attention matrix. The attention mechanism may be an attention mechanism commonly used in the related art, for example, attention enhancement processing may be performed based on the Scaled Dot-Product Attention Query feature and the Key feature, so as to obtain an attention matrix. Referring to fig. 4, after obtaining the attention matrix, the feature processing module may further aggregate the attention matrix and the Value feature to obtain an updated Query feature.

In one embodiment of the present disclosure, a set of k-numbered query vectors may be initialized, the dimensions of the query vectors being d, k and d being system superparameters, typically with k set to 6 and d set to 192. Thus, when the image classification model receives the image to be classified, the image classification process can be realized based on the query vector obtained by initialization.

The feature processing module may further aggregate the attention matrix and the Value feature specifically may refer to: and carrying out weighted summation on the attention matrix and the Value feature to obtain the updated Query feature. Specifically, the Value feature may be a feature matrix in a matrix form identical to the dimension of the attention matrix, or the Value feature may be converted into a feature matrix identical to the dimension of the attention matrix, and then the attention matrix and the feature matrix obtained after the conversion are weighted and summed. The weight ratio in the process of performing weighted summation can be determined according to actual needs, for example, the weight ratio of the attention matrix to the Value feature is 1:1, or 1:2, etc.

In this embodiment, the ith health value feature and the ith query feature are processed by using an attention mechanism based on a feature processing module of the ith feature processing layer to obtain an ith attention matrix; and the ith value feature and the ith attention moment array are subjected to aggregation treatment based on the feature processing module of the ith feature processing layer to obtain the (i+1) th Query feature, so that the global relation among the features can be modeled, the updated Query has a global receptive field, and the image classification effect is improved.

Optionally, the feature processing module based on the ith feature processing layer generates an ith key feature and an ith value feature by using the ith feature map, including:

the feature processing module based on the ith feature processing layer performs linear projection on the ith feature map by using a first projection matrix to obtain the ith key value feature, and the feature processing module based on the ith feature processing layer performs linear projection on the ith feature map by using a second projection matrix to obtain the ith value feature, wherein the first projection matrix and the second projection matrix are different matrices.

The first projection matrix may be expressed as: w (W) _K ∈R ^d×d ' the second projection matrix may be denoted W _V ∈R ^d ^×d '. In addition, the obtaining the Query feature by linearly projecting the Query may specifically refer to: based on a third projection matrix W _Q ∈R ^d×d And carrying out linear projection on the Query to obtain the Query characteristics. The first projection matrix, the second projection matrix and the third projection matrix are respectively different projection matrices. In one embodiment of the present disclosure, the image classification model may include three linear layers, which are respectively associated with the three projectionsThe matrices are in one-to-one correspondence, and the three linear layers are used for realizing the linear projection process.

After the linear projection, the feature dimensions of the obtained Query feature, key feature and Value feature are the same. The dimension is a system hyper-parameter, typically set to 256.

In this embodiment, the i-th robust feature is obtained by linearly projecting the i-th feature map by using a first projection matrix based on the feature processing module of the i-th feature processing layer, and the i-th value feature is obtained by linearly projecting the i-th feature map by using a second projection matrix based on the feature processing module of the i-th feature processing layer, so that the process of obtaining the robust feature and the value feature can be realized.

Optionally, the first feature vector includes features of m dimensions, the second feature vector includes features of d dimensions, the generating first classification information based on the first feature vector, and generating second classification information based on the second feature vector includes:

determining a first classification matrix and a second classification matrix, wherein the first classification matrix is an n multiplied by m classification matrix, and the second classification matrix is an n multiplied by d classification matrix;

the first classification information is generated based on the first classification matrix and the first feature vector, and the second classification information is generated based on the second classification matrix and the second feature vector.

In one embodiment of the present disclosure, the first feature vector may be expressed as f _cnn The second eigenvector may be expressed as f _class . After determining that the first feature vector includes m-dimensional features and the second feature vector includes d-dimensional features, and determining that the number of categories is n, the first classification matrix and the second classification matrix may be determined based on methods in the related art, for example, W may be adopted ₁ ∈R ^n×m Representing a first classification matrix, using W ₂ ∈R ^n×d Representing a second classification matrix.

Generating the first classification information f based on the first classification matrix and the first feature vector ₁ And generating the second classification information f based on the second classification matrix and the second feature vector ₂ Specifically, the first classification information and the second classification information may be obtained by calculating using the following formula:

f ₁ ＝softmax(W ₁ f _cnn )

f ₂ ＝softmax(W ₂ f _class )

wherein f ₁ For the first classification information, softmax () is a normalized exponential function, W ₁ For the first classification matrix, f _cnn As a first feature vector, f ₂ For the second classification information, W ₂ For the second classification matrix, f _class Is the second feature vector.

It can be appreciated that the first classification information f ₁ And second classification information f ₂ May be a probability distribution, i.e. the first classification information f ₁ And second classification information f ₂ And respectively comprising probability values of the images to be classified belonging to each of the n categories.

In this embodiment, the acquiring process of the first classification information and the second classification information is implemented by determining a first classification matrix and a second classification matrix, generating the first classification information based on the first classification matrix and the first feature vector, and generating the second classification information based on the second classification matrix and the second feature vector.

Optionally, the determining the target classification category based on the first classification information and the second classification information includes:

the first classification information and the second classification information are weighted and summed to obtain target classification information, wherein the target classification information comprises n probability values, and the n probability values are in one-to-one correspondence with n preset classification categories;

and determining the classification category corresponding to the maximum probability value in the n probability values as the target classification category.

The weight ratio in the process of weighting and summing the first classification information and the second classification information can be selected according to actual needs, for example, the weight ratio of the first classification information and the second classification information can be 1:1, or 1:2, or the like. In one embodiment of the present disclosure, when the weight ratio may be 1:1, the target classification information may be calculated based on the following formula:

wherein f is target classification information, f ₁ For the first classification information, f ₂ Is the second classification information. It may be appreciated that the target classification information may be a probability distribution, i.e. the f comprises a probability value for the image to be classified belonging to each of the n categories.

In this embodiment, the target classification information is obtained by performing weighted summation on the first classification information and the second classification information; in this way, it can be ensured that the target classification information can fuse the classification results of the first classification information and the second classification information. Meanwhile, the classification category corresponding to the largest probability value in the n probability values is determined as the target classification category, so that the accuracy of the classification result can be improved.

It should be noted that, in the method provided by the embodiment of the disclosure, only one Class Attention module is added on the basis of one CNN model, and the classifier of the CNN model is replaced by a hybrid classifier, so that in the process of classifying images, the calculated amount of the image classification model is only slightly more than that of a single CNN model, and compared with the process of performing hybrid classification by adopting a plurality of CNN models, the calculated amount of the model in the process of classifying images can be effectively reduced. For example, when the first classification information and the second classification information are obtained by classifying using two CNN models, respectively, the corresponding calculation amount is O (N ² ). By adopting the method of the embodiment of the disclosure, the first classification information and the second classification information are obtained by classification The calculated amount in the process of classifying information is O (kN), wherein k is the number of initialized Query vectors, N is the length of a feature sequence of a CNN feature map, and k is far smaller than N. Therefore, by adopting the method provided by the embodiment of the disclosure to carry out the mixed classification, the computational complexity can be effectively reduced, and the image classification efficiency is further improved.

Experiments prove that the experimental data of the CNN image classification model in the related technology are as follows: the CNN classification accuracy was 75.11% and the calculated amount was 229.6M. The experimental data for carrying out mixed classification by adopting the image classification model optimized by the embodiment of the disclosure are as follows: the classification accuracy was 75.42% and the calculated amount was 230.6M. Therefore, the embodiment of the disclosure optimizes the model structure of the image classification model, so that the accuracy of the existing classification model can be improved under the condition of increasing a small amount of calculation cost.

Referring to fig. 5, a schematic structural diagram of an image classification apparatus 500 according to an embodiment of the disclosure is provided, where the image classification apparatus 500 includes:

an obtaining module 501, configured to obtain an image to be classified;

the feature extraction module 502 is configured to perform feature extraction on the image to be classified to obtain target feature information, where the target feature information includes a first feature vector and a second feature vector, and the first feature vector is a feature vector generated based on a feature map of the image to be classified; the second feature vector is a feature vector generated based on the key value feature corresponding to the feature map, the value feature corresponding to the feature map and the initial query feature acquired in advance;

A generating module 503, configured to generate first classification information based on the first feature vector, and generate second classification information based on the second feature vector, where the first classification information is used to characterize a classification class of the image to be classified, and the second classification information is used to characterize the classification class of the image to be classified;

a determining module 504 is configured to determine a target classification category based on the first classification information and the second classification information.

Optionally, the feature extraction module 502 is specifically configured to perform s times of iterative feature processing on the image to be classified based on an image classification model to obtain the target feature information, where the image classification model includes s sequentially connected feature processing layers, one feature processing layer includes a feature extraction module 502 and a feature processing module, the feature extraction module 502 is configured to extract features in the received image to output the feature map, and the feature processing module is configured to update the initial query feature based on a robust feature corresponding to the feature map, a value feature corresponding to the feature map, and the initial query feature, and the s sequentially connected feature processing layers are in one-to-one correspondence with the s times of iterative feature processing, where s is an integer greater than 1.

Optionally, referring to fig. 6, the feature extraction module 502 includes:

the feature extraction submodule 5021 is configured to input an ith image into the feature extraction module 502 of the ith feature processing layer to perform feature extraction to obtain an ith feature map, where i is an integer greater than 0 and less than or equal to s;

a first generating sub-module 5022, configured to generate an ith healthy value feature and an ith value feature by using the ith feature map based on the feature processing module of the ith feature processing layer;

an updating submodule 5023, configured to update the ith query feature by using the ith key value feature, the ith value feature and the ith query feature based on the feature processing module of the ith feature processing layer to obtain an ith+1th query feature;

in the case that i is greater than 1, the ith image is an ith-1 th feature map;

Optionally, referring to fig. 7, the update sub-module 5023 includes:

an attention processing unit 50231, configured to process the ith robust feature and the ith query feature by using an attention mechanism based on the feature processing module of the ith feature processing layer to obtain an ith attention matrix;

and the aggregation processing unit 50232 is configured to perform aggregation processing on the ith value feature and the ith attention moment array based on the feature processing module of the ith feature processing layer to obtain the (i+1) th query feature.

Optionally, the first generating sub-module 5022 is specifically configured to perform linear projection on the ith feature map by using a first projection matrix based on the feature processing module of the ith feature processing layer to obtain the ith robust feature, and perform linear projection on the ith feature map by using a second projection matrix based on the feature processing module of the ith feature processing layer to obtain the ith value feature, where the first projection matrix and the second projection matrix are different matrices.

Optionally, the first feature vector includes features in m dimensions, the second feature vector includes features in d dimensions, referring to fig. 8, the generating module 503 includes:

A first determining submodule 5031 configured to determine a first classification matrix and a second classification matrix, where the first classification matrix is an nxm classification matrix and the second classification matrix is an nxd classification matrix;

a second generation sub-module 5032 for generating the first classification information based on the first classification matrix and the first feature vector, and generating the second classification information based on the second classification matrix and the second feature vector.

Optionally, referring to fig. 9, the determining module 504 includes:

a calculating submodule 5041, configured to perform weighted summation on the first classification information and the second classification information to obtain target classification information, where the target classification information includes n probability values, and the n probability values are in one-to-one correspondence with preset n classification categories;

and a second determining submodule 5042, configured to determine, as the target classification category, a classification category corresponding to a largest probability value among the n probability values.

It should be noted that, the image classification device 500 provided in this embodiment can implement all the technical solutions of the above-mentioned image classification method embodiments, so at least all the above-mentioned technical effects can be implemented, and the description thereof is omitted here.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related user personal information all conform to the regulations of related laws and regulations, and the public sequence is not violated.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 10 shows a schematic block diagram of an example electronic device 1000 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 10, the electronic device 1000 includes a computing unit 1001 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1002 or a computer program loaded from a storage unit 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data required for the operation of the device 1000 can also be stored. The computing unit 1001, the ROM 1002, and the RAM 1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.

Various components in the electronic device 1000 are connected to the I/O interface 1005, including: an input unit 1006 such as a keyboard, a mouse, and the like; an output unit 1007 such as various types of displays, speakers, and the like; a storage unit 1008 such as a magnetic disk, an optical disk, or the like; and communication unit 1009 such as a network card, modem, wireless communication transceiver, etc. Communication unit 1009 allows device 1000 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks.

The computing unit 1001 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1001 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 1001 performs the respective methods and processes described above, for example, the image classification method. For example, in some embodiments, the image classification method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 1008. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 1000 via ROM 1002 and/or communication unit 1009. When the computer program is loaded into RAM 1003 and executed by computing unit 1001, one or more steps of the image classification method described above are performed. Alternatively, in other embodiments, the computing unit 1001 may be configured to perform the image classification method in any other suitable way (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. An image classification method, comprising:

acquiring an image to be classified;

extracting features of the images to be classified to obtain target feature information, wherein the target feature information comprises a first feature vector and a second feature vector, and the first feature vector is a feature vector generated based on a feature map of the images to be classified; the second feature vector is a feature vector generated based on key value features corresponding to the feature map, value features corresponding to the feature map and initial query features acquired in advance;

determining a target classification category based on the first classification information and the second classification information;

the step of extracting the characteristics of the images to be classified to obtain target characteristic information comprises the following steps:

performing s times of iterative feature processing on the image to be classified based on an image classification model to obtain the target feature information, wherein the image classification model comprises s feature processing layers which are sequentially connected, one feature processing layer comprises a feature extraction module and a feature processing module, the s feature processing layers which are sequentially connected are in one-to-one correspondence with the s times of iterative feature processing, and s is an integer greater than 1;

after each feature processing layer performs feature processing, outputting updated query features and feature graphs, taking the updated query features and feature graphs as input of a next feature processing layer, wherein the input of a 1 st feature processing layer is an initial query feature and the image to be classified, the first feature vector is a feature vector generated based on the feature graphs output by an s-th feature processing layer, and the second feature vector comprises query features output by the s-th feature processing layer;

In one feature processing layer, the feature extraction module is used for extracting features in the received image to output a feature map, and the feature processing module is used for updating the received query feature based on the key value feature, the value feature and the received query feature of the feature map output by the feature extraction module in the feature processing layer to obtain the updated query feature.

2. The method according to claim 1, wherein the performing the s-time iterative feature processing on the image to be classified based on the image classification model to obtain the target feature information includes:

in the case that i is greater than 1, the ith image is an ith-1 th feature map;

3. The method of claim 2, wherein the feature processing module based on the ith feature processing layer updates the ith query feature with the ith key-value feature, the ith value feature, and the ith query feature to obtain an (i+1) th query feature, including:

processing the ith key value feature and the ith query feature by using an attention mechanism based on a feature processing module of the ith feature processing layer to obtain an ith attention matrix;

4. The method of claim 2, wherein the feature processing module based on the ith feature processing layer generates an ith key-value feature and an ith value feature using the ith feature map, comprising:

5. The method of claim 1, wherein the first feature vector comprises m-dimensional features, the second feature vector comprises d-dimensional features, the generating first classification information based on the first feature vector, and the generating second classification information based on the second feature vector, comprising:

determining a first classification matrix and a second classification matrix, wherein the first classification matrix is an n×m classification matrix, the second classification matrix is an n×d classification matrix, and n represents the number of classification categories;

6. The method of claim 1, wherein the determining a target classification category based on the first classification information and the second classification information comprises:

7. An image classification apparatus comprising:

the acquisition module is used for acquiring the images to be classified;

the feature extraction module is used for extracting features of the images to be classified to obtain target feature information, wherein the target feature information comprises a first feature vector and a second feature vector, and the first feature vector is a feature vector generated based on a feature map of the images to be classified; the second feature vector is a feature vector generated based on key value features corresponding to the feature map, value features corresponding to the feature map and initial query features acquired in advance;

a determining module, configured to determine a target classification category based on the first classification information and the second classification information;

the feature extraction module is specifically configured to: performing s times of iterative feature processing on the image to be classified based on an image classification model to obtain the target feature information, wherein the image classification model comprises s feature processing layers which are sequentially connected, one feature processing layer comprises a feature extraction module and a feature processing module, the s feature processing layers which are sequentially connected are in one-to-one correspondence with the s times of iterative feature processing, and s is an integer greater than 1;

8. The apparatus of claim 7, wherein the feature extraction module comprises:

the feature extraction sub-module is used for inputting the ith image into the feature extraction module of the ith feature processing layer to perform feature extraction to obtain an ith feature image, wherein i is an integer which is more than 0 and less than or equal to s;

a first generating sub-module, configured to generate an ith key value feature and an ith value feature by using the ith feature map based on the feature processing module of the ith feature processing layer;

an updating sub-module, configured to update the ith query feature by using the ith key value feature, the ith value feature, and the ith query feature based on the feature processing module of the ith feature processing layer to obtain an (i+1) th query feature;

in the case that i is greater than 1, the ith image is an ith-1 th feature map;

9. The apparatus of claim 8, wherein the update sub-module comprises:

the attention processing unit is used for processing the ith key value feature and the ith query feature by using an attention mechanism based on the feature processing module of the ith feature processing layer to obtain an ith attention matrix;

and the aggregation processing unit is used for carrying out aggregation processing on the ith value feature and the ith attention moment array based on the feature processing module of the ith feature processing layer to obtain the (i+1) th query feature.

10. The apparatus of claim 8, wherein the first generating sub-module is specifically configured to linearly project the ith feature map by using a first projection matrix based on a feature processing module of the ith feature processing layer to obtain the ith key value feature, and linearly project the ith feature map by using a second projection matrix based on the feature processing module of the ith feature processing layer to obtain the ith value feature, where the first projection matrix and the second projection matrix are different matrices.

11. The apparatus of claim 7, wherein the first feature vector comprises m-dimensional features and the second feature vector comprises d-dimensional features, the generating module comprising:

a first determining submodule, configured to determine a first classification matrix and a second classification matrix, where the first classification matrix is an nxm classification matrix, the second classification matrix is an nxd classification matrix, and n represents the number of classification categories;

and a second generation sub-module for generating the first classification information based on the first classification matrix and the first feature vector, and generating the second classification information based on the second classification matrix and the second feature vector.

12. The apparatus of claim 7, wherein the means for determining comprises:

the computing sub-module is used for carrying out weighted summation on the first classification information and the second classification information to obtain target classification information, wherein the target classification information comprises n probability values, and the n probability values are in one-to-one correspondence with n preset classification categories;

and the second determining submodule is used for determining the classification category corresponding to the maximum probability value in the n probability values as the target classification category.

13. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the image classification method of any one of claims 1-6.

14. A non-transitory computer readable storage medium storing computer instructions for causing the computer readable storage medium to perform the steps of the image classification method of any one of claims 1-6.

15. A computer program product comprising a computer program which, when executed by a processor, implements the steps of the image classification method of any of claims 1-6.