CN112749293A

CN112749293A - Image classification method and device and storage medium

Info

Publication number: CN112749293A
Application number: CN202010504243.4A
Authority: CN
Inventors: 沈伟; 康斌
Original assignee: Tencent Technology Beijing Co Ltd
Current assignee: Shenzhen Yayue Technology Co ltd
Priority date: 2020-06-05
Filing date: 2020-06-05
Publication date: 2021-05-04

Abstract

The present application relates to the field of computer technologies, and in particular, to an image classification method, apparatus, and storage medium for improving accuracy of classification results of images. According to the method, a plurality of image features of an image to be classified are obtained according to feature vectors corresponding to labels, a predicted value of each image feature for each label is determined, finally, the predicted value of the image to be classified is determined according to each predicted value corresponding to each label, and the image to be classified is classified according to the label predicted value of the image to be classified. Therefore, the accuracy of each label predicted value in the image to be classified can be improved by acquiring the plurality of image characteristics of the image to be classified and determining the label predicted value of the image to be classified according to the predicted values of the plurality of image characteristics to each label, so that the accuracy of image classification is improved.

Description

Image classification method and device and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to an image classification method and apparatus, and a storage medium.

Background

With the rapid development of internet technology, the requirements for image classification are also higher and higher. At present, in order to determine which kind or kinds of images the image to be classified belongs to, multi-label classification needs to be performed on the image to be classified. That is, it is detected whether an image belongs to one or more known categories. Most of the existing technical schemes for multi-label classification ignore semantic differences among labels, so that the accuracy of image classification results in the prior art is low, and the subsequent classification task is not facilitated.

Disclosure of Invention

The embodiment of the application provides an image classification method, an image classification device and a storage medium, so as to improve the accuracy of image classification results.

In a first aspect, an image classification method provided in an embodiment of the present application includes:

acquiring at least two image characteristics of an image to be classified; different image features are obtained according to the feature vectors corresponding to different labels;

respectively obtaining a predicted value of each image feature corresponding to each label;

for each label, determining a target predicted value of the label according to each predicted value of the label corresponding to each image feature;

taking the target predicted value of each label as the label predicted value of the image to be classified;

and classifying the image to be classified according to the label prediction value of the image to be classified.

In a second aspect, an image classification apparatus provided in an embodiment of the present application includes:

the first acquisition module is used for acquiring at least two image characteristics of the image to be classified; different image features are obtained according to the feature vectors corresponding to different labels;

the second acquisition module is used for respectively acquiring the predicted value of each image feature corresponding to each label;

the first determination module is used for determining a target predicted value of each label according to each predicted value of the label corresponding to each image feature;

the second determination module is used for taking the target predicted value of each label as the label predicted value of the image to be classified;

and the classification module is used for classifying the image to be classified according to the label prediction value of the image to be classified.

Optionally, the first obtaining module includes:

determining a sub-image unit, which is used for obtaining at least two sub-images from the image to be classified, wherein the spliced image of all the sub-images comprises all areas of the multi-label image;

the first feature extraction unit is used for extracting features of each sub-image to obtain sub-image features of the sub-images;

and the first image feature determining unit is used for taking the features of the sub-images as the image features of the image to be classified.

Optionally, the first obtaining module includes:

the second feature extraction unit is used for performing feature extraction on the image to be classified to obtain at least two global image features of the image to be classified; and the number of the first and second groups,

the third feature extraction unit is used for extracting features of each sub-image to obtain at least two local image features of each sub-image; the number of the local image features of each sub-image is the same as that of the global image features, and the local image features and the global image features correspond to each other;

the fusion unit is used for fusing the local image features with global image features corresponding to the local image features in the image to be classified according to the local image features of each sub-image to obtain fused image features of the local image features;

and the second image feature determining unit is used for taking the fusion image features of the sub-images as the image features of the image to be classified.

Optionally, the fused image feature is obtained by:

the first fusion subunit is used for taking the average value of each local image feature of each sub-image and the global image feature corresponding to the local image feature in the image to be classified as a fusion image feature; or the like, or, alternatively,

the second fusion subunit is used for combining the local image feature with the maximum value of each dimension in the global image feature corresponding to the local image feature in the image to be classified according to each local image feature of each sub-image, and taking the combined result as a fusion image feature; or the like, or, alternatively,

and the third fusion subunit is used for connecting the local image features with the global image features corresponding to the local image features in the image to be classified according to the local image features of each sub-image, inputting the local image features and the global image features into a full-connection network model, and outputting the full-connection network model to obtain the fusion image features.

Optionally, the determining the sub-image unit includes:

the determining side subunit is used for determining a long side and a short side of the image to be classified, determining that a first side of the image to be classified is the short side, and a second side is smaller than the long side;

and determining a sub-image subunit, which is used for setting a segmentation window with the same size as the sub-image, starting from the short side, moving the segmentation window along the long side for a set length each time, and obtaining the sub-image according to the region of the segmentation window in the image to be classified.

Optionally, the first obtaining module includes:

the fourth feature extraction unit is used for performing feature extraction on the image to be classified to obtain a feature map of the image to be classified;

the convolution unit is used for respectively carrying out convolution calculation on the feature map and the feature vectors corresponding to the labels to obtain the attention masks corresponding to the labels;

and a third image feature determining unit, configured to perform dot product calculation on the feature map and the attention mask corresponding to each label, respectively, to obtain at least two image features of the image to be classified.

Optionally, the first determining module is specifically configured to determine, for each label, a maximum value of the predicted values of the label corresponding to each image feature as a target predicted value of the label.

In a third aspect, a computing device is provided, comprising at least one processing unit, and at least one memory unit, wherein the memory unit stores a computer program that, when executed by the processing unit, causes the processing unit to perform the steps of any of the image classification methods described above.

In one embodiment, the computing device may be a server or a terminal device.

In a fourth aspect, there is provided a computer readable medium storing a computer program executable by a terminal device, the program, when run on the terminal device, causing the terminal device to perform the steps of any of the image classification methods described above.

The beneficial effect of this application is as follows:

according to the image classification method, the image classification device, the electronic equipment and the storage medium, the plurality of image features of the image to be classified are obtained according to the feature vectors corresponding to the labels, the predicted value of each image feature to each label is determined, finally, the predicted value of the image to be classified is determined according to each predicted value corresponding to each label, and the image to be classified is classified according to the label predicted value of the image to be classified. Therefore, the accuracy of each label predicted value in the image to be classified can be improved by acquiring the plurality of image characteristics of the image to be classified and determining the label predicted value of the image to be classified according to the predicted values of the plurality of image characteristics to each label, so that the accuracy of image classification is improved.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a schematic flowchart of an image classification method in an embodiment of the present application;

FIG. 2 is a schematic diagram of an attention mask in an embodiment of the present application;

FIG. 3 is a block diagram of a neural network model in an embodiment of the present application;

fig. 4 is a schematic diagram of the splitting of sub-images in the embodiment of the present application;

FIG. 5 is a flow chart of an overall method in an embodiment of the present application;

FIG. 6 is a schematic structural diagram of an image classification apparatus according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a terminal device in an embodiment of the present application.

Detailed Description

In order to improve the accuracy of the classification result of an image, the embodiment of the application provides an image classification method, an image classification device and a storage medium. In order to better understand the technical solution provided by the embodiments of the present application, the following brief description is made on the basic principle of the solution:

with the development of internet technology, more and more attention is paid to accurate image classification. Particularly in the classification task facing multiple labels, it is very important to be able to accurately determine the probability of an image with respect to each label, and the higher the accuracy, the more beneficial the subsequent classification task. In the prior art, the technical scheme for multi-label classification is mostly based on a single-scale convolutional neural network model, that is, a given input image is input into a convolutional neural network, and then the convolutional neural network outputs an identification result. The training process proceeds with each label as if it were a two-classification task. When a convolutional neural network is used for feature extraction, only one image feature is often extracted for one image, and then the image feature is used for multi-label classification.

However, due to the large semantic difference between the labels, in this case, if the feature extraction step extracts only one image feature, the representative features of multiple labels are often mixed together, which causes the ambiguity of feature representation and is not favorable for the final classification task. For example, if a single image needs to be classified, it is determined whether the image includes the tag 1 and the tag 2 and the probability value of each tag in the image. When the label 1 and the label 2 appear in the image at the same time, only one image feature is extracted, and the image feature may contain the content of the label 1 and the content of the label 2, so that the concentration of feature representation is lost, and the accuracy of the classification result is low, which is not beneficial to the subsequent classification task.

Therefore, in order to improve the accuracy of the image classification result, embodiments of the present application provide an image classification method, an apparatus, an electronic device, and a storage medium, where a plurality of image features of an image to be classified are obtained according to feature vectors corresponding to labels, a prediction value of each image feature for each label is determined, and finally, the prediction value of the image to be classified is determined by using each prediction value corresponding to each label. Therefore, the accuracy of each label predicted value in the image to be classified can be improved by acquiring the plurality of image characteristics of the image to be classified and determining the label predicted value of the image to be classified according to the predicted values of the plurality of image characteristics to each label, so that the accuracy of image classification is improved.

Therefore, if an image is required to be classified to judge the probability value of each label in the image, the predicted value of each label can be more accurately determined according to the method provided by the embodiment of the application. For example: if an image needs to be classified, and the predicted values of the image about the label 1 and the label 2 are determined, feature extraction can be performed on the image according to the feature vector corresponding to the label 1 to obtain an image feature 1; and performing feature extraction on the image according to the feature vector corresponding to the label 2 to obtain image features 2. Thus, the obtained predicted value of the image feature 1 for the tag 1 is more accurate, the predicted value of the image feature 2 for the tag 2 is more accurate, and finally the predicted value of the image feature 1 for the tag 1 and the predicted value of the image feature 2 for the tag 2 are taken as the predicted values of the image for the tags.

The preferred embodiments of the present application will be described below with reference to the accompanying drawings of the specification, it should be understood that the preferred embodiments described herein are merely for illustrating and explaining the present application, and are not intended to limit the present application, and that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

The following further explains the image classification method provided in the embodiments of the present application. As shown in fig. 1, the method comprises the following steps:

s101: acquiring at least two image characteristics of an image to be classified; wherein, different image characteristics are obtained according to the characteristic vectors corresponding to different labels.

In the embodiment of the application, a plurality of image features can be output by modifying the neural network model.

First, the neural network model needs to be trained. The training process is to determine the feature vector corresponding to each label. And inputting the sample into a neural network model to be trained, and outputting the image characteristics of the sample. Wherein each sample has a label and contains only one label content. And calculating the image characteristics of the sample and the characteristic vector of the label corresponding to the sample to obtain the characteristic error. And adjusting parameters of the neural network model according to the characteristic errors so that the characteristic errors of the image characteristics output by the sample and the characteristic vectors of the labels corresponding to the sample are in a preset range.

In the trained neural network model, feature extraction can be performed on the image to be classified through the trained neural network model, and the method specifically comprises the following steps of A1-A3:

step A1: and performing feature extraction on the image to be classified to obtain a feature map of the image to be classified.

In the embodiment of the application, the feature map of the image to be classified is obtained through the feature extraction network. Namely, convolution calculation is carried out on the image to be classified to obtain a feature map of the image to be classified.

Step A2: and performing convolution calculation on the feature map and the feature vectors corresponding to the labels respectively to obtain the attention masks corresponding to the labels.

Wherein the attention mask is a binary image, and the meaning of the binary image is the position of the label approximately distributed in the image to be classified. As shown in fig. 2, which is a schematic view of the attention mask. The white area is the position where the label is approximately distributed in the image to be classified, and the black area is the position where other contents in the image to be classified are located.

Therefore, the distribution of the labels in the image to be classified can be confirmed more clearly through the attention mask.

Step A3: and respectively carrying out dot product calculation on the feature graph and the attention mask corresponding to each label to obtain at least two image features of the image to be classified.

In the embodiment of the application, the feature map is multiplied by the attention mask point and summed in the spatial domain, so as to obtain the image feature concerned by the corresponding label.

Wherein, the same number of feature vectors can be obtained according to the number of labels in the neural network model; that is, the number of labels is the same as the number of feature vectors, and corresponds to one.

As shown in fig. 3, it is a framework diagram of a neural network model. Obtaining a feature map of the image to be classified through a feature extraction network, and performing convolution calculation on the feature map and feature vectors corresponding to the labels to obtain attention masks corresponding to the labels; and finally, calculating the attention mask corresponding to each label on the feature map to obtain the image features.

As can be seen from the above, the positions of the labels distributed on the image are different, so that when the image features of the image to be classified are obtained, the image to be classified may be processed into a plurality of sub-images, and thus, the information included in each sub-image is different, and the obtained image features are also different. Specifically, the method comprises the following steps:

step B1: and obtaining at least two sub-images from the image to be classified, wherein the spliced image of all the sub-images comprises all areas of the multi-label image.

In the embodiment of the present application, the sub-image may be obtained according to each side length of the image to be classified, specifically:

determining long edges and short edges of the images to be classified, determining a first edge of the images to be classified as a short edge, and determining a second edge of the images to be classified as a long edge;

and setting a segmentation window with the same size as the sub-image, starting from the short side, moving the segmentation window along the long side for a set length each time, and obtaining the sub-image in the region of the image to be classified according to the segmentation window.

The positions of the objects with different length-width ratios on the images to be classified cannot be determined in advance. In order to fit an image to a size, the short side of the image is usually filled with blank pixels (for example, black or white where filled), and the image is input to a model by scaling processing and processed. In this way, the concentration of the acquired image features may be reduced.

Therefore, the problem is solved by the method for segmenting the sub-image, when the sub-image is segmented, the long edge and the short edge of the image are determined, the short edge is set as the side length of the segmentation window (wherein the segmentation window is square, namely, the side lengths are the same), the image is moved from one short edge to the other short edge of the image to be classified, and the corresponding image content in the segmentation window is taken as the sub-image in the moving process.

It should be noted that the number of the sub-images may be selected according to actual situations, and during the selection, the length may be set by moving the segmentation window to determine one sub-image, or the sub-image may be obtained by random segmentation.

As shown in fig. 4, which is a schematic diagram when a sub-image is sliced. In fig. 4, 401 denotes a slit window, which moves from left to right and defines one sub-image by a set distance of movement. And obtaining 3 sub-images through segmentation.

Step B2: and performing feature extraction on each sub-image to obtain the sub-image features of the sub-images.

In the embodiment of the present application, a general feature extraction may be performed on the sub-images, that is, one image feature is obtained by extracting one sub-image.

Step B3: and taking the characteristics of the sub-images as the image characteristics of the image to be classified.

And processing the image to be classified to obtain sub-images, and acquiring the image characteristics of each sub-image. Therefore, one image is segmented, so that the attention points of each image are different, and the obtained classification result is more accurate.

Of course, in order to more accurately determine the tag prediction value, after a plurality of sub-images are acquired, feature extraction for acquiring a plurality of image features may be performed on the plurality of sub-images as well.

For example: the image to be classified is divided into 3 sub-images through a division window, and the 3 sub-images are subjected to feature extraction of a plurality of image features, for example, each sub-image is extracted to obtain two image features, namely, a sub-image 1 obtains a feature 11 and a feature 12, a sub-image 2 obtains a feature 21 and a feature 22, and a sub-image 3 obtains a feature 31 and a feature 32. Thus, 6 image features can be obtained through one image to be classified.

In order to further improve the accuracy of image classification, on the basis of acquiring a plurality of image features of a plurality of sub-images, the image features of the whole image are added, and the image features of the whole image are fused with the image features of the sub-images, so that the image features are more multi-scale. Specifically, the method comprises the steps of C1-C5:

step C1: and obtaining at least two sub-images from the image to be classified, wherein the spliced image of all the sub-images comprises all areas of the multi-label image.

Step C2: and performing feature extraction on the image to be classified to obtain at least two global image features of the image to be classified.

In the embodiment of the application, the images to be classified can be firstly supplemented, and then input into the model for feature extraction through scaling.

Step C3: extracting the features of each sub-image to obtain at least two local image features of each sub-image; the number of the local image features of each sub-image is the same as the number of the global image features, and the local image features and the global image features correspond to each other.

It should be noted that the global image features of the image to be classified are the same as the image features, and the local image features of the sub-images are the same as the sub-image features.

In the embodiment of the present application, each image feature is obtained through a feature vector corresponding to a label, and therefore, image features obtained through feature vectors corresponding to the same label correspond to each other.

For example: the number of the labels to be classified is two, the image to be classified is subjected to window segmentation to obtain 3 sub-images, feature extraction processing is carried out on the image to be classified and the 3 sub-images, the image to be classified obtains global features 1 and 2, the sub-image 1 obtains local features 11 and 12, the sub-image 2 obtains local features 21 and 22, and the sub-image 3 obtains local features 31 and 32. Thus, 8 image features can be obtained through one image to be classified. The global feature 1, the local feature 11, the local feature 21, and the local feature 31 are obtained from a feature vector corresponding to the tag 1, and the global feature 2, the local feature 12, the local feature 22, and the local feature 32 are obtained from a feature vector corresponding to the tag 2.

Step C4: and for each local image feature of each sub-image, fusing the local image feature with the global image feature corresponding to the local image feature in the image to be classified to obtain a fused image feature of the local image feature.

In the embodiment of the application, in order to improve the multi-scale property (including both global features and local features) of the image features, the obtained image features are fused.

For example: one image to be classified is subjected to feature extraction to obtain 8 image features which are respectively as follows: global feature 1, local feature 11, local feature 21, local feature 31, global feature 2, local feature 12, local feature 22, local feature 32.

And fusing the global feature 1 and the local feature 11 to obtain a fused feature 11, fusing the global feature 1 and the local feature 21 to obtain a fused feature 21, fusing the global feature 1 and the local feature 31 to obtain a fused feature 31, and the like.

The image features can be fused by:

1. and (3) averaging:

and aiming at each local image feature of each sub-image, taking the average value of the local image feature and the global image feature corresponding to the local image feature in the image to be classified as a fusion image feature.

For example: the feature vector of global feature 1 is 1, 4, 8, 6, 5, 2; the feature vector of the local feature 11 is 4, 5, 6, 7, 2, 1; then, after averaging, the resulting fusion signature 11 is 2.5, 4.5, 7, 6, 3.5, 1.5.

2. Taking the maximum value:

and aiming at each local image feature of each sub-image, combining the local image feature with the maximum value of each dimension in the global image feature corresponding to the local image feature in the image to be classified, and taking the combined result as a fusion image feature.

For example: the feature vector of global feature 1 is 1, 4, 8, 6, 5, 2; the feature vector of the local feature 11 is 4, 5, 6, 7, 2, 1; the fused features 11 obtained after taking the maximum value are 4, 5, 8, 7, 5, 2.

3. And (3) carrying out classification selection through a full connection layer:

and aiming at each local image feature of each sub-image, connecting the local image feature with the global image feature corresponding to the local image feature in the image to be classified, inputting the local image feature into a full-connection network model, and outputting to obtain a fusion image feature.

For example: the feature vector of global feature 1 is 1, 4, 8, 6, 5, 2; the feature vector of the local feature 11 is 4, 5, 6, 7, 2, 1; connecting the two eigenvectors to obtain a new eigenvector: 1, 4, 8, 6, 5, 2, 4, 5, 6, 7, 2, 1; and inputting the new feature vector into a full-connection network for classification and acquisition to obtain more complex fusion features.

Step C5: and taking the fusion image characteristics of the sub-images as the image characteristics of the image to be classified.

Therefore, the image characteristics of the whole image and the sub image are combined, and the accuracy of image classification is further improved.

S102: and respectively obtaining the predicted value of each image feature corresponding to each label.

In the embodiment of the application, after the image features to be classified are obtained, the predicted values of the labels corresponding to the features are obtained through the classifier.

For example: 4 image features are extracted from the image to be classified, namely feature 11, feature 12, feature 21 and feature 22. Wherein features 11 and 21 are obtained by tag 1 and features 12 and 22 are obtained by tag 2. And obtaining the predicted value of each image characteristic through a classifier. The predicted value obtained by the feature 11 is 78; 22. the predicted value of feature 12 is 83; 17. the predicted value of the feature 21 is 15; 85. the predicted value of the feature 22 is 20; 80. therefore, the predicted values for the image features of tag 1 are 78, 83, 15, and 20, respectively; the predicted values for the image features of the label 2 are 22, 17, 85, and 80, respectively.

S103: and determining the target predicted value of each label according to the predicted value of each label corresponding to each image feature.

In the embodiment of the present application, after the predicted value of each image feature corresponding to the label is determined, each determined predicted value may be used as the target predicted value of the label.

For example, the average value of the predicted values of each label is used as the target predicted value of the label, such as: if the predicted values of the image features of tag 1 are 78, 83, 15 and 20, respectively, the target predicted value of tag 1 is (78+83+15+ 20)/4-49.

A threshold may also be set, and the average of the predicted values greater than the threshold is used as the target predicted value of the tag, such as: the threshold value is set to 50, the predicted values of the image features of the tag 1 are 78, 83, 15 and 20 respectively, and the target predicted value of the tag 1 is (78+83)/2 which is 80.5.

Of course, in order to make the final predicted value more accurate, the maximum value corresponding to each label may be used as the predicted value of the label. Specifically, the method comprises the following steps: and determining the maximum value of the predicted values of each image feature corresponding to the label as the target predicted value of the label.

For example, the predicted values for the image features of the tag 1 are 78, 83, 15, and 20, respectively; the predicted values for the image features of the label 2 are 22, 17, 85, and 80, respectively. Then, the target predicted value for tag 1 is 83; the target predicted value for tag 2 is 85.

S104: and taking the target predicted value of each label as the label predicted value of the image to be classified.

S105: and classifying the images to be classified according to the label predicted values of the images to be classified.

As described above, the label prediction value of the image to be classified is 83; and 85, classifying the image to be classified according to the determined label prediction value. That is, the image to be classified has a probability of label 1 of 83% and label 2 of 85%.

Therefore, the accuracy of each label predicted value in the image to be classified can be improved by acquiring the plurality of image characteristics of the image to be classified and determining the label predicted value of the image to be classified according to the predicted values of the plurality of image characteristics to each label, so that the accuracy of image classification is improved.

Fig. 5 is a flowchart illustrating the whole process of the embodiment of the present application. In fig. 5, an image to be classified is divided into 3 sub-images, and the 3 sub-images are all input into a feature extraction network, and each sub-image obtains 2 image features, and 6 image features in total. And obtaining label predicted values of 6 features through a classifier, and taking the maximum value of the predicted values in each label as the label predicted value of the image to be classified.

Based on the same inventive concept, the embodiment of the application also provides an image classification device. As shown in fig. 6, the apparatus includes:

a first obtaining module 601, configured to obtain at least two image features of an image to be classified; different image features are obtained according to the feature vectors corresponding to different labels;

a second obtaining module 602, configured to obtain, for each label, a predicted value of each image feature corresponding to the label;

a first determining module 603, configured to determine, for each label, a maximum value of the predicted values of the label corresponding to each image feature as a target predicted value of the label;

a second determining module 604, configured to use the target prediction value of each label as a label prediction value of the image to be classified;

and the classification module 605 is configured to classify the image to be classified according to the label prediction value of the image to be classified.

Optionally, the first obtaining module 601 includes:

the fusion unit is used for fusing the local image features with global image features corresponding to the local image features in the images to be classified according to the local image features of each sub-image to obtain fusion image features of the local image features;

Optionally, the fused image feature is obtained by:

the first fusion subunit is used for taking the average value of the local image features and the global image features corresponding to the local image features in the image to be classified as fusion image features aiming at each local image feature of each sub-image; or the like, or, alternatively,

the second fusion subunit is used for combining the local image features with the maximum value of each dimension in the global image features corresponding to the local image features in the image to be classified according to the local image features of each sub-image, and taking the combined result as the fusion image features; or the like, or, alternatively,

and the third fusion subunit is used for connecting the local image features with the global image features corresponding to the local image features in the image to be classified according to the local image features of each sub-image, inputting the local image features into the full-connection network model, and outputting the full-connection network model to obtain the fusion image features.

Optionally, determining the sub-image unit includes:

the image classification device comprises a determining edge subunit, a classifying unit and a classifying unit, wherein the determining edge subunit is used for determining a long edge and a short edge of an image to be classified, determining a first edge of the image to be classified as the short edge, and determining a second edge smaller than the long edge;

and determining a sub-image subunit, wherein the sub-image subunit is used for setting a segmentation window with the same size as the sub-image, starting from a short side, moving the segmentation window along a long side for a set length each time, and obtaining the sub-image in the region of the image to be classified according to the segmentation window.

Optionally, the first obtaining module 601 includes:

the fourth feature extraction unit is used for extracting features of the image to be classified to obtain a feature map of the image to be classified;

the convolution unit is used for respectively carrying out convolution calculation on the feature vector corresponding to each label and the feature map to obtain an attention mask corresponding to each label;

and the third image feature determining unit is used for respectively performing dot product calculation on the feature map and the attention masks corresponding to the labels to obtain at least two image features of the image to be classified.

Based on the same technical concept, the present application further provides a terminal device 700, as shown in fig. 7, where the terminal device 700 is configured to implement the methods described in the above various method embodiments, for example, implement the embodiment shown in fig. 2, and the terminal device 700 may include a memory 701, a processor 702, an input unit 703, and a display panel 704.

A memory 701 for storing a computer program executed by the processor 702. The memory 701 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the terminal device 700, and the like. The processor 702 may be a Central Processing Unit (CPU), a digital processing unit, or the like. The input unit 703 may be used to obtain a user instruction input by a user. The display panel 704 is configured to display information input by a user or information provided to the user, and in this embodiment of the present application, the display panel 704 is mainly configured to display a display interface of each application program in the terminal device and a control entity displayed in each display interface. Alternatively, the display panel 704 may be configured in the form of a Liquid Crystal Display (LCD) or an organic light-emitting diode (OLED), and the like.

The embodiment of the present application does not limit the specific connection medium among the memory 701, the processor 702, the input unit 703, and the display panel 704. In the embodiment of the present application, the memory 701, the processor 702, the input unit 703, and the display panel 704 are connected by the bus 705 in fig. 7, the bus 705 is shown by a thick line in fig. 7, and the connection manner between other components is only schematically illustrated and is not limited thereto. The bus 705 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 7, but this is not intended to represent only one bus or type of bus.

The memory 701 may be a volatile memory (volatile memory), such as a random-access memory (RAM); the memory 701 may also be a non-volatile memory (non-volatile memory) such as, but not limited to, a read-only memory (rom), a flash memory (flash memory), a Hard Disk Drive (HDD) or a solid-state drive (SSD), or any other medium which can be used to carry or store desired program code in the form of instructions or data structures and which can be accessed by a computer. Memory 701 may be a combination of the above.

The processor 702, configured to implement the embodiment shown in fig. 1, includes:

a processor 702 for calling the computer program stored in the memory 701 to execute the embodiment shown in fig. 1.

The embodiment of the present application further provides a computer-readable storage medium, which stores computer-executable instructions required to be executed by the processor, and includes a program required to be executed by the processor.

In some possible embodiments, aspects of an image classification method provided herein may also be implemented in the form of a program product including program code for causing a terminal device to perform the steps of an image classification method according to various exemplary embodiments of the present application described above in this specification when the program product is run on the terminal device. For example, the terminal device may perform the embodiment as shown in fig. 1.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

An image classification program product for an embodiment of the present application may employ a portable compact disk read only memory (CD-ROM) and include program code, and may be executable on a computing device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including a physical programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device over any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., over the internet using an internet service provider).

It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such division is merely exemplary and not mandatory. Indeed, the features and functions of two or more units described above may be embodied in one unit, according to embodiments of the application. Conversely, the features and functions of one unit described above may be further divided into embodiments by a plurality of units.

Further, while the operations of the methods of the present application are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable image classification apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable image classification apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method of image classification, the method comprising:

2. The method of claim 1, wherein the obtaining at least two image features of the image to be classified comprises:

obtaining at least two sub-images from the image to be classified, wherein the spliced image of all the sub-images comprises all areas of the multi-label image;

extracting the characteristics of each sub-image to obtain the sub-image characteristics of the sub-image;

and taking the characteristics of the sub-images as the image characteristics of the image to be classified.

3. The method of claim 1, wherein the obtaining at least two image features of the image to be classified comprises:

performing feature extraction on the image to be classified to obtain at least two global image features of the image to be classified; and the number of the first and second groups,

performing feature extraction on each sub-image to obtain at least two local image features of each sub-image; the number of the local image features of each sub-image is the same as that of the global image features, and the local image features and the global image features correspond to each other;

for each local image feature of each sub-image, fusing the local image feature with a global image feature corresponding to the local image feature in the image to be classified to obtain a fused image feature of the local image feature;

and taking the fusion image characteristics of the sub-images as the image characteristics of the image to be classified.

4. The method of claim 3, wherein the fused image feature is obtained by:

aiming at each local image feature of each sub-image, taking the average value of the local image feature and the global image feature corresponding to the local image feature in the image to be classified as a fusion image feature; or the like, or, alternatively,

aiming at each local image feature of each sub-image, combining the local image feature with the maximum value of each dimension in the global image feature corresponding to the local image feature in the image to be classified, and taking the combined result as a fusion image feature; or the like, or, alternatively,

5. The method according to claim 2 or 3, wherein said obtaining at least two sub-images from said image to be classified comprises:

determining a long edge and a short edge of the image to be classified, determining a first edge of the image to be classified as the short edge, and determining a second edge as the long edge;

and setting a segmentation window with the same size as the sub-image, starting from the short side, moving the segmentation window along the long side for a set length each time, and obtaining the sub-image according to the region of the segmentation window in the image to be classified.

6. The method of claim 1, wherein the obtaining at least two image features of the image to be classified comprises:

carrying out feature extraction on the image to be classified to obtain a feature map of the image to be classified;

performing convolution calculation on the feature map and the feature vectors corresponding to the labels respectively to obtain the attention masks corresponding to the labels;

and respectively carrying out dot product calculation on the feature graph and the attention mask corresponding to each label to obtain at least two image features of the image to be classified.

7. The method according to any one of claims 1 to 4 or 6, wherein the determining, for each label, the target predicted value of the label according to the respective predicted value of the label corresponding to the respective image feature specifically comprises:

and determining the maximum value of the predicted values of each image feature corresponding to the label as the target predicted value of the label.

8. An image classification apparatus, characterized in that the apparatus comprises:

9. An electronic device, characterized in that it comprises a processor and a memory, wherein the memory stores program code which, when executed by the processor, causes the processor to carry out the steps of the method of any one of claims 1 to 7.

10. Computer-readable storage medium, characterized in that it comprises program code for causing an electronic device to carry out the steps of the method of any one of claims 1 to 7, when said program product is run on said electronic device.