CN111507403A

CN111507403A - Image classification method and device, computer equipment and storage medium

Info

Publication number: CN111507403A
Application number: CN202010303814.8A
Authority: CN
Inventors: 李岩; 康斌
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-04-17
Filing date: 2020-04-17
Publication date: 2020-08-07

Abstract

The application relates to the technical field of artificial intelligence, and provides an image classification method, an image classification device, computer equipment and a storage medium, wherein an image to be classified is obtained, at least two image features of the image to be classified are correspondingly input into at least two image classifiers, the at least two image classifiers are respectively corresponding to at least two classification levels, and the image features input into the image classifiers corresponding to the adjacent classification levels have a similarity constraint relation and are used for reducing the similarity between the image features; and acquiring the layering classification result of the image to be classified according to the classification result of the image to be classified output by the image classifier on the corresponding classification level. According to the scheme, the similarity constraint relation is applied among the image features to reduce the similarity among the image features input to the image classifiers corresponding to the adjacent classification levels, so that the image classifiers corresponding to different classification levels pay attention to different image features on the image to be classified, and the accuracy of hierarchical classification of the image is improved.

Description

Image classification method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to an image classification method, apparatus, computer device, and storage medium.

Background

With the development of artificial intelligence technology, technology for classifying images based on deep learning technology such as a deep neural network has emerged, for example, an image classifier can be constructed based on the deep neural network to classify input images. Wherein for a conventional image classification task the status between the categories of the images is generally equivalent, i.e. no classification between all images is made, in which case the image classifier is easier to implement for a simpler classification task, such as distinguishing between car images and other categories, such as cat and dog images. By adopting the hierarchical image classification technology, the hierarchical classification task can be completed by judging whether the image belongs to the animal class or the non-animal class, and then further learning the cat class and the dog class from the animal class, for example.

The hierarchical image classification method provided by the conventional technology usually directly inputs images into, for example, two different image classifiers, one of which performs classification training for a large category, and the other of which performs classification training for sub-categories under the large category. However, the hierarchical classification of images in this way is less accurate.

Disclosure of Invention

In view of the above, it is necessary to provide an image classification method, apparatus, computer device and storage medium for solving the above technical problems.

A method of image classification, the method comprising:

acquiring an image to be classified;

correspondingly inputting at least two image characteristics of the image to be classified into at least two image classifiers; the at least two image classifiers correspond to the at least two classification levels respectively; the image features input to the image classifiers corresponding to the adjacent classification levels have similarity constraint relation and are used for reducing the similarity between the image features;

and acquiring the layering classification result of the image to be classified according to the classification result of the image to be classified on the corresponding classification level output by the image classifier.

An image classification apparatus, the apparatus comprising:

the image acquisition module is used for acquiring an image to be classified;

the characteristic input module is used for correspondingly inputting at least two image characteristics of the image to be classified into at least two image classifiers; the at least two image classifiers correspond to the at least two classification levels respectively; the image features input to the image classifiers corresponding to the adjacent classification levels have similarity constraint relation and are used for reducing the similarity between the image features;

and the result acquisition module is used for acquiring the layering classification result of the image to be classified according to the classification result of the image to be classified output by the image classifier on the corresponding classification level.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

acquiring an image to be classified; correspondingly inputting at least two image characteristics of the image to be classified into at least two image classifiers; the at least two image classifiers correspond to the at least two classification levels respectively; the image features input to the image classifiers corresponding to the adjacent classification levels have similarity constraint relation and are used for reducing the similarity between the image features; and acquiring the layering classification result of the image to be classified according to the classification result of the image to be classified on the corresponding classification level output by the image classifier.

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

The image classification method, the image classification device, the computer equipment and the storage medium acquire an image to be classified, correspondingly input at least two image characteristics of the image to be classified into at least two image classifiers, wherein the at least two image classifiers are respectively corresponding to at least two classification levels and input between the image characteristics of the image classifiers corresponding to the adjacent classification levels, and have a similarity constraint relation for reducing the similarity between the image characteristics; then, the hierarchical classification result of the image to be classified can be obtained according to the classification result of the image to be classified output by the image classifier on the corresponding classification level. According to the scheme, the similarity constraint relation is applied among the image features, so that the similarity among the image features input to the image classifiers corresponding to the adjacent classification levels can be reduced as much as possible, the image classifiers corresponding to different classification levels can pay attention to different image features on the same image to be classified, the image is classified on respective classification levels according to the corresponding image features, the accuracy of hierarchical classification of the image is improved, and the classification tasks of the image to be classified on a plurality of classification levels can be completed simultaneously.

Drawings

FIG. 1 is a diagram of an exemplary embodiment of an image classification method;

FIG. 2 is a diagram illustrating an image classification task in one embodiment;

FIG. 3 is a flow diagram illustrating a method for image classification in one embodiment;

FIG. 4 is a flowchart illustrating steps of constructing an image classifier in one embodiment;

FIG. 5 is a schematic diagram of image classification in one embodiment;

FIG. 6 is a schematic flow chart illustrating the steps for obtaining features of a sample image in one embodiment;

FIG. 7 is a schematic diagram of an interface for displaying image information in one embodiment;

FIG. 8 is a schematic diagram of image classification in an example application;

FIG. 9 is a block diagram showing the structure of an image classification apparatus according to an embodiment;

FIG. 10 is a diagram showing an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The image classification method provided by the present application can be applied to the application environment shown in fig. 1, where fig. 1 is an application environment diagram of the image classification method in one embodiment. Wherein the terminal 110 may communicate with the server 120 through a network. The terminal 110 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 120 may be implemented by an independent server or a server cluster formed by a plurality of servers.

The application provides an image classification method, and relates to the technical field of artificial intelligence. Artificial Intelligence (AI) is a theory, method, technique and application system that can simulate, extend and expand human Intelligence using a digital computer or a machine controlled by a digital computer, and can sense the environment, acquire knowledge and use the knowledge to obtain the best result.

The artificial intelligence technology includes Computer Vision technology (CV), which means that a camera, a Computer and other terminal devices can be used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further perform image processing, so that the Computer processing becomes an image more suitable for human eyes to observe or transmit to an instrument to detect. And computer vision techniques may include image recognition and image classification techniques, such as recognizing that the image is an image of a car, an image of a cat, or an image of a dog, etc.

The machine learning technology is combined into the computer vision technology, so that the terminal equipment can intelligently classify the images to be classified according to the learned image classification knowledge. Furthermore, the terminal equipment can be used for carrying out hierarchical classification on the images to be classified.

Fig. 2 is a schematic diagram of an image classification task in an embodiment, where fig. 2 illustrates differences between general classification and hierarchical classification, in a general classification task, statuses between classes of images are equal, and an image classifier will not distinguish the classes, that is, the images are directly recognized as cat images, dog images, bicycle images, or automobile images, but actually, relationships between different classes are different, for example, among four classes of cats, dogs, automobiles, and bicycles, cats and dogs belong to animal classes, relationships are relatively close to each other, and relationships between the animal classes and vehicle classes to which automobiles and bicycles belong are actually relatively far away. Therefore, the hierarchical classification task can judge whether the image belongs to an animal class or a non-animal class such as a vehicle class and further identify a cat, a dog, a bicycle or an automobile and the like in the animal class and the vehicle class, wherein the animal class and the vehicle class belong to the same classification level and can be called a large class classification level; cats and dogs under the animal category and bicycles and cars under the vehicle category belong to another classification level, which can be a subclass classification level. The image classification method provided by the application can obtain the classification result of the image to be classified on each classification level, and obtain the hierarchical classification result of the image to be classified, for example, the hierarchical classification result of the image to be classified belonging to the animal category and the cat image can be obtained.

The image classification method provided by the application can be applied to various types of content auditing and content understanding tasks including images, for example, for video type services, a video frame extraction strategy can be adopted to realize the content auditing and understanding of the video services.

Specifically, the image classification method provided by the present application may be executed by the terminal 110 or the server 120 alone, or may be executed by the terminal 110 and the server 120 in cooperation.

Firstly, taking the terminal 110 to execute alone as an example for explanation, the terminal 110 may obtain an image to be classified, and correspondingly input at least two image features of the image to be classified to at least two image classifiers; the at least two image classifiers may be pre-configured on the terminal 110, respectively correspond to the at least two classification levels, and have a similarity constraint relationship between image features input to the image classifiers corresponding to adjacent classification levels, so as to reduce similarity between the image features; finally, the terminal 110 may obtain the hierarchical classification result of the image to be classified according to the classification result of the image to be classified output by the image classifier on the corresponding classification level.

The image classification method provided by the application can also be executed by the cooperation of the terminal 110 and the server 120, specifically, the terminal 110 can acquire an image to be classified and send the image to be classified to the server 120, and the server 120 correspondingly inputs at least two image characteristics of the image to be classified to at least two image classifiers; the at least two image classifiers may be pre-configured on the server 120, respectively correspond to the at least two classification levels, and have a similarity constraint relationship between image features input to the image classifiers corresponding to adjacent classification levels, so as to reduce similarity between the image features; then, the server 120 may send the classification result of the image to be classified output by each image classifier on the corresponding classification level to the terminal 110, and the terminal 110 may obtain the hierarchical classification result of the image to be classified according to the classification result.

In an embodiment, as shown in fig. 3, fig. 3 is a flowchart illustrating an image classification method in an embodiment, and an image classification method is provided, which is described by taking the method as an example applied to the terminal 110 in fig. 1, and includes the following steps:

step S301, acquiring an image to be classified;

in this step, the terminal 110 may obtain an image to be classified. The image to be classified may be an image captured by the terminal 110 through an image capturing device such as a camera, or may be an image pre-stored in an electronic gallery of the terminal 110, and the image to be classified may include objects such as animals and plants. Specifically, the terminal 110 may capture an image of the cat in real time through a camera configured therein, and the image of the cat may be used as an image to be classified.

Step S302, correspondingly inputting at least two image characteristics of the image to be classified into at least two image classifiers;

in this step, the terminal 110 may correspondingly input at least two image features of the image to be classified into at least two image classifiers. In the hierarchical classification task, the terminal 110 may be configured with two image classifiers, for example, one image classifier is used for classifying animals and vehicles at the level, and the other image classifier is used for classifying cats and dogs under the animal, bicycles and cars under the vehicle at the level.

Furthermore, the image features input to the image classifiers corresponding to the adjacent classification levels have a similarity constraint relationship, and the similarity constraint relationship is used for reducing the similarity between the image features input to the image classifiers corresponding to the adjacent classification levels. Specifically, a three-level classification task is taken as an example for explanation, and the first-level classification hierarchy is set as: a plant or animal; second-level classification hierarchy, animals for example: a mammal or a reptile; third-level classification hierarchy, exemplified by reptiles: lizards or snakes. In this case, two similarity constraint relationships are added, the first similarity constraint relationship being applied between the image features input to the image classifiers corresponding to the first and second classification levels, and the second similarity constraint relationship being applied between the image features input to the image classifiers corresponding to the second and third classification levels. The image features can be generally expressed by vectors, so that similarity constraint relation can be applied to the vectors by reducing the similarity among the vectors, and the similarity among the image features input to the image classifiers corresponding to the adjacent classification levels can be reduced.

Illustratively, the similarity constraint relationship may include a mutual information constraint relationship or an orthogonal constraint relationship. For the mutual information constraint relationship, the similarity between two image features can be reduced by obtaining the mutual information between the image features input to the image classifiers corresponding to the adjacent classification levels so as to minimize the mutual information between the two image features. Similarly, for the orthogonal constraint relationship, the similarity between two image features can be reduced by making the image features input to the image classifiers corresponding to the adjacent classification levels orthogonal to each other.

In this way, the terminal 110 can decouple two image features input to the image classifier corresponding to the adjacent classification levels, so that similar parts between different image features can be eliminated, the two image features respectively correspond to different features on the image, and the image classifier can focus on different types of features on the image based on the input decoupled image features, and can better and accurately classify the images to be classified on the corresponding classification levels by using different image features more suitable for the corresponding classification levels, and simultaneously complete classification tasks of a plurality of classification levels.

Step S303, acquiring a layering classification result of the image to be classified according to the classification result of the image to be classified output by the image classifier on the corresponding classification level.

In this step, the terminal 110 may obtain the classification result output by each image classifier, where the classification result may include the classification result of the image to be classified on each classification level, for example, in the case of having three image classifiers, the terminal 110 may obtain the classification results output by the three image classifiers respectively and corresponding to the three classification levels, and the terminal 110 may use the three classification results as the hierarchical classification result of the image to be classified, or may select the classification result of one classification level from the hierarchical classification results as the hierarchical classification result required by the terminal, so that the terminal 110 may simultaneously complete the classification task of the image to be classified on multiple classification levels.

In the image classification method, the terminal 110 obtains an image to be classified and correspondingly inputs at least two image characteristics of the image to be classified into at least two image classifiers, wherein the at least two image classifiers correspond to at least two classification levels respectively, and the image characteristics input into the image classifiers corresponding to the adjacent classification levels have a similarity constraint relation for reducing the similarity between the image characteristics; then, the terminal 110 may obtain a hierarchical classification result of the image to be classified according to the classification result of the image to be classified output by the image classifier on the corresponding classification level. According to the scheme, the similarity constraint relation is applied among the image features, so that the similarity among the image features input to the image classifiers corresponding to the adjacent classification levels can be reduced as much as possible, the image classifiers corresponding to different classification levels can pay attention to different image features on the same image to be classified, the image is classified on respective classification levels according to the corresponding image features, the accuracy of hierarchical classification of the image is improved, and the classification tasks of the image to be classified on a plurality of classification levels can be completed simultaneously.

In one embodiment, the inputting of the at least two image features of the image to be classified into the at least two image classifiers in step S302 may include:

at least two image features are obtained through a pre-constructed feature extractor and are correspondingly input to at least two image classifiers.

In this embodiment, the terminal 110 may obtain at least two image features from the image to be classified by using a pre-constructed feature extractor, and input the at least two image features to the at least two image classifiers. And the feature extractor and the at least two image classifiers are constructed based on the similarity constraint relation. The feature extractor may be implemented based on a neural network model. The terminal 110 may input the image to be classified to a feature extractor based on a neural network model, divide the image features output by the last convolutional layer of the feature extractor into the at least two image features, and correspondingly input the at least two image features to the at least two image classifiers.

In the scheme of this embodiment, the terminal 110 may perform feature extraction on the image to be classified through the feature extractor obtained by matching training with at least two image classifiers in advance under the similarity constraint relationship, so as to obtain multiple image features that can be applied to each image classifier for classification, and it is not necessary to recalculate the similarity between the image features each time an image is performed, thereby improving the image classification efficiency.

In one embodiment, further, the feature extractor may further include a feature extraction network and an encoder; the step of obtaining at least two image features through the pre-constructed feature extractor may specifically include:

inputting the image to be classified into a feature extraction network to obtain initial image features output by the feature extraction network; inputting the initial image characteristics to an encoder to obtain the encoded initial image characteristics output by the encoder; and acquiring at least two image characteristics based on the coded initial image characteristics.

In this embodiment, the feature extractor may further include a feature extraction network and an encoder. Wherein the feature extraction network and the encoder may be implemented based on a neural network model, such as a ResNet residual network model.

The feature extraction network can be used for preliminarily acquiring the image features of the images to be classified, and the image features acquired by the feature extraction network are often high in dimensionality and contain redundant information. Therefore, the terminal 110 may input the image to be classified into the feature extraction network first to obtain the initial image features output by the feature extraction network, and then the terminal 110 further inputs the initial image features into the encoder, where the terminal 110 may map the image features to the encoding features using the encoder, which may be used to reduce feature dimensionality and play a role in removing redundant information in the initial image features. Thus, the terminal 110 obtains at least two image features from the encoded initial image features output from the encoder based on the encoded initial image features. By adopting the scheme of the embodiment, the terminal 110 can directly use the trained feature extraction network to obtain the initial image features when classifying the images, can divide the initial image features coded by the coder into at least two image features after performing feature dimension reduction by using the coder, and has a constraint relationship between the image features input to the image classifiers corresponding to adjacent classification levels, thereby improving the efficiency and accuracy of image classification.

In one embodiment, as shown in fig. 4, fig. 4 is a flowchart illustrating steps of constructing an image classifier in an embodiment, before at least two image features are obtained by a pre-constructed feature extractor, the feature extractor and the image classifier may be constructed by the following steps:

step S401, obtaining a sample image, and obtaining classification labels of the sample image on at least two classification levels as real classification labels of at least two image classifiers;

in this step, the terminal 110 may obtain a sample image, and the number of the sample images is generally multiple. The terminal 110 further needs to obtain a classification label corresponding to the sample image, where the classification label includes a classification label of the sample image at each classification level. As described with reference to fig. 2, the terminal 110 may obtain a cat image as a sample image, and the terminal 110 further needs to obtain classification labels at two classification levels corresponding to the cat image, that is, "animal" and "cat". Further, the terminal 110 may use the classification labels of the sample image on at least two classification levels as real classification labels of at least two image classifiers, for training the image classifiers and the feature extractor.

Step S402, inputting the sample image into the feature extractor, and acquiring at least two sample image features with the same dimension according to the image features of the sample image output by the feature extractor.

Referring to fig. 5, fig. 5 is a schematic diagram of image classification in an embodiment, the terminal 110 inputs a sample image to the feature extractor, the feature extractor outputs image features of the sample image, and the terminal 110 further divides the sample image features into at least two sample image features with the same dimension. Specifically, the terminal 110 may divide the sample image feature into an image feature a and an image feature B having the same dimension. That is, assuming that the dimension of the sample image feature is 2d, the terminal 110 splits the sample image feature into an image feature a and an image feature B, the dimensions of which are 1d, respectively. For example, assuming that the image feature dimension of the sample image output by the acquired feature extractor is a 2048-dimensional vector, the splitting process may be to correspond the first 1/2-dimensional vector, i.e., 0-rd to 1023-th-dimensional vectors, in the 2048-dimensional vector to the image feature a and the last 1/2-dimensional vector, i.e., 1024-th to 2047-th-dimensional vectors, to the image feature B.

In one embodiment, the inputting the sample image to the feature extractor in step S402 may specifically include: preprocessing the sample image to obtain a sample image with the image size being a preset image size; the sample image of the preset image size is input to a feature extractor.

Specifically, for example, the image size of the model training of the image classifier, the feature extractor and the like generally needs to be fixed, so that the embodiment can scale the sample image with any image size into the image with the image size of 256 × 256, and then randomly cut out the image with the size of 224 × 224 from the sample image to be used as the sample image with the preset image size for training.

Step S403, respectively inputting at least two sample image characteristics to at least two image classifiers, and obtaining predicted classification labels of sample images output by the at least two image classifiers on corresponding classification levels;

in this step, referring to fig. 5, the terminal 110 may input the image feature a to the image classifier a and input the image feature B to the image classifier B. The image classifier A and the image classifier B are used for classifying the images to be classified on different classification levels. In the process of model construction, the image classifier a can predict the classification result according to the input image feature a to obtain a predicted classification label a, and similarly, the image classifier B can obtain a predicted classification label B according to the input image feature B. The predictive classification tag may correspond to a probability value that belongs to a class at the corresponding classification level. Specifically, as described with reference to fig. 2, the predicted classification label may be a probability value of the sample image belonging to an animal or a vehicle at a classification level of "animal or vehicle".

S404, constructing a similarity constraint relation among sample image features input to the image classifiers corresponding to the adjacent classification levels;

in this step, the terminal 110 constructs a similarity constraint relationship between sample image features input to the image classifiers corresponding to the adjacent classification levels.

Step S405, training the feature extractor and the at least two image classifiers based on the real classification label, the prediction classification label and the similarity constraint relation, and constructing the feature extractor and the at least two image classifiers.

According to the technical scheme of the embodiment, the terminal 110 may perform joint training on the feature extractor and the at least two image classifiers based on the real classification label, the prediction classification label and the similarity constraint relationship of the sample image, and construct the feature extractor and the at least two image classifiers, so that the trained feature extractor can obtain at least two image features having the similarity constraint relationship from the image to be classified, and input the at least two image features into the at least two image classifiers for classification, thereby realizing rapid and accurate classification of the image to be classified.

In one embodiment, as shown in fig. 6, fig. 6 is a flow chart illustrating the steps of obtaining the features of the sample image in one embodiment, and the feature extractor may include a feature extraction network and an encoder; the step of inputting the sample image into the feature extractor and obtaining at least two sample image features with the same dimension according to the image features of the sample image output by the feature extractor in step S402 may include:

step S601, inputting a sample image into a feature extraction network to obtain initial sample image features output by the feature extraction network;

step S602, inputting the initial sample image characteristics to an encoder to obtain the encoded initial sample image characteristics output by the encoder;

step S603, splitting the initial sample image features into at least two sample image features with the same dimension.

In this embodiment, the terminal 110 may obtain the at least two sample image features based on the feature extraction network and the encoder included in the feature extractor. Referring to fig. 5, fig. 5 is a schematic diagram of image classification in an embodiment, the terminal 110 may input a sample image to a feature extraction network, the feature extraction network may be configured to preliminarily obtain image features from the sample image, the terminal 110 obtains initial sample image features output by the feature extraction network, as described in the above embodiment, the image features obtained by the feature extraction network often contain redundant information and feature dimensions are relatively high, so that the terminal 110 further inputs the initial sample image features to the encoder, the encoder may be configured to map image features to encoded features, to reduce a feature dimension of initial sample image features acquired by a feature extraction network, and removing redundant information in the initial sample image features, and finally, the terminal 110 splits the encoded initial sample image features output by the encoder into at least two sample image features with the same dimension. By adopting the scheme of the embodiment, the feature extraction network, the encoder and the at least two image classifiers can be trained together based on the similarity constraint relationship, so that the trained feature extraction network, the encoder and the at least two image classifiers can be used as a whole of an image classification tool to rapidly and accurately classify the images to be classified.

In one embodiment, the training of the feature extractor and the at least two image classifiers based on the real classification label, the prediction classification label and the similarity constraint relationship in step S405 may include:

according to the real classification label and the prediction classification label, constructing first loss functions corresponding to two classification levels to obtain at least two first loss functions; constructing a second loss function according to the similarity constraint relation; the feature extractor and the at least two image classifiers are trained based on the at least two first loss functions and the second loss function such that the at least two first loss functions and the second loss function are maximized.

The present embodiment provides a specific way to train the feature extractor and the at least two image classifiers. Specifically, the terminal 110 may construct a first loss function based on the real classification label and the predicted classification label, where the first loss function may include a plurality of loss functions, and each loss function corresponds to a different classification level, for example, in a case where there are three classification levels, the first loss function includes three loss functions. In addition, the terminal 110 further constructs a second loss function according to the similarity constraint relationship, that is, the second loss function is constructed by the similarity constraint relationship between the image features input to the image classifiers corresponding to the adjacent classification levels. Accordingly, if there are two classification levels, the number of second loss functions constructed is one, and if there are three classification levels, the number of second loss functions constructed is two. Thus, the terminal 110 trains the feature extractor and the at least two image classifiers using the at least two first loss functions and the second loss function to maximize the at least two first loss functions and the second loss function.

Specifically, the image classification task with two classification levels is explained, which corresponds to the large category classification and the sub-category classification, wherein the large category label is set as y_superThe sub-category label is y_subThen the first penalty functions corresponding to the two classification levels are:

wherein, L_superIs shown as largeFirst loss function of class, L_subA first loss function, C, representing the subclass_superRepresents the total amount of big category (super category) categories, C_subRepresents the total amount of sub-category categories,

a true class label that represents a large class,

the true category label that represents the sub-category,

a predictive classification tag representing a large class, i.e. the probability value of the image to be classified under this large class belonging to class i,

the friend indicates the predicted classification label of the sub-category, i.e. the probability value of the image to be classified under the sub-category belonging to the category i.

In addition, let E be the image features of the image classifiers input to the large and sub-categories, respectively_α(x) And E_β(x) In that respect Wherein, the similarity constraint relation applied to the two image features can be mutual information constraint relation or orthogonal constraint relation. Taking the example of applying mutual information constraint, the corresponding second loss function is:

wherein, L_mulA second penalty function is represented, where r in the mutual information constraint represents a gradient inversion layer (gradientreverse layer), whose role is to multiply the gradient by-1, i.e. "invert the gradient" when the network propagates the gradient backwards_α(x) And E_β(x) Normalization using L2 (i.e., L2 Normal)ization) limits the range of values of the feature. In this way, only when the two images are identical in feature, i.e. E_α(x)＝＝E_β(x) Time of day, second loss function L_mulTake the minimum value-1 and when the two image features are perfectly orthogonal and mutually different, the second loss function L_mulThe maximum value of 0 is obtained. Since a gradient inversion layer is used, minimizing mutual information loss after gradient inversion is equivalent to maximizing the second loss function, i.e. it is required that the similarity between two image features is reduced as much as possible.

Finally, two first loss functions, together with one second loss function, are based on L-L_super+L_sub+L_mulAnd maximally training the whole network in a cooperative mode, wherein the training of the feature extractor and the at least two image classifiers is carried out, and when the feature extractor comprises the feature extraction network and the encoder, the feature extraction network and the encoder are trained together with the at least two image classifiers.

sending the image to be classified to a server so that the server correspondingly inputs at least two image characteristics of the image to be classified to at least two image classifiers to obtain classification results of the image to be classified output by the image classifiers on corresponding classification levels; and receiving the classification result obtained by the server.

Referring to fig. 1, the present embodiment mainly deals with the image classification process by the server 120. Specifically, the terminal 110 may obtain an image to be classified, and then send the image to be classified to the server 120, where the server 120 may be configured with the at least two image classifiers in advance. After receiving the image to be classified, the server 120 correspondingly inputs at least two image features of the image to be classified to at least two image classifiers to obtain classification results of the image to be classified output by the image classifiers on corresponding classification levels, and then sends the classification results to the terminal 110, and the terminal 110 receives the classification results sent by the server 120.

By adopting the technical solution of the embodiment, the terminal 110 may transfer the task of image classification processing to the server 120 for processing, so as to reduce the data processing pressure of the terminal 110.

In an embodiment, as shown in fig. 7, fig. 7 is an interface schematic diagram showing image information in an embodiment, and after the hierarchical classification result of the image to be classified is obtained according to the classification result of the image to be classified on the corresponding classification level output by the image classifier in step S303, the method may further include the following steps:

acquiring image classification information carrying a layering classification result; and displaying the image classification information on the image to be classified.

The present embodiment mainly includes that the terminal 110 can directly display the classification result of the image to be classified on the image to be classified. Referring to fig. 7, the terminal 110 may display an image 700 to be classified, and after obtaining a hierarchical classification result of the image 700 to be classified, the terminal 110 may display image classification information carrying the hierarchical classification result in an information display area 710. The hierarchical classification result of the image to be classified 700 may include a large classification result a1 and a sub classification result B2 of the image to be classified. Specifically, assuming that the image 700 to be classified is a cat image, the image classification information displayed by the terminal 110 may include a large classification result: an animal; and (5) subclass classification result: a cat. By adopting the technical scheme of the embodiment, the layering classification result can be displayed on the image to be classified in an overlapping mode, and the display efficiency of the layering classification result is improved.

In order to clarify the technical solutions provided by the present application more clearly, the principle of image classification is described in detail with reference to fig. 8, and fig. 8 is a schematic diagram of the principle of image classification in an application example.

In general, the input image (x) may be an image of any size, and training of the model (including training of the feature extraction network, the encoder, and the class and subclass classifiers) generally requires the use of images of fixed image size, so that images of any image size can be adjusted to 256 × 256 image size first, and then randomized therefromA224 × 224 image size image is cropped as the image to be processed.A feature extraction network can then be used to extract image features f (x). then, an encoder is used to map the image features to encoding features E (x), which can be viewed as image features prior to decoupling_α(x) For training a broad class classifier, second partial features E_β(x) The method is used for training the subclass classifier, and meanwhile mutual information constraint is applied between the two decoupling features, so that the similarity between the two decoupling features is reduced. It should be noted that, two parts of features extracted from the image to be processed and input to the large class classifier and the sub class classifier may not have any correlation characteristics, that is, only the image to be processed is provided for the large class classifier and the sub class classifier, and no special processing is required to be performed on the image features of the image in advance, the training of models including the feature extraction network, the encoder, the large class classifier and the sub class classifier can be completed by inputting the image to be processed to the model, and based on the trained model, the hierarchical classification of the image can be realized by inputting the image to be classified.

Specifically, assume that the input image is x and its class label is y_superThe subclass label is y_subThe image features are extracted using a feature extraction network, which is not particularly limited, and various neural network models, for example, may be used. Generally, the output of the last convolutional layer in the neural network can be used as the picture extraction feature f (x). Next, an encoder is used to map the image extraction features f (x) to encoding features e (x) having a dimension of 2 d. The encoder structure may employ a single fully connected layer (full connected layer). Then, the encoding feature e (x) of 2d dimension is split into two parts of the same dimension: e (x) → [ E [)_α(x)；E_β(x)]. Wherein, feature E_α(x) And E_β(x) The features are all 1 d-dimensional features and are used for large category classification and sub-category classification respectively. The corresponding loss functions are:

wherein, L_superLoss function representing large class, L_subLoss function representing subcategories, C_superRepresents the total amount of big category (super category) categories, C_subRepresents the total amount of sub-category categories,

a true class label that represents a large class,

the true category label that represents the sub-category,

a predictive classification tag representing a large class, i.e. a probability value of an image under the large class belonging to class i,

a predictive classification tag representing a sub-category, i.e. a probability value that an image belongs to category i under that sub-category.

In addition, to ensure the characteristic E_α(x) And E_β(x) The image features which are different as much as possible can be learned, and a mutual information constraint relation is applied between the two image features:

wherein, L_mulRepresenting the mutual information loss function, where r in the mutual information constraint represents a gradient inversion layer (gradient inversion layer) whose role is to multiply the gradient by-1 when the network propagates the gradient backwards, i.e. "invert the gradient"In one case, for feature E after decoupling_α(x) And E_β(x) The range of values of the features can be limited using L2 Normalization (i.e., L2 Normalization).

In this case, E is only the same for both image features_α(x)＝＝E_β(x) When the mutual information loss function takes a minimum value of-1, and when the two image features are perfectly orthogonal and mutually different, the second loss function L_mulFinally, the three loss functions are used to cooperatively train the whole network (including the feature extraction network, the encoder and the class and subclass classifiers), L-L_super+L_sub+L_mul。

The technical scheme provided by the application example can decouple the image features of the image into two partial image features suitable for large-class classification and sub-class classification, and simultaneously reduce the similarity degree between the two partial image features by using mutual information constraint, so that the two partial image features pay attention to different characteristics in the image as far as possible, and a hierarchical classification task is better completed.

It should be understood that, although the steps in the flowcharts of fig. 3 to 6 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 3 to 6 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least some of the other steps or stages.

In an embodiment, as shown in fig. 9, fig. 9 is a block diagram of an image classification apparatus in an embodiment, and provides an image classification apparatus, which may adopt a software module or a hardware module, or a combination of the two modules, as a part of a computer device, where the apparatus 900 specifically includes:

an image obtaining module 901, configured to obtain an image to be classified;

a feature input module 902, configured to correspondingly input at least two image features of an image to be classified to at least two image classifiers; the at least two image classifiers correspond to the at least two classification levels respectively; the image features input to the image classifiers corresponding to the adjacent classification levels have similarity constraint relation and are used for reducing the similarity between the image features;

and the result obtaining module 903 is configured to obtain a hierarchical classification result of the image to be classified according to the classification result of the image to be classified output by the image classifier on the corresponding classification level.

In one embodiment, the feature input module 902 is further configured to obtain at least two image features through a pre-constructed feature extractor, and correspondingly input the at least two image features to at least two image classifiers; the feature extractor and the at least two image classifiers are constructed based on similarity constraint relations.

In one embodiment, the feature extractor includes a feature extraction network and an encoder; the feature input module 902 is further configured to input the image to be classified into the feature extraction network, so as to obtain an initial image feature output by the feature extraction network; inputting the initial image characteristics to an encoder to obtain the encoded initial image characteristics output by the encoder; and acquiring at least two image characteristics based on the coded initial image characteristics.

In one embodiment, the apparatus 900 may further include:

the classifier building module is used for obtaining a sample image and obtaining classification labels of the sample image on at least two classification levels as real classification labels of at least two image classifiers; inputting the sample image into a feature extractor, and acquiring at least two sample image features with the same dimension according to the image features of the sample image output by the feature extractor; respectively inputting at least two sample image characteristics to at least two image classifiers, and acquiring predicted classification labels of sample images output by the at least two image classifiers on corresponding classification levels; constructing similarity constraint relation among sample image features input to image classifiers corresponding to adjacent classification levels; training the feature extractor and the at least two image classifiers based on the real classification label, the prediction classification label and the similarity constraint relation, and constructing the feature extractor and the at least two image classifiers.

In one embodiment, the feature extractor includes a feature extraction network and an encoder; a classifier building module further configured to: inputting the sample image into a feature extraction network to obtain initial sample image features output by the feature extraction network; inputting the initial sample image characteristics to an encoder to obtain the encoded initial sample image characteristics output by the encoder; and splitting the initial sample image features into at least two sample image features with the same dimension.

In one embodiment, the classifier building module is further configured to: according to the real classification label and the prediction classification label, constructing first loss functions corresponding to two classification levels to obtain at least two first loss functions; constructing a second loss function according to the similarity constraint relation; the feature extractor and the at least two image classifiers are trained based on the at least two first loss functions and the second loss function such that the at least two first loss functions and the second loss function are maximized.

In one embodiment, the similarity constraint relationship comprises a mutual information constraint relationship or an orthogonal constraint relationship.

In one embodiment, the classifier building module is further configured to: preprocessing the sample image to obtain a sample image with the image size being a preset image size; a sample image of a preset image size is input to the feature extractor.

In one embodiment, the apparatus 900 may further include:

the information display module is used for acquiring image classification information carrying the layering classification result; and displaying the image classification information on the image to be classified.

In an embodiment, the feature input module 902 is further configured to send the image to be classified to a server, so that the server correspondingly inputs at least two image features of the image to be classified to at least two image classifiers, and obtains a classification result of the image to be classified on a corresponding classification level, which is output by the image classifiers; and receiving the classification result obtained by the server.

For the specific definition of the image classification device, reference may be made to the above definition of the image classification method, which is not described herein again. The modules in the image classification device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, the computer device may be a terminal, an internal structure diagram of which may be as shown in fig. 10, and fig. 10 is an internal structure diagram of the computer device in one embodiment. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement an image classification method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 10 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.

In an embodiment, a computer-readable storage medium is provided, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method of image classification, the method comprising:

acquiring an image to be classified;

2. The method according to claim 1, wherein the inputting at least two image features of the image to be classified into at least two image classifiers comprises:

acquiring the at least two image characteristics through a pre-constructed characteristic extractor, and correspondingly inputting the at least two image characteristics to the at least two image classifiers; the feature extractor and the at least two image classifiers are constructed based on the similarity constraint relationship.

3. The method of claim 2, wherein the feature extractor comprises a feature extraction network and an encoder; the obtaining of the at least two image features by a pre-constructed feature extractor includes:

inputting the image to be classified into the feature extraction network to obtain the initial image features output by the feature extraction network;

inputting the initial image characteristics to the encoder to obtain encoded initial image characteristics output by the encoder;

and acquiring the at least two image characteristics based on the encoded initial image characteristics.

4. The method of claim 2, wherein prior to obtaining the at least two image features by the pre-constructed feature extractor, further comprising:

acquiring a sample image, and acquiring classification labels of the sample image on the at least two classification levels as real classification labels of the at least two image classifiers;

inputting the sample image into the feature extractor, and acquiring at least two sample image features with the same dimension according to the image features of the sample image output by the feature extractor;

inputting the at least two sample image features to the at least two image classifiers respectively, and obtaining predicted classification labels of the sample images output by the at least two image classifiers on corresponding classification levels;

constructing similarity constraint relation among sample image features input to image classifiers corresponding to adjacent classification levels;

training the feature extractor and the at least two image classifiers based on the real classification label, the prediction classification label and the similarity constraint relation to construct the feature extractor and the at least two image classifiers.

5. The method of claim 4, wherein the feature extractor comprises a feature extraction network and an encoder; the inputting the sample image into the feature extractor, and obtaining at least two sample image features with the same dimension according to the image features of the sample image output by the feature extractor includes:

inputting the sample image into the feature extraction network to obtain the initial sample image feature output by the feature extraction network;

inputting the initial sample image characteristics to the encoder to obtain encoded initial sample image characteristics output by the encoder;

splitting the initial sample image features into at least two sample image features with the same dimension.

6. The method of claim 4, wherein training the feature extractor and the at least two image classifiers based on the true classification label, the predicted classification label, and a similarity constraint relationship comprises:

according to the real classification label and the prediction classification label, constructing first loss functions corresponding to the two classification levels to obtain at least two first loss functions;

constructing a second loss function according to the similarity constraint relation;

training the feature extractor and the at least two image classifiers based on the at least two first loss functions and the second loss function such that the at least two first loss functions and the second loss function are maximized.

7. The method of claim 6, wherein the similarity constraint relationship comprises a mutual information constraint relationship or an orthogonal constraint relationship.

8. The method of claim 4, wherein inputting the sample image to the feature extractor comprises:

preprocessing the sample image to obtain a sample image with an image size of a preset image size;

inputting the sample image of the preset image size to the feature extractor.

9. The method according to claim 1, wherein after obtaining the hierarchical classification result of the image to be classified according to the classification result of the image to be classified at the corresponding classification level output by the image classifier, the method comprises:

acquiring image classification information carrying the hierarchical classification result;

and displaying the image classification information on the image to be classified.

10. The method according to claim 1, wherein the inputting at least two image features of the image to be classified into at least two image classifiers comprises:

sending the image to be classified to a server so that the server correspondingly inputs at least two image features of the image to be classified to the at least two image classifiers to obtain classification results of the image to be classified on corresponding classification levels output by the image classifiers;

and receiving the classification result obtained by the server.

11. An image classification apparatus, characterized in that the apparatus comprises:

the image acquisition module is used for acquiring an image to be classified;

12. The apparatus of claim 11, wherein the feature input module is further configured to obtain the at least two image features through a pre-constructed feature extractor, and correspondingly input the at least two image features to the at least two image classifiers; the feature extractor and the at least two image classifiers are constructed based on the similarity constraint relationship.

13. The apparatus of claim 12, wherein the feature extractor comprises a feature extraction network and an encoder; the feature input module is further configured to input the image to be classified into the feature extraction network, so as to obtain an initial image feature output by the feature extraction network; inputting the initial image characteristics to the encoder to obtain encoded initial image characteristics output by the encoder; and acquiring the at least two image characteristics based on the encoded initial image characteristics.

14. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor realizes the steps of the method of any one of claims 1 to 10 when executing the computer program.

15. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 10.