CN111783791A

CN111783791A - Image classification method, device and computer-readable storage medium

Info

Publication number: CN111783791A
Application number: CN202010075143.4A
Authority: CN
Inventors: 刘晓; 余京蕾; 王林芳; 梅涛
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2020-01-22
Filing date: 2020-01-22
Publication date: 2020-10-16

Abstract

The invention discloses an image classification method, an image classification device and a computer readable storage medium, and relates to the technical field of image processing. The image classification method comprises the following steps: extracting a target object from an image to be classified; extracting the features of the image to be classified and the features of the target object, and fusing the extracted features; and inputting the fused features into a classifier to obtain a classification result of the image to be classified. The method can extract and fuse the characteristics of the image to be classified and the target object, thereby classifying the image by adopting the characteristics simultaneously having visual semantic information and visual saliency information and improving the accuracy of image classification.

Description

Image classification method, device and computer-readable storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an image classification method and apparatus, and a computer-readable storage medium.

Background

The appearance of deep learning provides powerful technical support for image Classification (Classification), so that the accuracy of Classification is continuously improved. The related art performs classification based on an original pixel image.

Disclosure of Invention

The inventor finds out through analysis that the original pixel image carries out feature extraction and prediction based on the overall information of the image. However, the related art image classification result is not accurate.

The embodiment of the invention aims to solve the technical problem that: how to improve the accuracy of image classification.

According to a first aspect of some embodiments of the present invention, there is provided an image classification method, comprising: extracting a target object from an image to be classified; extracting the features of the image to be classified and the features of the target object, and fusing the extracted features; and inputting the fused features into a classifier to obtain a classification result of the image to be classified.

In some embodiments, extracting the features of the image to be classified and the features of the target object, and fusing the extracted features includes: extracting the features of the target object by adopting a first feature extraction network, wherein the input features of each layer of the first feature extraction network are the output features of the upper layer in the first feature extraction network; and extracting the features of the image to be classified by adopting a second feature extraction network, wherein the input features of each layer of the second feature extraction network comprise: and the output characteristics of the upper layer in the second characteristic extraction network and the output characteristics of the corresponding layer in the first characteristic extraction network are fused.

In some embodiments, extracting the features of the image to be classified and the features of the target object, and fusing the extracted features further includes: and splicing or weighting and summing the output features of the first feature extraction network and the features output by the second feature extraction network so as to input the splicing result or the weighted and summed result into the classifier.

In some embodiments, the input features of each layer of the second feature extraction network comprise: the output features of the upper layer in the second feature extraction network and the output features of the corresponding layer in the first feature extraction network are summed.

In some embodiments, extracting the target object from the image to be classified comprises: and acquiring a saliency image of the image to be classified.

In some embodiments, acquiring the target object of the image to be classified includes: calculating the color distance and the space distance between each image block and other blocks in the image to be classified; determining the difference degree between two different image blocks in the image to be classified according to the color distance and the space distance; determining a significant block in the image to be classified according to the difference degree; and constructing a saliency image of the image to be classified by adopting the saliency blocks.

In some embodiments, the image to be classified has a plurality of channels, and the saliency image has one channel.

In some embodiments, the image classification method further comprises: acquiring a target object of a training image, wherein the training image has a labeled classification result; the image classification model is trained by adopting a training image and a target object of the training image, the image classification model comprises a feature extraction network and a classifier, and the feature extraction network is used for extracting features of the training image and features of the target object of the training image and fusing the extracted features.

According to a second aspect of some embodiments of the present invention, there is provided an image classification apparatus comprising: a target object extraction module configured to extract a target object from an image to be classified; the characteristic fusion module is configured to extract the characteristics of the image to be classified and the characteristics of the target object and fuse the extracted characteristics; and the classification module is configured to input the fused features into the classifier and obtain a classification result of the image to be classified.

According to a third aspect of some embodiments of the present invention, there is provided an image classification apparatus including: a memory; and a processor coupled to the memory, the processor configured to perform any of the foregoing image classification methods based on instructions stored in the memory.

According to a fourth aspect of some embodiments of the present invention, there is provided a computer readable storage medium having a computer program stored thereon, wherein the program, when executed by a processor, implements any one of the image classification methods described above.

Some embodiments of the above invention have the following advantages or benefits: the method can extract and fuse the characteristics of the image to be classified and the target object, thereby classifying the image by adopting the characteristics simultaneously having visual semantic information and visual saliency information and improving the accuracy of image classification.

Other features of the present invention and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 illustrates a flow diagram of an image classification method according to some embodiments of the invention.

Fig. 2 exemplarily shows a schematic structural diagram of the first feature extraction network and the second feature extraction network.

FIG. 3 illustrates a flow diagram of an image classification model training method according to some embodiments of the invention.

Fig. 4 exemplarily shows a structure of the image classification model.

FIG. 5 illustrates a schematic structural diagram of an image classification apparatus according to some embodiments of the invention.

Fig. 6 is a schematic structural diagram illustrating an image classification apparatus according to further embodiments of the present invention.

Fig. 7 shows a schematic structural diagram of an image classification apparatus according to further embodiments of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

After further analysis, the inventor finds that from the physiological point of view, the scholars propose: there are two visual perception areas in the cerebral cortex, the so-called "What" pathway related to visual semantics, mainly to solve the visual semantic information of objects; the so-called "Where" path associated with object location information is primarily to address the visual saliency information of objects. The invention considers the two kinds of information simultaneously when classifying the image so as to obtain better classifying effect. An embodiment of the image classification method of the present invention is described below with reference to fig. 1.

FIG. 1 illustrates a flow diagram of an image classification method according to some embodiments of the invention. As shown in fig. 1, the image classification method of this embodiment includes steps S102 to S106.

In step S102, a target object is extracted from the image to be classified.

The target object is an object of interest for human vision in the image to be classified.

In some embodiments, the extraction of the target object may be achieved by acquiring a saliency image of the image to be classified. The Saliency image refers to an image obtained by Visual Saliency Detection (Visual salience Detection). The visual saliency detection simulates the visual characteristics of human beings through an algorithm, and extracts a salient region in an image, namely a region interested in human vision. Visual saliency detection algorithms include LC algorithms based on pixel global contrast, HC algorithms based on color information, FT algorithms using color features and luminance features, and AC algorithms that perform calculations based on neighborhoods, among others.

In some embodiments, the color distance and the space distance of each image block from other blocks in the image to be classified are calculated; determining the difference degree between two different image blocks in the image to be classified according to the color distance and the space distance; determining a significant block in the image to be classified according to the difference degree; and constructing a saliency image of the image to be classified by adopting the saliency blocks. In calculating the color distance, the color may be mapped to a point in color space and the distance of the point in color space is determined. The spatial distance refers to the distance between the positions of different image blocks in the image. Therefore, color blocks which are more prominent in color and space can be extracted, and a significant image is generated.

As required, those skilled in the art may also obtain the target object by other methods, for example, by using a target detection method, and details are not described herein.

In step S104, the features of the image to be classified and the features of the target object are extracted, and the extracted features are fused.

In some embodiments, the features of the image to be classified and the target object may be extracted and fused by using a feature extraction network, or feature fusion may be performed in the feature extraction process.

Therefore, the fused features have visual semantic information and visual saliency information at the same time.

In step S106, the fused features are input into a classifier, and a classification result of the image to be classified is obtained. The classification result may be, for example, a sea image, a plant image, a building image, an animal image, or the like.

By the method of the embodiment, the features of the image to be classified and the target object can be extracted and fused, so that the features with visual semantic information and visual saliency information are adopted for classification, and the accuracy of image classification is improved.

In some embodiments, the image to be classified and the target object classification may be input into two feature extraction networks. Extracting the features of the target object by adopting a first feature extraction network, wherein the input features of each layer of the first feature extraction network are the output features of the upper layer in the first feature extraction network; and extracting the features of the image to be classified by adopting a second feature extraction network, wherein the input features of each layer of the second feature extraction network comprise: and the output characteristics of the upper layer in the second characteristic extraction network and the output characteristics of the corresponding layer in the first characteristic extraction network are fused.

In some embodiments, the first layer of the first feature extraction network receives data of a target object, such as a saliency image, and the first layer of the second feature extraction network receives data of an image to be classified.

Through the embodiment, the first feature extraction network extracts the features of the target object; the second feature extraction network continuously refers to the intermediate features of the target object in the process of extracting the information of the image to be classified, so that the visual semantic information extracted from the image to be classified is more targeted. The mode of continuously carrying out fusion processing in the extraction process can better fuse the image to be classified and the characteristics of the target object.

In some embodiments, the input features of each layer of the second feature extraction network comprise: the output features of the upper layer in the second feature extraction network and the output features of the corresponding layer in the first feature extraction network are summed. In this case, the respective layers of the first feature extraction network and the second feature extraction network have the same number of nodes, so that the output results of the two layers can be directly added. If necessary, besides the addition, other methods such as weighted summation, vector splicing, etc. can be adopted by those skilled in the art, and are not described herein again.

Fig. 2 exemplarily shows a schematic structural diagram of the first feature extraction network and the second feature extraction network. As shown in fig. 2, the mth layer of the first feature extraction network receives the input of the m-1 th layer, the m +1 th layer receives the input of the mth layer, and so on; the nth layer of the second feature extraction network receives inputs from the nth-1 layer of the second feature extraction network and the mth-1 layer of the first feature extraction network, the (n + 1) th layer receives inputs from the nth layer of the second feature extraction network and the mth layer of the first feature extraction network, and so on. m and n may be set to the same value or different values, that is, the second feature extraction network may perform feature fusion layer by layer or may perform fusion using partial layers.

In some embodiments, the output features of the first feature extraction network are spliced or weighted summed with the features output by the second feature extraction network so that the result of the splicing or weighted summation is input into the classifier. For example, the network for extracting features includes a first feature extraction network, a second feature extraction network, and a feature extraction layer for further merging the output results of the first feature extraction network and the second feature extraction network. Thus, the final fusion processing can be performed intuitively.

Since the output result of the second feature extraction network also has features extracted from the image to be classified, a person skilled in the art can also directly input the output result of the second feature extraction network into the classifier as needed.

The first feature extraction network and the second feature extraction network may have a structure of a network such as Alexnet, Vggnet, Caffenet, Resnet, or the like.

An embodiment of the image classification model training method of the present invention is described below with reference to fig. 3.

FIG. 3 illustrates a flow diagram of an image classification model training method according to some embodiments of the invention. As shown in fig. 3, the image classification model training method of this embodiment includes steps S302 to S304.

In step S302, a target object of a training image is acquired, wherein the training image has labeled classification results.

In step S304, the image classification model is trained using the training image and the target object of the training image.

The image classification model comprises a feature extraction network and a classifier, wherein the feature extraction network is used for extracting features of the training images and features of target objects of the training images and fusing the extracted features. An exemplary structure of the image classification model is shown in fig. 4.

In fig. 4, a target object inputs a first feature extraction network to form a visual saliency path; and inputting the image to be classified into a second feature extraction network to form a visual semantic path. And the output results of the first characteristic extraction network and the second characteristic extraction network are input into the characteristic fusion layer for further fusion. The first feature extraction network, the second feature extraction network and the feature fusion layer jointly form a feature extraction network. The output result of the feature extraction network is input into the classifier. The dual-channel deep learning neural network structure provided with visual saliency assistance by the target object can realize feature fusion of the image to be classified and the target object.

Then, the trained image classification model can be used to obtain the classification result of the image.

By the method of the embodiment, the marked data can be adopted to train the feature extraction network and the classifier together, and the accuracy of image classification is improved.

An embodiment of the image classification apparatus of the present invention is described below with reference to fig. 5.

FIG. 5 illustrates a schematic structural diagram of an image classification apparatus according to some embodiments of the invention. As shown in fig. 5, the image classification device 50 of this embodiment includes: a target object extraction module 510 configured to extract a target object from an image to be classified; a feature fusion module 520 configured to extract features of the image to be classified and features of the target object, and fuse the extracted features; and the classification module 530 is configured to input the fused features into a classifier, and obtain a classification result of the image to be classified.

In some embodiments, the feature fusion module 520 is further configured to extract features of the target object using a first feature extraction network, the input features of each layer of the first feature extraction network being output features of a higher layer in the first feature extraction network; and extracting the features of the image to be classified by adopting a second feature extraction network, wherein the input features of each layer of the second feature extraction network comprise: and the output characteristics of the upper layer in the second characteristic extraction network and the output characteristics of the corresponding layer in the first characteristic extraction network are fused.

In some embodiments, the feature fusion module 520 is further configured to splice or weight-sum the output features of the first feature extraction network with the features output by the second feature extraction network, so as to input the spliced result or the weighted-sum result into the classifier.

In some embodiments, the target object extraction module 510 is further configured to obtain a saliency image of the image to be classified.

In some embodiments, the target object extraction module 510 is further configured to calculate a color distance and a spatial distance of each image block from other blocks in the image to be classified; determining the difference degree between two different image blocks in the image to be classified according to the color distance and the space distance; determining a significant block in the image to be classified according to the difference degree; and constructing a saliency image of the image to be classified by adopting the saliency blocks.

In some embodiments, the image classification device 50 further comprises: a training module 540 configured to obtain a target object of a training image, the training image having labeled classification results; the image classification model is trained by adopting a training image and a target object of the training image, the image classification model comprises a feature extraction network and a classifier, and the feature extraction network is used for extracting features of the training image and features of the target object of the training image and fusing the extracted features.

Fig. 6 is a schematic structural diagram illustrating an image classification apparatus according to further embodiments of the present invention. As shown in fig. 6, the image classification device 60 of this embodiment includes: a memory 610 and a processor 620 coupled to the memory 610, the processor 620 being configured to perform the image classification method of any of the preceding embodiments based on instructions stored in the memory 610.

Memory 610 may include, for example, system memory, fixed non-volatile storage media, and the like. The system memory stores, for example, an operating system, an application program, a Boot Loader (Boot Loader), and other programs.

Fig. 7 shows a schematic structural diagram of an image classification apparatus according to further embodiments of the present invention. As shown in fig. 7, the image classification device 70 of this embodiment includes: the memory 710 and the processor 720 may further include an input/output interface 730, a network interface 740, a storage interface 750, and the like. These

interfaces

730, 740, 750, as well as the memory 710 and the processor 720, may be connected, for example, by a bus 760. The input/output interface 730 provides a connection interface for input/output devices such as a display, a mouse, a keyboard, and a touch screen. The network interface 740 provides a connection interface for various networking devices. The storage interface 750 provides a connection interface for external storage devices such as an SD card and a usb disk.

An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, wherein the program, when executed by a processor, implements any one of the image classification methods described above.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. An image classification method, comprising:

extracting a target object from an image to be classified;

extracting the features of the image to be classified and the features of the target object, and fusing the extracted features;

and inputting the fused features into a classifier to obtain a classification result of the image to be classified.

2. The image classification method according to claim 1, wherein the extracting features of the image to be classified and features of the target object and fusing the extracted features comprises:

extracting the features of the target object by adopting a first feature extraction network, wherein the input features of each layer of the first feature extraction network are the output features of the upper layer in the first feature extraction network;

extracting the features of the image to be classified by adopting a second feature extraction network, wherein the input features of each layer of the second feature extraction network comprise: and the output features of the upper layer in the second feature extraction network and the fusion result of the output features of the corresponding layer in the first feature extraction network.

3. The image classification method according to claim 1, wherein the extracting features of the image to be classified and features of the target object and fusing the extracted features further comprises:

and splicing or weighting and summing the output features of the first feature extraction network and the features output by the second feature extraction network so as to input the splicing result or the weighted and summed result into the classifier.

4. The image classification method of claim 1, wherein the input features of each layer of the second feature extraction network comprise: the sum of the output features of a previous layer in the second feature extraction network and the output features of a corresponding layer in the first feature extraction network.

5. The image classification method according to claim 1, wherein the extracting of the target object from the image to be classified comprises:

and acquiring a saliency image of the image to be classified.

6. The image classification method according to claim 5, wherein the acquiring a target object of the image to be classified includes:

calculating the color distance and the space distance between each image block and other blocks in the image to be classified;

determining the difference degree between two different image blocks in the image to be classified according to the color distance and the space distance;

determining a significant block in the image to be classified according to the difference degree;

and constructing a saliency image of the image to be classified by adopting the saliency blocks.

7. The image classification method according to claim 5, wherein the image to be classified has a plurality of channels, and the saliency image has one channel.

8. The image classification method according to any one of claims 1 to 7, further comprising:

acquiring a target object of a training image, wherein the training image has a labeled classification result;

and training an image classification model by adopting the training image and the target object of the training image, wherein the image classification model comprises a feature extraction network and a classifier, and the feature extraction network is used for extracting the features of the training image and the features of the target object of the training image and fusing the extracted features.

9. An image classification apparatus comprising:

a target object extraction module configured to extract a target object from an image to be classified;

the feature fusion module is configured to extract features of the image to be classified and features of the target object and fuse the extracted features;

and the classification module is configured to input the fused features into a classifier to obtain a classification result of the image to be classified.

10. An image classification apparatus comprising:

a memory; and

a processor coupled to the memory, the processor configured to perform the image classification method of any of claims 1-8 based on instructions stored in the memory.

11. A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, implements the image classification method of any one of claims 1 to 8.