CN111783822B

CN111783822B - Image classification method, device and storage medium

Info

Publication number: CN111783822B
Application number: CN202010432868.4A
Authority: CN
Inventors: 申世伟
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2020-05-20
Filing date: 2020-05-20
Publication date: 2024-04-16
Anticipated expiration: 2040-05-20
Also published as: CN111783822A

Abstract

The present disclosure relates to an image classification method, apparatus and storage medium, the image classification method comprising: and obtaining a feature map of the original image, reconstructing the feature map to obtain a target image corresponding to the original image, and determining the category information of the target included in the original image according to the target image. In the image classification process, extracting features of an original image to obtain a feature image, reconstructing the feature image according to the feature image to obtain a target image only comprising features in the original image, and classifying the original image by classifying the target image. In the process of acquiring the feature map, the original image is compressed, other information irrelevant to the features in the image is deleted, and the target image generated according to the feature map only comprises the features in the original image. Therefore, when classifying the target image, the classification process is not influenced by other information in the original image, and the accuracy of image classification can be improved.

Description

Image classification method, device and storage medium

Technical Field

The disclosure relates to the field of computer technology, and in particular, to an image classification method, an image classification device and a storage medium.

Background

The image classification is to distinguish images of different categories according to the characteristics of the images, and is the basis of visual tasks such as image detection, image segmentation, object tracking, behavior analysis and the like. The image classification is applied to a plurality of technical fields, such as face recognition in security and protection fields, scene recognition in traffic fields, automatic classification of content-based image retrieval and album in the Internet field, image recognition in medical fields and the like.

In the image classification process, the accuracy of image classification is low due to the fact that fine interference information which is difficult to identify exists in the images.

Disclosure of Invention

The disclosure provides an image classification method, an image classification device and a storage medium, so as to at least solve the problem of low accuracy in the image classification process.

The technical scheme of the present disclosure is as follows:

according to a first aspect of an embodiment of the present disclosure, there is provided an image classification method, including:

acquiring an original image;

acquiring a feature map of the original image, and reconstructing the feature map to obtain a target image corresponding to the original image;

and determining the category information of the target included in the original image according to the target image.

Optionally, the obtaining a feature map of the original image, and reconstructing according to the feature map to obtain a target image corresponding to the original image includes:

Inputting the original image into a first target model, performing downsampling on the original image through the first target model to obtain the feature image, performing upsampling on the feature image, and reconstructing to obtain the target image, wherein the first target model is obtained by training a first preset model according to a first sample image set.

Optionally, the determining, according to the target image, category information of a target included in the original image includes:

and inputting the target image into a second target model, so as to perform downsampling on the target image through the second target model, obtain a feature map of the target image, and determine the category information according to the feature map of the target image, wherein the second target model is obtained by training a second preset model according to a second sample image set.

Optionally, before the capturing the original image, the method further includes:

acquiring a plurality of sample images;

inputting a first sample image into a first preset model to obtain a reconstructed image corresponding to the first sample image, wherein the first sample image is any one image of the plurality of sample images;

Training the first preset model according to the first sample image and the reconstructed image to obtain a first target model;

inputting a second sample image into a second preset model to obtain category information of a target included in the second sample image, wherein the second sample image is any one image of the plurality of sample images;

training the second preset model according to the label information and the category information of the second sample image to obtain a second target model;

the obtaining the feature map of the original image, and reconstructing the feature map to obtain a target image corresponding to the original image, including:

inputting the original image into the first target model to obtain the target image;

the determining, according to the target image, category information of a target included in the original image includes:

and inputting the target image into the second target model to obtain the category information.

acquiring a plurality of sample images;

inputting a target sample image into a first preset model to obtain a reconstructed image corresponding to the target sample image, wherein the target sample image is any one image of the plurality of sample images;

Inputting the reconstructed image into a second preset model to obtain category information of a target included in the target sample image;

training the first preset model and the second preset model according to the label information of the target sample image and the category information of the target included in the target sample image to obtain a first target model and a second target model;

According to a second aspect of embodiments of the present disclosure, there is provided an image classification apparatus, comprising:

a first acquisition module configured to acquire an original image;

the second acquisition module is configured to acquire a feature map of the original image and reconstruct a target image corresponding to the original image according to the feature map;

And the determining module is configured to determine category information of a target included in the original image according to the target image.

Optionally, the second obtaining module is specifically configured to input the original image into a first target model, so as to obtain the feature map by downsampling the original image through the first target model, and upsample the feature map, and reconstruct to obtain the target image, where the first target model is obtained by training a first preset model according to a first sample image set.

Optionally, the determining module is specifically configured to input the target image into a second target model, so as to downsample the target image through the second target model, obtain a feature map of the target image, and determine the category information according to the feature map of the target image, where the second target model is obtained by training a second preset model according to a second sample image set.

Optionally, the method further comprises:

a third acquisition module configured to acquire a plurality of sample images;

the first input module is configured to input a first sample image into a first preset model to obtain a reconstructed image corresponding to the first sample image, wherein the first sample image is any one image of the plurality of sample images;

The first training module is configured to train the first preset model according to the first sample image and the reconstruction image to obtain a first target model;

the second input module is configured to input a second sample image into a second preset model to obtain category information of a target included in the second sample image, wherein the second sample image is any one image of the plurality of sample images;

the second training module is configured to train the second preset model according to the label information of the second sample image and the category information of the target included in the second sample image to obtain a second target model;

the second acquisition module is specifically configured to input the original image into the first target model to obtain the target image;

the determining module is specifically configured to input the target image into the second target model to obtain the category information.

Optionally, the method further comprises:

a fourth acquisition module configured to acquire a plurality of sample images;

the third input module is configured to input a target sample image into a first preset model to obtain a reconstructed image corresponding to the target sample image, wherein the target sample image is any one image of the plurality of sample images;

The fourth input module is configured to input the reconstructed image into a second preset model to obtain category information of a target included in the target sample image;

the third training module is configured to train the first preset model and the second preset model according to the label information of the target sample image and the category information of the target included in the target sample image to obtain a first target model and a second target model;

According to a third aspect of embodiments of the present disclosure, there is provided another electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the image classification method as provided in the first aspect of the embodiments of the disclosure described above.

According to a fourth aspect of embodiments of the present disclosure, there is provided a storage medium, which when executed by a processor of an electronic device, enables the electronic device to perform the image classification method as provided in the first aspect of embodiments of the present disclosure described above.

According to a fifth aspect of embodiments of the present disclosure there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the image classification method as provided in the first aspect of embodiments of the present disclosure.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

in this embodiment, a feature map of an original image is obtained, a target image corresponding to the original image is obtained through reconstruction according to the feature map, and category information of a target included in the original image is determined according to the target image. In the image classification process, firstly, extracting features of an original image to obtain a feature image, then reconstructing the feature image according to the feature image to obtain a target image only comprising features in the original image, and classifying the original image by classifying the target image. Because the target image is generated according to the feature map of the original image, the original image is compressed in the process of extracting the features of the original image and acquiring the feature map, other information (such as interference information or noise information) irrelevant to the features in the original image is deleted, and the target image generated according to the feature map only comprises the features in the original image. Therefore, when classifying the target image, the classification process is not influenced by other information in the original image, and the accuracy of image classification can be improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and do not constitute an undue limitation on the disclosure.

FIG. 1 is a flow chart illustrating a method of image classification according to an exemplary embodiment;

FIG. 2 is a block diagram of an image classification model, according to an exemplary embodiment;

FIG. 3 is a flowchart illustrating another image classification method according to an exemplary embodiment;

FIG. 4 is a block diagram of an image classification device according to an exemplary embodiment;

FIG. 5 is a block diagram of another electronic device shown in accordance with an exemplary embodiment;

fig. 6 is a block diagram illustrating yet another image classification apparatus according to an exemplary embodiment.

Detailed Description

In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

Fig. 1 is a flowchart illustrating an image classification method according to an exemplary embodiment, and referring to fig. 1, the image classification method provided in this embodiment may be applied to image classification to improve accuracy of image classification. The image classification method provided in this embodiment may be performed by an image classification device, which is typically implemented in software and/or hardware, and the image classification device may be provided in an electronic device, and the method may include the following steps:

Step 101, acquiring an original image.

In this embodiment, the original image may be pre-stored image data, or may be image data directly input by the user, or may be image data directly downloaded from a server, and the method for obtaining the original image may be set according to the requirement, which is not limited in this embodiment.

Step 102, obtaining a feature map of the original image, and reconstructing the object image corresponding to the original image according to the feature map.

The feature map is obtained by extracting features of the original image and only comprises features in the original image. The target image is reconstructed according to the feature map, and the obtained image only comprises the features in the original image. The features in the original image include, for example, color features, texture features, shape features, spatial relationship features, and the like in the image, and the kind of the specifically extracted features may be set according to the need, which is not limited in this embodiment.

In this embodiment, after the original image is acquired, feature extraction may be performed on the original image first to acquire a feature map of the original image. For example, the original image may be downsampled a preset number of times to obtain a feature map of the original image. Wherein each downsampling includes one convolution calculation and a corresponding one downsampling calculation. For example, the original image may be downsampled 3 times, first, the original image is subjected to a first convolution calculation, and a first downsampling calculation (first downsampling) is performed according to the result of the first convolution calculation; then, performing a second convolution calculation according to the result of the first convolution calculation, and performing a second convolution calculation (second downsampling) according to the result of the second convolution calculation; and finally, performing third convolution calculation on the result of the second convolution calculation, and performing third convolution calculation (third downsampling) on the result of the third convolution calculation to obtain a feature map of the original image. The convolution calculation and the down-sampling calculation are described in detail with reference to the prior art, and this embodiment will not be described in detail. In practical application, the number of downsampling times can be set according to requirements, and parameters adopted in each convolution calculation and downsampling calculation process can be set according to requirements.

After the feature map of the original image is obtained, the reconstruction can be performed according to the feature map, and the target image corresponding to the original image is obtained. Specifically, up-sampling can be performed on the feature map for a preset number of times, so as to reconstruct and obtain a target image. Wherein each up-sampling includes a deconvolution calculation and a corresponding up-sampling calculation. For example, the feature map may be up-sampled 3 times, first the feature map is subjected to a first deconvolution calculation, and the result of the first deconvolution calculation is subjected to a first up-sampling calculation (first up-sampling); then, performing a second deconvolution calculation according to the result of the first upsampling calculation, and performing a second upsampling calculation (second upsampling) according to the result of the second deconvolution calculation; and finally, performing third deconvolution calculation on the result of the second upsampling calculation, and performing third upsampling calculation (third upsampling) on the result of the third deconvolution calculation, thereby obtaining a target image corresponding to the original image. The processes of deconvolution computation and downsampling computation may refer to the prior art, which is not described in detail in this embodiment. In practical application, the up-sampling frequency can be set according to the requirement, and parameters adopted in each deconvolution calculation and up-sampling calculation process can be set according to the requirement.

It should be noted that, the above is only an exemplary method for obtaining the feature map of the original image and reconstructing the feature map to obtain the target image, and in practical application, the feature map of the original image may also be obtained by other methods, and the target image may be obtained by reconstructing the feature map, which is not limited in this embodiment.

Step 103, determining category information of the target included in the original image according to the target image.

In this embodiment, in combination with step 101 and step 102, since the target image is an image reconstructed from the feature map of the original image, the target image includes features in the extracted original image. Therefore, after the target image is reconstructed, the target image can be directly classified, the category information of the target included in the target image is determined, and the category information of the target included in the target image is used as the category information of the target included in the original image. The category information may be, for example, category information of a target included in the original image, and a probability value corresponding to the category information, and specific parameters included in the category information may be set according to requirements. The method for classifying the target image may refer to an image classification method in the prior art, and this embodiment is not limited thereto.

In summary, in this embodiment, a feature map of an original image is obtained, a target image corresponding to the original image is obtained by reconstructing according to the feature map, and category information of a target included in the original image is determined according to the target image. In the image classification process, firstly, extracting features of an original image to obtain a feature image, then reconstructing the feature image according to the feature image to obtain a target image only comprising features in the original image, and classifying the original image by classifying the target image. Because the target image is generated according to the feature map of the original image, the original image is compressed in the process of extracting the features of the original image and acquiring the feature map, other information (such as interference information or noise information) irrelevant to the features in the original image is deleted, and the target image generated according to the feature map only comprises the features in the original image. Therefore, when classifying the target image, the classification process is not influenced by other information in the original image, and the accuracy of image classification can be improved.

Optionally, before the original image is acquired, the first preset model may be trained to obtain a first target model, and the second preset model may be trained to obtain a second target model.

Referring to fig. 2, fig. 2 is a block diagram illustrating an image classification model according to an exemplary embodiment, and as shown in fig. 2, the classification model may include a first preset model and a second preset model, and an output of the first preset model is an input of the second preset model. The method comprises the steps of extracting features of a sample image by a first preset model to obtain a feature image of the sample image, reconstructing according to the feature image to obtain a target image corresponding to the sample image, classifying the target image by a second preset model to obtain a classification result, and taking the classification result of the target image as the classification result of the sample image.

Specifically, a classification model including a first preset model and a second preset model may be constructed. The first preset model may include a feature extraction module and an image reconstruction module. For example, the feature extraction module may include a preset number of convolution layers, and a downsampling layer corresponding to each convolution layer, and a corresponding activation function may be set between the convolution layer and the downsampling layer. The feature extraction module may include 3 convolution layers, convolution layer a, convolution layer B, and convolution layer C, and each convolution layer corresponds to a downsampling layer a, a downsampling layer B, and a downsampling layer C, respectively. In the training process of the first preset model, the convolution layer A is used for carrying out first convolution calculation on a sample image, after the result of the first convolution calculation is processed through an activation function between the convolution layer A and the downsampling layer a, the result of the first convolution calculation is input into the downsampling layer a, the first downsampling calculation is carried out through the downsampling layer a, and the result of the first downsampling calculation is input into the convolution layer B; then, performing second convolution calculation on the result of the first convolution calculation through the convolution layer B, processing the result of the second convolution calculation through an activation function between the convolution layer B and the downsampling layer B, inputting the result of the second convolution calculation into the downsampling layer B, performing second downsampling calculation through the downsampling layer B, and inputting the result of the second downsampling calculation into the convolution layer C; and finally, performing third convolution calculation on the result of the second downsampling calculation through the convolution C, processing the result of the third convolution calculation through an activation function between the convolution layer C and the downsampling layer C, inputting the result of the third convolution calculation into the downsampling layer C, and performing third downsampling calculation through the downsampling layer C to obtain a feature map of the sample image.

Correspondingly, the image reconstruction module may include a deconvolution layer D, a deconvolution layer E, and a deconvolution layer F, and an up-sampling layer D, an up-sampling layer E, and an up-sampling layer F corresponding to each deconvolution layer, respectively, and a corresponding activation function may be set between the deconvolution layer and the up-sampling layer. In the training process of the image reconstruction module, firstly, performing first deconvolution calculation on a feature map through a deconvolution layer D, processing a result of the first deconvolution calculation through an activation function between the deconvolution layer D and an up-sampling layer D, inputting the result into the up-sampling layer D, performing first up-sampling calculation through the up-sampling layer D, and inputting a result of the first up-sampling calculation into an deconvolution layer E; then, performing second deconvolution calculation on the result of the first up-sampling calculation through the deconvolution layer E, processing the result of the second convolution calculation through an activation function between the deconvolution layer E and the up-sampling layer E, inputting the result of the second convolution calculation into the up-sampling layer E, performing second up-sampling calculation through the up-sampling layer E, and inputting the result of the second up-sampling calculation into the deconvolution layer F; and finally, performing third convolution calculation on the result of the second up-sampling calculation through the deconvolution layer F, processing the result of the third convolution calculation through an activation function between the deconvolution layer F and the up-sampling layer F, inputting the result into the up-sampling layer F, performing third up-sampling calculation through the up-sampling layer F, obtaining a target image corresponding to the sample image, and completing reconstruction of the target image.

The convolution calculation and the downsampling calculation of the sample image, and the deconvolution calculation and the upsampling calculation of the feature map may refer to the convolution calculation and the sampling calculation in the prior art, which will not be described in detail in this embodiment. It should be noted that, the specific structures of the first preset model and the second preset model may be set according to the requirement, and only the sample image input into the first preset model and the output target image are required to be kept the same in size, so as to calculate the reconstruction error between the sample image and the output target image in the training process, and adjust the parameters of the first preset model.

In this embodiment, a second preset model may be constructed, where the second preset model may include a preset number of convolution layers, a downsampling layer corresponding to each convolution layer, a full connection layer, and an output layer. The specific structure of the second preset model may refer to the structure of the classification model in the prior art, which is not limited in this embodiment.

In practical application, the output layer of the first preset model can be directly used as the input layer of the second preset model, or the output layer of the first preset model and the input layer of the second preset model can be connected, and the classification model is obtained through combination.

The training of the first preset model to obtain the first target model and the training of the second preset model to obtain the second target model can be achieved in the following two modes:

mode one may include the steps of:

acquiring a plurality of sample images;

inputting a first sample image into a first preset model to obtain a reconstructed image corresponding to the first sample image, wherein the first sample image is any one image of a plurality of sample images;

training a first preset model according to the first sample image and the reconstructed image to obtain a first target model;

and inputting a second sample image into a second preset model to obtain category information of a target included in the second sample image, wherein the second sample image is any one image of a plurality of sample images.

Training a second preset model according to the label information of the second sample image and the category information of the target included in the second sample image to obtain a second target model.

In this embodiment, the first preset model in the classification model may be trained separately to obtain the first target model. Specifically, according to a preset rule, part or all of the plurality of sample images may be selected as a first sample image, and the first preset model may be trained through the first sample image.

In the process of training the first preset model, firstly, fixing parameters of the second preset model, then, sequentially inputting each first sample image into the first preset model, and obtaining a reconstructed image corresponding to the first sample image through the first preset model. And calculating a reconstruction error between the first sample image and the reconstructed image, adjusting parameters of a first preset model according to the reconstruction error between the first sample image and the reconstructed image, and training the first preset model. For example, after inputting the first sample image to obtain the reconstructed image, calculating a root mean square error (RMSE, root Mean Square Error) between the first sample image and the reconstructed image, that is, using the root mean square error as the reconstructed error, adjusting parameters of the first preset model by the RMSE value, and determining that training is completed when the RMSE value is constant or smaller than the preset value, thereby obtaining the first target model. Specifically, parameters of the first preset model are adjusted according to RMSE values, and the process of training the first preset model may refer to the prior art, which will not be described in detail in this embodiment. In practical applications, the reconstruction error between the first sample image and the reconstructed image may also be calculated by other methods, which is not limited in this embodiment.

It should be noted that, when the first preset model is trained independently, parameters of the second preset model may be kept unchanged, and a reconstructed image may be directly obtained from an output layer of the first preset model, for example, an output result (reconstructed image) of the upsampling layer c may be directly obtained, and a reconstruction error between the output result of the upsampling layer c and the first sample image may be calculated. The method for training the first preset model may refer to the prior art, which is not limited in this embodiment.

In this embodiment, the second preset model in the classification model may be trained separately to obtain the second target model. Specifically, part or all of the plurality of sample images may be selected as the second sample image according to a preset rule, and the second preset model is trained through the second sample image. The first sample image and the second sample image may be the same or different sample images, which is not limited in this embodiment.

After the first preset model is trained to obtain a first target model, parameters of the first target model can be kept unchanged, training of the second preset model is started, each second sample image is sequentially input into the second preset model, category information of targets included in the second sample image is obtained through the second preset model, errors between the category information of the targets included in the second sample image and label information of the second sample image are calculated, parameters of the second preset model are adjusted according to the errors between the category information of the targets included in the second sample image and the label information of the second sample image, and training of the second preset model is performed. For example, an error between the category information of the target included in the second sample image and the label information of the second sample image may be calculated through a cross entropy loss function, parameters of the second preset model may be adjusted through an error value, and when the error value is constant or smaller than a preset value, it is determined that training is completed, and the second target model is obtained. Specifically, an error between the category information of the target included in the second sample image and the label information of the second sample image is calculated, parameters of the second preset model are adjusted according to the error value, and the process of training the second preset model can refer to the prior art, which is not limited in this embodiment.

It should be noted that, when the second preset model is trained separately, the parameters of the first target model may be kept unchanged, and the second sample image may be directly input from the input layer of the second preset model. The method for training the second preset model may refer to the prior art, which is not limited in this embodiment.

In practical application, the first preset model can be trained to obtain the first target model, then parameters of the first target model are kept unchanged, and the second preset model is trained. Or training the second preset model to obtain a second target model, and then keeping the parameters of the second target model unchanged, and training the first preset model, which is not limited in this embodiment.

In this embodiment, after training the first preset model and the second preset model is completed, a first target model and a second target model are respectively obtained, training the classification model is completed, and a trained classification model is obtained, that is, the trained classification model includes the first target model and the second target model.

In practical application, can train the first model of predetermineeing and the model is predetermine alone to the second, because the first model of predetermineeing and the model is predetermine to the second after the separation simple structure, therefore the convergence of model in the training process is better, and the time of training use is shorter to the first model of predetermineeing and the accuracy of model is predetermineeing to the second that the training obtained is higher.

The second mode may include the following steps:

acquiring a plurality of sample images;

inputting a target sample image into a first preset model to obtain a reconstructed image corresponding to the target sample image, wherein the target sample image is any one of a plurality of sample images;

training the first preset model and the second preset model according to the label information of the target sample image and the category information of the target included in the target sample image to obtain the first target model and the second target model.

In this embodiment, the entire classification model may be trained directly through the sample image, a plurality of sample images may be input into the classification model sequentially, a reconstructed image corresponding to the sample image is obtained through a first preset model in the classification model, then the type information of the target included in the sample image is determined according to the reconstructed image by a second preset model, finally an error value between the type information of the target included in the sample image and the tag information of the sample image is calculated, parameters of the first preset model are adjusted through the error value, a first target model is obtained, and parameters of the second preset model are adjusted through the error value, so as to obtain a second target model. The process of determining the category information of the target included in the sample image through the second preset model can refer to the first mode, and this embodiment will not be described in detail.

After the trained classification model is obtained through training, the classification information of the target included in the original image can be determined directly through the trained classification model.

Specifically, the original image can be directly input into the trained classification model, namely, the original image is directly input into a first target model in the trained classification model, in the running process of the trained classification model, a target image corresponding to the original image is obtained through the first target model, then the target image is input into a second target model, and the category information of the target included in the original image is obtained through the second target model. The specific process of determining the class information of the target included in the original image through the trained classification model may refer to the processing process of the sample image in the training process of the first preset model and the second preset model, which is not described herein in detail.

In practical applications, when classifying an original image by a classification model, a challenge sample exists for each classification model. The countermeasure sample is image data carrying interference information formed by manually adding fine interference information which cannot be perceived by human eyes into the image data. When the challenge sample is input into the classification model, the classification model can be caused to be wrongly classified with high probability, and wrong class information is obtained. The presence of the challenge sample reduces the accuracy of the classification model and deteriorates the robustness. In this embodiment, when the classification model is used to classify the original image, the compression processing is performed on the original image, and other information in the original image is deleted. If the original image is an countermeasure sample, the interference information in the countermeasure sample is included in other information with a high probability, so that the classification model is not influenced by the interference information in the countermeasure sample in the image classification process, the recognition capability of the classification model on the countermeasure sample is improved, and the accuracy and the robustness of the classification model are further improved.

FIG. 3 is a flowchart illustrating another image classification method according to an exemplary embodiment, referring to FIG. 3, the method may include:

step 301, acquiring an original image.

Step 302, inputting the original image into a first target model, so as to perform downsampling on the original image through the first target model to obtain a feature map, and performing upsampling on the feature map to reconstruct the target image.

The first target model is obtained by training a first preset model according to a first sample image set.

In this embodiment, the first target model may be trained in advance before the original image is acquired. After the original image is acquired, the original image can be input into a first target model, a feature map of the original image is acquired through the first target model, and a target image corresponding to the original image is obtained through reconstruction according to the feature map.

Specifically, a first preset model may be pre-built, and training is performed on the first preset model through a first sample image set, so as to obtain a first target model. For example, a first preset model may be constructed, a first sample image set may be acquired, and the first preset model may be trained by a plurality of first sample images in the first sample image set to obtain a first target model. The process of constructing the first preset model, obtaining the first sample image set, and training the first preset model to obtain the first target model may refer to the above embodiment, and this implementation is not described herein.

The process of obtaining the target image by reconstructing the feature map by downsampling the original image with the first target model may refer to the training process of the first target model in the foregoing embodiment, which is not described herein.

Step 303, inputting the target image into a second target model, so as to downsample the target image through the second target model, obtain a feature map of the target image, and determine category information according to the feature map of the target image.

The second target model is obtained by training a second preset model according to a second sample image set.

In this embodiment, before the original image is acquired, a second target model may be trained in advance, so that after the target image of the original image is acquired, the target image is input into the second target model, and the category information of the target included in the original image is determined through the second target model.

Specifically, a second preset model may be pre-built, and training is performed on the second preset model through a second sample image set to obtain a second target model. For example, a second preset model may be constructed, a second sample image set may be acquired, and the second preset model may be trained by using a plurality of second sample images in the second sample image set, to obtain a second target model. The process of constructing the second preset model, obtaining the second sample image set, and training the second preset model to obtain the second target model may refer to the above embodiment, and this implementation is not described herein.

The process of downsampling the target image by the second target model to obtain the feature map of the target image and determining the category information according to the feature map of the target image may refer to the training process of the second target model in the foregoing embodiment, which is not described herein.

Referring to fig. 4, fig. 4 is a block diagram illustrating an image classification apparatus according to an exemplary embodiment. The image classification apparatus 400 may be applied to image classification, and may include: a first acquisition module 401, a second acquisition module 402, and a determination module 403.

The first acquisition module 401 is configured to acquire an original image.

The second acquisition module 402 is configured to acquire a feature map of the original image, and reconstruct a target image corresponding to the original image according to the feature map.

The determining module 403 is configured to determine, from the target image, category information of the target included in the original image.

Optionally, the second obtaining module 402 is specifically configured to input the original image into a first target model, so as to obtain a feature map by downsampling the original image with the first target model, and upsample the feature map, and reconstruct the feature map to obtain the target image, where the first target model is obtained by training a first preset model according to a first sample image set.

Optionally, the determining module 403 is specifically configured to input the target image into a second target model, so as to downsample the target image through the second target model, obtain a feature map of the target image, and determine the category information according to the feature map of the target image, where the second target model is obtained by training a second preset model according to a second sample image set.

Optionally, the method may further include: the system comprises a third acquisition module, a first input module, a first training module, a second input module and a second training module.

The third acquisition module is configured to acquire a plurality of sample images.

The first input module is configured to input a first sample image into a first preset model to obtain a reconstructed image corresponding to the first sample image, wherein the first sample image is any one image of a plurality of sample images.

The first training module is configured to train the first preset model according to the first sample image and the reconstruction image to obtain a first target model.

The second input module is configured to input a second sample image into a second preset model to obtain category information of a target included in the second sample image, wherein the second sample image is any one image of a plurality of sample images.

The second training module is configured to train a second preset model according to the label information of the second sample image and the category information of the target included in the second sample image to obtain a second target model.

The second acquisition module 402 is specifically configured to input the original image into the first object model to obtain the object image.

The determining module 403 is specifically configured to input the target image into the second target model, resulting in category information.

Optionally, the method may further include: the system comprises a fourth acquisition module, a third input module, a fourth input module and a third training module.

The fourth acquisition module is configured to acquire a plurality of sample images.

The third input module is configured to input a target sample image into the first preset model to obtain a reconstructed image corresponding to the target sample image, wherein the target sample image is any one image of a plurality of sample images.

The fourth input module is configured to input the reconstructed image into a second preset model to obtain category information of the target included in the target sample image.

The third training module is configured to train the first preset model and the second preset model according to the label information of the target sample image and the category information of the target included in the target sample image to obtain the first target model and the second target model.

Referring to fig. 5, fig. 5 is a block diagram of another electronic device, shown in accordance with an exemplary embodiment.

The electronic device 500 includes:

a processor 501.

A memory 502 for storing instructions executable by the processor 501.

Wherein the processor 501 is configured to execute executable instructions stored in the memory 502 to implement the image classification method in the embodiment shown in fig. 1 and 3.

In an exemplary embodiment, a storage medium is also provided, such as a memory 502 including instructions executable by the processor 501 of the electronic device 500 to perform the image classification method of the embodiments shown in fig. 1 and 3.

Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

In an exemplary embodiment, a computer program product comprising instructions which, when run on a computer, cause the computer to perform the image classification method of the embodiment as shown in fig. 1 and 3 is also provided.

Referring to fig. 6, fig. 6 is a block diagram of yet another image classification apparatus, according to an exemplary embodiment, apparatus 600 may include one or more of the following components: a processing component 602, a memory 604, a power component 606, a multimedia component 608, an audio component 610, an input/output (I/O) interface 613, a sensor component 614, and a communication component 616.

The processing component 602 generally controls overall operation of the apparatus 600, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 602 may include one or more processors 620 to execute instructions to perform all or part of the steps of the image classification method described above. Further, the processing component 602 can include one or more modules that facilitate interaction between the processing component 602 and other components. For example, the processing component 602 may include a multimedia module to facilitate interaction between the multimedia component 608 and the processing component 602.

The memory 604 is configured to store various types of data to support operations at the apparatus 600. Examples of such data include instructions for any application or method operating on the apparatus 600, contact data, phonebook data, messages, pictures, videos, and the like. The memory 604 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The power supply component 606 provides power to the various components of the device 600. The power supply components 606 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 600.

The multimedia component 608 includes a screen between the device 600 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 608 includes a front camera and/or a rear camera. The front camera and/or the rear camera may receive external multimedia data when the apparatus 600 is in an operational mode, such as a photographing mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 610 is configured to output and/or input audio signals. For example, the audio component 610 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 600 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 604 or transmitted via the communication component 616. In some embodiments, audio component 610 further includes a speaker for outputting audio signals.

The I/O interface 613 provides an interface between the processing component 602 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.

The sensor assembly 614 includes one or more sensors for providing status assessment of various aspects of the apparatus 600. For example, the sensor assembly 614 may detect the open/closed state of the device 600, the relative positioning of the components, such as the display and keypad of the device 600, the sensor assembly 614 may also detect a change in position of the device 600 or a component of the device 600, the presence or absence of user contact with the device 600, the orientation or acceleration/deceleration of the device 600, and a change in temperature of the device 600. The sensor assembly 614 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. The sensor assembly 614 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 614 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 616 is configured to facilitate communication between the apparatus 600 and other devices in a wired or wireless manner. The apparatus 600 may access a wireless network based on a communication standard, such as WiFi, an operator network (e.g., 2G, 3G, 4G, or 5G), or a combination thereof. In one exemplary embodiment, the communication component 616 receives broadcast signals or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 616 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 600 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for performing the above-described image classification method.

In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 604, including instructions executable by processor 620 of apparatus 600 to perform the image classification method described above. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.

In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An image classification method, comprising:

acquiring an original image; the original image is a challenge sample;

acquiring a feature map of the original image, and reconstructing the feature map to obtain a target image corresponding to the original image; in the process of acquiring the feature map, compressing an original image and deleting interference information in the countermeasure sample;

2. The method according to claim 1, wherein the obtaining the feature map of the original image and reconstructing a target image corresponding to the original image according to the feature map includes:

3. The method according to claim 1 or 2, wherein said determining, from the target image, category information of a target included in the original image includes:

4. The method of claim 1, further comprising, prior to said capturing the original image:

acquiring a plurality of sample images;

training the second preset model according to the label information of the second sample image and the category information of the target included in the second sample image to obtain a second target model;

5. The method of claim 1, further comprising, prior to said capturing the original image:

acquiring a plurality of sample images;

6. An image classification apparatus, comprising:

a first acquisition module configured to acquire an original image; the original image is a challenge sample;

the second acquisition module is configured to acquire a feature map of the original image and reconstruct a target image corresponding to the original image according to the feature map; in the process of acquiring the feature map, compressing an original image and deleting interference information in the countermeasure sample;

7. The apparatus according to claim 6, wherein the second obtaining module is specifically configured to input the original image into a first target model, so as to obtain the feature map by downsampling the original image by the first target model, and to reconstruct the feature map to obtain the target image by upsampling, wherein the first target model is obtained by training a first preset model according to a first sample image set.

8. The apparatus according to claim 6 or 7, wherein the determining module is specifically configured to input the target image into a second target model, so as to downsample the target image through the second target model, obtain a feature map of the target image, and determine the category information according to the feature map of the target image, where the second target model is obtained by training a second preset model according to a second sample image set.

9. The apparatus as recited in claim 6, further comprising:

a third acquisition module configured to acquire a plurality of sample images;

10. The apparatus as recited in claim 6, further comprising:

a fourth acquisition module configured to acquire a plurality of sample images;

11. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the image classification method of any of claims 1-5.

12. A storage medium, which when executed by a processor of an electronic device, enables the electronic device to perform the image classification method of any of claims 1-5.