CN112818995A

CN112818995A - Image classification method and device, electronic equipment and storage medium

Info

Publication number: CN112818995A
Application number: CN202110111742.1A
Authority: CN
Inventors: 申世伟; 李家宏; 李思则; 王仲远
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2021-01-27
Filing date: 2021-01-27
Publication date: 2021-05-18
Anticipated expiration: 2041-01-27
Also published as: CN112818995B

Abstract

The disclosure relates to an image classification method, an image classification device and electronic equipment, wherein the image classification method comprises the steps of obtaining a segmentation characteristic image and scene information of an image to be classified; inputting the segmentation characteristic image into a target image classification network for classification processing to obtain initial classification information; generating target splicing characteristic information based on the scene information and the initial category information; inputting the target splicing characteristic information into a target generation network for image synthesis processing to obtain a target synthesis image; fusing the target synthetic image and the segmentation characteristic image to obtain a target fusion characteristic image; and inputting the target fusion characteristic image into a target image classification network for classification processing to obtain first target class information of the image to be classified. By utilizing the image classification method and device, the image classification precision can be improved, and the error rate is reduced.

Description

Image classification method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to an image classification method and apparatus, an electronic device, and a storage medium.

Background

An Artificial Intelligence (AI) technology is a comprehensive subject, and relates to a wide range of fields, namely a hardware technology and a software technology. The method utilizes the artificial intelligence technology to carry out classification processing, and plays an important role in a plurality of fields such as video monitoring, public safety and the like.

In the related art, due to the high cost and difficulty in constructing the training samples, the zero sample classification technical scheme for automatically synthesizing the picture pixel features based on the word vectors of unknown classes is popular in the industry. However, the zero sample classification technology only involves the text information in the training of the image classification network, so that the trained image classification network has the problems of low classification precision and easy error.

Disclosure of Invention

The disclosure provides an image classification method, an image classification device and electronic equipment, which at least solve the problems of low image classification precision and high error probability in the related art. The technical scheme of the disclosure is as follows:

according to a first aspect of the embodiments of the present disclosure, there is provided an image classification method, including:

acquiring a segmentation characteristic image and scene information of an image to be classified;

inputting the segmentation characteristic image into a target image classification network for classification processing to obtain initial classification information;

generating target splicing characteristic information based on the scene information and the initial category information;

inputting the target splicing characteristic information into a target generation network for image synthesis processing to obtain a target synthesis image;

fusing the target synthetic image and the segmentation characteristic image to obtain a target fusion characteristic image;

and inputting the target fusion characteristic image into the target image classification network for classification processing to obtain first target class information of the image to be classified.

Optionally, the method further includes:

acquiring a first category confidence corresponding to the initial category information;

acquiring a second category confidence corresponding to the first target category information;

and determining second target category information according to the first category confidence and the second category confidence.

Optionally, the fusing the target synthesized image and the segmentation feature image to obtain a target fusion feature image includes:

comparing the characteristic value between the corresponding pixel points of the target synthetic image and the segmentation characteristic image;

according to the comparison result, selecting target characteristic information corresponding to each pixel point of the image to be segmented from the target synthetic image and the segmentation characteristic image;

generating the target fusion characteristic image based on the target characteristic information;

or the like, or, alternatively,

and overlapping the characteristic values of corresponding pixel points in the target synthetic image and the segmentation characteristic image to obtain the target fusion characteristic image.

Optionally, the generating target splicing feature information based on the scene information and the initial category information includes:

respectively inputting the scene information and the initial category information into a target word vector network to obtain scene characteristic information corresponding to the scene information and category characteristic information corresponding to the initial category information;

and splicing the category characteristic information and the scene characteristic information to obtain target splicing characteristic information.

Optionally, the method further includes:

acquiring target category characteristic information and associated scene characteristic information of the target category characteristic information, wherein the target category characteristic information represents category characteristics of a training sample and a prediction sample;

splicing the target category characteristic information and the associated scene characteristic information to obtain first spliced characteristic information;

inputting the first splicing characteristic information into an initial generation network for image synthesis processing to obtain a first sample synthetic image;

inputting the first sample composite image into an initial judgment network for authenticity judgment to obtain a first image judgment result;

inputting the first sample synthetic image into an initial image classification network for classification processing to obtain first sample class information;

and training the initial image classification network based on the first image discrimination result, the first sample class information and the target class characteristic information to obtain the target image classification network.

Optionally, the obtaining of the associated scene feature information includes:

acquiring a scene image set, inputting the scene image set into a scene recognition network for scene recognition to obtain a scene information set;

inputting the scene information set into a target word vector network to obtain a scene characteristic information set;

calculating the similarity between the target category characteristic information and the scene characteristic information in the scene characteristic information set;

determining the associated scene feature information from the set of scene feature information based on the similarity.

Optionally, the method further includes:

acquiring the training sample, training scene characteristic information of the training sample and training category characteristic information of the training sample;

inputting the training sample into a feature extraction network of a segmentation network to be trained for feature extraction to obtain a sample segmentation feature image;

splicing the training category characteristic information and the training scene characteristic information to obtain second spliced characteristic information;

inputting the second splicing characteristic information into a to-be-trained generation network for image synthesis processing to obtain a second sample synthetic image;

fusing the second sample synthetic image and the sample segmentation characteristic image to obtain a sample fusion characteristic image;

inputting the second sample synthetic image, the sample segmentation feature image and the sample fusion feature image into a classification network of the segmentation network to be trained, and performing classification processing respectively to obtain second sample category information corresponding to the second sample synthetic image, third sample category information corresponding to the sample segmentation feature image and fourth sample category information corresponding to the sample fusion feature image;

inputting the sample segmentation characteristic image, the second sample composite image and the sample fusion characteristic image into a to-be-trained discrimination network, and performing authenticity discrimination respectively to obtain a second image discrimination result corresponding to the sample segmentation characteristic image, a third image discrimination result corresponding to the second sample composite image and a fourth image discrimination result corresponding to the sample fusion characteristic image;

training the segmentation network to be trained, the generation network to be trained and the discrimination network to be trained based on the second sample synthesized image, the sample segmentation feature image, the second sample category information, the third sample category information, the fourth sample category information, the training category feature information, the second image discrimination result, the third image discrimination result and the fourth image discrimination result to obtain the initial image segmentation network, the initial generation network and the initial discrimination network;

wherein the initial image segmentation network comprises the initial image classification network.

Optionally, the initial image segmentation network further includes a target feature extraction network;

the acquiring of the segmentation feature image of the image to be classified comprises:

and inputting the image to be classified into the target feature extraction network for feature extraction to obtain the segmentation feature image.

According to a second aspect of the embodiments of the present disclosure, there is provided an image classification apparatus including:

the data acquisition module is configured to acquire a segmentation characteristic image and scene information of an image to be classified;

the first classification processing module is configured to input the segmentation characteristic image into a target image classification network for classification processing to obtain initial classification information;

a target splicing characteristic information generation module configured to perform generation of target splicing characteristic information based on the scene information and the initial category information;

the first image synthesis processing module is configured to input the target splicing characteristic information into a target generation network for image synthesis processing to obtain a target synthesis image;

the first fusion processing module is configured to perform fusion processing on the target synthetic image and the segmentation characteristic image to obtain a target fusion characteristic image;

and the second classification processing module is configured to execute the step of inputting the target fusion characteristic image into the target image classification network for classification processing, so as to obtain first target class information of the image to be classified.

Optionally, the apparatus further comprises:

a first category confidence obtaining module configured to perform obtaining of a first category confidence corresponding to the initial category information;

a second category confidence obtaining module configured to perform obtaining of a second category confidence corresponding to the first target category information;

a target category information determination module configured to perform determining second target category information according to the first category confidence and the second category confidence.

Optionally, the first fusion processing module includes:

the characteristic value comparison unit is configured to compare the sizes of the characteristic values between the corresponding pixel points of the target synthetic image and the segmentation characteristic image;

a target feature information selecting unit configured to perform, according to a result of the comparison, selecting target feature information corresponding to each pixel point of the image to be segmented from the target synthesized image and the segmented feature image;

a target fusion feature image generation unit configured to perform generation of the target fusion feature image based on the target feature information;

or the like, or, alternatively,

and the superposition processing unit is configured to perform superposition processing on the feature values of the corresponding pixel points in the target synthetic image and the segmentation feature image to obtain the target fusion feature image.

Optionally, the target splicing characteristic information generating module includes:

a feature information obtaining unit configured to perform input of the scene information and the initial category information into a target word vector network, respectively, to obtain scene feature information corresponding to the scene information and category feature information corresponding to the initial category information;

and the splicing processing unit is configured to perform splicing processing on the category characteristic information and the scene characteristic information to obtain target splicing characteristic information.

Optionally, the apparatus further comprises:

the characteristic information acquisition module is configured to acquire target category characteristic information and associated scene characteristic information of the target category characteristic information, wherein the target category characteristic information represents category characteristics of a training sample and a prediction sample;

the first splicing processing module is configured to perform splicing processing on the target category characteristic information and the associated scene characteristic information to obtain first splicing characteristic information;

the second image synthesis processing module is configured to input the first splicing characteristic information into an initial generation network for image synthesis processing to obtain a first sample synthesis image;

the first authenticity judging module is configured to input the first sample composite image into an initial judging network for authenticity judgment to obtain a first image judging result;

a third classification processing module configured to perform classification processing of the first sample synthesized image input to an initial image classification network, so as to obtain first sample classification information;

a first network training module configured to perform training of the initial image classification network based on the first image discrimination result, the first sample class information, and the target class feature information, so as to obtain the target image classification network.

Optionally, the characteristic information obtaining module includes:

a scene image set acquisition unit configured to perform acquisition of a scene image set;

a scene recognition unit configured to perform scene recognition by inputting the scene image set into a scene recognition network, resulting in a scene information set;

the scene characteristic information set acquisition unit is configured to input the scene information set into a target word vector network to obtain a scene characteristic information set;

a similarity calculation unit configured to perform calculation of a similarity between the target category feature information and scene feature information in the scene feature information set;

an associated scene feature information determination unit configured to perform determination of the associated scene feature information from the scene feature information set based on the similarity.

Optionally, the apparatus further comprises:

a training data acquisition module configured to perform acquisition of the training sample, training scene feature information of the training sample, and training category feature information of the training sample;

the characteristic extraction module is configured to input the training sample into a characteristic extraction network of a segmentation network to be trained for characteristic extraction to obtain a sample segmentation characteristic image;

the second splicing processing module is configured to perform splicing processing on the training category characteristic information and the training scene characteristic information to obtain second splicing characteristic information;

the third image synthesis processing module is configured to input the second splicing characteristic information into a to-be-trained generation network for image synthesis processing to obtain a second sample synthetic image;

the second fusion processing module is configured to perform fusion processing on the second sample composite image and the sample segmentation feature image to obtain a sample fusion feature image;

a fourth classification processing module, configured to execute inputting the second sample composite image, the sample segmentation feature image, and the sample fusion feature image into a classification network of the segmentation network to be trained, and perform classification processing respectively to obtain second sample class information corresponding to the second sample composite image, third sample class information corresponding to the sample segmentation feature image, and fourth sample class information corresponding to the sample fusion feature image;

a second reality judging module configured to perform reality judgment by inputting the sample segmentation feature image, the second sample synthesis image and the sample fusion feature image into a to-be-trained judging network, so as to obtain a second image judging result corresponding to the sample segmentation feature image, a third image judging result corresponding to the second sample synthesis image and a fourth image judging result corresponding to the sample fusion feature image;

a second network training module configured to perform training of the segmented network to be trained, the generated network to be trained, and the discriminant network to be trained based on the second sample composite image, the sample segmentation feature image, the second sample class information, the third sample class information, the fourth sample class information, the training class feature information, the second image discriminant result, the third image discriminant result, and the fourth image discriminant result, to obtain the initial image segmented network, the initial generated network, and the initial discriminant network;

the data acquisition module comprises:

and the feature extraction unit is configured to input the image to be classified into the target feature extraction network for feature extraction, so as to obtain the segmentation feature image.

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including: a processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to implement the method of any of the first or third aspects above.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium, wherein instructions, when executed by a processor of an electronic device, enable the electronic device to perform the method of any one of the first or third aspects of the embodiments of the present disclosure.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the method of any one of the first or third aspects of embodiments of the present disclosure.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

classifying the images according to the segmentation characteristic images of the images to be classified to obtain initial classification information, generating target splicing characteristic information containing the classification information and the scene information according to the classification information and the scene information of the images to be classified, and generating a synthetic image corresponding to the images to be classified according to the target splicing characteristic information, so that the limitation on the scene where the objects in the images to be classified appear can be increased, and a target synthetic image capable of accurately representing the classification information and the scene information of the objects in the images to be classified is obtained; and then, the target synthetic image and the segmentation characteristic image are subjected to fusion processing, so that a target fusion characteristic image with more standard image characteristic information to be classified can be obtained, and the target fusion characteristic image is subjected to secondary classification based on the target fusion characteristic image, so that the image classification precision can be greatly improved, and the error rate is reduced.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 is a schematic diagram illustrating an application environment in accordance with an illustrative embodiment;

FIG. 2 is a flow diagram illustrating a method of image classification according to an exemplary embodiment;

FIG. 3 is a flow diagram illustrating a pre-trained target image classification network according to an exemplary embodiment;

FIG. 4 is a flow diagram illustrating a method of obtaining associated scene feature information in accordance with an exemplary embodiment;

FIG. 5 is a flow diagram illustrating pre-training an initial image segmentation network, an initial generation network, and an initial discrimination network in accordance with an exemplary embodiment;

FIG. 6 is a flow diagram illustrating a process for fusing a target composite image and a segmented feature image to obtain a target fused feature image in accordance with an illustrative embodiment;

FIG. 7 is a flow diagram illustrating another method of image classification according to an exemplary embodiment;

FIG. 8 is a block diagram illustrating an image classification device according to an exemplary embodiment;

FIG. 9 is a block diagram illustrating an electronic device for image classification in accordance with an exemplary embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

Referring to fig. 1, fig. 1 is a schematic diagram illustrating an application environment according to an exemplary embodiment, which may include a server 01 and a terminal 02, as shown in fig. 1.

In an alternative embodiment, server 01 may be used to train a target image classification network for the classification process. Specifically, the server 01 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and artificial intelligence platform, and the like.

In an alternative embodiment, the terminal 02 may perform a classification process in conjunction with a target image classification network trained by the server 01. Specifically, the terminal 02 may include, but is not limited to, a smart phone, a desktop computer, a tablet computer, a notebook computer, a smart speaker, a digital assistant, an Augmented Reality (AR)/Virtual Reality (VR) device, a smart wearable device, and other types of electronic devices. Optionally, the operating system running on the electronic device may include, but is not limited to, an android system, an IOS system, linux, windows, and the like.

In addition, it should be noted that fig. 1 shows only one application environment provided by the present disclosure, and in practical applications, other application environments may also be included, for example, training of a target image classification network may also be implemented on the terminal 02.

In the embodiment of the present specification, the server 01 and the terminal 02 may be directly or indirectly connected through a wired or wireless communication method, and the disclosure is not limited herein.

Fig. 2 is a flowchart illustrating an image classification method according to an exemplary embodiment, and as shown in fig. 2, the image classification method may be applied to an electronic device such as a server, a terminal, an edge computing node, and the like, and includes the following steps.

In step S201, a segmentation feature image and scene information of an image to be classified are acquired.

In this embodiment of the present specification, the image to be classified may be an image that needs to be classified, specifically, the image to be classified may include an object (i.e., an object to be classified), and optionally, the object may be an entity (e.g., a person, an animal, a thing, a building, etc.) in an image, for example, an image including a cat, and the object may be a cat.

In an optional embodiment, a preset feature extraction network may be combined to perform feature extraction on an image to be classified to obtain a segmented feature image.

In an optional embodiment, the image to be classified may be input to a scene recognition network for scene recognition, so as to obtain scene information.

In a specific embodiment, an image with a scene mark may be used as training data to train a preset deep learning network, so as to obtain a scene recognition network capable of performing scene recognition. Correspondingly, the image to be classified is input into the scene recognition network for scene recognition, and scene information of the image to be classified can be obtained.

Optionally, the preset deep learning network may include, but is not limited to, a convolutional neural network, a logistic regression neural network, a recurrent neural network, and other deep learning networks.

In step S203, the segmentation feature images are input to a target image classification network for classification processing, and initial classification information is obtained.

In an alternative embodiment, the target image classification network may be trained in advance, and accordingly, the method may further include the step of training the target image classification network in advance, and specifically, as shown in fig. 3, the method may include the following steps:

in step S301, target category feature information and associated scene feature information of the target category feature information are acquired.

In the embodiment of the present specification, the target category feature information may represent category features of a training sample and a prediction sample; in a specific embodiment, the class features of the training samples may be a large number of known class features, that is, class features of the training samples used for training the target image classification network; the class characteristics of the prediction samples are a large number of unknown class characteristics, namely the class characteristics of the images which do not participate in the classification of the target images; accordingly, the training samples may include a number of sample images used to train the target image classification network; the prediction samples may include a large number of images that do not participate in the target image classification and belong to a trained target image classification network that can be segmented (need to be predicted), i.e., zero samples.

In an optional embodiment, the obtaining of the target category feature information includes:

acquiring class information of a training sample and a prediction sample;

and inputting the category information into a target word vector network to obtain target category characteristic information.

In the embodiment of the present specification, during training, although the prediction sample is not obtained, the category information of the image that needs to be segmented by the target image classification network in practical application may be obtained as the category information of the prediction sample in combination with the practical application requirements.

In a specific embodiment, the category information may be a category of an object included in a plurality of images (i.e., training samples or prediction samples), for example, a cat (object) is included in one image, and accordingly, the category information of the image is the cat.

In a specific embodiment, the target word vector network may be obtained by training a preset word vector network based on preset training text information. In an alternative embodiment, the preset training text information may be text information related to an application field of the target image classification network.

Optionally, in the process of performing target word vector network training, word segmentation processing may be performed on preset training text information, word segmentation information (i.e., words) after word segmentation processing is input into the target word vector network for training, and each word may be mapped into a K-dimensional real number vector in the training process, so that a word vector set representing semantic association degrees between words may be obtained while the target word vector network is obtained. Optionally, subsequently, the category information (word) is input into the target word vector network, and the target word vector network may determine a word vector of the category information based on the word vectors in the word vector set, and use the word vector of the category information as the target category feature information corresponding to the category information.

In an alternative embodiment, the preset word vector network may include, but is not limited to, word2vec, fasttext, glove, etc.

In the above embodiment, the training data of the initial image classification network is obtained by obtaining the corresponding class characteristics of the training sample and the prediction sample, so that the recognition capability of the trained target image classification network on unknown classes can be improved, and the classification accuracy is further greatly improved.

In a specific embodiment, as shown in fig. 4, fig. 4 is a flowchart illustrating a method for obtaining associated scene feature information according to an exemplary embodiment, and specifically, the method may include the following steps:

in step S401, a scene image set is acquired.

In step S403, inputting the scene image set into a scene recognition network for scene recognition, so as to obtain a scene information set;

in step S405, inputting a scene information set into a target word vector network to obtain a scene feature information set;

in step S407, calculating a similarity between the target category feature information and the scene feature information in the scene feature information set;

in step S409, associated scene feature information is determined from the scene feature information set based on the similarity.

In a particular embodiment, the scene image set may include a plurality of images corresponding to a scene. The scene information set may be scene information corresponding to a large number of images in the scene image set, such as images taken in a bedroom, and the scene information is the bedroom; the image of the fish in the pond is shot, and the scene information can be the pond.

In a specific embodiment, the scene recognition network may be combined, and the scene image set is input to the scene recognition network for scene recognition, so as to obtain a scene information set corresponding to the images in the scene image set.

In an alternative embodiment, the scene information (words) in the scene information set is input into a target word vector network, and the target word vector network may determine word vectors of the scene information based on the word vectors in the word vector set, and use the word vectors of the scene information as scene feature information corresponding to the scene information.

In an optional embodiment, the target word vector network for acquiring the scene feature information set and the target word vector network for acquiring the target category feature information by the user are the same word vector network, that is, the target word vector network trained based on the same preset training text information is obtained, so that the accuracy of representing the semantic association degree between the scene information and the category information can be improved.

In an optional embodiment, the similarity between the target category characteristic information and the scene characteristic information may represent a semantic similarity between words (category information and scene information) corresponding to the target category characteristic information and the scene characteristic information; specifically, the higher the similarity between the target category characteristic information and the scene characteristic information is, the higher the semantic similarity between the words corresponding to the target category characteristic information and the scene characteristic information is; conversely, the lower the similarity between the target category feature information and the scene feature information, the lower the semantic similarity between the words corresponding to the target category feature information and the scene feature information.

In an alternative embodiment, the similarity between the object class feature information and the scene feature information may include, but is not limited to, a cosine distance, a euclidean distance, and a manhattan distance between the object class feature information and the scene feature information.

In an optional embodiment, the target category feature information may include category feature information (word vectors) corresponding to a plurality of category information, and accordingly, for each category information, the scene feature information of N before the similarity ranking between the category feature information corresponding to the category information may be selected as the initially selected scene feature information, and one scene feature information may be randomly selected from the initially selected scene feature information as the associated scene feature information of the category feature information.

Optionally, for each category of information, scene feature information corresponding to the category of information, of which the similarity between the category of feature information is greater than or equal to a preset threshold, may also be selected as the initially selected scene feature information, and one scene feature information is randomly selected from the initially selected scene feature information as the associated scene feature information of the category of feature information.

In this embodiment of the present specification, the preset threshold and N may be set according to actual application requirements.

In the above embodiment, the prediction of the scene where a certain class of objects appears can be realized by obtaining the associated scene feature information of the target class feature information, and further, when it is ensured that the picture pixel features are automatically synthesized based on the word vectors of unknown classes or known classes, the limitation of the scene where the classes appear can be increased, so that the training of the image classification network is more focused on the synthesis of the image pixel features in a specific scene.

In step S303, the target category feature information and the associated scene feature information are subjected to stitching processing to obtain first stitching feature information.

In a specific embodiment, the splicing processing of the target category characteristic information and the associated scene characteristic information may include splicing processing of category characteristic information corresponding to each category information in the target category characteristic information and associated scene characteristic information of the category characteristic information. For example, the category characteristic information corresponding to a certain category information is [1, 2, 3, 4, 5 ]; the associated scene feature information of the category feature information is [6, 7, 8, 9, 0], optionally, the first splicing feature information corresponding to the category feature information may be [1, 2, 3, 4, 5, 6, 7, 8, 9, 0], or may be [6, 7, 8, 9, 0, 1, 2, 3, 4, 5 ].

In an optional embodiment, in order to improve the accuracy of feature extraction in the zero sample learning process, a training sample, training scene feature information of the training sample, and training category feature information of the training sample may be combined to perform pre-training, and accordingly, as shown in fig. 5, the method may further include:

s501: acquiring training samples, training scene characteristic information of the training samples and training category characteristic information of the training samples;

in this embodiment of the present specification, the training scenario feature information may be a word vector of the training sample corresponding to the scenario information; in an optional embodiment, the specific refining step for obtaining the training scene feature information of the training sample may refer to the specific refining step for obtaining the scene feature information set of the scene image set, which is not described herein again.

In this embodiment of the present specification, the training category feature information of the training sample may be a word vector of the training sample corresponding to the category information. In an optional embodiment, the specific refining step for obtaining the training class feature information of the training sample may refer to the above-mentioned refining step for obtaining the target class feature information, and is not described herein again.

S503: and inputting the training sample into a feature extraction network of the segmentation network to be trained for feature extraction to obtain a sample segmentation feature image.

In an alternative embodiment, the segmentation network to be trained may include deep lab (semantic image classification network), but the embodiments of the present specification are not limited to the foregoing, and in practical applications, other deep learning networks may also be included.

In a particular embodiment, the segmented network to be trained may include a feature extraction network and a classification network. In this embodiment, the feature extraction network may be configured to extract feature information of an image (training sample), and input the training sample into the feature extraction network of the segmentation network to be trained to perform feature extraction, so as to obtain a sample segmentation feature image.

S505: and splicing the training category characteristic information and the training scene characteristic information to obtain second spliced characteristic information.

In an optional embodiment, the specific step of performing the splicing processing on the training category characteristic information and the training scene characteristic information to obtain the second spliced characteristic information may refer to the step of performing the splicing processing on the target category characteristic information and the associated scene characteristic information, which is not described herein again.

S507: and inputting the second splicing characteristic information into a to-be-trained generation network for image synthesis processing to obtain a second sample synthetic image.

In an alternative embodiment, the generated network to be trained may be a generator in a GAN (Generative adaptive Networks). And inputting the second splicing characteristic information into a to-be-trained generation network for image synthesis processing to obtain a second sample synthetic image.

In practical application, camels often appear in desert scenes, fishes often appear in oceans, ponds and other scenes, and the scenes in which most objects (objects) appear are limited. In the above embodiment, after the training category feature information and the training associated scene feature information of the training sample are spliced, the obtained second splicing feature information is used to synthesize the synthetic image corresponding to the training sample, so that the limitation on the scene where the object corresponding to the training sample appears can be increased, the second sample synthetic image capable of accurately representing the object category information and the scene information can be obtained, and the feature mapping capability on the training sample is greatly improved.

S509: and carrying out fusion processing on the second sample synthetic image and the sample segmentation characteristic image to obtain a sample fusion characteristic image.

In an optional embodiment, the fusing the second sample synthesized image and the sample segmented feature image to obtain a sample fused feature image may include:

comparing the characteristic value of the corresponding pixel points of the second sample synthetic image and the sample segmentation characteristic image;

according to the comparison result, selecting sample characteristic information corresponding to each pixel point of the training sample from the second sample synthetic image and the sample segmentation characteristic image;

and generating a sample fusion feature image based on the sample feature information.

In a specific embodiment, the size of the feature value between the pixel points corresponding to the second sample synthesized image and the sample segmented feature image corresponding to each sample image in the training sample can be respectively compared, and optionally, the feature value on the contract image of the second sample can be used as the sample feature information corresponding to a pixel point under the condition that the feature value of the pixel point on the second sample synthesized image is determined to be greater than the feature value of the pixel point on the sample segmented feature image according to the comparison result; correspondingly, under the condition that the characteristic value of a certain pixel point on the second sample composite image is smaller than the characteristic value of the pixel point on the sample segmentation characteristic image according to the comparison result, the characteristic value on the sample segmentation characteristic image can be used as the sample characteristic information corresponding to the pixel point. Further, a sample fusion feature image corresponding to each sample image may be generated based on sample feature information corresponding to all pixel points in each sample image.

In addition, when it is determined that the feature value of a certain pixel point on the second sample composite image is equal to the feature value of the pixel point on the sample segmentation feature image according to the comparison result, the feature value can be used as sample feature information corresponding to the pixel point.

and overlapping the characteristic values of the pixel points corresponding to the second sample synthetic image and the sample segmentation characteristic image to obtain a sample fusion characteristic image.

Specifically, the feature values of the second sample composite image and the sample segmentation feature image corresponding to each sample image on the corresponding pixel points may be superimposed to obtain a sample fusion feature image corresponding to each sample image.

In addition, it should be noted that, in practical applications, the second sample combined image and the sample division feature image may be fused in other manners, and the embodiment of the present invention is not limited to the above.

In the embodiment, the second sample synthetic image and the sample segmentation characteristic image which can accurately represent the object class information and the scene information are fused, so that the extracted characteristic information of the training sample is effectively increased.

S511: and inputting the second sample synthetic image, the sample segmentation characteristic image and the sample fusion characteristic image into a classification network of a segmentation network to be trained, and performing classification processing respectively to obtain third sample class information corresponding to the second sample class information sample segmentation characteristic image corresponding to the second sample synthetic image and fourth sample class information corresponding to the sample fusion characteristic image.

In this embodiment of the present specification, the second sample composite image may include a composite image corresponding to each sample image in the training sample, and accordingly, the second sample category information corresponding to each composite image may represent the prediction category feature information of the composite image; optionally, the sample segmentation feature image may include image feature information corresponding to each sample image in the training sample; correspondingly, the third sample category information corresponding to each image feature information here can represent the prediction category feature information of the image feature information; optionally, the sample fusion feature image may include a fusion feature image corresponding to each sample image in the training sample, and correspondingly, the fourth sample category information corresponding to each fusion feature image may represent prediction category feature information of the fusion feature image.

S513: and inputting the sample segmentation characteristic image, the second sample composite image and the sample fusion characteristic image into a to-be-trained discrimination network, and performing authenticity discrimination respectively to obtain a second image discrimination result corresponding to the sample segmentation characteristic image, a third image discrimination result corresponding to the second sample composite image and a fourth image discrimination result corresponding to the sample fusion characteristic image.

In an alternative embodiment, the discriminant network to be trained may be a GAN in-discriminant. In the embodiment of the present specification, the second image discrimination result corresponding to the sample segmentation feature image may represent the prediction probability that the sample segmentation feature image is a real image; the third image discrimination result corresponding to the second sample synthesized image may represent the prediction probability that the second sample synthesized image is a real image. The fourth image discrimination result corresponding to the sample fusion characteristic image can represent the prediction probability that the sample fusion characteristic image is a real image. In the embodiments of the present specification, the real image may be a non-synthesized image.

S515: and training a segmentation network to be trained, a generation network to be trained and a discrimination network to be trained based on the second sample synthesized image, the sample segmentation feature image, the second sample class information, the third sample class information, the fourth sample class information, the training class feature information, the second image discrimination result, the third image discrimination result and the fourth image discrimination result to obtain an initial image segmentation network, an initial generation network and an initial discrimination network.

Specifically, the initial image segmentation network may include an initial image classification network and a target feature extraction network.

In an optional embodiment, the obtained segmented feature image of the image to be classified may also be obtained by inputting the image to be classified into a target feature extraction network for feature extraction, so as to obtain the segmented feature image.

In the above embodiment, the feature extraction of the image to be classified is performed by combining with the target feature extraction network in the initial image segmentation network, so that the feature information of the image to be classified can be better extracted.

In a specific embodiment, the training the segmentation network to be trained, the generation network to be trained, and the discrimination network to be trained based on the second sample synthesized image, the sample segmentation feature image, the second sample class information, the third sample class information, the fourth sample class information, the training class feature information, the second image discrimination result, the third image discrimination result, and the fourth image discrimination result to obtain the initial image segmentation network, the initial generation network, and the initial discrimination network may include:

calculating content loss by using the second sample synthesis image and the sample segmentation characteristic image;

calculating second segmentation loss by using the second sample class information, the third sample class information, the fourth sample class information and the training class characteristic information;

calculating a second discrimination loss using the second image discrimination result, the third image discrimination result, and the third fourth image discrimination result;

determining a second target loss according to the content loss, the second judgment loss and the second segmentation loss;

under the condition that the second target loss does not meet a second preset condition, updating network parameters in the segmentation network to be trained, the generation network to be trained and the judgment network to be trained;

and updating the second target loss based on the updated segmentation network to be trained, the updated generation network to be trained and the updated discrimination network to be trained until the second target loss meets a second preset condition, taking the current segmentation network to be trained as an initial image classification network, taking the current generation network to be trained as an initial generation network and taking the current discrimination network to be trained as an initial discrimination network.

In a particular embodiment, the content loss may reflect a difference between the second sample composite image generated by the generation network to be trained and the segmented feature map. In a specific embodiment, the content loss may be a similarity distance between a second sample composite image corresponding to the sample image in the training sample and the sample segmentation feature image. In an alternative embodiment, the similarity distance between the second sample composite image and the sample segmented feature image may include, but is not limited to, a cosine distance, a euclidean distance, and a manhattan distance between the second sample composite image and the sample segmented feature image. In an alternative embodiment, the magnitude of the content loss value is proportional to the magnitude of the difference between the second sample composite image and the segmented feature map, and accordingly, the smaller the content loss value, the higher the performance of the initially generated network obtained by training.

In a specific embodiment, calculating the second segmentation loss using the second sample class information, the third sample class information, the fourth sample class information, and the training class feature information may include calculating a first segmentation sub-loss between the second sample class information and the training class feature information based on a preset loss function, calculating a second segmentation sub-loss between the third sample class information and the training class feature information, and calculating a third segmentation sub-loss between the fourth sample class information and the training class feature information; the first, second, and third fractional losses are weighted to obtain the second fractional loss. The weights of the first, second and third partition losses may be set in combination with actual application requirements.

Specifically, the first segmentation sub-loss can represent the difference between each pixel point of the second sample composite image and each pixel point of the training category feature information; the second segmentation sub-loss can represent the difference between each pixel point of the sample segmentation characteristic image and each pixel point of the training class characteristic information, and the third segmentation sub-loss can represent the difference between each pixel point of the sample fusion characteristic image and each pixel point of the training class characteristic information.

In a specific embodiment, calculating the second discrimination loss using the second image discrimination result, the third image discrimination result, and the third four-image discrimination result may include calculating a first discrimination sub-loss between the second image discrimination result and the authenticity label corresponding to the sample segmentation feature image, and calculating a second discrimination sub-loss between the third image discrimination result and the authenticity label corresponding to the second sample synthesis image, and calculating a third discrimination sub-loss between the third four-image discrimination result and the authenticity label corresponding to the sample fusion feature image, based on a preset loss function. And weighting the first discriminant loss, the second discriminant loss and the third discriminant loss to obtain the second discriminant loss. The weights of the first discriminant loss, the second discriminant loss and the third discriminant loss can be set according to actual application requirements.

Specifically, the first discriminant loss can represent the difference between the second image discrimination result and the authenticity label corresponding to the sample segmentation feature image; the second judgment sub loss can represent the difference between the third image judgment result and the authenticity label corresponding to the second sample composite image; the third discriminant loss can represent the difference between the fourth image discrimination result and the authenticity label corresponding to the sample fusion feature image.

In an optional embodiment, since the sample segmentation feature image is a real image, accordingly, the authenticity label corresponding to the sample segmentation feature image may be 1(1 represents the real image); since the second sample composite image is a composite image, not a real image; accordingly, the authenticity label corresponding to the second sample composite image may be 0(0 represents the non-authentic image, i.e., the composite image); since the sample fusion feature image includes the composite image (second sample composite image), it is not a real image; accordingly, the authenticity label corresponding to the sample fused feature image may be 0(0 represents a non-real image, i.e., a composite image).

In the embodiment of the present disclosure, the preset loss function may include, but is not limited to, a cross entropy loss function, a logic loss function, a Hinge loss function, an exponential loss function, and the like, and the embodiment of the present disclosure is not limited to the above. And the loss functions used to calculate the discrimination loss and the segmentation loss may be the same or different.

In a specific embodiment, after obtaining the content loss, the second division loss, and the second discrimination loss, the content loss, the second division loss, and the second discrimination loss may be subjected to weighted calculation to obtain a second target loss. Specifically, the weights of the content loss, the second segmentation loss and the second discrimination loss may be set according to the actual application requirements.

In an optional embodiment, the second target loss meeting the second preset condition may be that the second target loss is less than or equal to a specified threshold, or that a difference between corresponding second target losses in two previous training sessions is less than a certain threshold. In the embodiment of the present specification, the specified threshold and a certain threshold may be set in combination with actual training requirements.

In practical application, in the multiple iteration process of network training, part of training samples are randomly selected from the training samples each time to participate in the training. Correspondingly, updating the second target loss based on the updated segmented network to be trained, the updated generated network to be trained and the updated discriminant network to be trained may include randomly selecting a part of training samples from the training samples, training category feature information and training scene feature information of the part of training samples, and repeating the step of determining the second target loss in the above steps S503 to S515 in combination with the updated segmented network to be trained, the updated generated network to be trained and the updated discriminant network to be trained.

In the above embodiment, in the pre-training process, the limitation that scene information appears on objects corresponding to each category is increased, so that training of the image classification network is more focused on synthesis of image pixel features in a specific scene, and the synthesized second sample synthetic image and the sample segmentation feature image are subjected to fusion processing, which can greatly improve the feature mapping capability of the training sample, and determine a second target loss by combining content loss, second segmentation loss and second discrimination loss, so that the similarity between the synthetic image generated by the trained initial generation network and a real sample can be improved, and further the classification accuracy of the trained initial image classification network is improved.

In step S305, the first stitching feature information is input to the initial generation network and image synthesis processing is performed, so that a first sample synthesized image is obtained.

In an alternative embodiment, the initial generation network may be obtained by pre-training the generator in the GAN based on the training class feature information of the training samples and the training scenario feature information of the training samples. In the embodiment of the present specification, the first stitching feature information is input to the initial generation network to perform image synthesis processing, so as to obtain a first sample synthesis image.

In the above embodiment, after the class characteristic information corresponding to the class information and the associated scene characteristic information are spliced, the obtained first splicing characteristic information is used to synthesize an image corresponding to the class information, so that the limitation of a scene where an object corresponding to the class information appears can be increased, a first sample synthesized image capable of accurately representing the class information and the scene information of the object can be obtained, the capability of mapping the unknown class characteristics is greatly improved, and the classification accuracy in subsequent applications is further improved.

In step S307, the first sample combined image is input to the initial discrimination network to perform authenticity discrimination, and a first image discrimination result is obtained.

In an optional embodiment, the initial discriminant network may be obtained by pre-training the discriminant in the GAN based on the training samples, the training class feature information of the training samples, and the training scene feature information of the training samples.

In this embodiment, the first sample composite image may include a composite image corresponding to each sample image in the training sample or each image in the prediction sample, and accordingly, the first image determination result of each composite image may represent whether the composite image is a true sample image or a prediction probability of whether the composite image is an image in a true prediction sample.

In step S309, the first sample synthesized image is input to the initial image classification network and subjected to classification processing, so as to obtain first sample classification information.

In an optional embodiment, the initial image classification network is obtained by pre-training the segmentation network to be trained based on the training samples, the training scene feature information of the training samples, and the training class feature information of the training samples.

Optionally, the first sample composite image is input to an initial image classification network for classification processing, so as to obtain first sample classification information. Optionally, the first sample category information corresponding to the first sample composite image may represent the prediction category feature information of the first sample composite image.

In step S311, an initial image classification network is trained based on the first image discrimination result, the first sample class information, and the target class feature information, so as to obtain a target image classification network.

In a specific embodiment, training an initial image classification network based on the first image discrimination result, the first sample class information, and the target class feature information, and obtaining the target image classification network may include:

calculating a first discrimination loss using the first image discrimination result and the authenticity label of the first sample composite image;

calculating a first segmentation loss by using the first sample class information and the target class characteristic information;

determining a first target loss according to the first discrimination loss and the first segmentation loss;

under the condition that the first target loss does not meet a first preset condition, updating network parameters in the initial image classification network, the initial generation network and the initial judgment network;

and updating the first target loss based on the updated initial image classification network, the initial generation network and the initial judgment network until the first target loss meets a first preset condition, and taking the current initial image classification network as a target image classification network.

In a specific embodiment, calculating the first discrimination loss using the first image discrimination result and the authenticity label of the first sample composite image may include calculating a discrimination loss between the first image discrimination result and the authenticity label of the first sample composite image based on a preset loss function, the discrimination loss being taken as the first discrimination loss. Specifically, the first discrimination loss may represent a difference between the first image discrimination result and the authenticity label corresponding to the first sample composite image.

In an alternative embodiment, since the first sample composite image is a composite image, it is not a real image; accordingly, the authenticity label for the first sample composite image may be 0(0 characterizing the non-authentic image, i.e., the composite image)

In a specific embodiment, calculating the first segmentation loss using the first sample class information and the target class characteristic information may include calculating a segmentation loss between the first sample class information and the target class characteristic information based on a preset loss function, the segmentation loss being the first segmentation loss. The first segmentation loss may characterize a difference between each pixel point of a composite image and each pixel point of the target class feature information.

In the embodiment of the present disclosure, the preset loss function may include, but is not limited to, a cross entropy loss function, a logic loss function, a Hinge loss function, an exponential loss function, and the like. And the loss functions used to calculate the discrimination loss and the segmentation loss may be the same or different.

In a specific embodiment, after obtaining the first segmentation loss and the first discriminant loss, a weighted calculation may be performed on the first segmentation loss and the first discriminant loss to obtain a first target loss. Specifically, the weight of the first segmentation loss and the first discrimination loss may be set in combination with the actual application requirement.

In an alternative embodiment, the first target loss meeting the first preset condition may be that the input first target loss is less than or equal to a specified threshold, or that a difference between a first target loss corresponding to two previous training sessions and a first target loss corresponding to the last training session is less than a certain threshold. In the embodiment of the present specification, the specified threshold and a certain threshold may be set in combination with actual training requirements.

In practical application, in the multiple iteration process of network training, part of target category feature information and associated scene feature information of the target category feature information are randomly selected from the target category feature information each time to participate in the training. Optionally, the unknown class features are randomly extracted with a larger probability, and the known class features are randomly extracted with a smaller probability. Correspondingly, for specific refinement of updating the first target loss based on the updated initial image classification network, the initial generation network and the initial discrimination network, reference may be made to the above-mentioned related refinement steps of updating the second target loss based on the updated segmentation network to be trained, the updated generation network to be trained and the updated discrimination network, and details are not repeated here.

In the above embodiment, the first target loss is determined by combining the first segmentation loss determined by the first sample category information and the target category feature information and the second determination loss determined by the first image determination result and the authenticity label of the first sample synthetic image, so that the initial image classification network can be trained better on the basis of effectively ensuring the similarity between the first sample synthetic image generated by the initial generation network and the real sample (training sample or prediction sample), and the zero sample classification accuracy is greatly improved.

In the above embodiment, the class characteristics corresponding to the training samples and the prediction samples are obtained to be used as the training data of the initial image classification network, so that the recognition capability of the trained target image classification network on unknown classes can be improved, and the prediction on the scene where a certain class object appears can be realized by obtaining the associated scene characteristic information of the target class characteristic information, thereby ensuring that the limitation on the scene where the classes appear can be increased when the picture pixel characteristics are automatically synthesized based on the word vectors of the unknown classes or the known classes, so that the training of the image classification network is more focused on the synthesis of the image pixel characteristics in a specific scene, and thus the classification network in the zero-sample image segmentation training can be better adjusted by using the scene context, and the precision of the zero-sample classification is greatly improved.

In a specific embodiment, after the target image classification network is trained, the segmentation feature images are input into the target image classification network for classification processing, and initial classification information is obtained. Specifically, the initial category information may be category information corresponding to each pixel point in an image to be classified predicted by combining the segmented feature image.

In step S205, target stitching feature information is generated based on the scene information and the initial category information of the image to be classified.

In a specific embodiment, generating the scene category feature information based on the scene information and the initial category information of the image to be classified may include: respectively inputting the scene information and the initial category information into a target word vector network to obtain scene characteristic information corresponding to the scene information and category characteristic information corresponding to the initial category information; and splicing the category characteristic information and the scene characteristic information to obtain target splicing characteristic information.

In this embodiment of the present specification, the specific refinement step of respectively inputting the scene information and the initial category information of the image to be classified into the target word vector network to obtain the scene feature information corresponding to the scene information and the category feature information corresponding to the initial category information may refer to the specific refinement for obtaining the target category feature information, and is not described herein again.

In a specific embodiment, the category characteristic information and the scene characteristic information may be subjected to stitching processing to obtain specific refinement of the target stitching characteristic information, which may be referred to above as the stitching processing of the target category characteristic information and the associated scene characteristic information to obtain specific refinement of the first stitching characteristic information, and details are not repeated here.

In step S207, the target stitching feature information is input to the target generation network to perform image synthesis processing, so as to obtain a target synthesis image.

In a specific embodiment, the target stitching feature information is input into a target generation network for image synthesis processing, so as to obtain a target synthetic image.

In the above embodiment, after the class feature information and the scene feature information of the image to be classified are subjected to stitching processing, the obtained target stitching feature information is used to generate the composite image corresponding to the image to be classified, so that the limitation on the scene where the object appears in the image to be classified can be increased, the target composite image capable of accurately representing the class information and the scene information of the object in the image to be classified can be obtained, the feature information of the extracted image to be classified is effectively increased, and the image classification accuracy is further improved.

In step S209, the target synthesized image and the segmentation feature image are subjected to fusion processing, and a target fusion feature image is obtained.

In a specific embodiment, as shown in fig. 6, the fusing the target synthesized image and the segmented feature image to obtain the target fused feature image may include the following steps:

in step S2091, the feature values of the pixels corresponding to the target synthesized image and the segmented feature image are compared.

In step S2093, according to the comparison result, the target feature information corresponding to each pixel point of the image to be segmented is selected from the target synthesized image and the segmented feature image.

In step S2095, a target fusion feature image is generated based on the target feature information.

In an optional embodiment, when it is determined that the feature value of a certain pixel point on the target synthesized image is greater than the feature value of the pixel point on the segmentation feature image according to the comparison result, the feature value on the target synthesized image may be used as the target feature information corresponding to the pixel point; correspondingly, under the condition that the characteristic value of a certain pixel point on the target synthetic image is smaller than the characteristic value of the pixel point on the segmentation characteristic image according to the comparison result, the characteristic value on the segmentation characteristic image can be used as the target characteristic information corresponding to the pixel point. Further, the target feature information corresponding to all the pixel points in the image to be classified can be used as the target fusion feature image corresponding to the image to be classified.

In addition, when it is determined that the feature value of a certain pixel point on the target synthesized image is equal to the feature value of the pixel point on the segmented feature image according to the comparison result, the feature value can be used as the target feature information corresponding to the pixel point.

In an optional embodiment, the fusing the target synthesized image and the segmented feature image to obtain the target fused feature image may include:

and overlapping the characteristic values of the corresponding pixel points in the target synthetic image and the segmentation characteristic image to obtain a target fusion characteristic image.

In addition, it should be noted that, in practical applications, the target synthesized image and the segmented feature image may be subjected to the fusion processing in other manners, and the embodiments of the present specification are not limited to the above.

In the embodiment, the target synthetic image and the segmentation characteristic image which can accurately represent the object class information and the scene information in the image to be classified are fused, so that the extracted characteristic information of the image to be classified can be effectively increased, and the image classification precision is further improved.

In step S211, the target fusion feature image is input to a target image classification network for classification processing, so as to obtain first target class information of the image to be classified.

In this embodiment of the present specification, the first target category information of the image to be classified may be combined with target category information corresponding to each pixel point in the image to be classified predicted by the target fusion feature image.

According to the technical scheme provided in the embodiment of the present specification, after the initial classification information is obtained by performing classification processing in combination with the segmentation feature image of the image to be classified, the target splicing feature information including the category information and the scene information is generated in combination with the category information and the scene information of the image to be classified, and the composite image corresponding to the image to be classified is generated in combination with the target splicing feature information, so that the limitation on the scene where the object appears in the image to be classified can be increased, and the target composite image capable of accurately representing the category information and the scene information of the object in the image to be classified can be obtained; and then, the target synthetic image and the segmentation characteristic image are subjected to fusion processing, so that a target fusion characteristic image with more standard image characteristic information to be classified can be obtained, and the target fusion characteristic image is subjected to secondary classification based on the target fusion characteristic image, so that the image classification precision can be greatly improved, and the error rate is reduced.

In an alternative embodiment, as shown in fig. 7, the method may further include:

in step S213, a first category confidence corresponding to the initial category information is obtained;

in step S215, a second category confidence corresponding to the first target category information is obtained;

in step S217, second target category information is determined according to the first category confidence and the second category confidence.

In practical application, in the process of carrying out classification processing on a target image classification network, the class confidence of a plurality of class information corresponding to each pixel point of an image to be classified can be determined; the category confidence can represent the probability that the corresponding pixel point belongs to the corresponding category information; accordingly, the category information corresponding to the highest category confidence may be used as the target category information corresponding to the pixel point.

In a specific embodiment, for each pixel point, the category information corresponding to the higher category confidence may be used as the second target category information of the pixel point in combination with the first category confidence corresponding to the initial category information and the second category confidence corresponding to the first target category information.

In the above embodiment, the images to be classified are classified again by combining the class confidence of the class information obtained by the two classification processes, and more accurate class information can be selected for each pixel point in the images to be classified, so that the overall classification precision of the images is greatly improved, and the error rate is reduced.

Fig. 8 is a block diagram illustrating an image classification device according to an exemplary embodiment. Referring to fig. 8, the apparatus includes:

a data obtaining module 810 configured to perform obtaining of a segmentation feature image and scene information of an image to be classified;

a first classification processing module 820 configured to perform classification processing of inputting the segmentation feature image into a target image classification network, so as to obtain initial classification information;

a target splicing characteristic information generating module 830 configured to generate target splicing characteristic information based on the scene information and the initial category information;

a first image synthesis processing module 840 configured to perform image synthesis processing by inputting the target stitching feature information into a target generation network, so as to obtain a target synthesized image;

a first fusion processing module 850 configured to perform fusion processing on the target synthesized image and the segmentation feature image to obtain a target fusion feature image;

and the second classification processing module 860 is configured to perform classification processing on the target fusion characteristic image input into the target image classification network, so as to obtain first target class information of the image to be classified.

Optionally, the apparatus further comprises:

the first category confidence acquisition module is configured to execute acquisition of a first category confidence corresponding to the initial category information;

the second category confidence acquisition module is configured to execute acquisition of a second category confidence corresponding to the first target category information;

and the object class information determining module is configured to determine second object class information according to the first class confidence and the second class confidence.

Optionally, the first fusion processing module 850 includes:

the target characteristic information selecting unit is configured to execute the step of selecting target characteristic information corresponding to each pixel point of the image to be segmented from the target synthetic image and the segmented characteristic image according to the comparison result;

a target fusion feature image generation unit configured to perform generation of a target fusion feature image based on the target feature information;

or the like, or, alternatively,

and the superposition processing unit is configured to perform superposition processing on the characteristic values of the corresponding pixel points in the target synthetic image and the segmentation characteristic image to obtain a target fusion characteristic image.

Optionally, the target splicing characteristic information generating module 830 includes:

the characteristic information acquisition unit is configured to input scene information and initial category information into a target word vector network respectively to obtain scene characteristic information corresponding to the scene information and category characteristic information corresponding to the initial category information;

Optionally, the apparatus further comprises:

the characteristic information acquisition module is configured to acquire target category characteristic information and associated scene characteristic information of the target category characteristic information, and the target category characteristic information represents category characteristics of the training sample and the prediction sample;

the third classification processing module is configured to input the first sample synthetic image into the initial image classification network for classification processing to obtain first sample classification information;

and the first network training module is configured to execute training of an initial image classification network based on the first image discrimination result, the first sample class information and the target class characteristic information to obtain a target image classification network.

Optionally, the characteristic information obtaining module includes:

the scene recognition unit is configured to input the scene image set into a scene recognition network for scene recognition, and a scene information set is obtained;

the scene characteristic information set acquisition unit is configured to input a scene information set into a target word vector network to obtain a scene characteristic information set;

a similarity calculation unit configured to perform calculation of a similarity between the target category feature information and the scene feature information in the scene feature information set;

an associated scene feature information determination unit configured to perform determination of associated scene feature information from the scene feature information set based on the similarity.

Optionally, the apparatus further comprises:

a training data acquisition module configured to perform acquisition of a training sample, training scene feature information of the training sample, and training category feature information of the training sample;

the characteristic extraction module is configured to input a training sample into a characteristic extraction network of a segmentation network to be trained for characteristic extraction to obtain a sample segmentation characteristic image;

the second fusion processing module is configured to perform fusion processing on the second sample composite image and the sample segmentation characteristic image to obtain a sample fusion characteristic image;

the fourth classification processing module is configured to input the second sample synthetic image, the sample segmentation characteristic image and the sample fusion characteristic image into a classification network of a segmentation network to be trained, and perform classification processing respectively to obtain third sample class information corresponding to the second sample class information sample segmentation characteristic image corresponding to the second sample synthetic image and fourth sample class information corresponding to the sample fusion characteristic image;

the second reality judging module is configured to input the sample segmentation characteristic image, the second sample synthesis image and the sample fusion characteristic image into a to-be-trained judging network, and respectively judge the reality to obtain a second image judging result corresponding to the sample segmentation characteristic image, a third image judging result corresponding to the second sample synthesis image and a fourth image judging result corresponding to the sample fusion characteristic image;

the second network training module is configured to execute training of a to-be-trained segmented network, a to-be-trained generated network and a to-be-trained discrimination network based on a second sample synthetic image, a sample segmentation feature image, second sample category information, third sample category information, fourth sample category information, training category feature information, a second image discrimination result, a third image discrimination result and a fourth image discrimination result to obtain an initial image segmentation network, an initial generated network and an initial discrimination network;

wherein the initial image segmentation network comprises an initial image classification network.

the data acquisition module 810 includes:

and the characteristic extraction unit is configured to input the image to be classified into a target characteristic extraction network for characteristic extraction to obtain a segmentation characteristic image.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 9 is a block diagram illustrating an electronic device for image classification, which may be a terminal according to an exemplary embodiment, and an internal structure thereof may be as shown in fig. 9. The electronic device comprises a processor, a memory, a network interface, a display screen and an input device which are connected through a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic equipment comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the electronic device is used for connecting and communicating with an external terminal through a network. The computer program is executed by a processor to implement an image classification method. The display screen of the electronic equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the electronic equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the electronic equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 9 is merely a block diagram of some of the structures associated with the disclosed aspects and does not constitute a limitation on the electronic devices to which the disclosed aspects apply, as a particular electronic device may include more or less components than those shown, or combine certain components, or have a different arrangement of components.

In an exemplary embodiment, there is also provided an electronic device including: a processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to implement the image classification method as in the embodiments of the present disclosure.

In an exemplary embodiment, a computer-readable storage medium is also provided, in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform an image classification method in an embodiment of the present disclosure. Alternatively, the computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, a computer program product containing instructions is also provided, which when run on a computer, causes the computer to perform the image classification method in the embodiments of the present disclosure.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the computer program is executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An image classification method, comprising:

2. The image classification method according to claim 1, characterized in that the method further comprises:

3. The image classification method according to claim 1, wherein the fusing the target synthesized image and the segmented feature image to obtain a target fused feature image comprises:

or the like, or, alternatively,

4. The image classification method according to claim 1, wherein the generating target stitching feature information based on the scene information and the initial category information comprises:

5. The image classification method according to any one of claims 1 to 4, characterized in that the method further comprises:

6. The image classification method according to claim 5, wherein the obtaining of the associated scene feature information includes:

7. An image classification apparatus, comprising:

8. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the image classification method of any of claims 1 to 6.

9. A computer-readable storage medium, wherein instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the image classification method of any of claims 1 to 6.

10. A computer program product comprising computer instructions, characterized in that the computer instructions, when executed by a processor, implement the image classification method of any of claims 1 to 6.