CN111476294B - Zero sample image identification method and system based on generation countermeasure network - Google Patents

Zero sample image identification method and system based on generation countermeasure network Download PDF

Info

Publication number
CN111476294B
CN111476294B CN202010263452.4A CN202010263452A CN111476294B CN 111476294 B CN111476294 B CN 111476294B CN 202010263452 A CN202010263452 A CN 202010263452A CN 111476294 B CN111476294 B CN 111476294B
Authority
CN
China
Prior art keywords
semantic
visual
discriminator
features
loss function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN202010263452.4A
Other languages
Chinese (zh)
Other versions
CN111476294A (en
Inventor
张桂梅
龙邦耀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanchang Hangkong University
Original Assignee
Nanchang Hangkong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanchang Hangkong University filed Critical Nanchang Hangkong University
Priority to CN202010263452.4A priority Critical patent/CN111476294B/en
Publication of CN111476294A publication Critical patent/CN111476294A/en
Application granted granted Critical
Publication of CN111476294B publication Critical patent/CN111476294B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a zero sample image identification method and a zero sample image identification system based on a generation countermeasure network. The method comprises the following steps: acquiring a training image sample with marking information and a test image sample without marking information; constructing and generating a confrontation network model; the generation of the countermeasure network model comprises a semantic feature generator, a visual feature generator, a semantic discriminator and a visual discriminator; constructing a multi-target loss function comprising a cycle consistency loss function, a countermeasure loss function of a semantic discriminator, a countermeasure loss function of a visual discriminator and a classification loss function of the semantic discriminator; taking the training image sample as the input of the generated countermeasure network model, and carrying out iterative training on the generated countermeasure network model based on the multi-target loss function to obtain the trained generated countermeasure network model; and inputting the test image sample into the trained generated confrontation network model to obtain a recognition result. The invention can identify the sketch without the marked information, and the zero sample identification precision is high.

Description

Zero sample image identification method and system based on generation countermeasure network
Technical Field
The invention relates to the field of image identification based on weak/semi-supervision, in particular to a zero sample image identification method and system based on a generation countermeasure network.
Background
The concept of Zero-shot Learning (ZSL) was first proposed by h.larochelle et al in 2008, and is mainly used to solve the problem of how to correctly classify and identify unknown new objects under the condition that labeled training samples do not sufficiently cover all object classes. If a classifier is learned on a training set and applied to a test sample set according to a traditional supervised learning method, the classification effect is poor because the sample distribution of two domains is different. This image recognition problem is called zero sample recognition.
Zero sample identification requires only labeled samples of known classes to predict unknown classes. The main idea is to introduce category semantic information as middle layer characteristics and to link visual characteristics with semantic characteristics. Therefore, at the feature level, the key problem of implementing zero sample identification is: 1) searching visual features capable of fully expressing visual information of the image and semantic information capable of fully representing semantic features; 2) how to relate visual features to category semantic information.
For the key problem 1), finding visual features that can sufficiently express image visual information is one of the challenges of zero sample identification. With the rise of deep learning, scholars extract the identification features of images by using deep convolutional neural networks. Zero-sample image recognition requires not only visual features of the image, but also semantic features that can represent the semantics of image classes to link known classes to unknown classes. The most widely used semantic features currently are attribute features and text features. Due to the fact that the attribute characteristics are marked manually, accuracy is poor. In recent years, with the development of natural language processing techniques, research using text description features instead of attribute features has been receiving much attention. Because the text description features can be extracted directly from the corpus, each class corresponds to a vector in the text description space. Compared with attribute features, the text description features can obtain text vectors of any words from the unlabeled text corpus through natural language processing technology, and therefore have better expansibility. A commonly used text vector extraction method is Word2 Vec.
Existing semantic feature spaces can be divided into three categories: (1) semantic feature space based on attributes. (2) A text-based semantic feature space. (3) A common semantic feature space. After the semantic feature space is selected, how to establish the mapping relationship between the visual features and the semantic features is the second key problem of zero sample identification.
For the key problem 2), after extracting semantic features of known classes and unknown classes in a given semantic space, the semantic correlation between the classes can be obtained from the similarity between the semantic features. However, sample images are represented by visual features in the visual space, and they cannot directly link semantic features of the semantic space due to the existence of semantic gaps. Most of the existing methods learn the mapping function which is mapped from the visual space to the semantic space through the visual features of the known class pictures and the semantic features of the corresponding labels. Then, the visual features of the test image are mapped to a semantic space through the mapping function, and predicted semantic features are obtained. And finally finding out the semantic features of the unknown class closest to the unknown class to determine the class to which the unknown class belongs.
In zero-sample image recognition, since the known class and the unknown class are not intersected, the direct application of the model learned from the training sample set to the test set results in a large deviation between the mapping of the test set samples in the semantic space and the real class semantics, which is called domain offset. Recently, to solve the domain shift problem in zero sample learning, many methods have been proposed, such as data enhancement, self-training, and pivot correction.
The zero sample recognition has received a wide attention of the middle and old scholars in recent years, and the application-related algorithm of the zero sample recognition has come to be applied in practice. Previous zero sample learning methods mainly identify targets in the conventional zero sample learning setting, i.e., the test image is limited to only the target class, whereas in an actual scenario, the test image comes not only from the target class but also possibly from the source class. In this case, data from both the source class and the target class should be taken into consideration, and thus the generalized zero sample setting has been introduced in recent years, however, the recognition accuracy of the zero sample based on the generalized zero sample learning is much lower than that based on the conventional zero sample learning. Therefore, the conventional generalized zero sample identification method has the problem of low identification precision.
Disclosure of Invention
Based on the above, there is a need for a zero-sample image recognition method and system based on a generation countermeasure network, which can perform high-precision recognition on test images from a target class and a source class.
In order to achieve the purpose, the invention provides the following scheme:
a zero sample image identification method based on a generation countermeasure network comprises the following steps:
acquiring a training image sample and a test image sample; the training image sample is a sample image with marking information, and the test image sample is a sample image without marking information;
constructing and generating a confrontation network model; the generation countermeasure network model comprises a semantic feature generator, a visual feature generator, a semantic discriminator and a visual discriminator; the semantic feature generator is used for generating pseudo-semantic features according to the real visual features; the visual feature generator is used for generating a pseudo visual feature according to the pseudo semantic feature; the semantic discriminator is used for discriminating the real semantic features and the pseudo semantic features; the visual discriminator is used for discriminating the real visual features and the pseudo visual features;
constructing a multi-target loss function; the multi-target loss function comprises a cycle consistency loss function of a real visual feature and a pseudo visual feature, an antagonistic loss function of a semantic discriminator, an antagonistic loss function of the visual discriminator and a classification loss function of the semantic discriminator;
taking the training image sample as the input of the generated countermeasure network model, and performing iterative training on the generated countermeasure network model based on the multi-target loss function to obtain a trained generated countermeasure network model;
and inputting the test image sample into the trained generated confrontation network model to obtain a recognition result.
The invention also provides a zero sample image recognition system based on the generation countermeasure network, which comprises:
the sample acquisition module is used for acquiring a training image sample and a test image sample; the training image sample is a sample image with marking information, and the test image sample is a sample image without marking information;
the network model building module is used for building and generating a confrontation network model; the generation countermeasure network model comprises a semantic feature generator, a visual feature generator, a semantic discriminator and a visual discriminator; the semantic feature generator is used for generating pseudo-semantic features according to the real visual features; the visual feature generator is used for generating a pseudo visual feature according to the pseudo semantic feature; the semantic discriminator is used for discriminating the real semantic features and the pseudo semantic features; the visual discriminator is used for discriminating the real visual features and the pseudo visual features;
the loss function construction module is used for constructing a multi-target loss function; the multi-target loss function comprises a cycle consistency loss function of a real visual feature and a pseudo visual feature, an antagonistic loss function of a semantic discriminator, an antagonistic loss function of the visual discriminator and a classification loss function of the semantic discriminator;
the training module is used for taking the training image sample as the input of the generated confrontation network model, and performing iterative training on the generated confrontation network model based on the multi-target loss function to obtain a trained generated confrontation network model;
and the test identification module is used for inputting the test image sample into the trained generated confrontation network model to obtain an identification result.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a zero sample image recognition method and a zero sample image recognition system based on a generated countermeasure network, wherein the method comprises the steps of constructing a generated countermeasure network model comprising a semantic feature generator, a visual feature generator, a semantic discriminator and a visual discriminator; constructing a multi-target loss function comprising a cycle consistency loss function, a countermeasure loss function of a semantic discriminator, a countermeasure loss function of a visual discriminator and a classification loss function of the semantic discriminator; taking the training image sample as the input of the generated countermeasure network model, and carrying out iterative training on the generated countermeasure network model based on the multi-target loss function to obtain the trained generated countermeasure network model; and inputting the test image sample into the trained generated confrontation network model to obtain a recognition result. The method can identify the sketch without the marked information, improve the zero sample identification precision and improve the generalization capability of the model.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a flow chart of a zero-sample image recognition method based on a generation countermeasure network according to an embodiment of the present invention;
FIG. 2 is a semantic feature generator G according to an embodiment of the present invention1The network structure of (1);
FIG. 3 is a diagram of a visual feature generator G according to an embodiment of the present invention2The network structure of (1);
FIG. 4 is a diagram of a semantic discriminator D according to an embodiment of the present invention1The network structure of (1);
FIG. 5 is a diagram of a visual discriminator D according to an embodiment of the invention2The network structure of (1);
FIG. 6 is a block diagram of a trained generative confrontation network model according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a zero-sample image recognition system based on a generative countermeasure network according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
In order to improve the identification precision of the generalized zero sample, the following two problems need to be solved: on the one hand, aligned image pairs are required or inefficient feature fusion is required to map visual information to semantic space; on the other hand, when the self-encoder is used for extracting semantic information from Wikipedia, redundant noise texts exist, and the recognition effect is influenced.
Fig. 1 is a flowchart of a zero-sample image recognition method based on a generation countermeasure network according to an embodiment of the present invention. Referring to fig. 1, the zero-sample image recognition method based on the generation countermeasure network of the embodiment includes:
step 101: training image samples and test image samples are obtained.
The training image sample is a sample image with marking information, and the test image sample is a sample image without marking information.
Step 102: constructing and generating a confrontation network model; the generation countermeasure network model comprises a semantic feature generator, a visual feature generator, a semantic discriminator and a visual discriminator.
The semantic feature generator is used for generating pseudo-semantic features according to the real visual features; the visual feature generator is used for generating a pseudo visual feature according to the pseudo semantic feature; the semantic discriminator is used for discriminating the real semantic features and the pseudo semantic features; the visual discriminator is used for discriminating the real visual features and the pseudo visual features.
Before this step is performed, it is also necessary: 1) inputting texts in Wikipedia into a layered model to obtain useful information of the texts, and inputting the useful information of the texts into a self-encoder to obtain real semantic features. 2) And inputting the training image sample into a CNN model based on an attention mechanism to obtain a real visual feature.
Step 103: constructing a multi-target loss function; the multi-target loss function comprises a cycle consistency loss function of a real visual feature and a pseudo visual feature, a confrontation loss function of a semantic discriminator, a confrontation loss function of the visual discriminator and a classification loss function of the semantic discriminator.
Step 104: and taking the training image sample as the input of the generated countermeasure network model, and performing iterative training on the generated countermeasure network model based on the multi-target loss function to obtain the trained generated countermeasure network model.
Step 105: and inputting the test image sample into the trained generated confrontation network model to obtain a recognition result.
Step 101 is a training initial stage of this embodiment, the training initial stage of the recognition model is completed under a deep learning tensflo framework, and a specific flow of obtaining a training image sample and a testing image sample is as follows:
the training image samples and the test image samples in this embodiment may be selected from Sketchy and TU-Berlin. Sketchy and TU-Berlin are two common and popular sketch datasets.
The Sketchy dataset is a large sketch set. The dataset consists of 125 different classes of slaves, each class having 100 drafts. The sketch of the object appearing in this 12500 sketch was collected by group sourcing, with the result being 75471 sketches. The data set also contains fine-grained correspondence (alignment) between particular images and sketches, as well as various data augmentations for deep learning based methods. The data set was then expanded by adding 60502 photos, yielding a total of 73002 sketches. We randomly draw 25 classes of sketches as invisible test sets for zero sample recognition (without their labeling information) and the remaining 100 classes of data are used for training (with labeling information).
The TU-Berlin dataset (extended) contains 250 categories, followed by an extension of 20000 sketches, the natural image corresponding to the sketches class, with a total size of 204489. Randomly selecting 30 types of sketches as a test set (without using the labeled information thereof); the remaining 220 classes are used for training (using annotation information).
Step 102 is the middle training stage of this embodiment, namely, constructing a structure for generating a countermeasure network model, where the structure for generating the countermeasure network model includes a semantic feature generator G1Visual feature generator G2And a semantic discriminator D1And a visual sense discriminator D2. The specific construction process is as follows:
1) construction of the generator network:
constructing a generator network, the generator network having two: semantic feature generator G1And visual feature generator G2. As shown in FIG. 2, semantic feature generator G1Comprises 2 groups of convolution modules and 2 groups of full connection modules. The convolution module consists of a convolution layer (Conv), a Max Pooling layer (Max power) and a normalization layer (normalization); the full-connection module consists of a full-connection layer (FC) and a Leaky ReLU. As shown in FIG. 3, visual feature generator G2The system comprises two groups of full-connection modules, a 3-layer 4096-dimensional full-connection layer (FC 4096), a resampling layer (Reshape) and 5 groups of up-sampling modules. Wherein the full-connection module consists of a full-connection layer and a Leaky ReLU; the up-sampling module consists of two up-sampling layers (Upconv) and two Leaky ReLUs, wherein the up-sampling layers and the Leaky ReLUs are alternately connected. G2The input comes from G1And outputting the semantic features.
In particular, semantic feature generator G1Comprises 2 groups of convolution modules and 2 groups of full connection modules. After an image is input into a generator, firstly, convolution processing is carried out on a convolution layer with a convolution kernel of 11 and a step length of 4 through a convolution module, the deviation of a mean square error left by parameter errors of the convolution layer is reduced through maximum pooling with a pooling layer of 3 and a step length of 2, and the dimension of input data is normalized in subsequent normalization; then convolution processing is carried out on the convolution layer with the convolution kernel of 5 and the step length of 1, the deviation of the mean square error left by the parameter error of the convolution layer is reduced through the maximum pooling with the pooling layer of 3 and the step length of 2, the dimensionality of input data is normalized in the subsequent normalization, and then the input data is input into a 1024 full-connection module; and finally, generating semantic features from the input visual features through two full-connection modules with the same size.
In particular, the visual feature generator G2The device comprises two groups of full-connection modules, 3 layers of 4096-dimensional full-connection layers, a resampling layer and 5 groups of up-sampling modules. Inputting the semantic features generated by the semantic feature generator into the visual feature generator, and firstly passing through two 1024 full-connection modules; then three 4096-dimensional full-connected layers extract 4096 dimensions from the input dataThe feature vector of (2); then, the dimensionality of the input feature vector is resampled to be 4 multiplied by 256 through a resampling layer; finally, 5 up-sampling modules with convolution kernels of 4 and step length of 2 are used for up-sampling the feature vectors, and an activation function is used once every up-sampling to prevent gradient disappearance; and outputting the feature vector.
2) Construction of a discriminator network:
constructing a discriminator network, wherein the discriminator network comprises two networks: semantic discriminator D1And a visual sense discriminator D2。D1Comprises two branches: one branch for 0/1 (true and false) classification and the other branch for classification of the input label category. The first branched network structure comprises a group of fully connected modules and a two-way fully connected layer. The full-connection module consists of a full-connection layer and a Leaky ReLU; the network structure of the other branch comprises a group of fully connected modules and an n-way fully connected layer. The full-link module consists of a full-link layer and a Leaky ReLU. D2The full-connection module comprises a group of full-connection modules and a full-connection layer, wherein the full-connection modules comprise a full-connection layer and a Leaky ReLU. Two discriminators D1,D2The fully-connected layer of the last layer serves as a classifier in the overall convolutional neural network.
As shown in FIG. 4, in particular, semantic discriminator D1Two branches are included, one branch for the 0/1 second category; the other branch is used for class label classification. It receives true semantic features from the extracted self-encoder and a semantic feature generator G1Firstly, extracting features through a group of 1024 full-connection modules in a two-classification branch, then stabilizing the gradient by using an activation function, and finally, performing 0/1 two-classification through a full-connection layer to judge the truth of the input features; and in another n-way classification branch, the input data is subjected to n-way classification by using the last full connection layer.
As shown in FIG. 5, in particular, the visual discriminator D2For discriminating use of visual feature generator G2The authenticity of the features between the generated pseudo visual features and the CNN extracted real visual features. Inputting the generated pseudo-visual featuresTo discriminating network D2Firstly, using 1024 full-link layer to extract features, then using activation function to prevent gradient from disappearing, finally using full-link layer to make secondary classification of data and judging true and false of input features.
Wherein, a multi-target loss function is constructed in step 103, and the purpose of constructing the loss function is as follows: according to the convergence condition of the loss function value, the corresponding parameters in the zero sample identification network model can be better updated and optimized, the optimized generation countermeasure network model is finally obtained, and the image to be identified in the real data set is more accurately identified. Specifically, the method comprises the following steps:
the above-mentioned antagonism loss function is divided into two parts, one is the antagonism loss of CTGAN which evaluates the synthesis semantic features, the antagonism loss of CTGAN can make corresponding constraint to the gradient punishment to improve the quality of the synthesis features; and secondly, the antagonism loss of the general GAN for evaluating the synthesized pseudo-visual characteristics, and the general antagonism mechanism can well reduce the domain difference.
The degree of match between the visual features extracted by the CNN based on the attention mechanism and the generated pseudo-visual features can be well documented by the circular consistency loss function.
The classifier is attached to the semantic discriminator D1Therefore, the classifier can effectively classify the class label data so as to meet the task of zero sample image identification. Semantic discriminator D in the generation countermeasure network model1The countermeasure loss function of (1) is specifically as follows:
Figure BDA0002440299930000081
where x represents the true visual features, a represents the true semantic features, G1(x) Semantic Generator, D, representing an input visual feature as x1(G1(x) Represents input G)1(x) Semantic discriminator of, D1(a) Semantic discriminator expressing input semantic features as a, PfA priori distribution, P, representing true visual featuresrA prior distribution representing the true semantic features,
Figure BDA0002440299930000082
representing linear interpolation between features, Pr,fRepresenting a prior distribution of obedient real visual features and real semantic features; first item
Figure BDA0002440299930000083
Representing a desire for a pseudo feature distribution; second item
Figure BDA0002440299930000084
An expectation representing a true feature distribution; the difference between the first term and the second term represents the Wasserstein distance between the feature distributions;
Figure BDA0002440299930000085
denotes the gradient penalty, λ, of performing the Lipschitz constraint2CT|x',x”A consistency or continuity term representing an increased constraint gradient penalty; lambda [ alpha ]1A weight representing a gradient penalty; lambda [ alpha ]2Weights representing consistency or continuity terms; wherein,
Figure BDA0002440299930000091
x 'and x' both represent perturbation data near the true visual features (arbitrarily extracted perturbation data near the true samples); c is a fixed constant; d (x ') represents a semantic arbiter entered as x', D (x ") represents a semantic arbiter entered as x", | D (x ') -D (x ") | | represents the distance between two arbiter values, | | x' -x" | | represents the distance between two perturbation data features; the consistency term is used
Figure BDA0002440299930000092
To approximate the gradient and limit it to be less than c.
Construction of a countering loss function for a visual arbiter
Figure BDA0002440299930000093
Wherein,
Figure BDA0002440299930000094
representing pseudo-semantic features, D2(x) A visual discriminator representing the input visual feature x,
Figure BDA0002440299930000095
representing input pseudo-semantic features
Figure BDA0002440299930000096
The visual sense generator of (a) is,
Figure BDA0002440299930000097
presentation input
Figure BDA0002440299930000098
The visual characteristics generator of (1); continuously optimizing the network by a loss function such that the generated pseudo-visual features
Figure BDA0002440299930000099
And the true visual feature x is getting closer.
The loss resisting function is that the real characteristic distribution and the generated characteristic distribution are integrally analyzed, a feedback signal is output to the generator network, and the parameter of the network is adjusted and optimized.
Constructing a circular consistency loss function of real visual features and pseudo visual features
Figure BDA00024402999300000910
E[||G2(G1(x))-x||1]Representing a distribution expectation of two visual features measured by cyclic consistency;
Figure BDA00024402999300000911
representing the period of distribution of two semantic features measured by cycle consistencyInspection; loss of cyclic consistency LcycTo optimize network parameters such that the true visual feature x and the pseudo-semantic feature
Figure BDA00024402999300000912
Better matching is possible.
Constructing a classification loss function for a semantic classifier
Lcls=-E[logP(b|G1(a);θ)];
Wherein, P (c | G)1(a) (ii) a θ) represents the class conditional probability of the class label, G1(a) And a semantic generator which represents the input semantic features as a, theta is a parameter of the classification network, and b is a class label of a. The classification accuracy of class labels is improved by minimizing the classification loss of generated features.
And 104, performing iterative training on the constructed generation countermeasure network model, updating and optimizing parameters of the network model, and obtaining the trained generation countermeasure network model. Specifically, the training image sample is used as an input of the semantic feature generator, and the semantic feature generator, the visual feature generator, the semantic discriminator and the visual discriminator are jointly trained in a back propagation manner according to the multi-objective loss function, so that parameters in the semantic feature generator, the visual feature generator, the semantic discriminator and the visual discriminator are continuously updated and optimized, and a trained generated confrontation network model is obtained. Fig. 6 is a diagram of a trained structure for generating a confrontation network model according to an embodiment of the present invention. The specific iterative training steps are as follows:
inputting training sample data on the Sketchy data set and the TU-Berlin data set into a CNN based on an attention mechanism, extracting visual characteristic information of the training sample, and inputting the visual characteristic information into a semantic characteristic generator G1Generating pseudo-semantic features
Figure BDA0002440299930000101
Inputting the pseudo-semantic features obtained in the last step into a visual feature generator G2In generating pseudo visionFeature(s)
Figure BDA0002440299930000102
In order to better measure the similarity between the sketch and the real image in the training process, a cycle-GAN cycle consistency loss constraint is introduced. Because the cycle-GAN consists of two generators and two discriminators. Semantic feature generator G for generating semantic features and visual features as data information of two different domains1Generating pseudo-semantic features from real visual features x
Figure BDA0002440299930000103
Visual feature generator G2The obtained pseudo semantic features
Figure BDA0002440299930000104
Reverse generation of pseudo-visual features
Figure BDA0002440299930000105
Cyclios is then used to measure the similarity of true visual features and pseudo visual features.
The method comprises the steps of inputting texts in the Wikipedia into a layered model to obtain useful information of the texts, then inputting the useful information into a self-encoder, and extracting real semantic information of the Wikipedia texts. Using the real semantic information a as a discriminator D1Input of (1), with G1The generated pseudo-semantic features are used for counterstudy.
Using a variant of WGAN, CTGAN, as discriminator D1To improve the accuracy of zero sample image recognition. Because the gradient penalty of the WGAN is not reasonable, if the real sample distribution and the generated pseudo sample distribution are far away from each other, the gradient penalty often cannot detect the continuity of the area near the real sample, that is, the discriminator will destroy the Lipschitz continuity. The CTGAN adds a constancy term to constrain the gradient of the real sample distribution on the basis of WGAN, thereby enhancing the Lipschitz continuity near the data sample distribution.
Visual feature generator G2Generated pseudo-visual features
Figure BDA0002440299930000111
And the real visual feature x as a visual discriminator D2Input of G2Judging the truth of visual features to generate countermeasures to loss, and updating the optimized network parameters via loss function to make the visual features pseudo
Figure BDA0002440299930000112
Closer and closer to the true visual feature x.
Constructing a discriminator D according to the characteristic information of the Wikipedia text and the sketch1Is the function of the penalty of fighting LCTGANAnd a discriminator D2Is the function of the penalty of fighting Ladv(ii) a Constructing a cycle consistency loss function L according to real visual features and pseudo visual features of a sketchcycThen, a loss function L for classifying the label category is constructedcls
The specific updating optimization process comprises the following steps: the fixed generator network parameters are used for training the discriminator network to obtain a trained discriminator network model; and fixing the trained discriminator network model parameters, carrying out back propagation training on the generator network to obtain an optimized generator network model, and repeating the steps to obtain an optimal generation confrontation network model.
The zero sample image identification method based on the generation countermeasure network in the embodiment has the following advantages: introducing cycle consistent loss constraint of semantic alignment into a generation model to solve the problem that common semantic knowledge cannot be utilized between a training image and a testing image in a real scene, measuring the correlation between visual features and semantic features, and adding a classification network parallel to a discriminator at the output part of the discriminator to correctly classify class labels; using variant CTGAN of WGAN to carry out antagonistic learning on the true characteristic and the synthesized characteristic, and adding a consistency term on the basis of the WGAN so as to restrict the gradient of the distribution of the true characteristic; the zero sample learning has the problem that the training cost and the training complexity are high when the whole attribute set based on the features is identified, and the self-encoder extraction scheme based on the Wikipedia text and the hierarchical structure is proposed to extract the features of the subsets of the attributes, then the hierarchical structure is used for dividing the subsets, useful information is screened, and important feature information from the text is extracted, so that the training cost and the training complexity are reduced, and the zero sample learning is more effective in identifying the attribute subset than the whole attribute set.
The method in the embodiment adopts the generation of the countermeasure network to realize the identification of the zero sample, can identify the sketch without the labeled information, can improve the identification precision of the zero sample, and improves the generalization capability of the model.
Fig. 7 is a schematic structural diagram of a zero-sample image recognition system based on a generative countermeasure network according to an embodiment of the present invention. Referring to fig. 7, the zero-sample image recognition system based on the generation countermeasure network includes:
a sample obtaining module 201, configured to obtain a training image sample and a test image sample; the training image sample is a sample image with marking information, and the test image sample is a sample image without marking information.
A network model construction module 202, configured to construct and generate a confrontation network model; the generation countermeasure network model comprises a semantic feature generator, a visual feature generator, a semantic discriminator and a visual discriminator; the semantic feature generator is used for generating pseudo-semantic features according to the real visual features; the visual feature generator is used for generating a pseudo visual feature according to the pseudo semantic feature; the semantic discriminator is used for discriminating the real semantic features and the pseudo semantic features; the visual discriminator is used for discriminating the real visual features and the pseudo visual features.
A loss function constructing module 203, configured to construct a multi-objective loss function; the multi-target loss function comprises a cycle consistency loss function of a real visual feature and a pseudo visual feature, a confrontation loss function of a semantic discriminator, a confrontation loss function of the visual discriminator and a classification loss function of the semantic discriminator.
The training module 204 is configured to use the training image sample as an input of the generated confrontation network model, and perform iterative training on the generated confrontation network model based on the multi-objective loss function to obtain a trained generated confrontation network model.
And the test recognition module 205 is configured to input the test image sample into the trained generated confrontation network model to obtain a recognition result.
As an optional implementation, the system for zero-sample image recognition based on generation of a countermeasure network further includes:
and the real semantic feature acquisition module is used for inputting the text in the Wikipedia into the layered model to obtain useful information of the text and inputting the useful information of the text into the self-encoder to obtain the real semantic features.
And the real visual feature acquisition module is used for inputting the training image sample into a CNN model based on an attention mechanism to obtain real visual features.
As an optional implementation manner, the network model building module 202 specifically includes:
the first generator constructing unit is used for constructing a semantic feature generator; the semantic feature generator comprises two groups of convolution modules and two groups of full-connection modules; the convolution module comprises a convolution layer, a maximum pooling layer and a normalization layer which are connected in sequence; the full-connection module comprises a full-connection layer and a Leaky ReLU layer.
A second generator building unit for building a visual feature generator; the visual feature generator comprises two groups of fully-connected modules, three 4096-dimensional fully-connected layers, a resampling layer and five groups of upsampling modules which are sequentially connected; the up-sampling module comprises two up-sampling layers and two Leaky ReLU layers; and the upsampling layer in the upsampling module and the Leaky ReLU layer are alternately connected.
The first discriminator constructing unit is used for constructing a semantic discriminator; the semantic discriminator comprises a group of full-connection modules, a two-path full-connection layer, an n-path full-connection layer, two classifiers and an input label classifier.
A second discriminator establishing unit for establishing a visual discriminator; the visual discriminator comprises a group of fully connected modules, a fully connected layer and a classifier.
As an optional implementation manner, the loss function constructing module 203 specifically includes:
a first loss function construction unit for constructing a countermeasure loss function of the semantic discriminator
Figure BDA0002440299930000131
Where x represents the true visual features, a represents the true semantic features, G1(x) Semantic Generator, D, representing an input visual feature as x1(G1(x) Represents input G)1(x) Semantic discriminator of, D1(a) Semantic discriminator expressing input semantic features as a, PfA priori distribution, P, representing true visual featuresrA prior distribution representing the true semantic features,
Figure BDA0002440299930000132
representing linear interpolation between features, Pr,fRepresenting a prior distribution of obedient real visual features and real semantic features;
Figure BDA0002440299930000133
representing a desire for a pseudo feature distribution;
Figure BDA0002440299930000134
an expectation representing a true feature distribution;
Figure BDA0002440299930000135
denotes the gradient penalty, λ, of performing the Lipschitz constraint2CT|x',x”A consistency or continuity term representing an increased constraint gradient penalty; lambda [ alpha ]1A weight representing a gradient penalty; lambda [ alpha ]2Weights representing consistency or continuity terms; wherein,
Figure BDA0002440299930000136
x' and x "both represent perturbation data in the vicinity of the true visual feature; c is a fixed constant; d (x ') represents the semantic arbiter input as x', D (x ") represents the semantic arbiter input as x", | D (x ') -D (x ") | | represents the distance between two arbiter values, | | x' -x" | | represents the distance between two perturbation data features.
A second loss function constructing unit for constructing a countering loss function of the visual discriminator
Figure BDA0002440299930000141
Wherein,
Figure BDA0002440299930000142
representing pseudo-semantic features, D2(x) A visual discriminator representing the input visual feature x,
Figure BDA0002440299930000143
representing input pseudo-semantic features
Figure BDA0002440299930000144
The visual sense generator of (a) is,
Figure BDA0002440299930000145
presentation input
Figure BDA0002440299930000146
The visual characteristics generator of (1).
A third loss function constructing unit for constructing a circular consistency loss function of the real visual features and the pseudo visual features
Figure BDA0002440299930000147
E[||G2(G1(x))-x||1]Representing a distribution expectation of two visual features measured by cyclic consistency;
Figure BDA0002440299930000148
representing the expectation of distribution of two semantic features measured by circular consistency.
A fourth loss function construction unit for constructing the classification loss function of the semantic discriminator
Lcls=-E[logP(b|G1(a);θ)];
Wherein, P (c | G)1(a) (ii) a θ) represents the class conditional probability of the class label, G1(a) And a semantic generator which represents the input semantic features as a, theta is a parameter of the classification network, and b is a class label of a.
As an optional implementation manner, the training module 204 specifically includes:
and the training unit is used for taking the training image sample as the input of the semantic feature generator, and performing combined training on the semantic feature generator, the visual feature generator, the semantic discriminator and the visual discriminator in a back propagation mode according to the multi-target loss function, so that parameters in the semantic feature generator, the visual feature generator, the semantic discriminator and the visual discriminator are continuously updated and optimized, and a trained generated confrontation network model is obtained.
The zero sample image identification system based on the generation countermeasure network in the embodiment adopts the generation countermeasure network to realize the identification of the zero sample, can identify the sketch without the labeled information, can improve the identification precision of the zero sample, and improves the generalization capability of the model.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims (8)

1. A zero sample image identification method based on a generation countermeasure network is characterized by comprising the following steps:
acquiring a training image sample and a test image sample; the training image sample is a sample image with marking information, and the test image sample is a sample image without marking information;
constructing and generating a confrontation network model; the generation countermeasure network model comprises a semantic feature generator, a visual feature generator, a semantic discriminator and a visual discriminator; the semantic feature generator is used for generating pseudo-semantic features according to the real visual features; the visual feature generator is used for generating a pseudo visual feature according to the pseudo semantic feature; the semantic discriminator is used for discriminating the real semantic features and the pseudo semantic features; the visual discriminator is used for discriminating the real visual features and the pseudo visual features;
constructing a multi-target loss function; the multi-target loss function comprises a cycle consistency loss function of a real visual feature and a pseudo visual feature, an antagonistic loss function of a semantic discriminator, an antagonistic loss function of the visual discriminator and a classification loss function of the semantic discriminator;
taking the training image sample as the input of the generated countermeasure network model, and performing iterative training on the generated countermeasure network model based on the multi-target loss function to obtain a trained generated countermeasure network model;
inputting the test image sample into the trained generated confrontation network model to obtain a recognition result;
the constructing of the multi-target loss function specifically includes:
construction of a penalty function for semantic discriminators
Figure FDA0003486687910000011
Where x represents the true visual features, a represents the true semantic features, G1(x) Semantic Generator, D, representing an input visual feature as x1(G1(x) Represents input G)1(x) Semantic discriminator of, D1(a) Semantic discriminator expressing input semantic features as a, PfA priori distribution, P, representing true visual featuresrA prior distribution representing the true semantic features,
Figure FDA0003486687910000012
representing linear interpolation between features, Pr,fRepresenting a prior distribution of obedient real visual features and real semantic features;
Figure FDA0003486687910000021
representing a desire for a pseudo feature distribution;
Figure FDA0003486687910000022
an expectation representing a true feature distribution;
Figure FDA0003486687910000023
denotes the gradient penalty, λ, of performing the Lipschitz constraint2CT|x',x”A consistency or continuity term representing an increased constraint gradient penalty; lambda [ alpha ]1A weight representing a gradient penalty; lambda [ alpha ]2Weights representing consistency or continuity terms; wherein,
Figure FDA0003486687910000024
x' and x "both represent perturbation data in the vicinity of the true visual feature; c is a fixed constant; d (x ') represents a semantic arbiter entered as x', D (x ") represents a semantic arbiter entered as x", | D (x ') -D (x ") | | represents the distance between two arbiter values, | | x' -x" | | represents the distance between two perturbation data features;
construction of a countering loss function for a visual arbiter
Figure FDA0003486687910000025
Wherein,
Figure FDA0003486687910000026
representing pseudo-semantic features, D2(x) A visual discriminator representing the input visual feature x,
Figure FDA0003486687910000027
a visual generator representing an input pseudo-semantic feature a-,
Figure FDA0003486687910000028
presentation input
Figure FDA0003486687910000029
The visual characteristics generator of (1);
constructing a circular consistency loss function of real visual features and pseudo visual features
Figure FDA00034866879100000210
Ε[||G2(G1(x))-x||1]Representing a distribution expectation of two visual features measured by cyclic consistency;
Figure FDA00034866879100000211
representing the distribution expectation of two semantic features measured by cycle consistency;
constructing a classification loss function for a semantic classifier
Lcls=-Ε[logP(b|G1(a);θ)];
Wherein, P (b | G)1(a) (ii) a θ) represents the class conditional probability of the class label, G1(a) And a semantic generator which represents the input semantic features as a, theta is a parameter of the classification network, and b is a class label of a.
2. The method for zero-sample image recognition based on generation of countermeasure network according to claim 1, further comprising, before the constructing of the generation of countermeasure network model:
inputting texts in Wikipedia into a layered model to obtain useful information of the texts, and inputting the useful information of the texts into a self-encoder to obtain real semantic features;
and inputting the training image sample into a CNN model based on an attention mechanism to obtain a real visual feature.
3. The method for zero-sample image recognition based on generation of a countermeasure network according to claim 1, wherein the constructing of the generation countermeasure network model specifically includes:
constructing a semantic feature generator; the semantic feature generator comprises two groups of convolution modules and two groups of full-connection modules; the convolution module comprises a convolution layer, a maximum pooling layer and a normalization layer which are connected in sequence; the full-connection module comprises a full-connection layer and a Leaky ReLU layer;
constructing a visual feature generator; the visual feature generator comprises two groups of fully-connected modules, three 4096-dimensional fully-connected layers, a resampling layer and five groups of upsampling modules which are sequentially connected; the up-sampling module comprises two up-sampling layers and two Leaky ReLU layers; the upsampling layer in the upsampling module is alternately connected with the Leaky ReLU layer;
constructing a semantic discriminator; the semantic discriminator comprises a group of full-connection modules, a two-path full-connection layer, an n-path full-connection layer, two classifiers and an input label classifier;
constructing a visual discriminator; the visual discriminator comprises a group of fully connected modules, a fully connected layer and a classifier.
4. The method as claimed in claim 1, wherein the step of iteratively training the generated countermeasure network model based on the multi-objective loss function by using the training image samples as the input of the generated countermeasure network model to obtain the trained generated countermeasure network model specifically comprises:
and taking the training image sample as the input of the semantic feature generator, and performing combined training on the semantic feature generator, the visual feature generator, the semantic discriminator and the visual discriminator in a back propagation mode according to the multi-target loss function, so that parameters in the semantic feature generator, the visual feature generator, the semantic discriminator and the visual discriminator are continuously updated and optimized, and a trained generated confrontation network model is obtained.
5. A zero-sample image recognition system based on a generative countermeasure network, comprising:
the sample acquisition module is used for acquiring a training image sample and a test image sample; the training image sample is a sample image with marking information, and the test image sample is a sample image without marking information;
the network model building module is used for building and generating a confrontation network model; the generation countermeasure network model comprises a semantic feature generator, a visual feature generator, a semantic discriminator and a visual discriminator; the semantic feature generator is used for generating pseudo-semantic features according to the real visual features; the visual feature generator is used for generating a pseudo visual feature according to the pseudo semantic feature; the semantic discriminator is used for discriminating the real semantic features and the pseudo semantic features; the visual discriminator is used for discriminating the real visual features and the pseudo visual features;
the loss function construction module is used for constructing a multi-target loss function; the multi-target loss function comprises a cycle consistency loss function of a real visual feature and a pseudo visual feature, an antagonistic loss function of a semantic discriminator, an antagonistic loss function of the visual discriminator and a classification loss function of the semantic discriminator;
the training module is used for taking the training image sample as the input of the generated confrontation network model, and performing iterative training on the generated confrontation network model based on the multi-target loss function to obtain a trained generated confrontation network model;
the test recognition module is used for inputting the test image sample into the trained generated confrontation network model to obtain a recognition result;
the loss function building module specifically includes:
a first loss function construction unit for constructing a countermeasure loss function of the semantic discriminator
Figure FDA0003486687910000041
Where x represents the true visual features, a represents the true semantic features, G1(x) Semantic Generator, D, representing an input visual feature as x1(G1(x) Represents input G)1(x) Semantic discriminator of, D1(a) Semantic discriminator expressing input semantic features as a, PfA priori distribution, P, representing true visual featuresrA prior distribution representing the true semantic features,
Figure FDA0003486687910000051
representing linear interpolation between features, Pr,fRepresenting a prior distribution of obedient real visual features and real semantic features;
Figure FDA0003486687910000052
representing a desire for a pseudo feature distribution;
Figure FDA0003486687910000053
an expectation representing a true feature distribution;
Figure FDA0003486687910000054
denotes the gradient penalty, λ, of performing the Lipschitz constraint2CT|x',x”A consistency or continuity term representing an increased constraint gradient penalty; lambda [ alpha ]1A weight representing a gradient penalty; lambda [ alpha ]2Weights representing consistency or continuity terms; wherein,
Figure FDA0003486687910000055
x' and x "both represent perturbation data in the vicinity of the true visual feature; c is a fixed constant; d (x ') represents a semantic arbiter entered as x', D (x ") represents a semantic arbiter entered as x", | D (x ') -D (x ") | | represents the distance between two arbiter values, | | x' -x" | | represents the distance between two perturbation data features;
a second loss function constructing unit for constructing a countering loss function of the visual discriminator
Figure FDA0003486687910000056
Wherein,
Figure FDA0003486687910000057
representing pseudo-semantic features, D2(x) A visual discriminator representing the input visual feature x,
Figure FDA0003486687910000058
a visual generator representing an input pseudo-semantic feature a-,
Figure FDA0003486687910000059
presentation input
Figure FDA00034866879100000510
The visual characteristics generator of (1);
a third loss function constructing unit for constructing a circular consistency loss function of the real visual features and the pseudo visual features
Figure FDA00034866879100000511
Ε[||G2(G1(x))-x||1]Representing a distribution expectation of two visual features measured by cyclic consistency;
Figure FDA0003486687910000061
representing the distribution expectation of two semantic features measured by cycle consistency;
a fourth loss function construction unit for constructing the classification loss function of the semantic discriminator
Lcls=-Ε[log P(b|G1(a);θ)];
Wherein, P (b | G)1(a) (ii) a θ) represents the class conditional probability of the class label, G1(a) And a semantic generator which represents the input semantic features as a, theta is a parameter of the classification network, and b is a class label of a.
6. The system of claim 5, further comprising:
the real semantic feature acquisition module is used for inputting the text in the Wikipedia into the layered model to obtain useful information of the text and inputting the useful information of the text into the self-encoder to obtain real semantic features;
and the real visual feature acquisition module is used for inputting the training image sample into a CNN model based on an attention mechanism to obtain real visual features.
7. The system for zero-sample image recognition based on generation of a countermeasure network according to claim 5, wherein the network model construction module specifically comprises:
the first generator constructing unit is used for constructing a semantic feature generator; the semantic feature generator comprises two groups of convolution modules and two groups of full-connection modules; the convolution module comprises a convolution layer, a maximum pooling layer and a normalization layer which are connected in sequence; the full-connection module comprises a full-connection layer and a Leaky ReLU layer;
a second generator building unit for building a visual feature generator; the visual feature generator comprises two groups of fully-connected modules, three 4096-dimensional fully-connected layers, a resampling layer and five groups of upsampling modules which are sequentially connected; the up-sampling module comprises two up-sampling layers and two Leaky ReLU layers; the upsampling layer in the upsampling module is alternately connected with the Leaky ReLU layer;
the first discriminator constructing unit is used for constructing a semantic discriminator; the semantic discriminator comprises a group of full-connection modules, a two-path full-connection layer, an n-path full-connection layer, two classifiers and an input label classifier;
a second discriminator establishing unit for establishing a visual discriminator; the visual discriminator comprises a group of fully connected modules, a fully connected layer and a classifier.
8. The system of claim 5, wherein the training module specifically comprises:
and the training unit is used for taking the training image sample as the input of the semantic feature generator, and performing combined training on the semantic feature generator, the visual feature generator, the semantic discriminator and the visual discriminator in a back propagation mode according to the multi-target loss function, so that parameters in the semantic feature generator, the visual feature generator, the semantic discriminator and the visual discriminator are continuously updated and optimized, and a trained generated confrontation network model is obtained.
CN202010263452.4A 2020-04-07 2020-04-07 Zero sample image identification method and system based on generation countermeasure network Expired - Fee Related CN111476294B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010263452.4A CN111476294B (en) 2020-04-07 2020-04-07 Zero sample image identification method and system based on generation countermeasure network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010263452.4A CN111476294B (en) 2020-04-07 2020-04-07 Zero sample image identification method and system based on generation countermeasure network

Publications (2)

Publication Number Publication Date
CN111476294A CN111476294A (en) 2020-07-31
CN111476294B true CN111476294B (en) 2022-03-22

Family

ID=71749908

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010263452.4A Expired - Fee Related CN111476294B (en) 2020-04-07 2020-04-07 Zero sample image identification method and system based on generation countermeasure network

Country Status (1)

Country Link
CN (1) CN111476294B (en)

Families Citing this family (66)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111950619B (en) * 2020-08-05 2022-09-09 东北林业大学 Active learning method based on dual-generation countermeasure network
CN112069397B (en) * 2020-08-21 2023-08-04 三峡大学 Rumor detection method combining self-attention mechanism and generation of countermeasure network
CN112001122B (en) * 2020-08-26 2023-09-26 合肥工业大学 Non-contact physiological signal measurement method based on end-to-end generation countermeasure network
CN112199479B (en) * 2020-09-15 2024-08-02 北京捷通华声科技股份有限公司 Method, device, equipment and storage medium for optimizing language semantic understanding model
CN112132197B (en) * 2020-09-15 2024-07-09 腾讯科技(深圳)有限公司 Model training, image processing method, device, computer equipment and storage medium
CN112149802B (en) * 2020-09-17 2022-08-09 广西大学 Image content conversion method with consistent semantic structure
CN112101470B (en) * 2020-09-18 2023-04-11 上海电力大学 Guide zero sample identification method based on multi-channel Gauss GAN
CN112199637B (en) * 2020-09-21 2024-04-12 浙江大学 Regression modeling method for generating contrast network data enhancement based on regression attention
CN112308113A (en) * 2020-09-23 2021-02-02 济南浪潮高新科技投资发展有限公司 Target identification method, device and medium based on semi-supervision
CN112232378A (en) * 2020-09-23 2021-01-15 中国人民解放军战略支援部队信息工程大学 Zero-order learning method for fMRI visual classification
CN112364138A (en) * 2020-10-12 2021-02-12 上海交通大学 Visual question-answer data enhancement method and device based on anti-attack technology
CN112287779B (en) * 2020-10-19 2022-03-25 华南农业大学 Low-illuminance image natural illuminance reinforcing method and application
CN112364894B (en) * 2020-10-23 2022-07-08 天津大学 Zero sample image classification method of countermeasure network based on meta-learning
CN112415514B (en) * 2020-11-16 2023-05-02 北京环境特性研究所 Target SAR image generation method and device
CN113191381B (en) * 2020-12-04 2022-10-11 云南大学 Image zero-order classification model based on cross knowledge and classification method thereof
CN112560034B (en) * 2020-12-11 2024-03-29 宿迁学院 Malicious code sample synthesis method and device based on feedback type deep countermeasure network
CN112667496B (en) * 2020-12-14 2022-11-18 清华大学 Black box countermeasure test sample generation method and device based on multiple prior
CN112580722B (en) * 2020-12-20 2024-06-14 大连理工大学人工智能大连研究院 Generalized zero sample image recognition method based on conditional countermeasure automatic encoder
CN112731327B (en) * 2020-12-25 2023-05-23 南昌航空大学 HRRP radar target identification method based on CN-LSGAN, STFT and CNN
CN112700408B (en) * 2020-12-28 2023-09-08 中国银联股份有限公司 Model training method, image quality evaluation method and device
CN112767505B (en) * 2020-12-31 2023-12-22 深圳市联影高端医疗装备创新研究院 Image processing method, training device, electronic terminal and storage medium
CN112767507B (en) * 2021-01-15 2022-11-18 大连理工大学 Cartoon sketch coloring method based on dynamic memory module and generation confrontation network
CN112766366A (en) * 2021-01-18 2021-05-07 深圳前海微众银行股份有限公司 Training method for resisting generation network and image processing method and device thereof
CN112766386B (en) * 2021-01-25 2022-09-20 大连理工大学 Generalized zero sample learning method based on multi-input multi-output fusion network
CN112818995B (en) * 2021-01-27 2024-05-21 北京达佳互联信息技术有限公司 Image classification method, device, electronic equipment and storage medium
CN113283423B (en) * 2021-01-29 2022-08-16 南京理工大学 Natural scene distortion text image correction method and system based on generation network
CN113221948B (en) * 2021-04-13 2022-08-05 复旦大学 Digital slice image classification method based on countermeasure generation network and weak supervised learning
CN113222002B (en) * 2021-05-07 2024-04-05 西安交通大学 Zero sample classification method based on generative discriminative contrast optimization
CN113140020B (en) * 2021-05-13 2022-10-14 电子科技大学 Method for generating image based on text of countermeasure network generated by accompanying supervision
CN113269274B (en) * 2021-06-18 2022-04-19 南昌航空大学 Zero sample identification method and system based on cycle consistency
CN113726545B (en) * 2021-06-23 2022-12-23 清华大学 Network traffic generation method and device for generating countermeasure network based on knowledge enhancement
CN113378959B (en) * 2021-06-24 2022-03-15 中国矿业大学 Zero sample learning method for generating countermeasure network based on semantic error correction
CN113706645A (en) * 2021-06-30 2021-11-26 酷栈(宁波)创意科技有限公司 Information processing method for landscape painting
CN113609569B (en) * 2021-07-01 2023-06-09 湖州师范学院 Distinguishing type generalized zero sample learning fault diagnosis method
CN113361646A (en) * 2021-07-01 2021-09-07 中国科学技术大学 Generalized zero sample image identification method and model based on semantic information retention
CN113537322B (en) * 2021-07-02 2023-04-18 电子科技大学 Zero sample visual classification method for cross-modal semantic enhancement generation countermeasure network
CN113505845A (en) * 2021-07-23 2021-10-15 黑龙江省博雅智睿科技发展有限责任公司 Deep learning training set image generation method based on language
CN113706379B (en) * 2021-07-29 2023-05-26 山东财经大学 Interlayer interpolation method and system based on medical image processing
CN113642621B (en) * 2021-08-03 2024-06-28 南京邮电大学 Zero sample image classification method based on generation countermeasure network
CN113657272B (en) * 2021-08-17 2022-06-28 山东建筑大学 Micro video classification method and system based on missing data completion
CN113746087B (en) * 2021-08-19 2023-03-21 浙江大学 Power grid transient stability sample controllable generation and evaluation method and system based on CTGAN
CN113763442B (en) * 2021-09-07 2023-06-13 南昌航空大学 Deformable medical image registration method and system
CN113762180B (en) * 2021-09-13 2023-09-01 中国科学技术大学 Training method and system for human body activity imaging based on millimeter wave radar signals
CN113806584B (en) * 2021-09-17 2022-10-14 河海大学 Self-supervision cross-modal perception loss-based method for generating command actions of band
CN114154550B (en) * 2021-10-12 2024-10-18 清华大学 Domain name countermeasure sample generation method and device
CN114067195B (en) * 2021-10-20 2024-08-13 北京航天自动控制研究所 Target detector learning method based on generated countermeasure
CN114373077A (en) * 2021-12-07 2022-04-19 燕山大学 Sketch identification method based on double-layer structure
CN114359659B (en) * 2021-12-17 2024-09-06 华南理工大学 Attention disturbance-based automatic image labeling method, system and medium
CN114176549B (en) * 2021-12-23 2024-04-16 杭州电子科技大学 Fetal heart rate signal data enhancement method and device based on generation type countermeasure network
CN114387444B (en) * 2021-12-24 2024-10-15 大连理工大学 Zero sample classification method based on negative boundary triplet loss and data enhancement
CN114005005B (en) * 2021-12-30 2022-03-22 深圳佑驾创新科技有限公司 Double-batch standardized zero-instance image classification method
CN114511737B (en) * 2022-01-24 2022-09-09 北京建筑大学 Training method of image recognition domain generalization model
CN114519118A (en) * 2022-02-21 2022-05-20 安徽大学 Zero sample sketch retrieval method based on multiple times of GAN and semantic cycle consistency
CN114998124B (en) * 2022-05-23 2024-06-18 北京航空航天大学 Image sharpening processing method for target detection
CN115187467B (en) * 2022-05-31 2024-07-02 北京昭衍新药研究中心股份有限公司 Enhanced virtual image data generation method based on generation countermeasure network
CN114723611B (en) * 2022-06-10 2022-09-30 季华实验室 Image reconstruction model training method, reconstruction method, device, equipment and medium
CN114757342B (en) * 2022-06-14 2022-09-09 南昌大学 Electronic data information evidence-obtaining method based on confrontation training
CN115314254B (en) * 2022-07-07 2023-06-23 中国人民解放军战略支援部队信息工程大学 Semi-supervised malicious traffic detection method based on improved WGAN-GP
CN115308705A (en) * 2022-08-05 2022-11-08 北京理工大学 Multi-pose extremely narrow pulse echo generation method based on generation countermeasure network
CN115222752B (en) * 2022-09-19 2023-01-24 之江实验室 Pathological image feature extractor training method and device based on feature decoupling
CN115424119B (en) * 2022-11-04 2023-03-24 之江实验室 Image generation training method and device capable of explaining GAN based on semantic fractal
CN115527216B (en) * 2022-11-09 2023-05-23 中国矿业大学(北京) Text image generation method based on modulation fusion and antagonism network generation
CN116579414B (en) * 2023-03-24 2024-04-02 浙江医准智能科技有限公司 Model training method, MRI thin layer data reconstruction method, device and equipment
CN117541883B (en) * 2024-01-09 2024-04-09 四川见山科技有限责任公司 Image generation model training, image generation method, system and electronic equipment
CN117610614B (en) * 2024-01-11 2024-03-22 四川大学 Attention-guided generation countermeasure network zero sample nuclear power seal detection method
CN117934930B (en) * 2024-01-12 2024-09-10 西南计算机有限责任公司 Target identification method based on unmanned platform

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875818A (en) * 2018-06-06 2018-11-23 西安交通大学 Based on variation from code machine and confrontation network integration zero sample image classification method
CN109460814A (en) * 2018-09-28 2019-03-12 浙江工业大学 A kind of deep learning classification method for attacking resisting sample function with defence
CN109816032A (en) * 2019-01-30 2019-05-28 中科人工智能创新技术研究院(青岛)有限公司 Zero sample classification method and apparatus of unbiased mapping based on production confrontation network
CN110334781A (en) * 2019-06-10 2019-10-15 大连理工大学 A kind of zero sample learning algorithm based on Res-Gan
CN110443293A (en) * 2019-07-25 2019-11-12 天津大学 Based on double zero sample image classification methods for differentiating and generating confrontation network text and reconstructing
CN110490946A (en) * 2019-07-15 2019-11-22 同济大学 Text generation image method based on cross-module state similarity and generation confrontation network
CN110795585A (en) * 2019-11-12 2020-02-14 福州大学 Zero sample image classification model based on generation countermeasure network and method thereof

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10810767B2 (en) * 2018-06-12 2020-10-20 Siemens Healthcare Gmbh Machine-learned network for Fourier transform in reconstruction for medical imaging

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875818A (en) * 2018-06-06 2018-11-23 西安交通大学 Based on variation from code machine and confrontation network integration zero sample image classification method
CN109460814A (en) * 2018-09-28 2019-03-12 浙江工业大学 A kind of deep learning classification method for attacking resisting sample function with defence
CN109816032A (en) * 2019-01-30 2019-05-28 中科人工智能创新技术研究院(青岛)有限公司 Zero sample classification method and apparatus of unbiased mapping based on production confrontation network
CN110334781A (en) * 2019-06-10 2019-10-15 大连理工大学 A kind of zero sample learning algorithm based on Res-Gan
CN110490946A (en) * 2019-07-15 2019-11-22 同济大学 Text generation image method based on cross-module state similarity and generation confrontation network
CN110443293A (en) * 2019-07-25 2019-11-12 天津大学 Based on double zero sample image classification methods for differentiating and generating confrontation network text and reconstructing
CN110795585A (en) * 2019-11-12 2020-02-14 福州大学 Zero sample image classification model based on generation countermeasure network and method thereof

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks";Jun-Yan Zhu等;《2017 IEEE International Conference on Computer Vision (ICCV)》;20171225;第2242-2251页 *
"基于去冗余特征和语义关系约束的零样本属性识别";张桂梅等;《模式识别与人工智能》;20210930;第 34 卷(第 9 期);第809-823页 *
"结合迁移引导和双向循环结构 GAN 的零样本文本识别";张桂梅等;《模式识别与人工智能 》;20201231;第 33 卷(第 12 期);第1083-1096页 *

Also Published As

Publication number Publication date
CN111476294A (en) 2020-07-31

Similar Documents

Publication Publication Date Title
CN111476294B (en) Zero sample image identification method and system based on generation countermeasure network
CN110147457B (en) Image-text matching method, device, storage medium and equipment
CN108875818B (en) Zero sample image classification method based on combination of variational self-coding machine and antagonistic network
CN110059217B (en) Image text cross-media retrieval method for two-stage network
CN111581405A (en) Cross-modal generalization zero sample retrieval method for generating confrontation network based on dual learning
CN112966127A (en) Cross-modal retrieval method based on multilayer semantic alignment
CN110232395B (en) Power system fault diagnosis method based on fault Chinese text
Chen Model reprogramming: Resource-efficient cross-domain machine learning
CN112232053B (en) Text similarity computing system, method and storage medium based on multi-keyword pair matching
CN112732916A (en) BERT-based multi-feature fusion fuzzy text classification model
CN110287354A (en) A kind of high score remote sensing images semantic understanding method based on multi-modal neural network
Wang et al. Advanced Multimodal Deep Learning Architecture for Image-Text Matching
CN117725261A (en) Cross-modal retrieval method, device, equipment and medium for video text
Tang et al. Class-level prototype guided multiscale feature learning for remote sensing scene classification with limited labels
CN115204171A (en) Document-level event extraction method and system based on hypergraph neural network
Nijhawan et al. VTnet+ Handcrafted based approach for food cuisines classification
CN113222002A (en) Zero sample classification method based on generative discriminative contrast optimization
Li et al. Evaluating BERT on cloud-edge time series forecasting and sentiment analysis via prompt learning
Fang Detection of white blood cells using YOLOV3 network
Xie et al. Full-view salient feature mining and alignment for text-based person search
CN115640418A (en) Cross-domain multi-view target website retrieval method and device based on residual semantic consistency
Li et al. ViT2CMH: Vision Transformer Cross-Modal Hashing for Fine-Grained Vision-Text Retrieval.
CN113723111B (en) Small sample intention recognition method, device, equipment and storage medium
Singh et al. Visual content generation from textual description using improved adversarial network
Wang et al. Contrastive embedding-based feature generation for generalized zero-shot learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20220322

CF01 Termination of patent right due to non-payment of annual fee