CN112364894B - Zero sample image classification method of countermeasure network based on meta-learning - Google Patents
Zero sample image classification method of countermeasure network based on meta-learning Download PDFInfo
- Publication number
- CN112364894B CN112364894B CN202011147848.9A CN202011147848A CN112364894B CN 112364894 B CN112364894 B CN 112364894B CN 202011147848 A CN202011147848 A CN 202011147848A CN 112364894 B CN112364894 B CN 112364894B
- Authority
- CN
- China
- Prior art keywords
- visual
- training
- loss
- semantic
- decoder
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Molecular Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
The invention belongs to the technical field of image classification, and particularly relates to a zero sample image classification method of a countermeasure network based on meta-learning. The method can make the generalized zero sample image classification capability more prominent, improve the generalization capability of the model and relieve the field offset problem commonly existing in zero sample learning.
Description
Technical Field
The invention belongs to the technical field of image classification, and particularly relates to a zero sample image classification method of a countermeasure network based on meta-learning.
Background
In recent years, machine learning has been widely applied in the fields of natural language processing, computer vision, speech recognition, etc., while in the field of computer vision, the task of image classification is one of the most concerned and most widely applied tasks, and various classification techniques are developed endlessly and the performance is continuously improved. In a machine learning task, a supervised learning method for realizing classification through a large number of artificially labeled images is a traditional method for image classification, and is well applied to real life. However, it is not easy to collect enough samples and label for each category of image in practice, and a lot of labor is consumed. It is easy to understand that species distribution in nature presents a long tail effect, only a few classes of species have enough image samples for supervised learning to train a classification model, and many classes of species have few samples and difficult label labeling, which makes supervised learning a huge challenge. Therefore, zero sample learning arises in order to solve the problem of sample label loss.
The zero sample image classification is an important direction of zero sample learning and is used for solving the classification problem of difficult image labeling, in the traditional zero sample image classification setting, a visible image sample and a label training model thereof are utilized, an unseen image sample test model is utilized, and the classes of the test image and the classes of the training image are not intersected under the setting; whereas in the generalized zero-sample image classification setting, the test image sample includes both images of the visible class and images of the unseen class. The zero sample learning referred to in this patent includes the two settings described above. Currently, the main research methods for zero sample image classification can be roughly divided into two types: the method is based on mapping, and the visual characteristic and the semantic characteristic are mapped through mapping between a visual characteristic space and a semantic characteristic space or mapping from the visual characteristic space and the semantic characteristic space to a public space, so that a better classification result is obtained; and the other is a generation-based method, which utilizes generation models such as a generation countermeasure network and a variational self-encoder to generate pseudo features of the test sample, and determines the category of the test sample by comparing the similarity between the generated pseudo features and the real features.
In order to complete the prediction of the test sample class, the zero sample image classification technology achieves the effect of knowledge migration by utilizing the semantic information of a visible class and an invisible class. The experimental setup is as follows: in the training phase, a labeled sample of a visible class is givenWhere n is the number of samples of the visible class,is the visual characteristic of the ith sample,indicating its corresponding category label and, in addition,representing its corresponding class level semantic prototype. The traditional zero-sample image classification is a semantic feature A of a given unseen classUTo test a sample xtClassified into the unseen class YUIn, andthe generalized zero-sample image classification is to classify a test sample x according to the semantic features of a visible class and an invisible classtThe classification is visible and invisible. In summary, the zero-sample image classification is to train a model by using the relevant features of the visible class samples, and predict the class label y of the test sample by using the modelt。
The feature representation can be incomplete by learning a simple mapping relation between a visual space and a semantic space, and meanwhile, a low-dimensional pivot point problem can be generated. Simple mapping from a high-dimensional visual space to a low-dimensional semantic space by learning can cause a pivot point phenomenon that samples of different classes in the high-dimensional are compressed to the same class of semantics in the low-dimensional, and similar problems can also occur with simple mapping from the low-dimensional space to the high-dimensional space. In recent years, the generation of countermeasure networks has gained attention by researchers, and in combination with zero sample learning, the accuracy of classification is improved by generating a large number of pseudo features. However, the essential disadvantage of generating the countermeasure network is that the training process is unstable, and the problem of mode collapse is easily caused. Yet another generation-based approach introduces a variational auto-encoder (VAE) that generates pseudo-visual features by inputting VAEs conditioned on semantic information. VAEs tend to distort the visual features generated due to the introduction of lower bounds of variation.
Disclosure of Invention
The invention aims to: aiming at the defects of the prior art, the zero sample image classification method of the countermeasure network based on the meta-learning is provided, and the zero sample image classification accuracy can be improved.
In order to achieve the purpose, the invention adopts the following technical scheme:
a zero sample image classification method of a countermeasure network based on meta-learning comprises the following steps:
1) randomly selecting M categories from the visible category as a training set of one epicode, and using the rest categories in the visible category as a test category of the epicode, thereby obtaining the training setIt can be known thatWherein n istrFor the number of training set samples, x, in each epamodeiVisual characteristics of the ith training sample, yiFor the corresponding class label of the ith training sample, ai∈AtrA semantic prototype of the class of the ith training sample, and ate∈AteDefining two memory modules m for semantic prototype of test class in an epicode1、m2;
2) Visual feature x of a training sampleiRandomly selecting data x of a set batch, and inputting the data x into a encoder E1And decoder D1In a constituent variational autocoder, pseudo-visual features similar to those of real visual samples are generatedThe reconstruction constraints are as follows:
3) after passing through the variational self-encoder, calculating a variational self-encoder loss function LVAE;
4) Inputting the generated pseudo-visual features into a softmax classifier after passing through a dimension reduction matrix W, obtaining the probability that one-hot classification results represent each class, and calculating the classification loss according to the real labels as follows:
wherein f represents a softmax classifier, W is a classifier parameter, and the function is to reduce the dimension of the generated features to M dimension and compare the dimension with a real label y, and define W as a classifier of a visual mode;
5) the visual feature x of the training sample and the generated pseudo visual featureInput into a discriminator D, and the countermeasure loss is LD:
6) Calculating the distillation loss L of this epsicode visual modality training processkd-wAnd Lkd-v;
7) Setting a target function as the sum of the loss functions, and carrying out multiple iteration training on the variational self-encoder of the visual mode:
wherein λ is1、λ2For the weight coefficients characterizing the reconstruction loss and the loss of the variational autocoder, the encoder E of the trained variational autocoder1And decoder D1The parameters are respectively stored in the two memory modules;
8) the category semantic prototype a in the training classtrAs an input to an auto-encoder, corresponding visual prototypes are generatedAt the same time handleClassifiers defined as semantic modalities, usingClassifying the reconstructed features and calculating the classification loss Lcls2:
9) Classifier for constraining semantic modality by classifier W of visual modalitySo as to obtain the distillation constraint of vision to semantics and calculate the distillation loss Lkd2
10) The objective function of the self-encoder for training the semantic modalities is as follows:
La=Lcls2+λ3Lsup+λ4Lkd2
wherein L issupSupervision of decoders of semantic modalities for decoders of visual modalities, λ3And λ4Weight coefficients for the supervision loss and distillation loss, respectively;
11) procedure for testing of the epicode: semantic prototype a of test setteInput to trained encoder E2And decoder D2In order to obtain a corresponding visual prototype
12) Will be provided withAndthe classifiers are spliced together to obtain all visible classesAt this time, the sorter C is reusedSClassifying all visible samples, calculating classification loss, and finely adjusting the previously learned parameters:
13) semantic features a of test samples of visible classes and invisible classestInputting into semantic encoder and decoder, and adding the generated visual feature prototype and xtComparing, and obtaining a classification result by using a nearest neighbor method;
14) and repeating the steps 1) to 13) to finish the meta-training process of a plurality of epicodes until the optimal classification performance is obtained.
As an improvement of the zero sample image classification method based on the meta-learning confrontation network, the step 2) generates the pseudo-visual characteristicsAnd step 3) calculating LVAEThe working process comprises the following steps:
(2.1) training visual characteristics x of the sampleiRandomly selecting data x of set batch, and inputting the data x into an encoder E1The probability distribution of the latent variable z is obtained as follows:
p(z|x)=N(μ,Σ)
wherein p (z | x) represents the distribution of the latent variable z, μ, Σ represent the mean and variance of the latent variable z, respectively, and N represents a normal distribution;
Wherein, w1、v1Are respectively an encoder E1And decoder D1The parameters of (a);
(2.3) calculating the variational autocoder loss function LVAE:
Wherein L isVAERepresenting the variational self-encoder loss function,representing the calculation of the expectation over the distribution of the underlying variable z, p (x | z) representing the distribution of the visual features generated by the underlying variable z, q (z | x) representing the conditional distribution of the underlying variable z, p (z) representing the prior distribution of the underlying variable z, set to a normal distribution, log being a logarithmic operation, DKLCalculated for KL divergence.
As an improvement of the zero sample image classification method based on the meta-learning confrontation network, the distillation loss L of the step 6) is calculatedkd-wAnd Lkd-vThe working process comprises the following steps:
using encoders E stored in memory modules1And decoder D1Parameters calculation distillation loss:
wherein, w1-beforeAnd v1-beforeRespectively representing the code stored in the previous epsilon of the two memory modulesDevice E1Parameter of and decoder D1When epicode is 1, w1-before=v1-before=0。
As an improvement of the zero sample image classification method based on meta-learning confrontation network of the invention, the step 8) of generating visual prototypeThe working process comprises the following steps:
(4.1) class semantic prototype a in training classtrAs an encoder E2Is input of atrMapping to a hidden space with the same dimension as z to obtain za:
za=E2(atr,w2)
Wherein w2For an encoder E2The parameters of (1);
(4.2) adding zaInput to a decoder D2In (2), generating corresponding visual prototypesAnd isWith true visual features xiThe dimensions are the same:
wherein v is2Is a decoder D2The parameter (c) of (c).
As an improvement of the zero sample image classification method based on the meta-learning confrontation network, the step 10) calculates LsupThe working process comprises the following steps:
wherein v is1、v2Are respectively a decoder D1And D2The 2 norm algorithm is used to make the decoder of the semantic mode and the decoder of the visual mode similar, so that the generated visual prototype is closer to the real visual prototype.
The invention has the advantages that the invention completes an epicode meta-training process by utilizing a method of generating a network by two paths, so that a semantic classifier learns the visual classifier, and the zero sample learning performance is improved more intuitively and efficiently by utilizing the confrontation of a generator and a discriminator and the knowledge distillation of the characteristics between the front epicode and the back epicode. The training mode of meta-learning is used in a zero sample classification task, the visual characteristic and the semantic characteristic are input into a network in sequence, the learning task for zero sample image classification is simulated in the training stage, the generation process of the visual characteristic is finished, the alignment relation of different classifiers is guaranteed, and meanwhile, the knowledge obtained by each epsilon task is fully utilized, so that the semantic classifier is trained better under the supervision of the visual classifier, the visual characteristic and the semantic characteristic which are closer to real distribution are synthesized, and a zero sample image classification technology suitable for the real situation is designed. Therefore, the method can make the generalized zero sample image classification capability more prominent, improve the generalization capability of the model, and relieve the field offset problem commonly existing in zero sample learning, so that the classification task in a more real scene can be realized, the zero sample learning can be promoted to be applied to the production and life practice, and the deep learning algorithm can be accelerated to be developed to be practical.
Drawings
Features, advantages and technical effects of exemplary embodiments of the present invention will be described below with reference to the accompanying drawings.
Fig. 1 is a schematic structural diagram of meta learning in the present invention.
Detailed Description
As used in the specification and in the claims, certain terms are used to refer to particular components. As one skilled in the art will appreciate, manufacturers may refer to a component by different names. This specification and claims do not intend to distinguish between components that differ in name but not function. In the following description and in the claims, the terms "include" and "comprise" are used in an open-ended fashion, and thus should be interpreted to mean "include, but not limited to. "substantially" means within an acceptable error range, that a person skilled in the art can solve the technical problem within a certain error range, and that a technical effect is substantially achieved.
Furthermore, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the present invention, unless otherwise expressly specified or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
The present invention will be described in further detail with reference to fig. 1, but the present invention is not limited thereto.
The invention discloses a zero sample image classification method of a meta-learning based confrontation network, which is based on the basic idea that each subtask is used for simulating the whole generalized zero sample image classification process, and a knowledge distillation method is adopted among the tasks to enhance the memory and generalization capability of a model. In each epsilon task, a plurality of classes are randomly selected from all visible classes to serve as the visible classes in each task and are used for simulating generalized zero sample learning, after a visual classifier is learned by using a variational self-encoder, the visual classifier is used for guiding the semantics and learning to a semantic classifier. In the learning process of each epicode, the related parameters are stored in the memory module to supervise the learning of the related parameters of the next epicode, so that the function of knowledge distillation is achieved. Meanwhile, the supervision of the semantic classifier by the visual classifier can also be regarded as the operation of knowledge distillation. In the testing process after each epsilon training, the nearest neighbor is used for classifying the testing samples, and the zero sample image classification technology is realized.
In zero-sample image classification, the currently common training mode is to train a model with a visible class in a single round of multiple iterations, and then predict the class of a test sample, where the test sample includes both the visible class sample and the unseen class sample. In recent years, meta learning has been widely used in learning with few samples and has achieved excellent performance. Among training methods for meta-learning, meta-training methods based on sets (epicodes) are widely used. In the training mode, each epicode updates the model by using different training data in the training process, so that the previous knowledge and experience are fully utilized to guide the learning of a new task.
The invention discloses a zero sample image classification method of a countermeasure network based on meta-learning, which comprises the steps of firstly dividing an image data set into a visible class and an unseen class, then randomly selecting M classes from the visible class as a training set of an epicode, and using the rest classes in the visible class as a test class of the epicode. Given training setIt can be known thatWherein n istrFor the number of training set samples, x, in each epamodeiFor the visual characteristics of the i-th training sample, yiFor the corresponding class label of the ith training sample, ai∈AtrA semantic prototype of the class of the ith training sample, and ate∈AteSemantic prototypes of the test classes in each epicode. Given xtTo test the visual characteristics of the sample, atThe category semantic features of the test sample are obtained. As shown in fig. 1, the following steps are performed:
1) m categories are randomly selected from the visible category as a training set of one epicode, and the remaining categories in the visible category are used as a test category of the epicode. Encoder E in visual modal variational auto-encoder is initialized respectively1And decoder D1Semantic moduleEncoder E in state self-encoder2And decoder D2And parameter w of discriminator D1、v1、w2、v2And r, defined to store the parameter w1、v1The two memory modules are m1、m2;
2) In this epicode, the visual characteristics x of the sample will be trainediRandomly selecting data x of a set batch as an encoder E1The input of (1);
3) generating a pseudo visual feature formula according to the following steps to obtain the generated pseudo visual feature
Wherein, the encoder E1Is a latent variable, denoted by z, the probability distribution of z is expressed as follows:
p(z|x)=N(μ,Σ) (2)
wherein p (z | x) represents the distribution of the latent variable z, μ, Σ represent the mean and variance of the latent variable z, respectively, and N represents the normal distribution;
4) after passing through the variational self-encoder, the pseudo visual features expected to be generated are close to real features, and a feature reconstruction loss function and a variational self-encoder loss function are respectively calculated:
wherein L isrec1A function representing the loss of reconstruction is represented,is represented by a 2 norm, LVAERepresenting a variational autocoder loss function, EPE(zx)Representing the calculation of the expectation over the distribution of the underlying variable z, p (x | z) representing the distribution of the visual features generated by the underlying variable z, q (z | x) representing the conditional distribution of the underlying variable z, p (z) representing the prior distribution of the underlying variable z, set to a normal distribution, log being a logarithmic operation, DKLCalculating KL divergence;
5) inputting the generated pseudo-visual features into a softmax classifier after passing through a dimension reduction matrix W, obtaining the probability that one-hot classification results represent each class, and calculating the classification loss according to the real labels as follows:
wherein f represents a softmax classifier, W is a classifier parameter and is used for reducing the dimension of the generated features to M dimension and then comparing with the real label y. W is defined herein as the classifier of the visual modality.
6) The visual feature x of the training sample and the generated pseudo visual featureInputting the data into a discriminator D, training the discriminator D by using a countering loss function formula, and reserving a parameter r which enables the performance of the discriminator D to be the best, wherein the countering loss function formula is as follows:
wherein L isDAs a function of the penalty of the discriminator D, ExTo calculate the expectation over the distribution of the visual features x of the training sample,for pseudo-visual features being generatedOver-distribution calculation period ofInspection;
7) the distillation loss for this epsicode was calculated as follows:
wherein w1-beforeAnd v1-beforeRespectively representing the encoders E stored in the two memory modules in the immediately preceding epamode1Parameter of and decoder D1When epicode is 1, w1-before=v1-before=0;
8) E in the visual variation self-encoder is trained by adding the loss functions of the formulas (3) to (8)1And D1Updating the memory module;
wherein λ is1、λ2Weight coefficients characterizing the reconstruction loss and the variational self-coder loss.
9) In the epicode, the category semantic prototype a in the training class is further processedtrAs an input to an autocoder, of which the encoder E2Mapping the category semantic prototype into a hidden space with the same dimension as z, and passing through a decoder D2Reconstructing the features of the hidden space into the visual space to generate corresponding visual prototypes, where the decoder uses D1And (4) supervision and constraint:
wherein, the first and the second end of the pipe are connected with each other,is a visual prototype generated from a category semantic prototype, defined as a classifier of semantic modalities, LsupRepresentation decoder D1To D 22 norm constraints of;
10) meanwhile, the visual prototype features also need to be constrained by a dimensionality reduction matrix W, namely, a classifier of a visual mode is used for constraining a classifier of a semantic mode, so that distillation constraint of vision on semantics and distillation loss L are obtainedkd2The following were used:
12) the loss functions in equations (11) to (13) are added to train the encoder E2And decoder D2:
La=Lcls2+λ3Lsup+λ4Lkd2 (14)
Wherein λ is3And λ4Weight coefficients for the supervision loss and distillation loss, respectively;
13) semantic prototype a of the testing set of the epicodeteInput to trained encoder E2And decoder D2Obtaining a corresponding visual prototype:
14) can be utilizedToAndthe classifiers are spliced together to obtain all visible classesAt this point, the sorter C is reusedSClassifying all visible samples, calculating classification loss, and calculating parameter w1、v1、w2、v2And r for fine tuning:
15) semantic features a of test samples of visible classes and invisible classestInputting the visual feature prototype and x into semantic encoder and decodertAnd (5) comparing, and obtaining a classification result by using a nearest neighbor method.
16) And repeating the steps 1) to 15), and finishing the meta-training process of a plurality of epicodes until the optimal classification performance is obtained.
Variations and modifications to the above-described embodiments may also occur to those skilled in the art, which fall within the scope of the invention as disclosed and taught herein. Therefore, the present invention is not limited to the above-mentioned embodiments, and any obvious improvement, replacement or modification made by those skilled in the art based on the present invention is within the protection scope of the present invention. Furthermore, although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.
Claims (5)
1. A zero sample image classification method of a countermeasure network based on meta-learning is characterized by comprising the following steps:
1) randomly selecting M classes from the visible classes as a training set for an epsode,the remaining classes in the visible class serve as the test class for this epsilon, training the setIt can be known thatWherein n istrFor the number of training set samples, x, in each epamodeiFor the visual characteristics of the i-th training sample, yiFor the corresponding class label of the ith training sample, ai∈AtrA semantic prototype of the class of the ith training sample, and ate∈AteDefining two memory modules m for semantic prototype of test class in an epicode1、m2;
2) Visual feature x of a training sampleiRandomly selecting data x of a set batch, and inputting the data x into a encoder E1And decoder D1In a constituent variational autocoder, pseudo-visual features similar to those of real visual samples are generatedThe reconstruction constraints are as follows:
3) after passing through the variational autocoder, calculating a variational autocoder loss function LVAE;
4) Inputting the generated pseudo-visual features into a softmax classifier after passing through a dimensionality reduction matrix W, obtaining the probability that one-hot classification results represent each class, and calculating the classification loss according to the real labels as follows:
wherein f represents a softmax classifier, W is a classifier parameter, and the function is to reduce the dimension of the generated features to M dimension and compare the dimension with a real label y, and define W as a classifier of a visual mode;
5) the visual feature x of the training sample and the generated pseudo visual featureInput into a discriminator D, and the countermeasure loss is LD:
6) Calculating the distillation loss L of this epsicode visual modality training processkd-wAnd Lkd-v;
7) Setting a target function as the sum of the loss functions, and carrying out multiple iteration training on the variational self-encoder of the visual mode:
wherein λ is1、λ2For the weight coefficients characterizing the reconstruction loss and the loss of the variational autocoder, the encoder E of the trained variational autocoder1And decoder D1The parameters are respectively stored in the two memory modules;
8) the category semantic prototype a in the training classtrAs an input to an auto-encoder, corresponding visual prototypes are generatedAt the same time handleClassifier defined as a semantic modalityBy usingClassifying the reconstructed features and calculating the classification loss Lcls2:
9) Classifier for constraining semantic modality by classifier W of visual modalitySo as to obtain the distillation constraint of vision to semantics and calculate the distillation loss Lkd2
10) The objective function of the self-encoder for training the semantic modalities is as follows:
La=Lcls2+λ3Lsup+λ4Lkd2
wherein L issupSupervision of decoders of semantic modalities for decoders of visual modalities, λ3And λ4Weight coefficients for the supervision loss and distillation loss, respectively;
11) test procedure of the epicode: semantic prototype a of test setteInput to trained encoder E2And decoder D2In order to obtain a corresponding visual prototype
12) Will be provided withAndare spliced togetherTo obtain all visible classesAt this time, the sorter C is reusedSClassifying all visible samples, calculating classification loss, and finely adjusting the previously learned parameters:
13) semantic features a of test samples of visible classes and invisible classestInputting into semantic encoder and decoder, and adding the generated visual feature prototype and xtBy contrast, where xtObtaining a classification result by using a nearest neighbor method for testing the visual characteristics of the sample;
14) and repeating the steps 1) to 13) to finish the meta-training process of a plurality of epicodes until the optimal classification performance is obtained.
2. The method for zero-sample image classification of countermeasure network based on meta-learning as claimed in claim 1, wherein the step 2) of generating pseudo-visual featuresAnd step 3) calculating LVAEThe working process comprises the following steps:
(2.1) training visual characteristics x of the sampleiRandomly selecting data x of set batch, and inputting the data x into an encoder E1The probability distribution of the latent variable z is obtained as follows:
p(z|x)=N(μ,Σ)
wherein p (z | x) represents the distribution of the latent variable z, μ, Σ represent the mean and variance of the latent variable z, respectively, and N represents a normal distribution;
Wherein, w1、v1Are respectively an encoder E1And decoder D1The parameters of (1);
(2.3) calculating the variational autocoder loss function LVAE:
Wherein L isVAERepresenting the variational self-encoder loss function,representing the calculation of the expectation over the distribution of the underlying variable z, p (x | z) representing the distribution of the visual features generated by the underlying variable z, q (z | x) representing the conditional distribution of the underlying variable z, p (z) representing the prior distribution of the underlying variable z, set to a normal distribution, log being a logarithmic operation, DKLCalculated for KL divergence.
3. The method for zero-sample image classification based on meta-learning confrontation network as claimed in claim 1, wherein the step 6) of calculating distillation loss Lkd-wAnd Lkd-vThe working process comprises the following steps:
using encoders E stored in memory modules1And decoder D1Parameters calculation distillation loss:
wherein w1-beforeAnd v1-beforeRespectively representing the encoders E stored in the two memory modules in the immediately preceding epamode1Parameter of and decoder D1When epicode is 1, w1-before=v1-before=0。
4. The method for zero-sample image classification of countermeasure network based on meta-learning as claimed in claim 1, wherein the step 8) of generating visual prototypeThe working process comprises the following steps:
(4.1) class semantic prototype a in training classtrAs an encoder E2Is input of atrMapping to a hidden space with the same dimension as z to obtain za:
za=E2(atr,w2)
Wherein, w2For an encoder E2The parameters of (1);
(4.2) adding zaInput to a decoder D2In (2), generating corresponding visual prototypesAnd isWith true visual features xiThe dimensions are the same:
wherein v is2Is a decoder D2The parameter (c) of (c).
5. The method for zero-sample image classification of countermeasure network based on meta-learning as claimed in claim 1, wherein the calculation L of step 10) issupThe working process comprises the following steps:
wherein v is1、v2Are respectively a decoder D1And D2The 2 norm algorithm is used to make the decoder of the semantic mode and the decoder of the visual mode similar, so that the generated visual prototype is closer to the real visual prototype.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011147848.9A CN112364894B (en) | 2020-10-23 | 2020-10-23 | Zero sample image classification method of countermeasure network based on meta-learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011147848.9A CN112364894B (en) | 2020-10-23 | 2020-10-23 | Zero sample image classification method of countermeasure network based on meta-learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112364894A CN112364894A (en) | 2021-02-12 |
CN112364894B true CN112364894B (en) | 2022-07-08 |
Family
ID=74511961
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011147848.9A Active CN112364894B (en) | 2020-10-23 | 2020-10-23 | Zero sample image classification method of countermeasure network based on meta-learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112364894B (en) |
Families Citing this family (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113139591B (en) * | 2021-04-14 | 2023-02-24 | 广州大学 | Generalized zero-sample image classification method based on enhanced multi-mode alignment |
CN113177587B (en) * | 2021-04-27 | 2023-04-07 | 西安电子科技大学 | Generalized zero sample target classification method based on active learning and variational self-encoder |
CN113344069B (en) * | 2021-05-31 | 2023-01-24 | 成都快眼科技有限公司 | Image classification method for unsupervised visual representation learning based on multi-dimensional relation alignment |
CN113537322B (en) * | 2021-07-02 | 2023-04-18 | 电子科技大学 | Zero sample visual classification method for cross-modal semantic enhancement generation countermeasure network |
CN113610212B (en) * | 2021-07-05 | 2024-03-05 | 宜通世纪科技股份有限公司 | Method and device for synthesizing multi-mode sensor data and storage medium |
CN113343941B (en) * | 2021-07-20 | 2023-07-25 | 中国人民大学 | Zero sample action recognition method and system based on mutual information similarity |
CN113688879B (en) * | 2021-07-30 | 2024-05-24 | 南京理工大学 | Generalized zero sample learning classification method based on confidence distribution external detection |
CN113642621A (en) * | 2021-08-03 | 2021-11-12 | 南京邮电大学 | Zero sample image classification method based on generation countermeasure network |
CN113610173B (en) * | 2021-08-13 | 2022-10-04 | 天津大学 | Knowledge distillation-based multi-span domain few-sample classification method |
CN113688944B (en) * | 2021-09-29 | 2022-12-27 | 南京览众智能科技有限公司 | Image identification method based on meta-learning |
CN114048850A (en) * | 2021-10-29 | 2022-02-15 | 广东坚美铝型材厂(集团)有限公司 | Maximum interval semantic feature self-learning method, computer device and storage medium |
CN114037866B (en) * | 2021-11-03 | 2024-04-09 | 哈尔滨工程大学 | Generalized zero sample image classification method based on distinguishable pseudo-feature synthesis |
CN114120049B (en) * | 2022-01-27 | 2023-08-29 | 南京理工大学 | Long-tail distribution visual identification method based on prototype classifier learning |
CN114998613B (en) * | 2022-06-24 | 2024-04-26 | 安徽工业大学 | Multi-mark zero sample learning method based on deep mutual learning |
CN115331012B (en) * | 2022-10-14 | 2023-03-24 | 山东建筑大学 | Joint generation type image instance segmentation method and system based on zero sample learning |
CN117541555A (en) * | 2023-11-16 | 2024-02-09 | 广州市公路实业发展有限公司 | Road pavement disease detection method and system |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108399421A (en) * | 2018-01-31 | 2018-08-14 | 南京邮电大学 | A kind of zero sample classification method of depth of word-based insertion |
CN108875818A (en) * | 2018-06-06 | 2018-11-23 | 西安交通大学 | Based on variation from code machine and confrontation network integration zero sample image classification method |
WO2019055114A1 (en) * | 2017-09-12 | 2019-03-21 | Hrl Laboratories, Llc | Attribute aware zero shot machine vision system via joint sparse representations |
CN110097095A (en) * | 2019-04-15 | 2019-08-06 | 天津大学 | A kind of zero sample classification method generating confrontation network based on multiple view |
CN111476294A (en) * | 2020-04-07 | 2020-07-31 | 南昌航空大学 | Zero sample image identification method and system based on generation countermeasure network |
CA3076646A1 (en) * | 2019-03-22 | 2020-09-22 | Royal Bank Of Canada | System and method for generation of unseen composite data objects |
US10803646B1 (en) * | 2019-08-19 | 2020-10-13 | Neon Evolution Inc. | Methods and systems for image and voice processing |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11113599B2 (en) * | 2017-06-22 | 2021-09-07 | Adobe Inc. | Image captioning utilizing semantic text modeling and adversarial learning |
CN112889073A (en) * | 2018-08-30 | 2021-06-01 | 谷歌有限责任公司 | Cross-language classification using multi-language neural machine translation |
US11087184B2 (en) * | 2018-09-25 | 2021-08-10 | Nec Corporation | Network reparameterization for new class categorization |
US11087174B2 (en) * | 2018-09-25 | 2021-08-10 | Nec Corporation | Deep group disentangled embedding and network weight generation for visual inspection |
CN109492662B (en) * | 2018-09-27 | 2021-09-14 | 天津大学 | Zero sample image classification method based on confrontation self-encoder model |
CN110175251A (en) * | 2019-05-25 | 2019-08-27 | 西安电子科技大学 | The zero sample Sketch Searching method based on semantic confrontation network |
CN110580501B (en) * | 2019-08-20 | 2023-04-25 | 天津大学 | Zero sample image classification method based on variational self-coding countermeasure network |
CN110826638B (en) * | 2019-11-12 | 2023-04-18 | 福州大学 | Zero sample image classification model based on repeated attention network and method thereof |
CN111581405B (en) * | 2020-04-26 | 2021-10-26 | 电子科技大学 | Cross-modal generalization zero sample retrieval method for generating confrontation network based on dual learning |
-
2020
- 2020-10-23 CN CN202011147848.9A patent/CN112364894B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019055114A1 (en) * | 2017-09-12 | 2019-03-21 | Hrl Laboratories, Llc | Attribute aware zero shot machine vision system via joint sparse representations |
CN108399421A (en) * | 2018-01-31 | 2018-08-14 | 南京邮电大学 | A kind of zero sample classification method of depth of word-based insertion |
CN108875818A (en) * | 2018-06-06 | 2018-11-23 | 西安交通大学 | Based on variation from code machine and confrontation network integration zero sample image classification method |
CA3076646A1 (en) * | 2019-03-22 | 2020-09-22 | Royal Bank Of Canada | System and method for generation of unseen composite data objects |
CN110097095A (en) * | 2019-04-15 | 2019-08-06 | 天津大学 | A kind of zero sample classification method generating confrontation network based on multiple view |
US10803646B1 (en) * | 2019-08-19 | 2020-10-13 | Neon Evolution Inc. | Methods and systems for image and voice processing |
CN111476294A (en) * | 2020-04-07 | 2020-07-31 | 南昌航空大学 | Zero sample image identification method and system based on generation countermeasure network |
Non-Patent Citations (3)
Title |
---|
Incremental zero-shot learning based on attributes for image classification;Nan Xue, et al;《2017 IEEE International Conference on Image Processing (ICIP)》;20180222;全文 * |
基于Res-Gan网络的零样本学习研究;林娇娇;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20200215(第2期);全文 * |
基于生成对抗网络的零样本图像分类;魏宏喜等;《北京航空航天大学学报》;20191231(第12期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN112364894A (en) | 2021-02-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112364894B (en) | Zero sample image classification method of countermeasure network based on meta-learning | |
CN110580501B (en) | Zero sample image classification method based on variational self-coding countermeasure network | |
Bakhtin et al. | Real or fake? learning to discriminate machine from human generated text | |
CN109376242B (en) | Text classification method based on cyclic neural network variant and convolutional neural network | |
Perez-Martin et al. | Improving video captioning with temporal composition of a visual-syntactic embedding | |
CN105975573B (en) | A kind of file classification method based on KNN | |
US7362892B2 (en) | Self-optimizing classifier | |
CN108647226B (en) | Hybrid recommendation method based on variational automatic encoder | |
CN114998602B (en) | Domain adaptive learning method and system based on low confidence sample contrast loss | |
CN112364893B (en) | Semi-supervised zero-sample image classification method based on data enhancement | |
CN113127737B (en) | Personalized search method and search system integrating attention mechanism | |
Huang et al. | Large-scale weakly-supervised content embeddings for music recommendation and tagging | |
CN112015902A (en) | Least-order text classification method under metric-based meta-learning framework | |
CN113886562A (en) | AI resume screening method, system, equipment and storage medium | |
Shannon et al. | Non-saturating GAN training as divergence minimization | |
CN112529638A (en) | Service demand dynamic prediction method and system based on user classification and deep learning | |
CN106021402A (en) | Multi-modal multi-class Boosting frame construction method and device for cross-modal retrieval | |
Chen et al. | Trada: tree based ranking function adaptation | |
CN114399661A (en) | Instance awareness backbone network training method | |
CN116432125B (en) | Code Classification Method Based on Hash Algorithm | |
CN116704208A (en) | Local interpretable method based on characteristic relation | |
CN114202038B (en) | Crowdsourcing defect classification method based on DBM deep learning | |
CN114943216A (en) | Case microblog attribute-level viewpoint mining method based on graph attention network | |
CN114969511A (en) | Content recommendation method, device and medium based on fragments | |
CN110162629B (en) | Text classification method based on multi-base model framework |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |