CN112364894B - Zero sample image classification method of countermeasure network based on meta-learning - Google Patents

Zero sample image classification method of countermeasure network based on meta-learning Download PDF

Info

Publication number
CN112364894B
CN112364894B CN202011147848.9A CN202011147848A CN112364894B CN 112364894 B CN112364894 B CN 112364894B CN 202011147848 A CN202011147848 A CN 202011147848A CN 112364894 B CN112364894 B CN 112364894B
Authority
CN
China
Prior art keywords
visual
training
loss
semantic
decoder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011147848.9A
Other languages
Chinese (zh)
Other versions
CN112364894A (en
Inventor
冀中
崔碧莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202011147848.9A priority Critical patent/CN112364894B/en
Publication of CN112364894A publication Critical patent/CN112364894A/en
Application granted granted Critical
Publication of CN112364894B publication Critical patent/CN112364894B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of image classification, and particularly relates to a zero sample image classification method of a countermeasure network based on meta-learning. The method can make the generalized zero sample image classification capability more prominent, improve the generalization capability of the model and relieve the field offset problem commonly existing in zero sample learning.

Description

Zero sample image classification method of countermeasure network based on meta-learning
Technical Field
The invention belongs to the technical field of image classification, and particularly relates to a zero sample image classification method of a countermeasure network based on meta-learning.
Background
In recent years, machine learning has been widely applied in the fields of natural language processing, computer vision, speech recognition, etc., while in the field of computer vision, the task of image classification is one of the most concerned and most widely applied tasks, and various classification techniques are developed endlessly and the performance is continuously improved. In a machine learning task, a supervised learning method for realizing classification through a large number of artificially labeled images is a traditional method for image classification, and is well applied to real life. However, it is not easy to collect enough samples and label for each category of image in practice, and a lot of labor is consumed. It is easy to understand that species distribution in nature presents a long tail effect, only a few classes of species have enough image samples for supervised learning to train a classification model, and many classes of species have few samples and difficult label labeling, which makes supervised learning a huge challenge. Therefore, zero sample learning arises in order to solve the problem of sample label loss.
The zero sample image classification is an important direction of zero sample learning and is used for solving the classification problem of difficult image labeling, in the traditional zero sample image classification setting, a visible image sample and a label training model thereof are utilized, an unseen image sample test model is utilized, and the classes of the test image and the classes of the training image are not intersected under the setting; whereas in the generalized zero-sample image classification setting, the test image sample includes both images of the visible class and images of the unseen class. The zero sample learning referred to in this patent includes the two settings described above. Currently, the main research methods for zero sample image classification can be roughly divided into two types: the method is based on mapping, and the visual characteristic and the semantic characteristic are mapped through mapping between a visual characteristic space and a semantic characteristic space or mapping from the visual characteristic space and the semantic characteristic space to a public space, so that a better classification result is obtained; and the other is a generation-based method, which utilizes generation models such as a generation countermeasure network and a variational self-encoder to generate pseudo features of the test sample, and determines the category of the test sample by comparing the similarity between the generated pseudo features and the real features.
In order to complete the prediction of the test sample class, the zero sample image classification technology achieves the effect of knowledge migration by utilizing the semantic information of a visible class and an invisible class. The experimental setup is as follows: in the training phase, a labeled sample of a visible class is given
Figure GDA0003616299610000021
Where n is the number of samples of the visible class,
Figure GDA0003616299610000023
is the visual characteristic of the ith sample,
Figure GDA0003616299610000024
indicating its corresponding category label and, in addition,
Figure GDA0003616299610000025
representing its corresponding class level semantic prototype. The traditional zero-sample image classification is a semantic feature A of a given unseen classUTo test a sample xtClassified into the unseen class YUIn, and
Figure GDA0003616299610000022
the generalized zero-sample image classification is to classify a test sample x according to the semantic features of a visible class and an invisible classtThe classification is visible and invisible. In summary, the zero-sample image classification is to train a model by using the relevant features of the visible class samples, and predict the class label y of the test sample by using the modelt
The feature representation can be incomplete by learning a simple mapping relation between a visual space and a semantic space, and meanwhile, a low-dimensional pivot point problem can be generated. Simple mapping from a high-dimensional visual space to a low-dimensional semantic space by learning can cause a pivot point phenomenon that samples of different classes in the high-dimensional are compressed to the same class of semantics in the low-dimensional, and similar problems can also occur with simple mapping from the low-dimensional space to the high-dimensional space. In recent years, the generation of countermeasure networks has gained attention by researchers, and in combination with zero sample learning, the accuracy of classification is improved by generating a large number of pseudo features. However, the essential disadvantage of generating the countermeasure network is that the training process is unstable, and the problem of mode collapse is easily caused. Yet another generation-based approach introduces a variational auto-encoder (VAE) that generates pseudo-visual features by inputting VAEs conditioned on semantic information. VAEs tend to distort the visual features generated due to the introduction of lower bounds of variation.
Disclosure of Invention
The invention aims to: aiming at the defects of the prior art, the zero sample image classification method of the countermeasure network based on the meta-learning is provided, and the zero sample image classification accuracy can be improved.
In order to achieve the purpose, the invention adopts the following technical scheme:
a zero sample image classification method of a countermeasure network based on meta-learning comprises the following steps:
1) randomly selecting M categories from the visible category as a training set of one epicode, and using the rest categories in the visible category as a test category of the epicode, thereby obtaining the training set
Figure GDA0003616299610000031
It can be known that
Figure GDA0003616299610000036
Wherein n istrFor the number of training set samples, x, in each epamodeiVisual characteristics of the ith training sample, yiFor the corresponding class label of the ith training sample, ai∈AtrA semantic prototype of the class of the ith training sample, and ate∈AteDefining two memory modules m for semantic prototype of test class in an epicode1、m2
2) Visual feature x of a training sampleiRandomly selecting data x of a set batch, and inputting the data x into a encoder E1And decoder D1In a constituent variational autocoder, pseudo-visual features similar to those of real visual samples are generated
Figure GDA0003616299610000032
The reconstruction constraints are as follows:
Figure GDA0003616299610000033
wherein the content of the first and second substances,
Figure GDA0003616299610000034
is expressed by a 2 norm;
3) after passing through the variational self-encoder, calculating a variational self-encoder loss function LVAE
4) Inputting the generated pseudo-visual features into a softmax classifier after passing through a dimension reduction matrix W, obtaining the probability that one-hot classification results represent each class, and calculating the classification loss according to the real labels as follows:
Figure GDA0003616299610000035
wherein f represents a softmax classifier, W is a classifier parameter, and the function is to reduce the dimension of the generated features to M dimension and compare the dimension with a real label y, and define W as a classifier of a visual mode;
5) the visual feature x of the training sample and the generated pseudo visual feature
Figure GDA0003616299610000041
Input into a discriminator D, and the countermeasure loss is LD
Figure GDA0003616299610000042
6) Calculating the distillation loss L of this epsicode visual modality training processkd-wAnd Lkd-v
7) Setting a target function as the sum of the loss functions, and carrying out multiple iteration training on the variational self-encoder of the visual mode:
Figure GDA0003616299610000043
wherein λ is1、λ2For the weight coefficients characterizing the reconstruction loss and the loss of the variational autocoder, the encoder E of the trained variational autocoder1And decoder D1The parameters are respectively stored in the two memory modules;
8) the category semantic prototype a in the training classtrAs an input to an auto-encoder, corresponding visual prototypes are generated
Figure GDA0003616299610000044
At the same time handle
Figure GDA0003616299610000045
Classifiers defined as semantic modalities, using
Figure GDA0003616299610000046
Classifying the reconstructed features and calculating the classification loss Lcls2
Figure GDA0003616299610000047
9) Classifier for constraining semantic modality by classifier W of visual modality
Figure GDA0003616299610000048
So as to obtain the distillation constraint of vision to semantics and calculate the distillation loss Lkd2
Figure GDA0003616299610000049
10) The objective function of the self-encoder for training the semantic modalities is as follows:
La=Lcls23Lsup4Lkd2
wherein L issupSupervision of decoders of semantic modalities for decoders of visual modalities, λ3And λ4Weight coefficients for the supervision loss and distillation loss, respectively;
11) procedure for testing of the epicode: semantic prototype a of test setteInput to trained encoder E2And decoder D2In order to obtain a corresponding visual prototype
Figure GDA0003616299610000051
12) Will be provided with
Figure GDA0003616299610000052
And
Figure GDA0003616299610000053
the classifiers are spliced together to obtain all visible classes
Figure GDA0003616299610000054
At this time, the sorter C is reusedSClassifying all visible samples, calculating classification loss, and finely adjusting the previously learned parameters:
Figure GDA0003616299610000055
13) semantic features a of test samples of visible classes and invisible classestInputting into semantic encoder and decoder, and adding the generated visual feature prototype and xtComparing, and obtaining a classification result by using a nearest neighbor method;
14) and repeating the steps 1) to 13) to finish the meta-training process of a plurality of epicodes until the optimal classification performance is obtained.
As an improvement of the zero sample image classification method based on the meta-learning confrontation network, the step 2) generates the pseudo-visual characteristics
Figure GDA0003616299610000058
And step 3) calculating LVAEThe working process comprises the following steps:
(2.1) training visual characteristics x of the sampleiRandomly selecting data x of set batch, and inputting the data x into an encoder E1The probability distribution of the latent variable z is obtained as follows:
p(z|x)=N(μ,Σ)
wherein p (z | x) represents the distribution of the latent variable z, μ, Σ represent the mean and variance of the latent variable z, respectively, and N represents a normal distribution;
(2.2) input z to decoder D1In generating pseudo-visual features
Figure GDA0003616299610000056
Figure GDA0003616299610000057
Wherein, w1、v1Are respectively an encoder E1And decoder D1The parameters of (a);
(2.3) calculating the variational autocoder loss function LVAE
Figure GDA0003616299610000067
Wherein L isVAERepresenting the variational self-encoder loss function,
Figure GDA0003616299610000068
representing the calculation of the expectation over the distribution of the underlying variable z, p (x | z) representing the distribution of the visual features generated by the underlying variable z, q (z | x) representing the conditional distribution of the underlying variable z, p (z) representing the prior distribution of the underlying variable z, set to a normal distribution, log being a logarithmic operation, DKLCalculated for KL divergence.
As an improvement of the zero sample image classification method based on the meta-learning confrontation network, the distillation loss L of the step 6) is calculatedkd-wAnd Lkd-vThe working process comprises the following steps:
using encoders E stored in memory modules1And decoder D1Parameters calculation distillation loss:
Figure GDA0003616299610000061
Figure GDA0003616299610000062
wherein, w1-beforeAnd v1-beforeRespectively representing the code stored in the previous epsilon of the two memory modulesDevice E1Parameter of and decoder D1When epicode is 1, w1-before=v1-before=0。
As an improvement of the zero sample image classification method based on meta-learning confrontation network of the invention, the step 8) of generating visual prototype
Figure GDA0003616299610000063
The working process comprises the following steps:
(4.1) class semantic prototype a in training classtrAs an encoder E2Is input of atrMapping to a hidden space with the same dimension as z to obtain za
za=E2(atr,w2)
Wherein w2For an encoder E2The parameters of (1);
(4.2) adding zaInput to a decoder D2In (2), generating corresponding visual prototypes
Figure GDA0003616299610000064
And is
Figure GDA0003616299610000065
With true visual features xiThe dimensions are the same:
Figure GDA0003616299610000066
wherein v is2Is a decoder D2The parameter (c) of (c).
As an improvement of the zero sample image classification method based on the meta-learning confrontation network, the step 10) calculates LsupThe working process comprises the following steps:
Figure GDA0003616299610000071
wherein v is1、v2Are respectively a decoder D1And D2The 2 norm algorithm is used to make the decoder of the semantic mode and the decoder of the visual mode similar, so that the generated visual prototype is closer to the real visual prototype.
The invention has the advantages that the invention completes an epicode meta-training process by utilizing a method of generating a network by two paths, so that a semantic classifier learns the visual classifier, and the zero sample learning performance is improved more intuitively and efficiently by utilizing the confrontation of a generator and a discriminator and the knowledge distillation of the characteristics between the front epicode and the back epicode. The training mode of meta-learning is used in a zero sample classification task, the visual characteristic and the semantic characteristic are input into a network in sequence, the learning task for zero sample image classification is simulated in the training stage, the generation process of the visual characteristic is finished, the alignment relation of different classifiers is guaranteed, and meanwhile, the knowledge obtained by each epsilon task is fully utilized, so that the semantic classifier is trained better under the supervision of the visual classifier, the visual characteristic and the semantic characteristic which are closer to real distribution are synthesized, and a zero sample image classification technology suitable for the real situation is designed. Therefore, the method can make the generalized zero sample image classification capability more prominent, improve the generalization capability of the model, and relieve the field offset problem commonly existing in zero sample learning, so that the classification task in a more real scene can be realized, the zero sample learning can be promoted to be applied to the production and life practice, and the deep learning algorithm can be accelerated to be developed to be practical.
Drawings
Features, advantages and technical effects of exemplary embodiments of the present invention will be described below with reference to the accompanying drawings.
Fig. 1 is a schematic structural diagram of meta learning in the present invention.
Detailed Description
As used in the specification and in the claims, certain terms are used to refer to particular components. As one skilled in the art will appreciate, manufacturers may refer to a component by different names. This specification and claims do not intend to distinguish between components that differ in name but not function. In the following description and in the claims, the terms "include" and "comprise" are used in an open-ended fashion, and thus should be interpreted to mean "include, but not limited to. "substantially" means within an acceptable error range, that a person skilled in the art can solve the technical problem within a certain error range, and that a technical effect is substantially achieved.
Furthermore, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the present invention, unless otherwise expressly specified or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
The present invention will be described in further detail with reference to fig. 1, but the present invention is not limited thereto.
The invention discloses a zero sample image classification method of a meta-learning based confrontation network, which is based on the basic idea that each subtask is used for simulating the whole generalized zero sample image classification process, and a knowledge distillation method is adopted among the tasks to enhance the memory and generalization capability of a model. In each epsilon task, a plurality of classes are randomly selected from all visible classes to serve as the visible classes in each task and are used for simulating generalized zero sample learning, after a visual classifier is learned by using a variational self-encoder, the visual classifier is used for guiding the semantics and learning to a semantic classifier. In the learning process of each epicode, the related parameters are stored in the memory module to supervise the learning of the related parameters of the next epicode, so that the function of knowledge distillation is achieved. Meanwhile, the supervision of the semantic classifier by the visual classifier can also be regarded as the operation of knowledge distillation. In the testing process after each epsilon training, the nearest neighbor is used for classifying the testing samples, and the zero sample image classification technology is realized.
In zero-sample image classification, the currently common training mode is to train a model with a visible class in a single round of multiple iterations, and then predict the class of a test sample, where the test sample includes both the visible class sample and the unseen class sample. In recent years, meta learning has been widely used in learning with few samples and has achieved excellent performance. Among training methods for meta-learning, meta-training methods based on sets (epicodes) are widely used. In the training mode, each epicode updates the model by using different training data in the training process, so that the previous knowledge and experience are fully utilized to guide the learning of a new task.
The invention discloses a zero sample image classification method of a countermeasure network based on meta-learning, which comprises the steps of firstly dividing an image data set into a visible class and an unseen class, then randomly selecting M classes from the visible class as a training set of an epicode, and using the rest classes in the visible class as a test class of the epicode. Given training set
Figure GDA0003616299610000091
It can be known that
Figure GDA0003616299610000092
Wherein n istrFor the number of training set samples, x, in each epamodeiFor the visual characteristics of the i-th training sample, yiFor the corresponding class label of the ith training sample, ai∈AtrA semantic prototype of the class of the ith training sample, and ate∈AteSemantic prototypes of the test classes in each epicode. Given xtTo test the visual characteristics of the sample, atThe category semantic features of the test sample are obtained. As shown in fig. 1, the following steps are performed:
1) m categories are randomly selected from the visible category as a training set of one epicode, and the remaining categories in the visible category are used as a test category of the epicode. Encoder E in visual modal variational auto-encoder is initialized respectively1And decoder D1Semantic moduleEncoder E in state self-encoder2And decoder D2And parameter w of discriminator D1、v1、w2、v2And r, defined to store the parameter w1、v1The two memory modules are m1、m2
2) In this epicode, the visual characteristics x of the sample will be trainediRandomly selecting data x of a set batch as an encoder E1The input of (1);
3) generating a pseudo visual feature formula according to the following steps to obtain the generated pseudo visual feature
Figure GDA0003616299610000101
Figure GDA0003616299610000102
Wherein, the encoder E1Is a latent variable, denoted by z, the probability distribution of z is expressed as follows:
p(z|x)=N(μ,Σ) (2)
wherein p (z | x) represents the distribution of the latent variable z, μ, Σ represent the mean and variance of the latent variable z, respectively, and N represents the normal distribution;
4) after passing through the variational self-encoder, the pseudo visual features expected to be generated are close to real features, and a feature reconstruction loss function and a variational self-encoder loss function are respectively calculated:
Figure GDA0003616299610000103
Figure GDA0003616299610000104
wherein L isrec1A function representing the loss of reconstruction is represented,
Figure GDA0003616299610000105
is represented by a 2 norm, LVAERepresenting a variational autocoder loss function, EPE(zx)Representing the calculation of the expectation over the distribution of the underlying variable z, p (x | z) representing the distribution of the visual features generated by the underlying variable z, q (z | x) representing the conditional distribution of the underlying variable z, p (z) representing the prior distribution of the underlying variable z, set to a normal distribution, log being a logarithmic operation, DKLCalculating KL divergence;
5) inputting the generated pseudo-visual features into a softmax classifier after passing through a dimension reduction matrix W, obtaining the probability that one-hot classification results represent each class, and calculating the classification loss according to the real labels as follows:
Figure GDA0003616299610000111
wherein f represents a softmax classifier, W is a classifier parameter and is used for reducing the dimension of the generated features to M dimension and then comparing with the real label y. W is defined herein as the classifier of the visual modality.
6) The visual feature x of the training sample and the generated pseudo visual feature
Figure GDA0003616299610000112
Inputting the data into a discriminator D, training the discriminator D by using a countering loss function formula, and reserving a parameter r which enables the performance of the discriminator D to be the best, wherein the countering loss function formula is as follows:
Figure GDA0003616299610000113
wherein L isDAs a function of the penalty of the discriminator D, ExTo calculate the expectation over the distribution of the visual features x of the training sample,
Figure GDA0003616299610000114
for pseudo-visual features being generated
Figure GDA0003616299610000115
Over-distribution calculation period ofInspection;
7) the distillation loss for this epsicode was calculated as follows:
Figure GDA0003616299610000116
Figure GDA0003616299610000117
wherein w1-beforeAnd v1-beforeRespectively representing the encoders E stored in the two memory modules in the immediately preceding epamode1Parameter of and decoder D1When epicode is 1, w1-before=v1-before=0;
8) E in the visual variation self-encoder is trained by adding the loss functions of the formulas (3) to (8)1And D1Updating the memory module;
Figure GDA0003616299610000118
wherein λ is1、λ2Weight coefficients characterizing the reconstruction loss and the variational self-coder loss.
9) In the epicode, the category semantic prototype a in the training class is further processedtrAs an input to an autocoder, of which the encoder E2Mapping the category semantic prototype into a hidden space with the same dimension as z, and passing through a decoder D2Reconstructing the features of the hidden space into the visual space to generate corresponding visual prototypes, where the decoder uses D1And (4) supervision and constraint:
Figure GDA0003616299610000121
Figure GDA0003616299610000122
wherein, the first and the second end of the pipe are connected with each other,
Figure GDA0003616299610000123
is a visual prototype generated from a category semantic prototype, defined as a classifier of semantic modalities, LsupRepresentation decoder D1To D 22 norm constraints of;
10) meanwhile, the visual prototype features also need to be constrained by a dimensionality reduction matrix W, namely, a classifier of a visual mode is used for constraining a classifier of a semantic mode, so that distillation constraint of vision on semantics and distillation loss L are obtainedkd2The following were used:
Figure GDA0003616299610000124
11) by using
Figure GDA0003616299610000125
Classifying the features, and calculating classification loss:
Figure GDA0003616299610000126
12) the loss functions in equations (11) to (13) are added to train the encoder E2And decoder D2
La=Lcls23Lsup4Lkd2 (14)
Wherein λ is3And λ4Weight coefficients for the supervision loss and distillation loss, respectively;
13) semantic prototype a of the testing set of the epicodeteInput to trained encoder E2And decoder D2Obtaining a corresponding visual prototype:
Figure GDA0003616299610000127
14) can be utilizedTo
Figure GDA0003616299610000128
And
Figure GDA0003616299610000129
the classifiers are spliced together to obtain all visible classes
Figure GDA00036162996100001210
At this point, the sorter C is reusedSClassifying all visible samples, calculating classification loss, and calculating parameter w1、v1、w2、v2And r for fine tuning:
Figure GDA00036162996100001211
15) semantic features a of test samples of visible classes and invisible classestInputting the visual feature prototype and x into semantic encoder and decodertAnd (5) comparing, and obtaining a classification result by using a nearest neighbor method.
16) And repeating the steps 1) to 15), and finishing the meta-training process of a plurality of epicodes until the optimal classification performance is obtained.
Variations and modifications to the above-described embodiments may also occur to those skilled in the art, which fall within the scope of the invention as disclosed and taught herein. Therefore, the present invention is not limited to the above-mentioned embodiments, and any obvious improvement, replacement or modification made by those skilled in the art based on the present invention is within the protection scope of the present invention. Furthermore, although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.

Claims (5)

1. A zero sample image classification method of a countermeasure network based on meta-learning is characterized by comprising the following steps:
1) randomly selecting M classes from the visible classes as a training set for an epsode,the remaining classes in the visible class serve as the test class for this epsilon, training the set
Figure FDA0003616299600000011
It can be known that
Figure FDA0003616299600000012
Wherein n istrFor the number of training set samples, x, in each epamodeiFor the visual characteristics of the i-th training sample, yiFor the corresponding class label of the ith training sample, ai∈AtrA semantic prototype of the class of the ith training sample, and ate∈AteDefining two memory modules m for semantic prototype of test class in an epicode1、m2
2) Visual feature x of a training sampleiRandomly selecting data x of a set batch, and inputting the data x into a encoder E1And decoder D1In a constituent variational autocoder, pseudo-visual features similar to those of real visual samples are generated
Figure FDA0003616299600000013
The reconstruction constraints are as follows:
Figure FDA0003616299600000014
wherein the content of the first and second substances,
Figure FDA0003616299600000015
is expressed by a 2 norm;
3) after passing through the variational autocoder, calculating a variational autocoder loss function LVAE
4) Inputting the generated pseudo-visual features into a softmax classifier after passing through a dimensionality reduction matrix W, obtaining the probability that one-hot classification results represent each class, and calculating the classification loss according to the real labels as follows:
Figure FDA0003616299600000016
wherein f represents a softmax classifier, W is a classifier parameter, and the function is to reduce the dimension of the generated features to M dimension and compare the dimension with a real label y, and define W as a classifier of a visual mode;
5) the visual feature x of the training sample and the generated pseudo visual feature
Figure FDA0003616299600000021
Input into a discriminator D, and the countermeasure loss is LD
Figure FDA0003616299600000022
6) Calculating the distillation loss L of this epsicode visual modality training processkd-wAnd Lkd-v
7) Setting a target function as the sum of the loss functions, and carrying out multiple iteration training on the variational self-encoder of the visual mode:
Figure FDA0003616299600000023
wherein λ is1、λ2For the weight coefficients characterizing the reconstruction loss and the loss of the variational autocoder, the encoder E of the trained variational autocoder1And decoder D1The parameters are respectively stored in the two memory modules;
8) the category semantic prototype a in the training classtrAs an input to an auto-encoder, corresponding visual prototypes are generated
Figure FDA0003616299600000024
At the same time handle
Figure FDA0003616299600000025
Classifier defined as a semantic modalityBy using
Figure FDA0003616299600000026
Classifying the reconstructed features and calculating the classification loss Lcls2
Figure FDA0003616299600000027
9) Classifier for constraining semantic modality by classifier W of visual modality
Figure FDA0003616299600000028
So as to obtain the distillation constraint of vision to semantics and calculate the distillation loss Lkd2
Figure FDA0003616299600000029
10) The objective function of the self-encoder for training the semantic modalities is as follows:
La=Lcls23Lsup4Lkd2
wherein L issupSupervision of decoders of semantic modalities for decoders of visual modalities, λ3And λ4Weight coefficients for the supervision loss and distillation loss, respectively;
11) test procedure of the epicode: semantic prototype a of test setteInput to trained encoder E2And decoder D2In order to obtain a corresponding visual prototype
Figure FDA00036162996000000210
12) Will be provided with
Figure FDA0003616299600000031
And
Figure FDA0003616299600000032
are spliced togetherTo obtain all visible classes
Figure FDA0003616299600000033
At this time, the sorter C is reusedSClassifying all visible samples, calculating classification loss, and finely adjusting the previously learned parameters:
Figure FDA0003616299600000034
13) semantic features a of test samples of visible classes and invisible classestInputting into semantic encoder and decoder, and adding the generated visual feature prototype and xtBy contrast, where xtObtaining a classification result by using a nearest neighbor method for testing the visual characteristics of the sample;
14) and repeating the steps 1) to 13) to finish the meta-training process of a plurality of epicodes until the optimal classification performance is obtained.
2. The method for zero-sample image classification of countermeasure network based on meta-learning as claimed in claim 1, wherein the step 2) of generating pseudo-visual features
Figure FDA0003616299600000039
And step 3) calculating LVAEThe working process comprises the following steps:
(2.1) training visual characteristics x of the sampleiRandomly selecting data x of set batch, and inputting the data x into an encoder E1The probability distribution of the latent variable z is obtained as follows:
p(z|x)=N(μ,Σ)
wherein p (z | x) represents the distribution of the latent variable z, μ, Σ represent the mean and variance of the latent variable z, respectively, and N represents a normal distribution;
(2.2) input z to decoder D1In generating pseudo visual features
Figure FDA0003616299600000035
Figure FDA0003616299600000036
Wherein, w1、v1Are respectively an encoder E1And decoder D1The parameters of (1);
(2.3) calculating the variational autocoder loss function LVAE
Figure FDA0003616299600000037
Wherein L isVAERepresenting the variational self-encoder loss function,
Figure FDA0003616299600000038
representing the calculation of the expectation over the distribution of the underlying variable z, p (x | z) representing the distribution of the visual features generated by the underlying variable z, q (z | x) representing the conditional distribution of the underlying variable z, p (z) representing the prior distribution of the underlying variable z, set to a normal distribution, log being a logarithmic operation, DKLCalculated for KL divergence.
3. The method for zero-sample image classification based on meta-learning confrontation network as claimed in claim 1, wherein the step 6) of calculating distillation loss Lkd-wAnd Lkd-vThe working process comprises the following steps:
using encoders E stored in memory modules1And decoder D1Parameters calculation distillation loss:
Figure FDA0003616299600000041
Figure FDA0003616299600000042
wherein w1-beforeAnd v1-beforeRespectively representing the encoders E stored in the two memory modules in the immediately preceding epamode1Parameter of and decoder D1When epicode is 1, w1-before=v1-before=0。
4. The method for zero-sample image classification of countermeasure network based on meta-learning as claimed in claim 1, wherein the step 8) of generating visual prototype
Figure FDA0003616299600000043
The working process comprises the following steps:
(4.1) class semantic prototype a in training classtrAs an encoder E2Is input of atrMapping to a hidden space with the same dimension as z to obtain za
za=E2(atr,w2)
Wherein, w2For an encoder E2The parameters of (1);
(4.2) adding zaInput to a decoder D2In (2), generating corresponding visual prototypes
Figure FDA0003616299600000044
And is
Figure FDA0003616299600000045
With true visual features xiThe dimensions are the same:
Figure FDA0003616299600000046
wherein v is2Is a decoder D2The parameter (c) of (c).
5. The method for zero-sample image classification of countermeasure network based on meta-learning as claimed in claim 1, wherein the calculation L of step 10) issupThe working process comprises the following steps:
Figure FDA0003616299600000051
wherein v is1、v2Are respectively a decoder D1And D2The 2 norm algorithm is used to make the decoder of the semantic mode and the decoder of the visual mode similar, so that the generated visual prototype is closer to the real visual prototype.
CN202011147848.9A 2020-10-23 2020-10-23 Zero sample image classification method of countermeasure network based on meta-learning Active CN112364894B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011147848.9A CN112364894B (en) 2020-10-23 2020-10-23 Zero sample image classification method of countermeasure network based on meta-learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011147848.9A CN112364894B (en) 2020-10-23 2020-10-23 Zero sample image classification method of countermeasure network based on meta-learning

Publications (2)

Publication Number Publication Date
CN112364894A CN112364894A (en) 2021-02-12
CN112364894B true CN112364894B (en) 2022-07-08

Family

ID=74511961

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011147848.9A Active CN112364894B (en) 2020-10-23 2020-10-23 Zero sample image classification method of countermeasure network based on meta-learning

Country Status (1)

Country Link
CN (1) CN112364894B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113139591B (en) * 2021-04-14 2023-02-24 广州大学 Generalized zero-sample image classification method based on enhanced multi-mode alignment
CN113177587B (en) * 2021-04-27 2023-04-07 西安电子科技大学 Generalized zero sample target classification method based on active learning and variational self-encoder
CN113344069B (en) * 2021-05-31 2023-01-24 成都快眼科技有限公司 Image classification method for unsupervised visual representation learning based on multi-dimensional relation alignment
CN113537322B (en) * 2021-07-02 2023-04-18 电子科技大学 Zero sample visual classification method for cross-modal semantic enhancement generation countermeasure network
CN113610212B (en) * 2021-07-05 2024-03-05 宜通世纪科技股份有限公司 Method and device for synthesizing multi-mode sensor data and storage medium
CN113343941B (en) * 2021-07-20 2023-07-25 中国人民大学 Zero sample action recognition method and system based on mutual information similarity
CN113688879B (en) * 2021-07-30 2024-05-24 南京理工大学 Generalized zero sample learning classification method based on confidence distribution external detection
CN113642621A (en) * 2021-08-03 2021-11-12 南京邮电大学 Zero sample image classification method based on generation countermeasure network
CN113610173B (en) * 2021-08-13 2022-10-04 天津大学 Knowledge distillation-based multi-span domain few-sample classification method
CN113688944B (en) * 2021-09-29 2022-12-27 南京览众智能科技有限公司 Image identification method based on meta-learning
CN114048850A (en) * 2021-10-29 2022-02-15 广东坚美铝型材厂(集团)有限公司 Maximum interval semantic feature self-learning method, computer device and storage medium
CN114037866B (en) * 2021-11-03 2024-04-09 哈尔滨工程大学 Generalized zero sample image classification method based on distinguishable pseudo-feature synthesis
CN114120049B (en) * 2022-01-27 2023-08-29 南京理工大学 Long-tail distribution visual identification method based on prototype classifier learning
CN114998613B (en) * 2022-06-24 2024-04-26 安徽工业大学 Multi-mark zero sample learning method based on deep mutual learning
CN115331012B (en) * 2022-10-14 2023-03-24 山东建筑大学 Joint generation type image instance segmentation method and system based on zero sample learning
CN117541555A (en) * 2023-11-16 2024-02-09 广州市公路实业发展有限公司 Road pavement disease detection method and system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108399421A (en) * 2018-01-31 2018-08-14 南京邮电大学 A kind of zero sample classification method of depth of word-based insertion
CN108875818A (en) * 2018-06-06 2018-11-23 西安交通大学 Based on variation from code machine and confrontation network integration zero sample image classification method
WO2019055114A1 (en) * 2017-09-12 2019-03-21 Hrl Laboratories, Llc Attribute aware zero shot machine vision system via joint sparse representations
CN110097095A (en) * 2019-04-15 2019-08-06 天津大学 A kind of zero sample classification method generating confrontation network based on multiple view
CN111476294A (en) * 2020-04-07 2020-07-31 南昌航空大学 Zero sample image identification method and system based on generation countermeasure network
CA3076646A1 (en) * 2019-03-22 2020-09-22 Royal Bank Of Canada System and method for generation of unseen composite data objects
US10803646B1 (en) * 2019-08-19 2020-10-13 Neon Evolution Inc. Methods and systems for image and voice processing

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11113599B2 (en) * 2017-06-22 2021-09-07 Adobe Inc. Image captioning utilizing semantic text modeling and adversarial learning
CN112889073A (en) * 2018-08-30 2021-06-01 谷歌有限责任公司 Cross-language classification using multi-language neural machine translation
US11087184B2 (en) * 2018-09-25 2021-08-10 Nec Corporation Network reparameterization for new class categorization
US11087174B2 (en) * 2018-09-25 2021-08-10 Nec Corporation Deep group disentangled embedding and network weight generation for visual inspection
CN109492662B (en) * 2018-09-27 2021-09-14 天津大学 Zero sample image classification method based on confrontation self-encoder model
CN110175251A (en) * 2019-05-25 2019-08-27 西安电子科技大学 The zero sample Sketch Searching method based on semantic confrontation network
CN110580501B (en) * 2019-08-20 2023-04-25 天津大学 Zero sample image classification method based on variational self-coding countermeasure network
CN110826638B (en) * 2019-11-12 2023-04-18 福州大学 Zero sample image classification model based on repeated attention network and method thereof
CN111581405B (en) * 2020-04-26 2021-10-26 电子科技大学 Cross-modal generalization zero sample retrieval method for generating confrontation network based on dual learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019055114A1 (en) * 2017-09-12 2019-03-21 Hrl Laboratories, Llc Attribute aware zero shot machine vision system via joint sparse representations
CN108399421A (en) * 2018-01-31 2018-08-14 南京邮电大学 A kind of zero sample classification method of depth of word-based insertion
CN108875818A (en) * 2018-06-06 2018-11-23 西安交通大学 Based on variation from code machine and confrontation network integration zero sample image classification method
CA3076646A1 (en) * 2019-03-22 2020-09-22 Royal Bank Of Canada System and method for generation of unseen composite data objects
CN110097095A (en) * 2019-04-15 2019-08-06 天津大学 A kind of zero sample classification method generating confrontation network based on multiple view
US10803646B1 (en) * 2019-08-19 2020-10-13 Neon Evolution Inc. Methods and systems for image and voice processing
CN111476294A (en) * 2020-04-07 2020-07-31 南昌航空大学 Zero sample image identification method and system based on generation countermeasure network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Incremental zero-shot learning based on attributes for image classification;Nan Xue, et al;《2017 IEEE International Conference on Image Processing (ICIP)》;20180222;全文 *
基于Res-Gan网络的零样本学习研究;林娇娇;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20200215(第2期);全文 *
基于生成对抗网络的零样本图像分类;魏宏喜等;《北京航空航天大学学报》;20191231(第12期);全文 *

Also Published As

Publication number Publication date
CN112364894A (en) 2021-02-12

Similar Documents

Publication Publication Date Title
CN112364894B (en) Zero sample image classification method of countermeasure network based on meta-learning
CN110580501B (en) Zero sample image classification method based on variational self-coding countermeasure network
Bakhtin et al. Real or fake? learning to discriminate machine from human generated text
CN109376242B (en) Text classification method based on cyclic neural network variant and convolutional neural network
Perez-Martin et al. Improving video captioning with temporal composition of a visual-syntactic embedding
CN105975573B (en) A kind of file classification method based on KNN
US7362892B2 (en) Self-optimizing classifier
CN108647226B (en) Hybrid recommendation method based on variational automatic encoder
CN114998602B (en) Domain adaptive learning method and system based on low confidence sample contrast loss
CN112364893B (en) Semi-supervised zero-sample image classification method based on data enhancement
CN113127737B (en) Personalized search method and search system integrating attention mechanism
Huang et al. Large-scale weakly-supervised content embeddings for music recommendation and tagging
CN112015902A (en) Least-order text classification method under metric-based meta-learning framework
CN113886562A (en) AI resume screening method, system, equipment and storage medium
Shannon et al. Non-saturating GAN training as divergence minimization
CN112529638A (en) Service demand dynamic prediction method and system based on user classification and deep learning
CN106021402A (en) Multi-modal multi-class Boosting frame construction method and device for cross-modal retrieval
Chen et al. Trada: tree based ranking function adaptation
CN114399661A (en) Instance awareness backbone network training method
CN116432125B (en) Code Classification Method Based on Hash Algorithm
CN116704208A (en) Local interpretable method based on characteristic relation
CN114202038B (en) Crowdsourcing defect classification method based on DBM deep learning
CN114943216A (en) Case microblog attribute-level viewpoint mining method based on graph attention network
CN114969511A (en) Content recommendation method, device and medium based on fragments
CN110162629B (en) Text classification method based on multi-base model framework

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant