CN111914929B - Zero sample learning method - Google Patents
Zero sample learning method Download PDFInfo
- Publication number
- CN111914929B CN111914929B CN202010750578.4A CN202010750578A CN111914929B CN 111914929 B CN111914929 B CN 111914929B CN 202010750578 A CN202010750578 A CN 202010750578A CN 111914929 B CN111914929 B CN 111914929B
- Authority
- CN
- China
- Prior art keywords
- network
- visual
- zero
- features
- visual feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 25
- 230000000007 visual effect Effects 0.000 claims abstract description 95
- 238000012549 training Methods 0.000 claims abstract description 30
- 239000013598 vector Substances 0.000 claims abstract description 22
- 230000006870 function Effects 0.000 claims abstract description 16
- 238000009826 distribution Methods 0.000 claims abstract description 10
- 230000007246 mechanism Effects 0.000 claims abstract description 10
- 238000005457 optimization Methods 0.000 claims abstract description 7
- 238000012360 testing method Methods 0.000 claims description 7
- 238000013528 artificial neural network Methods 0.000 claims description 5
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 238000013508 migration Methods 0.000 claims description 3
- 230000005012 migration Effects 0.000 claims description 3
- 230000005540 biological transmission Effects 0.000 claims 1
- 238000012546 transfer Methods 0.000 abstract description 9
- 241000282412 Homo Species 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 230000009977 dual effect Effects 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Chemical compound OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000001143 conditioned effect Effects 0.000 description 1
- 238000013434 data augmentation Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000004438 eyesight Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 241000894007 species Species 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000004382 visual function Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Molecular Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Databases & Information Systems (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
- Filters That Use Time-Delay Elements (AREA)
Abstract
本发明提供了一种零样本学习方法,通过从可见类别样本到不可见类别样本进行知识迁移,以识别从未见过的数据类别,主要包括以下步骤:获取训练特征数据集;搭建基于生成网络、噪声自编码器、回归网络以及判别网络的零样本学习模型;训练生成网络和噪声自编码器;训练判别网络;得到总目标函数,并进行迭代以达到算法优化的目的。本发明通过知识迁移,融合属性和词向量两种语义特征,在对抗机制下进行训练以最小化真实样本和生成样本的分布差异,并通过回归网络将视觉特征映射到语义特征,有效地解决了模型预测结果邻域迁移的问题,可以对难以标注的样例进行识别,同时减小识别成本。
The present invention provides a zero-sample learning method, which recognizes never-before-seen data categories by performing knowledge transfer from visible category samples to invisible category samples, which mainly includes the following steps: acquiring a training feature data set; building a generative-based network , noise autoencoder, regression network and zero-shot learning model of discriminant network; train generative network and noise autoencoder; train discriminant network; obtain the overall objective function, and iterate to achieve the purpose of algorithm optimization. Through knowledge transfer, the invention fuses two semantic features of attribute and word vector, conducts training under the confrontation mechanism to minimize the distribution difference between real samples and generated samples, and maps visual features to semantic features through regression network, which effectively solves the problem. The problem of neighborhood transfer of model prediction results can identify difficult-to-label examples and reduce the cost of identification.
Description
技术领域technical field
本发明涉及一种零样本学习方法,属于模式识别领域。The invention relates to a zero-sample learning method, which belongs to the field of pattern recognition.
背景技术Background technique
随着深度学习的发展,计算机视觉与机器学习方法的性能都取得很大的提高,并且深度学习模型已经在图像分类领域取得了令人惊讶的成功,甚至可以与人类的识别能力相媲美。但是,人类在识别新颖物体方面具有天然的优势,这些物体人类从前只是听说过或是见过几次,也有可能是从未接触到的新物体。造成这两者区别的最根本的原因是深度模型依赖于完全监督学习。因此训练神经网络需要大量的经过标注的数据,实际上,由于自然界的物种数以万计,收集和注释视觉数据既麻烦又昂贵。由此产生了一种新的任务,通过将知识从可见类别样本转移到不可见类别样本,从而可以识别不可见类别样本,以解决图像标注的问题。With the development of deep learning, the performance of both computer vision and machine learning methods has been greatly improved, and deep learning models have achieved surprising success in the field of image classification, even comparable to human recognition ability. However, humans have a natural advantage in recognizing novel objects, which humans have only heard of or seen a few times before, or new objects that they have never encountered before. The most fundamental reason for the difference between the two is that deep models rely on fully supervised learning. Training neural networks therefore requires a large amount of annotated data, and in fact, due to the tens of thousands of species in nature, collecting and annotating visual data is cumbersome and expensive. This leads to a new task that can identify unseen class samples by transferring knowledge from visible class samples to unseen class samples to solve the problem of image annotation.
零样本学习目前受到了越来越多的关注,在零样本学习中,通常假设可见类与不可见类集合是不相交的。在特征空间中有一部分样本是有标注的,这些样本称为可见类别样本,并且只有可见类别样本的视觉实例用来训练模型。在特征空间还有一部分未被标注的样本实例,这些样本类别称为不可见类别样本。特征空间是样本经过神经网络提取出的向量组成,而且每一个样本属于一个类别。为了建立可见类别样本与不可见类别样本的联系,通常会为零样本学习引入语义特征。在零样本学习中,属性是最为常用的语义特征,但是为每种语义属性手动标注视觉特征,是一件既费时又费力的发明。而自然语言处理技术利用了一些可替换属性的语义特征(例如词向量,glove),直接从维基百科文章中获取文本信息,但是这类语义特征由于获取粗糙且不可见,所以性能相比属性特征较差。Zero-shot learning is currently receiving more and more attention. In zero-shot learning, it is usually assumed that the set of visible and unseen classes is disjoint. A part of the samples in the feature space are labeled, these samples are called visible class samples, and only the visual instances of the visible class samples are used to train the model. There are also some unlabeled sample instances in the feature space, and these sample categories are called invisible category samples. The feature space is composed of vectors extracted by the samples through the neural network, and each sample belongs to a category. In order to establish the connection between visible class samples and unseen class samples, semantic features are usually introduced from zero-shot learning. In zero-shot learning, attributes are the most commonly used semantic features, but manually annotating visual features for each semantic attribute is a time-consuming and laborious invention. Natural language processing technology uses some semantic features of replaceable attributes (such as word vector, glove) to directly obtain text information from Wikipedia articles, but such semantic features are rough and invisible, so their performance is higher than that of attribute features. poor.
有鉴于此,确有必要提出一种零样本学习方法,以解决上述问题。In view of this, it is indeed necessary to propose a zero-shot learning method to solve the above problems.
发明内容SUMMARY OF THE INVENTION
本发明的目的在于提供一种零样本学习方法,用以对难以标注的样例进行识别并减小识别成本。The purpose of the present invention is to provide a zero-sample learning method for recognizing difficult-to-label samples and reducing the recognition cost.
为实现上述目的,本发明提供了一种零样本学习方法,用于从可见类别样本到不可见类别样本进行知识迁移,以识别从未见过的数据类别,主要包括以下步骤:In order to achieve the above purpose, the present invention provides a zero-sample learning method, which is used for knowledge transfer from visible class samples to invisible class samples to identify data classes that have never been seen before, which mainly includes the following steps:
步骤1、获取训练特征数据集,所述训练特征数据集包括可见类别样本,其中可见类别样本包括标签、真实视觉特征和语义特征;Step 1, obtaining a training feature data set, the training feature data set includes visible class samples, wherein the visible class samples include labels, real visual features and semantic features;
步骤2、搭建基于生成网络、噪声自编码器、回归网络以及判别网络的零样本学习模型,并对零样本学习模型中的生成网络、噪声自编码器、回归网络以及判别网络进行初始化;Step 2. Build a zero-sample learning model based on a generative network, a noise autoencoder, a regression network, and a discriminant network, and initialize the generative network, noise autoencoder, regression network, and discriminant network in the zero-shot learning model;
步骤3、训练生成网络和噪声自编码器,以分别生成第一视觉特征和第二视觉特征,并将第一视觉特征和第二视觉特征根据不同权重融合成伪视觉特征;Step 3, train the generation network and the noise autoencoder to generate the first visual feature and the second visual feature respectively, and fuse the first visual feature and the second visual feature into pseudo visual features according to different weights;
步骤4、训练判别网络,以对所述伪视觉特征和所述真实视觉特征进行分类,并通过对抗机制优化生成网络和判别网络;Step 4, training the discriminant network to classify the pseudo visual features and the real visual features, and optimize the generation network and the discriminant network through the confrontation mechanism;
步骤5、训练回归网络,将所述伪视觉特征作为输入,以将伪视觉特征映射到语义特征;Step 5, training a regression network, using the pseudo-visual features as input, to map the pseudo-visual features to semantic features;
步骤6、将生成网络、噪声自编码器、回归网络和判别网络的损失函数相加,以得到总目标函数,并进行迭代以达到算法优化的目的。Step 6. Add the loss functions of the generative network, the noise autoencoder, the regression network and the discriminant network to obtain the total objective function, and iterate to achieve the purpose of algorithm optimization.
可选的,步骤1中,所述标签包括数量标签和类别标签,所述语义特征包括词向量和属性特征。Optionally, in step 1, the labels include quantity labels and category labels, and the semantic features include word vectors and attribute features.
可选的,步骤2中,所述生成网络、噪声自编码器、回归网络和判别网络之间使用前馈神经网络进行数据传递。Optionally, in step 2, a feedforward neural network is used for data transfer among the generating network, the noise autoencoder, the regression network and the discriminant network.
可选的,步骤3中,所述生成网络通过属性特征和高斯随机噪声生成第一视觉特征;所述噪声自编码器通过词向量、潜在变量和高斯随机噪声生成第二视觉特征。Optionally, in step 3, the generating network generates the first visual feature through attribute features and Gaussian random noise; the noise autoencoder generates the second visual feature through word vectors, latent variables and Gaussian random noise.
可选的,步骤3中,第一视觉特征和第二视觉特征融合成伪视觉特征的公式为:Optionally, in step 3, the formula for fusing the first visual feature and the second visual feature into a pseudo visual feature is:
其中,xf是伪视觉特征,λ是相应的权重,x1是第一视觉特征表示,是第二视觉特征表示,第一视觉特征和第二视觉特征两个部分的权重之和为1。where x f is the pseudo visual feature, λ is the corresponding weight, x 1 is the first visual feature representation, is the second visual feature representation, and the sum of the weights of the first visual feature and the second visual feature is 1.
可选的,步骤4中,所述对抗机制可以表示为:Optionally, in step 4, the confrontation mechanism can be expressed as:
其中,x是真实视觉特征,xf=G(a,w,z),α~U(0,1)。where x is the real visual feature, x f =G(a,w,z), α~U(0,1).
可选的,步骤4中,所述伪视觉特征和所述真实视觉特征均通过最小二乘损失公式约束其分布,所述最小二乘损失公式为:Optionally, in step 4, the pseudo visual features and the real visual features are both constrained by the least squares loss formula, and the least squares loss formula is:
其中,x是真实视觉特征,xf是伪视觉特征。where x is the real visual feature and x f is the pseudo visual feature.
可选的,步骤6中的总目标函数为:Optionally, the overall objective function in step 6 is:
L=LWGAN+L1+λ2*LR,L=L WGAN +L 1 +λ 2 *L R ,
其中,λ2是在不同部分分配权重的超参数。where λ2 is a hyperparameter that assigns weights to different parts.
可选的,步骤6中,使用Adam作为优化器进行算法优化。Optionally, in step 6, use Adam as an optimizer to perform algorithm optimization.
可选的,还包括步骤7、将步骤3中训练好的生成网络用于不可见类别样本的真实视觉特征生成,并对其分类,以测试步骤6中的所述总目标函数。Optionally, step 7 is also included, using the generating network trained in step 3 to generate real visual features of samples of invisible categories, and classifying them to test the total objective function in step 6.
本发明的有益效果是:本发明通过知识迁移,融合属性和词向量两种语义特征,在对抗机制下进行训练以最小化真实样本和生成样本的分布差异,并通过回归网络将视觉特征映射到语义特征,有效地解决了模型预测结果邻域迁移的问题,可以对难以标注的样例进行识别,同时减小识别成本。The beneficial effects of the present invention are as follows: the present invention integrates two semantic features of attribute and word vector through knowledge transfer, conducts training under the confrontation mechanism to minimize the distribution difference between real samples and generated samples, and maps visual features to Semantic features can effectively solve the problem of neighborhood migration of model prediction results, and can identify samples that are difficult to label, while reducing the cost of identification.
附图说明Description of drawings
图1是本发明零样本学习方法的流程示意图。FIG. 1 is a schematic flowchart of the zero-sample learning method of the present invention.
图2是本发明零样本学习方法中生成第一视觉特征的流程图。FIG. 2 is a flow chart of generating a first visual feature in the zero-sample learning method of the present invention.
图3是本发明零样本学习方法中生成第二视觉特征的流程图。FIG. 3 is a flow chart of generating a second visual feature in the zero-sample learning method of the present invention.
具体实施方式Detailed ways
为了使本发明的目的、技术方案和优点更加清楚,下面结合附图和具体实施例对本发明进行详细描述。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
如图1所示,本发明揭示了一种零样本学习方法,用于从可见类别样本到不可见类别样本进行知识迁移,以识别从未见过的数据类别,主要包括以下步骤:As shown in Fig. 1, the present invention discloses a zero-sample learning method, which is used for knowledge transfer from visible class samples to invisible class samples to identify data classes that have never been seen before, which mainly includes the following steps:
步骤1、获取训练特征数据集,所述训练特征数据集包括可见类别样本,其中可见类别样本包括标签、真实视觉特征和语义特征;Step 1, obtaining a training feature data set, the training feature data set includes visible class samples, wherein the visible class samples include labels, real visual features and semantic features;
步骤2、搭建基于生成网络、噪声自编码器、回归网络以及判别网络的零样本学习模型,并对零样本学习模型中的生成网络、噪声自编码器、回归网络以及判别网络进行初始化;Step 2. Build a zero-sample learning model based on a generative network, a noise autoencoder, a regression network, and a discriminant network, and initialize the generative network, noise autoencoder, regression network, and discriminant network in the zero-shot learning model;
步骤3、训练生成网络和噪声自编码器,以分别生成第一视觉特征和第二视觉特征,并将第一视觉特征和第二视觉特征根据不同权重融合成伪视觉特征;Step 3, train the generation network and the noise autoencoder to generate the first visual feature and the second visual feature respectively, and fuse the first visual feature and the second visual feature into pseudo visual features according to different weights;
步骤4、训练判别网络,以对所述伪视觉特征和所述真实视觉特征进行分类,并通过对抗机制优化生成网络和判别网络;Step 4, training the discriminant network to classify the pseudo visual features and the real visual features, and optimize the generation network and the discriminant network through the confrontation mechanism;
步骤5、训练回归网络,将所述伪视觉特征作为输入,以将伪视觉特征映射到语义特征;Step 5, training a regression network, using the pseudo-visual features as input, to map the pseudo-visual features to semantic features;
步骤6、将生成网络、噪声自编码器、回归网络和判别网络的损失函数相加,以得到总目标函数,并进行迭代以达到算法优化的目的。Step 6. Add the loss functions of the generative network, the noise autoencoder, the regression network and the discriminant network to obtain the total objective function, and iterate to achieve the purpose of algorithm optimization.
以下将对步骤1-步骤6进行详细说明。Steps 1 to 6 will be described in detail below.
在步骤1中,训练特征数据集是由深层卷积神经网络提取出的2048维视觉特征,是一个向量组;所述标签包括数量标签和类别标签,所述语义特征包括词向量和属性特征。训练特征数据集在深层卷积神经网络模型RstNet101顶层池化单元的2048维视觉功能下,表现出了卓越的性能。对于AwA1和AwA2数据库,除了使用视觉特征作为语义特征之外,还通过使用词向量来表示每个类别,其中每个类别词向量的维数为1000。具体来说,即利用自然语言处理技术从大型语言语料库中每个类别进行提取词向量。至于属性特征,通过连续值语义属性,其维数如下表1所示。In step 1, the training feature dataset is a 2048-dimensional visual feature extracted by a deep convolutional neural network, which is a vector group; the labels include quantity labels and category labels, and the semantic features include word vectors and attribute features. The training feature dataset shows excellent performance under the 2048-dimensional visual function of the top-level pooling unit of the deep convolutional neural network model RstNet101. For the AwA1 and AwA2 databases, in addition to using visual features as semantic features, each category is also represented by using word vectors, where the dimension of each category word vector is 1000. Specifically, it uses natural language processing technology to extract word vectors from each category in a large language corpus. As for attribute features, through continuous-valued semantic attributes, the dimensions are shown in Table 1 below.
表1Table 1
在步骤2中,生成网络、噪声自编码器、回归网络和判别网络使用前馈神经网络进行数据传递。In step 2, the generative network, noisy autoencoder, regression network and discriminative network use feedforward neural network for data transfer.
其中,噪声自编码器具有两个隐藏全连接层,分别为1200和600个单位,由2048和4096个单位的隐藏全连接层实现;而判别网络仅由一个512单位的隐藏全连接层实现;回归网络只有一个隐藏层,由600个单位组成。本发明中的所有噪声维度均为100,可分别从两种不同的语义特征来合成伪视觉特征,同时使用回归网络和判别网络来执行语义推理和相关约束。Among them, the noise autoencoder has two hidden fully connected layers, 1200 and 600 units respectively, which are implemented by 2048 and 4096 units of hidden fully connected layers; while the discriminant network is only implemented by one hidden fully connected layer of 512 units; The regression network has only one hidden layer, consisting of 600 units. All noise dimensions in the present invention are 100, and pseudo-visual features can be synthesized from two different semantic features respectively, while using regression network and discriminative network to perform semantic reasoning and related constraints.
在步骤3中,生成网络的目的是学习数据点的概率分布,以便从中采集样本,用于数据增强机制。作为最有潜力的生成模型之一,生成对抗模型被广泛研究。In step 3, the purpose of the generative network is to learn the probability distribution of the data points in order to collect samples from it for the data augmentation mechanism. As one of the most promising generative models, generative adversarial models have been widely studied.
如图2所示,给定所见类别的训练数据Dtr,旨在学习生成网络G:Z×C→X,高斯随机噪声并且将属性特征ai∈Rq作为输入,然后输出第一视觉特征。一旦生成网络学习到基于可见类别样本生成第一视觉特征,便可以嵌入属性特征以生成任何看不见的视觉特征类。生成网络可以通过以下优化函数来学习:As shown in Figure 2, given training data D tr of the seen classes, it aims to learn a generative network G: Z × C → X, Gaussian random noise And take the attribute feature a i ∈ R q as input, and then output the first visual feature. Once the generative network has learned to generate the first visual feature based on the visible class samples, attribute features can be embedded to generate any unseen visual feature class. The generative network can be learned by the following optimization function:
其中,x1=G(z,ai)是视觉空间中第i个生成的第一视觉特征表示,具有相应的视觉特征ai和噪声z。where x 1 =G(z, a i ) is the i-th generated first visual feature representation in the visual space, with the corresponding visual feature a i and noise z.
本发明还使用噪声自编码器,通过词向量作为语义特征以得到第二视觉特征,词向量是另一种语义信息,并从另一个角度描述了不可见类别样本并补充了属性特征,并将WAE扩展为附条件的WAE。The present invention also uses the noise self-encoder to obtain the second visual feature by using the word vector as a semantic feature. The word vector is another kind of semantic information, and describes the invisible category samples from another angle and supplements the attribute features, and uses the word vector as another kind of semantic information. WAE expands to conditional WAE.
如图3所示,给出了特定的条件信息(词向量),噪声自编码器用于生成概率分布Q(Z|X),其中Q是潜在空间的分布,PZ是Z:上的各向同性高斯先验分布。具体来说,潜在空间Z中引入了一个判别网络,目标是将从Z:采样的“假”点与从Q(Z|X)采样的“真”点区分开。其中,解码器用于解码词向量wi∈Rp和潜在变量Q(Z|X),然后生成第二视觉特征,其中,附条件的WAE的损失函数定义如下:As shown in Figure 3, given specific conditional information (word vectors), the noisy autoencoder is used to generate a probability distribution Q(Z|X), where Q is the distribution of the latent space and PZ is Z: An isotropic Gaussian prior distribution on . Specifically, a discriminative network is introduced into the latent space Z, with the goal of changing from Z: The "false" points sampled are distinguished from the "true" points sampled from Q(Z|X). Among them, the decoder is used to decode the word vector w i ∈ R p and the latent variable Q(Z|X), and then generate the second visual feature, where the loss function of the conditional WAE is defined as follows:
其中,是视觉空间中第i个生成的第二视觉特征表示,具有相应的词向量wi。同时选择DZ(Qz,PZ)=DJS(QZ,PZ)并使用对抗训练进行评估,λ>0是一个超参数。in, is the i-th generated second visual feature representation in the visual space, with the corresponding word vector w i . Also choose D Z (Qz, P Z ) = D JS (Q Z , P Z ) and use adversarial training for evaluation, λ>0 is a hyperparameter.
本发明选择融合第一视觉特征和第二视觉特征。一方面,基于人类在学习识别新对象方面的经验,属性特征比词向量包含更有效的语义信息。另一方面,与真实视觉特征相比,所产生的伪视觉特征是大量的无效信息,可以通过特征融合将无效信息删除,以保持信息的有效性。The present invention chooses to fuse the first visual feature and the second visual feature. On the one hand, attribute features contain more effective semantic information than word vectors, based on human experience in learning to recognize new objects. On the other hand, compared with real visual features, the generated pseudo visual features are a large amount of invalid information, which can be removed through feature fusion to maintain the validity of the information.
基于以上知识,有必要对由属性特征和词向量生成的伪视觉特征赋予不同的权重,以下是第一视觉特征和第二视觉特征融合成伪视觉特征的公式为:Based on the above knowledge, it is necessary to assign different weights to the pseudo-visual features generated by attribute features and word vectors. The following is the formula for fusing the first and second visual features into pseudo-visual features:
其中,xf是伪视觉特征,λ是相应的权重,x1是第一视觉特征表示,是第二视觉特征表示,第一视觉特征和第二视觉特征两个部分的权重之和为1。where x f is the pseudo visual feature, λ is the corresponding weight, x 1 is the first visual feature representation, is the second visual feature representation, and the sum of the weights of the first visual feature and the second visual feature is 1.
在步骤4中,判别网络用于对所述伪视觉特征和所述真实视觉特征进行分类。伪视觉特征可以尽可能成功地欺骗判别网络,随着判别网络不断提高判别能力,可以通过对抗机制来优化生成网络和判别网络,所生成的伪视觉特征的质量也不断提高。本发明使用改进的WGAN进行对抗训练,训练判别网络的对抗过程可以表示为:In step 4, a discriminant network is used to classify the pseudo visual features and the real visual features. The pseudo-visual features can deceive the discriminative network as successfully as possible. As the discriminative network continues to improve its discriminative ability, the generation network and the discriminative network can be optimized through an adversarial mechanism, and the quality of the generated pseudo-visual features is also continuously improved. The present invention uses the improved WGAN for adversarial training, and the adversarial process of training the discriminant network can be expressed as:
其中,x是真实视觉特征,xf=G(a,w,z),α~U(0,1);等式中的前两个项近似于Wasserstein距离,而第三个项是梯度罚项,强制执行D的梯度沿直线具有单位范数。where x is the real visual feature, x f =G(a,w,z), α ~ U(0,1); the first two terms in the equation approximate the Wasserstein distance, while the third term is a gradient penalty term that enforces that the gradient of D has a unit norm along the line.
尽管对抗机制使生成网络具有生成真实视觉特征及类似分布的能力,但是仅确保融合的伪视觉特征是有效的还不够。因此,我们为融合的伪视觉特征和真实视觉特征通过最小二乘损失公式,以约束其分布:Although adversarial mechanisms enable generative networks to generate realistic visual features and similar distributions, it is not enough to ensure that fused pseudo-visual features are effective. Therefore, we pass the least squares loss formula for the fused pseudo visual features and real visual features to constrain their distribution:
在步骤5中,所述回归网络将所述伪视觉特征作为输入,然后将所述伪视觉特征转换成语义特征。生成网络和回归网络一起形成双重学习框架,因此它们可以相互学习。在本发明中,主要任务是生成以类嵌入为条件的视觉特征,而双重任务是将视觉特征返回到相应的类语义空间。In step 5, the regression network takes the pseudo-visual features as input, and then converts the pseudo-visual features into semantic features. The generative network and the regression network together form a dual learning framework, so they can learn from each other. In the present invention, the main task is to generate visual features conditioned on class embeddings, while the dual task is to return visual features to the corresponding class semantic space.
来自训练特征数据集采样的真实视觉特征为x,第二个是融合的伪视觉特征为xf,借助配对的训练数据(x,a),我们可以在监督损失下训练回归网络:The real visual feature sampled from the training feature dataset is x, and the second is the fused pseudo visual feature x f . With the paired training data (x,a), we can train the regression network with a supervised loss:
在步骤6中,将生成网络、噪声自编码器、回归网络和判别网络的损失函数相加,可得到总的目标函数:In step 6, the loss functions of the generative network, the noise autoencoder, the regression network and the discriminant network are added to obtain the total objective function:
L=LWGAN+L1+λ2*LR,L=L WGAN +L 1 +λ 2 *L R ,
其中,λ2是在不同部分分配权重的超参数。where λ2 is a hyperparameter that assigns weights to different parts.
通过选择Adam作为优化器,并将参数β1和β2设置为(0.9,0.999)。首先训练判别网络并优化其参数,然后固定判别网络参数,判别网络的学习率设置为0.00001;训练生成网络、回归网络并优化其参数,生成网络和回归网络的学习率设置为0.0001,零样本学习模型中的所有模块都以批次数量为128的条件下进行了训练,通过在每个数据集上训练1000个周期,并每10个周期保存一次模型参数,然后评估测试集。By choosing Adam as the optimizer and setting the parameters β1 and β2 to ( 0.9, 0.999). First train the discriminant network and optimize its parameters, then fix the discriminant network parameters, set the learning rate of the discriminant network to 0.00001; train the generative network, the regression network and optimize its parameters, set the learning rate of the generative network and the regression network to 0.0001, and zero-sample learning All modules in the model were trained with a batch size of 128 by training on each dataset for 1000 epochs, saving model parameters every 10 epochs, and then evaluating on the test set.
优选地,本发明还包括步骤7,具体为:将步骤3中训练好的生成网络用于不可见类别样本的真实视觉特征生成,并对其分类,以测试步骤6中所述总目标函数。Preferably, the present invention further includes step 7, specifically: using the generating network trained in step 3 to generate real visual features of samples of invisible categories, and classifying them to test the overall objective function described in step 6.
训练模型后,为了预测不可见类别样本的标签,可以首先为每个不可见类别样本生成新样本,可以根据新样本集训练任何包含可见和不可见类的样本的新的分类器。然后将这些合成样本与训练数据中的其他样本一起,之后可以根据这个新数据集训练任何新的分类器包含可见和不可见类的样本。After training the model, in order to predict the labels of unseen class samples, a new sample can be first generated for each unseen class sample, and a new classifier can be trained based on the new set of samples containing both visible and unseen class samples. These synthetic samples are then combined with other samples in the training data, after which any new classifier can be trained based on this new dataset of samples containing both visible and invisible classes.
进一步地,将本发明与15种方法进行了比较,包括:SSE,LATEM,ALE,DEVISE,SJE,ESZSL,SYNC,SAE,DEM,RelationNet,PSR-ZSL,SP-AEN,CAPD,CVAE,GDAN。Further, the present invention is compared with 15 methods, including: SSE, LATEM, ALE, DEVISE, SJE, ESZSL, SYNC, SAE, DEM, RelationNet, PSR-ZSL, SP-AEN, CAPD, CVAE, GDAN.
对于与其他基准的合理比较,可应用一个简单的1-NN分类器用于测试。通过将GDFN与最新技术广义零样本学习进行了比较,结果显示在下表2中。For reasonable comparison with other benchmarks, a simple 1-NN classifier can be applied for testing. By comparing GDFN with state-of-the-art generalized zero-shot learning, the results are shown in Table 2 below.
表2四个基准数据集上的广义零样本学习方法的结果Table 2 Results of generalized zero-shot learning methods on four benchmark datasets
从结果可以看出,本发明在广义零样本学习数据集上取得了很好的结果。It can be seen from the results that the present invention achieves good results on the generalized zero-shot learning dataset.
对于CUB数据集,本发明在看不见的类别上取得了良好的结果,并在可见的类别上取得了最高的准确性。这说明:本发明在谐波均值方面表现良好,这再次表明本发明在可见和不可见类之间保持了良好的预测图像平衡,并且与以前的模型相比,本发明显示出更好的性能。For the CUB dataset, the present invention achieves good results on unseen classes and the highest accuracy on visible classes. This shows that the present invention performs well in terms of harmonic mean, which again shows that the present invention maintains a good balance of predicted images between visible and invisible classes, and the present invention shows better performance compared to previous models .
对于AwA2数据,本发明在看不见的类精度和谐波均值方面比最新方法(例如SP-AEN和PSR-ZSL)表现的更好,并在可见类别样本中显示出更高的精度。For AwA2 data, the present invention outperforms state-of-the-art methods such as SP-AEN and PSR-ZSL in unseen class precision and harmonic mean, and shows higher accuracy on unseen class samples.
对于SUN数据集,本发明在可见和不可见类别样本均的识别具有较高的准确性,并且对可见类别样本的识别分类上具有明显进步。For the SUN data set, the present invention has high accuracy in the recognition of both visible and invisible class samples, and has obvious progress in the recognition and classification of visible class samples.
对于aPY数据集,无关训练图像和测试图像的属性方差之间的相似性比其他数据集小得多,这表明很难对未看到的类进行合成和分类。尽管现有技术对于看不见的类别识别具有相对较低的准确率,但是使用本发明对该数据集进行测试时,可以取得良好的结果。现有技术对于可见类别样本的识别具有较高的准确率,而本发明对于可见类别样本具有较高的准确率,且在可见和不可见类之间实现了平衡,提供了准确的aPY谐波平均精度。For the aPY dataset, the similarity between the attribute variances of irrelevant training images and test images is much smaller than for other datasets, indicating that it is difficult to synthesize and classify unseen classes. Although the prior art has relatively low accuracy for unseen class recognition, good results can be achieved when testing this dataset using the present invention. The prior art has higher accuracy for the identification of visible class samples, while the present invention has higher accuracy for visible class samples, and achieves a balance between visible and invisible classes, providing accurate aPY harmonics Average precision.
综上所述,本发明通过知识迁移,融合属性和词向量两种语义特征,在对抗机制下进行训练以最小化真实样本和生成样本的分布差异,并通过回归网络将视觉特征映射到语义特征,有效地解决了模型预测结果邻域迁移的问题,可以对难以标注的样例进行识别,同时减小识别成本。To sum up, the present invention uses knowledge transfer, fuses two semantic features of attribute and word vector, conducts training under an adversarial mechanism to minimize the distribution difference between real samples and generated samples, and maps visual features to semantic features through a regression network. , which effectively solves the problem of neighborhood migration of model prediction results, and can identify samples that are difficult to label, while reducing the cost of identification.
以上实施例仅用以说明本发明的技术方案而非限制,尽管参照较佳实施例对本发明进行了详细说明,本领域的普通技术人员应当理解,可以对本发明的技术方案进行修改或者等同替换,而不脱离本发明技术方案的精神和范围。The above embodiments are only used to illustrate the technical solutions of the present invention and not to limit them. Although the present invention has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the present invention can be modified or equivalently replaced. Without departing from the spirit and scope of the technical solutions of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010750578.4A CN111914929B (en) | 2020-07-30 | 2020-07-30 | Zero sample learning method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010750578.4A CN111914929B (en) | 2020-07-30 | 2020-07-30 | Zero sample learning method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111914929A CN111914929A (en) | 2020-11-10 |
CN111914929B true CN111914929B (en) | 2022-08-23 |
Family
ID=73286794
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010750578.4A Active CN111914929B (en) | 2020-07-30 | 2020-07-30 | Zero sample learning method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111914929B (en) |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113191381B (en) * | 2020-12-04 | 2022-10-11 | 云南大学 | Image zero-order classification model based on cross knowledge and classification method thereof |
CN112580722B (en) * | 2020-12-20 | 2024-06-14 | 大连理工大学人工智能大连研究院 | Generalized zero sample image recognition method based on conditional countermeasure automatic encoder |
CN112674709B (en) * | 2020-12-22 | 2022-07-29 | 泉州装备制造研究所 | A method for amblyopia detection based on anti-noise |
CN112766386B (en) * | 2021-01-25 | 2022-09-20 | 大连理工大学 | A generalized zero-shot learning method based on multi-input multi-output fusion network |
CN113222002B (en) * | 2021-05-07 | 2024-04-05 | 西安交通大学 | Zero sample classification method based on generative discriminative contrast optimization |
CN113269274B (en) * | 2021-06-18 | 2022-04-19 | 南昌航空大学 | Zero sample identification method and system based on cycle consistency |
CN113378959B (en) * | 2021-06-24 | 2022-03-15 | 中国矿业大学 | Zero sample learning method for generating countermeasure network based on semantic error correction |
CN113723106B (en) * | 2021-07-29 | 2024-03-12 | 北京工业大学 | Zero sample text classification method based on label extension |
CN114266307B (en) * | 2021-12-21 | 2024-08-09 | 复旦大学 | Method for identifying noise samples in parallel based on non-zero mean shift parameters |
CN114842398B (en) * | 2022-05-23 | 2024-12-17 | 北京邮电大学 | Video action recognition method based on zero sample learning |
CN115424262A (en) * | 2022-08-04 | 2022-12-02 | 暨南大学 | Method for optimizing zero sample learning |
CN116051909B (en) * | 2023-03-06 | 2023-06-16 | 中国科学技术大学 | Direct push zero-order learning unseen picture classification method, device and medium |
CN117893743B (en) * | 2024-03-18 | 2024-05-31 | 山东军地信息技术集团有限公司 | Zero sample target detection method based on channel weighting and double-comparison learning |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107679556A (en) * | 2017-09-18 | 2018-02-09 | 天津大学 | The zero sample image sorting technique based on variation autocoder |
CN109492662A (en) * | 2018-09-27 | 2019-03-19 | 天津大学 | A kind of zero sample classification method based on confrontation self-encoding encoder model |
CN110175251A (en) * | 2019-05-25 | 2019-08-27 | 西安电子科技大学 | The zero sample Sketch Searching method based on semantic confrontation network |
-
2020
- 2020-07-30 CN CN202010750578.4A patent/CN111914929B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107679556A (en) * | 2017-09-18 | 2018-02-09 | 天津大学 | The zero sample image sorting technique based on variation autocoder |
CN109492662A (en) * | 2018-09-27 | 2019-03-19 | 天津大学 | A kind of zero sample classification method based on confrontation self-encoding encoder model |
CN110175251A (en) * | 2019-05-25 | 2019-08-27 | 西安电子科技大学 | The zero sample Sketch Searching method based on semantic confrontation network |
Non-Patent Citations (1)
Title |
---|
零样本图像识别;兰红等;《电子与信息学报》(第05期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN111914929A (en) | 2020-11-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111914929B (en) | Zero sample learning method | |
CN111581405B (en) | Cross-modal generalization zero sample retrieval method for generating confrontation network based on dual learning | |
CN111476294B (en) | A zero-sample image recognition method and system based on generative adversarial network | |
CN111368896B (en) | Hyperspectral Remote Sensing Image Classification Method Based on Dense Residual 3D Convolutional Neural Network | |
CN108875818B (en) | A zero-shot image classification method based on the combination of variational autoencoder and adversarial network | |
CN101315663B (en) | A Natural Scene Image Classification Method Based on Regional Latent Semantic Features | |
Gao et al. | Multi‐dimensional data modelling of video image action recognition and motion capture in deep learning framework | |
Shang et al. | Are noisy sentences useless for distant supervised relation extraction? | |
CN109598279B (en) | Zero sample learning method based on self-coding countermeasure generation network | |
CN112949647B (en) | Three-dimensional scene description method and device, electronic equipment and storage medium | |
CN111324765A (en) | Fine-grained sketch image retrieval method based on depth cascade cross-modal correlation | |
CN115861995B (en) | Visual question-answering method and device, electronic equipment and storage medium | |
CN106529605A (en) | Image identification method of convolutional neural network model based on immunity theory | |
CN108765383A (en) | Video presentation method based on depth migration study | |
CN113837229B (en) | Knowledge-driven text-to-image generation method | |
CN111461067B (en) | A zero-sample remote sensing image scene recognition method based on prior knowledge mapping and correction | |
Tran et al. | Aggregating image and text quantized correlated components | |
CN113361646A (en) | Generalized zero sample image identification method and model based on semantic information retention | |
Li et al. | Bidirectional generative transductive zero-shot learning | |
CN114626461A (en) | A cross-domain object detection method based on domain adaptation | |
CN112380374B (en) | A zero-shot image classification method based on semantic augmentation | |
CN108170823A (en) | Hand-drawn interactive three-dimensional model retrieval method based on high-level semantic attribute understanding | |
CN116663539A (en) | Chinese entity and relationship joint extraction method and system based on RoBERTa and pointer network | |
Soysal et al. | An introduction to zero-shot learning: An essential review | |
CN113032601A (en) | Zero sample sketch retrieval method based on discriminant improvement |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |