CN115170813A - Network supervision fine-grained image identification method based on partial label learning - Google Patents

Network supervision fine-grained image identification method based on partial label learning Download PDF

Info

Publication number
CN115170813A
CN115170813A CN202210761418.9A CN202210761418A CN115170813A CN 115170813 A CN115170813 A CN 115170813A CN 202210761418 A CN202210761418 A CN 202210761418A CN 115170813 A CN115170813 A CN 115170813A
Authority
CN
China
Prior art keywords
label
noise
sample
correlation
images
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210761418.9A
Other languages
Chinese (zh)
Inventor
魏秀参
许玉燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Science and Technology
Original Assignee
Nanjing University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Science and Technology filed Critical Nanjing University of Science and Technology
Priority to CN202210761418.9A priority Critical patent/CN115170813A/en
Publication of CN115170813A publication Critical patent/CN115170813A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/30Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a network supervision fine-grained image identification method based on partial label learning, which comprises the following steps: carrying out depth descriptor transformation by using a depth neural network model with pre-training to evaluate positive correlation between network images, detecting open-set label noise existing in a data set according to a correlation matrix, and removing the open-set label noise; the loss function is used for driving the model to show higher recall rate, so that the label set of each sample contains the labels of the real classes of the samples as much as possible; and selecting real labels of the samples from the label sets of the samples, correcting the label noise of the closed set, and putting the clean data and the closed set label noise with the corrected labels into a deep neural network for training. The method effectively removes the open set label noise, and simultaneously converts the closed set label noise into the training image with the accurate label by using the partial label learning, so that the number of available samples in the network data set is increased, and the learning performance of the neural network model is further improved.

Description

基于偏标签学习的网络监督细粒度图像识别方法Network-supervised fine-grained image recognition method based on partial label learning

技术领域technical field

本发明属于网络监督图像识别领域,具体涉及一种基于偏标签学习的网络监督细粒度图像识别方法。The invention belongs to the field of network supervised image recognition, in particular to a network supervised fine-grained image recognition method based on partial label learning.

背景技术Background technique

构建细粒度数据集需要特定领域专家通过细粒度子类之间的细微差异进行正确分类,因而是一项困难的工作。为了减少构建细粒度数据集对手工标注的依赖并且学习到更实用的模型,直接从互联网中收集相关类别的图像构建网络数据集并投入训练的方式变得越来越流行。但是构建的网络数据集存在较多的数据噪声,直接训练会导致模型过拟合从而影响准确率。细粒度网络数据集一般存在两类噪声,即开集标签噪声和闭集标签噪声。开集标签噪声通常是由“跨域”引起的,即噪声不属于同一细粒度域中的任何一类。闭集噪声是指在一个细粒度域中有错误标签的图像。Building fine-grained datasets requires domain-specific experts to correctly classify through subtle differences between fine-grained subclasses, making it a difficult task. In order to reduce the reliance on manual annotation for building fine-grained datasets and learn more practical models, it is becoming more and more popular to collect images of relevant categories directly from the Internet to build network datasets and put them into training. However, the constructed network dataset has a lot of data noise, and direct training will lead to overfitting of the model and affect the accuracy. There are generally two types of noise in fine-grained network datasets, namely open-set label noise and closed-set label noise. Open-set label noise is usually caused by "cross-domain", i.e. the noise does not belong to any class in the same fine-grained domain. Closed-set noise refers to images that are mislabeled in a fine-grained domain.

处理一般的标签噪声的方法有样本选择、采用软标签或者是相关的损失函数,这些方法虽然已有很好的分类效果,但是存在1)丢弃一部分干净图像的风险,2)在噪声图像的利用上仍然存在着无法将闭集标签噪声图像转化为准确的训练图像等问题。Methods to deal with general label noise include sample selection, soft labeling or related loss functions. Although these methods have good classification results, they have the risk of 1) discarding part of the clean image, 2) using the noisy image There are still problems such as the inability to convert closed-set label noise images into accurate training images.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于提供一种基于偏标签学习的网络监督细粒度图像识别方法。The purpose of the present invention is to provide a network-supervised fine-grained image recognition method based on partial label learning.

实现本发明目的的技术方案为:第一方面,本发明提供一种基于偏标签学习的网络监督细粒度图像识别方法,包括以下步骤:The technical scheme for realizing the purpose of the present invention is as follows: In the first aspect, the present invention provides a network-supervised fine-grained image recognition method based on partial label learning, comprising the following steps:

步骤1,利用拥有预训练的深度神经网络模型进行深度描述符变换来评估网络图像之间的正相关性,根据反映相关性的相关矩阵检测数据集中存在的开集标签噪声,并进行开集标签噪声的去除;Step 1: Use the pre-trained deep neural network model to perform deep descriptor transformation to evaluate the positive correlation between network images, detect the open set label noise in the data set according to the correlation matrix reflecting the correlation, and perform the open set labeling. noise removal;

步骤2,利用损失函数驱动模型,使每一个样本的标签集合中尽可能的包含样本真实类别的标签;Step 2, use the loss function to drive the model, so that the label set of each sample contains the label of the true category of the sample as much as possible;

步骤3,利用偏标签学习的思想,从样本的标签集合中选择样本的真实标签,从而对闭集标签噪声进行修正。Step 3, using the idea of partial label learning, select the true label of the sample from the label set of the sample, so as to correct the closed-set label noise.

第二方面,本发明提供一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现第一方面所述的方法的步骤。In a second aspect, the present invention provides a computer device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor implementing the method described in the first aspect when the processor executes the program A step of.

第三方面,本发明提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现第一方面所述的方法的步骤。In a third aspect, the present invention provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the steps of the method described in the first aspect.

第四方面,本发明提供一种计算机程序产品,包括计算机程序,该计算机程序被处理器执行时实现第一方面所述的方法的步骤。In a fourth aspect, the present invention provides a computer program product, comprising a computer program, which implements the steps of the method described in the first aspect when the computer program is executed by a processor.

本发明与现有技术相比,其显著优点为:(1)本发明提出了一种开集标签噪声去除策略和闭集标签噪声校正策略来处理实际但具有挑战性的网络监督细粒度识别任务;(2)利用预先训练的深度模型进行深度描述符变换来估计网络图像之间的正相关性,并根据相关性值有效的进行开集标签噪声的检测和清除;(3)自行构建每一个样本的候选标签集合,利用损失函数驱动模型表现出较高的召回率,使每个样本的候选标签集合中尽可能存在样本真实标签,保证偏标签学习的性能;(4)利用偏标签学习将闭集标签噪声转化为拥有准确标签的训练图像,在纠正闭集标签噪声的同时产生了闭集标签噪声到训练数据的转化,使网络数据集中的可用样本数量增加,保证神经网络模型学习性能的提高。Compared with the prior art, the present invention has the following significant advantages: (1) The present invention proposes an open-set label noise removal strategy and a closed-set label noise correction strategy to deal with practical but challenging network-supervised fine-grained recognition tasks ; (2) Use the pre-trained deep model to perform depth descriptor transformation to estimate the positive correlation between network images, and effectively detect and remove the open-set label noise according to the correlation value; (3) Build each The candidate label set of the sample, using the loss function to drive the model to show a high recall rate, so that the sample real label exists in the candidate label set of each sample as much as possible to ensure the performance of partial label learning; (4) Use partial label learning to The closed-set label noise is converted into training images with accurate labels. While correcting the closed-set label noise, the closed-set label noise is converted into training data, which increases the number of available samples in the network data set and ensures the learning performance of the neural network model. improve.

附图说明Description of drawings

图1为本发明基于偏标签学习的网络监督细粒度图像识别方法流程图。FIG. 1 is a flowchart of the network-supervised fine-grained image recognition method based on partial label learning of the present invention.

具体实施方式Detailed ways

结合图1,一种基于偏标签学习的网络监督细粒度图像识别方法,具体包括以下步骤:Referring to Figure 1, a network-supervised fine-grained image recognition method based on partial label learning includes the following steps:

步骤1,利用拥有预训练的深度神经网络模型进行深度描述符变换来评估网络图像之间的正相关性,根据反映相关性的相关矩阵检测数据集中存在的开集标签噪声,并进行开集标签噪声的去除;Step 1: Use the pre-trained deep neural network model to perform deep descriptor transformation to evaluate the positive correlation between network images, detect the open set label noise in the data set according to the correlation matrix reflecting the correlation, and perform open set labeling. noise removal;

Figure BDA0003724146630000021
为标签空间,
Figure BDA0003724146630000022
为样本空间,对于每一个包含n张图片的随机批次
Figure BDA0003724146630000023
Figure BDA0003724146630000024
通过拥有预训练的卷积神经网络Φpre提取到含有n个特征图的特征图集合
Figure BDA0003724146630000025
Assume
Figure BDA0003724146630000021
is the label space,
Figure BDA0003724146630000022
is the sample space, for each random batch containing n pictures
Figure BDA0003724146630000023
Figure BDA0003724146630000024
Extract a feature map set containing n feature maps by having a pre-trained convolutional neural network Φ pre
Figure BDA0003724146630000025

Figure BDA0003724146630000026
Figure BDA0003724146630000026

其中H,W和d分别表示特征图ti的高度、宽度和深度。为了能在特征图集合

Figure BDA0003724146630000027
上找到更普适的高级特征表达,利用主成分分析生成特征值最大的特征向量
Figure BDA0003724146630000028
对于给定的特征图集合,每一个特征图都和特征向量p进行通道加权求和获得热图并组成热图集合
Figure BDA0003724146630000031
计算与第i个特征图ti对应的
Figure BDA0003724146630000032
中第i个热图Hi:where H, W and d represent the height, width and depth of the feature map ti , respectively. In order to be able to collect in the feature map
Figure BDA0003724146630000027
Find a more general high-level feature expression, and use principal component analysis to generate the eigenvector with the largest eigenvalue
Figure BDA0003724146630000028
For a given set of feature maps, each feature map is channel-weighted and summed with the feature vector p to obtain a heat map and form a heat map set
Figure BDA0003724146630000031
Calculate the corresponding to the ith feature map t i
Figure BDA0003724146630000032
The i-th heatmap H i in :

Figure BDA0003724146630000033
Figure BDA0003724146630000033

其中

Figure BDA0003724146630000034
之后将每个热图均上采样至输入图像大小,获得相关矩阵C。相关矩阵由相关值组成,正值表示与
Figure BDA0003724146630000035
存在的普适表达相似度的正相关,负值表示负相关,绝对值越大,相关性越强。根据相关矩阵中正值的数量和大小可以有效判断是否为开集标签噪声,因此,本发明设置一个阈值δ来判断每一个样本,判断第i个样本是否为噪声:in
Figure BDA0003724146630000034
Each heatmap is then upsampled to the input image size to obtain the correlation matrix C. The correlation matrix consists of correlation values, with positive values indicating
Figure BDA0003724146630000035
Existing universal expresses a positive correlation of similarity, a negative value indicates a negative correlation, and the larger the absolute value, the stronger the correlation. According to the number and size of positive values in the correlation matrix, it can be effectively judged whether it is open set label noise. Therefore, the present invention sets a threshold δ to judge each sample, and judge whether the ith sample is noise:

Figure BDA0003724146630000036
Figure BDA0003724146630000036

如果样本不满足该条件,则视为开集标签噪声,并将其从样本空间中去除,从而可以获得由干净数据和闭集标签噪声组成的样本空间

Figure BDA0003724146630000037
If the sample does not meet this condition, it is regarded as open set label noise and removed from the sample space, so that a sample space composed of clean data and closed set label noise can be obtained
Figure BDA0003724146630000037

步骤2,利用损失函数驱动模型表现出较高的召回率,使每一个样本的标签集合中尽可能的包含样本真实类别的标签;Step 2, use the loss function to drive the model to show a high recall rate, so that the label set of each sample contains the labels of the true category of the sample as much as possible;

本发明定义标签空间为

Figure BDA0003724146630000038
样本空间为
Figure BDA0003724146630000039
其中
Figure BDA00037241466300000310
表示标签属于第i类yi的实例集合。在训练开始的样本选择阶段,先随机选择C个类别,生成一个批次
Figure BDA00037241466300000311
对于每一个选择到的类别yi,随机选取n*个该类别的样本。基于批次
Figure BDA00037241466300000312
a=n*×C,可以通过卷积神经网络ΦCNN得到嵌入特征
Figure BDA00037241466300000313
对于在
Figure BDA00037241466300000314
中的样本
Figure BDA00037241466300000315
来说,可以通过卷积神经网络ΦCNN获得嵌入特征fi:The invention defines the label space as
Figure BDA0003724146630000038
The sample space is
Figure BDA0003724146630000039
in
Figure BDA00037241466300000310
Indicates that the label belongs to the set of instances of the i -th class yi. In the sample selection stage at the beginning of training, C categories are randomly selected to generate a batch
Figure BDA00037241466300000311
For each selected class y i , n * samples of that class are randomly selected. batch based
Figure BDA00037241466300000312
a=n * ×C, the embedded features can be obtained through the convolutional neural network Φ CNN
Figure BDA00037241466300000313
for in
Figure BDA00037241466300000314
samples in
Figure BDA00037241466300000315
For example, the embedded features f i can be obtained by a convolutional neural network Φ CNN :

Figure BDA00037241466300000316
Figure BDA00037241466300000316

其中c是嵌入特征fi的长度。通过计算嵌入特征之间的余弦相似度得到相似矩阵

Figure BDA00037241466300000317
其中第i个查询图像(query image)和第j个支持图像(support image)的余弦相似度计算如下:where c is the length of the embedded feature fi . The similarity matrix is obtained by calculating the cosine similarity between the embedded features
Figure BDA00037241466300000317
The cosine similarity between the i-th query image and the j-th support image is calculated as follows:

Figure BDA00037241466300000318
Figure BDA00037241466300000318

将每一个查询图像和其他图像得到的相似度sq,:进行排列,相似度高的前k个图片放入集合

Figure BDA00037241466300000319
中。本发明定义与查询图像属于同一类别但是不在集合
Figure BDA00037241466300000320
中的图像称为正图像,在集合
Figure BDA00037241466300000321
中但是与查询图像不是同一类别的图像称为负图像。不在集合
Figure BDA00037241466300000322
中的正图像构成集合
Figure BDA00037241466300000323
其中
Figure BDA00037241466300000324
表示
Figure BDA00037241466300000325
Figure BDA00037241466300000326
的补集,yq是查询图像的标签。本发明设置n*<k,可以得到由负图像构成的集合
Figure BDA0003724146630000041
其中
Figure BDA0003724146630000042
为在集合
Figure BDA0003724146630000043
外与查询图像属于同一类别的图像的数量,sn是仅包含负图像的相似度分数的矩阵。因而,损失函数定义为Arrange the similarity s q,: obtained by each query image and other images, and put the top k images with high similarity into the set
Figure BDA00037241466300000319
middle. The present invention is defined as belonging to the same category as the query image but not in the collection
Figure BDA00037241466300000320
The images in are called positive images, in the set
Figure BDA00037241466300000321
Images that are in but not of the same class as the query image are called negative images. not in the collection
Figure BDA00037241466300000322
Positive image composition collection in
Figure BDA00037241466300000323
in
Figure BDA00037241466300000324
express
Figure BDA00037241466300000325
middle
Figure BDA00037241466300000326
The complement of , y q is the label of the query image. The present invention sets n * <k, and a set composed of negative images can be obtained
Figure BDA0003724146630000041
in
Figure BDA0003724146630000042
for the collection
Figure BDA0003724146630000043
The number of images that belong to the same class as the query image, and sn is a matrix containing similarity scores for only negative images. Therefore, the loss function is defined as

Figure BDA0003724146630000044
Figure BDA0003724146630000044

应用此损失函数可以确保每个图像都尽可能生成包含真实类别的标签集。Applying this loss function ensures that each image produces a label set that contains the true class as much as possible.

步骤3,利用偏标签学习的思想,从样本的标签集合中选择样本的真实标签,从而对闭集标签噪声进行修正。Step 3, using the idea of partial label learning, select the true label of the sample from the label set of the sample, so as to correct the closed-set label noise.

通过步骤2获得尽可能多的包含真实类别的标签集

Figure BDA0003724146630000045
后,要从闭集标签噪声的标签集中确定一个真实的标签。在编码阶段,通过随机生成N位的列编码来构建编码矩阵M∈{+1,-1}N×L,其中N表示类别的数量,L表示二分类器的数量,编码矩阵用于对训练过程中的样本进行划分。一随机生成的列编码v=[v1,v2,…,vN]T∈{+1,-1}N可以将标签空间划分正标签空间
Figure BDA0003724146630000046
和负标签空间
Figure BDA0003724146630000047
Obtain as many label sets as possible with ground-truth classes through step 2
Figure BDA0003724146630000045
Then, a true label is to be determined from the label set of closed-set label noise. In the encoding stage, an encoding matrix M∈{+1,-1} N×L is constructed by randomly generating a column encoding of N bits, where N represents the number of classes, L represents the number of binary classifiers, and the encoding matrix is used for training The samples in the process are divided. A randomly generated column encoding v=[v 1 ,v 2 ,...,v N ] T ∈{+1,-1} N can divide the label space into the positive label space
Figure BDA0003724146630000046
and negative label space
Figure BDA0003724146630000047

Figure BDA0003724146630000048
Figure BDA0003724146630000048

Figure BDA0003724146630000049
Figure BDA0003724146630000049

利用正负标签空间选择正负样本,给定一个训练样本

Figure BDA00037241466300000410
其中
Figure BDA00037241466300000411
本发明视标签集合
Figure BDA00037241466300000412
为一个整体来帮助构建二分类器。当标签集
Figure BDA00037241466300000413
中全部类别落入
Figure BDA00037241466300000414
Figure BDA00037241466300000415
时,样本
Figure BDA00037241466300000416
才会被用作正或负样本。于是这些正负样本组成了二分类训练集
Figure BDA00037241466300000417
Use the positive and negative label space to select positive and negative samples, given a training sample
Figure BDA00037241466300000410
in
Figure BDA00037241466300000411
Inventive View Tag Collection
Figure BDA00037241466300000412
as a whole to help build a binary classifier. when label set
Figure BDA00037241466300000413
All categories in
Figure BDA00037241466300000414
or
Figure BDA00037241466300000415
, the sample
Figure BDA00037241466300000416
will be used as positive or negative samples. So these positive and negative samples form a binary classification training set
Figure BDA00037241466300000417

在解码阶段,对于每一个类构造连通集,第j类的连通集可以表示为:In the decoding stage, for each class to construct a connected set, the connected set of the jth class can be expressed as:

Figure BDA00037241466300000418
Figure BDA00037241466300000418

根据连通集εy产生性能矩阵GN×L来反映分类器的能力,第j类在第t个分类器gt上的性能计算如下:According to the connected set εy , the performance matrix G N×L is generated to reflect the ability of the classifier. The performance of the jth class on the tth classifier gt is calculated as follows:

Figure BDA00037241466300000419
Figure BDA00037241466300000419

其中

Figure BDA00037241466300000420
Figure BDA00037241466300000421
是指示符函数,为了得到分类器在每一个类上的相对性能,对性能矩阵G逐行进行归一化:in
Figure BDA00037241466300000420
Figure BDA00037241466300000421
is the indicator function. In order to get the relative performance of the classifier on each class, the performance matrix G is normalized row by row:

Figure BDA00037241466300000422
Figure BDA00037241466300000422

其中

Figure BDA00037241466300000423
对于一个闭集标签噪声
Figure BDA00037241466300000424
可以获得类别预测通过:in
Figure BDA00037241466300000423
For a closed set label noise
Figure BDA00037241466300000424
Class predictions can be obtained by:

Figure BDA0003724146630000051
Figure BDA0003724146630000051

最后,闭集标签噪声获得伪标签

Figure BDA0003724146630000052
将干净样本和拥有伪标签的闭集标签噪声合并送入卷积神经网络中进行训练。Finally, closed-set label noise obtains pseudo-labels
Figure BDA0003724146630000052
The clean samples and closed-set label noise with pseudo-labels are combined into a convolutional neural network for training.

以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。The above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: The recorded technical solutions are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (7)

1.一种基于偏标签学习的网络监督细粒度图像识别方法,其特征在于,包括以下步骤:1. a network-supervised fine-grained image recognition method based on partial label learning, is characterized in that, comprises the following steps: 步骤1,利用拥有预训练的深度神经网络模型进行深度描述符变换来评估网络图像之间的正相关性,根据反映相关性的相关矩阵检测数据集中存在的开集标签噪声,并进行开集标签噪声的去除;Step 1: Use the pre-trained deep neural network model to perform deep descriptor transformation to evaluate the positive correlation between network images, detect the open set label noise in the data set according to the correlation matrix reflecting the correlation, and perform the open set labeling. noise removal; 步骤2,利用损失函数驱动模型,使每一个样本的标签集合中尽可能的包含样本真实类别的标签;Step 2, use the loss function to drive the model, so that the label set of each sample contains the label of the true category of the sample as much as possible; 步骤3,利用偏标签学习的思想,从样本的标签集合中选择样本的真实标签,从而对闭集标签噪声进行修正。Step 3, using the idea of partial label learning, select the true label of the sample from the label set of the sample, so as to correct the closed-set label noise. 2.根据权利要求1所述的基于偏标签学习的网络监督细粒度图像识别方法,其特征在于,步骤1利用拥有预训练的深度神经网络模型进行深度描述符变换来评估网络图像之间的正相关性,根据反映相关性的相关矩阵检测数据集中存在的开集标签噪声,并进行开集标签噪声的去除;2. The network-supervised fine-grained image recognition method based on partial label learning according to claim 1, wherein step 1 utilizes a deep neural network model with pre-training to carry out depth descriptor transformation to evaluate positive correlation between network images. Correlation, detect the open set label noise existing in the data set according to the correlation matrix reflecting the correlation, and remove the open set label noise;
Figure FDA0003724146620000011
为标签空间,
Figure FDA0003724146620000012
为样本空间,对于每一个包含n张图片的随机批次
Figure FDA0003724146620000013
Figure FDA0003724146620000014
通过拥有预训练的卷积神经网络Φpre提取到含有n个特征图的特征图集合
Figure FDA0003724146620000015
Assume
Figure FDA0003724146620000011
is the label space,
Figure FDA0003724146620000012
is the sample space, for each random batch containing n pictures
Figure FDA0003724146620000013
Figure FDA0003724146620000014
Extract a feature map set containing n feature maps by having a pre-trained convolutional neural network Φ pre
Figure FDA0003724146620000015
Figure FDA0003724146620000016
Figure FDA0003724146620000016
其中H,W和d分别表示特征图ti的高度、宽度和深度;利用主成分分析生成特征值最大的特征向量
Figure FDA0003724146620000017
对于给定的特征图集合,每一个特征图都和特征向量p进行通道加权求和获得热图并组成热图集合
Figure FDA0003724146620000018
计算与第i个特征图ti对应的
Figure FDA0003724146620000019
中第i个热图Hi
where H, W and d represent the height, width and depth of the feature map t i respectively; the eigenvector with the largest eigenvalue is generated by principal component analysis
Figure FDA0003724146620000017
For a given set of feature maps, each feature map is channel-weighted and summed with the feature vector p to obtain a heat map and form a heat map set
Figure FDA0003724146620000018
Calculate the corresponding to the ith feature map t i
Figure FDA0003724146620000019
The i-th heatmap H i in :
Figure FDA00037241466200000110
Figure FDA00037241466200000110
其中
Figure FDA00037241466200000111
之后将每个热图均上采样至输入图像大小,获得相关矩阵C;相关矩阵由相关值组成,正值表示与
Figure FDA00037241466200000112
存在的普适表达相似度的正相关,负值表示负相关,绝对值越大,相关性越强;根据相关矩阵中正值的数量和大小判断是否为开集标签噪声,因此,设置一个阈值δ来判断每一个样本,判断第i个样本是否为噪声:
in
Figure FDA00037241466200000111
After that, each heatmap is upsampled to the input image size to obtain the correlation matrix C; the correlation matrix consists of correlation values, and positive values indicate
Figure FDA00037241466200000112
The positive correlation of the existing universal expression similarity, the negative value indicates the negative correlation, the larger the absolute value, the stronger the correlation; according to the number and size of positive values in the correlation matrix to determine whether it is open set label noise, therefore, set a threshold δ to judge each sample and determine whether the ith sample is noise:
Figure FDA00037241466200000113
Figure FDA00037241466200000113
如果样本不满足该条件,则视为开集标签噪声,并将其从样本空间中去除,从而获得由干净数据和闭集标签噪声组成的样本空间
Figure FDA0003724146620000025
If the sample does not meet this condition, it is regarded as open-set label noise and removed from the sample space to obtain a sample space composed of clean data and closed-set label noise
Figure FDA0003724146620000025
3.根据权利要求2所述的基于偏标签学习的网络监督细粒度图像识别方法,其特征在于,步骤2利用损失函数驱动模型表现出较高的召回率。3. The network-supervised fine-grained image recognition method based on partial label learning according to claim 2, wherein step 2 utilizes a loss function to drive the model to show a higher recall rate. 定义标签空间为
Figure FDA0003724146620000026
样本空间为
Figure FDA0003724146620000027
其中
Figure FDA0003724146620000028
表示标签属于第i类yi的实例集合;在训练开始的样本选择阶段,先随机选择C个类别,生成一个批次
Figure FDA0003724146620000029
对于每一个选择到的类别yi,随机选取n*个该类别的样本;基于批次
Figure FDA00037241466200000210
通过卷积神经网络ΦCNN得到嵌入特征
Figure FDA00037241466200000211
对于在
Figure FDA00037241466200000212
中的样本
Figure FDA00037241466200000213
通过卷积神经网络ΦCNN获得嵌入特征fi
Define the label space as
Figure FDA0003724146620000026
The sample space is
Figure FDA0003724146620000027
in
Figure FDA0003724146620000028
Indicates that the label belongs to the instance set of the i-th class y i ; in the sample selection stage at the beginning of training, C classes are randomly selected to generate a batch
Figure FDA0003724146620000029
For each selected class y i , randomly select n * samples of that class; batch-based
Figure FDA00037241466200000210
Embedding features obtained by convolutional neural network Φ CNN
Figure FDA00037241466200000211
for in
Figure FDA00037241466200000212
samples in
Figure FDA00037241466200000213
Embedding features f i are obtained by convolutional neural network Φ CNN :
Figure FDA0003724146620000021
Figure FDA0003724146620000021
其中c是嵌入特征fi的长度;通过计算嵌入特征之间的余弦相似度得到相似矩阵
Figure FDA00037241466200000214
其中第i个查询图像和第j个支持图像的余弦相似度计算如下:
where c is the length of the embedded feature fi ; the similarity matrix is obtained by calculating the cosine similarity between the embedded features
Figure FDA00037241466200000214
where the cosine similarity of the i-th query image and the j-th support image is calculated as follows:
Figure FDA0003724146620000022
Figure FDA0003724146620000022
将每一个查询图像和其他图像得到的相似度sq,:进行排列,相似度高的前k个图片放入集合
Figure FDA00037241466200000215
中;定义与查询图像属于同一类别但是不在集合
Figure FDA00037241466200000216
中的图像称为正图像,在集合
Figure FDA00037241466200000226
中但是与查询图像不是同一类别的图像称为负图像;不在集合
Figure FDA00037241466200000218
中的正图像构成集合
Figure FDA00037241466200000217
其中
Figure FDA00037241466200000219
表示
Figure FDA00037241466200000220
Figure FDA00037241466200000221
的补集,yq是查询图像的标签;设置n*<k,得到由负图像构成的集合
Figure FDA00037241466200000222
Figure FDA00037241466200000223
其中
Figure FDA0003724146620000023
为在集合
Figure FDA00037241466200000224
与查询图像属于同一类别的图像的数量,sn是仅包含负图像的相似度分数的矩阵;因而,损失函数定义为
Arrange the similarity s q, : obtained by each query image and other images, and put the top k images with high similarity into the set
Figure FDA00037241466200000215
in; the definition is in the same category as the query image but not in the collection
Figure FDA00037241466200000216
The images in are called positive images, in the set
Figure FDA00037241466200000226
Images that are in but not of the same class as the query image are called negative images; not in the collection
Figure FDA00037241466200000218
Positive image composition collection in
Figure FDA00037241466200000217
in
Figure FDA00037241466200000219
express
Figure FDA00037241466200000220
middle
Figure FDA00037241466200000221
The complement of , y q is the label of the query image; set n * < k to get a set consisting of negative images
Figure FDA00037241466200000222
Figure FDA00037241466200000223
in
Figure FDA0003724146620000023
for the collection
Figure FDA00037241466200000224
the number of images that belong to the same class as the query image, sn is a matrix containing the similarity scores of only negative images; thus, the loss function is defined as
Figure FDA0003724146620000024
Figure FDA0003724146620000024
应用此损失函数能够确保每个图像都尽可能生成包含真实类别的标签集。Applying this loss function ensures that each image produces as many labels as possible containing the true class.
4.根据权利要求3所述的基于偏标签学习的网络监督细粒度图像识别方法,其特征在于,在步骤3中,利用偏标签学习从通过步骤2获得的标签集S中确定一个真实的标签;4. The network-supervised fine-grained image recognition method based on partial label learning according to claim 3, wherein in step 3, a real label is determined from the label set S obtained by step 2 using partial label learning ; 在编码阶段,通过随机生成N位的列编码来构建编码矩阵M∈{+1,-1}N×L,其中N表示类别的数量,L表示二分类器的数量,编码矩阵用于对训练过程中的样本进行划分;一随机生成的列编码
Figure FDA00037241466200000225
可将标签空间划分正标签空间
Figure FDA00037241466200000319
和负标签空间
Figure FDA00037241466200000320
In the encoding stage, an encoding matrix M∈{+1,-1} N×L is constructed by randomly generating N-bit column encodings, where N represents the number of classes, L represents the number of binary classifiers, and the encoding matrix is used for training Process samples are divided; a randomly generated column encoding
Figure FDA00037241466200000225
The label space can be divided into positive label space
Figure FDA00037241466200000319
and negative label space
Figure FDA00037241466200000320
Figure FDA0003724146620000031
Figure FDA0003724146620000031
Figure FDA0003724146620000032
Figure FDA0003724146620000032
利用正负标签空间选择正负样本,给定一个训练样本
Figure FDA00037241466200000316
其中
Figure FDA00037241466200000315
视标签集合
Figure FDA00037241466200000318
为一个整体来帮助构建二分类器;当标签集
Figure FDA00037241466200000317
中全部类别落入
Figure FDA00037241466200000314
Figure FDA00037241466200000312
时,样本
Figure FDA00037241466200000313
才会被用作正或负样本;于是这些正负样本组成了二分类训练集
Figure FDA00037241466200000311
Use the positive and negative label space to select positive and negative samples, given a training sample
Figure FDA00037241466200000316
in
Figure FDA00037241466200000315
visual label collection
Figure FDA00037241466200000318
as a whole to help build a binary classifier; when the label set
Figure FDA00037241466200000317
All categories in
Figure FDA00037241466200000314
or
Figure FDA00037241466200000312
, the sample
Figure FDA00037241466200000313
will be used as positive or negative samples; these positive and negative samples form the binary classification training set
Figure FDA00037241466200000311
在解码阶段,对于每一个类构造连通集,第j类的连通集表示为:In the decoding stage, a connected set is constructed for each class, and the connected set of the jth class is expressed as:
Figure FDA0003724146620000033
Figure FDA0003724146620000033
根据连通集εy产生性能矩阵GN×L来反映分类器的能力,第j类在第t个分类器gt上的性能计算如下:According to the connected set εy , the performance matrix G N×L is generated to reflect the ability of the classifier. The performance of the jth class on the tth classifier gt is calculated as follows:
Figure FDA0003724146620000034
Figure FDA0003724146620000034
其中
Figure FDA00037241466200000310
Figure FDA00037241466200000321
是指示符函数,对性能矩阵G逐行进行归一化:
in
Figure FDA00037241466200000310
Figure FDA00037241466200000321
is the indicator function that normalizes the performance matrix G row-wise:
Figure FDA0003724146620000035
Figure FDA0003724146620000035
其中
Figure FDA0003724146620000038
对于一个闭集标签噪声
Figure FDA0003724146620000039
获得类别预测通过:
in
Figure FDA0003724146620000038
For a closed set label noise
Figure FDA0003724146620000039
Get class predictions via:
Figure FDA0003724146620000036
Figure FDA0003724146620000036
最后,闭集标签噪声获得伪标签
Figure FDA0003724146620000037
将干净样本和拥有伪标签的闭集标签噪声合并送入卷积神经网络中进行训练。
Finally, closed-set label noise obtains pseudo-labels
Figure FDA0003724146620000037
The clean samples and closed-set label noise with pseudo-labels are combined into a convolutional neural network for training.
5.一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现权利要求1-4中任一所述的方法的步骤。5. A computer device comprising a memory, a processor and a computer program stored on the memory and running on the processor, wherein the processor implements any one of claims 1-4 when the processor executes the program the steps of the method. 6.一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现权利要求1-4中任一所述的方法的步骤。6. A computer-readable storage medium on which a computer program is stored, characterized in that, when the program is executed by a processor, the steps of the method according to any one of claims 1-4 are implemented. 7.一种计算机程序产品,包括计算机程序,其特征在于,该计算机程序被处理器执行时实现权利要求1_4中任一所述的方法的步骤。7. A computer program product, comprising a computer program, characterized in that, when the computer program is executed by a processor, the steps of the method according to any one of claims 1-4 are implemented.
CN202210761418.9A 2022-06-30 2022-06-30 Network supervision fine-grained image identification method based on partial label learning Pending CN115170813A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210761418.9A CN115170813A (en) 2022-06-30 2022-06-30 Network supervision fine-grained image identification method based on partial label learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210761418.9A CN115170813A (en) 2022-06-30 2022-06-30 Network supervision fine-grained image identification method based on partial label learning

Publications (1)

Publication Number Publication Date
CN115170813A true CN115170813A (en) 2022-10-11

Family

ID=83489216

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210761418.9A Pending CN115170813A (en) 2022-06-30 2022-06-30 Network supervision fine-grained image identification method based on partial label learning

Country Status (1)

Country Link
CN (1) CN115170813A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115564960A (en) * 2022-11-10 2023-01-03 南京码极客科技有限公司 Network image label denoising method combining sample selection and label correction
CN118552756A (en) * 2024-03-22 2024-08-27 杭州电子科技大学 Method for unsupervised detection of generated image based on visual language model

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115564960A (en) * 2022-11-10 2023-01-03 南京码极客科技有限公司 Network image label denoising method combining sample selection and label correction
CN118552756A (en) * 2024-03-22 2024-08-27 杭州电子科技大学 Method for unsupervised detection of generated image based on visual language model

Similar Documents

Publication Publication Date Title
CN111581405B (en) Cross-modal generalization zero sample retrieval method for generating confrontation network based on dual learning
CN107515895B (en) A visual target retrieval method and system based on target detection
CN112115995A (en) A semi-supervised learning based image multi-label classification method
US8254699B1 (en) Automatic large scale video object recognition
CN112966691A (en) Multi-scale text detection method and device based on semantic segmentation and electronic equipment
CN113139664B (en) A cross-modal transfer learning method
CN113076465A (en) Universal cross-modal retrieval model based on deep hash
CN111079847A (en) Remote sensing image automatic labeling method based on deep learning
CN114358188A (en) Feature extraction model processing method, feature extraction model processing device, sample retrieval method, sample retrieval device and computer equipment
CN112559781B (en) Image retrieval system and method
CN115712740B (en) Method and system for multimodal entailment enhanced image text retrieval
CN110647907A (en) Multi-label image classification algorithm using multi-layer classification and dictionary learning
CN115170813A (en) Network supervision fine-grained image identification method based on partial label learning
CN114926742B (en) A loop detection and optimization method based on second-order attention mechanism
CN111191033A (en) Open set classification method based on classification utility
CN112883931A (en) Real-time true and false motion judgment method based on long and short term memory network
CN114722892A (en) Continuous learning method and device based on machine learning
CN116737877A (en) Cross-modal retrieval method and device based on attention network against hashing
CN117557886A (en) Noise-containing tag image recognition method and system integrating bias tags and passive learning
CN116863250A (en) An open scene target detection method involving multi-modal unknown class recognition
CN116681128A (en) A neural network model training method and device for noisy multi-label data
CN114692750B (en) A fine-grained image classification method, device, electronic device and storage medium
CN118628813A (en) Passive domain adaptive image recognition method based on transferable semantic knowledge
CN115712751B (en) Cross-domain person search method based on text description
CN117437426A (en) A semi-supervised semantic segmentation method guided by high-density representative prototypes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Xu Yuyan

Inventor after: Wei Xiucan

Inventor before: Wei Xiucan

Inventor before: Xu Yuyan