CN115170813A - Network supervision fine-grained image identification method based on partial label learning - Google Patents
Network supervision fine-grained image identification method based on partial label learning Download PDFInfo
- Publication number
- CN115170813A CN115170813A CN202210761418.9A CN202210761418A CN115170813A CN 115170813 A CN115170813 A CN 115170813A CN 202210761418 A CN202210761418 A CN 202210761418A CN 115170813 A CN115170813 A CN 115170813A
- Authority
- CN
- China
- Prior art keywords
- label
- noise
- sample
- correlation
- images
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 23
- 239000011159 matrix material Substances 0.000 claims abstract description 23
- 230000006870 function Effects 0.000 claims abstract description 13
- 238000003062 neural network model Methods 0.000 claims abstract description 7
- 230000009466 transformation Effects 0.000 claims abstract description 6
- 238000013527 convolutional neural network Methods 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 10
- 238000002372 labelling Methods 0.000 claims description 4
- 230000000295 complement effect Effects 0.000 claims description 2
- 238000000513 principal component analysis Methods 0.000 claims description 2
- 230000008569 process Effects 0.000 claims description 2
- 230000000007 visual effect Effects 0.000 claims 1
- 230000007786 learning performance Effects 0.000 abstract description 2
- 238000013528 artificial neural network Methods 0.000 abstract 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/30—Noise filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Image Analysis (AREA)
Abstract
Description
技术领域technical field
本发明属于网络监督图像识别领域,具体涉及一种基于偏标签学习的网络监督细粒度图像识别方法。The invention belongs to the field of network supervised image recognition, in particular to a network supervised fine-grained image recognition method based on partial label learning.
背景技术Background technique
构建细粒度数据集需要特定领域专家通过细粒度子类之间的细微差异进行正确分类,因而是一项困难的工作。为了减少构建细粒度数据集对手工标注的依赖并且学习到更实用的模型,直接从互联网中收集相关类别的图像构建网络数据集并投入训练的方式变得越来越流行。但是构建的网络数据集存在较多的数据噪声,直接训练会导致模型过拟合从而影响准确率。细粒度网络数据集一般存在两类噪声,即开集标签噪声和闭集标签噪声。开集标签噪声通常是由“跨域”引起的,即噪声不属于同一细粒度域中的任何一类。闭集噪声是指在一个细粒度域中有错误标签的图像。Building fine-grained datasets requires domain-specific experts to correctly classify through subtle differences between fine-grained subclasses, making it a difficult task. In order to reduce the reliance on manual annotation for building fine-grained datasets and learn more practical models, it is becoming more and more popular to collect images of relevant categories directly from the Internet to build network datasets and put them into training. However, the constructed network dataset has a lot of data noise, and direct training will lead to overfitting of the model and affect the accuracy. There are generally two types of noise in fine-grained network datasets, namely open-set label noise and closed-set label noise. Open-set label noise is usually caused by "cross-domain", i.e. the noise does not belong to any class in the same fine-grained domain. Closed-set noise refers to images that are mislabeled in a fine-grained domain.
处理一般的标签噪声的方法有样本选择、采用软标签或者是相关的损失函数,这些方法虽然已有很好的分类效果,但是存在1)丢弃一部分干净图像的风险,2)在噪声图像的利用上仍然存在着无法将闭集标签噪声图像转化为准确的训练图像等问题。Methods to deal with general label noise include sample selection, soft labeling or related loss functions. Although these methods have good classification results, they have the risk of 1) discarding part of the clean image, 2) using the noisy image There are still problems such as the inability to convert closed-set label noise images into accurate training images.
发明内容SUMMARY OF THE INVENTION
本发明的目的在于提供一种基于偏标签学习的网络监督细粒度图像识别方法。The purpose of the present invention is to provide a network-supervised fine-grained image recognition method based on partial label learning.
实现本发明目的的技术方案为:第一方面,本发明提供一种基于偏标签学习的网络监督细粒度图像识别方法,包括以下步骤:The technical scheme for realizing the purpose of the present invention is as follows: In the first aspect, the present invention provides a network-supervised fine-grained image recognition method based on partial label learning, comprising the following steps:
步骤1,利用拥有预训练的深度神经网络模型进行深度描述符变换来评估网络图像之间的正相关性,根据反映相关性的相关矩阵检测数据集中存在的开集标签噪声,并进行开集标签噪声的去除;Step 1: Use the pre-trained deep neural network model to perform deep descriptor transformation to evaluate the positive correlation between network images, detect the open set label noise in the data set according to the correlation matrix reflecting the correlation, and perform the open set labeling. noise removal;
步骤2,利用损失函数驱动模型,使每一个样本的标签集合中尽可能的包含样本真实类别的标签;
步骤3,利用偏标签学习的思想,从样本的标签集合中选择样本的真实标签,从而对闭集标签噪声进行修正。
第二方面,本发明提供一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现第一方面所述的方法的步骤。In a second aspect, the present invention provides a computer device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor implementing the method described in the first aspect when the processor executes the program A step of.
第三方面,本发明提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现第一方面所述的方法的步骤。In a third aspect, the present invention provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the steps of the method described in the first aspect.
第四方面,本发明提供一种计算机程序产品,包括计算机程序,该计算机程序被处理器执行时实现第一方面所述的方法的步骤。In a fourth aspect, the present invention provides a computer program product, comprising a computer program, which implements the steps of the method described in the first aspect when the computer program is executed by a processor.
本发明与现有技术相比,其显著优点为:(1)本发明提出了一种开集标签噪声去除策略和闭集标签噪声校正策略来处理实际但具有挑战性的网络监督细粒度识别任务;(2)利用预先训练的深度模型进行深度描述符变换来估计网络图像之间的正相关性,并根据相关性值有效的进行开集标签噪声的检测和清除;(3)自行构建每一个样本的候选标签集合,利用损失函数驱动模型表现出较高的召回率,使每个样本的候选标签集合中尽可能存在样本真实标签,保证偏标签学习的性能;(4)利用偏标签学习将闭集标签噪声转化为拥有准确标签的训练图像,在纠正闭集标签噪声的同时产生了闭集标签噪声到训练数据的转化,使网络数据集中的可用样本数量增加,保证神经网络模型学习性能的提高。Compared with the prior art, the present invention has the following significant advantages: (1) The present invention proposes an open-set label noise removal strategy and a closed-set label noise correction strategy to deal with practical but challenging network-supervised fine-grained recognition tasks ; (2) Use the pre-trained deep model to perform depth descriptor transformation to estimate the positive correlation between network images, and effectively detect and remove the open-set label noise according to the correlation value; (3) Build each The candidate label set of the sample, using the loss function to drive the model to show a high recall rate, so that the sample real label exists in the candidate label set of each sample as much as possible to ensure the performance of partial label learning; (4) Use partial label learning to The closed-set label noise is converted into training images with accurate labels. While correcting the closed-set label noise, the closed-set label noise is converted into training data, which increases the number of available samples in the network data set and ensures the learning performance of the neural network model. improve.
附图说明Description of drawings
图1为本发明基于偏标签学习的网络监督细粒度图像识别方法流程图。FIG. 1 is a flowchart of the network-supervised fine-grained image recognition method based on partial label learning of the present invention.
具体实施方式Detailed ways
结合图1,一种基于偏标签学习的网络监督细粒度图像识别方法,具体包括以下步骤:Referring to Figure 1, a network-supervised fine-grained image recognition method based on partial label learning includes the following steps:
步骤1,利用拥有预训练的深度神经网络模型进行深度描述符变换来评估网络图像之间的正相关性,根据反映相关性的相关矩阵检测数据集中存在的开集标签噪声,并进行开集标签噪声的去除;Step 1: Use the pre-trained deep neural network model to perform deep descriptor transformation to evaluate the positive correlation between network images, detect the open set label noise in the data set according to the correlation matrix reflecting the correlation, and perform open set labeling. noise removal;
设为标签空间,为样本空间,对于每一个包含n张图片的随机批次 通过拥有预训练的卷积神经网络Φpre提取到含有n个特征图的特征图集合 Assume is the label space, is the sample space, for each random batch containing n pictures Extract a feature map set containing n feature maps by having a pre-trained convolutional neural network Φ pre
其中H,W和d分别表示特征图ti的高度、宽度和深度。为了能在特征图集合上找到更普适的高级特征表达,利用主成分分析生成特征值最大的特征向量对于给定的特征图集合,每一个特征图都和特征向量p进行通道加权求和获得热图并组成热图集合计算与第i个特征图ti对应的中第i个热图Hi:where H, W and d represent the height, width and depth of the feature map ti , respectively. In order to be able to collect in the feature map Find a more general high-level feature expression, and use principal component analysis to generate the eigenvector with the largest eigenvalue For a given set of feature maps, each feature map is channel-weighted and summed with the feature vector p to obtain a heat map and form a heat map set Calculate the corresponding to the ith feature map t i The i-th heatmap H i in :
其中之后将每个热图均上采样至输入图像大小,获得相关矩阵C。相关矩阵由相关值组成,正值表示与存在的普适表达相似度的正相关,负值表示负相关,绝对值越大,相关性越强。根据相关矩阵中正值的数量和大小可以有效判断是否为开集标签噪声,因此,本发明设置一个阈值δ来判断每一个样本,判断第i个样本是否为噪声:in Each heatmap is then upsampled to the input image size to obtain the correlation matrix C. The correlation matrix consists of correlation values, with positive values indicating Existing universal expresses a positive correlation of similarity, a negative value indicates a negative correlation, and the larger the absolute value, the stronger the correlation. According to the number and size of positive values in the correlation matrix, it can be effectively judged whether it is open set label noise. Therefore, the present invention sets a threshold δ to judge each sample, and judge whether the ith sample is noise:
如果样本不满足该条件,则视为开集标签噪声,并将其从样本空间中去除,从而可以获得由干净数据和闭集标签噪声组成的样本空间 If the sample does not meet this condition, it is regarded as open set label noise and removed from the sample space, so that a sample space composed of clean data and closed set label noise can be obtained
步骤2,利用损失函数驱动模型表现出较高的召回率,使每一个样本的标签集合中尽可能的包含样本真实类别的标签;
本发明定义标签空间为样本空间为其中表示标签属于第i类yi的实例集合。在训练开始的样本选择阶段,先随机选择C个类别,生成一个批次对于每一个选择到的类别yi,随机选取n*个该类别的样本。基于批次a=n*×C,可以通过卷积神经网络ΦCNN得到嵌入特征对于在中的样本来说,可以通过卷积神经网络ΦCNN获得嵌入特征fi:The invention defines the label space as The sample space is in Indicates that the label belongs to the set of instances of the i -th class yi. In the sample selection stage at the beginning of training, C categories are randomly selected to generate a batch For each selected class y i , n * samples of that class are randomly selected. batch based a=n * ×C, the embedded features can be obtained through the convolutional neural network Φ CNN for in samples in For example, the embedded features f i can be obtained by a convolutional neural network Φ CNN :
其中c是嵌入特征fi的长度。通过计算嵌入特征之间的余弦相似度得到相似矩阵其中第i个查询图像(query image)和第j个支持图像(support image)的余弦相似度计算如下:where c is the length of the embedded feature fi . The similarity matrix is obtained by calculating the cosine similarity between the embedded features The cosine similarity between the i-th query image and the j-th support image is calculated as follows:
将每一个查询图像和其他图像得到的相似度sq,:进行排列,相似度高的前k个图片放入集合中。本发明定义与查询图像属于同一类别但是不在集合中的图像称为正图像,在集合中但是与查询图像不是同一类别的图像称为负图像。不在集合中的正图像构成集合其中表示中的补集,yq是查询图像的标签。本发明设置n*<k,可以得到由负图像构成的集合其中为在集合外与查询图像属于同一类别的图像的数量,sn是仅包含负图像的相似度分数的矩阵。因而,损失函数定义为Arrange the similarity s q,: obtained by each query image and other images, and put the top k images with high similarity into the set middle. The present invention is defined as belonging to the same category as the query image but not in the collection The images in are called positive images, in the set Images that are in but not of the same class as the query image are called negative images. not in the collection Positive image composition collection in in express middle The complement of , y q is the label of the query image. The present invention sets n * <k, and a set composed of negative images can be obtained in for the collection The number of images that belong to the same class as the query image, and sn is a matrix containing similarity scores for only negative images. Therefore, the loss function is defined as
应用此损失函数可以确保每个图像都尽可能生成包含真实类别的标签集。Applying this loss function ensures that each image produces a label set that contains the true class as much as possible.
步骤3,利用偏标签学习的思想,从样本的标签集合中选择样本的真实标签,从而对闭集标签噪声进行修正。
通过步骤2获得尽可能多的包含真实类别的标签集后,要从闭集标签噪声的标签集中确定一个真实的标签。在编码阶段,通过随机生成N位的列编码来构建编码矩阵M∈{+1,-1}N×L,其中N表示类别的数量,L表示二分类器的数量,编码矩阵用于对训练过程中的样本进行划分。一随机生成的列编码v=[v1,v2,…,vN]T∈{+1,-1}N可以将标签空间划分正标签空间和负标签空间 Obtain as many label sets as possible with ground-truth classes through
利用正负标签空间选择正负样本,给定一个训练样本其中本发明视标签集合为一个整体来帮助构建二分类器。当标签集中全部类别落入或时,样本才会被用作正或负样本。于是这些正负样本组成了二分类训练集 Use the positive and negative label space to select positive and negative samples, given a training sample in Inventive View Tag Collection as a whole to help build a binary classifier. when label set All categories in or , the sample will be used as positive or negative samples. So these positive and negative samples form a binary classification training set
在解码阶段,对于每一个类构造连通集,第j类的连通集可以表示为:In the decoding stage, for each class to construct a connected set, the connected set of the jth class can be expressed as:
根据连通集εy产生性能矩阵GN×L来反映分类器的能力,第j类在第t个分类器gt上的性能计算如下:According to the connected set εy , the performance matrix G N×L is generated to reflect the ability of the classifier. The performance of the jth class on the tth classifier gt is calculated as follows:
其中 是指示符函数,为了得到分类器在每一个类上的相对性能,对性能矩阵G逐行进行归一化:in is the indicator function. In order to get the relative performance of the classifier on each class, the performance matrix G is normalized row by row:
其中对于一个闭集标签噪声可以获得类别预测通过:in For a closed set label noise Class predictions can be obtained by:
最后,闭集标签噪声获得伪标签将干净样本和拥有伪标签的闭集标签噪声合并送入卷积神经网络中进行训练。Finally, closed-set label noise obtains pseudo-labels The clean samples and closed-set label noise with pseudo-labels are combined into a convolutional neural network for training.
以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。The above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: The recorded technical solutions are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims (7)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210761418.9A CN115170813A (en) | 2022-06-30 | 2022-06-30 | Network supervision fine-grained image identification method based on partial label learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210761418.9A CN115170813A (en) | 2022-06-30 | 2022-06-30 | Network supervision fine-grained image identification method based on partial label learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115170813A true CN115170813A (en) | 2022-10-11 |
Family
ID=83489216
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210761418.9A Pending CN115170813A (en) | 2022-06-30 | 2022-06-30 | Network supervision fine-grained image identification method based on partial label learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115170813A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115564960A (en) * | 2022-11-10 | 2023-01-03 | 南京码极客科技有限公司 | Network image label denoising method combining sample selection and label correction |
CN118552756A (en) * | 2024-03-22 | 2024-08-27 | 杭州电子科技大学 | Method for unsupervised detection of generated image based on visual language model |
-
2022
- 2022-06-30 CN CN202210761418.9A patent/CN115170813A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115564960A (en) * | 2022-11-10 | 2023-01-03 | 南京码极客科技有限公司 | Network image label denoising method combining sample selection and label correction |
CN118552756A (en) * | 2024-03-22 | 2024-08-27 | 杭州电子科技大学 | Method for unsupervised detection of generated image based on visual language model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111581405B (en) | Cross-modal generalization zero sample retrieval method for generating confrontation network based on dual learning | |
CN107515895B (en) | A visual target retrieval method and system based on target detection | |
CN112115995A (en) | A semi-supervised learning based image multi-label classification method | |
US8254699B1 (en) | Automatic large scale video object recognition | |
CN112966691A (en) | Multi-scale text detection method and device based on semantic segmentation and electronic equipment | |
CN113139664B (en) | A cross-modal transfer learning method | |
CN113076465A (en) | Universal cross-modal retrieval model based on deep hash | |
CN111079847A (en) | Remote sensing image automatic labeling method based on deep learning | |
CN114358188A (en) | Feature extraction model processing method, feature extraction model processing device, sample retrieval method, sample retrieval device and computer equipment | |
CN112559781B (en) | Image retrieval system and method | |
CN115712740B (en) | Method and system for multimodal entailment enhanced image text retrieval | |
CN110647907A (en) | Multi-label image classification algorithm using multi-layer classification and dictionary learning | |
CN115170813A (en) | Network supervision fine-grained image identification method based on partial label learning | |
CN114926742B (en) | A loop detection and optimization method based on second-order attention mechanism | |
CN111191033A (en) | Open set classification method based on classification utility | |
CN112883931A (en) | Real-time true and false motion judgment method based on long and short term memory network | |
CN114722892A (en) | Continuous learning method and device based on machine learning | |
CN116737877A (en) | Cross-modal retrieval method and device based on attention network against hashing | |
CN117557886A (en) | Noise-containing tag image recognition method and system integrating bias tags and passive learning | |
CN116863250A (en) | An open scene target detection method involving multi-modal unknown class recognition | |
CN116681128A (en) | A neural network model training method and device for noisy multi-label data | |
CN114692750B (en) | A fine-grained image classification method, device, electronic device and storage medium | |
CN118628813A (en) | Passive domain adaptive image recognition method based on transferable semantic knowledge | |
CN115712751B (en) | Cross-domain person search method based on text description | |
CN117437426A (en) | A semi-supervised semantic segmentation method guided by high-density representative prototypes |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information | ||
CB03 | Change of inventor or designer information |
Inventor after: Xu Yuyan Inventor after: Wei Xiucan Inventor before: Wei Xiucan Inventor before: Xu Yuyan |