CN116630694A - A target classification method, system and electronic equipment for more marked images - Google Patents
A target classification method, system and electronic equipment for more marked images Download PDFInfo
- Publication number
- CN116630694A CN116630694A CN202310544125.XA CN202310544125A CN116630694A CN 116630694 A CN116630694 A CN 116630694A CN 202310544125 A CN202310544125 A CN 202310544125A CN 116630694 A CN116630694 A CN 116630694A
- Authority
- CN
- China
- Prior art keywords
- label
- iteration
- image
- tag
- classifier
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 239000003550 marker Substances 0.000 claims abstract description 7
- 230000006870 function Effects 0.000 claims description 28
- 238000012549 training Methods 0.000 claims description 19
- 230000000052 comparative effect Effects 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 6
- 238000013434 data augmentation Methods 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 3
- 239000000523 sample Substances 0.000 description 18
- 238000013145 classification model Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000003190 augmentative effect Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000004931 aggregating effect Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000003416 augmentation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000002195 synergetic effect Effects 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
- G06V10/765—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/70—Labelling scene content, e.g. deriving syntactic or semantic representations
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Image Analysis (AREA)
Abstract
本发明提供的一种偏多标记图像的目标分类方法、系统及电子设备,涉及偏多标记图像的目标分类技术领域。所述方法包括获取待识别偏多标记图像;将待识别偏多标记图像输入到相关标记确定模型中,确定待识别偏多标记图像中的所有目标的相关标记;根据多个相关标记确定待识别偏多标记图像中所有目标的种类;相关标记与目标种类一一对应。本发明利用对比标签消歧原理对分类器进行训练得到能够准确识别未见图像中相关标记的多标记确定模型,进而提高图像分类的准确性。
The invention provides an object classification method, system and electronic equipment for an over-marked image, and relates to the technical field of object classification for an over-marked image. The method includes acquiring an image of many markers to be identified; inputting the image of many markers to be identified into a related marker determination model, determining the related markers of all targets in the image of many markers to be identified; determining the target to be identified according to a plurality of related markers Mostly label the types of all objects in the image; the relevant labels correspond to the object types one-to-one. The invention utilizes the principle of contrastive label disambiguation to train a classifier to obtain a multi-label determination model capable of accurately identifying relevant labels in unseen images, thereby improving the accuracy of image classification.
Description
技术领域technical field
本发明涉及偏多标记图像的目标分类技术领域,特别是涉及一种偏多标记图像的目标分类方法、系统及电子设备。The present invention relates to the technical field of object classification of overly marked images, in particular to an object classification method, system and electronic equipment of overly marked images.
背景技术Background technique
多标记图像分类旨在处理图像分类问题,其中每个图像都与多个标签信息相关。目前多标记图像分类问题得到广泛关注。但多标记图像分类依赖于对数据的准确标注,这在资源有限的现实场景中极难实现。为了缓解标注压力,目前采取的方式是给每一张图像赋予由非专业的标注人员打上的多个候选标签,候选标签中不仅包括有利于图像分类的相关标记,也包括一些噪声标记。利用这种图像的候选标签集合进行学习的方式被定义为偏多标记图像的目标分类问题。Multi-label image classification aims to deal with image classification problems, where each image is associated with multiple label information. At present, the problem of multi-label image classification has received extensive attention. However, multi-label image classification relies on accurate labeling of data, which is extremely difficult to achieve in real-world scenarios with limited resources. In order to alleviate the labeling pressure, the current approach is to assign multiple candidate labels to each image by non-professional labelers. The candidate labels include not only relevant labels that are beneficial to image classification, but also some noise labels. Learning from such a set of candidate labels for images is defined as an overly labeled image object classification problem.
偏多标签学习问题包括两种类型:第一种类型是给每个候选标签分配一个置信度,在训练中迭代更新标签置信度和分类模型参数。比如,论文《Partial Multi-LabelLearning》提出在训练分类器的过程中,通过分别考虑标签相关性和特征原型来优化标签排序置信度矩阵。论文《Feature-Induced Partial Multi-label Learning》通过考虑标签和特征空间的低秩特性引入了特征诱导式偏多标签方法。论文《Partial Multi-LabelLearning with Meta Disambiguation》提出通过迭代最小化置信度加权的排序损失和使用模型在验证集上性能自适应地估计每个候选标签的置信度,以达到消歧的目的。第二种类型是两阶段训练法,从候选标签集合中获取可靠标签,然后使用这些可靠标签训练多标签分类器。比如论文《Discriminative and Correlative Partial Multi-LabelLearning》应用特征流形诱导出高置信度标签,进而训练多标签分类器。论文《PartialMulti-Label Learning via Credible Label Elicitation》通过使用迭代标签传播策略提取可靠标签来诱导分类器。专利方面,申请号为202010412162.1的中国发明专利提供了一种基于多子空间表示的偏多标签学习方法;申请号为202010412161.7的中国发明专利申请提供了一种基于噪声容忍的偏多标签学习方法;申请号为202111369388.9的中国发明专利申请提供一种基于偏多标签学习的患者筛选标签方法;申请号为202010411579.6的中国发明专利申请提供了基于全局和局部标签关系的偏多标签学习方法;申请号为202110717550.5的中国发明专利申请提供了一种基于补标签协同训练的偏多标签学习方法;申请号为202010411580.9的中国发明专利申请提供一种特征信息存在噪声的偏多标签学习方法,利用低秩与稀疏分解的思想恢复正确的特征信息,有效减少噪声特征信息的影响。尽管这些传统的方法已经取得显著的进步,但是它们普遍基于手工特征进行学习,面对偏多标记图像的目标分类问题时,表征能力和标签修正能力弱,无法达到较好的标签消歧效果。There are two types of multi-label learning problems: the first type is to assign a confidence level to each candidate label, and iteratively update the label confidence level and classification model parameters during training. For example, the paper "Partial Multi-Label Learning" proposes to optimize the label ranking confidence matrix by considering label correlation and feature prototypes separately during the process of training classifiers. The paper "Feature-Induced Partial Multi-label Learning" introduces a feature-induced partial multi-label method by considering the low-rank characteristics of the label and feature space. The paper "Partial Multi-Label Learning with Meta Disambiguation" proposes to iteratively minimize the confidence-weighted ranking loss and use the performance of the model on the validation set to adaptively estimate the confidence of each candidate label to achieve the purpose of disambiguation. The second type is a two-stage training method, which obtains reliable labels from a set of candidate labels, and then uses these reliable labels to train a multi-label classifier. For example, the paper "Discriminative and Correlative Partial Multi-Label Learning" uses feature manifolds to induce high-confidence labels, and then trains multi-label classifiers. The paper "PartialMulti-Label Learning via Credible Label Elimination" induces classifiers by extracting credible labels using an iterative label propagation strategy. In terms of patents, the Chinese Invention Patent Application No. 202010412162.1 provides a multi-label learning method based on multi-subspace representation; the Chinese Invention Patent Application No. 202010412161.7 provides a multi-label learning method based on noise tolerance; The Chinese invention patent application with application number 202111369388.9 provides a patient screening label method based on multi-label learning; the Chinese invention patent application with application number 202010411579.6 provides a multi-label learning method based on global and local label relationships; the application number is The Chinese invention patent application of 202110717550.5 provides a multi-label learning method based on complementary label collaborative training; the Chinese invention patent application of application number 202010411580.9 provides a multi-label learning method with noise in the feature information, using low rank and sparse The idea of decomposition restores the correct feature information and effectively reduces the influence of noise feature information. Although these traditional methods have made significant progress, they generally learn based on manual features. When faced with the target classification problem of more labeled images, their representation ability and label correction ability are weak, and they cannot achieve good label disambiguation effect.
发明内容Contents of the invention
本发明的目的是提供一种偏多标记图像的目标分类方法、系统及电子设备,利用对比标签消歧原理对分类器进行训练得到能够准确识别未见的偏多标记图像中相关标记的相关标记确定模型,进而提高图像分类的准确性。The purpose of the present invention is to provide a target classification method, system and electronic equipment for over-marked images, using the principle of contrastive label disambiguation to train the classifier to obtain related tags that can accurately identify related tags in unseen over-marked images Determine the model to improve the accuracy of image classification.
为实现上述目的,本发明提供了如下方案:To achieve the above object, the present invention provides the following scheme:
一种偏多标记图像的目标分类方法,包括:An object classification method for overly labeled images, comprising:
获取待识别偏多标记图像;所述待识别偏多标记图像中至少包括1个目标;Acquiring an image with many marks to be identified; the image with many marks to be identified includes at least one target;
将待识别偏多标记图像输入到相关标记确定模型中,确定所述待识别偏多标记图像中的所有目标的相关标记;所述相关标记确定模型是根据多张偏多标记历史图像,利用对比标签消歧原理对分类器进行训练得到的;Input the image of many markers to be identified into the relevant marker determination model, and determine the relevant markers of all targets in the image of multiple markers to be identified; The label disambiguation principle is used to train the classifier;
根据多个所述相关标记确定待识别偏多标记图像中所有目标的种类;所述相关标记与目标种类一一对应。The types of all targets in the image with more tags to be identified are determined according to the plurality of related tags; the related tags correspond to the target types one by one.
可选的,在获取待分类图像之前,还包括:Optionally, before obtaining the image to be classified, it also includes:
获取多张偏多标记历史图像;所述偏多标记历史图像上标注有多种标记;所述标记的种类为相关标记或噪声标记;Acquiring multiple over-marked historical images; the over-marked historical images are marked with various marks; the type of the mark is a relevant mark or a noise mark;
对所述偏多标记历史图像进行随机数据增广处理,得到偏多标记历史图像的query视图和key视图;Carrying out random data augmentation processing on the historical image with too many marks to obtain query view and key view of the historical image with too many marks;
确定query视图下的标签级嵌入和key视图下的标签级嵌入;所述标签级嵌入与所述偏多标记历史图像上多个标记一一对应;Determine the tag-level embedding under the query view and the tag-level embedding under the key view; the tag-level embedding is in one-to-one correspondence with the multiple tags on the historical image with more tags;
根据query视图下的标签级嵌入和key视图下的标签级嵌入,利用对比标签消歧原理对分类器进行训练,得到所述相关标记确定模型。According to the tag-level embedding under the query view and the tag-level embedding under the key view, the classifier is trained using the principle of contrastive tag disambiguation to obtain the relevant tag determination model.
可选的,在确定query视图下的标签级嵌入和key视图下的标签级嵌入之后,还包括:Optionally, after determining the tag-level embedding under the query view and the tag-level embedding under the key view, further include:
确定query视图下的多个标签级嵌入的正负性;Determine the sign of multiple tag-level embeddings under the query view;
确定key视图下的多个标签级嵌入的正负性。Determines the signability of multiple label-level embeddings under the key view.
可选的,所述根据query视图下的标签级嵌入和key视图下的标签级嵌入,利用对比标签消歧原理对分类器进行训练,得到所述相关标记确定模型,包括:Optionally, according to the label-level embedding under the query view and the label-level embedding under the key view, the classifier is trained using the principle of contrastive label disambiguation to obtain the relevant label determination model, including:
确定所述分类器为第0次迭代时的分类器;Determine that the classifier is the classifier at the 0th iteration;
获取分类器中每种标记的初始正原型为第0次迭代时的正原型;Obtain the initial positive type of each mark in the classifier as the positive type at the 0th iteration;
获取分类器中每种标记的初始负原型为第0次迭代时的负原型;Obtain the initial negative prototype of each label in the classifier as the negative prototype of the 0th iteration;
令第一迭代次数i=1;Let the first iteration number i=1;
令第二迭代次数j=1;Let the second iteration number j=1;
确定任一基于query视图下的任一标签级嵌入为当前标签级嵌入;Determine any tag-level embedding under any query-based view as the current tag-level embedding;
根据当前标签级嵌入的正负性,更新第i-1次迭代时的正原型和第i-1次迭代时的负原型;According to the positive and negative of the current label-level embedding, update the positive prototype at the i-1th iteration and the negative prototype at the i-1th iteration;
计算当前标签级嵌入与更新后第i-1次迭代时的正原型的相似性为第一相似性;Calculate the similarity between the current tag-level embedding and the positive prototype at the i-1th iteration after the update as the first similarity;
计算当前标签级嵌入与更新后第i-1次迭代时的负原型的相似性为第二相似性;Calculate the similarity between the current label-level embedding and the negative prototype at the i-1th iteration after the update as the second similarity;
根据所述第一相似性和第二相似性,确定当前标签级嵌入根据原型预测的标记向量;According to the first similarity and the second similarity, determine the current label-level embedding according to the label vector predicted by the prototype;
根据所述标记向量更新当前标签级嵌入对应标记的伪标记,得到第j次迭代时当前标签级嵌入对应标记的伪标记;Updating the pseudo-label embedded with the corresponding label at the current label level according to the label vector, and obtaining the pseudo-label embedded with the corresponding label at the current label level during the jth iteration;
令第二迭代次数j的数值增加1,经当前标签级嵌入更新为同一query视图下的当前标签级嵌入之外的当前标签级嵌入,并返回步骤“根据当前标签级嵌入的正负性,更新第i-1次迭代时的正原型和第i-1次迭代时的负原型”,直至第二迭代次数达到第二迭代次数阈值;Increase the value of the second iteration number j by 1, update the current tag-level embedding to a current tag-level embedding other than the current tag-level embedding under the same query view, and return to the step "according to the positive or negative of the current tag-level embedding, update The positive prototype at the i-1th iteration and the negative prototype at the i-1th iteration" until the second iteration number reaches the second iteration number threshold;
将对应query视图下的多个当前标签级嵌入输入第i-1次迭代时的分类器中,得到多个类别输出;Embed multiple current label levels corresponding to the query view into the classifier at the i-1th iteration to obtain multiple category outputs;
根据多个所述类别输出和多次迭代时当前标签级嵌入对应标记的伪标记,确定第i-1次迭代时的分类损失函数;Determine the classification loss function at the i-1th iteration according to a plurality of the category outputs and the pseudo-labels embedded in the corresponding labels at the current label level during multiple iterations;
判断所述分类损失函数是否小于分类损失函数阈值,得到第一判断结果;Judging whether the classification loss function is less than a classification loss function threshold, and obtaining a first judgment result;
若所述第一判断结果为否,则更新第i-1次迭代时的分类器的参数,得到第i次迭代时的分类器,令第一迭代次数i的数值增加1,并返回步骤“令第二迭代次数j=1”;If the first judgment result is no, update the parameters of the classifier at the i-1th iteration to obtain the classifier at the ith iteration, increase the value of the first iteration number i by 1, and return to step " Let the second iteration times j=1";
若所述第一判断结果为是,则判断第一迭代次数是否达到第一迭代次数阈值,得到第二判断结果;If the first judgment result is yes, it is judged whether the first iteration number reaches the first iteration number threshold, and a second judgment result is obtained;
若所述第二判断结果为否,则确定第i-1次迭代时的分类器为第i次迭代时的分类器,令第一迭代次数i的数值增加1,并返回步骤“令第二迭代次数j=1”;If the second judgment result is no, then it is determined that the classifier at the i-1th iteration is the classifier at the ith iteration, the value of the first iteration number i is increased by 1, and the step "make the second The number of iterations j = 1";
若所述第二判断结果为是,则确定第i-1次迭代时的分类器为所述相关标记确定模型。If the second judgment result is yes, the classifier at the i-1th iteration is determined to be the relevant marker determination model.
可选的,在确定第i-1次迭代时的分类器为所述相关标记确定模型之前,还包括:Optionally, before determining the classifier for the i-1th iteration and determining the model for the relevant markers, the method also includes:
确定query视图下的标签级嵌入和key视图下的标签级嵌入为嵌入池;所述嵌入池还包括动量标签级嵌入队列里的标签级嵌入;Determine the tag-level embedding under the query view and the tag-level embedding under the key view as an embedding pool; the embedding pool also includes the tag-level embedding in the momentum tag-level embedding queue;
确定query视图下任一正负性为正的标签级嵌入为当前正标签级嵌入;Determine any tag-level embedding with a positive positive or negative in the query view as the current positive tag-level embedding;
确定嵌入池中与所述正标签级嵌入对应相同标记的正标签级嵌入为当前正标签级嵌入对应的正样本集合;Determining that the positive label-level embedding corresponding to the same label as the positive label-level embedding in the embedding pool is the positive sample set corresponding to the current positive label-level embedding;
确定所述当前正标签级嵌入和当前正标签级嵌入对应的正样本集合中的样本构成多个正样本对;Determining that the current positive label-level embedding and the samples in the positive sample set corresponding to the current positive label-level embedding constitute a plurality of positive sample pairs;
确定根据同一query视图下多个正样本对,确定对应偏多标记历史图像的对比损失函数;Determine the comparative loss function corresponding to more than one marked historical image based on multiple positive sample pairs under the same query view;
判断多个所述对比损失函数是否均小于对比损失函数阈值得到第三判断结果;Judging whether a plurality of the comparison loss functions are less than the comparison loss function threshold to obtain a third judgment result;
若所述第三判断结果为否,则更新第i-1次迭代时的分类器的参数,得到第0次迭代时的分类器,并返回步骤“令第一迭代次数i=1”;If the third judgment result is no, update the parameters of the classifier at the i-1th iteration to obtain the classifier at the 0th iteration, and return to the step "make the first iteration number i=1";
若所述第三判断结果为是,则调用步骤“确定第i-1次迭代时的分类器为所述相关标记确定模型”。If the third judgment result is yes, the step of "determining the classifier at the i-1th iteration as the model for the relevant markers" is invoked.
一种偏多标记图像的目标分类系统,包括:An object classification system for mostly labeled images, comprising:
待识别偏多标记图像获取模块,用于获取待识别偏多标记图像;所述待识别偏多标记图像中至少包括1个目标;An image acquisition module with many marks to be identified is used to acquire an image with many marks to be identified; the image with many marks to be identified includes at least one target;
相关标记识别模块,用于将待识别偏多标记图像输入到相关标记确定模型中,确定所述待识别偏多标记图像中的所有目标的相关标记;所述相关标记确定模型是根据多张偏多标记历史图像,利用对比标签消歧原理对分类器进行训练得到的;The relevant mark identification module is used to input the image of many marks to be identified into the relevant mark determination model, and determine the relevant marks of all targets in the image of many marks to be recognized; the relevant mark determination model is based on multiple partial Multi-label historical images, obtained by training the classifier using the principle of contrastive label disambiguation;
目标种类确定模块,用于根据多个所述相关标记确定待识别偏多标记图像中所有目标的种类;所述相关标记与目标种类一一对应。The target type determination module is used to determine the types of all targets in the image with more tags to be identified according to the plurality of related marks; the related marks correspond to the target types one by one.
一种电子设备,可选的,包括存储器及处理器,所述存储器用于存储计算机程序,所述处理器运行所述计算机程序以使所述电子设备执行所述的一种偏多标记图像的目标分类方法。An electronic device, optionally including a memory and a processor, the memory is used to store a computer program, and the processor runs the computer program to enable the electronic device to execute the above-mentioned method of marking more images Target classification method.
可选的,所述存储器为可读存储介质。Optionally, the memory is a readable storage medium.
根据本发明提供的具体实施例,本发明公开了以下技术效果:According to the specific embodiments provided by the invention, the invention discloses the following technical effects:
本发明提供的一种偏多标记图像的目标分类方法、系统及电子设备,获取待识别偏多标记图像;将待识别偏多标记图像输入到相关标记确定模型中,确定待识别偏多标记图像中的所有目标的相关标记;根据多个相关标记确定待识别偏多标记图像中所有目标的种类;相关标记与目标种类一一对应。本发明利用对比标签消歧原理对分类器进行训练得到能够准确识别未见图像中相关标记的相关标记确定模型,进而提高图像分类的准确性。The object classification method, system and electronic equipment of a multi-marked image provided by the present invention acquire the multi-marked image to be recognized; input the multi-marked image to be identified into a relevant mark determination model, and determine the multi-marked image to be recognized The related marks of all the targets in the image; according to the multiple related marks, the types of all the targets in the image to be identified are determined; the related marks correspond to the target types one by one. The invention utilizes the principle of contrastive label disambiguation to train a classifier to obtain a relevant label determination model capable of accurately identifying relevant labels in unseen images, thereby improving the accuracy of image classification.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the accompanying drawings required in the embodiments. Obviously, the accompanying drawings in the following description are only some of the present invention. Embodiments, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without paying creative labor.
图1为本发明实施例1一种偏多标记图像的目标分类方法流程图;Fig. 1 is a flow chart of a target classification method for an over-marked image in Embodiment 1 of the present invention;
图2为本发明实施例2一种CPLD模型框架图。Fig. 2 is a frame diagram of a CPLD model in Embodiment 2 of the present invention.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.
本发明的目的是提供一种偏多标记图像的目标分类方法、系统及电子设备,利用对比标签消歧原理对分类器进行训练得到能够准确识别未见的偏多标记图像中相关标记的相关标记确定模型,进而提高图像分类的准确性。The purpose of the present invention is to provide a target classification method, system and electronic equipment for over-marked images, using the principle of contrastive label disambiguation to train the classifier to obtain related tags that can accurately identify related tags in unseen over-marked images Determine the model to improve the accuracy of image classification.
为使本发明的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本发明作进一步详细的说明。In order to make the above objects, features and advantages of the present invention more comprehensible, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.
实施例1Example 1
如图1所示,本实施例提供了一种偏多标记图像的目标分类方法,包括:As shown in Figure 1, the present embodiment provides a target classification method for more marked images, including:
步骤101:获取待识别偏多标记图像。Step 101: Obtain an image of more markers to be identified.
其中,待识别偏多标记图像中至少包括1个目标。Wherein, at least one target is included in the image to be identified with more marks.
步骤102:将待识别偏多标记图像输入到相关标记确定模型中,确定待识别偏多标记图像中的所有目标的相关标记。Step 102: Input the image with many labels to be identified into the relevant label determination model, and determine the relevant labels of all objects in the image with many labels to be identified.
相关标记确定模型是根据多张偏多标记历史图像,利用对比标签消歧原理对分类器进行训练得到的。The relevant label determination model is obtained by training a classifier based on multiple over-labeled historical images using the principle of contrastive label disambiguation.
步骤103:根据多个相关标记确定待识别偏多标记图像中所有目标的种类;相关标记与目标种类一一对应。Step 103: Determine the types of all targets in the image with more tags to be identified according to the plurality of related tags; the related tags correspond to the target types one by one.
在步骤101之前,还包括:Before step 101, also include:
步骤104:获取多张偏多标记历史图像;偏多标记历史图像上标注有多种标记。标记的种类为相关标记或噪声标记。Step 104: Acquiring multiple over-marked historical images; multiple kinds of marks are marked on the over-marked historical images. The types of markers are correlation markers or noise markers.
步骤105:对偏多标记历史图像进行随机数据增广处理,得到偏多标记历史图像的query视图和key视图。Step 105: Perform random data augmentation processing on the over-marked historical image to obtain the query view and key view of the over-marked historical image.
步骤106:确定query视图下的标签级嵌入和key视图下的标签级嵌入。标签级嵌入与偏多标记历史图像上多个标记一一对应。Step 106: Determine the tag-level embedding under the query view and the tag-level embedding under the key view. Label-level embeddings correspond one-to-one to multiple labels on overlabelled historical images.
步骤107:根据query视图下的标签级嵌入和key视图下的标签级嵌入,利用对比标签消歧原理对分类器进行训练,得到相关标记确定模型。Step 107: According to the tag-level embedding under the query view and the tag-level embedding under the key view, the classifier is trained using the principle of contrastive tag disambiguation to obtain a related tag determination model.
在步骤106之后,还包括:After step 106, also include:
步骤108:确定query视图下的多个标签级嵌入的正负性。Step 108: Determine the positive or negative of multiple tag-level embeddings under the query view.
步骤109:确定key视图下的多个标签级嵌入的正负性。Step 109: Determine the positive or negative of multiple tag-level embeddings under the key view.
步骤107,包括:Step 107, comprising:
步骤1071:确定分类器为第0次迭代时的分类器。Step 1071: Determine the classifier as the classifier at the 0th iteration.
步骤1072:获取分类器中每种标记的初始正原型为第0次迭代时的正原型。Step 1072: Obtaining the initial positive type of each label in the classifier is the positive type at the 0th iteration.
步骤1073:获取分类器中每种标记的初始负原型为第0次迭代时的负原型。Step 1073: Obtain the initial negative prototype of each label in the classifier when it is the negative prototype of the 0th iteration.
步骤1074:令第一迭代次数i=1。Step 1074: Set the first iteration number i=1.
步骤1075:令第二迭代次数j=1。Step 1075: Set the second iteration number j=1.
步骤1076:确定任一基于query视图下的任一标签级嵌入为当前标签级嵌入。Step 1076: Determine any tag-level embedding in any query-based view as the current tag-level embedding.
步骤1077:根据当前标签级嵌入的正负性,更新第i-1次迭代时的正原型和第i-1次迭代时的负原型。Step 1077: Update the positive prototype at the i-1th iteration and the negative prototype at the i-1th iteration according to the positive or negative of the current label-level embedding.
步骤1078:计算当前标签级嵌入与更新后第i-1次迭代时的正原型的相似性为第一相似性。Step 1078: Calculate the first similarity between the current tag-level embedding and the positive prototype at the i-1th iteration after the update.
步骤1079:计算当前标签级嵌入与更新后第i-1次迭代时的负原型的相似性为第二相似性。Step 1079: Calculate the similarity between the current tag-level embedding and the negative prototype at the i-1th iteration after the update as the second similarity.
步骤10710:根据第一相似性和第二相似性,确定当前标签级嵌入根据原型预测的标记向量。Step 10710: According to the first similarity and the second similarity, determine the label vector predicted by the current label-level embedding according to the prototype.
步骤10711:根据标记向量更新当前标签级嵌入对应标记的伪标记,得到第j次迭代时当前标签级嵌入对应标记的伪标记。Step 10711: Update the pseudo-label embedded with the corresponding label at the current label level according to the label vector, and obtain the pseudo-label embedded with the corresponding label at the current label level at the jth iteration.
步骤10712:令第二迭代次数j的数值增加1,经当前标签级嵌入更新为同一query视图下的当前标签级嵌入之外的当前标签级嵌入,并返回步骤1077,直至第二迭代次数达到第二迭代次数阈值。Step 10712: Increase the value of the second iteration number j by 1, update the current tag-level embedding to a current tag-level embedding other than the current tag-level embedding under the same query view, and return to step 1077 until the second iteration count reaches the th Two iteration thresholds.
步骤10713:将对应query视图下的多个当前标签级嵌入输入第i-1次迭代时的分类器中,得到多个类别输出。Step 10713: Embedding multiple current label levels corresponding to the query view into the classifier at the i-1th iteration to obtain multiple category outputs.
步骤10714:根据多个类别输出和多次迭代时当前标签级嵌入对应标记的伪标记,确定第i-1次迭代时的分类损失函数。Step 10714: Determine the classification loss function at the i-1th iteration according to the outputs of multiple categories and the pseudo-labels embedded in the corresponding labels at the current label level during multiple iterations.
步骤10715:判断分类损失函数是否小于分类损失函数阈值,得到第一判断结果;若第一判断结果为否,则执行步骤10716;若第一判断结果为是,则执行步骤10717。Step 10715: Judging whether the classification loss function is less than the threshold of the classification loss function, and obtaining the first judgment result; if the first judgment result is no, execute step 10716; if the first judgment result is yes, execute step 10717.
步骤10716:更新第i-1次迭代时的分类器的参数,得到第i次迭代时的分类器,令第一迭代次数i的数值增加1,并返回步骤1075。Step 10716: Update the parameters of the classifier at the i-1th iteration to obtain the classifier at the ith iteration, increase the value of the first iteration number i by 1, and return to step 1075.
步骤10717:判断第一迭代次数是否达到第一迭代次数阈值,得到第二判断结果;若第二判断结果为否,则执行步骤10718;若第二判断结果为是,则执行步骤10719。Step 10717: Judging whether the first iteration count reaches the threshold of the first iteration count, and obtaining a second judgment result; if the second judgment result is no, execute step 10718; if the second judgment result is yes, execute step 10719.
步骤10718:确定第i-1次迭代时的分类器为第i次迭代时的分类器,令第一迭代次数i的数值增加1,并返回步骤1075。Step 10718: Determine that the classifier at the i-1th iteration is the classifier at the ith iteration, increase the value of the first iteration number i by 1, and return to step 1075.
步骤10719:确定第i-1次迭代时的分类器为相关标记确定模型。Step 10719: Determine the classifier at the i-1th iteration as the relevant marker determination model.
在步骤10719之前,还包括:Before step 10719, also include:
步骤10720:确定query视图下的标签级嵌入和key视图下的标签级嵌入为嵌入池;嵌入池还包括动量标签级嵌入队列里的标签级嵌入。Step 10720: Determine the tag-level embedding under the query view and the tag-level embedding under the key view as the embedding pool; the embedding pool also includes the tag-level embedding in the momentum tag-level embedding queue.
步骤10721:确定query视图下任一正负性为正的标签级嵌入为当前正标签级嵌入。Step 10721: Determine any positive tag-level embedding in the query view as the current positive tag-level embedding.
步骤10722:确定嵌入池中与正标签级嵌入对应相同标记的正标签级嵌入为当前正标签级嵌入对应的正样本集合。Step 10722: Determine the positive label-level embedding corresponding to the same label as the positive label-level embedding in the embedding pool as the positive sample set corresponding to the current positive label-level embedding.
步骤10723:确定当前正标签级嵌入和当前正标签级嵌入对应的正样本集合中的样本构成多个正样本对。Step 10723: Determine that the current positive label-level embedding and the samples in the positive sample set corresponding to the current positive label-level embedding constitute a plurality of positive sample pairs.
步骤10724:确定根据同一query视图下多个正样本对,确定对应偏多标记历史图像的对比损失函数。Step 10724: Determine the comparative loss function corresponding to more than one labeled historical images based on multiple positive sample pairs under the same query view.
步骤10725:判断多个对比损失函数是否均小于对比损失函数阈值得到第三判断结果;若第三判断结果为否,则执行步骤10726;若第三判断结果为是,则执行步骤10727。Step 10725: Judging whether multiple contrastive loss functions are all smaller than the contrastive loss function threshold to obtain a third judgment result; if the third judgment result is no, execute step 10726; if the third judgment result is yes, execute step 10727.
步骤10726:更新第i-1次迭代时的分类器的参数,得到第0次迭代时的分类器,并返回步骤1074。Step 10726: Update the parameters of the classifier at the i-1th iteration to obtain the classifier at the 0th iteration, and return to step 1074.
步骤10727:若第三判断结果为是,则调用步骤10719。Step 10727: If the third judgment result is yes, call step 10719.
实施例2Example 2
如图2所示,本实施例提供的一种偏多标记图像的目标分类方法由两部分组成:对比学习模块和基于原型的标签消歧模块。该方法使用这两个模块构建了一个协同的系统框架:其中对比学习旨在得到高质量的表征,基于原型的标签消歧利用对比学习学习到的高质量的表征得到改善的原型,随后更新伪标记,指导模型的预测结果,来帮助对比学习建立更准确的正样本对。同时,本发明使用了两阶段训练策略,来使得对比学习技术更合理地运用于本发明。两个模块相互依赖,协同合作,随着训练的推进,模型能逐步更新标记的置信度,提炼出相关标记,并降低对噪声标记的关注。具体步骤如下:As shown in FIG. 2 , an object classification method for mostly labeled images provided by this embodiment consists of two parts: a contrastive learning module and a prototype-based label disambiguation module. The method uses these two modules to construct a synergistic system framework: where contrastive learning aims to obtain high-quality representations, prototype-based label disambiguation uses the high-quality representations learned by contrastive learning to improve prototypes, and then updates pseudo Marking, to guide the prediction results of the model, to help contrastive learning to establish more accurate positive sample pairs. At the same time, the present invention uses a two-stage training strategy to make the comparative learning technique more reasonably applied to the present invention. The two modules depend on each other and cooperate with each other. As the training progresses, the model can gradually update the confidence of the marker, extract relevant markers, and reduce the attention to noise markers. Specific steps are as follows:
步骤1:偏多标记图像训练数据作为输入。定义和/>分别为特征空间和标记空间,其中K表示感兴趣的标签数量。训练数据集/>由n个样本组成,其中表示观测的第i张图像,/>表示第i张图像对应的候选标签向量,yi,j=1表示标签j是第i张图像的标记,反之亦然。Step 1: More labeled image training data as input. definition and /> are the feature space and label space, respectively, where K represents the number of labels of interest. training dataset /> Consists of n samples, where Indicates the i-th image observed, /> Indicates the candidate label vector corresponding to the i-th image, y i,j = 1 means that the label j is the mark of the i-th image, and vice versa.
步骤2:获取图像的增广视图:Step 2: Get an augmented view of the image:
为了表达的简洁,本发明省略索引i。对于输入图像x,本发明使用两种图像增广方式分别得到query视图Augq(x)和key视图Augk(x)。For the sake of brevity, the present invention omits the index i. For the input image x, the present invention uses two image augmentation methods to obtain the query view Aug q (x) and the key view Aug k (x) respectively.
其中,对于query网络本发明使用论文《Supervised contrastive learning》中的SimAugment数据增广方式,对于key网络本发明使用论文《Randaugment:PracticalAutomated Data Augmentation With a Reduced Search Space》中的RandAugment数据增广方式。Among them, the present invention uses the SimAugment data augmentation method in the paper "Supervised contrastive learning" for the query network, and uses the RandAugment data augmentation method in the paper "Randaugment: PracticalAutomated Data Augmentation With a Reduced Search Space" for the key network.
步骤3:获取两种视图下的标签级嵌入:Step 3: Get tab-level embeds under both views:
对于query视图,通过其编码器网络得到Enc(Augq(x))∈Rd*h*w,其中d,h,w分别表示编码器输出的维度,特征图的高和宽。再通过1*1卷积将特征图降维至K,得到每一个类别对应的特征图,将类特征图展平至向量形式后输入一个投影头(projection head),以此来将类向量投影至后面所需的对比空间,此时将得到query网络g(·)的输出g(Augq(x))∈RK *D,其中D是对比空间的维度。为此,图像级的特征图被解耦为K个D维的标签级嵌入qj∈R1*D,j∈{1,...,K},每一个标签级嵌入可以看作是图像在对应标签背景下的一种表征向量,含有对应的类的特征信息。key网络g'(·)是query网络的参数动量滑动平均的结果,得到的g'(Augk(x))∈RK*D为后续使用对比学习做准备,其中的每一行kj∈R1*D,j∈{1,...,K}与qj类似,表示key网络将图像级表征解耦得到的标签级嵌入。For the query view, Enc(Aug q (x))∈R d*h*w is obtained through its encoder network, where d, h, and w represent the dimension of the encoder output, the height and width of the feature map, respectively. Then use 1*1 convolution to reduce the dimensionality of the feature map to K to obtain the feature map corresponding to each category, flatten the class feature map to vector form and input a projection head to project the class vector To the comparison space required later, the output g(Aug q (x))∈R K *D of the query network g(·) will be obtained at this time, where D is the dimension of the comparison space. To this end, the image-level feature map is decoupled into K D-dimensional label-level embeddings q j ∈ R 1*D , j ∈ {1,...,K}, each label-level embedding can be regarded as an image A representation vector in the context of the corresponding label, containing the feature information of the corresponding class. The key network g'( ) is the result of the moving average of the parameter momentum of the query network. The obtained g'(Aug k (x))∈R K*D is prepared for the subsequent use of comparative learning, and each row k j ∈R 1*D , j∈{1,...,K} is similar to q j , representing the label-level embedding obtained by decoupling the image-level representation by the key network.
步骤4:使用对比学习获取高质量的嵌入表征:Step 4: Use contrastive learning to obtain high-quality embedding representations:
第一步:判断标签级嵌入的正负性。Step 1: Judge the positive or negative of the label-level embedding.
对于上述的增广图像Augq(x),经过分类器得到分类器的输出f(Augq(x))∈[0,1]K,其中f(Augq(x))的每一项是Sigmoid激活函数得到的输出。For the augmented image Aug q (x) above, after classifier Get the output f(Aug q (x))∈[0,1] K of the classifier, where each item of f(Aug q (x)) is the output obtained by the Sigmoid activation function.
使用判断每个标签级嵌入的正负性。其中,若/>则判断第j个标签级嵌入为正的标签级嵌入,反之亦然。聚集这些标签级嵌入得到样本图像的正/负标签级嵌入集合PE(x)/NE(x)。上式中α是超参数,bj和/>分别是第j类的基准概率和针对所有类的平均基准概率。use Judge the positiveness or negativeness of each label-level embedding. Among them, if /> Then it is judged that the j-th label-level embedding is a positive label-level embedding, and vice versa. Aggregating these label-level embeddings yields a set of positive/negative label-level embeddings PE(x)/NE(x) for the sample image. In the above formula, α is a hyperparameter, b j and /> are the baseline probability of class j and the average baseline probability for all classes, respectively.
第二步:构建每个正标签级嵌入的正样本集合。Step 2: Build a set of positive samples for each positive label-level embedding.
将query网络和key网络得到的正标签级嵌入与嵌入队列里的历史嵌入联合起来构建嵌入池A=Bq∪Bk∪queue,且A(qj)=A\{qj},其中Bq和Bk表示当前批次中所有样本图像对应于query网络和key网络的正标签级嵌入。本发明使用队列queue保留最近批次样本由key网络得到的正标签级嵌入。Combine the positive label-level embedding obtained by the query network and the key network with the historical embedding in the embedding queue to build an embedding pool A=B q ∪B k ∪queue, and A(q j )=A\{q j }, where B q and Bk denote the positive label-level embeddings of all sample images in the current batch corresponding to the query network and the key network. The present invention uses the queue queue to retain the positive label-level embedding obtained by the key network of the latest batch of samples.
使用P(qj)={kj|kj∈Aj(qj)}表示正标签级嵌入qj的正样本集合,其中表示A(qj)中所有标签j对应的标签级嵌入。上式表明,每一个正标签级嵌入的正样本集合是嵌入池中与之同类别的其他正标签级嵌入。Use P(q j )={k j |k j ∈ A j (q j )} to represent the positive sample set of positive label-level embedding q j , where Denotes the label-level embeddings corresponding to all labels j in A(q j ). The above formula shows that the set of positive samples for each positive label-level embedding is the same category as other positive label-level embeddings in the embedding pool.
第三步:建立对比损失。Step 3: Establish a contrastive loss.
将所有正标签级嵌入与其正样本集合中的样本构成正样本对,建立对比损失,来获取高质量的标签级嵌入表征。All positive label-level embeddings and their samples in the positive sample set constitute positive sample pairs, and a contrastive loss is established to obtain high-quality label-level embedding representations.
单张图像样本的对比损失函数计算公式如下:The formula for calculating the contrastive loss function for a single image sample is as follows:
其中,τ表示温度参数。Among them, τ represents the temperature parameter.
步骤5:基于原型的标签消歧:Step 5: Prototype-based label disambiguation:
对于每一个类c∈{1,...,K},本发明使用一个正原型和一个负原型/>分别表示第c类具有代表性的正/负标签级嵌入特征。For each class c∈{1,...,K}, the present invention uses a positive type and a negative prototype /> represent representative positive/negative label-level embedding features of class c, respectively.
第一步:更新正/负原型。Step 1: Update positive/negative prototypes.
在当前的小批次中,使用样本的正/负标签级嵌入更新对应类的正/负原型。更新公式如下:In the current mini-batch, the positive/negative prototypes of the corresponding classes are updated using the positive/negative label-level embeddings of the samples. The update formula is as follows:
第二步:更新伪标记。Step 2: Update pseudo tags.
计算样本的标签级嵌入与改善后的正/负原型的相似度,得到通过原型预测的标记向量z。使用滑动平均的方式逐步更新伪标记s。Compute the similarity between the sample’s label-level embedding and the improved positive/negative prototypes to obtain the label vector z predicted by the prototypes. The pseudo-label s is gradually updated by means of moving average.
其中φ∈(0,1)是正常量。上式表示,如果一个标记的标签级嵌入qc和对应的正/负原型求相似度之后,发现与正原型更相似,便考虑将此标记作为图像x的相关标记。随着模型训练的进行,通过对应原型得到的对标签级嵌入的预测将逐渐一致。因此,相关标签的伪标记将逐渐稳定在1上,而不相关标签的伪标记将平滑地接近0。where φ∈(0,1) is a normal quantity. The above formula means that if the tag-level embedding q c of a tag is found to be more similar to the positive prototype after calculating the similarity with the corresponding positive/negative prototype, then this tag is considered as a related tag of image x. As the model training progresses, the predictions of label-level embeddings obtained by corresponding prototypes will gradually agree. Therefore, the pseudo-labels of relevant labels will gradually stabilize at 1, while those of irrelevant labels will smoothly approach 0.
第三步:使用分类器的输出与伪标记构建分类损失。Step 3: Build a classification loss using the output of the classifier with the pseudo-labels.
使用更新后的伪标记与分类器的输出建立分类损失,利用对比学习改善原型之后改进的伪标记来指导模型分类器的预测。分类损失计算公式如下:The updated pseudo-labels are used with the output of the classifier to build a classification loss, and the improved pseudo-labels after improving the prototype using contrastive learning are used to guide the prediction of the model classifier. The classification loss calculation formula is as follows:
步骤6:结合分类损失和对比损失,组合成总损失函数,作为训练优化神经网络的目标函数。总损失函数计算如下:Step 6: Combining the classification loss and the comparison loss to form a total loss function, which is used as the objective function for training and optimizing the neural network. The total loss function is calculated as follows:
其中,λ是可调整的超参数。where λ is an adjustable hyperparameter.
步骤7:使用两阶段训练策略:Step 7: Use a two-stage training strategy:
两阶段训练策略由预消歧阶段和对比消歧阶段组成。The two-stage training strategy consists of a pre-disambiguation stage and a contrastive disambiguation stage.
预消歧阶段:去掉对比学习分支,即只使用query网络和原型消歧策略,此时只使用作为网络的目标函数。Pre-disambiguation stage: Remove the comparative learning branch, that is, only use the query network and the prototype disambiguation strategy. At this time, only use as the objective function of the network.
对比消歧阶段:使用总损失函数,训练整个系统。Contrastive disambiguation stage: Using the total loss function, the entire system is trained.
步骤8:使用阈值δ对测试数据进行未见样本的多标记预测:Step 8: Multi-label prediction of unseen samples on test data using threshold δ:
对于未见样本x,它的相关标记预测结果为For an unseen sample x, its associated label prediction result is
即对于未见实例x,将分类器预测概率与阈值δ进行比较,判断此标记是否是相关标记。使用多标记评价指标对模型进行评价。That is, for an unseen instance x, compare the predicted probability of the classifier with the threshold δ to determine whether the label is a relevant label. The model is evaluated using a multi-label evaluation metric.
本发明在主流的多标记图像分类数据集VOC2007上做了实验。VOC2007数据集,包含来自20个目标类别的图像,其中每张图像平均包含2.5个类别的目标。VOC2007数据集包含一个由5011张图像构成的训练数据集和一个由4952张图像构成的测试数据集。The present invention is tested on the mainstream multi-label image classification data set VOC2007. The VOC2007 dataset contains images from 20 object categories, where each image contains an average of 2.5 categories of objects. The VOC2007 dataset contains a training dataset consisting of 5011 images and a testing dataset consisting of 4952 images.
为了构建偏多标记数据集,本发明使标记集合的平均大小占整个标记空间中标记数量的比例为q。实验中对于VOC2007数据集,q取0.1。In order to construct a more labeled data set, the present invention makes the ratio of the average size of the label set to the number of labels in the entire label space be q. In the experiment, for the VOC2007 data set, q takes 0.1.
跟随现有的许多研究,本发明使用mAP,OF1,CF1作为评价指标。其中,mAP又称全类平均精度或平均精度均值,是将所有类别检测的平均正确率(AP)进行综合加权平均而得到的。CF1与OF1综合考虑了总体与每类的查全率和查准率。因此这三个指标是所有度量指标中最重要也是最具代表性的评价指标。Following many existing studies, the present invention uses mAP, OF1, CF1 as evaluation indicators. Among them, mAP, also known as the average precision of all classes or the mean of average precision, is obtained by comprehensively weighting the average accuracy rate (AP) of all categories of detection. CF1 and OF1 comprehensively consider the recall and precision of the overall and each category. Therefore, these three indicators are the most important and representative evaluation indicators among all measurement indicators.
为了验证本发明的有效性,本发明使用二值交叉熵损失BCE(binary crossentropy)直接在偏多标记图像数据集上进行训练构成基准方法。另外,添加两种先进的多标记图像分类方法ASL(发表于《Asymmetric loss for multi-label classification》)和ML-GCN(发表于《Multi-Label Image Recognition with Graph ConvolutionalNetworks》)作为其他两种基准对比方法。In order to verify the effectiveness of the present invention, the present invention uses binary cross entropy loss BCE (binary crossentropy) to directly perform training on a mostly labeled image data set to form a benchmark method. In addition, two advanced multi-label image classification methods ASL (published in "Asymmetric loss for multi-label classification") and ML-GCN (published in "Multi-Label Image Recognition with Graph ConvolutionalNetworks") are added as the other two benchmarks for comparison method.
本发明为了比较的一致性,跟随ASL的研究,采用在ImageNet上预训练过的TresnetL《Tresnet:High performance gpu-dedicated architecture》作为本发明的主干架,使用224*224作为输入图像的分辨率。对于VOC2007数据集使α=0.8。使用γ=0.99来更新原型,对于伪标记更新的常量φ,本发明使其从0.95线性下降到0.8。温度参数τ=0.2,损失函数的权重因子λ=0.1,并且对于模型参数本发明使用指数滑动平均(EMA)的方式更新,且衰减参数为0.9997。。For the consistency of comparison, the present invention follows the research of ASL, adopts TresnetL "Tresnet: High performance gpu-dedicated architecture" pre-trained on ImageNet as the backbone of the present invention, and uses 224*224 as the resolution of the input image. Let α = 0.8 for the VOC2007 dataset. Using γ = 0.99 to update the prototype, the present invention makes it linearly decrease from 0.95 to 0.8 for the constant φ of the pseudo-label update. The temperature parameter τ=0.2, the weight factor of the loss function λ=0.1, and the present invention uses exponential moving average (EMA) to update the model parameters, and the attenuation parameter is 0.9997. .
本发明所提方法命名为CPLD(Contrastive Prototype-based LabelDisambiguation),方法比较结果如表1。可以看出在VOC2007数据集上,本发明的方法在性能上超过了其他方法,这证明了本发明的方法有效性。The method proposed in the present invention is named CPLD (Contrastive Prototype-based Label Disambiguation), and the method comparison results are shown in Table 1. It can be seen that on the VOC2007 data set, the method of the present invention surpasses other methods in performance, which proves the effectiveness of the method of the present invention.
表1模型分类结果对比表Table 1 Comparison table of model classification results
本发明提供了一种偏多标记图像的目标分类方法,直接利用偏多标记图像训练数据,得到可以对未见实例进行多标记预测的分类模型,大大降低了标记成本。The invention provides a target classification method for more marked images, which directly uses the training data of more marked images to obtain a classification model capable of performing multi-mark prediction on unseen instances, thereby greatly reducing the marking cost.
另外本发明中分类模型使用的是二值交叉熵损失函数(BCE),也可以使用FocalLoss《Focal Loss for Dense Object Detection》或者ASL《Asymmetric Loss For Multi-Label Classification》等更加先进的损失计算方法。此外针对模型解耦得到的标签级嵌入的正负性判断,也可以采用对分类器预测概率简单使用固定阈值判断,或者使用softmax得到预测概率后将作为阈值,这些改进或者其他可轻易想到的变化或替换都属于本发明的保护范围。In addition, the classification model in the present invention uses the binary cross-entropy loss function (BCE), and more advanced loss calculation methods such as FocalLoss "Focal Loss for Dense Object Detection" or ASL "Asymmetric Loss For Multi-Label Classification" can also be used. In addition, for the positive and negative judgment of the label-level embedding obtained by model decoupling, it is also possible to simply use a fixed threshold to judge the predicted probability of the classifier, or use softmax to obtain the predicted probability As thresholds, these improvements or other easily conceivable changes or replacements all belong to the protection scope of the present invention.
实施例3Example 3
为了执行上述实施例1对应的方法,以实现相应的功能和技术效果,下面提供了一种偏多标记图像的目标分类系统,包括:In order to implement the method corresponding to the above-mentioned embodiment 1 to achieve corresponding functions and technical effects, a target classification system for more marked images is provided below, including:
待识别偏多标记图像获取模块,用于获取待识别偏多标记图像;待识别偏多标记图像中至少包括1个目标。The image acquisition module with many marks to be identified is used to acquire images with many marks to be identified; the image with many marks to be identified includes at least one target.
相关标记识别模块,用于将待识别偏多标记图像输入到相关标记确定模型中,确定待识别偏多标记图像中的所有目标的相关标记;相关标记确定模型是根据多张偏多标记历史图像,利用对比标签消歧原理对分类器进行训练得到的。The relevant mark identification module is used to input the image of many mark to be identified into the relevant mark determination model, and determine the relevant marks of all targets in the image of many mark to be identified; the relevant mark determination model is based on multiple mark history images , obtained by training the classifier using the principle of contrastive label disambiguation.
目标种类确定模块,用于根据多个相关标记确定待识别偏多标记图像中所有目标的种类;相关标记与目标种类一一对应。The target type determination module is used to determine the types of all targets in the image to be identified with more tags according to the plurality of related marks; the related marks are in one-to-one correspondence with the target types.
实施例4Example 4
本实施例提供了一种电子设备,包括存储器及处理器,存储器用于存储计算机程序,处理器运行计算机程序以使电子设备执行实施例1所述的一种偏多标记图像的目标分类方法。其中,存储器为可读存储介质。This embodiment provides an electronic device, including a memory and a processor, the memory is used to store a computer program, and the processor runs the computer program to enable the electronic device to execute the object classification method for more marked images described in Embodiment 1. Wherein, the memory is a readable storage medium.
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的系统而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。Each embodiment in this specification is described in a progressive manner, each embodiment focuses on the difference from other embodiments, and the same and similar parts of each embodiment can be referred to each other. As for the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and for the related information, please refer to the description of the method part.
本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处。综上所述,本说明书内容不应理解为对本发明的限制。In this paper, specific examples have been used to illustrate the principle and implementation of the present invention. The description of the above embodiments is only used to help understand the method of the present invention and its core idea; meanwhile, for those of ordinary skill in the art, according to the present invention Thoughts, there will be changes in specific implementation methods and application ranges. In summary, the contents of this specification should not be construed as limiting the present invention.
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310544125.XA CN116630694A (en) | 2023-05-12 | 2023-05-12 | A target classification method, system and electronic equipment for more marked images |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310544125.XA CN116630694A (en) | 2023-05-12 | 2023-05-12 | A target classification method, system and electronic equipment for more marked images |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116630694A true CN116630694A (en) | 2023-08-22 |
Family
ID=87590462
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310544125.XA Pending CN116630694A (en) | 2023-05-12 | 2023-05-12 | A target classification method, system and electronic equipment for more marked images |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116630694A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117992835A (en) * | 2024-04-03 | 2024-05-07 | 安徽大学 | Multi-strategy label disambiguation partial multi-label classification method, device and storage medium |
-
2023
- 2023-05-12 CN CN202310544125.XA patent/CN116630694A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117992835A (en) * | 2024-04-03 | 2024-05-07 | 安徽大学 | Multi-strategy label disambiguation partial multi-label classification method, device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113378632B (en) | Pseudo-label optimization-based unsupervised domain adaptive pedestrian re-identification method | |
CN111079847B (en) | Remote sensing image automatic labeling method based on deep learning | |
CN105354595A (en) | Robust visual image classification method and system | |
CN106203521A (en) | Based on disparity map from the SAR image change detection of step study | |
CN108877947B (en) | Deep sample learning method based on iterative mean clustering | |
CN114882531B (en) | A cross-domain person re-identification method based on deep learning | |
CN115221947B (en) | A robust multimodal active learning method based on pre-trained language model | |
CN113095229B (en) | Self-adaptive pedestrian re-identification system and method for unsupervised domain | |
CN117516937A (en) | Unknown fault detection method of rolling bearing based on multi-modal feature fusion enhancement | |
CN113379037A (en) | Multi-label learning method based on supplementary label collaborative training | |
CN102663681B (en) | Gray scale image segmentation method based on sequencing K-mean algorithm | |
CN114266321A (en) | Weak supervision fuzzy clustering algorithm based on unconstrained prior information mode | |
CN116630694A (en) | A target classification method, system and electronic equipment for more marked images | |
CN114842330B (en) | Multi-scale background perception pooling weak supervision building extraction method | |
CN111144462A (en) | Unknown individual identification method and device for radar signals | |
CN109754000A (en) | A Dependency-Based Semi-Supervised Multi-Label Classification Method | |
CN112465016A (en) | Partial multi-mark learning method based on optimal distance between two adjacent marks | |
CN107247996A (en) | A kind of Active Learning Method applied to different distributed data environment | |
CN117541562A (en) | Semi-supervised non-reference image quality evaluation method based on uncertainty estimation | |
CN117576471A (en) | Method and device for classifying few-sample images by introducing local feature alignment and prototype correction mechanisms | |
CN117392420A (en) | Semantic association method of collection cultural relics image data based on multi-label image classification | |
CN113409351B (en) | Unsupervised Domain Adaptive Remote Sensing Image Segmentation Based on Optimal Transmission | |
CN114067165B (en) | Image screening and learning method and device for noise-containing mark distribution | |
CN117079011A (en) | Image type increment learning method and system based on out-of-distribution detection | |
CN102033933B (en) | Optimal distance measure method for maximizing the mean of average precision |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |