CN117409456A - Non-aligned multi-view multi-mark learning method based on graph matching mechanism - Google Patents
Non-aligned multi-view multi-mark learning method based on graph matching mechanism Download PDFInfo
- Publication number
- CN117409456A CN117409456A CN202311195295.8A CN202311195295A CN117409456A CN 117409456 A CN117409456 A CN 117409456A CN 202311195295 A CN202311195295 A CN 202311195295A CN 117409456 A CN117409456 A CN 117409456A
- Authority
- CN
- China
- Prior art keywords
- view
- matrix
- data
- aligned
- mark
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 230000007246 mechanism Effects 0.000 title claims abstract description 21
- 239000011159 matrix material Substances 0.000 claims abstract description 69
- 238000005457 optimization Methods 0.000 claims abstract description 19
- 230000014509 gene expression Effects 0.000 claims abstract description 15
- 238000012549 training Methods 0.000 claims abstract description 15
- 238000013507 mapping Methods 0.000 claims description 20
- 230000006870 function Effects 0.000 claims description 10
- 238000013145 classification model Methods 0.000 claims description 7
- 239000003550 marker Substances 0.000 claims description 6
- 238000012360 testing method Methods 0.000 claims description 5
- 238000000354 decomposition reaction Methods 0.000 claims description 4
- 238000005065 mining Methods 0.000 claims description 4
- 238000010586 diagram Methods 0.000 claims description 2
- 238000004364 calculation method Methods 0.000 claims 2
- 102000002274 Matrix Metalloproteinases Human genes 0.000 claims 1
- 108010000684 Matrix Metalloproteinases Proteins 0.000 claims 1
- 230000008878 coupling Effects 0.000 claims 1
- 238000010168 coupling process Methods 0.000 claims 1
- 238000005859 coupling reaction Methods 0.000 claims 1
- 238000007781 pre-processing Methods 0.000 claims 1
- 238000010561 standard procedure Methods 0.000 claims 1
- 239000013598 vector Substances 0.000 description 5
- 238000013473 artificial intelligence Methods 0.000 description 3
- 238000007418 data mining Methods 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000000295 complement effect Effects 0.000 description 2
- 230000008451 emotion Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000001737 promoting effect Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000012733 comparative method Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 230000008921 facial expression Effects 0.000 description 1
- 201000011243 gastrointestinal stromal tumor Diseases 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 230000007786 learning performance Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 230000033764 rhythmic process Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
- G06V10/765—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
Description
技术领域Technical field
本发明涉及非负矩阵分解、图匹配技术、非对齐多视图学习、多标记分类,具体涉及一种基于图匹配机制的非对齐多视图多标记学习方法。The invention relates to non-negative matrix decomposition, graph matching technology, non-aligned multi-view learning, and multi-label classification, and specifically relates to a non-aligned multi-view multi-label learning method based on a graph matching mechanism.
背景技术Background technique
大规模的互联网发展,大数据与人工智能技术应用的普及,随之也带来了海量的多视图多标记数据。多视图多标记学习作为解决此类问题的主要架构得到了广泛的关注。在多视图多标记学习任务中,每个样本对象由多种异构的视图信息进行表示,同时标注有若干相关联的标记。然而现有的多视图多标记学习方法通常是探索跨视图的相关信息和互补信息。这些信息的探寻通常是基于视图对齐关系(不同视图中的实例描述同一对象)。然而,由于空间、时间或时空异步导致,这种视图对齐关系可以变为部分视图对齐或视图非对齐关系。如:在视频推荐中,标记数据来自不同的视频软件,但由于用户的隐私保护原则,不能将这些数据与同一用户进行匹配和对齐。在人脸识别领域,由于人脸特征检测失败,多视图人脸无法对齐,这会导致人脸表情识别无法进行。现有多视图多标记学习模型无法直接从这些非对齐数据中学习到鲁棒的多标记分类模型。The large-scale development of the Internet and the popularization of big data and artificial intelligence technology applications have also brought massive multi-view and multi-label data. Multi-view multi-label learning has received widespread attention as the main architecture for solving such problems. In the multi-view multi-label learning task, each sample object is represented by a variety of heterogeneous view information and is marked with several associated labels. However, existing multi-view multi-label learning methods usually explore relevant and complementary information across views. The search for this information is usually based on view alignment relationships (instances in different views describe the same object). However, this view alignment relationship can become a partial view alignment or a view non-alignment relationship due to spatial, temporal, or spatiotemporal asynchrony. For example: in video recommendation, the labeled data comes from different video software, but due to the user's privacy protection principle, these data cannot be matched and aligned with the same user. In the field of face recognition, due to the failure of facial feature detection, multi-view faces cannot be aligned, which makes facial expression recognition impossible. Existing multi-view multi-label learning models cannot directly learn robust multi-label classification models from these non-aligned data.
为此,本发明提出一种基于图匹配机制的非对齐多视图多标记学习方法(non-aligned multi-view multi-label classification via graph matching mechanism,简称MCGM)以解决多视图多标记学习中的试图非对齐问题以及语义全面表达问题。针对多视图数据存在的非对齐问题,通过挖掘跨视图“实例-实例”和“实例关系-实例关系”图匹配关系,实现同一实例在不同视图下特征节点的精准对齐,并用于后续分类任务;针对现有基于共享子空间表示的多视图多标记学习算法难以刻画多视图数据的全部语义信息,设计一种基于“共体-单体”语义表示的多视图多标记分类模型,强调单体视图在特定语义表达的贡献,促进少类别样本的语义表达。To this end, the present invention proposes a non-aligned multi-view multi-label classification via graph matching mechanism (MCGM for short) based on a graph matching mechanism to solve the problems in multi-view multi-label learning. Non-alignment issues and semantic comprehensive expression issues. In order to solve the non-alignment problem of multi-view data, by mining the cross-view "instance-instance" and "instance relationship-instance relationship" graph matching relationships, the feature nodes of the same instance in different views can be accurately aligned, and used for subsequent classification tasks; As the existing multi-view multi-label learning algorithm based on shared subspace representation is difficult to describe all the semantic information of multi-view data, a multi-view multi-label classification model based on "community-single" semantic representation is designed, emphasizing the single view. Contribution to specific semantic expressions, promoting semantic expression of few-category samples.
发明内容Contents of the invention
本发明的技术解决问题是:提出一种基于图匹配机制的非对齐多视图多标记学习方法,实现对非对齐多视图多标记数据的分类,并通过语义全面表达保证方法的效率和准确性。The technical problem solved by the present invention is to propose a non-aligned multi-view multi-label learning method based on a graph matching mechanism, realize the classification of non-aligned multi-view multi-label data, and ensure the efficiency and accuracy of the method through comprehensive semantic expression.
本发明的技术解决方案为:一种基于图匹配机制的非对齐多视图多标记学习方法,首先获取非对齐多视图多标记数据,对非对齐多视图多标记数据进行存储、预处理和数据集划分,形成样本数据集。基于样本数据集中的训练数据,构建特征矩阵、可观测标记矩阵。根据特征矩阵、可观测标记矩阵非对齐数据对齐:1)通过置换矩阵对非对齐数据进行显式的对齐,即样本之间点对点的一阶对齐;2)利用不同视图中样本的距离矩阵对视图进行图结构上的二阶对齐,从而进一步提高模型的对齐准确性。挖掘对齐的数据视图间“共性-个性”的表达,利用跨视图的一致性和互补性构建基于图匹配机制的非对齐多视图多标记学习模型。通过交替优化方法对模型进行训练,直至模型收敛,得到分类预测器。基于收敛后的分类器对测试集进行预测,利用输出概率获得标记分类结果。其具体步骤如下:The technical solution of the present invention is: a non-aligned multi-view multi-label learning method based on a graph matching mechanism. First, the non-aligned multi-view multi-label data is obtained, and the non-aligned multi-view multi-label data is stored, preprocessed and data set. Divide to form a sample data set. Based on the training data in the sample data set, a feature matrix and an observable label matrix are constructed. Alignment of non-aligned data according to the feature matrix and observable label matrix: 1) Explicitly align the non-aligned data through the permutation matrix, that is, point-to-point first-order alignment between samples; 2) Use the distance matrix of samples in different views to align the views Perform second-order alignment on the graph structure to further improve the alignment accuracy of the model. Mining the expression of "commonality-individuality" between aligned data views, and utilizing the consistency and complementarity across views to build a non-aligned multi-view multi-label learning model based on the graph matching mechanism. The model is trained through the alternating optimization method until the model converges, and a classification predictor is obtained. Predict the test set based on the converged classifier, and use the output probability to obtain the label classification result. The specific steps are as follows:
在本发明中,矩阵由加粗大写字母表示,如X。向量由加粗小写字母表示,如x。另外,(XR)表示由X.R得到的矩阵,其中·为矩阵乘法。矩阵X的逆和转置分别表示为X-1,XT。Xv表示第v个视图的特征矩阵,Xv的第i列和第j行分别记为(Xv):,i和(Xv)j,:。(Xv)i,j是Xv的(i,j)元素,xi表示向量x的第i个元素。另外,我们用代表实数域,弗罗贝尼乌斯(Frobenius)范数记为/> In the present invention, matrices are represented by bold capital letters, such as X. Vectors are represented by bold lowercase letters, such as x. In addition, (XR) represents the matrix obtained from XR, where · is matrix multiplication. The inverse and transpose of matrix X are represented as X -1 and X T respectively. X v represents the feature matrix of the v-th view, and the i-th column and j-th row of X v are recorded as (X v ) :,i and (X v ) j,: respectively. (X v ) i, j is the (i, j) element of X v , and xi represents the i-th element of vector x. Additionally, we use Represents the field of real numbers, and the Frobenius norm is expressed as/>
步骤S1,获取非对齐多视图多标记数据,对非对齐多视图多标记数据进行存储、预处理和数据集划分。由于该问题是一个全新的问题,目前没有公开的非对齐数据集,因此采用人工合成数据集。具体地,以6个公开的多视图多标记数据集为基础,通过随机打乱实例,使得不同视图中的实例描述不同对象。获得非对齐多视图多标记数据集。Step S1: Obtain non-aligned multi-view multi-label data, store, pre-process and divide the data set into non-aligned multi-view multi-label data. Since this problem is a completely new one and there is currently no public non-aligned dataset, a synthetic dataset is used. Specifically, based on 6 public multi-view multi-label data sets, instances are randomly shuffled so that instances in different views describe different objects. Obtain non-aligned multi-view multi-label datasets.
用训练数据构建一个具有V个视图样本数据集其中/> 是第v个视图的完整特征空间,n代表训练样本xi的数量,dv表示每个样本的维度。表示与特征空间对应的标记空间。其中yi∈{0,1}n×q为xi的标记向量,q表示标记数。Construct a sample data set with V views using the training data Among them/> is the complete feature space of the v-th view, n represents the number of training samples xi , and d v represents the dimension of each sample. Represents the label space corresponding to the feature space. Among them, yi ∈ {0, 1} n×q is the tag vector of xi , and q represents the number of tags.
步骤S2,对步骤S1中获取样本数据集中训练数据构建的特征矩阵Xv,可观测标记矩阵Y构建跨视图的一阶和二阶关系匹配,实现多个视图的特征对齐,并基于此构建基于“共体-单体”语义表示的非对齐多视图多标记分类模型。具体步骤包括:Step S2: Construct the first -order and second-order relationship matching across views for the feature matrix A non-aligned multi-view multi-label classification model with "community-single" semantic representation. Specific steps include:
(a)利用置换矩阵对非对齐数据进行显式的对齐,进行实例之间点对点的一阶对齐。经过一阶对齐后,得到了实例间相对正确的映射关系,采用非负矩阵分解,提取不同视图数据的共同低维表示。基于获得的共同表示和可观测标记矩阵引入特征映射矩阵W0构建共享子空间到标记空间的线形映射关系,获得初始学习模型:(a) Use the permutation matrix to explicitly align the non-aligned data and perform point-to-point first-order alignment between instances. After first-order alignment, a relatively correct mapping relationship between instances is obtained, and non-negative matrix decomposition is used to extract a common low-dimensional representation of data from different views. Based on the obtained common representation and observable label matrix, the feature mapping matrix W 0 is introduced to construct a linear mapping relationship from the shared subspace to the label space, and the initial learning model is obtained:
s.t.(Mv)i,j∈{0,1},Mv1=1,1TMv=1T,P≥0,Hv≥0,W0≥0#(1)st(M v ) i, j ∈ {0, 1}, M v 1=1, 1 T M v =1 T , P≥0, H v ≥0, W 0 ≥0#(1)
其中,是第v个视图的置换矩阵,特征矩阵经过置换矩阵相乘得到对齐后的多视图数据。P和Hv是对齐后的数据经过非负矩分解得到的。/>表示第v个视图的个体映射矩阵;/>是共享子空间,其中k是期望的数据降低维数后的大小;P≥0和Hv≥0是矩阵的非负约束。/>是P对应的系数矩阵,因此也约束W0≥0。W0通过共享子空间P到Y的映射关系,学习异构视图的共性信息。α和γ是两个超参数。最后一项是是对W0进行正则化约束,是为了避免过拟合问题,减少噪声特征的影响。in, is the permutation matrix of the v-th view. The feature matrix is multiplied by the permutation matrix to obtain the aligned multi-view data. P and H v are obtained by non-negative moment decomposition of the aligned data. /> Represents the individual mapping matrix of the v-th view;/> is a shared subspace, where k is the size of the expected data after reducing the dimension; P ≥ 0 and H v ≥ 0 are non-negative constraints of the matrix. /> is the coefficient matrix corresponding to P, so W 0 ≥ 0 is also constrained. W 0 learns the common information of heterogeneous views by sharing the mapping relationship from subspace P to Y. α and γ are two hyperparameters. The last item is to regularize W 0 to avoid over-fitting problems and reduce the influence of noise features.
(b)考虑到个体映射矩阵实际上是用于编码单个视图的相应的特性,定义了另一组系数矩阵/>来捕获单个视图的独特特性,其中,/>将带有第v个视图个性信息的重构特征矩阵PHv映射到标记空间。利用异构视图的个性和共性信息,并进一步扩展公式(1)如下:(b) Considering the individual mapping matrix In fact, another set of coefficient matrices is defined for encoding the corresponding characteristics of a single view/> to capture the unique characteristics of a single view, where/> Map the reconstructed feature matrix PH v with the v-th view personality information to the label space. Utilize the individuality and commonality information of heterogeneous views and further expand formula (1) as follows:
s.t.(Mv)i,j∈{0,1},Mv1=1,1TMv=1T,P≥0,Hv≥0,W0≥0,Wv≥0#(2)st(M v ) i, j ∈ {0, 1}, M v 1=1, 1 T M v =1 T , P≥0, H v ≥0, W 0 ≥0, W v ≥0#(2 )
其中,P可以看作是包含所有视图信息的字典矩阵,而Hv表示单个视图特定的编码系数。因此,PHv的目标是捕获特定视图的个体信息,而P则捕获所有视图的共享信息。因此,PHv和P可以看作是多视图数据的“共性-个性”的表达。Among them, P can be regarded as a dictionary matrix containing all view information, and H v represents a single view-specific encoding coefficient. Therefore, PH v aims to capture the individual information of a specific view, while P captures the shared information of all views. Therefore, PH v and P can be regarded as the expression of the "commonality-individuality" of multi-view data.
(c)进一步考虑视图间二阶图结构的对齐和标记相关性。由于MvXv和MjXj分别表示视图Xv和视图Xj的正确对齐视图。用跨视图映射矩阵来表示视图Xv和视图Xj之间的跨视图样本的匹配度。基于对多视图数据对齐后形成的图结构应尽可能一致的共同认知,构建了结构匹配损失项来探索正确的跨视图映射关系。为每个视图建立各自样本间的距离矩阵Sv,代表视图v的图结构。通过跨视图映射矩阵/>置换距离矩阵Sv和Sj,使两个视图的图连接结构尽可能相似。/>表示标记相关性矩阵,拟通过矩阵A利用多标记数据中的相关性,提取出已知的相关标记信息。引入超参数β平衡二阶对齐的权重。最后,最终的基于图匹配机制的非对齐多视图多标记学习模型表述如下:(c) Further consider the alignment and labeling correlation of second-order graph structures between views. Since M v X v and M j X j represent the correctly aligned views of view X v and view X j respectively. Mapping matrices across views to represent the matching degree of cross-view samples between view X v and view X j . Based on the common understanding that the graph structure formed after multi-view data alignment should be as consistent as possible, a structure matching loss term is constructed to explore the correct cross-view mapping relationship. A distance matrix S v between respective samples is established for each view, representing the graph structure of view v. By mapping matrices across views/> Replace the distance matrices S v and S j to make the graph connection structure of the two views as similar as possible. /> Represents the tag correlation matrix. It is intended to use the correlation in multi-mark data through matrix A to extract known relevant tag information. The hyperparameter β is introduced to balance the weight of the second-order alignment. Finally, the final unaligned multi-view multi-label learning model based on the graph matching mechanism is expressed as follows:
s.t.(Mv)i,j∈{0,1},Mv1=1,1TMv=1T,P≥0,Hv≥0,W0≥0,Wv≥0st(M v ) i, j ∈ {0, 1}, M v 1=1, 1 T M v =1 T , P≥0, H v ≥0, W 0 ≥0, W v ≥0
其中 in
步骤S3,对S2中的基于图匹配机制的非对齐多视图多标记学习模型简化表述,并进行交替优化训练,使模型最小化,直至模型收敛,得到分类预测器。具体步骤包括:Step S3: Simplify the expression of the non-aligned multi-view multi-label learning model based on the graph matching mechanism in S2, and conduct alternate optimization training to minimize the model until the model converges to obtain a classification predictor. Specific steps include:
(a)由于Mv的约束是非凸很难得到最优解。又由于正交变换不会改变向量之间的关系,这意味着不那么严格的约束也可以保留数据的结构。把Mv的约束放宽为:(a) Since the constraints of M v are non-convex, it is difficult to obtain the optimal solution. And since orthogonal transformations do not change the relationship between vectors, this means that less stringent constraints can also preserve the structure of the data. Relax the constraint of M v to:
(b)通过引入拉格朗日乘子λ,Φ,Θ,Ω,Ψ,将目标函数转变为无约束问题。以视图υ为例,将目标函数(3)在视图υ上变为以下形式:(b) By introducing Lagrange multipliers λ, Φ, Θ, Ω, Ψ, the objective function is transformed into an unconstrained problem. Taking view υ as an example, the objective function (3) on view υ becomes the following form:
(c)迭代优化固定P,/>A,W0,/>时,Mv计算独立于Mv′,v′≠v。因此对于每一个视图v,对于Mv单独进行优化。(c) Iterative optimization Fixed P,/> A, W 0 ,/> When , M v is calculated independently of M v′ , v′≠v. Therefore, for each view v, M v is optimized separately.
标准的方法求解耦合方程等式(3)和约束是使用非线性方法,如牛顿方法。这个非线性方程组通常很难求解。选择寻求一个近似解。通过该方法,可以得到以下关于Mv的迭代更新规则:Standard approach to solving coupled equations Equation (3) and constraints is to use nonlinear methods, such as Newton's method. This nonlinear system of equations is often difficult to solve. Choose to seek an approximate solution. Through this method, the following iterative update rules for M v can be obtained:
其中:in:
(d)迭代优化P。固定A,W0,/>时,公式(3)中/>对P求导数,可以得到:(d) Iterative optimization of P. fixed A, W 0 ,/> When, in formula (3)/> Taking the derivative of P, we can get:
使用KKT条件,即Φi,jPi,j=0,可以得到以下关于P的迭代更新规则:Using the KKT condition, that is, Φ i, j P i, j = 0, the following iterative update rules for P can be obtained:
(e)迭代优化A。固定P,W0,/>时,公式(3)中/>对A求导数,可以得到:(e) Iterative optimization A. fixed P, W 0 ,/> When, in formula (3)/> Taking the derivative of A, we can get:
令导数等于0,可以得到以下关于A更新规则:Let the derivative equal to 0, we can get the following update rules for A:
(f)迭代优化固定P,/>A,W0,/>时,Hv计算独立于Hv,v′≠v。因此对于每一个视图v,对于Hv单独进行优化,公式(3)中/>对Hv求导数,可以得到:(f)Iterative optimization Fixed P,/> A, W 0 ,/> When , H v is calculated independently of H v , v′≠v. Therefore, for each view v, H v is optimized separately, in formula (3)/> Taking the derivative of H v , we can get:
使用KKT条件,即Θi,j(Hv)i,j=0,可以得到以下关于Hv的迭代更新规则:Using the KKT condition, that is, Θ i, j (H v ) i, j = 0, the following iterative update rules for H v can be obtained:
(g)迭代优化W0。固定P,A,/>时,公式(3)中/>对W0求导数,可以得到:(g) Iterative optimization of W 0 . Fixed P, A,/> When, in formula (3)/> Taking the derivative of W 0 , we can get:
使用KKT条件,即Ωi,j(W0)i,j=0,可以得到以下关于W0的迭代更新规则:Using the KKT condition, that is, Ω i,j (W 0 ) i,j = 0, the following iterative update rules for W 0 can be obtained:
(h)迭代优化固定P,/>A,/>W0时,公式(3)中/>对Wv求导数,可以得到:(h) Iterative optimization Fixed P,/> A,/> When W 0 , in formula (3)/> Taking the derivative of W v , we can get:
使用KKT条件,即Ψi,j(Wv)i,j=0可以得到以下关于Wv更新规则:Using the KKT condition, that is, Ψ i, j (W v ) i, j = 0, the following update rules for W v can be obtained:
(i)重复(c)到(h),不断交替更新参数A,W0,/>,P直到满足迭代停止条件。目标函数收敛,输出所述基于图匹配机制的非对齐多视图多标记学习模型的最优参数,得到分类预测器。(i) Repeat (c) to (h), updating parameters alternately A, W 0 ,/> ,P until the iteration stop condition is met. The objective function converges, outputs the optimal parameters of the non-aligned multi-view multi-label learning model based on the graph matching mechanism, and obtains a classification predictor.
步骤S4,基于收敛后的基于图匹配机制的非对齐多视图多标记学习模型,对测试集进行预测,利用输出概率获得标记预测结果,具体步骤包括:标记预测矩阵将会被等式给出。Step S4, based on the converged non-aligned multi-view multi-label learning model based on the graph matching mechanism, predict the test set, and use the output probability to obtain the label prediction result. The specific steps include: label prediction matrix will be equal to given.
本发明与现有技术相比的优点在于:The advantages of the present invention compared with the prior art are:
1、针对多视图多标记学习任务中视图非对齐问题,提出了一阶和二阶对齐来解决这个问题。重新排序矩阵可以自适应地对每个视图中的特征进行重排序,进行一阶对齐得到正确映射关系。因此,视图非对齐问题可以简化为视图对齐问题,此外,利用不同视图中样本的距离矩阵对视图进行结构上的二阶对齐,从而提高对齐的效率和准确性。1. In view of the view non-alignment problem in multi-view multi-label learning tasks, first-order and second-order alignment are proposed to solve this problem. The reordering matrix can adaptively reorder the features in each view and perform first-order alignment to obtain the correct mapping relationship. Therefore, the view non-alignment problem can be simplified to a view alignment problem. In addition, the distance matrices of samples in different views are used to perform structural second-order alignment of the views, thereby improving the efficiency and accuracy of the alignment.
2、针对多视图多标记语义全面表达问题,本发明的方法可以联合利用多视图多标记数据的一致性和多样性信息。模型从不同的视图、标记相关性、一种基于个体和共享特征空间的集成分类器中学习一个共享的子空间。将对齐的数据输入这个基于“共体-单体”语义表示的多视图多标记分类模型,强调单体视图在特定语义表达的贡献,促进少类别样本的语义表达。2. Aiming at the problem of comprehensive expression of multi-view and multi-label semantics, the method of the present invention can jointly utilize the consistency and diversity information of multi-view and multi-label data. The model learns a shared subspace from different views, label correlations, an ensemble classifier based on individual and shared feature spaces. Input the aligned data into this multi-view multi-label classification model based on the "community-single" semantic representation, emphasizing the contribution of the single view in specific semantic expression, and promoting the semantic expression of samples with few categories.
3、通过引入动态标记关联矩阵A来学习标记中存在的隐形相关。虽然可以基于已知标记矩阵估计的固定标记关联矩阵。但是仅用已知标记的样本可能是不充分的,提出的动态标记关联矩阵能够自适应的度量标记之间的相关关系,帮助提升多标记分类模型学习性能。3. Learn the invisible correlation existing in the tags by introducing the dynamic tag association matrix A. While a fixed tag correlation matrix can be estimated based on a known tag matrix. However, only using known labeled samples may not be sufficient. The proposed dynamic label association matrix can adaptively measure the correlation between labels and help improve the learning performance of multi-label classification models.
4、模型被简化为了一般的情况,提出了一种求解目标函数的迭代优化方法。在一定的时间复杂度下寻求模型的近似解。之后,在六个真实世界的数据集上验证了模型的有效性。4. The model is simplified to a general situation, and an iterative optimization method for solving the objective function is proposed. Find an approximate solution to the model under a certain time complexity. Afterwards, the effectiveness of the model is verified on six real-world datasets.
附图说明Description of the drawings
图1为本发明方法的处理流程图。Figure 1 is a processing flow chart of the method of the present invention.
图2为本发明方法的训练工作流程图。Figure 2 is a training workflow diagram of the method of the present invention.
具体实施方式Detailed ways
为了使本技术领域的人员更好地理解本发明实施例的方案,下面结合附图和实施方式对本发明实施例作进一步的详细说明。In order to enable those skilled in the art to better understand the solutions of the embodiments of the present invention, the embodiments of the present invention will be further described in detail below in conjunction with the drawings and implementation modes.
如图1所示,本发明包括以下步骤:As shown in Figure 1, the present invention includes the following steps:
1、获取非对齐多视图多标记数据,对数据进行存储、预处理和数据集划分。构建的特征矩阵Xv,可观测标记矩阵Y。由于该问题是一个全新的问题,目前没有公开的非对齐数据集,因此采用了人工合成数据集。具体地说,以6个公开的多视图多标记数据集为基础,通过随机打乱实例,使得不同视图中的实例描述不同对象。获得非对齐多视图多标记数据集。用训练数据构建一个具有V个视图样本数据集 是与特征集对应的标记空间。1. Obtain non-aligned multi-view multi-label data, store, preprocess and divide the data set. Constructed feature matrix X v , observable label matrix Y. Since this problem is a completely new one and there are currently no publicly available non-aligned datasets, a synthetic dataset is used. Specifically, based on 6 public multi-view multi-label data sets, instances are randomly shuffled so that instances in different views describe different objects. Obtain non-aligned multi-view multi-label datasets. Construct a sample data set with V views using the training data is the label space corresponding to the feature set.
2、对步骤1中获取样本数据集中训练数据建的特征矩阵Xv,可观测标记矩阵Y构建跨视图的一阶和二阶关系匹配,实现多个视图的特征对齐,并基于此构建基于“共体-单体”语义表示的非对齐多视图多标记分类模型。并进行交替优化训练,使模型最小化,直至模型收敛,得到分类预测器。目标函数如下:2. Construct the first-order and second-order relationship matching across views using the feature matrix A non-aligned multi-view multi-label classification model with semantic representation of “community-single entity”. And perform alternating optimization training to minimize the model until the model converges and obtain the classification predictor. The objective function is as follows:
具体步骤如下:Specific steps are as follows:
(a)输入特征矩阵Xv;可观测标记矩阵Y;共享子空间的维度;公式(6)中的超参数;收敛阈值;迭代次数。后面四项输入值可能会由于数据集不同而改变以达到更好的效果。(a) Input feature matrix X v ; observable label matrix Y; dimension of shared subspace; hyperparameters in formula (6); convergence threshold; number of iterations. The last four input values may be changed due to different data sets to achieve better results.
(b)随机初始化Mv、Hv、Wv、P、W0和A。通过特征矩阵构造每个视图的邻接矩阵Sa。(b) Randomly initialize M v , H v , W v , P, W 0 and A. Construct the adjacency matrix S a of each view through the feature matrix.
(c)根据公式(5),(7),(9),(11),(13),(15)分别交替迭代优化Mv、P、A、Hv、W0和Wv。直到满足迭代停止条件,上述迭代停止条件可以为目标函数值两次迭代的差值小于收敛阈值,或者达到迭代的最大次数,最后输出目标函数的最优解,得到基于图匹配机制的非对齐多视图多标记学习模型分类器。(c) According to formulas (5), (7), (9), (11), (13), and (15), M v , P, A, H v , W 0 and W v are alternately and iteratively optimized respectively. Until the iteration stop condition is met, the above iteration stop condition can be that the difference between the two iterations of the objective function value is less than the convergence threshold, or the maximum number of iterations is reached, and finally the optimal solution of the objective function is output, and a non-aligned multi-step algorithm based on the graph matching mechanism is obtained. View multi-label learning model classifier.
3、基于收敛后的基于图匹配机制的非对齐多视图多标记学习模型,对测试集进行预测,利用输出概率获得标记预测结果,具体步骤包括:标记预测矩阵将会被等式给出。为得到准确的标记信息,设置某个阈值,/>向量中元素高于此阈值设置为1,即此标记为样本的标记。低于此阈值设置为0,表示该标记不是此样本标记。阈值的设置一般可以取0.5,但是不同数据集该阈值往往取值不同。3. Based on the converged non-aligned multi-view multi-label learning model based on the graph matching mechanism, predict the test set and use the output probability to obtain the label prediction results. The specific steps include: label prediction matrix will be equal to given. In order to obtain accurate marking information, set a certain threshold,/> Elements in the vector that are higher than this threshold are set to 1, that is, this mark is the mark of the sample. Below this threshold is set to 0, indicating that the marker is not a marker for this sample. The threshold setting can generally be 0.5, but the threshold often takes different values in different data sets.
本发明在六个真实世界的数据集上进行了实验,以进行深入的实验研究。并且在实验中使用的六个多视图数据集都是公开的。它们的统计数据汇总见表1。对于每个数据集,表1总结了样本的数量(n);视图的数量(m);不同标记的数量(c);每个样本的平均标记数量(#avg);所有视图的最小维数(dmin)。The present invention is experimented on six real-world data sets to conduct in-depth experimental studies. And the six multi-view datasets used in the experiments are all publicly available. Their statistical data are summarized in Table 1. For each dataset, Table 1 summarizes the number of samples (n); the number of views (m); the number of different labels (c); the average number of labels per sample (#avg); and the minimum dimensionality of all views (d min ).
表1 六个多视图数据集的统计数据Table 1 Statistics of six multi-view datasets
Emotions是一个音乐数据集,每个示例的两个视图对应于一段音乐的节奏特征和音色特征;Yeast是一个生物数据集,每个例子的两个视图对应于一个基因的遗传表达和系统发育;Corel5k,Pascal07,ESPGame,Mirflicker是四个被广泛使用的多视图图像数据集。从中收集了这些图像的多个特征,每个图像由6个具有代表性的颜色空间视图表示,分别面向不同的应用背景:HUE、SIFT、GIST、HSV、RGB和LAB。通过随机打乱实例,使得不同视图中的实例描述不同对象,获得六个非对齐多视图多标记数据集。此外,为了验证本发明所述方法MCGM的有效性,将本发明方法MCGM与以下6种多标记方法进行了比较。其中两种单视图多标记方法采用串联策略,将多视图数据集转变为单视图数据集后再进行实验。其他的方法均是多视图多标记学习方法。对比方法包括单视图多标记学习方法MLkNN、LLSF,分别发表计算机视觉领域顶级期刊2007PR和数据挖掘领域顶级会议2016TKDE。多视图多标记学习方法FIMAN、ICM2L、iMvWL和BEMVL,分别发表在数据挖掘领域顶级期刊2020SIGKDD,人工智能领域顶级期刊2019TCYB,人工智能领域顶级会议2018IJCAI,数据挖掘领域顶级会议2022TKDD。本方法使用在多标记学习中广泛使用的五个评价指标来衡量每个算法的性能。具体的评估指标包括Average Precision、Coverage、Hamming Loss、One Error和rankingLoss。每个数据集下每个度量值的平均值和标准差将会在表2到表7中显示。需要注意的是本发明在表格中显示的是1-Ranking Loss的值。Emotions is a music data set, and the two views of each example correspond to the rhythm characteristics and timbre characteristics of a piece of music; Yeast is a biological data set, and the two views of each example correspond to the genetic expression and phylogeny of a gene; Corel5k, Pascal07, ESPGame, Mirflicker are four widely used multi-view image datasets. Multiple features of these images were collected, and each image was represented by 6 representative color space views, respectively oriented to different application backgrounds: HUE, SIFT, GIST, HSV, RGB and LAB. By randomly shuffling the instances so that instances in different views describe different objects, six non-aligned multi-view multi-label datasets are obtained. In addition, in order to verify the effectiveness of the MCGM method of the present invention, the MCGM method of the present invention was compared with the following six multi-label methods. Two of the single-view multi-label methods use a concatenation strategy to convert multi-view data sets into single-view data sets before conducting experiments. Other methods are multi-view multi-label learning methods. Comparative methods include single-view multi-label learning methods MLkNN and LLSF, which were published in the top journals in the field of computer vision 2007PR and the top conference in the field of data mining 2016TKDE. The multi-view multi-label learning methods FIMAN, ICM2L, iMvWL and BEMVL were published in the top journal in the field of data mining 2020SIGKDD, the top journal in the field of artificial intelligence 2019TCYB, the top conference in the field of artificial intelligence 2018IJCAI, and the top conference in the field of data mining 2022TKDD. This method uses five evaluation metrics that are widely used in multi-label learning to measure the performance of each algorithm. Specific evaluation indicators include Average Precision, Coverage, Hamming Loss, One Error and rankingLoss. The mean and standard deviation of each metric for each data set are shown in Tables 2 to 7. It should be noted that the present invention displays the value of 1-Ranking Loss in the table.
表2 Emotions的实验结果(平均值±标准差)Table 2 Experimental results of Emotions (mean ± standard deviation)
表3 Yeast的实验结果(平均值±标准差)Table 3 Experimental results of Yeast (mean ± standard deviation)
表4 Corel5k的实验结果(平均值±标准差)Table 4 Experimental results of Corel5k (mean ± standard deviation)
表5 Pascal07的实验结果(平均值±标准差)Table 5 Experimental results of Pascal07 (mean ± standard deviation)
表6 ESPGame的实验结果(平均值±标准差)Table 6 Experimental results of ESPGame (mean ± standard deviation)
表7 Mirflicker的实验结果(平均值±标准差)Table 7 Experimental results of Mirflicker (mean ± standard deviation)
从表2到表7报告的结果中,可以观察到,无论是在大的数据集还是小的数据集,MCGM在大多数情况下都优于其他比较方法。在30种实验设置(6个数据集和5个评价指标)中,本发明方法在结果中排名第一和第二的比率分别是57%和40%。并且没有一个方法在指标上明显优于本发明方法。From the results reported in Table 2 to Table 7, it can be observed that MCGM outperforms other comparison methods in most cases, both in large and small datasets. In 30 experimental settings (6 data sets and 5 evaluation indicators), the ratio of the inventive method ranking first and second in the results is 57% and 40% respectively. And none of the methods is significantly better than the method of the present invention in terms of indicators.
MCGM与LLSF和MLkNN的比较表明,可以看到传统的多标记方法的性能通过并联策略改进成为多视角多标记学习方法是有缺陷的,主要是因为它们忽略了多视图的一致性和互补信息挖掘。也就是说它们忽略了在数据集中的各个视图的物理意义。The comparison of MCGM with LLSF and MLkNN shows that the performance of traditional multi-label methods can be seen to be improved by the parallel strategy into multi-view multi-label learning methods which are flawed mainly because they ignore the consistency and complementary information mining of multi-views. . That is, they ignore the physical meaning of individual views in the data set.
MCGM与FIMAN、ICM2L、BEMVL和iMvWL的比较表明,本发明方法在处理非对齐视图问题上具有很好的性能。由于其它算法并没有考虑到视图非对齐问题,在遇到视图非对齐情况下是存在缺陷的。其中iMvWL忽略了视图的多样性,这导致它在视图信息提取方面有局限性。Comparison of MCGM with FIMAN, ICM2L, BEMVL and iMvWL shows that the invented method has good performance in dealing with non-aligned view problems. Since other algorithms do not consider the problem of view misalignment, they are flawed when encountering view misalignment. Among them, iMvWL ignores the diversity of views, which leads to its limitations in view information extraction.
需要说明的是,本发明实施例的方法适用于非对齐多视图多标记分类问题。It should be noted that the method in the embodiment of the present invention is suitable for non-aligned multi-view multi-label classification problems.
以上对本发明实施例进行了详细介绍,本文中应用了具体实施方式对本发明进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。The embodiments of the present invention have been introduced in detail above, and specific implementation modes are used in this article to illustrate the present invention. The description of the above embodiments is only used to help understand the method of the present invention; at the same time, for those of ordinary skill in the art, based on this The idea of the invention will be subject to change in the specific implementation and scope of application. In summary, the contents of this description should not be understood as limiting the invention.
Claims (2)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311195295.8A CN117409456A (en) | 2023-09-16 | 2023-09-16 | Non-aligned multi-view multi-mark learning method based on graph matching mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311195295.8A CN117409456A (en) | 2023-09-16 | 2023-09-16 | Non-aligned multi-view multi-mark learning method based on graph matching mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117409456A true CN117409456A (en) | 2024-01-16 |
Family
ID=89487860
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311195295.8A Pending CN117409456A (en) | 2023-09-16 | 2023-09-16 | Non-aligned multi-view multi-mark learning method based on graph matching mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117409456A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117690192A (en) * | 2024-02-02 | 2024-03-12 | 天度(厦门)科技股份有限公司 | Abnormal behavior identification method and equipment for multi-view instance-semantic consensus mining |
-
2023
- 2023-09-16 CN CN202311195295.8A patent/CN117409456A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117690192A (en) * | 2024-02-02 | 2024-03-12 | 天度(厦门)科技股份有限公司 | Abnormal behavior identification method and equipment for multi-view instance-semantic consensus mining |
CN117690192B (en) * | 2024-02-02 | 2024-04-26 | 天度(厦门)科技股份有限公司 | Abnormal behavior identification method and equipment for multi-view instance-semantic consensus mining |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wu et al. | Unsupervised Deep Hashing via Binary Latent Factor Models for Large-scale Cross-modal Retrieval. | |
Huang et al. | Multi-view intact space clustering | |
Liu et al. | Conditional convolution neural network enhanced random forest for facial expression recognition | |
Zhang et al. | Generalized semi-supervised and structured subspace learning for cross-modal retrieval | |
Liu et al. | Multiview Hessian discriminative sparse coding for image annotation | |
CN110348579B (en) | A method and system for domain self-adaptive transfer feature | |
CN106650808A (en) | Image classification method based on quantum nearest-neighbor algorithm | |
WO2022267954A1 (en) | Spectral clustering method and system based on unified anchor and subspace learning | |
CN111126464A (en) | An Image Classification Method Based on Unsupervised Domain Adversarial Domain Adaptation | |
CN114067385A (en) | Cross-modal face retrieval Hash method based on metric learning | |
Qiang et al. | Deep semantic similarity adversarial hashing for cross-modal retrieval | |
CN106845462A (en) | The face identification method of feature and cluster is selected while induction based on triple | |
Tan et al. | Contrastive learning is spectral clustering on similarity graph | |
CN116958613A (en) | Deep multi-view clustering method, device, electronic device and readable storage medium | |
Chapel et al. | Partial gromov-wasserstein with applications on positive-unlabeled learning | |
Wang et al. | Generative partial multi-view clustering | |
CN117409279A (en) | Key methods of multi-modal information fusion based on data privacy protection | |
CN117173702A (en) | Multi-view multi-label learning method based on deep feature map fusion | |
CN117409456A (en) | Non-aligned multi-view multi-mark learning method based on graph matching mechanism | |
Chen et al. | Constrained matrix factorization for semi-weakly learning with label proportions | |
Yu et al. | Enhancing Label Correlations in multi-label classification through global-local label specific feature learning to Fill Missing labels | |
CN112990340B (en) | Self-learning migration method based on feature sharing | |
CN114817581A (en) | Cross-modal Hash retrieval method based on fusion attention mechanism and DenseNet network | |
Tian et al. | Ordinal margin metric learning and its extension for cross-distribution image data | |
Zhang et al. | Easy-to-hard domain adaptation with human interaction for hyperspectral image classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |