CN117409456A

CN117409456A - Non-aligned multi-view multi-mark learning method based on graph matching mechanism

Info

Publication number: CN117409456A
Application number: CN202311195295.8A
Authority: CN
Inventors: 杨震; 钟淇宇; 吕庚育
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2023-09-16
Filing date: 2023-09-16
Publication date: 2024-01-16

Abstract

The invention discloses a non-aligned multi-view multi-mark learning method based on a graph matching mechanism. Based on training data in the sample data set, a feature matrix and an observable mark matrix are constructed. Explicit alignment is carried out on unaligned data through a permutation matrix, namely point-to-point first-order alignment among samples; and carrying out second-order alignment on the graph structure on the view by using distance matrixes of samples in different views, so that the alignment accuracy of the model is further improved. The expression of 'commonality-personality' among aligned data views is mined, and a non-aligned multi-view multi-label learning model based on a graph matching mechanism is constructed by utilizing consistency and complementarity of cross views. Training the model by an alternative optimization method until the model converges to obtain the classification predictor.

Description

Unaligned multi-view multi-label learning method based on graph matching mechanism

技术领域Technical field

本发明涉及非负矩阵分解、图匹配技术、非对齐多视图学习、多标记分类，具体涉及一种基于图匹配机制的非对齐多视图多标记学习方法。The invention relates to non-negative matrix decomposition, graph matching technology, non-aligned multi-view learning, and multi-label classification, and specifically relates to a non-aligned multi-view multi-label learning method based on a graph matching mechanism.

背景技术Background technique

大规模的互联网发展，大数据与人工智能技术应用的普及，随之也带来了海量的多视图多标记数据。多视图多标记学习作为解决此类问题的主要架构得到了广泛的关注。在多视图多标记学习任务中，每个样本对象由多种异构的视图信息进行表示，同时标注有若干相关联的标记。然而现有的多视图多标记学习方法通常是探索跨视图的相关信息和互补信息。这些信息的探寻通常是基于视图对齐关系(不同视图中的实例描述同一对象)。然而，由于空间、时间或时空异步导致，这种视图对齐关系可以变为部分视图对齐或视图非对齐关系。如：在视频推荐中，标记数据来自不同的视频软件，但由于用户的隐私保护原则，不能将这些数据与同一用户进行匹配和对齐。在人脸识别领域，由于人脸特征检测失败，多视图人脸无法对齐，这会导致人脸表情识别无法进行。现有多视图多标记学习模型无法直接从这些非对齐数据中学习到鲁棒的多标记分类模型。The large-scale development of the Internet and the popularization of big data and artificial intelligence technology applications have also brought massive multi-view and multi-label data. Multi-view multi-label learning has received widespread attention as the main architecture for solving such problems. In the multi-view multi-label learning task, each sample object is represented by a variety of heterogeneous view information and is marked with several associated labels. However, existing multi-view multi-label learning methods usually explore relevant and complementary information across views. The search for this information is usually based on view alignment relationships (instances in different views describe the same object). However, this view alignment relationship can become a partial view alignment or a view non-alignment relationship due to spatial, temporal, or spatiotemporal asynchrony. For example: in video recommendation, the labeled data comes from different video software, but due to the user's privacy protection principle, these data cannot be matched and aligned with the same user. In the field of face recognition, due to the failure of facial feature detection, multi-view faces cannot be aligned, which makes facial expression recognition impossible. Existing multi-view multi-label learning models cannot directly learn robust multi-label classification models from these non-aligned data.

为此，本发明提出一种基于图匹配机制的非对齐多视图多标记学习方法(non-aligned multi-view multi-label classification via graph matching mechanism，简称MCGM)以解决多视图多标记学习中的试图非对齐问题以及语义全面表达问题。针对多视图数据存在的非对齐问题，通过挖掘跨视图“实例-实例”和“实例关系-实例关系”图匹配关系，实现同一实例在不同视图下特征节点的精准对齐，并用于后续分类任务；针对现有基于共享子空间表示的多视图多标记学习算法难以刻画多视图数据的全部语义信息，设计一种基于“共体-单体”语义表示的多视图多标记分类模型，强调单体视图在特定语义表达的贡献，促进少类别样本的语义表达。To this end, the present invention proposes a non-aligned multi-view multi-label classification via graph matching mechanism (MCGM for short) based on a graph matching mechanism to solve the problems in multi-view multi-label learning. Non-alignment issues and semantic comprehensive expression issues. In order to solve the non-alignment problem of multi-view data, by mining the cross-view "instance-instance" and "instance relationship-instance relationship" graph matching relationships, the feature nodes of the same instance in different views can be accurately aligned, and used for subsequent classification tasks; As the existing multi-view multi-label learning algorithm based on shared subspace representation is difficult to describe all the semantic information of multi-view data, a multi-view multi-label classification model based on "community-single" semantic representation is designed, emphasizing the single view. Contribution to specific semantic expressions, promoting semantic expression of few-category samples.

发明内容Contents of the invention

本发明的技术解决问题是：提出一种基于图匹配机制的非对齐多视图多标记学习方法，实现对非对齐多视图多标记数据的分类，并通过语义全面表达保证方法的效率和准确性。The technical problem solved by the present invention is to propose a non-aligned multi-view multi-label learning method based on a graph matching mechanism, realize the classification of non-aligned multi-view multi-label data, and ensure the efficiency and accuracy of the method through comprehensive semantic expression.

本发明的技术解决方案为：一种基于图匹配机制的非对齐多视图多标记学习方法，首先获取非对齐多视图多标记数据，对非对齐多视图多标记数据进行存储、预处理和数据集划分，形成样本数据集。基于样本数据集中的训练数据，构建特征矩阵、可观测标记矩阵。根据特征矩阵、可观测标记矩阵非对齐数据对齐：1)通过置换矩阵对非对齐数据进行显式的对齐，即样本之间点对点的一阶对齐；2)利用不同视图中样本的距离矩阵对视图进行图结构上的二阶对齐，从而进一步提高模型的对齐准确性。挖掘对齐的数据视图间“共性-个性”的表达，利用跨视图的一致性和互补性构建基于图匹配机制的非对齐多视图多标记学习模型。通过交替优化方法对模型进行训练，直至模型收敛，得到分类预测器。基于收敛后的分类器对测试集进行预测，利用输出概率获得标记分类结果。其具体步骤如下：The technical solution of the present invention is: a non-aligned multi-view multi-label learning method based on a graph matching mechanism. First, the non-aligned multi-view multi-label data is obtained, and the non-aligned multi-view multi-label data is stored, preprocessed and data set. Divide to form a sample data set. Based on the training data in the sample data set, a feature matrix and an observable label matrix are constructed. Alignment of non-aligned data according to the feature matrix and observable label matrix: 1) Explicitly align the non-aligned data through the permutation matrix, that is, point-to-point first-order alignment between samples; 2) Use the distance matrix of samples in different views to align the views Perform second-order alignment on the graph structure to further improve the alignment accuracy of the model. Mining the expression of "commonality-individuality" between aligned data views, and utilizing the consistency and complementarity across views to build a non-aligned multi-view multi-label learning model based on the graph matching mechanism. The model is trained through the alternating optimization method until the model converges, and a classification predictor is obtained. Predict the test set based on the converged classifier, and use the output probability to obtain the label classification result. The specific steps are as follows:

在本发明中，矩阵由加粗大写字母表示，如X。向量由加粗小写字母表示，如x。另外，(XR)表示由X.R得到的矩阵，其中·为矩阵乘法。矩阵X的逆和转置分别表示为X^-1，X^T。X_v表示第v个视图的特征矩阵，X_v的第i列和第j行分别记为(X_v)_：,i和(X_v)_j，：。(X_v)_i，j是X_v的(i，j)元素，x_i表示向量x的第i个元素。另外，我们用代表实数域，弗罗贝尼乌斯(Frobenius)范数记为/> In the present invention, matrices are represented by bold capital letters, such as X. Vectors are represented by bold lowercase letters, such as x. In addition, (XR) represents the matrix obtained from XR, where · is matrix multiplication. The inverse and transpose of matrix X are represented as X ^-1 and X ^T respectively. X _v represents the feature matrix of the v-th view, and the i-th column and j-th row of X _v are recorded as (X _v ) _:,i and (X _v ) _j,: respectively. (X _v ) _{i, j} is the (i, j) element of X _v , and _xi represents the i-th element of vector x. Additionally, we use Represents the field of real numbers, and the Frobenius norm is expressed as/>

步骤S1，获取非对齐多视图多标记数据，对非对齐多视图多标记数据进行存储、预处理和数据集划分。由于该问题是一个全新的问题，目前没有公开的非对齐数据集，因此采用人工合成数据集。具体地，以6个公开的多视图多标记数据集为基础，通过随机打乱实例，使得不同视图中的实例描述不同对象。获得非对齐多视图多标记数据集。Step S1: Obtain non-aligned multi-view multi-label data, store, pre-process and divide the data set into non-aligned multi-view multi-label data. Since this problem is a completely new one and there is currently no public non-aligned dataset, a synthetic dataset is used. Specifically, based on 6 public multi-view multi-label data sets, instances are randomly shuffled so that instances in different views describe different objects. Obtain non-aligned multi-view multi-label datasets.

用训练数据构建一个具有V个视图样本数据集其中/> 是第v个视图的完整特征空间，n代表训练样本x_i的数量，d_v表示每个样本的维度。表示与特征空间对应的标记空间。其中y_i∈{0，1}^n×q为x_i的标记向量，q表示标记数。Construct a sample data set with V views using the training data Among them/> is the complete feature space of the v-th view, n represents the number of training samples _xi , and d _v represents the dimension of each sample. Represents the label space corresponding to the feature space. Among them, _yi ∈ {0, 1} ^n×q is the tag vector of _xi , and q represents the number of tags.

步骤S2，对步骤S1中获取样本数据集中训练数据构建的特征矩阵X_v，可观测标记矩阵Y构建跨视图的一阶和二阶关系匹配，实现多个视图的特征对齐，并基于此构建基于“共体-单体”语义表示的非对齐多视图多标记分类模型。具体步骤包括：Step S2: Construct the _first -order and second-order relationship matching across views for the feature matrix A non-aligned multi-view multi-label classification model with "community-single" semantic representation. Specific steps include:

(a)利用置换矩阵对非对齐数据进行显式的对齐，进行实例之间点对点的一阶对齐。经过一阶对齐后，得到了实例间相对正确的映射关系，采用非负矩阵分解，提取不同视图数据的共同低维表示。基于获得的共同表示和可观测标记矩阵引入特征映射矩阵W₀构建共享子空间到标记空间的线形映射关系，获得初始学习模型：(a) Use the permutation matrix to explicitly align the non-aligned data and perform point-to-point first-order alignment between instances. After first-order alignment, a relatively correct mapping relationship between instances is obtained, and non-negative matrix decomposition is used to extract a common low-dimensional representation of data from different views. Based on the obtained common representation and observable label matrix, the feature mapping matrix W ₀ is introduced to construct a linear mapping relationship from the shared subspace to the label space, and the initial learning model is obtained:

s.t.(M_v)_i，j∈{0，1}，M_v1＝1，1^TM_v＝1^T，P≥0，H_v≥0，W₀≥0#(1)st(M _v ) _{i, j} ∈ {0, 1}, M _v 1＝1, 1 ^T M _v ＝1 ^T , P≥0, H _v ≥0, W ₀ ≥0#(1)

其中，是第v个视图的置换矩阵，特征矩阵经过置换矩阵相乘得到对齐后的多视图数据。P和H_v是对齐后的数据经过非负矩分解得到的。/>表示第v个视图的个体映射矩阵；/>是共享子空间，其中k是期望的数据降低维数后的大小；P≥0和H_v≥0是矩阵的非负约束。/>是P对应的系数矩阵，因此也约束W₀≥0。W₀通过共享子空间P到Y的映射关系，学习异构视图的共性信息。α和γ是两个超参数。最后一项是是对W₀进行正则化约束，是为了避免过拟合问题，减少噪声特征的影响。in, is the permutation matrix of the v-th view. The feature matrix is multiplied by the permutation matrix to obtain the aligned multi-view data. P and H _v are obtained by non-negative moment decomposition of the aligned data. /> Represents the individual mapping matrix of the v-th view;/> is a shared subspace, where k is the size of the expected data after reducing the dimension; P ≥ 0 and H _v ≥ 0 are non-negative constraints of the matrix. /> is the coefficient matrix corresponding to P, so W ₀ ≥ 0 is also constrained. W ₀ learns the common information of heterogeneous views by sharing the mapping relationship from subspace P to Y. α and γ are two hyperparameters. The last item is to regularize W ₀ to avoid over-fitting problems and reduce the influence of noise features.

(b)考虑到个体映射矩阵实际上是用于编码单个视图的相应的特性，定义了另一组系数矩阵/>来捕获单个视图的独特特性，其中，/>将带有第v个视图个性信息的重构特征矩阵PH_v映射到标记空间。利用异构视图的个性和共性信息，并进一步扩展公式(1)如下：(b) Considering the individual mapping matrix In fact, another set of coefficient matrices is defined for encoding the corresponding characteristics of a single view/> to capture the unique characteristics of a single view, where/> Map the reconstructed feature matrix PH _v with the v-th view personality information to the label space. Utilize the individuality and commonality information of heterogeneous views and further expand formula (1) as follows:

s.t.(M_v)_i，j∈{0，1}，M_v1＝1，1^TM_v＝1^T，P≥0，H_v≥0，W₀≥0，W_v≥0#(2)st(M _v ) _{i, j} ∈ {0, 1}, M _v 1＝1, 1 ^T M _v ＝1 ^T , P≥0, H _v ≥0, W ₀ ≥0, W _v ≥0#(2 )

其中，P可以看作是包含所有视图信息的字典矩阵，而H_v表示单个视图特定的编码系数。因此，PH_v的目标是捕获特定视图的个体信息，而P则捕获所有视图的共享信息。因此，PH_v和P可以看作是多视图数据的“共性-个性”的表达。Among them, P can be regarded as a dictionary matrix containing all view information, and H _v represents a single view-specific encoding coefficient. Therefore, PH _v aims to capture the individual information of a specific view, while P captures the shared information of all views. Therefore, PH _v and P can be regarded as the expression of the "commonality-individuality" of multi-view data.

(c)进一步考虑视图间二阶图结构的对齐和标记相关性。由于M_vX_v和M_jX_j分别表示视图X_v和视图X_j的正确对齐视图。用跨视图映射矩阵来表示视图X_v和视图X_j之间的跨视图样本的匹配度。基于对多视图数据对齐后形成的图结构应尽可能一致的共同认知，构建了结构匹配损失项来探索正确的跨视图映射关系。为每个视图建立各自样本间的距离矩阵S_v，代表视图v的图结构。通过跨视图映射矩阵/>置换距离矩阵S_v和S_j，使两个视图的图连接结构尽可能相似。/>表示标记相关性矩阵，拟通过矩阵A利用多标记数据中的相关性，提取出已知的相关标记信息。引入超参数β平衡二阶对齐的权重。最后，最终的基于图匹配机制的非对齐多视图多标记学习模型表述如下：(c) Further consider the alignment and labeling correlation of second-order graph structures between views. Since M _v X _v and M _j X _j represent the correctly aligned views of view X _v and view X _j respectively. Mapping matrices across views to represent the matching degree of cross-view samples between view X _v and view X _j . Based on the common understanding that the graph structure formed after multi-view data alignment should be as consistent as possible, a structure matching loss term is constructed to explore the correct cross-view mapping relationship. A distance matrix S _v between respective samples is established for each view, representing the graph structure of view v. By mapping matrices across views/> Replace the distance matrices S _v and S _j to make the graph connection structure of the two views as similar as possible. /> Represents the tag correlation matrix. It is intended to use the correlation in multi-mark data through matrix A to extract known relevant tag information. The hyperparameter β is introduced to balance the weight of the second-order alignment. Finally, the final unaligned multi-view multi-label learning model based on the graph matching mechanism is expressed as follows:

s.t.(M_v)_i，j∈{0，1}，M_v1＝1，1^TM_v＝1^T，P≥0，H_v≥0，W₀≥0，W_v≥0st(M _v ) _{i, j} ∈ {0, 1}, M _v 1＝1, 1 ^T M _v ＝1 ^T , P≥0, H _v ≥0, W ₀ ≥0, W _v ≥0

其中 in

步骤S3，对S2中的基于图匹配机制的非对齐多视图多标记学习模型简化表述，并进行交替优化训练，使模型最小化，直至模型收敛，得到分类预测器。具体步骤包括：Step S3: Simplify the expression of the non-aligned multi-view multi-label learning model based on the graph matching mechanism in S2, and conduct alternate optimization training to minimize the model until the model converges to obtain a classification predictor. Specific steps include:

(a)由于M_v的约束是非凸很难得到最优解。又由于正交变换不会改变向量之间的关系，这意味着不那么严格的约束也可以保留数据的结构。把M_v的约束放宽为：(a) Since the constraints of M _v are non-convex, it is difficult to obtain the optimal solution. And since orthogonal transformations do not change the relationship between vectors, this means that less stringent constraints can also preserve the structure of the data. Relax the constraint of M _v to:

(b)通过引入拉格朗日乘子λ，Φ，Θ，Ω，Ψ，将目标函数转变为无约束问题。以视图υ为例，将目标函数(3)在视图υ上变为以下形式：(b) By introducing Lagrange multipliers λ, Φ, Θ, Ω, Ψ, the objective function is transformed into an unconstrained problem. Taking view υ as an example, the objective function (3) on view υ becomes the following form:

(c)迭代优化固定P，/>A，W₀，/>时，M_v计算独立于M_v′，v′≠v。因此对于每一个视图v，对于M_v单独进行优化。(c) Iterative optimization Fixed P,/> A, W ₀ ,/> When , M _v is calculated independently of M _v′ , v′≠v. Therefore, for each view v, M _v is optimized separately.

标准的方法求解耦合方程等式(3)和约束是使用非线性方法，如牛顿方法。这个非线性方程组通常很难求解。选择寻求一个近似解。通过该方法，可以得到以下关于M_v的迭代更新规则：Standard approach to solving coupled equations Equation (3) and constraints is to use nonlinear methods, such as Newton's method. This nonlinear system of equations is often difficult to solve. Choose to seek an approximate solution. Through this method, the following iterative update rules for M _v can be obtained:

其中：in:

(d)迭代优化P。固定A，W₀，/>时，公式(3)中/>对P求导数，可以得到：(d) Iterative optimization of P. fixed A, W ₀ ,/> When, in formula (3)/> Taking the derivative of P, we can get:

使用KKT条件，即Φ_i，jP_i，j＝0，可以得到以下关于P的迭代更新规则：Using the KKT condition, that is, Φ _{i, j} P _{i, j} = 0, the following iterative update rules for P can be obtained:

(e)迭代优化A。固定P，W₀，/>时，公式(3)中/>对A求导数，可以得到：(e) Iterative optimization A. fixed P, W ₀ ,/> When, in formula (3)/> Taking the derivative of A, we can get:

令导数等于0，可以得到以下关于A更新规则：Let the derivative equal to 0, we can get the following update rules for A:

(f)迭代优化固定P，/>A，W₀，/>时，H_v计算独立于H_v，v′≠v。因此对于每一个视图v，对于H_v单独进行优化，公式(3)中/>对H_v求导数，可以得到：(f)Iterative optimization Fixed P,/> A, W ₀ ,/> When , H _v is calculated independently of H _v , v′≠v. Therefore, for each view v, H _v is optimized separately, in formula (3)/> Taking the derivative of H _v , we can get:

使用KKT条件，即Θ_i，j(H_v)_i，j＝0，可以得到以下关于H_v的迭代更新规则：Using the KKT condition, that is, Θ _{i, j} (H _v ) _{i, j} = 0, the following iterative update rules for H _v can be obtained:

(g)迭代优化W₀。固定P，A，/>时，公式(3)中/>对W₀求导数，可以得到：(g) Iterative optimization of W ₀ . Fixed P, A,/> When, in formula (3)/> Taking the derivative of W ₀ , we can get:

使用KKT条件，即Ω_i，j(W₀)_i，j＝0，可以得到以下关于W₀的迭代更新规则：Using the KKT condition, that is, Ω _i,j (W ₀ ) _i,j = 0, the following iterative update rules for W ₀ can be obtained:

(h)迭代优化固定P，/>A，/>W₀时，公式(3)中/>对W_v求导数，可以得到：(h) Iterative optimization Fixed P,/> A,/> When W ₀ , in formula (3)/> Taking the derivative of W _v , we can get:

使用KKT条件，即Ψ_i，j(W_v)_i，j＝0可以得到以下关于W_v更新规则：Using the KKT condition, that is, Ψ _{i, j} (W _v ) _{i, j} = 0, the following update rules for W _v can be obtained:

(i)重复(c)到(h)，不断交替更新参数A，W₀，/>，P直到满足迭代停止条件。目标函数收敛，输出所述基于图匹配机制的非对齐多视图多标记学习模型的最优参数，得到分类预测器。(i) Repeat (c) to (h), updating parameters alternately A, W ₀ ,/> ,P until the iteration stop condition is met. The objective function converges, outputs the optimal parameters of the non-aligned multi-view multi-label learning model based on the graph matching mechanism, and obtains a classification predictor.

步骤S4，基于收敛后的基于图匹配机制的非对齐多视图多标记学习模型，对测试集进行预测，利用输出概率获得标记预测结果，具体步骤包括：标记预测矩阵将会被等式给出。Step S4, based on the converged non-aligned multi-view multi-label learning model based on the graph matching mechanism, predict the test set, and use the output probability to obtain the label prediction result. The specific steps include: label prediction matrix will be equal to given.

本发明与现有技术相比的优点在于：The advantages of the present invention compared with the prior art are:

1、针对多视图多标记学习任务中视图非对齐问题，提出了一阶和二阶对齐来解决这个问题。重新排序矩阵可以自适应地对每个视图中的特征进行重排序，进行一阶对齐得到正确映射关系。因此，视图非对齐问题可以简化为视图对齐问题，此外，利用不同视图中样本的距离矩阵对视图进行结构上的二阶对齐，从而提高对齐的效率和准确性。1. In view of the view non-alignment problem in multi-view multi-label learning tasks, first-order and second-order alignment are proposed to solve this problem. The reordering matrix can adaptively reorder the features in each view and perform first-order alignment to obtain the correct mapping relationship. Therefore, the view non-alignment problem can be simplified to a view alignment problem. In addition, the distance matrices of samples in different views are used to perform structural second-order alignment of the views, thereby improving the efficiency and accuracy of the alignment.

2、针对多视图多标记语义全面表达问题，本发明的方法可以联合利用多视图多标记数据的一致性和多样性信息。模型从不同的视图、标记相关性、一种基于个体和共享特征空间的集成分类器中学习一个共享的子空间。将对齐的数据输入这个基于“共体-单体”语义表示的多视图多标记分类模型，强调单体视图在特定语义表达的贡献，促进少类别样本的语义表达。2. Aiming at the problem of comprehensive expression of multi-view and multi-label semantics, the method of the present invention can jointly utilize the consistency and diversity information of multi-view and multi-label data. The model learns a shared subspace from different views, label correlations, an ensemble classifier based on individual and shared feature spaces. Input the aligned data into this multi-view multi-label classification model based on the "community-single" semantic representation, emphasizing the contribution of the single view in specific semantic expression, and promoting the semantic expression of samples with few categories.

3、通过引入动态标记关联矩阵A来学习标记中存在的隐形相关。虽然可以基于已知标记矩阵估计的固定标记关联矩阵。但是仅用已知标记的样本可能是不充分的，提出的动态标记关联矩阵能够自适应的度量标记之间的相关关系，帮助提升多标记分类模型学习性能。3. Learn the invisible correlation existing in the tags by introducing the dynamic tag association matrix A. While a fixed tag correlation matrix can be estimated based on a known tag matrix. However, only using known labeled samples may not be sufficient. The proposed dynamic label association matrix can adaptively measure the correlation between labels and help improve the learning performance of multi-label classification models.

4、模型被简化为了一般的情况，提出了一种求解目标函数的迭代优化方法。在一定的时间复杂度下寻求模型的近似解。之后，在六个真实世界的数据集上验证了模型的有效性。4. The model is simplified to a general situation, and an iterative optimization method for solving the objective function is proposed. Find an approximate solution to the model under a certain time complexity. Afterwards, the effectiveness of the model is verified on six real-world datasets.

附图说明Description of the drawings

图1为本发明方法的处理流程图。Figure 1 is a processing flow chart of the method of the present invention.

图2为本发明方法的训练工作流程图。Figure 2 is a training workflow diagram of the method of the present invention.

具体实施方式Detailed ways

为了使本技术领域的人员更好地理解本发明实施例的方案，下面结合附图和实施方式对本发明实施例作进一步的详细说明。In order to enable those skilled in the art to better understand the solutions of the embodiments of the present invention, the embodiments of the present invention will be further described in detail below in conjunction with the drawings and implementation modes.

如图1所示，本发明包括以下步骤：As shown in Figure 1, the present invention includes the following steps:

1、获取非对齐多视图多标记数据，对数据进行存储、预处理和数据集划分。构建的特征矩阵X_v，可观测标记矩阵Y。由于该问题是一个全新的问题，目前没有公开的非对齐数据集，因此采用了人工合成数据集。具体地说，以6个公开的多视图多标记数据集为基础，通过随机打乱实例，使得不同视图中的实例描述不同对象。获得非对齐多视图多标记数据集。用训练数据构建一个具有V个视图样本数据集是与特征集对应的标记空间。1. Obtain non-aligned multi-view multi-label data, store, preprocess and divide the data set. Constructed feature matrix X _v , observable label matrix Y. Since this problem is a completely new one and there are currently no publicly available non-aligned datasets, a synthetic dataset is used. Specifically, based on 6 public multi-view multi-label data sets, instances are randomly shuffled so that instances in different views describe different objects. Obtain non-aligned multi-view multi-label datasets. Construct a sample data set with V views using the training data is the label space corresponding to the feature set.

2、对步骤1中获取样本数据集中训练数据建的特征矩阵X_v，可观测标记矩阵Y构建跨视图的一阶和二阶关系匹配，实现多个视图的特征对齐，并基于此构建基于“共体-单体”语义表示的非对齐多视图多标记分类模型。并进行交替优化训练，使模型最小化，直至模型收敛，得到分类预测器。目标函数如下：2. Construct the first-order and second-order relationship matching across views using _the feature matrix A non-aligned multi-view multi-label classification model with semantic representation of “community-single entity”. And perform alternating optimization training to minimize the model until the model converges and obtain the classification predictor. The objective function is as follows:

具体步骤如下：Specific steps are as follows:

(a)输入特征矩阵X_v；可观测标记矩阵Y；共享子空间的维度；公式(6)中的超参数；收敛阈值；迭代次数。后面四项输入值可能会由于数据集不同而改变以达到更好的效果。(a) Input feature matrix X _v ; observable label matrix Y; dimension of shared subspace; hyperparameters in formula (6); convergence threshold; number of iterations. The last four input values may be changed due to different data sets to achieve better results.

(b)随机初始化M_v、H_v、W_v、P、W₀和A。通过特征矩阵构造每个视图的邻接矩阵S_a。(b) Randomly initialize M _v , H _v , W _v , P, W ₀ and A. Construct the adjacency matrix S _a of each view through the feature matrix.

(c)根据公式(5)，(7)，(9)，(11)，(13)，(15)分别交替迭代优化M_v、P、A、H_v、W₀和W_v。直到满足迭代停止条件，上述迭代停止条件可以为目标函数值两次迭代的差值小于收敛阈值，或者达到迭代的最大次数，最后输出目标函数的最优解，得到基于图匹配机制的非对齐多视图多标记学习模型分类器。(c) According to formulas (5), (7), (9), (11), (13), and (15), M _v , P, A, H _v , W ₀ and W _v are alternately and iteratively optimized respectively. Until the iteration stop condition is met, the above iteration stop condition can be that the difference between the two iterations of the objective function value is less than the convergence threshold, or the maximum number of iterations is reached, and finally the optimal solution of the objective function is output, and a non-aligned multi-step algorithm based on the graph matching mechanism is obtained. View multi-label learning model classifier.

3、基于收敛后的基于图匹配机制的非对齐多视图多标记学习模型，对测试集进行预测，利用输出概率获得标记预测结果，具体步骤包括：标记预测矩阵将会被等式给出。为得到准确的标记信息，设置某个阈值，/>向量中元素高于此阈值设置为1，即此标记为样本的标记。低于此阈值设置为0，表示该标记不是此样本标记。阈值的设置一般可以取0.5，但是不同数据集该阈值往往取值不同。3. Based on the converged non-aligned multi-view multi-label learning model based on the graph matching mechanism, predict the test set and use the output probability to obtain the label prediction results. The specific steps include: label prediction matrix will be equal to given. In order to obtain accurate marking information, set a certain threshold,/> Elements in the vector that are higher than this threshold are set to 1, that is, this mark is the mark of the sample. Below this threshold is set to 0, indicating that the marker is not a marker for this sample. The threshold setting can generally be 0.5, but the threshold often takes different values in different data sets.

本发明在六个真实世界的数据集上进行了实验，以进行深入的实验研究。并且在实验中使用的六个多视图数据集都是公开的。它们的统计数据汇总见表1。对于每个数据集，表1总结了样本的数量(n)；视图的数量(m)；不同标记的数量(c)；每个样本的平均标记数量(#avg)；所有视图的最小维数(d_min)。The present invention is experimented on six real-world data sets to conduct in-depth experimental studies. And the six multi-view datasets used in the experiments are all publicly available. Their statistical data are summarized in Table 1. For each dataset, Table 1 summarizes the number of samples (n); the number of views (m); the number of different labels (c); the average number of labels per sample (#avg); and the minimum dimensionality of all views (d _min ).

表1 六个多视图数据集的统计数据Table 1 Statistics of six multi-view datasets

Emotions是一个音乐数据集，每个示例的两个视图对应于一段音乐的节奏特征和音色特征；Yeast是一个生物数据集，每个例子的两个视图对应于一个基因的遗传表达和系统发育；Corel5k，Pascal07，ESPGame，Mirflicker是四个被广泛使用的多视图图像数据集。从中收集了这些图像的多个特征，每个图像由6个具有代表性的颜色空间视图表示，分别面向不同的应用背景：HUE、SIFT、GIST、HSV、RGB和LAB。通过随机打乱实例，使得不同视图中的实例描述不同对象，获得六个非对齐多视图多标记数据集。此外，为了验证本发明所述方法MCGM的有效性，将本发明方法MCGM与以下6种多标记方法进行了比较。其中两种单视图多标记方法采用串联策略，将多视图数据集转变为单视图数据集后再进行实验。其他的方法均是多视图多标记学习方法。对比方法包括单视图多标记学习方法MLkNN、LLSF，分别发表计算机视觉领域顶级期刊2007PR和数据挖掘领域顶级会议2016TKDE。多视图多标记学习方法FIMAN、ICM2L、iMvWL和BEMVL，分别发表在数据挖掘领域顶级期刊2020SIGKDD，人工智能领域顶级期刊2019TCYB，人工智能领域顶级会议2018IJCAI，数据挖掘领域顶级会议2022TKDD。本方法使用在多标记学习中广泛使用的五个评价指标来衡量每个算法的性能。具体的评估指标包括Average Precision、Coverage、Hamming Loss、One Error和rankingLoss。每个数据集下每个度量值的平均值和标准差将会在表2到表7中显示。需要注意的是本发明在表格中显示的是1-Ranking Loss的值。Emotions is a music data set, and the two views of each example correspond to the rhythm characteristics and timbre characteristics of a piece of music; Yeast is a biological data set, and the two views of each example correspond to the genetic expression and phylogeny of a gene; Corel5k, Pascal07, ESPGame, Mirflicker are four widely used multi-view image datasets. Multiple features of these images were collected, and each image was represented by 6 representative color space views, respectively oriented to different application backgrounds: HUE, SIFT, GIST, HSV, RGB and LAB. By randomly shuffling the instances so that instances in different views describe different objects, six non-aligned multi-view multi-label datasets are obtained. In addition, in order to verify the effectiveness of the MCGM method of the present invention, the MCGM method of the present invention was compared with the following six multi-label methods. Two of the single-view multi-label methods use a concatenation strategy to convert multi-view data sets into single-view data sets before conducting experiments. Other methods are multi-view multi-label learning methods. Comparative methods include single-view multi-label learning methods MLkNN and LLSF, which were published in the top journals in the field of computer vision 2007PR and the top conference in the field of data mining 2016TKDE. The multi-view multi-label learning methods FIMAN, ICM2L, iMvWL and BEMVL were published in the top journal in the field of data mining 2020SIGKDD, the top journal in the field of artificial intelligence 2019TCYB, the top conference in the field of artificial intelligence 2018IJCAI, and the top conference in the field of data mining 2022TKDD. This method uses five evaluation metrics that are widely used in multi-label learning to measure the performance of each algorithm. Specific evaluation indicators include Average Precision, Coverage, Hamming Loss, One Error and rankingLoss. The mean and standard deviation of each metric for each data set are shown in Tables 2 to 7. It should be noted that the present invention displays the value of 1-Ranking Loss in the table.

表2 Emotions的实验结果(平均值±标准差)Table 2 Experimental results of Emotions (mean ± standard deviation)

表3 Yeast的实验结果(平均值±标准差)Table 3 Experimental results of Yeast (mean ± standard deviation)

表4 Corel5k的实验结果(平均值±标准差)Table 4 Experimental results of Corel5k (mean ± standard deviation)

表5 Pascal07的实验结果(平均值±标准差)Table 5 Experimental results of Pascal07 (mean ± standard deviation)

表6 ESPGame的实验结果(平均值±标准差)Table 6 Experimental results of ESPGame (mean ± standard deviation)

表7 Mirflicker的实验结果(平均值±标准差)Table 7 Experimental results of Mirflicker (mean ± standard deviation)

从表2到表7报告的结果中，可以观察到，无论是在大的数据集还是小的数据集，MCGM在大多数情况下都优于其他比较方法。在30种实验设置(6个数据集和5个评价指标)中，本发明方法在结果中排名第一和第二的比率分别是57％和40％。并且没有一个方法在指标上明显优于本发明方法。From the results reported in Table 2 to Table 7, it can be observed that MCGM outperforms other comparison methods in most cases, both in large and small datasets. In 30 experimental settings (6 data sets and 5 evaluation indicators), the ratio of the inventive method ranking first and second in the results is 57% and 40% respectively. And none of the methods is significantly better than the method of the present invention in terms of indicators.

MCGM与LLSF和MLkNN的比较表明，可以看到传统的多标记方法的性能通过并联策略改进成为多视角多标记学习方法是有缺陷的，主要是因为它们忽略了多视图的一致性和互补信息挖掘。也就是说它们忽略了在数据集中的各个视图的物理意义。The comparison of MCGM with LLSF and MLkNN shows that the performance of traditional multi-label methods can be seen to be improved by the parallel strategy into multi-view multi-label learning methods which are flawed mainly because they ignore the consistency and complementary information mining of multi-views. . That is, they ignore the physical meaning of individual views in the data set.

MCGM与FIMAN、ICM2L、BEMVL和iMvWL的比较表明，本发明方法在处理非对齐视图问题上具有很好的性能。由于其它算法并没有考虑到视图非对齐问题，在遇到视图非对齐情况下是存在缺陷的。其中iMvWL忽略了视图的多样性，这导致它在视图信息提取方面有局限性。Comparison of MCGM with FIMAN, ICM2L, BEMVL and iMvWL shows that the invented method has good performance in dealing with non-aligned view problems. Since other algorithms do not consider the problem of view misalignment, they are flawed when encountering view misalignment. Among them, iMvWL ignores the diversity of views, which leads to its limitations in view information extraction.

需要说明的是，本发明实施例的方法适用于非对齐多视图多标记分类问题。It should be noted that the method in the embodiment of the present invention is suitable for non-aligned multi-view multi-label classification problems.

以上对本发明实施例进行了详细介绍，本文中应用了具体实施方式对本发明进行了阐述，以上实施例的说明只是用于帮助理解本发明的方法；同时，对于本领域的一般技术人员，依据本发明的思想，在具体实施方式及应用范围上均会有改变之处，综上所述，本说明书内容不应理解为对本发明的限制。The embodiments of the present invention have been introduced in detail above, and specific implementation modes are used in this article to illustrate the present invention. The description of the above embodiments is only used to help understand the method of the present invention; at the same time, for those of ordinary skill in the art, based on this The idea of the invention will be subject to change in the specific implementation and scope of application. In summary, the contents of this description should not be understood as limiting the invention.

Claims

1. The non-aligned multi-view multi-mark learning method based on the graph matching mechanism is characterized in that firstly, non-aligned multi-view multi-mark data are acquired, and the non-aligned multi-view multi-mark data are stored, preprocessed and data set divided to form a sample data set; constructing a feature matrix and an observable mark matrix based on training data in the sample data set; according to the feature matrix and the observable mark matrix, unaligned data are aligned: 1) Explicit alignment is carried out on unaligned data through a permutation matrix, namely point-to-point first-order alignment among samples; 2) Performing second-order alignment on the view on the graph structure by using distance matrixes of samples in different views, so that the alignment accuracy of the model is improved; mining the expression of commonality-personality among aligned data views, and constructing a non-aligned multi-view multi-mark learning model based on a graph matching mechanism by utilizing consistency and complementarity of cross views; training the model by an alternative optimization method until the model converges to obtain a classification predictor; and predicting the test set based on the converged classifier, and obtaining a mark classification result by using the output probability.

2. The non-aligned multi-view multi-label learning method based on a graph matching mechanism according to claim 1, wherein the method comprises the steps of,

step S1, acquiring non-aligned multi-view multi-mark data, and storing, preprocessing and dividing a data set for the non-aligned multi-view multi-mark data; adopting artificial synthetic data set; obtaining a non-aligned multi-view multi-marker dataset;

constructing a data set with V view samples from training dataWherein-> Is the complete feature space of the v-th view, n represents the training sample x _i Number d of (d) _v Representing the dimension of each sample;representing a markup space corresponding to the feature space; wherein y is _i ∈{0,1} ^n×q Is x _i Q represents the number of marks;

step S2, constructing a feature matrix X for training data in the sample data set acquired in the step S1 _v The observable mark matrix Y is used for constructing first-order and second-order relation matching of cross views, realizing characteristic alignment of a plurality of views, and constructing a non-aligned multi-view multi-mark classification model based on 'common-single' semantic representation; the method comprises the following specific steps:

(a) Explicit alignment is carried out on unaligned data by utilizing a permutation matrix, and point-to-point first-order alignment between examples is carried out; obtaining a relatively correct mapping relation between the examples after first-order alignment, and extracting common low-dimensional representation of different view data by adopting non-negative matrix factorization; introducing a feature mapping matrix W based on the obtained common representation and the observable marking matrix ₀ Constructing a linear mapping relation from a shared subspace to a mark space to obtain an initial learning model:

s.t.(M _v ) _i,j ∈{0,1},M _v 1＝1,1 ^T M _v ＝1 ^T ,P≥0,H _v ≥0,W ₀ ≥0(1)

wherein,is the replacement matrix of the v-th view, and the feature matrix is multiplied by the replacement matrix to obtain aligned multi-view data; p and H _v The aligned data are obtained through non-negative moment decomposition; />An individual mapping matrix representing a v-th view; />Is a shared subspace, where k is the size of the desired data reduced dimension; p is greater than or equal to 0 and H _v 0 is the non-negative constraint of the matrix; />Is a coefficient matrix corresponding to P, thus also constraining W ₀ ≥0；W ₀ Learning the commonality information of the heterogeneous view by sharing the mapping relation from the subspace P to the subspace Y; alpha and gamma are two hyper-parameters;

(b) Taking into account the individual mapping matrixIn fact, for encoding the corresponding characteristics of a single view, another set of coefficient matrices is defined +.>To capture the unique properties of a single view, +.>The reconstructed feature matrix PH with the v-th view personalized information _v Mapping to a markup space; utilizing personality and commonality information of heterogeneous views, andthe expansion formula (1) is as follows:

s.t.(M _v ) _i,j ∈{0,1},M _v 1＝1,1 ^T M _v ＝1 ^T ,P≥0,H _v ≥0,W ₀ ≥0,W _v ≥0(2)

wherein P can be regarded as a dictionary matrix containing all view information, and H _v Representing individual view-specific coding coefficients; PH value _v The goal of (1) is to capture individual information for a particular view, while P captures shared information for all views; PH value _v And P is considered as the expression of "commonality-personality" of the multi-view data;

(c) Considering alignment and mark correlation of second-order diagram structures among views; due to M _v X _v And M _j X _j Respectively represent view X _v And view X _j Is a properly aligned view of (1); mapping matrices with cross-viewTo represent view X _v And view X _j Matching degree of cross-view samples; constructing a structure matching loss item to explore a correct cross-view mapping relation; establishing a distance matrix S between respective samples for each view _v A graph structure representing view v; by mapping matrix across views->Permutation distance matrix S _v And S is _j Making the graph connection structure of the two views as similar as possible; />Representing a marker correlation matrix, and extracting known correlation marker information by utilizing correlation in the multi-marker data through the matrix A; introducing a weight of the second order alignment of the super-parameter beta balance; graph-based matching mechanismThe non-aligned multi-view multi-label learning model of (a) is expressed as follows:

s.t.(M _v ) _i,j ∈{0,1},M _v 1＝1,1 ^T M _v ＝1 ^T ,P≥0,H _v ≥0,W ₀ ≥0,M _v ≥0

wherein the method comprises the steps of

S3, simplifying the expression of the non-aligned multi-view multi-mark learning model based on the graph matching mechanism in S2, and performing alternating optimization training to minimize the model until the model converges to obtain a classification predictor; the method comprises the following specific steps:

(a) Due to M _v Is non-convex and it is difficult to obtain an optimal solution; handle M _v Is relaxed as:

(b) Converting the objective function into an unconstrained problem by introducing Lagrangian multipliers lambda, phi, theta, omega, psi; in view v, the objective function (3) is changed over view v to the following form:

(c) Iterative optimizationFix P->M _v Calculation is independent of M _v′ V' noteqv; thus for each view v, for M _v Optimizing independently;

standard methods solve the coupling equation (3) and constraintsThe following is obtained with respect to M using a nonlinear method _v Is a rule for iterative updating of:

wherein:

(d) Iterative optimization P; fixingIn the formula (3)>Derivative of P to obtain:

using the KKT condition, i.e. phi _i,j P _i,j =0, resulting in the following iterative update rule for P:

(e) Iterative optimization A; fixingIn the formula (3)>Derivative A is obtained by:

let the derivative equal to 0, the following update rule for A is obtained:

(f) Iterative optimizationFix->When H is _v Calculation is independent of H _v V' noteqv; for each view v, for H _v Optimizing alone, in formula (3)>For H _v Derivative is obtained, and the following steps are obtained:

using the KKT condition, i.e. Θ _i,j (H _v ) _i,j =0, giving the following relation to H _v Is a rule for iterative updating of:

(g) Iterative optimization W ₀ The method comprises the steps of carrying out a first treatment on the surface of the FixingIn the formula (3)>For W ₀ Derivative is obtained, and the following steps are obtained:

using the KKT condition, i.e. Ω _i,j (W ₀ ) _i,j =0, giving the following relation to W ₀ Is a rule for iterative updating of:

(h) Iterative optimizationFix->In the formula (3)>For W _v Derivative is obtained, and the following steps are obtained:

using the KKT condition, i.e. ψ _i,j (W _v ) _i,j =0 gets the following about W _v Updating rules:

(i) Repeating (c) to (h) continuously and alternatelyUpdating parameters Until the iteration stop condition is met; converging an objective function, outputting optimal parameters of the non-aligned multi-view multi-mark learning model based on the graph matching mechanism, and obtaining a classification predictor;

step S4, based on the non-aligned multi-view multi-mark learning model based on the graph matching mechanism after convergence, predicting the test set, and obtaining a mark prediction result by using the output probability, wherein the step comprises the following steps: marking prediction matricesWill be given by equationGiven.