CN112508117A

CN112508117A - Self-adaptive multi-view dimension reduction method and device based on graph embedding

Info

Publication number: CN112508117A
Application number: CN202011484154.4A
Authority: CN
Inventors: 尹宝才; 张超辉; 王博岳; 胡永利; 孙艳丰
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2020-12-16
Filing date: 2020-12-16
Publication date: 2021-03-16
Anticipated expiration: 2040-12-16
Also published as: CN112508117B

Abstract

The adaptive multi-view dimensionality reduction method and device based on graph embedding can fully explore the correlation of data between different views, and do not ignore the relationship between data and data within a single view, and can highlight the importance of some features. The method includes: (1) embedding high-dimensional data into low-dimensional space by means of graph embedding; (2) calculating the distance between different samples from the same view to measure the similarity of different samples; (3) sharing a common view through different views (4) measure the similarity to explore the relationship between different views; (5) obtain the projection matrix of each view, and compare the projection matrix with the original data. Multiply to get the final dimensionality reduction result.

Description

Adaptive multi-view dimensionality reduction method and device based on graph embedding

技术领域technical field

本发明涉及数据挖掘、机器学习和模式识别的技术领域，尤其涉及一种基于图嵌入的自适应多视降维方法，以及一种基于图嵌入的自适应多视降维装置。The invention relates to the technical fields of data mining, machine learning and pattern recognition, in particular to an adaptive multi-view dimensionality reduction method based on graph embedding, and an adaptive multi-view dimensionality reduction device based on graph embedding.

背景技术Background technique

降维是一种常用的处理高维数据的基本方法，其目的是将数据的特征维度从原始维度降低到指定维度，使得降维后的数据仍能保持数据之间的局部结构，且相似数据之间更相似，不相似的数据之间差距更大。在过去的几十年中，已经提出了许多经典的降维方法(例如PCA降维，LLE降维，CCA降维等)，并在数据挖掘、图像处理方面取得了巨大的成功。Dimensionality reduction is a commonly used basic method for processing high-dimensional data. Its purpose is to reduce the feature dimension of the data from the original dimension to the specified dimension, so that the data after dimension reduction can still maintain the local structure between the data, and similar data. The more similar the data, the larger the gap between the dissimilar data. In the past few decades, many classical dimensionality reduction methods (such as PCA dimensionality reduction, LLE dimensionality reduction, CCA dimensionality reduction, etc.) have been proposed, and have achieved great success in data mining and image processing.

在当今社会，随着摄像机及传感器等技术的广泛应用，数据通常由不同的信息源、模态或特征组成。例如，同一篇新闻可由不同的语言进行报道；图片可以由HOG，GIST和LBP等特征描述；一个人可以由不同角度的摄像机进行拍摄，这些不同的特征或角度可以描述数据不同的特征信息，所有这些都称之为多视图数据。In today's society, with the wide application of technologies such as cameras and sensors, data usually consists of different information sources, modalities or features. For example, the same news can be reported in different languages; pictures can be described by features such as HOG, GIST and LBP; a person can be photographed by cameras from different angles, and these different features or angles can describe different feature information of the data, all These are called multi-view data.

当前用于机器学习的主要方法是对单个视图表示的数据进行降维，这种方法称为单视图降维(或传统的降维)，因为它不考虑来自不同视图的相关信息，容易造成以偏概全的问题，所以不能够真实的反应数据的全貌。对于多视图数据，通过利用每个视图的兼容信息和互补信息全面的反映数据全貌，从而使降维性能得到提升。The main method currently used for machine learning is to reduce the dimensionality of the data represented by a single view. This method is called single-view dimensionality reduction (or traditional dimensionality reduction), because it does not consider relevant information from different views, which is easy to cause The problem of partial generalization, so it cannot truly reflect the full picture of the data. For multi-view data, the dimensionality reduction performance is improved by using the compatible and complementary information of each view to comprehensively reflect the whole data.

随着大数据时代的到来，我们拥有了日益进步的数据采集和存储能力，但是海量的数据会导致科学信息的超载。针对多视图数据的降维引起了越来越多研究者的关注，它把没有任何标注的多视图数据降低到一个非常低的维度，使现有的方法能对数据进行处理，同时节省内存占用空间。在信息检索、生物数据分析和医学诊断等方面中都起着重要的作用。With the advent of the era of big data, we have increasingly advanced data collection and storage capabilities, but massive data will lead to the overload of scientific information. Dimensionality reduction for multi-view data has attracted more and more attention from researchers. It reduces multi-view data without any annotations to a very low dimension, enabling existing methods to process data while saving memory usage. space. It plays an important role in information retrieval, biological data analysis and medical diagnosis.

多视图降维的目的是考虑不同视图的信息，通过融合各个视图的信息，最终获得这个多视高维数据的低维表示，使数据维度变低的情况下还能保持原有的结构关系。目前现有的解决多视角降维问题的方法大多还是直接用单视降维方法直接对多视数据进行降维，没有考虑到视与视之间的相关性与互补性。PCA是最经典的降维方法，张长青等提出的灵活多视协同降维由于其直接能对多视数据进行处理，并且利用希尔伯特-施密特独立准则增强视角之间的相关性从而备受关注，该方法能直接得到每个视图的投影矩阵从而对各个视图进行降维，并且能取得一个不错的效果。但是目前多视降维方法存在两处不足：1)没有充分探索不同视之间数据的关联性。2)探索了不同视数据之间的关联性却忽略了单视内数据与数据的联系。3)大多数降维方法无法凸显部分特征的重要性。The purpose of multi-view dimensionality reduction is to consider the information of different views, and finally obtain the low-dimensional representation of the multi-view high-dimensional data by fusing the information of each view, so that the original structural relationship can be maintained even when the data dimension is reduced. At present, most of the existing methods to solve the multi-view dimensionality reduction problem directly use the single-view dimensionality reduction method to directly reduce the dimensionality of the multi-view data, without considering the correlation and complementarity between the views. PCA is the most classic dimensionality reduction method. The flexible multi-view collaborative dimensionality reduction proposed by Zhang Changqing et al. can directly process multi-view data and use the Hilbert-Schmidt independence criterion to enhance the correlation between perspectives. It has attracted much attention. This method can directly obtain the projection matrix of each view to reduce the dimension of each view, and can achieve a good effect. However, the current multi-view dimensionality reduction methods have two shortcomings: 1) The correlation of data between different views is not fully explored. 2) Exploring the correlation between data from different views but ignoring the relationship between data and data within a single view. 3) Most dimensionality reduction methods cannot highlight the importance of some features.

发明内容SUMMARY OF THE INVENTION

为克服现有技术的缺陷，本发明要解决的技术问题是提供了一种基于图嵌入的自适应多视降维方法，其能够充分探索不同视之间数据的关联性，并且不忽略单视内数据与数据的联系，能够凸显部分特征的重要性。In order to overcome the defects of the prior art, the technical problem to be solved by the present invention is to provide an adaptive multi-view dimensionality reduction method based on graph embedding, which can fully explore the correlation of data between different views without ignoring single view. The relationship between internal data and data can highlight the importance of some features.

本发明的技术方案是：这种基于图嵌入的自适应多视降维方法，包括以下步骤：The technical scheme of the present invention is: this adaptive multi-view dimensionality reduction method based on graph embedding includes the following steps:

(1)通过图嵌入的方式把高维数据嵌入到低维空间；(1) Embed high-dimensional data into low-dimensional space by means of graph embedding;

(2)通过相同视不同样本之间计算距离，衡量不同样本的相似度；(2) Measure the similarity of different samples by calculating the distance between different samples with the same view;

(3)通过不同视角共享一个相同的相似性矩阵来探索不同视之间样本之间的关系；(3) Explore the relationship between samples between different views by sharing a same similarity matrix from different views;

(4)度量相似度，从而探索不同视之间的关系；(4) Measure similarity, so as to explore the relationship between different views;

(5)得到各视的投影矩阵，通过投影矩阵与原始数据进行相乘来得到最后的降维结果。(5) The projection matrix of each view is obtained, and the final dimension reduction result is obtained by multiplying the projection matrix with the original data.

本发明首先通过图嵌入的方式把高维数据嵌入到低维空间；其次通过相同视不同样本之间计算距离衡量不同样本的相似度；然后通过不同视角共享一个相同的相似性矩阵来充分探索不同视之间样本之间的关系；最后离度量相似度，从而探索不同视之间的关系；最后得到各视的投影矩阵，通过投影矩阵与原始数据进行相乘来得到最后的降维结果，因此能够充分探索不同视之间数据的关联性，并且不忽略单视内数据与数据的联系，能够凸显部分特征的重要性。The invention firstly embeds high-dimensional data into low-dimensional space by means of graph embedding; secondly, the similarity of different samples is measured by calculating the distance between different samples of the same view; and then the same similarity matrix is shared by different views to fully explore different The relationship between the samples between the views; finally, the similarity is measured to explore the relationship between different views; finally, the projection matrix of each view is obtained, and the final dimension reduction result is obtained by multiplying the projection matrix and the original data, so It can fully explore the correlation of data between different views, and does not ignore the relationship between data and data within a single view, which can highlight the importance of some features.

还提供了一种基于图嵌入的自适应多视降维装置，该装置包括：Also provided is an adaptive multi-view dimensionality reduction device based on graph embedding, the device comprising:

嵌入模块，其通过图嵌入的方式把高维数据嵌入到低维空间；Embedding module, which embeds high-dimensional data into low-dimensional space by means of graph embedding;

计算距离模块，其通过相同视不同样本之间计算距离，衡量不同样本的相似度；A distance calculation module, which measures the similarity of different samples by calculating the distance between different samples with the same view;

共享模块，其通过不同视角共享一个相同的相似性矩阵来探索不同视之间样本之间的关系；Shared module, which explores the relationship between samples between different views by sharing a same similarity matrix from different views;

度量模块，其度量相似度，从而探索不同视之间的关系；A measurement module, which measures similarity to explore the relationship between different views;

迭代模块，其得到各视的投影矩阵，通过投影矩阵与原始数据进行相乘来得到最后的降维结果。Iterative module, which obtains the projection matrix of each view, and obtains the final dimension reduction result by multiplying the projection matrix with the original data.

附图说明Description of drawings

图1示出了根据本发明的基于图嵌入的自适应多视降维方法的流程图。FIG. 1 shows a flow chart of an adaptive multi-view dimensionality reduction method based on graph embedding according to the present invention.

具体实施方式Detailed ways

如图1所示，这种基于图嵌入的自适应多视降维方法，包括以下步骤：As shown in Figure 1, this adaptive multi-view dimensionality reduction method based on graph embedding includes the following steps:

优选地，所述步骤(1)中，Preferably, in the step (1),

假设给定一个数据集

其中D表示每个样本数据的维度，N表示样本的数量，

是数据X的低维表示矩阵，其中K表示样本降维后的数据维度并且K＜＜，如果原始数据x_i和x_j相似，那么它们的低维表示z_i和z_j也相似，第i个样本与第j个样本之间的相似性用s_ij来表示，那么求解的目标公式(1)表示：Suppose you are given a dataset

where D represents the dimension of each sample data, N represents the number of samples,

is the low-dimensional representation matrix of the data X, where K represents the data dimension after sample dimension reduction and K<<, if the original data x _i and x _j are similar, then their low-dimensional representations z _i and z _j are also similar, the i-th The similarity between the sample and the jth sample is represented by s _ij , then the target formula (1) to be solved is expressed as:

公式(1)的图正则化表示为:The graph regularization of formula (1) is expressed as:

其中，L表示归一化图拉普拉斯矩阵，且

D是一个对角矩阵，其中

where L represents the normalized graph Laplacian matrix, and

D is a diagonal matrix where

对于多视数据来说，公式(2)就变成：For multi-view data, formula (2) becomes:

其中m表示不同视的数据，Z^(m)表示第m视的低维数据，L^(m)表示第m视的归一化图拉普拉斯矩阵，又因为低维数据Z^(m)是通过投影矩阵P^(m)和原始数据X^(m)得到的，所以公式(3)写成：where m represents the data of different views, Z ^(m) represents the low-dimensional data of the mth view, L ^(m) represents the normalized graph Laplacian matrix of the mth view, and because the low-dimensional data Z ^(m) is It is obtained by the projection matrix P ^(m) and the original data X ^(m) , so formula (3) is written as:

其中，

的目的是避免平凡解。in,

The purpose is to avoid trivial solutions.

优选地，所述步骤(3)中，公式(4)变为：Preferably, in the step (3), the formula (4) becomes:

公式(6)加入一个正则化项S矩阵的F范数，公式(6)变成：Equation (6) adds a regularization term to the F norm of the S matrix, and Equation (6) becomes:

优选地，所述步骤(4)中，加入对原始数据的约束后，公式(7)变成：Preferably, in the step (4), after adding constraints to the original data, the formula (7) becomes:

公式(8)没有考虑数据重构误差：Equation (8) does not consider the data reconstruction error:

其中

表示把原始数据投影到低维空间，而

表示将投影到低维空间的数据在投影回原始空间，与原始数据计算一个差值，将公式(8)与公式(9)结合得到：in

represents the projection of the original data into a low-dimensional space, and

It means that the data projected to the low-dimensional space is projected back to the original space, and a difference is calculated with the original data, and the formula (8) and the formula (9) are combined to obtain:

优选地，所述步骤(5)中，通过迭代更新，最终得到各视的投影矩阵P^(m)，并且选择每个视所降维的维度，Preferably, in the step (5), through iterative update, the projection matrix P ^(m) of each view is finally obtained, and the dimension of the dimension reduced by each view is selected,

首先通过原始数据构建相似度矩阵S，当固定相似度矩阵S不变时，公式(10)中的

和

固定不变，对剩余项构建拉格朗日函数后得：First, the similarity matrix S is constructed from the original data. When the fixed similarity matrix S is unchanged, the formula (10) in

and

Fixed and unchanged, after constructing the Lagrangian function for the remaining terms, we get:

(11) (11)

通过特征值分解得方法，求得P^(m)并指定要降到多少维；Through the method of eigenvalue decomposition, find P ^(m) and specify how many dimensions to reduce;

当固定投影矩阵P^(m)时，重构误差项不会参与更新，公式(10)变为：When the projection matrix P ^(m) is fixed, the reconstruction error term does not participate in the update, and formula (10) becomes:

将公式(12)中得低维数据和原始数据之间的距离设为d_ij:Set the distance between the low-dimensional data in formula (12) and the original data as d _ij :

公式(12)写成：Formula (12) is written as:

求解公式(14)变成：Solving equation (14) becomes:

这样，求得

Thus, get

本领域普通技术人员可以理解，实现上述实施例方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成，所述的程序可以存储于一计算机可读取存储介质中，该程序在执行时，包括上述实施例方法的各步骤，而所述的存储介质可以是：ROM/RAM、磁碟、光盘、存储卡等。因此，与本发明的方法相对应的，本发明还同时包括一种基于图嵌入的自适应多视降维装置，该装置通常以与方法各步骤相对应的功能模块的形式表示。该装置包括：Those of ordinary skill in the art can understand that all or part of the steps in the method of the above-mentioned embodiments can be completed by instructing the relevant hardware through a program, and the program can be stored in a computer-readable storage medium, and the program can be stored in a computer-readable storage medium. During execution, it includes each step of the method in the above embodiment, and the storage medium may be: ROM/RAM, magnetic disk, optical disk, memory card, and the like. Therefore, corresponding to the method of the present invention, the present invention also includes an adaptive multi-view dimensionality reduction device based on graph embedding, which is usually expressed in the form of functional modules corresponding to each step of the method. The device includes:

以下更详细地说明本发明的技术方案。The technical solutions of the present invention are described in more detail below.

基于图嵌入的自适应多视降维方法主要包含以下步骤：The adaptive multi-view dimensionality reduction method based on graph embedding mainly includes the following steps:

1.基于图嵌入的降维1. Dimensionality reduction based on graph embedding

假设给定一个数据集

其中D表示每个样本数据的维度，N表示样本的数量。

是数据X的低维表示矩阵，其中K表示样本降维后的数据维度并且K＜＜N。如果原始数据x_i和x_j相似，那么它们的低维表示z_i和z_j也应该相似。第i个样本与第j个样本之间的相似性用s_ij来表示，那么我们求解的目标可以用一下公式表示：Suppose you are given a dataset

where D represents the dimension of each sample data, and N represents the number of samples.

is a low-dimensional representation matrix of data X, where K represents the data dimension after sample dimension reduction and K<<N. If the original data x _i and x _j are similar, then their low-dimensional representations z _i and z _j should also be similar. The similarity between the i-th sample and the j-th sample is represented by s _ij , then the goal of our solution can be expressed by the following formula:

(1)的图正则化表示可以写成如下函数:The graph regularization representation of (1) can be written as the following function:

其中，(2)公式里面的L表示归一化图拉普拉斯矩阵，且

D是一个对角矩阵，其中

Among them, L in the formula (2) represents the normalized graph Laplacian matrix, and

D is a diagonal matrix where

其中m表示不同视的数据，Z^(m)表示第m视的低维数据，L^(m)表示第m视的归一化图拉普拉斯矩阵。又因为低维数据Z^(m)是通过投影矩阵P^(m)和原始数据X^(m)得到的，所以公式(3)又可以写成如下形式：where m represents the data of different views, Z ^(m) represents the low-dimensional data of the mth view, and L ^(m) represents the normalized graph Laplacian matrix of the mth view. And because the low-dimensional data Z ^(m) is obtained through the projection matrix P ^(m) and the original data X ^(m) , the formula (3) can be written in the following form:

其中，

的目的是避免平凡解。(4)式没有考虑视与视之间的关联关系。in,

The purpose is to avoid trivial solutions. Equation (4) does not take into account the relationship between vision and vision.

2.自适应局部结构学习2. Adaptive Local Structure Learning

由于公式(4)没有探索视与视之间的关系，接下来将介绍如何将视之间的关系结合。通过多视共享相似性矩阵可以很有效的来探索视与视之间的关系，因为视与视之间的相似性矩阵相同的话，视与视之间的拉普拉斯矩阵也会相同，那么公式(4)就变成：Since formula (4) does not explore the relationship between views, how to combine the relationship between views will be introduced next. The relationship between views and views can be effectively explored by sharing the similarity matrix of multiple views, because if the similarity matrix between views and views is the same, the Laplacian matrix between views and views will also be the same, then Formula (4) becomes:

而公式(5)又是下面公式的变形：And formula (5) is a modification of the following formula:

但是公式(6)会出现两个最相似的样本相似度为1，其他样本相似度为0的情况，为了解决这一问题，加入一个正则化项S矩阵的F范数，公式(6)变成：However, in formula (6), the similarity of the two most similar samples is 1, and the similarity of other samples is 0. In order to solve this problem, a regularization term is added to the F norm of the S matrix, and formula (6) becomes to make:

3.基于图嵌入的自适应多视降维3. Adaptive multi-view dimensionality reduction based on graph embedding

公式(7)中的相似度只对降维后的数据做了一个约束，没有约束降维前的原始数据，从而不能保证降维后的数据结构和降维前的一样。故加入对原始数据的约束后，公式(7)变成：The similarity in formula (7) only imposes a constraint on the data after dimensionality reduction, and does not constrain the original data before dimensionality reduction, so it cannot be guaranteed that the data structure after dimensionality reduction is the same as that before dimensionality reduction. Therefore, after adding constraints on the original data, formula (7) becomes:

其中

表示把原始数据投影到低维空间，而

表示将投影到低维空间的数据在投影回原始空间，与原始数据计算一个差值。将公式(8)与公式(9)结合得到最终的公式：in

Indicates that the data projected to the low-dimensional space is projected back to the original space, and a difference is calculated with the original data. Combine Equation (8) with Equation (9) to get the final formula:

最后通过迭代更新，求解P^(m)和S。Finally, through the iterative update, P ^(m) and S are solved.

(1)固定S，更新P^(m) (1) Fix S, update P ^(m)

首先通过原始数据构建相似度矩阵S，当固定相似度矩阵S不变时，(10)式中的

和

固定不变，对剩余项构建拉格朗日函数后得：First, the similarity matrix S is constructed from the original data. When the fixed similarity matrix S is unchanged, the equation (10) in

and

通过特征值分解得方法，我们可以求得P^(m)并可指定要降到多少维。Through the eigenvalue decomposition method, we can find P ^(m) and specify how many dimensions to reduce to.

(2)固定P^(m)，更新S(2) Fix P ^(m) and update S

当固定投影矩阵P^(m)时，公式中得重构误差项不会参与更新，公式(10)变为：When the projection matrix P ^(m) is fixed, the reconstruction error term in the formula will not participate in the update, and formula (10) becomes:

公式(12)就可写成：Equation (12) can then be written as:

求解公式(14)可变成：Solving equation (14) can become:

这样，就能求得

Thus, it can be obtained

综上所述，通过方法不断的迭代更新，最终得到各视的投影矩阵P^(m)，并且可以选择每个视所降维的维度。To sum up, through the continuous iterative update of the method, the projection matrix P ^(m) of each view is finally obtained, and the dimension reduced by each view can be selected.

本实验所使用的数据集有文本数据集和图像数据集，其中文本数据集是3sources，图像数据集是IXMAS。The datasets used in this experiment are text datasets and image datasets, where the text dataset is 3sources and the image dataset is IXMAS.

其中3sources是一个多视图文本数据集，收集了来自BBC新闻、路透社、卫报(TheGuardian)三个著名新闻网站的新闻。该数据集涵盖了2009年2月至4月期间的416个不同的新闻报道。本次实验选择了这三个新闻网站报道的169个新闻。对三种视角的新闻分别提取3560、3631、3068维特征，这些新闻类别属于商业、娱乐、健康、体育、政治、技术其中之一。此数据集包含169个样本，6个类别。IXMAS是一个多视图图像数据集，采集了五个不同角度拍的照片。数据集包括十个人去执行查看手表、交叉双臂、抓头、坐下、起来、转圈、行走、挥手、击打、踢、捡起这十一个动作的照片，每个动作重复三次。此数据集包含339个样本，11个类别。Among them, 3sources is a multi-view text dataset that collects news from three well-known news websites, BBC News, Reuters, and The Guardian. The dataset covers 416 different news stories from February to April 2009. This experiment selected 169 news stories reported by these three news websites. 3560, 3631, and 3068 dimensional features are extracted for news from three perspectives, and these news categories belong to one of business, entertainment, health, sports, politics, and technology. This dataset contains 169 samples with 6 categories. IXMAS is a multi-view image dataset that collects photos from five different angles. The dataset consists of photos of ten people performing eleven actions of checking their watch, crossing their arms, scratching their head, sitting down, getting up, spinning in circles, walking, waving, hitting, kicking, and picking up, and each action is repeated three times. This dataset contains 339 samples with 11 categories.

为了验证所提出放大的降维性能，将基于图嵌入的自适应多视降维方法(MVDR)与经典的PCA方法、多视非负矩阵分解方法(MVNMF)和能直接进行多视降维的灵活多视降维方法(McDR)进行了对比，统一将数据维度降到20维，并将降维后的数据用统一的聚类方法进行聚类，对聚类性能进行比较。In order to verify the dimensionality reduction performance of the proposed scaled-up, the adaptive multi-view dimensionality reduction method (MVDR) based on graph embedding is compared with the classical PCA method, the multi-view non-negative matrix factorization method (MVNMF), and the multi-view dimensionality reduction method that can directly perform multi-view dimensionality reduction. The flexible multi-view dimensionality reduction method (McDR) is compared, and the data dimension is reduced to 20 dimensions uniformly, and the data after dimension reduction is clustered by a unified clustering method, and the clustering performance is compared.

本实验使用归一化互信息(NMI)、正确度(ACC)和纯度(Purity)这三个指标来评价其聚类的性能，指标的值越高会反映出更好的聚类性能。具体结果如表1-表2所示。This experiment uses three indicators of normalized mutual information (NMI), correctness (ACC) and purity (Purity) to evaluate its clustering performance. The higher the value of the indicators, the better the clustering performance will be. The specific results are shown in Table 1-Table 2.

表1和表2分别显示了在3sources和IXMAS这两个数据库降维后的聚类性能。其中单视表示使用一个视角的数据进行降维后进行聚类，多视表示联合多视角数据进行降维后聚类。聚类在表中用粗体标记出单视和多视性能最佳的结果。从这两个表中可以看出，多视降维在结果上都优于相应的单视降维方法，且无论单视还是多视，MVDR降维方法要优于其他方法。Table 1 and Table 2 show the clustering performance after dimensionality reduction in the 3sources and IXMAS databases, respectively. Among them, single view means that the data from one perspective is used for dimensionality reduction and clustering, and multi-view means that the multi-view data is combined for dimensionality reduction and clustering. Clusters are marked in bold in the table with the best single- and multi-view performance results. It can be seen from these two tables that the results of multi-view dimensionality reduction are better than the corresponding single-view dimensionality reduction methods, and the MVDR dimensionality reduction method is better than other methods regardless of single-view or multi-view.

观察这些表，可以得出以下结论：Looking at these tables, the following conclusions can be drawn:

1)多视降维方法MVDR明显优于其他的多视降维方法，并且在聚类效果上均能实现最佳性能。这证明所提出的方法能够很好的融合多视信息对数据进行降维，并能提取更多的特征性信息以获得更好的聚类结果。1) The multi-look dimensionality reduction method MVDR is obviously better than other multi-look dimensionality reduction methods, and can achieve the best performance in the clustering effect. This proves that the proposed method can well integrate multi-view information to reduce dimensionality of data, and can extract more characteristic information to obtain better clustering results.

2)实验中进行多视降维进行聚类结果基本都要优于单视降维，证明了进行多视融合降维要比单视降维有效果。2) In the experiment, the clustering results of multi-view dimensionality reduction are basically better than single-view dimensionality reduction, which proves that multi-view fusion dimensionality reduction is more effective than single-view dimensionality reduction.

总之，MVDR与其它的降维方法相比较，实现了更高的聚类精度，得到了更好的降维效果。这预示着MVDR方法在未来的实际应用中具有更好的前景。In short, compared with other dimensionality reduction methods, MVDR achieves higher clustering accuracy and better dimensionality reduction effect. This bodes well for the future practical application of the MVDR method.

表1Table 1

表2Table 2

以上所述，仅是本发明的较佳实施例，并非对本发明作任何形式上的限制，凡是依据本发明的技术实质对以上实施例所作的任何简单修改、等同变化与修饰，均仍属本发明技术方案的保护范围。The above are only preferred embodiments of the present invention, and do not limit the present invention in any form. Any simple modifications, equivalent changes and modifications made to the above embodiments according to the technical essence of the present invention still belong to the present invention The protection scope of the technical solution of the invention.

Claims

1. The adaptive multi-view dimensionality reduction method based on graph embedding is characterized in that: the method comprises the following steps:

(1) Embed high-dimensional data into low-dimensional space by means of graph embedding;

(2) Measure the similarity of different samples by calculating the distance between different samples with the same view;

(3) Explore the relationship between samples between different views by sharing a same similarity matrix from different views;

(4) Measure similarity, so as to explore the relationship between different views;

(5) The projection matrix of each view is obtained, and the final dimension reduction result is obtained by multiplying the projection matrix with the original data.

2. The adaptive multi-view dimensionality reduction method based on graph embedding according to claim 1, wherein: in the step (1),

Suppose you are given a dataset

is the low-dimensional representation matrix of data X, where K represents the data dimension after sample dimension reduction and K<<N, if the original data x _i and x _j are similar, then their low-dimensional representations z _i and z _j are also similar, the first The similarity between the i sample and the jth sample is represented by s _ij , then the target formula (1) to be solved is expressed as:

The graph regularization of formula (1) is expressed as:

where L represents the normalized graph Laplacian matrix, and

D is a diagonal matrix where

For multi-view data, formula (2) becomes:

where m represents the data of different views, Z ^(m) represents the low-dimensional data of the mth view, L ^(m) represents the normalized graph Laplacian matrix of the mth view, and because the low-dimensional data Z ^(m) is It is obtained by the projection matrix P ^(m) and the original data X ^(m) , so formula (3) is written as:

in,

The purpose is to avoid trivial solutions.

3. The adaptive multi-view dimensionality reduction method based on graph embedding according to claim 2, wherein: in the step (3), formula (4) becomes:

Equation (6) adds a regularization term to the F norm of the S matrix, and Equation (6) becomes:

4. The adaptive multi-view dimensionality reduction method based on graph embedding according to claim 3, characterized in that: in the step (4), after adding constraints to the original data, the formula (7) becomes:

Equation (8) does not consider the data reconstruction error:

in

5. The adaptive multi-view dimensionality reduction method based on graph embedding according to claim 4, characterized in that: in the step (5), through iterative update, finally obtain the projection matrix P ^(m) of each view, and Choose the dimension that each view is reduced by,

First, the similarity matrix S is constructed from the original data. When the fixed similarity matrix S is unchanged, the formula (10) in

and

Through the method of eigenvalue decomposition, find P ^(m) and specify how many dimensions to reduce;

When the projection matrix P ^(m) is fixed, the reconstruction error term will not participate in the update, and formula (10) becomes:

Set the distance between the low-dimensional data in formula (12) and the original data as d _ij :

Formula (12) is written as:

Solving equation (14) becomes:

Thus, get

6. An adaptive multi-view dimensionality reduction device based on graph embedding, characterized in that: the device comprises:

Embedding module, which embeds high-dimensional data into low-dimensional space by means of graph embedding;

A distance calculation module, which measures the similarity of different samples by calculating the distance between different samples with the same view;

Shared module, which explores the relationship between samples between different views by sharing a same similarity matrix from different views;

A measurement module, which measures similarity to explore the relationship between different views;

Iterative module, which obtains the projection matrix of each view, and obtains the final dimension reduction result by multiplying the projection matrix with the original data.