CN112766412B - Multi-view clustering method based on self-adaptive sparse graph learning - Google Patents

Multi-view clustering method based on self-adaptive sparse graph learning Download PDF

Info

Publication number
CN112766412B
CN112766412B CN202110158287.0A CN202110158287A CN112766412B CN 112766412 B CN112766412 B CN 112766412B CN 202110158287 A CN202110158287 A CN 202110158287A CN 112766412 B CN112766412 B CN 112766412B
Authority
CN
China
Prior art keywords
view
learning
following
solving
clustering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110158287.0A
Other languages
Chinese (zh)
Other versions
CN112766412A (en
Inventor
肖庆江
黄奕轩
杜世强
石玉清
单广荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwest Minzu University
Original Assignee
Northwest Minzu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwest Minzu University filed Critical Northwest Minzu University
Priority to CN202110158287.0A priority Critical patent/CN112766412B/en
Publication of CN112766412A publication Critical patent/CN112766412A/en
Application granted granted Critical
Publication of CN112766412B publication Critical patent/CN112766412B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a multi-view clustering method based on self-adaptive sparse graph learning, which comprises the following steps: firstly, obtaining a similar matrix of each view through self-adaptive neighbor learning aiming at a data matrix of each view; then, the similarity matrix of each view is automatically weighted, and a shared similarity matrix with a sparse structure is obtained through learning by using sparse constraint; and finally, optimizing an objective function by using a high-efficiency iterative updating algorithm based on a multiplier alternating direction method ADMM, and carrying out standard spectral clustering on the shared similarity matrix to obtain a final clustering result. The invention improves the quality of each view similar diagram and enhances the robustness to noise and abnormal values. The method has the advantages that the calculation complexity is approximately equal to that of the spectral clustering method based on the single view, the model calculation speed is high, and the framework is simple and easy to realize.

Description

Multi-view clustering method based on self-adaptive sparse graph learning
Technical Field
The invention belongs to the technical field of data analysis, and particularly relates to a multi-view clustering method based on self-adaptive sparse graph learning.
Background
Clustering is a data analysis method commonly used in the fields of machine learning, pattern recognition, data mining, artificial intelligence, and the like, and aims to divide a data set into a plurality of subsets composed of similar objects according to the characteristics of the data. With the rapid development of internet technology and sensor technology, the actually acquired data description has evolved from a single view in the past to a ubiquitous multi-view description, and multi-view data can provide more sufficient information for data analysis tasks, is semantically richer, more useful, but more complex. Numerous studies have shown that multi-view learning is more efficient, more robust and has better generalization capability than single-view learning. The multi-view learning method aims at simulating each view to learn one function and improving generalization performance by jointly optimizing all functions, so that information from different views is effectively fused.
The multi-view clustering aims to group data points into a certain number of modes by utilizing compatibility and complementary information of multi-view data, and the existing algorithm mainly comprises a graph-based method, a matrix decomposition method, a multi-core learning method and a subspace learning method. Among them, the graph-based method is receiving attention because of its simplicity and high efficiency, and can be subdivided into a multi-view spectral clustering method, which generally uses graphs constructed by KNN, and a multi-view subspace clustering method, which generally uses graphs constructed from a representation model, such as sparse representation and low-rank representation.
Most of the multi-view clustering methods are based on graph models to fuse multi-view information, and early methods often focused on fusing two-view information, which is not applicable to three or more views. A multi-view spectral clustering (S-MVSC) method based on sparse graph learning has been proposed by researchers to learn a shared similarity matrix with a sparse structure from multiple views, but ignores the quality differences between different similarity matrices. Another researcher proposed a self-weighted multi-view clustering (SwMC) method with multiple graphs to study the laplace rank constraint graph, but it requires singular value decomposition in each iteration, resulting in a very time-consuming whole process.
Disclosure of Invention
Aiming at the defects pointed out in the background technology, the invention provides a multi-view clustering method based on self-adaptive sparse graph learning, which aims to solve the problems in the prior art.
In order to achieve the above purpose, the invention adopts the following technical scheme:
a multi-view clustering method based on self-adaptive sparse graph learning comprises the following steps:
(1) Obtaining the ith data point x of the similarity matrix of each view for the v-th view through adaptive neighbor learning from the data matrix i All data points of the v-th viewWill be +.>As x i Neighbor to x of (2) i The connection is thus established for all data points of the v-th view by solving the following problem>
From S v Middle school studyThere are k non-zero values, k is a neighbor parameter, and the result obtained after solving is as follows:
wherein,according to the solving result, adaptively learning a similarity matrix of each view from the data;
(2) Shared similarity matrix learning
For multi-view data, p similarity matrices S are first constructed (1) ,S (2) ,...,S (p) Wherein S is (v) ∈R nxn (v is not less than 1 and not more than p), introducing parameters lambda and gamma, and providing the following model:
s.t.α (v) ≥0,α T 1 p =1,
wherein α= [ α ] (1)(2) ,...,α (p) ]Lambda > 0, gamma > 0; solving the above model translates into solving the following relaxation problem:
s.t.α (v) ≥0,α T 1 p =1,
wherein S 1 Is S 0 Convex relaxation of (a);
(3) Optimization algorithm
The problem of relaxation is solved by using a multiplier alternating direction method ADMM, alpha and S are updated alternately through fixed variables, a consensus similarity matrix with a sparse structure is learned as input of standard spectral clustering, and a clustering result is obtained.
Preferably, in step (1), the solution isAt this time, since only the nearest data point will become +.>But not all other data points>To solve the following problems:
since the optimal solution is that all data points of the v-th view will become x with the same probability 1/n i Further translates into solving the following problem:
solving the final result as
Preferably, in step (3), when updating S, the problem of relaxation is translated into solving the following problem:
for the followingIs an objective function of (1), which is:
therefore, it isEquivalent to the following formula:
definition of the definitionFurther simplified to->
Introducing a soft threshold contraction operator:wherein mu is more than 0, and the similarity matrix for obtaining the updated S is as follows:
preferably, in step (3), S is fixed while α is updated, and the relaxation problem is converted to solve the following problems:
s.t.α (v) ≥0,α T 1 p =1;
definition of the definitionThe equivalent is to solve the following problem:
s.t.α (v) ≥0,α T 1 p =1;
further converted into the following formThe formula:then solving by the multiplier alternate direction method ADMM.
Compared with the defects and shortcomings of the prior art, the invention has the following beneficial effects:
(1) The invention constructs a similar matrix for each view by using the adaptive neighbor learning method as input, thereby improving the quality of the similar graph constructed for each view.
(2) According to the multi-view clustering (ASGL) model based on self-adaptive sparse graph learning, each view is automatically weighted, a shared similarity matrix with a sparse structure is learned from a plurality of views to serve as the input of standard spectral clustering, and quality differences among different views are considered, so that noise generated by different views can be effectively eliminated, and the robustness to noise and outliers is improved.
(3) The ASGL model is quick and easy to realize, and the calculation complexity of the ASGL is approximately equal to that of single-view spectral clustering under the condition of not considering the time consumption of constructing the similarity matrix for all views and iteratively solving the optimal shared similarity matrix.
(4) The ASGL model is optimized through a high-efficiency iterative updating algorithm based on a multiplier alternating direction method (ADMM), and compared with several latest algorithms, the effectiveness of the ASGL method is verified through numerical experiments on six data sets.
Drawings
Fig. 1 is a flow frame diagram of a multi-view clustering method based on adaptive sparse graph learning according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
1. Adaptive neighbor graph learning
The ith data point x for the v-th view i All data points of the v-th viewWill be with probabilityAs x i Neighbor to x of (2) i And (5) connection. Usually, a small distance +.>A large probability should be assigned +.>Thus, for all data points of the v-th view, the probability +.>The natural approach to (a) is to solve the following problem:
equation (1) has a simple solution, where only the nearest data point will be x with a probability of 1 i v In other words, without considering the distance information in some data, it will be translated to solve the following problem:
since the optimal solution is that all data points of the v-th view will become x with the same probability 1/n i Is further transformed into solving the following problem by combining the neighbors of formulas (1) and (2):
from S v Middle school studyThere are k non-zero values, k is a neighbor parameter, and the solution final result is:
wherein,by equation (4), the similarity matrix for each view can be adaptively learned from the data.
2. Shared similarity matrix learning
The performance of spectral clustering depends largely on the quality of the similarity matrix, based on which the present invention explores a method of learning a shared similarity matrix from multiple input data views while taking into account the quality differences between different similarity matrices. For multi-view data, p similarity matrices S are first constructed (1) ,S (2) ,...,S (p) Wherein S is (v) ∈R nxn (v is not less than 1 and not more than p), introducing parameters lambda and gamma, and providing the following model:
wherein α= [ α ] (1)(2) ,...,α (p) ]Lambda > 0, gamma > 0; since the solution of formula (5) involves minimizing l 0 Norm, the solution of the model described above translates into solving the following relaxation problem:
wherein S 1 Is S 0 Is relaxed.
3. Optimization algorithm
And (3) using a high-efficiency iterative updating algorithm based on a multiplier alternating direction method ADMM to respectively and alternately update alpha and S through fixed variables, and learning a shared similarity matrix to obtain a clustering result.
(1) Fixing α, updating S, translates to solving the following problem:
the objective function of equation (7) is:
formula (8) is equivalent to the following formula:
in order to simplify the calculation amount, the rewriting formula (9) is as follows:
wherein,introducing a soft threshold contraction operator:
wherein μ > 0, a solution similar to formula (10) is obtained:
(2) Fixing S, updating α, translates to solving the following problem:
definition of the definitionThe equivalent is to solve the following problem:
the rewriter formula (14) is as follows:
equation (15) can be solved by an efficient iterative algorithm based on the multiplier alternating direction method ADMM.
By analysis, it can be easily found that the solution of α is related to S and that the solution of S is also related to α, so the original formula (6) is solved by alternately and iteratively optimizing S and α.
The whole process of solving the formula (6) is shown in the algorithm 1:
the flow frame diagram of the multi-view clustering method (ASGL) based on the self-adaptive sparse graph learning is shown in figure 1.
In order to verify the correctness of the clustering result, clustering experiments are respectively carried out on a 3source text data set, a COIL20 toy data set, an MSRC image data set, a NUS image data set, an ORL face data set and an outer Scene data set, and the Accuracy (Accumey) corresponding to the clustering result is 73.79%, 93.36%, 82.90%, 28.67%, 83.13% and 64.47% respectively; the normalized mutual information (Normalized mutual information) is: 67.45%, 97.01%, 73.75%, 16.34%, 93.19% and 55.58%; the adjustment random coefficients (Adjusted rand index) are respectively: 57.63%, 92.73%, 67.95%, 10.12%, 79.34% and 46.20%; purity (Purity) was: 81.07%, 94.78%, 83.29%, 31.10%, 86.85% and 66.55%. Compared with 5 latest multi-view clustering methods S-MVSC, swMC, AMGL, MLAN and AWP, the ASGL method provided by the invention obtains the highest clustering result under 6 experimental databases and 4 common clustering evaluation criteria.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims (3)

1. A multi-view clustering method based on self-adaptive sparse graph learning is characterized by clustering text or image data, and comprises the following steps:
(1) Self-adaptive neighbor graph learning, namely constructing a similarity matrix for each given multi-view data as input, wherein the similarity matrix is obtained by learning a data matrix of a view through the self-adaptive neighbor graph, and the method comprises the following steps of:
the ith data point x for the v-th view i All data points of the v-th viewWill be with probabilityAs x i Neighbor to x of (2) i The connection is thus established for all data points of the v-th view by solving the following problem>
From S v Middle school studyThere are k non-zero values, k is a neighbor parameter, and the result obtained after solving is as follows:
wherein,according to the solving result, adaptively learning a similarity matrix of each view from the data;
(2) Shared similarity matrix learning
A shared similarity matrix with a sparse structure is learned from given multi-view data as an input for standard spectral clustering, and the method is as follows:
first construct p similarity matrices S (1) ,S (2) ,...,S (p) Wherein S is (v) ∈R n×n (v is not less than 1 and not more than p), introducing parameters lambda and gamma, and providing the following model:
s.t.α (v) ≥0,α T 1 p =1,
wherein α= [ α ] (1)(2) ,...,α (p) ,λ>0,γ>0; solving the above model translates into solving the following relaxation problem:
s.tα (v) ≥0,α T 1 p =1,
wherein S 1 Is S 0 Convex relaxation of (a);
(3) Optimization algorithm
Adopting a multiplier alternating direction method ADMM to solve the relaxation problem, respectively and alternately updating alpha and S through fixed variables, and learning a shared similar matrix with a sparse structure as the input of standard spectral clustering to obtain a clustering result;
in step (1), solveAt this time, since only the nearest data point will become +.>But not all other data points>To solve the following problems:
since the optimal solution is that all data points of the v-th view will become x with the same probability 1/n i Further translates into solving the following problem:
solving the final result as
2. The adaptive sparse graph learning-based multiview clustering method of claim 1, wherein in step (3), α is fixed when S is updated, and the relaxation problem is converted to solve the following problems:
for the followingThe objective function numbers of (a) include:
therefore, it isEquivalent to the following formula:
definition of the definitionFurther simplify->Introducing a soft threshold contraction operator: />Wherein mu is more than 0, and the similarity matrix for obtaining the updated S is as follows:
3. the adaptive sparse graph learning-based multiview clustering method of claim 1, wherein in step (3), S is fixed when α is updated, and the relaxation problem is converted to solve the following problems:
s.tα (v) ≥0,α T 1 p =1;
definition of the definitionThe equivalent is to solve the following problem:
s.t.α (v) ≥0,α T 1 p =1;
further converted into the following form:then solving by the multiplier alternate direction method ADMM.
CN202110158287.0A 2021-02-05 2021-02-05 Multi-view clustering method based on self-adaptive sparse graph learning Active CN112766412B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110158287.0A CN112766412B (en) 2021-02-05 2021-02-05 Multi-view clustering method based on self-adaptive sparse graph learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110158287.0A CN112766412B (en) 2021-02-05 2021-02-05 Multi-view clustering method based on self-adaptive sparse graph learning

Publications (2)

Publication Number Publication Date
CN112766412A CN112766412A (en) 2021-05-07
CN112766412B true CN112766412B (en) 2023-11-07

Family

ID=75705070

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110158287.0A Active CN112766412B (en) 2021-02-05 2021-02-05 Multi-view clustering method based on self-adaptive sparse graph learning

Country Status (1)

Country Link
CN (1) CN112766412B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114550906A (en) * 2022-01-14 2022-05-27 山东师范大学 Cancer subtype recognition system based on multi-view robust representation

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108764276A (en) * 2018-04-12 2018-11-06 西北大学 A kind of robust weights multi-characters clusterl method automatically
CN109508752A (en) * 2018-12-20 2019-03-22 西北工业大学 A kind of quick self-adapted neighbour's clustering method based on structuring anchor figure
CN110378365A (en) * 2019-06-03 2019-10-25 广东工业大学 A kind of multiple view Subspace clustering method based on joint sub-space learning
CN111008637A (en) * 2018-10-08 2020-04-14 北京京东尚科信息技术有限公司 Image classification method and system
CN111401468A (en) * 2020-03-26 2020-07-10 上海海事大学 Weight self-updating multi-view spectral clustering method based on shared neighbor
CN112148911A (en) * 2020-08-19 2020-12-29 江苏大学 Image clustering method of multi-view intrinsic low-rank structure

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190138786A1 (en) * 2017-06-06 2019-05-09 Sightline Innovation Inc. System and method for identification and classification of objects

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108764276A (en) * 2018-04-12 2018-11-06 西北大学 A kind of robust weights multi-characters clusterl method automatically
CN111008637A (en) * 2018-10-08 2020-04-14 北京京东尚科信息技术有限公司 Image classification method and system
CN109508752A (en) * 2018-12-20 2019-03-22 西北工业大学 A kind of quick self-adapted neighbour's clustering method based on structuring anchor figure
CN110378365A (en) * 2019-06-03 2019-10-25 广东工业大学 A kind of multiple view Subspace clustering method based on joint sub-space learning
CN111401468A (en) * 2020-03-26 2020-07-10 上海海事大学 Weight self-updating multi-view spectral clustering method based on shared neighbor
CN112148911A (en) * 2020-08-19 2020-12-29 江苏大学 Image clustering method of multi-view intrinsic low-rank structure

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
A multitask multiview clustering algorithm in heterogeneous situations based on LLE and LE;Yiling Zhang 等;《Knowledge-Based Systems》;第163卷;776-786 *
Multi-view spectral clustering algorithm based on shared nearest neighbor;Song Yan 等;《Journal of Computer Applications》;第40卷(第11期);3211-3216 *
基于动态近邻学习的多视图聚类方法研究;肖庆江;《中国优秀硕士学位论文全文数据库信息科技辑》(第(2022)09期);I138-133 *
基于子空间学习的多视角聚类方法研究;许楠;《中国优秀硕士学位论文全文数据库信息科技辑》(第(2020)02期);I138-1014 *
多视图嵌入学习方法及其应用研究;沈肖波;《中国博士学位论文全文数据库信息科技辑》(第(2018)07期);I138-45 *
面向多视图数据的降维与聚类算法研究;何云;《中国优秀硕士学位论文全文数据库信息科技辑》(第(2020)02期);I138-821 *

Also Published As

Publication number Publication date
CN112766412A (en) 2021-05-07

Similar Documents

Publication Publication Date Title
Zhang et al. Improved deep hashing with soft pairwise similarity for multi-label image retrieval
CN109783666B (en) Image scene graph generation method based on iterative refinement
CN113065974B (en) Link prediction method based on dynamic network representation learning
CN111931814B (en) Unsupervised countering domain adaptation method based on intra-class structure tightening constraint
CN113205048B (en) Gesture recognition method and system
CN109960732B (en) Deep discrete hash cross-modal retrieval method and system based on robust supervision
CN115688913A (en) Cloud-side collaborative personalized federal learning method, system, equipment and medium
CN112416293B (en) Neural network enhancement method, system and application thereof
CN114969367B (en) Cross-language entity alignment method based on multi-aspect subtask interaction
CN112766412B (en) Multi-view clustering method based on self-adaptive sparse graph learning
CN116861001A (en) Medical common sense knowledge graph automatic construction method based on meta learning
CN111709523A (en) Width learning method based on internal integration
Anwaar et al. Genetic algorithms: Brief review on genetic algorithms for global optimization problems
CN116386899A (en) Graph learning-based medicine disease association relation prediction method and related equipment
CN115272774A (en) Sample attack resisting method and system based on improved self-adaptive differential evolution algorithm
CN113806559B (en) Knowledge graph embedding method based on relationship path and double-layer attention
US8874615B2 (en) Method and apparatus for implementing a learning model for facilitating answering a query on a database
CN111241326A (en) Image visual relation referring and positioning method based on attention pyramid network
CN112784902B (en) Image classification method with missing data in mode
CN109063725B (en) Multi-view clustering-oriented multi-graph regularization depth matrix decomposition method
Chao et al. Incomplete contrastive multi-view clustering with high-confidence guiding
CN117078312A (en) Advertisement putting management method and system based on artificial intelligence
CN117409456A (en) Non-aligned multi-view multi-mark learning method based on graph matching mechanism
Balázs et al. Hierarchical fuzzy system modeling by genetic and bacterial programming approaches
CN115131605A (en) Structure perception graph comparison learning method based on self-adaptive sub-graph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant