CN105160357A - Multimodal data subspace clustering method based on global consistency and local topology - Google Patents

Multimodal data subspace clustering method based on global consistency and local topology Download PDF

Info

Publication number
CN105160357A
CN105160357A CN201510546959.XA CN201510546959A CN105160357A CN 105160357 A CN105160357 A CN 105160357A CN 201510546959 A CN201510546959 A CN 201510546959A CN 105160357 A CN105160357 A CN 105160357A
Authority
CN
China
Prior art keywords
modal data
matrix
expression
expression matrix
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510546959.XA
Other languages
Chinese (zh)
Inventor
赫然
胡包钢
樊艳波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201510546959.XA priority Critical patent/CN105160357A/en
Publication of CN105160357A publication Critical patent/CN105160357A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a multimodal data subspace clustering method based on global consistency and local topology. The method comprises obtaining a Laplacian matrix corresponding to each piece of modal data, establishing a multimodal data subspace clustering model according to the Laplacian matrixes, obtaining a self-expression matrix corresponding to each piece of modal data through the multimodal data subspace clustering model, selecting the first self-expression matrixes from all the self-expression matrixes of the various pieces of modal data, and clustering the first self-expression matrixes to obtain a clustering result. The multimodal data subspace clustering method based on global consistency and local topology is capable of obtaining better clustering performance and enhancing the robustness.

Description

Based on the multi-modal data Subspace clustering method of global coherency and local topology
Technical field
The present invention relates to computer realm, particularly relate to a kind of multi-modal data Subspace clustering method based on global coherency and local topology.
Background technology
Along with the development of science and technology and day by day popularizing of network, the collection of modern society's data becomes more and more easier, and data volume grows with each passing day, and data also become more and more diversified simultaneously, and particularly various multi-modal data also become more and more common.Learning method based on multi-modal data also receives increasing concern and research, compared to single mode data, multi-modal data can provide more mainly with and more complicated information, the learning model therefore based on multi-modal data usually can obtain better effect and possess more excellent statistical property.
In Multimodal Learning field, multi-modal data cluster receives and pays close attention to due to the ability of the extensive non-supervisory data of its process widely and develop, and the object of multi-modal data cluster utilizes the feature under multiple mode data to be better aggregated among their classification itself.A key issue in multi-modal data cluster is exactly how better to set up and to utilize the related information between different modalities, current existing a lot of research work is intended to address this problem, comprise first utilize multi-modal under feature learning go out the statement of public characteristic, then state in public feature but not former multi-modal feature do cluster, as the multi-modal clustering method based on Non-negative Matrix Factorization; And in the process of model training, utilizing the information under different modalities to increase bound term, these methods can obtain good effect.In addition be also the field that research is many recently based on the Subspace clustering method of spectral clustering, these class methods usually suppose that similar sample has usually and similar certainly express coefficient, and the close sample in space can linear reconstruction each other; These class methods first need to calculate sample from expression matrix, then using sample from expression matrix as input, utilizing the method for spectral clustering to generate final cluster result, the result of robust more can be obtained by increasing some structure prior imformations such as structure sparse constraint and the constraint of structure low-rank etc.
But, although current certain methods can improve the clustering performance of multi-modal data to a certain extent, how better to excavate and to utilize the information such as the correlativity between different modalities and otherness still to face very large challenge.
Summary of the invention
Multi-modal data Subspace clustering method based on global coherency and local topology provided by the invention, can obtain better clustering performance, strengthens robustness.
According to an aspect of the present invention, a kind of multi-modal data Subspace clustering method based on global coherency and local topology is provided, comprises: obtain the Laplacian Matrix that each modal data is corresponding; Multi-modal data subspace clustering model is built according to described Laplacian Matrix; By described multi-modal data Clustering Model obtain each modal data described corresponding from expression matrix; From each modal data described corresponding from expression matrix, choose first from expression matrix; Carry out cluster by described first from expression matrix and obtain cluster result.
The multi-modal data Subspace clustering method based on global coherency and local topology that the embodiment of the present invention provides, obtain the Laplacian Matrix that each modal data is corresponding, multi-modal data subspace clustering model is built according to Laplacian Matrix, by multi-modal data Clustering Model obtain each modal data corresponding from expression matrix, from each modal data corresponding from expression matrix, choose first from expression matrix, carry out cluster from expression matrix by Spectral Clustering by first and obtain cluster result, better clustering performance can be obtained, strengthen robustness.
Accompanying drawing explanation
The multi-modal data Subspace clustering method process flow diagram based on global coherency and local topology that Fig. 1 provides for the embodiment of the present invention.
Embodiment
Below in conjunction with accompanying drawing, the multi-modal data Subspace clustering method based on global coherency and local topology that the embodiment of the present invention provides is described in detail.
The multi-modal data Subspace clustering method process flow diagram based on global coherency and local topology that Fig. 1 provides for the embodiment of the present invention.
With reference to Fig. 1, in step S101, obtain the Laplacian Matrix that each modal data is corresponding.
In step S102, build multi-modal data subspace clustering model according to described Laplacian Matrix.
In step S103, by described multi-modal data Clustering Model obtain each modal data described corresponding from expression matrix.
In step S104, from each modal data described corresponding from expression matrix, choose first from expression matrix.
Here, first is corresponding optimum from expression matrix in expression matrix of each modal data from expression matrix, optimum can obtain according to the priori of data from expression matrix, also can be tested by checking collection, thus acquisition optimum from expression matrix.
In step S105, carry out cluster by described first from expression matrix and obtain cluster result.
Further, the Laplacian Matrix that each modal data of described acquisition is corresponding comprises:
Gaussian kernel function is utilized to calculate similarity corresponding to each modal data described respectively;
Similarity matrix corresponding to described similarity is obtained according to described similarity;
Laplacian Matrix corresponding to described similarity matrix is obtained according to described similarity matrix.
Here, in order to improve the efficiency of algorithm, structure similarity matrix can adopt, but is not limited to, and is specially k near neighbor method.Particularly, utilize gaussian kernel function to calculate similarity between the sample of each modal data and k neighbour sample respectively, build a similarity matrix W i, according to similarity matrix W iobtain its Laplacian Matrix L i.
Further, describedly described corresponding the comprising from expression matrix of each modal data is obtained by described multi-modal data Clustering Model:
According to formula (1) calculate each modal data described corresponding from expression matrix:
< Z > = arg min Z &Sigma; i = 1 m | | X i - X i Z i | | F 2 + &lambda; &Sigma; i = 1 m &Sigma; i = 1 , j &NotEqual; i m t r ( Z i L i Z i T ) + &beta; | | Z | | * + &rho; | | Z | | F 2 - - - ( 1 )
Wherein, Z=[Z 1z 2z m], Z ifor each modal data described corresponding from expression matrix, for the reconstructed error of each modal data described, L ifor the Laplacian Matrix that each modal data described is corresponding, for the local topology unchangeability of each modal data described, || Z|| *for the global coherency of each modal data described, for regular terms, λ, β and ρ are respectively weight parameter.
Here, incite somebody to action || Z|| *replace, specifically from formula (2):
< Z > = arg min Z &Sigma; i = 1 m | | X i - X i Z i | | F 2 + &lambda; &Sigma; i = 1 m &Sigma; j = 1 , j &NotEqual; i m t r ( Z i L i Z i T ) + &beta; 2 &Sigma; i = 1 m t r ( Z i T L i Z i ) + &beta; 2 t r ( S ) + &rho; | | Z | | F 2 - - - ( 2 )
Optimize S and Z by alternating iteration and minimize formula (2).
When each modal data corresponding from expression matrix Z i, i=1 ... when m is constant, upgrades S by formula (3), be specially:
S=(ZZ T+μI) 0.5,Z=[Z 1Z 2...Z m](3)
When S is constant, upgrade each modal data corresponding from expression matrix Z i, i=1 ... m, specifically from formula (4):
< Z i > = arg min Z | | X i - X i Z i | | F 2 + &lambda; &Sigma; j = 1 , j &NotEqual; i m t r ( Z i L i Z i T ) + &beta; 2 t r ( Z i T S - 1 Z i ) - - - ( 4 )
Formula (4) is carried out distortion and obtains formula (5), be specially:
( X i T X i + &beta; 2 S - 1 + &rho; I ) Z i + &lambda;Z i &Sigma; j = 1 , j &NotEqual; i m L j = X i T X i - - - ( 5 )
Further, carry out cluster by described first from expression matrix to obtain cluster result and comprise:
Carry out cluster from expression matrix by Spectral Clustering by described first and obtain cluster result.
In order to be described in detail, the method that the present invention proposes is applied on the database of five conventional multi-modal clusters, i.e. Movies617, PASCAL-VOC, WiKiText-Image, Animal, 3-Sources database.Wherein Movies617 database comprises totally 617 films of 17 classifications, two modal characteristics corresponding 1878 dimension keyword features and 1398 cast's features tieed up respectively; PASCAL-VOC comprises 20 class image texts pair, removes the sample having multiple category attribute, can obtain totally 5649 samples, consider the time cost of some control methodss simultaneously.Get first three class sample as evaluation and test collection, and utilize the text words-frequency feature that the Gist characteristic sum 399 of 512 dimensions is tieed up; WiKiText-Image database by totally 2866 image texts of 10 classifications to forming, from each classification, random selecting 60 samples form the test set of totally 600 samples, wherein text feature utilizes the LDA feature of 10 dimensions, and characteristics of image utilizes the SIFT feature of 128 dimensions; Animal database is made up of 30475 samples, totally 50 classes, choose front ten classifications and from each classification random selecting 50 composition of sample test sets, get PyramidHOG (PHOG), colorSIFT and SURF feature as the feature representation under three kinds of mode; 3-Sources database comprises totally 416 different message of collecting from BBC, Reuters and TheGuardian, and they are divided into 6 classifications, and wherein 169 three mechanisms have the message of report to be used as our test set.
As from the foregoing, first the Laplacian Matrix under different modalities is calculated, then the data input model of all data sets is carried out training obtain under its different modalities from expression matrix, choosing in expression matrix from different modalities is optimum from expression matrix, Spectral Clustering NormalizedCut is finally utilized to carry out cluster to optimum from expression matrix, thus obtain optimum result, and using the result of optimum as final cluster result.
The above; be only the specific embodiment of the present invention, but protection scope of the present invention is not limited thereto, is anyly familiar with those skilled in the art in the technical scope that the present invention discloses; change can be expected easily or replace, all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of described claim.

Claims (4)

1., based on a multi-modal data Subspace clustering method for global coherency and local topology, it is characterized in that, described method comprises:
Obtain the Laplacian Matrix that each modal data is corresponding;
Multi-modal data subspace clustering model is built according to described Laplacian Matrix;
By described multi-modal data Clustering Model obtain each modal data described corresponding from expression matrix;
From each modal data described corresponding from expression matrix, choose first from expression matrix;
Carry out cluster by described first from expression matrix and obtain cluster result.
2. method according to claim 1, is characterized in that, the Laplacian Matrix that each modal data of described acquisition is corresponding comprises:
Gaussian kernel function is utilized to calculate similarity corresponding to each modal data described respectively;
Similarity matrix corresponding to described similarity is obtained according to described similarity;
Laplacian Matrix corresponding to described similarity matrix is obtained according to described similarity matrix.
3. method according to claim 1, is characterized in that, describedly obtains described corresponding the comprising from expression matrix of each modal data by described multi-modal data Clustering Model:
According to following formula calculate each modal data described corresponding from expression matrix:
< Z > = arg min Z &Sigma; i = 1 m | | X i - X i Z i | | F 2 + &lambda; &Sigma; i = 1 m &Sigma; i = 1 , j &NotEqual; i m t r ( Z i L i Z i T ) + &beta; | | Z | | * + &rho; | | Z | | F 2
Wherein, Z=[Z lz 2z m], Z ifor each modal data described corresponding from expression matrix, for the reconstructed error of each modal data described, L ifor the Laplacian Matrix that each modal data described is corresponding, for the local topology unchangeability of each modal data described, || Z|| *for the global coherency of each modal data described, for regular terms, λ, β and ρ are respectively weight parameter.
4. method according to claim 1, is characterized in that, carries out cluster obtain described first cluster result and comprise from expression matrix:
Carry out cluster from expression matrix by Spectral Clustering by described first and obtain cluster result.
CN201510546959.XA 2015-08-31 2015-08-31 Multimodal data subspace clustering method based on global consistency and local topology Pending CN105160357A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510546959.XA CN105160357A (en) 2015-08-31 2015-08-31 Multimodal data subspace clustering method based on global consistency and local topology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510546959.XA CN105160357A (en) 2015-08-31 2015-08-31 Multimodal data subspace clustering method based on global consistency and local topology

Publications (1)

Publication Number Publication Date
CN105160357A true CN105160357A (en) 2015-12-16

Family

ID=54801209

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510546959.XA Pending CN105160357A (en) 2015-08-31 2015-08-31 Multimodal data subspace clustering method based on global consistency and local topology

Country Status (1)

Country Link
CN (1) CN105160357A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106022351A (en) * 2016-04-27 2016-10-12 天津中科智能识别产业技术研究院有限公司 Learning robustness multi-view clustering method based on nonnegative dictionaries
CN106055879A (en) * 2016-05-24 2016-10-26 北京千安哲信息技术有限公司 Adverse drug reaction mining method and system
CN106971197A (en) * 2017-03-02 2017-07-21 北京工业大学 The Subspace clustering method of multi-view data based on otherness and consistency constraint
CN110456985A (en) * 2019-07-02 2019-11-15 华南师范大学 Hierarchical storage method and system towards multi-modal network big data

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103617292A (en) * 2013-12-16 2014-03-05 中国科学院自动化研究所 Multi-view data clustering method based on mutual regularization constraint sub-space expression
CN104008177A (en) * 2014-06-09 2014-08-27 华中师范大学 Method and system for rule base structure optimization and generation facing image semantic annotation

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103617292A (en) * 2013-12-16 2014-03-05 中国科学院自动化研究所 Multi-view data clustering method based on mutual regularization constraint sub-space expression
CN104008177A (en) * 2014-06-09 2014-08-27 华中师范大学 Method and system for rule base structure optimization and generation facing image semantic annotation

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106022351A (en) * 2016-04-27 2016-10-12 天津中科智能识别产业技术研究院有限公司 Learning robustness multi-view clustering method based on nonnegative dictionaries
CN106055879A (en) * 2016-05-24 2016-10-26 北京千安哲信息技术有限公司 Adverse drug reaction mining method and system
CN106971197A (en) * 2017-03-02 2017-07-21 北京工业大学 The Subspace clustering method of multi-view data based on otherness and consistency constraint
CN106971197B (en) * 2017-03-02 2019-12-13 北京工业大学 Subspace clustering method of multi-view data based on difference and consistency constraint
CN110456985A (en) * 2019-07-02 2019-11-15 华南师范大学 Hierarchical storage method and system towards multi-modal network big data

Similar Documents

Publication Publication Date Title
CN110674407B (en) Hybrid recommendation method based on graph convolution neural network
CN103092911B (en) A kind of mosaic society label similarity is based on the Collaborative Filtering Recommendation System of k nearest neighbor
Liu et al. The causal nexus between energy consumption, carbon emissions and economic growth: New evidence from China, India and G7 countries using convergent cross mapping
CN106096066A (en) The Text Clustering Method embedded based on random neighbor
CN103886067B (en) Method for recommending books through label implied topic
CN103399858A (en) Socialization collaborative filtering recommendation method based on trust
CN103473307B (en) Across media sparse hash indexing means
CN103310003A (en) Method and system for predicting click rate of new advertisement based on click log
CN107220311B (en) Text representation method for modeling by utilizing local embedded topics
CN105160357A (en) Multimodal data subspace clustering method based on global consistency and local topology
CN102693316B (en) Linear generalization regression model based cross-media retrieval method
CN104933156A (en) Collaborative filtering method based on shared neighbor clustering
CN106599227B (en) Method and device for acquiring similarity between objects based on attribute values
Fienberg Introduction to papers on the modeling and analysis of network data
Yu et al. Hybrid self-optimized clustering model based on citation links and textual features to detect research topics
CN105678590A (en) topN recommendation method for social network based on cloud model
Colavizza et al. Clustering citation histories in the Physical Review
CN110473073A (en) The method and device that linear weighted function is recommended
CN104572733A (en) User interest tag classification method and device
CN103095849B (en) A method and a system of spervised web service finding based on attribution forecast and error correction of quality of service (QoS)
CN103123685B (en) Text mode recognition method
CN106021289A (en) Method for establishing probability matrix decomposition model based on node user
CN104008177A (en) Method and system for rule base structure optimization and generation facing image semantic annotation
CN115795131A (en) Electronic file classification method and device based on artificial intelligence and electronic equipment
CN104951505A (en) Large-scale data clustering method based on graphic calculation technology

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20151216