CN105160357A - Multimodal data subspace clustering method based on global consistency and local topology - Google Patents
Multimodal data subspace clustering method based on global consistency and local topology Download PDFInfo
- Publication number
- CN105160357A CN105160357A CN201510546959.XA CN201510546959A CN105160357A CN 105160357 A CN105160357 A CN 105160357A CN 201510546959 A CN201510546959 A CN 201510546959A CN 105160357 A CN105160357 A CN 105160357A
- Authority
- CN
- China
- Prior art keywords
- modal data
- matrix
- expression
- expression matrix
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 239000011159 matrix material Substances 0.000 claims abstract description 67
- 230000003595 spectral effect Effects 0.000 claims description 6
- 230000002708 enhancing effect Effects 0.000 abstract 1
- 230000008569 process Effects 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a multimodal data subspace clustering method based on global consistency and local topology. The method comprises obtaining a Laplacian matrix corresponding to each piece of modal data, establishing a multimodal data subspace clustering model according to the Laplacian matrixes, obtaining a self-expression matrix corresponding to each piece of modal data through the multimodal data subspace clustering model, selecting the first self-expression matrixes from all the self-expression matrixes of the various pieces of modal data, and clustering the first self-expression matrixes to obtain a clustering result. The multimodal data subspace clustering method based on global consistency and local topology is capable of obtaining better clustering performance and enhancing the robustness.
Description
Technical field
The present invention relates to computer realm, particularly relate to a kind of multi-modal data Subspace clustering method based on global coherency and local topology.
Background technology
Along with the development of science and technology and day by day popularizing of network, the collection of modern society's data becomes more and more easier, and data volume grows with each passing day, and data also become more and more diversified simultaneously, and particularly various multi-modal data also become more and more common.Learning method based on multi-modal data also receives increasing concern and research, compared to single mode data, multi-modal data can provide more mainly with and more complicated information, the learning model therefore based on multi-modal data usually can obtain better effect and possess more excellent statistical property.
In Multimodal Learning field, multi-modal data cluster receives and pays close attention to due to the ability of the extensive non-supervisory data of its process widely and develop, and the object of multi-modal data cluster utilizes the feature under multiple mode data to be better aggregated among their classification itself.A key issue in multi-modal data cluster is exactly how better to set up and to utilize the related information between different modalities, current existing a lot of research work is intended to address this problem, comprise first utilize multi-modal under feature learning go out the statement of public characteristic, then state in public feature but not former multi-modal feature do cluster, as the multi-modal clustering method based on Non-negative Matrix Factorization; And in the process of model training, utilizing the information under different modalities to increase bound term, these methods can obtain good effect.In addition be also the field that research is many recently based on the Subspace clustering method of spectral clustering, these class methods usually suppose that similar sample has usually and similar certainly express coefficient, and the close sample in space can linear reconstruction each other; These class methods first need to calculate sample from expression matrix, then using sample from expression matrix as input, utilizing the method for spectral clustering to generate final cluster result, the result of robust more can be obtained by increasing some structure prior imformations such as structure sparse constraint and the constraint of structure low-rank etc.
But, although current certain methods can improve the clustering performance of multi-modal data to a certain extent, how better to excavate and to utilize the information such as the correlativity between different modalities and otherness still to face very large challenge.
Summary of the invention
Multi-modal data Subspace clustering method based on global coherency and local topology provided by the invention, can obtain better clustering performance, strengthens robustness.
According to an aspect of the present invention, a kind of multi-modal data Subspace clustering method based on global coherency and local topology is provided, comprises: obtain the Laplacian Matrix that each modal data is corresponding; Multi-modal data subspace clustering model is built according to described Laplacian Matrix; By described multi-modal data Clustering Model obtain each modal data described corresponding from expression matrix; From each modal data described corresponding from expression matrix, choose first from expression matrix; Carry out cluster by described first from expression matrix and obtain cluster result.
The multi-modal data Subspace clustering method based on global coherency and local topology that the embodiment of the present invention provides, obtain the Laplacian Matrix that each modal data is corresponding, multi-modal data subspace clustering model is built according to Laplacian Matrix, by multi-modal data Clustering Model obtain each modal data corresponding from expression matrix, from each modal data corresponding from expression matrix, choose first from expression matrix, carry out cluster from expression matrix by Spectral Clustering by first and obtain cluster result, better clustering performance can be obtained, strengthen robustness.
Accompanying drawing explanation
The multi-modal data Subspace clustering method process flow diagram based on global coherency and local topology that Fig. 1 provides for the embodiment of the present invention.
Embodiment
Below in conjunction with accompanying drawing, the multi-modal data Subspace clustering method based on global coherency and local topology that the embodiment of the present invention provides is described in detail.
The multi-modal data Subspace clustering method process flow diagram based on global coherency and local topology that Fig. 1 provides for the embodiment of the present invention.
With reference to Fig. 1, in step S101, obtain the Laplacian Matrix that each modal data is corresponding.
In step S102, build multi-modal data subspace clustering model according to described Laplacian Matrix.
In step S103, by described multi-modal data Clustering Model obtain each modal data described corresponding from expression matrix.
In step S104, from each modal data described corresponding from expression matrix, choose first from expression matrix.
Here, first is corresponding optimum from expression matrix in expression matrix of each modal data from expression matrix, optimum can obtain according to the priori of data from expression matrix, also can be tested by checking collection, thus acquisition optimum from expression matrix.
In step S105, carry out cluster by described first from expression matrix and obtain cluster result.
Further, the Laplacian Matrix that each modal data of described acquisition is corresponding comprises:
Gaussian kernel function is utilized to calculate similarity corresponding to each modal data described respectively;
Similarity matrix corresponding to described similarity is obtained according to described similarity;
Laplacian Matrix corresponding to described similarity matrix is obtained according to described similarity matrix.
Here, in order to improve the efficiency of algorithm, structure similarity matrix can adopt, but is not limited to, and is specially k near neighbor method.Particularly, utilize gaussian kernel function to calculate similarity between the sample of each modal data and k neighbour sample respectively, build a similarity matrix W
i, according to similarity matrix W
iobtain its Laplacian Matrix L
i.
Further, describedly described corresponding the comprising from expression matrix of each modal data is obtained by described multi-modal data Clustering Model:
According to formula (1) calculate each modal data described corresponding from expression matrix:
Wherein, Z=[Z
1z
2z
m], Z
ifor each modal data described corresponding from expression matrix,
for the reconstructed error of each modal data described, L
ifor the Laplacian Matrix that each modal data described is corresponding,
for the local topology unchangeability of each modal data described, || Z||
*for the global coherency of each modal data described,
for regular terms, λ, β and ρ are respectively weight parameter.
Here, incite somebody to action || Z||
*replace, specifically from formula (2):
Optimize S and Z by alternating iteration and minimize formula (2).
When each modal data corresponding from expression matrix Z
i, i=1 ... when m is constant, upgrades S by formula (3), be specially:
S=(ZZ
T+μI)
0.5,Z=[Z
1Z
2...Z
m](3)
When S is constant, upgrade each modal data corresponding from expression matrix Z
i, i=1 ... m, specifically from formula (4):
Formula (4) is carried out distortion and obtains formula (5), be specially:
Further, carry out cluster by described first from expression matrix to obtain cluster result and comprise:
Carry out cluster from expression matrix by Spectral Clustering by described first and obtain cluster result.
In order to be described in detail, the method that the present invention proposes is applied on the database of five conventional multi-modal clusters, i.e. Movies617, PASCAL-VOC, WiKiText-Image, Animal, 3-Sources database.Wherein Movies617 database comprises totally 617 films of 17 classifications, two modal characteristics corresponding 1878 dimension keyword features and 1398 cast's features tieed up respectively; PASCAL-VOC comprises 20 class image texts pair, removes the sample having multiple category attribute, can obtain totally 5649 samples, consider the time cost of some control methodss simultaneously.Get first three class sample as evaluation and test collection, and utilize the text words-frequency feature that the Gist characteristic sum 399 of 512 dimensions is tieed up; WiKiText-Image database by totally 2866 image texts of 10 classifications to forming, from each classification, random selecting 60 samples form the test set of totally 600 samples, wherein text feature utilizes the LDA feature of 10 dimensions, and characteristics of image utilizes the SIFT feature of 128 dimensions; Animal database is made up of 30475 samples, totally 50 classes, choose front ten classifications and from each classification random selecting 50 composition of sample test sets, get PyramidHOG (PHOG), colorSIFT and SURF feature as the feature representation under three kinds of mode; 3-Sources database comprises totally 416 different message of collecting from BBC, Reuters and TheGuardian, and they are divided into 6 classifications, and wherein 169 three mechanisms have the message of report to be used as our test set.
As from the foregoing, first the Laplacian Matrix under different modalities is calculated, then the data input model of all data sets is carried out training obtain under its different modalities from expression matrix, choosing in expression matrix from different modalities is optimum from expression matrix, Spectral Clustering NormalizedCut is finally utilized to carry out cluster to optimum from expression matrix, thus obtain optimum result, and using the result of optimum as final cluster result.
The above; be only the specific embodiment of the present invention, but protection scope of the present invention is not limited thereto, is anyly familiar with those skilled in the art in the technical scope that the present invention discloses; change can be expected easily or replace, all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of described claim.
Claims (4)
1., based on a multi-modal data Subspace clustering method for global coherency and local topology, it is characterized in that, described method comprises:
Obtain the Laplacian Matrix that each modal data is corresponding;
Multi-modal data subspace clustering model is built according to described Laplacian Matrix;
By described multi-modal data Clustering Model obtain each modal data described corresponding from expression matrix;
From each modal data described corresponding from expression matrix, choose first from expression matrix;
Carry out cluster by described first from expression matrix and obtain cluster result.
2. method according to claim 1, is characterized in that, the Laplacian Matrix that each modal data of described acquisition is corresponding comprises:
Gaussian kernel function is utilized to calculate similarity corresponding to each modal data described respectively;
Similarity matrix corresponding to described similarity is obtained according to described similarity;
Laplacian Matrix corresponding to described similarity matrix is obtained according to described similarity matrix.
3. method according to claim 1, is characterized in that, describedly obtains described corresponding the comprising from expression matrix of each modal data by described multi-modal data Clustering Model:
According to following formula calculate each modal data described corresponding from expression matrix:
Wherein, Z=[Z
lz
2z
m], Z
ifor each modal data described corresponding from expression matrix,
for the reconstructed error of each modal data described, L
ifor the Laplacian Matrix that each modal data described is corresponding,
for the local topology unchangeability of each modal data described, || Z||
*for the global coherency of each modal data described,
for regular terms, λ, β and ρ are respectively weight parameter.
4. method according to claim 1, is characterized in that, carries out cluster obtain described first cluster result and comprise from expression matrix:
Carry out cluster from expression matrix by Spectral Clustering by described first and obtain cluster result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510546959.XA CN105160357A (en) | 2015-08-31 | 2015-08-31 | Multimodal data subspace clustering method based on global consistency and local topology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510546959.XA CN105160357A (en) | 2015-08-31 | 2015-08-31 | Multimodal data subspace clustering method based on global consistency and local topology |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105160357A true CN105160357A (en) | 2015-12-16 |
Family
ID=54801209
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510546959.XA Pending CN105160357A (en) | 2015-08-31 | 2015-08-31 | Multimodal data subspace clustering method based on global consistency and local topology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105160357A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106022351A (en) * | 2016-04-27 | 2016-10-12 | 天津中科智能识别产业技术研究院有限公司 | Learning robustness multi-view clustering method based on nonnegative dictionaries |
CN106055879A (en) * | 2016-05-24 | 2016-10-26 | 北京千安哲信息技术有限公司 | Adverse drug reaction mining method and system |
CN106971197A (en) * | 2017-03-02 | 2017-07-21 | 北京工业大学 | The Subspace clustering method of multi-view data based on otherness and consistency constraint |
CN110456985A (en) * | 2019-07-02 | 2019-11-15 | 华南师范大学 | Hierarchical storage method and system towards multi-modal network big data |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103617292A (en) * | 2013-12-16 | 2014-03-05 | 中国科学院自动化研究所 | Multi-view data clustering method based on mutual regularization constraint sub-space expression |
CN104008177A (en) * | 2014-06-09 | 2014-08-27 | 华中师范大学 | Method and system for rule base structure optimization and generation facing image semantic annotation |
-
2015
- 2015-08-31 CN CN201510546959.XA patent/CN105160357A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103617292A (en) * | 2013-12-16 | 2014-03-05 | 中国科学院自动化研究所 | Multi-view data clustering method based on mutual regularization constraint sub-space expression |
CN104008177A (en) * | 2014-06-09 | 2014-08-27 | 华中师范大学 | Method and system for rule base structure optimization and generation facing image semantic annotation |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106022351A (en) * | 2016-04-27 | 2016-10-12 | 天津中科智能识别产业技术研究院有限公司 | Learning robustness multi-view clustering method based on nonnegative dictionaries |
CN106055879A (en) * | 2016-05-24 | 2016-10-26 | 北京千安哲信息技术有限公司 | Adverse drug reaction mining method and system |
CN106971197A (en) * | 2017-03-02 | 2017-07-21 | 北京工业大学 | The Subspace clustering method of multi-view data based on otherness and consistency constraint |
CN106971197B (en) * | 2017-03-02 | 2019-12-13 | 北京工业大学 | Subspace clustering method of multi-view data based on difference and consistency constraint |
CN110456985A (en) * | 2019-07-02 | 2019-11-15 | 华南师范大学 | Hierarchical storage method and system towards multi-modal network big data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110674407B (en) | Hybrid recommendation method based on graph convolution neural network | |
CN103092911B (en) | A kind of mosaic society label similarity is based on the Collaborative Filtering Recommendation System of k nearest neighbor | |
Liu et al. | The causal nexus between energy consumption, carbon emissions and economic growth: New evidence from China, India and G7 countries using convergent cross mapping | |
CN106096066A (en) | The Text Clustering Method embedded based on random neighbor | |
CN103886067B (en) | Method for recommending books through label implied topic | |
CN103399858A (en) | Socialization collaborative filtering recommendation method based on trust | |
CN103473307B (en) | Across media sparse hash indexing means | |
CN103310003A (en) | Method and system for predicting click rate of new advertisement based on click log | |
CN107220311B (en) | Text representation method for modeling by utilizing local embedded topics | |
CN105160357A (en) | Multimodal data subspace clustering method based on global consistency and local topology | |
CN102693316B (en) | Linear generalization regression model based cross-media retrieval method | |
CN104933156A (en) | Collaborative filtering method based on shared neighbor clustering | |
CN106599227B (en) | Method and device for acquiring similarity between objects based on attribute values | |
Fienberg | Introduction to papers on the modeling and analysis of network data | |
Yu et al. | Hybrid self-optimized clustering model based on citation links and textual features to detect research topics | |
CN105678590A (en) | topN recommendation method for social network based on cloud model | |
Colavizza et al. | Clustering citation histories in the Physical Review | |
CN110473073A (en) | The method and device that linear weighted function is recommended | |
CN104572733A (en) | User interest tag classification method and device | |
CN103095849B (en) | A method and a system of spervised web service finding based on attribution forecast and error correction of quality of service (QoS) | |
CN103123685B (en) | Text mode recognition method | |
CN106021289A (en) | Method for establishing probability matrix decomposition model based on node user | |
CN104008177A (en) | Method and system for rule base structure optimization and generation facing image semantic annotation | |
CN115795131A (en) | Electronic file classification method and device based on artificial intelligence and electronic equipment | |
CN104951505A (en) | Large-scale data clustering method based on graphic calculation technology |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20151216 |