CN117409456A - Non-aligned multi-view multi-mark learning method based on graph matching mechanism - Google Patents
Non-aligned multi-view multi-mark learning method based on graph matching mechanism Download PDFInfo
- Publication number
- CN117409456A CN117409456A CN202311195295.8A CN202311195295A CN117409456A CN 117409456 A CN117409456 A CN 117409456A CN 202311195295 A CN202311195295 A CN 202311195295A CN 117409456 A CN117409456 A CN 117409456A
- Authority
- CN
- China
- Prior art keywords
- view
- matrix
- data
- mark
- aligned
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 50
- 230000007246 mechanism Effects 0.000 title claims abstract description 21
- 239000011159 matrix material Substances 0.000 claims abstract description 70
- 238000005457 optimization Methods 0.000 claims abstract description 19
- 238000012549 training Methods 0.000 claims abstract description 18
- 238000013507 mapping Methods 0.000 claims description 20
- 239000003550 marker Substances 0.000 claims description 16
- 230000006870 function Effects 0.000 claims description 10
- 238000013145 classification model Methods 0.000 claims description 7
- 238000012360 testing method Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000007781 pre-processing Methods 0.000 claims description 4
- 238000010586 diagram Methods 0.000 claims description 3
- 238000005065 mining Methods 0.000 claims description 3
- 102000002274 Matrix Metalloproteinases Human genes 0.000 claims description 2
- 108010000684 Matrix Metalloproteinases Proteins 0.000 claims description 2
- 230000008878 coupling Effects 0.000 claims description 2
- 238000010168 coupling process Methods 0.000 claims description 2
- 238000005859 coupling reaction Methods 0.000 claims description 2
- 238000000354 decomposition reaction Methods 0.000 claims description 2
- 238000010561 standard procedure Methods 0.000 claims description 2
- 239000013598 vector Substances 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 238000007418 data mining Methods 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 230000000295 complement effect Effects 0.000 description 2
- 230000002950 deficient Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000008921 facial expression Effects 0.000 description 1
- 201000011243 gastrointestinal stromal tumor Diseases 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 230000007786 learning performance Effects 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 230000001020 rhythmical effect Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
- G06V10/765—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a non-aligned multi-view multi-mark learning method based on a graph matching mechanism. Based on training data in the sample data set, a feature matrix and an observable mark matrix are constructed. Explicit alignment is carried out on unaligned data through a permutation matrix, namely point-to-point first-order alignment among samples; and carrying out second-order alignment on the graph structure on the view by using distance matrixes of samples in different views, so that the alignment accuracy of the model is further improved. The expression of 'commonality-personality' among aligned data views is mined, and a non-aligned multi-view multi-label learning model based on a graph matching mechanism is constructed by utilizing consistency and complementarity of cross views. Training the model by an alternative optimization method until the model converges to obtain the classification predictor.
Description
Technical Field
The invention relates to non-negative matrix factorization, graph matching technology, non-aligned multi-view learning and multi-label classification, in particular to a non-aligned multi-view multi-label learning method based on a graph matching mechanism.
Background
The large-scale Internet development, the popularization of large data and artificial intelligence technology application, and the mass multi-view multi-mark data are brought along. Multi-view multi-label learning has received extensive attention as a primary architecture to address such issues. In a multi-view multi-label learning task, each sample object is represented by a variety of heterogeneous view information, with several associated labels simultaneously labeled. However, the existing multi-view multi-label learning method is generally to explore related information and complementary information of cross views. The search for this information is typically based on view alignment (instances in different views describe the same object). However, such view alignment relationships may become partial view alignment or view non-alignment relationships due to spatial, temporal, or spatio-temporal asynchrony. Such as: in video recommendations, the tag data comes from different video software, but due to the privacy protection principle of the user, it is not possible to match and align these data with the same user. In the face recognition field, due to failure of face feature detection, multi-view faces cannot be aligned, which may result in incapability of facial expression recognition. The existing multi-view multi-label learning model cannot learn a robust multi-label classification model directly from these non-aligned data.
Therefore, the invention provides a non-aligned multi-view multi-label learning method (MCGM for short) based on a graph matching mechanism to solve the problem of attempting non-alignment and the problem of semantic comprehensive expression in multi-view multi-label learning. Aiming at the non-alignment problem of multi-view data, the precise alignment of the feature nodes of the same instance under different views is realized by mining the cross-view 'instance-instance' and 'instance relation-instance relation' graph matching relation, and the feature nodes are used for subsequent classification tasks; aiming at the problem that the existing multi-view multi-mark learning algorithm based on shared subspace representation is difficult to describe all semantic information of multi-view data, a multi-view multi-mark classification model based on 'common-single' semantic representation is designed, contribution of single views in specific semantic expression is emphasized, and semantic expression of few types of samples is promoted.
Disclosure of Invention
The technical solution of the invention is as follows: the non-aligned multi-view multi-mark learning method based on the graph matching mechanism is provided, classification of non-aligned multi-view multi-mark data is achieved, and efficiency and accuracy of the method are guaranteed through semantic comprehensive expression.
The technical scheme of the invention is as follows: a non-aligned multi-view multi-mark learning method based on a graph matching mechanism comprises the steps of firstly obtaining non-aligned multi-view multi-mark data, storing, preprocessing and dividing a data set of the non-aligned multi-view multi-mark data to form a sample data set. Based on training data in the sample data set, a feature matrix and an observable mark matrix are constructed. According to the feature matrix and the observable mark matrix, unaligned data are aligned: 1) Explicit alignment is carried out on unaligned data through a permutation matrix, namely point-to-point first-order alignment among samples; 2) And carrying out second-order alignment on the graph structure on the view by using distance matrixes of samples in different views, so that the alignment accuracy of the model is further improved. The expression of 'commonality-personality' among aligned data views is mined, and a non-aligned multi-view multi-label learning model based on a graph matching mechanism is constructed by utilizing consistency and complementarity of cross views. Training the model by an alternative optimization method until the model converges to obtain the classification predictor. And predicting the test set based on the converged classifier, and obtaining a mark classification result by using the output probability. The method comprises the following specific steps:
in the present invention, the matrix is represented by bolded letters, such as X. Vectors are represented by bold lower case letters, such as x. In addition, (XR) represents a matrix obtained by X.R, where·is a matrix multiplication. The inverse and transpose of matrix X are denoted as X, respectively -1 ,X T 。X v Feature matrix, X, representing the v-th view v The ith column and the jth row are respectively denoted as (X) v ) :,i And (X) v ) j,: 。(X v ) i,j Is X v Element (i, j), x i Representing the ith element of vector x. In addition, we useRepresenting the real number domain, fu Luo Beini Us (Frobenius) Fan Shuji is +.>
Step S1, acquiring non-aligned multi-view multi-marker data, and storing, preprocessing and dividing a data set for the non-aligned multi-view multi-marker data. Since this problem is a completely new problem, there is no presently disclosed non-aligned dataset, and thus a synthetic dataset is employed. Specifically, based on the 6 disclosed multi-view multi-label datasets, the instances in the different views are made to describe the different objects by randomly scrambling the instances. A non-aligned multi-view multi-marker dataset is obtained.
Constructing a data set with V view samples from training dataWherein-> Is the complete feature space of the v-th view, n represents the training sample x i Number d of (d) v Representing the dimension of each sample.Representing a marker space corresponding to the feature space. Wherein y is i ∈{0,1} n×q Is x i Q represents the number of marks.
Step S2, constructing a feature matrix X for training data in the sample data set acquired in the step S1 v The observable mark matrix Y is used for constructing first-order and second-order relation matching of cross views, realizing characteristic alignment of multiple views, and constructing a non-aligned multi-view multi-mark classification model based on 'common-single' semantic representation. The method comprises the following specific steps:
(a) Benefit (benefit)Explicit alignment is performed on unaligned data by using a permutation matrix, and point-to-point first-order alignment is performed between instances. After first-order alignment, a relatively correct mapping relation between the examples is obtained, and common low-dimensional representation of different view data is extracted by adopting non-negative matrix factorization. Introducing a feature mapping matrix W based on the obtained common representation and the observable marking matrix 0 Constructing a linear mapping relation from a shared subspace to a mark space to obtain an initial learning model:
s.t.(M v ) i,j ∈{0,1},M v 1=1,1 T M v =1 T ,P≥0,H v ≥0,W 0 ≥0#(1)
wherein,is the permutation matrix of the v-th view, and the feature matrix is multiplied by the permutation matrix to obtain aligned multi-view data. P and H v The aligned data are obtained by non-negative moment decomposition. />An individual mapping matrix representing a v-th view; />Is a shared subspace, where k is the size of the desired data reduced dimension; p is greater than or equal to 0 and H v 0 is the non-negative constraint of the matrix. />Is a coefficient matrix corresponding to P, thus also constraining W 0 ≥0。W 0 And learning the commonality information of the heterogeneous view by sharing the mapping relation of the subspace P to the subspace Y. Alpha and gamma are two hyper-parameters. The last term is for W 0 Regularization constraint is performed to avoid over-fitting problem and reduce influence of noise characteristics。
(b) Taking into account the individual mapping matrixIn fact the corresponding characteristics for encoding a single view define another set of coefficient matrices +.>To capture the unique characteristics of a single view, wherein +.>The reconstructed feature matrix PH with the v-th view personalized information v Mapped to the markup space. The personality and commonality information of the heterogeneous view is utilized, and the formula (1) is further expanded as follows:
s.t.(M v ) i,j ∈{0,1},M v 1=1,1 T M v =1 T ,P≥0,H v ≥0,W 0 ≥0,W v ≥0#(2)
wherein P can be regarded as a dictionary matrix containing all view information, and H v Representing individual view-specific coding coefficients. Thus, pH is v The goal of (1) is to capture individual information for a particular view, while P captures shared information for all views. Thus, pH is v And P can be regarded as an expression of "commonality-personality" of the multiview data.
(c) Further consider the alignment and marker correlation of the inter-view second order graph structure. Due to M v X v And M j X j Respectively represent view X v And view X j Is a properly aligned view of (c). Mapping matrices with cross-viewTo represent view X v And view X j Matching degree of cross-view samples between. Based on the common knowledge that the graph structures formed after the alignment of the multi-view data should be as consistent as possible, structure matching loss terms are constructed to explore the correct cross-view mapping relationship. Establishing a distance matrix S between respective samples for each view v Representing the graph structure of view v. By mapping matrix across views->Permutation distance matrix S v And S is j The graph connection structure of the two views is made as similar as possible. />The correlation matrix of the markers is represented, and known correlation marker information is extracted by using the correlation in the multi-marker data through the matrix A. The superparameter beta is introduced to balance the weights of the second order alignment. Finally, the final non-aligned multi-view multi-label learning model based on the graph matching mechanism is expressed as follows:
s.t.(M v ) i,j ∈{0,1},M v 1=1,1 T M v =1 T ,P≥0,H v ≥0,W 0 ≥0,W v ≥0
wherein the method comprises the steps of
And S3, simplifying the expression of the non-aligned multi-view multi-mark learning model based on the graph matching mechanism in the step S2, and performing alternating optimization training to minimize the model until the model converges to obtain the classification predictor. The method comprises the following specific steps:
(a) Due to M v Is non-convex and it is difficult to obtain an optimal solution. Again, since the orthogonal transformation does not change the relationship between vectors, this means that less stringent constraints can also preserve the structure of the data. Handle M v Is relaxed as:
(b) The objective function is transformed into an unconstrained problem by introducing lagrange multipliers λ, Φ, Θ, Ω, ψ. Taking view v as an example, the objective function (3) is changed over view v to the following form:
(c) Iterative optimizationFix P->A,W 0 ,/>When M is v Calculation is independent of M v′ V' +.v. Thus for each view v, for M v The optimization is performed separately.
Standard methods solve the coupling equation (3) and constraintsNonlinear methods, such as newton methods, are used. This system of nonlinear equations is often difficult to solve. The selection seeks an approximate solution. By this method, the following can be obtained with respect to M v Is a rule for iterative updating of:
wherein:
(d) And (5) iteratively optimizing P. FixingA,W 0 ,/>In the formula (3)>Taking the derivative of P, one can obtain:
using the KKT condition, i.e. phi i,j P i,j =0, the following iterative update rule for P can be derived:
(e) And (5) iteratively optimizing A. FixingP,W 0 ,/>In the formula (3)>Taking the derivative of A, one can obtain:
let the derivative be equal to 0, the following update rule for A can be obtained:
(f) Iterative optimizationFix P->A,W 0 ,/>When H is v Calculation is independent of H v V' +.v. Thus for each view v, for H v Optimizing alone, in formula (3)>For H v Derivative is obtained, and the following steps are obtained:
using the KKT condition, i.e. Θ i,j (H v ) i,j =0, the following can be obtained for H v Is a rule for iterative updating of:
(g) Iterative optimization W 0 . The process of fixing the P is carried out,A,/>in the formula (3)>For W 0 Derivative is obtained, and the following steps are obtained:
using the KKT condition, i.e. Ω i,j (W 0 ) i,j =0, the following can be obtained for W 0 Is a rule for iterative updating of:
(h) Iterative optimizationFix P->A,/>W 0 In the formula (3)>For W v Derivative is obtained, and the following steps are obtained:
using the KKT condition, i.e. ψ i,j (W v ) i,j The following can be obtained for W =0 v Updating rules:
(i) Repeating (c) to (h), and continuously and alternately updating parametersA,W 0 ,/>And P is carried out until the iteration stop condition is met. And converging the objective function, and outputting the optimal parameters of the non-aligned multi-view multi-mark learning model based on the graph matching mechanism to obtain the classification predictor.
Step S4, predicting a test set based on a converged non-aligned multi-view multi-label learning model based on a graph matching mechanism, and obtaining a label prediction result by using output probability, wherein the specific steps comprise: marking prediction matricesWill be given by equationGiven.
Compared with the prior art, the invention has the advantages that:
1. for the problem of view misalignment in multi-view multi-label learning tasks, first and second order alignment is proposed to solve this problem. The re-ordering matrix may adaptively reorder features in each view, performing a first order alignment to obtain a correct mapping relationship. Therefore, the problem of non-alignment of the views can be simplified into the problem of alignment of the views, and in addition, the distance matrix of samples in different views is used for carrying out second-order alignment on the views structurally, so that the alignment efficiency and accuracy are improved.
2. Aiming at the problem of multi-view multi-mark semantic comprehensive expression, the method can jointly utilize consistency and diversity information of multi-view multi-mark data. The model learns a shared subspace from different views, labeled correlations, an integrated classifier based on individual and shared feature spaces. The aligned data is input into the multi-view multi-label classification model based on the 'common body-single body' semantic representation, the contribution of single body view in specific semantic expression is emphasized, and the semantic expression of a few-class sample is promoted.
3. The invisible correlations present in the markers are learned by introducing a dynamic marker correlation matrix a. Although a fixed tag correlation matrix may be estimated based on known tag matrices. However, it may not be sufficient to use only samples of known labels, and the proposed dynamic label correlation matrix can adaptively measure the correlation between labels, so as to help improve learning performance of the multi-label classification model.
4. The model is simplified to be in a general condition, and an iterative optimization method for solving the objective function is provided. An approximate solution to the model is sought at a certain time complexity. Thereafter, the validity of the model was verified on six real-world datasets.
Drawings
FIG. 1 is a process flow diagram of the method of the present invention.
Fig. 2 is a training workflow diagram of the method of the present invention.
Detailed Description
In order to make the solution of the embodiment of the present invention better understood by those skilled in the art, the embodiment of the present invention is further described in detail below with reference to the accompanying drawings and embodiments.
As shown in fig. 1, the present invention includes the steps of:
1. and acquiring unaligned multi-view multi-marker data, and storing, preprocessing and dividing a data set. Constructed feature matrix X v The marker matrix Y can be observed. Since this problem is a completely new problem, there is no presently disclosed non-aligned dataset, and thus a synthetic dataset is employed. Specifically, based on the 6 disclosed multi-view multi-label datasets, the instances in the different views are made to describe the different objects by randomly scrambling the instances. A non-aligned multi-view multi-marker dataset is obtained. Constructing a data set with V view samples from training data Is a marker space corresponding to the feature set.
2. For the feature matrix X built by the training data in the sample data set obtained in the step 1 v The observable mark matrix Y is used for constructing first-order and second-order relation matching of cross views, realizing characteristic alignment of multiple views, and constructing a non-aligned multi-view multi-mark classification model based on 'common-single' semantic representation. And performing alternating optimization training to minimize the model until the model converges to obtain the classification predictor. The objective function is as follows:
the method comprises the following specific steps:
(a) Input feature matrix X v The method comprises the steps of carrying out a first treatment on the surface of the An observable marking matrix Y; sharing dimensions of subspaces; super-parameters in equation (6); a convergence threshold; number of iterations. The latter four input values may be changed from dataset to achieve a better result.
(b) Random initialization M v 、H v 、W v 、P、W 0 And A. Constructing an adjacency matrix S for each view by means of a feature matrix a 。
(c) Optimization M is performed according to the formulas (5), (7), (9), (11), (13), (15) alternately and iteratively v 、P、A、H v 、W 0 And W is v . And until the iteration stopping condition is met, the iteration stopping condition can be that the difference value of the objective function value and the two iterations is smaller than a convergence threshold value or the maximum number of iterations is reached, and finally, the optimal solution of the objective function is output to obtain the non-aligned multi-view multi-mark learning model classifier based on the graph matching mechanism.
3. Non-aligned multi-view multi-tagging based on graph matching mechanism after convergenceThe learning model predicts the test set and obtains a mark prediction result by using the output probability, and the specific steps comprise: marking prediction matricesWill be given by equationGiven. To obtain accurate marking information, a certain threshold value is set,/->The element in the vector above this threshold is set to 1, i.e. this flag is the flag of the sample. Setting 0 below this threshold indicates that the flag is not this sample flag. The threshold may typically be set at 0.5, but the threshold will often be different for different data sets.
The present invention has been conducted on six real world datasets to conduct intensive experimental studies. And all six multiview datasets used in the experiment were published. Their statistics are summarized in Table 1. For each dataset, table 1 summarizes the number of samples (n); number of views (m); the number of different labels (c); the average number of marks per sample (# avg); the minimum dimension (d min )。
Table 1 statistics for six multiview datasets
Emotion is a set of music data, the two views of each instance corresponding to the rhythmic and the tonal characteristics of a piece of music; yeast is a biological dataset, with two views of each example corresponding to genetic expression and phylogenetic development of a gene; corel5k, pascal07, ESPGame, mirflicker is four widely used multi-view image dataset. From which a plurality of features of these images are collected, each image being represented by 6 representative color space views, each facing a different application background: HUE, SIFT, GIST, HSV, RGB and LAB. Six unaligned multiview multi-label datasets are obtained by randomly scrambling the instances such that the instances in different views describe different objects. Furthermore, in order to verify the effectiveness of the method MCGM according to the present invention, the method MCGM according to the present invention was compared with the following 6 multi-labeling methods. The two single-view multi-marking methods adopt a series strategy, and experiments are carried out after the multi-view data set is converted into the single-view data set. Other methods are multi-view multi-label learning methods. The comparison method comprises a single-view multi-mark learning method MLkNN and LLSF, and a computer vision field top journal 2007PR and a data mining field top meeting 2016 TKBE are respectively published. The multi-view multi-mark learning methods FIMAN, ICM2L, iMvWL and BEMVL are respectively published in the data mining field top journal 2020SIGKDD, the artificial intelligence field top journal 2019TCYB, the artificial intelligence field top meeting 2018IJCAI, and the data mining field top meeting 2022TKDD. The method uses five evaluation indicators widely used in multi-label learning to measure the performance of each algorithm. Specific evaluation indexes include Average Precision, coverage, hamming Loss, one Error, and ranking Loss. The mean and standard deviation of each metric value for each dataset will be shown in tables 2 to 7. It should be noted that the present invention shows values of 1-Ranking Loss in the table.
Table 2 experimental results of the improvements (mean.+ -. Standard deviation)
TABLE 3 Yeast experimental results (mean.+ -. Standard deviation)
Table 4 Corel5k experimental results (mean.+ -. Standard deviation)
Table 5 Pascal07 experimental results (mean.+ -. Standard deviation)
TABLE 6 ESPGame experimental results (mean.+ -. Standard deviation)
TABLE 7 Mirflicker test results (mean.+ -. Standard deviation)
From the results reported in tables 2 to 7, it can be observed that MCGM is superior to other comparison methods in most cases, both in large and small data sets. In 30 experimental settings (6 data sets and 5 evaluation criteria), the inventive method ranked the first and second ratios 57% and 40% in the results, respectively. And none of the methods is significantly better than the methods of the present invention in terms of index.
Comparison of MCGM with LLSF and MLkNN shows that the performance of the conventional multi-signature approach to multi-view multi-signature learning approach through parallel strategy improvement can be seen to be deficient, mainly because they ignore multi-view consistency and complementary information mining. That is, they ignore the physical meaning of the individual views in the dataset.
Comparison of MCGM with FIMAN, ICM2L, BEMVL and iMvWL shows that the method of the present invention has good performance in dealing with non-aligned view problems. Other algorithms are defective in encountering view misalignment, since they do not take into account the view misalignment problem. Where iMvWL ignores view diversity, which results in its limitation in view information extraction.
It should be noted that the method according to the embodiment of the present invention is applicable to the problem of non-aligned multi-view multi-label classification.
The foregoing has described in detail embodiments of the invention, which are presented herein with particular reference to the drawings and are presented solely to aid in the understanding of the invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.
Claims (2)
1. The non-aligned multi-view multi-mark learning method based on the graph matching mechanism is characterized in that firstly, non-aligned multi-view multi-mark data are acquired, and the non-aligned multi-view multi-mark data are stored, preprocessed and data set divided to form a sample data set; constructing a feature matrix and an observable mark matrix based on training data in the sample data set; according to the feature matrix and the observable mark matrix, unaligned data are aligned: 1) Explicit alignment is carried out on unaligned data through a permutation matrix, namely point-to-point first-order alignment among samples; 2) Performing second-order alignment on the view on the graph structure by using distance matrixes of samples in different views, so that the alignment accuracy of the model is improved; mining the expression of commonality-personality among aligned data views, and constructing a non-aligned multi-view multi-mark learning model based on a graph matching mechanism by utilizing consistency and complementarity of cross views; training the model by an alternative optimization method until the model converges to obtain a classification predictor; and predicting the test set based on the converged classifier, and obtaining a mark classification result by using the output probability.
2. The non-aligned multi-view multi-label learning method based on a graph matching mechanism according to claim 1, wherein the method comprises the steps of,
step S1, acquiring non-aligned multi-view multi-mark data, and storing, preprocessing and dividing a data set for the non-aligned multi-view multi-mark data; adopting artificial synthetic data set; obtaining a non-aligned multi-view multi-marker dataset;
constructing a data set with V view samples from training dataWherein-> Is the complete feature space of the v-th view, n represents the training sample x i Number d of (d) v Representing the dimension of each sample;representing a markup space corresponding to the feature space; wherein y is i ∈{0,1} n×q Is x i Q represents the number of marks;
step S2, constructing a feature matrix X for training data in the sample data set acquired in the step S1 v The observable mark matrix Y is used for constructing first-order and second-order relation matching of cross views, realizing characteristic alignment of a plurality of views, and constructing a non-aligned multi-view multi-mark classification model based on 'common-single' semantic representation; the method comprises the following specific steps:
(a) Explicit alignment is carried out on unaligned data by utilizing a permutation matrix, and point-to-point first-order alignment between examples is carried out; obtaining a relatively correct mapping relation between the examples after first-order alignment, and extracting common low-dimensional representation of different view data by adopting non-negative matrix factorization; introducing a feature mapping matrix W based on the obtained common representation and the observable marking matrix 0 Constructing a linear mapping relation from a shared subspace to a mark space to obtain an initial learning model:
s.t.(M v ) i,j ∈{0,1},M v 1=1,1 T M v =1 T ,P≥0,H v ≥0,W 0 ≥0(1)
wherein,is the replacement matrix of the v-th view, and the feature matrix is multiplied by the replacement matrix to obtain aligned multi-view data; p and H v The aligned data are obtained through non-negative moment decomposition; />An individual mapping matrix representing a v-th view; />Is a shared subspace, where k is the size of the desired data reduced dimension; p is greater than or equal to 0 and H v 0 is the non-negative constraint of the matrix; />Is a coefficient matrix corresponding to P, thus also constraining W 0 ≥0;W 0 Learning the commonality information of the heterogeneous view by sharing the mapping relation from the subspace P to the subspace Y; alpha and gamma are two hyper-parameters;
(b) Taking into account the individual mapping matrixIn fact, for encoding the corresponding characteristics of a single view, another set of coefficient matrices is defined +.>To capture the unique properties of a single view, +.>The reconstructed feature matrix PH with the v-th view personalized information v Mapping to a markup space; utilizing personality and commonality information of heterogeneous views, andthe expansion formula (1) is as follows:
s.t.(M v ) i,j ∈{0,1},M v 1=1,1 T M v =1 T ,P≥0,H v ≥0,W 0 ≥0,W v ≥0(2)
wherein P can be regarded as a dictionary matrix containing all view information, and H v Representing individual view-specific coding coefficients; PH value v The goal of (1) is to capture individual information for a particular view, while P captures shared information for all views; PH value v And P is considered as the expression of "commonality-personality" of the multi-view data;
(c) Considering alignment and mark correlation of second-order diagram structures among views; due to M v X v And M j X j Respectively represent view X v And view X j Is a properly aligned view of (1); mapping matrices with cross-viewTo represent view X v And view X j Matching degree of cross-view samples; constructing a structure matching loss item to explore a correct cross-view mapping relation; establishing a distance matrix S between respective samples for each view v A graph structure representing view v; by mapping matrix across views->Permutation distance matrix S v And S is j Making the graph connection structure of the two views as similar as possible; />Representing a marker correlation matrix, and extracting known correlation marker information by utilizing correlation in the multi-marker data through the matrix A; introducing a weight of the second order alignment of the super-parameter beta balance; graph-based matching mechanismThe non-aligned multi-view multi-label learning model of (a) is expressed as follows:
s.t.(M v ) i,j ∈{0,1},M v 1=1,1 T M v =1 T ,P≥0,H v ≥0,W 0 ≥0,M v ≥0
wherein the method comprises the steps of
S3, simplifying the expression of the non-aligned multi-view multi-mark learning model based on the graph matching mechanism in S2, and performing alternating optimization training to minimize the model until the model converges to obtain a classification predictor; the method comprises the following specific steps:
(a) Due to M v Is non-convex and it is difficult to obtain an optimal solution; handle M v Is relaxed as:
(b) Converting the objective function into an unconstrained problem by introducing Lagrangian multipliers lambda, phi, theta, omega, psi; in view v, the objective function (3) is changed over view v to the following form:
(c) Iterative optimizationFix P->M v Calculation is independent of M v′ V' noteqv; thus for each view v, for M v Optimizing independently;
standard methods solve the coupling equation (3) and constraintsThe following is obtained with respect to M using a nonlinear method v Is a rule for iterative updating of:
wherein:
(d) Iterative optimization P; fixingIn the formula (3)>Derivative of P to obtain:
using the KKT condition, i.e. phi i,j P i,j =0, resulting in the following iterative update rule for P:
(e) Iterative optimization A; fixingIn the formula (3)>Derivative A is obtained by:
let the derivative equal to 0, the following update rule for A is obtained:
(f) Iterative optimizationFix->When H is v Calculation is independent of H v V' noteqv; for each view v, for H v Optimizing alone, in formula (3)>For H v Derivative is obtained, and the following steps are obtained:
using the KKT condition, i.e. Θ i,j (H v ) i,j =0, giving the following relation to H v Is a rule for iterative updating of:
(g) Iterative optimization W 0 The method comprises the steps of carrying out a first treatment on the surface of the FixingIn the formula (3)>For W 0 Derivative is obtained, and the following steps are obtained:
using the KKT condition, i.e. Ω i,j (W 0 ) i,j =0, giving the following relation to W 0 Is a rule for iterative updating of:
(h) Iterative optimizationFix->In the formula (3)>For W v Derivative is obtained, and the following steps are obtained:
using the KKT condition, i.e. ψ i,j (W v ) i,j =0 gets the following about W v Updating rules:
(i) Repeating (c) to (h) continuously and alternatelyUpdating parameters Until the iteration stop condition is met; converging an objective function, outputting optimal parameters of the non-aligned multi-view multi-mark learning model based on the graph matching mechanism, and obtaining a classification predictor;
step S4, based on the non-aligned multi-view multi-mark learning model based on the graph matching mechanism after convergence, predicting the test set, and obtaining a mark prediction result by using the output probability, wherein the step comprises the following steps: marking prediction matricesWill be given by equationGiven.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311195295.8A CN117409456A (en) | 2023-09-16 | 2023-09-16 | Non-aligned multi-view multi-mark learning method based on graph matching mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311195295.8A CN117409456A (en) | 2023-09-16 | 2023-09-16 | Non-aligned multi-view multi-mark learning method based on graph matching mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117409456A true CN117409456A (en) | 2024-01-16 |
Family
ID=89487860
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311195295.8A Pending CN117409456A (en) | 2023-09-16 | 2023-09-16 | Non-aligned multi-view multi-mark learning method based on graph matching mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117409456A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117690192A (en) * | 2024-02-02 | 2024-03-12 | 天度(厦门)科技股份有限公司 | Abnormal behavior identification method and equipment for multi-view instance-semantic consensus mining |
-
2023
- 2023-09-16 CN CN202311195295.8A patent/CN117409456A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117690192A (en) * | 2024-02-02 | 2024-03-12 | 天度(厦门)科技股份有限公司 | Abnormal behavior identification method and equipment for multi-view instance-semantic consensus mining |
CN117690192B (en) * | 2024-02-02 | 2024-04-26 | 天度(厦门)科技股份有限公司 | Abnormal behavior identification method and equipment for multi-view instance-semantic consensus mining |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Xie et al. | Hyper-Laplacian regularized multilinear multiview self-representations for clustering and semisupervised learning | |
Zhao et al. | Multi-view clustering via deep matrix factorization | |
Zhang et al. | A survey on concept factorization: From shallow to deep representation learning | |
CN113239131B (en) | Low-sample knowledge graph completion method based on meta-learning | |
Lin et al. | Unsupervised feature selection via orthogonal basis clustering and local structure preserving | |
Wang et al. | Minimum error entropy based sparse representation for robust subspace clustering | |
Zamiri et al. | MVDF-RSC: Multi-view data fusion via robust spectral clustering for geo-tagged image tagging | |
Yi et al. | Dual pursuit for subspace learning | |
CN117409456A (en) | Non-aligned multi-view multi-mark learning method based on graph matching mechanism | |
CN114943017B (en) | Cross-modal retrieval method based on similarity zero sample hash | |
CN109063725B (en) | Multi-view clustering-oriented multi-graph regularization depth matrix decomposition method | |
CN117992805B (en) | Zero sample cross-modal retrieval method and system based on tensor product graph fusion diffusion | |
CN111241326A (en) | Image visual relation referring and positioning method based on attention pyramid network | |
Wei et al. | Latent space robust subspace segmentation based on low-rank and locality constraints | |
CN110598740B (en) | Spectrum embedding multi-view clustering method based on diversity and consistency learning | |
CN117173702A (en) | Multi-view multi-mark learning method based on depth feature map fusion | |
Mei et al. | Joint feature selection and optimal bipartite graph learning for subspace clustering | |
CN107993311B (en) | Cost-sensitive latent semantic regression method for semi-supervised face recognition access control system | |
Leiber et al. | The dipencoder: Enforcing multimodality in autoencoders | |
CN115392474B (en) | Local perception graph representation learning method based on iterative optimization | |
Zhao et al. | Ensemble subspace segmentation under blockwise constraints | |
CN111160398A (en) | Missing label multi-label classification method based on example level and label level association | |
CN109614581A (en) | The Non-negative Matrix Factorization clustering method locally learnt based on antithesis | |
Dornaika et al. | Simultaneous label inference and discriminant projection estimation through adaptive self-taught graphs | |
Pu et al. | Deep Multi-View Clustering via View-Specific Representation and Global Graph |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |