CN110633732A - Multi-modal image recognition method based on low-rank and joint sparsity - Google Patents
Multi-modal image recognition method based on low-rank and joint sparsity Download PDFInfo
- Publication number
- CN110633732A CN110633732A CN201910751979.9A CN201910751979A CN110633732A CN 110633732 A CN110633732 A CN 110633732A CN 201910751979 A CN201910751979 A CN 201910751979A CN 110633732 A CN110633732 A CN 110633732A
- Authority
- CN
- China
- Prior art keywords
- dictionary
- matrix
- low
- rank
- updating
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 239000011159 matrix material Substances 0.000 claims description 56
- 238000005457 optimization Methods 0.000 claims description 14
- 238000012549 training Methods 0.000 claims description 14
- 238000012545 processing Methods 0.000 claims description 6
- 238000000354 decomposition reaction Methods 0.000 claims description 3
- 230000017105 transposition Effects 0.000 claims description 3
- 230000004927 fusion Effects 0.000 abstract description 8
- 230000009286 beneficial effect Effects 0.000 abstract description 3
- 230000008569 process Effects 0.000 description 10
- 230000006870 function Effects 0.000 description 7
- 238000012360 testing method Methods 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 238000004088 simulation Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- OAICVXFJPJFONN-UHFFFAOYSA-N Phosphorus Chemical compound [P] OAICVXFJPJFONN-UHFFFAOYSA-N 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000011840 criminal investigation Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2132—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on discrimination criteria, e.g. discriminant analysis
- G06F18/21322—Rendering the within-class scatter matrix non-singular
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/28—Determining representative reference patterns, e.g. by averaging or distorting; Generating dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2132—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on discrimination criteria, e.g. discriminant analysis
- G06F18/21322—Rendering the within-class scatter matrix non-singular
- G06F18/21324—Rendering the within-class scatter matrix non-singular involving projections, e.g. Fisherface techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/513—Sparse representations
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a multi-modal image recognition method based on low rank and joint sparsity, and belongs to the technical field of image recognition. In order to overcome the technical problem that the inter-modal difference among multi-modal images is larger than the inter-category difference, the original multi-modal data are projected into a low-rank common subspace, the low-rank constraint on the common subspace can effectively retain the similar information among different modalities of the same category, so that the connection among categories in the low-rank common subspace is larger, the image dimensionality can be reduced, the dimensionality disaster is avoided to a certain extent, and then the joint sparse representation of the data of different modalities is obtained in a joint sparse constraint mode to obtain the fused features; and classifying and identifying the features through a common classifier to obtain a final identification result. Aiming at the multi-modal problem, the invention combines the characteristics of a plurality of modes by adopting characteristic fusion to obtain the characteristics which are more beneficial to identification, thereby improving the identification efficiency.
Description
Technical Field
The invention belongs to the technical field of image recognition, and particularly relates to a multi-modal image recognition technology based on low-rank and joint sparsity.
Background
Image recognition technology uses a computer to process and analyze images, classify objects in the images, make meaningful judgments, and the like. With the development of sensors, in real life, multi-modal image data is easily captured. The multi-mode data can be fused to provide complementary information, so that the recognition performance is improved, and compared with a scheme based on single-mode information, the scheme based on the multi-mode information has a higher practical application value. Due to the difference of imaging mechanisms of different modes, the traditional single-mode image recognition algorithm cannot process multi-mode images, and further application of image recognition is limited. With large differences between data from different modalities, multi-modal data can be considered to come from different domains, with different distributions. Thus, no direct comparison can be made between multimodal data. Compared with a single-mode recognition method, the multi-mode-based recognition algorithm has the challenge of linking information of multiple modes to reduce mode difference.
In order to combine the multi-modal features, the feature fusion technology can be applied to feature fusion and extraction of the multi-modal. Joint sparse representation is a common feature fusion tool, and the basic principle thereof is to achieve the purpose of combining multiple modal features by constraining the sparse representations of samples of the same generic class to share the same sparse pattern (i.e. dictionary atoms used by the sparse representation are the same). The document "X.Yuan, S.Yan.visual classification with multi-task joint registration [ C ]. in 2010IEEE Computer Society Conference on Computer Vision and Pattern registration, 2010, 3493-3500" is to use the same sparse Pattern for feature fusion. Thus, the assumption is suitable for the case that different features extracted by the same reference are used as multi-modal information, and for the case that observation values of similar samples such as multi-view or multi-sensors are large in difference, the assumption of constraining the same sparse mode limits the composition of dictionary atoms, and is not suitable for data with large modal difference. The documents "S.Shekhar, V.M.Patel, N.M.Nasrabadi, et. Joint Sparse Representation for Robust Multimodal Biometrics [ J. IEEE Transactions on Pattern Analysis and Machine Analysis, 2014,36(1): 113-126." propose joint sparsity constraints which relax the constraints on Sparse modes with respect to the assumption of joint sparsity expression, more applicable to Multimodal situations. Since the similarity in the modality is greater than the similarity in the category, the direct fusion of the features with large differences in the conventional multimodal feature fusion method is easy to lose modality information.
Disclosure of Invention
The invention aims to: in view of the above existing problems, a recognition method suitable for use in a multi-modal situation is provided.
The invention discloses a multi-modal identification method based on low-rank and joint sparse constraint, which specifically comprises the following steps:
step S1: training a low-rank projection matrix P for multi-modal recognition and a dictionary D:
step S101: constructing an optimization model:
wherein K represents a modal number, C represents a category number, | · | | purpleFRepresenting Frobenius norm, | | · |. luminance*Representing a nuclear norm, and a superscript T representing a matrix transposition;
Xia feature matrix representing training samples of the i-th modality, and XiIs miX n dimensional matrix, where miRepresenting the characteristic dimension (image characteristic dimension) of the training sample of the ith modality, wherein n represents the number of samples contained in each modality; diRepresenting the i-th modeA dictionary; lambdaiRepresentation dictionary DiCoefficient matrices of, i.e. dictionaries DiA sparse coefficient matrix of (a); sparse coefficient Λ ═ Λ1,Λ2,...,ΛK](ii) a λ represents a regularization parameter;dictionary D representing the ith modalityiAn atom of (a); i represents an identity matrix;
step S102: solving the constructed optimization model by adopting an alternating direction multiplier method based on a preset training sample set to obtain a low-rank projection matrix P and a dictionary D;
step S2: feature matrix Y based on different modalities of objects to be classified1,Y2,...,YKAfter the object to be classified is projected through a low-rank projection matrix P, the joint sparse representation about the dictionary D is solved, and therefore the joint sparse coefficient of the object to be classified is obtained
Step S3: joint sparse coefficient based on object to be classifiedAnd carrying out classification processing to obtain a classification identification result of the object to be classified.
Further, in step S102, the specific steps of obtaining the low-rank projection matrix P and the dictionary D are as follows:
step S102-1: initializing initial parameters including sparse coefficient Λ0Low rank projection matrix P0Dictionary D0(ii) a Auxiliary variable Z0And W0Lagrange multiplier AZ,0And AW,0(ii) a And lagrange parameter alphaZ,αW(ii) a The iteration time t is 0, and the maximum iteration time k is obtained;
Auxiliary variable Z0Matrix dimension and sparsity factor Λ0Are the same in matrix dimension, W0And a low rank projection matrix P0The matrix dimensions of the low-rank projection matrix are the same (the dimension of the low-rank projection matrix is preset based on an actual application scene); namely, it is
Step S102-2: updating the sparse coefficient lambda:
by the formulaObtaining a coefficient matrix of the dictionary corresponding to the ith mode after the t +1 th updateThereby obtaining the coefficient Lambda after the t +1 time of updatingt+1;
Step S102-3: updating the dictionary D:
solving the equation by a quadratic problem solverThus obtaining t +1 updated dictionary Dt+1;
Step S102-4: update low rank projection P:
by the formulaObtaining the low-rank projection matrix P after the t +1 time of updatingt+1;
Step S102-5: updating the auxiliary variable Z:
by the formulaObtaining the auxiliary variable Z at the t +1Row vector z of ith row after secondary updatei,t+1So as to obtain the t +1 updated auxiliary variable Zt+1;
step S102-6: updating the auxiliary variable W:
Wherein F Σ BTIs thatSingular value decomposition of (c); the function S (a, b) takes the values: when | a | ≧ b: s (a, b) ═ sgn (a) (| a | -b); when | a | < b: s (a, b) ═ 0;
step S102-7: updating lagrange multiplier AZAnd AW:
By formula AZ,t+1=AZ,t+αZ(Λt+1-Zt+1) And AW,t+1=AW,t+αW(Pt+1-Wt+1) Obtaining Lagrange multiplier A after t +1 time of updatingZ,t+1And AW,t+1;
Step S102-8: judging whether the iteration time t reaches the maximum iteration time k, if so, updating the latest Pt+1And Dt+1As training result values of the projection matrix P and the dictionary D; otherwise, t +1 is updated, and the procedure returns to step S102-2.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:
according to the method, the difference between the modalities is reduced through low-rank projection, and the low-rank constraint on the public subspace can effectively retain the similar information between different modalities of the same type, so that the connection between the types in the low-rank public subspace is larger, the image dimensionality can be reduced, and the dimensionality disaster is avoided to a certain extent. Aiming at the multi-modal problem, the characteristics of a plurality of modes are combined by adopting characteristic fusion to obtain the characteristics which are more beneficial to identification, and the identification efficiency is improved.
Drawings
FIG. 1 shows the recognition rate of the present invention under the near infrared and visible light face data set (CASIA HFB).
FIG. 2 is a parameter characteristic diagram of the present invention. The numbers 1 to 8 on the abscissa represent these values, respectively: 0.001,0.01,0.1,0.5,1,5, 10, 100.
Fig. 3 is a graph of the convergence of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings.
The solution of the invention is that firstly, the multi-mode images distributed in different spaces are projected in low rank, so as to reduce the difference between modes, retain the important distinguishing information of the same category and reduce the data dimension; then, in the same projected space, performing feature fusion by adopting joint sparse constraint; and finally, realizing image recognition processing based on the fused features. Because the same type but different modes have similar information, compared with high-dimensional original image information, the similar information between the modes is required to be low-dimensional, and in order to reflect the low-dimensional characteristics in the multi-mode public subspace, the method extracts the similar information between the modes on the premise that the multi-mode projection matrix is low-rank, achieves the purpose of reducing the mode difference, and further improves the identification efficiency. The invention can be applied to identification processing in scenes such as identity identification, safety monitoring, criminal investigation and crime and the like.
The specific implementation process of the multi-modal identification method based on the low-rank and joint sparse constraint is as follows:
consider a training sample with C classes, K modalities, with the training sample for each modality expressed as:i=1,2,...,K,mithe dimension of the sample is trained by the table, and n is the number of samples contained in each mode. If the low rank subspace is represented by P (also called low rank common projection), P is obtained after low rank projection of the sampleTXi. The invention adopts joint sparse representation based on dictionary learning mode and designDictionary of corresponding modalities, NiFor the noise of the ith modal sample, the theory represented by joint sparsity is as follows:
PTXi=DiΛi+Ni,i=1,2,...,K
wherein, ΛiRepresentation dictionary DiThe coefficient matrix of (2).
Introducing low-rank constraint and joint sparse constraint, and solving by the following optimization formula (optimization model) to obtain low-rank public projection P and dictionary DiAnd its coefficient matrix Λi:
WhereinAs a dictionary Di(ii) atom (| · |) non-combustible gasFRepresenting the Frobenius norm with a sparse coefficient Λ ═ Λ1,Λ2,...,ΛK]I.e. ΛiIs a sub-matrix of the matrix Λ, which may also be referred to as a sparse coefficient matrix; kernel norm P (| non-conducting phosphor)*=∑iσi(P) the value is the sum of the eigenvalues of the matrix. Since the rank minimization problem cannot be directly handled with low rank (p), the rank minimization problem can be approximately solved by the kernel norm. Orthogonal constraint PTI ensures that the resulting P is the basis transformation matrix, where I is the identity matrix and λ is the regularization parameter.
The solution was performed by Alternating orientation Method of Multipliers (ADMM). Introducing auxiliary variables Z and W, definingAugmented Lagrange functionThe expression is as follows:
wherein A isZ、AWIs a linearly constrained multiplier (i.e., Lagrange multiplier), alphaZ、αWIs a positive parameter, a sign<A, B > represents tr (A)TB) Tr (-) denotes a trace of the matrix, andwhereinAre respectively AZ、AWAn element of (1); dictionary D ═ D1,D2,...,DK]。
Solving functions for P, Λ, Z and W according to augmented Lagrange multiplicationWhile maintaining AZAnd AWNot changing, then fixing other variables, for AZAnd AWAnd (6) updating. Having an objective functionWith a distributed structure, to simplify the problem, the problem can be solved by taking the variables P, Λ, Z and W as the unique variables of the objective function, respectively. The solving process for each sub-optimization goal is described in detail below. Since the optimization process is an iterative update solving process, the result of the t-th update is represented by adding subscript t (t ≧ 0) to the corresponding variable in the following equation.
(1) And updating the sparse coefficient lambda.
When solving the sparse coefficient, the optimization formula is converted into:
the optimization formula is a convex function, and an updating formula of the sparse coefficient lambda is obtained by solving the first order partial derivative and zero calculation:
wherein I is an identity matrix, ZiA sub-matrix of Z, i.e. Z ═ Z1,Z2,...,ZK]。
(2) And updating the dictionary D.
Fixing other variables and parameters to obtain the following optimized formula:
this is a quadratic constrained quadratic programming problem (QCQP) that can be solved by a quadratic solver.
(3) Low rank projection P updates.
The optimization formula for solving the low-rank projection is as follows:
the optimized expression is a convex function, and is obtained by solving the first order partial derivative and zero:
(4) The auxiliary variable Z is updated.
Solving the optimization problem for the auxiliary variable Z becomes:
equivalent transformation into:
since the above equation has a separable structure, each row of Z can be treated separately to solve this problem. Memory gammai,ziAre respectively Lambda and AZAnd row i of Z. The problem solving by the above equation translates into:
wherein, the sign function (c)+Represents taking max (c) in vector ciAnd 0) value.
(5) The auxiliary variable W is updated.
Updating the auxiliary variable W requires solving the following optimization:
the above equation is equivalently converted into:
the above equation is the shrinkage problem, and the solution equation is:
wherein F Σ BTIs thatThe Singular Value Decomposition (SVD) of (a) and (b) of the function S are, when | a | ≧ b: s (a, b) ═ sgn (a) (| a | -b); when | a | < b: s (a, b) ═ 0.
(6) Parameter AZAnd AWAnd (4) updating.
Lagrange multiplier aZAnd AWThe update formula of (2) is: a. theZ,t+1=AZ,t+αZ(Λt+1-Zt+1)
AW,t+1=AW,t+αW(Pt+1-Wt+1)
In summary, to solve for the low rank common projection P, dictionary DiAnd its coefficient matrix ΛiThe specific solving process is as follows:
step 1: initializing parameters:
the initialization parameters include: sparse coefficient Λ0(ii) a Low rank projection matrix P0(ii) a Dictionary D0(ii) a Auxiliary variable Z0(ii) a Auxiliary variable W0(ii) a Linear constraint multiplier aZ,AW(ii) a And lagrange parameter alphaZ,αW(ii) a The maximum number of iterations k.
Step 2: updating the sparse coefficient lambda:
And step 3: updating the dictionary D:
And 4, step 4: update low rank projection P:
And 5: updating the auxiliary variable Z:
through typeThe parameter Z is updated.
Step 6: updating the auxiliary variable W:
through typeThe parameter W is updated.
And 7: updating lagrange multiplier AZAnd AW:
Through the formula AZ,t+1=AZ,t+αZ(Λt+1-Zt+1) And AW,t+1=AW,t+αW(Pt+1-Wt+1) For parameter AZAnd AWAnd (6) updating.
And 8: judging the iteration times:
when the iteration time t is less than k, t is t +1, and the step 2 is returned; and when t is larger than or equal to k, ending, and outputting the obtained projection matrix P and the dictionary D.
Through the steps, the training process of the optimization model is completed, and a projection matrix P and a dictionary D are obtained.
In the test process, the test sample with K modes is recorded as { Y1,Y2,...,YKAnd (5) after the test samples are projected through a low-rank projection matrix P, solving the joint sparse representation about the dictionary D, and obtaining a joint sparse coefficient by solving the following formula
Wherein the value of the lambda is the same as that of the lambda in the training process,
step 10: and the classifier classifies according to the joint sparse coefficient to obtain an identification result.
For example, based on an actual application scene, category labels matched with different joint sparse coefficients are preset, so that the joint sparse coefficients obtained based on the current solution are matched with the corresponding category labels, and further a category identification result of the current image to be classified is obtained.
The classification includes, but is not limited to, KNN classifier, Support Vector Machine (SVM), naive Bayes classifier.
Examples
To further verify the identification performance of the present invention, simulation verification was performed on MATLAB 2016. For the convenience of analysis, the simulation scene considers the human face recognition under the near infrared and visible light scenes and the multi-view scene. Eight existing classification methods selected in a comparative experiment are specifically as follows: SCDL (Semi-coordinated Dictionary Learning), CDL (Coupled Dictionary Learning), GCDL1, GCDL2(Generalized Coupled Dictionary Learning), PCA (Primary Complex analysis), SRRS (Supervised Regularizationbased distribution hub subsystem), LRCS (Low-random mon subsystem) and CLRS (Collective Low-random subsystem); wherein SCDL, CDL, GCDL1 and GCDL2 are dictionary learning-based methods, and PCA, SRRS, LRCS and CLRS are common subspace learning-based methods.
Comparing the method (Ours) of the present invention with the existing eight methods for comparison, performing a comparison test of recognition rates on face data sets (CMU Multi PIE) at two different viewing angles, wherein the specific comparison is shown in table 1, wherein cases 1 to 6 represent different viewing angle combination schemes, that is, two different viewing angles are selected from the CMU Multi PIE to be combined, and 6 different combination results are obtained for simulation verification.
TABLE 1
As can be seen from Table 1, the low rank and joint sparsity based multi-modal image recognition method of the invention has better recognition rate than the existing method.
Fig. 1 shows the comparison result of the recognition rate of the present invention and the recognition rate of the above eight conventional classification methods in the face data set (CASIA HFB) of the infrared and visible light scenes, where the recognition rate of the present invention is the highest.
Fig. 2 shows a parameter characteristic diagram of the present invention when performing recognition processing in a near-infrared and visible light scene. The numbers 1 to 8 on the abscissa represent the values of the regularization parameter λ respectively as follows: 0.001,0.01,0.1,0.5,1,5, 10, 100.
Fig. 3 shows a convergence graph when the Recognition processing is performed in the near-infrared and visible light scenes, in which the abscissa is the number of iterations, a circled curve represents a convergence curve (Objective value), and a curve with "x" represents a Recognition rate variation curve (Recognition rate). As can be seen from fig. 3, in this embodiment, the optimal maximum number of iterations may be set to be between 10 and 15, so as to reduce the computation amount on the premise of ensuring the recognition rate.
While the invention has been described with reference to specific embodiments, any feature disclosed in this specification may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise; all of the disclosed features, or all of the method or process steps, may be combined in any combination, except mutually exclusive features and/or steps.
Claims (3)
1. A multi-modal image recognition method based on low rank and joint sparsity is characterized by comprising the following steps:
step S1: training a low-rank projection matrix P for multi-modal recognition and a dictionary D:
step S101: constructing an optimization model:
wherein K represents a modal number, C represents a category number, | · | | purpleFRepresenting Frobenius norm, | | · |. luminance1,2Representing a joint sparse norm, | · | | luminance*Representing a nuclear norm, and a superscript T representing a matrix transposition;
Xia feature matrix representing training samples of the i-th modality, and XiIs miX n dimensional matrix, where miRepresenting the characteristic dimension of training samples of the ith mode, and n represents the number of samples contained in each mode; diA dictionary representing the ith modality; lambdaiRepresentation dictionary DiA coefficient matrix of (a); sparse coefficient Λ ═ Λ1,Λ2,...,ΛK](ii) a λ represents a regularization parameter;dictionary D representing the ith modalityiAn atom of (a); i represents an identity matrix;
step S102: solving the constructed optimization model by adopting an alternating direction multiplier method based on a preset training sample set to obtain a low-rank projection matrix P and a dictionary D;
step S2: feature matrix Y based on different modalities of objects to be classified1,Y2,...,YKAfter the object to be classified is projected through a low-rank projection matrix P, the joint sparse representation about the dictionary D is solved, and therefore the joint sparse coefficient of the object to be classified is obtained
2. The method of claim 1, wherein the step S102 of obtaining the low rank projection matrix P and the dictionary D comprises the steps of:
step S102-1: initializing initial parameters including sparse coefficient Λ0Low rank projection matrix P0Dictionary D0(ii) a Auxiliary variable Z0And W0Lagrange multiplier AZ,0And AW,0(ii) a And lagrange parameter alphaZ,αW(ii) a The iteration time t is 0, and the maximum iteration time k is obtained;
Auxiliary variable Z0Matrix dimension and sparsity factor Λ0Has the same dimension of matrix, and an auxiliary variable W0And a low rank projection matrix P0The matrix dimensions of (a) are the same;
step S102-2: updating the sparse coefficient lambda:
by the formulaObtaining a coefficient matrix of the dictionary corresponding to the ith mode after the t +1 th updateThereby obtaining the sparse coefficient Lambda after the t +1 time of updatingt+1;
Step S102-3: updating the dictionary D:
Step S102-4: update low rank projection P:
Step S102-5: updating the auxiliary variable Z:
by the formulaObtaining the row vector Z of the ith row of the auxiliary variable Z after the t +1 th updatei,t+1So as to obtain the t +1 updated auxiliary variable Zt+1;
step S102-6: updating the auxiliary variable W:
Wherein F Σ BTIs thatSingular value decomposition of (c); the function S (a, b) takes the values: when | a | ≧ b: s (a, b) ═ sgn (a) (| a | -b); when | a | < b: s (a, b) ═ 0;
step S102-7: updating lagrange multiplier AZAnd AW:
By formula AZ,t+1=AZ,t+αZ(Λt+1-Zt+1) And AW,t+1=AW,t+αW(Pt+1-Wt+1) Obtaining Lagrange multiplier A after t +1 time of updatingZ,t+1And AW,t+1;
Step S102-8: judging whether the iteration time t reaches the maximum iteration time k, if so, updating the latest Pt+1And Dt+1As training result values of the projection matrix P and the dictionary D; otherwise, t +1 is updated, and the procedure returns to step S102-2.
3. The method of claim 2, wherein the maximum number of iterations k ranges from 10 to 15.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910751979.9A CN110633732B (en) | 2019-08-15 | 2019-08-15 | Multi-modal image recognition method based on low-rank and joint sparsity |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910751979.9A CN110633732B (en) | 2019-08-15 | 2019-08-15 | Multi-modal image recognition method based on low-rank and joint sparsity |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110633732A true CN110633732A (en) | 2019-12-31 |
CN110633732B CN110633732B (en) | 2022-05-03 |
Family
ID=68969698
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910751979.9A Active CN110633732B (en) | 2019-08-15 | 2019-08-15 | Multi-modal image recognition method based on low-rank and joint sparsity |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110633732B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112149053A (en) * | 2020-08-19 | 2020-12-29 | 江苏大学 | Multi-view image characterization method based on low-rank correlation analysis |
CN112541554A (en) * | 2020-12-18 | 2021-03-23 | 华中科技大学 | Multi-modal process monitoring method and system based on time constraint kernel sparse representation |
CN116246712A (en) * | 2023-02-13 | 2023-06-09 | 中国人民解放军军事科学院军事医学研究院 | Data subtype classification method with sparse constraint multi-mode matrix joint decomposition |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103632138A (en) * | 2013-11-20 | 2014-03-12 | 南京信息工程大学 | Low-rank partitioning sparse representation human face identifying method |
CN103761537A (en) * | 2014-02-07 | 2014-04-30 | 重庆市国土资源和房屋勘测规划院 | Image classification method based on low-rank optimization feature dictionary model |
US20170024855A1 (en) * | 2015-07-26 | 2017-01-26 | Macau University Of Science And Technology | Single Image Super-Resolution Method Using Transform-Invariant Directional Total Variation with S1/2+L1/2-norm |
CN107563968A (en) * | 2017-07-26 | 2018-01-09 | 昆明理工大学 | A kind of method based on the group medicine image co-registration denoising for differentiating dictionary learning |
CN107977949A (en) * | 2017-07-26 | 2018-05-01 | 昆明理工大学 | A kind of method improved based on projection dictionary to the Medical image fusion quality of study |
CN108460412A (en) * | 2018-02-11 | 2018-08-28 | 北京盛安同力科技开发有限公司 | A kind of image classification method based on subspace joint sparse low-rank Structure learning |
CN109215780A (en) * | 2018-08-24 | 2019-01-15 | 齐鲁工业大学 | The multi-modal data analysis method and system of high Laplace regularization low-rank representation |
CN109447009A (en) * | 2018-11-02 | 2019-03-08 | 南京审计大学 | Hyperspectral image classification method based on subspace nuclear norm regularized regression model |
CN109522956A (en) * | 2018-11-16 | 2019-03-26 | 哈尔滨理工大学 | A kind of low-rank differentiation proper subspace learning method |
CN110069978A (en) * | 2019-03-04 | 2019-07-30 | 杭州电子科技大学 | The face identification method that the non-convex low-rank decomposition of identification and superposition Sparse indicate |
-
2019
- 2019-08-15 CN CN201910751979.9A patent/CN110633732B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103632138A (en) * | 2013-11-20 | 2014-03-12 | 南京信息工程大学 | Low-rank partitioning sparse representation human face identifying method |
CN103761537A (en) * | 2014-02-07 | 2014-04-30 | 重庆市国土资源和房屋勘测规划院 | Image classification method based on low-rank optimization feature dictionary model |
US20170024855A1 (en) * | 2015-07-26 | 2017-01-26 | Macau University Of Science And Technology | Single Image Super-Resolution Method Using Transform-Invariant Directional Total Variation with S1/2+L1/2-norm |
CN107563968A (en) * | 2017-07-26 | 2018-01-09 | 昆明理工大学 | A kind of method based on the group medicine image co-registration denoising for differentiating dictionary learning |
CN107977949A (en) * | 2017-07-26 | 2018-05-01 | 昆明理工大学 | A kind of method improved based on projection dictionary to the Medical image fusion quality of study |
CN108460412A (en) * | 2018-02-11 | 2018-08-28 | 北京盛安同力科技开发有限公司 | A kind of image classification method based on subspace joint sparse low-rank Structure learning |
CN109215780A (en) * | 2018-08-24 | 2019-01-15 | 齐鲁工业大学 | The multi-modal data analysis method and system of high Laplace regularization low-rank representation |
CN109447009A (en) * | 2018-11-02 | 2019-03-08 | 南京审计大学 | Hyperspectral image classification method based on subspace nuclear norm regularized regression model |
CN109522956A (en) * | 2018-11-16 | 2019-03-26 | 哈尔滨理工大学 | A kind of low-rank differentiation proper subspace learning method |
CN110069978A (en) * | 2019-03-04 | 2019-07-30 | 杭州电子科技大学 | The face identification method that the non-convex low-rank decomposition of identification and superposition Sparse indicate |
Non-Patent Citations (5)
Title |
---|
DING Z 等: ""Low-rank embedded ensemble semantic dictionary for zero-shot learning"", 《PROCEEDINGS OF THE IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 * |
LIU G 等: ""Robust recovery of subspace structures by low-rank representation"", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》 * |
SUN B 等: ""Fusion of noisy images based on joint distribution model in dual‐tree complex wavelet domain"", 《INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY》 * |
邓志华 等: ""低秩稀疏分解与显著性度量的医学图像融合"", 《光学技术》 * |
高仕博 等: ""面向目标检测的稀疏表示方法研究进展"", 《电子学报》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112149053A (en) * | 2020-08-19 | 2020-12-29 | 江苏大学 | Multi-view image characterization method based on low-rank correlation analysis |
CN112541554A (en) * | 2020-12-18 | 2021-03-23 | 华中科技大学 | Multi-modal process monitoring method and system based on time constraint kernel sparse representation |
CN112541554B (en) * | 2020-12-18 | 2024-03-22 | 华中科技大学 | Multi-mode process monitoring method and system based on time constraint and nuclear sparse representation |
CN116246712A (en) * | 2023-02-13 | 2023-06-09 | 中国人民解放军军事科学院军事医学研究院 | Data subtype classification method with sparse constraint multi-mode matrix joint decomposition |
CN116246712B (en) * | 2023-02-13 | 2024-03-26 | 中国人民解放军军事科学院军事医学研究院 | Data subtype classification method with sparse constraint multi-mode matrix joint decomposition |
Also Published As
Publication number | Publication date |
---|---|
CN110633732B (en) | 2022-05-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Tang et al. | Learning a joint affinity graph for multiview subspace clustering | |
Patel et al. | Latent space sparse and low-rank subspace clustering | |
Patel et al. | Kernel sparse subspace clustering | |
Zheng et al. | Iterative re-constrained group sparse face recognition with adaptive weights learning | |
Yan et al. | Graph embedding and extensions: A general framework for dimensionality reduction | |
Zhang et al. | Graph based constrained semi-supervised learning framework via label propagation over adaptive neighborhood | |
He et al. | Robust principal component analysis based on maximum correntropy criterion | |
CN110633732B (en) | Multi-modal image recognition method based on low-rank and joint sparsity | |
Li et al. | Mutual component analysis for heterogeneous face recognition | |
CN107392107B (en) | Face feature extraction method based on heterogeneous tensor decomposition | |
Zheng et al. | A novel approach inspired by optic nerve characteristics for few-shot occluded face recognition | |
Nguyen et al. | Kernel low-rank representation for face recognition | |
Lu et al. | Nuclear norm-based 2DLPP for image classification | |
CN104715266B (en) | The image characteristic extracting method being combined based on SRC DP with LDA | |
CN107918761A (en) | A kind of single sample face recognition method based on multiple manifold kernel discriminant analysis | |
Nguyen et al. | Discriminative low-rank dictionary learning for face recognition | |
Puthenputhussery et al. | A sparse representation model using the complete marginal fisher analysis framework and its applications to visual recognition | |
Li et al. | Robust subspace clustering with independent and piecewise identically distributed noise modeling | |
CN111310813A (en) | Subspace clustering method and device for potential low-rank representation | |
Jin et al. | Multiple graph regularized sparse coding and multiple hypergraph regularized sparse coding for image representation | |
Zhang et al. | Cost-sensitive joint feature and dictionary learning for face recognition | |
He et al. | Low-rank representation with graph regularization for subspace clustering | |
Chen et al. | Semi-supervised dictionary learning with label propagation for image classification | |
Li et al. | Unsupervised active learning via subspace learning | |
Givens et al. | Biometric face recognition: from classical statistics to future challenges |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |