CN110633732A

CN110633732A - Multi-modal image recognition method based on low-rank and joint sparsity

Info

Publication number: CN110633732A
Application number: CN201910751979.9A
Authority: CN
Inventors: 孙彬; 杨轲; 王子强; 朱韦丹; 卢陶然; 刘强; 徐利梅
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2019-08-15
Filing date: 2019-08-15
Publication date: 2019-12-31
Anticipated expiration: 2039-08-15
Also published as: CN110633732B

Abstract

The invention discloses a multi-modal image recognition method based on low rank and joint sparsity, and belongs to the technical field of image recognition. In order to overcome the technical problem that the inter-modal difference among multi-modal images is larger than the inter-category difference, the original multi-modal data are projected into a low-rank common subspace, the low-rank constraint on the common subspace can effectively retain the similar information among different modalities of the same category, so that the connection among categories in the low-rank common subspace is larger, the image dimensionality can be reduced, the dimensionality disaster is avoided to a certain extent, and then the joint sparse representation of the data of different modalities is obtained in a joint sparse constraint mode to obtain the fused features; and classifying and identifying the features through a common classifier to obtain a final identification result. Aiming at the multi-modal problem, the invention combines the characteristics of a plurality of modes by adopting characteristic fusion to obtain the characteristics which are more beneficial to identification, thereby improving the identification efficiency.

Description

Multi-modal image recognition method based on low-rank and joint sparsity

Technical Field

The invention belongs to the technical field of image recognition, and particularly relates to a multi-modal image recognition technology based on low-rank and joint sparsity.

Background

Image recognition technology uses a computer to process and analyze images, classify objects in the images, make meaningful judgments, and the like. With the development of sensors, in real life, multi-modal image data is easily captured. The multi-mode data can be fused to provide complementary information, so that the recognition performance is improved, and compared with a scheme based on single-mode information, the scheme based on the multi-mode information has a higher practical application value. Due to the difference of imaging mechanisms of different modes, the traditional single-mode image recognition algorithm cannot process multi-mode images, and further application of image recognition is limited. With large differences between data from different modalities, multi-modal data can be considered to come from different domains, with different distributions. Thus, no direct comparison can be made between multimodal data. Compared with a single-mode recognition method, the multi-mode-based recognition algorithm has the challenge of linking information of multiple modes to reduce mode difference.

In order to combine the multi-modal features, the feature fusion technology can be applied to feature fusion and extraction of the multi-modal. Joint sparse representation is a common feature fusion tool, and the basic principle thereof is to achieve the purpose of combining multiple modal features by constraining the sparse representations of samples of the same generic class to share the same sparse pattern (i.e. dictionary atoms used by the sparse representation are the same). The document "X.Yuan, S.Yan.visual classification with multi-task joint registration [ C ]. in 2010IEEE Computer Society Conference on Computer Vision and Pattern registration, 2010, 3493-3500" is to use the same sparse Pattern for feature fusion. Thus, the assumption is suitable for the case that different features extracted by the same reference are used as multi-modal information, and for the case that observation values of similar samples such as multi-view or multi-sensors are large in difference, the assumption of constraining the same sparse mode limits the composition of dictionary atoms, and is not suitable for data with large modal difference. The documents "S.Shekhar, V.M.Patel, N.M.Nasrabadi, et. Joint Sparse Representation for Robust Multimodal Biometrics [ J. IEEE Transactions on Pattern Analysis and Machine Analysis, 2014,36(1): 113-126." propose joint sparsity constraints which relax the constraints on Sparse modes with respect to the assumption of joint sparsity expression, more applicable to Multimodal situations. Since the similarity in the modality is greater than the similarity in the category, the direct fusion of the features with large differences in the conventional multimodal feature fusion method is easy to lose modality information.

Disclosure of Invention

The invention aims to: in view of the above existing problems, a recognition method suitable for use in a multi-modal situation is provided.

The invention discloses a multi-modal identification method based on low-rank and joint sparse constraint, which specifically comprises the following steps:

step S1: training a low-rank projection matrix P for multi-modal recognition and a dictionary D:

step S101: constructing an optimization model:

Xⁱa feature matrix representing training samples of the i-th modality, and XⁱIs m_iX n dimensional matrix, where m_iRepresenting the characteristic dimension (image characteristic dimension) of the training sample of the ith modality, wherein n represents the number of samples contained in each modality; dⁱRepresenting the i-th modeA dictionary; lambdaⁱRepresentation dictionary DⁱCoefficient matrices of, i.e. dictionaries DⁱA sparse coefficient matrix of (a); sparse coefficient Λ ═ Λ¹,Λ²,...,Λ^K](ii) a λ represents a regularization parameter;

dictionary D representing the ith modalityⁱAn atom of (a); i represents an identity matrix;

step S102: solving the constructed optimization model by adopting an alternating direction multiplier method based on a preset training sample set to obtain a low-rank projection matrix P and a dictionary D;

step S2: feature matrix Y based on different modalities of objects to be classified¹,Y²,...,Y^KAfter the object to be classified is projected through a low-rank projection matrix P, the joint sparse representation about the dictionary D is solved, and therefore the joint sparse coefficient of the object to be classified is obtained

Step S3: joint sparse coefficient based on object to be classified

And carrying out classification processing to obtain a classification identification result of the object to be classified.

Further, in step S102, the specific steps of obtaining the low-rank projection matrix P and the dictionary D are as follows:

step S102-1: initializing initial parameters including sparse coefficient Λ₀Low rank projection matrix P₀Dictionary D₀(ii) a Auxiliary variable Z₀And W₀Lagrange multiplier A_Z,0And A_W,0(ii) a And lagrange parameter alpha_Z，α_W(ii) a The iteration time t is 0, and the maximum iteration time k is obtained;

wherein, the dictionary

Coefficient of sparseness

Auxiliary variable Z₀Matrix dimension and sparsity factor Λ₀Are the same in matrix dimension, W₀And a low rank projection matrix P₀The matrix dimensions of the low-rank projection matrix are the same (the dimension of the low-rank projection matrix is preset based on an actual application scene); namely, it is

Step S102-2: updating the sparse coefficient lambda:

by the formula

Obtaining a coefficient matrix of the dictionary corresponding to the ith mode after the t +1 th update

Thereby obtaining the coefficient Lambda after the t +1 time of updating_t+1；

Step S102-3: updating the dictionary D:

solving the equation by a quadratic problem solverThus obtaining t +1 updated dictionary D_t+1；

Step S102-4: update low rank projection P:

by the formulaObtaining the low-rank projection matrix P after the t +1 time of updating_t+1；

Step S102-5: updating the auxiliary variable Z:

by the formula

Obtaining the auxiliary variable Z at the t +1Row vector z of ith row after secondary update_i,t+1So as to obtain the t +1 updated auxiliary variable Z_t+1；

Wherein,

are respectively Λ_t+1、A_Z,tThe row vector of the ith row of (1);

step S102-6: updating the auxiliary variable W:

by the formula

Obtaining the t +1 updated auxiliary variable W_t+1；

Wherein F Σ B^TIs that

Singular value decomposition of (c); the function S (a, b) takes the values: when | a | ≧ b: s (a, b) ═ sgn (a) (| a | -b); when | a | < b: s (a, b) ═ 0;

step S102-7: updating lagrange multiplier A_ZAnd A_W：

By formula A_Z,t+1＝A_Z,t+α_Z(Λ_t+1-Z_t+1) And A_W,t+1＝A_W,t+α_W(P_t+1-W_t+1) Obtaining Lagrange multiplier A after t +1 time of updating_Z,t+1And A_W,t+1；

Step S102-8: judging whether the iteration time t reaches the maximum iteration time k, if so, updating the latest P_t+1And D_t+1As training result values of the projection matrix P and the dictionary D; otherwise, t +1 is updated, and the procedure returns to step S102-2.

In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:

according to the method, the difference between the modalities is reduced through low-rank projection, and the low-rank constraint on the public subspace can effectively retain the similar information between different modalities of the same type, so that the connection between the types in the low-rank public subspace is larger, the image dimensionality can be reduced, and the dimensionality disaster is avoided to a certain extent. Aiming at the multi-modal problem, the characteristics of a plurality of modes are combined by adopting characteristic fusion to obtain the characteristics which are more beneficial to identification, and the identification efficiency is improved.

Drawings

FIG. 1 shows the recognition rate of the present invention under the near infrared and visible light face data set (CASIA HFB).

FIG. 2 is a parameter characteristic diagram of the present invention. The numbers 1 to 8 on the abscissa represent these values, respectively: 0.001,0.01,0.1,0.5,1,5, 10, 100.

Fig. 3 is a graph of the convergence of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings.

The solution of the invention is that firstly, the multi-mode images distributed in different spaces are projected in low rank, so as to reduce the difference between modes, retain the important distinguishing information of the same category and reduce the data dimension; then, in the same projected space, performing feature fusion by adopting joint sparse constraint; and finally, realizing image recognition processing based on the fused features. Because the same type but different modes have similar information, compared with high-dimensional original image information, the similar information between the modes is required to be low-dimensional, and in order to reflect the low-dimensional characteristics in the multi-mode public subspace, the method extracts the similar information between the modes on the premise that the multi-mode projection matrix is low-rank, achieves the purpose of reducing the mode difference, and further improves the identification efficiency. The invention can be applied to identification processing in scenes such as identity identification, safety monitoring, criminal investigation and crime and the like.

The specific implementation process of the multi-modal identification method based on the low-rank and joint sparse constraint is as follows:

consider a training sample with C classes, K modalities, with the training sample for each modality expressed as:

i＝1,2,...,K，m_ithe dimension of the sample is trained by the table, and n is the number of samples contained in each mode. If the low rank subspace is represented by P (also called low rank common projection), P is obtained after low rank projection of the sample^TXⁱ. The invention adopts joint sparse representation based on dictionary learning mode and design

Dictionary of corresponding modalities, NⁱFor the noise of the ith modal sample, the theory represented by joint sparsity is as follows:

P^TXⁱ＝DⁱΛⁱ+Nⁱ，i＝1,2,...,K

wherein, ΛⁱRepresentation dictionary DⁱThe coefficient matrix of (2).

Introducing low-rank constraint and joint sparse constraint, and solving by the following optimization formula (optimization model) to obtain low-rank public projection P and dictionary DⁱAnd its coefficient matrix Λⁱ：

Wherein

As a dictionary Dⁱ(ii) atom (| · |) non-combustible gas_FRepresenting the Frobenius norm with a sparse coefficient Λ ═ Λ¹,Λ²,...,Λ^K]I.e. ΛⁱIs a sub-matrix of the matrix Λ, which may also be referred to as a sparse coefficient matrix; kernel norm P (| non-conducting phosphor)_*＝∑_iσ_i(P) the value is the sum of the eigenvalues of the matrix. Since the rank minimization problem cannot be directly handled with low rank (p), the rank minimization problem can be approximately solved by the kernel norm. Orthogonal constraint P^TI ensures that the resulting P is the basis transformation matrix, where I is the identity matrix and λ is the regularization parameter.

The solution was performed by Alternating orientation Method of Multipliers (ADMM). Introducing auxiliary variables Z and W, definingAugmented Lagrange function

The expression is as follows:

wherein A is_Z、A_WIs a linearly constrained multiplier (i.e., Lagrange multiplier), alpha_Z、α_WIs a positive parameter, a sign<A, B > represents tr (A)^TB) Tr (-) denotes a trace of the matrix, and

wherein

Are respectively A_Z、A_WAn element of (1); dictionary D ═ D¹,D²,...,D^K]。

Solving functions for P, Λ, Z and W according to augmented Lagrange multiplication

While maintaining A_ZAnd A_WNot changing, then fixing other variables, for A_ZAnd A_WAnd (6) updating. Having an objective functionWith a distributed structure, to simplify the problem, the problem can be solved by taking the variables P, Λ, Z and W as the unique variables of the objective function, respectively. The solving process for each sub-optimization goal is described in detail below. Since the optimization process is an iterative update solving process, the result of the t-th update is represented by adding subscript t (t ≧ 0) to the corresponding variable in the following equation.

(1) And updating the sparse coefficient lambda.

When solving the sparse coefficient, the optimization formula is converted into:

the optimization formula is a convex function, and an updating formula of the sparse coefficient lambda is obtained by solving the first order partial derivative and zero calculation:

wherein I is an identity matrix, ZⁱA sub-matrix of Z, i.e. Z ═ Z¹,Z²,...,Z^K]。

(2) And updating the dictionary D.

Fixing other variables and parameters to obtain the following optimized formula:

this is a quadratic constrained quadratic programming problem (QCQP) that can be solved by a quadratic solver.

(3) Low rank projection P updates.

The optimization formula for solving the low-rank projection is as follows:

the optimized expression is a convex function, and is obtained by solving the first order partial derivative and zero:

wherein the superscript T denotes the matrix transposition, i.e.

To represent

The transposing of (1).

(4) The auxiliary variable Z is updated.

Solving the optimization problem for the auxiliary variable Z becomes:

equivalent transformation into:

since the above equation has a separable structure, each row of Z can be treated separately to solve this problem. Memory gamma_i，z_iAre respectively Lambda and A_ZAnd row i of Z. The problem solving by the above equation translates into:

wherein

And z is_i,t+1The calculation of (a) has the following form:

wherein, the sign function (c)₊Represents taking max (c) in vector c_iAnd 0) value.

(5) The auxiliary variable W is updated.

Updating the auxiliary variable W requires solving the following optimization:

the above equation is equivalently converted into:

the above equation is the shrinkage problem, and the solution equation is:

wherein F Σ B^TIs thatThe Singular Value Decomposition (SVD) of (a) and (b) of the function S are, when | a | ≧ b: s (a, b) ═ sgn (a) (| a | -b); when | a | < b: s (a, b) ═ 0.

(6) Parameter A_ZAnd A_WAnd (4) updating.

Lagrange multiplier a_ZAnd A_WThe update formula of (2) is: a. the_Z,t+1＝A_Z,t+α_Z(Λ_t+1-Z_t+1)

A_W,t+1＝A_W,t+α_W(P_t+1-W_t+1)

In summary, to solve for the low rank common projection P, dictionary DⁱAnd its coefficient matrix ΛⁱThe specific solving process is as follows:

step 1: initializing parameters:

the initialization parameters include: sparse coefficient Λ₀(ii) a Low rank projection matrix P₀(ii) a Dictionary D₀(ii) a Auxiliary variable Z₀(ii) a Auxiliary variable W₀(ii) a Linear constraint multiplier a_Z，A_W(ii) a And lagrange parameter alpha_Z，α_W(ii) a The maximum number of iterations k.

Step 2: updating the sparse coefficient lambda:

through type

And updating the sparse coefficient lambda.

And step 3: updating the dictionary D:

solving the equation by a quadratic problem solver

And 4, step 4: update low rank projection P:

through type

The low rank projection P is updated.

And 5: updating the auxiliary variable Z:

through typeThe parameter Z is updated.

Step 6: updating the auxiliary variable W:

through typeThe parameter W is updated.

And 7: updating lagrange multiplier A_ZAnd A_W：

Through the formula A_Z,t+1＝A_Z,t+α_Z(Λ_t+1-Z_t+1) And A_W,t+1＝A_W,t+α_W(P_t+1-W_t+1) For parameter A_ZAnd A_WAnd (6) updating.

And 8: judging the iteration times:

when the iteration time t is less than k, t is t +1, and the step 2 is returned; and when t is larger than or equal to k, ending, and outputting the obtained projection matrix P and the dictionary D.

Through the steps, the training process of the optimization model is completed, and a projection matrix P and a dictionary D are obtained.

And step 9: solving joint sparse coefficients of test samples

In the test process, the test sample with K modes is recorded as { Y¹,Y²,...,Y^KAnd (5) after the test samples are projected through a low-rank projection matrix P, solving the joint sparse representation about the dictionary D, and obtaining a joint sparse coefficient by solving the following formula

Wherein the value of the lambda is the same as that of the lambda in the training process,

step 10: and the classifier classifies according to the joint sparse coefficient to obtain an identification result.

For example, based on an actual application scene, category labels matched with different joint sparse coefficients are preset, so that the joint sparse coefficients obtained based on the current solution are matched with the corresponding category labels, and further a category identification result of the current image to be classified is obtained.

The classification includes, but is not limited to, KNN classifier, Support Vector Machine (SVM), naive Bayes classifier.

Examples

To further verify the identification performance of the present invention, simulation verification was performed on MATLAB 2016. For the convenience of analysis, the simulation scene considers the human face recognition under the near infrared and visible light scenes and the multi-view scene. Eight existing classification methods selected in a comparative experiment are specifically as follows: SCDL (Semi-coordinated Dictionary Learning), CDL (Coupled Dictionary Learning), GCDL1, GCDL2(Generalized Coupled Dictionary Learning), PCA (Primary Complex analysis), SRRS (Supervised Regularizationbased distribution hub subsystem), LRCS (Low-random mon subsystem) and CLRS (Collective Low-random subsystem); wherein SCDL, CDL, GCDL1 and GCDL2 are dictionary learning-based methods, and PCA, SRRS, LRCS and CLRS are common subspace learning-based methods.

Comparing the method (Ours) of the present invention with the existing eight methods for comparison, performing a comparison test of recognition rates on face data sets (CMU Multi PIE) at two different viewing angles, wherein the specific comparison is shown in table 1, wherein cases 1 to 6 represent different viewing angle combination schemes, that is, two different viewing angles are selected from the CMU Multi PIE to be combined, and 6 different combination results are obtained for simulation verification.

TABLE 1

As can be seen from Table 1, the low rank and joint sparsity based multi-modal image recognition method of the invention has better recognition rate than the existing method.

Fig. 1 shows the comparison result of the recognition rate of the present invention and the recognition rate of the above eight conventional classification methods in the face data set (CASIA HFB) of the infrared and visible light scenes, where the recognition rate of the present invention is the highest.

Fig. 2 shows a parameter characteristic diagram of the present invention when performing recognition processing in a near-infrared and visible light scene. The numbers 1 to 8 on the abscissa represent the values of the regularization parameter λ respectively as follows: 0.001,0.01,0.1,0.5,1,5, 10, 100.

Fig. 3 shows a convergence graph when the Recognition processing is performed in the near-infrared and visible light scenes, in which the abscissa is the number of iterations, a circled curve represents a convergence curve (Objective value), and a curve with "x" represents a Recognition rate variation curve (Recognition rate). As can be seen from fig. 3, in this embodiment, the optimal maximum number of iterations may be set to be between 10 and 15, so as to reduce the computation amount on the premise of ensuring the recognition rate.

While the invention has been described with reference to specific embodiments, any feature disclosed in this specification may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise; all of the disclosed features, or all of the method or process steps, may be combined in any combination, except mutually exclusive features and/or steps.

Claims

1. A multi-modal image recognition method based on low rank and joint sparsity is characterized by comprising the following steps:

step S101: constructing an optimization model:

wherein K represents a modal number, C represents a category number, | · | | purple_FRepresenting Frobenius norm, | | · |. luminance_1,2Representing a joint sparse norm, | · | | luminance_*Representing a nuclear norm, and a superscript T representing a matrix transposition;

Xⁱa feature matrix representing training samples of the i-th modality, and XⁱIs m_iX n dimensional matrix, where m_iRepresenting the characteristic dimension of training samples of the ith mode, and n represents the number of samples contained in each mode; dⁱA dictionary representing the ith modality; lambdaⁱRepresentation dictionary DⁱA coefficient matrix of (a); sparse coefficient Λ ═ Λ¹,Λ²,...,Λ^K](ii) a λ represents a regularization parameter;

Step S3: joint sparsity based on objects to be classifiedCoefficient of performance

2. The method of claim 1, wherein the step S102 of obtaining the low rank projection matrix P and the dictionary D comprises the steps of:

wherein, the dictionary

Coefficient of sparseness

Auxiliary variable Z₀Matrix dimension and sparsity factor Λ₀Has the same dimension of matrix, and an auxiliary variable W₀And a low rank projection matrix P₀The matrix dimensions of (a) are the same;

step S102-2: updating the sparse coefficient lambda:

by the formula

Thereby obtaining the sparse coefficient Lambda after the t +1 time of updating_t+1；

Step S102-3: updating the dictionary D:

solving the equation by a quadratic problem solver

s.t.

Thus obtaining t +1 updated dictionary D_t+1；

Step S102-4: update low rank projection P:

by the formula

Obtaining the low-rank projection matrix P after the t +1 time of updating_t+1；

Step S102-5: updating the auxiliary variable Z:

by the formula

Obtaining the row vector Z of the ith row of the auxiliary variable Z after the t +1 th update_i,t+1So as to obtain the t +1 updated auxiliary variable Z_t+1；

Wherein,

γ_i,t+1、

are respectively Λ_t+1、A_Z,tThe row vector of the ith row of (1);

step S102-6: updating the auxiliary variable W:

by the formula

Obtaining the t +1 updated auxiliary variable W_t+1；

Wherein F Σ B^TIs that

step S102-7: updating lagrange multiplier A_ZAnd A_W：

3. The method of claim 2, wherein the maximum number of iterations k ranges from 10 to 15.