CN107451537A

CN107451537A - Face identification method based on deep learning multilayer Non-negative Matrix Factorization

Info

Publication number: CN107451537A
Application number: CN201710568578.0A
Authority: CN
Inventors: 同鸣; 李明阳; 陈逸然; 席圣男
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2017-07-13
Filing date: 2017-07-13
Publication date: 2017-12-08
Anticipated expiration: 2037-07-13
Also published as: CN107451537B

Abstract

The invention discloses a kind of face identification method based on deep learning multilayer Non-negative Matrix Factorization, mainly solves the problems, such as that existing face recognition technology discrimination under complicated cosmetic variation is low.Its technical scheme is：1. utilize the characteristic of VGG Face extraction training samples and each channel data of test sample；2. the characteristic of each channel data of pair training sample repeats the characteristic extraction procedure of L normalization, nonlinear transformation and matrix decomposition respectively, low-rank robust features are obtained；3. build K nearest neighbor classifier；4. the characteristic of each channel data of test sample is projected respectively, projection coefficient vector is obtained；5. projection coefficient vector is input into K nearest neighbor classifier to be classified；6. integrating the classification results of K nearest neighbor classifier, the recognition result of test sample is obtained.The present invention improves the face identification rate under complicated cosmetic variation, can be applied to identity authentication and information security field.

Description

Face recognition method based on deep learning multi-layer non-negative matrix factorization

技术领域technical field

本发明属于图像处理技术领域，特别涉及人脸图像识别方法，可应用于身份鉴定和信息安全领域。The invention belongs to the technical field of image processing, in particular to a face image recognition method, which can be applied to the fields of identification and information security.

背景技术Background technique

随着人类社会的不断发展，人脸识别在安保、金融、电子政务等多个领域具有广泛应用，提高人脸识别性能有利于扩大人脸识别的应用。当前关于人脸识别的主要研究在于提取有效、鲁棒并且更具鉴别性的特征和设计具有更好分类能力的分类器。选择更加鲁棒、更有鉴别力的特征以及设计具有良好分类能力的分类器是提高人脸识别鲁棒性的关键。With the continuous development of human society, face recognition has been widely used in security, finance, e-government and many other fields. Improving the performance of face recognition is conducive to expanding the application of face recognition. The current main research on face recognition is to extract effective, robust and more discriminative features and design classifiers with better classification ability. Selecting more robust and discriminative features and designing a classifier with good classification ability are the keys to improve the robustness of face recognition.

非负矩阵分解是一种在非负约束下进行矩阵分解的特征提取方法，具有良好的数据表示能力，可大幅度降低数据特征的维数，而且其分解特性合乎人类视觉感知的直观体验，分解结果具有可解释和明确的物理意义。基本非负矩阵分解NMF直接将原始系数矩阵分解为基矩阵和系数矩阵，并要求基矩阵以及系数矩阵都是非负的，这表明非负矩阵分解NMF只存在加性组合。因此，非负矩阵分解NMF可以看作是一个基于部分表示的模型，能够提供观测数据的局部结构，但有些情况下，NMF算法也会给出全局特征，导致分类性能受限。Non-negative matrix decomposition is a feature extraction method for matrix decomposition under non-negative constraints. It has good data representation ability and can greatly reduce the dimensionality of data features, and its decomposition characteristics are in line with the intuitive experience of human visual perception. Decomposition The results have interpretable and clear physical meaning. The basic non-negative matrix factorization NMF directly decomposes the original coefficient matrix into a base matrix and a coefficient matrix, and requires both the base matrix and the coefficient matrix to be non-negative, which shows that there is only an additive combination in the non-negative matrix factorization NMF. Therefore, non-negative matrix factorization NMF can be regarded as a model based on partial representation, which can provide the local structure of the observed data, but in some cases, the NMF algorithm will also give global features, resulting in limited classification performance.

深度学习是机器学习领域中特征表示的一个新的研究方向，近年来在语音识别、计算机视觉等多类应用中取得突破性的进展，深度学习通过组合底层特征形成更加抽象的高层表示或特征。深度学习模型中，具有更多的非线性变换层，具有更强的泛化能力。但在实际应用中，头部姿势、照明、遮挡等因素产生的外观变化会导致深度学习的性能下降，到目前为止没有良好的解决办法。Deep learning is a new research direction of feature representation in the field of machine learning. In recent years, breakthroughs have been made in various applications such as speech recognition and computer vision. Deep learning forms more abstract high-level representations or features by combining low-level features. In the deep learning model, there are more nonlinear transformation layers and stronger generalization ability. But in practical applications, appearance changes caused by factors such as head posture, lighting, occlusion, etc. will lead to a decline in the performance of deep learning, and so far there is no good solution.

发明内容Contents of the invention

鉴于以上所述的现有技术缺点，本发明的目的在于提供一种基于深度学习多层非负矩阵分解的人脸识别方法，以获取深层次更具判别性的低秩鲁棒特征，提高复杂外观变化下的人脸识别率。In view of the disadvantages of the prior art described above, the purpose of the present invention is to provide a face recognition method based on deep learning multi-layer non-negative matrix factorization, to obtain low-rank robust features that are more discriminative in depth, and improve complex Face recognition rate under appearance variation.

实现本发明的技术关键是在深度学习的基础上，引入了一种新的多层非负矩阵分解，以对现有的深度学习方法进行改进。具体来说，本发明是通过对深度学习得到的样本特征进行多次非负矩阵分解，以此得到更具鉴别力的低秩特征表示，从而提高人脸识别率，其步骤包括如下：The technical key to realize the present invention is that on the basis of deep learning, a new multi-layer non-negative matrix decomposition is introduced to improve the existing deep learning methods. Specifically, the present invention obtains a more discriminative low-rank feature representation by performing multiple non-negative matrix decompositions on the sample features obtained by deep learning, thereby improving the face recognition rate. The steps include the following:

(1)将训练样本的每个通道数据输入到VGG-Face深度卷积神经网络中，得到训练样本每个通道数据的特征数据X(k)，其中，k＝1,2,...,K，K为训练样本的通道数；(1) Input the data of each channel of the training sample into the VGG-Face deep convolutional neural network to obtain the characteristic data X(k) of each channel data of the training sample, where k=1,2,..., K, K is the channel number of training samples;

(2)对步骤(1)得到的特征数据X(k)分别进行归一化、非线性变换和矩阵分解的特征提取过程，得到系数矩阵H(k)；(2) Carry out the feature extraction process of normalization, nonlinear transformation and matrix decomposition to the characteristic data X (k) that step (1) obtains respectively, obtain coefficient matrix H (k);

(3)将步骤(2)中的特征提取过程重复L次，得到低秩鲁棒特征h_j(k)，其中，j＝1,2,...,n，n为训练样本总数；(3) Repeat the feature extraction process in step (2) L times to obtain low-rank robust features h _j (k), where j=1,2,...,n, n is the total number of training samples;

(4)根据步骤(3)得到的低秩鲁棒特征h_j(k)，构造K个最近邻分类器；(4) Construct K nearest neighbor classifiers according to the low-rank robust features h _j (k) obtained in step (3);

(5)将测试样本的每个通道数据输入到VGG-Face深度卷积神经网络中，得到测试样本每个通道数据的特征数据Y(k)；(5) Input each channel data of the test sample into the VGG-Face deep convolutional neural network to obtain the characteristic data Y(k) of each channel data of the test sample;

(6)根据步骤(5)得到的特征数据Y(k)进行投影过程，得到投影系数向量 (6) Perform the projection process according to the characteristic data Y(k) obtained in step (5), and obtain the projection coefficient vector

(7)将步骤(6)得到的投影系数向量输入到K个最近邻分类器中，得到测试样本每个通道的分类结果，其中，i＝1,2,...,e，e为测试样本总数；(7) The projection coefficient vector obtained in step (6) Input to the K nearest neighbor classifiers to obtain the classification result of each channel of the test sample, where i=1,2,...,e, e is the total number of test samples;

(8)综合步骤(7)得到的测试样本每个通道的分类结果，得到测试样本的分类结果。(8) Combining the classification results of each channel of the test sample obtained in step (7) to obtain the classification result of the test sample.

本发明与现有技术相比，具有如下优点：Compared with the prior art, the present invention has the following advantages:

1)本发明在深度学习的基础上结合了多层非负矩阵分解，能够得到更具鉴别力的特征表示；1) The present invention combines multi-layer non-negative matrix decomposition on the basis of deep learning, which can obtain more discriminative feature representation;

2)本发明通过综合不同通道的分类结果，进一步提高了复杂外观变化下的人脸识别率。2) The present invention further improves the face recognition rate under complex appearance changes by synthesizing the classification results of different channels.

附图说明Description of drawings

图1是本发明的实现流程图。Fig. 1 is the realization flowchart of the present invention.

具体实施方式detailed description

参照图1，本发明的基于深度学习多层非负矩阵分解的人脸识别步骤如下：With reference to Fig. 1, the face recognition step of the present invention based on deep learning multi-layer non-negative matrix decomposition is as follows:

步骤1，获取训练样本每个通道数据的特征数据X(k)。Step 1, obtain the feature data X(k) of each channel data of the training sample.

(1a)获取人脸数据集V^train作为训练数据集，该训练数据集中的训练样本总数为n，该训练数据集的类别数量为c，该训练数据集中的每个训练样本等分为K个区域，每个区域作为训练样本的1个通道数据，训练样本共包含K个通道数据；(1a) Obtain the face data set V ^train as a training data set, the total number of training samples in the training data set is n, the number of categories in the training data set is c, and each training sample in the training data set is equally divided into K Area, each area is used as one channel data of the training sample, and the training sample contains K channel data in total;

(1b)根据训练数据集，在Linux操作系统下，利用Caffe深度学习框架对VGG-Face深度卷积神经网络参数进行微调；(1b) According to the training data set, under the Linux operating system, use the Caffe deep learning framework to fine-tune the parameters of the VGG-Face deep convolutional neural network;

(1c)将训练数据集中每个训练样本的每个通道数据输入到VGG-Face深度卷积神经网络中，得到训练每个通道数据的特征数据X(k)，其中，k＝1,2,...,K；K为训练样本的通道数。(1c) Input each channel data of each training sample in the training data set into the VGG-Face deep convolutional neural network to obtain the feature data X(k) of training each channel data, where k=1,2, ...,K; K is the channel number of training samples.

步骤2，根据特征数据X(k)，获取系数矩阵H(k)。Step 2, according to the feature data X(k), obtain the coefficient matrix H(k).

对特征数据X(k)分别进行归一化、非线性变换和矩阵分解的特征提取过程，得到系数矩阵H(k)；Perform the feature extraction process of normalization, nonlinear transformation and matrix decomposition on the feature data X(k) to obtain the coefficient matrix H(k);

(2a)使用L2范数对特征数据X(k)进行归一化处理；(2a) Normalize the feature data X(k) using the L2 norm;

(2b)使用sigmoid函数对步骤(2a)归一化处理后的结果进行非线性变换，得到变换后的结果B(k)；(2b) use the sigmoid function to carry out non-linear transformation to the result after step (2a) normalization processing, obtain the transformed result B(k);

(2c)使用软约束非负矩阵分解对步骤(2b)中非线性变换后的结果B(k)进行矩阵分解，得到B(k)≈Z(k)A(k)F(k)，其中，B(k)为m×n阶矩阵，Z(k)为m×φ阶的基矩阵，A(k)为φ×c阶的辅助矩阵，F(k)为c×n阶的预测标签矩阵，m为原始特征维数，φ为分解维数，c为类别数，n为训练样本总数；(2c) Use soft-constrained non-negative matrix decomposition to perform matrix decomposition on the result B(k) after nonlinear transformation in step (2b), and obtain B(k)≈Z(k)A(k)F(k), where , B(k) is a matrix of order m×n, Z(k) is a base matrix of order m×φ, A(k) is an auxiliary matrix of order φ×c, and F(k) is a predicted label of order c×n Matrix, m is the original feature dimension, φ is the decomposition dimension, c is the number of categories, and n is the total number of training samples;

(2c1)随机初始化基矩阵Z⁽¹⁾(k)、辅助矩阵A⁽¹⁾(k)和预测标签矩阵F⁽¹⁾(k)作为迭代1次后的结果，其中，基矩阵Z⁽¹⁾(k)中的任意元素满足为基矩阵Z⁽¹⁾(k)的第p行q列元素；辅助矩阵A⁽¹⁾(k)中的任意元素满足为辅助矩阵A⁽¹⁾(k)的第α行β列元素；预测标签矩阵F⁽¹⁾(k)中的任意元素满足为预测标签矩阵F⁽¹⁾(k)的第γ行列元素；p＝1,2,...,m，q＝1,2,...,φ，α＝1,2,...,φ，β＝1,2,...,c，γ＝1,2,...,c， (2c1) Randomly initialize the base matrix Z ⁽¹⁾ (k), auxiliary matrix A ⁽¹⁾ (k) and predictive label matrix F ⁽¹⁾ (k) as the result of one iteration, where the base matrix Z ^{(1 )} Any element in (k) satisfies is the pth row q column element of the base matrix Z ⁽¹⁾ (k); any element in the auxiliary matrix A ⁽¹⁾ (k) satisfies is the element in row β and column α of the auxiliary matrix A ⁽¹⁾ (k); any element in the predicted label matrix F ⁽¹⁾ (k) satisfies is the γth row of the predicted label matrix F ⁽¹⁾ (k) Column elements; p=1,2,...,m, q=1,2,...,φ, α=1,2,...,φ, β=1,2,...,c , γ=1,2,...,c,

(2c2)根据如下公式，对基矩阵Z中的元素Z_p,q进行更新：(2c2) Update the elements Z _{p, q} in the base matrix Z according to the following formula:

其中，t为迭代次数，t＝2,...,iter，iter为最大迭代次数，T为矩阵转置，为迭代t次后得到的非归一化基矩阵Z^(t)′(k)的第p行q列元素；Among them, t is the number of iterations, t=2,..., iter, iter is the maximum number of iterations, T is the matrix transpose, is the pth row q column element of the unnormalized base matrix Z ^(t)' (k) obtained after iteration t times;

(2c3)对步骤(2c2)中得到的基矩阵Z^(t)′(k)进行归一化处理，得到迭代t次的基矩阵Z^(t)(k)；(2c3) normalize the base matrix Z ^(t) '(k) obtained in step (2c2), obtain the base matrix Z ( ^t ) (k) of iteration t times;

(2c4)根据如下公式，对辅助矩阵A(k)中的元素A_α,β(k)进行更新：(2c4) Update the elements A _{α, β} (k) in the auxiliary matrix A(k) according to the following formula:

其中，为迭代t次后得到的辅助矩阵A^(t)(k)的第α行β列元素；A^(t)(k)为迭代t次后得到的辅助矩阵；in, For the auxiliary matrix A ^(t) (k) obtained after the iteration t times, the α row and β column element; A ^(t) (k) is the auxiliary matrix obtained after the iteration t times;

(2c5)根据如下公式，对预测标签矩阵F(k)中的元素进行更新：(2c5) According to the following formula, predict the elements in the label matrix F(k) Make an update:

其中，为迭代t次后预测标签矩阵F^(t)(k)的第γ行列元素；F^(t)(k)为迭代t次后预测标签矩阵；λ为正则项系数；为预先定义的本地标签矩阵C(k)的第γ行列元素；in, is the γ-th row of the predicted label matrix F ^(t) (k) after iteration t times Column elements; F ^(t) (k) is the predicted label matrix after iteration t times; λ is the coefficient of the regular term; is the γ-th row of the pre-defined local label matrix C(k) column element;

(2c6)判断迭代次数t是否达到最大迭代次数iter：如果是，则停止迭代，将第iter次迭代得到的基矩阵Z^(iter)(k)、辅助矩阵A^(iter)(k)和预测标签矩阵F^(iter)(k)，作为最终的基矩阵Z(k)、辅助矩阵A(k)和预测标签矩阵F(k)；否则，返回步骤(2c2)；(2c6) Determine whether the number of iterations t has reached the maximum number of iterations iter: if so, stop the iteration, and use the base matrix Z ^(iter) (k), auxiliary matrix A ^(iter) (k) and predicted label obtained in the iter iteration Matrix F ^(iter) (k), as the final base matrix Z(k), auxiliary matrix A(k) and predicted label matrix F(k); otherwise, return to step (2c2);

(2d)根据步骤(2c)中软约束非负矩阵分解后得到的辅助矩阵A(k)和预测标签矩阵F(k)，得到系数矩阵：H(k)＝A(k)F(k)。(2d) According to the auxiliary matrix A(k) obtained after the soft-constrained non-negative matrix decomposition in step (2c) and the predicted label matrix F(k), the coefficient matrix is obtained: H(k)=A(k)F(k).

步骤3，获取训练样本的低秩鲁棒特征h(k)。Step 3, obtain the low-rank robust features h(k) of the training samples.

重复步骤2中的特征提取过程，得到训练样本的每个通道特征数据X(k)的低秩鲁棒特征h(k)；Repeat the feature extraction process in step 2 to obtain the low-rank robust feature h(k) of each channel feature data X(k) of the training sample;

(3a)根据步骤2对训练样本的各个通道的特征数据X(k)进行处理，得到第1层基矩阵Z¹(k)和第1层系数矩阵H¹(k)；(3a) Process the feature data X(k) of each channel of the training sample according to step 2 to obtain the first-layer base matrix Z ¹ (k) and the first-layer coefficient matrix H ¹ (k);

(3b)根据步骤2对步骤(3a)得到的第1层系数矩阵H¹(k)进行处理，得到第2层基矩阵Z²(k)和第2层系数矩阵H²(k)；(3b) Process the coefficient matrix H ¹ (k) of the first layer obtained in step (3a) according to step 2 to obtain the base matrix Z ² (k) of the second layer and the coefficient matrix H ² (k) of the second layer;

(3c)根据步骤(3a)和(3b)继续重复相同步骤，根据第l-1层系数矩阵H^l-1(k)，得到第l层基矩阵Z^l(k)和第l层系数矩阵H^l(k)，直到重复次数l＝L，得到第L层基矩阵Z^L(k)和第L层系数矩阵H^L(k)，其中，l＝2,...,L，L为多层非负矩阵分解的层数；(3c) Continue to repeat the same steps according to steps (3a) and (3b), and obtain the base matrix Z ^l (k) of the l-th layer and the coefficient matrix of the l-th layer according to the coefficient matrix H ^l-1 (k) of the l-1th layer H ^l (k), until the number of repetitions l=L, the base matrix Z ^L (k) of the L layer and the coefficient matrix H ^L (k) of the L layer are obtained, wherein, l=2,...,L, L is The number of layers of multi-level non-negative matrix factorization;

(3d)根据步骤(3c)得到的第L层系数矩阵H^L(k)，得到训练样本各个通道的低秩鲁棒特征h_j(k)，其中，j＝1,2,...,n。(3d) According to the L-th layer coefficient matrix H ^L (k) obtained in step (3c), obtain the low-rank robust features h _j (k) of each channel of the training sample, where j=1,2,..., n.

步骤4，根据步骤3得到的低秩鲁棒特征h_j(k)，构造K个最近邻分类器。Step 4, according to the low-rank robust feature h _j (k) obtained in step 3, construct K nearest neighbor classifiers.

(4a)从步骤3中得到的结果，选取每个训练样本第k个通道的低秩鲁棒特征h_j(k)，构成一个特征集合；(4a) From the results obtained in step 3, select the low-rank robust features h _j (k) of the kth channel of each training sample to form a feature set;

(4b)根据步骤(4a)得到的特征集合，构成一个最近邻分类器；(4b) form a nearest neighbor classifier according to the feature set obtained in step (4a);

(4c)针对不同的通道重复步骤(4a)和(4b)，得到K个最近邻分类器。(4c) Repeat steps (4a) and (4b) for different channels to obtain K nearest neighbor classifiers.

步骤5，获取测试样本每个通道数据的特征数据Y(k)。Step 5, obtain the characteristic data Y(k) of each channel data of the test sample.

(5a)获取与训练数据集属性相同的人脸数据集V^test作为测试数据集，该测试数据集中的测试样本总数为e，该测试数据集的类别数量为c，该测试数据集中的每个测试样本根据步骤(1a)划分为K个通道数据；(5a) Obtain the face data set V ^test with the same attributes as the training data set as the test data set, the total number of test samples in the test data set is e, the number of categories in the test data set is c, and each The test sample is divided into K channel data according to step (1a);

(5b)根据步骤(1b)对VGG-Face深度卷积神经网络的参数进行设置；(5b) according to step (1b), the parameter of VGG-Face deep convolutional neural network is set;

(5c)将测试样本的每个通道数据输入到VGG-Face深度卷积神经网络中，得到测试样本每个通道数据的特征数据Y(k)。(5c) Input the data of each channel of the test sample into the VGG-Face deep convolutional neural network to obtain the feature data Y(k) of each channel data of the test sample.

步骤6，将步骤5得到的测试样本的每个通道数据的特征数据Y(k)分别进行投影，输出投影系数向量 Step 6, respectively project the characteristic data Y(k) of each channel data of the test sample obtained in step 5, and output the projection coefficient vector

(6a)对测试样本的特征数据Y(k)进行归一化处理、非线性变换和投影变换的投影处理过程，得到第1层投影矩阵 (6a) Perform normalization processing, nonlinear transformation and projection transformation on the characteristic data Y(k) of the test sample to obtain the projection matrix of the first layer

(6a1)使用L2范数对测试样本的特征数据Y(k)进行归一化处理；(6a1) Normalize the feature data Y(k) of the test sample using the L2 norm;

(6a2)使用Sigmoid函数对步骤(6a1)中归一化处理后得到的结果进行非线性变换，得到非线性变换后的变换结果f(Y(k))，其中，f(·)表示利用Sigmoid函数进行非线性变换；(6a2) Use the Sigmoid function to perform nonlinear transformation on the result obtained after the normalization process in step (6a1), and obtain the transformation result f(Y(k)) after the nonlinear transformation, where f(·) means using Sigmoid function for nonlinear transformation;

(6a3)将步骤(6a2)非线性变换后的结果f(Y(k))分别在步骤(3a)得到的第1层基矩阵Z¹(k)上进行投影变换，得到第1层投影矩阵：其中，表示广义逆运算；(6a3) Perform projection transformation on the first-layer base matrix Z ¹ (k) obtained in step (3a) to obtain the first-layer projection matrix : in, Indicates the generalized inverse operation;

(6b)根据步骤(6a)得到的第1层投影矩阵和第2层基矩阵Z²(k)进行相同处理过程，得到第2层投影矩阵 (6b) The first layer projection matrix obtained according to step (6a) Perform the same process as the base matrix Z ² (k) of the second layer to obtain the projection matrix of the second layer

(6c)根据步骤(6a)和(6b)继续重复相同步骤，根据第l-1层投影矩阵和第l层基矩阵Z^l(k)得到第l层投影矩阵直到重复次数l＝L，得到第L层投影矩阵其中，l＝2,...,L；(6c) Continue to repeat the same steps according to steps (6a) and (6b), according to the l-1th layer projection matrix and the base matrix Z ^l (k) of the l-th layer to obtain the projection matrix of the l-th layer Until the number of repetitions l=L, the L-th layer projection matrix is obtained Among them, l=2,...,L;

(6d)根据步骤(6c)得到的第L层投影矩阵得到每个测试样本的投影系数向量其中，i＝1,2,...,e。(6d) The L-th layer projection matrix obtained according to step (6c) Get the projection coefficient vector for each test sample Wherein, i=1, 2, . . . , e.

步骤7，将步骤6得到的投影系数向量输入到K个最近邻分类器中，得到测试样本每个通道的分类结果。Step 7, the projection coefficient vector obtained in step 6 Input to the K nearest neighbor classifiers to get the classification results of each channel of the test sample.

(7a)计算训练样本的低秩鲁棒特征h_j(k)与测试样本的投影系数向量之间的低维欧氏距离得到距离集合其中，j＝1,2,...,n，i∈{1,2,...,e}，||·||₂表示2范数；(7a) Calculate the low-rank robust feature h _j (k) of the training sample and the projection coefficient vector of the test sample The low-dimensional Euclidean distance between get distance set Among them, j=1,2,...,n, i∈{1,2,...,e}, ||·|| ₂ means 2-norm;

(7b)根据步骤(7a)得到的距离集合将距离集合中最小值对应的第ξ个训练样本的类别作为第i个测试样本在第k个最近邻分类器上的分类结果，其中，ξ∈{1,2,...,n}；(7b) The distance set obtained according to step (7a) The minimum value in the distance set The category of the corresponding ξ-th training sample is used as the classification result of the i-th test sample on the k-th nearest neighbor classifier, where ξ∈{1,2,...,n};

(7c)根据步骤(7a)和(7b)分别对每个测试样本的K个通道进行分类，得到每个测试样本在K个最近邻分类器上的分类结果。(7c) Classify the K channels of each test sample according to steps (7a) and (7b), and obtain the classification results of each test sample on K nearest neighbor classifiers.

步骤8，综合步骤7得到的测试样本每个通道的分类结果，得到测试样本的最终分类结果。In step 8, the classification results of each channel of the test sample obtained in step 7 are integrated to obtain the final classification result of the test sample.

(8a)根据步骤7得到的每个测试样本在K个最近邻分类器上的分类结果，分别统计每个最近邻分类器上的被正确分类的测试样本数目CN_k，计算每个最近邻分类器的识别率：(8a) According to the classification results of each test sample on the K nearest neighbor classifiers obtained in step 7, respectively count the number of correctly classified test samples CN _k on each nearest neighbor classifier, and calculate each nearest neighbor classification The recognition rate of the device:

其中，CN_k为第k个最近邻分类器上的被正确分类的测试样本数目，o_k为第k个最近邻分类器的识别率；Among them, CN _k is the number of correctly classified test samples on the kth nearest neighbor classifier, and o _k is the recognition rate of the kth nearest neighbor classifier;

(8b)根据步骤(8a)得到的K个最近邻分类器的识别率分别计算K个最近邻分类器的线性权重系数α_k：(8b) Calculate the linear weight coefficients α _k of the K nearest neighbor classifiers respectively according to the recognition rates of the K nearest neighbor classifiers obtained in step (8a):

(8c)根据步骤(8b)得到的线性权重系数α_k，计算测试样本K个通道投影系数向量与训练样本K个通道低秩鲁棒特征h_j(k)之间的加权距离：(8c) According to the linear weight coefficient α _k obtained in step (8b), calculate the test sample K channel projection coefficient vectors The weighted distance between the K channel low-rank robust features h _j (k) of the training sample:

，得到加权距离集合{d_1i,d_2i,...,d_ji,...,d_ni}；, get the weighted distance set {d _1i ,d _2i ,...,d _ji ,...,d _ni };

(8d)根据步骤(8c)得到的加权距离集合{d_1i,d_2i,...,d_ji,...,d_ni}，将加权距离集合中最小值d_ωi对应的第ω个训练样本的类别作为测试样本的分类结果，其中，ω∈{1,2,...,n}。(8d) According to the weighted distance set {d _1i ,d _2i ,...,d _ji ,...,d _ni } obtained in step (8c), train the ωth corresponding to the minimum value d _ωi in the weighted distance set The category of the sample is the classification result of the test sample, where ω∈{1,2,...,n}.

以上描述仅是本发明的一个具体实例，不构成对本发明的任何限制，显然对于本领域的专业人员来说，在了解了本发明内容和原理后，都可能在不背离本发明原理、结构的情况下，进行形式和细节上的各种修正和改变，但是这些基于本发明思想的修正和改变仍在本发明的权利要求保护范围之内。The above description is only a specific example of the present invention, and does not constitute any limitation to the present invention. Obviously, for those skilled in the art, after understanding the content and principles of the present invention, it is possible without departing from the principles and structures of the present invention. In some cases, various modifications and changes in form and details are made, but these modifications and changes based on the idea of the present invention are still within the protection scope of the claims of the present invention.

Claims

1. A face recognition method based on deep learning multi-layer non-negative matrix factorization, including:

(1) Input the data of each channel of the training sample into the VGG-Face deep convolutional neural network to obtain the characteristic data X(k) of each channel data of the training sample, where k=1,2,..., K, K is the channel number of training samples;

(2) Carry out the feature extraction process of normalization, nonlinear transformation and matrix decomposition to the characteristic data X (k) that step (1) obtains respectively, obtain coefficient matrix H (k);

(3) Repeat the feature extraction process in step (2) L times to obtain low-rank robust features h _j (k), where j=1,2,...,n, n is the total number of training samples;

(4) Construct K nearest neighbor classifiers according to the low-rank robust features h _j (k) obtained in step (3);

(5) Input each channel data of the test sample into the VGG-Face deep convolutional neural network to obtain the characteristic data Y(k) of each channel data of the test sample;

(6) Perform the projection process according to the characteristic data Y(k) obtained in step (5), and obtain the projection coefficient vector

(7) The projection coefficient vector obtained in step (6) Input to the K nearest neighbor classifiers to obtain the classification result of each channel of the test sample, where i=1,2,...,e, e is the total number of test samples;

(8) Combining the classification results of each channel of the test sample obtained in step (7) to obtain the classification result of the test sample.

2. The method according to claim 1, wherein the implementation steps of the step (2) are as follows:

(2a) Normalize the feature data X(k) using the L2 norm;

(2b) use the sigmoid function to carry out non-linear transformation to the result after step (2a) normalization processing, obtain the transformed result B(k);

(2c) Use soft-constrained non-negative matrix decomposition to perform matrix decomposition on the result B(k) after nonlinear transformation in step (2b), and obtain B(k)≈Z(k)A(k)F(k), where , B(k) is a matrix of order m×n, Z(k) is a base matrix of order m×φ, A(k) is an auxiliary matrix of order φ×c, and F(k) is a predicted label of order c×n Matrix, m is the original feature dimension, φ is the decomposition dimension, c is the number of categories, and n is the total number of training samples;

(2d) According to the auxiliary matrix A(k) obtained after the soft-constrained non-negative matrix decomposition in step (2c) and the predicted label matrix F(k), the coefficient matrix is obtained: H(k)=A(k)F(k).

3. The method according to claim 2, wherein in the step (2c), use soft constraint non-negative matrix decomposition to carry out matrix decomposition to the result B (k) after nonlinear transformation in the step (2b), carry out as follows:

(2c1) Randomly initialize the base matrix Z ⁽¹⁾ (k), auxiliary matrix A ⁽¹⁾ (k) and predictive label matrix F ⁽¹⁾ (k) as the result of one iteration, where the base matrix Z ^{(1 )} Any element in (k) satisfies is the pth row q column element of the base matrix Z ⁽¹⁾ (k); any element in the auxiliary matrix A ⁽¹⁾ (k) satisfies is the element in row β and column α of the auxiliary matrix A ⁽¹⁾ (k); any element in the predicted label matrix F ⁽¹⁾ (k) satisfies is the γth row of the predicted label matrix F ⁽¹⁾ (k) Column elements; p=1,2,...,m, q=1,2,...,φ, α=1,2,...,φ, β=1,2,...,c , γ=1,2,...,c,

(2c2) Update the elements Z _{p, q} in the base matrix Z according to the following formula:

<mrow><msubsup><mi>Z</mi><mrow><mi>p</mi><mo>,</mo><mi>q</mi></mrow><mrow><mo>(</mo><mi>t</mi><mo>)</mo><mo>&prime;</mo></mrow></msubsup><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mo>=</mo><msubsup><mi>Z</mi><mrow><mi>p</mi><mo>,</mo><mi>q</mi></mrow><mrow><mo>(</mo><mi>t</mi><mo>-</mo><mn>1</mn><mo>)</mo></mrow></msubsup><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mfrac><msubsup><mrow><mo>(</mo><mi>B</mi><mo>(</mo><mi>k</mi><mo>)</mo><msup><mi>F</mi><mi>T</mi></msup><mo>(</mo><mi>k</mi><mo>)</mo><msup><mi>A</mi><mi>T</mi></msup><mo>(</mo><mi>k</mi><mo>)</mo><mo>)</mo></mrow><mrow><mi>p</mi><mo>,</mo><mi>q</mi></mrow><mrow><mo>(</mo><mi>t</mi><mo>-</mo><mn>1</mn><mo>)</mo></mrow></msubsup><msubsup><mrow><mo>(</mo><mi>Z</mi><mo>(</mo><mi>k</mi><mo>)</mo><mi>A</mi><mo>(</mo><mi>k</mi><mo>)</mo><mi>F</mi><mo>(</mo><mi>k</mi><mo>)</mo><msup><mi>F</mi><mi>T</mi></msup><mo>(</mo><mi>k</mi><mo>)</mo><msup><mi>A</mi><mi>T</mi></msup><mo>(</mo><mi>k</mi><mo>)</mo><mo>)</mo></mrow><mrow><mi>p</mi><mo>,</mo><mi>q</mi></mrow><mrow><mo>(</mo><mi>t</mi><mo>-</mo><mn>1</mn><mo>)</mo></mrow></msubsup></mfrac><mo>,</mo></mrow>

Among them, t is the number of iterations, t=2,..., iter, iter is the maximum number of iterations, T is the matrix transpose, It is the pth row q column element of the unnormalized base matrix Z ^(t) '(k) obtained after iteration t times;

(2c3) normalize the base matrix Z ^(t) '(k) obtained in step (2c2), obtain the base matrix Z ( ^t ) (k) of iteration t times;

(2c4) Update the elements A _{α, β} (k) in the auxiliary matrix A(k) according to the following formula:

<mrow><msubsup><mi>A</mi><mrow><mi>&alpha;</mi><mo>,</mo><mi>&beta;</mi></mrow><mrow><mo>(</mo><mi>t</mi><mo>)</mo></mrow></msubsup><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mo>=</mo><msubsup><mi>A</mi><mrow><mi>&alpha;</mi><mo>,</mo><mi>&beta;</mi></mrow><mrow><mo>(</mo><mi>t</mi><mo>-</mo><mn>1</mn><mo>)</mo></mrow></msubsup><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mfrac><msubsup><mrow><mo>(</mo><msup><mi>Z</mi><mi>T</mi></msup><mo>(</mo><mi>k</mi><mo>)</mo><mi>B</mi><mo>(</mo><mi>k</mi><mo>)</mo><msup><mi>F</mi><mi>T</mi></msup><mo>(</mo><mi>k</mi><mo>)</mo><mo>)</mo></mrow><mrow><mi>&alpha;</mi><mo>,</mo><mi>&beta;</mi></mrow><mrow><mo>(</mo><mi>t</mi><mo>-</mo><mn>1</mn><mo>)</mo></mrow></msubsup><msubsup><mrow><mo>(</mo><msup><mi>Z</mi><mi>T</mi></msup><mo>(</mo><mi>k</mi><mo>)</mo><mi>Z</mi><mo>(</mo><mi>k</mi><mo>)</mo><mi>A</mi><mo>(</mo><mi>k</mi><mo>)</mo><mi>F</mi><mo>(</mo><mi>k</mi><mo>)</mo><msup><mi>F</mi><mi>T</mi></msup><mo>(</mo><mi>k</mi><mo>)</mo><mo>)</mo></mrow><mrow><mi>&alpha;</mi><mo>,</mo><mi>&beta;</mi></mrow><mrow><mo>(</mo><mi>t</mi><mo>-</mo><mn>1</mn><mo>)</mo></mrow></msubsup></mfrac><mo>,</mo></mrow>

in, For the auxiliary matrix A ^(t) (k) obtained after the iteration t times, the α row and β column element; A ^(t) (k) is the auxiliary matrix obtained after the iteration t times;

(2c5) According to the following formula, predict the elements in the label matrix F(k) Make an update:

in, is the γ-th row of the predicted label matrix F ^(t) (k) after iteration t times Column elements; F ^(t) (k) is the predicted label matrix after iteration t times; λ is the coefficient of the regular term; is the γ-th row of the pre-defined local label matrix C(k) column element;

(2c6) Determine whether the number of iterations t has reached the maximum number of iterations iter: if so, stop the iteration, and use the base matrix Z ^(iter) (k), auxiliary matrix A ^(iter) (k) and predicted label obtained in the iter iteration Matrix F ^(iter) (k), as the final base matrix Z(k), auxiliary matrix A(k) and predicted label matrix F(k); otherwise, return to step (2c2).

4. The method according to claim 1, wherein the implementation steps of the step (3) are as follows:

(3a) Process the feature data X(k) of each channel of the training sample according to step (2), and obtain the first-layer base matrix Z ¹ (k) and the first-layer coefficient matrix H ¹ (k), where k =1,2,...,K;

(3b) Process the first layer coefficient matrix H ¹ (k) obtained in step (3a) according to step (2), and obtain the second layer basic matrix Z ² (k) and the second layer coefficient matrix H ² (k) ;

(3c) Continue to repeat the same steps according to steps (3a) and (3b), and obtain the base matrix Z ^l (k) of the l-th layer and the coefficient matrix of the l-th layer according to the coefficient matrix H ^l-1 (k) of the l-1th layer H ^l (k), until the number of repetitions l=L, the base matrix Z ^L (k) of the L layer and the coefficient matrix H ^L (k) of the L layer are obtained, wherein, l=2,...,L, L is The number of layers of multi-level non-negative matrix factorization;

(3d) According to the L-th layer coefficient matrix H ^L (k) obtained in step (3c), obtain the low-rank robust features h _j (k) of each channel of the training sample, where j=1,2,..., n.

5. The method according to claim 1, wherein the implementation steps of the step (6) are as follows:

(6a) Perform normalization processing, nonlinear transformation and projection transformation process on the characteristic data Y(k) of the test sample to obtain the projection matrix Among them, k∈{1,2,...,K}, K is the number of sample channels;

(6a1) Normalize the feature data Y(k) of the test sample using the L2 norm;

(6a2) Use the Sigmoid function to perform nonlinear transformation on the result obtained after the normalization process in step (6a1), and obtain the transformation result f(Y(k)) after the nonlinear transformation, where f(·) means using Sigmoid function for nonlinear transformation;

(6a3) Perform projection transformation on the first-layer base matrix Z ¹ (k) obtained in step (3a) to obtain the first-layer projection matrix : in, Indicates the generalized inverse operation;

(6b) The first layer projection matrix obtained according to step (6a) Perform the same process as the base matrix Z ² (k) of the second layer to obtain the projection matrix of the second layer

(6c) Continue to repeat the same steps according to steps (6a) and (6b), according to the l-1th layer projection matrix and the base matrix Z ^l (k) of the l-th layer to obtain the projection matrix of the l-th layer Until the number of repetitions l=L, the L-th layer projection matrix is obtained Wherein, l=2,...,L, L is the number of layers of multi-layer non-negative matrix decomposition;

(6d) The L-th layer projection matrix obtained according to step (6c) Get the projection coefficient vector for each test sample Wherein, i=1, 2, . . . , e.

6. The method according to claim 1, wherein said step (7) is carried out as follows:

(7a) Calculate the low-rank robust feature h _j (k) of the training sample and the projection coefficient vector of the test sample The low-dimensional Euclidean distance between get distance set Among them, j=1,2,...,n, k∈{1,2,...,K}, i∈{1,2,...,e}, ||·|| ₂ means 2 norm;

(7b) The distance set obtained according to step (7a) The minimum value in the distance set The category of the corresponding ξ-th training sample is used as the classification result of the i-th test sample on the k-th nearest neighbor classifier, where ξ∈{1,2,...,n};

(7c) Classify the K channels of each test sample according to steps (7a) and (7b), and obtain the classification results of each test sample on K nearest neighbor classifiers.

7. The method according to claim 1, wherein said step (8) is carried out as follows:

(8a) According to the classification results of each test sample obtained in step (7) on the K nearest neighbor classifiers, respectively count the number of correctly classified test samples CN _k on each nearest neighbor classifier, and calculate each nearest neighbor classifier. The recognition rate of the neighbor classifier:

Among them, CN _k is the number of correctly classified test samples on the kth nearest neighbor classifier, and o _k is the recognition rate of the kth nearest neighbor classifier;

(8b) Calculate the linear weight coefficients α _k of the K nearest neighbor classifiers respectively according to the recognition rates of the K nearest neighbor classifiers obtained in step (8a):

<mrow><msub><mi>&alpha;</mi><mi>k</mi></msub><mo>=</mo><mfrac><msub><mi>o</mi><mi>k</mi></msub><mrow><munderover><mo>&Sigma;</mo><mrow><mi>k</mi><mo>=</mo><mn>1</mn></mrow><mi>K</mi></munderover><msub><mi>o</mi><mi>k</mi></msub></mrow></mfrac><mo>;</mo></mrow>

(8c) According to the linear weight coefficient α _k obtained in step (8b), calculate the test sample K channel projection coefficient vectors The weighted distance between the K channel low-rank robust features h _j (k) of the training sample:

<mrow><msub><mi>d</mi><mrow><mi>j</mi><mi>i</mi></mrow></msub><mo>=</mo><msub><mi>&alpha;</mi><mn>1</mn></msub><mo>|</mo><mo>|</mo><msub><mover><mi>h</mi><mo>^</mo></mover><mi>i</mi></msub><mrow><mo>(</mo><mn>1</mn><mo>)</mo></mrow><mo>-</mo><msub><mi>h</mi><mi>j</mi></msub><mrow><mo>(</mo><mn>1</mn><mo>)</mo></mrow><mo>|</mo><msub><mo>|</mo><mn>2</mn></msub><mo>+</mo><msub><mi>&alpha;</mi><mn>2</mn></msub><mo>|</mo><mo>|</mo><msub><mover><mi>h</mi><mo>^</mo></mover><mi>i</mi></msub><mrow><mo>(</mo><mn>2</mn><mo>)</mo></mrow><mo>-</mo><msub><mi>h</mi><mi>j</mi></msub><mrow><mo>(</mo><mn>2</mn><mo>)</mo></mrow><mo>|</mo><msub><mo>|</mo><mn>2</mn></msub><mo>+</mo><mo>...</mo><mo>+</mo><msub><mi>&alpha;</mi><mi>k</mi></msub><mo>|</mo><mo>|</mo><msub><mover><mi>h</mi><mo>^</mo></mover><mi>i</mi></msub><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mo>-</mo><msub><mi>h</mi><mi>j</mi></msub><mrow><mo>(</mo><mi>k</mi><mo>)</mo></mrow><mo>|</mo><msub><mo>|</mo><mn>2</mn></msub><mo>+</mo><mo>...</mo><mo>+</mo><msub><mi>&alpha;</mi><mi>K</mi></msub><mo>|</mo><mo>|</mo><msub><mover><mi>h</mi><mo>^</mo></mover><mi>i</mi></msub><mrow><mo>(</mo><mi>K</mi><mo>)</mo></mrow><mo>-</mo><msub><mi>h</mi><mi>j</mi></msub><mrow><mo>(</mo><mi>K</mi><mo>)</mo></mrow><mo>|</mo><msub><mo>|</mo><mn>2</mn></msub><mo>,</mo></mrow>

Get the weighted distance set {d _1i ,d _2i ,...,d _ji ,...,d _ni }, where j=1,2,...,n, i∈{1,2,... ,e};

(8d) According to the weighted distance set {d _1i ,d _2i ,...,d _ji ,...,d _ni } obtained in step (8c), train the ωth corresponding to the minimum value d _ωi in the weighted distance set The category of the sample is the classification result of the test sample, where ω∈{1,2,...,n}.