CN104063714A

CN104063714A - Fast human face recognition algorithm used for video monitoring and based on CUDA parallel computing and sparse representing

Info

Publication number: CN104063714A
Application number: CN201410346049.2A
Authority: CN
Inventors: 詹曙; 王俊
Original assignee: Individual
Current assignee: Individual
Priority date: 2014-07-20
Filing date: 2014-07-20
Publication date: 2014-09-24
Anticipated expiration: 2034-07-20
Also published as: CN104063714B

Abstract

The invention discloses a fast human face recognition algorithm used for video monitoring and based on CUDA parallel computing and sparse representing. For overcoming the defects that in the sparse representing dictionary training process, the computing complexity is large and the speed is low, the CUDA parallel computing technology is adopted in the method, and a GPU is used for performing computing to obtain a dictionary; the sparse representing method is adopted, a to-be-detected sample is re-established by solving the sparse coefficient vector, and classification recognition is carried out according to the residual error between a tested sample and the re-established sample. According to the method, computing resources of existing computer hardware are fully excavated, optimizing is carried out according to the algorithm features, the computing speed in the dictionary training process is increased, the algorithm executing time is greatly shortened, and the efficiency of the method is improved.

Description

A kind of for fast face recognizer video monitoring, based on CUDA parallel computation and rarefaction representation

Technical field

The present invention relates to computer vision and mode identification technology, be specifically related to the fast human face recognition of CUDA parallel computation and rarefaction representation.

Background technology

Face recognition technology is current main biological identification technology, due to its have untouchable, friendly, convenient, the feature such as be difficult for discovering, be easy to be accepted by user, make user without any mental handicape, thereby having obtained research widely and application, is one of important problem of computer vision and area of pattern recognition.

In numerous research methods of recognition of face, the classificating thought of rarefaction representation success has been obtained consequence in recognition of face field.Images Classification based on rarefaction representation is or to represent higher-dimension image with a small amount of of a sort low-dimensional Image Coding; Mainly contain two stages: rarefaction representation and Classification and Identification.First, by dictionary atom and some sparse property constraints, test pattern is represented, then on the basis of rarefaction representation coefficient and dictionary, carry out Classification and Identification.2009, Wright etc., by proposing a sorter (Sparse Representation Classification, SRC) based on rarefaction representation, using original training facial image as dictionary, passed through l ₁norm solves the sparse coefficient of test sample book, by this coefficient, test person face is reconstructed, and then obtains its residual error, is classified as residual error infima species, has obtained good classifying quality.

In rarefaction representation, the structure of dictionary is vital, and scholar both domestic and external had proposed the method for the complete dictionary learning of a lot of mistakes in recent years, and object is from training sample, to obtain one group of base can better to test sample book, represent or encode.Dictionary is mainly to be trained and obtain from sample by machine learning, and the subject matter existing is at present exactly to expend time in and calculation resources when on CPU, serial moves.

Along with the development of parallel processing technique, become an important directions of image processing field and computer science; Programmable graphics processor (Programmable GPU) is on current computing machine, to process the dedicated devices generally adopting.GPU has far away floating-point operation ability and the memory bandwidth higher than CPU, simultaneously due to the concurrency of its height, is very suitable for large-scale data and processes.2006, the tall and handsome company that reaches released unified equipment framework (Compute Unifed Device Architecture, CUDA), and this is the general-purpose computations framework of a kind of new parallel programming model and instruction set architecture.CUDA framework can well be worked in coordination with CPU and be processed parallel task, particularly for calculating relatively time consuming floating number, calculate, at present, CUDA can well support the floating number of double precision to calculate, than CPU, solve more efficiently the calculation task of many complexity, improved significantly the efficiency that traditional algorithm is processed.

Summary of the invention:

Current sparse dictionary learning algorithm complex is high, and travelling speed is slow, and traditional rarefaction representation Classification and Identification effect is not fine simultaneously.The object of the invention is to propose a kind of for fast face recognizer video monitoring, based on CUDA parallel computation and rarefaction representation.For solving above-mentioned technical matters, the following technic relization scheme of employing of the present invention:

Fast human face recognition based on CUDA parallel computation and rarefaction representation, is characterized in that, the step of this algorithm is as follows:

(1) from face database structure training sample matrix, and initialization dictionary:

Suppose in face database k class people altogether, everyone has n width picture, altogether N=n*c width picture.Therefrom everyone selects c (1<c<n) width picture to be used for constructing training sample matrix Y at random, and remaining picture is configured to test sample book collection; Used complete DCT dictionary as initialization dictionary D;

(2) utilize CUDA Mutli-thread Programming Technology to build CPU and GPU cooperative working environment:

Training sample matrix Y and initialization dictionary D are sent in GPU video memory; In CUDA, write the program of CPU and GPU, first realize kernel function, in definition GPU, need the operation of carrying out; Then define the multithreading scale of this kernel function, comprise the dimension of the thread in thread block dimension and each thread block;

(3) on GPU platform, adopt KSVD dictionary algorithm to carry out dictionary learning to training sample matrix:

Make D, Y, X represents respectively dictionary, the training sample that study obtains, the rarefaction representation matrix of training sample; The target equation of K-SVD dictionary training algorithm can be expressed as:

\begin{matrix} \min_{D, X} {{| | Y - DX | |}_{F}^{2}} & s . t & &ForAll; i {, | | x_{i} | |}_{0} \leq T_{0} \end{matrix} - - - (1)

Solving of above formula is an iterative process.Algorithm flow is:

A. sparse coding:

Suppose that dictionary D fixes, use OMP algorithm can solve the rarefaction representation matrix of coefficients X of Y on dictionary D;

\begin{matrix} x_{i} = \arg \min_{D, X} {| | y_{i} - {Dx}_{i} | |}_{2}^{2} & s . t & {| | x_{i} | |}_{0} \leq T_{0} \end{matrix} - - - (2)

B. dictionary updating: the atom to dictionary D upgrades one by one, and renewal process is as follows:

1. for the current atom d that will upgrade _k, note I _k={ i|a _i(k) ≠ 0,1≤i≤N}, a _i(k) be a _iin k element, I _krepresent to use in each sample atom d _kindex;

2. calculate residual matrix the j that represents A is capable, E ^krepresent to remove the error after k atom, all samples being caused;

3. corresponding to I _kin index, choose E ^kin corresponding column vector, form new error matrix and right carry out SVD decomposition

E_{R}^{k} = {UΛV}^{T};

4. get first row in matrix U as the atom after upgrading, Λ (1,1) with the product of the first row of matrix V as new sparse coefficient.

C. iteration finishes, otherwise turns back to A;

(4) rarefaction representation dictionary D is sent back in CPU internal memory from GPU video memory;

(5) dictionary D is applied to traditional rarefaction representation sorting algorithm in CPU and carries out Classification and Identification:

Make z represent test sample book, test pattern z is expressed as to the linear combination of training sample, be i.e. z=D α, wherein α=[α ₁; α _i; α _k].Classification and identification algorithm process is as follows:

A. test sample book is expressed as to the linear combination of dictionary D, by l ₁norm minimum solves and obtains sparse coefficient:

\hat{α} = \arg \min_{α} {{| | z - Dα | |}_{2}^{2} + λ {| | α | |}_{1}} - - - (3)

In formula, λ is a scalar;

B. calculate the residual error of approaching of Different categories of samples to test sample book:

e_{i} = | | z - {Dδ}_{i} (\hat{α}) | |, i = 1, . . ., k - - - (4)

Wherein, δ _i(α) be the coefficient vector corresponding with i class sample;

C. according to least residual, approach criterion and carry out test pattern classification:

identity (z) = \arg \min_{i} (e_{i} (z)) - - - (5)

Compared with prior art, beneficial effect of the present invention is:

1. the present invention uses CUDA parallel computing to accelerate to process to the learning process of dictionary, and is optimized according to algorithm characteristics, has reduced widely algorithm execution time, has improved arithmetic speed;

2. adopt GPU concurrent technique to process, fully excavated the computational resource of active computer hardware, significantly promoted the efficiency of face identification method, can effectively improve the accuracy of identification.

Accompanying drawing explanation:

Fig. 1 is CUDA thread network chart;

Fig. 2 is the fast human face recognition process flow diagram based on CUDA parallel computation and rarefaction representation.

Embodiment:

\begin{matrix} \min_{D, X} {{| | Y - DX | |}_{F}^{2}} & s . t & &ForAll; i {, | | x_{i} | |}_{0} \leq T_{0} \end{matrix} - - - (1)

Solving of above formula is an iterative process.Algorithm flow is:

A. sparse coding:

\begin{matrix} x_{i} = \arg \min_{D, X} {| | y_{i} - {Dx}_{i} | |}_{2}^{2} & s . t & {| | x_{i} | |}_{0} \leq T_{0} \end{matrix} - - - (2)

E_{R}^{k} = {UΛV}^{T};

C. iteration finishes, otherwise turns back to A;

\hat{α} = \arg \min_{α} {{| | z - Dα | |}_{2}^{2} + λ {| | α | |}_{1}} - - - (3)

In formula, λ is a scalar;

e_{i} = | | z - {Dδ}_{i} (\hat{α}) | |, i = 1, . . ., k - - - (4)

Wherein, δ _i(α) be the coefficient vector corresponding with i class sample;

identity (z) = \arg \min_{i} (e_{i} (z)) - - - (5)

Through above step, the method, on the basis of CUDA parallel mechanism, has realized effectively recognition of face at a high speed.

The present invention utilizes parallel mechanism and the multi-thread programming feature of CUDA, dictionary learning process is carried out to GPU and optimize acceleration, solved the slow problem of dictionary learning arithmetic speed, simplified the complexity of method, improve the operational efficiency of algorithm, can meet the application demand of real-time in reality.

Claims

1. for fast face recognizer video monitoring, based on CUDA parallel computation and rarefaction representation, it is characterized in that, the step of method is as follows;

Solving of above formula is an iterative process.Algorithm flow is:

A. sparse coding:

C. iteration finishes, otherwise turns back to A;

(1) test sample book is expressed as to the linear combination of dictionary D, by l ₁norm minimum solves and obtains sparse coefficient:

In formula, λ is a scalar;

(2) calculate the approach residual error of Different categories of samples to test sample book:

Wherein, δ _i(α) be the coefficient vector corresponding with i class sample;

(3) according to least residual, approach criterion and carry out test pattern classification:

。