CN104063714A - Fast human face recognition algorithm used for video monitoring and based on CUDA parallel computing and sparse representing - Google Patents

Fast human face recognition algorithm used for video monitoring and based on CUDA parallel computing and sparse representing Download PDF

Info

Publication number
CN104063714A
CN104063714A CN201410346049.2A CN201410346049A CN104063714A CN 104063714 A CN104063714 A CN 104063714A CN 201410346049 A CN201410346049 A CN 201410346049A CN 104063714 A CN104063714 A CN 104063714A
Authority
CN
China
Prior art keywords
dictionary
matrix
algorithm
sparse
gpu
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410346049.2A
Other languages
Chinese (zh)
Other versions
CN104063714B (en
Inventor
詹曙
王俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201410346049.2A priority Critical patent/CN104063714B/en
Publication of CN104063714A publication Critical patent/CN104063714A/en
Application granted granted Critical
Publication of CN104063714B publication Critical patent/CN104063714B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a fast human face recognition algorithm used for video monitoring and based on CUDA parallel computing and sparse representing. For overcoming the defects that in the sparse representing dictionary training process, the computing complexity is large and the speed is low, the CUDA parallel computing technology is adopted in the method, and a GPU is used for performing computing to obtain a dictionary; the sparse representing method is adopted, a to-be-detected sample is re-established by solving the sparse coefficient vector, and classification recognition is carried out according to the residual error between a tested sample and the re-established sample. According to the method, computing resources of existing computer hardware are fully excavated, optimizing is carried out according to the algorithm features, the computing speed in the dictionary training process is increased, the algorithm executing time is greatly shortened, and the efficiency of the method is improved.

Description

A kind of for fast face recognizer video monitoring, based on CUDA parallel computation and rarefaction representation
Technical field
The present invention relates to computer vision and mode identification technology, be specifically related to the fast human face recognition of CUDA parallel computation and rarefaction representation.
Background technology
Face recognition technology is current main biological identification technology, due to its have untouchable, friendly, convenient, the feature such as be difficult for discovering, be easy to be accepted by user, make user without any mental handicape, thereby having obtained research widely and application, is one of important problem of computer vision and area of pattern recognition.
In numerous research methods of recognition of face, the classificating thought of rarefaction representation success has been obtained consequence in recognition of face field.Images Classification based on rarefaction representation is or to represent higher-dimension image with a small amount of of a sort low-dimensional Image Coding; Mainly contain two stages: rarefaction representation and Classification and Identification.First, by dictionary atom and some sparse property constraints, test pattern is represented, then on the basis of rarefaction representation coefficient and dictionary, carry out Classification and Identification.2009, Wright etc., by proposing a sorter (Sparse Representation Classification, SRC) based on rarefaction representation, using original training facial image as dictionary, passed through l 1norm solves the sparse coefficient of test sample book, by this coefficient, test person face is reconstructed, and then obtains its residual error, is classified as residual error infima species, has obtained good classifying quality.
In rarefaction representation, the structure of dictionary is vital, and scholar both domestic and external had proposed the method for the complete dictionary learning of a lot of mistakes in recent years, and object is from training sample, to obtain one group of base can better to test sample book, represent or encode.Dictionary is mainly to be trained and obtain from sample by machine learning, and the subject matter existing is at present exactly to expend time in and calculation resources when on CPU, serial moves.
Along with the development of parallel processing technique, become an important directions of image processing field and computer science; Programmable graphics processor (Programmable GPU) is on current computing machine, to process the dedicated devices generally adopting.GPU has far away floating-point operation ability and the memory bandwidth higher than CPU, simultaneously due to the concurrency of its height, is very suitable for large-scale data and processes.2006, the tall and handsome company that reaches released unified equipment framework (Compute Unifed Device Architecture, CUDA), and this is the general-purpose computations framework of a kind of new parallel programming model and instruction set architecture.CUDA framework can well be worked in coordination with CPU and be processed parallel task, particularly for calculating relatively time consuming floating number, calculate, at present, CUDA can well support the floating number of double precision to calculate, than CPU, solve more efficiently the calculation task of many complexity, improved significantly the efficiency that traditional algorithm is processed.
Summary of the invention:
Current sparse dictionary learning algorithm complex is high, and travelling speed is slow, and traditional rarefaction representation Classification and Identification effect is not fine simultaneously.The object of the invention is to propose a kind of for fast face recognizer video monitoring, based on CUDA parallel computation and rarefaction representation.For solving above-mentioned technical matters, the following technic relization scheme of employing of the present invention:
Fast human face recognition based on CUDA parallel computation and rarefaction representation, is characterized in that, the step of this algorithm is as follows:
(1) from face database structure training sample matrix, and initialization dictionary:
Suppose in face database k class people altogether, everyone has n width picture, altogether N=n*c width picture.Therefrom everyone selects c (1<c<n) width picture to be used for constructing training sample matrix Y at random, and remaining picture is configured to test sample book collection; Used complete DCT dictionary as initialization dictionary D;
(2) utilize CUDA Mutli-thread Programming Technology to build CPU and GPU cooperative working environment:
Training sample matrix Y and initialization dictionary D are sent in GPU video memory; In CUDA, write the program of CPU and GPU, first realize kernel function, in definition GPU, need the operation of carrying out; Then define the multithreading scale of this kernel function, comprise the dimension of the thread in thread block dimension and each thread block;
(3) on GPU platform, adopt KSVD dictionary algorithm to carry out dictionary learning to training sample matrix:
Make D, Y, X represents respectively dictionary, the training sample that study obtains, the rarefaction representation matrix of training sample; The target equation of K-SVD dictionary training algorithm can be expressed as:
min D , X { | | Y - DX | | F 2 } s . t &ForAll; i , | | x i | | 0 &le; T 0 - - - ( 1 )
Solving of above formula is an iterative process.Algorithm flow is:
A. sparse coding:
Suppose that dictionary D fixes, use OMP algorithm can solve the rarefaction representation matrix of coefficients X of Y on dictionary D;
x i = arg min D , X | | y i - Dx i | | 2 2 s . t | | x i | | 0 &le; T 0 - - - ( 2 )
B. dictionary updating: the atom to dictionary D upgrades one by one, and renewal process is as follows:
1. for the current atom d that will upgrade k, note I k={ i|a i(k) ≠ 0,1≤i≤N}, a i(k) be a iin k element, I krepresent to use in each sample atom d kindex;
2. calculate residual matrix the j that represents A is capable, E krepresent to remove the error after k atom, all samples being caused;
3. corresponding to I kin index, choose E kin corresponding column vector, form new error matrix and right carry out SVD decomposition E R k = U&Lambda;V T ;
4. get first row in matrix U as the atom after upgrading, Λ (1,1) with the product of the first row of matrix V as new sparse coefficient.
C. iteration finishes, otherwise turns back to A;
(4) rarefaction representation dictionary D is sent back in CPU internal memory from GPU video memory;
(5) dictionary D is applied to traditional rarefaction representation sorting algorithm in CPU and carries out Classification and Identification:
Make z represent test sample book, test pattern z is expressed as to the linear combination of training sample, be i.e. z=D α, wherein α=[α 1; α i; α k].Classification and identification algorithm process is as follows:
A. test sample book is expressed as to the linear combination of dictionary D, by l 1norm minimum solves and obtains sparse coefficient:
&alpha; ^ = arg min &alpha; { | | z - D&alpha; | | 2 2 + &lambda; | | &alpha; | | 1 } - - - ( 3 )
In formula, λ is a scalar;
B. calculate the residual error of approaching of Different categories of samples to test sample book:
e i = | | z - D&delta; i ( &alpha; ^ ) | | , i = 1 , . . . , k - - - ( 4 )
Wherein, δ i(α) be the coefficient vector corresponding with i class sample;
C. according to least residual, approach criterion and carry out test pattern classification:
identity ( z ) = arg min i ( e i ( z ) ) - - - ( 5 )
Compared with prior art, beneficial effect of the present invention is:
1. the present invention uses CUDA parallel computing to accelerate to process to the learning process of dictionary, and is optimized according to algorithm characteristics, has reduced widely algorithm execution time, has improved arithmetic speed;
2. adopt GPU concurrent technique to process, fully excavated the computational resource of active computer hardware, significantly promoted the efficiency of face identification method, can effectively improve the accuracy of identification.
Accompanying drawing explanation:
Fig. 1 is CUDA thread network chart;
Fig. 2 is the fast human face recognition process flow diagram based on CUDA parallel computation and rarefaction representation.
Embodiment:
(1) from face database structure training sample matrix, and initialization dictionary:
Suppose in face database k class people altogether, everyone has n width picture, altogether N=n*c width picture.Therefrom everyone selects c (1<c<n) width picture to be used for constructing training sample matrix Y at random, and remaining picture is configured to test sample book collection; Used complete DCT dictionary as initialization dictionary D;
(2) utilize CUDA Mutli-thread Programming Technology to build CPU and GPU cooperative working environment:
Training sample matrix Y and initialization dictionary D are sent in GPU video memory; In CUDA, write the program of CPU and GPU, first realize kernel function, in definition GPU, need the operation of carrying out; Then define the multithreading scale of this kernel function, comprise the dimension of the thread in thread block dimension and each thread block;
(3) on GPU platform, adopt KSVD dictionary algorithm to carry out dictionary learning to training sample matrix:
Make D, Y, X represents respectively dictionary, the training sample that study obtains, the rarefaction representation matrix of training sample; The target equation of K-SVD dictionary training algorithm can be expressed as:
min D , X { | | Y - DX | | F 2 } s . t &ForAll; i , | | x i | | 0 &le; T 0 - - - ( 1 )
Solving of above formula is an iterative process.Algorithm flow is:
A. sparse coding:
Suppose that dictionary D fixes, use OMP algorithm can solve the rarefaction representation matrix of coefficients X of Y on dictionary D;
x i = arg min D , X | | y i - Dx i | | 2 2 s . t | | x i | | 0 &le; T 0 - - - ( 2 )
B. dictionary updating: the atom to dictionary D upgrades one by one, and renewal process is as follows:
1. for the current atom d that will upgrade k, note I k={ i|a i(k) ≠ 0,1≤i≤N}, a i(k) be a iin k element, I krepresent to use in each sample atom d kindex;
2. calculate residual matrix the j that represents A is capable, E krepresent to remove the error after k atom, all samples being caused;
3. corresponding to I kin index, choose E kin corresponding column vector, form new error matrix and right carry out SVD decomposition E R k = U&Lambda;V T ;
4. get first row in matrix U as the atom after upgrading, Λ (1,1) with the product of the first row of matrix V as new sparse coefficient.
C. iteration finishes, otherwise turns back to A;
(4) rarefaction representation dictionary D is sent back in CPU internal memory from GPU video memory;
(5) dictionary D is applied to traditional rarefaction representation sorting algorithm in CPU and carries out Classification and Identification:
Make z represent test sample book, test pattern z is expressed as to the linear combination of training sample, be i.e. z=D α, wherein α=[α 1; α i; α k].Classification and identification algorithm process is as follows:
A. test sample book is expressed as to the linear combination of dictionary D, by l 1norm minimum solves and obtains sparse coefficient:
&alpha; ^ = arg min &alpha; { | | z - D&alpha; | | 2 2 + &lambda; | | &alpha; | | 1 } - - - ( 3 )
In formula, λ is a scalar;
B. calculate the residual error of approaching of Different categories of samples to test sample book:
e i = | | z - D&delta; i ( &alpha; ^ ) | | , i = 1 , . . . , k - - - ( 4 )
Wherein, δ i(α) be the coefficient vector corresponding with i class sample;
C. according to least residual, approach criterion and carry out test pattern classification:
identity ( z ) = arg min i ( e i ( z ) ) - - - ( 5 )
Through above step, the method, on the basis of CUDA parallel mechanism, has realized effectively recognition of face at a high speed.
The present invention utilizes parallel mechanism and the multi-thread programming feature of CUDA, dictionary learning process is carried out to GPU and optimize acceleration, solved the slow problem of dictionary learning arithmetic speed, simplified the complexity of method, improve the operational efficiency of algorithm, can meet the application demand of real-time in reality.

Claims (1)

1. for fast face recognizer video monitoring, based on CUDA parallel computation and rarefaction representation, it is characterized in that, the step of method is as follows;
(1) from face database structure training sample matrix, and initialization dictionary:
Suppose in face database k class people altogether, everyone has n width picture, altogether N=n*c width picture.Therefrom everyone selects c (1<c<n) width picture to be used for constructing training sample matrix Y at random, and remaining picture is configured to test sample book collection; Used complete DCT dictionary as initialization dictionary D;
(2) utilize CUDA Mutli-thread Programming Technology to build CPU and GPU cooperative working environment:
Training sample matrix Y and initialization dictionary D are sent in GPU video memory; In CUDA, write the program of CPU and GPU, first realize kernel function, in definition GPU, need the operation of carrying out; Then define the multithreading scale of this kernel function, comprise the dimension of the thread in thread block dimension and each thread block;
(3) on GPU platform, adopt KSVD dictionary algorithm to carry out dictionary learning to training sample matrix:
Make D, Y, X represents respectively dictionary, the training sample that study obtains, the rarefaction representation matrix of training sample; The target equation of K-SVD dictionary training algorithm can be expressed as:
Solving of above formula is an iterative process.Algorithm flow is:
A. sparse coding:
Suppose that dictionary D fixes, use OMP algorithm can solve the rarefaction representation matrix of coefficients X of Y on dictionary D;
B. dictionary updating: the atom to dictionary D upgrades one by one, and renewal process is as follows:
1. for the current atom d that will upgrade k, note I k={ i|a i(k) ≠ 0,1≤i≤N}, a i(k) be a iin k element, I krepresent to use in each sample atom d kindex;
2. calculate residual matrix the j that represents A is capable, E krepresent to remove the error after k atom, all samples being caused;
3. corresponding to I kin index, choose E kin corresponding column vector, form new error matrix and right carry out SVD decomposition
4. get first row in matrix U as the atom after upgrading, Λ (1,1) with the product of the first row of matrix V as new sparse coefficient.
C. iteration finishes, otherwise turns back to A;
(4) rarefaction representation dictionary D is sent back in CPU internal memory from GPU video memory;
(5) dictionary D is applied to traditional rarefaction representation sorting algorithm in CPU and carries out Classification and Identification:
Make z represent test sample book, test pattern z is expressed as to the linear combination of training sample, be i.e. z=D α, wherein α=[α 1; α i; α k].Classification and identification algorithm process is as follows:
(1) test sample book is expressed as to the linear combination of dictionary D, by l 1norm minimum solves and obtains sparse coefficient:
In formula, λ is a scalar;
(2) calculate the approach residual error of Different categories of samples to test sample book:
Wherein, δ i(α) be the coefficient vector corresponding with i class sample;
(3) according to least residual, approach criterion and carry out test pattern classification:
CN201410346049.2A 2014-07-20 2014-07-20 A kind of for fast face recognizer video monitoring, based on CUDA parallel computation and rarefaction representation Expired - Fee Related CN104063714B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410346049.2A CN104063714B (en) 2014-07-20 2014-07-20 A kind of for fast face recognizer video monitoring, based on CUDA parallel computation and rarefaction representation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410346049.2A CN104063714B (en) 2014-07-20 2014-07-20 A kind of for fast face recognizer video monitoring, based on CUDA parallel computation and rarefaction representation

Publications (2)

Publication Number Publication Date
CN104063714A true CN104063714A (en) 2014-09-24
CN104063714B CN104063714B (en) 2016-05-18

Family

ID=51551416

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410346049.2A Expired - Fee Related CN104063714B (en) 2014-07-20 2014-07-20 A kind of for fast face recognizer video monitoring, based on CUDA parallel computation and rarefaction representation

Country Status (1)

Country Link
CN (1) CN104063714B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104318522A (en) * 2014-10-08 2015-01-28 苏州新视线文化科技发展有限公司 Graphics processing unit-based sparse representation fast calculation method
CN106407995A (en) * 2016-04-01 2017-02-15 中国地质大学(武汉) Image data set sparse expression acceleration method and apparatus
CN106485202A (en) * 2016-09-18 2017-03-08 南京工程学院 Unconfinement face identification system and method
CN107886519A (en) * 2017-10-17 2018-04-06 杭州电子科技大学 Multichannel chromatogram three-dimensional image fast partition method based on CUDA
CN108256345A (en) * 2016-12-28 2018-07-06 中移(杭州)信息技术有限公司 A kind of picture method for secret protection, apparatus and system
CN108921088A (en) * 2018-06-29 2018-11-30 佛山市顺德区中山大学研究院 A kind of face identification method based on discriminate target equation
CN109165554A (en) * 2018-07-24 2019-01-08 高新兴科技集团股份有限公司 A kind of face characteristic comparison method based on cuda technology
CN109997115A (en) * 2016-11-23 2019-07-09 超威半导体公司 Low-power and low latency GPU coprocessor for persistently calculating
CN110765965A (en) * 2019-10-30 2020-02-07 兰州理工大学 Quick dictionary learning algorithm for sparse representation of mechanical vibration signals
US10769464B2 (en) 2017-09-12 2020-09-08 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Facial recognition method and related product
CN112001865A (en) * 2020-09-02 2020-11-27 广东工业大学 Face recognition method, device and equipment

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101976453A (en) * 2010-09-26 2011-02-16 浙江大学 GPU-based three-dimensional face expression synthesis method
CN102521581B (en) * 2011-12-22 2014-02-19 刘翔 Parallel face recognition method with biological characteristics and local image characteristics
CN102737234B (en) * 2012-06-21 2015-08-12 北京工业大学 Based on the face identification method of Gabor filtering and joint sparse model
CN102915436B (en) * 2012-10-25 2015-04-15 北京邮电大学 Sparse representation face recognition method based on intra-class variation dictionary and training image

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104318522A (en) * 2014-10-08 2015-01-28 苏州新视线文化科技发展有限公司 Graphics processing unit-based sparse representation fast calculation method
CN106407995A (en) * 2016-04-01 2017-02-15 中国地质大学(武汉) Image data set sparse expression acceleration method and apparatus
CN106485202A (en) * 2016-09-18 2017-03-08 南京工程学院 Unconfinement face identification system and method
CN109997115A (en) * 2016-11-23 2019-07-09 超威半导体公司 Low-power and low latency GPU coprocessor for persistently calculating
CN108256345A (en) * 2016-12-28 2018-07-06 中移(杭州)信息技术有限公司 A kind of picture method for secret protection, apparatus and system
US10769464B2 (en) 2017-09-12 2020-09-08 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Facial recognition method and related product
CN107886519A (en) * 2017-10-17 2018-04-06 杭州电子科技大学 Multichannel chromatogram three-dimensional image fast partition method based on CUDA
CN108921088A (en) * 2018-06-29 2018-11-30 佛山市顺德区中山大学研究院 A kind of face identification method based on discriminate target equation
CN108921088B (en) * 2018-06-29 2022-03-04 佛山市顺德区中山大学研究院 Face recognition method based on discriminant target equation
CN109165554A (en) * 2018-07-24 2019-01-08 高新兴科技集团股份有限公司 A kind of face characteristic comparison method based on cuda technology
CN110765965A (en) * 2019-10-30 2020-02-07 兰州理工大学 Quick dictionary learning algorithm for sparse representation of mechanical vibration signals
CN110765965B (en) * 2019-10-30 2023-09-15 兰州理工大学 Quick dictionary learning algorithm for sparse representation of mechanical vibration signals
CN112001865A (en) * 2020-09-02 2020-11-27 广东工业大学 Face recognition method, device and equipment

Also Published As

Publication number Publication date
CN104063714B (en) 2016-05-18

Similar Documents

Publication Publication Date Title
CN104063714B (en) A kind of for fast face recognizer video monitoring, based on CUDA parallel computation and rarefaction representation
Thomas et al. CortexSuite: A synthetic brain benchmark suite
Sprechmann et al. Learning efficient sparse and low rank models
Khomenko et al. Accelerating recurrent neural network training using sequence bucketing and multi-gpu data parallelization
US11620521B2 (en) Smoothing regularization for a generative neural network
EP3742343A1 (en) Autonomous vehicle simulation using machine learning
CN107563150A (en) Forecasting Methodology, device, equipment and the storage medium of protein binding site
CN108171133B (en) Dynamic gesture recognition method based on characteristic covariance matrix
DE112020004167T5 (en) VIDEO PREDICTION USING ONE OR MORE NEURAL NETWORKS
CN107430678A (en) Use the inexpensive face recognition of Gauss received field feature
CN106295690A (en) Time series data clustering method based on Non-negative Matrix Factorization and system
Fazanaro et al. Numerical characterization of nonlinear dynamical systems using parallel computing: The role of GPUs approach
Rathi Optimization of transfer learning for sign language recognition targeting mobile platform
CN109325513A (en) A kind of image classification network training method based on magnanimity list class single image
Stober et al. Learning geometry from sensorimotor experience
Müller et al. Special issue on advances in kernel-based learning for signal processing
Tegegne et al. Parallel nonlinear dimensionality reduction using GPU Acceleration
Libuschewski et al. Multi-objective, energy-aware gpgpu design space exploration for medical or industrial applications
US11605001B2 (en) Weight demodulation for a generative neural network
US10255692B2 (en) Method for tracking an object in an image sequence
Rashidi Application of TensorFlow lite on embedded devices: A hands-on practice of TensorFlow model conversion to TensorFlow Lite model and its deployment on Smartphone to compare model’s performance
Fejér et al. Hybrid FPGA–CPU-Based Architecture for Object Recognition in Visual Servoing of Arm Prosthesis
Shahid et al. Multilinear low-rank tensors on graphs & applications
JP7298870B2 (en) Molecular dynamics data analyzer and program
Huang et al. Deep Time Series Sketching and Its Application on Industrial Time Series Clustering

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160518

Termination date: 20170720