CN104063714A - Fast human face recognition algorithm used for video monitoring and based on CUDA parallel computing and sparse representing - Google Patents
Fast human face recognition algorithm used for video monitoring and based on CUDA parallel computing and sparse representing Download PDFInfo
- Publication number
- CN104063714A CN104063714A CN201410346049.2A CN201410346049A CN104063714A CN 104063714 A CN104063714 A CN 104063714A CN 201410346049 A CN201410346049 A CN 201410346049A CN 104063714 A CN104063714 A CN 104063714A
- Authority
- CN
- China
- Prior art keywords
- dictionary
- matrix
- algorithm
- sparse
- gpu
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Image Analysis (AREA)
Abstract
The invention discloses a fast human face recognition algorithm used for video monitoring and based on CUDA parallel computing and sparse representing. For overcoming the defects that in the sparse representing dictionary training process, the computing complexity is large and the speed is low, the CUDA parallel computing technology is adopted in the method, and a GPU is used for performing computing to obtain a dictionary; the sparse representing method is adopted, a to-be-detected sample is re-established by solving the sparse coefficient vector, and classification recognition is carried out according to the residual error between a tested sample and the re-established sample. According to the method, computing resources of existing computer hardware are fully excavated, optimizing is carried out according to the algorithm features, the computing speed in the dictionary training process is increased, the algorithm executing time is greatly shortened, and the efficiency of the method is improved.
Description
Technical field
The present invention relates to computer vision and mode identification technology, be specifically related to the fast human face recognition of CUDA parallel computation and rarefaction representation.
Background technology
Face recognition technology is current main biological identification technology, due to its have untouchable, friendly, convenient, the feature such as be difficult for discovering, be easy to be accepted by user, make user without any mental handicape, thereby having obtained research widely and application, is one of important problem of computer vision and area of pattern recognition.
In numerous research methods of recognition of face, the classificating thought of rarefaction representation success has been obtained consequence in recognition of face field.Images Classification based on rarefaction representation is or to represent higher-dimension image with a small amount of of a sort low-dimensional Image Coding; Mainly contain two stages: rarefaction representation and Classification and Identification.First, by dictionary atom and some sparse property constraints, test pattern is represented, then on the basis of rarefaction representation coefficient and dictionary, carry out Classification and Identification.2009, Wright etc., by proposing a sorter (Sparse Representation Classification, SRC) based on rarefaction representation, using original training facial image as dictionary, passed through l
1norm solves the sparse coefficient of test sample book, by this coefficient, test person face is reconstructed, and then obtains its residual error, is classified as residual error infima species, has obtained good classifying quality.
In rarefaction representation, the structure of dictionary is vital, and scholar both domestic and external had proposed the method for the complete dictionary learning of a lot of mistakes in recent years, and object is from training sample, to obtain one group of base can better to test sample book, represent or encode.Dictionary is mainly to be trained and obtain from sample by machine learning, and the subject matter existing is at present exactly to expend time in and calculation resources when on CPU, serial moves.
Along with the development of parallel processing technique, become an important directions of image processing field and computer science; Programmable graphics processor (Programmable GPU) is on current computing machine, to process the dedicated devices generally adopting.GPU has far away floating-point operation ability and the memory bandwidth higher than CPU, simultaneously due to the concurrency of its height, is very suitable for large-scale data and processes.2006, the tall and handsome company that reaches released unified equipment framework (Compute Unifed Device Architecture, CUDA), and this is the general-purpose computations framework of a kind of new parallel programming model and instruction set architecture.CUDA framework can well be worked in coordination with CPU and be processed parallel task, particularly for calculating relatively time consuming floating number, calculate, at present, CUDA can well support the floating number of double precision to calculate, than CPU, solve more efficiently the calculation task of many complexity, improved significantly the efficiency that traditional algorithm is processed.
Summary of the invention:
Current sparse dictionary learning algorithm complex is high, and travelling speed is slow, and traditional rarefaction representation Classification and Identification effect is not fine simultaneously.The object of the invention is to propose a kind of for fast face recognizer video monitoring, based on CUDA parallel computation and rarefaction representation.For solving above-mentioned technical matters, the following technic relization scheme of employing of the present invention:
Fast human face recognition based on CUDA parallel computation and rarefaction representation, is characterized in that, the step of this algorithm is as follows:
(1) from face database structure training sample matrix, and initialization dictionary:
Suppose in face database k class people altogether, everyone has n width picture, altogether N=n*c width picture.Therefrom everyone selects c (1<c<n) width picture to be used for constructing training sample matrix Y at random, and remaining picture is configured to test sample book collection; Used complete DCT dictionary as initialization dictionary D;
(2) utilize CUDA Mutli-thread Programming Technology to build CPU and GPU cooperative working environment:
Training sample matrix Y and initialization dictionary D are sent in GPU video memory; In CUDA, write the program of CPU and GPU, first realize kernel function, in definition GPU, need the operation of carrying out; Then define the multithreading scale of this kernel function, comprise the dimension of the thread in thread block dimension and each thread block;
(3) on GPU platform, adopt KSVD dictionary algorithm to carry out dictionary learning to training sample matrix:
Make D, Y, X represents respectively dictionary, the training sample that study obtains, the rarefaction representation matrix of training sample; The target equation of K-SVD dictionary training algorithm can be expressed as:
Solving of above formula is an iterative process.Algorithm flow is:
A. sparse coding:
Suppose that dictionary D fixes, use OMP algorithm can solve the rarefaction representation matrix of coefficients X of Y on dictionary D;
B. dictionary updating: the atom to dictionary D upgrades one by one, and renewal process is as follows:
1. for the current atom d that will upgrade
k, note I
k={ i|a
i(k) ≠ 0,1≤i≤N}, a
i(k) be a
iin k element, I
krepresent to use in each sample atom d
kindex;
2. calculate residual matrix
the j that represents A is capable, E
krepresent to remove the error after k atom, all samples being caused;
3. corresponding to I
kin index, choose E
kin corresponding column vector, form new error matrix
and right
carry out SVD decomposition
4. get first row in matrix U as the atom after upgrading, Λ (1,1) with the product of the first row of matrix V as new sparse coefficient.
C. iteration finishes, otherwise turns back to A;
(4) rarefaction representation dictionary D is sent back in CPU internal memory from GPU video memory;
(5) dictionary D is applied to traditional rarefaction representation sorting algorithm in CPU and carries out Classification and Identification:
Make z represent test sample book, test pattern z is expressed as to the linear combination of training sample, be i.e. z=D α, wherein α=[α
1; α
i; α
k].Classification and identification algorithm process is as follows:
A. test sample book is expressed as to the linear combination of dictionary D, by l
1norm minimum solves and obtains sparse coefficient:
In formula, λ is a scalar;
B. calculate the residual error of approaching of Different categories of samples to test sample book:
Wherein, δ
i(α) be the coefficient vector corresponding with i class sample;
C. according to least residual, approach criterion and carry out test pattern classification:
Compared with prior art, beneficial effect of the present invention is:
1. the present invention uses CUDA parallel computing to accelerate to process to the learning process of dictionary, and is optimized according to algorithm characteristics, has reduced widely algorithm execution time, has improved arithmetic speed;
2. adopt GPU concurrent technique to process, fully excavated the computational resource of active computer hardware, significantly promoted the efficiency of face identification method, can effectively improve the accuracy of identification.
Accompanying drawing explanation:
Fig. 1 is CUDA thread network chart;
Fig. 2 is the fast human face recognition process flow diagram based on CUDA parallel computation and rarefaction representation.
Embodiment:
(1) from face database structure training sample matrix, and initialization dictionary:
Suppose in face database k class people altogether, everyone has n width picture, altogether N=n*c width picture.Therefrom everyone selects c (1<c<n) width picture to be used for constructing training sample matrix Y at random, and remaining picture is configured to test sample book collection; Used complete DCT dictionary as initialization dictionary D;
(2) utilize CUDA Mutli-thread Programming Technology to build CPU and GPU cooperative working environment:
Training sample matrix Y and initialization dictionary D are sent in GPU video memory; In CUDA, write the program of CPU and GPU, first realize kernel function, in definition GPU, need the operation of carrying out; Then define the multithreading scale of this kernel function, comprise the dimension of the thread in thread block dimension and each thread block;
(3) on GPU platform, adopt KSVD dictionary algorithm to carry out dictionary learning to training sample matrix:
Make D, Y, X represents respectively dictionary, the training sample that study obtains, the rarefaction representation matrix of training sample; The target equation of K-SVD dictionary training algorithm can be expressed as:
Solving of above formula is an iterative process.Algorithm flow is:
A. sparse coding:
Suppose that dictionary D fixes, use OMP algorithm can solve the rarefaction representation matrix of coefficients X of Y on dictionary D;
B. dictionary updating: the atom to dictionary D upgrades one by one, and renewal process is as follows:
1. for the current atom d that will upgrade
k, note I
k={ i|a
i(k) ≠ 0,1≤i≤N}, a
i(k) be a
iin k element, I
krepresent to use in each sample atom d
kindex;
2. calculate residual matrix
the j that represents A is capable, E
krepresent to remove the error after k atom, all samples being caused;
3. corresponding to I
kin index, choose E
kin corresponding column vector, form new error matrix
and right
carry out SVD decomposition
4. get first row in matrix U as the atom after upgrading, Λ (1,1) with the product of the first row of matrix V as new sparse coefficient.
C. iteration finishes, otherwise turns back to A;
(4) rarefaction representation dictionary D is sent back in CPU internal memory from GPU video memory;
(5) dictionary D is applied to traditional rarefaction representation sorting algorithm in CPU and carries out Classification and Identification:
Make z represent test sample book, test pattern z is expressed as to the linear combination of training sample, be i.e. z=D α, wherein α=[α
1; α
i; α
k].Classification and identification algorithm process is as follows:
A. test sample book is expressed as to the linear combination of dictionary D, by l
1norm minimum solves and obtains sparse coefficient:
In formula, λ is a scalar;
B. calculate the residual error of approaching of Different categories of samples to test sample book:
Wherein, δ
i(α) be the coefficient vector corresponding with i class sample;
C. according to least residual, approach criterion and carry out test pattern classification:
Through above step, the method, on the basis of CUDA parallel mechanism, has realized effectively recognition of face at a high speed.
The present invention utilizes parallel mechanism and the multi-thread programming feature of CUDA, dictionary learning process is carried out to GPU and optimize acceleration, solved the slow problem of dictionary learning arithmetic speed, simplified the complexity of method, improve the operational efficiency of algorithm, can meet the application demand of real-time in reality.
Claims (1)
1. for fast face recognizer video monitoring, based on CUDA parallel computation and rarefaction representation, it is characterized in that, the step of method is as follows;
(1) from face database structure training sample matrix, and initialization dictionary:
Suppose in face database k class people altogether, everyone has n width picture, altogether N=n*c width picture.Therefrom everyone selects c (1<c<n) width picture to be used for constructing training sample matrix Y at random, and remaining picture is configured to test sample book collection; Used complete DCT dictionary as initialization dictionary D;
(2) utilize CUDA Mutli-thread Programming Technology to build CPU and GPU cooperative working environment:
Training sample matrix Y and initialization dictionary D are sent in GPU video memory; In CUDA, write the program of CPU and GPU, first realize kernel function, in definition GPU, need the operation of carrying out; Then define the multithreading scale of this kernel function, comprise the dimension of the thread in thread block dimension and each thread block;
(3) on GPU platform, adopt KSVD dictionary algorithm to carry out dictionary learning to training sample matrix:
Make D, Y, X represents respectively dictionary, the training sample that study obtains, the rarefaction representation matrix of training sample; The target equation of K-SVD dictionary training algorithm can be expressed as:
Solving of above formula is an iterative process.Algorithm flow is:
A. sparse coding:
Suppose that dictionary D fixes, use OMP algorithm can solve the rarefaction representation matrix of coefficients X of Y on dictionary D;
B. dictionary updating: the atom to dictionary D upgrades one by one, and renewal process is as follows:
1. for the current atom d that will upgrade
k, note I
k={ i|a
i(k) ≠ 0,1≤i≤N}, a
i(k) be a
iin k element, I
krepresent to use in each sample atom d
kindex;
2. calculate residual matrix
the j that represents A is capable, E
krepresent to remove the error after k atom, all samples being caused;
3. corresponding to I
kin index, choose E
kin corresponding column vector, form new error matrix
and right
carry out SVD decomposition
4. get first row in matrix U as the atom after upgrading, Λ (1,1) with the product of the first row of matrix V as new sparse coefficient.
C. iteration finishes, otherwise turns back to A;
(4) rarefaction representation dictionary D is sent back in CPU internal memory from GPU video memory;
(5) dictionary D is applied to traditional rarefaction representation sorting algorithm in CPU and carries out Classification and Identification:
Make z represent test sample book, test pattern z is expressed as to the linear combination of training sample, be i.e. z=D α, wherein α=[α
1; α
i; α
k].Classification and identification algorithm process is as follows:
(1) test sample book is expressed as to the linear combination of dictionary D, by l
1norm minimum solves and obtains sparse coefficient:
In formula, λ is a scalar;
(2) calculate the approach residual error of Different categories of samples to test sample book:
Wherein, δ
i(α) be the coefficient vector corresponding with i class sample;
(3) according to least residual, approach criterion and carry out test pattern classification:
。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410346049.2A CN104063714B (en) | 2014-07-20 | 2014-07-20 | A kind of for fast face recognizer video monitoring, based on CUDA parallel computation and rarefaction representation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410346049.2A CN104063714B (en) | 2014-07-20 | 2014-07-20 | A kind of for fast face recognizer video monitoring, based on CUDA parallel computation and rarefaction representation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104063714A true CN104063714A (en) | 2014-09-24 |
CN104063714B CN104063714B (en) | 2016-05-18 |
Family
ID=51551416
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410346049.2A Expired - Fee Related CN104063714B (en) | 2014-07-20 | 2014-07-20 | A kind of for fast face recognizer video monitoring, based on CUDA parallel computation and rarefaction representation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104063714B (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104318522A (en) * | 2014-10-08 | 2015-01-28 | 苏州新视线文化科技发展有限公司 | Graphics processing unit-based sparse representation fast calculation method |
CN106407995A (en) * | 2016-04-01 | 2017-02-15 | 中国地质大学(武汉) | Image data set sparse expression acceleration method and apparatus |
CN106485202A (en) * | 2016-09-18 | 2017-03-08 | 南京工程学院 | Unconfinement face identification system and method |
CN107886519A (en) * | 2017-10-17 | 2018-04-06 | 杭州电子科技大学 | Multichannel chromatogram three-dimensional image fast partition method based on CUDA |
CN108256345A (en) * | 2016-12-28 | 2018-07-06 | 中移(杭州)信息技术有限公司 | A kind of picture method for secret protection, apparatus and system |
CN108921088A (en) * | 2018-06-29 | 2018-11-30 | 佛山市顺德区中山大学研究院 | A kind of face identification method based on discriminate target equation |
CN109165554A (en) * | 2018-07-24 | 2019-01-08 | 高新兴科技集团股份有限公司 | A kind of face characteristic comparison method based on cuda technology |
CN109997115A (en) * | 2016-11-23 | 2019-07-09 | 超威半导体公司 | Low-power and low latency GPU coprocessor for persistently calculating |
CN110765965A (en) * | 2019-10-30 | 2020-02-07 | 兰州理工大学 | Quick dictionary learning algorithm for sparse representation of mechanical vibration signals |
US10769464B2 (en) | 2017-09-12 | 2020-09-08 | Guangdong Oppo Mobile Telecommunications Corp., Ltd. | Facial recognition method and related product |
CN112001865A (en) * | 2020-09-02 | 2020-11-27 | 广东工业大学 | Face recognition method, device and equipment |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101976453A (en) * | 2010-09-26 | 2011-02-16 | 浙江大学 | GPU-based three-dimensional face expression synthesis method |
CN102521581B (en) * | 2011-12-22 | 2014-02-19 | 刘翔 | Parallel face recognition method with biological characteristics and local image characteristics |
CN102737234B (en) * | 2012-06-21 | 2015-08-12 | 北京工业大学 | Based on the face identification method of Gabor filtering and joint sparse model |
CN102915436B (en) * | 2012-10-25 | 2015-04-15 | 北京邮电大学 | Sparse representation face recognition method based on intra-class variation dictionary and training image |
-
2014
- 2014-07-20 CN CN201410346049.2A patent/CN104063714B/en not_active Expired - Fee Related
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104318522A (en) * | 2014-10-08 | 2015-01-28 | 苏州新视线文化科技发展有限公司 | Graphics processing unit-based sparse representation fast calculation method |
CN106407995A (en) * | 2016-04-01 | 2017-02-15 | 中国地质大学(武汉) | Image data set sparse expression acceleration method and apparatus |
CN106485202A (en) * | 2016-09-18 | 2017-03-08 | 南京工程学院 | Unconfinement face identification system and method |
CN109997115A (en) * | 2016-11-23 | 2019-07-09 | 超威半导体公司 | Low-power and low latency GPU coprocessor for persistently calculating |
CN108256345A (en) * | 2016-12-28 | 2018-07-06 | 中移(杭州)信息技术有限公司 | A kind of picture method for secret protection, apparatus and system |
US10769464B2 (en) | 2017-09-12 | 2020-09-08 | Guangdong Oppo Mobile Telecommunications Corp., Ltd. | Facial recognition method and related product |
CN107886519A (en) * | 2017-10-17 | 2018-04-06 | 杭州电子科技大学 | Multichannel chromatogram three-dimensional image fast partition method based on CUDA |
CN108921088A (en) * | 2018-06-29 | 2018-11-30 | 佛山市顺德区中山大学研究院 | A kind of face identification method based on discriminate target equation |
CN108921088B (en) * | 2018-06-29 | 2022-03-04 | 佛山市顺德区中山大学研究院 | Face recognition method based on discriminant target equation |
CN109165554A (en) * | 2018-07-24 | 2019-01-08 | 高新兴科技集团股份有限公司 | A kind of face characteristic comparison method based on cuda technology |
CN110765965A (en) * | 2019-10-30 | 2020-02-07 | 兰州理工大学 | Quick dictionary learning algorithm for sparse representation of mechanical vibration signals |
CN110765965B (en) * | 2019-10-30 | 2023-09-15 | 兰州理工大学 | Quick dictionary learning algorithm for sparse representation of mechanical vibration signals |
CN112001865A (en) * | 2020-09-02 | 2020-11-27 | 广东工业大学 | Face recognition method, device and equipment |
Also Published As
Publication number | Publication date |
---|---|
CN104063714B (en) | 2016-05-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104063714B (en) | A kind of for fast face recognizer video monitoring, based on CUDA parallel computation and rarefaction representation | |
Thomas et al. | CortexSuite: A synthetic brain benchmark suite | |
Sprechmann et al. | Learning efficient sparse and low rank models | |
Khomenko et al. | Accelerating recurrent neural network training using sequence bucketing and multi-gpu data parallelization | |
US11620521B2 (en) | Smoothing regularization for a generative neural network | |
EP3742343A1 (en) | Autonomous vehicle simulation using machine learning | |
CN107563150A (en) | Forecasting Methodology, device, equipment and the storage medium of protein binding site | |
CN108171133B (en) | Dynamic gesture recognition method based on characteristic covariance matrix | |
DE112020004167T5 (en) | VIDEO PREDICTION USING ONE OR MORE NEURAL NETWORKS | |
CN107430678A (en) | Use the inexpensive face recognition of Gauss received field feature | |
CN106295690A (en) | Time series data clustering method based on Non-negative Matrix Factorization and system | |
Fazanaro et al. | Numerical characterization of nonlinear dynamical systems using parallel computing: The role of GPUs approach | |
Rathi | Optimization of transfer learning for sign language recognition targeting mobile platform | |
CN109325513A (en) | A kind of image classification network training method based on magnanimity list class single image | |
Stober et al. | Learning geometry from sensorimotor experience | |
Müller et al. | Special issue on advances in kernel-based learning for signal processing | |
Tegegne et al. | Parallel nonlinear dimensionality reduction using GPU Acceleration | |
Libuschewski et al. | Multi-objective, energy-aware gpgpu design space exploration for medical or industrial applications | |
US11605001B2 (en) | Weight demodulation for a generative neural network | |
US10255692B2 (en) | Method for tracking an object in an image sequence | |
Rashidi | Application of TensorFlow lite on embedded devices: A hands-on practice of TensorFlow model conversion to TensorFlow Lite model and its deployment on Smartphone to compare model’s performance | |
Fejér et al. | Hybrid FPGA–CPU-Based Architecture for Object Recognition in Visual Servoing of Arm Prosthesis | |
Shahid et al. | Multilinear low-rank tensors on graphs & applications | |
JP7298870B2 (en) | Molecular dynamics data analyzer and program | |
Huang et al. | Deep Time Series Sketching and Its Application on Industrial Time Series Clustering |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20160518 Termination date: 20170720 |