CN109165554B - Human face feature comparison method based on cuda technology - Google Patents

Human face feature comparison method based on cuda technology Download PDF

Info

Publication number
CN109165554B
CN109165554B CN201810816840.3A CN201810816840A CN109165554B CN 109165554 B CN109165554 B CN 109165554B CN 201810816840 A CN201810816840 A CN 201810816840A CN 109165554 B CN109165554 B CN 109165554B
Authority
CN
China
Prior art keywords
comparison
feature comparison
feature
memory
cuda
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810816840.3A
Other languages
Chinese (zh)
Other versions
CN109165554A (en
Inventor
关喜记
江盛欣
劳定雄
洪曙光
黄仝宇
汪刚
宋一兵
侯玉清
刘双广
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Gosuncn Technology Group Co Ltd
Original Assignee
Gosuncn Technology Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gosuncn Technology Group Co Ltd filed Critical Gosuncn Technology Group Co Ltd
Priority to CN201810816840.3A priority Critical patent/CN109165554B/en
Publication of CN109165554A publication Critical patent/CN109165554A/en
Application granted granted Critical
Publication of CN109165554B publication Critical patent/CN109165554B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Abstract

The invention belongs to the field of biological recognition, and particularly relates to a human face feature comparison method based on a cuda technology, which comprises the following steps: firstly, based on a hardware architecture and a memory access mechanism of the cuda, a mode of accessing the global memory by modifying a thread is changed from the mode that a previous thread accesses a feature block into the mode that a previous thread accesses a byte in a feature block, so that the combined access of the global memory is realized, the access times of the thread to the global memory are reduced, and the comparison speed of single face features is improved; caching the face feature comparison requests of each time, and combining all the comparison requests through a preset rule to perform feature comparison; and finally, splitting the comparison result into independent results according to a preset rule and reporting the results to the user, thereby improving the concurrency efficiency of face feature comparison. Under the condition of the same face feature library capacity and the same request concurrency number, the scheme can effectively reduce server resources and save hardware cost.

Description

Human face feature comparison method based on cuda technology
Technical Field
The invention relates to the technical field of biological recognition, in particular to a human face feature comparison method based on a cuda technology.
Background
At present, the biometric identification technology based on the human face features mainly completes the identification process of the human face based on the effective comparison of the human face features. The method comprises the steps of comparing human faces in a current image with massive human face features in a preset human face feature library one by one after feature extraction is carried out on the human faces in the current image, respectively obtaining comparison similarity scores, then sorting similarity score values in a descending order, and finally keeping a series of human faces with highest similarity meeting a threshold as results to be output.
At present, the construction of safe cities across the country is fiercely carried out, and massive preprocessed face feature data are also generated, so that an effective face feature comparison method is adopted, public security investigation personnel can be helped to quickly identify and distinguish the real identity of a specific person, and an effective help and solution method is provided for the work of public security video investigation, public security management, criminal investigation filing and the like.
In the prior art, in the scheme 1, a processor based on a cpu (graphical Processing unit) is used as a hardware carrier to compare face feature values. The method mainly completes the comparison of the human face features through a multithreading technology, and specifically comprises the following steps: 1. uniformly distributing a preset face feature library to different threads; 2. comparing the current characteristics with the characteristic libraries of all threads one by one, and outputting a series of comparison results with the highest similarity meeting a threshold; 3. and summarizing the output results of the threads, performing descending sorting, and finally outputting a series of comparison results with the highest similarity meeting the threshold. The scheme adopts a hardware architecture of a CPU processor, and firstly, the speed of characteristic comparison is limited by the limitations of CPU main frequency, bus bandwidth and memory access speed. And secondly, the concurrent comparison speed of the human face features is in direct proportion to the number of the features and tends to increase linearly. Therefore, under the condition that the face feature library capacity and the maximum comparison time allowed by the user are both fixed, if the concurrent number of the face feature comparison performed by the user is to be increased, the server for comparison needs to be added for performing horizontal expansion, and the hardware cost is increased.
In the prior art, in scheme 2, a face feature value comparison is performed by taking a GPU as a hardware carrier. The method mainly completes the comparison of the human face characteristics through the powerful floating point arithmetic capability and the high-performance parallel computing technology of the GPU, and specifically comprises the following steps: 1. the method comprises the steps of completing feature comparison of a preset face feature library by compiling a kernel function of a GPU (graphics processing unit) device side; 2. 1 line process completes the comparison of 1 human face characteristic, and super-large scale parallel computation; 3. writing all comparison results into a GPU global video memory; 4. copying the comparison result of the GPU to a CPU, performing descending sorting, and finally outputting a series of comparison results with the highest similarity meeting a threshold; 5. each feature comparison request is executed serially. In the scheme 2, firstly, a single GPU thread completes comparison of 1 individual face feature, so that GPU memories are not merged for access, the bandwidth of the global memory is not optimized, and the access of the global memory is slowed down; secondly, as in the prior art scheme 1, the feature comparison tasks are executed in a serial order, which results in low comparison concurrency efficiency.
Disclosure of Invention
The invention aims to provide a human face feature comparison method based on a cuda technology, and aims to solve the problems that in the prior art, the human face feature comparison concurrency efficiency is low, and the hardware cost is increased due to the fact that a comparison server needs to be added for transverse expansion.
The invention is realized by the following technical scheme:
a facial feature comparison method based on the cuda technology comprises the following steps: loading all the face features of the target library into a GPU (graphics processing unit) video memory in advance, ensuring the continuity and alignment of the memory, and accessing the feature memory through a GPU thread to realize the merging access of the memory; on the premise of a hardware architecture and a concurrency technology based on cuda, a plurality of feature comparison requests are combined into a kernel function by combining the feature comparison requests of users, and finally, feature comparison result separation is carried out according to a preset rule, so that the feature comparison result is matched with the requests.
The method for merging and accessing the memory specifically comprises the following steps:
a. obtaining effective human face features;
b. each thread only calculates the characteristic data meeting the serial number condition; the sequence number condition is the starting sequence number + step number, wherein the starting sequence number is the initial value of each thread, the step size is the number of threads in the thread Block (Block), the number of times is the self-increment number of the initial value 0, and the maximum value of the number of times is equal to the characteristic length divided by the number of threads of the thread Block (Block);
c. calculating the degree of identity and writing back to the memory;
d. judging whether the comparison is finished or not, and if so, finishing; otherwise, returning to the step a.
The calculation mode of the face features is that one thread block runs one face feature value.
Preferably, each feature comparison request is cached according to a first predetermined rule.
The caching method of the characteristic comparison request specifically comprises the following steps:
a1, receiving a characteristic comparison request, and caching according to the first preset rule;
b1, waiting for the feature comparison result to return;
c1, reporting the feature comparison result, and ending.
The method further comprises the steps of obtaining a certain number of cached feature comparison requests, and then combining the cached feature comparison requests according to a second preset rule.
And further comprising the steps of carrying out feature comparison after merging treatment, and then carrying out feature comparison result separation according to the unique task identifier SN.
Preferably, following the hardware architecture and the memory access mechanism of cuda, the GPU thread is adopted to continuously access the memory data of the face features, so as to implement the merged access of the global feature memory, reduce the number of memory accesses, and improve the feature comparison speed.
Preferably, multiple feature comparison requests are combined into one request, so that the number of calling times of the GPU for comparing the kernel function is reduced, and the concurrency number of feature comparison is increased.
By applying the technical scheme of the invention, firstly, based on the hardware architecture and the memory access mechanism of the cuda, a mode of accessing the global memory by modifying the thread is changed from accessing one feature block by one thread into accessing one byte in one feature block by one thread, so that the combined access of the global memory is realized, the access times of the thread to the global memory are reduced, and the comparison speed of the single face feature is improved. And caching the face feature comparison requests of each time, combining all the comparison requests through a preset rule to perform feature comparison, and finally splitting the comparison result into independent results according to the preset rule and reporting the results to the user, so that the concurrence efficiency of the face feature comparison is improved.
The technical scheme of the invention adopts the hardware architecture of cuda, the parallel processing capability and the floating point operation capability of the hardware architecture are obviously stronger than those of a CPU (central processing unit), and the face feature comparison efficiency is improved. By changing the calculation mode of the face features, one face feature is changed from the previous thread operation into one thread Block (Block) operation, so that the merged access of the GPU memory is realized, and the comparison speed is improved by about 1 time. And a method for combining and processing comparison requests is added, and multiple kernel function calls are changed into one call, so that the concurrence of face feature comparison is improved. Under the condition of the same face feature library capacity and the same request concurrency number, the scheme can effectively reduce server resources and save hardware cost.
Drawings
The present invention will be described in further detail with reference to the accompanying drawings;
FIG. 1 is a flow chart of a merged access method for a GPU global memory of the present invention;
FIG. 2 is a flow chart of the present invention for comparing human face features to request caching;
FIG. 3 is a data structure diagram of the face feature comparison request merging according to the present invention;
FIG. 4 is a flow chart of the face feature comparison request merging according to the present invention.
Detailed Description
The invention is described in further detail below with reference to specific embodiments and with reference to the attached drawings.
According to the scheme, all face features of the face feature library are required to be loaded into the global memory of the GPU in advance, and the memory blocks are ensured to be continuous and aligned.
Based on a hardware architecture and a memory access mechanism of the cuda (computer United Device architecture), the method solves the problem of merging access of the GPU global memory by calculating a characteristic through a thread block, reduces the access times of the thread to the global memory, and optimizes the bandwidth of the global memory. Thus, the comparison speed of a single feature comparison request is improved, and the specific steps are as shown in fig. 1, and include:
step a, obtaining effective characteristics; the effective features refer to human face features which can be effectively detected and recognized;
and b, each Thread (Thread) only calculates the characteristic data meeting the sequence number condition (tid sequence number), wherein the sequence number condition is the starting sequence number + step length times. Wherein, the initial sequence number is the respective initial value of the thread, and the step length is the thread number in the thread Block (Block); the number of times is a self-increment number of an initial value 0, and the maximum value of the number of times is equal to the characteristic length divided by the number of threads of a thread Block (Block); the feature length is a numerical length of the face feature.
Step c, calculating the degree of identity and writing back to the memory;
d, judging whether the comparison is finished or not, and if so, finishing the process; otherwise, returning to the step a.
Because the access speed of the global memory is relatively slow, 400-600 clock cycles are needed, so that the optimization of the access speed of the global memory is particularly important. In the scheme, besides the fact that the memory addresses of the face feature library are continuous and aligned, the face feature calculation mode is modified, one thread is changed into one thread block, and one feature is calculated, so that the fact that each thread accesses the global memory in a one-to-one continuous alignment mode is guaranteed, the access addresses of each thread can be combined, the memory access times are reduced, and the speed of a single feature comparison request is improved. By the strategy, the comparison speed of the single face features is improved by nearly 1 time under the condition of 100w of face feature library.
Based on the concurrency technique of cuda, we find the execution time by performing the comparison of a plurality of eigenvalues in a kernel function as shown in table 1. The optimization effect is more obvious along with the increase of the comparison characteristic value.
TABLE 1
Comparing the number of characteristic values Unmergence call time (ms) Merging calls time (ms)
1 31 31
2 60 46
10 929 494
12 1213 605
Based on the test data in table 1, a face feature comparison request for a single time is merged, and a face feature comparison kernel (kernal) function which is called for many times is changed into one-time calling, so that the concurrence of face feature comparison is increased. The implementation of the technical scheme comprises the following 2 sub-processes.
1. Caching process of feature comparison request
After receiving the requests for feature comparison, each request is automatically given a unique identifier (SN) and cached in sequence, and finally a task completion notification of the waiting feature comparison thread is blocked. And after receiving a comparison completion notice of the feature comparison thread, reporting the comparison result to the user so as to complete the feature comparison request. The specific steps are shown in fig. 2, and include:
step a1, receiving a characteristic comparison request, and caching according to rules;
b1, waiting for the comparison result to return;
and c1, reporting the comparison result, and ending the process.
2. Merged feature comparison request flow
And acquiring a certain number of face feature comparison requests from the buffer queue, merging the face feature comparison tasks according to a data structure sequentially composed of the unique identifier SN, the feature data, the channel information and the timestamp in the figure 3, and setting the upper limit of the merged tasks. Then, a kernel function of feature comparison is called, a task request is matched with a comparison result according to the unique identifier SN of the requested task, and finally, a waiting thread is notified that the feature comparison request is completed, and the specific steps are shown in fig. 4 and include:
a2, obtaining a certain amount of cached feature comparison tasks;
step b2, merging according to a preset rule;
c2, performing feature comparison;
d2, separating the characteristic results according to the task SN;
and e2, informing the waiting thread of the comparison end.
By adopting the hardware architecture of cuda, the parallel processing capability and the floating point arithmetic capability of the scheme are obviously stronger than those of the hardware architecture of a CPU, for example, under the condition of 100w of face feature library capacity, the time for searching under a CPU processor is 113ms, and the time for searching under hardware based on cuda is 31 ms. Obviously, the characteristic comparison efficiency of the scheme is higher than that of the prior art scheme 1.
Compared with the prior art scheme 2, the method changes the calculation mode of the human face features, changes the operation of one feature by one thread into the operation of one feature by one thread block, realizes the merged access of the GPU memory, and improves the comparison speed by about 1 time. And a method for combining and processing comparison requests is added, and multiple times of kernel function calls are changed into one time of call, so that the concurrence of face feature comparison is improved. Obviously, under the condition of the same storage capacity and concurrency number, the technical scheme can save server resources and save hardware cost.
The above-mentioned embodiments are provided to further explain the objects, technical solutions and advantages of the present invention in detail, and it should be understood that the above-mentioned embodiments are only examples of the present invention and are not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement and the like made without departing from the spirit and scope of the invention are also within the protection scope of the invention.

Claims (7)

1. A facial feature comparison method based on the cuda technology is characterized by comprising the following steps: loading all the face features of the target library into a GPU (graphics processing unit) video memory in advance, ensuring the continuity and alignment of the memory, and accessing the feature memory through a GPU thread to realize the merging access of the memory; on the premise of a hardware architecture and a concurrency technology based on cuda, caching each feature comparison request according to a first preset rule, combining a plurality of feature comparison requests into a kernel function by combining the feature comparison requests of users, and finally separating feature comparison results according to a preset rule to enable the feature comparison results to be matched with the requests;
the method for merging and accessing the memory specifically comprises the following steps:
a. obtaining effective human face features;
b. each thread only calculates the characteristic data meeting the serial number condition;
c. calculating the degree of identity and writing back to the memory;
d. judging whether the comparison is finished or not, and if so, finishing; otherwise, returning to the step a;
the calculation mode of the face feature is that a thread block runs a face feature value.
2. The method for comparing facial features based on cuda technology according to claim 1, wherein the sequence number condition is a starting sequence number + step size times; the starting sequence numbers are the respective initial values of the threads, the step length is the number of the threads in the thread block, the times are self-increasing integers of 0 of the initial values, and the maximum value of the times is equal to the characteristic length divided by the number of the threads of the thread block.
3. The cuda technology-based face feature comparison method according to claim 1, wherein the feature comparison request caching method specifically comprises the following steps:
a1, receiving a characteristic comparison request, and caching according to the first preset rule;
b1, waiting for the feature comparison result to return;
c1, reporting the feature comparison result, and ending.
4. The method for comparing the face features based on the cuda technology as claimed in claim 1, wherein a GPU thread is adopted to continuously access the memory data of the face features following a hardware architecture and a memory access mechanism of cuda, so as to realize the merged access of the global feature memory, reduce the number of memory access times and improve the feature comparison speed.
5. The method for comparing facial features based on the cuda technology as claimed in claim 1, wherein multiple feature comparison requests are combined into one request, so that the number of times of calling the kernel function by the GPU is reduced, and the concurrence number of feature comparison is increased.
6. The method for comparing facial features based on the cuda technique as claimed in claim 1, further comprising obtaining a certain number of cached feature comparison requests, and then merging the cached feature comparison requests according to a second predetermined rule.
7. The method for comparing the face features based on the cuda technology as claimed in claim 6, further comprising the steps of performing feature comparison after merging processing, and then performing feature comparison result separation according to a task SN.
CN201810816840.3A 2018-07-24 2018-07-24 Human face feature comparison method based on cuda technology Active CN109165554B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810816840.3A CN109165554B (en) 2018-07-24 2018-07-24 Human face feature comparison method based on cuda technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810816840.3A CN109165554B (en) 2018-07-24 2018-07-24 Human face feature comparison method based on cuda technology

Publications (2)

Publication Number Publication Date
CN109165554A CN109165554A (en) 2019-01-08
CN109165554B true CN109165554B (en) 2021-09-24

Family

ID=64898241

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810816840.3A Active CN109165554B (en) 2018-07-24 2018-07-24 Human face feature comparison method based on cuda technology

Country Status (1)

Country Link
CN (1) CN109165554B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111368020A (en) * 2020-02-10 2020-07-03 浙江大华技术股份有限公司 Feature vector comparison method and device and storage medium
CN113326714B (en) * 2020-02-28 2024-03-22 杭州海康威视数字技术股份有限公司 Target comparison method, target comparison device, electronic equipment and readable storage medium
CN114595070B (en) * 2022-05-10 2022-08-12 上海登临科技有限公司 Processor, multithreading combination method and electronic equipment

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20170137273A (en) * 2016-06-02 2017-12-13 중앙대학교 산학협력단 Apparatus and Method for Pedestrian Detection using Deformable Part Model

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521581B (en) * 2011-12-22 2014-02-19 刘翔 Parallel face recognition method with biological characteristics and local image characteristics
CN104063714B (en) * 2014-07-20 2016-05-18 詹曙 A kind of for fast face recognizer video monitoring, based on CUDA parallel computation and rarefaction representation
KR101656373B1 (en) * 2014-10-15 2016-09-12 서울시립대학교 산학협력단 Face identifying method, face identifying apparatus and computer program executing the method
CN106228628B (en) * 2016-07-15 2021-03-26 腾讯科技(深圳)有限公司 Check-in system, method and device based on face recognition

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20170137273A (en) * 2016-06-02 2017-12-13 중앙대학교 산학협력단 Apparatus and Method for Pedestrian Detection using Deformable Part Model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《Acceleration algrithm for CUDA-based face detection》;Li chao Sun,et al;《ICSPCC 2013》;20131114;第1-5页 *

Also Published As

Publication number Publication date
CN109165554A (en) 2019-01-08

Similar Documents

Publication Publication Date Title
Huynh et al. Deepsense: A gpu-based deep convolutional neural network framework on commodity mobile devices
US10606654B2 (en) Data processing method and apparatus
CN109165554B (en) Human face feature comparison method based on cuda technology
US7865898B2 (en) Repartitioning parallel SVM computations using dynamic timeout
CN111913955A (en) Data sorting processing device, method and storage medium
US11514027B2 (en) Paged hybrid LOBs
US9268595B2 (en) Scheduling thread execution based on thread affinity
Ling et al. Design and implementation of a CUDA-compatible GPU-based core for gapped BLAST algorithm
CN105159650A (en) Method and systems for power consumption management of a pattern-recognition processor
US10067963B2 (en) Method for pre-processing and processing query operation on multiple data chunk on vector enabled architecture
EP3398065B1 (en) Data driven scheduler on multiple computing cores
CN110659278A (en) Graph data distributed processing system based on CPU-GPU heterogeneous architecture
CN113706502B (en) Face image quality assessment method and device
CN111949708A (en) Multi-task prediction method, device, equipment and medium based on time sequence feature extraction
CN110955390B (en) Data processing method, device, electronic equipment and storage medium
CN113239218B (en) Method for concurrently executing face search on NPU-equipped device
CN113032621A (en) Data sampling method and device, computer equipment and storage medium
Quirino et al. fgssjoin: A GPU-based Algorithm for Set Similarity Joins.
US20240071066A1 (en) Object recognition method and apparatus, and device and medium
CN111444430B (en) Content recommendation method, device, equipment and storage medium
CN106227739B (en) Method for realizing data request based on multiple tasks
CN107169313A (en) The read method and computer-readable recording medium of DNA data files
JP2021522605A (en) Accelerated large-scale similarity calculation
Matsumura et al. An FPGA-accelerated partial duplicate image retrieval engine for a document search system
CN116709419A (en) Tensor data transmission processing method and tensor data reception processing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant