CN109165554B

CN109165554B - Human face feature comparison method based on cuda technology

Info

Publication number: CN109165554B
Application number: CN201810816840.3A
Authority: CN
Inventors: 关喜记; 江盛欣; 劳定雄; 洪曙光; 黄仝宇; 汪刚; 宋一兵; 侯玉清; 刘双广
Original assignee: Gosuncn Technology Group Co Ltd
Current assignee: Gosuncn Technology Group Co Ltd
Priority date: 2018-07-24
Filing date: 2018-07-24
Publication date: 2021-09-24
Anticipated expiration: 2038-07-24
Also published as: CN109165554A

Abstract

The invention belongs to the field of biological recognition, and particularly relates to a human face feature comparison method based on a cuda technology, which comprises the following steps: firstly, based on a hardware architecture and a memory access mechanism of the cuda, a mode of accessing the global memory by modifying a thread is changed from the mode that a previous thread accesses a feature block into the mode that a previous thread accesses a byte in a feature block, so that the combined access of the global memory is realized, the access times of the thread to the global memory are reduced, and the comparison speed of single face features is improved; caching the face feature comparison requests of each time, and combining all the comparison requests through a preset rule to perform feature comparison; and finally, splitting the comparison result into independent results according to a preset rule and reporting the results to the user, thereby improving the concurrency efficiency of face feature comparison. Under the condition of the same face feature library capacity and the same request concurrency number, the scheme can effectively reduce server resources and save hardware cost.

Description

Human face feature comparison method based on cuda technology

Technical Field

The invention relates to the technical field of biological recognition, in particular to a human face feature comparison method based on a cuda technology.

Background

At present, the biometric identification technology based on the human face features mainly completes the identification process of the human face based on the effective comparison of the human face features. The method comprises the steps of comparing human faces in a current image with massive human face features in a preset human face feature library one by one after feature extraction is carried out on the human faces in the current image, respectively obtaining comparison similarity scores, then sorting similarity score values in a descending order, and finally keeping a series of human faces with highest similarity meeting a threshold as results to be output.

At present, the construction of safe cities across the country is fiercely carried out, and massive preprocessed face feature data are also generated, so that an effective face feature comparison method is adopted, public security investigation personnel can be helped to quickly identify and distinguish the real identity of a specific person, and an effective help and solution method is provided for the work of public security video investigation, public security management, criminal investigation filing and the like.

In the prior art, in the scheme 1, a processor based on a cpu (graphical Processing unit) is used as a hardware carrier to compare face feature values. The method mainly completes the comparison of the human face features through a multithreading technology, and specifically comprises the following steps: 1. uniformly distributing a preset face feature library to different threads; 2. comparing the current characteristics with the characteristic libraries of all threads one by one, and outputting a series of comparison results with the highest similarity meeting a threshold; 3. and summarizing the output results of the threads, performing descending sorting, and finally outputting a series of comparison results with the highest similarity meeting the threshold. The scheme adopts a hardware architecture of a CPU processor, and firstly, the speed of characteristic comparison is limited by the limitations of CPU main frequency, bus bandwidth and memory access speed. And secondly, the concurrent comparison speed of the human face features is in direct proportion to the number of the features and tends to increase linearly. Therefore, under the condition that the face feature library capacity and the maximum comparison time allowed by the user are both fixed, if the concurrent number of the face feature comparison performed by the user is to be increased, the server for comparison needs to be added for performing horizontal expansion, and the hardware cost is increased.

In the prior art, in scheme 2, a face feature value comparison is performed by taking a GPU as a hardware carrier. The method mainly completes the comparison of the human face characteristics through the powerful floating point arithmetic capability and the high-performance parallel computing technology of the GPU, and specifically comprises the following steps: 1. the method comprises the steps of completing feature comparison of a preset face feature library by compiling a kernel function of a GPU (graphics processing unit) device side; 2. 1 line process completes the comparison of 1 human face characteristic, and super-large scale parallel computation; 3. writing all comparison results into a GPU global video memory; 4. copying the comparison result of the GPU to a CPU, performing descending sorting, and finally outputting a series of comparison results with the highest similarity meeting a threshold; 5. each feature comparison request is executed serially. In the scheme 2, firstly, a single GPU thread completes comparison of 1 individual face feature, so that GPU memories are not merged for access, the bandwidth of the global memory is not optimized, and the access of the global memory is slowed down; secondly, as in the prior art scheme 1, the feature comparison tasks are executed in a serial order, which results in low comparison concurrency efficiency.

Disclosure of Invention

The invention aims to provide a human face feature comparison method based on a cuda technology, and aims to solve the problems that in the prior art, the human face feature comparison concurrency efficiency is low, and the hardware cost is increased due to the fact that a comparison server needs to be added for transverse expansion.

The invention is realized by the following technical scheme:

a facial feature comparison method based on the cuda technology comprises the following steps: loading all the face features of the target library into a GPU (graphics processing unit) video memory in advance, ensuring the continuity and alignment of the memory, and accessing the feature memory through a GPU thread to realize the merging access of the memory; on the premise of a hardware architecture and a concurrency technology based on cuda, a plurality of feature comparison requests are combined into a kernel function by combining the feature comparison requests of users, and finally, feature comparison result separation is carried out according to a preset rule, so that the feature comparison result is matched with the requests.

The method for merging and accessing the memory specifically comprises the following steps:

a. obtaining effective human face features;

b. each thread only calculates the characteristic data meeting the serial number condition; the sequence number condition is the starting sequence number + step number, wherein the starting sequence number is the initial value of each thread, the step size is the number of threads in the thread Block (Block), the number of times is the self-increment number of the initial value 0, and the maximum value of the number of times is equal to the characteristic length divided by the number of threads of the thread Block (Block);

c. calculating the degree of identity and writing back to the memory;

d. judging whether the comparison is finished or not, and if so, finishing; otherwise, returning to the step a.

The calculation mode of the face features is that one thread block runs one face feature value.

Preferably, each feature comparison request is cached according to a first predetermined rule.

The caching method of the characteristic comparison request specifically comprises the following steps:

a1, receiving a characteristic comparison request, and caching according to the first preset rule;

b1, waiting for the feature comparison result to return;

c1, reporting the feature comparison result, and ending.

The method further comprises the steps of obtaining a certain number of cached feature comparison requests, and then combining the cached feature comparison requests according to a second preset rule.

And further comprising the steps of carrying out feature comparison after merging treatment, and then carrying out feature comparison result separation according to the unique task identifier SN.

Preferably, following the hardware architecture and the memory access mechanism of cuda, the GPU thread is adopted to continuously access the memory data of the face features, so as to implement the merged access of the global feature memory, reduce the number of memory accesses, and improve the feature comparison speed.

Preferably, multiple feature comparison requests are combined into one request, so that the number of calling times of the GPU for comparing the kernel function is reduced, and the concurrency number of feature comparison is increased.

By applying the technical scheme of the invention, firstly, based on the hardware architecture and the memory access mechanism of the cuda, a mode of accessing the global memory by modifying the thread is changed from accessing one feature block by one thread into accessing one byte in one feature block by one thread, so that the combined access of the global memory is realized, the access times of the thread to the global memory are reduced, and the comparison speed of the single face feature is improved. And caching the face feature comparison requests of each time, combining all the comparison requests through a preset rule to perform feature comparison, and finally splitting the comparison result into independent results according to the preset rule and reporting the results to the user, so that the concurrence efficiency of the face feature comparison is improved.

The technical scheme of the invention adopts the hardware architecture of cuda, the parallel processing capability and the floating point operation capability of the hardware architecture are obviously stronger than those of a CPU (central processing unit), and the face feature comparison efficiency is improved. By changing the calculation mode of the face features, one face feature is changed from the previous thread operation into one thread Block (Block) operation, so that the merged access of the GPU memory is realized, and the comparison speed is improved by about 1 time. And a method for combining and processing comparison requests is added, and multiple kernel function calls are changed into one call, so that the concurrence of face feature comparison is improved. Under the condition of the same face feature library capacity and the same request concurrency number, the scheme can effectively reduce server resources and save hardware cost.

Drawings

The present invention will be described in further detail with reference to the accompanying drawings;

FIG. 1 is a flow chart of a merged access method for a GPU global memory of the present invention;

FIG. 2 is a flow chart of the present invention for comparing human face features to request caching;

FIG. 3 is a data structure diagram of the face feature comparison request merging according to the present invention;

FIG. 4 is a flow chart of the face feature comparison request merging according to the present invention.

Detailed Description

The invention is described in further detail below with reference to specific embodiments and with reference to the attached drawings.

According to the scheme, all face features of the face feature library are required to be loaded into the global memory of the GPU in advance, and the memory blocks are ensured to be continuous and aligned.

Based on a hardware architecture and a memory access mechanism of the cuda (computer United Device architecture), the method solves the problem of merging access of the GPU global memory by calculating a characteristic through a thread block, reduces the access times of the thread to the global memory, and optimizes the bandwidth of the global memory. Thus, the comparison speed of a single feature comparison request is improved, and the specific steps are as shown in fig. 1, and include:

step a, obtaining effective characteristics; the effective features refer to human face features which can be effectively detected and recognized;

and b, each Thread (Thread) only calculates the characteristic data meeting the sequence number condition (tid sequence number), wherein the sequence number condition is the starting sequence number + step length times. Wherein, the initial sequence number is the respective initial value of the thread, and the step length is the thread number in the thread Block (Block); the number of times is a self-increment number of an initial value 0, and the maximum value of the number of times is equal to the characteristic length divided by the number of threads of a thread Block (Block); the feature length is a numerical length of the face feature.

Step c, calculating the degree of identity and writing back to the memory;

d, judging whether the comparison is finished or not, and if so, finishing the process; otherwise, returning to the step a.

Because the access speed of the global memory is relatively slow, 400-600 clock cycles are needed, so that the optimization of the access speed of the global memory is particularly important. In the scheme, besides the fact that the memory addresses of the face feature library are continuous and aligned, the face feature calculation mode is modified, one thread is changed into one thread block, and one feature is calculated, so that the fact that each thread accesses the global memory in a one-to-one continuous alignment mode is guaranteed, the access addresses of each thread can be combined, the memory access times are reduced, and the speed of a single feature comparison request is improved. By the strategy, the comparison speed of the single face features is improved by nearly 1 time under the condition of 100w of face feature library.

Based on the concurrency technique of cuda, we find the execution time by performing the comparison of a plurality of eigenvalues in a kernel function as shown in table 1. The optimization effect is more obvious along with the increase of the comparison characteristic value.

TABLE 1

Comparing the number of characteristic values	Unmergence call time (ms)	Merging calls time (ms)
			1	31	31
2	60	46
			10	929	494
12	1213	605

Based on the test data in table 1, a face feature comparison request for a single time is merged, and a face feature comparison kernel (kernal) function which is called for many times is changed into one-time calling, so that the concurrence of face feature comparison is increased. The implementation of the technical scheme comprises the following 2 sub-processes.

1. Caching process of feature comparison request

After receiving the requests for feature comparison, each request is automatically given a unique identifier (SN) and cached in sequence, and finally a task completion notification of the waiting feature comparison thread is blocked. And after receiving a comparison completion notice of the feature comparison thread, reporting the comparison result to the user so as to complete the feature comparison request. The specific steps are shown in fig. 2, and include:

step a1, receiving a characteristic comparison request, and caching according to rules;

b1, waiting for the comparison result to return;

and c1, reporting the comparison result, and ending the process.

2. Merged feature comparison request flow

And acquiring a certain number of face feature comparison requests from the buffer queue, merging the face feature comparison tasks according to a data structure sequentially composed of the unique identifier SN, the feature data, the channel information and the timestamp in the figure 3, and setting the upper limit of the merged tasks. Then, a kernel function of feature comparison is called, a task request is matched with a comparison result according to the unique identifier SN of the requested task, and finally, a waiting thread is notified that the feature comparison request is completed, and the specific steps are shown in fig. 4 and include:

a2, obtaining a certain amount of cached feature comparison tasks;

step b2, merging according to a preset rule;

c2, performing feature comparison;

d2, separating the characteristic results according to the task SN;

and e2, informing the waiting thread of the comparison end.

By adopting the hardware architecture of cuda, the parallel processing capability and the floating point arithmetic capability of the scheme are obviously stronger than those of the hardware architecture of a CPU, for example, under the condition of 100w of face feature library capacity, the time for searching under a CPU processor is 113ms, and the time for searching under hardware based on cuda is 31 ms. Obviously, the characteristic comparison efficiency of the scheme is higher than that of the prior art scheme 1.

Compared with the prior art scheme 2, the method changes the calculation mode of the human face features, changes the operation of one feature by one thread into the operation of one feature by one thread block, realizes the merged access of the GPU memory, and improves the comparison speed by about 1 time. And a method for combining and processing comparison requests is added, and multiple times of kernel function calls are changed into one time of call, so that the concurrence of face feature comparison is improved. Obviously, under the condition of the same storage capacity and concurrency number, the technical scheme can save server resources and save hardware cost.

The above-mentioned embodiments are provided to further explain the objects, technical solutions and advantages of the present invention in detail, and it should be understood that the above-mentioned embodiments are only examples of the present invention and are not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement and the like made without departing from the spirit and scope of the invention are also within the protection scope of the invention.

Claims

1. A facial feature comparison method based on the cuda technology is characterized by comprising the following steps: loading all the face features of the target library into a GPU (graphics processing unit) video memory in advance, ensuring the continuity and alignment of the memory, and accessing the feature memory through a GPU thread to realize the merging access of the memory; on the premise of a hardware architecture and a concurrency technology based on cuda, caching each feature comparison request according to a first preset rule, combining a plurality of feature comparison requests into a kernel function by combining the feature comparison requests of users, and finally separating feature comparison results according to a preset rule to enable the feature comparison results to be matched with the requests;

a. obtaining effective human face features;

b. each thread only calculates the characteristic data meeting the serial number condition;

c. calculating the degree of identity and writing back to the memory;

d. judging whether the comparison is finished or not, and if so, finishing; otherwise, returning to the step a;

the calculation mode of the face feature is that a thread block runs a face feature value.

2. The method for comparing facial features based on cuda technology according to claim 1, wherein the sequence number condition is a starting sequence number + step size times; the starting sequence numbers are the respective initial values of the threads, the step length is the number of the threads in the thread block, the times are self-increasing integers of 0 of the initial values, and the maximum value of the times is equal to the characteristic length divided by the number of the threads of the thread block.

3. The cuda technology-based face feature comparison method according to claim 1, wherein the feature comparison request caching method specifically comprises the following steps:

b1, waiting for the feature comparison result to return;

c1, reporting the feature comparison result, and ending.

4. The method for comparing the face features based on the cuda technology as claimed in claim 1, wherein a GPU thread is adopted to continuously access the memory data of the face features following a hardware architecture and a memory access mechanism of cuda, so as to realize the merged access of the global feature memory, reduce the number of memory access times and improve the feature comparison speed.

5. The method for comparing facial features based on the cuda technology as claimed in claim 1, wherein multiple feature comparison requests are combined into one request, so that the number of times of calling the kernel function by the GPU is reduced, and the concurrence number of feature comparison is increased.

6. The method for comparing facial features based on the cuda technique as claimed in claim 1, further comprising obtaining a certain number of cached feature comparison requests, and then merging the cached feature comparison requests according to a second predetermined rule.

7. The method for comparing the face features based on the cuda technology as claimed in claim 6, further comprising the steps of performing feature comparison after merging processing, and then performing feature comparison result separation according to a task SN.