CN105160350B

CN105160350B - A kind of Haar detection methods accelerated based on GPU

Info

Publication number: CN105160350B
Application number: CN201510479238.1A
Authority: CN
Inventors: 曹泉; 余坚毅
Original assignee: SHENZHEN HAGONGDA TRAFFIC ELECTRONIC TECHNOLOGY Co Ltd
Current assignee: SHENZHEN HAGONGDA TRAFFIC ELECTRONIC TECHNOLOGY Co Ltd
Priority date: 2015-08-03
Filing date: 2015-08-03
Publication date: 2018-08-28
Anticipated expiration: 2035-08-03
Also published as: CN105160350A

Abstract

The invention discloses a kind of Haar detection methods accelerated based on GPU, which is characterized in that includes the following steps：（1）Scanning window, feature frame information and classifier parameters data under all amplification coefficients are transmitted and are preserved to GPU equipment by system initialization, CPU；（2）Thread Count is set gradually according to the columns of image array and calls kernel function twice, is carried out two submatrix transposition, is obtained the integrogram and square integrogram of picture；（3）Scanning window and characteristic rectangle frame are arranged in GPU global memories, haar algorithm detections are carried out；（4）Rectangle frame merging will be carried out by the scanning window of all strong classifiers, obtains result to be detected.The present invention is in the stand-by mode of data, the quick calculating of integrogram and pointedly realizes that Haar detects three aspects and improved existing haar algorithms stage by stage, so that the realization speed of haar algorithms has obtained great promotion, practical value has obtained more fully embodying, and has very high practical value and promotional value.

Description

A kind of Haar detection methods accelerated based on GPU

Technical field

The present invention relates to a kind of Haar detection methods, specifically, being to be related to a kind of detection sides Haar accelerated based on GPU Method.

Background technology

Haar (Ha Er) method is a kind of to carry out feeling emerging on the image by training using Haar features and cascade classifier The method of interesting object detection.Haar detection algorithms are widely used, still, data structure is complicated, calculates due to haveing excellent performance Amount is big, realizes that speed is slower, cannot be satisfied the actual requirement quickly detected.

Invention content

The purpose of the present invention is to provide a kind of Haar detection methods accelerated based on GPU, solve in the prior art Existing haar algorithms realize that speed is slow, cannot be satisfied the problem of actual demand.

To achieve the goals above, the technical solution adopted by the present invention is as follows：

A kind of Haar detection methods accelerated based on GPU, are included the following steps：

（1）System initialization, CPU is by scanning window, feature frame information and the classifier parameters under all amplification coefficients Data are transmitted and are preserved to GPU equipment；

（2）Thread Count is set gradually according to the columns of image array and calls kernel function twice, carries out two submatrix transposition, Obtain the integrogram and square integrogram of picture；

（3）Scanning window and characteristic rectangle frame are arranged in GPU global memories, haar algorithm detections are carried out；

（4）Rectangle frame merging will be carried out by the scanning window of all strong classifiers, obtains result to be detected.

Further, the step（2）The specific method is as follows：

First, set gradually Thread Count according to the columns of image array, call kernel function, successively it is longitudinal carry out current line and The calculating of next line, then into row matrix transposition；

Then, Thread Count is set according to the matrix columns after transposition, calls kernel function, carry out current line and next again successively Capable calculating, then into row matrix transposition, obtained matrix is required integrogram；

Finally, corresponding integrated square figure is calculated in the same way.

Still further, the step（3）Specially：

A. the grid number and block number of GP configuring U, all scanning windows pass through each strong classification of cascade classifier successively Device, when by least first three strong classifier of cascade classifier, using the parallel of one scanning window of a thread process Mode is calculated, and each scanning window traverses all Weak Classifiers of current strong classifier in per thread, is integrated Figure characteristic value；

B. integrogram characteristic value is obtained into score value with corresponding Weak Classifier threshold value comparison, the score value of each Weak Classifier is cumulative Final result and corresponding strong classifier threshold value comparison, judge Current Scan window whether by current strong classifier；

C. scanning window by cascade classifier at least after three strong classifiers when, calculate one using block A scanning window, each thread inside same block are responsible for the calculating that Current Scan window passes through different Weak Classifiers.

Preferably, in the step（3）In, next stage Thread Count is by the scanning window number by upper level strong classifier It determines.

Preferably, when scanning window passes through the strong classifier between the step b and c, it is only necessary to call a core letter Number.

Compared with prior art, the invention has the advantages that：

（1）Fixed data initialization is ready to and is stored directly in GPU equipment by the present invention, to efficiently reduce Calculation amount avoids the frequent transmission between host and equipment, improves the realization speed of haar algorithms；

（2）The present invention takes full advantage of the merging access characteristics of GPU, so as to quickly calculate integrogram and square product Component shortens the realization time of haar algorithms；

（3）Image to be detected block in the present invention under all scales is all made of parallel processing, improves calculating density, fully The resource characteristics of the more small nuts of GPU are utilized, to realize large-scale parallel data processing, substantially reduce data processing time, It greatly improves haar algorithms and realizes speed；

（4）The present invention is directed to the fruiting characteristic of cycle detection, takes different parallel thinkings stage by stage, is efficiently utilized Simultaneous resource improves the realization speed of target detection；

（5）The present invention pointedly uses register pair kernel function to carry out data storage, so as to effectively improve data Reading speed saves data processing time, further increases the realization speed of target detection.

（6）The present invention makes haar algorithms obtain greatly carrying on data-handling efficiency by many-sided technological improvement It is high so that the realization speed of haar algorithms disclosure satisfy that the actual demand of current goal detection, improve its accommodation and reality With value.

Description of the drawings

Fig. 1 is the flow diagram of the present invention.

Specific implementation mode

The invention will be further described with reference to the accompanying drawings and examples, and embodiments of the present invention include but not limited to The following example.

Embodiment

As shown in Figure 1, the Haar detection methods disclosed by the invention accelerated based on GPU, mainly in the prior art Haar algorithms realize slow-footed problem, and have carried out various technological improvements, in general, include the following steps：

（1）System initialization, CPU is by scanning window, feature frame information and the classifier parameters under all amplification coefficients Data are organized into a matrix in the way of arranging one by one in GPU global memories, and preserve to GPU equipment respectively；

For step（1）, by the way that in system initialization, all fixed datas are all stored directly in GPU equipment, Including scanning window information, feature frame information and the need classifier parameters data to be used under all zoom factors.Because Data exchange between CPU and GPU is one than relatively time-consuming process, therefore can be in actual process by the method Frequent data exchange between CPU and GPU should be reduced to the greatest extent, to save the time of data exchange.And it is stored in data When GPU, all data are arranged in a matrix one by one, can directly be read from GPU equipment when subsequent detection needs, so Also the merging requirement for meeting global memory's reading, improves reading speed.In addition, in the access of subsequent kernel function, this hair It is bright first to read the data currently needed into register, it, can be effective since the reading and writing data speed of register is exceedingly fast Improve whole reading speed.

For step（2）, first with（1）Based on the matrix obtained in step, line is set gradually according to matrix column number Number of passes calls kernel function, first longitudinal corresponding calculating for carrying out current line and next line successively, then transposed matrix, again according to Matrix columns setting Thread Count after transposition calls kernel function, equally carries out the calculating of current line and next line again successively, finally Transposed matrix again, obtained matrix are required integrogram.It is finally calculated in the way of calculating integrogram square Integrogram.

Matrix transposition accelerates to complete in GPU, and matrix is divided into multiple block of fixed size, that is, is divided into multiple sizes Fixed piece, block and block be segmented into row-by-row reading it is vertical write or it is vertical read to write across the page, and the data inside block can also row-by-row reading be vertical writes or vertical read It writes across the page.

For step（3）, when making scanning window pass through cascade classifier, in several strong classifiers of front, due to total Swept-volume window number is more, and the window number of one strong decaying of classifying is also more, and the time that a kernel function is called is only It therefore for abundant distributing equipment resource, is detected step by step for several microseconds, the remaining window number of upper level determines next The grid number of layer（grid）And block number（block）Distribution.

When by the strong classifier of the middle section of cascade classifier, the number of scanning window is not counting too big, and decaying Amplitude reduction, at this point, in order to reduce because frequently calling kernel function and the time for the number result outflow consumption that decays, Yi Jiqi The time that his code needs, several layers of strong classifiers of middle section are combined, only carry out the calling of a kernel function.

When by the strong classifier of the tail portion of cascade classifier, since the overall number of scanning window is smaller, and The Weak Classifier number of single strong classifier increases, and is no longer appropriate for carrying out data processing, therefore, this hair using window parallel schema It is bright to be changed to using based on block number（block）Window parallel mode, i.e. block calculates a scanning window, each block Internal thread realizes the parallel processing of Weak Classifier characteristic value inside same strong classifier.

After the processing of the last one strong classifier is completed, remaining scanning window carries out rectangle frame merging again, you can To examined object.

The present invention is in the stand-by mode of data, the quick calculating of integrogram and pointedly realizes Haar detections three stage by stage A aspect improves existing haar algorithms so that the realization speed of haar algorithms has obtained great promotion, practical valence It has been worth to and has more fully embodied, there is very high practical value and promotional value.

Above-described embodiment is merely a preferred embodiment of the present invention, and it is not intended to limit the protection scope of the present invention, as long as using The design principle of the present invention, and the non-creative variation worked and made is carried out on this basis, it should all belong to the present invention's Within protection domain.

Claims

1. a kind of Haar detection methods accelerated based on GPU, which is characterized in that include the following steps：

（1）System initialization, CPU by scanning window, feature frame information and the classifier parameters data under all amplification coefficients, It transmits and preserves to GPU equipment；

（2）Thread Count is set gradually according to the columns of image array and calls kernel function twice, is carried out two submatrix transposition, is obtained The integrogram of picture and square integrogram；

（3）Scanning window and feature frame are arranged in GPU global memories, haar algorithm detections are carried out, it is specific as follows：

A. the grid number and block number of GP configuring U, all scanning windows pass through each strong classifier of cascade classifier successively, By cascade classifier at least first three strong classifier when, using one scanning window of a thread process parallel mode into Row calculates, and each scanning window traverses all Weak Classifiers of current strong classifier in per thread, obtains integrogram feature Value；

B. integrogram characteristic value is obtained into score value with corresponding Weak Classifier threshold value comparison, the score value of each Weak Classifier is cumulative most Whether the threshold value comparison of termination fruit and corresponding strong classifier, judge Current Scan window by current strong classifier；

C. scanning window by cascade classifier at least after three strong classifiers when, calculate one using block and sweep Window is retouched, each thread inside same block is responsible for the calculating that Current Scan window passes through different Weak Classifiers；

2. a kind of Haar detection methods accelerated based on GPU according to claim 1, which is characterized in that the step（2） The specific method is as follows：

First, it sets gradually Thread Count according to the columns of image array, calls kernel function, successively longitudinal current line and next of carrying out Capable calculating, then into row matrix transposition；

Then, Thread Count is set according to the matrix columns after transposition, calls kernel function, carry out current line and next line again successively It calculates, then into row matrix transposition, obtained matrix is required integrogram；

Finally, corresponding integrated square figure is calculated in the same way.

3. a kind of Haar detection methods accelerated based on GPU according to claim 2, which is characterized in that in the step （3）In, next stage Thread Count by the scanning window number of upper level strong classifier by being determined.

4. a kind of Haar detection methods accelerated based on GPU according to claim 3, which is characterized in that in scanning window When by strong classifier between the step b and c, it is only necessary to call a kernel function.