CN105160350B - A kind of Haar detection methods accelerated based on GPU - Google Patents

A kind of Haar detection methods accelerated based on GPU Download PDF

Info

Publication number
CN105160350B
CN105160350B CN201510479238.1A CN201510479238A CN105160350B CN 105160350 B CN105160350 B CN 105160350B CN 201510479238 A CN201510479238 A CN 201510479238A CN 105160350 B CN105160350 B CN 105160350B
Authority
CN
China
Prior art keywords
haar
gpu
classifier
scanning window
integrogram
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510479238.1A
Other languages
Chinese (zh)
Other versions
CN105160350A (en
Inventor
曹泉
余坚毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHENZHEN HAGONGDA TRAFFIC ELECTRONIC TECHNOLOGY Co Ltd
Original Assignee
SHENZHEN HAGONGDA TRAFFIC ELECTRONIC TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHENZHEN HAGONGDA TRAFFIC ELECTRONIC TECHNOLOGY Co Ltd filed Critical SHENZHEN HAGONGDA TRAFFIC ELECTRONIC TECHNOLOGY Co Ltd
Priority to CN201510479238.1A priority Critical patent/CN105160350B/en
Publication of CN105160350A publication Critical patent/CN105160350A/en
Application granted granted Critical
Publication of CN105160350B publication Critical patent/CN105160350B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The invention discloses a kind of Haar detection methods accelerated based on GPU, which is characterized in that includes the following steps:(1)Scanning window, feature frame information and classifier parameters data under all amplification coefficients are transmitted and are preserved to GPU equipment by system initialization, CPU;(2)Thread Count is set gradually according to the columns of image array and calls kernel function twice, is carried out two submatrix transposition, is obtained the integrogram and square integrogram of picture;(3)Scanning window and characteristic rectangle frame are arranged in GPU global memories, haar algorithm detections are carried out;(4)Rectangle frame merging will be carried out by the scanning window of all strong classifiers, obtains result to be detected.The present invention is in the stand-by mode of data, the quick calculating of integrogram and pointedly realizes that Haar detects three aspects and improved existing haar algorithms stage by stage, so that the realization speed of haar algorithms has obtained great promotion, practical value has obtained more fully embodying, and has very high practical value and promotional value.

Description

A kind of Haar detection methods accelerated based on GPU
Technical field
The present invention relates to a kind of Haar detection methods, specifically, being to be related to a kind of detection sides Haar accelerated based on GPU Method.
Background technology
Haar (Ha Er) method is a kind of to carry out feeling emerging on the image by training using Haar features and cascade classifier The method of interesting object detection.Haar detection algorithms are widely used, still, data structure is complicated, calculates due to haveing excellent performance Amount is big, realizes that speed is slower, cannot be satisfied the actual requirement quickly detected.
Invention content
The purpose of the present invention is to provide a kind of Haar detection methods accelerated based on GPU, solve in the prior art Existing haar algorithms realize that speed is slow, cannot be satisfied the problem of actual demand.
To achieve the goals above, the technical solution adopted by the present invention is as follows:
A kind of Haar detection methods accelerated based on GPU, are included the following steps:
(1)System initialization, CPU is by scanning window, feature frame information and the classifier parameters under all amplification coefficients Data are transmitted and are preserved to GPU equipment;
(2)Thread Count is set gradually according to the columns of image array and calls kernel function twice, carries out two submatrix transposition, Obtain the integrogram and square integrogram of picture;
(3)Scanning window and characteristic rectangle frame are arranged in GPU global memories, haar algorithm detections are carried out;
(4)Rectangle frame merging will be carried out by the scanning window of all strong classifiers, obtains result to be detected.
Further, the step(2)The specific method is as follows:
First, set gradually Thread Count according to the columns of image array, call kernel function, successively it is longitudinal carry out current line and The calculating of next line, then into row matrix transposition;
Then, Thread Count is set according to the matrix columns after transposition, calls kernel function, carry out current line and next again successively Capable calculating, then into row matrix transposition, obtained matrix is required integrogram;
Finally, corresponding integrated square figure is calculated in the same way.
Still further, the step(3)Specially:
A. the grid number and block number of GP configuring U, all scanning windows pass through each strong classification of cascade classifier successively Device, when by least first three strong classifier of cascade classifier, using the parallel of one scanning window of a thread process Mode is calculated, and each scanning window traverses all Weak Classifiers of current strong classifier in per thread, is integrated Figure characteristic value;
B. integrogram characteristic value is obtained into score value with corresponding Weak Classifier threshold value comparison, the score value of each Weak Classifier is cumulative Final result and corresponding strong classifier threshold value comparison, judge Current Scan window whether by current strong classifier;
C. scanning window by cascade classifier at least after three strong classifiers when, calculate one using block A scanning window, each thread inside same block are responsible for the calculating that Current Scan window passes through different Weak Classifiers.
Preferably, in the step(3)In, next stage Thread Count is by the scanning window number by upper level strong classifier It determines.
Preferably, when scanning window passes through the strong classifier between the step b and c, it is only necessary to call a core letter Number.
Compared with prior art, the invention has the advantages that:
(1)Fixed data initialization is ready to and is stored directly in GPU equipment by the present invention, to efficiently reduce Calculation amount avoids the frequent transmission between host and equipment, improves the realization speed of haar algorithms;
(2)The present invention takes full advantage of the merging access characteristics of GPU, so as to quickly calculate integrogram and square product Component shortens the realization time of haar algorithms;
(3)Image to be detected block in the present invention under all scales is all made of parallel processing, improves calculating density, fully The resource characteristics of the more small nuts of GPU are utilized, to realize large-scale parallel data processing, substantially reduce data processing time, It greatly improves haar algorithms and realizes speed;
(4)The present invention is directed to the fruiting characteristic of cycle detection, takes different parallel thinkings stage by stage, is efficiently utilized Simultaneous resource improves the realization speed of target detection;
(5)The present invention pointedly uses register pair kernel function to carry out data storage, so as to effectively improve data Reading speed saves data processing time, further increases the realization speed of target detection.
(6)The present invention makes haar algorithms obtain greatly carrying on data-handling efficiency by many-sided technological improvement It is high so that the realization speed of haar algorithms disclosure satisfy that the actual demand of current goal detection, improve its accommodation and reality With value.
Description of the drawings
Fig. 1 is the flow diagram of the present invention.
Specific implementation mode
The invention will be further described with reference to the accompanying drawings and examples, and embodiments of the present invention include but not limited to The following example.
Embodiment
As shown in Figure 1, the Haar detection methods disclosed by the invention accelerated based on GPU, mainly in the prior art Haar algorithms realize slow-footed problem, and have carried out various technological improvements, in general, include the following steps:
(1)System initialization, CPU is by scanning window, feature frame information and the classifier parameters under all amplification coefficients Data are organized into a matrix in the way of arranging one by one in GPU global memories, and preserve to GPU equipment respectively;
(2)Thread Count is set gradually according to the columns of image array and calls kernel function twice, carries out two submatrix transposition, Obtain the integrogram and square integrogram of picture;
(3)Scanning window and characteristic rectangle frame are arranged in GPU global memories, haar algorithm detections are carried out;
(4)Rectangle frame merging will be carried out by the scanning window of all strong classifiers, obtains result to be detected.
For step(1), by the way that in system initialization, all fixed datas are all stored directly in GPU equipment, Including scanning window information, feature frame information and the need classifier parameters data to be used under all zoom factors.Because Data exchange between CPU and GPU is one than relatively time-consuming process, therefore can be in actual process by the method Frequent data exchange between CPU and GPU should be reduced to the greatest extent, to save the time of data exchange.And it is stored in data When GPU, all data are arranged in a matrix one by one, can directly be read from GPU equipment when subsequent detection needs, so Also the merging requirement for meeting global memory's reading, improves reading speed.In addition, in the access of subsequent kernel function, this hair It is bright first to read the data currently needed into register, it, can be effective since the reading and writing data speed of register is exceedingly fast Improve whole reading speed.
For step(2), first with(1)Based on the matrix obtained in step, line is set gradually according to matrix column number Number of passes calls kernel function, first longitudinal corresponding calculating for carrying out current line and next line successively, then transposed matrix, again according to Matrix columns setting Thread Count after transposition calls kernel function, equally carries out the calculating of current line and next line again successively, finally Transposed matrix again, obtained matrix are required integrogram.It is finally calculated in the way of calculating integrogram square Integrogram.
Matrix transposition accelerates to complete in GPU, and matrix is divided into multiple block of fixed size, that is, is divided into multiple sizes Fixed piece, block and block be segmented into row-by-row reading it is vertical write or it is vertical read to write across the page, and the data inside block can also row-by-row reading be vertical writes or vertical read It writes across the page.
For step(3), when making scanning window pass through cascade classifier, in several strong classifiers of front, due to total Swept-volume window number is more, and the window number of one strong decaying of classifying is also more, and the time that a kernel function is called is only It therefore for abundant distributing equipment resource, is detected step by step for several microseconds, the remaining window number of upper level determines next The grid number of layer(grid)And block number(block)Distribution.
When by the strong classifier of the middle section of cascade classifier, the number of scanning window is not counting too big, and decaying Amplitude reduction, at this point, in order to reduce because frequently calling kernel function and the time for the number result outflow consumption that decays, Yi Jiqi The time that his code needs, several layers of strong classifiers of middle section are combined, only carry out the calling of a kernel function.
When by the strong classifier of the tail portion of cascade classifier, since the overall number of scanning window is smaller, and The Weak Classifier number of single strong classifier increases, and is no longer appropriate for carrying out data processing, therefore, this hair using window parallel schema It is bright to be changed to using based on block number(block)Window parallel mode, i.e. block calculates a scanning window, each block Internal thread realizes the parallel processing of Weak Classifier characteristic value inside same strong classifier.
After the processing of the last one strong classifier is completed, remaining scanning window carries out rectangle frame merging again, you can To examined object.
The present invention is in the stand-by mode of data, the quick calculating of integrogram and pointedly realizes Haar detections three stage by stage A aspect improves existing haar algorithms so that the realization speed of haar algorithms has obtained great promotion, practical valence It has been worth to and has more fully embodied, there is very high practical value and promotional value.
Above-described embodiment is merely a preferred embodiment of the present invention, and it is not intended to limit the protection scope of the present invention, as long as using The design principle of the present invention, and the non-creative variation worked and made is carried out on this basis, it should all belong to the present invention's Within protection domain.

Claims (4)

1. a kind of Haar detection methods accelerated based on GPU, which is characterized in that include the following steps:
(1)System initialization, CPU by scanning window, feature frame information and the classifier parameters data under all amplification coefficients, It transmits and preserves to GPU equipment;
(2)Thread Count is set gradually according to the columns of image array and calls kernel function twice, is carried out two submatrix transposition, is obtained The integrogram of picture and square integrogram;
(3)Scanning window and feature frame are arranged in GPU global memories, haar algorithm detections are carried out, it is specific as follows:
A. the grid number and block number of GP configuring U, all scanning windows pass through each strong classifier of cascade classifier successively, By cascade classifier at least first three strong classifier when, using one scanning window of a thread process parallel mode into Row calculates, and each scanning window traverses all Weak Classifiers of current strong classifier in per thread, obtains integrogram feature Value;
B. integrogram characteristic value is obtained into score value with corresponding Weak Classifier threshold value comparison, the score value of each Weak Classifier is cumulative most Whether the threshold value comparison of termination fruit and corresponding strong classifier, judge Current Scan window by current strong classifier;
C. scanning window by cascade classifier at least after three strong classifiers when, calculate one using block and sweep Window is retouched, each thread inside same block is responsible for the calculating that Current Scan window passes through different Weak Classifiers;
(4)Rectangle frame merging will be carried out by the scanning window of all strong classifiers, obtains result to be detected.
2. a kind of Haar detection methods accelerated based on GPU according to claim 1, which is characterized in that the step(2) The specific method is as follows:
First, it sets gradually Thread Count according to the columns of image array, calls kernel function, successively longitudinal current line and next of carrying out Capable calculating, then into row matrix transposition;
Then, Thread Count is set according to the matrix columns after transposition, calls kernel function, carry out current line and next line again successively It calculates, then into row matrix transposition, obtained matrix is required integrogram;
Finally, corresponding integrated square figure is calculated in the same way.
3. a kind of Haar detection methods accelerated based on GPU according to claim 2, which is characterized in that in the step (3)In, next stage Thread Count by the scanning window number of upper level strong classifier by being determined.
4. a kind of Haar detection methods accelerated based on GPU according to claim 3, which is characterized in that in scanning window When by strong classifier between the step b and c, it is only necessary to call a kernel function.
CN201510479238.1A 2015-08-03 2015-08-03 A kind of Haar detection methods accelerated based on GPU Active CN105160350B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510479238.1A CN105160350B (en) 2015-08-03 2015-08-03 A kind of Haar detection methods accelerated based on GPU

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510479238.1A CN105160350B (en) 2015-08-03 2015-08-03 A kind of Haar detection methods accelerated based on GPU

Publications (2)

Publication Number Publication Date
CN105160350A CN105160350A (en) 2015-12-16
CN105160350B true CN105160350B (en) 2018-08-28

Family

ID=54801202

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510479238.1A Active CN105160350B (en) 2015-08-03 2015-08-03 A kind of Haar detection methods accelerated based on GPU

Country Status (1)

Country Link
CN (1) CN105160350B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108052891A (en) * 2017-12-08 2018-05-18 触景无限科技(北京)有限公司 Facial contour parallel calculating method and device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104680558A (en) * 2015-03-14 2015-06-03 西安电子科技大学 Struck target tracking method using GPU hardware for acceleration

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8452111B2 (en) * 2008-06-05 2013-05-28 Microsoft Corporation Real-time compression and decompression of wavelet-compressed images

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104680558A (en) * 2015-03-14 2015-06-03 西安电子科技大学 Struck target tracking method using GPU hardware for acceleration

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GPU AND CPU COOPERATIVE ACCELARATION FOR FACE DETECTION ON MODERN PROCESSORS;Eric Li 等;《2012 IEEE International Conference on Multimedia and Expo》;20121231;全文 *
基于GPU的人脸检测和特征点定位研究;张印 等;《电子技术研发》;20140925(第9期);全文 *

Also Published As

Publication number Publication date
CN105160350A (en) 2015-12-16

Similar Documents

Publication Publication Date Title
US11775836B2 (en) Hand pose estimation
US9971959B2 (en) Performing object detection operations via a graphics processing unit
CN101044508B (en) Cache efficient rasterization of graphics data
US11900253B2 (en) Tiling format for convolutional neural networks
CN109784372B (en) Target classification method based on convolutional neural network
US11568225B2 (en) Signal processing system and method
CN110020639B (en) Video feature extraction method and related equipment
US11836608B2 (en) Convolution acceleration with embedded vector decompression
US10769485B2 (en) Framebuffer-less system and method of convolutional neural network
CN110738160A (en) human face quality evaluation method combining with human face detection
Nguyen et al. Yolo based real-time human detection for smart video surveillance at the edge
CN111583094A (en) Image pulse coding method and system based on FPGA
CN111191569A (en) Face attribute recognition method and related device thereof
CN109615067B (en) A kind of data dispatching method and device of convolutional neural networks
CN105160350B (en) A kind of Haar detection methods accelerated based on GPU
CN103106412B (en) Flaky medium recognition methods and recognition device
CN103793873A (en) Obtaining method and device for image pixel mid value
US11481994B2 (en) Method and apparatus for extracting image data in parallel from multiple convolution windows, device, and computer-readable storage medium
CN105069450A (en) Quick multi-character recognition method
CN110210430A (en) A kind of Activity recognition method and device
Sun et al. Acceleration algorithm for CUDA-based face detection
US20240071066A1 (en) Object recognition method and apparatus, and device and medium
CN105160349A (en) Haar detection object algorithm based on GPU platform
CN110503193B (en) ROI-based pooling operation method and circuit
CN107862316A (en) Convolution algorithm method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant