CN105160350B - A kind of Haar detection methods accelerated based on GPU - Google Patents
A kind of Haar detection methods accelerated based on GPU Download PDFInfo
- Publication number
- CN105160350B CN105160350B CN201510479238.1A CN201510479238A CN105160350B CN 105160350 B CN105160350 B CN 105160350B CN 201510479238 A CN201510479238 A CN 201510479238A CN 105160350 B CN105160350 B CN 105160350B
- Authority
- CN
- China
- Prior art keywords
- haar
- gpu
- classifier
- scanning window
- integrogram
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Abstract
The invention discloses a kind of Haar detection methods accelerated based on GPU, which is characterized in that includes the following steps:(1)Scanning window, feature frame information and classifier parameters data under all amplification coefficients are transmitted and are preserved to GPU equipment by system initialization, CPU;(2)Thread Count is set gradually according to the columns of image array and calls kernel function twice, is carried out two submatrix transposition, is obtained the integrogram and square integrogram of picture;(3)Scanning window and characteristic rectangle frame are arranged in GPU global memories, haar algorithm detections are carried out;(4)Rectangle frame merging will be carried out by the scanning window of all strong classifiers, obtains result to be detected.The present invention is in the stand-by mode of data, the quick calculating of integrogram and pointedly realizes that Haar detects three aspects and improved existing haar algorithms stage by stage, so that the realization speed of haar algorithms has obtained great promotion, practical value has obtained more fully embodying, and has very high practical value and promotional value.
Description
Technical field
The present invention relates to a kind of Haar detection methods, specifically, being to be related to a kind of detection sides Haar accelerated based on GPU
Method.
Background technology
Haar (Ha Er) method is a kind of to carry out feeling emerging on the image by training using Haar features and cascade classifier
The method of interesting object detection.Haar detection algorithms are widely used, still, data structure is complicated, calculates due to haveing excellent performance
Amount is big, realizes that speed is slower, cannot be satisfied the actual requirement quickly detected.
Invention content
The purpose of the present invention is to provide a kind of Haar detection methods accelerated based on GPU, solve in the prior art
Existing haar algorithms realize that speed is slow, cannot be satisfied the problem of actual demand.
To achieve the goals above, the technical solution adopted by the present invention is as follows:
A kind of Haar detection methods accelerated based on GPU, are included the following steps:
(1)System initialization, CPU is by scanning window, feature frame information and the classifier parameters under all amplification coefficients
Data are transmitted and are preserved to GPU equipment;
(2)Thread Count is set gradually according to the columns of image array and calls kernel function twice, carries out two submatrix transposition,
Obtain the integrogram and square integrogram of picture;
(3)Scanning window and characteristic rectangle frame are arranged in GPU global memories, haar algorithm detections are carried out;
(4)Rectangle frame merging will be carried out by the scanning window of all strong classifiers, obtains result to be detected.
Further, the step(2)The specific method is as follows:
First, set gradually Thread Count according to the columns of image array, call kernel function, successively it is longitudinal carry out current line and
The calculating of next line, then into row matrix transposition;
Then, Thread Count is set according to the matrix columns after transposition, calls kernel function, carry out current line and next again successively
Capable calculating, then into row matrix transposition, obtained matrix is required integrogram;
Finally, corresponding integrated square figure is calculated in the same way.
Still further, the step(3)Specially:
A. the grid number and block number of GP configuring U, all scanning windows pass through each strong classification of cascade classifier successively
Device, when by least first three strong classifier of cascade classifier, using the parallel of one scanning window of a thread process
Mode is calculated, and each scanning window traverses all Weak Classifiers of current strong classifier in per thread, is integrated
Figure characteristic value;
B. integrogram characteristic value is obtained into score value with corresponding Weak Classifier threshold value comparison, the score value of each Weak Classifier is cumulative
Final result and corresponding strong classifier threshold value comparison, judge Current Scan window whether by current strong classifier;
C. scanning window by cascade classifier at least after three strong classifiers when, calculate one using block
A scanning window, each thread inside same block are responsible for the calculating that Current Scan window passes through different Weak Classifiers.
Preferably, in the step(3)In, next stage Thread Count is by the scanning window number by upper level strong classifier
It determines.
Preferably, when scanning window passes through the strong classifier between the step b and c, it is only necessary to call a core letter
Number.
Compared with prior art, the invention has the advantages that:
(1)Fixed data initialization is ready to and is stored directly in GPU equipment by the present invention, to efficiently reduce
Calculation amount avoids the frequent transmission between host and equipment, improves the realization speed of haar algorithms;
(2)The present invention takes full advantage of the merging access characteristics of GPU, so as to quickly calculate integrogram and square product
Component shortens the realization time of haar algorithms;
(3)Image to be detected block in the present invention under all scales is all made of parallel processing, improves calculating density, fully
The resource characteristics of the more small nuts of GPU are utilized, to realize large-scale parallel data processing, substantially reduce data processing time,
It greatly improves haar algorithms and realizes speed;
(4)The present invention is directed to the fruiting characteristic of cycle detection, takes different parallel thinkings stage by stage, is efficiently utilized
Simultaneous resource improves the realization speed of target detection;
(5)The present invention pointedly uses register pair kernel function to carry out data storage, so as to effectively improve data
Reading speed saves data processing time, further increases the realization speed of target detection.
(6)The present invention makes haar algorithms obtain greatly carrying on data-handling efficiency by many-sided technological improvement
It is high so that the realization speed of haar algorithms disclosure satisfy that the actual demand of current goal detection, improve its accommodation and reality
With value.
Description of the drawings
Fig. 1 is the flow diagram of the present invention.
Specific implementation mode
The invention will be further described with reference to the accompanying drawings and examples, and embodiments of the present invention include but not limited to
The following example.
Embodiment
As shown in Figure 1, the Haar detection methods disclosed by the invention accelerated based on GPU, mainly in the prior art
Haar algorithms realize slow-footed problem, and have carried out various technological improvements, in general, include the following steps:
(1)System initialization, CPU is by scanning window, feature frame information and the classifier parameters under all amplification coefficients
Data are organized into a matrix in the way of arranging one by one in GPU global memories, and preserve to GPU equipment respectively;
(2)Thread Count is set gradually according to the columns of image array and calls kernel function twice, carries out two submatrix transposition,
Obtain the integrogram and square integrogram of picture;
(3)Scanning window and characteristic rectangle frame are arranged in GPU global memories, haar algorithm detections are carried out;
(4)Rectangle frame merging will be carried out by the scanning window of all strong classifiers, obtains result to be detected.
For step(1), by the way that in system initialization, all fixed datas are all stored directly in GPU equipment,
Including scanning window information, feature frame information and the need classifier parameters data to be used under all zoom factors.Because
Data exchange between CPU and GPU is one than relatively time-consuming process, therefore can be in actual process by the method
Frequent data exchange between CPU and GPU should be reduced to the greatest extent, to save the time of data exchange.And it is stored in data
When GPU, all data are arranged in a matrix one by one, can directly be read from GPU equipment when subsequent detection needs, so
Also the merging requirement for meeting global memory's reading, improves reading speed.In addition, in the access of subsequent kernel function, this hair
It is bright first to read the data currently needed into register, it, can be effective since the reading and writing data speed of register is exceedingly fast
Improve whole reading speed.
For step(2), first with(1)Based on the matrix obtained in step, line is set gradually according to matrix column number
Number of passes calls kernel function, first longitudinal corresponding calculating for carrying out current line and next line successively, then transposed matrix, again according to
Matrix columns setting Thread Count after transposition calls kernel function, equally carries out the calculating of current line and next line again successively, finally
Transposed matrix again, obtained matrix are required integrogram.It is finally calculated in the way of calculating integrogram square
Integrogram.
Matrix transposition accelerates to complete in GPU, and matrix is divided into multiple block of fixed size, that is, is divided into multiple sizes
Fixed piece, block and block be segmented into row-by-row reading it is vertical write or it is vertical read to write across the page, and the data inside block can also row-by-row reading be vertical writes or vertical read
It writes across the page.
For step(3), when making scanning window pass through cascade classifier, in several strong classifiers of front, due to total
Swept-volume window number is more, and the window number of one strong decaying of classifying is also more, and the time that a kernel function is called is only
It therefore for abundant distributing equipment resource, is detected step by step for several microseconds, the remaining window number of upper level determines next
The grid number of layer(grid)And block number(block)Distribution.
When by the strong classifier of the middle section of cascade classifier, the number of scanning window is not counting too big, and decaying
Amplitude reduction, at this point, in order to reduce because frequently calling kernel function and the time for the number result outflow consumption that decays, Yi Jiqi
The time that his code needs, several layers of strong classifiers of middle section are combined, only carry out the calling of a kernel function.
When by the strong classifier of the tail portion of cascade classifier, since the overall number of scanning window is smaller, and
The Weak Classifier number of single strong classifier increases, and is no longer appropriate for carrying out data processing, therefore, this hair using window parallel schema
It is bright to be changed to using based on block number(block)Window parallel mode, i.e. block calculates a scanning window, each block
Internal thread realizes the parallel processing of Weak Classifier characteristic value inside same strong classifier.
After the processing of the last one strong classifier is completed, remaining scanning window carries out rectangle frame merging again, you can
To examined object.
The present invention is in the stand-by mode of data, the quick calculating of integrogram and pointedly realizes Haar detections three stage by stage
A aspect improves existing haar algorithms so that the realization speed of haar algorithms has obtained great promotion, practical valence
It has been worth to and has more fully embodied, there is very high practical value and promotional value.
Above-described embodiment is merely a preferred embodiment of the present invention, and it is not intended to limit the protection scope of the present invention, as long as using
The design principle of the present invention, and the non-creative variation worked and made is carried out on this basis, it should all belong to the present invention's
Within protection domain.
Claims (4)
1. a kind of Haar detection methods accelerated based on GPU, which is characterized in that include the following steps:
(1)System initialization, CPU by scanning window, feature frame information and the classifier parameters data under all amplification coefficients,
It transmits and preserves to GPU equipment;
(2)Thread Count is set gradually according to the columns of image array and calls kernel function twice, is carried out two submatrix transposition, is obtained
The integrogram of picture and square integrogram;
(3)Scanning window and feature frame are arranged in GPU global memories, haar algorithm detections are carried out, it is specific as follows:
A. the grid number and block number of GP configuring U, all scanning windows pass through each strong classifier of cascade classifier successively,
By cascade classifier at least first three strong classifier when, using one scanning window of a thread process parallel mode into
Row calculates, and each scanning window traverses all Weak Classifiers of current strong classifier in per thread, obtains integrogram feature
Value;
B. integrogram characteristic value is obtained into score value with corresponding Weak Classifier threshold value comparison, the score value of each Weak Classifier is cumulative most
Whether the threshold value comparison of termination fruit and corresponding strong classifier, judge Current Scan window by current strong classifier;
C. scanning window by cascade classifier at least after three strong classifiers when, calculate one using block and sweep
Window is retouched, each thread inside same block is responsible for the calculating that Current Scan window passes through different Weak Classifiers;
(4)Rectangle frame merging will be carried out by the scanning window of all strong classifiers, obtains result to be detected.
2. a kind of Haar detection methods accelerated based on GPU according to claim 1, which is characterized in that the step(2)
The specific method is as follows:
First, it sets gradually Thread Count according to the columns of image array, calls kernel function, successively longitudinal current line and next of carrying out
Capable calculating, then into row matrix transposition;
Then, Thread Count is set according to the matrix columns after transposition, calls kernel function, carry out current line and next line again successively
It calculates, then into row matrix transposition, obtained matrix is required integrogram;
Finally, corresponding integrated square figure is calculated in the same way.
3. a kind of Haar detection methods accelerated based on GPU according to claim 2, which is characterized in that in the step
(3)In, next stage Thread Count by the scanning window number of upper level strong classifier by being determined.
4. a kind of Haar detection methods accelerated based on GPU according to claim 3, which is characterized in that in scanning window
When by strong classifier between the step b and c, it is only necessary to call a kernel function.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510479238.1A CN105160350B (en) | 2015-08-03 | 2015-08-03 | A kind of Haar detection methods accelerated based on GPU |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510479238.1A CN105160350B (en) | 2015-08-03 | 2015-08-03 | A kind of Haar detection methods accelerated based on GPU |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105160350A CN105160350A (en) | 2015-12-16 |
CN105160350B true CN105160350B (en) | 2018-08-28 |
Family
ID=54801202
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510479238.1A Active CN105160350B (en) | 2015-08-03 | 2015-08-03 | A kind of Haar detection methods accelerated based on GPU |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105160350B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108052891A (en) * | 2017-12-08 | 2018-05-18 | 触景无限科技(北京)有限公司 | Facial contour parallel calculating method and device |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104680558A (en) * | 2015-03-14 | 2015-06-03 | 西安电子科技大学 | Struck target tracking method using GPU hardware for acceleration |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8452111B2 (en) * | 2008-06-05 | 2013-05-28 | Microsoft Corporation | Real-time compression and decompression of wavelet-compressed images |
-
2015
- 2015-08-03 CN CN201510479238.1A patent/CN105160350B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104680558A (en) * | 2015-03-14 | 2015-06-03 | 西安电子科技大学 | Struck target tracking method using GPU hardware for acceleration |
Non-Patent Citations (2)
Title |
---|
GPU AND CPU COOPERATIVE ACCELARATION FOR FACE DETECTION ON MODERN PROCESSORS;Eric Li 等;《2012 IEEE International Conference on Multimedia and Expo》;20121231;全文 * |
基于GPU的人脸检测和特征点定位研究;张印 等;《电子技术研发》;20140925(第9期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN105160350A (en) | 2015-12-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11775836B2 (en) | Hand pose estimation | |
US9971959B2 (en) | Performing object detection operations via a graphics processing unit | |
CN101044508B (en) | Cache efficient rasterization of graphics data | |
US11900253B2 (en) | Tiling format for convolutional neural networks | |
CN109784372B (en) | Target classification method based on convolutional neural network | |
US11568225B2 (en) | Signal processing system and method | |
CN110020639B (en) | Video feature extraction method and related equipment | |
US11836608B2 (en) | Convolution acceleration with embedded vector decompression | |
US10769485B2 (en) | Framebuffer-less system and method of convolutional neural network | |
CN110738160A (en) | human face quality evaluation method combining with human face detection | |
Nguyen et al. | Yolo based real-time human detection for smart video surveillance at the edge | |
CN111583094A (en) | Image pulse coding method and system based on FPGA | |
CN111191569A (en) | Face attribute recognition method and related device thereof | |
CN109615067B (en) | A kind of data dispatching method and device of convolutional neural networks | |
CN105160350B (en) | A kind of Haar detection methods accelerated based on GPU | |
CN103106412B (en) | Flaky medium recognition methods and recognition device | |
CN103793873A (en) | Obtaining method and device for image pixel mid value | |
US11481994B2 (en) | Method and apparatus for extracting image data in parallel from multiple convolution windows, device, and computer-readable storage medium | |
CN105069450A (en) | Quick multi-character recognition method | |
CN110210430A (en) | A kind of Activity recognition method and device | |
Sun et al. | Acceleration algorithm for CUDA-based face detection | |
US20240071066A1 (en) | Object recognition method and apparatus, and device and medium | |
CN105160349A (en) | Haar detection object algorithm based on GPU platform | |
CN110503193B (en) | ROI-based pooling operation method and circuit | |
CN107862316A (en) | Convolution algorithm method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |