CN107563392A - The YOLO object detection methods accelerated using OpenCL - Google Patents
The YOLO object detection methods accelerated using OpenCL Download PDFInfo
- Publication number
- CN107563392A CN107563392A CN201710798823.7A CN201710798823A CN107563392A CN 107563392 A CN107563392 A CN 107563392A CN 201710798823 A CN201710798823 A CN 201710798823A CN 107563392 A CN107563392 A CN 107563392A
- Authority
- CN
- China
- Prior art keywords
- mrow
- msubsup
- convolutional neural
- neural networks
- msub
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
The invention discloses a kind of YOLO object detection methods accelerated using GPU hardware, the step of realizing, is:(1) convolutional neural networks are initialized;(2) training sample is obtained;(3) grid of training sample is divided;(4) training convolutional neural networks;(5) judge whether penalty values are less than 0.01, if so, the convolutional neural networks model trained is preserved, if it is not, the training sample obtained next time continues to train;(6) model of the convolutional neural networks trained is preserved to hard disc of computer;(7) feature of test pictures is extracted;(8) the position rectangle frame of test pictures target is determined;(9) target end detects.Available realize on a general-purpose computer of the present invention carries out feature extraction to the target in image, then indicates target position with position rectangle frame, and indicate the classification of target in the rectangle frame upper right corner.
Description
Technical field
The invention belongs to field of computer technology, further relates in computer vision and depth learning technology field
One kind accelerated using open computing language OpenCL (Open Computing Language) you only have a look at YOLO (You
Only Look Once) object detection method.The present invention can realize to be examined to the YOLO targets based on depth convolutional neural networks
Survey method accelerates, and the target in picture can be detected in real time with realizing on a general-purpose computer.
Background technology
It is the core technology of computer vision field with high speed, high performance object detection method.In recent years based on deep
Degree convolutional neural networks yield unusually brilliant results in computer vision field, based on the image classification of depth convolutional neural networks, target inspection
Compared with traditional method, Classification and Identification accuracy rate is obviously improved method of determining and calculating.For the target detection of complex scene, based on depth
The algorithm of convolutional neural networks has stronger robustness, is suitable for scene, the change of intensity of illumination.
Patent application " a kind of convolution based on extensive High-Performance Computing Cluster that Changsha Ma Sha Electronic Science and Technology Co., Ltd.s propose
Neural network concurrent the processing method " (applying date:On November 21st, 2014, application number:2014106748603, publication number:
CN104463324A a kind of convolutional neural networks method for parallel processing based on extensive High-Performance Computing Cluster is disclosed in).The party
The network model that method will train first constructs multiple copies, the model parameter all same of each copy, the number of copy with
The nodes of High-Performance Computing Cluster are identical, and a model copy is distributed on each node;A node is selected as host node, is responsible for
The broadcast and collection of model parameter.Secondly, training set is divided into some subsets, every time by training subset be distributed to except host node it
Remaining outer child node, it is common to carry out the calculating of parameter gradients, and Grad is added up, aggregate-value is used for updating host node model
Parameter, the model parameter after renewal is broadcast to each child node, until model training terminates.But this method Shortcomings
, the parallel-convolution neutral net in extensive High-Performance Computing Cluster, code can not transplant on general computer, the big day of one's doom
Its application is made.
Patent application " a kind of Haar based on GPU platform that Shenzhen Hagongda Traffic Electronic Technology Co., Ltd. proposes
Detect target algorithm " (the applying date:On December 16th, 2015, application number:2015104762047, publication number:CN105160349A)
In disclose it is a kind of based on GPU accelerate Haar classifier object detection method.This method reads sorter model text first
Part information, calculates the scaled size information of all sizes, and transfers data in GPU equipment, then calculates integrogram peace
Square integrogram, standard deviation figure corresponding to different scale is obtained, finally realize mesh using the mode of haar grader parallel processings
Mark detection.But this method Shortcomings, only calculate integrogram and this side for manually selecting feature of square integrogram
Formula, do not have good generalization ability to more complicated scene, accuracy of detection is relatively low.
The content of the invention
It is an object of the invention to overcome above-mentioned the deficiencies in the prior art, there is provided a kind of YOLO accelerated using OpenCL
Object detection method, it can be achieved to detect the target in image.
The step of the present invention is as follows:
(1) convolutional neural networks are initialized:
Initial weight value, bias, the batch Normalized Scale factor values of convolutional neural networks convolutional layer, profit are calculated respectively
Convolutional neural networks are initialized with three values of calculating;
(2) training sample is obtained:
(2a) randomly selects 64 width pictures from the pictures containing 20 kinds of target classifications and mark rectangle frame;
(2b) pre-processes to selected each width picture;
(2c) by the length of pretreated each width picture, it is wide be disposed as 448 pixels after form training sample set;
(3) grid of training sample is divided:
Each width picture that training sample is concentrated is divided into 7*7 square net, the size of each grid is 64*
64;
(4) training convolutional neural networks:
Training sample set is input in convolutional neural networks by (4a);
(4b) utilizes computer graphics processor GPU, each output of convolutional layer in parallel computation convolutional neural networks
Characteristic value, the output characteristic value matrix by all eigenvalue clusters into convolutional layer in convolutional neural networks;
(4c) takes the maximum of output characteristic in the 2*2 neighborhoods of the output characteristic value matrix of convolutional layer, as convolution god
Output characteristic value matrix through network maximum pond layer;
(4e) utilizes computer graphics processor GPU, flexible maximum softmax layers in parallel computation convolutional neural networks
Each output characteristic value, the output by all eigenvalue clusters into flexible maximum softmax layers in convolutional neural networks
Eigenvalue matrix;
(4f) utilizes computer graphics processor GPU, the penalty values of parallel computation convolutional neural networks output layer;
(4g) utilizes computer graphics processor GPU, using stochastic gradient descent method, parallel computation convolutional neural networks
Weighted value, bias;
(5) judge whether the penalty values of current convolutional neural networks output layer are less than 0.01, if so, step (6) is then performed,
Otherwise, step (2) is performed;
(6) model of the convolutional neural networks trained is preserved to hard disc of computer;
(7) output characteristic of test pictures is extracted:
(7a) is never marked in the pictures of rectangle frame and is randomly selected 1 width picture, as test pictures;
Test pictures are input in convolutional neural networks by (7b), obtain the output characteristic of test pictures;
(8) the position rectangle frame of test pictures target is indicated:
According to the output characteristic of test pictures, in test pictures, target is indicated with rectangle frame, and show target
Classification;
(9) target end detects.
The present invention compared with prior art, has advantages below:
First, the present invention uses computer graphics processor GPU, the parallel feature for extracting training sample and test sample,
Overcome and extract that characteristics of image is complicated, portable poor asks parallel using extensive High-Performance Computing Cluster in the prior art
Topic so that while the extraction rate of sample characteristics is ensured, the portability of code greatly enhances the present invention.
Second, the present invention uses computer graphics processor GPU, the parallel feature for extracting training sample and test sample,
Overcome the problem of relatively low to complex scene target detection precision in the prior art so that the present invention greatly strengthen target detection
Generalization ability of the algorithm under several scenes, improve the precision of target detection.
Brief description of the drawings
Fig. 1 is the flow chart of the present invention.
Embodiment
The invention will be further described below in conjunction with the accompanying drawings.
The present invention uses OpenCL language, can be in the GPU equipment of NVIDIA any a support OpenCL frameworks
Realize.
Reference picture 1, the present invention can be realized by following steps:
Step 1, convolutional neural networks are initialized.
According to the following formula, the initial weight value of convolutional neural networks convolutional layer, bias, batch Normalized Scale are calculated respectively
Factor values, convolutional neural networks are initialized using three values of calculating.
Wherein,N-th of weighted value of convolutional neural networks r layer g passages is represented ,~represent to obey probability distribution
Symbol,Radical sign operation is opened in expression, and π represents pi, and exp () represents the index operation using natural constant e the bottom of as,Table
Show the bias of convolutional neural networks r layer g passages,Represent that the batch of convolutional neural networks r layer g passages is returned
One changes scale factor value.
Step 2, training sample is obtained.
From the pictures containing 20 kinds of target classifications and mark rectangle frame, 64 pictures are randomly selected.
Selected each pictures are pre-processed according to the 1st following step, the 2nd step, the 3rd step.
1st step, in [- 15,15] angular range, any anglec of rotation for choosing a value as selected each pictures
Degree, each pictures are rotated with the selected anglec of rotation.
2nd step, in [- 20,20] pixel coverage, any one value of selection is moved as the horizontal of selected each pictures
Dynamic pixel value, each pictures are moved horizontally with selected pixel value.
3rd step, in [- 20,20] pixel coverage, any vertical shifting for choosing a value as selected each pictures
Dynamic pixel value, with the selected each pictures of pixel value vertical shift, obtain pretreated picture.Will be pretreated
The length of each pictures, it is wide be disposed as 448 pixels after will form training sample set.
Step 3, the grid of training sample is divided.
Each pictures that training sample is concentrated are divided into 7*7 square net, the size of each grid is 64*
64。
Step 4, training convolutional neural networks.
Training sample set is input in convolutional neural networks.
Using computer graphics processor GPU, according in following the 1st step and the 2nd step parallel computation convolutional neural networks
Each output characteristic value of convolutional layer, the output characteristic value square by all eigenvalue clusters into convolutional layer in convolutional neural networks
Battle array.
1st step, according to the following formula, calculate the output valve of convolution operation in convolutional neural networks:
Wherein,Represent i-th of output valve obtained by convolution operation of convolutional neural networks r layer jth passages, ∑
Represent sum operation, SgThe size of convolution kernel g passages is represented,Represent the jth passage of convolutional neural networks r-1 layers
I-th of output characteristic value, * represent product operation;
2nd step, according to the following formula, calculate each output characteristic value of convolutional layer in convolutional neural networks:
Wherein, AtT-th of output characteristic value of convolutional layer in convolutional neural networks is represented, activate represents activation primitive
Operation, m represent channel size, and δ represents a minimum number for being substantially equal to 0.
The maximum for taking output characteristic in the 2*2 neighborhoods of the output characteristic value matrix of convolutional layer, as convolutional Neural
The output characteristic value matrix of network maximum pond layer.
According to the following formula, using computer graphics processor GPU, flexible maximum in parallel computation convolutional neural networks
Each output characteristic value of softmax layers, its all result form flexible maximum softmax layers in convolutional neural networks
Output characteristic value matrix.
Wherein, YzRepresent z-th of output characteristic value of convolutional neural networks flexibility maximum softmax layers, xkRepresent convolution
K-th of input feature vector value of neutral net flexibility maximum softmax layers, e represent that the input of flexible maximum softmax layers is special
Value indicative sum.
According to the 1st following step, the 2nd step, the 3rd step, the 4th step, computer graphics processor GPU, parallel computation volume are utilized
The penalty values of product neutral net output layer.
1st step, according to the following formula, calculate the position penalty values in convolutional neural networks output layer:
Wherein, L1 represents the position penalty values in convolutional neural networks output layer, and λ represents the penalty factor of target location, D
The number of grid of division image is represented, F represents the quantity of bounding box,In the β bounding box for representing the γ grid of picture
There are indicator function existing for target, uγThe abscissa of future position in the γ grid of picture is represented,Represent picture the
The abscissa of target actual positions, v in γ gridγThe ordinate of future position in the γ grid of picture is represented,
Represent the ordinate of target actual positions in the γ grid of picture, tγRepresent that the prediction of target in the γ grid of picture is wide
Degree,Represent the width that target is actual in the γ grid of picture, hγThe pre-computed altitude of target in the γ grid of picture is represented,Represent the height that target is actual in the γ grid of picture;
2nd step, according to the following formula, calculate the probability penalty values that convolutional neural networks output layer has target:
Wherein, L2 represents that convolutional neural networks output layer has the probability penalty values of target, QγRepresent the γ net of picture
The prediction probability value of target in lattice be present,Represent the actual probability of target, λ in the γ grid of picture be present1Represent not
The penalty factor of target item be present,Represent the instruction letter that target is not present in the β bounding box of the γ grid of picture
Number;
3rd step, according to the following formula, calculate the probability penalty values of convolutional neural networks output layer classification:
Wherein, L3 represents the probability penalty values of convolutional neural networks output layer classification,Represent in the γ grid of picture
Whether target existing for indicator function is had, and classes represents classification sum, pγ(cla) target in the γ grid of picture is represented
Classification is cla prediction probability value,Represent the true probability value that target classification is cla in the γ grid of picture.
4th step, according to the following formula, calculate the penalty values of convolutional neural networks output layer.
L=L1+L2+L3
Wherein, L represents the penalty values of convolutional neural networks output layer.
According to the following formula, using computer graphics processor GPU, using stochastic gradient descent method, parallel computation convolution god
Weighted value, bias after network updates.
1st step, according to the following formula, weight, the gradient of biasing of each each passage of layer of convolutional neural networks is calculated respectively
Value.
Wherein,The Grad of n-th of weight of convolutional neural networks r layer g passages is represented,Expression takes local derviation
Operation,Represent the Grad of g-th of passage biasing of convolutional neural networks r layers.
2nd step, according to the following formula, weighted value, bias after parallel computation convolutional neural networks update respectively.
Wherein,The weighted value after the renewal of convolutional neural networks r layer g passages is represented,Represent convolutional neural networks
Bias after the renewal of r layer g passages, α represent learning rate, and its span is (0,1).
Step 5, judge whether the penalty values of current convolutional neural networks output layer are less than 0.01, if so, then performing step
(6) step (2), otherwise, is performed.
Step 6, the model of the convolutional neural networks trained is preserved to hard disc of computer.
Step 7, the output characteristic of test pictures is extracted.
Never mark in the pictures of rectangle frame and randomly select 1 width picture, as test pictures.
Test pictures are input in convolutional neural networks, obtain the output characteristic of test pictures.
Step 8, the position rectangle frame of test pictures target is indicated.
According to the output characteristic of test pictures, in test pictures, target is indicated with rectangle frame, and show target
Classification.
Step 9, target end detects.
The effect of the present invention is described in further detail with reference to emulation experiment.
1. emulation experiment condition:
The heterogeneous platform of the emulation experiment of the present invention is NVDIA isomery development platforms, and wherein host side CPU is Xeon E5-
1603, graphics processor is NVDIA GTX 1080, and operating system is Ubuntu 14.04, and software environment is Eclipse CDT.
2. emulation experiment content and its interpretation of result:
The emulation experiment of the present invention is to randomly select 1 width picture in the pictures for never mark rectangle frame, as test
Picture.Test pictures are subjected to target detection using traditional YOLO algorithms and the inventive method respectively, obtain two kinds of sides respectively
Method selected test pictures are carried out needed for target detection it is time-consuming contrast, as a result as shown in table 1.
Table 1 is of the invention with the time-consuming contrast table (unit of traditional YOLO object detection methods:ms)
Traditional YOLO object detection methods take | 2613 |
The inventive method takes | 53 |
From table 1, the YOLO object detection methods proposed by the invention accelerated using OpenCL are compared traditional
YOLO object detection methods, it is time-consuming to significantly reduce.
Claims (7)
1. a kind of YOLO object detection methods accelerated using OpenCL, it is characterised in that comprise the following steps:
(1) convolutional neural networks are initialized:
Initial weight value, bias, the batch Normalized Scale factor values of convolutional neural networks convolutional layer are calculated respectively, utilize meter
Three values calculated initialize to convolutional neural networks;
(2) training sample is obtained:
(2a) randomly selects 64 width pictures from the pictures containing 20 kinds of target classifications and mark rectangle frame;
(2b) pre-processes to selected each width picture;
(2c) by the length of pretreated each width picture, it is wide be disposed as 448 pixels after form training sample set;
(3) grid of training sample is divided:
Each width picture that training sample is concentrated is divided into 7*7 square net, the size of each grid is 64*64;
(4) training convolutional neural networks:
Training sample set is input in convolutional neural networks by (4a);
(4b) utilizes computer graphics processor GPU, each output characteristic of convolutional layer in parallel computation convolutional neural networks
Value, the output characteristic value matrix by all eigenvalue clusters into convolutional layer in convolutional neural networks;
(4c) takes the maximum of output characteristic in the 2*2 neighborhoods of the output characteristic value matrix of convolutional layer, as convolutional Neural net
The output characteristic value matrix of network maximum pond layer;
(4e) utilizes computer graphics processor GPU, and flexible maximum softmax layers is every in parallel computation convolutional neural networks
One output characteristic value, the output characteristic by all eigenvalue clusters into flexible maximum softmax layers in convolutional neural networks
Value matrix;
(4f) utilizes computer graphics processor GPU, the penalty values of parallel computation convolutional neural networks output layer;
(4g) utilizes computer graphics processor GPU, using stochastic gradient descent method, the renewal of parallel computation convolutional neural networks
Weighted value, bias afterwards;
(5) judge whether the penalty values of current convolutional neural networks output layer are less than 0.01, if so, step (6) is then performed, it is no
Then, step (2) is performed;
(6) model of the convolutional neural networks trained is preserved to hard disc of computer;
(7) output characteristic of test pictures is extracted:
(7a) is never marked in the pictures of rectangle frame and is randomly selected 1 width picture, as test pictures;
Test pictures are input in convolutional neural networks by (7b), obtain the output characteristic of test pictures;
(8) the position rectangle frame of test pictures target is indicated:
According to the output characteristic of test pictures, in test pictures, target is indicated with rectangle frame, and show the class of target
Not;
(9) target end detects.
2. the YOLO object detection methods according to claim 1 accelerated using OpenCL, it is characterised in that step (1)
Described in respectively calculate the initial weight value of convolutional neural networks convolutional layer, bias, batch Normalized Scale factor values
Formula is as follows:
<mrow>
<msubsup>
<mi>w</mi>
<mrow>
<mi>n</mi>
<mi>g</mi>
</mrow>
<mi>r</mi>
</msubsup>
<mo>~</mo>
<mfrac>
<mn>1</mn>
<msqrt>
<mrow>
<mn>2</mn>
<mi>&pi;</mi>
</mrow>
</msqrt>
</mfrac>
<mi>exp</mi>
<mrow>
<mo>(</mo>
<mo>-</mo>
<mfrac>
<mrow>
<msup>
<msubsup>
<mi>w</mi>
<mrow>
<mi>n</mi>
<mi>g</mi>
</mrow>
<mi>r</mi>
</msubsup>
<mn>2</mn>
</msup>
</mrow>
<mn>2</mn>
</mfrac>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<msubsup>
<mi>b</mi>
<mi>g</mi>
<mi>r</mi>
</msubsup>
<mo>~</mo>
<mfrac>
<mn>1</mn>
<msqrt>
<mrow>
<mn>2</mn>
<mi>&pi;</mi>
</mrow>
</msqrt>
</mfrac>
<mi>exp</mi>
<mrow>
<mo>(</mo>
<mo>-</mo>
<mfrac>
<msubsup>
<mi>b</mi>
<mi>g</mi>
<msup>
<mi>r</mi>
<mn>2</mn>
</msup>
</msubsup>
<mn>2</mn>
</mfrac>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<msubsup>
<mi>scale</mi>
<mi>j</mi>
<mi>r</mi>
</msubsup>
<mo>~</mo>
<mfrac>
<mn>1</mn>
<msqrt>
<mrow>
<mn>2</mn>
<mi>&pi;</mi>
</mrow>
</msqrt>
</mfrac>
<mi>exp</mi>
<mrow>
<mo>(</mo>
<mo>-</mo>
<mfrac>
<mrow>
<msubsup>
<mi>scale</mi>
<mi>j</mi>
<msup>
<mi>r</mi>
<mn>2</mn>
</msup>
</msubsup>
</mrow>
<mn>2</mn>
</mfrac>
<mo>)</mo>
</mrow>
</mrow>
Wherein,N-th of weighted value of convolutional neural networks r layer g passages is represented ,~represent to obey probability distribution symbol,Radical sign operation is opened in expression, and π represents pi, and exp () represents the index operation using natural constant e the bottom of as,Represent convolution
The bias of neutral net r layer g passages,Represent the batch normalization chi of convolutional neural networks r layer g passages
Spend factor values.
3. the YOLO object detection methods according to claim 1 accelerated using OpenCL, it is characterised in that step (2b)
Described in comprised the following steps that to what selected each width picture was pre-processed:
1st step, in [- 15,15] angular range, any anglec of rotation for choosing a value as selected each width picture,
Each width picture is rotated with the selected anglec of rotation;
2nd step, in [- 20,20] pixel coverage, arbitrarily choose moving horizontally for each width picture of the value as selected by
Pixel value, each width picture is moved horizontally with selected pixel value;
3rd step, it is any to choose a value as the vertical shift of selected each width picture in [- 20,20] pixel coverage
Pixel value, with the selected each width picture of pixel value vertical shift, obtain pretreated picture.
4. the YOLO object detection methods according to claim 1 accelerated using OpenCL, it is characterised in that step (4b)
Described in parallel computation convolutional neural networks in each output characteristic value of convolutional layer comprise the following steps that:
1st step, according to the following formula, calculate the output valve of convolution operation in convolutional neural networks:
<mrow>
<msubsup>
<mi>C</mi>
<mrow>
<mi>i</mi>
<mi>j</mi>
</mrow>
<mi>r</mi>
</msubsup>
<mo>=</mo>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>n</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<msub>
<mi>S</mi>
<mi>g</mi>
</msub>
</munderover>
<msubsup>
<mi>x</mi>
<mrow>
<mi>i</mi>
<mi>j</mi>
</mrow>
<mrow>
<mi>r</mi>
<mo>-</mo>
<mn>1</mn>
</mrow>
</msubsup>
<mo>*</mo>
<msubsup>
<mi>w</mi>
<mrow>
<mi>n</mi>
<mi>g</mi>
</mrow>
<mi>r</mi>
</msubsup>
</mrow>
Wherein,I-th of output valve obtained by convolution operation of convolutional neural networks r layer jth passages is represented, ∑ represents
Sum operation, SgThe size of convolution kernel g passages is represented,Represent i-th of the jth passage of convolutional neural networks r-1 layers
Output characteristic value, * represent product operation;
2nd step, according to the following formula, calculate each output characteristic value of convolutional layer in convolutional neural networks:
<mrow>
<msub>
<mi>A</mi>
<mi>t</mi>
</msub>
<mo>=</mo>
<mi>a</mi>
<mi>c</mi>
<mi>t</mi>
<mi>i</mi>
<mi>v</mi>
<mi>a</mi>
<mi>t</mi>
<mi>e</mi>
<mrow>
<mo>(</mo>
<msubsup>
<mi>scale</mi>
<mi>j</mi>
<mi>r</mi>
</msubsup>
<mfrac>
<mrow>
<msubsup>
<mi>C</mi>
<mrow>
<mi>i</mi>
<mi>j</mi>
</mrow>
<mi>r</mi>
</msubsup>
<mo>-</mo>
<mfrac>
<mn>1</mn>
<mi>m</mi>
</mfrac>
<munderover>
<mi>&Sigma;</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>m</mi>
</munderover>
<msubsup>
<mi>C</mi>
<mrow>
<mi>i</mi>
<mi>j</mi>
</mrow>
<mi>r</mi>
</msubsup>
</mrow>
<msqrt>
<mrow>
<mfrac>
<mn>1</mn>
<mi>m</mi>
</mfrac>
<munderover>
<mi>&Sigma;</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>m</mi>
</munderover>
<msup>
<mrow>
<mo>(</mo>
<msubsup>
<mi>C</mi>
<mrow>
<mi>i</mi>
<mi>j</mi>
</mrow>
<mi>r</mi>
</msubsup>
<mo>-</mo>
<mfrac>
<mn>1</mn>
<mi>m</mi>
</mfrac>
<munderover>
<mi>&Sigma;</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>m</mi>
</munderover>
<msubsup>
<mi>C</mi>
<mrow>
<mi>i</mi>
<mi>j</mi>
</mrow>
<mi>r</mi>
</msubsup>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>+</mo>
<mi>&delta;</mi>
</mrow>
</msqrt>
</mfrac>
<mo>+</mo>
<msubsup>
<mi>b</mi>
<mi>g</mi>
<mi>r</mi>
</msubsup>
<mo>)</mo>
</mrow>
</mrow>
Wherein, AtT-th of output characteristic value of convolutional layer in convolutional neural networks is represented, activate represents activation primitive operation,
M represents channel size, and δ represents a minimum number for being substantially equal to 0.
5. the YOLO object detection methods according to claim 1 accelerated using OpenCL, it is characterised in that step (4e)
Described in parallel computation convolutional neural networks in flexible maximum softmax layers each output characteristic value formula such as
Under:
<mrow>
<msub>
<mi>Y</mi>
<mi>z</mi>
</msub>
<mo>=</mo>
<mfrac>
<mrow>
<mi>exp</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>x</mi>
<mi>k</mi>
</msub>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>k</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>e</mi>
</munderover>
<mi>exp</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>x</mi>
<mi>k</mi>
</msub>
<mo>)</mo>
</mrow>
</mrow>
</mfrac>
</mrow>
Wherein, YzRepresent z-th of output characteristic value of convolutional neural networks flexibility maximum softmax layers, xkRepresent convolutional Neural
K-th of input feature vector value of network flexibility maximum softmax layers, e represent the input feature vector value of flexible maximum softmax layers
Sum.
6. the YOLO object detection methods according to claim 1 accelerated using OpenCL, it is characterised in that step (4f)
Described in the penalty values of parallel computation convolutional neural networks output layer comprise the following steps that:
1st step, according to the following formula, calculate the position penalty values in convolutional neural networks output layer:
<mrow>
<mi>L</mi>
<mn>1</mn>
<mo>=</mo>
<mi>&lambda;</mi>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>&gamma;</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mi>D</mi>
</munderover>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>&beta;</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mi>F</mi>
</munderover>
<msubsup>
<mn>1</mn>
<mrow>
<mi>&gamma;</mi>
<mi>&beta;</mi>
</mrow>
<mrow>
<mi>o</mi>
<mi>b</mi>
<mi>j</mi>
</mrow>
</msubsup>
<mo>&lsqb;</mo>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>u</mi>
<mi>&gamma;</mi>
</msub>
<mo>-</mo>
<msub>
<mover>
<mi>u</mi>
<mo>^</mo>
</mover>
<mi>&gamma;</mi>
</msub>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>+</mo>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>v</mi>
<mi>&gamma;</mi>
</msub>
<mo>-</mo>
<msub>
<mover>
<mi>v</mi>
<mo>^</mo>
</mover>
<mi>&gamma;</mi>
</msub>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>&rsqb;</mo>
<mo>+</mo>
<mi>&lambda;</mi>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>&gamma;</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mi>D</mi>
</munderover>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>&beta;</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mi>F</mi>
</munderover>
<msubsup>
<mn>1</mn>
<mrow>
<mi>&gamma;</mi>
<mi>&beta;</mi>
</mrow>
<mrow>
<mi>o</mi>
<mi>b</mi>
<mi>j</mi>
</mrow>
</msubsup>
<mo>&lsqb;</mo>
<msup>
<mrow>
<mo>(</mo>
<msqrt>
<msub>
<mi>t</mi>
<mi>&gamma;</mi>
</msub>
</msqrt>
<mo>-</mo>
<msqrt>
<msub>
<mover>
<mi>t</mi>
<mo>^</mo>
</mover>
<mi>&gamma;</mi>
</msub>
</msqrt>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>+</mo>
<msup>
<mrow>
<mo>(</mo>
<msqrt>
<msub>
<mi>h</mi>
<mi>&gamma;</mi>
</msub>
</msqrt>
<mo>-</mo>
<msqrt>
<msub>
<mover>
<mi>h</mi>
<mo>^</mo>
</mover>
<mi>&gamma;</mi>
</msub>
</msqrt>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>&rsqb;</mo>
</mrow>
Wherein, L1 represents the position penalty values in convolutional neural networks output layer, and λ represents the penalty factor of target location, and D is represented
The number of grid of image is divided, F represents the quantity of bounding box,Represent there is mesh in the β bounding box of the γ grid of picture
Indicator function existing for mark, uγThe abscissa of future position in the γ grid of picture is represented,Represent γ, picture
The abscissa of target actual positions, v in gridγThe ordinate of future position in the γ grid of picture is represented,Represent
The ordinate of target actual positions, t in the γ grid of pictureγThe predicted width of target in the γ grid of picture is represented,Table
The actual width of target, h in the γ grid of diagram pieceγThe pre-computed altitude of target in the γ grid of picture is represented,Represent
The actual height of target in the γ grid of picture;
2nd step, according to the following formula, calculate the probability penalty values that convolutional neural networks output layer has target:
<mrow>
<mi>L</mi>
<mn>2</mn>
<mo>=</mo>
<munderover>
<mi>&Sigma;</mi>
<mrow>
<mi>&gamma;</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>D</mi>
</munderover>
<munderover>
<mi>&Sigma;</mi>
<mrow>
<mi>&beta;</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mi>F</mi>
</munderover>
<msubsup>
<mn>1</mn>
<mrow>
<mi>&gamma;</mi>
<mi>&beta;</mi>
</mrow>
<mrow>
<mi>o</mi>
<mi>b</mi>
<mi>j</mi>
</mrow>
</msubsup>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>Q</mi>
<mi>&gamma;</mi>
</msub>
<mo>-</mo>
<msub>
<mover>
<mi>Q</mi>
<mo>^</mo>
</mover>
<mi>&gamma;</mi>
</msub>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>+</mo>
<msub>
<mi>&lambda;</mi>
<mn>1</mn>
</msub>
<munderover>
<mi>&Sigma;</mi>
<mrow>
<mi>&gamma;</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mi>D</mi>
</munderover>
<munderover>
<mi>&Sigma;</mi>
<mrow>
<mi>&beta;</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mi>F</mi>
</munderover>
<msubsup>
<mn>1</mn>
<mrow>
<mi>&gamma;</mi>
<mi>&beta;</mi>
</mrow>
<mrow>
<mi>n</mi>
<mi>o</mi>
<mi>o</mi>
<mi>b</mi>
<mi>j</mi>
</mrow>
</msubsup>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>Q</mi>
<mi>&gamma;</mi>
</msub>
<mo>-</mo>
<msub>
<mover>
<mi>Q</mi>
<mo>^</mo>
</mover>
<mi>&gamma;</mi>
</msub>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
</mrow>
Wherein, L2 represents that convolutional neural networks output layer has the probability penalty values of target, QγRepresent in the γ grid of picture
The prediction probability value of target be present,Represent the actual probability of target, λ in the γ grid of picture be present1Expression is not present
The penalty factor of target item,Represent the indicator function that target is not present in the β bounding box of the γ grid of picture;
3rd step, according to the following formula, calculate the probability penalty values of convolutional neural networks output layer classification:
<mrow>
<mi>L</mi>
<mn>3</mn>
<mo>=</mo>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>&lambda;</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mi>D</mi>
</munderover>
<msubsup>
<mn>1</mn>
<mi>&gamma;</mi>
<mrow>
<mi>o</mi>
<mi>b</mi>
<mi>j</mi>
</mrow>
</msubsup>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>c</mi>
<mi>l</mi>
<mi>a</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mrow>
<mi>c</mi>
<mi>l</mi>
<mi>a</mi>
<mi>s</mi>
<mi>s</mi>
<mi>e</mi>
<mi>s</mi>
</mrow>
</munderover>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>p</mi>
<mi>&gamma;</mi>
</msub>
<mo>(</mo>
<mrow>
<mi>c</mi>
<mi>l</mi>
<mi>a</mi>
</mrow>
<mo>)</mo>
<mo>-</mo>
<msub>
<mover>
<mi>p</mi>
<mo>^</mo>
</mover>
<mi>&gamma;</mi>
</msub>
<mo>(</mo>
<mrow>
<mi>c</mi>
<mi>l</mi>
<mi>a</mi>
</mrow>
<mo>)</mo>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
</mrow>
Wherein, L3 represents the probability penalty values of convolutional neural networks output layer classification,Represent the γ grid of picture in whether
There is indicator function existing for target, classes represents classification sum, pγ(cla) target classification in the γ grid of picture is represented
For cla prediction probability value,Represent the true probability value that target classification is cla in the γ grid of picture;
4th step, according to the following formula, calculate the penalty values of convolutional neural networks output layer:
L=L1+L2+L3
Wherein, L represents the penalty values of convolutional neural networks output layer.
7. the YOLO object detection methods according to claim 1 accelerated using OpenCL, it is characterised in that step (4g)
Described in use stochastic gradient descent method, parallel computation convolutional neural networks renewal after weighted value, bias it is specific
Step is as follows:
1st step, according to the following formula, weight, the Grad of biasing of each each passage of layer of convolutional neural networks is calculated respectively:
<mrow>
<msubsup>
<mi>&Delta;w</mi>
<mrow>
<mi>n</mi>
<mi>g</mi>
</mrow>
<mi>r</mi>
</msubsup>
<mo>=</mo>
<mfrac>
<mrow>
<mo>&part;</mo>
<mi>L</mi>
</mrow>
<mrow>
<mo>&part;</mo>
<msubsup>
<mi>w</mi>
<mrow>
<mi>n</mi>
<mi>g</mi>
</mrow>
<mi>r</mi>
</msubsup>
</mrow>
</mfrac>
</mrow>
<mrow>
<msubsup>
<mi>&Delta;b</mi>
<mi>g</mi>
<mi>r</mi>
</msubsup>
<mo>=</mo>
<mfrac>
<mrow>
<mo>&part;</mo>
<mi>L</mi>
</mrow>
<mrow>
<mo>&part;</mo>
<msubsup>
<mi>b</mi>
<mi>g</mi>
<mi>r</mi>
</msubsup>
</mrow>
</mfrac>
</mrow>
Wherein,The Grad of n-th of weight of convolutional neural networks r layer g passages is represented,Expression takes local derviation to operate,Represent the Grad of g-th of passage biasing of convolutional neural networks r layers;
2nd step, according to the following formula, weighted value, bias after parallel computation convolutional neural networks update respectively:
<mrow>
<msubsup>
<mover>
<mi>w</mi>
<mo>&OverBar;</mo>
</mover>
<mrow>
<mi>n</mi>
<mi>g</mi>
</mrow>
<mi>r</mi>
</msubsup>
<mo>=</mo>
<msubsup>
<mi>w</mi>
<mrow>
<mi>n</mi>
<mi>g</mi>
</mrow>
<mi>r</mi>
</msubsup>
<mo>-</mo>
<msubsup>
<mi>&alpha;&Delta;w</mi>
<mrow>
<mi>n</mi>
<mi>g</mi>
</mrow>
<mi>r</mi>
</msubsup>
</mrow>
<mrow>
<msubsup>
<mover>
<mi>b</mi>
<mo>&OverBar;</mo>
</mover>
<mi>g</mi>
<mi>r</mi>
</msubsup>
<mo>=</mo>
<msubsup>
<mi>b</mi>
<mi>g</mi>
<mi>r</mi>
</msubsup>
<mo>-</mo>
<msubsup>
<mi>&alpha;&Delta;b</mi>
<mi>g</mi>
<mi>r</mi>
</msubsup>
</mrow>
Wherein,The weighted value after the renewal of convolutional neural networks r layer g passages is represented,Represent convolutional neural networks r layers
Bias after the renewal of g passages, α represent learning rate, and its span is (0,1).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710798823.7A CN107563392A (en) | 2017-09-07 | 2017-09-07 | The YOLO object detection methods accelerated using OpenCL |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710798823.7A CN107563392A (en) | 2017-09-07 | 2017-09-07 | The YOLO object detection methods accelerated using OpenCL |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107563392A true CN107563392A (en) | 2018-01-09 |
Family
ID=60979539
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710798823.7A Pending CN107563392A (en) | 2017-09-07 | 2017-09-07 | The YOLO object detection methods accelerated using OpenCL |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107563392A (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108289177A (en) * | 2018-02-13 | 2018-07-17 | 北京旷视科技有限公司 | Information interacting method, apparatus and system |
CN108805064A (en) * | 2018-05-31 | 2018-11-13 | 中国农业大学 | A kind of fish detection and localization and recognition methods and system based on deep learning |
CN108830195A (en) * | 2018-05-31 | 2018-11-16 | 西安电子科技大学 | Image classification method based on on-site programmable gate array FPGA |
CN108982901A (en) * | 2018-06-14 | 2018-12-11 | 哈尔滨工业大学 | A kind of rotating speed measurement method of at the uniform velocity rotary body |
CN109447034A (en) * | 2018-11-14 | 2019-03-08 | 北京信息科技大学 | Traffic mark detection method in automatic Pilot based on YOLOv3 network |
CN109684143A (en) * | 2018-12-26 | 2019-04-26 | 郑州云海信息技术有限公司 | A kind of method and device of the test GPU performance based on deep learning |
CN109858569A (en) * | 2019-03-07 | 2019-06-07 | 中国科学院自动化研究所 | Multi-tag object detecting method, system, device based on target detection network |
CN109978043A (en) * | 2019-03-19 | 2019-07-05 | 新华三技术有限公司 | A kind of object detection method and device |
CN109977783A (en) * | 2019-02-28 | 2019-07-05 | 浙江新再灵科技股份有限公司 | Method based on the independent boarding detection of vertical ladder scene perambulator |
CN110110844A (en) * | 2019-04-24 | 2019-08-09 | 西安电子科技大学 | Convolutional neural networks method for parallel processing based on OpenCL |
CN110826379A (en) * | 2018-08-13 | 2020-02-21 | 中国科学院长春光学精密机械与物理研究所 | Target detection method based on feature multiplexing and YOLOv3 |
CN111078195A (en) * | 2018-10-18 | 2020-04-28 | 中国科学院长春光学精密机械与物理研究所 | Target capture parallel acceleration method based on OPENCL |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104463324A (en) * | 2014-11-21 | 2015-03-25 | 长沙马沙电子科技有限公司 | Convolution neural network parallel processing method based on large-scale high-performance cluster |
CN104680558A (en) * | 2015-03-14 | 2015-06-03 | 西安电子科技大学 | Struck target tracking method using GPU hardware for acceleration |
CN105160349A (en) * | 2015-08-06 | 2015-12-16 | 深圳市哈工大交通电子技术有限公司 | Haar detection object algorithm based on GPU platform |
CN106997475A (en) * | 2017-02-24 | 2017-08-01 | 中国科学院合肥物质科学研究院 | A kind of insect image-recognizing method based on parallel-convolution neutral net |
-
2017
- 2017-09-07 CN CN201710798823.7A patent/CN107563392A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104463324A (en) * | 2014-11-21 | 2015-03-25 | 长沙马沙电子科技有限公司 | Convolution neural network parallel processing method based on large-scale high-performance cluster |
CN104680558A (en) * | 2015-03-14 | 2015-06-03 | 西安电子科技大学 | Struck target tracking method using GPU hardware for acceleration |
CN105160349A (en) * | 2015-08-06 | 2015-12-16 | 深圳市哈工大交通电子技术有限公司 | Haar detection object algorithm based on GPU platform |
CN106997475A (en) * | 2017-02-24 | 2017-08-01 | 中国科学院合肥物质科学研究院 | A kind of insect image-recognizing method based on parallel-convolution neutral net |
Non-Patent Citations (4)
Title |
---|
ANDRÉ R. BRODTKORB等: "Graphics processing unit (GPU) programming strategies and trends in GPU computing", 《J. PARALLEL DISTRIB. COMPUT.》 * |
JOSEPH REDMON等: ""You Only Look Once:Unified, Real-Time Object Detection"", 《网页在线公开:ARXIV:1506.02640V5》 * |
LOC NGUYEN HUYNH等: ""Demo: GPU-based image recognition and object detection on commodity mobile devices"", 《MOBISYS’16 COMPANION PROCEEDINGS OF THE 14TH ANNUAL INTERNATIONAL CONFERENCE ON MOBILE SYSTEMS,APPLICATIONS,AND SERVICES COMPANION》 * |
STEVE LAWRENCE等: "Face Recognition: A Convolutional Neural-Network Approach", 《IEEE TRANSACTIONS ON NEURAL NETWORKS》 * |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108289177A (en) * | 2018-02-13 | 2018-07-17 | 北京旷视科技有限公司 | Information interacting method, apparatus and system |
CN108289177B (en) * | 2018-02-13 | 2020-10-16 | 北京旷视科技有限公司 | Information interaction method, device and system |
CN108805064A (en) * | 2018-05-31 | 2018-11-13 | 中国农业大学 | A kind of fish detection and localization and recognition methods and system based on deep learning |
CN108830195A (en) * | 2018-05-31 | 2018-11-16 | 西安电子科技大学 | Image classification method based on on-site programmable gate array FPGA |
CN108982901A (en) * | 2018-06-14 | 2018-12-11 | 哈尔滨工业大学 | A kind of rotating speed measurement method of at the uniform velocity rotary body |
CN108982901B (en) * | 2018-06-14 | 2020-06-09 | 哈尔滨工业大学 | Method for measuring rotating speed of uniform-speed rotating body |
CN110826379A (en) * | 2018-08-13 | 2020-02-21 | 中国科学院长春光学精密机械与物理研究所 | Target detection method based on feature multiplexing and YOLOv3 |
CN110826379B (en) * | 2018-08-13 | 2022-03-22 | 中国科学院长春光学精密机械与物理研究所 | Target detection method based on feature multiplexing and YOLOv3 |
CN111078195A (en) * | 2018-10-18 | 2020-04-28 | 中国科学院长春光学精密机械与物理研究所 | Target capture parallel acceleration method based on OPENCL |
CN109447034A (en) * | 2018-11-14 | 2019-03-08 | 北京信息科技大学 | Traffic mark detection method in automatic Pilot based on YOLOv3 network |
CN109447034B (en) * | 2018-11-14 | 2021-04-06 | 北京信息科技大学 | Traffic sign detection method in automatic driving based on YOLOv3 network |
CN109684143A (en) * | 2018-12-26 | 2019-04-26 | 郑州云海信息技术有限公司 | A kind of method and device of the test GPU performance based on deep learning |
CN109977783A (en) * | 2019-02-28 | 2019-07-05 | 浙江新再灵科技股份有限公司 | Method based on the independent boarding detection of vertical ladder scene perambulator |
CN109858569A (en) * | 2019-03-07 | 2019-06-07 | 中国科学院自动化研究所 | Multi-tag object detecting method, system, device based on target detection network |
CN109978043A (en) * | 2019-03-19 | 2019-07-05 | 新华三技术有限公司 | A kind of object detection method and device |
CN110110844A (en) * | 2019-04-24 | 2019-08-09 | 西安电子科技大学 | Convolutional neural networks method for parallel processing based on OpenCL |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107563392A (en) | The YOLO object detection methods accelerated using OpenCL | |
CN108985238A (en) | The high-resolution remote sensing image impervious surface extracting method and system of combined depth study and semantic probability | |
Zheng et al. | Edge effects in fragmented landscapes: a generic model for delineating area of edge influences (D-AEI) | |
CN109886359A (en) | Small target detecting method and detection model based on convolutional neural networks | |
CN108062756A (en) | Image, semantic dividing method based on the full convolutional network of depth and condition random field | |
CN111178206B (en) | Building embedded part detection method and system based on improved YOLO | |
CN108334499A (en) | A kind of text label tagging equipment, method and computing device | |
CN108596101A (en) | A kind of remote sensing images multi-target detection method based on convolutional neural networks | |
CN110263833A (en) | Based on coding-decoding structure image, semantic dividing method | |
CN107392973A (en) | Pixel-level handwritten Chinese character automatic generation method, storage device, processing unit | |
CN111833237B (en) | Image registration method based on convolutional neural network and local homography transformation | |
CN104217438A (en) | Image significance detection method based on semi-supervision | |
CN108241854A (en) | A kind of deep video conspicuousness detection method based on movement and recall info | |
CN110516677A (en) | A kind of neural network recognization model, target identification method and system | |
CN107967474A (en) | A kind of sea-surface target conspicuousness detection method based on convolutional neural networks | |
CN103268607B (en) | A kind of common object detection method under weak supervision condition | |
CN107092883A (en) | Object identification method for tracing | |
CN108447057A (en) | SAR image change detection based on conspicuousness and depth convolutional network | |
CN106372597B (en) | CNN Vehicle Detection method based on adaptive contextual information | |
CN109255304A (en) | Method for tracking target based on distribution field feature | |
CN108776777A (en) | The recognition methods of spatial relationship between a kind of remote sensing image object based on Faster RCNN | |
CN108664986A (en) | Based on lpThe multi-task learning image classification method and system of norm regularization | |
CN107506792A (en) | A kind of semi-supervised notable method for checking object | |
CN107239532A (en) | Data digging method and device | |
CN109948527A (en) | Small sample terahertz image foreign matter detecting method based on integrated deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180109 |
|
RJ01 | Rejection of invention patent application after publication |