CN107563392A - The YOLO object detection methods accelerated using OpenCL - Google Patents

The YOLO object detection methods accelerated using OpenCL Download PDF

Info

Publication number
CN107563392A
CN107563392A CN201710798823.7A CN201710798823A CN107563392A CN 107563392 A CN107563392 A CN 107563392A CN 201710798823 A CN201710798823 A CN 201710798823A CN 107563392 A CN107563392 A CN 107563392A
Authority
CN
China
Prior art keywords
mrow
msubsup
convolutional neural
neural networks
msub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710798823.7A
Other languages
Chinese (zh)
Inventor
田小林
张晰
逯甜甜
赵启明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201710798823.7A priority Critical patent/CN107563392A/en
Publication of CN107563392A publication Critical patent/CN107563392A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses a kind of YOLO object detection methods accelerated using GPU hardware, the step of realizing, is:(1) convolutional neural networks are initialized;(2) training sample is obtained;(3) grid of training sample is divided;(4) training convolutional neural networks;(5) judge whether penalty values are less than 0.01, if so, the convolutional neural networks model trained is preserved, if it is not, the training sample obtained next time continues to train;(6) model of the convolutional neural networks trained is preserved to hard disc of computer;(7) feature of test pictures is extracted;(8) the position rectangle frame of test pictures target is determined;(9) target end detects.Available realize on a general-purpose computer of the present invention carries out feature extraction to the target in image, then indicates target position with position rectangle frame, and indicate the classification of target in the rectangle frame upper right corner.

Description

The YOLO object detection methods accelerated using OpenCL
Technical field
The invention belongs to field of computer technology, further relates in computer vision and depth learning technology field One kind accelerated using open computing language OpenCL (Open Computing Language) you only have a look at YOLO (You Only Look Once) object detection method.The present invention can realize to be examined to the YOLO targets based on depth convolutional neural networks Survey method accelerates, and the target in picture can be detected in real time with realizing on a general-purpose computer.
Background technology
It is the core technology of computer vision field with high speed, high performance object detection method.In recent years based on deep Degree convolutional neural networks yield unusually brilliant results in computer vision field, based on the image classification of depth convolutional neural networks, target inspection Compared with traditional method, Classification and Identification accuracy rate is obviously improved method of determining and calculating.For the target detection of complex scene, based on depth The algorithm of convolutional neural networks has stronger robustness, is suitable for scene, the change of intensity of illumination.
Patent application " a kind of convolution based on extensive High-Performance Computing Cluster that Changsha Ma Sha Electronic Science and Technology Co., Ltd.s propose Neural network concurrent the processing method " (applying date:On November 21st, 2014, application number:2014106748603, publication number: CN104463324A a kind of convolutional neural networks method for parallel processing based on extensive High-Performance Computing Cluster is disclosed in).The party The network model that method will train first constructs multiple copies, the model parameter all same of each copy, the number of copy with The nodes of High-Performance Computing Cluster are identical, and a model copy is distributed on each node;A node is selected as host node, is responsible for The broadcast and collection of model parameter.Secondly, training set is divided into some subsets, every time by training subset be distributed to except host node it Remaining outer child node, it is common to carry out the calculating of parameter gradients, and Grad is added up, aggregate-value is used for updating host node model Parameter, the model parameter after renewal is broadcast to each child node, until model training terminates.But this method Shortcomings , the parallel-convolution neutral net in extensive High-Performance Computing Cluster, code can not transplant on general computer, the big day of one's doom Its application is made.
Patent application " a kind of Haar based on GPU platform that Shenzhen Hagongda Traffic Electronic Technology Co., Ltd. proposes Detect target algorithm " (the applying date:On December 16th, 2015, application number:2015104762047, publication number:CN105160349A) In disclose it is a kind of based on GPU accelerate Haar classifier object detection method.This method reads sorter model text first Part information, calculates the scaled size information of all sizes, and transfers data in GPU equipment, then calculates integrogram peace Square integrogram, standard deviation figure corresponding to different scale is obtained, finally realize mesh using the mode of haar grader parallel processings Mark detection.But this method Shortcomings, only calculate integrogram and this side for manually selecting feature of square integrogram Formula, do not have good generalization ability to more complicated scene, accuracy of detection is relatively low.
The content of the invention
It is an object of the invention to overcome above-mentioned the deficiencies in the prior art, there is provided a kind of YOLO accelerated using OpenCL Object detection method, it can be achieved to detect the target in image.
The step of the present invention is as follows:
(1) convolutional neural networks are initialized:
Initial weight value, bias, the batch Normalized Scale factor values of convolutional neural networks convolutional layer, profit are calculated respectively Convolutional neural networks are initialized with three values of calculating;
(2) training sample is obtained:
(2a) randomly selects 64 width pictures from the pictures containing 20 kinds of target classifications and mark rectangle frame;
(2b) pre-processes to selected each width picture;
(2c) by the length of pretreated each width picture, it is wide be disposed as 448 pixels after form training sample set;
(3) grid of training sample is divided:
Each width picture that training sample is concentrated is divided into 7*7 square net, the size of each grid is 64* 64;
(4) training convolutional neural networks:
Training sample set is input in convolutional neural networks by (4a);
(4b) utilizes computer graphics processor GPU, each output of convolutional layer in parallel computation convolutional neural networks Characteristic value, the output characteristic value matrix by all eigenvalue clusters into convolutional layer in convolutional neural networks;
(4c) takes the maximum of output characteristic in the 2*2 neighborhoods of the output characteristic value matrix of convolutional layer, as convolution god Output characteristic value matrix through network maximum pond layer;
(4e) utilizes computer graphics processor GPU, flexible maximum softmax layers in parallel computation convolutional neural networks Each output characteristic value, the output by all eigenvalue clusters into flexible maximum softmax layers in convolutional neural networks Eigenvalue matrix;
(4f) utilizes computer graphics processor GPU, the penalty values of parallel computation convolutional neural networks output layer;
(4g) utilizes computer graphics processor GPU, using stochastic gradient descent method, parallel computation convolutional neural networks Weighted value, bias;
(5) judge whether the penalty values of current convolutional neural networks output layer are less than 0.01, if so, step (6) is then performed, Otherwise, step (2) is performed;
(6) model of the convolutional neural networks trained is preserved to hard disc of computer;
(7) output characteristic of test pictures is extracted:
(7a) is never marked in the pictures of rectangle frame and is randomly selected 1 width picture, as test pictures;
Test pictures are input in convolutional neural networks by (7b), obtain the output characteristic of test pictures;
(8) the position rectangle frame of test pictures target is indicated:
According to the output characteristic of test pictures, in test pictures, target is indicated with rectangle frame, and show target Classification;
(9) target end detects.
The present invention compared with prior art, has advantages below:
First, the present invention uses computer graphics processor GPU, the parallel feature for extracting training sample and test sample, Overcome and extract that characteristics of image is complicated, portable poor asks parallel using extensive High-Performance Computing Cluster in the prior art Topic so that while the extraction rate of sample characteristics is ensured, the portability of code greatly enhances the present invention.
Second, the present invention uses computer graphics processor GPU, the parallel feature for extracting training sample and test sample, Overcome the problem of relatively low to complex scene target detection precision in the prior art so that the present invention greatly strengthen target detection Generalization ability of the algorithm under several scenes, improve the precision of target detection.
Brief description of the drawings
Fig. 1 is the flow chart of the present invention.
Embodiment
The invention will be further described below in conjunction with the accompanying drawings.
The present invention uses OpenCL language, can be in the GPU equipment of NVIDIA any a support OpenCL frameworks Realize.
Reference picture 1, the present invention can be realized by following steps:
Step 1, convolutional neural networks are initialized.
According to the following formula, the initial weight value of convolutional neural networks convolutional layer, bias, batch Normalized Scale are calculated respectively Factor values, convolutional neural networks are initialized using three values of calculating.
Wherein,N-th of weighted value of convolutional neural networks r layer g passages is represented ,~represent to obey probability distribution Symbol,Radical sign operation is opened in expression, and π represents pi, and exp () represents the index operation using natural constant e the bottom of as,Table Show the bias of convolutional neural networks r layer g passages,Represent that the batch of convolutional neural networks r layer g passages is returned One changes scale factor value.
Step 2, training sample is obtained.
From the pictures containing 20 kinds of target classifications and mark rectangle frame, 64 pictures are randomly selected.
Selected each pictures are pre-processed according to the 1st following step, the 2nd step, the 3rd step.
1st step, in [- 15,15] angular range, any anglec of rotation for choosing a value as selected each pictures Degree, each pictures are rotated with the selected anglec of rotation.
2nd step, in [- 20,20] pixel coverage, any one value of selection is moved as the horizontal of selected each pictures Dynamic pixel value, each pictures are moved horizontally with selected pixel value.
3rd step, in [- 20,20] pixel coverage, any vertical shifting for choosing a value as selected each pictures Dynamic pixel value, with the selected each pictures of pixel value vertical shift, obtain pretreated picture.Will be pretreated The length of each pictures, it is wide be disposed as 448 pixels after will form training sample set.
Step 3, the grid of training sample is divided.
Each pictures that training sample is concentrated are divided into 7*7 square net, the size of each grid is 64* 64。
Step 4, training convolutional neural networks.
Training sample set is input in convolutional neural networks.
Using computer graphics processor GPU, according in following the 1st step and the 2nd step parallel computation convolutional neural networks Each output characteristic value of convolutional layer, the output characteristic value square by all eigenvalue clusters into convolutional layer in convolutional neural networks Battle array.
1st step, according to the following formula, calculate the output valve of convolution operation in convolutional neural networks:
Wherein,Represent i-th of output valve obtained by convolution operation of convolutional neural networks r layer jth passages, ∑ Represent sum operation, SgThe size of convolution kernel g passages is represented,Represent the jth passage of convolutional neural networks r-1 layers I-th of output characteristic value, * represent product operation;
2nd step, according to the following formula, calculate each output characteristic value of convolutional layer in convolutional neural networks:
Wherein, AtT-th of output characteristic value of convolutional layer in convolutional neural networks is represented, activate represents activation primitive Operation, m represent channel size, and δ represents a minimum number for being substantially equal to 0.
The maximum for taking output characteristic in the 2*2 neighborhoods of the output characteristic value matrix of convolutional layer, as convolutional Neural The output characteristic value matrix of network maximum pond layer.
According to the following formula, using computer graphics processor GPU, flexible maximum in parallel computation convolutional neural networks Each output characteristic value of softmax layers, its all result form flexible maximum softmax layers in convolutional neural networks Output characteristic value matrix.
Wherein, YzRepresent z-th of output characteristic value of convolutional neural networks flexibility maximum softmax layers, xkRepresent convolution K-th of input feature vector value of neutral net flexibility maximum softmax layers, e represent that the input of flexible maximum softmax layers is special Value indicative sum.
According to the 1st following step, the 2nd step, the 3rd step, the 4th step, computer graphics processor GPU, parallel computation volume are utilized The penalty values of product neutral net output layer.
1st step, according to the following formula, calculate the position penalty values in convolutional neural networks output layer:
Wherein, L1 represents the position penalty values in convolutional neural networks output layer, and λ represents the penalty factor of target location, D The number of grid of division image is represented, F represents the quantity of bounding box,In the β bounding box for representing the γ grid of picture There are indicator function existing for target, uγThe abscissa of future position in the γ grid of picture is represented,Represent picture the The abscissa of target actual positions, v in γ gridγThe ordinate of future position in the γ grid of picture is represented, Represent the ordinate of target actual positions in the γ grid of picture, tγRepresent that the prediction of target in the γ grid of picture is wide Degree,Represent the width that target is actual in the γ grid of picture, hγThe pre-computed altitude of target in the γ grid of picture is represented,Represent the height that target is actual in the γ grid of picture;
2nd step, according to the following formula, calculate the probability penalty values that convolutional neural networks output layer has target:
Wherein, L2 represents that convolutional neural networks output layer has the probability penalty values of target, QγRepresent the γ net of picture The prediction probability value of target in lattice be present,Represent the actual probability of target, λ in the γ grid of picture be present1Represent not The penalty factor of target item be present,Represent the instruction letter that target is not present in the β bounding box of the γ grid of picture Number;
3rd step, according to the following formula, calculate the probability penalty values of convolutional neural networks output layer classification:
Wherein, L3 represents the probability penalty values of convolutional neural networks output layer classification,Represent in the γ grid of picture Whether target existing for indicator function is had, and classes represents classification sum, pγ(cla) target in the γ grid of picture is represented Classification is cla prediction probability value,Represent the true probability value that target classification is cla in the γ grid of picture.
4th step, according to the following formula, calculate the penalty values of convolutional neural networks output layer.
L=L1+L2+L3
Wherein, L represents the penalty values of convolutional neural networks output layer.
According to the following formula, using computer graphics processor GPU, using stochastic gradient descent method, parallel computation convolution god Weighted value, bias after network updates.
1st step, according to the following formula, weight, the gradient of biasing of each each passage of layer of convolutional neural networks is calculated respectively Value.
Wherein,The Grad of n-th of weight of convolutional neural networks r layer g passages is represented,Expression takes local derviation Operation,Represent the Grad of g-th of passage biasing of convolutional neural networks r layers.
2nd step, according to the following formula, weighted value, bias after parallel computation convolutional neural networks update respectively.
Wherein,The weighted value after the renewal of convolutional neural networks r layer g passages is represented,Represent convolutional neural networks Bias after the renewal of r layer g passages, α represent learning rate, and its span is (0,1).
Step 5, judge whether the penalty values of current convolutional neural networks output layer are less than 0.01, if so, then performing step (6) step (2), otherwise, is performed.
Step 6, the model of the convolutional neural networks trained is preserved to hard disc of computer.
Step 7, the output characteristic of test pictures is extracted.
Never mark in the pictures of rectangle frame and randomly select 1 width picture, as test pictures.
Test pictures are input in convolutional neural networks, obtain the output characteristic of test pictures.
Step 8, the position rectangle frame of test pictures target is indicated.
According to the output characteristic of test pictures, in test pictures, target is indicated with rectangle frame, and show target Classification.
Step 9, target end detects.
The effect of the present invention is described in further detail with reference to emulation experiment.
1. emulation experiment condition:
The heterogeneous platform of the emulation experiment of the present invention is NVDIA isomery development platforms, and wherein host side CPU is Xeon E5- 1603, graphics processor is NVDIA GTX 1080, and operating system is Ubuntu 14.04, and software environment is Eclipse CDT.
2. emulation experiment content and its interpretation of result:
The emulation experiment of the present invention is to randomly select 1 width picture in the pictures for never mark rectangle frame, as test Picture.Test pictures are subjected to target detection using traditional YOLO algorithms and the inventive method respectively, obtain two kinds of sides respectively Method selected test pictures are carried out needed for target detection it is time-consuming contrast, as a result as shown in table 1.
Table 1 is of the invention with the time-consuming contrast table (unit of traditional YOLO object detection methods:ms)
Traditional YOLO object detection methods take 2613
The inventive method takes 53
From table 1, the YOLO object detection methods proposed by the invention accelerated using OpenCL are compared traditional YOLO object detection methods, it is time-consuming to significantly reduce.

Claims (7)

1. a kind of YOLO object detection methods accelerated using OpenCL, it is characterised in that comprise the following steps:
(1) convolutional neural networks are initialized:
Initial weight value, bias, the batch Normalized Scale factor values of convolutional neural networks convolutional layer are calculated respectively, utilize meter Three values calculated initialize to convolutional neural networks;
(2) training sample is obtained:
(2a) randomly selects 64 width pictures from the pictures containing 20 kinds of target classifications and mark rectangle frame;
(2b) pre-processes to selected each width picture;
(2c) by the length of pretreated each width picture, it is wide be disposed as 448 pixels after form training sample set;
(3) grid of training sample is divided:
Each width picture that training sample is concentrated is divided into 7*7 square net, the size of each grid is 64*64;
(4) training convolutional neural networks:
Training sample set is input in convolutional neural networks by (4a);
(4b) utilizes computer graphics processor GPU, each output characteristic of convolutional layer in parallel computation convolutional neural networks Value, the output characteristic value matrix by all eigenvalue clusters into convolutional layer in convolutional neural networks;
(4c) takes the maximum of output characteristic in the 2*2 neighborhoods of the output characteristic value matrix of convolutional layer, as convolutional Neural net The output characteristic value matrix of network maximum pond layer;
(4e) utilizes computer graphics processor GPU, and flexible maximum softmax layers is every in parallel computation convolutional neural networks One output characteristic value, the output characteristic by all eigenvalue clusters into flexible maximum softmax layers in convolutional neural networks Value matrix;
(4f) utilizes computer graphics processor GPU, the penalty values of parallel computation convolutional neural networks output layer;
(4g) utilizes computer graphics processor GPU, using stochastic gradient descent method, the renewal of parallel computation convolutional neural networks Weighted value, bias afterwards;
(5) judge whether the penalty values of current convolutional neural networks output layer are less than 0.01, if so, step (6) is then performed, it is no Then, step (2) is performed;
(6) model of the convolutional neural networks trained is preserved to hard disc of computer;
(7) output characteristic of test pictures is extracted:
(7a) is never marked in the pictures of rectangle frame and is randomly selected 1 width picture, as test pictures;
Test pictures are input in convolutional neural networks by (7b), obtain the output characteristic of test pictures;
(8) the position rectangle frame of test pictures target is indicated:
According to the output characteristic of test pictures, in test pictures, target is indicated with rectangle frame, and show the class of target Not;
(9) target end detects.
2. the YOLO object detection methods according to claim 1 accelerated using OpenCL, it is characterised in that step (1) Described in respectively calculate the initial weight value of convolutional neural networks convolutional layer, bias, batch Normalized Scale factor values Formula is as follows:
<mrow> <msubsup> <mi>w</mi> <mrow> <mi>n</mi> <mi>g</mi> </mrow> <mi>r</mi> </msubsup> <mo>~</mo> <mfrac> <mn>1</mn> <msqrt> <mrow> <mn>2</mn> <mi>&amp;pi;</mi> </mrow> </msqrt> </mfrac> <mi>exp</mi> <mrow> <mo>(</mo> <mo>-</mo> <mfrac> <mrow> <msup> <msubsup> <mi>w</mi> <mrow> <mi>n</mi> <mi>g</mi> </mrow> <mi>r</mi> </msubsup> <mn>2</mn> </msup> </mrow> <mn>2</mn> </mfrac> <mo>)</mo> </mrow> </mrow>
<mrow> <msubsup> <mi>b</mi> <mi>g</mi> <mi>r</mi> </msubsup> <mo>~</mo> <mfrac> <mn>1</mn> <msqrt> <mrow> <mn>2</mn> <mi>&amp;pi;</mi> </mrow> </msqrt> </mfrac> <mi>exp</mi> <mrow> <mo>(</mo> <mo>-</mo> <mfrac> <msubsup> <mi>b</mi> <mi>g</mi> <msup> <mi>r</mi> <mn>2</mn> </msup> </msubsup> <mn>2</mn> </mfrac> <mo>)</mo> </mrow> </mrow>
<mrow> <msubsup> <mi>scale</mi> <mi>j</mi> <mi>r</mi> </msubsup> <mo>~</mo> <mfrac> <mn>1</mn> <msqrt> <mrow> <mn>2</mn> <mi>&amp;pi;</mi> </mrow> </msqrt> </mfrac> <mi>exp</mi> <mrow> <mo>(</mo> <mo>-</mo> <mfrac> <mrow> <msubsup> <mi>scale</mi> <mi>j</mi> <msup> <mi>r</mi> <mn>2</mn> </msup> </msubsup> </mrow> <mn>2</mn> </mfrac> <mo>)</mo> </mrow> </mrow>
Wherein,N-th of weighted value of convolutional neural networks r layer g passages is represented ,~represent to obey probability distribution symbol,Radical sign operation is opened in expression, and π represents pi, and exp () represents the index operation using natural constant e the bottom of as,Represent convolution The bias of neutral net r layer g passages,Represent the batch normalization chi of convolutional neural networks r layer g passages Spend factor values.
3. the YOLO object detection methods according to claim 1 accelerated using OpenCL, it is characterised in that step (2b) Described in comprised the following steps that to what selected each width picture was pre-processed:
1st step, in [- 15,15] angular range, any anglec of rotation for choosing a value as selected each width picture, Each width picture is rotated with the selected anglec of rotation;
2nd step, in [- 20,20] pixel coverage, arbitrarily choose moving horizontally for each width picture of the value as selected by Pixel value, each width picture is moved horizontally with selected pixel value;
3rd step, it is any to choose a value as the vertical shift of selected each width picture in [- 20,20] pixel coverage Pixel value, with the selected each width picture of pixel value vertical shift, obtain pretreated picture.
4. the YOLO object detection methods according to claim 1 accelerated using OpenCL, it is characterised in that step (4b) Described in parallel computation convolutional neural networks in each output characteristic value of convolutional layer comprise the following steps that:
1st step, according to the following formula, calculate the output valve of convolution operation in convolutional neural networks:
<mrow> <msubsup> <mi>C</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> <mi>r</mi> </msubsup> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <msub> <mi>S</mi> <mi>g</mi> </msub> </munderover> <msubsup> <mi>x</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> <mrow> <mi>r</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>*</mo> <msubsup> <mi>w</mi> <mrow> <mi>n</mi> <mi>g</mi> </mrow> <mi>r</mi> </msubsup> </mrow>
Wherein,I-th of output valve obtained by convolution operation of convolutional neural networks r layer jth passages is represented, ∑ represents Sum operation, SgThe size of convolution kernel g passages is represented,Represent i-th of the jth passage of convolutional neural networks r-1 layers Output characteristic value, * represent product operation;
2nd step, according to the following formula, calculate each output characteristic value of convolutional layer in convolutional neural networks:
<mrow> <msub> <mi>A</mi> <mi>t</mi> </msub> <mo>=</mo> <mi>a</mi> <mi>c</mi> <mi>t</mi> <mi>i</mi> <mi>v</mi> <mi>a</mi> <mi>t</mi> <mi>e</mi> <mrow> <mo>(</mo> <msubsup> <mi>scale</mi> <mi>j</mi> <mi>r</mi> </msubsup> <mfrac> <mrow> <msubsup> <mi>C</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> <mi>r</mi> </msubsup> <mo>-</mo> <mfrac> <mn>1</mn> <mi>m</mi> </mfrac> <munderover> <mi>&amp;Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <msubsup> <mi>C</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> <mi>r</mi> </msubsup> </mrow> <msqrt> <mrow> <mfrac> <mn>1</mn> <mi>m</mi> </mfrac> <munderover> <mi>&amp;Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <msup> <mrow> <mo>(</mo> <msubsup> <mi>C</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> <mi>r</mi> </msubsup> <mo>-</mo> <mfrac> <mn>1</mn> <mi>m</mi> </mfrac> <munderover> <mi>&amp;Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <msubsup> <mi>C</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> <mi>r</mi> </msubsup> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>+</mo> <mi>&amp;delta;</mi> </mrow> </msqrt> </mfrac> <mo>+</mo> <msubsup> <mi>b</mi> <mi>g</mi> <mi>r</mi> </msubsup> <mo>)</mo> </mrow> </mrow>
Wherein, AtT-th of output characteristic value of convolutional layer in convolutional neural networks is represented, activate represents activation primitive operation, M represents channel size, and δ represents a minimum number for being substantially equal to 0.
5. the YOLO object detection methods according to claim 1 accelerated using OpenCL, it is characterised in that step (4e) Described in parallel computation convolutional neural networks in flexible maximum softmax layers each output characteristic value formula such as Under:
<mrow> <msub> <mi>Y</mi> <mi>z</mi> </msub> <mo>=</mo> <mfrac> <mrow> <mi>exp</mi> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>e</mi> </munderover> <mi>exp</mi> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow>
Wherein, YzRepresent z-th of output characteristic value of convolutional neural networks flexibility maximum softmax layers, xkRepresent convolutional Neural K-th of input feature vector value of network flexibility maximum softmax layers, e represent the input feature vector value of flexible maximum softmax layers Sum.
6. the YOLO object detection methods according to claim 1 accelerated using OpenCL, it is characterised in that step (4f) Described in the penalty values of parallel computation convolutional neural networks output layer comprise the following steps that:
1st step, according to the following formula, calculate the position penalty values in convolutional neural networks output layer:
<mrow> <mi>L</mi> <mn>1</mn> <mo>=</mo> <mi>&amp;lambda;</mi> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>&amp;gamma;</mi> <mo>=</mo> <mn>0</mn> </mrow> <mi>D</mi> </munderover> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>&amp;beta;</mi> <mo>=</mo> <mn>0</mn> </mrow> <mi>F</mi> </munderover> <msubsup> <mn>1</mn> <mrow> <mi>&amp;gamma;</mi> <mi>&amp;beta;</mi> </mrow> <mrow> <mi>o</mi> <mi>b</mi> <mi>j</mi> </mrow> </msubsup> <mo>&amp;lsqb;</mo> <msup> <mrow> <mo>(</mo> <msub> <mi>u</mi> <mi>&amp;gamma;</mi> </msub> <mo>-</mo> <msub> <mover> <mi>u</mi> <mo>^</mo> </mover> <mi>&amp;gamma;</mi> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>+</mo> <msup> <mrow> <mo>(</mo> <msub> <mi>v</mi> <mi>&amp;gamma;</mi> </msub> <mo>-</mo> <msub> <mover> <mi>v</mi> <mo>^</mo> </mover> <mi>&amp;gamma;</mi> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>&amp;rsqb;</mo> <mo>+</mo> <mi>&amp;lambda;</mi> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>&amp;gamma;</mi> <mo>=</mo> <mn>0</mn> </mrow> <mi>D</mi> </munderover> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>&amp;beta;</mi> <mo>=</mo> <mn>0</mn> </mrow> <mi>F</mi> </munderover> <msubsup> <mn>1</mn> <mrow> <mi>&amp;gamma;</mi> <mi>&amp;beta;</mi> </mrow> <mrow> <mi>o</mi> <mi>b</mi> <mi>j</mi> </mrow> </msubsup> <mo>&amp;lsqb;</mo> <msup> <mrow> <mo>(</mo> <msqrt> <msub> <mi>t</mi> <mi>&amp;gamma;</mi> </msub> </msqrt> <mo>-</mo> <msqrt> <msub> <mover> <mi>t</mi> <mo>^</mo> </mover> <mi>&amp;gamma;</mi> </msub> </msqrt> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>+</mo> <msup> <mrow> <mo>(</mo> <msqrt> <msub> <mi>h</mi> <mi>&amp;gamma;</mi> </msub> </msqrt> <mo>-</mo> <msqrt> <msub> <mover> <mi>h</mi> <mo>^</mo> </mover> <mi>&amp;gamma;</mi> </msub> </msqrt> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>&amp;rsqb;</mo> </mrow>
Wherein, L1 represents the position penalty values in convolutional neural networks output layer, and λ represents the penalty factor of target location, and D is represented The number of grid of image is divided, F represents the quantity of bounding box,Represent there is mesh in the β bounding box of the γ grid of picture Indicator function existing for mark, uγThe abscissa of future position in the γ grid of picture is represented,Represent γ, picture The abscissa of target actual positions, v in gridγThe ordinate of future position in the γ grid of picture is represented,Represent The ordinate of target actual positions, t in the γ grid of pictureγThe predicted width of target in the γ grid of picture is represented,Table The actual width of target, h in the γ grid of diagram pieceγThe pre-computed altitude of target in the γ grid of picture is represented,Represent The actual height of target in the γ grid of picture;
2nd step, according to the following formula, calculate the probability penalty values that convolutional neural networks output layer has target:
<mrow> <mi>L</mi> <mn>2</mn> <mo>=</mo> <munderover> <mi>&amp;Sigma;</mi> <mrow> <mi>&amp;gamma;</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>D</mi> </munderover> <munderover> <mi>&amp;Sigma;</mi> <mrow> <mi>&amp;beta;</mi> <mo>=</mo> <mn>0</mn> </mrow> <mi>F</mi> </munderover> <msubsup> <mn>1</mn> <mrow> <mi>&amp;gamma;</mi> <mi>&amp;beta;</mi> </mrow> <mrow> <mi>o</mi> <mi>b</mi> <mi>j</mi> </mrow> </msubsup> <msup> <mrow> <mo>(</mo> <msub> <mi>Q</mi> <mi>&amp;gamma;</mi> </msub> <mo>-</mo> <msub> <mover> <mi>Q</mi> <mo>^</mo> </mover> <mi>&amp;gamma;</mi> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>+</mo> <msub> <mi>&amp;lambda;</mi> <mn>1</mn> </msub> <munderover> <mi>&amp;Sigma;</mi> <mrow> <mi>&amp;gamma;</mi> <mo>=</mo> <mn>0</mn> </mrow> <mi>D</mi> </munderover> <munderover> <mi>&amp;Sigma;</mi> <mrow> <mi>&amp;beta;</mi> <mo>=</mo> <mn>0</mn> </mrow> <mi>F</mi> </munderover> <msubsup> <mn>1</mn> <mrow> <mi>&amp;gamma;</mi> <mi>&amp;beta;</mi> </mrow> <mrow> <mi>n</mi> <mi>o</mi> <mi>o</mi> <mi>b</mi> <mi>j</mi> </mrow> </msubsup> <msup> <mrow> <mo>(</mo> <msub> <mi>Q</mi> <mi>&amp;gamma;</mi> </msub> <mo>-</mo> <msub> <mover> <mi>Q</mi> <mo>^</mo> </mover> <mi>&amp;gamma;</mi> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow>
Wherein, L2 represents that convolutional neural networks output layer has the probability penalty values of target, QγRepresent in the γ grid of picture The prediction probability value of target be present,Represent the actual probability of target, λ in the γ grid of picture be present1Expression is not present The penalty factor of target item,Represent the indicator function that target is not present in the β bounding box of the γ grid of picture;
3rd step, according to the following formula, calculate the probability penalty values of convolutional neural networks output layer classification:
<mrow> <mi>L</mi> <mn>3</mn> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>&amp;lambda;</mi> <mo>=</mo> <mn>0</mn> </mrow> <mi>D</mi> </munderover> <msubsup> <mn>1</mn> <mi>&amp;gamma;</mi> <mrow> <mi>o</mi> <mi>b</mi> <mi>j</mi> </mrow> </msubsup> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>c</mi> <mi>l</mi> <mi>a</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>c</mi> <mi>l</mi> <mi>a</mi> <mi>s</mi> <mi>s</mi> <mi>e</mi> <mi>s</mi> </mrow> </munderover> <msup> <mrow> <mo>(</mo> <msub> <mi>p</mi> <mi>&amp;gamma;</mi> </msub> <mo>(</mo> <mrow> <mi>c</mi> <mi>l</mi> <mi>a</mi> </mrow> <mo>)</mo> <mo>-</mo> <msub> <mover> <mi>p</mi> <mo>^</mo> </mover> <mi>&amp;gamma;</mi> </msub> <mo>(</mo> <mrow> <mi>c</mi> <mi>l</mi> <mi>a</mi> </mrow> <mo>)</mo> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow>
Wherein, L3 represents the probability penalty values of convolutional neural networks output layer classification,Represent the γ grid of picture in whether There is indicator function existing for target, classes represents classification sum, pγ(cla) target classification in the γ grid of picture is represented For cla prediction probability value,Represent the true probability value that target classification is cla in the γ grid of picture;
4th step, according to the following formula, calculate the penalty values of convolutional neural networks output layer:
L=L1+L2+L3
Wherein, L represents the penalty values of convolutional neural networks output layer.
7. the YOLO object detection methods according to claim 1 accelerated using OpenCL, it is characterised in that step (4g) Described in use stochastic gradient descent method, parallel computation convolutional neural networks renewal after weighted value, bias it is specific Step is as follows:
1st step, according to the following formula, weight, the Grad of biasing of each each passage of layer of convolutional neural networks is calculated respectively:
<mrow> <msubsup> <mi>&amp;Delta;w</mi> <mrow> <mi>n</mi> <mi>g</mi> </mrow> <mi>r</mi> </msubsup> <mo>=</mo> <mfrac> <mrow> <mo>&amp;part;</mo> <mi>L</mi> </mrow> <mrow> <mo>&amp;part;</mo> <msubsup> <mi>w</mi> <mrow> <mi>n</mi> <mi>g</mi> </mrow> <mi>r</mi> </msubsup> </mrow> </mfrac> </mrow>
<mrow> <msubsup> <mi>&amp;Delta;b</mi> <mi>g</mi> <mi>r</mi> </msubsup> <mo>=</mo> <mfrac> <mrow> <mo>&amp;part;</mo> <mi>L</mi> </mrow> <mrow> <mo>&amp;part;</mo> <msubsup> <mi>b</mi> <mi>g</mi> <mi>r</mi> </msubsup> </mrow> </mfrac> </mrow>
Wherein,The Grad of n-th of weight of convolutional neural networks r layer g passages is represented,Expression takes local derviation to operate,Represent the Grad of g-th of passage biasing of convolutional neural networks r layers;
2nd step, according to the following formula, weighted value, bias after parallel computation convolutional neural networks update respectively:
<mrow> <msubsup> <mover> <mi>w</mi> <mo>&amp;OverBar;</mo> </mover> <mrow> <mi>n</mi> <mi>g</mi> </mrow> <mi>r</mi> </msubsup> <mo>=</mo> <msubsup> <mi>w</mi> <mrow> <mi>n</mi> <mi>g</mi> </mrow> <mi>r</mi> </msubsup> <mo>-</mo> <msubsup> <mi>&amp;alpha;&amp;Delta;w</mi> <mrow> <mi>n</mi> <mi>g</mi> </mrow> <mi>r</mi> </msubsup> </mrow>
<mrow> <msubsup> <mover> <mi>b</mi> <mo>&amp;OverBar;</mo> </mover> <mi>g</mi> <mi>r</mi> </msubsup> <mo>=</mo> <msubsup> <mi>b</mi> <mi>g</mi> <mi>r</mi> </msubsup> <mo>-</mo> <msubsup> <mi>&amp;alpha;&amp;Delta;b</mi> <mi>g</mi> <mi>r</mi> </msubsup> </mrow>
Wherein,The weighted value after the renewal of convolutional neural networks r layer g passages is represented,Represent convolutional neural networks r layers Bias after the renewal of g passages, α represent learning rate, and its span is (0,1).
CN201710798823.7A 2017-09-07 2017-09-07 The YOLO object detection methods accelerated using OpenCL Pending CN107563392A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710798823.7A CN107563392A (en) 2017-09-07 2017-09-07 The YOLO object detection methods accelerated using OpenCL

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710798823.7A CN107563392A (en) 2017-09-07 2017-09-07 The YOLO object detection methods accelerated using OpenCL

Publications (1)

Publication Number Publication Date
CN107563392A true CN107563392A (en) 2018-01-09

Family

ID=60979539

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710798823.7A Pending CN107563392A (en) 2017-09-07 2017-09-07 The YOLO object detection methods accelerated using OpenCL

Country Status (1)

Country Link
CN (1) CN107563392A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108289177A (en) * 2018-02-13 2018-07-17 北京旷视科技有限公司 Information interacting method, apparatus and system
CN108805064A (en) * 2018-05-31 2018-11-13 中国农业大学 A kind of fish detection and localization and recognition methods and system based on deep learning
CN108830195A (en) * 2018-05-31 2018-11-16 西安电子科技大学 Image classification method based on on-site programmable gate array FPGA
CN108982901A (en) * 2018-06-14 2018-12-11 哈尔滨工业大学 A kind of rotating speed measurement method of at the uniform velocity rotary body
CN109447034A (en) * 2018-11-14 2019-03-08 北京信息科技大学 Traffic mark detection method in automatic Pilot based on YOLOv3 network
CN109684143A (en) * 2018-12-26 2019-04-26 郑州云海信息技术有限公司 A kind of method and device of the test GPU performance based on deep learning
CN109858569A (en) * 2019-03-07 2019-06-07 中国科学院自动化研究所 Multi-tag object detecting method, system, device based on target detection network
CN109978043A (en) * 2019-03-19 2019-07-05 新华三技术有限公司 A kind of object detection method and device
CN109977783A (en) * 2019-02-28 2019-07-05 浙江新再灵科技股份有限公司 Method based on the independent boarding detection of vertical ladder scene perambulator
CN110110844A (en) * 2019-04-24 2019-08-09 西安电子科技大学 Convolutional neural networks method for parallel processing based on OpenCL
CN110826379A (en) * 2018-08-13 2020-02-21 中国科学院长春光学精密机械与物理研究所 Target detection method based on feature multiplexing and YOLOv3
CN111078195A (en) * 2018-10-18 2020-04-28 中国科学院长春光学精密机械与物理研究所 Target capture parallel acceleration method based on OPENCL

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104463324A (en) * 2014-11-21 2015-03-25 长沙马沙电子科技有限公司 Convolution neural network parallel processing method based on large-scale high-performance cluster
CN104680558A (en) * 2015-03-14 2015-06-03 西安电子科技大学 Struck target tracking method using GPU hardware for acceleration
CN105160349A (en) * 2015-08-06 2015-12-16 深圳市哈工大交通电子技术有限公司 Haar detection object algorithm based on GPU platform
CN106997475A (en) * 2017-02-24 2017-08-01 中国科学院合肥物质科学研究院 A kind of insect image-recognizing method based on parallel-convolution neutral net

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104463324A (en) * 2014-11-21 2015-03-25 长沙马沙电子科技有限公司 Convolution neural network parallel processing method based on large-scale high-performance cluster
CN104680558A (en) * 2015-03-14 2015-06-03 西安电子科技大学 Struck target tracking method using GPU hardware for acceleration
CN105160349A (en) * 2015-08-06 2015-12-16 深圳市哈工大交通电子技术有限公司 Haar detection object algorithm based on GPU platform
CN106997475A (en) * 2017-02-24 2017-08-01 中国科学院合肥物质科学研究院 A kind of insect image-recognizing method based on parallel-convolution neutral net

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ANDRÉ R. BRODTKORB等: "Graphics processing unit (GPU) programming strategies and trends in GPU computing", 《J. PARALLEL DISTRIB. COMPUT.》 *
JOSEPH REDMON等: ""You Only Look Once:Unified, Real-Time Object Detection"", 《网页在线公开:ARXIV:1506.02640V5》 *
LOC NGUYEN HUYNH等: ""Demo: GPU-based image recognition and object detection on commodity mobile devices"", 《MOBISYS’16 COMPANION PROCEEDINGS OF THE 14TH ANNUAL INTERNATIONAL CONFERENCE ON MOBILE SYSTEMS,APPLICATIONS,AND SERVICES COMPANION》 *
STEVE LAWRENCE等: "Face Recognition: A Convolutional Neural-Network Approach", 《IEEE TRANSACTIONS ON NEURAL NETWORKS》 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108289177A (en) * 2018-02-13 2018-07-17 北京旷视科技有限公司 Information interacting method, apparatus and system
CN108289177B (en) * 2018-02-13 2020-10-16 北京旷视科技有限公司 Information interaction method, device and system
CN108805064A (en) * 2018-05-31 2018-11-13 中国农业大学 A kind of fish detection and localization and recognition methods and system based on deep learning
CN108830195A (en) * 2018-05-31 2018-11-16 西安电子科技大学 Image classification method based on on-site programmable gate array FPGA
CN108982901A (en) * 2018-06-14 2018-12-11 哈尔滨工业大学 A kind of rotating speed measurement method of at the uniform velocity rotary body
CN108982901B (en) * 2018-06-14 2020-06-09 哈尔滨工业大学 Method for measuring rotating speed of uniform-speed rotating body
CN110826379A (en) * 2018-08-13 2020-02-21 中国科学院长春光学精密机械与物理研究所 Target detection method based on feature multiplexing and YOLOv3
CN110826379B (en) * 2018-08-13 2022-03-22 中国科学院长春光学精密机械与物理研究所 Target detection method based on feature multiplexing and YOLOv3
CN111078195A (en) * 2018-10-18 2020-04-28 中国科学院长春光学精密机械与物理研究所 Target capture parallel acceleration method based on OPENCL
CN109447034A (en) * 2018-11-14 2019-03-08 北京信息科技大学 Traffic mark detection method in automatic Pilot based on YOLOv3 network
CN109447034B (en) * 2018-11-14 2021-04-06 北京信息科技大学 Traffic sign detection method in automatic driving based on YOLOv3 network
CN109684143A (en) * 2018-12-26 2019-04-26 郑州云海信息技术有限公司 A kind of method and device of the test GPU performance based on deep learning
CN109977783A (en) * 2019-02-28 2019-07-05 浙江新再灵科技股份有限公司 Method based on the independent boarding detection of vertical ladder scene perambulator
CN109858569A (en) * 2019-03-07 2019-06-07 中国科学院自动化研究所 Multi-tag object detecting method, system, device based on target detection network
CN109978043A (en) * 2019-03-19 2019-07-05 新华三技术有限公司 A kind of object detection method and device
CN110110844A (en) * 2019-04-24 2019-08-09 西安电子科技大学 Convolutional neural networks method for parallel processing based on OpenCL

Similar Documents

Publication Publication Date Title
CN107563392A (en) The YOLO object detection methods accelerated using OpenCL
CN108985238A (en) The high-resolution remote sensing image impervious surface extracting method and system of combined depth study and semantic probability
Zheng et al. Edge effects in fragmented landscapes: a generic model for delineating area of edge influences (D-AEI)
CN109886359A (en) Small target detecting method and detection model based on convolutional neural networks
CN108062756A (en) Image, semantic dividing method based on the full convolutional network of depth and condition random field
CN111178206B (en) Building embedded part detection method and system based on improved YOLO
CN108334499A (en) A kind of text label tagging equipment, method and computing device
CN108596101A (en) A kind of remote sensing images multi-target detection method based on convolutional neural networks
CN110263833A (en) Based on coding-decoding structure image, semantic dividing method
CN107392973A (en) Pixel-level handwritten Chinese character automatic generation method, storage device, processing unit
CN111833237B (en) Image registration method based on convolutional neural network and local homography transformation
CN104217438A (en) Image significance detection method based on semi-supervision
CN108241854A (en) A kind of deep video conspicuousness detection method based on movement and recall info
CN110516677A (en) A kind of neural network recognization model, target identification method and system
CN107967474A (en) A kind of sea-surface target conspicuousness detection method based on convolutional neural networks
CN103268607B (en) A kind of common object detection method under weak supervision condition
CN107092883A (en) Object identification method for tracing
CN108447057A (en) SAR image change detection based on conspicuousness and depth convolutional network
CN106372597B (en) CNN Vehicle Detection method based on adaptive contextual information
CN109255304A (en) Method for tracking target based on distribution field feature
CN108776777A (en) The recognition methods of spatial relationship between a kind of remote sensing image object based on Faster RCNN
CN108664986A (en) Based on lpThe multi-task learning image classification method and system of norm regularization
CN107506792A (en) A kind of semi-supervised notable method for checking object
CN107239532A (en) Data digging method and device
CN109948527A (en) Small sample terahertz image foreign matter detecting method based on integrated deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180109

RJ01 Rejection of invention patent application after publication