CN107563392A

CN107563392A - The YOLO object detection methods accelerated using OpenCL

Info

Publication number: CN107563392A
Application number: CN201710798823.7A
Authority: CN
Inventors: 田小林; 张晰; 逯甜甜; 赵启明
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2017-09-07
Filing date: 2017-09-07
Publication date: 2018-01-09

Abstract

The invention discloses a kind of YOLO object detection methods accelerated using GPU hardware, the step of realizing, is：(1) convolutional neural networks are initialized；(2) training sample is obtained；(3) grid of training sample is divided；(4) training convolutional neural networks；(5) judge whether penalty values are less than 0.01, if so, the convolutional neural networks model trained is preserved, if it is not, the training sample obtained next time continues to train；(6) model of the convolutional neural networks trained is preserved to hard disc of computer；(7) feature of test pictures is extracted；(8) the position rectangle frame of test pictures target is determined；(9) target end detects.Available realize on a general-purpose computer of the present invention carries out feature extraction to the target in image, then indicates target position with position rectangle frame, and indicate the classification of target in the rectangle frame upper right corner.

Description

The YOLO object detection methods accelerated using OpenCL

Technical field

The invention belongs to field of computer technology, further relates in computer vision and depth learning technology field One kind accelerated using open computing language OpenCL (Open Computing Language) you only have a look at YOLO (You Only Look Once) object detection method.The present invention can realize to be examined to the YOLO targets based on depth convolutional neural networks Survey method accelerates, and the target in picture can be detected in real time with realizing on a general-purpose computer.

Background technology

It is the core technology of computer vision field with high speed, high performance object detection method.In recent years based on deep Degree convolutional neural networks yield unusually brilliant results in computer vision field, based on the image classification of depth convolutional neural networks, target inspection Compared with traditional method, Classification and Identification accuracy rate is obviously improved method of determining and calculating.For the target detection of complex scene, based on depth The algorithm of convolutional neural networks has stronger robustness, is suitable for scene, the change of intensity of illumination.

Patent application " a kind of convolution based on extensive High-Performance Computing Cluster that Changsha Ma Sha Electronic Science and Technology Co., Ltd.s propose Neural network concurrent the processing method " (applying date：On November 21st, 2014, application number：2014106748603, publication number： CN104463324A a kind of convolutional neural networks method for parallel processing based on extensive High-Performance Computing Cluster is disclosed in).The party The network model that method will train first constructs multiple copies, the model parameter all same of each copy, the number of copy with The nodes of High-Performance Computing Cluster are identical, and a model copy is distributed on each node；A node is selected as host node, is responsible for The broadcast and collection of model parameter.Secondly, training set is divided into some subsets, every time by training subset be distributed to except host node it Remaining outer child node, it is common to carry out the calculating of parameter gradients, and Grad is added up, aggregate-value is used for updating host node model Parameter, the model parameter after renewal is broadcast to each child node, until model training terminates.But this method Shortcomings , the parallel-convolution neutral net in extensive High-Performance Computing Cluster, code can not transplant on general computer, the big day of one's doom Its application is made.

Patent application " a kind of Haar based on GPU platform that Shenzhen Hagongda Traffic Electronic Technology Co., Ltd. proposes Detect target algorithm " (the applying date：On December 16th, 2015, application number：2015104762047, publication number：CN105160349A) In disclose it is a kind of based on GPU accelerate Haar classifier object detection method.This method reads sorter model text first Part information, calculates the scaled size information of all sizes, and transfers data in GPU equipment, then calculates integrogram peace Square integrogram, standard deviation figure corresponding to different scale is obtained, finally realize mesh using the mode of haar grader parallel processings Mark detection.But this method Shortcomings, only calculate integrogram and this side for manually selecting feature of square integrogram Formula, do not have good generalization ability to more complicated scene, accuracy of detection is relatively low.

The content of the invention

It is an object of the invention to overcome above-mentioned the deficiencies in the prior art, there is provided a kind of YOLO accelerated using OpenCL Object detection method, it can be achieved to detect the target in image.

The step of the present invention is as follows：

(1) convolutional neural networks are initialized：

Initial weight value, bias, the batch Normalized Scale factor values of convolutional neural networks convolutional layer, profit are calculated respectively Convolutional neural networks are initialized with three values of calculating；

(2) training sample is obtained：

(2a) randomly selects 64 width pictures from the pictures containing 20 kinds of target classifications and mark rectangle frame；

(2b) pre-processes to selected each width picture；

(2c) by the length of pretreated each width picture, it is wide be disposed as 448 pixels after form training sample set；

(3) grid of training sample is divided：

Each width picture that training sample is concentrated is divided into 7*7 square net, the size of each grid is 64* 64；

(4) training convolutional neural networks：

Training sample set is input in convolutional neural networks by (4a)；

(4b) utilizes computer graphics processor GPU, each output of convolutional layer in parallel computation convolutional neural networks Characteristic value, the output characteristic value matrix by all eigenvalue clusters into convolutional layer in convolutional neural networks；

(4c) takes the maximum of output characteristic in the 2*2 neighborhoods of the output characteristic value matrix of convolutional layer, as convolution god Output characteristic value matrix through network maximum pond layer；

(4e) utilizes computer graphics processor GPU, flexible maximum softmax layers in parallel computation convolutional neural networks Each output characteristic value, the output by all eigenvalue clusters into flexible maximum softmax layers in convolutional neural networks Eigenvalue matrix；

(4f) utilizes computer graphics processor GPU, the penalty values of parallel computation convolutional neural networks output layer；

(4g) utilizes computer graphics processor GPU, using stochastic gradient descent method, parallel computation convolutional neural networks Weighted value, bias；

(5) judge whether the penalty values of current convolutional neural networks output layer are less than 0.01, if so, step (6) is then performed, Otherwise, step (2) is performed；

(6) model of the convolutional neural networks trained is preserved to hard disc of computer；

(7) output characteristic of test pictures is extracted：

(7a) is never marked in the pictures of rectangle frame and is randomly selected 1 width picture, as test pictures；

Test pictures are input in convolutional neural networks by (7b), obtain the output characteristic of test pictures；

(8) the position rectangle frame of test pictures target is indicated：

According to the output characteristic of test pictures, in test pictures, target is indicated with rectangle frame, and show target Classification；

(9) target end detects.

The present invention compared with prior art, has advantages below：

First, the present invention uses computer graphics processor GPU, the parallel feature for extracting training sample and test sample, Overcome and extract that characteristics of image is complicated, portable poor asks parallel using extensive High-Performance Computing Cluster in the prior art Topic so that while the extraction rate of sample characteristics is ensured, the portability of code greatly enhances the present invention.

Second, the present invention uses computer graphics processor GPU, the parallel feature for extracting training sample and test sample, Overcome the problem of relatively low to complex scene target detection precision in the prior art so that the present invention greatly strengthen target detection Generalization ability of the algorithm under several scenes, improve the precision of target detection.

Brief description of the drawings

Fig. 1 is the flow chart of the present invention.

Embodiment

The invention will be further described below in conjunction with the accompanying drawings.

The present invention uses OpenCL language, can be in the GPU equipment of NVIDIA any a support OpenCL frameworks Realize.

Reference picture 1, the present invention can be realized by following steps：

Step 1, convolutional neural networks are initialized.

According to the following formula, the initial weight value of convolutional neural networks convolutional layer, bias, batch Normalized Scale are calculated respectively Factor values, convolutional neural networks are initialized using three values of calculating.

Wherein,N-th of weighted value of convolutional neural networks r layer g passages is represented ,~represent to obey probability distribution Symbol,Radical sign operation is opened in expression, and π represents pi, and exp () represents the index operation using natural constant e the bottom of as,Table Show the bias of convolutional neural networks r layer g passages,Represent that the batch of convolutional neural networks r layer g passages is returned One changes scale factor value.

Step 2, training sample is obtained.

From the pictures containing 20 kinds of target classifications and mark rectangle frame, 64 pictures are randomly selected.

Selected each pictures are pre-processed according to the 1st following step, the 2nd step, the 3rd step.

1st step, in [- 15,15] angular range, any anglec of rotation for choosing a value as selected each pictures Degree, each pictures are rotated with the selected anglec of rotation.

2nd step, in [- 20,20] pixel coverage, any one value of selection is moved as the horizontal of selected each pictures Dynamic pixel value, each pictures are moved horizontally with selected pixel value.

3rd step, in [- 20,20] pixel coverage, any vertical shifting for choosing a value as selected each pictures Dynamic pixel value, with the selected each pictures of pixel value vertical shift, obtain pretreated picture.Will be pretreated The length of each pictures, it is wide be disposed as 448 pixels after will form training sample set.

Step 3, the grid of training sample is divided.

Each pictures that training sample is concentrated are divided into 7*7 square net, the size of each grid is 64* 64。

Step 4, training convolutional neural networks.

Training sample set is input in convolutional neural networks.

Using computer graphics processor GPU, according in following the 1st step and the 2nd step parallel computation convolutional neural networks Each output characteristic value of convolutional layer, the output characteristic value square by all eigenvalue clusters into convolutional layer in convolutional neural networks Battle array.

1st step, according to the following formula, calculate the output valve of convolution operation in convolutional neural networks：

Wherein,Represent i-th of output valve obtained by convolution operation of convolutional neural networks r layer jth passages, ∑ Represent sum operation, S_gThe size of convolution kernel g passages is represented,Represent the jth passage of convolutional neural networks r-1 layers I-th of output characteristic value, * represent product operation；

2nd step, according to the following formula, calculate each output characteristic value of convolutional layer in convolutional neural networks：

Wherein, A_tT-th of output characteristic value of convolutional layer in convolutional neural networks is represented, activate represents activation primitive Operation, m represent channel size, and δ represents a minimum number for being substantially equal to 0.

The maximum for taking output characteristic in the 2*2 neighborhoods of the output characteristic value matrix of convolutional layer, as convolutional Neural The output characteristic value matrix of network maximum pond layer.

According to the following formula, using computer graphics processor GPU, flexible maximum in parallel computation convolutional neural networks Each output characteristic value of softmax layers, its all result form flexible maximum softmax layers in convolutional neural networks Output characteristic value matrix.

Wherein, Y_zRepresent z-th of output characteristic value of convolutional neural networks flexibility maximum softmax layers, x_kRepresent convolution K-th of input feature vector value of neutral net flexibility maximum softmax layers, e represent that the input of flexible maximum softmax layers is special Value indicative sum.

According to the 1st following step, the 2nd step, the 3rd step, the 4th step, computer graphics processor GPU, parallel computation volume are utilized The penalty values of product neutral net output layer.

1st step, according to the following formula, calculate the position penalty values in convolutional neural networks output layer：

Wherein, L1 represents the position penalty values in convolutional neural networks output layer, and λ represents the penalty factor of target location, D The number of grid of division image is represented, F represents the quantity of bounding box,In the β bounding box for representing the γ grid of picture There are indicator function existing for target, u_γThe abscissa of future position in the γ grid of picture is represented,Represent picture the The abscissa of target actual positions, v in γ grid_γThe ordinate of future position in the γ grid of picture is represented, Represent the ordinate of target actual positions in the γ grid of picture, t_γRepresent that the prediction of target in the γ grid of picture is wide Degree,Represent the width that target is actual in the γ grid of picture, h_γThe pre-computed altitude of target in the γ grid of picture is represented,Represent the height that target is actual in the γ grid of picture；

2nd step, according to the following formula, calculate the probability penalty values that convolutional neural networks output layer has target：

Wherein, L2 represents that convolutional neural networks output layer has the probability penalty values of target, Q_γRepresent the γ net of picture The prediction probability value of target in lattice be present,Represent the actual probability of target, λ in the γ grid of picture be present₁Represent not The penalty factor of target item be present,Represent the instruction letter that target is not present in the β bounding box of the γ grid of picture Number；

3rd step, according to the following formula, calculate the probability penalty values of convolutional neural networks output layer classification：

Wherein, L3 represents the probability penalty values of convolutional neural networks output layer classification,Represent in the γ grid of picture Whether target existing for indicator function is had, and classes represents classification sum, p_γ(cla) target in the γ grid of picture is represented Classification is cla prediction probability value,Represent the true probability value that target classification is cla in the γ grid of picture.

4th step, according to the following formula, calculate the penalty values of convolutional neural networks output layer.

L=L1+L2+L3

Wherein, L represents the penalty values of convolutional neural networks output layer.

According to the following formula, using computer graphics processor GPU, using stochastic gradient descent method, parallel computation convolution god Weighted value, bias after network updates.

1st step, according to the following formula, weight, the gradient of biasing of each each passage of layer of convolutional neural networks is calculated respectively Value.

Wherein,The Grad of n-th of weight of convolutional neural networks r layer g passages is represented,Expression takes local derviation Operation,Represent the Grad of g-th of passage biasing of convolutional neural networks r layers.

2nd step, according to the following formula, weighted value, bias after parallel computation convolutional neural networks update respectively.

Wherein,The weighted value after the renewal of convolutional neural networks r layer g passages is represented,Represent convolutional neural networks Bias after the renewal of r layer g passages, α represent learning rate, and its span is (0,1).

Step 5, judge whether the penalty values of current convolutional neural networks output layer are less than 0.01, if so, then performing step (6) step (2), otherwise, is performed.

Step 6, the model of the convolutional neural networks trained is preserved to hard disc of computer.

Step 7, the output characteristic of test pictures is extracted.

Never mark in the pictures of rectangle frame and randomly select 1 width picture, as test pictures.

Test pictures are input in convolutional neural networks, obtain the output characteristic of test pictures.

Step 8, the position rectangle frame of test pictures target is indicated.

According to the output characteristic of test pictures, in test pictures, target is indicated with rectangle frame, and show target Classification.

Step 9, target end detects.

The effect of the present invention is described in further detail with reference to emulation experiment.

1. emulation experiment condition：

The heterogeneous platform of the emulation experiment of the present invention is NVDIA isomery development platforms, and wherein host side CPU is Xeon E5- 1603, graphics processor is NVDIA GTX 1080, and operating system is Ubuntu 14.04, and software environment is Eclipse CDT.

2. emulation experiment content and its interpretation of result：

The emulation experiment of the present invention is to randomly select 1 width picture in the pictures for never mark rectangle frame, as test Picture.Test pictures are subjected to target detection using traditional YOLO algorithms and the inventive method respectively, obtain two kinds of sides respectively Method selected test pictures are carried out needed for target detection it is time-consuming contrast, as a result as shown in table 1.

Table 1 is of the invention with the time-consuming contrast table (unit of traditional YOLO object detection methods:ms)

Traditional YOLO object detection methods take	2613
		The inventive method takes	53

From table 1, the YOLO object detection methods proposed by the invention accelerated using OpenCL are compared traditional YOLO object detection methods, it is time-consuming to significantly reduce.

Claims

1. a kind of YOLO object detection methods accelerated using OpenCL, it is characterised in that comprise the following steps：

(1) convolutional neural networks are initialized：

Initial weight value, bias, the batch Normalized Scale factor values of convolutional neural networks convolutional layer are calculated respectively, utilize meter Three values calculated initialize to convolutional neural networks；

(2) training sample is obtained：

(2b) pre-processes to selected each width picture；

(3) grid of training sample is divided：

Each width picture that training sample is concentrated is divided into 7*7 square net, the size of each grid is 64*64；

(4) training convolutional neural networks：

Training sample set is input in convolutional neural networks by (4a)；

(4b) utilizes computer graphics processor GPU, each output characteristic of convolutional layer in parallel computation convolutional neural networks Value, the output characteristic value matrix by all eigenvalue clusters into convolutional layer in convolutional neural networks；

(4c) takes the maximum of output characteristic in the 2*2 neighborhoods of the output characteristic value matrix of convolutional layer, as convolutional Neural net The output characteristic value matrix of network maximum pond layer；

(4e) utilizes computer graphics processor GPU, and flexible maximum softmax layers is every in parallel computation convolutional neural networks One output characteristic value, the output characteristic by all eigenvalue clusters into flexible maximum softmax layers in convolutional neural networks Value matrix；

(4g) utilizes computer graphics processor GPU, using stochastic gradient descent method, the renewal of parallel computation convolutional neural networks Weighted value, bias afterwards；

(5) judge whether the penalty values of current convolutional neural networks output layer are less than 0.01, if so, step (6) is then performed, it is no Then, step (2) is performed；

(7) output characteristic of test pictures is extracted：

(8) the position rectangle frame of test pictures target is indicated：

According to the output characteristic of test pictures, in test pictures, target is indicated with rectangle frame, and show the class of target Not；

(9) target end detects.

2. the YOLO object detection methods according to claim 1 accelerated using OpenCL, it is characterised in that step (1) Described in respectively calculate the initial weight value of convolutional neural networks convolutional layer, bias, batch Normalized Scale factor values Formula is as follows：

<mrow> <msubsup> <mi>scale</mi> <mi>j</mi> <mi>r</mi> </msubsup> <mo>~</mo> <mfrac> <mn>1</mn> <msqrt> <mrow> <mn>2</mn> <mi>&pi;</mi> </mrow> </msqrt> </mfrac> <mi>exp</mi> <mrow> <mo>(</mo> <mo>-</mo> <mfrac> <mrow> <msubsup> <mi>scale</mi> <mi>j</mi> <msup> <mi>r</mi> <mn>2</mn> </msup> </msubsup> </mrow> <mn>2</mn> </mfrac> <mo>)</mo> </mrow> </mrow>

Wherein,N-th of weighted value of convolutional neural networks r layer g passages is represented ,~represent to obey probability distribution symbol,Radical sign operation is opened in expression, and π represents pi, and exp () represents the index operation using natural constant e the bottom of as,Represent convolution The bias of neutral net r layer g passages,Represent the batch normalization chi of convolutional neural networks r layer g passages Spend factor values.

3. the YOLO object detection methods according to claim 1 accelerated using OpenCL, it is characterised in that step (2b) Described in comprised the following steps that to what selected each width picture was pre-processed：

1st step, in [- 15,15] angular range, any anglec of rotation for choosing a value as selected each width picture, Each width picture is rotated with the selected anglec of rotation；

2nd step, in [- 20,20] pixel coverage, arbitrarily choose moving horizontally for each width picture of the value as selected by Pixel value, each width picture is moved horizontally with selected pixel value；

3rd step, it is any to choose a value as the vertical shift of selected each width picture in [- 20,20] pixel coverage Pixel value, with the selected each width picture of pixel value vertical shift, obtain pretreated picture.

4. the YOLO object detection methods according to claim 1 accelerated using OpenCL, it is characterised in that step (4b) Described in parallel computation convolutional neural networks in each output characteristic value of convolutional layer comprise the following steps that：

<mrow> <msubsup> <mi>C</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> <mi>r</mi> </msubsup> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <msub> <mi>S</mi> <mi>g</mi> </msub> </munderover> <msubsup> <mi>x</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> <mrow> <mi>r</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>*</mo> <msubsup> <mi>w</mi> <mrow> <mi>n</mi> <mi>g</mi> </mrow> <mi>r</mi> </msubsup> </mrow>

Wherein,I-th of output valve obtained by convolution operation of convolutional neural networks r layer jth passages is represented, ∑ represents Sum operation, S_gThe size of convolution kernel g passages is represented,Represent i-th of the jth passage of convolutional neural networks r-1 layers Output characteristic value, * represent product operation；

<mrow> <msub> <mi>A</mi> <mi>t</mi> </msub> <mo>=</mo> <mi>a</mi> <mi>c</mi> <mi>t</mi> <mi>i</mi> <mi>v</mi> <mi>a</mi> <mi>t</mi> <mi>e</mi> <mrow> <mo>(</mo> <msubsup> <mi>scale</mi> <mi>j</mi> <mi>r</mi> </msubsup> <mfrac> <mrow> <msubsup> <mi>C</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> <mi>r</mi> </msubsup> <mo>-</mo> <mfrac> <mn>1</mn> <mi>m</mi> </mfrac> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <msubsup> <mi>C</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> <mi>r</mi> </msubsup> </mrow> <msqrt> <mrow> <mfrac> <mn>1</mn> <mi>m</mi> </mfrac> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <msup> <mrow> <mo>(</mo> <msubsup> <mi>C</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> <mi>r</mi> </msubsup> <mo>-</mo> <mfrac> <mn>1</mn> <mi>m</mi> </mfrac> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <msubsup> <mi>C</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> <mi>r</mi> </msubsup> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>+</mo> <mi>&delta;</mi> </mrow> </msqrt> </mfrac> <mo>+</mo> <msubsup> <mi>b</mi> <mi>g</mi> <mi>r</mi> </msubsup> <mo>)</mo> </mrow> </mrow>

Wherein, A_tT-th of output characteristic value of convolutional layer in convolutional neural networks is represented, activate represents activation primitive operation, M represents channel size, and δ represents a minimum number for being substantially equal to 0.

5. the YOLO object detection methods according to claim 1 accelerated using OpenCL, it is characterised in that step (4e) Described in parallel computation convolutional neural networks in flexible maximum softmax layers each output characteristic value formula such as Under：

<mrow> <msub> <mi>Y</mi> <mi>z</mi> </msub> <mo>=</mo> <mfrac> <mrow> <mi>exp</mi> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <munderover> <mo>&Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>e</mi> </munderover> <mi>exp</mi> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow>

Wherein, Y_zRepresent z-th of output characteristic value of convolutional neural networks flexibility maximum softmax layers, x_kRepresent convolutional Neural K-th of input feature vector value of network flexibility maximum softmax layers, e represent the input feature vector value of flexible maximum softmax layers Sum.

6. the YOLO object detection methods according to claim 1 accelerated using OpenCL, it is characterised in that step (4f) Described in the penalty values of parallel computation convolutional neural networks output layer comprise the following steps that：

<mrow> <mi>L</mi> <mn>1</mn> <mo>=</mo> <mi>&lambda;</mi> <munderover> <mo>&Sigma;</mo> <mrow> <mi>&gamma;</mi> <mo>=</mo> <mn>0</mn> </mrow> <mi>D</mi> </munderover> <munderover> <mo>&Sigma;</mo> <mrow> <mi>&beta;</mi> <mo>=</mo> <mn>0</mn> </mrow> <mi>F</mi> </munderover> <msubsup> <mn>1</mn> <mrow> <mi>&gamma;</mi> <mi>&beta;</mi> </mrow> <mrow> <mi>o</mi> <mi>b</mi> <mi>j</mi> </mrow> </msubsup> <mo>&lsqb;</mo> <msup> <mrow> <mo>(</mo> <msub> <mi>u</mi> <mi>&gamma;</mi> </msub> <mo>-</mo> <msub> <mover> <mi>u</mi> <mo>^</mo> </mover> <mi>&gamma;</mi> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>+</mo> <msup> <mrow> <mo>(</mo> <msub> <mi>v</mi> <mi>&gamma;</mi> </msub> <mo>-</mo> <msub> <mover> <mi>v</mi> <mo>^</mo> </mover> <mi>&gamma;</mi> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>&rsqb;</mo> <mo>+</mo> <mi>&lambda;</mi> <munderover> <mo>&Sigma;</mo> <mrow> <mi>&gamma;</mi> <mo>=</mo> <mn>0</mn> </mrow> <mi>D</mi> </munderover> <munderover> <mo>&Sigma;</mo> <mrow> <mi>&beta;</mi> <mo>=</mo> <mn>0</mn> </mrow> <mi>F</mi> </munderover> <msubsup> <mn>1</mn> <mrow> <mi>&gamma;</mi> <mi>&beta;</mi> </mrow> <mrow> <mi>o</mi> <mi>b</mi> <mi>j</mi> </mrow> </msubsup> <mo>&lsqb;</mo> <msup> <mrow> <mo>(</mo> <msqrt> <msub> <mi>t</mi> <mi>&gamma;</mi> </msub> </msqrt> <mo>-</mo> <msqrt> <msub> <mover> <mi>t</mi> <mo>^</mo> </mover> <mi>&gamma;</mi> </msub> </msqrt> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>+</mo> <msup> <mrow> <mo>(</mo> <msqrt> <msub> <mi>h</mi> <mi>&gamma;</mi> </msub> </msqrt> <mo>-</mo> <msqrt> <msub> <mover> <mi>h</mi> <mo>^</mo> </mover> <mi>&gamma;</mi> </msub> </msqrt> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>&rsqb;</mo> </mrow>

Wherein, L1 represents the position penalty values in convolutional neural networks output layer, and λ represents the penalty factor of target location, and D is represented The number of grid of image is divided, F represents the quantity of bounding box,Represent there is mesh in the β bounding box of the γ grid of picture Indicator function existing for mark, u_γThe abscissa of future position in the γ grid of picture is represented,Represent γ, picture The abscissa of target actual positions, v in grid_γThe ordinate of future position in the γ grid of picture is represented,Represent The ordinate of target actual positions, t in the γ grid of picture_γThe predicted width of target in the γ grid of picture is represented,Table The actual width of target, h in the γ grid of diagram piece_γThe pre-computed altitude of target in the γ grid of picture is represented,Represent The actual height of target in the γ grid of picture；

<mrow> <mi>L</mi> <mn>2</mn> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>&gamma;</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>D</mi> </munderover> <munderover> <mi>&Sigma;</mi> <mrow> <mi>&beta;</mi> <mo>=</mo> <mn>0</mn> </mrow> <mi>F</mi> </munderover> <msubsup> <mn>1</mn> <mrow> <mi>&gamma;</mi> <mi>&beta;</mi> </mrow> <mrow> <mi>o</mi> <mi>b</mi> <mi>j</mi> </mrow> </msubsup> <msup> <mrow> <mo>(</mo> <msub> <mi>Q</mi> <mi>&gamma;</mi> </msub> <mo>-</mo> <msub> <mover> <mi>Q</mi> <mo>^</mo> </mover> <mi>&gamma;</mi> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>+</mo> <msub> <mi>&lambda;</mi> <mn>1</mn> </msub> <munderover> <mi>&Sigma;</mi> <mrow> <mi>&gamma;</mi> <mo>=</mo> <mn>0</mn> </mrow> <mi>D</mi> </munderover> <munderover> <mi>&Sigma;</mi> <mrow> <mi>&beta;</mi> <mo>=</mo> <mn>0</mn> </mrow> <mi>F</mi> </munderover> <msubsup> <mn>1</mn> <mrow> <mi>&gamma;</mi> <mi>&beta;</mi> </mrow> <mrow> <mi>n</mi> <mi>o</mi> <mi>o</mi> <mi>b</mi> <mi>j</mi> </mrow> </msubsup> <msup> <mrow> <mo>(</mo> <msub> <mi>Q</mi> <mi>&gamma;</mi> </msub> <mo>-</mo> <msub> <mover> <mi>Q</mi> <mo>^</mo> </mover> <mi>&gamma;</mi> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow>

Wherein, L2 represents that convolutional neural networks output layer has the probability penalty values of target, Q_γRepresent in the γ grid of picture The prediction probability value of target be present,Represent the actual probability of target, λ in the γ grid of picture be present₁Expression is not present The penalty factor of target item,Represent the indicator function that target is not present in the β bounding box of the γ grid of picture；

<mrow> <mi>L</mi> <mn>3</mn> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>&lambda;</mi> <mo>=</mo> <mn>0</mn> </mrow> <mi>D</mi> </munderover> <msubsup> <mn>1</mn> <mi>&gamma;</mi> <mrow> <mi>o</mi> <mi>b</mi> <mi>j</mi> </mrow> </msubsup> <munderover> <mo>&Sigma;</mo> <mrow> <mi>c</mi> <mi>l</mi> <mi>a</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>c</mi> <mi>l</mi> <mi>a</mi> <mi>s</mi> <mi>s</mi> <mi>e</mi> <mi>s</mi> </mrow> </munderover> <msup> <mrow> <mo>(</mo> <msub> <mi>p</mi> <mi>&gamma;</mi> </msub> <mo>(</mo> <mrow> <mi>c</mi> <mi>l</mi> <mi>a</mi> </mrow> <mo>)</mo> <mo>-</mo> <msub> <mover> <mi>p</mi> <mo>^</mo> </mover> <mi>&gamma;</mi> </msub> <mo>(</mo> <mrow> <mi>c</mi> <mi>l</mi> <mi>a</mi> </mrow> <mo>)</mo> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow>

Wherein, L3 represents the probability penalty values of convolutional neural networks output layer classification,Represent the γ grid of picture in whether There is indicator function existing for target, classes represents classification sum, p_γ(cla) target classification in the γ grid of picture is represented For cla prediction probability value,Represent the true probability value that target classification is cla in the γ grid of picture；

4th step, according to the following formula, calculate the penalty values of convolutional neural networks output layer：

L=L1+L2+L3

7. the YOLO object detection methods according to claim 1 accelerated using OpenCL, it is characterised in that step (4g) Described in use stochastic gradient descent method, parallel computation convolutional neural networks renewal after weighted value, bias it is specific Step is as follows：

1st step, according to the following formula, weight, the Grad of biasing of each each passage of layer of convolutional neural networks is calculated respectively：

<mrow> <msubsup> <mi>&Delta;w</mi> <mrow> <mi>n</mi> <mi>g</mi> </mrow> <mi>r</mi> </msubsup> <mo>=</mo> <mfrac> <mrow> <mo>&part;</mo> <mi>L</mi> </mrow> <mrow> <mo>&part;</mo> <msubsup> <mi>w</mi> <mrow> <mi>n</mi> <mi>g</mi> </mrow> <mi>r</mi> </msubsup> </mrow> </mfrac> </mrow>

<mrow> <msubsup> <mi>&Delta;b</mi> <mi>g</mi> <mi>r</mi> </msubsup> <mo>=</mo> <mfrac> <mrow> <mo>&part;</mo> <mi>L</mi> </mrow> <mrow> <mo>&part;</mo> <msubsup> <mi>b</mi> <mi>g</mi> <mi>r</mi> </msubsup> </mrow> </mfrac> </mrow>

Wherein,The Grad of n-th of weight of convolutional neural networks r layer g passages is represented,Expression takes local derviation to operate,Represent the Grad of g-th of passage biasing of convolutional neural networks r layers；

2nd step, according to the following formula, weighted value, bias after parallel computation convolutional neural networks update respectively：

<mrow> <msubsup> <mover> <mi>w</mi> <mo>&OverBar;</mo> </mover> <mrow> <mi>n</mi> <mi>g</mi> </mrow> <mi>r</mi> </msubsup> <mo>=</mo> <msubsup> <mi>w</mi> <mrow> <mi>n</mi> <mi>g</mi> </mrow> <mi>r</mi> </msubsup> <mo>-</mo> <msubsup> <mi>&alpha;&Delta;w</mi> <mrow> <mi>n</mi> <mi>g</mi> </mrow> <mi>r</mi> </msubsup> </mrow>

<mrow> <msubsup> <mover> <mi>b</mi> <mo>&OverBar;</mo> </mover> <mi>g</mi> <mi>r</mi> </msubsup> <mo>=</mo> <msubsup> <mi>b</mi> <mi>g</mi> <mi>r</mi> </msubsup> <mo>-</mo> <msubsup> <mi>&alpha;&Delta;b</mi> <mi>g</mi> <mi>r</mi> </msubsup> </mrow>

Wherein,The weighted value after the renewal of convolutional neural networks r layer g passages is represented,Represent convolutional neural networks r layers Bias after the renewal of g passages, α represent learning rate, and its span is (0,1).