CN112184756A

CN112184756A - Single-target rapid detection method based on deep learning

Info

Publication number: CN112184756A
Application number: CN202011063213.0A
Authority: CN
Inventors: 韩勇强; 李利华; 刘泳庆; 杨旭; 卢彤春
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2020-09-30
Filing date: 2020-09-30
Publication date: 2021-01-05

Abstract

The invention discloses a single-target rapid detection method based on deep learning, which is designed aiming at the training process of a network, designs a loss function and an optimization method of the network, optimizes the detection process of a video sequence, traverses the whole image by using a sliding window method under the condition that the target position is not detected in an initial state or in the previous frame, sends the image of each window into a convolutional neural network for calculation until the target position information is detected, and then starts to detect the image of the next frame. When the target position information (x) in the previous frame image_k‑1，y_k‑1) When known, the center coordinate is (x)_k‑1，y_k‑1) The window is sent into a convolutional neural network for operation, and the target can be rapidly positioned by detecting the window. Due to the neural netThe network scale and the network input size are closely related, and the technical scheme provided by the invention can optimize the network by controlling the input size of the network, thereby reducing the network scale and improving the network computing speed while ensuring the performance.

Description

Single-target rapid detection method based on deep learning

Technical Field

The invention relates to the technical field of computer vision, in particular to a single-target rapid detection method based on deep learning.

Background

Object Detection (OD) is one of the core problems in the field of computer vision, and its basic task is to classify and locate objects of interest in an image or video, while detecting their position and size. Currently, target detection methods can be divided into two broad categories: traditional target detection algorithms and deep learning based target detection algorithms.

The traditional target detection algorithm is to perform region selection based on traversal of a sliding window, then perform feature extraction on an image block in the sliding window by using features such as Histogram of Oriented Gradient (HOG), scale-invariant feature transform (SIFT), and the like, and finally classify the extracted features by using classifiers such as Support Vector Machine (SVM), AdaBoost, and the like. The target detection method needs manual feature construction, the construction process is complex, the detection precision is improved very limitedly, and the adaptability to complex backgrounds is poor. Currently, more research is being directed to methods based on deep learning.

The target detection algorithm based on deep learning mainly uses a Convolutional Neural Network (CNN) to extract image features, and completes target detection through operations such as pooling and sampling. Generally, the performance of a network is positively correlated with the depth of the network. In order to extract more image information, the depth of the network is continuously deepened, and the parameters of the network are increasingly huge. Although the performance is greatly improved, due to the large scale of the network, the real-time performance of the target detection depends on the computing capacity seriously, which greatly limits the deployment of the algorithm on a mobile platform.

Disclosure of Invention

In order to solve the limitations and defects of the prior art, the invention provides a single-target rapid detection method based on deep learning, which comprises the following steps:

step S1, detecting the video sequence;

step S2, under the condition that the target position is not detected in the initial state or the previous frame of image, traversing the whole image by using a sliding window method, and sending the image of each window into a convolutional neural network for calculation until the target position information is detected;

step S3, because the moving process of the target is continuous, the difference between the target position in the previous frame image and the target position in the current frame image is within a certain range, when the target position information in the previous frame image is determined, a sliding window is formed according to the target position information in the previous frame image to detect the current frame image, the previous frame image is k-1 frame, the current frame image is k frame, the target position information in the previous frame image is (x-1 frame)_k-1，y_k-1)；

Step S4, the central coordinate is (x)_k-1，y_k-1) Sending the sliding window into a convolutional neural network for calculation, and judging whether target position information exists in the image in the sliding window;

step S5, if the judgment result is that the window has the target position information, executing step S6; if the judgment result is that the window does not have the target position information, executing step S2;

step S6, the next frame image is detected.

Optionally, step 1 includes the following steps before:

setting a loss function of the network, wherein the loss function comprises a prediction center coordinate loss function, a prediction boundary box size loss function, a prediction category loss function and a prediction confidence coefficient loss function;

the calculation formula of the prediction center coordinate loss function is as follows:

wherein λ is_coordWithin a representation gridNo coefficients of the object, (x, y) is the position of the prediction bounding box,

the real frame position;

the calculation formula of the predicted bounding box size loss function is as follows:

wherein (w, h) is the predicted frame size,

is the real frame size;

the calculation formula of the prediction category loss function is as follows:

wherein the content of the first and second substances,

to predict the probability of the outcome being of class c,

the prediction result is the true probability of the class c;

the calculation formula of the prediction confidence coefficient loss function is as follows:

wherein, c_iIs the score of the degree of confidence that the user is,

is the intersection of the predicted bounding box and the actual bounding box, if a cell has an object,

the value is 1, and if the cell has no object,

a value of 0;

setting the total loss function of the network to be J, wherein the calculation formula of the J is as follows:

J＝L₁+L₂+L₃+L₄ (5)

updating the gradient according to the error calculation weight, wherein the calculation formula is as follows:

wherein, g_tA gradient representing the t time step;

the weights are updated by a gradient-based optimization algorithm, which is calculated as follows:

the invention has the following beneficial effects:

the single-target rapid detection method based on deep learning provided by the invention detects a video sequence, traverses the whole image by using a sliding window method under the condition of an initial state or no target position detected in the previous frame, sends the image of each window into a convolutional neural network for calculation until target position information is detected, and then starts to detect the image of the next frame. Since the motion of the object is a continuous process, the difference between the positions of the two previous and next frames is within a preset range. When the target position information (x) in the previous frame image_k-1，y_k-1) When known, the center coordinate is (x)_k-1，y_k-1) The window is sent into a convolutional neural network for operation, and the target can be rapidly positioned by detecting the window.

Because the scale of the neural network is closely related to the input size of the network, the technical scheme provided by the invention can optimize the network by controlling the input size of the network, thereby reducing the scale of the network while ensuring the performance, improving the calculation speed of the network and further improving the real-time property of the target detection algorithm.

According to the technical scheme provided by the invention, by reducing the size of the input of the neural network, the high precision can be ensured, the network scale can be reduced, the reasoning speed of the neural network is effectively improved, and the real-time performance of the target detection algorithm is improved. In addition, in the target detection process, the method only sends the high-value window image to the network for operation, effectively avoids interference information of a low-value area, and is beneficial to improving the accuracy of the algorithm.

Drawings

Fig. 1 is a schematic general architecture diagram of a single-target fast detection method based on deep learning according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a current frame operation based on deep learning according to an embodiment of the present invention.

Fig. 3 is an algorithm flowchart of a single-target fast detection method based on deep learning according to an embodiment of the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solution of the present invention, the following describes in detail a single-target fast detection method based on deep learning, provided by the present invention, with reference to the accompanying drawings.

Example one

Fig. 1 is a schematic general architecture diagram of a single-target fast detection method based on deep learning according to an embodiment of the present invention. As shown in fig. 1, in the present embodiment, a single-target fast detection method based on deep learning is provided, because an original image is large, most regions do not include target information in a detection process, and these regions may not be sent to a neural network for operation.

In this embodiment, a video sequence is detected, and in an initial state or in a case where a target position is not detected in a previous frame, the entire image is traversed by using a sliding window method, an image in each window is sent to a neural network for calculation until target position information is detected, and then, next frame image detection is started.

Fig. 2 is a schematic diagram of a current frame operation based on deep learning according to a first embodiment of the present invention, and fig. 3 is a flowchart of an algorithm of a single-target fast detection method based on deep learning according to a first embodiment of the present invention. As shown in fig. 2 and 3, since the motion of the object is a continuous process, the difference between the positions in the two previous and next frames is in a certain range. When the target position information (x) in the previous frame (k-1 frame) image_k-1,y_k-1) In known cases, during the detection of the current frame (k frames), only the center coordinate is required to be (x)_k-1,y_k-1) The window is sent into a neural network for operation, and the target can be rapidly positioned by detecting the window.

The single-target rapid detection method based on deep learning provided by the embodiment has the main significance of improving the speed of a single-target detection algorithm, and enabling a small-scale neural network to complete a target detection task and have higher precision by controlling the input size.

The embodiment selects the window of the original image, reduces the input size of the network, accelerates the network computing speed and improves the real-time performance of the algorithm. In the process of detecting the adjacent image frames, the next frame generates an image window according to the target position information by using the target position information of the previous frame, so that the total calculation amount is reduced. In addition, the technical scheme provided by the embodiment selects the high-value area through the window, effectively shields the interference of the low-value area in the image, and improves the target detection precision.

It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.