CN112184756A - Single-target rapid detection method based on deep learning - Google Patents
Single-target rapid detection method based on deep learning Download PDFInfo
- Publication number
- CN112184756A CN112184756A CN202011063213.0A CN202011063213A CN112184756A CN 112184756 A CN112184756 A CN 112184756A CN 202011063213 A CN202011063213 A CN 202011063213A CN 112184756 A CN112184756 A CN 112184756A
- Authority
- CN
- China
- Prior art keywords
- network
- target position
- image
- loss function
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 36
- 238000013135 deep learning Methods 0.000 title claims abstract description 21
- 238000004364 calculation method Methods 0.000 claims abstract description 24
- 230000006870 function Effects 0.000 claims abstract description 23
- 238000000034 method Methods 0.000 claims abstract description 17
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 12
- 238000005457 optimization Methods 0.000 claims abstract description 3
- 239000000126 substance Substances 0.000 claims description 3
- 230000001537 neural effect Effects 0.000 abstract 1
- 238000013528 artificial neural network Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 4
- 238000010924 continuous production Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 230000001276 controlling effect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 230000002596 correlated effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/15—Correlation function computation including computation of convolution operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computational Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a single-target rapid detection method based on deep learning, which is designed aiming at the training process of a network, designs a loss function and an optimization method of the network, optimizes the detection process of a video sequence, traverses the whole image by using a sliding window method under the condition that the target position is not detected in an initial state or in the previous frame, sends the image of each window into a convolutional neural network for calculation until the target position information is detected, and then starts to detect the image of the next frame. When the target position information (x) in the previous frame imagek‑1,yk‑1) When known, the center coordinate is (x)k‑1,yk‑1) The window is sent into a convolutional neural network for operation, and the target can be rapidly positioned by detecting the window. Due to the neural netThe network scale and the network input size are closely related, and the technical scheme provided by the invention can optimize the network by controlling the input size of the network, thereby reducing the network scale and improving the network computing speed while ensuring the performance.
Description
Technical Field
The invention relates to the technical field of computer vision, in particular to a single-target rapid detection method based on deep learning.
Background
Object Detection (OD) is one of the core problems in the field of computer vision, and its basic task is to classify and locate objects of interest in an image or video, while detecting their position and size. Currently, target detection methods can be divided into two broad categories: traditional target detection algorithms and deep learning based target detection algorithms.
The traditional target detection algorithm is to perform region selection based on traversal of a sliding window, then perform feature extraction on an image block in the sliding window by using features such as Histogram of Oriented Gradient (HOG), scale-invariant feature transform (SIFT), and the like, and finally classify the extracted features by using classifiers such as Support Vector Machine (SVM), AdaBoost, and the like. The target detection method needs manual feature construction, the construction process is complex, the detection precision is improved very limitedly, and the adaptability to complex backgrounds is poor. Currently, more research is being directed to methods based on deep learning.
The target detection algorithm based on deep learning mainly uses a Convolutional Neural Network (CNN) to extract image features, and completes target detection through operations such as pooling and sampling. Generally, the performance of a network is positively correlated with the depth of the network. In order to extract more image information, the depth of the network is continuously deepened, and the parameters of the network are increasingly huge. Although the performance is greatly improved, due to the large scale of the network, the real-time performance of the target detection depends on the computing capacity seriously, which greatly limits the deployment of the algorithm on a mobile platform.
Disclosure of Invention
In order to solve the limitations and defects of the prior art, the invention provides a single-target rapid detection method based on deep learning, which comprises the following steps:
step S1, detecting the video sequence;
step S2, under the condition that the target position is not detected in the initial state or the previous frame of image, traversing the whole image by using a sliding window method, and sending the image of each window into a convolutional neural network for calculation until the target position information is detected;
step S3, because the moving process of the target is continuous, the difference between the target position in the previous frame image and the target position in the current frame image is within a certain range, when the target position information in the previous frame image is determined, a sliding window is formed according to the target position information in the previous frame image to detect the current frame image, the previous frame image is k-1 frame, the current frame image is k frame, the target position information in the previous frame image is (x-1 frame)k-1,yk-1);
Step S4, the central coordinate is (x)k-1,yk-1) Sending the sliding window into a convolutional neural network for calculation, and judging whether target position information exists in the image in the sliding window;
step S5, if the judgment result is that the window has the target position information, executing step S6; if the judgment result is that the window does not have the target position information, executing step S2;
step S6, the next frame image is detected.
Optionally, step 1 includes the following steps before:
setting a loss function of the network, wherein the loss function comprises a prediction center coordinate loss function, a prediction boundary box size loss function, a prediction category loss function and a prediction confidence coefficient loss function;
the calculation formula of the prediction center coordinate loss function is as follows:
wherein λ iscoordWithin a representation gridNo coefficients of the object, (x, y) is the position of the prediction bounding box,the real frame position;
the calculation formula of the predicted bounding box size loss function is as follows:
the calculation formula of the prediction category loss function is as follows:
wherein the content of the first and second substances,to predict the probability of the outcome being of class c,the prediction result is the true probability of the class c;
the calculation formula of the prediction confidence coefficient loss function is as follows:
wherein, ciIs the score of the degree of confidence that the user is,is the intersection of the predicted bounding box and the actual bounding box, if a cell has an object,the value is 1, and if the cell has no object,a value of 0;
setting the total loss function of the network to be J, wherein the calculation formula of the J is as follows:
J=L1+L2+L3+L4 (5)
updating the gradient according to the error calculation weight, wherein the calculation formula is as follows:
wherein, gtA gradient representing the t time step;
the weights are updated by a gradient-based optimization algorithm, which is calculated as follows:
the invention has the following beneficial effects:
the single-target rapid detection method based on deep learning provided by the invention detects a video sequence, traverses the whole image by using a sliding window method under the condition of an initial state or no target position detected in the previous frame, sends the image of each window into a convolutional neural network for calculation until target position information is detected, and then starts to detect the image of the next frame. Since the motion of the object is a continuous process, the difference between the positions of the two previous and next frames is within a preset range. When the target position information (x) in the previous frame imagek-1,yk-1) When known, the center coordinate is (x)k-1,yk-1) The window is sent into a convolutional neural network for operation, and the target can be rapidly positioned by detecting the window.
Because the scale of the neural network is closely related to the input size of the network, the technical scheme provided by the invention can optimize the network by controlling the input size of the network, thereby reducing the scale of the network while ensuring the performance, improving the calculation speed of the network and further improving the real-time property of the target detection algorithm.
According to the technical scheme provided by the invention, by reducing the size of the input of the neural network, the high precision can be ensured, the network scale can be reduced, the reasoning speed of the neural network is effectively improved, and the real-time performance of the target detection algorithm is improved. In addition, in the target detection process, the method only sends the high-value window image to the network for operation, effectively avoids interference information of a low-value area, and is beneficial to improving the accuracy of the algorithm.
Drawings
Fig. 1 is a schematic general architecture diagram of a single-target fast detection method based on deep learning according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a current frame operation based on deep learning according to an embodiment of the present invention.
Fig. 3 is an algorithm flowchart of a single-target fast detection method based on deep learning according to an embodiment of the present invention.
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present invention, the following describes in detail a single-target fast detection method based on deep learning, provided by the present invention, with reference to the accompanying drawings.
Example one
Fig. 1 is a schematic general architecture diagram of a single-target fast detection method based on deep learning according to an embodiment of the present invention. As shown in fig. 1, in the present embodiment, a single-target fast detection method based on deep learning is provided, because an original image is large, most regions do not include target information in a detection process, and these regions may not be sent to a neural network for operation.
In this embodiment, a video sequence is detected, and in an initial state or in a case where a target position is not detected in a previous frame, the entire image is traversed by using a sliding window method, an image in each window is sent to a neural network for calculation until target position information is detected, and then, next frame image detection is started.
Fig. 2 is a schematic diagram of a current frame operation based on deep learning according to a first embodiment of the present invention, and fig. 3 is a flowchart of an algorithm of a single-target fast detection method based on deep learning according to a first embodiment of the present invention. As shown in fig. 2 and 3, since the motion of the object is a continuous process, the difference between the positions in the two previous and next frames is in a certain range. When the target position information (x) in the previous frame (k-1 frame) imagek-1,yk-1) In known cases, during the detection of the current frame (k frames), only the center coordinate is required to be (x)k-1,yk-1) The window is sent into a neural network for operation, and the target can be rapidly positioned by detecting the window.
The single-target rapid detection method based on deep learning provided by the embodiment has the main significance of improving the speed of a single-target detection algorithm, and enabling a small-scale neural network to complete a target detection task and have higher precision by controlling the input size.
The single-target rapid detection method based on deep learning provided by the invention detects a video sequence, traverses the whole image by using a sliding window method under the condition of an initial state or no target position detected in the previous frame, sends the image of each window into a convolutional neural network for calculation until target position information is detected, and then starts to detect the image of the next frame. Since the motion of the object is a continuous process, the difference between the positions of the two previous and next frames is within a preset range. When the target position information (x) in the previous frame imagek-1,yk-1) When known, the center coordinate is (x)k-1,yk-1) The window is sent into a convolutional neural network for operation, and the target can be rapidly positioned by detecting the window.
The embodiment selects the window of the original image, reduces the input size of the network, accelerates the network computing speed and improves the real-time performance of the algorithm. In the process of detecting the adjacent image frames, the next frame generates an image window according to the target position information by using the target position information of the previous frame, so that the total calculation amount is reduced. In addition, the technical scheme provided by the embodiment selects the high-value area through the window, effectively shields the interference of the low-value area in the image, and improves the target detection precision.
It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.
Claims (2)
1. A single target rapid detection method based on deep learning is characterized by comprising the following steps:
step S1, detecting the video sequence;
step S2, under the condition that the target position is not detected in the initial state or the previous frame of image, traversing the whole image by using a sliding window method, and sending the image of each window into a convolutional neural network for calculation until the target position information is detected;
step S3, because the moving process of the target is continuous, the difference between the target position in the previous frame image and the target position in the current frame image is within a certain range, when the target position information in the previous frame image is determined, a sliding window is formed according to the target position information in the previous frame image to detect the current frame image, the previous frame image is k-1 frame, the current frame image is k frame, the target position information in the previous frame image is (x-1 frame)k-1,yk-1);
Step S4, the central coordinate is (x)k-1,yk-1) Sending the sliding window into a convolutional neural network for calculation, and judging whether target position information exists in the image in the sliding window;
step S5, if the judgment result is that the window has the target position information, executing step S6; if the judgment result is that the window does not have the target position information, executing step S2;
step S6, the next frame image is detected.
2. The single-target rapid detection method based on deep learning according to claim 1, wherein the step 1 comprises the following steps:
setting a loss function of the network, wherein the loss function comprises a prediction center coordinate loss function, a prediction boundary box size loss function, a prediction category loss function and a prediction confidence coefficient loss function;
the calculation formula of the prediction center coordinate loss function is as follows:
wherein λ iscoordCoefficients representing no objects within the mesh, (x, y) is the position of the prediction bounding box,the real frame position;
the calculation formula of the predicted bounding box size loss function is as follows:
the calculation formula of the prediction category loss function is as follows:
wherein the content of the first and second substances,to predict the probability of the outcome being of class c,the prediction result is the true probability of the class c;
the calculation formula of the prediction confidence coefficient loss function is as follows:
wherein, ciIs the score of the degree of confidence that the user is,is the intersection of the predicted bounding box and the actual bounding box, if a cell has an object,the value is 1, and if the cell has no object,a value of 0;
setting the total loss function of the network to be J, wherein the calculation formula of the J is as follows:
J=L1+L2+L3+L4 (5)
updating the gradient according to the error calculation weight, wherein the calculation formula is as follows:
wherein, gtA gradient representing the t time step;
the weights are updated by a gradient-based optimization algorithm, which is calculated as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011063213.0A CN112184756A (en) | 2020-09-30 | 2020-09-30 | Single-target rapid detection method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011063213.0A CN112184756A (en) | 2020-09-30 | 2020-09-30 | Single-target rapid detection method based on deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112184756A true CN112184756A (en) | 2021-01-05 |
Family
ID=73949222
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011063213.0A Pending CN112184756A (en) | 2020-09-30 | 2020-09-30 | Single-target rapid detection method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112184756A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106650668A (en) * | 2016-12-27 | 2017-05-10 | 上海葡萄纬度科技有限公司 | Method and system for detecting movable target object in real time |
CN109685152A (en) * | 2018-12-29 | 2019-04-26 | 北京化工大学 | A kind of image object detection method based on DC-SPP-YOLO |
CN110135267A (en) * | 2019-04-17 | 2019-08-16 | 电子科技大学 | A kind of subtle object detection method of large scene SAR image |
CN110502962A (en) * | 2018-05-18 | 2019-11-26 | 翔升(上海)电子技术有限公司 | Mesh object detection method, device, equipment and medium in video flowing |
CN110781836A (en) * | 2019-10-28 | 2020-02-11 | 深圳市赛为智能股份有限公司 | Human body recognition method and device, computer equipment and storage medium |
CN111311634A (en) * | 2020-01-23 | 2020-06-19 | 支付宝实验室(新加坡)有限公司 | Face image detection method, device and equipment |
-
2020
- 2020-09-30 CN CN202011063213.0A patent/CN112184756A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106650668A (en) * | 2016-12-27 | 2017-05-10 | 上海葡萄纬度科技有限公司 | Method and system for detecting movable target object in real time |
CN110502962A (en) * | 2018-05-18 | 2019-11-26 | 翔升(上海)电子技术有限公司 | Mesh object detection method, device, equipment and medium in video flowing |
CN109685152A (en) * | 2018-12-29 | 2019-04-26 | 北京化工大学 | A kind of image object detection method based on DC-SPP-YOLO |
CN110135267A (en) * | 2019-04-17 | 2019-08-16 | 电子科技大学 | A kind of subtle object detection method of large scene SAR image |
CN110781836A (en) * | 2019-10-28 | 2020-02-11 | 深圳市赛为智能股份有限公司 | Human body recognition method and device, computer equipment and storage medium |
CN111311634A (en) * | 2020-01-23 | 2020-06-19 | 支付宝实验室(新加坡)有限公司 | Face image detection method, device and equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109800689B (en) | Target tracking method based on space-time feature fusion learning | |
CN108229397B (en) | Method for detecting text in image based on Faster R-CNN | |
WO2018103608A1 (en) | Text detection method, device and storage medium | |
CN111368683B (en) | Face image feature extraction method and face recognition method based on modular constraint CenterFace | |
CN111079674B (en) | Target detection method based on global and local information fusion | |
CN110796186A (en) | Dry and wet garbage identification and classification method based on improved YOLOv3 network | |
CN112132856B (en) | Twin network tracking method based on self-adaptive template updating | |
CN111445488B (en) | Method for automatically identifying and dividing salt body by weak supervision learning | |
CN111476302A (en) | fast-RCNN target object detection method based on deep reinforcement learning | |
CN112836639A (en) | Pedestrian multi-target tracking video identification method based on improved YOLOv3 model | |
CN110120065B (en) | Target tracking method and system based on hierarchical convolution characteristics and scale self-adaptive kernel correlation filtering | |
CN109325440B (en) | Human body action recognition method and system | |
Sajanraj et al. | Indian sign language numeral recognition using region of interest convolutional neural network | |
CN113076871A (en) | Fish shoal automatic detection method based on target shielding compensation | |
CN110009060B (en) | Robustness long-term tracking method based on correlation filtering and target detection | |
CN110276784B (en) | Correlation filtering moving target tracking method based on memory mechanism and convolution characteristics | |
CN111860587B (en) | Detection method for small targets of pictures | |
CN111931654A (en) | Intelligent monitoring method, system and device for personnel tracking | |
CN111931953A (en) | Multi-scale characteristic depth forest identification method for waste mobile phones | |
CN110688940A (en) | Rapid face tracking method based on face detection | |
CN110633727A (en) | Deep neural network ship target fine-grained identification method based on selective search | |
CN112733942A (en) | Variable-scale target detection method based on multi-stage feature adaptive fusion | |
CN113159215A (en) | Small target detection and identification method based on fast Rcnn | |
CN113705579A (en) | Automatic image annotation method driven by visual saliency | |
CN111931572B (en) | Target detection method for remote sensing image |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210105 |
|
RJ01 | Rejection of invention patent application after publication |