CN112184756A - Single-target rapid detection method based on deep learning - Google Patents

Single-target rapid detection method based on deep learning Download PDF

Info

Publication number
CN112184756A
CN112184756A CN202011063213.0A CN202011063213A CN112184756A CN 112184756 A CN112184756 A CN 112184756A CN 202011063213 A CN202011063213 A CN 202011063213A CN 112184756 A CN112184756 A CN 112184756A
Authority
CN
China
Prior art keywords
network
target position
image
loss function
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011063213.0A
Other languages
Chinese (zh)
Inventor
韩勇强
李利华
刘泳庆
杨旭
卢彤春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202011063213.0A priority Critical patent/CN112184756A/en
Publication of CN112184756A publication Critical patent/CN112184756A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a single-target rapid detection method based on deep learning, which is designed aiming at the training process of a network, designs a loss function and an optimization method of the network, optimizes the detection process of a video sequence, traverses the whole image by using a sliding window method under the condition that the target position is not detected in an initial state or in the previous frame, sends the image of each window into a convolutional neural network for calculation until the target position information is detected, and then starts to detect the image of the next frame. When the target position information (x) in the previous frame imagek‑1,yk‑1) When known, the center coordinate is (x)k‑1,yk‑1) The window is sent into a convolutional neural network for operation, and the target can be rapidly positioned by detecting the window. Due to the neural netThe network scale and the network input size are closely related, and the technical scheme provided by the invention can optimize the network by controlling the input size of the network, thereby reducing the network scale and improving the network computing speed while ensuring the performance.

Description

Single-target rapid detection method based on deep learning
Technical Field
The invention relates to the technical field of computer vision, in particular to a single-target rapid detection method based on deep learning.
Background
Object Detection (OD) is one of the core problems in the field of computer vision, and its basic task is to classify and locate objects of interest in an image or video, while detecting their position and size. Currently, target detection methods can be divided into two broad categories: traditional target detection algorithms and deep learning based target detection algorithms.
The traditional target detection algorithm is to perform region selection based on traversal of a sliding window, then perform feature extraction on an image block in the sliding window by using features such as Histogram of Oriented Gradient (HOG), scale-invariant feature transform (SIFT), and the like, and finally classify the extracted features by using classifiers such as Support Vector Machine (SVM), AdaBoost, and the like. The target detection method needs manual feature construction, the construction process is complex, the detection precision is improved very limitedly, and the adaptability to complex backgrounds is poor. Currently, more research is being directed to methods based on deep learning.
The target detection algorithm based on deep learning mainly uses a Convolutional Neural Network (CNN) to extract image features, and completes target detection through operations such as pooling and sampling. Generally, the performance of a network is positively correlated with the depth of the network. In order to extract more image information, the depth of the network is continuously deepened, and the parameters of the network are increasingly huge. Although the performance is greatly improved, due to the large scale of the network, the real-time performance of the target detection depends on the computing capacity seriously, which greatly limits the deployment of the algorithm on a mobile platform.
Disclosure of Invention
In order to solve the limitations and defects of the prior art, the invention provides a single-target rapid detection method based on deep learning, which comprises the following steps:
step S1, detecting the video sequence;
step S2, under the condition that the target position is not detected in the initial state or the previous frame of image, traversing the whole image by using a sliding window method, and sending the image of each window into a convolutional neural network for calculation until the target position information is detected;
step S3, because the moving process of the target is continuous, the difference between the target position in the previous frame image and the target position in the current frame image is within a certain range, when the target position information in the previous frame image is determined, a sliding window is formed according to the target position information in the previous frame image to detect the current frame image, the previous frame image is k-1 frame, the current frame image is k frame, the target position information in the previous frame image is (x-1 frame)k-1,yk-1);
Step S4, the central coordinate is (x)k-1,yk-1) Sending the sliding window into a convolutional neural network for calculation, and judging whether target position information exists in the image in the sliding window;
step S5, if the judgment result is that the window has the target position information, executing step S6; if the judgment result is that the window does not have the target position information, executing step S2;
step S6, the next frame image is detected.
Optionally, step 1 includes the following steps before:
setting a loss function of the network, wherein the loss function comprises a prediction center coordinate loss function, a prediction boundary box size loss function, a prediction category loss function and a prediction confidence coefficient loss function;
the calculation formula of the prediction center coordinate loss function is as follows:
Figure BDA0002713017120000021
wherein λ iscoordWithin a representation gridNo coefficients of the object, (x, y) is the position of the prediction bounding box,
Figure BDA0002713017120000022
the real frame position;
the calculation formula of the predicted bounding box size loss function is as follows:
Figure BDA0002713017120000023
wherein (w, h) is the predicted frame size,
Figure BDA0002713017120000031
is the real frame size;
the calculation formula of the prediction category loss function is as follows:
Figure BDA0002713017120000032
wherein the content of the first and second substances,
Figure BDA0002713017120000033
to predict the probability of the outcome being of class c,
Figure BDA0002713017120000034
the prediction result is the true probability of the class c;
the calculation formula of the prediction confidence coefficient loss function is as follows:
Figure BDA0002713017120000035
wherein, ciIs the score of the degree of confidence that the user is,
Figure BDA0002713017120000036
is the intersection of the predicted bounding box and the actual bounding box, if a cell has an object,
Figure BDA0002713017120000037
the value is 1, and if the cell has no object,
Figure BDA0002713017120000038
a value of 0;
setting the total loss function of the network to be J, wherein the calculation formula of the J is as follows:
J=L1+L2+L3+L4 (5)
updating the gradient according to the error calculation weight, wherein the calculation formula is as follows:
Figure BDA0002713017120000039
wherein, gtA gradient representing the t time step;
the weights are updated by a gradient-based optimization algorithm, which is calculated as follows:
Figure BDA00027130171200000310
the invention has the following beneficial effects:
the single-target rapid detection method based on deep learning provided by the invention detects a video sequence, traverses the whole image by using a sliding window method under the condition of an initial state or no target position detected in the previous frame, sends the image of each window into a convolutional neural network for calculation until target position information is detected, and then starts to detect the image of the next frame. Since the motion of the object is a continuous process, the difference between the positions of the two previous and next frames is within a preset range. When the target position information (x) in the previous frame imagek-1,yk-1) When known, the center coordinate is (x)k-1,yk-1) The window is sent into a convolutional neural network for operation, and the target can be rapidly positioned by detecting the window.
Because the scale of the neural network is closely related to the input size of the network, the technical scheme provided by the invention can optimize the network by controlling the input size of the network, thereby reducing the scale of the network while ensuring the performance, improving the calculation speed of the network and further improving the real-time property of the target detection algorithm.
According to the technical scheme provided by the invention, by reducing the size of the input of the neural network, the high precision can be ensured, the network scale can be reduced, the reasoning speed of the neural network is effectively improved, and the real-time performance of the target detection algorithm is improved. In addition, in the target detection process, the method only sends the high-value window image to the network for operation, effectively avoids interference information of a low-value area, and is beneficial to improving the accuracy of the algorithm.
Drawings
Fig. 1 is a schematic general architecture diagram of a single-target fast detection method based on deep learning according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a current frame operation based on deep learning according to an embodiment of the present invention.
Fig. 3 is an algorithm flowchart of a single-target fast detection method based on deep learning according to an embodiment of the present invention.
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present invention, the following describes in detail a single-target fast detection method based on deep learning, provided by the present invention, with reference to the accompanying drawings.
Example one
Fig. 1 is a schematic general architecture diagram of a single-target fast detection method based on deep learning according to an embodiment of the present invention. As shown in fig. 1, in the present embodiment, a single-target fast detection method based on deep learning is provided, because an original image is large, most regions do not include target information in a detection process, and these regions may not be sent to a neural network for operation.
In this embodiment, a video sequence is detected, and in an initial state or in a case where a target position is not detected in a previous frame, the entire image is traversed by using a sliding window method, an image in each window is sent to a neural network for calculation until target position information is detected, and then, next frame image detection is started.
Fig. 2 is a schematic diagram of a current frame operation based on deep learning according to a first embodiment of the present invention, and fig. 3 is a flowchart of an algorithm of a single-target fast detection method based on deep learning according to a first embodiment of the present invention. As shown in fig. 2 and 3, since the motion of the object is a continuous process, the difference between the positions in the two previous and next frames is in a certain range. When the target position information (x) in the previous frame (k-1 frame) imagek-1,yk-1) In known cases, during the detection of the current frame (k frames), only the center coordinate is required to be (x)k-1,yk-1) The window is sent into a neural network for operation, and the target can be rapidly positioned by detecting the window.
The single-target rapid detection method based on deep learning provided by the embodiment has the main significance of improving the speed of a single-target detection algorithm, and enabling a small-scale neural network to complete a target detection task and have higher precision by controlling the input size.
The single-target rapid detection method based on deep learning provided by the invention detects a video sequence, traverses the whole image by using a sliding window method under the condition of an initial state or no target position detected in the previous frame, sends the image of each window into a convolutional neural network for calculation until target position information is detected, and then starts to detect the image of the next frame. Since the motion of the object is a continuous process, the difference between the positions of the two previous and next frames is within a preset range. When the target position information (x) in the previous frame imagek-1,yk-1) When known, the center coordinate is (x)k-1,yk-1) The window is sent into a convolutional neural network for operation, and the target can be rapidly positioned by detecting the window.
The embodiment selects the window of the original image, reduces the input size of the network, accelerates the network computing speed and improves the real-time performance of the algorithm. In the process of detecting the adjacent image frames, the next frame generates an image window according to the target position information by using the target position information of the previous frame, so that the total calculation amount is reduced. In addition, the technical scheme provided by the embodiment selects the high-value area through the window, effectively shields the interference of the low-value area in the image, and improves the target detection precision.
It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.

Claims (2)

1. A single target rapid detection method based on deep learning is characterized by comprising the following steps:
step S1, detecting the video sequence;
step S2, under the condition that the target position is not detected in the initial state or the previous frame of image, traversing the whole image by using a sliding window method, and sending the image of each window into a convolutional neural network for calculation until the target position information is detected;
step S3, because the moving process of the target is continuous, the difference between the target position in the previous frame image and the target position in the current frame image is within a certain range, when the target position information in the previous frame image is determined, a sliding window is formed according to the target position information in the previous frame image to detect the current frame image, the previous frame image is k-1 frame, the current frame image is k frame, the target position information in the previous frame image is (x-1 frame)k-1,yk-1);
Step S4, the central coordinate is (x)k-1,yk-1) Sending the sliding window into a convolutional neural network for calculation, and judging whether target position information exists in the image in the sliding window;
step S5, if the judgment result is that the window has the target position information, executing step S6; if the judgment result is that the window does not have the target position information, executing step S2;
step S6, the next frame image is detected.
2. The single-target rapid detection method based on deep learning according to claim 1, wherein the step 1 comprises the following steps:
setting a loss function of the network, wherein the loss function comprises a prediction center coordinate loss function, a prediction boundary box size loss function, a prediction category loss function and a prediction confidence coefficient loss function;
the calculation formula of the prediction center coordinate loss function is as follows:
Figure FDA0002713017110000011
wherein λ iscoordCoefficients representing no objects within the mesh, (x, y) is the position of the prediction bounding box,
Figure FDA0002713017110000012
the real frame position;
the calculation formula of the predicted bounding box size loss function is as follows:
Figure FDA0002713017110000021
wherein (w, h) is the predicted frame size,
Figure FDA0002713017110000022
is the real frame size;
the calculation formula of the prediction category loss function is as follows:
Figure FDA0002713017110000023
wherein the content of the first and second substances,
Figure FDA0002713017110000024
to predict the probability of the outcome being of class c,
Figure FDA0002713017110000025
the prediction result is the true probability of the class c;
the calculation formula of the prediction confidence coefficient loss function is as follows:
Figure FDA0002713017110000026
wherein, ciIs the score of the degree of confidence that the user is,
Figure FDA0002713017110000027
is the intersection of the predicted bounding box and the actual bounding box, if a cell has an object,
Figure FDA0002713017110000028
the value is 1, and if the cell has no object,
Figure FDA0002713017110000029
a value of 0;
setting the total loss function of the network to be J, wherein the calculation formula of the J is as follows:
J=L1+L2+L3+L4 (5)
updating the gradient according to the error calculation weight, wherein the calculation formula is as follows:
Figure FDA00027130171100000210
wherein, gtA gradient representing the t time step;
the weights are updated by a gradient-based optimization algorithm, which is calculated as follows:
Figure FDA00027130171100000211
CN202011063213.0A 2020-09-30 2020-09-30 Single-target rapid detection method based on deep learning Pending CN112184756A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011063213.0A CN112184756A (en) 2020-09-30 2020-09-30 Single-target rapid detection method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011063213.0A CN112184756A (en) 2020-09-30 2020-09-30 Single-target rapid detection method based on deep learning

Publications (1)

Publication Number Publication Date
CN112184756A true CN112184756A (en) 2021-01-05

Family

ID=73949222

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011063213.0A Pending CN112184756A (en) 2020-09-30 2020-09-30 Single-target rapid detection method based on deep learning

Country Status (1)

Country Link
CN (1) CN112184756A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106650668A (en) * 2016-12-27 2017-05-10 上海葡萄纬度科技有限公司 Method and system for detecting movable target object in real time
CN109685152A (en) * 2018-12-29 2019-04-26 北京化工大学 A kind of image object detection method based on DC-SPP-YOLO
CN110135267A (en) * 2019-04-17 2019-08-16 电子科技大学 A kind of subtle object detection method of large scene SAR image
CN110502962A (en) * 2018-05-18 2019-11-26 翔升(上海)电子技术有限公司 Mesh object detection method, device, equipment and medium in video flowing
CN110781836A (en) * 2019-10-28 2020-02-11 深圳市赛为智能股份有限公司 Human body recognition method and device, computer equipment and storage medium
CN111311634A (en) * 2020-01-23 2020-06-19 支付宝实验室(新加坡)有限公司 Face image detection method, device and equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106650668A (en) * 2016-12-27 2017-05-10 上海葡萄纬度科技有限公司 Method and system for detecting movable target object in real time
CN110502962A (en) * 2018-05-18 2019-11-26 翔升(上海)电子技术有限公司 Mesh object detection method, device, equipment and medium in video flowing
CN109685152A (en) * 2018-12-29 2019-04-26 北京化工大学 A kind of image object detection method based on DC-SPP-YOLO
CN110135267A (en) * 2019-04-17 2019-08-16 电子科技大学 A kind of subtle object detection method of large scene SAR image
CN110781836A (en) * 2019-10-28 2020-02-11 深圳市赛为智能股份有限公司 Human body recognition method and device, computer equipment and storage medium
CN111311634A (en) * 2020-01-23 2020-06-19 支付宝实验室(新加坡)有限公司 Face image detection method, device and equipment

Similar Documents

Publication Publication Date Title
CN109800689B (en) Target tracking method based on space-time feature fusion learning
CN108229397B (en) Method for detecting text in image based on Faster R-CNN
WO2018103608A1 (en) Text detection method, device and storage medium
CN111368683B (en) Face image feature extraction method and face recognition method based on modular constraint CenterFace
CN111079674B (en) Target detection method based on global and local information fusion
CN110796186A (en) Dry and wet garbage identification and classification method based on improved YOLOv3 network
CN112132856B (en) Twin network tracking method based on self-adaptive template updating
CN111445488B (en) Method for automatically identifying and dividing salt body by weak supervision learning
CN111476302A (en) fast-RCNN target object detection method based on deep reinforcement learning
CN112836639A (en) Pedestrian multi-target tracking video identification method based on improved YOLOv3 model
CN110120065B (en) Target tracking method and system based on hierarchical convolution characteristics and scale self-adaptive kernel correlation filtering
CN109325440B (en) Human body action recognition method and system
Sajanraj et al. Indian sign language numeral recognition using region of interest convolutional neural network
CN113076871A (en) Fish shoal automatic detection method based on target shielding compensation
CN110009060B (en) Robustness long-term tracking method based on correlation filtering and target detection
CN110276784B (en) Correlation filtering moving target tracking method based on memory mechanism and convolution characteristics
CN111860587B (en) Detection method for small targets of pictures
CN111931654A (en) Intelligent monitoring method, system and device for personnel tracking
CN111931953A (en) Multi-scale characteristic depth forest identification method for waste mobile phones
CN110688940A (en) Rapid face tracking method based on face detection
CN110633727A (en) Deep neural network ship target fine-grained identification method based on selective search
CN112733942A (en) Variable-scale target detection method based on multi-stage feature adaptive fusion
CN113159215A (en) Small target detection and identification method based on fast Rcnn
CN113705579A (en) Automatic image annotation method driven by visual saliency
CN111931572B (en) Target detection method for remote sensing image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210105

RJ01 Rejection of invention patent application after publication