CN112487889A - Unmanned aerial vehicle ground detection method and system based on deep neural network - Google Patents

Unmanned aerial vehicle ground detection method and system based on deep neural network Download PDF

Info

Publication number
CN112487889A
CN112487889A CN202011285176.8A CN202011285176A CN112487889A CN 112487889 A CN112487889 A CN 112487889A CN 202011285176 A CN202011285176 A CN 202011285176A CN 112487889 A CN112487889 A CN 112487889A
Authority
CN
China
Prior art keywords
target
detected
frame
neural network
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011285176.8A
Other languages
Chinese (zh)
Inventor
管乃洋
苏龙飞
王之元
凡遵林
张天昊
王浩
沈天龙
黄强娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Defense Technology Innovation Institute PLA Academy of Military Science
Original Assignee
National Defense Technology Innovation Institute PLA Academy of Military Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Defense Technology Innovation Institute PLA Academy of Military Science filed Critical National Defense Technology Innovation Institute PLA Academy of Military Science
Priority to CN202011285176.8A priority Critical patent/CN112487889A/en
Publication of CN112487889A publication Critical patent/CN112487889A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Astronomy & Astrophysics (AREA)
  • Remote Sensing (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to an unmanned aerial vehicle ground detection method and system based on a deep neural network, which comprises the following steps: acquiring a first position of a target to be detected from images acquired frame by frame; taking the first position as an initial target position of target tracking, continuously determining the target position in the next frame of image according to a candidate area corresponding to the target position to be detected in the current frame of image, and acquiring the first position of the target to be detected from the image collected frame by frame again and tracking when the target tracking fails; the technical scheme provided by the invention can monitor the acquired video in real time, thereby improving the target detection efficiency and accuracy; meanwhile, the target detection method adopting the deep neural network in the technical scheme provided by the invention has small calculated amount and higher practicability.

Description

Unmanned aerial vehicle ground detection method and system based on deep neural network
Technical Field
The invention relates to the technical field of computer vision, in particular to an unmanned aerial vehicle ground detection method and system based on a deep neural network.
Background
The current deep neural network is developed rapidly and is applied more and more widely, and a method for detecting or searching a target on a video or an image by using the deep neural network mainly comprises a two-step method represented by FasterR-CNN, R-CNN and the like and a one-step method represented by YOLO, SSD and the like; although FasterR-CNN is an excellent algorithm in the two-step method, the FasterR-CNN can only reach 5FPS processing speed under the support of strong computing power of a K40GPU, and cannot meet the requirement of real-time performance; although the speed of the YOLO and SSD target detection in the one-step method can reach more than 15FPS and can reach the real-time requirement, the calculation capability of TitanX or M40GPU is required to support. Algorithms with better performance and higher speed in the target tracking algorithm are represented by related filtering algorithms, and the algorithms have stable tracking and higher speed and can reach 172FPS under limited computing power.
The unmanned aerial vehicle is a reusable aircraft which is controlled by radio remote control or autonomous program control and is pilotless, and has the advantages of simple structure, low cost, strong survival ability, good maneuvering performance and capability of completing various tasks; however, the unmanned aerial vehicle is low in bearing weight, so that the unmanned aerial vehicle cannot carry computing equipment with strong computing performance, and therefore, the target detection algorithm based on the deep neural network is difficult to deploy, and the small unmanned aerial vehicle on-board computer such as a raspberry pie or an odroid is light in weight and limited in computing capacity; even if the faster one-step method in TinyYOLO or Mobilenes-SSD is deployed on the odroid on-board computer, the target detection speed does not exceed 3FPS, and the real-time requirement cannot be met. The retired predator unmanned aerial vehicle mainly obtains data through a sensor of the unmanned aerial vehicle and returns the data to the ground, and the data are manually interpreted on the ground; the improved global eagle portable signal sensor and the radar for detecting the ground moving target have primary on-board target detection and monitoring capability (distinguishing moving and static, detecting the moving target), and the detection technology is not mature enough; the rainbow unmanned aerial vehicle acquires data through a sensor of the unmanned aerial vehicle, returns the data to the ground, is manually interpreted on the ground, and is further processed at the rear end; the artificial intelligence algorithm is tested on a scanning eagle, the test is started for only a few days, the identification accuracy of a computer to objects such as personnel, vehicles, buildings and the like reaches 60%, and is improved to 80% after 1 week, however, the application is still finished on the ground; therefore, the current technology still cannot realize the processing operation of tracking and detecting the target in the data acquired by the airborne camera of the unmanned aerial vehicle in real time and carrying out the next indication.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide an unmanned aerial vehicle ground detection method and system based on a deep neural network, and the method for detecting and tracking a specific target in data acquired from an airborne camera in real time in the flight process of the unmanned aerial vehicle by combining a target detection algorithm and a tracking algorithm based on the deep neural network is utilized to realize the monitoring and searching of the tactical unmanned aerial vehicle on the ground target, the directional tracking on a moving target and the detection and tracking on an aerial target.
The purpose of the invention is realized by adopting the following technical scheme:
the invention provides an unmanned aerial vehicle ground detection method based on a deep neural network, which is improved in that the method comprises the following steps:
step 1) acquiring a first position of a target to be detected from images acquired frame by frame;
and step 2) taking the first position as an initial target position of target tracking, continuously determining the target position in the next frame of image according to a candidate area corresponding to the target position to be detected in the current frame of image, and returning to the step 1) when the target tracking fails.
Preferably, the condition for determining the target tracking failure includes any one of:
when the target detection duration is longer than the target tracking duration and the target detection duration is a positive integral multiple of the target tracking duration; or
When the target to be detected is not detected in the current frame image;
the target detection duration is a time interval from inputting the images acquired frame by frame into a pre-trained target detection deep neural network model to obtaining the first position of the target to be detected in the images;
the target tracking duration is the time interval for obtaining the position of the target to be detected in every two frames of images.
Further, the value range of the positive integer multiple is [1,100 ].
Preferably, the continuously determining the target position in the next frame of image according to the candidate region corresponding to the target position to be detected in the current frame of image includes:
acquiring a candidate area corresponding to a position area of a target to be detected in a current frame image;
searching a region consistent with a candidate region corresponding to the position region of the target to be detected in the current frame image in the next frame image by using a kernel correlation filtering algorithm, and taking the region as the candidate region corresponding to the position region of the target to be detected in the next frame image;
and storing the position area of the target to be detected in each frame of image into a target position set.
Further, after the storing the position area of the target to be measured in each frame of image to the target position set, the method further includes:
judging whether the number of the position areas of the target to be detected, which is stored in the target position set, exceeds a preset number threshold K or not;
if not, outputting the position area of the target to be detected which is currently stored;
and if so, updating the number of the position areas of the target to be detected in the target position set in a mode of abandoning the position area saved earliest, and outputting the updated position area of the target to be detected.
Further, the searching for a region consistent with a candidate region corresponding to a position region of the target to be detected in the current frame image by using the kernel correlation filtering algorithm includes:
expanding a candidate area corresponding to the position area of the target to be detected in the current frame image by a preset multiple to serve as an area where the candidate area corresponding to the position area of the target to be detected in the next frame image is consistent with the candidate area corresponding to the position area of the target to be detected in the current frame image;
wherein, the value range of the preset multiple is [1.5,3 ].
Preferably, the step 1) includes: and inputting the video images acquired frame by frame to a pre-trained target detection deep neural network model, executing forward reasoning of the deep neural network, and acquiring the first position of the target to be detected in the images.
Preferably, the training process of the pre-trained target detection deep neural network model includes:
carrying out frame-by-frame labeling on various targets in the historical video data acquired frame-by-frame;
constructing training data by using the historical video data labeled frame by frame, and training a target detection deep neural network model by using the training data;
and acquiring a pre-trained target detection deep neural network model.
The invention provides an unmanned aerial vehicle ground detection system based on a deep neural network, which is improved in that the system comprises:
the detection module is used for acquiring the first position of the target to be detected from the image acquired frame by frame;
and the tracking module is used for taking the first position as an initial target position of target tracking, continuously determining the target position in the next frame of image according to the candidate area corresponding to the target position to be detected in the current frame of image, and returning to the detection module if the target tracking fails.
Preferably, the condition for determining the target tracking failure includes any one of:
when the target detection duration is longer than the target tracking duration and the target detection duration is a positive integral multiple of the target tracking duration; or
When the target to be detected is not detected in the current frame image;
the target detection duration is a time interval from inputting the images acquired frame by frame into a pre-trained target detection deep neural network model to obtaining the first position of the target to be detected in the images;
the target tracking duration is the time interval for obtaining the position of the target to be detected in every two frames of images.
Further, the value range of the positive integer multiple is [1,100 ].
Preferably, the tracking module includes:
the acquisition unit is used for acquiring a candidate area corresponding to the position area of the target to be detected in the current frame image;
the searching unit is used for searching a region consistent with a candidate region corresponding to the position region of the target to be detected in the current frame image in the next frame image by utilizing a kernel correlation filtering algorithm, and taking the region as the candidate region corresponding to the position region of the target to be detected in the next frame image;
and the storage unit is used for storing the position area of the target to be detected in each frame of image into the target position set.
Further, the tracking module further includes:
the judging unit is used for judging whether the number of the position areas of the target to be detected, which is stored in the target position set, exceeds a preset number threshold K or not;
if not, outputting the position area of the target to be detected which is currently stored;
and if so, updating the number of the position areas of the target to be detected in the target position set in a mode of abandoning the position area saved earliest, and outputting the updated position area of the target to be detected.
Further, the search unit is specifically configured to:
expanding a candidate area corresponding to the position area of the target to be detected in the current frame image by a preset multiple to serve as an area where the candidate area corresponding to the position area of the target to be detected in the next frame image is consistent with the candidate area corresponding to the position area of the target to be detected in the current frame image;
wherein, the value range of the preset multiple is [1.5,3 ].
Preferably, the detection module is specifically configured to: and inputting the video images acquired frame by frame to a pre-trained target detection deep neural network model, executing forward reasoning of the deep neural network, and acquiring the first position of the target to be detected in the images.
Further, the training process of the pre-trained target detection deep neural network model includes:
carrying out frame-by-frame labeling on various targets in the historical video data acquired frame-by-frame;
constructing training data by using the historical video data labeled frame by frame, and training a target detection deep neural network model by using the training data;
and acquiring a pre-trained target detection deep neural network model.
Compared with the closest prior art, the invention has the following beneficial effects:
in the technical scheme provided by the invention, the main implementation steps comprise the steps of acquiring the first position of a target to be detected from images acquired frame by frame; taking the first position as an initial target position of target tracking, and continuously determining a target position in the next frame of image according to a candidate area corresponding to a target position to be detected in the current frame of image; if the target tracking fails, acquiring the first position of the target to be detected from the image acquired frame by frame again and tracking; the technical scheme provided by the invention can monitor the acquired video in real time, thereby improving the target detection efficiency and accuracy; meanwhile, the target detection method adopting the deep neural network in the technical scheme provided by the invention has small calculated amount and higher practicability.
The technical scheme provided by the invention also provides a trained target detection deep neural network model, forward reasoning is carried out on video data acquired frame by frame to obtain the position area of the target to be detected, a candidate area corresponding to the position area of the target to be detected in the current video frame and an area consistent with the candidate area corresponding to the current video frame in the next video frame are obtained, the position area of the target to be detected is determined, and the next operation is determined according to the judgment condition of target tracking failure; the technical scheme can keep the advantage of high precision of the deep neural network target detection algorithm, can overcome the defect of low speed of the deep neural network target detection algorithm, and monitors the acquired video in real time; when the target tracking algorithm fails to track, the target detection algorithm can be used in time to obtain the correct target position again, and the multi-scale target tracking algorithm can track the multi-scale target; when a plurality of targets exist in the video, the tracking algorithm can avoid the jumping of a target frame of the target detection algorithm on different targets; meanwhile, the technical scheme provided by the invention has small calculation amount, does not need huge calculation capacity such as the support of a GPU (graphics processing unit) display card, can be deployed on an onboard computer of a small unmanned aerial vehicle, and has important application value.
Drawings
Fig. 1 is a flowchart of a method for detecting the ground of an unmanned aerial vehicle based on a deep neural network according to embodiment 1 of the present invention;
fig. 2 is a flowchart of a specific implementation of a method for detecting the ground of an unmanned aerial vehicle based on a deep neural network according to embodiment 2 of the present invention;
FIG. 3 is a training flowchart of a deep neural network-based target detection model provided in embodiment 2 of the present invention;
FIG. 4 is a flowchart of real-time target detection based on deep neural network provided in embodiment 2 of the present invention
FIG. 5 is a flowchart of target tracking based on deep neural network provided in embodiment 2 of the present invention;
fig. 6 is a structural diagram of a ground detection system of a drone based on a deep neural network according to embodiment 3 of the present invention.
Detailed Description
The following describes embodiments of the present invention in further detail with reference to the accompanying drawings.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
The invention provides an unmanned aerial vehicle ground detection method based on a deep neural network, which comprises the following steps of as shown in figure 1
Step 1) acquiring a first position of a target to be detected from images acquired frame by frame;
and step 2) taking the first position as an initial target position of target tracking, continuously determining the target position in the next frame of image according to a candidate area corresponding to the target position to be detected in the current frame of image, and returning to the step 1) when the target tracking fails.
In an embodiment of the present invention, the determination condition of the target tracking failure in step 2) includes any one of:
when the target detection duration is longer than the target tracking duration and the target detection duration is a positive integral multiple of the target tracking duration; or
When the target to be detected is not detected in the current frame image;
the target detection duration is the time interval from inputting the images acquired frame by frame into a pre-trained target detection deep neural network model to obtaining the first position of the target to be detected in the images;
the target tracking duration is the time interval for obtaining the position of the target to be detected in every two frames of images.
Wherein, the value range of the positive integer multiple is [1,100 ].
In the embodiment of the present invention, the step 2) of continuously determining the target position in the next frame image according to the candidate region corresponding to the target position to be detected in the current frame image includes:
acquiring a candidate area corresponding to a position area of a target to be detected in a current frame image;
searching a region consistent with a candidate region corresponding to a position region of the target to be detected in the current frame image in the next frame image by using a Kernel Correlation Filter (KCF) algorithm, and taking the region as the candidate region corresponding to the position region of the target to be detected in the next frame image;
and storing the position area of the target to be detected in each frame of image into a target position set.
After the step 2) of saving the position area of the target to be measured in each frame of image to the target position set, the method further comprises the following steps:
judging whether the number of the position areas of the target to be detected, which is stored in the target position set, exceeds a preset number threshold K or not;
if not, outputting the position area of the target to be detected which is currently stored;
and if so, updating the number of the position areas of the target to be detected in the target position set in a mode of abandoning the position area saved earliest, and outputting the updated position area of the target to be detected.
In the embodiment of the present invention, searching for a region consistent with a candidate region corresponding to a position region of a target to be detected in a current frame image by using a KCF algorithm in a next frame image includes:
expanding a candidate area corresponding to the position area of the target to be detected in the current frame image by a preset multiple to serve as an area where the candidate area corresponding to the position area of the target to be detected in the next frame image is consistent with the candidate area corresponding to the position area of the target to be detected in the current frame image;
wherein, the value range of the preset multiple is [1.5,3 ].
In an embodiment of the present invention, the acquiring the first position of the target to be measured in step 1) includes: and inputting the video images acquired frame by frame to a pre-trained target detection deep neural network model, executing forward reasoning of the deep neural network, and acquiring the first position of the target to be detected in the images.
The training process of the pre-trained target detection deep neural network model comprises the following steps:
carrying out frame-by-frame labeling on various targets in the historical video data acquired frame-by-frame;
constructing training data by using the historical video data labeled frame by frame, and training a target detection deep neural network model by using the training data;
and acquiring a pre-trained target detection deep neural network model.
Example 2
The embodiment provides a specific implementation process of an unmanned aerial vehicle ground detection method based on a deep neural network, as shown in fig. 2, the method includes:
training a target detection deep neural network model to obtain a model file and a weight file;
step (2) collecting video data frame by frame;
step (3) recording the current time t10
Step (4) detecting a model file and a weight file of the deep neural network based on the trained target, and performing forward reasoning on the video data acquired frame by frame to obtain a position area of the target to be detected;
step (5) recording the current time t11And determining the detection time t of the detection target according to the following formula1:t1=t11-t10
Step (6) numbering the video frames of the position area of the target to be detected as 1, and sequentially numbering the video data acquired frame by frame after the initial video frame in the video data acquired frame by frame;
step (7) recording the current time t20
Step (8) initializing i to 1;
step (9) initializing j to 1;
step (10) obtaining a candidate area corresponding to the position area of the target to be detected in the video frame with the number i and an area which is consistent with the candidate area corresponding to the video frame with the number i in the video frame with the number i +1, and taking the area as the candidate area corresponding to the position area of the target to be detected in the video frame with the number i + 1;
step (11) acquiring a position area of a target to be detected in a candidate area corresponding to the position area of the target to be detected in the video frame with the number of i +1, setting j to j +1, and storing the position area of the target to be detected as a jth target position in a target position set;
step (12) judging whether j is larger than the preset target position number K in the target position set, if not, outputting the jth position of the target to be detected, and executing step (13); if yes, abandoning the earliest stored target position in the target position set, outputting the jth position of the target to be detected, and executing the step (13);
step (13) of recording the current time t21And determining the tracking time t of the detected target object according to the following formula2:t2=t21-t20
Step (14) if t2≥αt1Then go to step (4), if t2<αt1Then let i equal i +1 and execute step (9).
Preferably, step (1) comprises:
carrying out frame-by-frame labeling on various targets in the historical video data acquired frame-by-frame;
constructing training data by using the historical video data labeled frame by frame, and training a target detection deep neural network model by using the training data;
and obtaining a model file and a weight file of the trained target detection deep neural network.
Preferably, step (3) comprises:
and sequentially reading a label corresponding to the target to be detected, a trained model file and a weight file of the target detection deep neural network and video data acquired frame by using the forward reasoning frame, and acquiring the position of the target to be detected output by the forward reasoning frame.
In an embodiment of the present invention, the off-line training of the target detection deep neural network includes:
a-1, labeling the video data of the same type aiming at a specific target needing to be detected and tracked, and performing offline training on a deep neural network by using the labeled data on a GPU server or a computer with stronger performance;
step A-2, decomposing the same type of video data acquired by the unmanned aerial vehicle into images, wherein the number of the images is as large as possible, and is usually not less than 1 ten thousand to avoid overfitting and improve the generalization capability; labeling targets (automobiles, people, tanks, unmanned planes and the like) in each image; specifically, the method comprises the following steps: framing the target by using a rectangular frame, and recording pixel coordinates of the top left corner and the bottom right corner of the rectangular frame or vertex coordinates of the top left corner, the length and the width of the rectangular frame and a corresponding target label according to a specific format;
a-3, building a deep neural network training platform (TensorFlow, Darknet, Caffe and the like), setting parameters such as a training path size and a learning rate, reading a model of the deep neural network such as a Mobilenets-SSD, and updating parameters of the deep neural network model of the specific target detection algorithm on marked data;
and step A-4, after training for a specific number of times (more than 10000 rounds), storing the training model of the deep neural network, and obtaining a model file and a weight file of the training model of the deep neural network.
Secondly, detecting a target:
b-1, loading video data and reading video frames;
step B-2, recording the current time t of the timer 110
And B-3, loading a pre-training model based on a deep learning algorithm, and detecting a specific target on the read video frame by utilizing a deep learning forward reasoning mechanism: reading a target class label, a pre-training parameter model file, a weight file and a video frame to be detected, and carrying out forward reasoning on a new video frame to obtain target position information and confidence;
step B-4, recording the current time t of the timer 111,t1=t11-t10And simultaneously, transmitting the detected target position to a target tracker.
And finally, tracking the target:
and C-1, initializing by the target tracker by taking the target position detected by the target detector as a tracking starting point.
Step C-2, recording the current time t of the timer 220
C-3, tracking the target by a tracking algorithm, and updating the target position on a new video frame: determining the position of a candidate region of a current frame, and extracting the characteristics of the candidate region; searching a region which is most matched with the candidate region characteristics in a subsequent video frame as a target tracking object; enclosing the object by a rectangular frame to be used as a tracking result and storing the target position to a target position set;
c-4, if the number of the historical positions stored in the target position set of the same target exceeds K, abandoning the target position stored firstly, and outputting and displaying the latest position of the tracking target on the video image;
step C-5, recording the current time t of the timer 221Calculating a target tracking time t2=t21-t20If t is2≥αt1Go to step B-3, if t2<t1Go to step C-3.
Preferably, the obtaining of the candidate region corresponding to the position region of the target to be detected in the video frame with the number i includes:
and expanding the position area of the target to be detected in the video frame with the number i by a preset multiple.
Further, the value range of the preset multiple is [1.5,3 ].
Preferably, the value range of alpha in the step (13) is [1,100 ].
Based on the technical solution provided by the present invention, the embodiment of the present invention further provides a training flowchart of a target detection model based on a deep neural network, as shown in fig. 3:
s1, off-line training of a target detection model:
s11, collecting videos or images aiming at a monitoring specific area, wherein the collected images or video scenes are required to be similar to the scenes of the actual unmanned aerial vehicle monitoring area as much as possible;
s12, marking various targets (vehicles, personnel, trees and the like) in the collected video or image frame by frame, preferably selecting a rectangular frame for the marking frame, positioning through the top points of the upper left corner and the lower right corner or positioning by adopting the long and wide sides of the upper left corner and the rectangle, storing marked coordinates and category labels as xml or txt file types according to a fixed format, establishing an index file, and enabling image paths and file names to correspond to the names of the xml or txt file paths one by one;
s13, selecting a training platform of the deep neural network, wherein the training platform can be, but is not limited to, a cafe, tenserflow, a pitorch and a darknet;
s14, selecting a target detection deep neural network including but not limited to a Mobilenets-SSD target detection neural network, setting parameters such as training pathsize and learning rate, reading a training image and a corresponding xml or txt file according to an index file, and training by using labeled data on a training platform selected in S13;
and S15, performing N rounds of training on the acquired data in the training process of S14, wherein N is usually not less than 10000, and storing the obtained model file for the subsequent real-time target detection process.
Based on the technical scheme provided by the invention, the embodiment of the invention also provides a target real-time detection flow chart based on the deep neural network, as shown in fig. 4:
s2, online real-time target detection:
s21, reading the video or image data of the camera frame by frame in real time on the unmanned aerial vehicle;
s22, recording the current time t of the timer 110
S23, operating a lightweight forward reasoning framework which is convenient to deploy on a mobile platform, wherein the framework comprises but is not limited to an opencvDNN module, a TensorRT forward reasoning module, a Tencent NCNN forward reasoning module and a TENGNE forward reasoning module;
s24, reading the model weight file trained and stored in S15, detecting the selected target on the video or image read frame by frame, and acquiring and outputting corresponding information such as a target position rectangular frame, confidence coefficient, category label and the like;
s25, recording the current time t of the timer 111
Based on the technical solution provided by the present invention, an embodiment of the present invention further provides a target tracking flow chart based on a deep neural network, as shown in fig. 5:
s3, specific steps of target tracking:
s31, taking a target position rectangular frame stored by the target tracker in S25 as an initial value of a target tracking algorithm, initializing the tracking algorithm in a current video frame, preferably selecting a KCF target tracking algorithm by the tracking algorithm, and storing the initial position of the target;
s32, recording the current time t of the timer 220
S33, determining a template area larger than the target frame in the current frame by a KCF algorithm according to the initial target position, generally taking 1.5-3 times of the size of the target frame, and obtaining different displacement templates of the template area by using a cyclic matrix; moving according to the x-axis and the y-axis, respectively using the following cyclic matrices:
Figure BDA0002782143910000101
s34, extracting the characteristics of different displacement templates, multiplying the characteristics with a Hanning window to obtain a target template, and calculating a Gaussian kernel of the target template; the Hanning window is determined according to the formula:
Figure BDA0002782143910000102
wherein N is the window width;
s35, calculating a target position of the target template in the image through Fourier transform, and calculating a new target template according to the target position; and calculating a Gaussian response graph of the new target template, training the ridge regression model in a frequency domain, and updating the target template and the classifier parameter values.
S36, outputting and storing the target position;
s37, determining a template area larger than the target frame in the newly acquired frame according to the target position, generally taking 1.5-3 times of the size of the target frame, and acquiring different displacement templates of the template area by utilizing a cyclic matrix;
s38, extracting the characteristics of different displacement templates, and multiplying the characteristics by a Hanning window to obtain a target template;
s39, calculating a Gaussian kernel according to the size of the target template, and calculating a response graph by using the parameter value to obtain a target position; calculating a Gaussian kernel of a new target template, training a ridge regression model in a frequency domain, and updating the target template and the classifier parameter values;
s40, outputting and storing the target position;
s41, judging the number of the stored target positions, if the number of the stored target positions exceeds the preset number K of the target positions, abandoning the first stored target positions, and outputting the rest target positions, and if the number of the stored target positions does not exceed the preset number of the target positions, directly outputting the stored target positions;
s42, recording the current time t of the timer 221Calculating a target tracking time t2=t21-t20If t is2≥αt1Go to step S23, if t2<t1Go to step S37.
Example 3
The embodiment provides an unmanned aerial vehicle ground detection system based on a deep neural network, as shown in fig. 6, including:
the detection module is used for acquiring the first position of the target to be detected from the image acquired frame by frame;
and the tracking module is used for taking the first position as an initial target position of target tracking, continuously determining the target position in the next frame of image according to the candidate area corresponding to the target position to be detected in the current frame of image, and returning to the detection module when the target tracking fails.
Preferably, the condition for determining the target tracking failure includes any one of:
when the target detection duration is longer than the target tracking duration and the target detection duration is a positive integral multiple of the target tracking duration; or
When the target to be detected is not detected in the current frame image;
the target detection duration is the time interval from inputting the images acquired frame by frame into a pre-trained target detection deep neural network model to obtaining the first position of the target to be detected in the images;
the target tracking duration is the time interval for obtaining the position of the target to be detected in every two frames of images.
Further, the value range of the positive integer multiple is [1,100 ].
Preferably, the tracking module comprises:
the acquisition unit is used for acquiring a candidate area corresponding to the position area of the target to be detected in the current frame image;
the searching unit is used for searching a region consistent with a candidate region corresponding to the position region of the target to be detected in the current frame image in the next frame image by utilizing a kernel correlation filtering algorithm, and taking the region as the candidate region corresponding to the position region of the target to be detected in the next frame image;
and the storage unit is used for storing the position area of the target to be detected in each frame of image into the target position set.
Further, the tracking module further comprises:
the judging unit is used for judging whether the number of the position areas of the target to be detected, which is stored in the target position set, exceeds a preset number threshold K or not;
if not, outputting the position area of the target to be detected which is currently stored;
and if so, updating the number of the position areas of the target to be detected in the target position set in a mode of abandoning the position area saved earliest, and outputting the updated position area of the target to be detected.
Further, the search unit is specifically configured to:
expanding a candidate area corresponding to the position area of the target to be detected in the current frame image by a preset multiple to serve as an area where the candidate area corresponding to the position area of the target to be detected in the next frame image is consistent with the candidate area corresponding to the position area of the target to be detected in the current frame image;
wherein, the value range of the preset multiple is [1.5,3 ].
Preferably, the detection module is specifically configured to: and inputting the video images acquired frame by frame to a pre-trained target detection deep neural network model, executing forward reasoning of the deep neural network, and acquiring the first position of the target to be detected in the images.
Further, the training process of the pre-trained target detection deep neural network model comprises the following steps:
carrying out frame-by-frame labeling on various targets in the historical video data acquired frame-by-frame;
constructing training data by using the historical video data labeled frame by frame, and training a target detection deep neural network model by using the training data;
and acquiring a pre-trained target detection deep neural network model.
The unmanned aerial vehicle ground detection system provided by the embodiment of the invention or the electronic equipment loaded with the unmanned aerial vehicle ground detection method can be deployed on the unmanned aerial vehicle to realize monitoring and tracking of the target. .
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims (16)

1. An unmanned aerial vehicle ground detection method based on a deep neural network is characterized by comprising the following steps:
step 1) acquiring a first position of a target to be detected from images acquired frame by frame;
and step 2) taking the first position as an initial target position of target tracking, continuously determining the target position in the next frame of image according to a candidate area corresponding to the target position to be detected in the current frame of image, and returning to the step 1) when the target tracking fails.
2. The method according to claim 1, wherein the judgment condition of the target tracking failure includes any one of:
when the target detection duration is longer than the target tracking duration and the target detection duration is a positive integral multiple of the target tracking duration; or
When the target to be detected is not detected in the current frame image;
the target detection duration is a time interval from inputting the images acquired frame by frame into a pre-trained target detection deep neural network model to obtaining the first position of the target to be detected in the images;
the target tracking duration is the time interval for obtaining the position of the target to be detected in every two frames of images.
3. The method of claim 2, wherein the positive integer multiple ranges from [1,100 ].
4. The method of claim 1, wherein said continuously determining the target position in the next frame of image according to the candidate region corresponding to the target position to be detected in the current frame of image comprises:
acquiring a candidate area corresponding to a position area of a target to be detected in a current frame image;
searching a region consistent with a candidate region corresponding to the position region of the target to be detected in the current frame image in the next frame image by using a kernel correlation filtering algorithm, and taking the region as the candidate region corresponding to the position region of the target to be detected in the next frame image;
and storing the position area of the target to be detected in each frame of image into a target position set.
5. The method of claim 4, wherein after saving the location area of the object to be measured in each frame of image to the set of object locations, further comprising:
judging whether the number of the position areas of the target to be detected, which is stored in the target position set, exceeds a preset number threshold K or not;
if not, outputting the position area of the target to be detected which is currently stored;
and if so, updating the number of the position areas of the target to be detected in the target position set in a mode of abandoning the position area saved earliest, and outputting the updated position area of the target to be detected.
6. The method as claimed in claim 4, wherein said searching for a region in the next frame image consistent with the candidate region corresponding to the position region of the target to be detected in the current frame image by using the kernel correlation filtering algorithm comprises:
expanding a candidate area corresponding to the position area of the target to be detected in the current frame image by a preset multiple to serve as an area where the candidate area corresponding to the position area of the target to be detected in the next frame image is consistent with the candidate area corresponding to the position area of the target to be detected in the current frame image;
wherein, the value range of the preset multiple is [1.5,3 ].
7. The method of claim 1, wherein the step 1) comprises: and inputting the video images acquired frame by frame to a pre-trained target detection deep neural network model, executing forward reasoning of the deep neural network, and acquiring the first position of the target to be detected in the images.
8. The method of claim 1, wherein the training process of the pre-trained target detection deep neural network model comprises:
carrying out frame-by-frame labeling on various targets in the historical video data acquired frame-by-frame;
constructing training data by using the historical video data labeled frame by frame, and training a target detection deep neural network model by using the training data;
and acquiring a pre-trained target detection deep neural network model.
9. An unmanned aerial vehicle ground detection system based on a deep neural network, the system comprising:
the detection module is used for acquiring the first position of the target to be detected from the image acquired frame by frame;
and the tracking module is used for taking the first position as an initial target position of target tracking, continuously determining the target position in the next frame of image according to the candidate area corresponding to the target position to be detected in the current frame of image, and returning to the detection module when the target tracking fails.
10. The system according to claim 9, wherein the judgment condition of the target tracking failure includes any one of:
when the target detection duration is longer than the target tracking duration and the target detection duration is a positive integral multiple of the target tracking duration; or
When the target to be detected is not detected in the current frame image;
the target detection duration is a time interval from inputting the images acquired frame by frame into a pre-trained target detection deep neural network model to obtaining the first position of the target to be detected in the images;
the target tracking duration is the time interval for obtaining the position of the target to be detected in every two frames of images.
11. The system of claim 10, wherein the positive integer multiple ranges from [1,100 ].
12. The system of claim 9, wherein the tracking module comprises:
the acquisition unit is used for acquiring a candidate area corresponding to the position area of the target to be detected in the current frame image;
the searching unit is used for searching a region consistent with a candidate region corresponding to the position region of the target to be detected in the current frame image in the next frame image by utilizing a kernel correlation filtering algorithm, and taking the region as the candidate region corresponding to the position region of the target to be detected in the next frame image;
and the storage unit is used for storing the position area of the target to be detected in each frame of image into the target position set.
13. The system of claim 12, wherein the tracking module further comprises:
the judging unit is used for judging whether the number of the position areas of the target to be detected, which is stored in the target position set, exceeds a preset number threshold K or not;
if not, outputting the position area of the target to be detected which is currently stored;
and if so, updating the number of the position areas of the target to be detected in the target position set in a mode of abandoning the position area saved earliest, and outputting the updated position area of the target to be detected.
14. The system of claim 12, wherein the lookup unit is specifically configured to:
expanding a candidate area corresponding to the position area of the target to be detected in the current frame image by a preset multiple to serve as an area where the candidate area corresponding to the position area of the target to be detected in the next frame image is consistent with the candidate area corresponding to the position area of the target to be detected in the current frame image;
wherein, the value range of the preset multiple is [1.5,3 ].
15. The system of claim 9, wherein the detection module is specifically configured to: and inputting the video images acquired frame by frame to a pre-trained target detection deep neural network model, executing forward reasoning of the deep neural network, and acquiring the first position of the target to be detected in the images.
16. The system of claim 9, wherein the training process of the pre-trained target detection deep neural network model comprises:
carrying out frame-by-frame labeling on various targets in the historical video data acquired frame-by-frame;
constructing training data by using the historical video data labeled frame by frame, and training a target detection deep neural network model by using the training data;
and acquiring a pre-trained target detection deep neural network model.
CN202011285176.8A 2020-11-17 2020-11-17 Unmanned aerial vehicle ground detection method and system based on deep neural network Pending CN112487889A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011285176.8A CN112487889A (en) 2020-11-17 2020-11-17 Unmanned aerial vehicle ground detection method and system based on deep neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011285176.8A CN112487889A (en) 2020-11-17 2020-11-17 Unmanned aerial vehicle ground detection method and system based on deep neural network

Publications (1)

Publication Number Publication Date
CN112487889A true CN112487889A (en) 2021-03-12

Family

ID=74930890

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011285176.8A Pending CN112487889A (en) 2020-11-17 2020-11-17 Unmanned aerial vehicle ground detection method and system based on deep neural network

Country Status (1)

Country Link
CN (1) CN112487889A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113296540A (en) * 2021-05-20 2021-08-24 北京航空航天大学 Hybrid intelligent following and obstacle avoiding method suitable for indoor unmanned aerial vehicle

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107767405A (en) * 2017-09-29 2018-03-06 华中科技大学 A kind of nuclear phase for merging convolutional neural networks closes filtered target tracking
CN108346159A (en) * 2018-01-28 2018-07-31 北京工业大学 A kind of visual target tracking method based on tracking-study-detection

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107767405A (en) * 2017-09-29 2018-03-06 华中科技大学 A kind of nuclear phase for merging convolutional neural networks closes filtered target tracking
CN108346159A (en) * 2018-01-28 2018-07-31 北京工业大学 A kind of visual target tracking method based on tracking-study-detection

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113296540A (en) * 2021-05-20 2021-08-24 北京航空航天大学 Hybrid intelligent following and obstacle avoiding method suitable for indoor unmanned aerial vehicle
CN113296540B (en) * 2021-05-20 2022-07-12 北京航空航天大学 Hybrid intelligent following and obstacle avoiding method suitable for indoor unmanned aerial vehicle

Similar Documents

Publication Publication Date Title
US11205274B2 (en) High-performance visual object tracking for embedded vision systems
CN110443969B (en) Fire detection method and device, electronic equipment and storage medium
CN106874854B (en) Unmanned aerial vehicle tracking method based on embedded platform
Yang et al. Deep concrete inspection using unmanned aerial vehicle towards cssc database
CN110555901B (en) Method, device, equipment and storage medium for positioning and mapping dynamic and static scenes
EP2917874B1 (en) Cloud feature detection
CN109584213B (en) Multi-target number selection tracking method
CN110021033B (en) Target tracking method based on pyramid twin network
US9767570B2 (en) Systems and methods for computer vision background estimation using foreground-aware statistical models
US8446468B1 (en) Moving object detection using a mobile infrared camera
CN111932588A (en) Tracking method of airborne unmanned aerial vehicle multi-target tracking system based on deep learning
CN107871324B (en) Target tracking method and device based on double channels
CN111326023A (en) Unmanned aerial vehicle route early warning method, device, equipment and storage medium
CN111679695B (en) Unmanned aerial vehicle cruising and tracking system and method based on deep learning technology
CN112115975B (en) Deep learning network model rapid iterative training method and equipment suitable for monitoring device
Wu et al. Multivehicle object tracking in satellite video enhanced by slow features and motion features
CN114511792B (en) Unmanned aerial vehicle ground detection method and system based on frame counting
CN115861860B (en) Target tracking and positioning method and system for unmanned aerial vehicle
CN117036989A (en) Miniature unmanned aerial vehicle target recognition and tracking control method based on computer vision
CN112487892B (en) Unmanned aerial vehicle ground detection method and system based on confidence
CN112487889A (en) Unmanned aerial vehicle ground detection method and system based on deep neural network
EP2731050A1 (en) Cloud feature detection
Chandana et al. Autonomous drones based forest surveillance using Faster R-CNN
CN116486290B (en) Unmanned aerial vehicle monitoring and tracking method and device, electronic equipment and storage medium
CN116580056A (en) Ship detection and tracking method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination