CN114565867A - Unmanned aerial vehicle scene video target detection method based on convolutional neural network - Google Patents

Unmanned aerial vehicle scene video target detection method based on convolutional neural network Download PDF

Info

Publication number
CN114565867A
CN114565867A CN202210085038.8A CN202210085038A CN114565867A CN 114565867 A CN114565867 A CN 114565867A CN 202210085038 A CN202210085038 A CN 202210085038A CN 114565867 A CN114565867 A CN 114565867A
Authority
CN
China
Prior art keywords
model
target
neural network
convolutional neural
minidet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210085038.8A
Other languages
Chinese (zh)
Inventor
卢湖川
赵庆宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN202210085038.8A priority Critical patent/CN114565867A/en
Publication of CN114565867A publication Critical patent/CN114565867A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the field of video target detection in the field of computer vision, and provides an unmanned aerial vehicle scene video target detection method based on a convolutional neural network. The framework can fully utilize information correlation of pedestrians between front and rear frames of a video to narrow a region to be searched, reduces calculation load, aims at the problem of insufficient calculation capacity of an embedded platform, further accelerates network reasoning speed by using a tensorrt quantitative acceleration technology, and achieves good balance between accuracy and speed. The invention deploys the algorithm framework on an NVIDIA Jetson TX2 embedded platform for detecting the targets of the unmanned aerial vehicle scene downlink people, and the platform has the advantages of small volume, low power consumption, suitability for embedded application and the like.

Description

Unmanned aerial vehicle scene video target detection method based on convolutional neural network
Technical Field
The invention belongs to the field of video object detection in the field of computer vision, and particularly relates to image classification, object detection and neural network quantization acceleration technologies, in particular to an unmanned aerial vehicle scene video object detection method based on a convolutional neural network.
Background
Along with the rapid development of low-cost commercial unmanned aerial vehicles, video monitoring on the unmanned aerial vehicle scene is also more and more concerned. Several studies have addressed this problem from different aspects. However, few attempts have been made on embedded platforms. This work was primarily directed to developing an effective and efficient drone-based pedestrian detection algorithm framework on the Nvdia Jetson TX2 platform.
Existing detectors can be broadly divided into two-stage (e.g., fast-RCNN) and single-stage (e.g., YOLO and SSD) detectors. The two-stage detector first generates a region proposal frame (propusals) and uses a sub-network to classify and modify the region proposal frame; while the single-stage detector directly gives the final result of the detector, no region suggestion box needs to be generated. Generally, the speed of a single-stage detector is faster, while the accuracy of a two-stage detector is higher. Considering the computing power of embedded platforms, we chose the MobileNet based SSD detector as our base model.
In unmanned aerial vehicle scenes, the image resolution is typically high, but the people in the field of view are relatively small. Therefore, how to balance the speed and accuracy of the algorithm is a difficult task. If the detector is applied directly on the high resolution image, the computational cost will be huge for the embedded platform. But if we directly adjust the image to a low resolution, some objects will not be recognized because the appearance information is very limited. In our observations, most areas in the camera are free of targets. Thus, saving the computation of these areas will greatly speed up the detection process while maintaining good performance. In particular, we use the temporal and spatial relationship between frames to determine the detection position.
In this work, two MobileNetv 1-based SSD detectors, namely HeavyDet and MiniDet, were combined. The HeavyDet is a powerful global detector that detects the entire image in a sliding window fashion and finds the targets of the entire domain. In order to sufficiently extract the temporal-spatial information of the target video sequence, it is assumed that the movement of the target is small in a short time, and then the results of the previous frames are used to determine a local search area in the current frame. These search regions are handled by the MiniDet model, which uses a very small input size, making the detector more efficient than the HeavyDet model. Furthermore, the HeavyDet and MiniDet models are dynamically interactive in order to achieve a good balance between accuracy and speed.
Disclosure of Invention
The invention aims to provide a scene video target detection framework of an unmanned aerial vehicle, and aims to solve the problem that the video target detection reasoning speed is slow and the algorithm real-time requirement in an actual application scene cannot be met under the condition that the calculation power of an embedded platform on the unmanned aerial vehicle is limited in the prior art.
The technical scheme of the invention is as follows:
an unmanned aerial vehicle scene video target detection method based on a convolutional neural network comprises the following steps:
step 1, constructing a convolutional neural network model, wherein the convolutional neural network model comprises a global HeavyDet model and a local MiniDet model which are dynamically interactive;
the HeavyDet model is an SSD detector based on MobileNet, an original image is divided into a plurality of sub-regions, the adjacent sub-regions are partially overlapped, and then all the sub-regions of a picture are input into the HeavyDet model as a batch; next using the modified NMS to eliminate false positives due to the presence of targets in overlapping regions;
the improved NMS adds preprocessing operation on the basis of the traditional NMS, and the specific process is as follows: before executing the traditional NMS, mapping the position coordinates of the target boundary frames corresponding to all the sub-regions back to the original picture to obtain the target boundary frames under the original picture coordinate system, and then executing the traditional NMS;
the MiniDet model is an SSD detector based on MobileNet, taking a search area as its input, and returning the position of the target in the search area;
step 2, acquiring a video sequence to be detected, wherein the video sequence comprises video sequences of a plurality of pedestrians in an unmanned aerial vehicle scene, and the input of a convolutional neural network model is an image of any frame in the video sequence;
step 3, carrying out unmanned aerial vehicle scene downlink human target detection on the video sequence by using a convolutional neural network model to obtain a detection result;
taking each frame of image in the video sequence obtained in the step (2) as the input of the convolutional neural network model constructed in the step (1), and then predicting the positions of all pedestrians in the input image by using the convolutional neural network model;
the HeavyDet model is responsible for finely detecting pedestrians in the whole picture, and then expanding the detection result into a search area; the MiniDet model is responsible for correcting the search area frame by frame; the HeavyDet model and the MiniDet model are executed alternately;
when the pedestrian position is predicted, the HeavyDet model searches the whole image to preliminarily position the target position, and then an area which is enlarged by 1.5 times by taking the geometric center of the detected target boundary frame of the HeavDet model as the center is used as a search area of MiniDet; applying the MiniDet model to the search area to obtain a bounding box about the target, which will continue to be expanded by a factor of 1.5 as the search area for the next frame; until the heavyDet model detects the whole image again, so as to perfect the result of the MiniDet model and initialize a new target entering the region;
when the MiniDet model fails to find any target in a search area, retaining the bounding box of the target in the last frame and reducing its score; the position of the search area is updated only when some targets are detected in the search area or the HeavyDet model is started; the expansion of the detection bounding box may cover the adjacent target, thereby causing repeated detection, and the following two methods are adopted to solve the problem of repeated detection: the first approach is to ignore only partially visible targets during the training phase; another approach is to perform a modified NMS in collecting the results of MiniDet;
step 4, outputting the detection result, and visually outputting the bounding boxes of all pedestrians based on the detection result;
step 5, obtaining a convolutional neural network model, comprising:
building a convolutional neural network model to be trained;
acquiring a training data set, a testing data set and a verification data set;
training the convolutional neural network model by using a training data set to obtain the weight of the trained convolutional neural network model;
the trained convolutional neural network model is quantitatively accelerated using tensorrt.
The acquiring of the training data set, the testing data set and the verification data set comprises:
acquiring a pedestrian detection data set of an unmanned plane scene;
manually labeling all data in the detection data set;
and splitting the labeled data set into a training data set, a testing data set and a verification data set.
The invention has the beneficial effects that:
(1) a good balance between accuracy and speed is achieved
The invention provides an effective and efficient pedestrian detection framework using the HeavyDet and MiniDet models. The framework can fully utilize information correlation of pedestrians between front and rear frames of a video to narrow a region to be searched, reduces calculation load, aims at the problem of insufficient calculation capacity of an embedded platform, further accelerates network reasoning speed by using a tensorrt quantitative acceleration technology, and achieves good balance between accuracy and speed. The algorithm framework achieves satisfactory performance on the Nvdia Jetson TX2 platform, exceeding 5 fps.
(2) The applicability is wider, and the method is easy to deploy to an embedded platform
The invention deploys the algorithm framework on an NVIDIA Jetson TX2 embedded platform for detecting the targets of the unmanned aerial vehicle scene downlink people, and the platform has the advantages of small volume, low power consumption, suitability for embedded application and the like.
The Tensor RT is a tool for optimizing an algorithm deployment end, and has good support for most mainstream deep learning applications. The invention uses the Tensor RT to carry out rapid reasoning on the model with the optimization precision of FP16, thereby greatly reducing the delay, ensuring the optimal model deduction performance and meeting the strict requirements of time delay and throughput on an embedded platform with relatively tight calculation. In addition, the present invention packages the programs, dependencies into a lightweight, portable Docker container, and then deploys to Jetson TX2, Inviet.
Drawings
FIG. 1 is an overall framework of the detection algorithm. The HeavyDet and MiniDet operate alternately to detect pedestrians in a video sequence.
Fig. 2 is an example of the sliding window strategy of HeavyDet. The original image is segmented into overlapping sub-regions that are input as the same batch to the HeavyDet.
Fig. 3 is an example of expanding a detected target region. The target area detected in the previous frame is expanded in both the horizontal and vertical directions as a local search area for MiniDet.
FIG. 4 is an example of a fine position of a regression target in the expanded search area. MiniDet processes a given search area to update the location of the target.
FIG. 5 is a partial visualization of the methods herein.
Detailed Description
The following further describes a specific embodiment of the present invention with reference to the drawings and technical solutions.
An unmanned aerial vehicle scene video target detection method based on a convolutional neural network comprises the following steps:
step 1, obtaining a convolutional neural network model; which comprises the following steps: a strong global heavidet model and a fast local MiniDet model. The former carefully detects pedestrians on the whole image through a series of sliding windows; the latter performs a local search around the previous detection result. The HeavyDet and MiniDet models are dynamically interactive.
The HeavyDet model is a MobileNet-based SSD detector. In order to pursue high accuracy, the larger the input size of the image should be set, the better. But if the whole image (1920 x 1080) is taken as input, the memory of Nvdia Jetson TX2 is not sufficient. To solve this problem, the original image is divided into a number of sub-regions, which are input as a batch to the HeavyDet model. This approach not only greatly reduces the algorithm's requirements for run-time memory, but also facilitates improved detection performance because the target bounding box is much easier to regress for small images, especially when the target is relatively small. Segmenting the original image into multiple sub-regions can result in the target on the edges of each region being split, resulting in a situation where the target bounding box is split in half or results in false negatives. Thus, in the implementation of the present framework, adjacent sub-areas are overlapped together. In addition, we also used an improved version of NMS (Non-Maximum Suppression, NMS) to eliminate false positives due to the presence of targets in overlapping regions.
The traditional NMS is mainly used for extracting a boundary frame with high confidence coefficient in a picture in target detection, and inhibiting false detection frames with low confidence coefficient. Generally, the number of target bounding boxes output by the model is very large, and the specific number is determined by the number of anchors, where many repeated boxes are located to the same target, and the NMS is used to remove these repeated boxes to obtain the true target bounding box. In the method, the original picture is firstly split into a plurality of sub-regions with overlapping regions, and the conventional NMS can not eliminate a false detection frame of repeated detection among the plurality of sub-regions, so that the conventional NMS is improved and a preprocessing operation is added. Before executing traditional NMS, firstly mapping the position coordinates of the target boundary box corresponding to all the sub-regions back to the original picture to obtain the target boundary box under the original picture coordinate system, and then executing traditional NMS.
The MiniDet model is also a MobileNet based SSD detector with a small search area as its input and returns the location of the object in the search area. The search areas are obtained by expanding the previously detected target bounding box, the size of the search areas is 1.5 times of the size of the detected target bounding box, and the center positions of the search areas coincide. Since the speed of movement of a person is typically slow, the search area may also tend to cover all objects previously detected. When processing images using MiniDet, many regions without targets are ignored. Thus, the computational load is reduced and the speed of the detector is increased.
Step 2, acquiring a video sequence to be detected; the video sequence is a video sequence including a plurality of pedestrians in an unmanned aerial vehicle scene, and the video sequence can be a recorded video, such as a video obtained by a video recorder and other image recording devices in real time on line, and no strict requirement is made here. The input of the convolutional neural network is an image of any frame in the video sequence.
Step 3, carrying out unmanned aerial vehicle scene downlink human target detection on the video sequence by using the convolutional neural network model to obtain a detection result;
taking each frame of image in the video sequence as the input of the convolutional neural network, and then predicting the positions of all pedestrians in the input image by using the convolutional neural network;
the target detection algorithm of the method consists of HeavyDet and MiniDet, and is based on the SSD target detection algorithm. The HeavyDet model is responsible for finely detecting pedestrians in the whole picture and then expanding the detection result into a small search area, and the MiniDet model is responsible for correcting the pedestrian in the search area frame by frame. The HeavyDet model and the MiniDet model are alternately executed in this way, so that the balance between speed and accuracy is achieved.
Specifically, in predicting the pedestrian position, the heavidet model searches the entire image carefully to preliminarily locate the target position, and then expands the detection result of heavidet by 1.5 times as a search area of MiniDet. The MiniDet model is applied to the search area to obtain a bounding box about the target that will continue to be expanded by a factor of 1.5 as the search area for the next frame. After processing several frames, the HeavyDet again examines the entire image to refine the MiniDet results and initialize new targets into the region. The HeavyDet and MiniDet are dynamically alternated in this manner, with a good balance between accuracy and speed.
It can be noted that the MiniDet model may fail during its lifetime. If the MiniDet misses an object, the corresponding search area disappears so that the object is not identified again in the next frame until it is detected by the next HeavyDet. To correct this, the search areas are left for a period of time so that even if an object is not detected for a period of time, its trajectory does not immediately stop. Specifically, when MiniDet fails to find any target in a search area, we keep the target box in the last frame and reduce its score. The position of the search area is updated only when some targets are detected in the search area or the HeavyDet is initiated. Notably, the extension of the detection bounding box may cover nearby objects, thereby resulting in duplicate detections. The following two approaches are taken to address this problem. The first approach is to ignore partially visible objects during the training phase, which makes the model more sensitive to whole people rather than cropped people. Another approach is to perform the NMS when collecting the results of MiniDet, which is a conventional way of handling duplicate detections.
Furthermore, when the camera moves rapidly, the position of the object within the scene can change drastically. It is very difficult for MiniDet to continuously detect the target. In the algorithmic framework herein, the number of search areas where no target is detected is analyzed. If the proportion of the search area in which the target cannot be detected is high enough, the HeavyDet will be restarted to search the entire image again. It is not necessary to start the HeavyDet as often. The algorithm framework herein examines the number of targets detected and if the number is stable, the frequency of launching the HeavyDet is reduced.
And 4, outputting the detection result. The method comprises the following steps: based on the detection result, the boundary frames of all the pedestrians are visually output.
Step 5, the obtaining of the convolutional neural network model comprises the following steps:
building a convolutional neural network model to be trained;
acquiring a training data set, a testing data set and a verification data set;
and training the convolutional neural network model by using the training data set to obtain the weight of the trained convolutional neural network model.
Carrying out quantitative acceleration on the trained convolutional neural network model by using tensorrt;
step 6, the acquiring of the training data set, the testing data set and the verification data set comprises:
acquiring a pedestrian detection data set of an unmanned plane scene;
manually labeling all data in the detection data set;
and splitting the labeled data set into a training data set, a testing data set and a verification data set.
The embodiment of the invention provides an unmanned aerial vehicle scene video target detection method based on a convolutional neural network. The HeavyDet model searches through the entire image to preliminarily locate the target position, and then takes the result of HeavyDet as a search area of MiniDet. The MiniDet model is applied to the search area to obtain a bounding box about the target that will continue to be expanded as the search area for the next frame. After a period of processing, the HeavyDet again examines the entire image to refine the MiniDet results and initialize new targets into the region. Fig. 1 and table 1 show the overall framework flow of the method.
TABLE 1HeavyDet and MiniDet test procedures
Figure BDA0003487325350000091
Figure BDA0003487325350000101
The overall algorithm framework uses 5 frames as a period, and the first frame at the beginning of each period uses the HeacyDet detection, followed by the MiniDet detection. When the scene change is large or the target displacement is large, the search area of MiniDet may not keep up with the movement of the target, resulting in the target being lost and not being detected. For this case, a target loss threshold is set, and when the search area in which no target is detected is greater than 40% of the entire search area, HeacyDet detection is started for the next 3 frames of pictures. And when the target loss percentage is smaller than the target loss threshold value, the detection flow returns to be normal. The process flow of the present invention is specifically illustrated by the following example.
First, as shown in fig. 1 and table 1, it is first necessary to acquire a sequence of video frames, which may be recorded videos, or videos acquired online in real time by an image recording device such as a video recorder, and no strict requirement is made here. The input picture in fig. 1 is an example input.
Secondly, if the current stage (one stage for every 5 frames) is finished or fast motion occurs, the first frame picture of the current stage is partitioned with an overlapping area according to the method shown in fig. 2, and all the partitioned subgraphs are input into the HeacyDet model as a batch. And obtaining the detection result of the HeacyDet model.
When the camera moves rapidly, the position of the object within the scene changes dramatically. It is very difficult for MiniDet to continuously detect the target. In the algorithmic framework herein, the number of search areas where no target is detected is analyzed. If the proportion of the search area in which the target cannot be detected is high enough, the HeavyDet will be restarted to search the entire image again. It is not necessary to start the HeavyDet as often. The algorithm framework herein examines the number of targets detected and if the number is stable, the frequency of launching the HeavyDet is reduced.
An improved version of NMS was used to eliminate false positives in the detection results of the HeacyDet model due to the presence of targets in overlapping regions. And obtaining a detection result for removing the false positive target.
Fourthly, for the last four frames of pictures in the current stage, the method shown in fig. 3 is used to expand the bounding boxes of all the targets in the detection result in the third step by 1.5 times, and the expanded bounding boxes are used as the search area of the MiniDet model. The solid line frame in fig. 3 is a target boundary frame detected by the HeacyDet model, and the dashed line frame is obtained by expanding the detected target boundary frame, and the size of the dashed line frame is 1.5 times of the size of the detected target boundary frame, and the center positions of the two frames coincide.
Fifthly, all the search areas in the fourth step are used as a batch to be input into the MiniDet model, and the positions of the targets in the search areas which are corrected by the MiniDet are output.
Note that the MiniDet model may fail during its lifetime. If the MiniDet misses an object, the corresponding search area disappears, so that the object is not recognized again in the next frame until it is detected by the next HeavyDet. To correct this, the search areas are left for a period of time so that even if an object is not detected for a period of time, its trajectory does not immediately stop. Specifically, when MiniDet fails to find any target in a search area, we keep the target box in the last frame and reduce its score. The position of the search area is updated only when some targets are detected in the search area or the HeavyDet is initiated. Notably, the extension of the detection bounding box may cover nearby objects, thereby resulting in duplicate detections. The following two approaches are taken to address this problem. The first approach is to ignore partially visible objects during the training phase, which makes the model more sensitive to whole people rather than cropped people. Another approach is to perform the NMS when collecting the results of MiniDet, which is a conventional way of handling duplicate detection.
And sixthly, entering the next stage, and repeating all the steps until the video frame is finished.
The final output detection effect is shown in fig. 5, and it can be seen from the figure that the video target detection framework of the present invention can still maintain a good detection effect under the conditions of complex scene, more people and smaller target.

Claims (2)

1. An unmanned aerial vehicle scene video target detection method based on a convolutional neural network is characterized by comprising the following steps:
step 1, constructing a convolutional neural network model, wherein the convolutional neural network model comprises a global HeavyDet model and a local MiniDet model which are dynamically interactive;
the HeavyDet model is an SSD detector based on MobileNet, an original image is divided into a plurality of sub-regions, the adjacent sub-regions are partially overlapped, and then all the sub-regions of a picture are input into the HeavyDet model as a batch; next using modified NMS to eliminate false positives due to the presence of targets in overlapping regions;
the improved NMS adds preprocessing operation on the basis of the traditional NMS, and the specific process is as follows: before executing the traditional NMS, mapping the position coordinates of the target boundary frames corresponding to all the sub-regions back to the original picture to obtain the target boundary frames under the original picture coordinate system, and then executing the traditional NMS;
the MiniDet model is an SSD detector based on MobileNet, taking a search area as its input, and returning the position of the target in the search area;
step 2, acquiring a video sequence to be detected, wherein the video sequence to be detected comprises a plurality of pedestrian video sequences in an unmanned aerial vehicle scene, and the input of a convolutional neural network model is an image of any frame in the video sequence;
step 3, carrying out unmanned aerial vehicle scene downlink human target detection on the video sequence by using a convolutional neural network model to obtain a detection result;
taking each frame of image in the video sequence obtained in the step (2) as the input of the convolutional neural network model constructed in the step (1), and then predicting the positions of all pedestrians in the input image by using the convolutional neural network model;
the HeavyDet model is responsible for finely detecting pedestrians in the whole picture, and then expanding the detected target boundary frame into a search area; the MiniDet model is responsible for frame-by-frame correction in the search area; the HeavyDet model and the MiniDet model are executed alternately;
when the pedestrian position is predicted, the HeavyDet model searches the whole image to preliminarily position the target position, and then an area which is enlarged by 1.5 times by taking the geometric center of a target boundary frame detected by the HeavDet model as the center is used as a search area of MiniDet; applying the MiniDet model to the search area to obtain a bounding box about the target, which will continue to be expanded by a factor of 1.5 as the search area for the next frame; until the heavyDet model detects the whole image again, so as to perfect the result of the MiniDet model and initialize a new target entering the region;
when the MiniDet model fails to find any target in a search area, retaining the bounding box of the target in the last frame and reducing its score; the position of the search area is updated only when some targets are detected in the search area or the HeavyDet model is started; the expansion of the detection bounding box may cover the adjacent target, thereby causing the repeated detection, and the following two methods are adopted to solve the problem of the repeated detection: the first approach is to ignore only partially visible targets during the training phase; another approach is to perform a modified NMS in collecting the results of MiniDet;
step 4, outputting the detection result, and visually outputting the bounding boxes of all pedestrians based on the detection result;
step 5, obtaining a convolutional neural network model, comprising:
building a convolutional neural network model to be trained;
acquiring a training data set, a testing data set and a verification data set;
training the convolutional neural network model by using a training data set to obtain the weight of the trained convolutional neural network model;
the trained convolutional neural network model is quantitatively accelerated using tensorrt.
2. The method of claim 1, wherein the obtaining the training dataset, the testing dataset, and the verification dataset comprises:
acquiring a pedestrian detection data set of an unmanned plane scene;
manually labeling all data in the detection data set;
and splitting the labeled data set into a training data set, a testing data set and a verification data set.
CN202210085038.8A 2022-01-25 2022-01-25 Unmanned aerial vehicle scene video target detection method based on convolutional neural network Pending CN114565867A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210085038.8A CN114565867A (en) 2022-01-25 2022-01-25 Unmanned aerial vehicle scene video target detection method based on convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210085038.8A CN114565867A (en) 2022-01-25 2022-01-25 Unmanned aerial vehicle scene video target detection method based on convolutional neural network

Publications (1)

Publication Number Publication Date
CN114565867A true CN114565867A (en) 2022-05-31

Family

ID=81713283

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210085038.8A Pending CN114565867A (en) 2022-01-25 2022-01-25 Unmanned aerial vehicle scene video target detection method based on convolutional neural network

Country Status (1)

Country Link
CN (1) CN114565867A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114937045A (en) * 2022-06-20 2022-08-23 四川大学华西医院 Hepatocellular carcinoma pathological image segmentation system
CN116311084A (en) * 2023-05-22 2023-06-23 青岛海信网络科技股份有限公司 Crowd gathering detection method and video monitoring equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114937045A (en) * 2022-06-20 2022-08-23 四川大学华西医院 Hepatocellular carcinoma pathological image segmentation system
CN116311084A (en) * 2023-05-22 2023-06-23 青岛海信网络科技股份有限公司 Crowd gathering detection method and video monitoring equipment

Similar Documents

Publication Publication Date Title
US11302315B2 (en) Digital video fingerprinting using motion segmentation
CN111126152B (en) Multi-target pedestrian detection and tracking method based on video
US9454819B1 (en) System and method for static and moving object detection
US9443320B1 (en) Multi-object tracking with generic object proposals
US9349066B2 (en) Object tracking and processing
US7620266B2 (en) Robust and efficient foreground analysis for real-time video surveillance
US11527000B2 (en) System and method for re-identifying target object based on location information of CCTV and movement information of object
CN109035295B (en) Multi-target tracking method, device, computer equipment and storage medium
US20080187173A1 (en) Method and apparatus for tracking video image
CN114565867A (en) Unmanned aerial vehicle scene video target detection method based on convolutional neural network
Cao et al. Vehicle detection and tracking in airborne videos by multi-motion layer analysis
CN111401293B (en) Gesture recognition method based on Head lightweight Mask scanning R-CNN
Santoro et al. Crowd analysis by using optical flow and density based clustering
Tang et al. Multiple-kernel adaptive segmentation and tracking (MAST) for robust object tracking
WO2010113417A1 (en) Moving object tracking device, moving object tracking method, and moving object tracking program
Wang et al. Fire detection in video surveillance using superpixel-based region proposal and ESE-ShuffleNet
Angelo A novel approach on object detection and tracking using adaptive background subtraction method
CN112347967B (en) Pedestrian detection method fusing motion information in complex scene
WO2020019353A1 (en) Tracking control method, apparatus, and computer-readable storage medium
Zhang et al. An optical flow based moving objects detection algorithm for the UAV
Le et al. Human detection and tracking for autonomous human-following quadcopter
Pan et al. Fourier domain pruning of mobilenet-v2 with application to video based wildfire detection
JP2002027480A (en) Dynamic image processing method and apparatus thereof
CN114821441A (en) Deep learning-based airport scene moving target identification method combined with ADS-B information
Che et al. Traffic light recognition for real scenes based on image processing and deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination