CN111275736A - Unmanned aerial vehicle video multi-target tracking method based on target scene consistency - Google Patents
Unmanned aerial vehicle video multi-target tracking method based on target scene consistency Download PDFInfo
- Publication number
- CN111275736A CN111275736A CN202010015437.8A CN202010015437A CN111275736A CN 111275736 A CN111275736 A CN 111275736A CN 202010015437 A CN202010015437 A CN 202010015437A CN 111275736 A CN111275736 A CN 111275736A
- Authority
- CN
- China
- Prior art keywords
- network
- target
- layers
- consistency
- aerial vehicle
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Abstract
The invention relates to the technical field of computer vision auxiliary devices, in particular to an unmanned aerial vehicle video multi-target tracking method based on target scene consistency, which can measure the similarity between targets by utilizing the consistency of scene and target appearance and improve the tracking performance of an unmanned aerial vehicle video; the method comprises the following steps: (1) calculating object-to-scene consistency using a ResNet-based twin Network (Siamese Network); (2) calculating the similarity of the appearance of the object and the object by using a ResNet-based twin Network (Simese Network); (3) constructing a branch network, wherein the network comprises a convolution layer and two full-connection layers, the input of the network is the output characteristic of the fifth convolution layer of the two twin networks, and the estimation of the offset of the detection result is output; the three networks are fused by a multi-task learning method so as to be mutually promoted.
Description
Technical Field
The invention relates to the technical field of computer vision auxiliary devices, in particular to an unmanned aerial vehicle video multi-target tracking method based on target scene consistency.
Background
Multi-target tracking (MOT) is a key step of many video analysis tasks, such as video event separation, behavioral understanding. MOT aims to track objects appearing in a video, giving the position of each object in each frame. The existing MOT method can be divided into two types according to the mode of utilizing the target detection result: off-line tracking and on-line tracking. The target detection result on the whole video is considered when the detection result is associated by offline tracking; and the online tracking considers the detection result on the current frame and the obtained motion trail of each object.
Existing methods generally use multiple cues (e.g., appearance, motion) to track objects to comprehensively measure the similarity between objects in adjacent frames. However, in the video of the unmanned aerial vehicle, the target size is small, so that the discrimination of self information, especially apparent information, is not strong, and the tracking performance of the existing multi-target tracking method on the video of the unmanned aerial vehicle is poor.
Disclosure of Invention
In order to solve the technical problems, the invention provides the unmanned aerial vehicle video multi-target tracking method based on the consistency of the target scene, which can comprehensively measure the probability that two targets are the same object by utilizing the consistency of the target and the scene and the apparent similarity of the target and the target, and improve the multi-target tracking precision in the unmanned aerial vehicle.
The invention discloses an unmanned aerial vehicle video multi-target tracking method based on target scene consistency, which comprises the following steps:
(1) calculating object-to-scene consistency using a ResNet-based twin Network (Siamese Network); the first five layers of the network are convolution layers, and then a full connection layer and soft max are connected, so that the confidence coefficient of the consistency of an object and a scene is output;
(2) calculating the similarity of the appearance of the object and the object by using a ResNet-based twin Network (Simese Network); the first five layers of the network are convolution layers, the full connection layer and soft max are connected behind the convolution layers, the output is the apparent similarity of the objects, and the larger the similarity is, the larger the probability that the two objects are the same object is;
(3) constructing a branch network, wherein the network comprises a convolution layer and two full-connection layers, the input of the network is the output characteristic of the fifth convolution layer of the two twin networks, and the estimation of the offset of the detection result is output;
the three networks are fused by a multi-task learning method so as to be mutually promoted.
The invention relates to an unmanned aerial vehicle video multi-target tracking method based on target scene consistency, which further comprises the step of initializing parameters of the first five convolutional layers of a twin network by using the parameters of the first five convolutional layers of RetNet50 trained on ImageNet, wherein the parameters of the fully-connected layers and the parameters of the rest convolutional layers are initialized in a random mode.
The invention discloses an unmanned aerial vehicle video multi-target tracking method based on target scene consistency, which further comprises the step of obtaining an object position with deviation as a detection result by adding disturbance to the real position of the object by utilizing a marked video sequence. And constructing a training data set by using the detection results, and training the whole network.
Compared with the prior art, the invention has the beneficial effects that: during multi-target tracking, the similarity between targets is utilized, and the consistency between scenes and the targets is also utilized, so that the method can deal with the situation that the targets in the video of the unmanned aerial vehicle are small and the apparent distinguishability is weak, thereby realizing more accurate target association and improving the tracking precision; and a multi-task learning framework is designed, so that a plurality of related tasks are mutually promoted, the target detection precision is improved, and the tracking accuracy is further improved.
Drawings
FIG. 1 is a schematic structural view of the present invention;
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
Example (b):
the main network of each branch network of the twin is based on the first five convolutional layers of RestNet50, wherein the characteristic dimension of the output of the first four convolutional layers is 14 × 1024, and 2048-dimensional characteristics are obtained after the fifth convolutional layer. By f1,f2,f3,f4Features of the outputs of the first five convolutional layers of the four-branch network are shown. The first two networks are taken into two 2048-dimensional features (f)1,f2) Stitched together, input fully-connected layers to get confidence of apparent-scene consistencyThe loss function here is defined as:
whereinA true value representing the object-scene consistency relationship,indicating that the target is consistent with the scene;indicating that the object is not consistent with the scene.
The characteristics f obtained for the last two networks3,f4Inputting the full connection layer and the soft-max layer to obtain the apparent similarity measurement between the target and the targetThe loss function here is defined as:
whereinA true value representing the object-scene consistency relationship,indicating that the appearance of two objects is similar, i.e. the same object;indicating that the objects being compared are not the same object.
Four 2048 dimensional features (f) from four networks1,f2,f3,f4) Are arranged side by sideCombining them together to form 2X 2048D features, inputting convolution layer and two full-connection layers to obtain the estimation of offset of detection resultThe loss function here is defined as:
here, the offset △ p between the detection result and the true position of the object is (△ x, △ y),
during training, the parameters of the first five convolutional layers are initialized by using the parameters of RestNet50 pre-trained on ImageNET, and the rest parameters are all initialized randomly;
constructing a training data set: adding disturbance on the basis of the real position of the target to obtain a biased result, using the biased result as a detection result of the target, and constructing a training data set; by minimizing the following loss function, the optimal network parameters are obtained.
Designing a target-scene consistency measurement network based on a twin network, wherein the network comprises two networks based on RestNet, and the two networks share the parameters of a convolutional layer and are used for extracting and calculating the consistency of the scene of the object of the t frame and the object of the t +1 frame;
constructing twin networks for calculating the apparent similarity of the target by using two RestNet-based networks, calculating the apparent similarity of the object of the t frame and the object of the t +1 frame, wherein the two networks also share the convolutional layer parameters;
meanwhile, another detection result offset estimation network is introduced for adjusting the target detection result. The network is composed of a convolutional layer and two fully-connected layers, and the input of the network is a feature matrix extracted by the two twin networks. By introducing the branch network, partial parameters of the three task networks of consistency estimation of a target scene, apparent similarity estimation of a target and target detection result offset estimation can be shared, and mutual promotion can be realized.
According to the unmanned aerial vehicle video multi-target tracking method based on the target scene consistency, the installation mode, the connection mode or the setting mode of all the components are common mechanical modes, and the specific structures, models and coefficient indexes of all the components are self-contained technologies, so that the beneficial effects can be achieved, and further description is omitted.
The invention relates to a method for unmanned aerial vehicle video multi-target tracking based on target scene consistency, wherein orientation words such as 'up, down, left, right, front, back, inside, outside, vertical and horizontal' contained in a term only represent the orientation of the term in a conventional use state or are common names understood by a person skilled in the art without being limited to the contrary, meanwhile, the numerical terms such as 'first', 'second' and 'third' do not represent specific numbers and sequences and are only used for name differentiation, and the terms 'comprise', 'contain' or any other variants thereof are meant to cover non-exclusive inclusions, so that a process, method, article or equipment comprising a series of elements not only comprises the elements, but also comprises other elements which are not explicitly listed, or also comprises the elements which are not listed, the process, the method, the article or the equipment, Method, article, or apparatus.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.
Claims (3)
1. An unmanned aerial vehicle video multi-target tracking method based on target scene consistency is characterized by comprising the following steps:
(1) calculating object-to-scene consistency using a ResNet-based twin Network (Siamese Network); the first five layers of the network are convolution layers, and then a full connection layer and soft max are connected, so that the confidence coefficient of the consistency of an object and a scene is output;
(2) calculating the similarity of the appearance of the object and the object by using a ResNet-based twin Network (Simese Network); the first five layers of the network are convolution layers, the full connection layer and soft max are connected behind the convolution layers, the output is the apparent similarity of the objects, and the larger the similarity is, the larger the probability that the two objects are the same object is;
(3) constructing a branch network, wherein the network comprises a convolution layer and two full-connection layers, the input of the network is the output characteristic of the fifth convolution layer of the two twin networks, and the estimation of the offset of the detection result is output;
the three networks are fused by a multi-task learning method so as to be mutually promoted.
2. The unmanned aerial vehicle video multi-target tracking method based on target scene consistency of claim 1, further comprising initializing parameters of the first five convolutional layers of the twin network with parameters of the first five convolutional layers of RetNet50 trained on ImageNet, wherein the parameters of the fully-connected layers and the remaining convolutional layers are initialized using a random manner.
3. The unmanned aerial vehicle video multi-target tracking method based on target scene consistency of claim 2, further comprising obtaining a position of an object with a deviation as a detection result by adding disturbance to a real position of the object by using a labeled video sequence. And constructing a training data set by using the detection results, and training the whole network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010015437.8A CN111275736A (en) | 2020-01-07 | 2020-01-07 | Unmanned aerial vehicle video multi-target tracking method based on target scene consistency |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010015437.8A CN111275736A (en) | 2020-01-07 | 2020-01-07 | Unmanned aerial vehicle video multi-target tracking method based on target scene consistency |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111275736A true CN111275736A (en) | 2020-06-12 |
Family
ID=70998816
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010015437.8A Pending CN111275736A (en) | 2020-01-07 | 2020-01-07 | Unmanned aerial vehicle video multi-target tracking method based on target scene consistency |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111275736A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112561904A (en) * | 2020-12-24 | 2021-03-26 | 凌云光技术股份有限公司 | Method and system for reducing false detection rate of AOI (argon oxygen decarburization) defects on display screen appearance |
CN116186907A (en) * | 2023-05-04 | 2023-05-30 | 中国人民解放军海军工程大学 | Method, system and medium for analyzing navigable state based on state of marine subsystem |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108846358A (en) * | 2018-06-13 | 2018-11-20 | 浙江工业大学 | A kind of method for tracking target carrying out Fusion Features based on twin network |
CN108898620A (en) * | 2018-06-14 | 2018-11-27 | 厦门大学 | Method for tracking target based on multiple twin neural network and regional nerve network |
CN109446889A (en) * | 2018-09-10 | 2019-03-08 | 北京飞搜科技有限公司 | Object tracking method and device based on twin matching network |
CN110298404A (en) * | 2019-07-02 | 2019-10-01 | 西南交通大学 | A kind of method for tracking target based on triple twin Hash e-learnings |
CN110580713A (en) * | 2019-08-30 | 2019-12-17 | 武汉大学 | Satellite video target tracking method based on full convolution twin network and track prediction |
-
2020
- 2020-01-07 CN CN202010015437.8A patent/CN111275736A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108846358A (en) * | 2018-06-13 | 2018-11-20 | 浙江工业大学 | A kind of method for tracking target carrying out Fusion Features based on twin network |
CN108898620A (en) * | 2018-06-14 | 2018-11-27 | 厦门大学 | Method for tracking target based on multiple twin neural network and regional nerve network |
CN109446889A (en) * | 2018-09-10 | 2019-03-08 | 北京飞搜科技有限公司 | Object tracking method and device based on twin matching network |
CN110298404A (en) * | 2019-07-02 | 2019-10-01 | 西南交通大学 | A kind of method for tracking target based on triple twin Hash e-learnings |
CN110580713A (en) * | 2019-08-30 | 2019-12-17 | 武汉大学 | Satellite video target tracking method based on full convolution twin network and track prediction |
Non-Patent Citations (4)
Title |
---|
HONGYANG YU ET AL.: "《nline multiple object tracking via exchanging object context》", 《NEUROCOMPUTING》 * |
JEANY SON ET AL.: "《Multi-object Tracking with Quadruplet Convolutional Neural Networks》", 《IEEE XPLORE》 * |
周士杰 等: "《基于双重孪生网络与相关滤波器的目标跟踪算法》", 《第二十二届计算机工程与工艺年会暨第八届微处理器技术论坛论文集》 * |
徐怀宇 等: "无人机目标跟踪综述", 《网络新媒体技术》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112561904A (en) * | 2020-12-24 | 2021-03-26 | 凌云光技术股份有限公司 | Method and system for reducing false detection rate of AOI (argon oxygen decarburization) defects on display screen appearance |
CN116186907A (en) * | 2023-05-04 | 2023-05-30 | 中国人民解放军海军工程大学 | Method, system and medium for analyzing navigable state based on state of marine subsystem |
CN116186907B (en) * | 2023-05-04 | 2023-09-15 | 中国人民解放军海军工程大学 | Method, system and medium for analyzing navigable state based on state of marine subsystem |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109508654B (en) | Face analysis method and system fusing multitask and multi-scale convolutional neural network | |
CN107945204B (en) | Pixel-level image matting method based on generation countermeasure network | |
CN113034548B (en) | Multi-target tracking method and system suitable for embedded terminal | |
CN109584213B (en) | Multi-target number selection tracking method | |
CN111639551A (en) | Online multi-target tracking method and system based on twin network and long-short term clues | |
CN111161315B (en) | Multi-target tracking method and system based on graph neural network | |
Cao et al. | Monocular depth estimation with augmented ordinal depth relationships | |
EP1530157A1 (en) | Image matching system using 3-dimensional object model, image matching method, and image matching program | |
CN114972418A (en) | Maneuvering multi-target tracking method based on combination of nuclear adaptive filtering and YOLOX detection | |
CN103679674A (en) | Method and system for splicing images of unmanned aircrafts in real time | |
CN111275736A (en) | Unmanned aerial vehicle video multi-target tracking method based on target scene consistency | |
CN109389156B (en) | Training method and device of image positioning model and image positioning method | |
CN111832484A (en) | Loop detection method based on convolution perception hash algorithm | |
WO2019167784A1 (en) | Position specifying device, position specifying method, and computer program | |
CN111353448A (en) | Pedestrian multi-target tracking method based on relevance clustering and space-time constraint | |
Cao et al. | Monocular depth estimation with augmented ordinal depth relationships | |
CN114417048A (en) | Unmanned aerial vehicle positioning method without positioning equipment based on image semantic guidance | |
CN112507859B (en) | Visual tracking method for mobile robot | |
CN111402429B (en) | Scale reduction and three-dimensional reconstruction method, system, storage medium and equipment | |
CN116883458A (en) | Transformer-based multi-target tracking system fusing motion characteristics with observation as center | |
Li et al. | Tvg-reid: Transformer-based vehicle-graph re-identification | |
CN115601841A (en) | Human body abnormal behavior detection method combining appearance texture and motion skeleton | |
CN116663384A (en) | Target track prediction method under battlefield task planning background | |
CN115147385A (en) | Intelligent detection and judgment method for repeated damage in aviation hole exploration video | |
CN114998611A (en) | Target contour detection method based on structure fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200612 |