CN111275736A - Unmanned aerial vehicle video multi-target tracking method based on target scene consistency - Google Patents

Unmanned aerial vehicle video multi-target tracking method based on target scene consistency Download PDF

Info

Publication number
CN111275736A
CN111275736A CN202010015437.8A CN202010015437A CN111275736A CN 111275736 A CN111275736 A CN 111275736A CN 202010015437 A CN202010015437 A CN 202010015437A CN 111275736 A CN111275736 A CN 111275736A
Authority
CN
China
Prior art keywords
network
target
layers
consistency
aerial vehicle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010015437.8A
Other languages
Chinese (zh)
Inventor
李国荣
黄庆明
苏荔
于洪洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Chinese Academy of Sciences
Original Assignee
University of Chinese Academy of Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Chinese Academy of Sciences filed Critical University of Chinese Academy of Sciences
Priority to CN202010015437.8A priority Critical patent/CN111275736A/en
Publication of CN111275736A publication Critical patent/CN111275736A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Abstract

The invention relates to the technical field of computer vision auxiliary devices, in particular to an unmanned aerial vehicle video multi-target tracking method based on target scene consistency, which can measure the similarity between targets by utilizing the consistency of scene and target appearance and improve the tracking performance of an unmanned aerial vehicle video; the method comprises the following steps: (1) calculating object-to-scene consistency using a ResNet-based twin Network (Siamese Network); (2) calculating the similarity of the appearance of the object and the object by using a ResNet-based twin Network (Simese Network); (3) constructing a branch network, wherein the network comprises a convolution layer and two full-connection layers, the input of the network is the output characteristic of the fifth convolution layer of the two twin networks, and the estimation of the offset of the detection result is output; the three networks are fused by a multi-task learning method so as to be mutually promoted.

Description

Unmanned aerial vehicle video multi-target tracking method based on target scene consistency
Technical Field
The invention relates to the technical field of computer vision auxiliary devices, in particular to an unmanned aerial vehicle video multi-target tracking method based on target scene consistency.
Background
Multi-target tracking (MOT) is a key step of many video analysis tasks, such as video event separation, behavioral understanding. MOT aims to track objects appearing in a video, giving the position of each object in each frame. The existing MOT method can be divided into two types according to the mode of utilizing the target detection result: off-line tracking and on-line tracking. The target detection result on the whole video is considered when the detection result is associated by offline tracking; and the online tracking considers the detection result on the current frame and the obtained motion trail of each object.
Existing methods generally use multiple cues (e.g., appearance, motion) to track objects to comprehensively measure the similarity between objects in adjacent frames. However, in the video of the unmanned aerial vehicle, the target size is small, so that the discrimination of self information, especially apparent information, is not strong, and the tracking performance of the existing multi-target tracking method on the video of the unmanned aerial vehicle is poor.
Disclosure of Invention
In order to solve the technical problems, the invention provides the unmanned aerial vehicle video multi-target tracking method based on the consistency of the target scene, which can comprehensively measure the probability that two targets are the same object by utilizing the consistency of the target and the scene and the apparent similarity of the target and the target, and improve the multi-target tracking precision in the unmanned aerial vehicle.
The invention discloses an unmanned aerial vehicle video multi-target tracking method based on target scene consistency, which comprises the following steps:
(1) calculating object-to-scene consistency using a ResNet-based twin Network (Siamese Network); the first five layers of the network are convolution layers, and then a full connection layer and soft max are connected, so that the confidence coefficient of the consistency of an object and a scene is output;
(2) calculating the similarity of the appearance of the object and the object by using a ResNet-based twin Network (Simese Network); the first five layers of the network are convolution layers, the full connection layer and soft max are connected behind the convolution layers, the output is the apparent similarity of the objects, and the larger the similarity is, the larger the probability that the two objects are the same object is;
(3) constructing a branch network, wherein the network comprises a convolution layer and two full-connection layers, the input of the network is the output characteristic of the fifth convolution layer of the two twin networks, and the estimation of the offset of the detection result is output;
the three networks are fused by a multi-task learning method so as to be mutually promoted.
The invention relates to an unmanned aerial vehicle video multi-target tracking method based on target scene consistency, which further comprises the step of initializing parameters of the first five convolutional layers of a twin network by using the parameters of the first five convolutional layers of RetNet50 trained on ImageNet, wherein the parameters of the fully-connected layers and the parameters of the rest convolutional layers are initialized in a random mode.
The invention discloses an unmanned aerial vehicle video multi-target tracking method based on target scene consistency, which further comprises the step of obtaining an object position with deviation as a detection result by adding disturbance to the real position of the object by utilizing a marked video sequence. And constructing a training data set by using the detection results, and training the whole network.
Compared with the prior art, the invention has the beneficial effects that: during multi-target tracking, the similarity between targets is utilized, and the consistency between scenes and the targets is also utilized, so that the method can deal with the situation that the targets in the video of the unmanned aerial vehicle are small and the apparent distinguishability is weak, thereby realizing more accurate target association and improving the tracking precision; and a multi-task learning framework is designed, so that a plurality of related tasks are mutually promoted, the target detection precision is improved, and the tracking accuracy is further improved.
Drawings
FIG. 1 is a schematic structural view of the present invention;
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
Example (b):
the main network of each branch network of the twin is based on the first five convolutional layers of RestNet50, wherein the characteristic dimension of the output of the first four convolutional layers is 14 × 1024, and 2048-dimensional characteristics are obtained after the fifth convolutional layer. By f1,f2,f3,f4Features of the outputs of the first five convolutional layers of the four-branch network are shown. The first two networks are taken into two 2048-dimensional features (f)1,f2) Stitched together, input fully-connected layers to get confidence of apparent-scene consistency
Figure BDA0002358692380000031
The loss function here is defined as:
Figure BDA0002358692380000032
wherein
Figure BDA0002358692380000033
A true value representing the object-scene consistency relationship,
Figure BDA0002358692380000034
indicating that the target is consistent with the scene;
Figure BDA0002358692380000035
indicating that the object is not consistent with the scene.
The characteristics f obtained for the last two networks3,f4Inputting the full connection layer and the soft-max layer to obtain the apparent similarity measurement between the target and the target
Figure BDA0002358692380000036
The loss function here is defined as:
Figure BDA0002358692380000037
wherein
Figure BDA0002358692380000038
A true value representing the object-scene consistency relationship,
Figure BDA0002358692380000039
indicating that the appearance of two objects is similar, i.e. the same object;
Figure BDA00023586923800000310
indicating that the objects being compared are not the same object.
Four 2048 dimensional features (f) from four networks1,f2,f3,f4) Are arranged side by sideCombining them together to form 2X 2048D features, inputting convolution layer and two full-connection layers to obtain the estimation of offset of detection result
Figure RE-GDA0002477583160000033
The loss function here is defined as:
Figure RE-GDA0002477583160000034
here, the offset △ p between the detection result and the true position of the object is (△ x, △ y),
Figure BDA00023586923800000313
during training, the parameters of the first five convolutional layers are initialized by using the parameters of RestNet50 pre-trained on ImageNET, and the rest parameters are all initialized randomly;
constructing a training data set: adding disturbance on the basis of the real position of the target to obtain a biased result, using the biased result as a detection result of the target, and constructing a training data set; by minimizing the following loss function, the optimal network parameters are obtained.
Figure BDA0002358692380000041
Wherein ω is all parameters of the network, | ω | | non-calculation2Representing the L2 norm of ω.
Designing a target-scene consistency measurement network based on a twin network, wherein the network comprises two networks based on RestNet, and the two networks share the parameters of a convolutional layer and are used for extracting and calculating the consistency of the scene of the object of the t frame and the object of the t +1 frame;
constructing twin networks for calculating the apparent similarity of the target by using two RestNet-based networks, calculating the apparent similarity of the object of the t frame and the object of the t +1 frame, wherein the two networks also share the convolutional layer parameters;
meanwhile, another detection result offset estimation network is introduced for adjusting the target detection result. The network is composed of a convolutional layer and two fully-connected layers, and the input of the network is a feature matrix extracted by the two twin networks. By introducing the branch network, partial parameters of the three task networks of consistency estimation of a target scene, apparent similarity estimation of a target and target detection result offset estimation can be shared, and mutual promotion can be realized.
According to the unmanned aerial vehicle video multi-target tracking method based on the target scene consistency, the installation mode, the connection mode or the setting mode of all the components are common mechanical modes, and the specific structures, models and coefficient indexes of all the components are self-contained technologies, so that the beneficial effects can be achieved, and further description is omitted.
The invention relates to a method for unmanned aerial vehicle video multi-target tracking based on target scene consistency, wherein orientation words such as 'up, down, left, right, front, back, inside, outside, vertical and horizontal' contained in a term only represent the orientation of the term in a conventional use state or are common names understood by a person skilled in the art without being limited to the contrary, meanwhile, the numerical terms such as 'first', 'second' and 'third' do not represent specific numbers and sequences and are only used for name differentiation, and the terms 'comprise', 'contain' or any other variants thereof are meant to cover non-exclusive inclusions, so that a process, method, article or equipment comprising a series of elements not only comprises the elements, but also comprises other elements which are not explicitly listed, or also comprises the elements which are not listed, the process, the method, the article or the equipment, Method, article, or apparatus.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (3)

1. An unmanned aerial vehicle video multi-target tracking method based on target scene consistency is characterized by comprising the following steps:
(1) calculating object-to-scene consistency using a ResNet-based twin Network (Siamese Network); the first five layers of the network are convolution layers, and then a full connection layer and soft max are connected, so that the confidence coefficient of the consistency of an object and a scene is output;
(2) calculating the similarity of the appearance of the object and the object by using a ResNet-based twin Network (Simese Network); the first five layers of the network are convolution layers, the full connection layer and soft max are connected behind the convolution layers, the output is the apparent similarity of the objects, and the larger the similarity is, the larger the probability that the two objects are the same object is;
(3) constructing a branch network, wherein the network comprises a convolution layer and two full-connection layers, the input of the network is the output characteristic of the fifth convolution layer of the two twin networks, and the estimation of the offset of the detection result is output;
the three networks are fused by a multi-task learning method so as to be mutually promoted.
2. The unmanned aerial vehicle video multi-target tracking method based on target scene consistency of claim 1, further comprising initializing parameters of the first five convolutional layers of the twin network with parameters of the first five convolutional layers of RetNet50 trained on ImageNet, wherein the parameters of the fully-connected layers and the remaining convolutional layers are initialized using a random manner.
3. The unmanned aerial vehicle video multi-target tracking method based on target scene consistency of claim 2, further comprising obtaining a position of an object with a deviation as a detection result by adding disturbance to a real position of the object by using a labeled video sequence. And constructing a training data set by using the detection results, and training the whole network.
CN202010015437.8A 2020-01-07 2020-01-07 Unmanned aerial vehicle video multi-target tracking method based on target scene consistency Pending CN111275736A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010015437.8A CN111275736A (en) 2020-01-07 2020-01-07 Unmanned aerial vehicle video multi-target tracking method based on target scene consistency

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010015437.8A CN111275736A (en) 2020-01-07 2020-01-07 Unmanned aerial vehicle video multi-target tracking method based on target scene consistency

Publications (1)

Publication Number Publication Date
CN111275736A true CN111275736A (en) 2020-06-12

Family

ID=70998816

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010015437.8A Pending CN111275736A (en) 2020-01-07 2020-01-07 Unmanned aerial vehicle video multi-target tracking method based on target scene consistency

Country Status (1)

Country Link
CN (1) CN111275736A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112561904A (en) * 2020-12-24 2021-03-26 凌云光技术股份有限公司 Method and system for reducing false detection rate of AOI (argon oxygen decarburization) defects on display screen appearance
CN116186907A (en) * 2023-05-04 2023-05-30 中国人民解放军海军工程大学 Method, system and medium for analyzing navigable state based on state of marine subsystem

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108846358A (en) * 2018-06-13 2018-11-20 浙江工业大学 A kind of method for tracking target carrying out Fusion Features based on twin network
CN108898620A (en) * 2018-06-14 2018-11-27 厦门大学 Method for tracking target based on multiple twin neural network and regional nerve network
CN109446889A (en) * 2018-09-10 2019-03-08 北京飞搜科技有限公司 Object tracking method and device based on twin matching network
CN110298404A (en) * 2019-07-02 2019-10-01 西南交通大学 A kind of method for tracking target based on triple twin Hash e-learnings
CN110580713A (en) * 2019-08-30 2019-12-17 武汉大学 Satellite video target tracking method based on full convolution twin network and track prediction

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108846358A (en) * 2018-06-13 2018-11-20 浙江工业大学 A kind of method for tracking target carrying out Fusion Features based on twin network
CN108898620A (en) * 2018-06-14 2018-11-27 厦门大学 Method for tracking target based on multiple twin neural network and regional nerve network
CN109446889A (en) * 2018-09-10 2019-03-08 北京飞搜科技有限公司 Object tracking method and device based on twin matching network
CN110298404A (en) * 2019-07-02 2019-10-01 西南交通大学 A kind of method for tracking target based on triple twin Hash e-learnings
CN110580713A (en) * 2019-08-30 2019-12-17 武汉大学 Satellite video target tracking method based on full convolution twin network and track prediction

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
HONGYANG YU ET AL.: "《nline multiple object tracking via exchanging object context》", 《NEUROCOMPUTING》 *
JEANY SON ET AL.: "《Multi-object Tracking with Quadruplet Convolutional Neural Networks》", 《IEEE XPLORE》 *
周士杰 等: "《基于双重孪生网络与相关滤波器的目标跟踪算法》", 《第二十二届计算机工程与工艺年会暨第八届微处理器技术论坛论文集》 *
徐怀宇 等: "无人机目标跟踪综述", 《网络新媒体技术》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112561904A (en) * 2020-12-24 2021-03-26 凌云光技术股份有限公司 Method and system for reducing false detection rate of AOI (argon oxygen decarburization) defects on display screen appearance
CN116186907A (en) * 2023-05-04 2023-05-30 中国人民解放军海军工程大学 Method, system and medium for analyzing navigable state based on state of marine subsystem
CN116186907B (en) * 2023-05-04 2023-09-15 中国人民解放军海军工程大学 Method, system and medium for analyzing navigable state based on state of marine subsystem

Similar Documents

Publication Publication Date Title
CN109508654B (en) Face analysis method and system fusing multitask and multi-scale convolutional neural network
CN107945204B (en) Pixel-level image matting method based on generation countermeasure network
CN113034548B (en) Multi-target tracking method and system suitable for embedded terminal
CN109584213B (en) Multi-target number selection tracking method
CN111639551A (en) Online multi-target tracking method and system based on twin network and long-short term clues
CN111161315B (en) Multi-target tracking method and system based on graph neural network
Cao et al. Monocular depth estimation with augmented ordinal depth relationships
EP1530157A1 (en) Image matching system using 3-dimensional object model, image matching method, and image matching program
CN114972418A (en) Maneuvering multi-target tracking method based on combination of nuclear adaptive filtering and YOLOX detection
CN103679674A (en) Method and system for splicing images of unmanned aircrafts in real time
CN111275736A (en) Unmanned aerial vehicle video multi-target tracking method based on target scene consistency
CN109389156B (en) Training method and device of image positioning model and image positioning method
CN111832484A (en) Loop detection method based on convolution perception hash algorithm
WO2019167784A1 (en) Position specifying device, position specifying method, and computer program
CN111353448A (en) Pedestrian multi-target tracking method based on relevance clustering and space-time constraint
Cao et al. Monocular depth estimation with augmented ordinal depth relationships
CN114417048A (en) Unmanned aerial vehicle positioning method without positioning equipment based on image semantic guidance
CN112507859B (en) Visual tracking method for mobile robot
CN111402429B (en) Scale reduction and three-dimensional reconstruction method, system, storage medium and equipment
CN116883458A (en) Transformer-based multi-target tracking system fusing motion characteristics with observation as center
Li et al. Tvg-reid: Transformer-based vehicle-graph re-identification
CN115601841A (en) Human body abnormal behavior detection method combining appearance texture and motion skeleton
CN116663384A (en) Target track prediction method under battlefield task planning background
CN115147385A (en) Intelligent detection and judgment method for repeated damage in aviation hole exploration video
CN114998611A (en) Target contour detection method based on structure fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200612