CN111899283B - Video target tracking method - Google Patents

Video target tracking method Download PDF

Info

Publication number
CN111899283B
CN111899283B CN202010753190.XA CN202010753190A CN111899283B CN 111899283 B CN111899283 B CN 111899283B CN 202010753190 A CN202010753190 A CN 202010753190A CN 111899283 B CN111899283 B CN 111899283B
Authority
CN
China
Prior art keywords
convolution
correlation
target
layer
tracking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010753190.XA
Other languages
Chinese (zh)
Other versions
CN111899283A (en
Inventor
孟宇
邓在旭
沈伾伾
焦志宝
许焱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology Beijing USTB
Original Assignee
University of Science and Technology Beijing USTB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology Beijing USTB filed Critical University of Science and Technology Beijing USTB
Priority to CN202010753190.XA priority Critical patent/CN111899283B/en
Publication of CN111899283A publication Critical patent/CN111899283A/en
Application granted granted Critical
Publication of CN111899283B publication Critical patent/CN111899283B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Abstract

The application provides a video target tracking method, belonging to the field of computer vision. The method comprises the following steps: inputting the target image and the search image into a hierarchy correlation twin network at the same time to perform feature extraction to obtain convolution features extracted by different convolution layers, performing correlation measurement on the convolution features of the target image and the search image extracted by the same convolution layer, and generating hierarchy correlation by hierarchy splicing of the correlation of each layer; taking the position of the search image with the highest tracking response in the hierarchical correlation as the center position of the tracking target in the search image; and determining the position of the tracking target in the search image according to the central position of the tracking target in the search image and the independent scale factors. By adopting the method and the device, any target can be accurately tracked.

Description

Video target tracking method
Technical Field
The application relates to the field of computer vision, in particular to a video target tracking method.
Background
In recent years, the number of automobiles has been rapidly increased with the improvement of the living standard of people and the great change of the automobile manufacturing industry, but the available road resources are smaller and smaller, and the human self-reaction capability and the perception capability are limited, so that the traffic accident rate has been continuously increased in recent years due to the fact that the information fed back from the outside is wrongly judged. According to incomplete statistics, the number of traffic accident deaths caused by driving automobiles in the world is over 3000 ten thousand, and more than the number of world deaths caused by large combat. With the opportunity of revolutionary changes brought to the automobile manufacturing industry by the internet technology, unmanned vehicles represent a rapid development potential in the current society, and the main purpose of the unmanned vehicles is to separate people from complex driving operations and improve the safety of vehicles running on roads.
However, the unmanned vehicle has a certain difficulty in actually putting unmanned vehicle into practice, and the most critical problem is that the unmanned vehicle cannot accurately judge complex road conditions and obstacle conditions according to the prior experience like a human brain. The video target tracking is used as a key ring in the unmanned vehicle, the target in front of the vehicle is tracked in real time, the dynamic state of the target in front of the vehicle can be mastered, and a basis is provided for the unmanned vehicle to make a correct decision in the current environment, so that various necessary basic operations such as vehicle distance maintenance, lane changing, vehicle speed adjustment and the like can be ensured in the driving process, the performance of the unmanned vehicle is greatly improved, unnecessary accidents are reduced, and the driving safety is improved.
However, the existing video target tracking method has the problems of low tracking accuracy and the like.
Disclosure of Invention
The embodiment of the application provides a video target tracking method, which can improve the accuracy of target tracking. The technical scheme is as follows:
in one aspect, a video object tracking method is provided, the method being applied to an electronic device, the method comprising:
inputting the target image and the search image into a hierarchy correlation twin network at the same time to perform feature extraction to obtain convolution features extracted by different convolution layers, performing correlation measurement on the convolution features of the target image and the search image extracted by the same convolution layer, and generating hierarchy correlation by splicing the correlation of each layer, wherein the target image comprises: tracking a target;
taking the position of the search image with the highest tracking response in the hierarchical correlation as the center position of the tracking target in the search image;
and determining the position of the tracking target in the search image according to the central position of the tracking target in the search image and the independent scale factors.
Further, the step of simultaneously inputting the target image and the search image into the hierarchy correlation twin network to perform feature extraction, and the step of obtaining convolution features extracted by different convolution layers includes:
simultaneously inputting the target image and the search image into two branches of the hierarchical correlation twin network for feature extraction to perform convolution calculation, so as to obtain convolution features extracted by different convolution layers;
each branch structure for extracting the characteristics in the hierarchical correlation twin network is as follows: (conv1+relu+overlay+max POOL) — (conv2+relu+overlay+max POOL) — (conv3+relu) — (conv4+relu) — (conv5+relu);
where conv represents the convolutional layer, reLU represents the nonlinear activation function, overmapping represents the local response normalization layer, and Max POOL represents the maximum pooling layer.
Further, the formula for performing correlation measurement on the convolution characteristics of the target image and the search image extracted by the same convolution layer is as follows:
wherein F (z, x) i Representing a correlation between the target image extracted by the convolution layer i and the convolution characteristics of the search image; z and x represent the target image and the search image, respectively;a convolution characteristic representing the output of the convolution layer i; beta represents the deviation.
Further, the step of using the position of the search image with the highest tracking response in the hierarchical relevance as the tracking target at the center position of the search image comprises the following steps:
inputting the maximum correlation in the hierarchical correlation into a correlation attention module; the structure of the correlation attention module is as follows: full tie layer 1-full tie layer 2-full tie layer 3-full tie layer 4-softmax layer;
the correlation among the convolution characteristics of different layers is learned through four full-connection layers, and corresponding weights are distributed to each convolution layer through a softmax layer;
and determining the tracking response of each layer of convolution layer according to the correlation of each layer of convolution layer and the weight of the corresponding convolution layer obtained by allocation, and taking the position of the search image with the highest tracking response as the center position of the tracking target in the search image.
Further, the highest tracking response is expressed as:
wherein Y (z, x) represents the highest tracking response; z and x represent the target image and the search image, respectively;a convolution characteristic representing the output of the convolution layer i; alpha i A weight assigned to convolution layer i; beta represents the deviation.
Further, the scale factors independent in the width direction are expressed as:
s w (w+p)=A w
the scale factors independent in the height direction are expressed as:
s h (h+p)=A h
wherein s is w Sum s h Representing scale factors of the target in the width direction and the height direction, respectively; w and h represent the width and height of the target, respectively; p represents a filled region; a is that w And A is a h The size of the object in the width direction and the height direction are indicated, respectively.
Further, the filled region p is expressed as:
p=(w+h)/2。
in one aspect, an electronic device is provided that includes a processor and a memory having at least one instruction stored therein that is loaded and executed by the processor to implement the video object tracking method described above.
In one aspect, a computer readable storage medium having stored therein at least one instruction that is loaded and executed by a processor to implement the video object tracking method described above is provided.
The technical scheme provided by the embodiment of the application has the beneficial effects that at least:
1) The hierarchical correlation twin network is provided on the basis of the twin network to track the target, and can comprehensively utilize the characteristic information of a plurality of convolution layers, so that the selection of tracking target positions is increased, and the tracking accuracy of the video target tracking method is improved;
2) The video target tracking method can be adaptively adjusted when tracking different targets through the correlation attention module, different weights can be distributed to the correlation of each layer, the selection of the positions of the tracked targets is further enhanced, and the tracking accuracy is improved;
3) Independent scale factors are used in the width and height directions of the tracking target to output frames (i.e.: the size of the tracking target) can reduce the deformation of the output frame and increase the tracking accuracy;
4) Is more robust to the conditions of complex background and large scale change of the tracking target.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a video target tracking method according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a hierarchical correlation twin network according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a correlation attention module according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.
As shown in fig. 1, an embodiment of the present application provides a video object tracking method, which may be implemented by an electronic device, where the electronic device may be a terminal or a server, and the method includes:
s101, inputting a target image and a search image into a hierarchy correlation twin network at the same time to perform feature extraction to obtain convolution features extracted by different convolution layers, performing correlation measurement on the convolution features of the target image and the search image extracted by the same convolution layer, and generating hierarchy correlation by hierarchical stitching of the correlation of each layer, wherein the target image comprises: tracking a target;
s102, taking the position of the search image with highest tracking response in the hierarchical correlation as the center position of the tracking target in the search image;
s103, determining the position of the tracking target in the search image according to the central position of the tracking target in the search image and the independent scale factors.
According to the video target tracking method, a hierarchical correlation twin network is provided on the basis of the twin network to track the target, and the hierarchical correlation twin network can comprehensively utilize the characteristic information of a plurality of convolution layers, so that the selection of tracking target positions is increased, and the tracking accuracy of the video target tracking method is improved; because independent scale factors are used, the influence on deformation of a tracking target caused by zooming pictures can be reduced, and the accuracy of target tracking is further improved.
In a specific embodiment of the foregoing video object tracking method, further, each branch structure for performing feature extraction in the hierarchical correlation twin network is: (conv1+relu+overlay+max POOL) — (conv2+relu+overlay+max POOL) — (conv3+relu) — (conv4+relu) — (conv5+relu);
where conv represents the convolutional layer, reLU represents the nonlinear activation function, overmapping represents the local response normalization layer, and Max POOL represents the maximum pooling layer.
In this embodiment, the structure for extracting features in the hierarchical correlation twin network includes: five convolutional layers (conv); to prevent the gradient vanishing problem, a ReLU nonlinear activation function is added after each convolution layer.
In this embodiment, after the ReLU nonlinear activation functions of conv1 and conv2, a local response normalization layer is connected to accelerate convergence of the hierarchical correlation twin network, and at the same time, a maximum pooling layer is connected to reduce the size of the feature map after the local response normalization layer.
In this embodiment, since the target tracking task is different from the target detection task, it is not necessary to output the category of the tracking target, and thus, it is not necessary to use a full connection layer when extracting the features.
In this embodiment, as shown in fig. 2, for example, a target image with a size of 127×127 and a search image with a size of 255×255 may be input into a hierarchical correlation twin network at the same time to perform convolution calculation by two branches for feature extraction, so as to obtain convolution features extracted by different convolution layers; then, correlation measurement is carried out on the convolution characteristics of the target image and the search image extracted by the same layer of convolution layers, the correlation of each layer (common five layers) is spliced through the layers to generate the layer correlation, and the layer correlation of 5×17×17 is obtained, wherein 17×17 in the 5×17×17 is the size of the feature map output by each layer of convolution layers, and 5 is the number of convolution layers.
In this embodiment, the hierarchical concatenation may be understood as superposition, like RGB, where different R, G, B values are superimposed to show different colors.
In a specific embodiment of the foregoing video object tracking method, further, a formula for performing a correlation measurement on a convolution feature of a target image and a search image extracted by a same convolution layer is:
wherein F (z, x) i Representing a correlation between the target image extracted by the convolution layer i and the convolution characteristics of the search image; z and x represent the target image and the search image, respectively;a convolution characteristic representing the output of the convolution layer i; beta represents the deviation.
In this embodiment, the correlation metric can measure the difference between the convolution characteristics of the target image and the search image extracted by the same layer of convolution layers located in different branches.
In a specific embodiment of the foregoing video object tracking method, further, the step of using a position of the search image having the highest tracking response in the hierarchical relevance as the tracking object at a center position of the search image includes:
inputting the maximum correlation in the hierarchical correlation into a correlation attention module; the structure of the correlation attention module is as follows: full tie layer 1-full tie layer 2-full tie layer 3-full tie layer 4-softmax layer, as shown in fig. 3;
the correlation among the convolution characteristics of different layers is learned through four full-connection layers, and corresponding weights are distributed to each convolution layer through a softmax layer;
and determining the tracking response of each layer of convolution layer according to the correlation of each layer of convolution layer and the weight of the corresponding convolution layer obtained by allocation, and taking the position of the search image with the highest tracking response as the center position of the tracking target in the search image.
According to the method, the video target tracking method can be adaptively adjusted when tracking different targets through the correlation attention module, different weights can be distributed to the correlation of each layer, the selection of the positions of the tracked targets is further enhanced, and the tracking accuracy is improved.
In a specific embodiment of the foregoing video object tracking method, further, the highest tracking response is expressed as:
wherein Y (z, x) represents the highest tracking response; z and x represent the target image and the search image, respectively;a convolution characteristic representing the output of the convolution layer i; alpha i A weight assigned to convolution layer i; beta represents the deviation.
In a specific embodiment of the foregoing video object tracking method, further, the scale factors independent in the width direction are expressed as:
s w (w+p)=A w
the scale factors independent in the height direction are expressed as:
s h (h+p)=A h
wherein s is w Sum s h Respectively representing scale factors of the tracking target in the width direction and the height direction; w and h represent the width and height of the tracking target on the target image, respectively; p represents a filled region, p= (w+h)/2; a is that w And A is a h Respectively represent widthThe width and height of the target image input in the direction and height direction.
In this embodiment, the width w and the height h of the tracking target on the target image and the width A of the input target image can be used w And height A h Calculating scale factors s of the tracking target in the width direction and the height direction w Sum s h Further according to s w Sum s h The width and the height of the tracked object on the search image are calculated to be the final size of the output frame, so that the independent scale factors are used for adjusting the output frame (namely the size of the tracked object) in the width and the height directions of the tracked object, the deformation of the output frame can be reduced, and the tracking accuracy is increased.
In this embodiment, by using independent scale factors, the transformation of one dimension in the width and height directions will not affect the other dimension, and the tracking target will not be deformed basically, so that the influence on the deformation of the tracking target caused by scaling the target to a uniform size can be reduced.
According to the video target tracking method provided by the embodiment of the application, a hierarchical correlation twin network is provided on the basis of the twin network to track the target, and a correlation attention module and independent scale factors are designed for the hierarchical correlation twin network, so that compared with the video target tracking method based on a pure twin network, the video target tracking method has higher tracking accuracy and is more robust to the conditions of complex background and larger scale change of the tracked target.
Next, the validity of the video target tracking method provided by the embodiment of the application is verified, and the method is specifically:
the framework was deep learned using Python assembly language and TensorFlow.
And using the ILSRVC2015-VID data set as a training data set, randomly selecting two frames of images from one video segment, cutting and scaling to a fixed size, and then inputting the video segment into a network, wherein the interval between the two frames of images is not more than 100 frames.
In order to accelerate the convergence speed of the tracking model (comprising a hierarchical correlation twin network and a correlation attention module) in a training stage, an optimization method is a momentum gradient descent method, an exponentially weighted average gradient is used for replacing an original gradient to update parameters, and the momentum is set to be 0.9.
The iteration batch is set to 8 image pairs.
The initial learning rate was 0.01.
The attenuation coefficient was 0.86.
The training round number is 60, and each round contains 53200 image pairs.
After training, the trained tracking model provided by the application is tested on OTB50, OTB100, VOT2015 and VOT2016 data sets, and the test proves that the tracking model provided by the embodiment improves the tracking accuracy by 6.5% compared with the video target tracking model based on a pure twin network (for example, a full convolution twin network), and can improve the tracking performance of an algorithm under the condition of not reducing the speed.
Fig. 4 is a schematic structural diagram of an electronic device 600 according to an embodiment of the present application, where the electronic device 600 may have a relatively large difference due to different configurations or performances, and may include one or more processors (central processing units, CPU) 601 and one or more memories 602, where at least one instruction is stored in the memories 602, and the at least one instruction is loaded and executed by the processors 601 to implement the video object tracking method described above.
In an exemplary embodiment, a computer readable storage medium, such as a memory comprising instructions executable by a processor in a terminal to perform the video object tracking method described above, is also provided. For example, the computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The foregoing description of the preferred embodiments of the application is not intended to limit the application to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the application are intended to be included within the scope of the application.

Claims (3)

1. A video object tracking method, comprising:
inputting the target image and the search image into a hierarchy correlation twin network at the same time to perform feature extraction to obtain convolution features extracted by different convolution layers, performing correlation measurement on the convolution features of the target image and the search image extracted by the same convolution layer, and generating hierarchy correlation by splicing the correlation of each layer, wherein the target image comprises: tracking a target;
taking the position of the search image with the highest tracking response in the hierarchical correlation as the center position of the tracking target in the search image;
determining the position of the tracking target in the search image according to the central position of the tracking target in the search image and the independent scale factors;
the step of taking the position of the search image with the highest tracking response in the hierarchical relevance as a tracking target at the center position of the search image comprises the following steps:
inputting the maximum correlation in the hierarchical correlation into a correlation attention module; the structure of the correlation attention module is as follows: full tie layer 1-full tie layer 2-full tie layer 3-full tie layer 4-softmax layer;
the correlation among the convolution characteristics of different layers is learned through four full-connection layers, and corresponding weights are distributed to each convolution layer through a softmax layer;
determining the tracking response of each layer of convolution layer according to the correlation of each layer of convolution layer and the weight of the corresponding convolution layer obtained by distribution, and taking the position of the search image with the highest tracking response as the center position of the tracking target in the search image;
wherein the highest tracking response is expressed as:
wherein Y (z, x) represents the highest tracking response; z and x represent the target image and the search image, respectively;a convolution characteristic representing the output of the convolution layer i; alpha i A weight assigned to convolution layer i; beta represents a deviation;
wherein the scale factors independent in the width direction are expressed as:
s w (w+p)=A w
the scale factors independent in the height direction are expressed as:
s h (h+p)=A h
wherein s is w Sum s h Respectively representing scale factors of the tracking target in the width direction and the height direction; w and h represent the width and height of the tracking target on the target image, respectively; p represents a filled region, p= (w+h)/2; a is that w And A is a h Representing the width and height of the target image input in the width direction and the height direction, respectively;
wherein the filled region p is denoted as:
p=(w+h)/2。
2. the method of claim 1, wherein the step of simultaneously inputting the target image and the search image into the hierarchical correlation twin network to perform feature extraction, and obtaining convolution features extracted by different convolution layers comprises:
simultaneously inputting the target image and the search image into two branches of the hierarchical correlation twin network for feature extraction to perform convolution calculation, so as to obtain convolution features extracted by different convolution layers;
each branch structure for extracting the characteristics in the hierarchical correlation twin network is as follows: (conv1+relu+overlay+max POOL) — (conv2+relu+overlay+max POOL) — (conv3+relu) — (conv4+relu) — (conv5+relu);
where conv represents the convolutional layer, reLU represents the nonlinear activation function, overmapping represents the local response normalization layer, and Max POOL represents the maximum pooling layer.
3. The video object tracking method according to claim 1, wherein the formula for performing the correlation measurement on the convolution characteristics of the object image and the search image extracted by the same convolution layer is:
wherein F (z, x) i Representing a correlation between the target image extracted by the convolution layer i and the convolution characteristics of the search image; z and x represent the target image and the search image, respectively;a convolution characteristic representing the output of the convolution layer i; beta represents the deviation.
CN202010753190.XA 2020-07-30 2020-07-30 Video target tracking method Active CN111899283B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010753190.XA CN111899283B (en) 2020-07-30 2020-07-30 Video target tracking method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010753190.XA CN111899283B (en) 2020-07-30 2020-07-30 Video target tracking method

Publications (2)

Publication Number Publication Date
CN111899283A CN111899283A (en) 2020-11-06
CN111899283B true CN111899283B (en) 2023-10-17

Family

ID=73182806

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010753190.XA Active CN111899283B (en) 2020-07-30 2020-07-30 Video target tracking method

Country Status (1)

Country Link
CN (1) CN111899283B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116360492B (en) * 2023-04-03 2024-01-30 北京科技大学 Object tracking method and system for flapping wing flying robot
CN117809025A (en) * 2024-03-01 2024-04-02 深圳魔视智能科技有限公司 Attention network-based target tracking method, device, equipment and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017156886A (en) * 2016-02-29 2017-09-07 Kddi株式会社 Device of tracking object taking similarity degree between images into consideration, program thereof and method thereof
CN109978921A (en) * 2019-04-01 2019-07-05 南京信息工程大学 A kind of real-time video target tracking algorithm based on multilayer attention mechanism
CN110021033A (en) * 2019-02-22 2019-07-16 广西师范大学 A kind of method for tracking target based on the twin network of pyramid
CN110286683A (en) * 2019-07-15 2019-09-27 北京科技大学 A kind of autonomous running path tracking control method of caterpillar mobile robot
CN110490906A (en) * 2019-08-20 2019-11-22 南京邮电大学 A kind of real-time vision method for tracking target based on twin convolutional network and shot and long term memory network
CN110675429A (en) * 2019-09-24 2020-01-10 湖南人文科技学院 Long-range and short-range complementary target tracking method based on twin network and related filter
CN111161317A (en) * 2019-12-30 2020-05-15 北京工业大学 Single-target tracking method based on multiple networks
CN111192292A (en) * 2019-12-27 2020-05-22 深圳大学 Target tracking method based on attention mechanism and twin network and related equipment
CN111260688A (en) * 2020-01-13 2020-06-09 深圳大学 Twin double-path target tracking method
CN111291679A (en) * 2020-02-06 2020-06-16 厦门大学 Target specific response attention target tracking method based on twin network

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10902243B2 (en) * 2016-10-25 2021-01-26 Deep North, Inc. Vision based target tracking that distinguishes facial feature targets
US20180129934A1 (en) * 2016-11-07 2018-05-10 Qualcomm Incorporated Enhanced siamese trackers
US11308350B2 (en) * 2016-11-07 2022-04-19 Qualcomm Incorporated Deep cross-correlation learning for object tracking
US10902615B2 (en) * 2017-11-13 2021-01-26 Qualcomm Incorporated Hybrid and self-aware long-term object tracking
US11055854B2 (en) * 2018-08-23 2021-07-06 Seoul National University R&Db Foundation Method and system for real-time target tracking based on deep learning
US11493908B2 (en) * 2018-11-13 2022-11-08 Rockwell Automation Technologies, Inc. Industrial safety monitoring configuration using a digital twin

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017156886A (en) * 2016-02-29 2017-09-07 Kddi株式会社 Device of tracking object taking similarity degree between images into consideration, program thereof and method thereof
CN110021033A (en) * 2019-02-22 2019-07-16 广西师范大学 A kind of method for tracking target based on the twin network of pyramid
CN109978921A (en) * 2019-04-01 2019-07-05 南京信息工程大学 A kind of real-time video target tracking algorithm based on multilayer attention mechanism
CN110286683A (en) * 2019-07-15 2019-09-27 北京科技大学 A kind of autonomous running path tracking control method of caterpillar mobile robot
CN110490906A (en) * 2019-08-20 2019-11-22 南京邮电大学 A kind of real-time vision method for tracking target based on twin convolutional network and shot and long term memory network
CN110675429A (en) * 2019-09-24 2020-01-10 湖南人文科技学院 Long-range and short-range complementary target tracking method based on twin network and related filter
CN111192292A (en) * 2019-12-27 2020-05-22 深圳大学 Target tracking method based on attention mechanism and twin network and related equipment
CN111161317A (en) * 2019-12-30 2020-05-15 北京工业大学 Single-target tracking method based on multiple networks
CN111260688A (en) * 2020-01-13 2020-06-09 深圳大学 Twin double-path target tracking method
CN111291679A (en) * 2020-02-06 2020-06-16 厦门大学 Target specific response attention target tracking method based on twin network

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Deep Siamese Network for Multiple Object Tracking;Bonan Cuan 等;《2018 IEEE 20th international workshop on multimedia signal processing(MMSP)》;1-6 *
Feature Deep Continuous Aggregation for 3D Vehicle Detection;Zhao, K 等;《Applied sciences-basel》;第9卷(第24期);1-17 *
Hierarchical correlation siamese network for real-time object tracking;Yu Meng 等;《Applied Intelligence》;第51卷(第6期);3202-3211 *
基于可变预测时域及速度的车辆路径跟踪控制;白国星 等;《中国机械工程》;第31卷(第11期);1277-1284 *
融合多尺度局部特征与深度特征的双目立体匹配;王旭初 等;《光学学报》;第40卷(第2期);119-131 *

Also Published As

Publication number Publication date
CN111899283A (en) 2020-11-06

Similar Documents

Publication Publication Date Title
CN107945204B (en) Pixel-level image matting method based on generation countermeasure network
KR102635987B1 (en) Method, apparatus, device and storage medium for training an image semantic segmentation network
CN112052787B (en) Target detection method and device based on artificial intelligence and electronic equipment
CN107169421B (en) Automobile driving scene target detection method based on deep convolutional neural network
CN111554105B (en) Intelligent traffic identification and statistics method for complex traffic intersection
CN111899283B (en) Video target tracking method
CN111353505B (en) Device based on network model capable of realizing semantic segmentation and depth of field estimation jointly
CN112307978A (en) Target detection method and device, electronic equipment and readable storage medium
CN110689043A (en) Vehicle fine granularity identification method and device based on multiple attention mechanism
CN111126459A (en) Method and device for identifying fine granularity of vehicle
CN112464912B (en) Robot end face detection method based on YOLO-RGGNet
CN115512251A (en) Unmanned aerial vehicle low-illumination target tracking method based on double-branch progressive feature enhancement
CN113269133A (en) Unmanned aerial vehicle visual angle video semantic segmentation method based on deep learning
CN111179272B (en) Rapid semantic segmentation method for road scene
CN114360239A (en) Traffic prediction method and system for multilayer space-time traffic knowledge map reconstruction
CN115661767A (en) Image front vehicle target identification method based on convolutional neural network
CN113177432A (en) Head pose estimation method, system, device and medium based on multi-scale lightweight network
CN114399638A (en) Semantic segmentation network training method, equipment and medium based on patch learning
CN113920479A (en) Target detection network construction method, target detection device and electronic equipment
CN112597996A (en) Task-driven natural scene-based traffic sign significance detection method
CN116776208A (en) Training method of seismic wave classification model, seismic wave selecting method, equipment and medium
CN115689946A (en) Image restoration method, electronic device and computer program product
CN115981302A (en) Vehicle following lane change behavior decision-making method and device and electronic equipment
CN112016599A (en) Neural network training method and device for image retrieval and electronic equipment
CN117456480B (en) Light vehicle re-identification method based on multi-source information fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant