CN110807794A - Single target tracking method based on multiple features - Google Patents

Single target tracking method based on multiple features Download PDF

Info

Publication number
CN110807794A
CN110807794A CN201910939321.0A CN201910939321A CN110807794A CN 110807794 A CN110807794 A CN 110807794A CN 201910939321 A CN201910939321 A CN 201910939321A CN 110807794 A CN110807794 A CN 110807794A
Authority
CN
China
Prior art keywords
image
frame
response
convolution
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910939321.0A
Other languages
Chinese (zh)
Inventor
宋建锋
苗启广
申猛
王宇杰
王崇晓
刘向增
权义宁
盛立杰
刘如意
戚玉涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201910939321.0A priority Critical patent/CN110807794A/en
Publication of CN110807794A publication Critical patent/CN110807794A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a single-target tracking method based on multiple features. The method comprises the steps of respectively carrying out correlation operation on convolution characteristics and difference image characteristics by adopting a correlation filter tracking method, fusing response graphs obtained by the correlation operation, and tracking the target by taking a fusion result as a dynamic target coordinate correction basis. The method improves the current situation of low tracking precision caused by the change of the surrounding environment of the target, self deformation or the shielding factor of the target in the traditional method, and effectively improves the precision of target tracking.

Description

Single target tracking method based on multiple features
Technical Field
The invention belongs to the field of computer vision, relates to an infrared target tracking and deep convolution neural network algorithm, and can be applied to an infrared target tracking scene.
Background
In recent years, the discriminant method has become the mainstream method in target tracking, and the following methods are commonly used: support vector machine tracking algorithm, correlation filtering tracking algorithm and the like. Avidan et al introduced the support vector machine approach to target tracking for the first time, and Hare et al proposed a Structured Output target tracking algorithm (Structured Output tracking with Kernels, Struck). A Support Vector Machine (SVM) with kernel structure output is used, the tracking effect is achieved by explicitly introducing an output space, and an intermediate classification link can be avoided.
In 2015, Danelljan et al proposed srdcf (spatial regulated discriminatory filters) algorithm, which has weak anti-interference capability, easy target loss and low tracking accuracy.
Disclosure of Invention
Aiming at the difference or the deficiency of the prior art, the invention provides a single-target tracking method based on multiple characteristics.
The invention provides a multi-feature-based single-target tracking method, which is characterized in that each frame image in a video to be tracked is sequentially tracked according to a time sequence, and the frame images are in the same coordinate system, and the method comprises the following steps:
marking target coordinates of a first frame of image, initializing a correlation filter by using the first frame of image, and acquiring an initial convolution characteristic correlation filter model;
extracting the multi-scale convolution characteristics of the second frame image, wherein the first frame target coordinate is used as the temporary target coordinate of the second frame image during extraction;
performing correlation operation on the multi-scale convolution characteristics of the second frame image by using an initial convolution characteristic correlation filter model to obtain a convolution characteristic response image of the second frame, wherein the difference value of the peak value coordinate of the response image and the center coordinate of the response image plus the coordinate of the first frame image is the target coordinate of the second frame;
obtaining a difference image of a second frame image and a first frame image, initializing a correlation filter by using the difference image, obtaining a convolution characteristic correlation filter model of the initial difference image, and taking the target coordinate of the second frame as the target coordinate of the current difference image in the initialization process;
updating the initial convolution characteristic correlation filter model by using the second frame image;
step three, circularly executing the steps (1) to (5) to sequentially track the target of the third frame and the subsequent images:
(1) extracting the multi-scale convolution characteristics of the nth frame, wherein n is more than or equal to 3, and taking the target coordinates of the n-1 frame as the temporary target coordinates of the current frame during extraction; obtaining a difference image of the nth frame and the (n-1) th frame, and extracting multi-scale convolution characteristics of the difference image;
(2) performing correlation operation on the multi-scale convolution characteristics of the nth frame by using a current convolution characteristic correlation filter model to obtain a first response image; performing correlation operation on the multi-scale convolution characteristics of the current difference image by using the convolution characteristic correlation filter model of the current difference image to obtain a second response image;
(3) fusing the first response diagram and the second response diagram by adopting a formula (I) to obtain a fused response diagram (or a fused response diagram matrix) R;
Figure BDA0002222441760000021
in formula (1):
m-1 or 2, wherein m-1 represents a convolution characteristic or a first response map, and m-2 represents a convolution characteristic or a second response map of the difference image;
fmrepresents a weight, f1=1.1,f2=1;
RmIs a corresponding characteristic response map (or a corresponding characteristic response map matrix);
max(R1,R2) Representing the maximum response value in the two response graphs;
PSRmrepresenting the peak-to-side lobe ratio of the corresponding characteristic response plot,
Figure BDA0002222441760000031
representing the maximum response value in the corresponding response map,
Figure BDA0002222441760000033
represents the average value of the response values corresponding to each pixel point in the side lobe in the corresponding response map,
Figure BDA0002222441760000034
representing the standard deviation of response values corresponding to all pixel points in the side lobe in a corresponding response graph;
(4) the difference value of the peak value coordinate of the fused response image and the center coordinate of the fused response image is added with the target coordinate of the n-1 frame image to form the target coordinate of the n frame image;
(5) updating the current convolution characteristic correlation filter model by using the image of the current frame; and updating the convolution characteristic correlation filter model of the current differential image by using the current differential image.
Preferably, the convolution feature of the invention is extracted by using a VGG convolution neural network, and the convolution feature is a conv3-3 convolution feature.
Preferably, the difference image is sequentially subjected to corrosion operation and expansion operation, and then multi-scale convolution feature extraction is performed.
Compared with the prior art, the invention has the following advantages:
(1) the invention fully exerts the advantages of each characteristic by dynamically fusing the related filtering response graphs of different characteristics, improves the infrared tracking accuracy, and is obviously superior to a tracker which uses two characteristics of a differential characteristic and a convolution characteristic independently by dynamically fusing trackers of different characteristics.
(2) The invention selects the better convolution characteristic for the tracking of the correlation filtering, and obtains the best tracking accuracy.
Drawings
FIG. 1 is an image (a) and a target response image (b) of a first frame image labeled with target coordinates in an embodiment;
FIG. 2 is a second frame image in the embodiment;
FIG. 3 is a differential image of a second frame and a first frame in the embodiment;
FIG. 4 is a graph of convolution response characteristics for a third frame of the embodiment;
FIG. 5 is a graph of differential feature responses for a third frame and a second frame according to an embodiment;
FIG. 6 is a graph of responses after fusion of the embodiments;
FIG. 7 the algorithm of the present invention compares the results with other tracker accuracy (a) and success rate (b).
Detailed Description
The target response map acquisition (or generation) method, the multi-scale image, the multi-scale Convolution feature extraction, initialization, update and related operations described in the present invention are all methods known in the art, and specifically, a scheme disclosed in document 1(Danelljan, Martin, et. "ECO: Efficient Convolution Operators for tracking."2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) IEEE Computer Society,2017.) may be adopted. It should be further explained that the multi-scale convolution features refer to the result of performing convolution feature extraction on the multi-scale images respectively. The initialization and the update may adopt the same operation, and the specific operation process according to the disclosure of document 1 may be understood as: generating a target response image of the image according to the target coordinates of the image, extracting a multi-scale image of the target coordinates of the image, further adopting a corresponding neural network to sequentially extract convolution characteristics of the multi-scale image, and then carrying out time domain convolution operation on a correlation filter by using the target response image (expected value) of the image and the multi-scale convolution characteristics of the image to obtain an initial convolution characteristic correlation filter model or an updated convolution characteristic correlation filter model.
The invention preferably extracts features using the last convolution layer of the first to the fifth groups of the VGG 16 network with the full connection layer and the softmax layer removed to obtain conv3-3 convolution features as the convolution features for tracking of the invention.
It should also be explained that the PSR according to formula (I) of the present inventionmThe meaning of peak to side lobe ratio (PSR), side lobes and related terms representing The corresponding characteristic response plots is disclosed in document 2(Bolme, David s., et al, "Visual object tracking using adaptive correlation filters." The screw-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA,13-18June 2010IEEE, 2010.).
Example (b):
the example is directed to data that is The forest snow horizon thermal infrared dataset (The
Figure BDA0002222441760000051
Thermal InfraReddataset, LTIR) adopts the method of the present invention to track the target, and each frame image to be tracked in the data is located in the same coordinate system, specifically through the item of coordinate normalization. The specific tracking method comprises the following steps:
marking target coordinates of a first frame of image, and initializing a correlation filter by using the first frame of image to obtain an initial convolution characteristic correlation filter model; specifically, a target response image of the first frame image is generated according to target coordinates of the first frame image, as shown in fig. 1, multi-scale convolution characteristics of the target coordinates of the first frame image are extracted, and a correlation filter is initialized by using the target response image of the first frame image and the multi-scale convolution characteristics of the first frame image, so as to obtain an initial convolution characteristic correlation filter model;
step two, taking the target coordinates of the first frame as temporary target coordinates of a second frame image (shown in figure 2), and extracting the multi-scale convolution characteristics of the second frame;
performing correlation operation on the multi-scale convolution characteristics of the second frame by using an initial convolution characteristic correlation filter to obtain a convolution characteristic response image of the second frame, wherein the difference value of the peak value coordinate of the response image and the central coordinate of the response image is added with the image coordinate of the first frame to be the target coordinate of the second frame;
obtaining a difference image (as shown in fig. 3) between the second frame and the first frame, taking the target coordinates of the second frame as the target of the current difference image, and initializing a correlation filter by using the current difference image to obtain a convolution characteristic correlation filter model of the initial difference image;
updating the initial convolution characteristic correlation filter model by using the image of the second frame;
step three, circularly executing the steps (1) to (5) to track the third frame and the following image frames:
(1) taking the target coordinates of the previous frame as temporary target coordinates of the current frame, and extracting multi-scale convolution characteristics of the nth frame image (namely the current frame); obtaining a difference image of the nth frame and the (n-1) th frame, and extracting multi-scale convolution characteristics of the difference image;
(2) performing correlation operation on the multi-scale convolution characteristics of the nth frame by using a current convolution characteristic correlation filter to obtain a first response image; the first response diagram of the third frame is shown in fig. 4; performing correlation operation on the multi-scale convolution characteristics of the current difference image by using a current difference characteristic correlation filter to obtain a second response image; third frame second response graph is shown in fig. 5;
(3) fusing the first response diagram and the second response diagram by adopting a formula (I) to obtain a fused response diagram R; the fused response graph of the second frame and the third frame is shown in FIG. 6;
(4) adding the difference value of the peak value coordinate of the fused response image and the center coordinate of the response image to the coordinate of the previous frame image to obtain the coordinate of the current frame;
(5) updating the current convolution characteristic correlation filter model by using the image of the current frame; and updating the convolution characteristic correlation filter model of the current differential image by using the current differential image.
Preferably, in the embodiment, when extracting the relevant convolution features (including the convolution features of the images of the scales of each frame of image and the convolution features of the difference images of each frame of image), extracting the conv3-3 convolution features by using a VGG convolution neural network; in the embodiment, the multi-scale proportion of each frame of image is as follows: 1,1.02,1.04,1.06,0.98,0.96,0.94.
On an LTIR infrared data set, the method is compared with four tracking algorithms of KCF, DSST, CSK and SimFC, and the comparison result is shown in FIG. 7, which shows that the tracking result of the image sequence of the method has better performance in accuracy.

Claims (3)

1. A multi-feature-based single-target tracking method sequentially tracks each frame image in a video to be tracked according to a time sequence, wherein each frame image is in the same coordinate system, and the method comprises the following steps:
marking target coordinates of a first frame of image, initializing a correlation filter by using the first frame of image, and acquiring an initial convolution characteristic correlation filter model;
extracting the multi-scale convolution characteristics of the second frame image, wherein the first frame target coordinate is used as the temporary target coordinate of the second frame image during extraction;
performing correlation operation on the multi-scale convolution characteristics of the second frame image by using an initial convolution characteristic correlation filter model to obtain a convolution characteristic response image of the second frame, wherein the difference value of the peak value coordinate of the response image and the center coordinate of the response image plus the coordinate of the first frame image is the target coordinate of the second frame;
obtaining a difference image of a second frame image and a first frame image, initializing a correlation filter by using the difference image, obtaining a convolution characteristic correlation filter model of the initial difference image, and taking the target coordinate of the second frame as the target coordinate of the current difference image in the initialization process;
updating the initial convolution characteristic correlation filter model by using the second frame image;
step three, circularly executing the steps (1) to (5) to sequentially track the target of the third frame and the subsequent images:
(1) extracting the multi-scale convolution characteristics of the nth frame, wherein n is more than or equal to 3, and taking the target coordinates of the n-1 frame as the temporary target coordinates of the current frame during extraction; obtaining a difference image of the nth frame and the (n-1) th frame, and extracting multi-scale convolution characteristics of the difference image;
(2) performing correlation operation on the multi-scale convolution characteristics of the nth frame by using a current convolution characteristic correlation filter model to obtain a first response image; performing correlation operation on the multi-scale convolution characteristics of the current difference image by using the convolution characteristic correlation filter model of the current difference image to obtain a second response image;
(3) fusing the first response diagram and the second response diagram by adopting a formula (I) to obtain a fused response diagram R;
in formula (1):
m-1 or 2, wherein m-1 represents a convolution characteristic or a first response map, and m-2 represents a convolution characteristic or a second response map of the difference image;
fmrepresents a weight, f1=1.1,f2=1;
RmIs a corresponding characteristic response graph;
max(R1,R2) Representing the maximum response value in the two response graphs;
PSRmindicating the peak sidelobes of the corresponding characteristic response map,
Figure FDA0002222441750000022
Figure FDA0002222441750000025
representing the maximum response value in the corresponding response map,
Figure FDA0002222441750000023
represents the average value of the response values corresponding to each pixel point in the side lobe in the corresponding response map,
Figure FDA0002222441750000024
representing the standard deviation of response values corresponding to all pixel points in the side lobe in a corresponding response graph;
(4) the difference value of the peak value coordinate of the fused response image and the center coordinate of the fused response image is added with the target coordinate of the n-1 frame image to form the target coordinate of the n frame image;
(5) updating the current convolution characteristic correlation filter model by using the image of the current frame; and updating the convolution characteristic correlation filter model of the current differential image by using the current differential image.
2. The multi-feature-based single target tracking method according to claim 1, wherein the convolution feature is extracted by adopting a VGG convolution neural network, and the convolution feature is conv3-3 convolution feature.
3. The multi-feature-based single-target tracking method according to claim 1, wherein the difference image is subjected to erosion operation and dilation operation in sequence and then subjected to multi-scale convolution feature extraction.
CN201910939321.0A 2019-09-30 2019-09-30 Single target tracking method based on multiple features Pending CN110807794A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910939321.0A CN110807794A (en) 2019-09-30 2019-09-30 Single target tracking method based on multiple features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910939321.0A CN110807794A (en) 2019-09-30 2019-09-30 Single target tracking method based on multiple features

Publications (1)

Publication Number Publication Date
CN110807794A true CN110807794A (en) 2020-02-18

Family

ID=69488022

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910939321.0A Pending CN110807794A (en) 2019-09-30 2019-09-30 Single target tracking method based on multiple features

Country Status (1)

Country Link
CN (1) CN110807794A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7362885B2 (en) * 2004-04-20 2008-04-22 Delphi Technologies, Inc. Object tracking and eye state identification method
CN102081801A (en) * 2011-01-26 2011-06-01 上海交通大学 Multi-feature adaptive fused ship tracking and track detecting method
CN103455797A (en) * 2013-09-07 2013-12-18 西安电子科技大学 Detection and tracking method of moving small target in aerial shot video
CN109242883A (en) * 2018-08-14 2019-01-18 西安电子科技大学 Optical remote sensing video target tracking method based on depth S R-KCF filtering

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7362885B2 (en) * 2004-04-20 2008-04-22 Delphi Technologies, Inc. Object tracking and eye state identification method
CN102081801A (en) * 2011-01-26 2011-06-01 上海交通大学 Multi-feature adaptive fused ship tracking and track detecting method
CN103455797A (en) * 2013-09-07 2013-12-18 西安电子科技大学 Detection and tracking method of moving small target in aerial shot video
CN109242883A (en) * 2018-08-14 2019-01-18 西安电子科技大学 Optical remote sensing video target tracking method based on depth S R-KCF filtering

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
宋建锋 等: "多特征融合的相关滤波红外单目标跟踪算法", 《西安电子科技大学学报》 *

Similar Documents

Publication Publication Date Title
Afifi et al. What else can fool deep learning? Addressing color constancy errors on deep neural network performance
CN107578423B (en) Multi-feature hierarchical fusion related filtering robust tracking method
CN112347861B (en) Human body posture estimation method based on motion feature constraint
CN109903331B (en) Convolutional neural network target detection method based on RGB-D camera
CN110782477A (en) Moving target rapid detection method based on sequence image and computer vision system
CN107609571B (en) Adaptive target tracking method based on LARK features
CN108257155B (en) Extended target stable tracking point extraction method based on local and global coupling
CN112634325A (en) Unmanned aerial vehicle video multi-target tracking method
CN110246154B (en) Visual target tracking method based on ICA-R multi-feature fusion and self-adaptive updating
CN104091145A (en) Human palm vein feature image acquisition method
CN114782298B (en) Infrared and visible light image fusion method with regional attention
CN113449658A (en) Night video sequence significance detection method based on spatial domain, frequency domain and time domain
CN116189019A (en) Unmanned aerial vehicle ground target tracking method based on improved twin neural network
CN114529584A (en) Single-target vehicle tracking method based on unmanned aerial vehicle aerial photography
CN116052025A (en) Unmanned aerial vehicle video image small target tracking method based on twin network
CN110751271B (en) Image traceability feature characterization method based on deep neural network
CN111429485A (en) Cross-modal filtering tracking method based on self-adaptive regularization and high-reliability updating
CN108876776B (en) Classification model generation method, fundus image classification method and device
CN112766102B (en) Unsupervised hyperspectral video target tracking method based on spatial spectrum feature fusion
CN110807794A (en) Single target tracking method based on multiple features
CN115984439A (en) Three-dimensional countertexture generation method and device for disguised target
CN113610888B (en) Twin network target tracking method based on Gaussian smoothing
CN113160050B (en) Small target identification method and system based on space-time neural network
CN114913337A (en) Camouflage target frame detection method based on ternary cascade perception
CN108010051A (en) Multisource video subject fusion tracking based on AdaBoost algorithms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200218

WD01 Invention patent application deemed withdrawn after publication