CN110807794A - Single target tracking method based on multiple features - Google Patents
Single target tracking method based on multiple features Download PDFInfo
- Publication number
- CN110807794A CN110807794A CN201910939321.0A CN201910939321A CN110807794A CN 110807794 A CN110807794 A CN 110807794A CN 201910939321 A CN201910939321 A CN 201910939321A CN 110807794 A CN110807794 A CN 110807794A
- Authority
- CN
- China
- Prior art keywords
- image
- frame
- response
- convolution
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a single-target tracking method based on multiple features. The method comprises the steps of respectively carrying out correlation operation on convolution characteristics and difference image characteristics by adopting a correlation filter tracking method, fusing response graphs obtained by the correlation operation, and tracking the target by taking a fusion result as a dynamic target coordinate correction basis. The method improves the current situation of low tracking precision caused by the change of the surrounding environment of the target, self deformation or the shielding factor of the target in the traditional method, and effectively improves the precision of target tracking.
Description
Technical Field
The invention belongs to the field of computer vision, relates to an infrared target tracking and deep convolution neural network algorithm, and can be applied to an infrared target tracking scene.
Background
In recent years, the discriminant method has become the mainstream method in target tracking, and the following methods are commonly used: support vector machine tracking algorithm, correlation filtering tracking algorithm and the like. Avidan et al introduced the support vector machine approach to target tracking for the first time, and Hare et al proposed a Structured Output target tracking algorithm (Structured Output tracking with Kernels, Struck). A Support Vector Machine (SVM) with kernel structure output is used, the tracking effect is achieved by explicitly introducing an output space, and an intermediate classification link can be avoided.
In 2015, Danelljan et al proposed srdcf (spatial regulated discriminatory filters) algorithm, which has weak anti-interference capability, easy target loss and low tracking accuracy.
Disclosure of Invention
Aiming at the difference or the deficiency of the prior art, the invention provides a single-target tracking method based on multiple characteristics.
The invention provides a multi-feature-based single-target tracking method, which is characterized in that each frame image in a video to be tracked is sequentially tracked according to a time sequence, and the frame images are in the same coordinate system, and the method comprises the following steps:
marking target coordinates of a first frame of image, initializing a correlation filter by using the first frame of image, and acquiring an initial convolution characteristic correlation filter model;
extracting the multi-scale convolution characteristics of the second frame image, wherein the first frame target coordinate is used as the temporary target coordinate of the second frame image during extraction;
performing correlation operation on the multi-scale convolution characteristics of the second frame image by using an initial convolution characteristic correlation filter model to obtain a convolution characteristic response image of the second frame, wherein the difference value of the peak value coordinate of the response image and the center coordinate of the response image plus the coordinate of the first frame image is the target coordinate of the second frame;
obtaining a difference image of a second frame image and a first frame image, initializing a correlation filter by using the difference image, obtaining a convolution characteristic correlation filter model of the initial difference image, and taking the target coordinate of the second frame as the target coordinate of the current difference image in the initialization process;
updating the initial convolution characteristic correlation filter model by using the second frame image;
step three, circularly executing the steps (1) to (5) to sequentially track the target of the third frame and the subsequent images:
(1) extracting the multi-scale convolution characteristics of the nth frame, wherein n is more than or equal to 3, and taking the target coordinates of the n-1 frame as the temporary target coordinates of the current frame during extraction; obtaining a difference image of the nth frame and the (n-1) th frame, and extracting multi-scale convolution characteristics of the difference image;
(2) performing correlation operation on the multi-scale convolution characteristics of the nth frame by using a current convolution characteristic correlation filter model to obtain a first response image; performing correlation operation on the multi-scale convolution characteristics of the current difference image by using the convolution characteristic correlation filter model of the current difference image to obtain a second response image;
(3) fusing the first response diagram and the second response diagram by adopting a formula (I) to obtain a fused response diagram (or a fused response diagram matrix) R;
in formula (1):
m-1 or 2, wherein m-1 represents a convolution characteristic or a first response map, and m-2 represents a convolution characteristic or a second response map of the difference image;
fmrepresents a weight, f1=1.1,f2=1;
RmIs a corresponding characteristic response map (or a corresponding characteristic response map matrix);
max(R1,R2) Representing the maximum response value in the two response graphs;
PSRmrepresenting the peak-to-side lobe ratio of the corresponding characteristic response plot, representing the maximum response value in the corresponding response map,represents the average value of the response values corresponding to each pixel point in the side lobe in the corresponding response map,representing the standard deviation of response values corresponding to all pixel points in the side lobe in a corresponding response graph;
(4) the difference value of the peak value coordinate of the fused response image and the center coordinate of the fused response image is added with the target coordinate of the n-1 frame image to form the target coordinate of the n frame image;
(5) updating the current convolution characteristic correlation filter model by using the image of the current frame; and updating the convolution characteristic correlation filter model of the current differential image by using the current differential image.
Preferably, the convolution feature of the invention is extracted by using a VGG convolution neural network, and the convolution feature is a conv3-3 convolution feature.
Preferably, the difference image is sequentially subjected to corrosion operation and expansion operation, and then multi-scale convolution feature extraction is performed.
Compared with the prior art, the invention has the following advantages:
(1) the invention fully exerts the advantages of each characteristic by dynamically fusing the related filtering response graphs of different characteristics, improves the infrared tracking accuracy, and is obviously superior to a tracker which uses two characteristics of a differential characteristic and a convolution characteristic independently by dynamically fusing trackers of different characteristics.
(2) The invention selects the better convolution characteristic for the tracking of the correlation filtering, and obtains the best tracking accuracy.
Drawings
FIG. 1 is an image (a) and a target response image (b) of a first frame image labeled with target coordinates in an embodiment;
FIG. 2 is a second frame image in the embodiment;
FIG. 3 is a differential image of a second frame and a first frame in the embodiment;
FIG. 4 is a graph of convolution response characteristics for a third frame of the embodiment;
FIG. 5 is a graph of differential feature responses for a third frame and a second frame according to an embodiment;
FIG. 6 is a graph of responses after fusion of the embodiments;
FIG. 7 the algorithm of the present invention compares the results with other tracker accuracy (a) and success rate (b).
Detailed Description
The target response map acquisition (or generation) method, the multi-scale image, the multi-scale Convolution feature extraction, initialization, update and related operations described in the present invention are all methods known in the art, and specifically, a scheme disclosed in document 1(Danelljan, Martin, et. "ECO: Efficient Convolution Operators for tracking."2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) IEEE Computer Society,2017.) may be adopted. It should be further explained that the multi-scale convolution features refer to the result of performing convolution feature extraction on the multi-scale images respectively. The initialization and the update may adopt the same operation, and the specific operation process according to the disclosure of document 1 may be understood as: generating a target response image of the image according to the target coordinates of the image, extracting a multi-scale image of the target coordinates of the image, further adopting a corresponding neural network to sequentially extract convolution characteristics of the multi-scale image, and then carrying out time domain convolution operation on a correlation filter by using the target response image (expected value) of the image and the multi-scale convolution characteristics of the image to obtain an initial convolution characteristic correlation filter model or an updated convolution characteristic correlation filter model.
The invention preferably extracts features using the last convolution layer of the first to the fifth groups of the VGG 16 network with the full connection layer and the softmax layer removed to obtain conv3-3 convolution features as the convolution features for tracking of the invention.
It should also be explained that the PSR according to formula (I) of the present inventionmThe meaning of peak to side lobe ratio (PSR), side lobes and related terms representing The corresponding characteristic response plots is disclosed in document 2(Bolme, David s., et al, "Visual object tracking using adaptive correlation filters." The screw-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA,13-18June 2010IEEE, 2010.).
Example (b):
the example is directed to data that is The forest snow horizon thermal infrared dataset (TheThermal InfraReddataset, LTIR) adopts the method of the present invention to track the target, and each frame image to be tracked in the data is located in the same coordinate system, specifically through the item of coordinate normalization. The specific tracking method comprises the following steps:
marking target coordinates of a first frame of image, and initializing a correlation filter by using the first frame of image to obtain an initial convolution characteristic correlation filter model; specifically, a target response image of the first frame image is generated according to target coordinates of the first frame image, as shown in fig. 1, multi-scale convolution characteristics of the target coordinates of the first frame image are extracted, and a correlation filter is initialized by using the target response image of the first frame image and the multi-scale convolution characteristics of the first frame image, so as to obtain an initial convolution characteristic correlation filter model;
step two, taking the target coordinates of the first frame as temporary target coordinates of a second frame image (shown in figure 2), and extracting the multi-scale convolution characteristics of the second frame;
performing correlation operation on the multi-scale convolution characteristics of the second frame by using an initial convolution characteristic correlation filter to obtain a convolution characteristic response image of the second frame, wherein the difference value of the peak value coordinate of the response image and the central coordinate of the response image is added with the image coordinate of the first frame to be the target coordinate of the second frame;
obtaining a difference image (as shown in fig. 3) between the second frame and the first frame, taking the target coordinates of the second frame as the target of the current difference image, and initializing a correlation filter by using the current difference image to obtain a convolution characteristic correlation filter model of the initial difference image;
updating the initial convolution characteristic correlation filter model by using the image of the second frame;
step three, circularly executing the steps (1) to (5) to track the third frame and the following image frames:
(1) taking the target coordinates of the previous frame as temporary target coordinates of the current frame, and extracting multi-scale convolution characteristics of the nth frame image (namely the current frame); obtaining a difference image of the nth frame and the (n-1) th frame, and extracting multi-scale convolution characteristics of the difference image;
(2) performing correlation operation on the multi-scale convolution characteristics of the nth frame by using a current convolution characteristic correlation filter to obtain a first response image; the first response diagram of the third frame is shown in fig. 4; performing correlation operation on the multi-scale convolution characteristics of the current difference image by using a current difference characteristic correlation filter to obtain a second response image; third frame second response graph is shown in fig. 5;
(3) fusing the first response diagram and the second response diagram by adopting a formula (I) to obtain a fused response diagram R; the fused response graph of the second frame and the third frame is shown in FIG. 6;
(4) adding the difference value of the peak value coordinate of the fused response image and the center coordinate of the response image to the coordinate of the previous frame image to obtain the coordinate of the current frame;
(5) updating the current convolution characteristic correlation filter model by using the image of the current frame; and updating the convolution characteristic correlation filter model of the current differential image by using the current differential image.
Preferably, in the embodiment, when extracting the relevant convolution features (including the convolution features of the images of the scales of each frame of image and the convolution features of the difference images of each frame of image), extracting the conv3-3 convolution features by using a VGG convolution neural network; in the embodiment, the multi-scale proportion of each frame of image is as follows: 1,1.02,1.04,1.06,0.98,0.96,0.94.
On an LTIR infrared data set, the method is compared with four tracking algorithms of KCF, DSST, CSK and SimFC, and the comparison result is shown in FIG. 7, which shows that the tracking result of the image sequence of the method has better performance in accuracy.
Claims (3)
1. A multi-feature-based single-target tracking method sequentially tracks each frame image in a video to be tracked according to a time sequence, wherein each frame image is in the same coordinate system, and the method comprises the following steps:
marking target coordinates of a first frame of image, initializing a correlation filter by using the first frame of image, and acquiring an initial convolution characteristic correlation filter model;
extracting the multi-scale convolution characteristics of the second frame image, wherein the first frame target coordinate is used as the temporary target coordinate of the second frame image during extraction;
performing correlation operation on the multi-scale convolution characteristics of the second frame image by using an initial convolution characteristic correlation filter model to obtain a convolution characteristic response image of the second frame, wherein the difference value of the peak value coordinate of the response image and the center coordinate of the response image plus the coordinate of the first frame image is the target coordinate of the second frame;
obtaining a difference image of a second frame image and a first frame image, initializing a correlation filter by using the difference image, obtaining a convolution characteristic correlation filter model of the initial difference image, and taking the target coordinate of the second frame as the target coordinate of the current difference image in the initialization process;
updating the initial convolution characteristic correlation filter model by using the second frame image;
step three, circularly executing the steps (1) to (5) to sequentially track the target of the third frame and the subsequent images:
(1) extracting the multi-scale convolution characteristics of the nth frame, wherein n is more than or equal to 3, and taking the target coordinates of the n-1 frame as the temporary target coordinates of the current frame during extraction; obtaining a difference image of the nth frame and the (n-1) th frame, and extracting multi-scale convolution characteristics of the difference image;
(2) performing correlation operation on the multi-scale convolution characteristics of the nth frame by using a current convolution characteristic correlation filter model to obtain a first response image; performing correlation operation on the multi-scale convolution characteristics of the current difference image by using the convolution characteristic correlation filter model of the current difference image to obtain a second response image;
(3) fusing the first response diagram and the second response diagram by adopting a formula (I) to obtain a fused response diagram R;
in formula (1):
m-1 or 2, wherein m-1 represents a convolution characteristic or a first response map, and m-2 represents a convolution characteristic or a second response map of the difference image;
fmrepresents a weight, f1=1.1,f2=1;
RmIs a corresponding characteristic response graph;
max(R1,R2) Representing the maximum response value in the two response graphs;
PSRmindicating the peak sidelobes of the corresponding characteristic response map, representing the maximum response value in the corresponding response map,represents the average value of the response values corresponding to each pixel point in the side lobe in the corresponding response map,representing the standard deviation of response values corresponding to all pixel points in the side lobe in a corresponding response graph;
(4) the difference value of the peak value coordinate of the fused response image and the center coordinate of the fused response image is added with the target coordinate of the n-1 frame image to form the target coordinate of the n frame image;
(5) updating the current convolution characteristic correlation filter model by using the image of the current frame; and updating the convolution characteristic correlation filter model of the current differential image by using the current differential image.
2. The multi-feature-based single target tracking method according to claim 1, wherein the convolution feature is extracted by adopting a VGG convolution neural network, and the convolution feature is conv3-3 convolution feature.
3. The multi-feature-based single-target tracking method according to claim 1, wherein the difference image is subjected to erosion operation and dilation operation in sequence and then subjected to multi-scale convolution feature extraction.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910939321.0A CN110807794A (en) | 2019-09-30 | 2019-09-30 | Single target tracking method based on multiple features |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910939321.0A CN110807794A (en) | 2019-09-30 | 2019-09-30 | Single target tracking method based on multiple features |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110807794A true CN110807794A (en) | 2020-02-18 |
Family
ID=69488022
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910939321.0A Pending CN110807794A (en) | 2019-09-30 | 2019-09-30 | Single target tracking method based on multiple features |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110807794A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7362885B2 (en) * | 2004-04-20 | 2008-04-22 | Delphi Technologies, Inc. | Object tracking and eye state identification method |
CN102081801A (en) * | 2011-01-26 | 2011-06-01 | 上海交通大学 | Multi-feature adaptive fused ship tracking and track detecting method |
CN103455797A (en) * | 2013-09-07 | 2013-12-18 | 西安电子科技大学 | Detection and tracking method of moving small target in aerial shot video |
CN109242883A (en) * | 2018-08-14 | 2019-01-18 | 西安电子科技大学 | Optical remote sensing video target tracking method based on depth S R-KCF filtering |
-
2019
- 2019-09-30 CN CN201910939321.0A patent/CN110807794A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7362885B2 (en) * | 2004-04-20 | 2008-04-22 | Delphi Technologies, Inc. | Object tracking and eye state identification method |
CN102081801A (en) * | 2011-01-26 | 2011-06-01 | 上海交通大学 | Multi-feature adaptive fused ship tracking and track detecting method |
CN103455797A (en) * | 2013-09-07 | 2013-12-18 | 西安电子科技大学 | Detection and tracking method of moving small target in aerial shot video |
CN109242883A (en) * | 2018-08-14 | 2019-01-18 | 西安电子科技大学 | Optical remote sensing video target tracking method based on depth S R-KCF filtering |
Non-Patent Citations (1)
Title |
---|
宋建锋 等: "多特征融合的相关滤波红外单目标跟踪算法", 《西安电子科技大学学报》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Afifi et al. | What else can fool deep learning? Addressing color constancy errors on deep neural network performance | |
CN107578423B (en) | Multi-feature hierarchical fusion related filtering robust tracking method | |
CN112347861B (en) | Human body posture estimation method based on motion feature constraint | |
CN109903331B (en) | Convolutional neural network target detection method based on RGB-D camera | |
CN110782477A (en) | Moving target rapid detection method based on sequence image and computer vision system | |
CN107609571B (en) | Adaptive target tracking method based on LARK features | |
CN108257155B (en) | Extended target stable tracking point extraction method based on local and global coupling | |
CN112634325A (en) | Unmanned aerial vehicle video multi-target tracking method | |
CN110246154B (en) | Visual target tracking method based on ICA-R multi-feature fusion and self-adaptive updating | |
CN104091145A (en) | Human palm vein feature image acquisition method | |
CN114782298B (en) | Infrared and visible light image fusion method with regional attention | |
CN113449658A (en) | Night video sequence significance detection method based on spatial domain, frequency domain and time domain | |
CN116189019A (en) | Unmanned aerial vehicle ground target tracking method based on improved twin neural network | |
CN114529584A (en) | Single-target vehicle tracking method based on unmanned aerial vehicle aerial photography | |
CN116052025A (en) | Unmanned aerial vehicle video image small target tracking method based on twin network | |
CN110751271B (en) | Image traceability feature characterization method based on deep neural network | |
CN111429485A (en) | Cross-modal filtering tracking method based on self-adaptive regularization and high-reliability updating | |
CN108876776B (en) | Classification model generation method, fundus image classification method and device | |
CN112766102B (en) | Unsupervised hyperspectral video target tracking method based on spatial spectrum feature fusion | |
CN110807794A (en) | Single target tracking method based on multiple features | |
CN115984439A (en) | Three-dimensional countertexture generation method and device for disguised target | |
CN113610888B (en) | Twin network target tracking method based on Gaussian smoothing | |
CN113160050B (en) | Small target identification method and system based on space-time neural network | |
CN114913337A (en) | Camouflage target frame detection method based on ternary cascade perception | |
CN108010051A (en) | Multisource video subject fusion tracking based on AdaBoost algorithms |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20200218 |
|
WD01 | Invention patent application deemed withdrawn after publication |