CN109685831A - Method for tracking target and system based on residual error layering attention and correlation filter - Google Patents
Method for tracking target and system based on residual error layering attention and correlation filter Download PDFInfo
- Publication number
- CN109685831A CN109685831A CN201811592319.2A CN201811592319A CN109685831A CN 109685831 A CN109685831 A CN 109685831A CN 201811592319 A CN201811592319 A CN 201811592319A CN 109685831 A CN109685831 A CN 109685831A
- Authority
- CN
- China
- Prior art keywords
- sample
- target
- network
- attention
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
Abstract
The present disclosure proposes a kind of method for tracking target and system based on residual error layering attention and correlation filter.The disclosure uses the convolutional neural networks of end-to-end training, and using correlation filter as the layer in network, realizes and carries out real-time target following to moving target.Moreover, being layered attention study by residual error, more effective, robust convolution target signature can be obtained, the generalization ability of target following has been obviously improved.Perception is carried out to context in joint fashion and regressive object is carried out adaptively, to be obviously improved the discriminating power of target following in addition, multi-context correlation wave filtering layer realizes.
Description
Technical field
This disclosure relates to a kind of method for tracking target and system based on residual error layering attention and correlation filter.
Background technique
Only there is provided background technical informations relevant to the disclosure for the statement of this part, it is not necessary to so constitute first skill
Art.
The target moved to one carries out the important branch and research hotspot that target following is computer vision, and in many
Field, such as motion event detection, video monitoring, biological vision, are widely used.However, due to often occurring during tracking
Profile variation, illumination variation, block, background interference the problems such as, target following is still an extremely challenging project.
In recent years, the method for tracking target based on correlation filter causes extensive concern, quickly grows.This kind of side
Method can reach higher tracking accuracy, while have faster processing speed.During tracking, contextual information is contained
Many important foreground and background clues, these information help to promote the accuracy of target positioning.However, being filtered based on correlation
The method for tracking target of wave device not can be carried out context-aware usually;Part such methods are although be utilized during tracking
Contextual information, but since the region of search of each frame only includes a small amount of context area, and for weakening boundary effect
Cosine Window can be further reduced the contextual information that region of search includes in such methods.
In nearest 5 years, the correlation technique and technology of depth network and machine learning are gradually applied to target following
In, substantially increase the performance of target following.Such methods compared with traditional method for tracking target, tracking precision and with
Track success rate is obviously improved.However, it is many based on the method for tracking target of deep learning using as VGG or
Alexnet etc. trained in advance network, is superimposed with other existing trackings again later, is difficult to meet target following real-time
It is required that not carrying out network training end to end veritably, and give full play to the advantage of depth network.
Summary of the invention
The disclosure to solve the above-mentioned problems, proposes a kind of mesh based on residual error layering attention and correlation filter
Mark tracking and system.The disclosure uses the convolutional neural networks of end-to-end training, and using correlation filter as network
In layer, realize and real-time target following carried out to moving target.Moreover, being layered attention study by residual error, can obtain
More effectively, the convolution target signature of robust, be obviously improved the generalization ability of target following.In addition, multi-context correlation
Wave filtering layer, which realizes, to carry out perception to context in joint fashion and carries out adaptively, being obviously improved target to regressive object
The discriminating power of tracking.
According to some embodiments, the disclosure is adopted the following technical scheme that
A kind of method for tracking target based on residual error layering attention and correlation filter, comprising the following steps:
(1) current frame image is read, obtains position and scale of the target in previous frame image, and then determine in present frame
Test sample;
(2) test sample is input to the convolutional neural networks after training, obtains the convolution feature of test sample, it will be described
Feature is input to multi-context correlation wave filtering layer, by model parameter, obtains network response, and determine target in present frame
Position and scale;
(3) position according to target in present frame and scale obtain training sample, by training sample input convolution mind
Power module is paid attention to through network and residual error layering, is obtained containing the training sample feature for paying attention to force information;
(4) transformation sample is extracted, convolutional neural networks are inputted, based on transformation in the position of present frame according to target
The network of sample responds, and obtains adaptive regressive object, later, extracts context sample, obtains context sample feature, and
According to containing the training sample feature and adaptive regressive object for paying attention to force information, the filtering containing multi-context information is obtained
Device parameter;
(5) using the filter parameter obtained, original model parameter is updated.
It is limited as further, the method also includes step (6), are updated to next frame image, constantly progress step
(1)-(5) iteration, until all image procossings are completed.
It is limited as further, in the step (1), determination process packet of the selected target in the test sample of present frame
It includes: in current frame image centered on previous frame target position, extracting the image of N times of previous frame target scale of scale
Piece, N is greater than 1, and image sheet is adjusted to specified pixel, the test sample as present frame.
It is limited as further, in the step (2), the structure of convolutional neural networks includes:
Using the structure of VGG-16 network first tier convolutional layer and second layer convolutional layer, and remove all pond layers;
The above convolutional layer is copied as into symmetrical twin network structure, making network tool, there are two the consistent training of structure point
Branch and test branch;
The Hourglass structure with three layers of pond layer is added after the convolutional layer of network training branch, as the net
The residual error layering of network pays attention to power module;
The last layer of network is multi-context correlation wave filtering layer, and the input of this layer is to pay attention to the output and survey of power module
Try the output of branch.
Limited as further, in the step (2), using training dataset to convolutional neural networks end to end into
Row pre-training.
It is limited as further, in the step (2), the detailed process of pre-training includes:
Training data is pre-processed, a pair of of picture frame is extracted every several frames, is mentioned with being greater than the range of target sizes
Image sheet is taken out, and adjusts the big of image sheet and as low as sets pixel, the sample as training network;
Using stochastic gradient descent method training network;
Notice that the convolutional neural networks of power module carry out the iteration instruction that complete data set is used for multiple times to without residual error layering
Practice;
Residual error is layered and notices that depth convolutional network, and the layer trained in fixed convolutional network is added in power module, is carried out
The repetitive exercise of complete data set is used for multiple times.
It is limited as further, in the step (2), by the test branch of test sample input network, by two layers
Convolutional layer obtains the feature of test sample.
It is limited as further, in the step (2), determines that target in the position of present frame and scale includes: that will survey
Sample eigen inputs multi-context correlation wave filtering layer, and according to model parameter, calculates network response;In tracking phase,
The image sheet of multiple and different scales will be extracted, handled as test sample, feature and the network for obtaining them respond, and target is current
The scale of frame and position are respectively that the scale and maximum response of target in the test sample for obtain network response maximum value are corresponding
Position.
It is limited as further, in the step (3), the detailed process for obtaining training sample includes: in present frame figure
As in, centered on the target position in present frame, the image sheet of N times of current goal scale of scale is extracted, N is greater than 1, and
Image sheet is adjusted to specified pixel, as target current frame image training sample.
It limits as further, in the step (3), obtains containing the specific of the training sample feature for paying attention to force information
Process includes:
By training sample x0The convolutional layer in network training branch is inputted, output P (x is obtained0);Later, by P (x0) input
It is layered to residual error and pays attention to power module, obtained containing the training sample feature for paying attention to force information:
Q(x0)=∑uMu(x0)*P(x0)+P(x0)
Wherein, * indicates that Hadamard is multiplied by channel, M (x0) indicate to pay attention to the attention distribution map that power module generates, u
Indicate to pay attention to the number that layer is up-sampled in power module, Q (x0) indicate will be input to multi-context wave filtering layer with pay attention to
The training sample feature of force information.
It is limited as further, in the step (4), obtains adaptive regressive object, specific as follows:
Centered on set point, several transformation samples are extracted, the scale for changing sample is consistent with the scale of training sample,
And construct a center and scale and training sample the consistent restriction matrix in center and scale, element it is initial
Value is 0;
Transformation sample is input to network test branch, obtains the feature of transformation sample, and obtain network response diagram, is taken every
Value of the value of a response diagram center as restriction matrix corresponding position;
According to the value of element in known restriction matrix, it is based on Gaussian Profile, calculates the value of surplus element, finally obtaining can
Reflect the restriction matrix of target distribution and target movement;
According to restriction matrix, adaptive regressive object is obtained, it and restriction matrix obey noise model.
It is limited as further, in the step (4), obtains the specific of the filter parameter containing multi-context information
Process includes:
Centered on set point, context sample is extracted, the scale of context sample is consistent with training sample, and will be upper and lower
Literary sample is input to network test branch, obtains the feature of context sample;
In the training stage, according to the training sample feature comprising attention, the feature of context sample and adaptive recurrence
Restriction matrix in target obtains filter parameter.
In one or more embodiments, a kind of target following based on residual error layering attention and correlation filter
System, including server, the server include memory, processor and storage on a memory and can run on a processor
Computer program, the processor is realized above-mentioned based on residual error layering attention and correlation filter when executing described program
Target following method.
In one or more embodiments, a kind of computer readable storage medium is stored thereon with computer program, should
The above-mentioned Target Tracking System side based on residual error layering attention and correlation filter is executed when program is executed by processor
Method.
Compared with prior art, the disclosure has the beneficial effect that
(1) it is directed to target following, the convolutional neural networks of end-to-end training is proposed, can satisfy target following real-time
It is required that and the proposed network of correlation filter involvement is improved the differentiation of network as the correlation wave filtering layer of network
Ability;
(2) residual error layering attention study is proposed, using residual information and can pay attention to multiple up-samplings in power module
The information of layer, improves the generalization ability of network;
(3) objective function based on correlation wave filtering layer new by building, proposes multi-context correlation wave filtering layer,
Be able to carry out context perception and regressive object it is adaptive, and be incorporated into facilitating target for multi-context information
The study of positioning and model, further improves the performance of network;
(4) disclosure is under many complex environments, such as blocks in large area, target shape variation, target quickly rotate, light
According under the environment such as variation, background interference, effective, stable moving target can be tracked.
Detailed description of the invention
The accompanying drawings constituting a part of this application is used to provide further understanding of the present application, and the application's shows
Meaning property embodiment and its explanation are not constituted an undue limitation on the present application for explaining the application.
Fig. 1 is the method for tracking target schematic diagram based on residual error layering attention and correlation filter;
Fig. 2 is proposed residual error layering attention module diagram;
Fig. 3 is the comparison diagram of contextual information used in contextual information and this method used in conventional method;
Fig. 4 is the procedure chart for extracting transformation sample and obtaining adaptive regressive object;
Fig. 5 is tracking precision and tracking success rate figure on OTB50, OTB2013, OTB2015 data set;
Fig. 6 is the partial results schematic diagram tracked on OTB data set to different types of target.
Specific embodiment:
The disclosure is described further with embodiment with reference to the accompanying drawing.
It is noted that following detailed description is all illustrative, it is intended to provide further instruction to the application.Unless another
It indicates, all technical and scientific terms used herein has usual with the application person of an ordinary skill in the technical field
The identical meanings of understanding.
It should be noted that term used herein above is merely to describe specific embodiment, and be not intended to restricted root
According to the illustrative embodiments of the application.As used herein, unless the context clearly indicates otherwise, otherwise singular
Also it is intended to include plural form, additionally, it should be understood that, when in the present specification using term "comprising" and/or " packet
Include " when, indicate existing characteristics, step, operation, device, component and/or their combination.
In the disclosure, term for example "upper", "lower", "left", "right", "front", "rear", "vertical", "horizontal", " side ",
The orientation or positional relationship of the instructions such as "bottom" is to be based on the orientation or positional relationship shown in the drawings, only to facilitate describing this public affairs
The relative for opening each component or component structure relationship and determination, not refers in particular to either component or element in the disclosure, cannot understand
For the limitation to the disclosure.
In the disclosure, term such as " affixed ", " connected ", " connection " be shall be understood in a broad sense, and indicate may be a fixed connection,
It is also possible to be integrally connected or is detachably connected;It can be directly connected, it can also be indirectly connected through an intermediary.For
The related scientific research of this field or technical staff can determine the concrete meaning of above-mentioned term in the disclosure as the case may be,
It should not be understood as the limitation to the disclosure.
Embodiment one
A kind of mesh based on residual error layering attention and correlation filter is disclosed in one or more embodiments
Tracking is marked, as shown in Figure 1, attention mechanism and the filter of multi-context correlation can be layered containing residual error by what is proposed
The convolutional network end to end of wave layer carries out real-time target following.
In a certain frame, for a test sample, it is input to the test of the symmetrical twin network structure proposed
Branch obtains test sample feature;In test branch the structure of convolutional layer using in VGG-16 network first layer convolutional layer and
The structure of second layer convolutional layer, and remove wherein all pond layers.
It obtained convolution feature z is input to multi-context correlation wave filtering layer, obtains the response of filter, is i.e. network
Output response G (z), formula are as follows:
Wherein, ω is model parameter, and "-" indicates discrete Fourier transform,Representing matrix element is successively multiplied.
For in present frame, the test sample of different scale is repeated above operation, its feature z is obtainedsWith corresponding network
Output response G (zs);Using all-network respond in the corresponding position of maximum value as target in the position of present frame, while basis
The scale of target in the corresponding test sample of maximum response, the most best scale of target, formula are as follows:
Training sample x is extracted according to the current position of target and scale into the training stage0, by training sample x0Input
Convolutional layer in network training branch obtains output P (x0), train the convolution in the convolutional layer structure and test branch in branch
Layer structure is consistent;Later, by P (x0) it is input to residual error layering attention power module, obtain the training sample comprising paying attention to force information
Feature, formula are as follows:
Q(x0)=∑uMu(x0)*P(x0)+P(x0)
Wherein, * indicates that Hadamard is multiplied by channel, M (x0) indicate to pay attention to the attention distribution map that power module generates, u
Indicate to pay attention to the number that layer is up-sampled in power module, Q (x0) indicate will be input to multi-context wave filtering layer with pay attention to
The training sample feature of force information.
According to the current position of target and scale, context sample and transformation sample are extracted, is input to and is proposed
The test branch of symmetrical twin network structure, respectively obtains context sample convolution feature and transformation sample characteristics, and according to turning
The feature for sheet of changing, constructs adaptive regressive object.
In multi-context correlation wave filtering layer, the present disclosure proposes a new objective functions, by the objective function
Optimal value solved, the perception of context, the adaptive and filter of regressive object can be carried out in joint fashion
The study of parameter, formula are as follows:
Wherein, w indicates filter parameter;y0Indicate the restriction matrix for constructing regressive object y;X0It indicates to adopt in image
The sample of collection;XiIndicate context sample;Regular terms parameter θ1,θ2,θ3∈ (0,1] it is constant, for preventing over-fitting;X herein0
And XiTo recycle sample, their basic sample is respectively x0And xi。
It is solved by the optimal value of the objective function to proposed multi-context correlation wave filtering layer, uses Cyclic Moment
The closed solutions of filter parameter can be calculated in the property and Inversion Formula of battle array are as follows:
Wherein,For x0Conjugate complex number.In the training stage, the training sample feature x obtained can be used0, contextual feature
xi, restriction matrix y in adaptive regressive object0, acquire filter parameter w.X herein0For the training of input correlation wave filtering layer
The feature of sample, rather than training sample;xiFor the feature of the context sample of input correlation wave filtering layer, rather than context sample
This.
Based on new filter parameter, original model parameter is updated, completes training process, formula is as follows:
Wherein, ω is original model parameter;λ ∈ [0,1] is constant, indicates learning rate.
The present processes are described in detail below.
Pay attention to power module, receives residual error Web vector graphic residual error jump connection to enhance the inspiration of network performance, this public affairs
It opens and proposes residual error layering attention study to obtain more extensive and more effectively can perceive the convolution feature of attention, Fig. 2 is
The residual error layering attention module diagram proposed.
In traditional attention mechanism, the output Q (x of power module is paid attention to0) can be represented as:
Q(x0)=M (x0)*P(x0)
Wherein, P (x0) it is the input for paying attention to power module, that is, train the sample x of convolutional layer output in branch0Feature, M
(x0) it is the attention distribution map for paying attention to power module and generating, Q (x0) it is the convolution containing attention force information for paying attention to power module output
Feature.
However, the attention of convolution feature and element value in 0 to 1 range is distributed in traditional attention mechanism
Figure is multiplied, it will reduces the value of convolution feature, this can reduce the performance of script convolutional network in many cases.Therefore,
For this problem, the inspiration of residual error network is received, proposes a kind of attention power module based on residual information.In addition, this
The open attention power module proposed is used from upper-top-down hourglass configuration in bottom.From the convolution of different convolutional layers output
Similarly comprising different sample informations, in paying attention to power module, the output of different up-sampling layers can also reflect not feature
Same attention force information, therefore, the disclosure is integrated to obtain a more accurate attention distribution map.
The layering attention study of residual error that the disclosure is proposed may be expressed as following formula, layering attention distribution map with
Contain the convolution feature for paying attention to force information at final by the Fusion Features of convolutional layer input:
Q(x0)=∑uMu(x0)*P(x0)+P(x0)
Wherein, u is the number for paying attention to up-sampling layer in power module.Here, the attention point of different up-sampling layer outputs
Butut have different resolution ratio, handled using closest to distribution map of the interpolation to low resolution, make its with compared with high score
The resolution ratio that the distribution map of resolution is consistent.
As shown in Fig. 2, the attention power module of the disclosure has used the jump in residual error network to connect.In traditional attention
In mechanism, only with the last one up-sampling layer output as attention distribution map, and it is defeated that layer is up-sampled before having given up
Information out.Unlike traditional attention mechanism, the disclosure uses containing with difference for multiple up-sampling layer outputs
The attention force information of justice and effect, and combined.In Fig. 2, the preposition up-sampling layer with low resolution
Output contain more global information, this facilitate the positioning to target and prevent because block or other factors caused by drift
Problem;And the output of the postposition up-sampling layer with high-resolution contains more accurate local message, this facilitates area
Partial objectives for and similar object, and adapt to the variation of target.
Contextual information can provide more auxiliary informations to target positioning, this helps to improve mesh during tracking
Mark the accuracy of tracking, the especially target following under complex environment.However, traditional method based on correlation device due to
The Cosine Window for alleviating boundary effect has been used, therefore has only remained a small amount of contextual information during tracking, it is shown in Fig. 3
It is the comparison diagram of contextual information used in the contextual information for including and this method in conventional method.
As shown in figure 3, disclosed method extracts context sample, the update for model in model training stage.With
A+pnCentered on, extract context sample x1:k, wherein pnIt is target in the position of n-th frame, A=[- size (x0,1),0;0,-
size(x0,2);size(x0,1),0;0,size(x0, 2)], the scale of context sample is consistent with training sample.
Traditional method for tracking target based on correlation filter using static gaussian shape regressive object,
Different, disclosed method is suitable for the movement of target and the distribution of target using dynamic regressive object y
Situation, wherein y obeys noise model y=y+n,y0For the limitation for constructing regressive object y
Matrix.Shown in Fig. 4 is that disclosed method extracts transformation sample and obtains the procedure chart of adaptive regressive object.
As shown in figure 4, with T+pnCentered on, extract j transformation sample m1:j, change the scale and training sample of sample
Scale is consistent, wherein pnIt is target in the position of n-th frame, j=7, T=[t1,t2,...,tj]=[0,0;0,1;1,1;0,-
1;-1,0;- 1, -1] * ρ,By m1:jIt is input to network test branch and obtains network response diagram G
(m1:j), taking the value of response diagram center is restriction matrix y0The value of corresponding position, it may be assumed that
y0(t1:j+pn)=G (m1:j)
According to known y0The value of middle element is based on Gaussian Profile, obtains the value of surplus element, finally obtains adaptive return
Return target.In the top view of regressive object shown in Fig. 4, compared to the regressive object of gaussian shape, used in the disclosure
Adaptive regressive object can preferably reflect the distribution of target.
Attention and correlation are layered based on residual error to what is proposed on tri- data sets of OTB50, OTB2013, OTB2015
The method for tracking target of property filter is assessed.The instruction for the convolutional network end to end that the disclosure is proposed is described first
Practice process, the concrete configuration of experiment and the appraisal procedure of use is given later, finally to the reality obtained on three data sets
Result is tested to be analyzed.
The structure and training process of convolutional neural networks are as follows end to end:
Using the structure of VGG-16 network first tier convolutional layer and second layer convolutional layer, and remove all pond layers;
The above convolutional layer is copied as into symmetrical twin network structure, making network tool, there are two the consistent training of structure point
Branch and test branch;
The Hourglass structure with three layers of pond layer is added after the convolutional layer of network training branch, as the net
The residual error layering of network pays attention to power module;
The last layer of network is multi-context correlation wave filtering layer, and the input of this layer is to pay attention to the output and survey of power module
Try the output of branch.
It is specific as follows to the pre-training process of convolutional neural networks:
Training data is pre-processed, a pair of of picture frame is extracted every 10 frames, is extracted with the range of 3 times of target sizes
Image sheet out, and adjust the big as low as 128*128 pixel of image sheet;
Using stochastic gradient descent method training network, wherein momentum value is set as 0.9, and weight decaying is set as 0.005,
Learning rate is set as 1e-2;
For the loss function of network training using loss function is returned, formula is as follows:
Wherein, G (z) is the network receptance function to sample z, and y is the regressive object of Gaussian distributed;
The convolutional neural networks progress 50 times iteration using complete data set that power module is paid attention to without residual error layering are instructed
Practice;
Residual error is layered and notices that depth convolutional network, and the layer trained in fixed convolutional network is added in power module, is carried out
20 repetitive exercises using complete data set.
The test configurations of the disclosure are as follows: a 2.59GHz be furnished with 8G memory, i5 processor and it is tall and handsome reach GTX 1070
It is tested on the computer of GPU, the speed executed in PyTorch environment can reach 36 frames/second, meets real-time and wants
It asks.During the experiment, whole parameters that this method uses all are fixed and invariable.In layering attention mechanism, use
The number for up-sampling layer is u=3;Regular parameter θ1、θ2、θ3Value be respectively 1e-3,1,0.5;Learning parameter λ is 0.012.
By result of the method proposed (Ours) on OTB data set and other 17 kinds of high performance objectives trackings
As a result it is compared.This 17 kinds of method for tracking target include SRDCFdecon, MUSTer, LCT, SRDCF, Staple_CA,
CFNet, SiamFC, HDT, Staple, DCF_CA, SAMF, MEEM, DSST, KCF, TGPR, DLT, STC.Wherein,
The result of SRDCFdecon, LCT, SRDCF, CFNet, SiamFC, HDT, Staple and DSST are disclosed from their author
Methods and results;Method disclosed in author of the result of Staple_CA, DCF_CA, SAMF and KCF from them is being tested
Result after being run in equipment;Author of the result of MUSTer, MEEM, TGPR, DLT and STC from LCT.The disclosure is adopted
Success rate with region under curve (area under the curve, AUC) and when threshold value is 20 pixels is as measurement
Standard has carried out ranking to the tracking success rate and tracking precision of this 18 kinds of methods respectively.Shown in fig. 5 is 12 before ranking
The tracking success rate of method and the schematic diagram of accuracy.Table 1 summarizes the class of 18 kinds of method for tracking target for performance comparison
Type.
Type of the table 1 for the method for tracking target of comparison
Method for tracking target evaluation process based on residual error layering attention and correlation filter are as follows:
On OTB50 data set, the AUC for the tracker that the disclosure proposes is 0.591, better than what is be number two
SRDCFdecon (0.560) 5.5%.The tracker of the disclosure can carry out the tracker of context-aware better than other two, i.e.,
Staple_CA (0.542) and DCF_CA (0.493) is better than 9.0% and 19.8% respectively, this has benefited from disclosed method institute
The convolution feature that uses and regressive object it is adaptive.In addition to disclosed method, the peak performance with twin symmetrical structure
Tracker be (0.530) CFNet, its network structure has used two convolutional layers;However, its result is lower than the disclosure
11.5%.In terms of accuracy, the tracker of the disclosure is number two (0.790), lower than being located at primary HDT (0.804)
1.8%.But the tracker of the disclosure execute speed on be it is advantageous, according to HDT author provide data, the place of HDT
Reason speed is 10fps, fast three times of ratio HDT tracker of the execution speed of disclosed method.
On OTB2013 data set, the AUC for the tracker that the disclosure proposes is 0.671, is made number one, than
SRDCFdecon (0.653) is high by 2.8%.The tracker of the disclosure can carry out the tracker of context-aware better than other two,
That is Staple_CA (0.615) and DCF_CA (0.592) is better than 9.1% and 13.3% respectively.In addition to disclosed method, have
The tracker of the peak performance of twin symmetrical structure is (0.611) CFNet, but its result lower than disclosed method 9.8%.?
In terms of accuracy, the tracker (0.889) of the disclosure is than HDT tracker (0.883) only low 0.7%, but the tracker of the disclosure
HDT tracker is substantially better than in terms of executing speed.
On OTB2015 data set, the AUC for the tracker that the disclosure proposes is 0.623, is number two, than being located at first
The SRDCFdecon (0.627) only low 0.6% of position.But the tracker of the disclosure executes speed ratio SRDCFdecon (its author
The processing speed of offer is 1fps) fast 30 times of method.The tracker of the disclosure can carry out context-aware better than other two
Tracker, i.e. Staple_CA (0.598) and DCF_CA (0.552) are better than 4.2% and 12.9% respectively.Except disclosed method
Outside, the tracker of the peak performance with twin symmetrical structure is (0.582) SiamFC, but its result is lower than the disclosure
7.0%.In terms of accuracy, the precision of the tracker of the disclosure is 0.815, comes third position.
The method that the disclosure is proposed can carry out stable tracking in complicated scene to target, shown in fig. 6 for this
The partial results schematic diagram that disclosed method tracks all kinds of targets on OTB data set.Disclosed method is in many
Under complex environment, such as block in large area, target shape variation, target quickly rotate, illumination variation, background interference environment
Under, it is able to carry out effective, stable target following.
Embodiment two
A kind of mesh based on residual error layering attention and correlation filter disclosed in one or more embodiments
Mark tracking system, including server, the server include memory, processor and storage on a memory and can be in processor
The computer program of upper operation, the processor are realized to be layered described in embodiment one based on residual error when executing described program and be infused
The method for tracking target for the power and correlation filter of anticipating.
Embodiment three
A kind of computer readable storage medium disclosed in one or more embodiments, is stored thereon with computer journey
Sequence executes the mesh based on residual error layering attention and correlation filter described in embodiment one when the program is executed by processor
Mark tracking.
The foregoing is merely preferred embodiment of the present application, are not intended to limit this application, for the skill of this field
For art personnel, various changes and changes are possible in this application.Within the spirit and principles of this application, made any to repair
Change, equivalent replacement, improvement etc., should be included within the scope of protection of this application.
Although above-mentioned be described in conjunction with specific embodiment of the attached drawing to the disclosure, model not is protected to the disclosure
The limitation enclosed, those skilled in the art should understand that, on the basis of the technical solution of the disclosure, those skilled in the art are not
Need to make the creative labor the various modifications or changes that can be made still within the protection scope of the disclosure.
Claims (10)
1. a kind of method for tracking target based on residual error layering attention and correlation filter, it is characterized in that: including following step
It is rapid:
(1) current frame image is read, obtains position and scale of the target in previous frame image, and then determine the survey in present frame
Sample sheet;
(2) test sample is input to the convolutional neural networks after training, the convolution feature of test sample is obtained, by the feature
It is input to multi-context correlation wave filtering layer, by model parameter, obtains network response, and determine target in the position of present frame
And scale;
(3) position according to target in present frame and scale obtain training sample, and the training sample is inputted convolutional Neural net
Network and residual error layering pay attention to power module, obtain containing the training sample feature for paying attention to force information;
(4) transformation sample is extracted, convolutional neural networks are inputted in the position of present frame according to target, based on transformation sample
Network response, obtain adaptive regressive object, later, extract context sample, obtain context sample feature, and according to
Containing the training sample feature and adaptive regressive object for paying attention to force information, the filter ginseng containing multi-context information is obtained
Number;
(5) using the filter parameter obtained, original model parameter is updated.
2. a kind of method for tracking target based on residual error layering attention and correlation filter as described in claim 1,
It is characterized in: further includes step (6), is updated to next frame image, the iteration of step (1)-(5) is constantly carried out, until all images
Processing is completed.
3. a kind of method for tracking target based on residual error layering attention and correlation filter as described in claim 1,
Be characterized in: in the step (1), selected target includes: in current frame image in the determination process of the test sample of present frame
Centered on previous frame target position, the image sheet of N times of previous frame target scale of scale is extracted, N is greater than 1, and by image
Piece is adjusted to specified pixel, the test sample as present frame.
4. a kind of method for tracking target based on residual error layering attention and correlation filter as described in claim 1,
Be characterized in: in the step (2), the structure of convolutional neural networks includes:
Using the structure of VGG-16 network first tier convolutional layer and second layer convolutional layer, and remove all pond layers;
The above convolutional layer is copied as into symmetrical twin network structure, make network tool there are two the consistent trained branch of structure and
Test branch;
The Hourglass structure with three layers of pond layer is added after the convolutional layer of network training branch, as the network
Residual error layering pays attention to power module;
The last layer of network is multi-context correlation wave filtering layer, and the input of this layer is to pay attention to the output and test point of power module
The output of branch.
5. a kind of method for tracking target based on residual error layering attention and correlation filter as described in claim 1,
It is characterized in: in the step (2), pre-training is carried out to convolutional neural networks end to end using training dataset;
The detailed process of pre-training includes:
Training data is pre-processed, a pair of of picture frame is extracted every several frames, is extracted with being greater than the range of target sizes
Image sheet, and adjust the big of image sheet and as low as set pixel, the sample as training network;
Using stochastic gradient descent method training network;
Notice that the convolutional neural networks of power module carry out the repetitive exercise that complete data set is used for multiple times to without residual error layering;
Residual error is layered and notices that depth convolutional network, and the layer trained in fixed convolutional network is added in power module, is carried out multiple
Use the repetitive exercise of complete data set.
6. a kind of method for tracking target based on residual error layering attention and correlation filter as described in claim 1,
It is characterized in: in the step (2), the test branch of test sample input network is obtained into test sample by two layers of convolutional layer
Feature;
Or, in the step (2), determining that target includes: in the position of present frame and scale will be in the input mostly of test sample feature
Context correlation wave filtering layer, and according to model parameter, calculate network response;In tracking phase, multiple and different scales will be extracted
Image sheet, handle the feature that them are obtained for test sample and network response, target is distinguished in the scale of present frame and position
The scale of target and the corresponding position of maximum response in the test sample of maximum value are responded to obtain network.
7. a kind of method for tracking target based on residual error layering attention and correlation filter as described in claim 1,
Be characterized in: in the step (3), the detailed process for obtaining training sample includes: in current frame image, with the mesh in present frame
Centered on cursor position, the image sheet of N times of current goal scale of scale is extracted, N is greater than 1, and image sheet is adjusted to specified
Pixel, as target current frame image training sample;
Or, obtaining the detailed process containing the training sample feature for paying attention to force information includes: in the step (3)
By training sample x0The convolutional layer in network training branch is inputted, output P (x is obtained0);Later, by P (x0) be input to it is residual
Difference layer pays attention to power module, obtains containing the training sample feature for paying attention to force information:
Q(x0)=∑uMu(x0)*P(x0)+P(x0)
Wherein, * indicates that Hadamard is multiplied by channel, M (x0) indicate to notice that the attention distribution map that power module generates, u indicate note
The number of layer, Q (x are up-sampled in power module of anticipating0) indicate will be input to multi-context wave filtering layer with pay attention to force information
Training sample feature.
8. a kind of method for tracking target based on residual error layering attention and correlation filter as described in claim 1,
It is characterized in: in the step (4), obtains adaptive regressive object, specific as follows:
Centered on set point, several transformation samples are extracted, the scale for changing sample is consistent with the scale of training sample, and structure
The consistent restriction matrix in center and scale an of center and scale and training sample is built, the initial value of element is
0;
Transformation sample is input to network test branch, the feature of transformation sample is obtained, and obtain network response diagram, takes each sound
Should figure center value of the value as restriction matrix corresponding position;
According to the value of element in known restriction matrix, it is based on Gaussian Profile, the value of surplus element is calculated, finally obtains and be able to reflect
The restriction matrix of target distribution and target movement;
According to restriction matrix, adaptive regressive object is obtained, it and restriction matrix obey noise model.
9. a kind of method for tracking target based on residual error layering attention and correlation filter as described in claim 1,
Be characterized in: in the step (4), the detailed process for obtaining the filter parameter containing multi-context information includes:
Centered on set point, context sample is extracted, the scale of context sample is consistent with training sample, and by context sample
Originally it is input to network test branch, obtains the feature of context sample;
In the training stage, according to the training sample feature comprising attention, the feature of context sample and adaptive regressive object
In restriction matrix, obtain filter parameter.
10. a kind of Target Tracking System based on residual error layering attention and correlation filter, it is characterized in that: including service
Device, the server include memory, processor and storage on a memory and the computer program that can run on a processor,
The processor is realized as claimed in any one of claims 1-9 wherein when executing described program based on residual error layering attention and phase
The method of the target following of closing property filter.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811592319.2A CN109685831B (en) | 2018-12-20 | 2018-12-20 | Target tracking method and system based on residual layered attention and correlation filter |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811592319.2A CN109685831B (en) | 2018-12-20 | 2018-12-20 | Target tracking method and system based on residual layered attention and correlation filter |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109685831A true CN109685831A (en) | 2019-04-26 |
CN109685831B CN109685831B (en) | 2020-08-25 |
Family
ID=66189235
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811592319.2A Active CN109685831B (en) | 2018-12-20 | 2018-12-20 | Target tracking method and system based on residual layered attention and correlation filter |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109685831B (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110070563A (en) * | 2019-04-30 | 2019-07-30 | 山东大学 | Correlation filter method for tracking target and system based on joint perception |
CN110210551A (en) * | 2019-05-28 | 2019-09-06 | 北京工业大学 | A kind of visual target tracking method based on adaptive main body sensitivity |
CN110335290A (en) * | 2019-06-04 | 2019-10-15 | 大连理工大学 | Twin candidate region based on attention mechanism generates network target tracking method |
CN110443852A (en) * | 2019-08-07 | 2019-11-12 | 腾讯科技(深圳)有限公司 | A kind of method and relevant apparatus of framing |
CN110827320A (en) * | 2019-09-17 | 2020-02-21 | 北京邮电大学 | Target tracking method and device based on time sequence prediction |
CN110992404A (en) * | 2019-12-23 | 2020-04-10 | 驭势科技(南京)有限公司 | Target tracking method, device and system and storage medium |
CN111080541A (en) * | 2019-12-06 | 2020-04-28 | 广东启迪图卫科技股份有限公司 | Color image denoising method based on bit layering and attention fusion mechanism |
CN111724410A (en) * | 2020-05-25 | 2020-09-29 | 天津大学 | Target tracking method based on residual attention |
CN112907607A (en) * | 2021-03-15 | 2021-06-04 | 德鲁动力科技(成都)有限公司 | Deep learning, target detection and semantic segmentation method based on differential attention |
CN113297959A (en) * | 2021-05-24 | 2021-08-24 | 南京邮电大学 | Target tracking method and system based on corner attention twin network |
CN113627240A (en) * | 2021-06-29 | 2021-11-09 | 南京邮电大学 | Unmanned aerial vehicle tree species identification method based on improved SSD learning model |
CN113689464A (en) * | 2021-07-09 | 2021-11-23 | 西北工业大学 | Target tracking method based on twin network adaptive multilayer response fusion |
CN113947618A (en) * | 2021-10-20 | 2022-01-18 | 哈尔滨工业大学 | Adaptive regression tracking method based on modulator |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8401248B1 (en) * | 2008-12-30 | 2013-03-19 | Videomining Corporation | Method and system for measuring emotional and attentional response to dynamic digital media content |
CN103065326A (en) * | 2012-12-26 | 2013-04-24 | 西安理工大学 | Target detection method based on time-space multiscale motion attention analysis |
CN103514608A (en) * | 2013-06-24 | 2014-01-15 | 西安理工大学 | Movement target detection and extraction method based on movement attention fusion model |
CN104243916A (en) * | 2014-09-02 | 2014-12-24 | 江苏大学 | Moving object detecting and tracking method based on compressive sensing |
CN106530329A (en) * | 2016-11-14 | 2017-03-22 | 华北电力大学(保定) | Fractional differential-based multi-feature combined sparse representation tracking method |
CN106898015A (en) * | 2017-01-17 | 2017-06-27 | 华中科技大学 | A kind of multi thread visual tracking method based on the screening of self adaptation sub-block |
CN107016689A (en) * | 2017-02-04 | 2017-08-04 | 中国人民解放军理工大学 | A kind of correlation filtering of dimension self-adaption liquidates method for tracking target |
-
2018
- 2018-12-20 CN CN201811592319.2A patent/CN109685831B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8401248B1 (en) * | 2008-12-30 | 2013-03-19 | Videomining Corporation | Method and system for measuring emotional and attentional response to dynamic digital media content |
CN103065326A (en) * | 2012-12-26 | 2013-04-24 | 西安理工大学 | Target detection method based on time-space multiscale motion attention analysis |
CN103514608A (en) * | 2013-06-24 | 2014-01-15 | 西安理工大学 | Movement target detection and extraction method based on movement attention fusion model |
CN104243916A (en) * | 2014-09-02 | 2014-12-24 | 江苏大学 | Moving object detecting and tracking method based on compressive sensing |
CN106530329A (en) * | 2016-11-14 | 2017-03-22 | 华北电力大学(保定) | Fractional differential-based multi-feature combined sparse representation tracking method |
CN106898015A (en) * | 2017-01-17 | 2017-06-27 | 华中科技大学 | A kind of multi thread visual tracking method based on the screening of self adaptation sub-block |
CN107016689A (en) * | 2017-02-04 | 2017-08-04 | 中国人民解放军理工大学 | A kind of correlation filtering of dimension self-adaption liquidates method for tracking target |
Non-Patent Citations (2)
Title |
---|
DORAN M M等: "The Role of Visual Attention in Multiple Object Tracking", 《ATTENTION PERCEPTION & PSYCHOPHYSICS》 * |
伍博: "基于显著性的视觉目标跟踪研究", 《中国博士学位论文全文数据库 信息科技辑》 * |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110070563A (en) * | 2019-04-30 | 2019-07-30 | 山东大学 | Correlation filter method for tracking target and system based on joint perception |
CN110210551A (en) * | 2019-05-28 | 2019-09-06 | 北京工业大学 | A kind of visual target tracking method based on adaptive main body sensitivity |
CN110210551B (en) * | 2019-05-28 | 2021-07-30 | 北京工业大学 | Visual target tracking method based on adaptive subject sensitivity |
CN110335290A (en) * | 2019-06-04 | 2019-10-15 | 大连理工大学 | Twin candidate region based on attention mechanism generates network target tracking method |
CN110443852B (en) * | 2019-08-07 | 2022-03-01 | 腾讯科技(深圳)有限公司 | Image positioning method and related device |
CN110443852A (en) * | 2019-08-07 | 2019-11-12 | 腾讯科技(深圳)有限公司 | A kind of method and relevant apparatus of framing |
CN110827320A (en) * | 2019-09-17 | 2020-02-21 | 北京邮电大学 | Target tracking method and device based on time sequence prediction |
CN110827320B (en) * | 2019-09-17 | 2022-05-20 | 北京邮电大学 | Target tracking method and device based on time sequence prediction |
CN111080541A (en) * | 2019-12-06 | 2020-04-28 | 广东启迪图卫科技股份有限公司 | Color image denoising method based on bit layering and attention fusion mechanism |
CN110992404A (en) * | 2019-12-23 | 2020-04-10 | 驭势科技(南京)有限公司 | Target tracking method, device and system and storage medium |
CN110992404B (en) * | 2019-12-23 | 2023-09-19 | 驭势科技(浙江)有限公司 | Target tracking method, device and system and storage medium |
CN111724410A (en) * | 2020-05-25 | 2020-09-29 | 天津大学 | Target tracking method based on residual attention |
CN112907607A (en) * | 2021-03-15 | 2021-06-04 | 德鲁动力科技(成都)有限公司 | Deep learning, target detection and semantic segmentation method based on differential attention |
CN113297959A (en) * | 2021-05-24 | 2021-08-24 | 南京邮电大学 | Target tracking method and system based on corner attention twin network |
CN113627240A (en) * | 2021-06-29 | 2021-11-09 | 南京邮电大学 | Unmanned aerial vehicle tree species identification method based on improved SSD learning model |
CN113627240B (en) * | 2021-06-29 | 2023-07-25 | 南京邮电大学 | Unmanned aerial vehicle tree species identification method based on improved SSD learning model |
CN113689464A (en) * | 2021-07-09 | 2021-11-23 | 西北工业大学 | Target tracking method based on twin network adaptive multilayer response fusion |
CN113947618A (en) * | 2021-10-20 | 2022-01-18 | 哈尔滨工业大学 | Adaptive regression tracking method based on modulator |
CN113947618B (en) * | 2021-10-20 | 2023-08-29 | 哈尔滨工业大学 | Self-adaptive regression tracking method based on modulator |
Also Published As
Publication number | Publication date |
---|---|
CN109685831B (en) | 2020-08-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109685831A (en) | Method for tracking target and system based on residual error layering attention and correlation filter | |
CN110276316B (en) | Human body key point detection method based on deep learning | |
CN110555434B (en) | Method for detecting visual saliency of three-dimensional image through local contrast and global guidance | |
CN107204010B (en) | A kind of monocular image depth estimation method and system | |
CN103856727B (en) | Multichannel real-time video splicing processing system | |
CN110458165B (en) | Natural scene text detection method introducing attention mechanism | |
CN109117794A (en) | A kind of moving target behavior tracking method, apparatus, equipment and readable storage medium storing program for executing | |
CN107529650A (en) | The structure and closed loop detection method of network model, related device and computer equipment | |
CN108921058A (en) | Fish identification method, medium, terminal device and device based on deep learning | |
CN106683048A (en) | Image super-resolution method and image super-resolution equipment | |
CN107423398A (en) | Exchange method, device, storage medium and computer equipment | |
CN113012172A (en) | AS-UNet-based medical image segmentation method and system | |
CN109978786A (en) | A kind of Kinect depth map restorative procedure based on convolutional neural networks | |
CN110705566B (en) | Multi-mode fusion significance detection method based on spatial pyramid pool | |
CN114511778A (en) | Image processing method and device | |
Pham et al. | Road damage detection and classification with yolov7 | |
KR20220006654A (en) | Image registration method and associated model training method, apparatus, apparatus | |
US20230281913A1 (en) | Radiance Fields for Three-Dimensional Reconstruction and Novel View Synthesis in Large-Scale Environments | |
CN108259893A (en) | Virtual reality method for evaluating video quality based on double-current convolutional neural networks | |
CN117391938B (en) | Infrared image super-resolution reconstruction method, system, equipment and terminal | |
CN110956684A (en) | Crowd movement evacuation simulation method and system based on residual error network | |
Neupane et al. | Building footprint segmentation using transfer learning: a case study of the city of melbourne | |
CN114333071B (en) | Video teaching method, system and storage medium based on human body posture estimation | |
CN116758263A (en) | Remote sensing image target detection method based on multi-level feature fusion and joint positioning | |
CN111862158B (en) | Staged target tracking method, device, terminal and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |