CN109685831B - Target tracking method and system based on residual layered attention and correlation filter - Google Patents

Target tracking method and system based on residual layered attention and correlation filter Download PDF

Info

Publication number
CN109685831B
CN109685831B CN201811592319.2A CN201811592319A CN109685831B CN 109685831 B CN109685831 B CN 109685831B CN 201811592319 A CN201811592319 A CN 201811592319A CN 109685831 B CN109685831 B CN 109685831B
Authority
CN
China
Prior art keywords
target
sample
network
attention
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811592319.2A
Other languages
Chinese (zh)
Other versions
CN109685831A (en
Inventor
马昕
黄文慧
宋锐
荣学文
田国会
李贻斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN201811592319.2A priority Critical patent/CN109685831B/en
Publication of CN109685831A publication Critical patent/CN109685831A/en
Application granted granted Critical
Publication of CN109685831B publication Critical patent/CN109685831B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The disclosure provides a target tracking method and system based on residual layered attention and correlation filter. The present disclosure uses an end-to-end trained convolutional neural network with a correlation filter as a layer in the network, enabling real-time target tracking of moving targets. Moreover, through residual error layered attention learning, more effective and robust convolution target characteristics can be obtained, and the generalization capability of target tracking is remarkably improved. In addition, the multi-context correlation filtering layer realizes the perception of the context and the self-adaption of the regression target in a joint mode, and obviously improves the discrimination capability of target tracking.

Description

Target tracking method and system based on residual layered attention and correlation filter
Technical Field
The disclosure relates to a target tracking method and system based on residual layered attention and correlation filter.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
Target tracking of a moving target is an important branch of computer vision and a research hotspot, and is widely applied in many fields, such as motion event detection, video monitoring, biological vision, and the like. However, target tracking remains a very challenging issue due to the problems of shape change, illumination change, occlusion, background interference, etc. that often occur during tracking.
In recent years, a target tracking method based on a correlation filter has attracted wide attention and is rapidly developed. The method can achieve higher tracking precision and has higher processing speed. In the tracking process, the context information contains many important foreground and background clues, and the information helps to improve the accuracy of target positioning. However, correlation filter based target tracking methods are generally not context-aware; some of these methods, although using the context information in the tracking process, further reduce the context information contained in the search area in these methods because the search area of each frame only contains a small amount of context area and the cosine window for reducing the boundary effect.
In the last five years, related methods and technologies of deep network and machine learning are gradually applied to target tracking, and the performance of target tracking is greatly improved. Compared with the traditional target tracking method, the method has the advantages that the tracking accuracy and the tracking success rate are remarkably improved. However, many target tracking methods based on deep learning adopt a pre-trained network such as VGG or Alexnet, and then other existing tracking methods are superimposed, so that it is difficult to meet the real-time requirement of target tracking, end-to-end network training is not really performed, and the advantages of the deep network are fully exerted.
Disclosure of Invention
In order to solve the above problems, the present disclosure provides a target tracking method and system based on residual layered attention and correlation filter. The present disclosure uses an end-to-end trained convolutional neural network with a correlation filter as a layer in the network, enabling real-time target tracking of moving targets. Moreover, through residual error layered attention learning, more effective and robust convolution target characteristics can be obtained, and the generalization capability of target tracking is remarkably improved. In addition, the multi-context correlation filtering layer realizes the perception of the context and the self-adaption of the regression target in a joint mode, and obviously improves the discrimination capability of target tracking.
According to some embodiments, the following technical scheme is adopted in the disclosure:
a target tracking method based on residual layered attention and correlation filter comprises the following steps:
(1) reading the current frame image, obtaining the position and the scale of a target in the previous frame image, and further determining a test sample in the current frame;
(2) inputting a test sample into a trained convolutional neural network to obtain the convolutional characteristic of the test sample, inputting the characteristic into a multi-context correlation filter layer, obtaining a network response through a model parameter, and determining the position and the scale of a target in a current frame;
(3) acquiring a training sample according to the position and the scale of a target in a current frame, inputting the training sample into a convolutional neural network and a residual error layering attention module, and acquiring the characteristics of the training sample containing attention information;
(4) extracting a conversion sample according to the position of a target in a current frame, inputting the conversion sample into a convolutional neural network, obtaining a self-adaptive regression target based on the network response of the conversion sample, then extracting a context sample, obtaining the characteristics of the context sample, and obtaining filter parameters containing multiple context information according to the training sample characteristics containing attention information and the self-adaptive regression target;
(5) and updating the original model parameters by using the obtained filter parameters.
By way of further limitation, the method further comprises the step (6) of updating to the next frame image, and continuously iterating the steps (1) - (5) until all image processing is completed.
By way of further limitation, in step (1), the determining of the test sample of the selected target in the current frame includes: and taking the target position of the previous frame in the current frame image as the center, extracting an image slice with the size N times that of the target position of the previous frame, wherein N is larger than 1, and adjusting the image slice to a specified pixel to be used as a test sample of the current frame.
By way of further limitation, in step (2), the structure of the convolutional neural network includes:
adopting the structure of a first layer of convolution layer and a second layer of convolution layer of a VGG-16 network, and removing all pooling layers;
copying the convolutional layer into a symmetrical twin network structure, so that the network has two training branches and testing branches with consistent structures;
adding a Hourglass structure with three pooling layers after the convolutional layer of the network training branch as a residual error layering attention module of the network;
the last layer of the network is a multi-context correlation filter layer, the inputs of which are the output of the attention module and the output of the test branch.
As a further limitation, in the step (2), the end-to-end convolutional neural network is pre-trained by using a training data set.
As a further limitation, in step (2), the pre-training comprises:
preprocessing training data, extracting a pair of image frames every a plurality of frames, extracting an image slice in a range larger than a target size, and adjusting the size of the image slice to a set pixel to be used as a sample of a training network;
training the network by adopting a random gradient descent method;
performing iterative training using a complete data set for a plurality of times on a convolutional neural network without a residual error hierarchical attention module;
and adding the residual error layering attention module into the deep convolutional network, fixing the trained layers in the convolutional network, and performing iterative training for using the complete data set for multiple times.
As a further limitation, in the step (2), the test sample is input into the test branch of the network, and passes through two convolutional layers, so as to obtain the characteristics of the test sample.
By way of further limitation, in step (2), the determining the position and the scale of the target in the current frame includes: inputting the characteristics of the test sample into a multi-context correlation filter layer, and calculating a network response value according to the model parameters; in the tracking stage, a plurality of image slices with different scales are extracted and processed into test samples to obtain the characteristics and the network response of the test samples, and the scale and the position of the target in the current frame are respectively the scale of the target in the test sample with the maximum network response value and the position corresponding to the maximum response value.
As a further limitation, in step (3), the specific process of obtaining the training sample includes: in the current frame image, taking the target position in the current frame as the center, extracting an image slice with the size N times of the current target size, wherein N is larger than 1, and adjusting the image slice to a specified pixel to be used as a training sample of the target in the current frame image.
By way of further limitation, in step (3), the specific process of obtaining the training sample features containing the attention information includes:
will train sample x0Inputting convolution layer in network training branch to obtain output P (x)0) (ii) a Then, P (x)0) Inputting the data into a residual error layering attention module to obtain training sample characteristics containing attention information:
Q(x0)=∑uMu(x0)*P(x0)+P(x0)
wherein, denotes the multiplication of Hadamard by channel, Mu(x0) Denotes the attention profile generated by the attention module, u denotes the number of upsampled layers in the attention module, Q (x)0) Representing the training sample features with attention information to be input to the multi-context filter layer.
As a further limitation, in the step (4), an adaptive regression target is obtained, which specifically includes the following steps:
taking a set point as a center, extracting a plurality of conversion samples, wherein the scale of each conversion sample is consistent with that of a training sample, and constructing a limiting matrix with the central position and the scale consistent with those of the training samples, wherein the initial value of an element is 0;
inputting the conversion samples into a network test branch to obtain the characteristics of the conversion samples and obtain network response graphs, and taking the value of the central position of each response graph as the value of the corresponding position of the restriction matrix;
calculating values of the remaining elements based on Gaussian distribution according to values of elements in a known limiting matrix to finally obtain a limiting matrix capable of reflecting target distribution and target motion;
from the constraint matrix, an adaptive regression objective is derived, which obeys the noise model with the constraint matrix.
As a further limitation, in step (4), the specific process of obtaining the filter parameter containing the multiple context information includes:
taking a set point as a center, extracting a context sample, wherein the scale of the context sample is consistent with that of a training sample, and inputting the context sample to a network test branch to obtain the characteristics of the context sample;
in the training phase, filter parameters are obtained according to the features of the training samples containing attention, the features of the context samples and the limiting matrix in the adaptive regression target.
In one or more embodiments, a target tracking system based on a residual layered attention and relevance filter includes a server including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the above-described method for target tracking based on the residual layered attention and relevance filter when executing the program.
In one or more embodiments, a computer-readable storage medium has stored thereon a computer program which, when executed by a processor, performs the above-described residual layered attention and correlation filter based target tracking system method.
Compared with the prior art, the beneficial effect of this disclosure is:
(1) aiming at target tracking, a convolutional neural network trained end to end is provided, the real-time requirement of target tracking can be met, a correlation filter is integrated into the provided network to be used as a correlation filtering layer of the network, and the discrimination capability of the network is improved;
(2) residual error layered attention learning is provided, and the generalization capability of the network is improved by utilizing residual error information and information of a plurality of upper sampling layers in an attention module;
(3) by constructing a new target function based on a correlation filtering layer, a multi-context correlation filtering layer is provided, context sensing and regression target self-adaptation can be carried out, and the multi-context correlation filtering layer is combined into multi-context information, so that the target positioning and model learning are facilitated, and the network performance is further improved;
(4) the method can effectively and stably track the moving target in a plurality of complex environments, such as large-area shielding, target appearance change, target rapid rotation, illumination change, background interference and the like.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application.
FIG. 1 is a schematic diagram of a target tracking method based on a residual layered attention and correlation filter;
FIG. 2 is a schematic diagram of a proposed residual layered attention module;
FIG. 3 is a diagram comparing context information used in a conventional method and context information used in the method;
FIG. 4 is a diagram of a process of extracting transition samples and obtaining adaptive regression targets;
FIG. 5 is a graph of tracking accuracy and tracking success rate on an OTB50, OTB2013, OTB2015 data set;
FIG. 6 is a partial result diagram of tracking different types of targets on an OTB data set.
The specific implementation mode is as follows:
the present disclosure is further described with reference to the following drawings and examples.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
In the present disclosure, terms such as "upper", "lower", "left", "right", "front", "rear", "vertical", "horizontal", "side", "bottom", and the like indicate orientations or positional relationships based on those shown in the drawings, and are only relational terms determined for convenience in describing structural relationships of the parts or elements of the present disclosure, and do not refer to any parts or elements of the present disclosure, and are not to be construed as limiting the present disclosure.
In the present disclosure, terms such as "fixedly connected", "connected", and the like are to be understood in a broad sense, and mean either a fixed connection or an integrally connected or detachable connection; may be directly connected or indirectly connected through an intermediate. The specific meanings of the above terms in the present disclosure can be determined on a case-by-case basis by persons skilled in the relevant art or technicians, and are not to be construed as limitations of the present disclosure.
Example one
In one or more embodiments, a target tracking method based on residual layered attention and correlation filter is disclosed, as shown in fig. 1, which can perform real-time target tracking through the proposed end-to-end convolutional network with a residual layered attention mechanism and multiple context correlation filter layers.
In a certain frame, inputting a test sample to a test branch of the proposed symmetrical twin network structure to obtain the characteristics of the test sample; the structure of the convolutional layers in the test branch takes the structure of the first and second convolutional layers in the VGG-16 network and removes all pooling layers therein.
Inputting the obtained convolution characteristic z to the multi-context correlation filtering layer to obtain the response of the filter, namely the output response G (z) of the network, wherein the formula is as follows:
Figure GDA0002579442710000081
where ω is a model parameter, "-" represents a discrete fourier transform, "omicron" represents sequential multiplication of matrix elements.
Repeating the above operations on the test samples with different scales in the current frame to obtain the characteristic z of the test samplessAnd corresponding network output response G (z)s) (ii) a Taking the position corresponding to the maximum value in all network responses as the position of the target in the current frame, and simultaneously according to the scale of the target in the test sample corresponding to the maximum response value, the optimal scale of the target is the most, and the formula is as follows:
Figure GDA0002579442710000082
entering a training stage, and extracting a training sample x according to the current position and scale of the target0Will train sample x0Inputting convolution layer in network training branch to obtain output P (x)0) The convolutional layer structure in the training branch is consistent with that in the testing branch; then, P (x)0) Inputting the data into a residual error layered attention module to obtain training sample characteristics containing attention information, wherein the formula is as follows:
Q(x0)=∑uMu(x0)*P(x0)+P(x0)
wherein, denotes the multiplication of Hadamard by channel, Mu(x0) Denotes the attention profile generated by the attention module, u denotes the number of upsampled layers in the attention module, Q (x)0) Representing the training sample features with attention information to be input to the multi-context filter layer.
Extracting a context sample and a transformation sample according to the current position and scale of the target, inputting the context sample and the transformation sample into the test branch of the proposed symmetrical twin network structure to respectively obtain the convolution characteristic and the transformation sample characteristic of the context sample, and constructing a self-adaptive regression target according to the characteristic of the transformation sample.
In a multi-context correlation filtering layer, the present disclosure provides a new objective function, and by solving an optimal value of the objective function, context sensing, adaptation of a regression target, and learning of filter parameters can be performed in a joint manner, and the formula is as follows:
Figure GDA0002579442710000091
wherein w represents a filter parameter; y is0Representing a constraint matrix used to construct a regression target y; x0Representing a sample acquired in an image; xiRepresenting a context sample; regularization term parameter θ123∈(0,1]Is a constant to prevent overfitting; here X0And XiFor cyclic samples, their base samples are x respectively0And xi
By solving the optimal value of the objective function of the proposed multi-context correlation filter layer, the closed form solution of the filter parameters can be calculated by using the properties of the circulant matrix and an inversion formula as follows:
Figure GDA0002579442710000101
wherein the content of the first and second substances,
Figure GDA0002579442710000102
is x0The complex conjugate of (a). In the training phase, the obtained training sample features x may be used0Context feature xiConstraint matrix y in adaptive regression target0And obtaining the filter parameter w. Where x is0Features of training samples of the input correlation filtering layer, but not training samples; x is the number ofiIs a characteristic of the context sample of the input correlation filtering layer, not the context sample.
Based on the new filter parameters, the original model parameters are updated to complete the training process, and the formula is as follows:
Figure GDA0002579442710000103
wherein, omega is the original model parameter; λ ∈ [0,1] is a constant, indicating the learning rate.
The method of the present application is described in detail below.
In the attention module, inspired by the use of residual hop connectivity by residual networks to enhance network performance, the present disclosure proposes residual layered attention learning to obtain more generalized and efficient attention-aware convolution features, and fig. 2 is a schematic diagram of the proposed residual layered attention module.
In a conventional attention mechanism, the output Q (x) of the attention module0) Can be expressed as:
Q(x0)=M(x0)*P(x0)
wherein, P (x)0) Samples x output by convolutional layers in the training branch as input to the attention module0Is characterized by M (x)0) Attention profile, Q (x), generated for attention module0) Convolution characteristics containing attention information output by the attention module.
However, in conventional attention mechanisms, multiplying the convolution signature by an attention profile with elemental values in the range of 0 to 1 will reduce the value of the convolution signature, which in many cases reduces the performance of the original convolution network. Therefore, inspired by a residual network, an attention module based on residual information is proposed to solve the problem. In addition, the attention module proposed by the present disclosure employs a bottom-up-top-down sand drain structure. Similarly to the convolution characteristics of different convolution layer outputs containing different sample information, in the attention module, the outputs of different upsampling layers can reflect different attention information, and therefore, the present disclosure integrates them to obtain a more accurate attention distribution map.
The residual layered attention learning proposed by the present disclosure can be expressed as the following formula, and the layered attention distribution map is fused with the features input by the convolutional layer to form the final convolutional features containing attention information:
Q(x0)=∑uMu(x0)*P(x0)+P(x0)
wherein u is the number of upsampling layers in the attention module. Here, the attention profiles output by the different upsampling layers have different resolutions, and the lower resolution profile is processed using nearest neighbor interpolation to maintain a resolution consistent with the higher resolution profile.
As shown in fig. 2, the attention module of the present disclosure uses a hopping connection in a residual network. In the conventional attention mechanism, only the output of the last upsampled layer is used as the attention profile, and the information output by the previous upsampled layer is discarded. Unlike conventional attention mechanisms, the present disclosure takes and combines attention information with different meanings and contributions output by multiple upsampling layers. In fig. 2, the output of the pre-upsampling layer with lower resolution contains more global information, which helps in locating the target and preventing drift problems due to occlusion or other factors; while the output of the post-upsampling layer with higher resolution contains more accurate local information, which helps to distinguish objects from similar objects and to adapt to changes in the object.
The context information can provide more auxiliary information for target positioning in the tracking process, which helps to improve the accuracy of target tracking, especially in complex environments. However, the conventional correlator-based method only retains a small amount of context information in the tracking process because a cosine window for alleviating the boundary effect is used, and fig. 3 is a graph comparing the context information included in the conventional method and the context information used in the method.
As shown in fig. 3, in the model training phase, the method of the present disclosure extracts context samples for model update. With A + pnCentered, extract context sample x1:kWherein p isnIs the position of the target in the nth frame, A [ -size (x)0,1),0;0,-size(x0,2);size(x0,1),0;0,size(x0,2)]The scale of the context sample is consistent with the training sample.
Unlike the conventional correlation filter-based target tracking method which uses a static gaussian-shaped regression target, the method of the present disclosure uses a dynamic regression target y, which can adapt to the motion of the target and the distribution of the target, wherein y obeys a noise model y-y + n,
Figure GDA0002579442710000121
y0is a constraint matrix used to construct the regression target y. Fig. 4 is a process diagram illustrating the process of extracting transition samples and obtaining an adaptive regression target according to the method of the present disclosure.
As shown in FIG. 4, with T + pnCentered, j transition samples m are extracted1:jThe scale of the transformed sample is consistent with the scale of the training sample, where pnThe position of the target in the nth frame, j equals 7, and T equals [ T ]1,t2,...,tj]=[0,0;0,1;1,1;0,-1;-1,0;-1,-1]*ρ,
Figure GDA0002579442710000122
M is to be1:jInputting to network test branch to obtain network response graph G (m)1:j) Taking the value of the central position of the response map as the limit matrix y0The value of the corresponding position, namely:
y0(t1:j+pn)=G(m1:j)
according to known y0And obtaining the value of the middle element based on Gaussian distribution to obtain the value of the residual element, and finally obtaining the self-adaptive regression target. In the top view of the regression target shown in fig. 4, the adaptive regression target used in the present disclosure can better reflect the distribution of the target than a gaussian-shaped regression target.
The proposed target tracking method based on residual layered attention and correlation filters was evaluated on three datasets OTB50, OTB2013, OTB 2015. The training process of the end-to-end convolutional network provided by the disclosure is introduced firstly, then the specific configuration of the experiment and the adopted evaluation method are given, and finally the experimental results obtained on the three data sets are analyzed.
The structure and training process of the end-to-end convolutional neural network is as follows:
adopting the structure of a first layer of convolution layer and a second layer of convolution layer of a VGG-16 network, and removing all pooling layers;
copying the convolutional layer into a symmetrical twin network structure, so that the network has two training branches and testing branches with consistent structures;
adding a Hourglass structure with three pooling layers after the convolutional layer of the network training branch as a residual error layering attention module of the network;
the last layer of the network is a multi-context correlation filter layer, the inputs of which are the output of the attention module and the output of the test branch.
The pre-training process for the convolutional neural network specifically comprises the following steps:
preprocessing training data, extracting a pair of image frames every 10 frames, extracting image slices in a range of 3 times of a target size, and adjusting the size of the image slices to 128 × 128 pixels;
training the network by adopting a random gradient descent method, wherein the momentum value is set to be 0.9, the weight attenuation is set to be 0.005, and the learning rate is set to be 1 e-2;
the loss function of the network training adopts a regression loss function, and the formula is as follows:
Figure GDA0002579442710000131
wherein G (z) is a network response function to the sample z, and y is a regression target obeying a Gaussian distribution;
performing 50 iterative training using a complete data set on a convolutional neural network without a residual layered attention module;
and adding a residual error layering attention module into the deep convolutional network, fixing the trained layers in the convolutional network, and performing 20 times of iterative training by using a complete data set.
The experimental configuration of the present disclosure is as follows: experiments were conducted on a 2.59GHz computer equipped with 8G memory, i5 processor and invada GTX1070GPU, which was able to execute at a speed of up to 36 frames/second in a PyTorch environment, meeting real-time requirements. All parameters used in the method were fixed and unchanged during the experiment. In the layered attention mechanism, the number of used upsampling layers is u-3; regularization parameter θ1、θ2、θ3Are each 1e -31, 0.5; the learning parameter λ is 0.012.
The results of the proposed method (Ours) on OTB data sets were compared with the results of the other 17 high performance target tracking methods. The 17 target tracking methods include SRDCFdecon, MUSTer, LCT, SRDCF, stage _ CA, CFNet, SiamFC, HDT, stage, DCF _ CA, SAMF, MEEM, DSST, KCF, TGPR, DLT, STC. Wherein the results of SRDCFdecon, LCT, SRDCF, CFNet, SimFC, HDT, Stacke and DSST are derived from the method results disclosed by their authors; the results for complete _ CA, DCF _ CA, SAMF and KCF were from the results after the processes disclosed by their authors were run on experimental equipment; results for MUSTer, MEEM, TGPR, DLT and STC came from the authors of LCT. The present disclosure uses the area under the curve (AUC) and the success rate at 20 pixels as metrics to rank the tracking success rate and tracking accuracy of these 18 methods, respectively. Fig. 5 is a graph illustrating the tracking success rate and accuracy of the top 12 bit method. Table 1 summarizes the types of 18 target tracking methods used for performance comparison.
TABLE 1 types of target tracking methods for comparison
Figure GDA0002579442710000141
Figure GDA0002579442710000151
The target tracking method based on the residual error layered attention and the correlation filter comprises the following evaluation processes:
on the OTB50 dataset, the AUC of the tracker proposed by the present disclosure was 0.591, which is 5.5% better than the second ranked SRDCFdecon (0.560). The tracker of the present disclosure outperforms the other two context-aware trackers, i.e., complete _ CA (0.542) and DCF _ CA (0.493), by 9.0% and 19.8%, respectively, which benefit from the convolution feature and the adaptation of the regression target used by the method of the present disclosure. In addition to the method of the present disclosure, the highest performing tracker with twin symmetry is CFNet (0.530), whose network structure uses two convolutional layers; however, the results were 11.5% lower than that of the present disclosure. In terms of accuracy, the tracker of the present disclosure ranks second (0.790), 1.8% lower than the HDT at the first bit (0.804). However, the tracker of the present disclosure is advantageous in that the processing speed of HDT is 10fps according to the data provided by the HDT author, and the method of the present disclosure performs three times faster than the HDT tracker.
On the OTB2013 dataset, the AUC of the tracker proposed by the present disclosure is 0.671, ranked first, 2.8% higher than SRDCFdecon (0.653). The tracker of the present disclosure outperforms the other two context-aware trackers, i.e., the complete _ CA (0.615) and DCF _ CA (0.592), by 9.1% and 13.3%, respectively. The highest performing tracker with twin symmetry, except for the method of the present disclosure, is CFNet (0.611), but the results are 9.8% lower than the method of the present disclosure. In terms of accuracy, the tracker of the present disclosure (0.889) is only 0.7% lower than the HDT tracker (0.883), but the tracker of the present disclosure is significantly better than the HDT tracker in terms of execution speed.
On the OTB2015 dataset, the AUC of the tracker proposed by the present disclosure is 0.623, ranked second, only 0.6% lower than the SRDCFdecon (0.627) located first. However, the tracker of the present disclosure performs 30 times faster than the SRDCFdecon (which authors provide a processing speed of 1fps) approach. The tracker of the present disclosure outperforms the other two context-aware trackers, i.e., the stage _ CA (0.598) and DCF _ CA (0.552), by 4.2% and 12.9%, respectively. The highest performance tracker with twin symmetry, except for the method of the present disclosure, is SiamFC (0.582), but the results are 7.0% lower than that of the present disclosure. In terms of accuracy, the tracker of the present disclosure has an accuracy of 0.815, ranked third.
The method provided by the present disclosure can stably track the target in a complex scene, and fig. 6 is a partial result schematic diagram of the method of the present disclosure tracking various targets on an OTB data set. The method disclosed by the invention can be used for effectively and stably tracking the target in a plurality of complex environments, such as large-area shielding, target appearance change, target rapid rotation, illumination change, background interference and the like.
Example two
In one or more embodiments, a target tracking system based on a residual layered attention and correlation filter includes a server including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the target tracking method based on the residual layered attention and correlation filter as described in the first embodiment.
EXAMPLE III
In one or more embodiments, a computer-readable storage medium is disclosed, on which a computer program is stored, which, when executed by a processor, performs a target tracking method based on residual layered attention and relevance filters as described in example one.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.
Although the present disclosure has been described with reference to specific embodiments, it should be understood that the scope of the present disclosure is not limited thereto, and those skilled in the art will appreciate that various modifications and changes can be made without departing from the spirit and scope of the present disclosure.

Claims (9)

1. A target tracking method based on residual layered attention and correlation filter is characterized in that: the method comprises the following steps:
(1) reading the current frame image, obtaining the position and the scale of a target in the previous frame image, and further determining a test sample in the current frame;
(2) inputting a test sample into a trained convolutional neural network to obtain the convolutional characteristic of the test sample, inputting the characteristic into a multi-context correlation filter layer, obtaining a network response through a model parameter, and determining the position and the scale of a target in a current frame;
(3) acquiring a training sample according to the position and the scale of a target in a current frame, inputting the training sample into a convolutional neural network and a residual error layering attention module, and acquiring the characteristics of the training sample containing attention information;
(4) extracting a conversion sample according to the position of a target in a current frame, inputting the conversion sample into a convolutional neural network, obtaining a self-adaptive regression target based on the network response of the conversion sample, then extracting a context sample, obtaining the characteristics of the context sample, and obtaining filter parameters containing multiple context information according to the training sample characteristics containing attention information and the self-adaptive regression target;
(5) updating the original model parameters by using the obtained filter parameters;
in the step (4), the specific process of obtaining the filter parameter containing the multi-context information includes:
taking a set point as a center, extracting a context sample, wherein the scale of the context sample is consistent with that of a training sample, and inputting the context sample to a network test branch to obtain the characteristics of the context sample;
in the training phase, filter parameters are obtained according to the features of the training samples containing attention, the features of the context samples and the limiting matrix in the adaptive regression target.
2. The method of claim 1, wherein the target tracking method based on the residual layered attention and correlation filter comprises: and (6) updating to the next frame of image, and continuously iterating the steps (1) to (5) until all image processing is finished.
3. The method of claim 1, wherein the target tracking method based on the residual layered attention and correlation filter comprises: in the step (1), the determination process of the test sample of the selected target in the current frame includes: and taking the target position of the previous frame in the current frame image as the center, extracting an image slice with the size N times that of the target position of the previous frame, wherein N is larger than 1, and adjusting the image slice to a specified pixel to be used as a test sample of the current frame.
4. The method of claim 1, wherein the target tracking method based on the residual layered attention and correlation filter comprises: in the step (2), the structure of the convolutional neural network includes:
adopting the structure of a first layer of convolution layer and a second layer of convolution layer of a VGG-16 network, and removing all pooling layers;
copying the convolutional layer into a symmetrical twin network structure, so that the network has two training branches and testing branches with consistent structures;
adding a Hourglass structure with three pooling layers after the convolutional layer of the network training branch as a residual error layering attention module of the network;
the last layer of the network is a multi-context correlation filter layer, the inputs of which are the output of the attention module and the output of the test branch.
5. The method of claim 1, wherein the target tracking method based on the residual layered attention and correlation filter comprises: in the step (2), pre-training an end-to-end convolutional neural network by adopting a training data set;
the pre-training process comprises the following specific steps:
preprocessing training data, extracting a pair of image frames every a plurality of frames, extracting an image slice in a range larger than a target size, and adjusting the size of the image slice to a set pixel to be used as a sample of a training network;
training the network by adopting a random gradient descent method;
performing iterative training using a complete data set for a plurality of times on a convolutional neural network without a residual error hierarchical attention module;
and adding the residual error layering attention module into the deep convolutional network, fixing the trained layers in the convolutional network, and performing iterative training for using the complete data set for multiple times.
6. The method of claim 1, wherein the target tracking method based on the residual layered attention and correlation filter comprises: in the step (2), the test sample is input into a test branch of the network and passes through two layers of convolution layers to obtain the characteristics of the test sample;
or, in the step (2), determining the position and the scale of the target in the current frame includes: inputting the characteristics of the test sample into a multi-context correlation filter layer, and calculating a network response value according to the model parameters; in the tracking stage, a plurality of image slices with different scales are extracted and processed into test samples to obtain the characteristics and the network response of the test samples, and the scale and the position of the target in the current frame are respectively the scale of the target in the test sample with the maximum network response value and the position corresponding to the maximum response value.
7. The method of claim 1, wherein the target tracking method based on the residual layered attention and correlation filter comprises: in the step (3), the specific process of obtaining the training sample includes: in the current frame image, taking a target position in the current frame as a center, extracting an image slice with the size N times that of the current target, wherein N is larger than 1, and adjusting the image slice to a specified pixel to be used as a training sample of the target in the current frame image;
or, in the step (3), the specific process of obtaining the training sample features containing the attention information includes:
will train sample x0Inputting convolution layer in network training branch to obtain output P (x)0) (ii) a After that time, the user can use the device,p (x)0) Inputting the data into a residual error layering attention module to obtain training sample characteristics containing attention information:
Q(x0)=∑uMu(x0)*P(x0)+P(x0)
wherein, denotes the multiplication of Hadamard by channel, Mu(x0) Denotes the attention profile generated by the attention module, u denotes the number of upsampled layers in the attention module, Q (x)0) Representing the training sample features with attention information to be input to the multi-context filter layer.
8. The method of claim 1, wherein the target tracking method based on the residual layered attention and correlation filter comprises: in the step (4), a self-adaptive regression target is obtained, which specifically comprises the following steps:
taking a set point as a center, extracting a plurality of conversion samples, wherein the scale of each conversion sample is consistent with that of a training sample, and constructing a limiting matrix with the central position and the scale consistent with those of the training samples, wherein the initial value of an element is 0;
inputting the conversion samples into a network test branch to obtain the characteristics of the conversion samples and obtain network response graphs, and taking the value of the central position of each response graph as the value of the corresponding position of the restriction matrix;
calculating values of the remaining elements based on Gaussian distribution according to values of elements in a known limiting matrix to finally obtain a limiting matrix capable of reflecting target distribution and target motion;
from the constraint matrix, an adaptive regression objective is derived, which obeys the noise model with the constraint matrix.
9. A target tracking system based on residual layered attention and correlation filter is characterized in that: comprising a server comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing the program implementing the method of residual layered attention and relevance filter based target tracking according to any of claims 1-8.
CN201811592319.2A 2018-12-20 2018-12-20 Target tracking method and system based on residual layered attention and correlation filter Active CN109685831B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811592319.2A CN109685831B (en) 2018-12-20 2018-12-20 Target tracking method and system based on residual layered attention and correlation filter

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811592319.2A CN109685831B (en) 2018-12-20 2018-12-20 Target tracking method and system based on residual layered attention and correlation filter

Publications (2)

Publication Number Publication Date
CN109685831A CN109685831A (en) 2019-04-26
CN109685831B true CN109685831B (en) 2020-08-25

Family

ID=66189235

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811592319.2A Active CN109685831B (en) 2018-12-20 2018-12-20 Target tracking method and system based on residual layered attention and correlation filter

Country Status (1)

Country Link
CN (1) CN109685831B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110070563A (en) * 2019-04-30 2019-07-30 山东大学 Correlation filter method for tracking target and system based on joint perception
CN110210551B (en) * 2019-05-28 2021-07-30 北京工业大学 Visual target tracking method based on adaptive subject sensitivity
CN110335290B (en) * 2019-06-04 2021-02-26 大连理工大学 Twin candidate region generation network target tracking method based on attention mechanism
CN110443852B (en) * 2019-08-07 2022-03-01 腾讯科技(深圳)有限公司 Image positioning method and related device
CN110827320B (en) * 2019-09-17 2022-05-20 北京邮电大学 Target tracking method and device based on time sequence prediction
CN111080541B (en) * 2019-12-06 2020-10-30 广东启迪图卫科技股份有限公司 Color image denoising method based on bit layering and attention fusion mechanism
CN110992404B (en) * 2019-12-23 2023-09-19 驭势科技(浙江)有限公司 Target tracking method, device and system and storage medium
CN111724410A (en) * 2020-05-25 2020-09-29 天津大学 Target tracking method based on residual attention
CN112907607B (en) * 2021-03-15 2024-06-18 德鲁动力科技(成都)有限公司 Deep learning, target detection and semantic segmentation method based on differential attention
CN113297959A (en) * 2021-05-24 2021-08-24 南京邮电大学 Target tracking method and system based on corner attention twin network
CN113627240B (en) * 2021-06-29 2023-07-25 南京邮电大学 Unmanned aerial vehicle tree species identification method based on improved SSD learning model
CN113689464A (en) * 2021-07-09 2021-11-23 西北工业大学 Target tracking method based on twin network adaptive multilayer response fusion
CN113947618B (en) * 2021-10-20 2023-08-29 哈尔滨工业大学 Self-adaptive regression tracking method based on modulator

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8401248B1 (en) * 2008-12-30 2013-03-19 Videomining Corporation Method and system for measuring emotional and attentional response to dynamic digital media content
CN103065326A (en) * 2012-12-26 2013-04-24 西安理工大学 Target detection method based on time-space multiscale motion attention analysis
CN103514608A (en) * 2013-06-24 2014-01-15 西安理工大学 Movement target detection and extraction method based on movement attention fusion model
CN104243916A (en) * 2014-09-02 2014-12-24 江苏大学 Moving object detecting and tracking method based on compressive sensing
CN106530329A (en) * 2016-11-14 2017-03-22 华北电力大学(保定) Fractional differential-based multi-feature combined sparse representation tracking method
CN106898015A (en) * 2017-01-17 2017-06-27 华中科技大学 A kind of multi thread visual tracking method based on the screening of self adaptation sub-block
CN107016689A (en) * 2017-02-04 2017-08-04 中国人民解放军理工大学 A kind of correlation filtering of dimension self-adaption liquidates method for tracking target

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8401248B1 (en) * 2008-12-30 2013-03-19 Videomining Corporation Method and system for measuring emotional and attentional response to dynamic digital media content
CN103065326A (en) * 2012-12-26 2013-04-24 西安理工大学 Target detection method based on time-space multiscale motion attention analysis
CN103514608A (en) * 2013-06-24 2014-01-15 西安理工大学 Movement target detection and extraction method based on movement attention fusion model
CN104243916A (en) * 2014-09-02 2014-12-24 江苏大学 Moving object detecting and tracking method based on compressive sensing
CN106530329A (en) * 2016-11-14 2017-03-22 华北电力大学(保定) Fractional differential-based multi-feature combined sparse representation tracking method
CN106898015A (en) * 2017-01-17 2017-06-27 华中科技大学 A kind of multi thread visual tracking method based on the screening of self adaptation sub-block
CN107016689A (en) * 2017-02-04 2017-08-04 中国人民解放军理工大学 A kind of correlation filtering of dimension self-adaption liquidates method for tracking target

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
The Role of Visual Attention in Multiple Object Tracking;Doran M M等;《Attention Perception & Psychophysics》;20101231;第72卷(第1期);第33-52页 *
基于显著性的视觉目标跟踪研究;伍博;《中国博士学位论文全文数据库 信息科技辑》;20180115(第1期);第I138-98页 *

Also Published As

Publication number Publication date
CN109685831A (en) 2019-04-26

Similar Documents

Publication Publication Date Title
CN109685831B (en) Target tracking method and system based on residual layered attention and correlation filter
CN113065558B (en) Lightweight small target detection method combined with attention mechanism
CN108399611B (en) Multi-focus image fusion method based on gradient regularization
CN108830818B (en) Rapid multi-focus image fusion method
CN109543559B (en) Target tracking method and system based on twin network and action selection mechanism
CN109741366B (en) Related filtering target tracking method fusing multilayer convolution characteristics
Wang et al. A review of image super-resolution approaches based on deep learning and applications in remote sensing
CN113012172A (en) AS-UNet-based medical image segmentation method and system
CN104156918B (en) SAR image noise suppression method based on joint sparse representation and residual fusion
CN112712546A (en) Target tracking method based on twin neural network
Gong et al. Combining sparse representation and local rank constraint for single image super resolution
CN116681679A (en) Medical image small target segmentation method based on double-branch feature fusion attention
CN112232134A (en) Human body posture estimation method based on hourglass network and attention mechanism
CN112785624A (en) RGB-D characteristic target tracking method based on twin network
CN113589286B (en) Unscented Kalman filtering phase unwrapping method based on D-LinkNet
Li et al. Transformer helps identify kiwifruit diseases in complex natural environments
He et al. Remote sensing image super-resolution using deep–shallow cascaded convolutional neural networks
Sreelakshmi et al. Fast and denoise feature extraction based ADMF–CNN with GBML framework for MRI brain image
Abbasi-Sureshjani et al. Boosted exudate segmentation in retinal images using residual nets
Xu et al. AutoSegNet: An automated neural network for image segmentation
CN114049314A (en) Medical image segmentation method based on feature rearrangement and gated axial attention
CN108305268A (en) A kind of image partition method and device
Wang et al. Multi-feature fusion tracking algorithm based on generative compression network
CN106934398A (en) Image de-noising method based on super-pixel cluster and rarefaction representation
CN116740362B (en) Attention-based lightweight asymmetric scene semantic segmentation method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant