CN109685831B

CN109685831B - Target tracking method and system based on residual layered attention and correlation filter

Info

Publication number: CN109685831B
Application number: CN201811592319.2A
Authority: CN
Inventors: 马昕; 黄文慧; 宋锐; 荣学文; 田国会; 李贻斌
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2018-12-20
Filing date: 2018-12-20
Publication date: 2020-08-25
Anticipated expiration: 2038-12-20
Also published as: CN109685831A

Abstract

The disclosure provides a target tracking method and system based on residual layered attention and correlation filter. The present disclosure uses an end-to-end trained convolutional neural network with a correlation filter as a layer in the network, enabling real-time target tracking of moving targets. Moreover, through residual error layered attention learning, more effective and robust convolution target characteristics can be obtained, and the generalization capability of target tracking is remarkably improved. In addition, the multi-context correlation filtering layer realizes the perception of the context and the self-adaption of the regression target in a joint mode, and obviously improves the discrimination capability of target tracking.

Description

Target tracking method and system based on residual layered attention and correlation filter

Technical Field

The disclosure relates to a target tracking method and system based on residual layered attention and correlation filter.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

Target tracking of a moving target is an important branch of computer vision and a research hotspot, and is widely applied in many fields, such as motion event detection, video monitoring, biological vision, and the like. However, target tracking remains a very challenging issue due to the problems of shape change, illumination change, occlusion, background interference, etc. that often occur during tracking.

In recent years, a target tracking method based on a correlation filter has attracted wide attention and is rapidly developed. The method can achieve higher tracking precision and has higher processing speed. In the tracking process, the context information contains many important foreground and background clues, and the information helps to improve the accuracy of target positioning. However, correlation filter based target tracking methods are generally not context-aware; some of these methods, although using the context information in the tracking process, further reduce the context information contained in the search area in these methods because the search area of each frame only contains a small amount of context area and the cosine window for reducing the boundary effect.

In the last five years, related methods and technologies of deep network and machine learning are gradually applied to target tracking, and the performance of target tracking is greatly improved. Compared with the traditional target tracking method, the method has the advantages that the tracking accuracy and the tracking success rate are remarkably improved. However, many target tracking methods based on deep learning adopt a pre-trained network such as VGG or Alexnet, and then other existing tracking methods are superimposed, so that it is difficult to meet the real-time requirement of target tracking, end-to-end network training is not really performed, and the advantages of the deep network are fully exerted.

Disclosure of Invention

In order to solve the above problems, the present disclosure provides a target tracking method and system based on residual layered attention and correlation filter. The present disclosure uses an end-to-end trained convolutional neural network with a correlation filter as a layer in the network, enabling real-time target tracking of moving targets. Moreover, through residual error layered attention learning, more effective and robust convolution target characteristics can be obtained, and the generalization capability of target tracking is remarkably improved. In addition, the multi-context correlation filtering layer realizes the perception of the context and the self-adaption of the regression target in a joint mode, and obviously improves the discrimination capability of target tracking.

According to some embodiments, the following technical scheme is adopted in the disclosure:

a target tracking method based on residual layered attention and correlation filter comprises the following steps:

(1) reading the current frame image, obtaining the position and the scale of a target in the previous frame image, and further determining a test sample in the current frame;

(2) inputting a test sample into a trained convolutional neural network to obtain the convolutional characteristic of the test sample, inputting the characteristic into a multi-context correlation filter layer, obtaining a network response through a model parameter, and determining the position and the scale of a target in a current frame;

(3) acquiring a training sample according to the position and the scale of a target in a current frame, inputting the training sample into a convolutional neural network and a residual error layering attention module, and acquiring the characteristics of the training sample containing attention information;

(4) extracting a conversion sample according to the position of a target in a current frame, inputting the conversion sample into a convolutional neural network, obtaining a self-adaptive regression target based on the network response of the conversion sample, then extracting a context sample, obtaining the characteristics of the context sample, and obtaining filter parameters containing multiple context information according to the training sample characteristics containing attention information and the self-adaptive regression target;

(5) and updating the original model parameters by using the obtained filter parameters.

By way of further limitation, the method further comprises the step (6) of updating to the next frame image, and continuously iterating the steps (1) - (5) until all image processing is completed.

By way of further limitation, in step (1), the determining of the test sample of the selected target in the current frame includes: and taking the target position of the previous frame in the current frame image as the center, extracting an image slice with the size N times that of the target position of the previous frame, wherein N is larger than 1, and adjusting the image slice to a specified pixel to be used as a test sample of the current frame.

By way of further limitation, in step (2), the structure of the convolutional neural network includes:

adopting the structure of a first layer of convolution layer and a second layer of convolution layer of a VGG-16 network, and removing all pooling layers;

copying the convolutional layer into a symmetrical twin network structure, so that the network has two training branches and testing branches with consistent structures;

adding a Hourglass structure with three pooling layers after the convolutional layer of the network training branch as a residual error layering attention module of the network;

the last layer of the network is a multi-context correlation filter layer, the inputs of which are the output of the attention module and the output of the test branch.

As a further limitation, in the step (2), the end-to-end convolutional neural network is pre-trained by using a training data set.

As a further limitation, in step (2), the pre-training comprises:

preprocessing training data, extracting a pair of image frames every a plurality of frames, extracting an image slice in a range larger than a target size, and adjusting the size of the image slice to a set pixel to be used as a sample of a training network;

training the network by adopting a random gradient descent method;

performing iterative training using a complete data set for a plurality of times on a convolutional neural network without a residual error hierarchical attention module;

and adding the residual error layering attention module into the deep convolutional network, fixing the trained layers in the convolutional network, and performing iterative training for using the complete data set for multiple times.

As a further limitation, in the step (2), the test sample is input into the test branch of the network, and passes through two convolutional layers, so as to obtain the characteristics of the test sample.

By way of further limitation, in step (2), the determining the position and the scale of the target in the current frame includes: inputting the characteristics of the test sample into a multi-context correlation filter layer, and calculating a network response value according to the model parameters; in the tracking stage, a plurality of image slices with different scales are extracted and processed into test samples to obtain the characteristics and the network response of the test samples, and the scale and the position of the target in the current frame are respectively the scale of the target in the test sample with the maximum network response value and the position corresponding to the maximum response value.

As a further limitation, in step (3), the specific process of obtaining the training sample includes: in the current frame image, taking the target position in the current frame as the center, extracting an image slice with the size N times of the current target size, wherein N is larger than 1, and adjusting the image slice to a specified pixel to be used as a training sample of the target in the current frame image.

By way of further limitation, in step (3), the specific process of obtaining the training sample features containing the attention information includes:

will train sample x₀Inputting convolution layer in network training branch to obtain output P (x)₀) (ii) a Then, P (x)₀) Inputting the data into a residual error layering attention module to obtain training sample characteristics containing attention information:

Q(x₀)＝∑_uM_u(x₀)*P(x₀)+P(x₀)

wherein, denotes the multiplication of Hadamard by channel, M_u(x₀) Denotes the attention profile generated by the attention module, u denotes the number of upsampled layers in the attention module, Q (x)₀) Representing the training sample features with attention information to be input to the multi-context filter layer.

As a further limitation, in the step (4), an adaptive regression target is obtained, which specifically includes the following steps:

taking a set point as a center, extracting a plurality of conversion samples, wherein the scale of each conversion sample is consistent with that of a training sample, and constructing a limiting matrix with the central position and the scale consistent with those of the training samples, wherein the initial value of an element is 0;

inputting the conversion samples into a network test branch to obtain the characteristics of the conversion samples and obtain network response graphs, and taking the value of the central position of each response graph as the value of the corresponding position of the restriction matrix;

calculating values of the remaining elements based on Gaussian distribution according to values of elements in a known limiting matrix to finally obtain a limiting matrix capable of reflecting target distribution and target motion;

from the constraint matrix, an adaptive regression objective is derived, which obeys the noise model with the constraint matrix.

As a further limitation, in step (4), the specific process of obtaining the filter parameter containing the multiple context information includes:

taking a set point as a center, extracting a context sample, wherein the scale of the context sample is consistent with that of a training sample, and inputting the context sample to a network test branch to obtain the characteristics of the context sample;

in the training phase, filter parameters are obtained according to the features of the training samples containing attention, the features of the context samples and the limiting matrix in the adaptive regression target.

In one or more embodiments, a target tracking system based on a residual layered attention and relevance filter includes a server including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the above-described method for target tracking based on the residual layered attention and relevance filter when executing the program.

In one or more embodiments, a computer-readable storage medium has stored thereon a computer program which, when executed by a processor, performs the above-described residual layered attention and correlation filter based target tracking system method.

Compared with the prior art, the beneficial effect of this disclosure is:

(1) aiming at target tracking, a convolutional neural network trained end to end is provided, the real-time requirement of target tracking can be met, a correlation filter is integrated into the provided network to be used as a correlation filtering layer of the network, and the discrimination capability of the network is improved;

(2) residual error layered attention learning is provided, and the generalization capability of the network is improved by utilizing residual error information and information of a plurality of upper sampling layers in an attention module;

(3) by constructing a new target function based on a correlation filtering layer, a multi-context correlation filtering layer is provided, context sensing and regression target self-adaptation can be carried out, and the multi-context correlation filtering layer is combined into multi-context information, so that the target positioning and model learning are facilitated, and the network performance is further improved;

(4) the method can effectively and stably track the moving target in a plurality of complex environments, such as large-area shielding, target appearance change, target rapid rotation, illumination change, background interference and the like.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application.

FIG. 1 is a schematic diagram of a target tracking method based on a residual layered attention and correlation filter;

FIG. 2 is a schematic diagram of a proposed residual layered attention module;

FIG. 3 is a diagram comparing context information used in a conventional method and context information used in the method;

FIG. 4 is a diagram of a process of extracting transition samples and obtaining adaptive regression targets;

FIG. 5 is a graph of tracking accuracy and tracking success rate on an OTB50, OTB2013, OTB2015 data set;

FIG. 6 is a partial result diagram of tracking different types of targets on an OTB data set.

The specific implementation mode is as follows:

the present disclosure is further described with reference to the following drawings and examples.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

In the present disclosure, terms such as "upper", "lower", "left", "right", "front", "rear", "vertical", "horizontal", "side", "bottom", and the like indicate orientations or positional relationships based on those shown in the drawings, and are only relational terms determined for convenience in describing structural relationships of the parts or elements of the present disclosure, and do not refer to any parts or elements of the present disclosure, and are not to be construed as limiting the present disclosure.

In the present disclosure, terms such as "fixedly connected", "connected", and the like are to be understood in a broad sense, and mean either a fixed connection or an integrally connected or detachable connection; may be directly connected or indirectly connected through an intermediate. The specific meanings of the above terms in the present disclosure can be determined on a case-by-case basis by persons skilled in the relevant art or technicians, and are not to be construed as limitations of the present disclosure.

Example one

In one or more embodiments, a target tracking method based on residual layered attention and correlation filter is disclosed, as shown in fig. 1, which can perform real-time target tracking through the proposed end-to-end convolutional network with a residual layered attention mechanism and multiple context correlation filter layers.

In a certain frame, inputting a test sample to a test branch of the proposed symmetrical twin network structure to obtain the characteristics of the test sample; the structure of the convolutional layers in the test branch takes the structure of the first and second convolutional layers in the VGG-16 network and removes all pooling layers therein.

Inputting the obtained convolution characteristic z to the multi-context correlation filtering layer to obtain the response of the filter, namely the output response G (z) of the network, wherein the formula is as follows:

where ω is a model parameter, "-" represents a discrete fourier transform, "omicron" represents sequential multiplication of matrix elements.

Repeating the above operations on the test samples with different scales in the current frame to obtain the characteristic z of the test samples_sAnd corresponding network output response G (z)_s) (ii) a Taking the position corresponding to the maximum value in all network responses as the position of the target in the current frame, and simultaneously according to the scale of the target in the test sample corresponding to the maximum response value, the optimal scale of the target is the most, and the formula is as follows:

entering a training stage, and extracting a training sample x according to the current position and scale of the target₀Will train sample x₀Inputting convolution layer in network training branch to obtain output P (x)₀) The convolutional layer structure in the training branch is consistent with that in the testing branch; then, P (x)₀) Inputting the data into a residual error layered attention module to obtain training sample characteristics containing attention information, wherein the formula is as follows:

Q(x₀)＝∑_uM_u(x₀)*P(x₀)+P(x₀)

Extracting a context sample and a transformation sample according to the current position and scale of the target, inputting the context sample and the transformation sample into the test branch of the proposed symmetrical twin network structure to respectively obtain the convolution characteristic and the transformation sample characteristic of the context sample, and constructing a self-adaptive regression target according to the characteristic of the transformation sample.

In a multi-context correlation filtering layer, the present disclosure provides a new objective function, and by solving an optimal value of the objective function, context sensing, adaptation of a regression target, and learning of filter parameters can be performed in a joint manner, and the formula is as follows:

wherein w represents a filter parameter; y is₀Representing a constraint matrix used to construct a regression target y; x₀Representing a sample acquired in an image; x_iRepresenting a context sample; regularization term parameter θ₁,θ₂,θ₃∈(0,1]Is a constant to prevent overfitting; here X₀And X_iFor cyclic samples, their base samples are x respectively₀And x_i。

By solving the optimal value of the objective function of the proposed multi-context correlation filter layer, the closed form solution of the filter parameters can be calculated by using the properties of the circulant matrix and an inversion formula as follows:

wherein the content of the first and second substances,

is x₀The complex conjugate of (a). In the training phase, the obtained training sample features x may be used₀Context feature x_iConstraint matrix y in adaptive regression target₀And obtaining the filter parameter w. Where x is₀Features of training samples of the input correlation filtering layer, but not training samples; x is the number of_iIs a characteristic of the context sample of the input correlation filtering layer, not the context sample.

Based on the new filter parameters, the original model parameters are updated to complete the training process, and the formula is as follows:

wherein, omega is the original model parameter; λ ∈ [0,1] is a constant, indicating the learning rate.

The method of the present application is described in detail below.

In the attention module, inspired by the use of residual hop connectivity by residual networks to enhance network performance, the present disclosure proposes residual layered attention learning to obtain more generalized and efficient attention-aware convolution features, and fig. 2 is a schematic diagram of the proposed residual layered attention module.

In a conventional attention mechanism, the output Q (x) of the attention module₀) Can be expressed as:

Q(x₀)＝M(x₀)*P(x₀)

wherein, P (x)₀) Samples x output by convolutional layers in the training branch as input to the attention module₀Is characterized by M (x)₀) Attention profile, Q (x), generated for attention module₀) Convolution characteristics containing attention information output by the attention module.

However, in conventional attention mechanisms, multiplying the convolution signature by an attention profile with elemental values in the range of 0 to 1 will reduce the value of the convolution signature, which in many cases reduces the performance of the original convolution network. Therefore, inspired by a residual network, an attention module based on residual information is proposed to solve the problem. In addition, the attention module proposed by the present disclosure employs a bottom-up-top-down sand drain structure. Similarly to the convolution characteristics of different convolution layer outputs containing different sample information, in the attention module, the outputs of different upsampling layers can reflect different attention information, and therefore, the present disclosure integrates them to obtain a more accurate attention distribution map.

The residual layered attention learning proposed by the present disclosure can be expressed as the following formula, and the layered attention distribution map is fused with the features input by the convolutional layer to form the final convolutional features containing attention information:

Q(x₀)＝∑_uM_u(x₀)*P(x₀)+P(x₀)

wherein u is the number of upsampling layers in the attention module. Here, the attention profiles output by the different upsampling layers have different resolutions, and the lower resolution profile is processed using nearest neighbor interpolation to maintain a resolution consistent with the higher resolution profile.

As shown in fig. 2, the attention module of the present disclosure uses a hopping connection in a residual network. In the conventional attention mechanism, only the output of the last upsampled layer is used as the attention profile, and the information output by the previous upsampled layer is discarded. Unlike conventional attention mechanisms, the present disclosure takes and combines attention information with different meanings and contributions output by multiple upsampling layers. In fig. 2, the output of the pre-upsampling layer with lower resolution contains more global information, which helps in locating the target and preventing drift problems due to occlusion or other factors; while the output of the post-upsampling layer with higher resolution contains more accurate local information, which helps to distinguish objects from similar objects and to adapt to changes in the object.

The context information can provide more auxiliary information for target positioning in the tracking process, which helps to improve the accuracy of target tracking, especially in complex environments. However, the conventional correlator-based method only retains a small amount of context information in the tracking process because a cosine window for alleviating the boundary effect is used, and fig. 3 is a graph comparing the context information included in the conventional method and the context information used in the method.

As shown in fig. 3, in the model training phase, the method of the present disclosure extracts context samples for model update. With A + p_nCentered, extract context sample x_1:kWherein p is_nIs the position of the target in the nth frame, A [ -size (x)₀,1),0；0,-size(x₀,2)；size(x₀,1),0；0,size(x₀,2)]The scale of the context sample is consistent with the training sample.

Unlike the conventional correlation filter-based target tracking method which uses a static gaussian-shaped regression target, the method of the present disclosure uses a dynamic regression target y, which can adapt to the motion of the target and the distribution of the target, wherein y obeys a noise model y-y + n,

y₀is a constraint matrix used to construct the regression target y. Fig. 4 is a process diagram illustrating the process of extracting transition samples and obtaining an adaptive regression target according to the method of the present disclosure.

As shown in FIG. 4, with T + p_nCentered, j transition samples m are extracted_1:jThe scale of the transformed sample is consistent with the scale of the training sample, where p_nThe position of the target in the nth frame, j equals 7, and T equals [ T ]₁,t₂,...,t_j]＝[0,0；0,1；1,1；0,-1；-1,0；-1,-1]*ρ，

M is to be_1:jInputting to network test branch to obtain network response graph G (m)_1:j) Taking the value of the central position of the response map as the limit matrix y₀The value of the corresponding position, namely:

y₀(t_1:j+p_n)＝G(m_1:j)

according to known y₀And obtaining the value of the middle element based on Gaussian distribution to obtain the value of the residual element, and finally obtaining the self-adaptive regression target. In the top view of the regression target shown in fig. 4, the adaptive regression target used in the present disclosure can better reflect the distribution of the target than a gaussian-shaped regression target.

The proposed target tracking method based on residual layered attention and correlation filters was evaluated on three datasets OTB50, OTB2013, OTB 2015. The training process of the end-to-end convolutional network provided by the disclosure is introduced firstly, then the specific configuration of the experiment and the adopted evaluation method are given, and finally the experimental results obtained on the three data sets are analyzed.

The structure and training process of the end-to-end convolutional neural network is as follows:

The pre-training process for the convolutional neural network specifically comprises the following steps:

preprocessing training data, extracting a pair of image frames every 10 frames, extracting image slices in a range of 3 times of a target size, and adjusting the size of the image slices to 128 × 128 pixels;

training the network by adopting a random gradient descent method, wherein the momentum value is set to be 0.9, the weight attenuation is set to be 0.005, and the learning rate is set to be 1 e-2;

the loss function of the network training adopts a regression loss function, and the formula is as follows:

wherein G (z) is a network response function to the sample z, and y is a regression target obeying a Gaussian distribution;

performing 50 iterative training using a complete data set on a convolutional neural network without a residual layered attention module;

and adding a residual error layering attention module into the deep convolutional network, fixing the trained layers in the convolutional network, and performing 20 times of iterative training by using a complete data set.

The experimental configuration of the present disclosure is as follows: experiments were conducted on a 2.59GHz computer equipped with 8G memory, i5 processor and invada GTX1070GPU, which was able to execute at a speed of up to 36 frames/second in a PyTorch environment, meeting real-time requirements. All parameters used in the method were fixed and unchanged during the experiment. In the layered attention mechanism, the number of used upsampling layers is u-3; regularization parameter θ₁、θ₂、θ₃Are each 1e ^-31, 0.5; the learning parameter λ is 0.012.

The results of the proposed method (Ours) on OTB data sets were compared with the results of the other 17 high performance target tracking methods. The 17 target tracking methods include SRDCFdecon, MUSTer, LCT, SRDCF, stage _ CA, CFNet, SiamFC, HDT, stage, DCF _ CA, SAMF, MEEM, DSST, KCF, TGPR, DLT, STC. Wherein the results of SRDCFdecon, LCT, SRDCF, CFNet, SimFC, HDT, Stacke and DSST are derived from the method results disclosed by their authors; the results for complete _ CA, DCF _ CA, SAMF and KCF were from the results after the processes disclosed by their authors were run on experimental equipment; results for MUSTer, MEEM, TGPR, DLT and STC came from the authors of LCT. The present disclosure uses the area under the curve (AUC) and the success rate at 20 pixels as metrics to rank the tracking success rate and tracking accuracy of these 18 methods, respectively. Fig. 5 is a graph illustrating the tracking success rate and accuracy of the top 12 bit method. Table 1 summarizes the types of 18 target tracking methods used for performance comparison.

TABLE 1 types of target tracking methods for comparison

The target tracking method based on the residual error layered attention and the correlation filter comprises the following evaluation processes:

on the OTB50 dataset, the AUC of the tracker proposed by the present disclosure was 0.591, which is 5.5% better than the second ranked SRDCFdecon (0.560). The tracker of the present disclosure outperforms the other two context-aware trackers, i.e., complete _ CA (0.542) and DCF _ CA (0.493), by 9.0% and 19.8%, respectively, which benefit from the convolution feature and the adaptation of the regression target used by the method of the present disclosure. In addition to the method of the present disclosure, the highest performing tracker with twin symmetry is CFNet (0.530), whose network structure uses two convolutional layers; however, the results were 11.5% lower than that of the present disclosure. In terms of accuracy, the tracker of the present disclosure ranks second (0.790), 1.8% lower than the HDT at the first bit (0.804). However, the tracker of the present disclosure is advantageous in that the processing speed of HDT is 10fps according to the data provided by the HDT author, and the method of the present disclosure performs three times faster than the HDT tracker.

On the OTB2013 dataset, the AUC of the tracker proposed by the present disclosure is 0.671, ranked first, 2.8% higher than SRDCFdecon (0.653). The tracker of the present disclosure outperforms the other two context-aware trackers, i.e., the complete _ CA (0.615) and DCF _ CA (0.592), by 9.1% and 13.3%, respectively. The highest performing tracker with twin symmetry, except for the method of the present disclosure, is CFNet (0.611), but the results are 9.8% lower than the method of the present disclosure. In terms of accuracy, the tracker of the present disclosure (0.889) is only 0.7% lower than the HDT tracker (0.883), but the tracker of the present disclosure is significantly better than the HDT tracker in terms of execution speed.

On the OTB2015 dataset, the AUC of the tracker proposed by the present disclosure is 0.623, ranked second, only 0.6% lower than the SRDCFdecon (0.627) located first. However, the tracker of the present disclosure performs 30 times faster than the SRDCFdecon (which authors provide a processing speed of 1fps) approach. The tracker of the present disclosure outperforms the other two context-aware trackers, i.e., the stage _ CA (0.598) and DCF _ CA (0.552), by 4.2% and 12.9%, respectively. The highest performance tracker with twin symmetry, except for the method of the present disclosure, is SiamFC (0.582), but the results are 7.0% lower than that of the present disclosure. In terms of accuracy, the tracker of the present disclosure has an accuracy of 0.815, ranked third.

The method provided by the present disclosure can stably track the target in a complex scene, and fig. 6 is a partial result schematic diagram of the method of the present disclosure tracking various targets on an OTB data set. The method disclosed by the invention can be used for effectively and stably tracking the target in a plurality of complex environments, such as large-area shielding, target appearance change, target rapid rotation, illumination change, background interference and the like.

Example two

In one or more embodiments, a target tracking system based on a residual layered attention and correlation filter includes a server including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the target tracking method based on the residual layered attention and correlation filter as described in the first embodiment.

EXAMPLE III

In one or more embodiments, a computer-readable storage medium is disclosed, on which a computer program is stored, which, when executed by a processor, performs a target tracking method based on residual layered attention and relevance filters as described in example one.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Although the present disclosure has been described with reference to specific embodiments, it should be understood that the scope of the present disclosure is not limited thereto, and those skilled in the art will appreciate that various modifications and changes can be made without departing from the spirit and scope of the present disclosure.

Claims

1. A target tracking method based on residual layered attention and correlation filter is characterized in that: the method comprises the following steps:

(5) updating the original model parameters by using the obtained filter parameters;

in the step (4), the specific process of obtaining the filter parameter containing the multi-context information includes:

2. The method of claim 1, wherein the target tracking method based on the residual layered attention and correlation filter comprises: and (6) updating to the next frame of image, and continuously iterating the steps (1) to (5) until all image processing is finished.

3. The method of claim 1, wherein the target tracking method based on the residual layered attention and correlation filter comprises: in the step (1), the determination process of the test sample of the selected target in the current frame includes: and taking the target position of the previous frame in the current frame image as the center, extracting an image slice with the size N times that of the target position of the previous frame, wherein N is larger than 1, and adjusting the image slice to a specified pixel to be used as a test sample of the current frame.

4. The method of claim 1, wherein the target tracking method based on the residual layered attention and correlation filter comprises: in the step (2), the structure of the convolutional neural network includes:

5. The method of claim 1, wherein the target tracking method based on the residual layered attention and correlation filter comprises: in the step (2), pre-training an end-to-end convolutional neural network by adopting a training data set;

the pre-training process comprises the following specific steps:

training the network by adopting a random gradient descent method;

6. The method of claim 1, wherein the target tracking method based on the residual layered attention and correlation filter comprises: in the step (2), the test sample is input into a test branch of the network and passes through two layers of convolution layers to obtain the characteristics of the test sample;

or, in the step (2), determining the position and the scale of the target in the current frame includes: inputting the characteristics of the test sample into a multi-context correlation filter layer, and calculating a network response value according to the model parameters; in the tracking stage, a plurality of image slices with different scales are extracted and processed into test samples to obtain the characteristics and the network response of the test samples, and the scale and the position of the target in the current frame are respectively the scale of the target in the test sample with the maximum network response value and the position corresponding to the maximum response value.

7. The method of claim 1, wherein the target tracking method based on the residual layered attention and correlation filter comprises: in the step (3), the specific process of obtaining the training sample includes: in the current frame image, taking a target position in the current frame as a center, extracting an image slice with the size N times that of the current target, wherein N is larger than 1, and adjusting the image slice to a specified pixel to be used as a training sample of the target in the current frame image;

or, in the step (3), the specific process of obtaining the training sample features containing the attention information includes:

will train sample x₀Inputting convolution layer in network training branch to obtain output P (x)₀) (ii) a After that time, the user can use the device,p (x)₀) Inputting the data into a residual error layering attention module to obtain training sample characteristics containing attention information:

Q(x₀)＝∑_uM_u(x₀)*P(x₀)+P(x₀)

8. The method of claim 1, wherein the target tracking method based on the residual layered attention and correlation filter comprises: in the step (4), a self-adaptive regression target is obtained, which specifically comprises the following steps:

9. A target tracking system based on residual layered attention and correlation filter is characterized in that: comprising a server comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing the program implementing the method of residual layered attention and relevance filter based target tracking according to any of claims 1-8.