CN109685831A

CN109685831A - Method for tracking target and system based on residual error layering attention and correlation filter

Info

Publication number: CN109685831A
Application number: CN201811592319.2A
Authority: CN
Inventors: 马昕; 黄文慧; 宋锐; 荣学文; 田国会; 李贻斌
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2018-12-20
Filing date: 2018-12-20
Publication date: 2019-04-26
Anticipated expiration: 2038-12-20
Also published as: CN109685831B

Abstract

The present disclosure proposes a kind of method for tracking target and system based on residual error layering attention and correlation filter.The disclosure uses the convolutional neural networks of end-to-end training, and using correlation filter as the layer in network, realizes and carries out real-time target following to moving target.Moreover, being layered attention study by residual error, more effective, robust convolution target signature can be obtained, the generalization ability of target following has been obviously improved.Perception is carried out to context in joint fashion and regressive object is carried out adaptively, to be obviously improved the discriminating power of target following in addition, multi-context correlation wave filtering layer realizes.

Description

Method for tracking target and system based on residual error layering attention and correlation filter

Technical field

This disclosure relates to a kind of method for tracking target and system based on residual error layering attention and correlation filter.

Background technique

Only there is provided background technical informations relevant to the disclosure for the statement of this part, it is not necessary to so constitute first skill Art.

The target moved to one carries out the important branch and research hotspot that target following is computer vision, and in many Field, such as motion event detection, video monitoring, biological vision, are widely used.However, due to often occurring during tracking Profile variation, illumination variation, block, background interference the problems such as, target following is still an extremely challenging project.

In recent years, the method for tracking target based on correlation filter causes extensive concern, quickly grows.This kind of side Method can reach higher tracking accuracy, while have faster processing speed.During tracking, contextual information is contained Many important foreground and background clues, these information help to promote the accuracy of target positioning.However, being filtered based on correlation The method for tracking target of wave device not can be carried out context-aware usually；Part such methods are although be utilized during tracking Contextual information, but since the region of search of each frame only includes a small amount of context area, and for weakening boundary effect Cosine Window can be further reduced the contextual information that region of search includes in such methods.

In nearest 5 years, the correlation technique and technology of depth network and machine learning are gradually applied to target following In, substantially increase the performance of target following.Such methods compared with traditional method for tracking target, tracking precision and with Track success rate is obviously improved.However, it is many based on the method for tracking target of deep learning using as VGG or Alexnet etc. trained in advance network, is superimposed with other existing trackings again later, is difficult to meet target following real-time It is required that not carrying out network training end to end veritably, and give full play to the advantage of depth network.

Summary of the invention

The disclosure to solve the above-mentioned problems, proposes a kind of mesh based on residual error layering attention and correlation filter Mark tracking and system.The disclosure uses the convolutional neural networks of end-to-end training, and using correlation filter as network In layer, realize and real-time target following carried out to moving target.Moreover, being layered attention study by residual error, can obtain More effectively, the convolution target signature of robust, be obviously improved the generalization ability of target following.In addition, multi-context correlation Wave filtering layer, which realizes, to carry out perception to context in joint fashion and carries out adaptively, being obviously improved target to regressive object The discriminating power of tracking.

According to some embodiments, the disclosure is adopted the following technical scheme that

A kind of method for tracking target based on residual error layering attention and correlation filter, comprising the following steps:

(1) current frame image is read, obtains position and scale of the target in previous frame image, and then determine in present frame Test sample；

(2) test sample is input to the convolutional neural networks after training, obtains the convolution feature of test sample, it will be described Feature is input to multi-context correlation wave filtering layer, by model parameter, obtains network response, and determine target in present frame Position and scale；

(3) position according to target in present frame and scale obtain training sample, by training sample input convolution mind Power module is paid attention to through network and residual error layering, is obtained containing the training sample feature for paying attention to force information；

(4) transformation sample is extracted, convolutional neural networks are inputted, based on transformation in the position of present frame according to target The network of sample responds, and obtains adaptive regressive object, later, extracts context sample, obtains context sample feature, and According to containing the training sample feature and adaptive regressive object for paying attention to force information, the filtering containing multi-context information is obtained Device parameter；

(5) using the filter parameter obtained, original model parameter is updated.

It is limited as further, the method also includes step (6), are updated to next frame image, constantly progress step (1)-(5) iteration, until all image procossings are completed.

It is limited as further, in the step (1), determination process packet of the selected target in the test sample of present frame It includes: in current frame image centered on previous frame target position, extracting the image of N times of previous frame target scale of scale Piece, N is greater than 1, and image sheet is adjusted to specified pixel, the test sample as present frame.

It is limited as further, in the step (2), the structure of convolutional neural networks includes:

Using the structure of VGG-16 network first tier convolutional layer and second layer convolutional layer, and remove all pond layers；

The above convolutional layer is copied as into symmetrical twin network structure, making network tool, there are two the consistent training of structure point Branch and test branch；

The Hourglass structure with three layers of pond layer is added after the convolutional layer of network training branch, as the net The residual error layering of network pays attention to power module；

The last layer of network is multi-context correlation wave filtering layer, and the input of this layer is to pay attention to the output and survey of power module Try the output of branch.

Limited as further, in the step (2), using training dataset to convolutional neural networks end to end into Row pre-training.

It is limited as further, in the step (2), the detailed process of pre-training includes:

Training data is pre-processed, a pair of of picture frame is extracted every several frames, is mentioned with being greater than the range of target sizes Image sheet is taken out, and adjusts the big of image sheet and as low as sets pixel, the sample as training network；

Using stochastic gradient descent method training network；

Notice that the convolutional neural networks of power module carry out the iteration instruction that complete data set is used for multiple times to without residual error layering Practice；

Residual error is layered and notices that depth convolutional network, and the layer trained in fixed convolutional network is added in power module, is carried out The repetitive exercise of complete data set is used for multiple times.

It is limited as further, in the step (2), by the test branch of test sample input network, by two layers Convolutional layer obtains the feature of test sample.

It is limited as further, in the step (2), determines that target in the position of present frame and scale includes: that will survey Sample eigen inputs multi-context correlation wave filtering layer, and according to model parameter, calculates network response；In tracking phase, The image sheet of multiple and different scales will be extracted, handled as test sample, feature and the network for obtaining them respond, and target is current The scale of frame and position are respectively that the scale and maximum response of target in the test sample for obtain network response maximum value are corresponding Position.

It is limited as further, in the step (3), the detailed process for obtaining training sample includes: in present frame figure As in, centered on the target position in present frame, the image sheet of N times of current goal scale of scale is extracted, N is greater than 1, and Image sheet is adjusted to specified pixel, as target current frame image training sample.

It limits as further, in the step (3), obtains containing the specific of the training sample feature for paying attention to force information Process includes:

By training sample x₀The convolutional layer in network training branch is inputted, output P (x is obtained₀)；Later, by P (x₀) input It is layered to residual error and pays attention to power module, obtained containing the training sample feature for paying attention to force information:

Q(x₀)=∑_uM_u(x₀)*P(x₀)+P(x₀)

Wherein, * indicates that Hadamard is multiplied by channel, M (x₀) indicate to pay attention to the attention distribution map that power module generates, u Indicate to pay attention to the number that layer is up-sampled in power module, Q (x₀) indicate will be input to multi-context wave filtering layer with pay attention to The training sample feature of force information.

It is limited as further, in the step (4), obtains adaptive regressive object, specific as follows:

Centered on set point, several transformation samples are extracted, the scale for changing sample is consistent with the scale of training sample, And construct a center and scale and training sample the consistent restriction matrix in center and scale, element it is initial Value is 0；

Transformation sample is input to network test branch, obtains the feature of transformation sample, and obtain network response diagram, is taken every Value of the value of a response diagram center as restriction matrix corresponding position；

According to the value of element in known restriction matrix, it is based on Gaussian Profile, calculates the value of surplus element, finally obtaining can Reflect the restriction matrix of target distribution and target movement；

According to restriction matrix, adaptive regressive object is obtained, it and restriction matrix obey noise model.

It is limited as further, in the step (4), obtains the specific of the filter parameter containing multi-context information Process includes:

Centered on set point, context sample is extracted, the scale of context sample is consistent with training sample, and will be upper and lower Literary sample is input to network test branch, obtains the feature of context sample；

In the training stage, according to the training sample feature comprising attention, the feature of context sample and adaptive recurrence Restriction matrix in target obtains filter parameter.

In one or more embodiments, a kind of target following based on residual error layering attention and correlation filter System, including server, the server include memory, processor and storage on a memory and can run on a processor Computer program, the processor is realized above-mentioned based on residual error layering attention and correlation filter when executing described program Target following method.

In one or more embodiments, a kind of computer readable storage medium is stored thereon with computer program, should The above-mentioned Target Tracking System side based on residual error layering attention and correlation filter is executed when program is executed by processor Method.

Compared with prior art, the disclosure has the beneficial effect that

(1) it is directed to target following, the convolutional neural networks of end-to-end training is proposed, can satisfy target following real-time It is required that and the proposed network of correlation filter involvement is improved the differentiation of network as the correlation wave filtering layer of network Ability；

(2) residual error layering attention study is proposed, using residual information and can pay attention to multiple up-samplings in power module The information of layer, improves the generalization ability of network；

(3) objective function based on correlation wave filtering layer new by building, proposes multi-context correlation wave filtering layer, Be able to carry out context perception and regressive object it is adaptive, and be incorporated into facilitating target for multi-context information The study of positioning and model, further improves the performance of network；

(4) disclosure is under many complex environments, such as blocks in large area, target shape variation, target quickly rotate, light According under the environment such as variation, background interference, effective, stable moving target can be tracked.

Detailed description of the invention

The accompanying drawings constituting a part of this application is used to provide further understanding of the present application, and the application's shows Meaning property embodiment and its explanation are not constituted an undue limitation on the present application for explaining the application.

Fig. 1 is the method for tracking target schematic diagram based on residual error layering attention and correlation filter；

Fig. 2 is proposed residual error layering attention module diagram；

Fig. 3 is the comparison diagram of contextual information used in contextual information and this method used in conventional method；

Fig. 4 is the procedure chart for extracting transformation sample and obtaining adaptive regressive object；

Fig. 5 is tracking precision and tracking success rate figure on OTB50, OTB2013, OTB2015 data set；

Fig. 6 is the partial results schematic diagram tracked on OTB data set to different types of target.

Specific embodiment:

The disclosure is described further with embodiment with reference to the accompanying drawing.

It is noted that following detailed description is all illustrative, it is intended to provide further instruction to the application.Unless another It indicates, all technical and scientific terms used herein has usual with the application person of an ordinary skill in the technical field The identical meanings of understanding.

It should be noted that term used herein above is merely to describe specific embodiment, and be not intended to restricted root According to the illustrative embodiments of the application.As used herein, unless the context clearly indicates otherwise, otherwise singular Also it is intended to include plural form, additionally, it should be understood that, when in the present specification using term "comprising" and/or " packet Include " when, indicate existing characteristics, step, operation, device, component and/or their combination.

In the disclosure, term for example "upper", "lower", "left", "right", "front", "rear", "vertical", "horizontal", " side ", The orientation or positional relationship of the instructions such as "bottom" is to be based on the orientation or positional relationship shown in the drawings, only to facilitate describing this public affairs The relative for opening each component or component structure relationship and determination, not refers in particular to either component or element in the disclosure, cannot understand For the limitation to the disclosure.

In the disclosure, term such as " affixed ", " connected ", " connection " be shall be understood in a broad sense, and indicate may be a fixed connection, It is also possible to be integrally connected or is detachably connected；It can be directly connected, it can also be indirectly connected through an intermediary.For The related scientific research of this field or technical staff can determine the concrete meaning of above-mentioned term in the disclosure as the case may be, It should not be understood as the limitation to the disclosure.

Embodiment one

A kind of mesh based on residual error layering attention and correlation filter is disclosed in one or more embodiments Tracking is marked, as shown in Figure 1, attention mechanism and the filter of multi-context correlation can be layered containing residual error by what is proposed The convolutional network end to end of wave layer carries out real-time target following.

In a certain frame, for a test sample, it is input to the test of the symmetrical twin network structure proposed Branch obtains test sample feature；In test branch the structure of convolutional layer using in VGG-16 network first layer convolutional layer and The structure of second layer convolutional layer, and remove wherein all pond layers.

It obtained convolution feature z is input to multi-context correlation wave filtering layer, obtains the response of filter, is i.e. network Output response G (z), formula are as follows:

Wherein, ω is model parameter, and "-" indicates discrete Fourier transform,Representing matrix element is successively multiplied.

For in present frame, the test sample of different scale is repeated above operation, its feature z is obtained_sWith corresponding network Output response G (z_s)；Using all-network respond in the corresponding position of maximum value as target in the position of present frame, while basis The scale of target in the corresponding test sample of maximum response, the most best scale of target, formula are as follows:

Training sample x is extracted according to the current position of target and scale into the training stage₀, by training sample x₀Input Convolutional layer in network training branch obtains output P (x₀), train the convolution in the convolutional layer structure and test branch in branch Layer structure is consistent；Later, by P (x₀) it is input to residual error layering attention power module, obtain the training sample comprising paying attention to force information Feature, formula are as follows:

Q(x₀)=∑_uM_u(x₀)*P(x₀)+P(x₀)

According to the current position of target and scale, context sample and transformation sample are extracted, is input to and is proposed The test branch of symmetrical twin network structure, respectively obtains context sample convolution feature and transformation sample characteristics, and according to turning The feature for sheet of changing, constructs adaptive regressive object.

In multi-context correlation wave filtering layer, the present disclosure proposes a new objective functions, by the objective function Optimal value solved, the perception of context, the adaptive and filter of regressive object can be carried out in joint fashion The study of parameter, formula are as follows:

Wherein, w indicates filter parameter；y₀Indicate the restriction matrix for constructing regressive object y；X₀It indicates to adopt in image The sample of collection；X_iIndicate context sample；Regular terms parameter θ₁,θ₂,θ₃∈ (0,1] it is constant, for preventing over-fitting；X herein₀ And X_iTo recycle sample, their basic sample is respectively x₀And x_i。

It is solved by the optimal value of the objective function to proposed multi-context correlation wave filtering layer, uses Cyclic Moment The closed solutions of filter parameter can be calculated in the property and Inversion Formula of battle array are as follows:

Wherein,For x₀Conjugate complex number.In the training stage, the training sample feature x obtained can be used₀, contextual feature x_i, restriction matrix y in adaptive regressive object₀, acquire filter parameter w.X herein₀For the training of input correlation wave filtering layer The feature of sample, rather than training sample；x_iFor the feature of the context sample of input correlation wave filtering layer, rather than context sample This.

Based on new filter parameter, original model parameter is updated, completes training process, formula is as follows:

Wherein, ω is original model parameter；λ ∈ [0,1] is constant, indicates learning rate.

The present processes are described in detail below.

Pay attention to power module, receives residual error Web vector graphic residual error jump connection to enhance the inspiration of network performance, this public affairs It opens and proposes residual error layering attention study to obtain more extensive and more effectively can perceive the convolution feature of attention, Fig. 2 is The residual error layering attention module diagram proposed.

In traditional attention mechanism, the output Q (x of power module is paid attention to₀) can be represented as:

Q(x₀)=M (x₀)*P(x₀)

Wherein, P (x₀) it is the input for paying attention to power module, that is, train the sample x of convolutional layer output in branch₀Feature, M (x₀) it is the attention distribution map for paying attention to power module and generating, Q (x₀) it is the convolution containing attention force information for paying attention to power module output Feature.

However, the attention of convolution feature and element value in 0 to 1 range is distributed in traditional attention mechanism Figure is multiplied, it will reduces the value of convolution feature, this can reduce the performance of script convolutional network in many cases.Therefore, For this problem, the inspiration of residual error network is received, proposes a kind of attention power module based on residual information.In addition, this The open attention power module proposed is used from upper-top-down hourglass configuration in bottom.From the convolution of different convolutional layers output Similarly comprising different sample informations, in paying attention to power module, the output of different up-sampling layers can also reflect not feature Same attention force information, therefore, the disclosure is integrated to obtain a more accurate attention distribution map.

The layering attention study of residual error that the disclosure is proposed may be expressed as following formula, layering attention distribution map with Contain the convolution feature for paying attention to force information at final by the Fusion Features of convolutional layer input:

Q(x₀)=∑_uM_u(x₀)*P(x₀)+P(x₀)

Wherein, u is the number for paying attention to up-sampling layer in power module.Here, the attention point of different up-sampling layer outputs Butut have different resolution ratio, handled using closest to distribution map of the interpolation to low resolution, make its with compared with high score The resolution ratio that the distribution map of resolution is consistent.

As shown in Fig. 2, the attention power module of the disclosure has used the jump in residual error network to connect.In traditional attention In mechanism, only with the last one up-sampling layer output as attention distribution map, and it is defeated that layer is up-sampled before having given up Information out.Unlike traditional attention mechanism, the disclosure uses containing with difference for multiple up-sampling layer outputs The attention force information of justice and effect, and combined.In Fig. 2, the preposition up-sampling layer with low resolution Output contain more global information, this facilitate the positioning to target and prevent because block or other factors caused by drift Problem；And the output of the postposition up-sampling layer with high-resolution contains more accurate local message, this facilitates area Partial objectives for and similar object, and adapt to the variation of target.

Contextual information can provide more auxiliary informations to target positioning, this helps to improve mesh during tracking Mark the accuracy of tracking, the especially target following under complex environment.However, traditional method based on correlation device due to The Cosine Window for alleviating boundary effect has been used, therefore has only remained a small amount of contextual information during tracking, it is shown in Fig. 3 It is the comparison diagram of contextual information used in the contextual information for including and this method in conventional method.

As shown in figure 3, disclosed method extracts context sample, the update for model in model training stage.With A+p_nCentered on, extract context sample x_1:k, wherein p_nIt is target in the position of n-th frame, A=[- size (x₀,1),0；0,- size(x₀,2)；size(x₀,1),0；0,size(x₀, 2)], the scale of context sample is consistent with training sample.

Traditional method for tracking target based on correlation filter using static gaussian shape regressive object, Different, disclosed method is suitable for the movement of target and the distribution of target using dynamic regressive object y Situation, wherein y obeys noise model y=y+n,y₀For the limitation for constructing regressive object y Matrix.Shown in Fig. 4 is that disclosed method extracts transformation sample and obtains the procedure chart of adaptive regressive object.

As shown in figure 4, with T+p_nCentered on, extract j transformation sample m_1:j, change the scale and training sample of sample Scale is consistent, wherein p_nIt is target in the position of n-th frame, j=7, T=[t₁,t₂,...,t_j]=[0,0；0,1；1,1；0,- 1；-1,0；- 1, -1] * ρ,By m_1:jIt is input to network test branch and obtains network response diagram G (m_1:j), taking the value of response diagram center is restriction matrix y₀The value of corresponding position, it may be assumed that

y₀(t_1:j+p_n)=G (m_1:j)

According to known y₀The value of middle element is based on Gaussian Profile, obtains the value of surplus element, finally obtains adaptive return Return target.In the top view of regressive object shown in Fig. 4, compared to the regressive object of gaussian shape, used in the disclosure Adaptive regressive object can preferably reflect the distribution of target.

Attention and correlation are layered based on residual error to what is proposed on tri- data sets of OTB50, OTB2013, OTB2015 The method for tracking target of property filter is assessed.The instruction for the convolutional network end to end that the disclosure is proposed is described first Practice process, the concrete configuration of experiment and the appraisal procedure of use is given later, finally to the reality obtained on three data sets Result is tested to be analyzed.

The structure and training process of convolutional neural networks are as follows end to end:

It is specific as follows to the pre-training process of convolutional neural networks:

Training data is pre-processed, a pair of of picture frame is extracted every 10 frames, is extracted with the range of 3 times of target sizes Image sheet out, and adjust the big as low as 128*128 pixel of image sheet；

Using stochastic gradient descent method training network, wherein momentum value is set as 0.9, and weight decaying is set as 0.005, Learning rate is set as 1e-2；

For the loss function of network training using loss function is returned, formula is as follows:

Wherein, G (z) is the network receptance function to sample z, and y is the regressive object of Gaussian distributed；

The convolutional neural networks progress 50 times iteration using complete data set that power module is paid attention to without residual error layering are instructed Practice；

Residual error is layered and notices that depth convolutional network, and the layer trained in fixed convolutional network is added in power module, is carried out 20 repetitive exercises using complete data set.

The test configurations of the disclosure are as follows: a 2.59GHz be furnished with 8G memory, i5 processor and it is tall and handsome reach GTX 1070 It is tested on the computer of GPU, the speed executed in PyTorch environment can reach 36 frames/second, meets real-time and wants It asks.During the experiment, whole parameters that this method uses all are fixed and invariable.In layering attention mechanism, use The number for up-sampling layer is u=3；Regular parameter θ₁、θ₂、θ₃Value be respectively 1e^-3,1,0.5；Learning parameter λ is 0.012.

By result of the method proposed (Ours) on OTB data set and other 17 kinds of high performance objectives trackings As a result it is compared.This 17 kinds of method for tracking target include SRDCFdecon, MUSTer, LCT, SRDCF, Staple_CA, CFNet, SiamFC, HDT, Staple, DCF_CA, SAMF, MEEM, DSST, KCF, TGPR, DLT, STC.Wherein, The result of SRDCFdecon, LCT, SRDCF, CFNet, SiamFC, HDT, Staple and DSST are disclosed from their author Methods and results；Method disclosed in author of the result of Staple_CA, DCF_CA, SAMF and KCF from them is being tested Result after being run in equipment；Author of the result of MUSTer, MEEM, TGPR, DLT and STC from LCT.The disclosure is adopted Success rate with region under curve (area under the curve, AUC) and when threshold value is 20 pixels is as measurement Standard has carried out ranking to the tracking success rate and tracking precision of this 18 kinds of methods respectively.Shown in fig. 5 is 12 before ranking The tracking success rate of method and the schematic diagram of accuracy.Table 1 summarizes the class of 18 kinds of method for tracking target for performance comparison Type.

Type of the table 1 for the method for tracking target of comparison

Method for tracking target evaluation process based on residual error layering attention and correlation filter are as follows:

On OTB50 data set, the AUC for the tracker that the disclosure proposes is 0.591, better than what is be number two SRDCFdecon (0.560) 5.5%.The tracker of the disclosure can carry out the tracker of context-aware better than other two, i.e., Staple_CA (0.542) and DCF_CA (0.493) is better than 9.0% and 19.8% respectively, this has benefited from disclosed method institute The convolution feature that uses and regressive object it is adaptive.In addition to disclosed method, the peak performance with twin symmetrical structure Tracker be (0.530) CFNet, its network structure has used two convolutional layers；However, its result is lower than the disclosure 11.5%.In terms of accuracy, the tracker of the disclosure is number two (0.790), lower than being located at primary HDT (0.804) 1.8%.But the tracker of the disclosure execute speed on be it is advantageous, according to HDT author provide data, the place of HDT Reason speed is 10fps, fast three times of ratio HDT tracker of the execution speed of disclosed method.

On OTB2013 data set, the AUC for the tracker that the disclosure proposes is 0.671, is made number one, than SRDCFdecon (0.653) is high by 2.8%.The tracker of the disclosure can carry out the tracker of context-aware better than other two, That is Staple_CA (0.615) and DCF_CA (0.592) is better than 9.1% and 13.3% respectively.In addition to disclosed method, have The tracker of the peak performance of twin symmetrical structure is (0.611) CFNet, but its result lower than disclosed method 9.8%.? In terms of accuracy, the tracker (0.889) of the disclosure is than HDT tracker (0.883) only low 0.7%, but the tracker of the disclosure HDT tracker is substantially better than in terms of executing speed.

On OTB2015 data set, the AUC for the tracker that the disclosure proposes is 0.623, is number two, than being located at first The SRDCFdecon (0.627) only low 0.6% of position.But the tracker of the disclosure executes speed ratio SRDCFdecon (its author The processing speed of offer is 1fps) fast 30 times of method.The tracker of the disclosure can carry out context-aware better than other two Tracker, i.e. Staple_CA (0.598) and DCF_CA (0.552) are better than 4.2% and 12.9% respectively.Except disclosed method Outside, the tracker of the peak performance with twin symmetrical structure is (0.582) SiamFC, but its result is lower than the disclosure 7.0%.In terms of accuracy, the precision of the tracker of the disclosure is 0.815, comes third position.

The method that the disclosure is proposed can carry out stable tracking in complicated scene to target, shown in fig. 6 for this The partial results schematic diagram that disclosed method tracks all kinds of targets on OTB data set.Disclosed method is in many Under complex environment, such as block in large area, target shape variation, target quickly rotate, illumination variation, background interference environment Under, it is able to carry out effective, stable target following.

Embodiment two

A kind of mesh based on residual error layering attention and correlation filter disclosed in one or more embodiments Mark tracking system, including server, the server include memory, processor and storage on a memory and can be in processor The computer program of upper operation, the processor are realized to be layered described in embodiment one based on residual error when executing described program and be infused The method for tracking target for the power and correlation filter of anticipating.

Embodiment three

A kind of computer readable storage medium disclosed in one or more embodiments, is stored thereon with computer journey Sequence executes the mesh based on residual error layering attention and correlation filter described in embodiment one when the program is executed by processor Mark tracking.

The foregoing is merely preferred embodiment of the present application, are not intended to limit this application, for the skill of this field For art personnel, various changes and changes are possible in this application.Within the spirit and principles of this application, made any to repair Change, equivalent replacement, improvement etc., should be included within the scope of protection of this application.

Although above-mentioned be described in conjunction with specific embodiment of the attached drawing to the disclosure, model not is protected to the disclosure The limitation enclosed, those skilled in the art should understand that, on the basis of the technical solution of the disclosure, those skilled in the art are not Need to make the creative labor the various modifications or changes that can be made still within the protection scope of the disclosure.

Claims

1. a kind of method for tracking target based on residual error layering attention and correlation filter, it is characterized in that: including following step It is rapid:

(1) current frame image is read, obtains position and scale of the target in previous frame image, and then determine the survey in present frame Sample sheet；

(2) test sample is input to the convolutional neural networks after training, the convolution feature of test sample is obtained, by the feature It is input to multi-context correlation wave filtering layer, by model parameter, obtains network response, and determine target in the position of present frame And scale；

(3) position according to target in present frame and scale obtain training sample, and the training sample is inputted convolutional Neural net Network and residual error layering pay attention to power module, obtain containing the training sample feature for paying attention to force information；

(4) transformation sample is extracted, convolutional neural networks are inputted in the position of present frame according to target, based on transformation sample Network response, obtain adaptive regressive object, later, extract context sample, obtain context sample feature, and according to Containing the training sample feature and adaptive regressive object for paying attention to force information, the filter ginseng containing multi-context information is obtained Number；

(5) using the filter parameter obtained, original model parameter is updated.

2. a kind of method for tracking target based on residual error layering attention and correlation filter as described in claim 1, It is characterized in: further includes step (6), is updated to next frame image, the iteration of step (1)-(5) is constantly carried out, until all images Processing is completed.

3. a kind of method for tracking target based on residual error layering attention and correlation filter as described in claim 1, Be characterized in: in the step (1), selected target includes: in current frame image in the determination process of the test sample of present frame Centered on previous frame target position, the image sheet of N times of previous frame target scale of scale is extracted, N is greater than 1, and by image Piece is adjusted to specified pixel, the test sample as present frame.

4. a kind of method for tracking target based on residual error layering attention and correlation filter as described in claim 1, Be characterized in: in the step (2), the structure of convolutional neural networks includes:

The above convolutional layer is copied as into symmetrical twin network structure, make network tool there are two the consistent trained branch of structure and Test branch；

The Hourglass structure with three layers of pond layer is added after the convolutional layer of network training branch, as the network Residual error layering pays attention to power module；

The last layer of network is multi-context correlation wave filtering layer, and the input of this layer is to pay attention to the output and test point of power module The output of branch.

5. a kind of method for tracking target based on residual error layering attention and correlation filter as described in claim 1, It is characterized in: in the step (2), pre-training is carried out to convolutional neural networks end to end using training dataset；

The detailed process of pre-training includes:

Training data is pre-processed, a pair of of picture frame is extracted every several frames, is extracted with being greater than the range of target sizes Image sheet, and adjust the big of image sheet and as low as set pixel, the sample as training network；

Using stochastic gradient descent method training network；

Notice that the convolutional neural networks of power module carry out the repetitive exercise that complete data set is used for multiple times to without residual error layering；

Residual error is layered and notices that depth convolutional network, and the layer trained in fixed convolutional network is added in power module, is carried out multiple Use the repetitive exercise of complete data set.

6. a kind of method for tracking target based on residual error layering attention and correlation filter as described in claim 1, It is characterized in: in the step (2), the test branch of test sample input network is obtained into test sample by two layers of convolutional layer Feature；

Or, in the step (2), determining that target includes: in the position of present frame and scale will be in the input mostly of test sample feature Context correlation wave filtering layer, and according to model parameter, calculate network response；In tracking phase, multiple and different scales will be extracted Image sheet, handle the feature that them are obtained for test sample and network response, target is distinguished in the scale of present frame and position The scale of target and the corresponding position of maximum response in the test sample of maximum value are responded to obtain network.

7. a kind of method for tracking target based on residual error layering attention and correlation filter as described in claim 1, Be characterized in: in the step (3), the detailed process for obtaining training sample includes: in current frame image, with the mesh in present frame Centered on cursor position, the image sheet of N times of current goal scale of scale is extracted, N is greater than 1, and image sheet is adjusted to specified Pixel, as target current frame image training sample；

Or, obtaining the detailed process containing the training sample feature for paying attention to force information includes: in the step (3)

By training sample x₀The convolutional layer in network training branch is inputted, output P (x is obtained₀)；Later, by P (x₀) be input to it is residual Difference layer pays attention to power module, obtains containing the training sample feature for paying attention to force information:

Q(x₀)=∑_uM_u(x₀)*P(x₀)+P(x₀)

Wherein, * indicates that Hadamard is multiplied by channel, M (x₀) indicate to notice that the attention distribution map that power module generates, u indicate note The number of layer, Q (x are up-sampled in power module of anticipating₀) indicate will be input to multi-context wave filtering layer with pay attention to force information Training sample feature.

8. a kind of method for tracking target based on residual error layering attention and correlation filter as described in claim 1, It is characterized in: in the step (4), obtains adaptive regressive object, specific as follows:

Centered on set point, several transformation samples are extracted, the scale for changing sample is consistent with the scale of training sample, and structure The consistent restriction matrix in center and scale an of center and scale and training sample is built, the initial value of element is 0；

Transformation sample is input to network test branch, the feature of transformation sample is obtained, and obtain network response diagram, takes each sound Should figure center value of the value as restriction matrix corresponding position；

According to the value of element in known restriction matrix, it is based on Gaussian Profile, the value of surplus element is calculated, finally obtains and be able to reflect The restriction matrix of target distribution and target movement；

9. a kind of method for tracking target based on residual error layering attention and correlation filter as described in claim 1, Be characterized in: in the step (4), the detailed process for obtaining the filter parameter containing multi-context information includes:

Centered on set point, context sample is extracted, the scale of context sample is consistent with training sample, and by context sample Originally it is input to network test branch, obtains the feature of context sample；

In the training stage, according to the training sample feature comprising attention, the feature of context sample and adaptive regressive object In restriction matrix, obtain filter parameter.

10. a kind of Target Tracking System based on residual error layering attention and correlation filter, it is characterized in that: including service Device, the server include memory, processor and storage on a memory and the computer program that can run on a processor, The processor is realized as claimed in any one of claims 1-9 wherein when executing described program based on residual error layering attention and phase The method of the target following of closing property filter.