CN107527355A

CN107527355A - Visual tracking method, device based on convolutional neural networks regression model

Info

Publication number: CN107527355A
Application number: CN201710595279.6A
Authority: CN
Inventors: 徐常胜; 张天柱; 高君宇
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2017-07-20
Filing date: 2017-07-20
Publication date: 2017-12-29
Anticipated expiration: 2037-07-20
Also published as: CN107527355B

Abstract

The present invention relates to computer vision field, propose a kind of visual tracking method based on convolutional neural networks regression model, device, aim to solve the problem that object tracking process is divided into parts match, target positions two independent steps, it can not accomplish that this method includes the problem of directly inferring the position of target from part：S1, in the initial frame of vision tracking, the sampling of image block is carried out according to given target to be tracked, and be divided into multiple parts；S2, it is trained using stochastic gradient descent method to building in advance based on convolutional neural networks regression model；S3, in follow-up each frame of vision tracking, the placement configurations region of search that is occurred based on the target to be tracked in previous frame, by it is described train the position of target to be tracked described in present frame is obtained based on convolutional neural networks regression model.Part and target positioning have been carried out abundant combination by the present invention, have preferable robustness.

Description

Visual tracking method, device based on convolutional neural networks regression model

Technical field

The invention belongs to computer vision field, and in particular to a kind of vision based on convolutional neural networks regression model with Track method, apparatus.

Background technology

Vision tracking is most basic part in computer vision application, intelligent video monitoring, augmented reality, machine People learns to be required for tracking target interested with carrying out robustness with fields such as man-machine interactions.Although the field is in recent years It has been made significant headway that, but vision tracking is still a difficult task, because it faces following challenge, such as part hides Gear, deformation, illumination variation, motion blur, quick motion, background are in a jumble and ratio changes etc..

In order to solve these problems, much the method based on part is widely studied in recent years, and this method is by object Body is decomposed into a group parts and studied.In fact, when part masking or deformation be present, some parts of target remain in that It can be seen that so as to provide reliable clue for tracking.It is most of in these methods to be counted as first between consecutive frame Parts match, then realize part to the tracking problem of part.This method generally includes two key steps：(1) parts match： Its basic thought is to match corresponding part in subsequent frames using the part in the frame of video one；(2) target positions：Based on portion Part matching result, by considering the space constraint between different parts, estimate the state of target.The major defect of this method It is that tracking process is divided into two independent steps, it is impossible to accomplish directly to infer the position of target from part.

Different from existing method, intuitively idea is how using a part for target directly to return out mesh for another Target position.In the current frame, if target object can be divided into dry part by us.If we know that how to estimate target Translation between part and the center of destination object, we are obtained with all position candidates of target, for inferring target Position.Therefore, part can be by directly solving to the tracking problem of target in target component upper returning.For achievement unit Robustness of the part to goal regression, it is necessary to be modeled to part contextual information and part reliability, reason is as follows：(1) portion The contextual information of part is critically important, because it is carried by recurrence of the Space Layout Structure of object of reservation to part to target For additional constraint.If we individually carry out part to each part to the regression algorithm of target, then when occurring in video During more complicated cosmetic variation, the result is that insecure.(2) reliability of target component is also critically important, because target Different parts may have different outward appearance changes, illumination variation, motion blur or partial occlusion, if simply giving different portions The equal weight of part, then regression result will lopsidedly be acted on by each part.It would therefore be desirable to each part is built Its reliability of mould, to emphasize its significance level.

The content of the invention

It has been that solution object tracking process is divided into parts match, mesh to solve above mentioned problem of the prior art Demarcate the independent step in position two, it is impossible to the problem of accomplishing directly to infer the position of target from part, an aspect of of the present present invention, carry A kind of visual tracking method based on convolutional neural networks regression model is gone out, has comprised the following steps：

Step S1, in the initial frame of vision tracking, the sampling of image block is carried out according to given target to be tracked, to sampling Obtained each image block carries out implicit division, after computation partition the offset information of each part and the target to be tracked and Duplication；

Step S2, based on image block and the component information for sampling to obtain in step S1, it is utilized respectively stochastic gradient descent Loss function derivation based on convolutional neural networks regression model of the method to building in advance, the renewal convolutional neural networks of iteration Parameter value, until loss function reaches the default condition of convergence, trained based on convolutional neural networks regression model；

Step S3, in follow-up each frame of vision tracking, the position that is occurred based on the target to be tracked in previous frame Region of search is constructed, image block sampling is carried out to region of search, and the image block of sampling is carried out after implicitly dividing, by described What is trained obtains the position of target to be tracked described in present frame based on convolutional neural networks regression model；

Wherein,

Described to be based on convolutional neural networks regression model, its loss function loses L by returning_reg, differentiate loss L_disAnd The regular terms of convolutional neural networks weight parameter is formed.

Preferably, it is described build in advance be based on convolutional neural networks regression model, its loss function L:

Wherein, L_regLost to return, L_disFor differentiate lose, Θ be convolutional neural networks weight parameter, λ₁And λ₂It is pre- If regulatory factor.

Preferably, loss L is returned_regFor:

Wherein, Δ x_kWith Δ y_kRespectively convolutional neural networks input picture block J k-th of part is predicted its with treating Track target's center's position horizontal stroke, the Center Offset of ordinate.

Preferably, loss L is differentiated_disFor:

Wherein, w_kRepresent the discriminant value that convolutional neural networks are predicted input picture block J k-th of part；l∈{0,1} It is label, represents two part k and k ' correlation, l=0 represents that part k has similar discriminant value with k ', and l=1 represents part K has the discriminant value bigger than part k '；For image block and the Duplication of target to be tracked.

Preferably, it is described build in advance be based on convolutional neural networks regression model, its constraints：

During l=0, using distance restraint, its distance restraint function L_sFormula be:

During l=1, constrained using contrast, its comparative constraint function L_cFormula be:

L_c(w_k,w_k′)=s (τ-(w_k-w_k′))；

Wherein, s (x)=max (0, x) is non-saturation nonlinearity function；τ is predetermined threshold value.

Preferably, the Duplication of image block and target to be trackedFor:

Wherein, BOX_PATCHFor rectangle frame corresponding to image block；BOX_GTFor rectangle frame corresponding to target to be tracked.

Preferably, in step S1, the given target to be tracked of the basis carries out the sampling of image block, and its method is：

It is that average constructs Gaussian function with the given state J=(x, y, s) of the target to be tracked, and is based on the Gaussian function Number carries out the sampling of image block on different yardsticks and position；

Wherein, x, y, s represent the horizontal central point of target to be tracked, ordinate and yardstick respectively.

Preferably, the offset information of each part and the target to be tracked and overlapping after computation partition described in step S1 Rate, its method are：

Part and the off-centring of the target to be tracked are offset including abscissaOffset with ordinate

Wherein, x_kAnd y_kHorizontal stroke, the ordinate of respectively k-th part centre position；x_GTAnd y_GTTarget respectively to be tracked Horizontal stroke, the ordinate of center；

Part and the Duplication s of the target to be tracked_kFor:

Wherein, ROI_kFor rectangle frame, ROI corresponding to a part of image block_GTRepresent rectangle corresponding to target to be tracked Frame.

Preferably, described to be based on convolutional neural networks regression model, the center of target to be tracked described in its t frame is sat MarkCalculating function be：

Wherein, x_k,tAnd y_k,tHorizontal stroke, the ordinate of k-th of part respectively in the region of search of t frames；Δx_k,t、Δy_k,tWith w_k,tRepresent that k-th of part is relative to target's center to be tracked in the current t region of search of convolutional neural networks prediction respectively The transverse and longitudinal coordinate displacement of position and the discriminant value of the part；Z_wIt is weight w_k,tNormalization factor；K is in the middle part of the region of search of t frames The sum of part.

Another aspect of the present invention, it is proposed that a kind of storage device, wherein be stored with a plurality of program, described program be suitable to by Processor is loaded and performed with the above-mentioned visual tracking method based on convolutional neural networks regression model.

The third aspect of the present invention, it is proposed that a kind of processing unit, including

Processor, it is adapted for carrying out each bar program；And

Storage device, suitable for storing a plurality of program；

Described program is suitable to be loaded by processor and performed to realize：

The above-mentioned visual tracking method based on convolutional neural networks regression model.

Beneficial effects of the present invention：

(1) it is of the invention based on convolutional neural networks regression model, by considering the component information in image block, Ke Yicong Component locations directly return out target location, it is achieved thereby that the target following of robustness；

(2) present invention is modeled using the feature of whole image block, it is contemplated that correlation and each part between part Importance, so as to improve tracking effect.

Brief description of the drawings

Fig. 1 is the signal of the visual tracking method flow based on convolutional neural networks regression model of one embodiment of the invention Figure.

Embodiment

The preferred embodiment of the present invention described with reference to the accompanying drawings.It will be apparent to a skilled person that this A little embodiments are used only for explaining the technical principle of the present invention, it is not intended that limit the scope of the invention.

The purpose of the present invention is to carry out the tracking of the vision of robustness using convolutional neural networks regressive object position.This hair Bright to have considered part contextual information and part reliability, in one end to end framework realizing part arrives target Return.

The method of the present invention realizes that the regression model of robustness is used by convolutional neural networks in an end-to-end framework In part to goal regression.The model proposed can not only utilize part contextual information to keep overall corresponding space layout Structure, and learnt part reliability, to emphasize importance of the different parts for regressive object.Methods described includes four Individual part：(1) image block is sampled in initial frame, builds sample labeling information；(2) optimization object function is established；(3) use with Machine gradient descent method optimizes to object function until model is restrained；(4) estimated in subsequent frames using the model trained Target is most stateful.Wherein 2) in establish optimization object function and can build in advance, so in the application process of reality, side Method step can think (1), (3), (4).

As shown in figure 1, the visual tracking method based on convolutional neural networks regression model of one embodiment of the invention, bag Include following steps：

Step S1, in the initial frame of vision tracking, the sampling of image block is carried out according to given target to be tracked, to sampling Obtained each image block carries out implicit division, after computation partition the offset information of each part and the target to be tracked and Duplication；Offset information herein is the skew letter after division between the central point of each part and the central point of the target to be tracked Breath, Duplication herein is each part after division and the Duplication of the target to be tracked.

Wherein,

Implicit division i.e. by an image block it is implicit be divided into multiple parts, record these parts in this image block Positional information.Image block or original expression after implicit division, simply have recorded component information existing for the inside.In contrast The display division answered refers to really disassembling image block not to have existed for multiple small image of component blocks, initial image block.

The region of search constructed in the step S3 of the present embodiment can be rectangle, the center of this rectangle be it is described treat with The center for the position that track target occurs in previous frame, height, the width of rectangle are that the target to be tracked is right in previous frame respectively High, wide 2 times for the rectangle frame answered.Certainly, the region of search constructed can also be other shapes, its area size covered Other sizes are can also be, as long as it can cover the band of position that the target to be tracked occurs in previous frame in the current frame Corresponding region, and reserve certain nargin scope.

In the step S3 of the present embodiment, image block sampling is carried out to region of search, and the image block of sampling is carried out implicitly Division, Ke Yiwei：Carry out sampling N1 image block (N1=in one embodiment in this big rectangular area in region of search 100) and to sampling to image block implicitly divided.

Convolutional neural networks regression model is based on described in the present embodiment, the centre bit of target to be tracked described in its t frame Put coordinateCalculating function such as formula (1) shown in：

Convolutional neural networks regression model is based on described in the present embodiment, its object function is to minimize a loss letter Number, for the image block of an input, its loss function L loses L by returning_reg, differentiate loss L_disAnd convolutional neural networks The regular terms of weight parameter is formed.

Shown in loss function L such as formula (2)：

Wherein, L_regLost to return, L_disLost to differentiate, the weight parameter of Θ convolutional neural networks, λ₁And λ₂It is default Regulatory factor.

Return loss L_regAs shown in formula (3)：

Differentiate loss L_disAs shown in formula (4)：

In the present embodiment, the constraints based on convolutional neural networks regression model is：

During l=0, using distance restraint, its distance restraint function L_sAs shown in formula (5)：

During l=1, constrained using contrast, its comparative constraint function L_cAs shown in formula (6)：

L_c(w_k,w_k′)=s (τ-(w_k-w_k′)) (6)

In the present embodiment, the Duplication of image block and target to be trackedCalculating such as formula (7) shown in：

Wherein, BOX_PATCHFor rectangle frame corresponding to image block；BOX_GTFor rectangle frame corresponding to target to be tracked.Herein Image block, part, rectangle frame corresponding to target to be followed the trail of are the minimum enclosed rectangle of corresponding content.

In the present embodiment, step S1 can be split as two steps：

Step S11, the sampling of image block is carried out according to given target to be tracked.Target to be tracked can be any herein Object interested, including people, vehicle, animal, commodity etc..

It is that average constructs Gaussian function with the given state J=(x, y, s) of the target to be tracked, and is based on the Gaussian function Number carries out the sampling of image block on different yardsticks and position；Wherein, x, y, s represent the central point of target to be tracked respectively Horizontal, ordinate and yardstick.

Step S12, carries out implicit division to the obtained each image block of sampling, after computation partition each part with it is described The offset information and Duplication of target to be tracked.

Implicit dividing mode can be any type herein, such as be divided into several formed objects or different size of Part, can have between part it is overlapping can also be not overlapping.Image block is divided into 9 area equations, no in present invention experiment Overlapping part.

Part and the off-centring of the target to be tracked are offset including abscissaOffset with ordinateIts Calculate respectively as shown in formula (8), (9)：

Wherein, x_kAnd y_kHorizontal stroke, the ordinate of respectively k-th part centre position；x_GTAnd y_GTTarget respectively to be tracked Horizontal stroke, the ordinate of center.

Part and the Duplication s of the target to be tracked_kAs shown in formula (10)

Wherein, ROI_kFor rectangle frame, ROI corresponding to a part of image block_GTRepresent that target to be tracked corresponds to rectangle frame, ∩ represents the common factor between image block, and ∪ represents the union between image block, and area () expression obtains the area of given image block.

Hardware, the software mould of computing device can be used with reference to the step of method that the embodiments described herein describes Block, or the two combination are implemented.Software module can be placed in random access memory (RAM), internal memory, read-only storage (ROM), Institute is public in electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technical field In any other form of storage medium known.

A kind of storage device of embodiment of the present invention, wherein being stored with a plurality of program, described program is suitable to by processor Load and perform to realize the above-mentioned visual tracking method based on convolutional neural networks regression model.

A kind of processing unit of embodiment of the present invention, including processor, storage device；Processor, it is adapted for carrying out each bar Program；Storage device, suitable for storing a plurality of program；Described program is suitable to be loaded by processor and performed to realize：Above-mentioned base In the visual tracking method of convolutional neural networks regression model.

Person of ordinary skill in the field can be understood that, for convenience and simplicity of description, foregoing description The specific work process of device and relevant explanation, may be referred to the corresponding process in preceding method embodiment, will not be repeated here.

Those skilled in the art should be able to recognize that, the mould of each example described with reference to the embodiments described herein Block, method and step, it can be realized with electronic hardware, computer software or the combination of the two, in order to clearly demonstrate electronics The interchangeability of hardware and software, the composition and step of each example are generally described according to function in the above description Suddenly.These functions are performed with electronic hardware or software mode actually, and the application-specific and design depending on technical scheme are about Beam condition.Those skilled in the art can realize described function using distinct methods to each specific application, but It is this realization it is not considered that beyond the scope of this invention.

Term " comprising " or any other like term are intended to including for nonexcludability, so that including a system Process, the method for row key element not only include those key elements, but also other key elements including being not expressly set out, or also include The intrinsic key element of these processes, method.

So far, combined preferred embodiment shown in the drawings describes technical scheme, still, this area Technical staff is it is easily understood that protection scope of the present invention is expressly not limited to these embodiments.Without departing from this On the premise of the principle of invention, those skilled in the art can make equivalent change or replacement to correlation technique feature, these Technical scheme after changing or replacing it is fallen within protection scope of the present invention.

Claims

1. a kind of visual tracking method based on convolutional neural networks regression model, it is characterised in that comprise the following steps：

Step S1, in the initial frame of vision tracking, the sampling of image block is carried out according to given target to be tracked, sampling is obtained Each image block carry out implicit division, the offset information of each part and the target to be tracked and overlapping after computation partition Rate；

Step S2, based on image block and the component information for sampling to obtain in step S1, it is utilized respectively stochastic gradient descent method pair The loss function derivation based on convolutional neural networks regression model built in advance, the parameter of the renewal convolutional neural networks of iteration Value, until loss function reaches the default condition of convergence, trained based on convolutional neural networks regression model；

Step S3, in follow-up each frame of vision tracking, the placement configurations that are occurred based on the target to be tracked in previous frame Region of search, image block sampling is carried out to region of search, and the image block of sampling is carried out after implicitly dividing, pass through the training Good obtains the position of target to be tracked described in present frame based on convolutional neural networks regression model；

Wherein,

Described to be based on convolutional neural networks regression model, its loss function loses L by returning_reg, differentiate loss L_disAnd convolution The regular terms of neutral net weight parameter is formed.

2. visual tracking method according to claim 1, it is characterised in that it is described build in advance based on convolutional Neural net Network regression model, its loss function L are：

<mrow> <mi>L</mi> <mo>=</mo> <msub> <mi>L</mi> <mrow> <mi>r</mi> <mi>e</mi> <mi>g</mi> </mrow> </msub> <mo>+</mo> <msub> <mi>&lambda;</mi> <mn>1</mn> </msub> <msub> <mi>L</mi> <mrow> <mi>d</mi> <mi>i</mi> <mi>s</mi> </mrow> </msub> <mo>+</mo> <msub> <mi>&lambda;</mi> <mn>1</mn> </msub> <msubsup> <mrow> <mo>|</mo> <mrow> <mo>|</mo> <mi>&Theta;</mi> <mo>|</mo> </mrow> <mo>|</mo> </mrow> <mi>F</mi> <mn>2</mn> </msubsup> </mrow>

Wherein, L_regLost to return, L_disFor differentiate lose, Θ be convolutional neural networks weight parameter, λ₁And λ₂It is default Regulatory factor.

3. visual tracking method according to claim 2, it is characterised in that return loss L_regFor：

<mrow> <msub> <mi>L</mi> <mrow> <mi>r</mi> <mi>e</mi> <mi>g</mi> </mrow> </msub> <mo>=</mo> <munder> <mo>&Sigma;</mo> <mi>k</mi> </munder> <msubsup> <mn>1</mn> <mi>k</mi> <mrow> <mi>t</mi> <mi>a</mi> <mi>r</mi> </mrow> </msubsup> <mo>&lsqb;</mo> <msup> <mrow> <mo>(</mo> <msub> <mi>&Delta;y</mi> <mi>k</mi> </msub> <mo>-</mo> <mi>&Delta;</mi> <msub> <mover> <mi>y</mi> <mo>^</mo> </mover> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>+</mo> <msup> <mrow> <mo>(</mo> <msub> <mi>&Delta;x</mi> <mi>k</mi> </msub> <mo>-</mo> <mi>&Delta;</mi> <msub> <mover> <mi>x</mi> <mo>^</mo> </mover> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>&rsqb;</mo> </mrow>

Wherein, Δ x_kWith Δ y_kRespectively convolutional neural networks input picture block J k-th of part is predicted its with it is to be tracked Target's center position is horizontal, the Center Offset of ordinate.

4. visual tracking method according to claim 3, it is characterised in that differentiate loss L_disFor：

<mrow> <msub> <mi>L</mi> <mrow> <mi>d</mi> <mi>i</mi> <mi>s</mi> </mrow> </msub> <mo>=</mo> <munder> <mo>&Sigma;</mo> <mrow> <mi>k</mi> <mo>,</mo> <msup> <mi>k</mi> <mo>&prime;</mo> </msup> <mo>,</mo> <mi>k</mi> <mo>&NotEqual;</mo> <msup> <mi>k</mi> <mo>&prime;</mo> </msup> </mrow> </munder> <mo>{</mo> <msub> <mi>lL</mi> <mi>c</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>w</mi> <mi>k</mi> </msub> <mo>,</mo> <msub> <mi>w</mi> <msup> <mi>k</mi> <mo>&prime;</mo> </msup> </msub> <mo>)</mo> </mrow> <mo>+</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>l</mi> <mo>)</mo> </mrow> <msub> <mi>L</mi> <mi>s</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>w</mi> <mi>k</mi> </msub> <mo>,</mo> <msub> <mi>w</mi> <msup> <mi>k</mi> <mo>&prime;</mo> </msup> </msub> <mo>)</mo> </mrow> <mo>}</mo> <mo>+</mo> <msup> <mrow> <mo>(</mo> <mo>(</mo> <mrow> <msub> <mi>&Sigma;</mi> <mi>k</mi> </msub> <msub> <mi>w</mi> <mi>k</mi> </msub> </mrow> <mo>)</mo> <mo>-</mo> <mover> <mi>w</mi> <mo>^</mo> </mover> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow>

Wherein, w_kRepresent the discriminant value that convolutional neural networks are predicted input picture block J k-th of part；L ∈ { 0,1 } are marks Label, two part k and k ' correlation is represented, l=0 represents that part k has similar discriminant value with k ', and l=1 represents that part k has The discriminant value bigger than part k '；For image block and the Duplication of target to be tracked.

5. visual tracking method according to claim 4, it is characterised in that it is described build in advance based on convolutional Neural net Network regression model, its constraints are：

During l=0, using distance restraint, its distance restraint function L_sFormula be：

<mrow> <msub> <mi>L</mi> <mi>s</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>w</mi> <mi>k</mi> </msub> <mo>,</mo> <msub> <mi>w</mi> <msup> <mi>k</mi> <mo>&prime;</mo> </msup> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <msup> <mrow> <mo>(</mo> <msub> <mi>w</mi> <mi>k</mi> </msub> <mo>-</mo> <msub> <mi>w</mi> <msup> <mi>k</mi> <mo>&prime;</mo> </msup> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>;</mo> </mrow>

During l=1, constrained using contrast, its comparative constraint function L_cFormula be：

L_c(w_k,w_k′)=s (τ-(w_k-w_k′))；

6. visual tracking method according to claim 4, it is characterised in that the Duplication of image block and target to be tracked For：

7. according to the visual tracking method described in claim any one of 1-6, it is characterised in that in step S1, the basis is given Fixed target to be tracked carries out the sampling of image block, and its method is：

It is that average constructs Gaussian function with the given state J=(x, y, s) of the target to be tracked, and is existed based on the Gaussian function The sampling of image block is carried out on different yardsticks and position；

8. visual tracking method according to claim 7, it is characterised in that each portion after computation partition described in step S1 Part and the offset information and Duplication of the target to be tracked, its method are：

<mrow> <mi>&Delta;</mi> <msub> <mover> <mi>x</mi> <mo>^</mo> </mover> <mi>k</mi> </msub> <mo>=</mo> <msub> <mi>x</mi> <mi>k</mi> </msub> <mo>-</mo> <msub> <mi>x</mi> <mrow> <mi>G</mi> <mi>T</mi> </mrow> </msub> </mrow>

<mrow> <mi>&Delta;</mi> <msub> <mover> <mi>y</mi> <mo>^</mo> </mover> <mi>k</mi> </msub> <mo>=</mo> <msub> <mi>y</mi> <mi>k</mi> </msub> <mo>-</mo> <msub> <mi>y</mi> <mrow> <mi>G</mi> <mi>T</mi> </mrow> </msub> </mrow>

Wherein, x_kAnd y_kHorizontal stroke, the ordinate of respectively k-th part centre position；x_GTAnd y_GTTarget's center respectively to be tracked Horizontal stroke, the ordinate of position；

Part and the Duplication s of the target to be tracked_kFor：

Wherein, ROI_kFor rectangle frame, ROI corresponding to a part of image block_GTRepresent rectangle frame corresponding to target to be tracked.

9. according to the visual tracking method described in claim any one of 1-6, it is characterised in that described to be based on convolutional neural networks Regression model, the center position coordinates of target to be tracked described in its t frameCalculating function be：

<mrow> <mo>&lsqb;</mo> <msubsup> <mi>y</mi> <mi>t</mi> <mo>*</mo> </msubsup> <mo>,</mo> <msubsup> <mi>x</mi> <mi>t</mi> <mo>*</mo> </msubsup> <mo>&rsqb;</mo> <mo>=</mo> <mfrac> <mn>1</mn> <msub> <mi>Z</mi> <mi>w</mi> </msub> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>K</mi> </munderover> <msub> <mi>w</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>t</mi> </mrow> </msub> <mo>&lsqb;</mo> <mrow> <mo>(</mo> <msub> <mi>y</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>t</mi> </mrow> </msub> <mo>+</mo> <msub> <mi>&Delta;y</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>t</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>,</mo> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>t</mi> </mrow> </msub> <mo>+</mo> <msub> <mi>&Delta;x</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>t</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>&rsqb;</mo> </mrow>

Wherein, x_k,tAnd y_k,tHorizontal stroke, the ordinate of k-th of part respectively in the region of search of t frames；Δx_k,t、Δy_k,tAnd w_k,tPoint Not Biao Shi convolutional neural networks prediction current t region of search in k-th of part relative to target's center position to be tracked Transverse and longitudinal coordinate displacement and the part discriminant value；Z_wIt is weight w_k,tNormalization factor；K is part in the region of search of t frames Sum.

10. a kind of storage device, wherein being stored with a plurality of program, it is characterised in that described program is suitable to by processor loading simultaneously Perform to realize the visual tracking method based on convolutional neural networks regression model described in claim any one of 1-9.

11. a kind of processing unit, including

Processor, it is adapted for carrying out each bar program；And

Storage device, suitable for storing a plurality of program；

Characterized in that, described program is suitable to be loaded by processor and performed to realize：

The visual tracking method based on convolutional neural networks regression model described in claim any one of 1-9.