CN107527355A - Visual tracking method, device based on convolutional neural networks regression model - Google Patents
Visual tracking method, device based on convolutional neural networks regression model Download PDFInfo
- Publication number
- CN107527355A CN107527355A CN201710595279.6A CN201710595279A CN107527355A CN 107527355 A CN107527355 A CN 107527355A CN 201710595279 A CN201710595279 A CN 201710595279A CN 107527355 A CN107527355 A CN 107527355A
- Authority
- CN
- China
- Prior art keywords
- mrow
- msub
- target
- tracked
- convolutional neural
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 56
- 238000000034 method Methods 0.000 title claims abstract description 52
- 230000000007 visual effect Effects 0.000 title claims abstract description 21
- 238000005070 sampling Methods 0.000 claims abstract description 27
- 238000011478 gradient descent method Methods 0.000 claims abstract description 3
- 238000005192 partition Methods 0.000 claims description 6
- 230000000052 comparative effect Effects 0.000 claims description 3
- 238000009795 derivation Methods 0.000 claims description 3
- 238000006073 displacement reaction Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 230000001105 regulatory effect Effects 0.000 claims description 3
- 230000001537 neural effect Effects 0.000 claims 2
- 230000007935 neutral effect Effects 0.000 claims 1
- 230000006870 function Effects 0.000 description 28
- 238000005286 illumination Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000001373 regressive effect Effects 0.000 description 2
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000002537 cosmetic Substances 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000026676 system process Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/223—Analysis of motion using block-matching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20021—Dividing image into blocks, subimages or windows
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The present invention relates to computer vision field, propose a kind of visual tracking method based on convolutional neural networks regression model, device, aim to solve the problem that object tracking process is divided into parts match, target positions two independent steps, it can not accomplish that this method includes the problem of directly inferring the position of target from part:S1, in the initial frame of vision tracking, the sampling of image block is carried out according to given target to be tracked, and be divided into multiple parts;S2, it is trained using stochastic gradient descent method to building in advance based on convolutional neural networks regression model;S3, in follow-up each frame of vision tracking, the placement configurations region of search that is occurred based on the target to be tracked in previous frame, by it is described train the position of target to be tracked described in present frame is obtained based on convolutional neural networks regression model.Part and target positioning have been carried out abundant combination by the present invention, have preferable robustness.
Description
Technical field
The invention belongs to computer vision field, and in particular to a kind of vision based on convolutional neural networks regression model with
Track method, apparatus.
Background technology
Vision tracking is most basic part in computer vision application, intelligent video monitoring, augmented reality, machine
People learns to be required for tracking target interested with carrying out robustness with fields such as man-machine interactions.Although the field is in recent years
It has been made significant headway that, but vision tracking is still a difficult task, because it faces following challenge, such as part hides
Gear, deformation, illumination variation, motion blur, quick motion, background are in a jumble and ratio changes etc..
In order to solve these problems, much the method based on part is widely studied in recent years, and this method is by object
Body is decomposed into a group parts and studied.In fact, when part masking or deformation be present, some parts of target remain in that
It can be seen that so as to provide reliable clue for tracking.It is most of in these methods to be counted as first between consecutive frame
Parts match, then realize part to the tracking problem of part.This method generally includes two key steps:(1) parts match:
Its basic thought is to match corresponding part in subsequent frames using the part in the frame of video one;(2) target positions:Based on portion
Part matching result, by considering the space constraint between different parts, estimate the state of target.The major defect of this method
It is that tracking process is divided into two independent steps, it is impossible to accomplish directly to infer the position of target from part.
Different from existing method, intuitively idea is how using a part for target directly to return out mesh for another
Target position.In the current frame, if target object can be divided into dry part by us.If we know that how to estimate target
Translation between part and the center of destination object, we are obtained with all position candidates of target, for inferring target
Position.Therefore, part can be by directly solving to the tracking problem of target in target component upper returning.For achievement unit
Robustness of the part to goal regression, it is necessary to be modeled to part contextual information and part reliability, reason is as follows:(1) portion
The contextual information of part is critically important, because it is carried by recurrence of the Space Layout Structure of object of reservation to part to target
For additional constraint.If we individually carry out part to each part to the regression algorithm of target, then when occurring in video
During more complicated cosmetic variation, the result is that insecure.(2) reliability of target component is also critically important, because target
Different parts may have different outward appearance changes, illumination variation, motion blur or partial occlusion, if simply giving different portions
The equal weight of part, then regression result will lopsidedly be acted on by each part.It would therefore be desirable to each part is built
Its reliability of mould, to emphasize its significance level.
The content of the invention
It has been that solution object tracking process is divided into parts match, mesh to solve above mentioned problem of the prior art
Demarcate the independent step in position two, it is impossible to the problem of accomplishing directly to infer the position of target from part, an aspect of of the present present invention, carry
A kind of visual tracking method based on convolutional neural networks regression model is gone out, has comprised the following steps:
Step S1, in the initial frame of vision tracking, the sampling of image block is carried out according to given target to be tracked, to sampling
Obtained each image block carries out implicit division, after computation partition the offset information of each part and the target to be tracked and
Duplication;
Step S2, based on image block and the component information for sampling to obtain in step S1, it is utilized respectively stochastic gradient descent
Loss function derivation based on convolutional neural networks regression model of the method to building in advance, the renewal convolutional neural networks of iteration
Parameter value, until loss function reaches the default condition of convergence, trained based on convolutional neural networks regression model;
Step S3, in follow-up each frame of vision tracking, the position that is occurred based on the target to be tracked in previous frame
Region of search is constructed, image block sampling is carried out to region of search, and the image block of sampling is carried out after implicitly dividing, by described
What is trained obtains the position of target to be tracked described in present frame based on convolutional neural networks regression model;
Wherein,
Described to be based on convolutional neural networks regression model, its loss function loses L by returningreg, differentiate loss LdisAnd
The regular terms of convolutional neural networks weight parameter is formed.
Preferably, it is described build in advance be based on convolutional neural networks regression model, its loss function L:
Wherein, LregLost to return, LdisFor differentiate lose, Θ be convolutional neural networks weight parameter, λ1And λ2It is pre-
If regulatory factor.
Preferably, loss L is returnedregFor:
Wherein, Δ xkWith Δ ykRespectively convolutional neural networks input picture block J k-th of part is predicted its with treating
Track target's center's position horizontal stroke, the Center Offset of ordinate.
Preferably, loss L is differentiateddisFor:
Wherein, wkRepresent the discriminant value that convolutional neural networks are predicted input picture block J k-th of part;l∈{0,1}
It is label, represents two part k and k ' correlation, l=0 represents that part k has similar discriminant value with k ', and l=1 represents part
K has the discriminant value bigger than part k ';For image block and the Duplication of target to be tracked.
Preferably, it is described build in advance be based on convolutional neural networks regression model, its constraints:
During l=0, using distance restraint, its distance restraint function LsFormula be:
During l=1, constrained using contrast, its comparative constraint function LcFormula be:
Lc(wk,wk′)=s (τ-(wk-wk′));
Wherein, s (x)=max (0, x) is non-saturation nonlinearity function;τ is predetermined threshold value.
Preferably, the Duplication of image block and target to be trackedFor:
Wherein, BOXPATCHFor rectangle frame corresponding to image block;BOXGTFor rectangle frame corresponding to target to be tracked.
Preferably, in step S1, the given target to be tracked of the basis carries out the sampling of image block, and its method is:
It is that average constructs Gaussian function with the given state J=(x, y, s) of the target to be tracked, and is based on the Gaussian function
Number carries out the sampling of image block on different yardsticks and position;
Wherein, x, y, s represent the horizontal central point of target to be tracked, ordinate and yardstick respectively.
Preferably, the offset information of each part and the target to be tracked and overlapping after computation partition described in step S1
Rate, its method are:
Part and the off-centring of the target to be tracked are offset including abscissaOffset with ordinate
Wherein, xkAnd ykHorizontal stroke, the ordinate of respectively k-th part centre position;xGTAnd yGTTarget respectively to be tracked
Horizontal stroke, the ordinate of center;
Part and the Duplication s of the target to be trackedkFor:
Wherein, ROIkFor rectangle frame, ROI corresponding to a part of image blockGTRepresent rectangle corresponding to target to be tracked
Frame.
Preferably, described to be based on convolutional neural networks regression model, the center of target to be tracked described in its t frame is sat
MarkCalculating function be:
Wherein, xk,tAnd yk,tHorizontal stroke, the ordinate of k-th of part respectively in the region of search of t frames;Δxk,t、Δyk,tWith
wk,tRepresent that k-th of part is relative to target's center to be tracked in the current t region of search of convolutional neural networks prediction respectively
The transverse and longitudinal coordinate displacement of position and the discriminant value of the part;ZwIt is weight wk,tNormalization factor;K is in the middle part of the region of search of t frames
The sum of part.
Another aspect of the present invention, it is proposed that a kind of storage device, wherein be stored with a plurality of program, described program be suitable to by
Processor is loaded and performed with the above-mentioned visual tracking method based on convolutional neural networks regression model.
The third aspect of the present invention, it is proposed that a kind of processing unit, including
Processor, it is adapted for carrying out each bar program;And
Storage device, suitable for storing a plurality of program;
Described program is suitable to be loaded by processor and performed to realize:
The above-mentioned visual tracking method based on convolutional neural networks regression model.
Beneficial effects of the present invention:
(1) it is of the invention based on convolutional neural networks regression model, by considering the component information in image block, Ke Yicong
Component locations directly return out target location, it is achieved thereby that the target following of robustness;
(2) present invention is modeled using the feature of whole image block, it is contemplated that correlation and each part between part
Importance, so as to improve tracking effect.
Brief description of the drawings
Fig. 1 is the signal of the visual tracking method flow based on convolutional neural networks regression model of one embodiment of the invention
Figure.
Embodiment
The preferred embodiment of the present invention described with reference to the accompanying drawings.It will be apparent to a skilled person that this
A little embodiments are used only for explaining the technical principle of the present invention, it is not intended that limit the scope of the invention.
The purpose of the present invention is to carry out the tracking of the vision of robustness using convolutional neural networks regressive object position.This hair
Bright to have considered part contextual information and part reliability, in one end to end framework realizing part arrives target
Return.
The method of the present invention realizes that the regression model of robustness is used by convolutional neural networks in an end-to-end framework
In part to goal regression.The model proposed can not only utilize part contextual information to keep overall corresponding space layout
Structure, and learnt part reliability, to emphasize importance of the different parts for regressive object.Methods described includes four
Individual part:(1) image block is sampled in initial frame, builds sample labeling information;(2) optimization object function is established;(3) use with
Machine gradient descent method optimizes to object function until model is restrained;(4) estimated in subsequent frames using the model trained
Target is most stateful.Wherein 2) in establish optimization object function and can build in advance, so in the application process of reality, side
Method step can think (1), (3), (4).
As shown in figure 1, the visual tracking method based on convolutional neural networks regression model of one embodiment of the invention, bag
Include following steps:
Step S1, in the initial frame of vision tracking, the sampling of image block is carried out according to given target to be tracked, to sampling
Obtained each image block carries out implicit division, after computation partition the offset information of each part and the target to be tracked and
Duplication;Offset information herein is the skew letter after division between the central point of each part and the central point of the target to be tracked
Breath, Duplication herein is each part after division and the Duplication of the target to be tracked.
Step S2, based on image block and the component information for sampling to obtain in step S1, it is utilized respectively stochastic gradient descent
Loss function derivation based on convolutional neural networks regression model of the method to building in advance, the renewal convolutional neural networks of iteration
Parameter value, until loss function reaches the default condition of convergence, trained based on convolutional neural networks regression model;
Step S3, in follow-up each frame of vision tracking, the position that is occurred based on the target to be tracked in previous frame
Region of search is constructed, image block sampling is carried out to region of search, and the image block of sampling is carried out after implicitly dividing, by described
What is trained obtains the position of target to be tracked described in present frame based on convolutional neural networks regression model;
Wherein,
Described to be based on convolutional neural networks regression model, its loss function loses L by returningreg, differentiate loss LdisAnd
The regular terms of convolutional neural networks weight parameter is formed.
Implicit division i.e. by an image block it is implicit be divided into multiple parts, record these parts in this image block
Positional information.Image block or original expression after implicit division, simply have recorded component information existing for the inside.In contrast
The display division answered refers to really disassembling image block not to have existed for multiple small image of component blocks, initial image block.
The region of search constructed in the step S3 of the present embodiment can be rectangle, the center of this rectangle be it is described treat with
The center for the position that track target occurs in previous frame, height, the width of rectangle are that the target to be tracked is right in previous frame respectively
High, wide 2 times for the rectangle frame answered.Certainly, the region of search constructed can also be other shapes, its area size covered
Other sizes are can also be, as long as it can cover the band of position that the target to be tracked occurs in previous frame in the current frame
Corresponding region, and reserve certain nargin scope.
In the step S3 of the present embodiment, image block sampling is carried out to region of search, and the image block of sampling is carried out implicitly
Division, Ke Yiwei:Carry out sampling N1 image block (N1=in one embodiment in this big rectangular area in region of search
100) and to sampling to image block implicitly divided.
Convolutional neural networks regression model is based on described in the present embodiment, the centre bit of target to be tracked described in its t frame
Put coordinateCalculating function such as formula (1) shown in:
Wherein, xk,tAnd yk,tHorizontal stroke, the ordinate of k-th of part respectively in the region of search of t frames;Δxk,t、Δyk,tWith
wk,tRepresent that k-th of part is relative to target's center to be tracked in the current t region of search of convolutional neural networks prediction respectively
The transverse and longitudinal coordinate displacement of position and the discriminant value of the part;ZwIt is weight wk,tNormalization factor;K is in the middle part of the region of search of t frames
The sum of part.
Convolutional neural networks regression model is based on described in the present embodiment, its object function is to minimize a loss letter
Number, for the image block of an input, its loss function L loses L by returningreg, differentiate loss LdisAnd convolutional neural networks
The regular terms of weight parameter is formed.
Shown in loss function L such as formula (2):
Wherein, LregLost to return, LdisLost to differentiate, the weight parameter of Θ convolutional neural networks, λ1And λ2It is default
Regulatory factor.
Return loss LregAs shown in formula (3):
Wherein, Δ xkWith Δ ykRespectively convolutional neural networks input picture block J k-th of part is predicted its with treating
Track target's center's position horizontal stroke, the Center Offset of ordinate.
Differentiate loss LdisAs shown in formula (4):
Wherein, wkRepresent the discriminant value that convolutional neural networks are predicted input picture block J k-th of part;l∈{0,1}
It is label, represents two part k and k ' correlation, l=0 represents that part k has similar discriminant value with k ', and l=1 represents part
K has the discriminant value bigger than part k ';For image block and the Duplication of target to be tracked.
In the present embodiment, the constraints based on convolutional neural networks regression model is:
During l=0, using distance restraint, its distance restraint function LsAs shown in formula (5):
During l=1, constrained using contrast, its comparative constraint function LcAs shown in formula (6):
Lc(wk,wk′)=s (τ-(wk-wk′)) (6)
Wherein, s (x)=max (0, x) is non-saturation nonlinearity function;τ is predetermined threshold value.
In the present embodiment, the Duplication of image block and target to be trackedCalculating such as formula (7) shown in:
Wherein, BOXPATCHFor rectangle frame corresponding to image block;BOXGTFor rectangle frame corresponding to target to be tracked.Herein
Image block, part, rectangle frame corresponding to target to be followed the trail of are the minimum enclosed rectangle of corresponding content.
In the present embodiment, step S1 can be split as two steps:
Step S11, the sampling of image block is carried out according to given target to be tracked.Target to be tracked can be any herein
Object interested, including people, vehicle, animal, commodity etc..
It is that average constructs Gaussian function with the given state J=(x, y, s) of the target to be tracked, and is based on the Gaussian function
Number carries out the sampling of image block on different yardsticks and position;Wherein, x, y, s represent the central point of target to be tracked respectively
Horizontal, ordinate and yardstick.
Step S12, carries out implicit division to the obtained each image block of sampling, after computation partition each part with it is described
The offset information and Duplication of target to be tracked.
Implicit dividing mode can be any type herein, such as be divided into several formed objects or different size of
Part, can have between part it is overlapping can also be not overlapping.Image block is divided into 9 area equations, no in present invention experiment
Overlapping part.
Part and the off-centring of the target to be tracked are offset including abscissaOffset with ordinateIts
Calculate respectively as shown in formula (8), (9):
Wherein, xkAnd ykHorizontal stroke, the ordinate of respectively k-th part centre position;xGTAnd yGTTarget respectively to be tracked
Horizontal stroke, the ordinate of center.
Part and the Duplication s of the target to be trackedkAs shown in formula (10)
Wherein, ROIkFor rectangle frame, ROI corresponding to a part of image blockGTRepresent that target to be tracked corresponds to rectangle frame,
∩ represents the common factor between image block, and ∪ represents the union between image block, and area () expression obtains the area of given image block.
Hardware, the software mould of computing device can be used with reference to the step of method that the embodiments described herein describes
Block, or the two combination are implemented.Software module can be placed in random access memory (RAM), internal memory, read-only storage (ROM),
Institute is public in electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technical field
In any other form of storage medium known.
A kind of storage device of embodiment of the present invention, wherein being stored with a plurality of program, described program is suitable to by processor
Load and perform to realize the above-mentioned visual tracking method based on convolutional neural networks regression model.
A kind of processing unit of embodiment of the present invention, including processor, storage device;Processor, it is adapted for carrying out each bar
Program;Storage device, suitable for storing a plurality of program;Described program is suitable to be loaded by processor and performed to realize:Above-mentioned base
In the visual tracking method of convolutional neural networks regression model.
Person of ordinary skill in the field can be understood that, for convenience and simplicity of description, foregoing description
The specific work process of device and relevant explanation, may be referred to the corresponding process in preceding method embodiment, will not be repeated here.
Those skilled in the art should be able to recognize that, the mould of each example described with reference to the embodiments described herein
Block, method and step, it can be realized with electronic hardware, computer software or the combination of the two, in order to clearly demonstrate electronics
The interchangeability of hardware and software, the composition and step of each example are generally described according to function in the above description
Suddenly.These functions are performed with electronic hardware or software mode actually, and the application-specific and design depending on technical scheme are about
Beam condition.Those skilled in the art can realize described function using distinct methods to each specific application, but
It is this realization it is not considered that beyond the scope of this invention.
Term " comprising " or any other like term are intended to including for nonexcludability, so that including a system
Process, the method for row key element not only include those key elements, but also other key elements including being not expressly set out, or also include
The intrinsic key element of these processes, method.
So far, combined preferred embodiment shown in the drawings describes technical scheme, still, this area
Technical staff is it is easily understood that protection scope of the present invention is expressly not limited to these embodiments.Without departing from this
On the premise of the principle of invention, those skilled in the art can make equivalent change or replacement to correlation technique feature, these
Technical scheme after changing or replacing it is fallen within protection scope of the present invention.
Claims (11)
1. a kind of visual tracking method based on convolutional neural networks regression model, it is characterised in that comprise the following steps:
Step S1, in the initial frame of vision tracking, the sampling of image block is carried out according to given target to be tracked, sampling is obtained
Each image block carry out implicit division, the offset information of each part and the target to be tracked and overlapping after computation partition
Rate;
Step S2, based on image block and the component information for sampling to obtain in step S1, it is utilized respectively stochastic gradient descent method pair
The loss function derivation based on convolutional neural networks regression model built in advance, the parameter of the renewal convolutional neural networks of iteration
Value, until loss function reaches the default condition of convergence, trained based on convolutional neural networks regression model;
Step S3, in follow-up each frame of vision tracking, the placement configurations that are occurred based on the target to be tracked in previous frame
Region of search, image block sampling is carried out to region of search, and the image block of sampling is carried out after implicitly dividing, pass through the training
Good obtains the position of target to be tracked described in present frame based on convolutional neural networks regression model;
Wherein,
Described to be based on convolutional neural networks regression model, its loss function loses L by returningreg, differentiate loss LdisAnd convolution
The regular terms of neutral net weight parameter is formed.
2. visual tracking method according to claim 1, it is characterised in that it is described build in advance based on convolutional Neural net
Network regression model, its loss function L are:
<mrow>
<mi>L</mi>
<mo>=</mo>
<msub>
<mi>L</mi>
<mrow>
<mi>r</mi>
<mi>e</mi>
<mi>g</mi>
</mrow>
</msub>
<mo>+</mo>
<msub>
<mi>&lambda;</mi>
<mn>1</mn>
</msub>
<msub>
<mi>L</mi>
<mrow>
<mi>d</mi>
<mi>i</mi>
<mi>s</mi>
</mrow>
</msub>
<mo>+</mo>
<msub>
<mi>&lambda;</mi>
<mn>1</mn>
</msub>
<msubsup>
<mrow>
<mo>|</mo>
<mrow>
<mo>|</mo>
<mi>&Theta;</mi>
<mo>|</mo>
</mrow>
<mo>|</mo>
</mrow>
<mi>F</mi>
<mn>2</mn>
</msubsup>
</mrow>
Wherein, LregLost to return, LdisFor differentiate lose, Θ be convolutional neural networks weight parameter, λ1And λ2It is default
Regulatory factor.
3. visual tracking method according to claim 2, it is characterised in that return loss LregFor:
<mrow>
<msub>
<mi>L</mi>
<mrow>
<mi>r</mi>
<mi>e</mi>
<mi>g</mi>
</mrow>
</msub>
<mo>=</mo>
<munder>
<mo>&Sigma;</mo>
<mi>k</mi>
</munder>
<msubsup>
<mn>1</mn>
<mi>k</mi>
<mrow>
<mi>t</mi>
<mi>a</mi>
<mi>r</mi>
</mrow>
</msubsup>
<mo>&lsqb;</mo>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>&Delta;y</mi>
<mi>k</mi>
</msub>
<mo>-</mo>
<mi>&Delta;</mi>
<msub>
<mover>
<mi>y</mi>
<mo>^</mo>
</mover>
<mi>k</mi>
</msub>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>+</mo>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>&Delta;x</mi>
<mi>k</mi>
</msub>
<mo>-</mo>
<mi>&Delta;</mi>
<msub>
<mover>
<mi>x</mi>
<mo>^</mo>
</mover>
<mi>k</mi>
</msub>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>&rsqb;</mo>
</mrow>
Wherein, Δ xkWith Δ ykRespectively convolutional neural networks input picture block J k-th of part is predicted its with it is to be tracked
Target's center position is horizontal, the Center Offset of ordinate.
4. visual tracking method according to claim 3, it is characterised in that differentiate loss LdisFor:
<mrow>
<msub>
<mi>L</mi>
<mrow>
<mi>d</mi>
<mi>i</mi>
<mi>s</mi>
</mrow>
</msub>
<mo>=</mo>
<munder>
<mo>&Sigma;</mo>
<mrow>
<mi>k</mi>
<mo>,</mo>
<msup>
<mi>k</mi>
<mo>&prime;</mo>
</msup>
<mo>,</mo>
<mi>k</mi>
<mo>&NotEqual;</mo>
<msup>
<mi>k</mi>
<mo>&prime;</mo>
</msup>
</mrow>
</munder>
<mo>{</mo>
<msub>
<mi>lL</mi>
<mi>c</mi>
</msub>
<mrow>
<mo>(</mo>
<msub>
<mi>w</mi>
<mi>k</mi>
</msub>
<mo>,</mo>
<msub>
<mi>w</mi>
<msup>
<mi>k</mi>
<mo>&prime;</mo>
</msup>
</msub>
<mo>)</mo>
</mrow>
<mo>+</mo>
<mrow>
<mo>(</mo>
<mn>1</mn>
<mo>-</mo>
<mi>l</mi>
<mo>)</mo>
</mrow>
<msub>
<mi>L</mi>
<mi>s</mi>
</msub>
<mrow>
<mo>(</mo>
<msub>
<mi>w</mi>
<mi>k</mi>
</msub>
<mo>,</mo>
<msub>
<mi>w</mi>
<msup>
<mi>k</mi>
<mo>&prime;</mo>
</msup>
</msub>
<mo>)</mo>
</mrow>
<mo>}</mo>
<mo>+</mo>
<msup>
<mrow>
<mo>(</mo>
<mo>(</mo>
<mrow>
<msub>
<mi>&Sigma;</mi>
<mi>k</mi>
</msub>
<msub>
<mi>w</mi>
<mi>k</mi>
</msub>
</mrow>
<mo>)</mo>
<mo>-</mo>
<mover>
<mi>w</mi>
<mo>^</mo>
</mover>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
</mrow>
Wherein, wkRepresent the discriminant value that convolutional neural networks are predicted input picture block J k-th of part;L ∈ { 0,1 } are marks
Label, two part k and k ' correlation is represented, l=0 represents that part k has similar discriminant value with k ', and l=1 represents that part k has
The discriminant value bigger than part k ';For image block and the Duplication of target to be tracked.
5. visual tracking method according to claim 4, it is characterised in that it is described build in advance based on convolutional Neural net
Network regression model, its constraints are:
During l=0, using distance restraint, its distance restraint function LsFormula be:
<mrow>
<msub>
<mi>L</mi>
<mi>s</mi>
</msub>
<mrow>
<mo>(</mo>
<msub>
<mi>w</mi>
<mi>k</mi>
</msub>
<mo>,</mo>
<msub>
<mi>w</mi>
<msup>
<mi>k</mi>
<mo>&prime;</mo>
</msup>
</msub>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<mn>2</mn>
</mfrac>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>w</mi>
<mi>k</mi>
</msub>
<mo>-</mo>
<msub>
<mi>w</mi>
<msup>
<mi>k</mi>
<mo>&prime;</mo>
</msup>
</msub>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>;</mo>
</mrow>
During l=1, constrained using contrast, its comparative constraint function LcFormula be:
Lc(wk,wk′)=s (τ-(wk-wk′));
Wherein, s (x)=max (0, x) is non-saturation nonlinearity function;τ is predetermined threshold value.
6. visual tracking method according to claim 4, it is characterised in that the Duplication of image block and target to be tracked
For:
<mrow>
<mover>
<mi>w</mi>
<mo>^</mo>
</mover>
<mo>=</mo>
<mfrac>
<mrow>
<mi>a</mi>
<mi>r</mi>
<mi>e</mi>
<mi>a</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>BOX</mi>
<mrow>
<mi>P</mi>
<mi>A</mi>
<mi>T</mi>
<mi>C</mi>
<mi>H</mi>
</mrow>
</msub>
<mo>&cap;</mo>
<msub>
<mi>BOX</mi>
<mrow>
<mi>G</mi>
<mi>T</mi>
</mrow>
</msub>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<mi>a</mi>
<mi>r</mi>
<mi>e</mi>
<mi>a</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>BOX</mi>
<mrow>
<mi>P</mi>
<mi>A</mi>
<mi>T</mi>
<mi>C</mi>
<mi>H</mi>
</mrow>
</msub>
<mo>&cup;</mo>
<msub>
<mi>BOX</mi>
<mrow>
<mi>G</mi>
<mi>T</mi>
</mrow>
</msub>
<mo>)</mo>
</mrow>
</mrow>
</mfrac>
</mrow>
Wherein, BOXPATCHFor rectangle frame corresponding to image block;BOXGTFor rectangle frame corresponding to target to be tracked.
7. according to the visual tracking method described in claim any one of 1-6, it is characterised in that in step S1, the basis is given
Fixed target to be tracked carries out the sampling of image block, and its method is:
It is that average constructs Gaussian function with the given state J=(x, y, s) of the target to be tracked, and is existed based on the Gaussian function
The sampling of image block is carried out on different yardsticks and position;
Wherein, x, y, s represent the horizontal central point of target to be tracked, ordinate and yardstick respectively.
8. visual tracking method according to claim 7, it is characterised in that each portion after computation partition described in step S1
Part and the offset information and Duplication of the target to be tracked, its method are:
Part and the off-centring of the target to be tracked are offset including abscissaOffset with ordinate
<mrow>
<mi>&Delta;</mi>
<msub>
<mover>
<mi>x</mi>
<mo>^</mo>
</mover>
<mi>k</mi>
</msub>
<mo>=</mo>
<msub>
<mi>x</mi>
<mi>k</mi>
</msub>
<mo>-</mo>
<msub>
<mi>x</mi>
<mrow>
<mi>G</mi>
<mi>T</mi>
</mrow>
</msub>
</mrow>
<mrow>
<mi>&Delta;</mi>
<msub>
<mover>
<mi>y</mi>
<mo>^</mo>
</mover>
<mi>k</mi>
</msub>
<mo>=</mo>
<msub>
<mi>y</mi>
<mi>k</mi>
</msub>
<mo>-</mo>
<msub>
<mi>y</mi>
<mrow>
<mi>G</mi>
<mi>T</mi>
</mrow>
</msub>
</mrow>
Wherein, xkAnd ykHorizontal stroke, the ordinate of respectively k-th part centre position;xGTAnd yGTTarget's center respectively to be tracked
Horizontal stroke, the ordinate of position;
Part and the Duplication s of the target to be trackedkFor:
<mrow>
<msub>
<mi>s</mi>
<mi>k</mi>
</msub>
<mo>=</mo>
<mfrac>
<mrow>
<mi>a</mi>
<mi>r</mi>
<mi>e</mi>
<mi>a</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>ROI</mi>
<mi>k</mi>
</msub>
<mo>&cap;</mo>
<msub>
<mi>ROI</mi>
<mrow>
<mi>G</mi>
<mi>T</mi>
</mrow>
</msub>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<mi>a</mi>
<mi>r</mi>
<mi>e</mi>
<mi>a</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>ROI</mi>
<mi>k</mi>
</msub>
<mo>&cup;</mo>
<msub>
<mi>ROI</mi>
<mrow>
<mi>G</mi>
<mi>T</mi>
</mrow>
</msub>
<mo>)</mo>
</mrow>
</mrow>
</mfrac>
</mrow>
Wherein, ROIkFor rectangle frame, ROI corresponding to a part of image blockGTRepresent rectangle frame corresponding to target to be tracked.
9. according to the visual tracking method described in claim any one of 1-6, it is characterised in that described to be based on convolutional neural networks
Regression model, the center position coordinates of target to be tracked described in its t frameCalculating function be:
<mrow>
<mo>&lsqb;</mo>
<msubsup>
<mi>y</mi>
<mi>t</mi>
<mo>*</mo>
</msubsup>
<mo>,</mo>
<msubsup>
<mi>x</mi>
<mi>t</mi>
<mo>*</mo>
</msubsup>
<mo>&rsqb;</mo>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<msub>
<mi>Z</mi>
<mi>w</mi>
</msub>
</mfrac>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>k</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>K</mi>
</munderover>
<msub>
<mi>w</mi>
<mrow>
<mi>k</mi>
<mo>,</mo>
<mi>t</mi>
</mrow>
</msub>
<mo>&lsqb;</mo>
<mrow>
<mo>(</mo>
<msub>
<mi>y</mi>
<mrow>
<mi>k</mi>
<mo>,</mo>
<mi>t</mi>
</mrow>
</msub>
<mo>+</mo>
<msub>
<mi>&Delta;y</mi>
<mrow>
<mi>k</mi>
<mo>,</mo>
<mi>t</mi>
</mrow>
</msub>
<mo>)</mo>
</mrow>
<mo>,</mo>
<mrow>
<mo>(</mo>
<msub>
<mi>x</mi>
<mrow>
<mi>k</mi>
<mo>,</mo>
<mi>t</mi>
</mrow>
</msub>
<mo>+</mo>
<msub>
<mi>&Delta;x</mi>
<mrow>
<mi>k</mi>
<mo>,</mo>
<mi>t</mi>
</mrow>
</msub>
<mo>)</mo>
</mrow>
<mo>&rsqb;</mo>
</mrow>
Wherein, xk,tAnd yk,tHorizontal stroke, the ordinate of k-th of part respectively in the region of search of t frames;Δxk,t、Δyk,tAnd wk,tPoint
Not Biao Shi convolutional neural networks prediction current t region of search in k-th of part relative to target's center position to be tracked
Transverse and longitudinal coordinate displacement and the part discriminant value;ZwIt is weight wk,tNormalization factor;K is part in the region of search of t frames
Sum.
10. a kind of storage device, wherein being stored with a plurality of program, it is characterised in that described program is suitable to by processor loading simultaneously
Perform to realize the visual tracking method based on convolutional neural networks regression model described in claim any one of 1-9.
11. a kind of processing unit, including
Processor, it is adapted for carrying out each bar program;And
Storage device, suitable for storing a plurality of program;
Characterized in that, described program is suitable to be loaded by processor and performed to realize:
The visual tracking method based on convolutional neural networks regression model described in claim any one of 1-9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710595279.6A CN107527355B (en) | 2017-07-20 | 2017-07-20 | Visual tracking method and device based on convolutional neural network regression model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710595279.6A CN107527355B (en) | 2017-07-20 | 2017-07-20 | Visual tracking method and device based on convolutional neural network regression model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107527355A true CN107527355A (en) | 2017-12-29 |
CN107527355B CN107527355B (en) | 2020-08-11 |
Family
ID=60749049
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710595279.6A Active CN107527355B (en) | 2017-07-20 | 2017-07-20 | Visual tracking method and device based on convolutional neural network regression model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107527355B (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108416428A (en) * | 2018-02-28 | 2018-08-17 | 中国计量大学 | A kind of robot visual orientation method based on convolutional neural networks |
CN108510523A (en) * | 2018-03-16 | 2018-09-07 | 新智认知数据服务有限公司 | It is a kind of to establish the model for obtaining object feature and object searching method and device |
CN108805204A (en) * | 2018-06-12 | 2018-11-13 | 东北大学 | Electrical energy power quality disturbance analytical equipment based on deep neural network and its application method |
CN109389543A (en) * | 2018-09-11 | 2019-02-26 | 深圳大学 | Bus operation data statistical approach, calculates equipment and storage medium at system |
CN109636846A (en) * | 2018-12-06 | 2019-04-16 | 重庆邮电大学 | Object localization method based on circulation attention convolutional neural networks |
CN109711332A (en) * | 2018-12-26 | 2019-05-03 | 浙江捷尚视觉科技股份有限公司 | A kind of face tracking method and application based on regression algorithm |
CN109829936A (en) * | 2019-01-29 | 2019-05-31 | 青岛海信网络科技股份有限公司 | A kind of method and apparatus of target tracking |
CN110060274A (en) * | 2019-04-12 | 2019-07-26 | 北京影谱科技股份有限公司 | The visual target tracking method and device of neural network based on the dense connection of depth |
CN110807515A (en) * | 2019-10-30 | 2020-02-18 | 北京百度网讯科技有限公司 | Model generation method and device |
CN112634344A (en) * | 2020-12-15 | 2021-04-09 | 西安理工大学 | Method for detecting center position of cold-rolled strip coil shaft hole based on machine vision |
CN112861652A (en) * | 2021-01-20 | 2021-05-28 | 中国科学院自动化研究所 | Method and system for tracking and segmenting video target based on convolutional neural network |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104573731A (en) * | 2015-02-06 | 2015-04-29 | 厦门大学 | Rapid target detection method based on convolutional neural network |
CN105243398A (en) * | 2015-09-08 | 2016-01-13 | 西安交通大学 | Method of improving performance of convolutional neural network based on linear discriminant analysis criterion |
CN106599805A (en) * | 2016-12-01 | 2017-04-26 | 华中科技大学 | Supervised data driving-based monocular video depth estimating method |
CN106709936A (en) * | 2016-12-14 | 2017-05-24 | 北京工业大学 | Single target tracking method based on convolution neural network |
-
2017
- 2017-07-20 CN CN201710595279.6A patent/CN107527355B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104573731A (en) * | 2015-02-06 | 2015-04-29 | 厦门大学 | Rapid target detection method based on convolutional neural network |
CN105243398A (en) * | 2015-09-08 | 2016-01-13 | 西安交通大学 | Method of improving performance of convolutional neural network based on linear discriminant analysis criterion |
CN106599805A (en) * | 2016-12-01 | 2017-04-26 | 华中科技大学 | Supervised data driving-based monocular video depth estimating method |
CN106709936A (en) * | 2016-12-14 | 2017-05-24 | 北京工业大学 | Single target tracking method based on convolution neural network |
Non-Patent Citations (2)
Title |
---|
HYEONSEOB NAM等: "Learning Multi-Domain Convolational Neural networks for visual tracking", 《THE IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 * |
何振军: "基于卷积神经网络的车辆检测算法研究", 《万方学位论文》 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108416428A (en) * | 2018-02-28 | 2018-08-17 | 中国计量大学 | A kind of robot visual orientation method based on convolutional neural networks |
CN108510523A (en) * | 2018-03-16 | 2018-09-07 | 新智认知数据服务有限公司 | It is a kind of to establish the model for obtaining object feature and object searching method and device |
CN108805204A (en) * | 2018-06-12 | 2018-11-13 | 东北大学 | Electrical energy power quality disturbance analytical equipment based on deep neural network and its application method |
CN109389543B (en) * | 2018-09-11 | 2022-03-04 | 深圳大学 | Bus operation data statistical method, system, computing device and storage medium |
CN109389543A (en) * | 2018-09-11 | 2019-02-26 | 深圳大学 | Bus operation data statistical approach, calculates equipment and storage medium at system |
CN109636846A (en) * | 2018-12-06 | 2019-04-16 | 重庆邮电大学 | Object localization method based on circulation attention convolutional neural networks |
CN109636846B (en) * | 2018-12-06 | 2022-10-11 | 重庆邮电大学 | Target positioning method based on cyclic attention convolution neural network |
CN109711332A (en) * | 2018-12-26 | 2019-05-03 | 浙江捷尚视觉科技股份有限公司 | A kind of face tracking method and application based on regression algorithm |
CN109829936A (en) * | 2019-01-29 | 2019-05-31 | 青岛海信网络科技股份有限公司 | A kind of method and apparatus of target tracking |
CN109829936B (en) * | 2019-01-29 | 2021-12-24 | 青岛海信网络科技股份有限公司 | Target tracking method and device |
CN110060274A (en) * | 2019-04-12 | 2019-07-26 | 北京影谱科技股份有限公司 | The visual target tracking method and device of neural network based on the dense connection of depth |
CN110807515A (en) * | 2019-10-30 | 2020-02-18 | 北京百度网讯科技有限公司 | Model generation method and device |
CN110807515B (en) * | 2019-10-30 | 2023-04-28 | 北京百度网讯科技有限公司 | Model generation method and device |
CN112634344A (en) * | 2020-12-15 | 2021-04-09 | 西安理工大学 | Method for detecting center position of cold-rolled strip coil shaft hole based on machine vision |
CN112861652A (en) * | 2021-01-20 | 2021-05-28 | 中国科学院自动化研究所 | Method and system for tracking and segmenting video target based on convolutional neural network |
Also Published As
Publication number | Publication date |
---|---|
CN107527355B (en) | 2020-08-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107527355A (en) | Visual tracking method, device based on convolutional neural networks regression model | |
JP6709283B2 (en) | Detection and analysis of moving vehicles using low resolution remote sensing images | |
CN108022012A (en) | Vehicle location Forecasting Methodology based on deep learning | |
CN107862705A (en) | A kind of unmanned plane small target detecting method based on motion feature and deep learning feature | |
CN105760849B (en) | Target object behavioral data acquisition methods and device based on video | |
CN106845351A (en) | It is a kind of for Activity recognition method of the video based on two-way length mnemon in short-term | |
CN104915970A (en) | Multi-target tracking method based on track association | |
CN111626184B (en) | Crowd density estimation method and system | |
CN107301369A (en) | Road traffic congestion analysis method based on Aerial Images | |
CN108345875A (en) | Wheeled region detection model training method, detection method and device | |
Li et al. | Pedestrian detection based on deep learning model | |
CN104156982B (en) | Motion target tracking method and device | |
CN107194366A (en) | The Activity recognition method of son is described based on dense track covariance | |
CN115699102A (en) | Tracking multiple objects in a video stream using occlusion aware single object tracking | |
CN107832716A (en) | Method for detecting abnormality based on active-passive Gauss on-line study | |
CN111126515B (en) | Model training method based on artificial intelligence and related device | |
CN105844667A (en) | Structural target tracking method of compact color coding | |
Liu et al. | A novel facial mask detection using fast-yolo algorithm | |
Firouznia et al. | Chaotic particle filter for visual object tracking | |
Wang et al. | Collaborative 3d object detection for autonomous vehicles via learnable communications | |
CN112819889B (en) | Method and device for determining position information, storage medium and electronic device | |
Balasubramaniam et al. | R-TOSS: A framework for real-time object detection using semi-structured pruning | |
CN114386691A (en) | Occupant damage prediction method and device based on stress posture prediction | |
CN106204639A (en) | Based on frequency domain regression model target tracking method, system and senior drive assist system | |
CN106407975A (en) | Multi-dimensional layered object detection method based on space-spectrum constraint |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |