CN107527355A - Visual tracking method, device based on convolutional neural networks regression model - Google Patents

Visual tracking method, device based on convolutional neural networks regression model Download PDF

Info

Publication number
CN107527355A
CN107527355A CN201710595279.6A CN201710595279A CN107527355A CN 107527355 A CN107527355 A CN 107527355A CN 201710595279 A CN201710595279 A CN 201710595279A CN 107527355 A CN107527355 A CN 107527355A
Authority
CN
China
Prior art keywords
mrow
msub
target
tracked
convolutional neural
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710595279.6A
Other languages
Chinese (zh)
Other versions
CN107527355B (en
Inventor
徐常胜
张天柱
高君宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201710595279.6A priority Critical patent/CN107527355B/en
Publication of CN107527355A publication Critical patent/CN107527355A/en
Application granted granted Critical
Publication of CN107527355B publication Critical patent/CN107527355B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/223Analysis of motion using block-matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The present invention relates to computer vision field, propose a kind of visual tracking method based on convolutional neural networks regression model, device, aim to solve the problem that object tracking process is divided into parts match, target positions two independent steps, it can not accomplish that this method includes the problem of directly inferring the position of target from part:S1, in the initial frame of vision tracking, the sampling of image block is carried out according to given target to be tracked, and be divided into multiple parts;S2, it is trained using stochastic gradient descent method to building in advance based on convolutional neural networks regression model;S3, in follow-up each frame of vision tracking, the placement configurations region of search that is occurred based on the target to be tracked in previous frame, by it is described train the position of target to be tracked described in present frame is obtained based on convolutional neural networks regression model.Part and target positioning have been carried out abundant combination by the present invention, have preferable robustness.

Description

Visual tracking method, device based on convolutional neural networks regression model
Technical field
The invention belongs to computer vision field, and in particular to a kind of vision based on convolutional neural networks regression model with Track method, apparatus.
Background technology
Vision tracking is most basic part in computer vision application, intelligent video monitoring, augmented reality, machine People learns to be required for tracking target interested with carrying out robustness with fields such as man-machine interactions.Although the field is in recent years It has been made significant headway that, but vision tracking is still a difficult task, because it faces following challenge, such as part hides Gear, deformation, illumination variation, motion blur, quick motion, background are in a jumble and ratio changes etc..
In order to solve these problems, much the method based on part is widely studied in recent years, and this method is by object Body is decomposed into a group parts and studied.In fact, when part masking or deformation be present, some parts of target remain in that It can be seen that so as to provide reliable clue for tracking.It is most of in these methods to be counted as first between consecutive frame Parts match, then realize part to the tracking problem of part.This method generally includes two key steps:(1) parts match: Its basic thought is to match corresponding part in subsequent frames using the part in the frame of video one;(2) target positions:Based on portion Part matching result, by considering the space constraint between different parts, estimate the state of target.The major defect of this method It is that tracking process is divided into two independent steps, it is impossible to accomplish directly to infer the position of target from part.
Different from existing method, intuitively idea is how using a part for target directly to return out mesh for another Target position.In the current frame, if target object can be divided into dry part by us.If we know that how to estimate target Translation between part and the center of destination object, we are obtained with all position candidates of target, for inferring target Position.Therefore, part can be by directly solving to the tracking problem of target in target component upper returning.For achievement unit Robustness of the part to goal regression, it is necessary to be modeled to part contextual information and part reliability, reason is as follows:(1) portion The contextual information of part is critically important, because it is carried by recurrence of the Space Layout Structure of object of reservation to part to target For additional constraint.If we individually carry out part to each part to the regression algorithm of target, then when occurring in video During more complicated cosmetic variation, the result is that insecure.(2) reliability of target component is also critically important, because target Different parts may have different outward appearance changes, illumination variation, motion blur or partial occlusion, if simply giving different portions The equal weight of part, then regression result will lopsidedly be acted on by each part.It would therefore be desirable to each part is built Its reliability of mould, to emphasize its significance level.
The content of the invention
It has been that solution object tracking process is divided into parts match, mesh to solve above mentioned problem of the prior art Demarcate the independent step in position two, it is impossible to the problem of accomplishing directly to infer the position of target from part, an aspect of of the present present invention, carry A kind of visual tracking method based on convolutional neural networks regression model is gone out, has comprised the following steps:
Step S1, in the initial frame of vision tracking, the sampling of image block is carried out according to given target to be tracked, to sampling Obtained each image block carries out implicit division, after computation partition the offset information of each part and the target to be tracked and Duplication;
Step S2, based on image block and the component information for sampling to obtain in step S1, it is utilized respectively stochastic gradient descent Loss function derivation based on convolutional neural networks regression model of the method to building in advance, the renewal convolutional neural networks of iteration Parameter value, until loss function reaches the default condition of convergence, trained based on convolutional neural networks regression model;
Step S3, in follow-up each frame of vision tracking, the position that is occurred based on the target to be tracked in previous frame Region of search is constructed, image block sampling is carried out to region of search, and the image block of sampling is carried out after implicitly dividing, by described What is trained obtains the position of target to be tracked described in present frame based on convolutional neural networks regression model;
Wherein,
Described to be based on convolutional neural networks regression model, its loss function loses L by returningreg, differentiate loss LdisAnd The regular terms of convolutional neural networks weight parameter is formed.
Preferably, it is described build in advance be based on convolutional neural networks regression model, its loss function L:
Wherein, LregLost to return, LdisFor differentiate lose, Θ be convolutional neural networks weight parameter, λ1And λ2It is pre- If regulatory factor.
Preferably, loss L is returnedregFor:
Wherein, Δ xkWith Δ ykRespectively convolutional neural networks input picture block J k-th of part is predicted its with treating Track target's center's position horizontal stroke, the Center Offset of ordinate.
Preferably, loss L is differentiateddisFor:
Wherein, wkRepresent the discriminant value that convolutional neural networks are predicted input picture block J k-th of part;l∈{0,1} It is label, represents two part k and k ' correlation, l=0 represents that part k has similar discriminant value with k ', and l=1 represents part K has the discriminant value bigger than part k ';For image block and the Duplication of target to be tracked.
Preferably, it is described build in advance be based on convolutional neural networks regression model, its constraints:
During l=0, using distance restraint, its distance restraint function LsFormula be:
During l=1, constrained using contrast, its comparative constraint function LcFormula be:
Lc(wk,wk′)=s (τ-(wk-wk′));
Wherein, s (x)=max (0, x) is non-saturation nonlinearity function;τ is predetermined threshold value.
Preferably, the Duplication of image block and target to be trackedFor:
Wherein, BOXPATCHFor rectangle frame corresponding to image block;BOXGTFor rectangle frame corresponding to target to be tracked.
Preferably, in step S1, the given target to be tracked of the basis carries out the sampling of image block, and its method is:
It is that average constructs Gaussian function with the given state J=(x, y, s) of the target to be tracked, and is based on the Gaussian function Number carries out the sampling of image block on different yardsticks and position;
Wherein, x, y, s represent the horizontal central point of target to be tracked, ordinate and yardstick respectively.
Preferably, the offset information of each part and the target to be tracked and overlapping after computation partition described in step S1 Rate, its method are:
Part and the off-centring of the target to be tracked are offset including abscissaOffset with ordinate
Wherein, xkAnd ykHorizontal stroke, the ordinate of respectively k-th part centre position;xGTAnd yGTTarget respectively to be tracked Horizontal stroke, the ordinate of center;
Part and the Duplication s of the target to be trackedkFor:
Wherein, ROIkFor rectangle frame, ROI corresponding to a part of image blockGTRepresent rectangle corresponding to target to be tracked Frame.
Preferably, described to be based on convolutional neural networks regression model, the center of target to be tracked described in its t frame is sat MarkCalculating function be:
Wherein, xk,tAnd yk,tHorizontal stroke, the ordinate of k-th of part respectively in the region of search of t frames;Δxk,t、Δyk,tWith wk,tRepresent that k-th of part is relative to target's center to be tracked in the current t region of search of convolutional neural networks prediction respectively The transverse and longitudinal coordinate displacement of position and the discriminant value of the part;ZwIt is weight wk,tNormalization factor;K is in the middle part of the region of search of t frames The sum of part.
Another aspect of the present invention, it is proposed that a kind of storage device, wherein be stored with a plurality of program, described program be suitable to by Processor is loaded and performed with the above-mentioned visual tracking method based on convolutional neural networks regression model.
The third aspect of the present invention, it is proposed that a kind of processing unit, including
Processor, it is adapted for carrying out each bar program;And
Storage device, suitable for storing a plurality of program;
Described program is suitable to be loaded by processor and performed to realize:
The above-mentioned visual tracking method based on convolutional neural networks regression model.
Beneficial effects of the present invention:
(1) it is of the invention based on convolutional neural networks regression model, by considering the component information in image block, Ke Yicong Component locations directly return out target location, it is achieved thereby that the target following of robustness;
(2) present invention is modeled using the feature of whole image block, it is contemplated that correlation and each part between part Importance, so as to improve tracking effect.
Brief description of the drawings
Fig. 1 is the signal of the visual tracking method flow based on convolutional neural networks regression model of one embodiment of the invention Figure.
Embodiment
The preferred embodiment of the present invention described with reference to the accompanying drawings.It will be apparent to a skilled person that this A little embodiments are used only for explaining the technical principle of the present invention, it is not intended that limit the scope of the invention.
The purpose of the present invention is to carry out the tracking of the vision of robustness using convolutional neural networks regressive object position.This hair Bright to have considered part contextual information and part reliability, in one end to end framework realizing part arrives target Return.
The method of the present invention realizes that the regression model of robustness is used by convolutional neural networks in an end-to-end framework In part to goal regression.The model proposed can not only utilize part contextual information to keep overall corresponding space layout Structure, and learnt part reliability, to emphasize importance of the different parts for regressive object.Methods described includes four Individual part:(1) image block is sampled in initial frame, builds sample labeling information;(2) optimization object function is established;(3) use with Machine gradient descent method optimizes to object function until model is restrained;(4) estimated in subsequent frames using the model trained Target is most stateful.Wherein 2) in establish optimization object function and can build in advance, so in the application process of reality, side Method step can think (1), (3), (4).
As shown in figure 1, the visual tracking method based on convolutional neural networks regression model of one embodiment of the invention, bag Include following steps:
Step S1, in the initial frame of vision tracking, the sampling of image block is carried out according to given target to be tracked, to sampling Obtained each image block carries out implicit division, after computation partition the offset information of each part and the target to be tracked and Duplication;Offset information herein is the skew letter after division between the central point of each part and the central point of the target to be tracked Breath, Duplication herein is each part after division and the Duplication of the target to be tracked.
Step S2, based on image block and the component information for sampling to obtain in step S1, it is utilized respectively stochastic gradient descent Loss function derivation based on convolutional neural networks regression model of the method to building in advance, the renewal convolutional neural networks of iteration Parameter value, until loss function reaches the default condition of convergence, trained based on convolutional neural networks regression model;
Step S3, in follow-up each frame of vision tracking, the position that is occurred based on the target to be tracked in previous frame Region of search is constructed, image block sampling is carried out to region of search, and the image block of sampling is carried out after implicitly dividing, by described What is trained obtains the position of target to be tracked described in present frame based on convolutional neural networks regression model;
Wherein,
Described to be based on convolutional neural networks regression model, its loss function loses L by returningreg, differentiate loss LdisAnd The regular terms of convolutional neural networks weight parameter is formed.
Implicit division i.e. by an image block it is implicit be divided into multiple parts, record these parts in this image block Positional information.Image block or original expression after implicit division, simply have recorded component information existing for the inside.In contrast The display division answered refers to really disassembling image block not to have existed for multiple small image of component blocks, initial image block.
The region of search constructed in the step S3 of the present embodiment can be rectangle, the center of this rectangle be it is described treat with The center for the position that track target occurs in previous frame, height, the width of rectangle are that the target to be tracked is right in previous frame respectively High, wide 2 times for the rectangle frame answered.Certainly, the region of search constructed can also be other shapes, its area size covered Other sizes are can also be, as long as it can cover the band of position that the target to be tracked occurs in previous frame in the current frame Corresponding region, and reserve certain nargin scope.
In the step S3 of the present embodiment, image block sampling is carried out to region of search, and the image block of sampling is carried out implicitly Division, Ke Yiwei:Carry out sampling N1 image block (N1=in one embodiment in this big rectangular area in region of search 100) and to sampling to image block implicitly divided.
Convolutional neural networks regression model is based on described in the present embodiment, the centre bit of target to be tracked described in its t frame Put coordinateCalculating function such as formula (1) shown in:
Wherein, xk,tAnd yk,tHorizontal stroke, the ordinate of k-th of part respectively in the region of search of t frames;Δxk,t、Δyk,tWith wk,tRepresent that k-th of part is relative to target's center to be tracked in the current t region of search of convolutional neural networks prediction respectively The transverse and longitudinal coordinate displacement of position and the discriminant value of the part;ZwIt is weight wk,tNormalization factor;K is in the middle part of the region of search of t frames The sum of part.
Convolutional neural networks regression model is based on described in the present embodiment, its object function is to minimize a loss letter Number, for the image block of an input, its loss function L loses L by returningreg, differentiate loss LdisAnd convolutional neural networks The regular terms of weight parameter is formed.
Shown in loss function L such as formula (2):
Wherein, LregLost to return, LdisLost to differentiate, the weight parameter of Θ convolutional neural networks, λ1And λ2It is default Regulatory factor.
Return loss LregAs shown in formula (3):
Wherein, Δ xkWith Δ ykRespectively convolutional neural networks input picture block J k-th of part is predicted its with treating Track target's center's position horizontal stroke, the Center Offset of ordinate.
Differentiate loss LdisAs shown in formula (4):
Wherein, wkRepresent the discriminant value that convolutional neural networks are predicted input picture block J k-th of part;l∈{0,1} It is label, represents two part k and k ' correlation, l=0 represents that part k has similar discriminant value with k ', and l=1 represents part K has the discriminant value bigger than part k ';For image block and the Duplication of target to be tracked.
In the present embodiment, the constraints based on convolutional neural networks regression model is:
During l=0, using distance restraint, its distance restraint function LsAs shown in formula (5):
During l=1, constrained using contrast, its comparative constraint function LcAs shown in formula (6):
Lc(wk,wk′)=s (τ-(wk-wk′)) (6)
Wherein, s (x)=max (0, x) is non-saturation nonlinearity function;τ is predetermined threshold value.
In the present embodiment, the Duplication of image block and target to be trackedCalculating such as formula (7) shown in:
Wherein, BOXPATCHFor rectangle frame corresponding to image block;BOXGTFor rectangle frame corresponding to target to be tracked.Herein Image block, part, rectangle frame corresponding to target to be followed the trail of are the minimum enclosed rectangle of corresponding content.
In the present embodiment, step S1 can be split as two steps:
Step S11, the sampling of image block is carried out according to given target to be tracked.Target to be tracked can be any herein Object interested, including people, vehicle, animal, commodity etc..
It is that average constructs Gaussian function with the given state J=(x, y, s) of the target to be tracked, and is based on the Gaussian function Number carries out the sampling of image block on different yardsticks and position;Wherein, x, y, s represent the central point of target to be tracked respectively Horizontal, ordinate and yardstick.
Step S12, carries out implicit division to the obtained each image block of sampling, after computation partition each part with it is described The offset information and Duplication of target to be tracked.
Implicit dividing mode can be any type herein, such as be divided into several formed objects or different size of Part, can have between part it is overlapping can also be not overlapping.Image block is divided into 9 area equations, no in present invention experiment Overlapping part.
Part and the off-centring of the target to be tracked are offset including abscissaOffset with ordinateIts Calculate respectively as shown in formula (8), (9):
Wherein, xkAnd ykHorizontal stroke, the ordinate of respectively k-th part centre position;xGTAnd yGTTarget respectively to be tracked Horizontal stroke, the ordinate of center.
Part and the Duplication s of the target to be trackedkAs shown in formula (10)
Wherein, ROIkFor rectangle frame, ROI corresponding to a part of image blockGTRepresent that target to be tracked corresponds to rectangle frame, ∩ represents the common factor between image block, and ∪ represents the union between image block, and area () expression obtains the area of given image block.
Hardware, the software mould of computing device can be used with reference to the step of method that the embodiments described herein describes Block, or the two combination are implemented.Software module can be placed in random access memory (RAM), internal memory, read-only storage (ROM), Institute is public in electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technical field In any other form of storage medium known.
A kind of storage device of embodiment of the present invention, wherein being stored with a plurality of program, described program is suitable to by processor Load and perform to realize the above-mentioned visual tracking method based on convolutional neural networks regression model.
A kind of processing unit of embodiment of the present invention, including processor, storage device;Processor, it is adapted for carrying out each bar Program;Storage device, suitable for storing a plurality of program;Described program is suitable to be loaded by processor and performed to realize:Above-mentioned base In the visual tracking method of convolutional neural networks regression model.
Person of ordinary skill in the field can be understood that, for convenience and simplicity of description, foregoing description The specific work process of device and relevant explanation, may be referred to the corresponding process in preceding method embodiment, will not be repeated here.
Those skilled in the art should be able to recognize that, the mould of each example described with reference to the embodiments described herein Block, method and step, it can be realized with electronic hardware, computer software or the combination of the two, in order to clearly demonstrate electronics The interchangeability of hardware and software, the composition and step of each example are generally described according to function in the above description Suddenly.These functions are performed with electronic hardware or software mode actually, and the application-specific and design depending on technical scheme are about Beam condition.Those skilled in the art can realize described function using distinct methods to each specific application, but It is this realization it is not considered that beyond the scope of this invention.
Term " comprising " or any other like term are intended to including for nonexcludability, so that including a system Process, the method for row key element not only include those key elements, but also other key elements including being not expressly set out, or also include The intrinsic key element of these processes, method.
So far, combined preferred embodiment shown in the drawings describes technical scheme, still, this area Technical staff is it is easily understood that protection scope of the present invention is expressly not limited to these embodiments.Without departing from this On the premise of the principle of invention, those skilled in the art can make equivalent change or replacement to correlation technique feature, these Technical scheme after changing or replacing it is fallen within protection scope of the present invention.

Claims (11)

1. a kind of visual tracking method based on convolutional neural networks regression model, it is characterised in that comprise the following steps:
Step S1, in the initial frame of vision tracking, the sampling of image block is carried out according to given target to be tracked, sampling is obtained Each image block carry out implicit division, the offset information of each part and the target to be tracked and overlapping after computation partition Rate;
Step S2, based on image block and the component information for sampling to obtain in step S1, it is utilized respectively stochastic gradient descent method pair The loss function derivation based on convolutional neural networks regression model built in advance, the parameter of the renewal convolutional neural networks of iteration Value, until loss function reaches the default condition of convergence, trained based on convolutional neural networks regression model;
Step S3, in follow-up each frame of vision tracking, the placement configurations that are occurred based on the target to be tracked in previous frame Region of search, image block sampling is carried out to region of search, and the image block of sampling is carried out after implicitly dividing, pass through the training Good obtains the position of target to be tracked described in present frame based on convolutional neural networks regression model;
Wherein,
Described to be based on convolutional neural networks regression model, its loss function loses L by returningreg, differentiate loss LdisAnd convolution The regular terms of neutral net weight parameter is formed.
2. visual tracking method according to claim 1, it is characterised in that it is described build in advance based on convolutional Neural net Network regression model, its loss function L are:
<mrow> <mi>L</mi> <mo>=</mo> <msub> <mi>L</mi> <mrow> <mi>r</mi> <mi>e</mi> <mi>g</mi> </mrow> </msub> <mo>+</mo> <msub> <mi>&amp;lambda;</mi> <mn>1</mn> </msub> <msub> <mi>L</mi> <mrow> <mi>d</mi> <mi>i</mi> <mi>s</mi> </mrow> </msub> <mo>+</mo> <msub> <mi>&amp;lambda;</mi> <mn>1</mn> </msub> <msubsup> <mrow> <mo>|</mo> <mrow> <mo>|</mo> <mi>&amp;Theta;</mi> <mo>|</mo> </mrow> <mo>|</mo> </mrow> <mi>F</mi> <mn>2</mn> </msubsup> </mrow>
Wherein, LregLost to return, LdisFor differentiate lose, Θ be convolutional neural networks weight parameter, λ1And λ2It is default Regulatory factor.
3. visual tracking method according to claim 2, it is characterised in that return loss LregFor:
<mrow> <msub> <mi>L</mi> <mrow> <mi>r</mi> <mi>e</mi> <mi>g</mi> </mrow> </msub> <mo>=</mo> <munder> <mo>&amp;Sigma;</mo> <mi>k</mi> </munder> <msubsup> <mn>1</mn> <mi>k</mi> <mrow> <mi>t</mi> <mi>a</mi> <mi>r</mi> </mrow> </msubsup> <mo>&amp;lsqb;</mo> <msup> <mrow> <mo>(</mo> <msub> <mi>&amp;Delta;y</mi> <mi>k</mi> </msub> <mo>-</mo> <mi>&amp;Delta;</mi> <msub> <mover> <mi>y</mi> <mo>^</mo> </mover> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>+</mo> <msup> <mrow> <mo>(</mo> <msub> <mi>&amp;Delta;x</mi> <mi>k</mi> </msub> <mo>-</mo> <mi>&amp;Delta;</mi> <msub> <mover> <mi>x</mi> <mo>^</mo> </mover> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>&amp;rsqb;</mo> </mrow>
Wherein, Δ xkWith Δ ykRespectively convolutional neural networks input picture block J k-th of part is predicted its with it is to be tracked Target's center position is horizontal, the Center Offset of ordinate.
4. visual tracking method according to claim 3, it is characterised in that differentiate loss LdisFor:
<mrow> <msub> <mi>L</mi> <mrow> <mi>d</mi> <mi>i</mi> <mi>s</mi> </mrow> </msub> <mo>=</mo> <munder> <mo>&amp;Sigma;</mo> <mrow> <mi>k</mi> <mo>,</mo> <msup> <mi>k</mi> <mo>&amp;prime;</mo> </msup> <mo>,</mo> <mi>k</mi> <mo>&amp;NotEqual;</mo> <msup> <mi>k</mi> <mo>&amp;prime;</mo> </msup> </mrow> </munder> <mo>{</mo> <msub> <mi>lL</mi> <mi>c</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>w</mi> <mi>k</mi> </msub> <mo>,</mo> <msub> <mi>w</mi> <msup> <mi>k</mi> <mo>&amp;prime;</mo> </msup> </msub> <mo>)</mo> </mrow> <mo>+</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>l</mi> <mo>)</mo> </mrow> <msub> <mi>L</mi> <mi>s</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>w</mi> <mi>k</mi> </msub> <mo>,</mo> <msub> <mi>w</mi> <msup> <mi>k</mi> <mo>&amp;prime;</mo> </msup> </msub> <mo>)</mo> </mrow> <mo>}</mo> <mo>+</mo> <msup> <mrow> <mo>(</mo> <mo>(</mo> <mrow> <msub> <mi>&amp;Sigma;</mi> <mi>k</mi> </msub> <msub> <mi>w</mi> <mi>k</mi> </msub> </mrow> <mo>)</mo> <mo>-</mo> <mover> <mi>w</mi> <mo>^</mo> </mover> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow>
Wherein, wkRepresent the discriminant value that convolutional neural networks are predicted input picture block J k-th of part;L ∈ { 0,1 } are marks Label, two part k and k ' correlation is represented, l=0 represents that part k has similar discriminant value with k ', and l=1 represents that part k has The discriminant value bigger than part k ';For image block and the Duplication of target to be tracked.
5. visual tracking method according to claim 4, it is characterised in that it is described build in advance based on convolutional Neural net Network regression model, its constraints are:
During l=0, using distance restraint, its distance restraint function LsFormula be:
<mrow> <msub> <mi>L</mi> <mi>s</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>w</mi> <mi>k</mi> </msub> <mo>,</mo> <msub> <mi>w</mi> <msup> <mi>k</mi> <mo>&amp;prime;</mo> </msup> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <msup> <mrow> <mo>(</mo> <msub> <mi>w</mi> <mi>k</mi> </msub> <mo>-</mo> <msub> <mi>w</mi> <msup> <mi>k</mi> <mo>&amp;prime;</mo> </msup> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>;</mo> </mrow>
During l=1, constrained using contrast, its comparative constraint function LcFormula be:
Lc(wk,wk′)=s (τ-(wk-wk′));
Wherein, s (x)=max (0, x) is non-saturation nonlinearity function;τ is predetermined threshold value.
6. visual tracking method according to claim 4, it is characterised in that the Duplication of image block and target to be tracked For:
<mrow> <mover> <mi>w</mi> <mo>^</mo> </mover> <mo>=</mo> <mfrac> <mrow> <mi>a</mi> <mi>r</mi> <mi>e</mi> <mi>a</mi> <mrow> <mo>(</mo> <msub> <mi>BOX</mi> <mrow> <mi>P</mi> <mi>A</mi> <mi>T</mi> <mi>C</mi> <mi>H</mi> </mrow> </msub> <mo>&amp;cap;</mo> <msub> <mi>BOX</mi> <mrow> <mi>G</mi> <mi>T</mi> </mrow> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <mi>a</mi> <mi>r</mi> <mi>e</mi> <mi>a</mi> <mrow> <mo>(</mo> <msub> <mi>BOX</mi> <mrow> <mi>P</mi> <mi>A</mi> <mi>T</mi> <mi>C</mi> <mi>H</mi> </mrow> </msub> <mo>&amp;cup;</mo> <msub> <mi>BOX</mi> <mrow> <mi>G</mi> <mi>T</mi> </mrow> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow>
Wherein, BOXPATCHFor rectangle frame corresponding to image block;BOXGTFor rectangle frame corresponding to target to be tracked.
7. according to the visual tracking method described in claim any one of 1-6, it is characterised in that in step S1, the basis is given Fixed target to be tracked carries out the sampling of image block, and its method is:
It is that average constructs Gaussian function with the given state J=(x, y, s) of the target to be tracked, and is existed based on the Gaussian function The sampling of image block is carried out on different yardsticks and position;
Wherein, x, y, s represent the horizontal central point of target to be tracked, ordinate and yardstick respectively.
8. visual tracking method according to claim 7, it is characterised in that each portion after computation partition described in step S1 Part and the offset information and Duplication of the target to be tracked, its method are:
Part and the off-centring of the target to be tracked are offset including abscissaOffset with ordinate
<mrow> <mi>&amp;Delta;</mi> <msub> <mover> <mi>x</mi> <mo>^</mo> </mover> <mi>k</mi> </msub> <mo>=</mo> <msub> <mi>x</mi> <mi>k</mi> </msub> <mo>-</mo> <msub> <mi>x</mi> <mrow> <mi>G</mi> <mi>T</mi> </mrow> </msub> </mrow>
<mrow> <mi>&amp;Delta;</mi> <msub> <mover> <mi>y</mi> <mo>^</mo> </mover> <mi>k</mi> </msub> <mo>=</mo> <msub> <mi>y</mi> <mi>k</mi> </msub> <mo>-</mo> <msub> <mi>y</mi> <mrow> <mi>G</mi> <mi>T</mi> </mrow> </msub> </mrow>
Wherein, xkAnd ykHorizontal stroke, the ordinate of respectively k-th part centre position;xGTAnd yGTTarget's center respectively to be tracked Horizontal stroke, the ordinate of position;
Part and the Duplication s of the target to be trackedkFor:
<mrow> <msub> <mi>s</mi> <mi>k</mi> </msub> <mo>=</mo> <mfrac> <mrow> <mi>a</mi> <mi>r</mi> <mi>e</mi> <mi>a</mi> <mrow> <mo>(</mo> <msub> <mi>ROI</mi> <mi>k</mi> </msub> <mo>&amp;cap;</mo> <msub> <mi>ROI</mi> <mrow> <mi>G</mi> <mi>T</mi> </mrow> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <mi>a</mi> <mi>r</mi> <mi>e</mi> <mi>a</mi> <mrow> <mo>(</mo> <msub> <mi>ROI</mi> <mi>k</mi> </msub> <mo>&amp;cup;</mo> <msub> <mi>ROI</mi> <mrow> <mi>G</mi> <mi>T</mi> </mrow> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow>
Wherein, ROIkFor rectangle frame, ROI corresponding to a part of image blockGTRepresent rectangle frame corresponding to target to be tracked.
9. according to the visual tracking method described in claim any one of 1-6, it is characterised in that described to be based on convolutional neural networks Regression model, the center position coordinates of target to be tracked described in its t frameCalculating function be:
<mrow> <mo>&amp;lsqb;</mo> <msubsup> <mi>y</mi> <mi>t</mi> <mo>*</mo> </msubsup> <mo>,</mo> <msubsup> <mi>x</mi> <mi>t</mi> <mo>*</mo> </msubsup> <mo>&amp;rsqb;</mo> <mo>=</mo> <mfrac> <mn>1</mn> <msub> <mi>Z</mi> <mi>w</mi> </msub> </mfrac> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>K</mi> </munderover> <msub> <mi>w</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>t</mi> </mrow> </msub> <mo>&amp;lsqb;</mo> <mrow> <mo>(</mo> <msub> <mi>y</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>t</mi> </mrow> </msub> <mo>+</mo> <msub> <mi>&amp;Delta;y</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>t</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>,</mo> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>t</mi> </mrow> </msub> <mo>+</mo> <msub> <mi>&amp;Delta;x</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>t</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>&amp;rsqb;</mo> </mrow>
Wherein, xk,tAnd yk,tHorizontal stroke, the ordinate of k-th of part respectively in the region of search of t frames;Δxk,t、Δyk,tAnd wk,tPoint Not Biao Shi convolutional neural networks prediction current t region of search in k-th of part relative to target's center position to be tracked Transverse and longitudinal coordinate displacement and the part discriminant value;ZwIt is weight wk,tNormalization factor;K is part in the region of search of t frames Sum.
10. a kind of storage device, wherein being stored with a plurality of program, it is characterised in that described program is suitable to by processor loading simultaneously Perform to realize the visual tracking method based on convolutional neural networks regression model described in claim any one of 1-9.
11. a kind of processing unit, including
Processor, it is adapted for carrying out each bar program;And
Storage device, suitable for storing a plurality of program;
Characterized in that, described program is suitable to be loaded by processor and performed to realize:
The visual tracking method based on convolutional neural networks regression model described in claim any one of 1-9.
CN201710595279.6A 2017-07-20 2017-07-20 Visual tracking method and device based on convolutional neural network regression model Active CN107527355B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710595279.6A CN107527355B (en) 2017-07-20 2017-07-20 Visual tracking method and device based on convolutional neural network regression model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710595279.6A CN107527355B (en) 2017-07-20 2017-07-20 Visual tracking method and device based on convolutional neural network regression model

Publications (2)

Publication Number Publication Date
CN107527355A true CN107527355A (en) 2017-12-29
CN107527355B CN107527355B (en) 2020-08-11

Family

ID=60749049

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710595279.6A Active CN107527355B (en) 2017-07-20 2017-07-20 Visual tracking method and device based on convolutional neural network regression model

Country Status (1)

Country Link
CN (1) CN107527355B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108416428A (en) * 2018-02-28 2018-08-17 中国计量大学 A kind of robot visual orientation method based on convolutional neural networks
CN108510523A (en) * 2018-03-16 2018-09-07 新智认知数据服务有限公司 It is a kind of to establish the model for obtaining object feature and object searching method and device
CN108805204A (en) * 2018-06-12 2018-11-13 东北大学 Electrical energy power quality disturbance analytical equipment based on deep neural network and its application method
CN109389543A (en) * 2018-09-11 2019-02-26 深圳大学 Bus operation data statistical approach, calculates equipment and storage medium at system
CN109636846A (en) * 2018-12-06 2019-04-16 重庆邮电大学 Object localization method based on circulation attention convolutional neural networks
CN109711332A (en) * 2018-12-26 2019-05-03 浙江捷尚视觉科技股份有限公司 A kind of face tracking method and application based on regression algorithm
CN109829936A (en) * 2019-01-29 2019-05-31 青岛海信网络科技股份有限公司 A kind of method and apparatus of target tracking
CN110060274A (en) * 2019-04-12 2019-07-26 北京影谱科技股份有限公司 The visual target tracking method and device of neural network based on the dense connection of depth
CN110807515A (en) * 2019-10-30 2020-02-18 北京百度网讯科技有限公司 Model generation method and device
CN112634344A (en) * 2020-12-15 2021-04-09 西安理工大学 Method for detecting center position of cold-rolled strip coil shaft hole based on machine vision
CN112861652A (en) * 2021-01-20 2021-05-28 中国科学院自动化研究所 Method and system for tracking and segmenting video target based on convolutional neural network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104573731A (en) * 2015-02-06 2015-04-29 厦门大学 Rapid target detection method based on convolutional neural network
CN105243398A (en) * 2015-09-08 2016-01-13 西安交通大学 Method of improving performance of convolutional neural network based on linear discriminant analysis criterion
CN106599805A (en) * 2016-12-01 2017-04-26 华中科技大学 Supervised data driving-based monocular video depth estimating method
CN106709936A (en) * 2016-12-14 2017-05-24 北京工业大学 Single target tracking method based on convolution neural network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104573731A (en) * 2015-02-06 2015-04-29 厦门大学 Rapid target detection method based on convolutional neural network
CN105243398A (en) * 2015-09-08 2016-01-13 西安交通大学 Method of improving performance of convolutional neural network based on linear discriminant analysis criterion
CN106599805A (en) * 2016-12-01 2017-04-26 华中科技大学 Supervised data driving-based monocular video depth estimating method
CN106709936A (en) * 2016-12-14 2017-05-24 北京工业大学 Single target tracking method based on convolution neural network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HYEONSEOB NAM等: "Learning Multi-Domain Convolational Neural networks for visual tracking", 《THE IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 *
何振军: "基于卷积神经网络的车辆检测算法研究", 《万方学位论文》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108416428A (en) * 2018-02-28 2018-08-17 中国计量大学 A kind of robot visual orientation method based on convolutional neural networks
CN108510523A (en) * 2018-03-16 2018-09-07 新智认知数据服务有限公司 It is a kind of to establish the model for obtaining object feature and object searching method and device
CN108805204A (en) * 2018-06-12 2018-11-13 东北大学 Electrical energy power quality disturbance analytical equipment based on deep neural network and its application method
CN109389543B (en) * 2018-09-11 2022-03-04 深圳大学 Bus operation data statistical method, system, computing device and storage medium
CN109389543A (en) * 2018-09-11 2019-02-26 深圳大学 Bus operation data statistical approach, calculates equipment and storage medium at system
CN109636846A (en) * 2018-12-06 2019-04-16 重庆邮电大学 Object localization method based on circulation attention convolutional neural networks
CN109636846B (en) * 2018-12-06 2022-10-11 重庆邮电大学 Target positioning method based on cyclic attention convolution neural network
CN109711332A (en) * 2018-12-26 2019-05-03 浙江捷尚视觉科技股份有限公司 A kind of face tracking method and application based on regression algorithm
CN109829936A (en) * 2019-01-29 2019-05-31 青岛海信网络科技股份有限公司 A kind of method and apparatus of target tracking
CN109829936B (en) * 2019-01-29 2021-12-24 青岛海信网络科技股份有限公司 Target tracking method and device
CN110060274A (en) * 2019-04-12 2019-07-26 北京影谱科技股份有限公司 The visual target tracking method and device of neural network based on the dense connection of depth
CN110807515A (en) * 2019-10-30 2020-02-18 北京百度网讯科技有限公司 Model generation method and device
CN110807515B (en) * 2019-10-30 2023-04-28 北京百度网讯科技有限公司 Model generation method and device
CN112634344A (en) * 2020-12-15 2021-04-09 西安理工大学 Method for detecting center position of cold-rolled strip coil shaft hole based on machine vision
CN112861652A (en) * 2021-01-20 2021-05-28 中国科学院自动化研究所 Method and system for tracking and segmenting video target based on convolutional neural network

Also Published As

Publication number Publication date
CN107527355B (en) 2020-08-11

Similar Documents

Publication Publication Date Title
CN107527355A (en) Visual tracking method, device based on convolutional neural networks regression model
JP6709283B2 (en) Detection and analysis of moving vehicles using low resolution remote sensing images
CN108022012A (en) Vehicle location Forecasting Methodology based on deep learning
CN107862705A (en) A kind of unmanned plane small target detecting method based on motion feature and deep learning feature
CN105760849B (en) Target object behavioral data acquisition methods and device based on video
CN106845351A (en) It is a kind of for Activity recognition method of the video based on two-way length mnemon in short-term
CN104915970A (en) Multi-target tracking method based on track association
CN111626184B (en) Crowd density estimation method and system
CN107301369A (en) Road traffic congestion analysis method based on Aerial Images
CN108345875A (en) Wheeled region detection model training method, detection method and device
Li et al. Pedestrian detection based on deep learning model
CN104156982B (en) Motion target tracking method and device
CN107194366A (en) The Activity recognition method of son is described based on dense track covariance
CN115699102A (en) Tracking multiple objects in a video stream using occlusion aware single object tracking
CN107832716A (en) Method for detecting abnormality based on active-passive Gauss on-line study
CN111126515B (en) Model training method based on artificial intelligence and related device
CN105844667A (en) Structural target tracking method of compact color coding
Liu et al. A novel facial mask detection using fast-yolo algorithm
Firouznia et al. Chaotic particle filter for visual object tracking
Wang et al. Collaborative 3d object detection for autonomous vehicles via learnable communications
CN112819889B (en) Method and device for determining position information, storage medium and electronic device
Balasubramaniam et al. R-TOSS: A framework for real-time object detection using semi-structured pruning
CN114386691A (en) Occupant damage prediction method and device based on stress posture prediction
CN106204639A (en) Based on frequency domain regression model target tracking method, system and senior drive assist system
CN106407975A (en) Multi-dimensional layered object detection method based on space-spectrum constraint

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant