CN108549928B - Continuous movement-based visual tracking method and device under deep reinforcement learning guidance - Google Patents

Continuous movement-based visual tracking method and device under deep reinforcement learning guidance Download PDF

Info

Publication number
CN108549928B
CN108549928B CN201810226092.3A CN201810226092A CN108549928B CN 108549928 B CN108549928 B CN 108549928B CN 201810226092 A CN201810226092 A CN 201810226092A CN 108549928 B CN108549928 B CN 108549928B
Authority
CN
China
Prior art keywords
action
network
actions
stop
update
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810226092.3A
Other languages
Chinese (zh)
Other versions
CN108549928A (en
Inventor
鲁继文
周杰
任亮亮
袁鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201810226092.3A priority Critical patent/CN108549928B/en
Publication of CN108549928A publication Critical patent/CN108549928A/en
Application granted granted Critical
Publication of CN108549928B publication Critical patent/CN108549928B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/277Analysis of motion involving stochastic approaches, e.g. using Kalman filters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a visual tracking method and a visual tracking device based on continuous movement under the guidance of deep reinforcement learning, wherein the method comprises the following steps: pre-training a prediction network; generating a plurality of actions according to the prediction network and obtaining corresponding rewards; a Q value for each of a plurality of actions is obtained while updating a network that predicts and generates the actions. The method can continuously and cumulatively adjust the target frame of the object and dynamically adjust the appearance characteristics and the model of the target object, thereby greatly improving the robustness.

Description

Continuous movement-based visual tracking method and device under deep reinforcement learning guidance
Technical Field
The invention relates to the technical field of visual tracking, in particular to a visual tracking method and device based on continuous movement under the guidance of deep reinforcement learning.
Background
Visual object tracking is a fundamental problem in computer vision, and is widely applied in the fields of visual monitoring, robot control, human-computer interaction, advanced auxiliary driving systems and the like. A number of visual tracking methods have been proposed over the past decades, but the problems of visual tracking remain very challenging in unrestricted natural environments due to deformations, sudden movements, occlusion, and illumination changes.
The purpose of the visual tracking problem is to determine the position of objects in the video based on the object information of the first frame only. The visual tracking methods with the best effect at present are mainly divided into two types: a correlation filtering based method and a deep learning based method. Wherein, the correlation filter is designed based on the correlation filtering method to generate the peak value of the correlation filtering of the target object in each frame. This method does not require multiple samples of the appearance of the object. Based on the basic framework MOSSE (Minimum Output Sum of squared Error filter) method, a number of methods such as CFTs and DSST (discrete Scale Space Tracker, DSST algorithm) are proposed to apply color attributes and solve Scale problems. The method tracking method based on deep learning adopts a deep convolutional neural network as a classifier, and selects the most probable position from a plurality of candidate boxes. Representative deep learning-based methods MDNet, FCNT, and STCT all employ inefficient search techniques like sliding windows and iterative sampling. In recent years, some visual tracking methods for making decisions by reinforcement learning have been proposed. For example, ADNet adopts the policy gradient method to decide the size and displacement of the target object. However, most of the existing methods update the depth model on line through sampling, and are easily affected by large deformation and sudden motion, so that the accuracy is reduced, and the problem needs to be solved.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, an object of the present invention is to provide a continuous-motion-based visual tracking method under deep reinforcement learning guidance, which can greatly improve robustness.
The invention also aims to provide a visual tracking device based on continuous movement under the guidance of deep reinforcement learning.
In order to achieve the above object, an embodiment of the present invention provides a continuous-motion-based visual tracking method under deep reinforcement learning guidance, including the following steps: pre-training a prediction network; generating a plurality of actions according to the prediction network and obtaining corresponding rewards; a Q value for each of a plurality of actions is obtained while updating a network that predicts and generates the actions.
According to the visual tracking method based on continuous movement under the guidance of the deep reinforcement learning, the multiple actions can be generated according to the pre-training prediction network, the corresponding rewards are obtained, the Q value of each action of the multiple actions is obtained, the network of the predicted and generated actions is updated simultaneously, the visual tracking problem is modeled into a continuous and cumulative movement problem, the robustness on the appearance change of the tracked target caused by complex background and deformation is higher, and the target drift caused by large deformation and quick movement is relieved to a certain extent.
In addition, the visual tracking method based on continuous movement under the guidance of deep reinforcement learning according to the above embodiment of the present invention may further have the following additional technical features:
further, in one embodiment of the present invention, the objective function of the prediction network is:
Figure BDA0001601384400000021
wherein, Δ X and Δ y are both scale-invariant transformations of the target frame between two frames, Δ w and Δ h are both width and height variations in logarithmic space, XcurrCurrent frame position X coordinate, XprevIs the x coordinate, W, of the previous frame positionprevIs the current frame width, HprevFor the current frame height, WcurrIs the next frame width, WprevIs the next frame high.
Further, in an embodiment of the present invention, the generating a plurality of actions and obtaining corresponding rewards according to the prediction network further includes: generating continuous action, stopping and updating action, stopping and ignoring action and restarting action through a deep neural network; and obtaining the corresponding reward of each action according to the tracking effect.
Further, in one embodiment of the present invention, wherein,
for the continuation action, use It kAs an input, ft-1 *As hidden layer characteristics:
lt,k=lt,k-1+,
wherein lt,kFor adjusted position, /)t,k-1Is the initial position, t is time, k is the number of iteration steps, is the offset;
for the stop-and-update action, stopping iteration and updating the characteristics of the target and the parameters of the predicted network:
Figure BDA0001601384400000022
wherein f ist *Is characterized by ρ being a smoothing coefficient, ftIs characterized in that the method is characterized in that,
Figure BDA0001601384400000023
characteristic of the last moment, θtAs a network parameter, θt-1Is a network parameter, mu is a learning speed,
Figure BDA0001601384400000031
q (s, a,) is the Q function, s is the turntable, a is the motion, offset, as desired;
for the stop-and-ignore action, starting to update a next frame and using the target feature at the previous moment and the parameters of the prediction network;
for the restart action, resampling the initial box, wherein the box with the highest Q value is selected as a new initial box by randomly sampling around the current object:
Figure BDA0001601384400000032
wherein lt-1,0The initial position is stop, update is update;
and, the reward function for the continuation action is:
Figure BDA0001601384400000033
wherein r ist,kΔ IoU is the amount of change IoU for the current prize value, which is the threshold;
the reward functions for the stop-and-update action and the stop-and-ignore action are:
Figure BDA0001601384400000034
wherein r ist,KtFor the prize value, KtFor the final number of iteration steps, g is a function of IoU,
Figure BDA0001601384400000035
as true position,/tIs the output position;
the reward function for the resume action is:
Figure BDA0001601384400000036
further, in one embodiment of the present invention, the Q value of the continuation action is calculated by the following formula:
Figure BDA0001601384400000037
wherein γ is an equilibrium coefficient.
The Q value calculation formulas of the stop-and-update action, the stop-and-ignore action, and the restart action are as follows:
Q(s,a|t,k)=rt,Kt+γrt+1,kt+1+…。
in order to achieve the above object, another embodiment of the present invention provides a continuous-motion-based visual tracking apparatus under deep reinforcement learning guidance, including: the pre-training module is used for pre-training the prediction network; the generating module is used for generating a plurality of actions according to the prediction network and obtaining corresponding rewards; an acquisition module to acquire a Q value for each of the plurality of actions while updating the network of predicted and generated actions.
The visual tracking device based on continuous movement under the guidance of the deep reinforcement learning of the embodiment of the invention can generate a plurality of actions and obtain corresponding rewards according to the pre-training prediction network, and obtain the Q value of each action of the plurality of actions, thereby updating the network of the predicted and generated actions simultaneously, modeling the visual tracking problem as a continuous and cumulative movement problem, having stronger robustness to the appearance change of the tracked target caused by complex background and deformation, and relieving the target drift caused by large deformation and quick movement to a certain extent.
In addition, the visual tracking device based on continuous movement under the guidance of deep reinforcement learning according to the above embodiment of the present invention may further have the following additional technical features:
further, in one embodiment of the present invention, the objective function of the prediction network is:
Figure BDA0001601384400000041
wherein, Δ X and Δ y are both scale-invariant transformations of the target frame between two frames, Δ w and Δ h are both width and height variations in logarithmic space, XcurrCurrent frame position X coordinate, XprevIs the x coordinate, W, of the previous frame positionprevIs the current frame width, HprevFor the current frame height, WcurrIs the next frame width, WprevIs the next frame high.
Further, in an embodiment of the present invention, the generating module further includes: the generating unit is used for generating continuous action, stopping and updating action, stopping and ignoring action and restarting action through the deep neural network; and the acquisition unit is used for obtaining the reward corresponding to each action according to the tracking effect.
Further, in one embodiment of the present invention, wherein,
for the continuation action, use It kAs an input, ft-1 *As hidden layer characteristics:
lt,k=lt,k-1+,
wherein lt,kFor adjusted position, /)t,k-1Is the initial position, t is time, k is the number of iteration steps, is the offset;
for the stop-and-update action, stopping iteration and updating the characteristics of the target and the parameters of the predicted network:
Figure BDA0001601384400000042
wherein f ist *Is characterized by ρ being a smoothing coefficient, ftIs characterized in that the method is characterized in that,
Figure BDA0001601384400000051
characteristic of the last moment, θtAs a network parameter, θt-1Is a network parameter, mu is a learning speed,
Figure BDA0001601384400000052
q (s, a,) is the Q function, s is the turntable, a is the motion, offset, as desired;
for the stop-and-ignore action, starting to update a next frame and using the target feature at the previous moment and the parameters of the prediction network;
for the restart action, resampling the initial box, wherein the box with the highest Q value is selected as a new initial box by randomly sampling around the current object:
Figure BDA0001601384400000053
wherein lt-1,0The initial position is stop, update is update;
and, the reward function for the continuation action is:
Figure BDA0001601384400000054
wherein r ist,kΔ IoU is the amount of change IoU for the current prize value, which is the threshold;
the reward functions for the stop-and-update action and the stop-and-ignore action are:
Figure BDA0001601384400000055
wherein r ist,KtFor the prize value, KtFor the final number of iteration steps, g is a function of IoU,
Figure BDA0001601384400000056
as true position,/tIs the output position;
the reward function for the resume action is:
Figure BDA0001601384400000057
further, in one embodiment of the present invention,
the Q value calculation formula of the continuous action is as follows:
Figure BDA0001601384400000058
wherein γ is an equilibrium coefficient.
The Q value calculation formulas of the stop-and-update action, the stop-and-ignore action, and the restart action are as follows:
Q(s,a|t,k)=rt,Kt+γrt+1,kt+1+…。
additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flow chart of a continuous motion based visual tracking method under deep reinforcement learning guidance according to an embodiment of the present invention;
FIG. 2 is a flow diagram of a method for continuous motion based visual tracking guided by deep reinforcement learning according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of optimizing an evaluation network and generating an action network according to one embodiment of the invention;
fig. 4 is a schematic structural diagram of a continuous-motion-based visual tracking apparatus under deep reinforcement learning guidance according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The following describes a visual tracking method and apparatus based on continuous movement under deep reinforcement learning guidance according to an embodiment of the present invention with reference to the accompanying drawings, and first, a visual tracking method based on continuous movement under deep reinforcement learning guidance according to an embodiment of the present invention will be described with reference to the accompanying drawings.
FIG. 1 is a flowchart of a continuous movement-based visual tracking method under deep reinforcement learning guidance according to an embodiment of the present invention.
As shown in fig. 1, the continuous movement-based visual tracking method under the guidance of deep reinforcement learning includes the following steps:
in step S101, the prediction network is pre-trained.
In one embodiment of the present invention, the objective function of the prediction network is:
Figure BDA0001601384400000061
wherein, Δ X and Δ y are both scale-invariant transformations of the target frame between two frames, Δ w and Δ h are both width and height variations in logarithmic space, XcurrCurrent frame position X coordinate, XprevIs the x coordinate, W, of the previous frame positionprevIs the current frame width, HprevFor the current frame height, WcurrIs the next frame width, WprevIs the next frame high.
It will be appreciated that, given the initial box of the t-th frame, shown in connection with fig. 1 and 2, the depth features are first extracted for that location and the current feature is combined with the features of the target. Four actions (continue, stop and update, stop and ignore, and restart) are then generated using one prediction network and the action generation network to adjust the position and shape of the target box. For action "continue", continuously adjusting the position of the target frame; for the action "stop and update", stopping the iteration and updating the features of the target and the model parameters of the prediction network; for the action "stop and ignore", a step of skipping the update is performed; for the action "restart," the target may have been lost and the initial box needs to be resampled. Finally, the deep evaluation network is used to estimate the Q value of the current action and to update the parameters of the prediction network and the generation action network.
Specifically, the embodiment of the present invention may pre-train the prediction network, and the target function of the prediction network is defined as follows:
Figure BDA0001601384400000071
where Δ x and Δ y in the formula represent scale-invariant transformations of the target box between two frames, and Δ w and Δ h represent wide and high variations in log space. As shown in fig. 1, the prediction network of the embodiment of the present invention uses three convolutional layers to extract features of the target and candidate regions, and then the features are concatenated and input to two fully-connected layers to generate parameters for estimating the position and scale change. Therefore, through the objective function, the embodiment of the invention can train an end-to-end deep neural network to directly predict the change of the position and the shape.
In step S102, a plurality of actions are generated according to the prediction network and corresponding rewards are obtained.
Further, in an embodiment of the present invention, generating a plurality of actions and obtaining corresponding rewards according to the prediction network further comprises: generating continuous action, stopping and updating action, stopping and ignoring action and restarting action through a deep neural network; and obtaining the corresponding reward of each action according to the tracking effect.
Further, in one embodiment of the present invention, wherein,
for the continuation action, use It kAs an input, ft-1 *As hidden layer characteristics:
lt,k=lt,k-1+,
wherein lt,kFor adjusted position, /)t,k-1Is the initial position, t is time, k is the number of iteration steps, is the offset;
for the stop-and-update action, stopping iteration and updating the characteristics of the target and the parameters of the predicted network:
Figure BDA0001601384400000081
wherein f ist *Is characterized by ρ being a smoothing coefficient, ftIs characterized in that the method is characterized in that,
Figure BDA0001601384400000082
characteristic of the last moment, θtAs a network parameter, θt-1Is a network parameter, mu is a learning speed,
Figure BDA0001601384400000083
q (s, a,) is the Q function, s is the turntable, a is the motion, offset, as desired;
for stop and ignore actions, start updating the next frame and use the target feature at the last moment and the parameters of the prediction network;
for the restart action, the initial box is resampled, wherein the box with the highest Q value is selected as the new initial box by randomly sampling around the current object:
Figure BDA0001601384400000084
wherein lt-1,0The initial position is stop, update is update;
and, the reward function for the continuation action is:
Figure BDA0001601384400000085
wherein r ist,kΔ IoU is the amount of change IoU for the current prize value, which is the threshold;
the reward functions for the stop-and-update action and the stop-and-ignore action are:
Figure BDA0001601384400000086
wherein r ist,KtFor the prize value, KtFor the final number of iteration steps, g is a function of IoU,
Figure BDA0001601384400000087
as true position,/tIs the output position;
the reward function for the resume action is:
Figure BDA0001601384400000088
it will be appreciated that embodiments of the invention may generate a series of actions and receive corresponding rewards based on the predicted network. Embodiments of the invention may use a deep neural network to generate four actions: continue, stop and update, stop and ignore, and restart. For action "continue", embodiments of the invention may use It kAs an input, ft-1 *As hidden layer characteristics:
lt,k=lt,k-1+,
for the action "stop and update", embodiments of the invention may stop the iteration and update the features of the target and the parameters of the prediction network:
Figure BDA0001601384400000091
for the action "stop and ignore", the next frame is started to be updated and the target feature at the last moment and the parameters of the predicted network are used. For the action "restart," embodiments of the present invention resample the initial box, by randomly sampling around the current object, selecting the box with the highest Q value as the new initial box:
Figure BDA0001601384400000092
the embodiment of the invention also defines corresponding rewards for each action according to the tracking effect. For the action "continue", the reward function is defined as:
Figure BDA0001601384400000093
wherein, DeltaIoUThe following formula:
Figure BDA0001601384400000094
for the action "stop and update" and the action "stop and ignore", the reward function may be defined as:
Figure BDA0001601384400000095
for the action "resume," the reward function may be defined as:
Figure BDA0001601384400000096
in step S103, the Q value of each of the plurality of actions is acquired, while updating the network of predicted and generated actions.
In one embodiment of the present invention, the Q value calculation formula of the continuous action is:
Figure BDA0001601384400000101
wherein γ is an equilibrium coefficient.
The Q value calculation formulas of the stop-and-update action, the stop-and-ignore action, and the restart action are as follows:
Q(s,a|t,k)=rt,Kt+γrt+1,kt+1+…。
it will be appreciated that embodiments of the present invention, as shown in FIG. 3, can calculate the Q value for each action and optimize the evaluation network φ-And generating a network of actions theta, embodiments of the present invention may use a deep evaluation network to predict the Q value of a current action and update model parameters of the predicted network. For the action "continue", the Q value is calculated as follows:
Figure BDA0001601384400000102
for the other three actions "continue", "stop and update", "stop and ignore, and restart", the Q value is calculated as follows:
Q(s,a|t,k)=rt,Kt+γrt+1,kt+1+…,
therefore, the embodiment of the present invention can express the optimization problem of the evaluation network as the following formula:
Figure BDA0001601384400000103
wherein phi is-Representing the target network, which has the same structure as phi but is in a slowly updated state.
The embodiment of the invention updates and evaluates the parameters of the network according to the following formula:
Figure BDA0001601384400000104
the embodiment of the invention can express the optimization problem of generating the action network as the following formula:
Figure BDA0001601384400000105
wherein the content of the first and second substances,
Figure BDA0001601384400000106
finally, the parameter θ generating the action network is updated as follows:
Figure BDA0001601384400000107
according to the visual tracking method based on continuous movement under the guidance of the deep reinforcement learning, which is provided by the embodiment of the invention, a plurality of actions can be generated according to a pre-training prediction network, corresponding rewards are obtained, and the Q value of each action of the plurality of actions is obtained, so that the network of the predicted and generated actions is updated simultaneously, the visual tracking problem is modeled into a continuous and cumulative movement problem, the robustness on the appearance change of a tracked target caused by complex background and deformation is stronger, and the target drift caused by large deformation and quick movement is relieved to a certain extent.
Next, a visual tracking apparatus based on continuous movement under guidance of deep reinforcement learning according to an embodiment of the present invention will be described with reference to the drawings.
Fig. 4 is a schematic structural diagram of a continuous movement-based visual tracking apparatus under the guidance of deep reinforcement learning according to an embodiment of the present invention.
As shown in fig. 4, the continuous-movement-based visual tracking apparatus 10 under the guidance of deep reinforcement learning includes: a pre-training module 100, a generation module 200 and an acquisition module 300.
The pre-training module 100 is used for pre-training the prediction network. The generating module 200 is used for generating a plurality of actions according to the prediction network and obtaining corresponding rewards. The acquisition module 300 is used to acquire the Q value for each of a plurality of actions while updating the network of predicted and generated actions. The apparatus 10 of the embodiment of the present invention can continuously and cumulatively adjust the target frame of the object, and dynamically adjust the appearance and model of the target object, thereby greatly improving the robustness.
Further, in one embodiment of the present invention, the objective function of the predicted network is:
Figure BDA0001601384400000111
wherein, Δ X and Δ y are both scale-invariant transformations of the target frame between two frames, Δ w and Δ h are both width and height variations in logarithmic space, XcurrCurrent frame position X coordinate, XprevIs the x coordinate, W, of the previous frame positionprevIs the current frame width, HprevFor the current frame height, WcurrIs the next frame width, WprevIs the next frame high.
Further, in an embodiment of the present invention, the generating module 200 further includes: the device comprises a generating unit and an acquiring unit. The generating unit is used for generating continuous action, stopping and updating action, stopping and ignoring action and restarting action through the deep neural network. The acquisition unit is used for obtaining the reward corresponding to each action according to the tracking effect.
Further, in one embodiment of the present invention, wherein,
for the continuation action, use It kAs an input, ft-1 *As hidden layer characteristics:
lt,k=lt,k-1+,
wherein lt,kFor adjusted position, /)t,k-1Is the initial position, t is time, k is the number of iteration steps, is the offset;
for the stop-and-update action, stopping iteration and updating the characteristics of the target and the parameters of the predicted network:
Figure BDA0001601384400000121
wherein f ist *Is characterized by ρ being a smoothing coefficient, ftIs characterized in that the method is characterized in that,
Figure BDA0001601384400000122
characteristic of the last moment, θtAs a network parameter, θt-1Is a network parameter, mu is a learning speed,
Figure BDA0001601384400000123
q (s, a,) is the Q function, s is the turntable, a is the motion, offset, as desired;
for stop and ignore actions, start updating the next frame and use the target feature at the last moment and the parameters of the prediction network;
for the restart action, the initial box is resampled, wherein the box with the highest Q value is selected as the new initial box by randomly sampling around the current object:
Figure BDA0001601384400000124
wherein lt-1,0The initial position is stop, update is update;
and, the reward function for the continuation action is:
Figure BDA0001601384400000125
wherein r ist,kΔ IoU is the amount of change IoU for the current prize value, which is the threshold;
the reward functions for the stop-and-update action and the stop-and-ignore action are:
Figure BDA0001601384400000126
wherein r ist,KtFor the prize value, KtFor the final number of iteration steps, g is a function of IoU,
Figure BDA0001601384400000127
as true position,/tIs the output position;
the reward function for the resume action is:
Figure BDA0001601384400000128
further, in one embodiment of the present invention,
the Q value calculation formula of the continuous action is as follows:
Figure BDA0001601384400000131
wherein γ is an equilibrium coefficient.
The Q value calculation formulas of the stop-and-update action, the stop-and-ignore action, and the restart action are as follows:
Q(s,a|t,k)=rt,Kt+γrt+1,kt+1+…。
it should be noted that the foregoing explanation of the embodiment of the visual tracking method based on continuous movement under the guidance of the deep reinforcement learning is also applicable to the visual tracking apparatus based on continuous movement under the guidance of the deep reinforcement learning of the embodiment, and details are not repeated here.
According to the visual tracking device based on continuous movement under the guidance of the deep reinforcement learning, which is provided by the embodiment of the invention, a plurality of actions can be generated according to a pre-training prediction network, corresponding rewards are obtained, and the Q value of each action of the plurality of actions is obtained, so that the network of the predicted and generated actions is updated simultaneously, the visual tracking problem is modeled into a continuous and cumulative movement problem, the robustness on the appearance change of a tracked target caused by complex background and deformation is stronger, and the target drift caused by large deformation and quick movement is relieved to a certain extent.
In the description of the present invention, it is to be understood that the terms "central," "longitudinal," "lateral," "length," "width," "thickness," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," "clockwise," "counterclockwise," "axial," "radial," "circumferential," and the like are used in the orientations and positional relationships indicated in the drawings for convenience in describing the invention and to simplify the description, and are not intended to indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and are therefore not to be considered limiting of the invention.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
In the present invention, unless otherwise expressly stated or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; they may be directly connected or indirectly connected through intervening media, or they may be connected internally or in any other suitable relationship, unless expressly stated otherwise. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
In the present invention, unless otherwise expressly stated or limited, the first feature "on" or "under" the second feature may be directly contacting the first and second features or indirectly contacting the first and second features through an intermediate. Also, a first feature "on," "over," and "above" a second feature may be directly or diagonally above the second feature, or may simply indicate that the first feature is at a higher level than the second feature. A first feature being "under," "below," and "beneath" a second feature may be directly under or obliquely under the first feature, or may simply mean that the first feature is at a lesser elevation than the second feature.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (6)

1. A visual tracking method based on continuous movement under the guidance of deep reinforcement learning is characterized by comprising the following steps:
pre-training a prediction network;
generating a plurality of actions according to the prediction network and obtaining corresponding rewards; and
obtaining a Q value for each of a plurality of actions while updating a network that predicts and generates the actions;
wherein the generating a plurality of actions and deriving corresponding rewards in accordance with the prediction network further comprises: generating continuous action, stopping and updating action, stopping and ignoring action and restarting action through a deep neural network; obtaining the corresponding reward of each action according to the tracking effect;
wherein the content of the first and second substances,
for the continuation action, use It kAs an input, ft-1 *As hidden layer characteristics:
lt,k=lt,k-1+,
wherein lt,kFor adjusted position, /)t,k-1Is the initial position, t is time, k is the number of iteration steps, is the offset;
for the stop-and-update action, stopping iteration and updating the characteristics of the target and the parameters of the predicted network:
Figure FDA0002510415140000011
wherein f ist *Is characterized by ρ being a smoothing coefficient, ftIs characterized in that the method is characterized in that,
Figure FDA0002510415140000012
characteristic of the last moment, θtAs a network parameter, θt-1Is a network parameter, mu is a learning speed,
Figure FDA0002510415140000013
q (s, a,) is the Q function, s is the turntable, a is the motion, offset, as desired;
for the stop-and-ignore action, starting to update a next frame and using the target feature at the previous moment and the parameters of the prediction network;
for the restart action, resampling the initial box, wherein the box with the highest Q value is selected as a new initial box by randomly sampling around the current object:
Figure FDA0002510415140000014
wherein lt-1,0The initial position is stop, update is update;
and, the reward function for the continuation action is:
Figure FDA0002510415140000015
wherein r ist,kΔ IoU is the amount of change IoU for the current prize value, which is the threshold;
the reward functions for the stop-and-update action and the stop-and-ignore action are:
Figure FDA0002510415140000021
wherein r ist,KtFor the prize value, KtFor the final number of iteration steps, g is a function of IoU,
Figure FDA0002510415140000022
as true position,/tIs the output position;
the reward function for the resume action is:
Figure FDA0002510415140000023
2. the continuous-motion-based visual tracking method under deep reinforcement learning guidance according to claim 1, wherein the objective function of the prediction network is as follows:
Figure FDA0002510415140000024
wherein, Δ x and Δ y are both scale-invariant transformations of the target frame between two frames, and Δ w and Δ h are both width and height variations in logarithmic space,XcurrCurrent frame position X coordinate, XprevIs the x coordinate, W, of the previous frame positionprevIs the current frame width, HprevFor the current frame height, WcurrIs the next frame width, WprevIs the next frame high.
3. The continuous movement-based visual tracking method under deep reinforcement learning guidance according to claim 1,
the Q value calculation formula of the continuous action is as follows:
Figure FDA0002510415140000025
wherein γ is the equilibrium coefficient;
the Q value calculation formula for stopping and updating actions, stopping and ignoring, and restarting actions is:
Q(s,a|t,k)=rt,Kt+γrt+1,kt+1+…。
4. a visual tracking device based on continuous movement under the guidance of deep reinforcement learning, which is characterized by comprising:
the pre-training module is used for pre-training the prediction network;
the generating module is used for generating a plurality of actions according to the prediction network and obtaining corresponding rewards; and
an acquisition module for acquiring a Q value for each of a plurality of actions while updating a network of predicted and generated actions;
wherein the generation module further comprises: the generating unit is used for generating continuous action, stopping and updating action, stopping and ignoring action and restarting action through the deep neural network; the acquisition unit is used for obtaining the reward corresponding to each action according to the tracking effect;
wherein the content of the first and second substances,
for the continuation action, use It kAs an input, ft-1 *As hidden layer characteristics:
lt,k=lt,k-1+,
wherein lt,kFor adjusted position, /)t,k-1Is the initial position, t is time, k is the number of iteration steps, is the offset;
for the stop-and-update action, stopping iteration and updating the characteristics of the target and the parameters of the predicted network:
Figure FDA0002510415140000031
wherein f ist *Is characterized by ρ being a smoothing coefficient, ftIs characterized in that the method is characterized in that,
Figure FDA0002510415140000032
characteristic of the last moment, θtAs a network parameter, θt-1Is a network parameter, mu is a learning speed,
Figure FDA0002510415140000033
q (s, a,) is the Q function, s is the turntable, a is the motion, offset, as desired;
for the stop-and-ignore action, starting to update a next frame and using the target feature at the previous moment and the parameters of the prediction network;
for the restart action, resampling the initial box, wherein the box with the highest Q value is selected as a new initial box by randomly sampling around the current object:
Figure FDA0002510415140000034
wherein lt-1,0The initial position is stop, update is update;
and, the reward function for the continuation action is:
Figure FDA0002510415140000035
wherein r ist,kΔ IoU is the amount of change IoU for the current prize value, which is the threshold;
the reward functions for the stop-and-update action and the stop-and-ignore action are:
Figure FDA0002510415140000036
wherein r ist,KtFor the prize value, KtFor the final number of iteration steps, g is a function of IoU,
Figure FDA0002510415140000038
as true position,/tIs the output position;
the reward function for the resume action is:
Figure FDA0002510415140000037
5. the device for continuous-motion-based visual tracking under deep reinforcement learning guidance according to claim 4, wherein the objective function of the prediction network is as follows:
Figure FDA0002510415140000041
wherein, Δ X and Δ y are both scale-invariant transformations of the target frame between two frames, Δ w and Δ h are both width and height variations in logarithmic space, XcurrCurrent frame position X coordinate, XprevIs the x coordinate, W, of the previous frame positionprevIs the current frame width, HprevFor the current frame height, WcurrIs the next frame width, WprevIs the next frame high.
6. The device for continuous movement-based visual tracking under deep reinforcement learning guidance according to claim 4,
the Q value calculation formula of the continuous action is as follows:
Figure FDA0002510415140000042
wherein γ is the equilibrium coefficient;
the Q value calculation formula for stopping and updating actions, stopping and ignoring, and restarting actions is:
Q(s,a|t,k)=rt,Kt+γrt+1,kt+1+…。
CN201810226092.3A 2018-03-19 2018-03-19 Continuous movement-based visual tracking method and device under deep reinforcement learning guidance Active CN108549928B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810226092.3A CN108549928B (en) 2018-03-19 2018-03-19 Continuous movement-based visual tracking method and device under deep reinforcement learning guidance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810226092.3A CN108549928B (en) 2018-03-19 2018-03-19 Continuous movement-based visual tracking method and device under deep reinforcement learning guidance

Publications (2)

Publication Number Publication Date
CN108549928A CN108549928A (en) 2018-09-18
CN108549928B true CN108549928B (en) 2020-09-25

Family

ID=63516573

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810226092.3A Active CN108549928B (en) 2018-03-19 2018-03-19 Continuous movement-based visual tracking method and device under deep reinforcement learning guidance

Country Status (1)

Country Link
CN (1) CN108549928B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084307B (en) * 2019-04-30 2021-06-18 东北大学 Mobile robot vision following method based on deep reinforcement learning
CN111048212B (en) * 2019-12-20 2023-04-18 华中科技大学 Network optimization method for tracking inclined-tip flexible needle path based on deep reinforcement learning
CN117409557B (en) * 2023-12-14 2024-02-20 成都格理特电子技术有限公司 Dynamic analysis-based high-temperature alarm method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106970615A (en) * 2017-03-21 2017-07-21 西北工业大学 A kind of real-time online paths planning method of deeply study
CN107066967A (en) * 2017-04-12 2017-08-18 清华大学 A kind of target-seeking method and device of active face using local observation information
CN107306207A (en) * 2017-05-31 2017-10-31 东南大学 Calculated and multiple target intensified learning service combining method with reference to Skyline
CN107450555A (en) * 2017-08-30 2017-12-08 唐开强 A kind of Hexapod Robot real-time gait planing method based on deeply study

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110279475A1 (en) * 2008-12-24 2011-11-17 Sony Computer Entertainment Inc. Image processing device and image processing method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106970615A (en) * 2017-03-21 2017-07-21 西北工业大学 A kind of real-time online paths planning method of deeply study
CN107066967A (en) * 2017-04-12 2017-08-18 清华大学 A kind of target-seeking method and device of active face using local observation information
CN107306207A (en) * 2017-05-31 2017-10-31 东南大学 Calculated and multiple target intensified learning service combining method with reference to Skyline
CN107450555A (en) * 2017-08-30 2017-12-08 唐开强 A kind of Hexapod Robot real-time gait planing method based on deeply study

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
《Action-decision networks for visual tracking with deep reinforcement learning》;Sangdoo Yun et al;《2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)》;20170726;第01卷;全文 *
《Attention-Aware Deep Reinforcement Learning for Video Face Recognition》;Yongming Rao et al;《The IEEE International Conference on Computer Vision (ICCV)》;20171029;全文 *
《Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation》;Ross Girshick et al;《2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)》;20140628;第01卷;正文第12页 附录第c部分 *
《Tracking as Online Decision-Making: Learning a Policy from Streaming Videos with Reinforcement Learning》;Supancic III et al;《Computer Vision and Pattern Recognition (cs.CV》;20170717;正文第2页 第1部分、图2 *
《基于视觉的目标检测与跟踪综述》;尹宏鹏等;《自动化学报》;20161020;第42卷(第10期);全文 *
《快速深度学习的鲁棒视觉跟踪》;戴铂等;《中国图象图形学报》;20161211;第21卷(第12期);全文 *

Also Published As

Publication number Publication date
CN108549928A (en) 2018-09-18

Similar Documents

Publication Publication Date Title
US10860926B2 (en) Meta-gradient updates for training return functions for reinforcement learning systems
CN108549928B (en) Continuous movement-based visual tracking method and device under deep reinforcement learning guidance
CN111141300A (en) Intelligent mobile platform map-free autonomous navigation method based on deep reinforcement learning
CN108446619B (en) Face key point detection method and device based on deep reinforcement learning
CN110473231B (en) Target tracking method of twin full convolution network with prejudging type learning updating strategy
CN109711401B (en) Text detection method in natural scene image based on Faster Rcnn
CN116776964A (en) Method, program product and storage medium for distributed reinforcement learning
CN111161412B (en) Three-dimensional laser mapping method and system
CN104794733A (en) Object tracking method and device
CN109447133B (en) SVR algorithm-based method for eliminating position information outliers
CN110706252B (en) Robot nuclear correlation filtering tracking algorithm under guidance of motion model
CN113168566A (en) Controlling a robot by using entropy constraints
KR20220137732A (en) Reinforcement Learning with Adaptive Return Calculation
WO2021152515A1 (en) Planning for agent control using learned hidden states
CN112507943B (en) Visual positioning navigation method, system and medium based on multitasking neural network
CN109299669B (en) Video face key point detection method and device based on double intelligent agents
CN107657627B (en) Space-time context target tracking method based on human brain memory mechanism
CN113468706A (en) Laser point cloud power transmission line lead fitting method for distribution network live working robot
CN112388628A (en) Apparatus and method for training a gaussian process regression model
CN114608585A (en) Method and device for synchronous positioning and mapping of mobile robot
CN113628246B (en) Twin network target tracking method based on 3D convolution template updating
CN112067007B (en) Map generation method, computer storage medium, and electronic device
CN117372536A (en) Laser radar and camera calibration method, system, equipment and storage medium
US20230090127A1 (en) Device and method for controlling an agent
CN114612518A (en) Twin network target tracking method based on historical track information and fine-grained matching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant