CN108549928B

CN108549928B - Continuous movement-based visual tracking method and device under deep reinforcement learning guidance

Info

Publication number: CN108549928B
Application number: CN201810226092.3A
Authority: CN
Inventors: 鲁继文; 周杰; 任亮亮; 袁鑫
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2018-03-19
Filing date: 2018-03-19
Publication date: 2020-09-25
Anticipated expiration: 2038-03-19
Also published as: CN108549928A

Abstract

The invention discloses a visual tracking method and a visual tracking device based on continuous movement under the guidance of deep reinforcement learning, wherein the method comprises the following steps: pre-training a prediction network; generating a plurality of actions according to the prediction network and obtaining corresponding rewards; a Q value for each of a plurality of actions is obtained while updating a network that predicts and generates the actions. The method can continuously and cumulatively adjust the target frame of the object and dynamically adjust the appearance characteristics and the model of the target object, thereby greatly improving the robustness.

Description

Continuous movement-based visual tracking method and device under deep reinforcement learning guidance

Technical Field

The invention relates to the technical field of visual tracking, in particular to a visual tracking method and device based on continuous movement under the guidance of deep reinforcement learning.

Background

Visual object tracking is a fundamental problem in computer vision, and is widely applied in the fields of visual monitoring, robot control, human-computer interaction, advanced auxiliary driving systems and the like. A number of visual tracking methods have been proposed over the past decades, but the problems of visual tracking remain very challenging in unrestricted natural environments due to deformations, sudden movements, occlusion, and illumination changes.

The purpose of the visual tracking problem is to determine the position of objects in the video based on the object information of the first frame only. The visual tracking methods with the best effect at present are mainly divided into two types: a correlation filtering based method and a deep learning based method. Wherein, the correlation filter is designed based on the correlation filtering method to generate the peak value of the correlation filtering of the target object in each frame. This method does not require multiple samples of the appearance of the object. Based on the basic framework MOSSE (Minimum Output Sum of squared Error filter) method, a number of methods such as CFTs and DSST (discrete Scale Space Tracker, DSST algorithm) are proposed to apply color attributes and solve Scale problems. The method tracking method based on deep learning adopts a deep convolutional neural network as a classifier, and selects the most probable position from a plurality of candidate boxes. Representative deep learning-based methods MDNet, FCNT, and STCT all employ inefficient search techniques like sliding windows and iterative sampling. In recent years, some visual tracking methods for making decisions by reinforcement learning have been proposed. For example, ADNet adopts the policy gradient method to decide the size and displacement of the target object. However, most of the existing methods update the depth model on line through sampling, and are easily affected by large deformation and sudden motion, so that the accuracy is reduced, and the problem needs to be solved.

Disclosure of Invention

The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.

Therefore, an object of the present invention is to provide a continuous-motion-based visual tracking method under deep reinforcement learning guidance, which can greatly improve robustness.

The invention also aims to provide a visual tracking device based on continuous movement under the guidance of deep reinforcement learning.

In order to achieve the above object, an embodiment of the present invention provides a continuous-motion-based visual tracking method under deep reinforcement learning guidance, including the following steps: pre-training a prediction network; generating a plurality of actions according to the prediction network and obtaining corresponding rewards; a Q value for each of a plurality of actions is obtained while updating a network that predicts and generates the actions.

According to the visual tracking method based on continuous movement under the guidance of the deep reinforcement learning, the multiple actions can be generated according to the pre-training prediction network, the corresponding rewards are obtained, the Q value of each action of the multiple actions is obtained, the network of the predicted and generated actions is updated simultaneously, the visual tracking problem is modeled into a continuous and cumulative movement problem, the robustness on the appearance change of the tracked target caused by complex background and deformation is higher, and the target drift caused by large deformation and quick movement is relieved to a certain extent.

In addition, the visual tracking method based on continuous movement under the guidance of deep reinforcement learning according to the above embodiment of the present invention may further have the following additional technical features:

further, in one embodiment of the present invention, the objective function of the prediction network is:

wherein, Δ X and Δ y are both scale-invariant transformations of the target frame between two frames, Δ w and Δ h are both width and height variations in logarithmic space, X_currCurrent frame position X coordinate, X_prevIs the x coordinate, W, of the previous frame position_prevIs the current frame width, H_prevFor the current frame height, W_currIs the next frame width, W_prevIs the next frame high.

Further, in an embodiment of the present invention, the generating a plurality of actions and obtaining corresponding rewards according to the prediction network further includes: generating continuous action, stopping and updating action, stopping and ignoring action and restarting action through a deep neural network; and obtaining the corresponding reward of each action according to the tracking effect.

Further, in one embodiment of the present invention, wherein,

for the continuation action, use I_t ^kAs an input, f_t-1 ^*As hidden layer characteristics:

l_t,k＝l_t,k-1+，

wherein l_t,kFor adjusted position, /)_t,k-1Is the initial position, t is time, k is the number of iteration steps, is the offset;

for the stop-and-update action, stopping iteration and updating the characteristics of the target and the parameters of the predicted network:

wherein f is_t ^*Is characterized by ρ being a smoothing coefficient, f_tIs characterized in that the method is characterized in that,

characteristic of the last moment, θ_tAs a network parameter, θ_t-1Is a network parameter, mu is a learning speed,

q (s, a,) is the Q function, s is the turntable, a is the motion, offset, as desired;

for the stop-and-ignore action, starting to update a next frame and using the target feature at the previous moment and the parameters of the prediction network;

for the restart action, resampling the initial box, wherein the box with the highest Q value is selected as a new initial box by randomly sampling around the current object:

wherein l_t-1,0The initial position is stop, update is update;

and, the reward function for the continuation action is:

wherein r is_t,kΔ IoU is the amount of change IoU for the current prize value, which is the threshold;

the reward functions for the stop-and-update action and the stop-and-ignore action are:

wherein r is_t,K_tFor the prize value, K_tFor the final number of iteration steps, g is a function of IoU,

as true position,/_tIs the output position;

the reward function for the resume action is:

further, in one embodiment of the present invention, the Q value of the continuation action is calculated by the following formula:

wherein γ is an equilibrium coefficient.

The Q value calculation formulas of the stop-and-update action, the stop-and-ignore action, and the restart action are as follows:

Q(s,a|_t,k)＝r_t,K_t+γr_t+1,k_t+1+…。

in order to achieve the above object, another embodiment of the present invention provides a continuous-motion-based visual tracking apparatus under deep reinforcement learning guidance, including: the pre-training module is used for pre-training the prediction network; the generating module is used for generating a plurality of actions according to the prediction network and obtaining corresponding rewards; an acquisition module to acquire a Q value for each of the plurality of actions while updating the network of predicted and generated actions.

The visual tracking device based on continuous movement under the guidance of the deep reinforcement learning of the embodiment of the invention can generate a plurality of actions and obtain corresponding rewards according to the pre-training prediction network, and obtain the Q value of each action of the plurality of actions, thereby updating the network of the predicted and generated actions simultaneously, modeling the visual tracking problem as a continuous and cumulative movement problem, having stronger robustness to the appearance change of the tracked target caused by complex background and deformation, and relieving the target drift caused by large deformation and quick movement to a certain extent.

In addition, the visual tracking device based on continuous movement under the guidance of deep reinforcement learning according to the above embodiment of the present invention may further have the following additional technical features:

Further, in an embodiment of the present invention, the generating module further includes: the generating unit is used for generating continuous action, stopping and updating action, stopping and ignoring action and restarting action through the deep neural network; and the acquisition unit is used for obtaining the reward corresponding to each action according to the tracking effect.

Further, in one embodiment of the present invention, wherein,

l_t,k＝l_t,k-1+，

wherein l_t-1,0The initial position is stop, update is update;

and, the reward function for the continuation action is:

as true position,/_tIs the output position;

the reward function for the resume action is:

further, in one embodiment of the present invention,

the Q value calculation formula of the continuous action is as follows:

wherein γ is an equilibrium coefficient.

Q(s,a|_t,k)＝r_t,K_t+γr_t+1,k_t+1+…。

additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a flow chart of a continuous motion based visual tracking method under deep reinforcement learning guidance according to an embodiment of the present invention;

FIG. 2 is a flow diagram of a method for continuous motion based visual tracking guided by deep reinforcement learning according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of optimizing an evaluation network and generating an action network according to one embodiment of the invention;

fig. 4 is a schematic structural diagram of a continuous-motion-based visual tracking apparatus under deep reinforcement learning guidance according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

The following describes a visual tracking method and apparatus based on continuous movement under deep reinforcement learning guidance according to an embodiment of the present invention with reference to the accompanying drawings, and first, a visual tracking method based on continuous movement under deep reinforcement learning guidance according to an embodiment of the present invention will be described with reference to the accompanying drawings.

FIG. 1 is a flowchart of a continuous movement-based visual tracking method under deep reinforcement learning guidance according to an embodiment of the present invention.

As shown in fig. 1, the continuous movement-based visual tracking method under the guidance of deep reinforcement learning includes the following steps:

in step S101, the prediction network is pre-trained.

In one embodiment of the present invention, the objective function of the prediction network is:

It will be appreciated that, given the initial box of the t-th frame, shown in connection with fig. 1 and 2, the depth features are first extracted for that location and the current feature is combined with the features of the target. Four actions (continue, stop and update, stop and ignore, and restart) are then generated using one prediction network and the action generation network to adjust the position and shape of the target box. For action "continue", continuously adjusting the position of the target frame; for the action "stop and update", stopping the iteration and updating the features of the target and the model parameters of the prediction network; for the action "stop and ignore", a step of skipping the update is performed; for the action "restart," the target may have been lost and the initial box needs to be resampled. Finally, the deep evaluation network is used to estimate the Q value of the current action and to update the parameters of the prediction network and the generation action network.

Specifically, the embodiment of the present invention may pre-train the prediction network, and the target function of the prediction network is defined as follows:

where Δ x and Δ y in the formula represent scale-invariant transformations of the target box between two frames, and Δ w and Δ h represent wide and high variations in log space. As shown in fig. 1, the prediction network of the embodiment of the present invention uses three convolutional layers to extract features of the target and candidate regions, and then the features are concatenated and input to two fully-connected layers to generate parameters for estimating the position and scale change. Therefore, through the objective function, the embodiment of the invention can train an end-to-end deep neural network to directly predict the change of the position and the shape.

In step S102, a plurality of actions are generated according to the prediction network and corresponding rewards are obtained.

Further, in an embodiment of the present invention, generating a plurality of actions and obtaining corresponding rewards according to the prediction network further comprises: generating continuous action, stopping and updating action, stopping and ignoring action and restarting action through a deep neural network; and obtaining the corresponding reward of each action according to the tracking effect.

Further, in one embodiment of the present invention, wherein,

l_t,k＝l_t,k-1+，

for stop and ignore actions, start updating the next frame and use the target feature at the last moment and the parameters of the prediction network;

for the restart action, the initial box is resampled, wherein the box with the highest Q value is selected as the new initial box by randomly sampling around the current object:

wherein l_t-1,0The initial position is stop, update is update;

and, the reward function for the continuation action is:

as true position,/_tIs the output position;

the reward function for the resume action is:

it will be appreciated that embodiments of the invention may generate a series of actions and receive corresponding rewards based on the predicted network. Embodiments of the invention may use a deep neural network to generate four actions: continue, stop and update, stop and ignore, and restart. For action "continue", embodiments of the invention may use I_t ^kAs an input, f_t-1 ^*As hidden layer characteristics:

l_t,k＝l_t,k-1+，

for the action "stop and update", embodiments of the invention may stop the iteration and update the features of the target and the parameters of the prediction network:

for the action "stop and ignore", the next frame is started to be updated and the target feature at the last moment and the parameters of the predicted network are used. For the action "restart," embodiments of the present invention resample the initial box, by randomly sampling around the current object, selecting the box with the highest Q value as the new initial box:

the embodiment of the invention also defines corresponding rewards for each action according to the tracking effect. For the action "continue", the reward function is defined as:

wherein, Delta_IoUThe following formula:

for the action "stop and update" and the action "stop and ignore", the reward function may be defined as:

for the action "resume," the reward function may be defined as:

in step S103, the Q value of each of the plurality of actions is acquired, while updating the network of predicted and generated actions.

In one embodiment of the present invention, the Q value calculation formula of the continuous action is:

wherein γ is an equilibrium coefficient.

Q(s,a|_t,k)＝r_t,K_t+γr_t+1,k_t+1+…。

it will be appreciated that embodiments of the present invention, as shown in FIG. 3, can calculate the Q value for each action and optimize the evaluation network φ^-And generating a network of actions theta, embodiments of the present invention may use a deep evaluation network to predict the Q value of a current action and update model parameters of the predicted network. For the action "continue", the Q value is calculated as follows:

for the other three actions "continue", "stop and update", "stop and ignore, and restart", the Q value is calculated as follows:

Q(s,a|_t,k)＝r_t,K_t+γr_t+1,k_t+1+…，

therefore, the embodiment of the present invention can express the optimization problem of the evaluation network as the following formula:

wherein phi is^-Representing the target network, which has the same structure as phi but is in a slowly updated state.

The embodiment of the invention updates and evaluates the parameters of the network according to the following formula:

the embodiment of the invention can express the optimization problem of generating the action network as the following formula:

wherein the content of the first and second substances,

finally, the parameter θ generating the action network is updated as follows:

according to the visual tracking method based on continuous movement under the guidance of the deep reinforcement learning, which is provided by the embodiment of the invention, a plurality of actions can be generated according to a pre-training prediction network, corresponding rewards are obtained, and the Q value of each action of the plurality of actions is obtained, so that the network of the predicted and generated actions is updated simultaneously, the visual tracking problem is modeled into a continuous and cumulative movement problem, the robustness on the appearance change of a tracked target caused by complex background and deformation is stronger, and the target drift caused by large deformation and quick movement is relieved to a certain extent.

Next, a visual tracking apparatus based on continuous movement under guidance of deep reinforcement learning according to an embodiment of the present invention will be described with reference to the drawings.

Fig. 4 is a schematic structural diagram of a continuous movement-based visual tracking apparatus under the guidance of deep reinforcement learning according to an embodiment of the present invention.

As shown in fig. 4, the continuous-movement-based visual tracking apparatus 10 under the guidance of deep reinforcement learning includes: a pre-training module 100, a generation module 200 and an acquisition module 300.

The pre-training module 100 is used for pre-training the prediction network. The generating module 200 is used for generating a plurality of actions according to the prediction network and obtaining corresponding rewards. The acquisition module 300 is used to acquire the Q value for each of a plurality of actions while updating the network of predicted and generated actions. The apparatus 10 of the embodiment of the present invention can continuously and cumulatively adjust the target frame of the object, and dynamically adjust the appearance and model of the target object, thereby greatly improving the robustness.

Further, in one embodiment of the present invention, the objective function of the predicted network is:

Further, in an embodiment of the present invention, the generating module 200 further includes: the device comprises a generating unit and an acquiring unit. The generating unit is used for generating continuous action, stopping and updating action, stopping and ignoring action and restarting action through the deep neural network. The acquisition unit is used for obtaining the reward corresponding to each action according to the tracking effect.

Further, in one embodiment of the present invention, wherein,

l_t,k＝l_t,k-1+，

wherein l_t-1,0The initial position is stop, update is update;

and, the reward function for the continuation action is:

as true position,/_tIs the output position;

the reward function for the resume action is:

further, in one embodiment of the present invention,

the Q value calculation formula of the continuous action is as follows:

wherein γ is an equilibrium coefficient.

Q(s,a|_t,k)＝r_t,K_t+γr_t+1,k_t+1+…。

it should be noted that the foregoing explanation of the embodiment of the visual tracking method based on continuous movement under the guidance of the deep reinforcement learning is also applicable to the visual tracking apparatus based on continuous movement under the guidance of the deep reinforcement learning of the embodiment, and details are not repeated here.

According to the visual tracking device based on continuous movement under the guidance of the deep reinforcement learning, which is provided by the embodiment of the invention, a plurality of actions can be generated according to a pre-training prediction network, corresponding rewards are obtained, and the Q value of each action of the plurality of actions is obtained, so that the network of the predicted and generated actions is updated simultaneously, the visual tracking problem is modeled into a continuous and cumulative movement problem, the robustness on the appearance change of a tracked target caused by complex background and deformation is stronger, and the target drift caused by large deformation and quick movement is relieved to a certain extent.

In the description of the present invention, it is to be understood that the terms "central," "longitudinal," "lateral," "length," "width," "thickness," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," "clockwise," "counterclockwise," "axial," "radial," "circumferential," and the like are used in the orientations and positional relationships indicated in the drawings for convenience in describing the invention and to simplify the description, and are not intended to indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and are therefore not to be considered limiting of the invention.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

In the present invention, unless otherwise expressly stated or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; they may be directly connected or indirectly connected through intervening media, or they may be connected internally or in any other suitable relationship, unless expressly stated otherwise. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.

In the present invention, unless otherwise expressly stated or limited, the first feature "on" or "under" the second feature may be directly contacting the first and second features or indirectly contacting the first and second features through an intermediate. Also, a first feature "on," "over," and "above" a second feature may be directly or diagonally above the second feature, or may simply indicate that the first feature is at a higher level than the second feature. A first feature being "under," "below," and "beneath" a second feature may be directly under or obliquely under the first feature, or may simply mean that the first feature is at a lesser elevation than the second feature.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A visual tracking method based on continuous movement under the guidance of deep reinforcement learning is characterized by comprising the following steps:

pre-training a prediction network;

generating a plurality of actions according to the prediction network and obtaining corresponding rewards; and

obtaining a Q value for each of a plurality of actions while updating a network that predicts and generates the actions;

wherein the generating a plurality of actions and deriving corresponding rewards in accordance with the prediction network further comprises: generating continuous action, stopping and updating action, stopping and ignoring action and restarting action through a deep neural network; obtaining the corresponding reward of each action according to the tracking effect;

wherein the content of the first and second substances,

l_t,k＝l_t,k-1+，

wherein l_t-1,0The initial position is stop, update is update;

and, the reward function for the continuation action is:

as true position,/_tIs the output position;

the reward function for the resume action is:

2. the continuous-motion-based visual tracking method under deep reinforcement learning guidance according to claim 1, wherein the objective function of the prediction network is as follows:

wherein, Δ x and Δ y are both scale-invariant transformations of the target frame between two frames, and Δ w and Δ h are both width and height variations in logarithmic space，X_currCurrent frame position X coordinate, X_prevIs the x coordinate, W, of the previous frame position_prevIs the current frame width, H_prevFor the current frame height, W_currIs the next frame width, W_prevIs the next frame high.

3. The continuous movement-based visual tracking method under deep reinforcement learning guidance according to claim 1,

the Q value calculation formula of the continuous action is as follows:

wherein γ is the equilibrium coefficient;

the Q value calculation formula for stopping and updating actions, stopping and ignoring, and restarting actions is:

Q(s,a|_t,k)＝r_t,K_t+γr_t+1,k_t+1+…。

4. a visual tracking device based on continuous movement under the guidance of deep reinforcement learning, which is characterized by comprising:

the pre-training module is used for pre-training the prediction network;

the generating module is used for generating a plurality of actions according to the prediction network and obtaining corresponding rewards; and

an acquisition module for acquiring a Q value for each of a plurality of actions while updating a network of predicted and generated actions;

wherein the generation module further comprises: the generating unit is used for generating continuous action, stopping and updating action, stopping and ignoring action and restarting action through the deep neural network; the acquisition unit is used for obtaining the reward corresponding to each action according to the tracking effect;

wherein the content of the first and second substances,

l_t,k＝l_t,k-1+，

wherein l_t-1,0The initial position is stop, update is update;

and, the reward function for the continuation action is:

as true position,/_tIs the output position;

the reward function for the resume action is:

5. the device for continuous-motion-based visual tracking under deep reinforcement learning guidance according to claim 4, wherein the objective function of the prediction network is as follows:

6. The device for continuous movement-based visual tracking under deep reinforcement learning guidance according to claim 4,

the Q value calculation formula of the continuous action is as follows:

wherein γ is the equilibrium coefficient;

Q(s,a|_t,k)＝r_t,K_t+γr_t+1,k_t+1+…。