CN108549928B - Continuous movement-based visual tracking method and device under deep reinforcement learning guidance - Google Patents
Continuous movement-based visual tracking method and device under deep reinforcement learning guidance Download PDFInfo
- Publication number
- CN108549928B CN108549928B CN201810226092.3A CN201810226092A CN108549928B CN 108549928 B CN108549928 B CN 108549928B CN 201810226092 A CN201810226092 A CN 201810226092A CN 108549928 B CN108549928 B CN 108549928B
- Authority
- CN
- China
- Prior art keywords
- action
- network
- actions
- stop
- update
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/277—Analysis of motion involving stochastic approaches, e.g. using Kalman filters
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a visual tracking method and a visual tracking device based on continuous movement under the guidance of deep reinforcement learning, wherein the method comprises the following steps: pre-training a prediction network; generating a plurality of actions according to the prediction network and obtaining corresponding rewards; a Q value for each of a plurality of actions is obtained while updating a network that predicts and generates the actions. The method can continuously and cumulatively adjust the target frame of the object and dynamically adjust the appearance characteristics and the model of the target object, thereby greatly improving the robustness.
Description
Technical Field
The invention relates to the technical field of visual tracking, in particular to a visual tracking method and device based on continuous movement under the guidance of deep reinforcement learning.
Background
Visual object tracking is a fundamental problem in computer vision, and is widely applied in the fields of visual monitoring, robot control, human-computer interaction, advanced auxiliary driving systems and the like. A number of visual tracking methods have been proposed over the past decades, but the problems of visual tracking remain very challenging in unrestricted natural environments due to deformations, sudden movements, occlusion, and illumination changes.
The purpose of the visual tracking problem is to determine the position of objects in the video based on the object information of the first frame only. The visual tracking methods with the best effect at present are mainly divided into two types: a correlation filtering based method and a deep learning based method. Wherein, the correlation filter is designed based on the correlation filtering method to generate the peak value of the correlation filtering of the target object in each frame. This method does not require multiple samples of the appearance of the object. Based on the basic framework MOSSE (Minimum Output Sum of squared Error filter) method, a number of methods such as CFTs and DSST (discrete Scale Space Tracker, DSST algorithm) are proposed to apply color attributes and solve Scale problems. The method tracking method based on deep learning adopts a deep convolutional neural network as a classifier, and selects the most probable position from a plurality of candidate boxes. Representative deep learning-based methods MDNet, FCNT, and STCT all employ inefficient search techniques like sliding windows and iterative sampling. In recent years, some visual tracking methods for making decisions by reinforcement learning have been proposed. For example, ADNet adopts the policy gradient method to decide the size and displacement of the target object. However, most of the existing methods update the depth model on line through sampling, and are easily affected by large deformation and sudden motion, so that the accuracy is reduced, and the problem needs to be solved.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, an object of the present invention is to provide a continuous-motion-based visual tracking method under deep reinforcement learning guidance, which can greatly improve robustness.
The invention also aims to provide a visual tracking device based on continuous movement under the guidance of deep reinforcement learning.
In order to achieve the above object, an embodiment of the present invention provides a continuous-motion-based visual tracking method under deep reinforcement learning guidance, including the following steps: pre-training a prediction network; generating a plurality of actions according to the prediction network and obtaining corresponding rewards; a Q value for each of a plurality of actions is obtained while updating a network that predicts and generates the actions.
According to the visual tracking method based on continuous movement under the guidance of the deep reinforcement learning, the multiple actions can be generated according to the pre-training prediction network, the corresponding rewards are obtained, the Q value of each action of the multiple actions is obtained, the network of the predicted and generated actions is updated simultaneously, the visual tracking problem is modeled into a continuous and cumulative movement problem, the robustness on the appearance change of the tracked target caused by complex background and deformation is higher, and the target drift caused by large deformation and quick movement is relieved to a certain extent.
In addition, the visual tracking method based on continuous movement under the guidance of deep reinforcement learning according to the above embodiment of the present invention may further have the following additional technical features:
further, in one embodiment of the present invention, the objective function of the prediction network is:
wherein, Δ X and Δ y are both scale-invariant transformations of the target frame between two frames, Δ w and Δ h are both width and height variations in logarithmic space, XcurrCurrent frame position X coordinate, XprevIs the x coordinate, W, of the previous frame positionprevIs the current frame width, HprevFor the current frame height, WcurrIs the next frame width, WprevIs the next frame high.
Further, in an embodiment of the present invention, the generating a plurality of actions and obtaining corresponding rewards according to the prediction network further includes: generating continuous action, stopping and updating action, stopping and ignoring action and restarting action through a deep neural network; and obtaining the corresponding reward of each action according to the tracking effect.
Further, in one embodiment of the present invention, wherein,
for the continuation action, use It kAs an input, ft-1 *As hidden layer characteristics:
lt,k=lt,k-1+,
wherein lt,kFor adjusted position, /)t,k-1Is the initial position, t is time, k is the number of iteration steps, is the offset;
for the stop-and-update action, stopping iteration and updating the characteristics of the target and the parameters of the predicted network:
wherein f ist *Is characterized by ρ being a smoothing coefficient, ftIs characterized in that the method is characterized in that,characteristic of the last moment, θtAs a network parameter, θt-1Is a network parameter, mu is a learning speed,q (s, a,) is the Q function, s is the turntable, a is the motion, offset, as desired;
for the stop-and-ignore action, starting to update a next frame and using the target feature at the previous moment and the parameters of the prediction network;
for the restart action, resampling the initial box, wherein the box with the highest Q value is selected as a new initial box by randomly sampling around the current object:
wherein lt-1,0The initial position is stop, update is update;
and, the reward function for the continuation action is:
wherein r ist,kΔ IoU is the amount of change IoU for the current prize value, which is the threshold;
the reward functions for the stop-and-update action and the stop-and-ignore action are:
wherein r ist,KtFor the prize value, KtFor the final number of iteration steps, g is a function of IoU,as true position,/tIs the output position;
the reward function for the resume action is:
further, in one embodiment of the present invention, the Q value of the continuation action is calculated by the following formula:
wherein γ is an equilibrium coefficient.
The Q value calculation formulas of the stop-and-update action, the stop-and-ignore action, and the restart action are as follows:
Q(s,a|t,k)=rt,Kt+γrt+1,kt+1+…。
in order to achieve the above object, another embodiment of the present invention provides a continuous-motion-based visual tracking apparatus under deep reinforcement learning guidance, including: the pre-training module is used for pre-training the prediction network; the generating module is used for generating a plurality of actions according to the prediction network and obtaining corresponding rewards; an acquisition module to acquire a Q value for each of the plurality of actions while updating the network of predicted and generated actions.
The visual tracking device based on continuous movement under the guidance of the deep reinforcement learning of the embodiment of the invention can generate a plurality of actions and obtain corresponding rewards according to the pre-training prediction network, and obtain the Q value of each action of the plurality of actions, thereby updating the network of the predicted and generated actions simultaneously, modeling the visual tracking problem as a continuous and cumulative movement problem, having stronger robustness to the appearance change of the tracked target caused by complex background and deformation, and relieving the target drift caused by large deformation and quick movement to a certain extent.
In addition, the visual tracking device based on continuous movement under the guidance of deep reinforcement learning according to the above embodiment of the present invention may further have the following additional technical features:
further, in one embodiment of the present invention, the objective function of the prediction network is:
wherein, Δ X and Δ y are both scale-invariant transformations of the target frame between two frames, Δ w and Δ h are both width and height variations in logarithmic space, XcurrCurrent frame position X coordinate, XprevIs the x coordinate, W, of the previous frame positionprevIs the current frame width, HprevFor the current frame height, WcurrIs the next frame width, WprevIs the next frame high.
Further, in an embodiment of the present invention, the generating module further includes: the generating unit is used for generating continuous action, stopping and updating action, stopping and ignoring action and restarting action through the deep neural network; and the acquisition unit is used for obtaining the reward corresponding to each action according to the tracking effect.
Further, in one embodiment of the present invention, wherein,
for the continuation action, use It kAs an input, ft-1 *As hidden layer characteristics:
lt,k=lt,k-1+,
wherein lt,kFor adjusted position, /)t,k-1Is the initial position, t is time, k is the number of iteration steps, is the offset;
for the stop-and-update action, stopping iteration and updating the characteristics of the target and the parameters of the predicted network:
wherein f ist *Is characterized by ρ being a smoothing coefficient, ftIs characterized in that the method is characterized in that,characteristic of the last moment, θtAs a network parameter, θt-1Is a network parameter, mu is a learning speed,q (s, a,) is the Q function, s is the turntable, a is the motion, offset, as desired;
for the stop-and-ignore action, starting to update a next frame and using the target feature at the previous moment and the parameters of the prediction network;
for the restart action, resampling the initial box, wherein the box with the highest Q value is selected as a new initial box by randomly sampling around the current object:
wherein lt-1,0The initial position is stop, update is update;
and, the reward function for the continuation action is:
wherein r ist,kΔ IoU is the amount of change IoU for the current prize value, which is the threshold;
the reward functions for the stop-and-update action and the stop-and-ignore action are:
wherein r ist,KtFor the prize value, KtFor the final number of iteration steps, g is a function of IoU,as true position,/tIs the output position;
the reward function for the resume action is:
further, in one embodiment of the present invention,
the Q value calculation formula of the continuous action is as follows:
wherein γ is an equilibrium coefficient.
The Q value calculation formulas of the stop-and-update action, the stop-and-ignore action, and the restart action are as follows:
Q(s,a|t,k)=rt,Kt+γrt+1,kt+1+…。
additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flow chart of a continuous motion based visual tracking method under deep reinforcement learning guidance according to an embodiment of the present invention;
FIG. 2 is a flow diagram of a method for continuous motion based visual tracking guided by deep reinforcement learning according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of optimizing an evaluation network and generating an action network according to one embodiment of the invention;
fig. 4 is a schematic structural diagram of a continuous-motion-based visual tracking apparatus under deep reinforcement learning guidance according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The following describes a visual tracking method and apparatus based on continuous movement under deep reinforcement learning guidance according to an embodiment of the present invention with reference to the accompanying drawings, and first, a visual tracking method based on continuous movement under deep reinforcement learning guidance according to an embodiment of the present invention will be described with reference to the accompanying drawings.
FIG. 1 is a flowchart of a continuous movement-based visual tracking method under deep reinforcement learning guidance according to an embodiment of the present invention.
As shown in fig. 1, the continuous movement-based visual tracking method under the guidance of deep reinforcement learning includes the following steps:
in step S101, the prediction network is pre-trained.
In one embodiment of the present invention, the objective function of the prediction network is:
wherein, Δ X and Δ y are both scale-invariant transformations of the target frame between two frames, Δ w and Δ h are both width and height variations in logarithmic space, XcurrCurrent frame position X coordinate, XprevIs the x coordinate, W, of the previous frame positionprevIs the current frame width, HprevFor the current frame height, WcurrIs the next frame width, WprevIs the next frame high.
It will be appreciated that, given the initial box of the t-th frame, shown in connection with fig. 1 and 2, the depth features are first extracted for that location and the current feature is combined with the features of the target. Four actions (continue, stop and update, stop and ignore, and restart) are then generated using one prediction network and the action generation network to adjust the position and shape of the target box. For action "continue", continuously adjusting the position of the target frame; for the action "stop and update", stopping the iteration and updating the features of the target and the model parameters of the prediction network; for the action "stop and ignore", a step of skipping the update is performed; for the action "restart," the target may have been lost and the initial box needs to be resampled. Finally, the deep evaluation network is used to estimate the Q value of the current action and to update the parameters of the prediction network and the generation action network.
Specifically, the embodiment of the present invention may pre-train the prediction network, and the target function of the prediction network is defined as follows:
where Δ x and Δ y in the formula represent scale-invariant transformations of the target box between two frames, and Δ w and Δ h represent wide and high variations in log space. As shown in fig. 1, the prediction network of the embodiment of the present invention uses three convolutional layers to extract features of the target and candidate regions, and then the features are concatenated and input to two fully-connected layers to generate parameters for estimating the position and scale change. Therefore, through the objective function, the embodiment of the invention can train an end-to-end deep neural network to directly predict the change of the position and the shape.
In step S102, a plurality of actions are generated according to the prediction network and corresponding rewards are obtained.
Further, in an embodiment of the present invention, generating a plurality of actions and obtaining corresponding rewards according to the prediction network further comprises: generating continuous action, stopping and updating action, stopping and ignoring action and restarting action through a deep neural network; and obtaining the corresponding reward of each action according to the tracking effect.
Further, in one embodiment of the present invention, wherein,
for the continuation action, use It kAs an input, ft-1 *As hidden layer characteristics:
lt,k=lt,k-1+,
wherein lt,kFor adjusted position, /)t,k-1Is the initial position, t is time, k is the number of iteration steps, is the offset;
for the stop-and-update action, stopping iteration and updating the characteristics of the target and the parameters of the predicted network:
wherein f ist *Is characterized by ρ being a smoothing coefficient, ftIs characterized in that the method is characterized in that,characteristic of the last moment, θtAs a network parameter, θt-1Is a network parameter, mu is a learning speed,q (s, a,) is the Q function, s is the turntable, a is the motion, offset, as desired;
for stop and ignore actions, start updating the next frame and use the target feature at the last moment and the parameters of the prediction network;
for the restart action, the initial box is resampled, wherein the box with the highest Q value is selected as the new initial box by randomly sampling around the current object:
wherein lt-1,0The initial position is stop, update is update;
and, the reward function for the continuation action is:
wherein r ist,kΔ IoU is the amount of change IoU for the current prize value, which is the threshold;
the reward functions for the stop-and-update action and the stop-and-ignore action are:
wherein r ist,KtFor the prize value, KtFor the final number of iteration steps, g is a function of IoU,as true position,/tIs the output position;
the reward function for the resume action is:
it will be appreciated that embodiments of the invention may generate a series of actions and receive corresponding rewards based on the predicted network. Embodiments of the invention may use a deep neural network to generate four actions: continue, stop and update, stop and ignore, and restart. For action "continue", embodiments of the invention may use It kAs an input, ft-1 *As hidden layer characteristics:
lt,k=lt,k-1+,
for the action "stop and update", embodiments of the invention may stop the iteration and update the features of the target and the parameters of the prediction network:
for the action "stop and ignore", the next frame is started to be updated and the target feature at the last moment and the parameters of the predicted network are used. For the action "restart," embodiments of the present invention resample the initial box, by randomly sampling around the current object, selecting the box with the highest Q value as the new initial box:
the embodiment of the invention also defines corresponding rewards for each action according to the tracking effect. For the action "continue", the reward function is defined as:
wherein, DeltaIoUThe following formula:
for the action "stop and update" and the action "stop and ignore", the reward function may be defined as:
for the action "resume," the reward function may be defined as:
in step S103, the Q value of each of the plurality of actions is acquired, while updating the network of predicted and generated actions.
In one embodiment of the present invention, the Q value calculation formula of the continuous action is:
wherein γ is an equilibrium coefficient.
The Q value calculation formulas of the stop-and-update action, the stop-and-ignore action, and the restart action are as follows:
Q(s,a|t,k)=rt,Kt+γrt+1,kt+1+…。
it will be appreciated that embodiments of the present invention, as shown in FIG. 3, can calculate the Q value for each action and optimize the evaluation network φ-And generating a network of actions theta, embodiments of the present invention may use a deep evaluation network to predict the Q value of a current action and update model parameters of the predicted network. For the action "continue", the Q value is calculated as follows:
for the other three actions "continue", "stop and update", "stop and ignore, and restart", the Q value is calculated as follows:
Q(s,a|t,k)=rt,Kt+γrt+1,kt+1+…,
therefore, the embodiment of the present invention can express the optimization problem of the evaluation network as the following formula:
wherein phi is-Representing the target network, which has the same structure as phi but is in a slowly updated state.
The embodiment of the invention updates and evaluates the parameters of the network according to the following formula:
the embodiment of the invention can express the optimization problem of generating the action network as the following formula:
wherein the content of the first and second substances,
finally, the parameter θ generating the action network is updated as follows:
according to the visual tracking method based on continuous movement under the guidance of the deep reinforcement learning, which is provided by the embodiment of the invention, a plurality of actions can be generated according to a pre-training prediction network, corresponding rewards are obtained, and the Q value of each action of the plurality of actions is obtained, so that the network of the predicted and generated actions is updated simultaneously, the visual tracking problem is modeled into a continuous and cumulative movement problem, the robustness on the appearance change of a tracked target caused by complex background and deformation is stronger, and the target drift caused by large deformation and quick movement is relieved to a certain extent.
Next, a visual tracking apparatus based on continuous movement under guidance of deep reinforcement learning according to an embodiment of the present invention will be described with reference to the drawings.
Fig. 4 is a schematic structural diagram of a continuous movement-based visual tracking apparatus under the guidance of deep reinforcement learning according to an embodiment of the present invention.
As shown in fig. 4, the continuous-movement-based visual tracking apparatus 10 under the guidance of deep reinforcement learning includes: a pre-training module 100, a generation module 200 and an acquisition module 300.
The pre-training module 100 is used for pre-training the prediction network. The generating module 200 is used for generating a plurality of actions according to the prediction network and obtaining corresponding rewards. The acquisition module 300 is used to acquire the Q value for each of a plurality of actions while updating the network of predicted and generated actions. The apparatus 10 of the embodiment of the present invention can continuously and cumulatively adjust the target frame of the object, and dynamically adjust the appearance and model of the target object, thereby greatly improving the robustness.
Further, in one embodiment of the present invention, the objective function of the predicted network is:
wherein, Δ X and Δ y are both scale-invariant transformations of the target frame between two frames, Δ w and Δ h are both width and height variations in logarithmic space, XcurrCurrent frame position X coordinate, XprevIs the x coordinate, W, of the previous frame positionprevIs the current frame width, HprevFor the current frame height, WcurrIs the next frame width, WprevIs the next frame high.
Further, in an embodiment of the present invention, the generating module 200 further includes: the device comprises a generating unit and an acquiring unit. The generating unit is used for generating continuous action, stopping and updating action, stopping and ignoring action and restarting action through the deep neural network. The acquisition unit is used for obtaining the reward corresponding to each action according to the tracking effect.
Further, in one embodiment of the present invention, wherein,
for the continuation action, use It kAs an input, ft-1 *As hidden layer characteristics:
lt,k=lt,k-1+,
wherein lt,kFor adjusted position, /)t,k-1Is the initial position, t is time, k is the number of iteration steps, is the offset;
for the stop-and-update action, stopping iteration and updating the characteristics of the target and the parameters of the predicted network:
wherein f ist *Is characterized by ρ being a smoothing coefficient, ftIs characterized in that the method is characterized in that,characteristic of the last moment, θtAs a network parameter, θt-1Is a network parameter, mu is a learning speed,q (s, a,) is the Q function, s is the turntable, a is the motion, offset, as desired;
for stop and ignore actions, start updating the next frame and use the target feature at the last moment and the parameters of the prediction network;
for the restart action, the initial box is resampled, wherein the box with the highest Q value is selected as the new initial box by randomly sampling around the current object:
wherein lt-1,0The initial position is stop, update is update;
and, the reward function for the continuation action is:
wherein r ist,kΔ IoU is the amount of change IoU for the current prize value, which is the threshold;
the reward functions for the stop-and-update action and the stop-and-ignore action are:
wherein r ist,KtFor the prize value, KtFor the final number of iteration steps, g is a function of IoU,as true position,/tIs the output position;
the reward function for the resume action is:
further, in one embodiment of the present invention,
the Q value calculation formula of the continuous action is as follows:
wherein γ is an equilibrium coefficient.
The Q value calculation formulas of the stop-and-update action, the stop-and-ignore action, and the restart action are as follows:
Q(s,a|t,k)=rt,Kt+γrt+1,kt+1+…。
it should be noted that the foregoing explanation of the embodiment of the visual tracking method based on continuous movement under the guidance of the deep reinforcement learning is also applicable to the visual tracking apparatus based on continuous movement under the guidance of the deep reinforcement learning of the embodiment, and details are not repeated here.
According to the visual tracking device based on continuous movement under the guidance of the deep reinforcement learning, which is provided by the embodiment of the invention, a plurality of actions can be generated according to a pre-training prediction network, corresponding rewards are obtained, and the Q value of each action of the plurality of actions is obtained, so that the network of the predicted and generated actions is updated simultaneously, the visual tracking problem is modeled into a continuous and cumulative movement problem, the robustness on the appearance change of a tracked target caused by complex background and deformation is stronger, and the target drift caused by large deformation and quick movement is relieved to a certain extent.
In the description of the present invention, it is to be understood that the terms "central," "longitudinal," "lateral," "length," "width," "thickness," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," "clockwise," "counterclockwise," "axial," "radial," "circumferential," and the like are used in the orientations and positional relationships indicated in the drawings for convenience in describing the invention and to simplify the description, and are not intended to indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and are therefore not to be considered limiting of the invention.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
In the present invention, unless otherwise expressly stated or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; they may be directly connected or indirectly connected through intervening media, or they may be connected internally or in any other suitable relationship, unless expressly stated otherwise. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
In the present invention, unless otherwise expressly stated or limited, the first feature "on" or "under" the second feature may be directly contacting the first and second features or indirectly contacting the first and second features through an intermediate. Also, a first feature "on," "over," and "above" a second feature may be directly or diagonally above the second feature, or may simply indicate that the first feature is at a higher level than the second feature. A first feature being "under," "below," and "beneath" a second feature may be directly under or obliquely under the first feature, or may simply mean that the first feature is at a lesser elevation than the second feature.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.
Claims (6)
1. A visual tracking method based on continuous movement under the guidance of deep reinforcement learning is characterized by comprising the following steps:
pre-training a prediction network;
generating a plurality of actions according to the prediction network and obtaining corresponding rewards; and
obtaining a Q value for each of a plurality of actions while updating a network that predicts and generates the actions;
wherein the generating a plurality of actions and deriving corresponding rewards in accordance with the prediction network further comprises: generating continuous action, stopping and updating action, stopping and ignoring action and restarting action through a deep neural network; obtaining the corresponding reward of each action according to the tracking effect;
wherein the content of the first and second substances,
for the continuation action, use It kAs an input, ft-1 *As hidden layer characteristics:
lt,k=lt,k-1+,
wherein lt,kFor adjusted position, /)t,k-1Is the initial position, t is time, k is the number of iteration steps, is the offset;
for the stop-and-update action, stopping iteration and updating the characteristics of the target and the parameters of the predicted network:
wherein f ist *Is characterized by ρ being a smoothing coefficient, ftIs characterized in that the method is characterized in that,characteristic of the last moment, θtAs a network parameter, θt-1Is a network parameter, mu is a learning speed,q (s, a,) is the Q function, s is the turntable, a is the motion, offset, as desired;
for the stop-and-ignore action, starting to update a next frame and using the target feature at the previous moment and the parameters of the prediction network;
for the restart action, resampling the initial box, wherein the box with the highest Q value is selected as a new initial box by randomly sampling around the current object:
wherein lt-1,0The initial position is stop, update is update;
and, the reward function for the continuation action is:
wherein r ist,kΔ IoU is the amount of change IoU for the current prize value, which is the threshold;
the reward functions for the stop-and-update action and the stop-and-ignore action are:
wherein r ist,KtFor the prize value, KtFor the final number of iteration steps, g is a function of IoU,as true position,/tIs the output position;
the reward function for the resume action is:
2. the continuous-motion-based visual tracking method under deep reinforcement learning guidance according to claim 1, wherein the objective function of the prediction network is as follows:
wherein, Δ x and Δ y are both scale-invariant transformations of the target frame between two frames, and Δ w and Δ h are both width and height variations in logarithmic space,XcurrCurrent frame position X coordinate, XprevIs the x coordinate, W, of the previous frame positionprevIs the current frame width, HprevFor the current frame height, WcurrIs the next frame width, WprevIs the next frame high.
3. The continuous movement-based visual tracking method under deep reinforcement learning guidance according to claim 1,
the Q value calculation formula of the continuous action is as follows:
wherein γ is the equilibrium coefficient;
the Q value calculation formula for stopping and updating actions, stopping and ignoring, and restarting actions is:
Q(s,a|t,k)=rt,Kt+γrt+1,kt+1+…。
4. a visual tracking device based on continuous movement under the guidance of deep reinforcement learning, which is characterized by comprising:
the pre-training module is used for pre-training the prediction network;
the generating module is used for generating a plurality of actions according to the prediction network and obtaining corresponding rewards; and
an acquisition module for acquiring a Q value for each of a plurality of actions while updating a network of predicted and generated actions;
wherein the generation module further comprises: the generating unit is used for generating continuous action, stopping and updating action, stopping and ignoring action and restarting action through the deep neural network; the acquisition unit is used for obtaining the reward corresponding to each action according to the tracking effect;
wherein the content of the first and second substances,
for the continuation action, use It kAs an input, ft-1 *As hidden layer characteristics:
lt,k=lt,k-1+,
wherein lt,kFor adjusted position, /)t,k-1Is the initial position, t is time, k is the number of iteration steps, is the offset;
for the stop-and-update action, stopping iteration and updating the characteristics of the target and the parameters of the predicted network:
wherein f ist *Is characterized by ρ being a smoothing coefficient, ftIs characterized in that the method is characterized in that,characteristic of the last moment, θtAs a network parameter, θt-1Is a network parameter, mu is a learning speed,q (s, a,) is the Q function, s is the turntable, a is the motion, offset, as desired;
for the stop-and-ignore action, starting to update a next frame and using the target feature at the previous moment and the parameters of the prediction network;
for the restart action, resampling the initial box, wherein the box with the highest Q value is selected as a new initial box by randomly sampling around the current object:
wherein lt-1,0The initial position is stop, update is update;
and, the reward function for the continuation action is:
wherein r ist,kΔ IoU is the amount of change IoU for the current prize value, which is the threshold;
the reward functions for the stop-and-update action and the stop-and-ignore action are:
wherein r ist,KtFor the prize value, KtFor the final number of iteration steps, g is a function of IoU,as true position,/tIs the output position;
the reward function for the resume action is:
5. the device for continuous-motion-based visual tracking under deep reinforcement learning guidance according to claim 4, wherein the objective function of the prediction network is as follows:
wherein, Δ X and Δ y are both scale-invariant transformations of the target frame between two frames, Δ w and Δ h are both width and height variations in logarithmic space, XcurrCurrent frame position X coordinate, XprevIs the x coordinate, W, of the previous frame positionprevIs the current frame width, HprevFor the current frame height, WcurrIs the next frame width, WprevIs the next frame high.
6. The device for continuous movement-based visual tracking under deep reinforcement learning guidance according to claim 4,
the Q value calculation formula of the continuous action is as follows:
wherein γ is the equilibrium coefficient;
the Q value calculation formula for stopping and updating actions, stopping and ignoring, and restarting actions is:
Q(s,a|t,k)=rt,Kt+γrt+1,kt+1+…。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810226092.3A CN108549928B (en) | 2018-03-19 | 2018-03-19 | Continuous movement-based visual tracking method and device under deep reinforcement learning guidance |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810226092.3A CN108549928B (en) | 2018-03-19 | 2018-03-19 | Continuous movement-based visual tracking method and device under deep reinforcement learning guidance |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108549928A CN108549928A (en) | 2018-09-18 |
CN108549928B true CN108549928B (en) | 2020-09-25 |
Family
ID=63516573
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810226092.3A Active CN108549928B (en) | 2018-03-19 | 2018-03-19 | Continuous movement-based visual tracking method and device under deep reinforcement learning guidance |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108549928B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110084307B (en) * | 2019-04-30 | 2021-06-18 | 东北大学 | Mobile robot vision following method based on deep reinforcement learning |
CN111048212B (en) * | 2019-12-20 | 2023-04-18 | 华中科技大学 | Network optimization method for tracking inclined-tip flexible needle path based on deep reinforcement learning |
CN117409557B (en) * | 2023-12-14 | 2024-02-20 | 成都格理特电子技术有限公司 | Dynamic analysis-based high-temperature alarm method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106970615A (en) * | 2017-03-21 | 2017-07-21 | 西北工业大学 | A kind of real-time online paths planning method of deeply study |
CN107066967A (en) * | 2017-04-12 | 2017-08-18 | 清华大学 | A kind of target-seeking method and device of active face using local observation information |
CN107306207A (en) * | 2017-05-31 | 2017-10-31 | 东南大学 | Calculated and multiple target intensified learning service combining method with reference to Skyline |
CN107450555A (en) * | 2017-08-30 | 2017-12-08 | 唐开强 | A kind of Hexapod Robot real-time gait planing method based on deeply study |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110279475A1 (en) * | 2008-12-24 | 2011-11-17 | Sony Computer Entertainment Inc. | Image processing device and image processing method |
-
2018
- 2018-03-19 CN CN201810226092.3A patent/CN108549928B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106970615A (en) * | 2017-03-21 | 2017-07-21 | 西北工业大学 | A kind of real-time online paths planning method of deeply study |
CN107066967A (en) * | 2017-04-12 | 2017-08-18 | 清华大学 | A kind of target-seeking method and device of active face using local observation information |
CN107306207A (en) * | 2017-05-31 | 2017-10-31 | 东南大学 | Calculated and multiple target intensified learning service combining method with reference to Skyline |
CN107450555A (en) * | 2017-08-30 | 2017-12-08 | 唐开强 | A kind of Hexapod Robot real-time gait planing method based on deeply study |
Non-Patent Citations (6)
Title |
---|
《Action-decision networks for visual tracking with deep reinforcement learning》;Sangdoo Yun et al;《2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)》;20170726;第01卷;全文 * |
《Attention-Aware Deep Reinforcement Learning for Video Face Recognition》;Yongming Rao et al;《The IEEE International Conference on Computer Vision (ICCV)》;20171029;全文 * |
《Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation》;Ross Girshick et al;《2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)》;20140628;第01卷;正文第12页 附录第c部分 * |
《Tracking as Online Decision-Making: Learning a Policy from Streaming Videos with Reinforcement Learning》;Supancic III et al;《Computer Vision and Pattern Recognition (cs.CV》;20170717;正文第2页 第1部分、图2 * |
《基于视觉的目标检测与跟踪综述》;尹宏鹏等;《自动化学报》;20161020;第42卷(第10期);全文 * |
《快速深度学习的鲁棒视觉跟踪》;戴铂等;《中国图象图形学报》;20161211;第21卷(第12期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN108549928A (en) | 2018-09-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10860926B2 (en) | Meta-gradient updates for training return functions for reinforcement learning systems | |
CN108549928B (en) | Continuous movement-based visual tracking method and device under deep reinforcement learning guidance | |
CN111141300A (en) | Intelligent mobile platform map-free autonomous navigation method based on deep reinforcement learning | |
CN108446619B (en) | Face key point detection method and device based on deep reinforcement learning | |
CN110473231B (en) | Target tracking method of twin full convolution network with prejudging type learning updating strategy | |
CN109711401B (en) | Text detection method in natural scene image based on Faster Rcnn | |
CN116776964A (en) | Method, program product and storage medium for distributed reinforcement learning | |
CN111161412B (en) | Three-dimensional laser mapping method and system | |
CN104794733A (en) | Object tracking method and device | |
CN109447133B (en) | SVR algorithm-based method for eliminating position information outliers | |
CN110706252B (en) | Robot nuclear correlation filtering tracking algorithm under guidance of motion model | |
CN113168566A (en) | Controlling a robot by using entropy constraints | |
KR20220137732A (en) | Reinforcement Learning with Adaptive Return Calculation | |
WO2021152515A1 (en) | Planning for agent control using learned hidden states | |
CN112507943B (en) | Visual positioning navigation method, system and medium based on multitasking neural network | |
CN109299669B (en) | Video face key point detection method and device based on double intelligent agents | |
CN107657627B (en) | Space-time context target tracking method based on human brain memory mechanism | |
CN113468706A (en) | Laser point cloud power transmission line lead fitting method for distribution network live working robot | |
CN112388628A (en) | Apparatus and method for training a gaussian process regression model | |
CN114608585A (en) | Method and device for synchronous positioning and mapping of mobile robot | |
CN113628246B (en) | Twin network target tracking method based on 3D convolution template updating | |
CN112067007B (en) | Map generation method, computer storage medium, and electronic device | |
CN117372536A (en) | Laser radar and camera calibration method, system, equipment and storage medium | |
US20230090127A1 (en) | Device and method for controlling an agent | |
CN114612518A (en) | Twin network target tracking method based on historical track information and fine-grained matching |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |