CN115439510A - Active target tracking method and system based on expert strategy guidance - Google Patents

Active target tracking method and system based on expert strategy guidance Download PDF

Info

Publication number
CN115439510A
CN115439510A CN202211388347.9A CN202211388347A CN115439510A CN 115439510 A CN115439510 A CN 115439510A CN 202211388347 A CN202211388347 A CN 202211388347A CN 115439510 A CN115439510 A CN 115439510A
Authority
CN
China
Prior art keywords
expert
tracker
student
target object
scene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211388347.9A
Other languages
Chinese (zh)
Other versions
CN115439510B (en
Inventor
宋然
栾迎新
张钰荻
张伟
李晓磊
张倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN202211388347.9A priority Critical patent/CN115439510B/en
Publication of CN115439510A publication Critical patent/CN115439510A/en
Application granted granted Critical
Publication of CN115439510B publication Critical patent/CN115439510B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an active target tracking method and system based on expert strategy guidance, which belongs to the technical field of active target tracking and comprises the following steps: acquiring a scene observation image, a scene map and an intelligent body pose; according to the scene map and the pose of the intelligent agent, obtaining a local map of each intelligent agent and motion tracks of all the intelligent agents in each local map as first training data; inputting the first training data into an expert tracker and an expert target object respectively, performing countermeasure reinforcement learning by the expert target object and the expert tracker, and outputting a suggested action by the expert tracker; inputting the scene observation image into a student tracker, and training the student tracker by using the suggested action as a label of the scene observation image to obtain a trained student tracker; and identifying the acquired scene real-time image by using the trained student tracker to obtain the decision-making action of the intelligent agent. Accurate tracking of the target is achieved.

Description

Active target tracking method and system based on expert strategy guidance
Technical Field
The invention relates to the technical field of active target tracking, in particular to an active target tracking method and system based on expert strategy guidance.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
Active target tracking refers to the fact that in a dynamic three-dimensional scene, an intelligent body equipped with a camera can enable a target object to be located in the center of the visual field of the intelligent body in a relatively stable size and posture all the time by autonomously adjusting actions. The most advanced active target tracking method at present is a complete end-to-end optimization method depending on deep reinforcement learning. The whole end-to-end optimization process is driven by data, the neural network needs enough and good samples to optimize parameters, and the reinforcement learning optimization also needs to explore more states and actions. However, the conventional active target tracking method adopts a learning strategy of direct confrontation, and a trained target object does not utilize the capacity of an obstacle, so that sufficient challenges cannot be brought to the tracker, such as movement around the obstacle and disappearance of the target object in the field of view of the tracker. A tracker that can handle complex scenes cannot be trained. Therefore, the existing method cannot guarantee accurate target tracking in a complex environment.
Disclosure of Invention
In order to solve the problems, the invention provides an active target tracking method and system based on expert strategy guidance, which can realize active target tracking in complex scenes.
In order to achieve the purpose, the invention adopts the following technical scheme:
in a first aspect, an active target tracking method based on expert policy guidance is disclosed, which comprises the following steps:
acquiring a scene observation image, a scene map and an intelligent body pose;
according to the scene map and the pose of the intelligent agent, obtaining a local map of each intelligent agent and motion tracks of all the intelligent agents in each local map as first training data;
inputting the first training data into an expert tracker and an expert target object respectively, performing countermeasure reinforcement learning by the expert target object and the expert tracker, and outputting a suggested action by the expert tracker;
inputting the scene observation image into a student tracker, and training the student tracker by using the suggested action as a label of the scene observation image to obtain a trained student tracker;
and identifying the acquired scene real-time image by using the trained student tracker to obtain the decision-making action of the intelligent agent.
In a second aspect, an active target tracking system based on expert policy guidance is disclosed, comprising:
the training data acquisition module is used for acquiring a scene observation image, a scene map and an intelligent agent pose;
the first-stage training module is used for obtaining a local map of each intelligent agent and motion tracks of all the intelligent agents in each local map as first training data according to a scene map and the pose of the intelligent agents; inputting the first training data into an expert tracker and an expert target object respectively, performing countermeasure reinforcement learning by the expert target object and the expert tracker, and outputting a suggested action by the expert tracker;
the student tracker training module is used for inputting the scene observation image into the student tracker, training the student tracker by taking the suggested action as a label of the scene observation image, and obtaining the trained student tracker;
and the example tracking module is used for identifying the acquired scene real-time images by using the trained student tracker to obtain the intelligent agent decision-making action.
In a third aspect, an electronic device is provided, which includes a memory and a processor, and computer instructions stored in the memory and executed on the processor, wherein when the computer instructions are executed by the processor, the steps of the active target tracking method based on expert policy guidance are completed.
In a fourth aspect, a computer-readable storage medium is provided for storing computer instructions, which when executed by a processor, perform the steps of an expert policy guidance-based active target tracking method.
Compared with the prior art, the invention has the beneficial effects that:
1. according to the method, an expert model is trained by obtaining an intelligent body local map of a scene and an intelligent body motion trail in the map, an expert tracker outputs a suggested action, an escape strategy is output through an expert target object, then the suggested action output by the expert tracker is used as a label of a scene observation image, the scene observation image is input into a student tracker, the student tracker is trained to obtain a trained student tracker, and the strong scene understanding capability and decision capability of the expert tracker are transferred into the student tracker, so that the student tracker has an obstacle avoidance function, the performance of the student tracker is improved, meanwhile, the extra overhead of on-line drawing construction in an inference process is omitted, the calculation rate is improved, and the real-time performance of target tracking is guaranteed.
Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application.
FIG. 1 is a block diagram showing the overall structure of the method disclosed in example 1;
FIG. 2 is a global map constructed for a training scenario in example 1;
FIG. 3 is a visual representation of a map and agent trajectory used in training an expert agent as disclosed in example 1;
FIG. 4 is a schematic diagram showing a comparison of reward mechanisms disclosed in example 1, wherein (a) is the distribution of obstacles, and (b) is the reward mechanism used in expert tracker training;
FIG. 5 is a handwritten target object trajectory when verifying the tracking effect of the disclosed tracker of embodiment 1;
fig. 6 is a simulation demonstration of the disclosed tracker of example 1, wherein (a) is an expert tracker demonstration and (b) is a student tracker demonstration.
Detailed Description
The invention is further described with reference to the following figures and examples.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
Example 1
In order to improve the accuracy and real-time performance of active target tracking, in this embodiment, an active target tracking method based on expert policy guidance is disclosed, as shown in fig. 1, including:
s1: and acquiring a scene observation image, a scene map and an intelligent body pose.
The acquired scene observation image is a scene observation RGB image or an RGB-D image under the visual angle of the tracker at each moment.
Determining the position and scale of each obstacle in the scene, constructing a scene global map according to the scale and position information of the obstacles, and recording the scene global map as
Figure 437773DEST_PATH_IMAGE001
Wherein is obstructed byThe values of the grid points occupied by the object are set to be non-zero, between 0 and 1, indicated as light in fig. 2, and the grid points not occupied are set to be 0, indicated as dark in fig. 2.
S2: and according to the scene map and the pose of the intelligent agent, obtaining a local map of each intelligent agent and motion tracks of all the intelligent agents in each local map as first training data.
The present embodiment represents the environmental structure information using a grid map centered on an agent. In order to obtain the structure of the environment surrounding the agent,
Figure 398775DEST_PATH_IMAGE002
at any moment, the pose of the agent in the scene is obtained
Figure 46926DEST_PATH_IMAGE003
Wherein, in the process,
Figure 118787DEST_PATH_IMAGE004
is the position and posture of the tracker under the global map,
Figure 496679DEST_PATH_IMAGE005
the pose of the target object under the global map is obtained, and the global map is processed according to the pose
Figure 736905DEST_PATH_IMAGE001
Performing rotation and translation, i.e. calculating the transformation from the global map coordinate system of the scene map to the agent-centric coordinate system, to obtain the agent-centric local map
Figure 262564DEST_PATH_IMAGE006
Where subscripts 1 and 2 represent the tracker and target object, respectively,
Figure 747903DEST_PATH_IMAGE007
for a local map centered on the tracker,
Figure 980301DEST_PATH_IMAGE008
is a local map centered on the target object. TheThe process can be represented as:
Figure 17527DEST_PATH_IMAGE009
(1)
wherein, the first and the second end of the pipe are connected with each other,
Figure 640270DEST_PATH_IMAGE010
to be an intelligent agentiA local map that is the center of the map,
Figure 319513DEST_PATH_IMAGE011
representing a global map
Figure 812942DEST_PATH_IMAGE001
To position with an agent
Figure 286649DEST_PATH_IMAGE012
As a coordinate system of the center.
Coordinate transformation is commonly used to establish a one-to-one correspondence between two different coordinate systems, assuming a coordinate system
Figure 895222DEST_PATH_IMAGE013
Rotate anticlockwise around self Z axis
Figure 846998DEST_PATH_IMAGE014
After the angle, the other translation
Figure 53988DEST_PATH_IMAGE015
And a coordinate system
Figure 308383DEST_PATH_IMAGE016
Coincidence, then coordinate system
Figure 764772DEST_PATH_IMAGE013
Point of (1)
Figure 661184DEST_PATH_IMAGE017
And a coordinate system
Figure 253840DEST_PATH_IMAGE016
Point of (1)
Figure 944715DEST_PATH_IMAGE018
There is a one-to-one correspondence:
Figure 888400DEST_PATH_IMAGE019
(2)
adopting formula (2) to map the global map
Figure 821459DEST_PATH_IMAGE001
To position with an agent
Figure 534200DEST_PATH_IMAGE012
As a coordinate system of the center.
In order to enable all agents to know the motion forms of the agents and other agents, the motion tracks of all agents are represented on a local map of one agent. The track of the intelligent agent is constructed by collecting the pose of the intelligent agent in the historical frame, and the historical pose is transformed to the coordinate system taking the current intelligent agent as the center by utilizing the coordinate system. In addition, to represent timing information in the trajectories, the trajectories of all agents are represented as an arithmetic series with respect to pose time.
tTime of day, agentjThe motion trajectories T relative to the global map collected for all agents are:
Figure 723873DEST_PATH_IMAGE020
and the pose thereof
Figure 30221DEST_PATH_IMAGE021
Wherein the track
Figure 393069DEST_PATH_IMAGE022
By
Figure 835683DEST_PATH_IMAGE023
Personal intelligent agentiAnd (4) historical pose construction. Each historical pose of agent to each agentA calculation is performed and a time-dependent value is assigned to it.
Figure 114534DEST_PATH_IMAGE025
Time of day, agentiIn thattHistorical pose under intelligent body coordinate system at any moment
Figure 578194DEST_PATH_IMAGE026
Can be expressed as:
Figure 280308DEST_PATH_IMAGE027
(3)
Figure 623565DEST_PATH_IMAGE028
(4)
wherein the content of the first and second substances,
Figure 328216DEST_PATH_IMAGE029
representing absolute pose of an agent
Figure 745422DEST_PATH_IMAGE030
To switch to
Figure 244536DEST_PATH_IMAGE031
In the coordinate system with the center as the center,
Figure 747193DEST_PATH_IMAGE032
values representing time distances. Thus, the agentjThe motion trajectory of each agent in the local map of (a) may be represented as:
Figure 989955DEST_PATH_IMAGE033
(5)
the visual results of the constructed local maps and the movement tracks of the agents in the maps are shown in fig. 3, wherein the black part in the map is a passable area, the white line is the track of the tracker and the target object, the rest white or gray parts are non-passable parts, and the lighter the color is, the higher the barrier height is.
S3: the first training data are respectively input into an expert tracker and an expert target object, the expert target object and the expert tracker perform confrontation reinforcement learning, the expert tracker outputs suggested actions, and the expert target object outputs escape strategies corresponding to the targets.
The expert tracker comprises a convolutional neural network and a sequence model, the local map and the motion trail of the intelligent agent are coded through the convolutional neural network to obtain coded information, and the coded information is identified through the sequence model to obtain decision-making action.
Each expert agent needs a model with sufficient expressive power to map the input to a simple action. The expert tracker firstly encodes environment structure information and intelligent agent motion information by using a convolutional neural network to obtain encoded information, wherein the environment structure information is a local map of an intelligent agent, the intelligent agent motion information is an intelligent agent motion track in the map, then a sequence model is used for modeling the dynamic characteristics between sequence observations, estimating the environment state and outputting corresponding motion distribution. In addition, a cost function of the current state needs to be estimated simultaneously for iterative estimation and promotion of the strategy.
Therefore, the structure of the expert tracker is shown in table 1, where C5x5-32S1P2 represents a convolutional neural network, which employs 32 convolutional kernels of size 5x5, the step size of each convolutional kernel is 1, and the size of the padding area is 2; the LSTM256 represents that the sequence model adopts a long-term and short-term memory network, and the input and output dimensionality of the sequence model is 256; FC6 denotes the fully connected layer with an output dimension of 6.
Each expert tracker maps its own local map
Figure 527247DEST_PATH_IMAGE034
And the motion tracks of all intelligent bodies in the local map
Figure 400525DEST_PATH_IMAGE035
As an input, among other things,
Figure 515112DEST_PATH_IMAGE036
output the predicted action
Figure 669887DEST_PATH_IMAGE037
The calculation process of its predicted action can be expressed as equation (6):
Figure 451898DEST_PATH_IMAGE038
(6)
TABLE 1 model Structure for expert models
Figure 168182DEST_PATH_IMAGE039
The expert tracker and the expert target object adopt a reward mechanism which can be known by shielding, when the expert tracker is not in a shielding state, the value range of the reward value of the expert tracker is limited to be between 0 and 1, and when the expert tracker is in a shielding state, the reward value of the expert tracker is set to be-1.
Whether shelter from relative orientation and position that accessible map and agent and judge: when any point on the connection line between the agents is marked as occupied on the map, namely, the shielding occurs. The reward for the expert tracker may be expressed as:
Figure 973327DEST_PATH_IMAGE040
(7)
wherein the content of the first and second substances,
Figure 557892DEST_PATH_IMAGE041
is a reward for the expert tracker that,
Figure 804196DEST_PATH_IMAGE042
Figure 81594DEST_PATH_IMAGE043
respectively the actual distance and the expected distance of the expert tracker from the target object,
Figure 780560DEST_PATH_IMAGE044
Figure 372078DEST_PATH_IMAGE045
respectively the actual and expected angles of the expert tracker to the target object,
Figure 863102DEST_PATH_IMAGE046
Figure 419723DEST_PATH_IMAGE047
the maximum distance and angle, respectively, that the expert tracker can see, the subscript indicating the time of day is omitted.
Besides the observed value of the expert target object, the expert target object also has the ability of acquiring the observed value of the expert tracker and predicting the obtained reward value, the reward of the expert target object is the opposite number of the reward of the expert tracker, and the zero-sum competitive relationship between the expert tracker and two agents of the expert target object is maintained. Thus, when occlusion occurs, expert tracker tracking is penalized disadvantageously, while the expert target object is rewarded for being in a state that facilitates escaping the tracker's line of sight. Fig. 4 is a graph showing the relationship between the position of the expert target object and the reward that should be obtained by the expert tracker when the expert tracker fixing position is (0, 0), wherein (a) is the obstacle distribution and (b) in fig. 4 is the reward mechanism used for training the expert tracker.
It can be seen that the reward mechanism proposed in this embodiment can be timely fed back to the expert tracker or expert target object when occlusion occurs.
The specific process of obtaining the output suggested action of the expert tracker comprises the following steps:
respectively inputting first training data into an expert tracker and an expert target object, performing countermeasure learning on the expert tracker through the expert target object, performing pre-training on the expert tracker, outputting decision actions by the expert tracker in the pre-training process, outputting escape strategies corresponding to targets by the expert target object, and constructing an expert strategy pool through strategies of an expert target object model;
selecting a micro-call expert target object model from an expert strategy pool;
the method comprises the steps of performing counterstudy by micro-calling an expert target object model and a pre-trained expert tracker, performing micro-adjustment on the pre-trained expert tracker, and outputting suggested actions through the micro-adjusted expert tracker.
In particular implementation, the training process of the expert tracker is divided into antagonistic expert strategy learning and fine-tuning of the expert tracker.
First, first training data are respectively input into an expert tracker and an expert target object model, and the expert tracker and the expert target object model are optimized through countermeasure reinforcement learning to generate a diversification strategy, wherein the process is a pre-training process of the expert tracking model. With the optimization, the expert target object model generates different strategies to escape the tracking of the expert tracker, and the expert tracker learns various strategies to cope with the escape strategies of the expert target object. In the process, not only a relatively strong expert tracker model is learned, but also the strategies of the expert target object models are stored for 200, 250, 300, 350, 400, 450, 550, 650, 700, 800 and 950 ten thousand times of interaction times to construct an expert strategy pool.
Second, the tracker expert model is fine-tuned. As reinforcement learning progresses, expert trackers are gradually forgotten about past methods of dealing with escape strategies, and further adjustments to the expert tracker are required. In the process, the pre-trained expert tracker performs confrontation training with the expert target object models in the expert strategy pool, the expert tracker tries to learn a stronger strategy so as to perfectly cope with all the strategy models in the expert target object strategy pool, the expert tracker evaluates for 100 times in a training environment, and the turn length can be stabilized to be more than 495.
S4: and inputting the scene observation image into a student tracker, and training the student tracker by using the suggested action as a label of the scene observation image to obtain the trained student tracker.
This example trains a simple lightweight student tracker under the direction of expert strategies. In this stage, the input to the student tracker is set to the scene observation image at the tracker perspective at each moment. The optimization process of the student tracker is a supervised learning process, and the student tracker is trained by adopting double constraints of a feature space and an output space, so that the strong scene understanding capability and decision making capability of the expert tracker are transferred to the student tracker. When the student tracker is trained, the student tracker is guided by the student target object, and in order to generate diversified target object strategies, in the training process, the model parameters of the student target object are randomly sampled from the expert target object strategy pool constructed in the first stage.
TABLE 2 model structure of student tracker
Figure 199460DEST_PATH_IMAGE048
The model structure of the student tracker is shown in table 2 and comprises a convolutional neural network and a sequence model, wherein the convolutional neural network is used for coding an input observation image to obtain coding information, and the sequence model is used for identifying the coding information to obtain a decision action.
Wherein, C5x5-32S1P2 represents a convolutional neural network, which adopts 32 convolutional kernels with the size of 5x5, the stride of each convolutional kernel is 1, and the size of a filling area is 2; the LSTM256 represents that the sequence model adopts a long-term and short-term memory network unit, and the input and output dimensionality of the sequence model is 256; FC6 denotes the fully connected layer with an output dimension of 6.
The supervision signal used in the training of the student tracker is two parts: feature space constraints and action space constraints because the student tracker is required to simultaneously migrate the scene awareness and decision making capabilities of the expert tracker. Thus, loss function of student tracker
Figure 125828DEST_PATH_IMAGE049
Is defined as two parts:
Figure 346725DEST_PATH_IMAGE050
(8)
wherein the content of the first and second substances,
Figure 169187DEST_PATH_IMAGE051
Figure 967379DEST_PATH_IMAGE052
respectively a loss function in feature space and a loss function in motion space,
Figure 572804DEST_PATH_IMAGE053
for the over parameter, set to 0.1.
The recommended action output by the expert tracker is used as a dense supervision signal for training of the student tracker, and the KL divergence is used for forcing the output of the student tracker to approach the output of the expert tracker. At each time step, the expert tracker observes and gives a suggested action according to the current privilege information to serve as a data label for the training of the student tracker model; in training, using the KL divergence to force the output of the student tracker to approach the output of the expert tracker, the computation of this partial loss function can be expressed as:
Figure 976103DEST_PATH_IMAGE054
(9)
in the formula (I), the compound is shown in the specification,
Figure 766205DEST_PATH_IMAGE055
for the output of the student tracker at time t,
Figure 927059DEST_PATH_IMAGE056
the output of the expert tracker at time t.
In order for student trackers to have greater scene understanding capabilities, student trackers are forced to learn features similar to expert trackers. Therefore, the present embodiment calculates the loss function by measuring the similarity of convolutional neural network outputs in expert and student trackers as the feature space constraint, and the calculation can be expressed as:
Figure 460809DEST_PATH_IMAGE057
(10)
wherein MSE represents the mean square loss function,
Figure 718615DEST_PATH_IMAGE058
Figure 811377DEST_PATH_IMAGE059
respectively, the output of the last convolutional layer of the student tracker and the expert tracker.
Furthermore, to assist in mining difficult samples for training of student trackers, in the training process, the student target object model selects actions using a randomly sampled strategy in the target object expert strategy pool constructed in the first stage.
S5: and identifying the acquired scene real-time image by using the trained student tracker to obtain an intelligent agent decision action.
The active target tracking model (EG-AOT) constructed in this embodiment is shown in fig. 1 and includes an expert model and a student model, the expert model includes an expert tracker and an expert target object that are used for learning against each other, the student model includes a student tracker and a student target object, and the student target object guides the student tracker.
The present embodiment verifies the performance of the active target tracking model (EG-AOT) using a target object based on point-to-point navigation (Nav) and a target object based on trajectory planning (PathPlanning).
The target object based on the trajectory planning can directly obtain the scene map, and the trajectory planning is carried out in two steps: first, at the beginning of each round, target objects are picked from the map
Figure 849740DEST_PATH_IMAGE060
Selecting two points at two sides of each obstacle at random, and sharing the two points
Figure 796968DEST_PATH_IMAGE061
The points are taken as path-level subtarget points and are connected into a closed-loop path,reuse of
Figure 909280DEST_PATH_IMAGE062
The algorithm calculates a final path that avoids the obstacle. Secondly, secondary child target points are screened out from the path again, the number of the secondary child target points is larger than that of the primary child target points, the target object can avoid obstacles, the expected travelling speed and the rotation angle of the target object can be determined at each moment according to the current direction of the target object, the distance from the position of the secondary child target points and the angle, and the actual travelling speed is determined by adding certain noise to the expected travelling speed. Used in the experiments
Figure 41184DEST_PATH_IMAGE063
Because PathPlanning uses an environment map to plan a path in advance, the target object has the ability to avoid obstacles, and there can be more probability to challenge the tracker: such as when the target is occluded by an obstacle. A schematic diagram of the path planning for some target objects is shown in fig. 5.
In the first stage training of the active target tracking model, the size of the local map is set to 80x80, wherein the side length of each grid corresponds to the distance of 10 cm in the simulation environment, and the center of each grid is the position of the agent. The model is trained on a computer, and model optimization is carried out by adopting 6 threads. The total number of interactions of the agent with the environment while countering the learning and fine tuning of the expert tracker is 1000M each. In the second stage, the observed data of the student model are all adjusted to 80x80 and then input into the model, 4 threads are adopted for model optimization, and the updating times are 2000M. Other hyper-parameters used for training and evaluation are shown in table 3, and the tracker and target object motion space settings are shown in tables 4 and 5.
TABLE 3 this example presents hyper-parameters for EG-AOT in training and evaluation
Figure 911051DEST_PATH_IMAGE064
TABLE 4 motion space of active tracker
Movement of Speed (cm/sec) Angle (degree)
Forward 200 0
Retreat -200 0
Go forward to the right 150 45
Go forward to the left 150 -45
Turn to the right 0 45
Rotate to the left 0 -45
Stop 0 0
TABLE 5 learning of the motion space of a target object
Movement of Speed (cm/sec) Angle (degree)
Forward 150 0
Retreat -150 0
Go forward to the right 100 45
Go forward to the left 100 -45
Turn to the right 0 45
Rotate to the left 0 -45
Stop 0 0
The expected position difference, the turn length, the success rate and the shielding rate are used for evaluating the performance of the model. Specific descriptions about each index are as follows:
the expected position difference is an accumulated value of the expected position difference at each time, and the expected position difference at each step is calculated by the formula
Figure 52182DEST_PATH_IMAGE065
The larger the value, the better.
The viewable area in the turn length is defined as a sector area in front of the tracker with a radius of 750 cm and a range of 90 degrees. The current round stops as long as the target is outside this area for 5 seconds or the round length reaches 500.
Success rate, when the round length reaches 500, is marked as a successful trace, while power indicates the ratio of the number of successful traces over all trials.
The student tracker disclosed in this example was compared to a baseline method, which included the latest AD-VAT and AD-VAT + algorithms. For the sake of fairness, the student tracker in this embodiment uses the input that is the RGB image, and the same network model structure in accordance with the reference method, and constructs the variant AD-VAT and AD-VAT + of the AD-VAT and AD-VAT + algorithms, which are compared with the student tracker using the RGDB image as the input.
TABLE 6 comparative experiment results with reference method for RGB input
Figure 284581DEST_PATH_IMAGE066
And comparing the experimental results of the model by taking RGB as input. The experimental results are shown in table 6, with the Nav strategy applied to the target subjects. From experimental results, compared with a benchmark method, the student tracker provided by the embodiment can obtain longer turn length and better success rate in most scenes, and is improved in average result. This is because, although the same model structure and observation input are adopted, the student tracker disclosed in the present embodiment migrates the scene understanding capability and decision-making capability of the expert policy tracker, and has a certain capability of handling obstacles, and thus can achieve performance improvement.
Comparing model experiment results when the RGBD image is used as input. The experimental results are shown in table 7, with the target subject using Nav strategy. Overall, the experimental conclusions for the RGBD data input are similar to those for the RGB data input: although the student tracker proposed in this embodiment is inferior to the benchmark method in terms of the desired position difference index, better results were obtained in terms of average round length and success rate. Furthermore, it can be seen that the student tracker proposed by the present embodiment is more elevated than the respective reference method, because the spatial cues are more missing for RGB data, and thus the model learning scene understanding is more difficult.
TABLE 7 comparative experiment results with the baseline method of RGBD input
Figure 197173DEST_PATH_IMAGE067
Note: the results are the mean and variance of 100 replicates and are expressed as "mean ± variance". The best results are shown in bold font. The last column is the average result in all scenes.
And (5) comparing the running time. This embodiment proposes that the model is run time consistent with the baseline method, where the run time of the model is 0.002260s per frame for RGB input and 0.002943s per frame for RGBD input.
In order to verify the reasonableness and superiority of the expert strategy disclosed in the embodiment, other different expert strategies Depth and MaskDepth are additionally constructed and compared experimentally.
A Depth: the target tracker inputs the real depth image of the first view angle as a tracker model, and the target person can learn the real depth image of the first view angle, the depth image of the first view angle of the tracker and the action taken by the tracker as inputs.
MaskDepth: the target tracker splices the semantic segmentation map and the real depth image of the first visual angle of the target tracker along the channel dimension as input, and the target person can learn to take the semantic segmentation map and the real depth image of the first visual angle of the target person, the semantic segmentation map and the depth image of the first visual angle of the tracker and actions taken by the tracker as input. The model structure is shown in table 8 for the tracker model structure.
The experimental results are shown in table 9, comparing the performance of the same tracker strategy on various evaluation indexes, in particular the occlusion rate index, it can be seen that in the ability to challenge the tracker with obstacles to make difficulties: nav < pathplating < expert target object presented in this example. In fact, nav can hardly process the situation of the obstacle, pathplating has certain obstacle utilization capacity by manually selecting some path sub-target points close to the obstacle by utilizing obstacle position information and planning an obstacle avoidable path by utilizing an a-x algorithm, and the confrontation reinforcement learning of the expert tracker and the expert target object provided by the embodiment can acquire more complete obstacle position information and target person movement information, so that the action can be selected by comprehensively considering the structure of the environment around the tracker and the movement of the tracker, and the capability of manufacturing a difficult tracking scene by utilizing the obstacle is stronger than that of pathplating.
TABLE 8 tracker model architecture
Figure 944549DEST_PATH_IMAGE068
TABLE 9 expert policy Performance comparison
Figure 997694DEST_PATH_IMAGE069
Note: the results are the mean and variance of 100 replicates and are expressed as "mean ± variance". The best results are shown in bold font. The last column is the average result across all scenes.
Furthermore, the expert tracker proposed in this embodiment exhibits the best performance on all evaluation indexes as the objective strategy changes. More specifically, as the obstacle capability of the target strategy is increased, the tracking performance of both the Depth tracker and the maskdept tracker is obviously reduced: the Depth tracker success rate drops from 0.86 to 0.41 and the maskdepth tracker success rate drops from 0.77 to 0.33. However, the embodiment proposes that the expert trackers all achieve robust tracking: the average round length of 100 tests is stable at 495 and the success rate is stable at 0.9. Furthermore, the expert tracker proposed by the present embodiment always has a lower occlusion rate, i.e. it handles occlusion more strongly than other expert trackers.
In order to more intuitively show the performance of the method provided by the embodiment, the expert tracker and the student tracker provided by the embodiment are respectively operated and demonstrated in a virtual environment, the result is shown in fig. 6, the figure in the figure is a virtual figure, in the figure, (a) is the expert tracker demonstration result, and (b) is the student tracker demonstration result, both of which are represented by a first visual angle of the tracker, the number at the upper left corner of each frame image is the current frame number, the leftmost column in the figure is a schematic diagram of the relative position relationship among the tracker, a target object and an obstacle, wherein two circles with darker colors represent the positions of the target where the target starts and ends movement; the two circles with lighter colors represent the positions where the tracker starts and ends movement, the dotted lines and arrows represent the movement trajectory and direction, respectively, and the middle rectangle or ellipse represents an obstacle.
The embodiment discloses a method, an expert model is trained by obtaining an intelligent agent local map of a scene and an intelligent agent motion trail in the map, an expert tracker outputs a proposal action, an escape strategy is output by an expert target object, then the proposal action output by the expert tracker is used as a label of a scene observation image, the scene observation image is input into a student tracker, the student tracker is trained to obtain a trained student tracker, and the strong scene understanding capability and decision capability of the expert tracker are transferred into the student tracker.
Example 2
In this embodiment, an active target tracking system based on expert policy guidance is provided, including:
the training data acquisition module is used for acquiring a scene observation image, a scene map and an intelligent body pose;
the first-stage training module is used for obtaining a local map of each intelligent agent and motion tracks of all the intelligent agents in each local map as first training data according to a scene map and the pose of the intelligent agents; respectively inputting the first training data into an expert tracker and an expert target object, performing countermeasure reinforcement learning by the expert target object and the expert tracker, outputting a suggested action through the expert tracker, and outputting an escape strategy corresponding to a target through the expert target object;
the student tracker training module is used for inputting the scene observation image into the student tracker, training the student tracker by taking the suggested action as a label of the scene observation image, and obtaining the trained student tracker;
and the example tracking module is used for identifying the acquired scene real-time images by using the trained student tracker to obtain the intelligent agent decision-making action.
Example 3
In this embodiment, an electronic device is disclosed, which comprises a memory and a processor, and computer instructions stored in the memory and executed on the processor, wherein the computer instructions, when executed by the processor, perform the steps of the expert policy guidance-based active target tracking method disclosed in embodiment 1.
Example 4
In this embodiment, a computer readable storage medium is disclosed for storing computer instructions that, when executed by a processor, perform the steps of an expert policy guidance based active target tracking method disclosed in embodiment 1.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims (10)

1. An active target tracking method based on expert strategy guidance is characterized by comprising the following steps:
acquiring a scene observation image, a scene map and an intelligent body pose;
according to the scene map and the pose of the intelligent agent, obtaining a local map of each intelligent agent and motion tracks of all the intelligent agents in each local map as first training data;
inputting the first training data into an expert tracker and an expert target object respectively, performing countermeasure reinforcement learning by the expert target object and the expert tracker, and outputting a suggested action by the expert tracker;
inputting the scene observation image into a student tracker, and training the student tracker by using the suggested action as a label of the scene observation image to obtain a trained student tracker;
and identifying the acquired scene real-time image by using the trained student tracker to obtain the decision-making action of the intelligent agent.
2. The active target tracking method based on expert strategy guidance as claimed in claim 1, wherein an occlusion aware reward mechanism is adopted by the expert tracker and the expert target object, when the expert tracker is not in an occlusion state, the value range of the reward value of the expert tracker is limited to be between 0 and 1, and when the expert tracker is in an occlusion state, the reward value of the expert tracker is set to be-1.
3. The active target tracking method based on expert strategy guidance as claimed in claim 1, wherein when the expert target object and the expert tracker perform countermeasure reinforcement learning, an escape strategy corresponding to a target is output through the expert target object, and an expert strategy pool is constructed through a model strategy of the expert target object.
4. The expert strategy guidance-based active target tracking method according to claim 3, wherein the specific process of obtaining the output recommended action of the expert tracker is as follows:
respectively inputting the first training data into an expert tracker and an expert target object, performing countermeasure learning on the expert tracker through the expert target object, performing pre-training on the expert tracker, outputting a decision action by the expert tracker in the pre-training process, outputting an escape strategy corresponding to a target by the expert target object, and constructing an expert strategy pool through a strategy of an expert target object model;
selecting a micro-transfer expert target object model from an expert strategy pool;
the method comprises the steps of performing counterstudy by micro-calling an expert target object model and a pre-trained expert tracker, performing micro-adjustment on the pre-trained expert tracker, and outputting suggested actions through the micro-adjusted expert tracker.
5. The expert strategy guidance-based active target tracking method of claim 3, wherein in training the student tracker, the student tracker is guided by student target objects, wherein the student target object model is an expert target object model in an expert strategy pool.
6. The active target tracking method based on expert strategy guidance according to claim 1, wherein the expert tracker and the student tracker both comprise a convolutional neural network and a sequence model, the convolutional neural network in the expert tracker encodes a local map and a relative motion trajectory of an agent to obtain encoded information, and the encoded information is identified by the sequence model to obtain a suggested action; a convolutional neural network in the student tracker encodes the scene observation image to obtain encoding information, and the encoding information is identified through a sequence model to obtain a decision action.
7. The expert strategy guidance-based active target tracking method of claim 6, wherein the loss function of the student tracker comprises loss in feature space and loss in action space, the loss in action space is calculated by KL divergence, and the loss in feature space is obtained by similarity calculation between the output of convolutional neural network in the expert tracker and the output of the student tracker.
8. An active target tracking system based on expert policy guidance, comprising:
the training data acquisition module is used for acquiring a scene observation image, a scene map and an intelligent agent pose;
the first-stage training module is used for obtaining a local map of each intelligent agent and motion tracks of all the intelligent agents in each local map as first training data according to a scene map and the pose of the intelligent agents; inputting the first training data into an expert tracker and an expert target object respectively, performing countermeasure reinforcement learning by the expert target object and the expert tracker, and outputting a suggested action by the expert tracker;
the student tracker training module is used for inputting the scene observation image into the student tracker, training the student tracker by taking the suggested action as a label of the scene observation image, and obtaining the trained student tracker;
and the example tracking module is used for identifying the acquired scene real-time images by using the trained student tracker to obtain the intelligent agent decision-making action.
9. An electronic device comprising a memory and a processor and computer instructions stored in the memory and executed on the processor, wherein the computer instructions, when executed by the processor, perform the steps of the expert policy guidance based active target tracking method of any one of claims 1-7.
10. A computer readable storage medium storing computer instructions which, when executed by a processor, perform the steps of the expert policy guidance based active object tracking method of any one of claims 1 to 7.
CN202211388347.9A 2022-11-08 2022-11-08 Active target tracking method and system based on expert strategy guidance Active CN115439510B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211388347.9A CN115439510B (en) 2022-11-08 2022-11-08 Active target tracking method and system based on expert strategy guidance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211388347.9A CN115439510B (en) 2022-11-08 2022-11-08 Active target tracking method and system based on expert strategy guidance

Publications (2)

Publication Number Publication Date
CN115439510A true CN115439510A (en) 2022-12-06
CN115439510B CN115439510B (en) 2023-02-28

Family

ID=84252026

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211388347.9A Active CN115439510B (en) 2022-11-08 2022-11-08 Active target tracking method and system based on expert strategy guidance

Country Status (1)

Country Link
CN (1) CN115439510B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102901500A (en) * 2012-09-17 2013-01-30 西安电子科技大学 Aircraft optimal path determination method based on mixed probability A star and agent
US20200033868A1 (en) * 2018-07-27 2020-01-30 GM Global Technology Operations LLC Systems, methods and controllers for an autonomous vehicle that implement autonomous driver agents and driving policy learners for generating and improving policies based on collective driving experiences of the autonomous driver agents
CN112215364A (en) * 2020-09-17 2021-01-12 天津(滨海)人工智能军民融合创新中心 Enemy-friend depth certainty strategy method and system based on reinforcement learning
CN112325884A (en) * 2020-10-29 2021-02-05 广西科技大学 ROS robot local path planning method based on DWA
WO2021073528A1 (en) * 2019-10-18 2021-04-22 华中光电技术研究所(中国船舶重工集团有限公司第七一七研究所) Intelligent decision-making method and system for unmanned surface vehicle
CN112908042A (en) * 2015-03-31 2021-06-04 深圳市大疆创新科技有限公司 System and remote control for operating an unmanned aerial vehicle
WO2021208771A1 (en) * 2020-04-18 2021-10-21 华为技术有限公司 Reinforced learning method and device
CN114154913A (en) * 2021-12-14 2022-03-08 上海启泷通教育科技有限公司 Method for acquiring big exercise data of infants and finely analyzing health data
CN115100238A (en) * 2022-05-24 2022-09-23 北京理工大学 Knowledge distillation-based light single-target tracker training method
CN115164890A (en) * 2022-06-09 2022-10-11 复旦大学 Swarm unmanned aerial vehicle autonomous motion planning method based on simulation learning
WO2022222490A1 (en) * 2021-04-21 2022-10-27 中国科学院深圳先进技术研究院 Robot control method and robot

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102901500A (en) * 2012-09-17 2013-01-30 西安电子科技大学 Aircraft optimal path determination method based on mixed probability A star and agent
CN112908042A (en) * 2015-03-31 2021-06-04 深圳市大疆创新科技有限公司 System and remote control for operating an unmanned aerial vehicle
US20200033868A1 (en) * 2018-07-27 2020-01-30 GM Global Technology Operations LLC Systems, methods and controllers for an autonomous vehicle that implement autonomous driver agents and driving policy learners for generating and improving policies based on collective driving experiences of the autonomous driver agents
WO2021073528A1 (en) * 2019-10-18 2021-04-22 华中光电技术研究所(中国船舶重工集团有限公司第七一七研究所) Intelligent decision-making method and system for unmanned surface vehicle
WO2021208771A1 (en) * 2020-04-18 2021-10-21 华为技术有限公司 Reinforced learning method and device
CN112215364A (en) * 2020-09-17 2021-01-12 天津(滨海)人工智能军民融合创新中心 Enemy-friend depth certainty strategy method and system based on reinforcement learning
CN112325884A (en) * 2020-10-29 2021-02-05 广西科技大学 ROS robot local path planning method based on DWA
WO2022222490A1 (en) * 2021-04-21 2022-10-27 中国科学院深圳先进技术研究院 Robot control method and robot
CN114154913A (en) * 2021-12-14 2022-03-08 上海启泷通教育科技有限公司 Method for acquiring big exercise data of infants and finely analyzing health data
CN115100238A (en) * 2022-05-24 2022-09-23 北京理工大学 Knowledge distillation-based light single-target tracker training method
CN115164890A (en) * 2022-06-09 2022-10-11 复旦大学 Swarm unmanned aerial vehicle autonomous motion planning method based on simulation learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHAO LI等: "《Comparative Research of Dynamic Target Detection Algorithms Based on Static Background》", 《2021 PHOTONICS & ELECTROMAGNETICS RESEARCH SYMPOSIUM (PIERS)》 *
张文旭等: "强化学习的地空异构多智能体协作覆盖研究", 《智能系统学报》 *
陈建行: "《基于深度强化学习的端到端主动跟踪系统》", 《万方平台》 *

Also Published As

Publication number Publication date
CN115439510B (en) 2023-02-28

Similar Documents

Publication Publication Date Title
Wang et al. Reinforced cross-modal matching and self-supervised imitation learning for vision-language navigation
Surmann et al. Deep reinforcement learning for real autonomous mobile robot navigation in indoor environments
CN110874578B (en) Unmanned aerial vehicle visual angle vehicle recognition tracking method based on reinforcement learning
Chernova et al. Confidence-based policy learning from demonstration using gaussian mixture models
Chen et al. Stabilization approaches for reinforcement learning-based end-to-end autonomous driving
US20070100780A1 (en) Hybrid control device
CN109960246B (en) Action control method and device
Devo et al. Deep reinforcement learning for instruction following visual navigation in 3D maze-like environments
CN116540731B (en) Path planning method and system integrating LSTM and SAC algorithms
Liu et al. Episodic memory-based robotic planning under uncertainty
Kardell et al. Autonomous vehicle control via deep reinforcement learning
Ma et al. Learning to navigate in indoor environments: From memorizing to reasoning
Mun et al. Occlusion-aware crowd navigation using people as sensors
Zernetsch et al. A holistic view on probabilistic trajectory forecasting–case study. cyclist intention detection
Genc et al. Zero-shot reinforcement learning with deep attention convolutional neural networks
Zhu et al. Reciprocal consistency prediction network for multi-step human trajectory prediction
Jin et al. Safe-Nav: learning to prevent PointGoal navigation failure in unknown environments
US20220269948A1 (en) Training of a convolutional neural network
Desai et al. Auxiliary tasks for efficient learning of point-goal navigation
CN115439510B (en) Active target tracking method and system based on expert strategy guidance
CN117406762A (en) Unmanned aerial vehicle remote control algorithm based on sectional reinforcement learning
Cao et al. Instance-Aware predictive navigation in multi-agent environments
CN114609925B (en) Training method of underwater exploration strategy model and underwater exploration method of bionic machine fish
CN115700509A (en) Guideline generation method and system based on simulation feedback data
Anderson et al. Autonomous navigation via a deep Q network with one-hot image encoding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant