CN114970714B - Track prediction method and system considering uncertain behavior mode of moving target - Google Patents

Track prediction method and system considering uncertain behavior mode of moving target Download PDF

Info

Publication number
CN114970714B
CN114970714B CN202210582034.0A CN202210582034A CN114970714B CN 114970714 B CN114970714 B CN 114970714B CN 202210582034 A CN202210582034 A CN 202210582034A CN 114970714 B CN114970714 B CN 114970714B
Authority
CN
China
Prior art keywords
moving target
model
behavior
target behavior
moving
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210582034.0A
Other languages
Chinese (zh)
Other versions
CN114970714A (en
Inventor
白成超
颜鹏
郭继峰
郑红星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN202210582034.0A priority Critical patent/CN114970714B/en
Publication of CN114970714A publication Critical patent/CN114970714A/en
Application granted granted Critical
Publication of CN114970714B publication Critical patent/CN114970714B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Feedback Control In General (AREA)

Abstract

A track prediction method and a track prediction system considering uncertain behavior modes of a moving target relate to the technical field of track prediction of the moving target and are used for solving the problems that the existing method is poor in effect and low in accuracy in predicting the motion track of the target with uncertain behavior modes. The technical key points of the invention include: firstly, collecting historical motion trail data of a moving target as a training data set; then establishing a moving target behavior decision model and a moving target behavior preference model, and learning parameters of the moving target behavior preference model and the moving target behavior decision model from a training data set in a supervised learning mode; then alternately learning parameters of the moving target behavior decision model and the moving target behavior preference model from the training data set in an inverse reinforcement learning mode; and using the learned moving target behavior decision model to simulate a behavior decision process of the moving target, and predicting the motion trail of the moving target. The method and the device can remarkably improve the track prediction precision of the moving target.

Description

Track prediction method and system considering uncertain behavior mode of moving target
Technical Field
The invention relates to the technical field of moving target track prediction, in particular to a track prediction method and a track prediction system considering uncertain behavior modes of a moving target.
Background
The existing methods for predicting the moving target track are roughly divided into three types, namely a prediction method based on a moving target motion model, a prediction method based on moving target historical track data driving and a prediction method based on a moving target track planning process. In the prediction method based on the moving target motion model, the motion trail of the target is usually predicted by using a method based on filtering estimation according to the established target motion model and the observed target motion state, for example, the trail of the target is predicted by using methods such as Kalman filtering, unscented Kalman filtering, extended Kalman filtering and the like, but the prediction method based on filtering estimation of the moving target trail is seriously dependent on the target motion model, and if the established target model is inaccurate or can not obtain a more accurate target motion model, the motion trail of the target can not be accurately predicted; in a prediction method based on moving object historical track data driving, a model for predicting a moving object motion track is generally constructed by using a deep neural network, a Gaussian mixture model, a hidden Markov model and the like, model parameters are learned from collected object motion track data, motion behavior characteristics of an object are mapped into the learned model parameters, and then the track of the object is predicted based on a matching generation mode, and although the track prediction method based on the object historical track data driving can predict the track of the object by relying on the collected object motion track data in the absence of the object motion model, the defect is that the object is assumed to have a determined behavior mode and is consistent with the behavior mode reflected in the collected object motion track data set; since it is difficult to directly match the uncertain behavior pattern of the target, it is difficult to predict the motion trail of the target with the uncertain behavior pattern by this method; the method for predicting the moving track planning process based on the moving target predicts the moving track of the target by simulating the process of the moving track planning of the target, and the mode generally assumes that the target has an optimal behavior mode; however, in the real world, the behavior patterns of the moving targets are mostly not optimal, so that the behavior planning process of the targets cannot be simulated by using the optimal criteria, and the motion trail of the targets can be predicted according to the behavior planning process. The effective method for solving the problem is to learn the behavior mode of the target from the historical motion track data of the target in an inverse reinforcement learning mode, and then generate the motion track of the target in a mode of simulating the behavior planning of the target, so that the aim of predicting the motion track of the target is fulfilled.
Although the non-optimal behavior mode of the target can be simulated by the inverse reinforcement learning mode, for a moving target with an uncertain behavior mode moving in a complex environment, the existing target track prediction method based on the inverse reinforcement learning method cannot effectively simulate the uncertain behavior mode of the target, so that higher prediction accuracy is difficult to achieve.
Disclosure of Invention
In order to solve the problems of poor effect and low precision of the existing method for predicting the target motion trail with uncertain behavior modes, the invention provides a trail prediction method and a trail prediction system considering the uncertain behavior modes of a moving target, which can fully learn the uncertain behavior modes of the target from target motion trail data so as to accurately predict the motion trail of the target.
According to an aspect of the present invention, there is provided a trajectory prediction method considering an uncertain behavior pattern of a moving object, the method comprising the steps of:
step one, collecting historical motion trail data of a moving target in a complex environment, and taking the historical motion trail data as a training data set;
step two, according to the environment where the moving target is located and the behavior characteristics of the environment, a moving target behavior decision model and a moving target behavior preference model based on a convolutional neural network are established;
step three, learning model parameters of the moving target behavior decision model and the moving target behavior preference model from the training data set in a supervised learning mode;
Step four, taking the model parameters obtained in the previous step as initialization parameters, and alternately learning the moving target behavior decision model and the final model parameters of the moving target behavior preference model from the training data set in an inverse reinforcement learning mode to obtain a trained moving target behavior decision model;
and fifthly, simulating a behavior decision process of the moving target by using the trained moving target behavior decision model, and predicting the motion trail of the moving target.
Further, in the second step, a moving target behavior decision model based on the convolutional neural network is denoted as pi θ (a|s), a moving target behavior preference model based on the convolutional neural network is denoted as r ψ (s, a), wherein s represents an environmental state observed by the moving target, a represents a behavior action of the moving target, θ represents a model parameter of the moving target behavior decision model, and ψ represents a model parameter of the moving target behavior preference model; the inputs of the moving target behavior decision model and the moving target behavior preference model are the surrounding environment information o (M t) observed by the moving target at the current position and the relative position relation o (b g) of the moving target from the destination position, and the output of the moving target behavior decision model is the probability of selecting each behavior action of the moving target; the output of the moving target behavior preference model is a prize value for performing action a at the environmental state s.
Further, the specific step of learning to obtain the model parameters by means of supervised learning in the third step includes:
the frequency f (c kl) of each grid unit in the environment grid map after the rasterization processing of all the motion tracks in the statistical training data set is calculated as follows:
Wherein: n kl denotes the number of times that all the motion trajectories pass through each grid cell c kl in the environmental grid map; n represents the total number of motion tracks;
Pre-training the moving target behavior preference model by minimizing the following loss function to obtain model parameters of the pre-trained moving target behavior preference model:
wherein: c X and C Y respectively represent the number of grid units of the environment grid map in the X-axis direction and the Y-axis direction; s (c kl) represents an input state observed when the moving target is located in the grid cell c kl in the environment grid map; r ψ(s(ckl)) represents an output value of the mobile target behavior preference model solution when the input state is s (c kl);
Pre-training the moving target behavior decision model by minimizing the following loss function to obtain model parameters of the pre-trained moving target behavior decision model:
Wherein τ i represents the i-th moving target motion track in the training data set D demo; t i denotes the total number of observed moments; representing the action executed by the target at the moment t; /(I) And represents the observed target motion state at the time t.
Further, the specific steps of the fourth step include:
step four, first: taking the model parameters obtained in the previous step as initialization parameters;
Step four, two: m moving object motion trajectories D traj are sampled from the training dataset using a moving object behavior decision model pi θ (a|s) M moving object motion trajectories D traj and M moving object motion trajectories/>Merging to form a sampling track D sample;
and step four, three: updating a model parameter psi of the moving target behavior preference model r ψ (s, a) by a gradient descent method;
and step four: updating model parameters theta of a moving target behavior decision model pi θ (a|s) by using a reinforcement learning algorithm and a sampling track D sample;
Step four, five: and (3) circularly executing the second step to the fourth step, and ending the cycle until the training period reaches the preset maximum training period.
Further, the gradient calculation formula of the model parameter ψ of the moving target behavior preference model r ψ (s, a) in the fourth step is as follows:
Wherein: r ψi) represents the cumulative prize value of the moving target behavior preference model at trace τ i; r ψj) represents the cumulative prize value of the moving target behavior preference model at trace τ j; omega j represents the importance sampling factor corresponding to the trajectory τ j.
According to another aspect of the present invention, there is provided a trajectory prediction system considering an uncertain behavior manner of a moving object, the system comprising:
A data collection module configured to collect historical motion trajectory data of a moving object in a complex environment as a training data set;
The model pre-training module is configured to establish a moving target behavior decision model and a moving target behavior preference model based on a convolutional neural network according to the environment in which the moving target is located and behavior characteristics of the environment; model parameters of the moving target behavior decision model and the moving target behavior preference model are learned from the training data set in a supervised learning mode;
The behavior decision model training module is configured to alternately learn the moving target behavior decision model and the final model parameters of the moving target behavior preference model from the training data set by adopting an inverse reinforcement learning mode by taking the model parameters obtained by learning as initialization parameters so as to obtain a trained moving target behavior decision model;
and the track prediction module is configured to simulate a behavior decision process of the moving target by using the trained moving target behavior decision model and predict the motion track of the moving target.
Further, a moving target behavior decision model based on a convolutional neural network in the model pre-training module is expressed as pi θ (a|s), a moving target behavior preference model based on the convolutional neural network is expressed as r ψ (s, a), wherein a represents behavior actions of the moving target, s represents environmental states observed by the moving target, θ represents model parameters of the moving target behavior decision model, and ψ represents model parameters of the moving target behavior preference model; the inputs of the moving target behavior decision model and the moving target behavior preference model are the surrounding environment information o (M t) observed by the moving target at the current position and the relative position relation o (b g) of the moving target from the destination position, and the output of the moving target behavior decision model is the probability of selecting each behavior action of the moving target; the output of the moving target behavior preference model is a prize value for performing action a at the environmental state s.
Further, the specific step of learning to obtain the model parameters by means of supervised learning in the model pre-training module comprises the following steps: the frequency f (c kl) of each grid unit in the environment grid map after the rasterization processing of all the motion tracks in the statistical training data set is calculated as follows:
Wherein: n kl denotes the number of times that all the motion trajectories pass through each grid cell c kl in the environmental grid map; n represents the total number of motion tracks;
Pre-training the moving target behavior preference model by minimizing the following loss function to obtain model parameters of the pre-trained moving target behavior preference model:
wherein: c X and C Y respectively represent the number of grid units of the environment grid map in the X-axis direction and the Y-axis direction; s (c kl) represents an input state observed when the moving target is located in the grid cell c kl in the environment grid map; r ψ(s(ckl)) represents an output value of the mobile target behavior preference model solution when the input state is s (c kl);
Pre-training the moving target behavior decision model by minimizing the following loss function to obtain model parameters of the pre-trained moving target behavior decision model:
Wherein τ i represents the i-th moving target motion track in the training data set D demo; t i denotes the total number of observed moments; representing the action executed by the target at the moment t; /(I) And represents the observed target motion state at the time t.
Further, the specific steps of model training in the behavior decision model training module comprise:
step four, first: taking the model parameters obtained in the previous step as initialization parameters;
Step four, two: m moving object motion trajectories D traj are sampled from the training dataset using a moving object behavior decision model pi θ (a|s) M moving object motion trajectories D traj and M moving object motion trajectories/>Merging to form a sampling track D sample;
And step four, three: updating a model parameter psi of the moving target behavior preference model r ψ (s, a) by a gradient descent method; wherein the gradient of the model parameter ψ is calculated as follows:
Wherein: r ψi) represents the cumulative prize value of the moving target behavior preference model at trace τ i; r ψj) represents the cumulative prize value of the moving target behavior preference model at trace τ j; omega j represents the importance sampling factor corresponding to trajectory τ j;
and step four: updating model parameters theta of a moving target behavior decision model pi θ (a|s) by using a reinforcement learning algorithm and a sampling track D sample;
Step four, five: and (3) circularly executing the second step to the fourth step, and ending the cycle until the training period reaches the preset maximum training period.
The beneficial technical effects of the invention are as follows:
The method predicts the motion trail of the moving target in the complex environment by simulating the uncertain behavior mode of the moving target, and can solve the trail prediction problem when the motion model of the moving target is unknown and the moving target has the uncertain behavior mode. Compared with the traditional method, the invention has the following advantages:
1) By learning the historical motion trail data of the moving target under the reverse reinforcement learning framework, a behavior preference model and a behavior decision model of the moving target can be learned, and then the motion trail conforming to the behavior mode of the target can be predicted by simulating the behavior decision process of the target;
2) Through supervised learning of historical motion trail data of the moving target, behavior characteristics of the moving target can be fully mined, and further, the learning process of a moving target behavior preference model and a behavior decision model is optimized, so that an uncertain behavior mode of the moving target can be learned, and the trail prediction precision of the moving target is improved.
Drawings
The invention may be better understood by reference to the following description taken in conjunction with the accompanying drawings, which are included to provide a further illustration of the preferred embodiments of the invention and to explain the principles and advantages of the invention, together with the detailed description below.
Fig. 1 is a flowchart of a track prediction method considering uncertain behavior patterns of a moving object according to an embodiment of the present invention.
FIG. 2 is a schematic diagram of a moving object behavior decision model structure in an embodiment of the invention.
Fig. 3 is a schematic diagram of an experimental verification scenario in an embodiment of the present invention.
FIG. 4 is a schematic diagram of a mobile object behavior decision model and a mobile object behavior preference model supervision training process according to an embodiment of the invention.
FIG. 5 is a schematic diagram of a training process of a mobile object behavior preference model under an inverse reinforcement learning architecture according to an embodiment of the present invention.
Fig. 6 is a schematic diagram of motion trail prediction of a moving object in an embodiment of the present invention.
Fig. 7 is a schematic diagram of a track prediction system considering uncertain behavior of a moving object according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, exemplary embodiments or examples of the present invention will be described below with reference to the accompanying drawings. It is apparent that the described embodiments or examples are only implementations or examples of a part of the invention, not all. All other embodiments or examples, which may be made by one of ordinary skill in the art without undue burden, are intended to be within the scope of the present invention based on the embodiments or examples herein.
Firstly, collecting historical motion trail data of a moving target as a training data set; then establishing a moving target behavior decision model and a moving target behavior preference model; then learning parameters of the mobile target behavior preference model and the mobile target behavior decision model from the training data set in a supervised learning mode; then alternately learning parameters of the moving target behavior decision model and the moving target behavior preference model from the training data set in an inverse reinforcement learning mode; and finally, the learned moving target behavior decision model is used for simulating a behavior decision process of the moving target, so that the motion trail of the moving target is predicted.
The embodiment of the invention provides a track prediction method considering uncertain behavior modes of a moving target, as shown in fig. 1, the method comprises the following steps:
step one: collecting historical motion trail data of a moving target in a complex environment, and taking the historical motion trail data as a training data set;
According to the embodiment of the invention, the collected historical motion trail data of the moving target in the complex environment is arranged into a training data set D demo={τ12,…τN }, wherein Representing the motion trail of the ith moving target in the training data set D demo, wherein the motion trail comprises the observed target motion state/>, observed at T i momentsAction performed by the target/>N represents the number of motion trajectories in the training dataset D demo.
Step two: establishing a moving target behavior decision model and a moving target behavior preference model according to the environmental characteristics of the moving target and the behavior characteristics of the moving target;
According to an embodiment of the present invention, first, a moving object behavior decision model pi θ (a|s) is built based on a convolutional neural network, as shown in fig. 2. Where a represents the behavior action of the target, s= [ o (M t),o(bg) ] represents the environmental state observed by the target, and θ represents the parameters of the moving target behavior decision model. Considering that the behavior decision process of the moving target is influenced by the surrounding environment information of the moving target and the destination position, the two influencing factors are taken as the input of the model in the process of establishing the moving target behavior decision model. As shown in fig. 2, the inputs of the moving object behavior decision model are environmental information o (M t) of its surroundings observed by the object at the current location, and a relative positional relationship o (b g) of the moving object from its destination location. The moving target behavior decision model is formed by 5 layers of neural networks, wherein the first two layers of neural networks are two-dimensional convolutional neural networks which are mainly responsible for coding the surrounding environment information o (M t) of the moving target, and key features in the o (M t) are extracted through convolutional operation; and then splicing the information coded by the two layers of convolutional neural networks with a relative position relation o (b g) to form fusion information containing the local environment information and the global destination information of the moving target, processing the fusion information through a 3-layer fully-connected network, performing depth coding on the fused information, and finally obtaining the probability of selecting each behavior action of the target. In the embodiment of the invention, the moving target has 8 behavior actions in the moving target behavior decision model, so the output dimension is 8.
Second, a moving target behavior preference model r ψ (s, a) is built based on the convolutional neural network, where ψ represents the parameters of the moving target behavior preference model. The moving target behavior preference model established in this embodiment is basically identical to the moving target behavior decision model shown in fig. 1, except that its output layer has only one output unit and the activation function is Tanh, in order to limit the output value between (-1, 1).
In the present embodiment, the moving object behavior preference model r ψ (s, a) is further simplified to r ψ (s '), where s' represents a state of the moving object after performing the action a at the state s.
Step three: learning a moving target behavior preference model and a moving target behavior decision model from the training data set in a supervised learning mode;
according to the embodiment of the invention, first, the distribution characteristics of the motion trail in the training dataset D demo are counted. Specifically, the frequency f (c kl) of each grid unit c kl in the environment map M t after all the motion trajectories in the statistical training data set D demo are rasterized is calculated as follows:
Wherein: n kl represents the number of times all the motion trajectories in D demo pass through grid cell c kl in grid map M t; the function min () represents taking the minimum value.
Then, the moving target behavior preference model is pre-trained by optimizing the loss function. Specifically, the moving target behavior preference model is pre-trained by minimizing the following loss function:
wherein: c X and C Y respectively represent the number of grid units of the grid map M t in the X-axis direction and the Y-axis direction; s (c kl) represents an input state observed when the moving target is located in the grid cell c kl in the grid map M t; r ψ(s(ckl)) represents an output value of the mobile target behavior preference model solution when the input state is s (c kl).
Then, the moving target example trajectory probability distribution model is trained in a mode of optimizing the loss function. Specifically, the moving target example trajectory probability distribution model is pre-trained by minimizing the following loss function:
Step four: alternately learning parameters of the moving target behavior decision model and the moving target behavior preference model from the training data set by adopting an inverse reinforcement learning mode;
according to the embodiment of the invention, the method is realized by the following steps:
step four, first: taking the model parameter theta of the moving target behavior decision model obtained by the learning in the step three as an initialization parameter of a target behavior decision model pi θ (a|s), and taking the model parameter phi of the moving target behavior preference model obtained by the learning as an initialization parameter of a target behavior preference model r ψ (s, a);
Step four, two: sampling M moving object motion trajectories D traj by using a moving object behavior decision model pi θ (a|s);
and step four, three: sampling M moving target motion trajectories from a training dataset D demo
And step four: m moving object motion tracks D traj and M training tracksMerging to form a sampling track D sample, namely D sample←Dtraj∪Ddemo;
Step four, five: updating parameters of the moving object behavior preference model r ψ (s, a) by a gradient descent method, wherein the gradient of the parameter ψ is calculated as follows:
wherein: A cumulative prize value under trace τ i representing a moving target behavior preference model r ψ (s, a); /(I) The importance sampling factor corresponding to the track tau i is represented;
and step four, six: parameters of the moving object behavior decision model pi θ (a|s) are updated using a reinforcement learning algorithm and the sampling trajectory D sample.
Seventhly, the step four is that: judging whether the training period reaches the maximum training period E max; if the maximum training period E max is not reached, the process goes to the fourth step; if the maximum training period E max is reached, the training is ended.
Step five: and (3) simulating a behavior decision process of the moving target by using the trained moving target behavior decision model in the step four, and predicting the motion trail of the moving target.
Further experiments prove the technical effect of the invention.
The correctness and the effectiveness of the invention are verified by adopting a digital simulation mode. First, a virtual complex environment is constructed in a Python environment, as shown in fig. 3. Wherein the inaccessible area indicates an area into which the moving object cannot enter, the accessible area indicates an area into which the moving object can enter, the destination area indicates a destination position of the moving object, the moving object moves from a start position (i.e., (0 m,0 m) in fig. 3) to its destination position, the movement behavior of the moving object is not optimal but has a certain uncertain behavior, and the target trajectory shown in fig. 3 is a primary trajectory of the moving object moving from the start position to the destination area. In the complex environment shown in fig. 3, the experiment randomly generates 500 moving object motion trajectories as the training data set D demo. The simulation test software environment is Windows 10+Python3.7, and the hardware environment is AMD Ryzen 5 3550H CPU+16.0GB RAM.
The experiment first verifies whether the training process of step three and step four is converged.
Fig. 4 shows a loss value change curve in the training process of the moving target behavior preference model and the moving target example trajectory probability distribution model in the third step. As can be seen from fig. 4, the training process for both models was performed for a total of 1000 training cycles. During the pre-training of the mobile target behavior preference model, when the training period is more than 400, the loss value is calculatedSubstantially no further decline, indicating that the model training is substantially converged. In the training process of the moving target example track probability distribution model, when the training period is more than 800, the loss value/>Substantially no further decline, indicating that the model training is substantially converged.
Fig. 5 shows a loss value change curve when training the moving target behavior preference model by using the inverse reinforcement learning method in the fourth step. As can be seen from FIG. 5, the training process lasts for 125 training periods in total, and the loss value of the moving target behavior preference model increases with the increase of the training periodsThe absolute value of (2) approaches 0 gradually, which indicates that the training process gradually converges, that is, the behavior preference model of the moving target learned from the training data set gradually approaches the behavior mode of the real moving target.
The above results indicate that the training process of the third and fourth steps in the embodiment of the present invention is convergent, that is, stable model parameters can be learned from the training data set.
Further, the behavior mode that the target track predicted by the method accords with the target is verified through a one-time prediction process of the moving target motion track. Specifically, the motion behavior decision process of the target is simulated by using the motion target behavior decision model trained in the step four, and the simulated motion trail of the target is shown in fig. 6.
As can be seen from fig. 6, the target motion trajectory simulated by the moving target behavior decision model (i.e., the predicted trajectory shown in fig. 6) is substantially similar to the target example trajectory. In the target example track, the moving target enters the accessible area in the environment 4 times, and the target track predicted by the method successfully reproduces the first 3 times, and the last time is that the moving target is very close to the destination position, so the target track predicted by the method directly points to the destination position. Considering that the behavior of the moving object has certain uncertainty, it is difficult to accurately predict the motion trail of the moving object, and the motion trail predicted by the moving object trail prediction method provided by the invention is very close to the behavior mode of the moving object, which indicates that the moving object motion trail prediction method provided by the invention can predict the motion trail conforming to the behavior mode of the moving object.
According to the result, for the moving target with uncertain behavior mode moving in the complex environment, the method can learn the behavior decision model of the moving target and the behavior preference model of the moving target through learning the collected target motion track data, and further predict the motion track conforming to the behavior mode of the moving target through simulating the behavior decision process of the moving target. According to the method disclosed by the invention, the prediction of the motion trail of the moving target with an uncertain behavior mode, which moves in a complex environment, can be realized, and a new technical thought is provided for the motion trail prediction technology of the moving target.
Another embodiment of the present invention provides a trajectory prediction system considering uncertain behavior patterns of a moving object, as shown in fig. 7, the system includes:
a data collection module 10 configured to collect historical motion trajectory data of a moving object in a complex environment as a training data set;
The model pre-training module 20 is configured to establish a moving target behavior decision model and a moving target behavior preference model based on a convolutional neural network according to the environment in which the moving target is located and behavior characteristics thereof; model parameters of a moving target behavior decision model and a moving target behavior preference model are learned from a training data set in a supervised learning mode;
A behavior decision model training module 30 configured to alternately learn the final model parameters of the moving target behavior decision model and the moving target behavior preference model from the training data set by using the model parameters obtained by learning as initialization parameters in an inverse reinforcement learning manner, so as to obtain a trained moving target behavior decision model;
The trajectory prediction module 40 is configured to simulate the behavior decision process of the moving object by using the trained moving object behavior decision model, and predict the motion trajectory of the moving object.
In this embodiment, optionally, the moving target behavior decision model based on the convolutional neural network in the model pre-training module 20 is denoted as pi θ (a|s), the moving target behavior preference model based on the convolutional neural network is denoted as r ψ (s, a), where a represents the behavior action of the moving target, s represents the environmental state observed by the moving target, θ represents the model parameters of the moving target behavior decision model, and ψ represents the model parameters of the moving target behavior preference model; the input of the moving target behavior decision model and the moving target behavior preference model is the surrounding environment information o (M t) observed by the moving target at the current position and the relative position relation o (b g) of the moving target from the destination position, and the output of the moving target behavior decision model is the probability of selecting each behavior action of the moving target; the output of the moving target behavior preference model is the prize value for performing action a at the environmental state s.
In this embodiment, optionally, the specific steps of learning to obtain the model parameters by means of supervised learning in the model pre-training module 20 include: the frequency f (c kl) of each grid unit in the environment grid map after the rasterization processing of all the motion tracks in the statistical training data set is calculated as follows:
Wherein: n kl denotes the number of times that all the motion trajectories pass through each grid cell c kl in the environmental grid map; n represents the total number of motion tracks;
pre-training the moving target behavior preference model by minimizing the following loss function to obtain pre-trained model parameters:
wherein: c X and C Y respectively represent the number of grid units of the environment grid map in the X-axis direction and the Y-axis direction; s (c kl) represents an input state observed when the moving target is located in the grid cell c kl in the environment grid map; r ψ(s(ckl)) represents an output value of the mobile target behavior preference model solution when the input state is s (c kl);
Pre-training the moving target behavior decision model by minimizing the following loss function to obtain pre-trained model parameters:
Wherein τ i represents the i-th moving target motion track in the training data set D demo; t i denotes the total number of observed moments; representing the action executed by the target at the moment t; /(I) And represents the observed target motion state at the time t.
In this embodiment, optionally, the specific steps of model training in the behavior decision model training module 30 include:
step four, first: taking the model parameters obtained in the previous step as initialization parameters;
Step four, two: m moving object motion trajectories D traj are sampled from the training dataset using a moving object behavior decision model pi θ (a|s) M moving object motion tracks D traj and M tracksMerging to form a sampling track D sample;
And step four, three: updating a model parameter psi of the moving target behavior preference model r ψ (s, a) by a gradient descent method; wherein the gradient of the model parameter ψ is calculated as follows:
Wherein: r ψi) represents the cumulative prize value of the moving target behavior preference model at trace τ i; r ψj) represents the cumulative prize value of the moving target behavior preference model at trace τ j; omega j represents the importance sampling factor corresponding to trajectory τ j;
and step four: updating model parameters theta of a moving target behavior decision model pi θ (a|s) by using a reinforcement learning algorithm and a sampling track D sample;
Step four, five: and (3) circularly executing the second step to the fourth step, and ending the cycle until the training period reaches the preset maximum training period.
The function of the track prediction system that considers the uncertain behavior of the moving object according to the present embodiment may be described by the track prediction method that considers the uncertain behavior of the moving object, so that details of the track prediction system are not described in this embodiment, and reference may be made to the above method embodiments, which are not described herein.
While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of the above description, will appreciate that other embodiments are contemplated within the scope of the invention as described herein. The disclosure of the present invention is intended to be illustrative, but not limiting, of the scope of the invention, which is defined by the appended claims.

Claims (5)

1. The track prediction method considering the uncertain behavior mode of the moving target is characterized by comprising the following steps of:
step one, collecting historical motion trail data of a moving target in a complex environment, and taking the historical motion trail data as a training data set;
Step two, according to the environment where the moving target is located and the behavior characteristics of the environment, a moving target behavior decision model and a moving target behavior preference model based on a convolutional neural network are established; the method comprises the steps that a moving target behavior decision model based on a convolutional neural network is expressed as pi θ (a|s), a moving target behavior preference model based on the convolutional neural network is expressed as r ψ (s, a), wherein s represents an environment state observed by a moving target, a represents a behavior action of the moving target, θ represents model parameters of the moving target behavior decision model, and ψ represents model parameters of the moving target behavior preference model; the inputs of the moving target behavior decision model and the moving target behavior preference model are the surrounding environment information o (M t) observed by the moving target at the current position and the relative position relation o (b g) of the moving target from the destination position, and the output of the moving target behavior decision model is the probability of selecting each behavior action of the moving target; the output of the moving target behavior preference model is a reward value for executing action a at environmental state s;
step three, learning model parameters of the moving target behavior decision model and the moving target behavior preference model from the training data set in a supervised learning mode;
step four, taking the model parameters obtained in the previous step as initialization parameters, and alternately learning the moving target behavior decision model and the final model parameters of the moving target behavior preference model from the training data set in an inverse reinforcement learning mode to obtain a trained moving target behavior decision model; the method comprises the following specific steps:
step four, first: taking the model parameters obtained in the previous step as initialization parameters;
Step four, two: m moving object motion trajectories D traj are sampled from the training dataset using a moving object behavior decision model pi θ (a|s) M moving object motion trajectories D traj and M moving object motion trajectories/>Merging to form a sampling track D sample;
and step four, three: updating a model parameter psi of the moving target behavior preference model r y (s, a) by a gradient descent method;
And step four: updating model parameters theta of a moving target behavior decision model pi q (a|s) by using a reinforcement learning algorithm and a sampling track D sample;
step four, five: the fourth step is executed circularly, and the circulation is ended until the training period reaches the preset maximum training period;
and fifthly, simulating a behavior decision process of the moving target by using the trained moving target behavior decision model, and predicting the motion trail of the moving target.
2. The trajectory prediction method considering uncertain behavior patterns of a moving object according to claim 1, wherein the specific step of learning to obtain model parameters by supervised learning in the step three comprises:
the frequency f (c kl) of each grid unit in the environment grid map after the rasterization processing of all the motion tracks in the statistical training data set is calculated as follows:
Wherein: n kl denotes the number of times that all the motion trajectories pass through each grid cell c kl in the environmental grid map; n represents the total number of motion tracks;
Pre-training the moving target behavior preference model by minimizing the following loss function to obtain model parameters of the pre-trained moving target behavior preference model:
Wherein: c X and C Y respectively represent the number of grid units of the environment grid map in the X-axis direction and the Y-axis direction; s (c kl) represents an input state observed when the moving target is located in the grid cell c kl in the environment grid map; r y(s(ckl)) represents an output value of the mobile target behavior preference model solution when the input state is s (c kl);
Pre-training the moving target behavior decision model by minimizing the following loss function to obtain model parameters of the pre-trained moving target behavior decision model:
Wherein τ i represents the i-th moving target motion track in the training data set D demo; t i denotes the total number of observed moments; representing the action executed by the target at the moment t; /(I) And represents the observed target motion state at the time t.
3. The trajectory prediction method considering uncertain behavior patterns of a moving object according to claim 2, wherein the gradient calculation formula of the model parameters ψ of the moving object behavior preference model r ψ (s, a) in the fourth step is as follows:
Wherein: r ψi) represents the cumulative prize value of the moving target behavior preference model at trace τ i; r ψj) represents the cumulative prize value of the moving target behavior preference model at trace τ j; omega j represents the importance sampling factor corresponding to the trajectory τ j.
4.A trajectory prediction system that considers uncertain behavior patterns of moving objects, comprising:
A data collection module configured to collect historical motion trajectory data of a moving object in a complex environment as a training data set;
The model pre-training module is configured to establish a moving target behavior decision model and a moving target behavior preference model based on a convolutional neural network according to the environment in which the moving target is located and behavior characteristics of the environment; model parameters of the moving target behavior decision model and the moving target behavior preference model are learned from the training data set in a supervised learning mode; wherein a moving target behavior decision model based on the convolutional neural network is expressed as pi θ (a|s), a moving target behavior preference model based on the convolutional neural network is expressed as r ψ (s, a), wherein a represents the behavior action of the moving target, s represents the observed environmental state of the moving target, θ represents the model parameters of the moving target behavior decision model, and ψ represents the model parameters of the moving target behavior preference model; the inputs of the moving target behavior decision model and the moving target behavior preference model are the surrounding environment information o (M t) observed by the moving target at the current position and the relative position relation o (b g) of the moving target from the destination position, and the output of the moving target behavior decision model is the probability of selecting each behavior action of the moving target; the output of the moving target behavior preference model is a reward value for executing action a at environmental state s;
The behavior decision model training module is configured to alternately learn the moving target behavior decision model and the final model parameters of the moving target behavior preference model from the training data set by adopting an inverse reinforcement learning mode by taking the model parameters obtained by learning as initialization parameters so as to obtain a trained moving target behavior decision model; the specific steps of model training include:
step four, first: taking the model parameters obtained in the previous step as initialization parameters;
Step four, two: m moving object motion trajectories D traj are sampled from the training dataset using a moving object behavior decision model pi θ (a|s) M moving object motion trajectories D traj and M moving object motion trajectories/>Merging to form a sampling track D sample;
and step four, three: updating a model parameter psi of the moving target behavior preference model r y (s, a) by a gradient descent method; wherein the gradient of the model parameter ψ is calculated as follows:
Wherein: r ψi) represents the cumulative prize value of the moving target behavior preference model at trace τ i; r ψj) represents the cumulative prize value of the moving target behavior preference model at trace τ j; omega j represents the importance sampling factor corresponding to trajectory τ j;
and step four: updating model parameters theta of a moving target behavior decision model pi θ (a|s) by using a reinforcement learning algorithm and a sampling track D sample;
Step four, five: the fourth step is executed circularly, and the circulation is ended until the training period reaches the preset maximum training period;
and the track prediction module is configured to simulate a behavior decision process of the moving target by using the trained moving target behavior decision model and predict the motion track of the moving target.
5. The track prediction system considering uncertain behavior of a moving object according to claim 4, wherein the specific step of learning to obtain model parameters by supervised learning in the model pre-training module comprises: the frequency f (c kl) of each grid unit in the environment grid map after the rasterization processing of all the motion tracks in the statistical training data set is calculated as follows:
Wherein: n kl denotes the number of times that all the motion trajectories pass through each grid cell c kl in the environmental grid map; n represents the total number of motion tracks;
Pre-training the moving target behavior preference model by minimizing the following loss function to obtain model parameters of the pre-trained moving target behavior preference model:
wherein: c X and C Y respectively represent the number of grid units of the environment grid map in the X-axis direction and the Y-axis direction; s (c kl) represents an input state observed when the moving target is located in the grid cell c kl in the environment grid map; r ψ(s(ckl)) represents an output value of the mobile target behavior preference model solution when the input state is s (c kl);
Pre-training the moving target behavior decision model by minimizing the following loss function to obtain model parameters of the pre-trained moving target behavior decision model:
Wherein τ i represents the i-th moving target motion track in the training data set D demo; t i denotes the total number of observed moments; representing the action executed by the target at the moment t; /(I) And represents the observed target motion state at the time t.
CN202210582034.0A 2022-05-26 2022-05-26 Track prediction method and system considering uncertain behavior mode of moving target Active CN114970714B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210582034.0A CN114970714B (en) 2022-05-26 2022-05-26 Track prediction method and system considering uncertain behavior mode of moving target

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210582034.0A CN114970714B (en) 2022-05-26 2022-05-26 Track prediction method and system considering uncertain behavior mode of moving target

Publications (2)

Publication Number Publication Date
CN114970714A CN114970714A (en) 2022-08-30
CN114970714B true CN114970714B (en) 2024-05-03

Family

ID=82956619

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210582034.0A Active CN114970714B (en) 2022-05-26 2022-05-26 Track prediction method and system considering uncertain behavior mode of moving target

Country Status (1)

Country Link
CN (1) CN114970714B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103914985A (en) * 2014-04-25 2014-07-09 大连理工大学 Method for predicting future speed trajectory of hybrid power bus
CN112364119A (en) * 2020-12-01 2021-02-12 国家海洋信息中心 Ocean buoy track prediction method based on LSTM coding and decoding model
CN112717415A (en) * 2021-01-22 2021-04-30 上海交通大学 Information bottleneck theory-based AI (Artificial intelligence) training method for reinforcement learning battle game
CN113204718A (en) * 2021-04-22 2021-08-03 武汉大学 Vehicle track destination prediction method considering space-time semantics and driving state
CN113221449A (en) * 2021-04-27 2021-08-06 中国科学院国家空间科学中心 Ship track real-time prediction method and system based on optimal strategy learning
CN113467515A (en) * 2021-07-22 2021-10-01 南京大学 Unmanned aerial vehicle flight control method based on virtual environment simulation reconstruction and reinforcement learning
CN113771034A (en) * 2021-09-17 2021-12-10 西北工业大学 Robot trajectory prediction method based on model confidence and Gaussian process
WO2023052010A1 (en) * 2021-09-29 2023-04-06 Nokia Technologies Oy Trajectory data collection in mobile telecommunication systems

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210181768A1 (en) * 2019-10-29 2021-06-17 Loon Llc Controllers for Lighter-Than-Air (LTA) Vehicles Using Deep Reinforcement Learning

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103914985A (en) * 2014-04-25 2014-07-09 大连理工大学 Method for predicting future speed trajectory of hybrid power bus
CN112364119A (en) * 2020-12-01 2021-02-12 国家海洋信息中心 Ocean buoy track prediction method based on LSTM coding and decoding model
CN112717415A (en) * 2021-01-22 2021-04-30 上海交通大学 Information bottleneck theory-based AI (Artificial intelligence) training method for reinforcement learning battle game
CN113204718A (en) * 2021-04-22 2021-08-03 武汉大学 Vehicle track destination prediction method considering space-time semantics and driving state
CN113221449A (en) * 2021-04-27 2021-08-06 中国科学院国家空间科学中心 Ship track real-time prediction method and system based on optimal strategy learning
CN113467515A (en) * 2021-07-22 2021-10-01 南京大学 Unmanned aerial vehicle flight control method based on virtual environment simulation reconstruction and reinforcement learning
CN113771034A (en) * 2021-09-17 2021-12-10 西北工业大学 Robot trajectory prediction method based on model confidence and Gaussian process
WO2023052010A1 (en) * 2021-09-29 2023-04-06 Nokia Technologies Oy Trajectory data collection in mobile telecommunication systems

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DP-BPR: Destination prediction based on Bayesian personalized ranking;JIANG Feng et al.;Springer;20211231;第494−506页 *
基于隐马尔科夫模型的VLCC目的港预测;杨春 等;上海海事大学学报;20201231;第41卷(第4期);第42-49页 *

Also Published As

Publication number Publication date
CN114970714A (en) 2022-08-30

Similar Documents

Publication Publication Date Title
CN113110592B (en) Unmanned aerial vehicle obstacle avoidance and path planning method
Zyner et al. Naturalistic driver intention and path prediction using recurrent neural networks
Bianchi et al. Accelerating autonomous learning by using heuristic selection of actions
Zhao et al. A spatial-temporal attention model for human trajectory prediction.
Havangi et al. A square root unscented FastSLAM with improved proposal distribution and resampling
CN106169188A (en) A kind of method for tracing object based on the search of Monte Carlo tree
CN114460943B (en) Self-adaptive target navigation method and system for service robot
CN109977571B (en) Simulation calculation method and device based on data and model mixing
CN109556609B (en) Artificial intelligence-based collision avoidance method and device
CN115280322A (en) Hidden state planning actor control using learning
Tang et al. Adaptive probabilistic vehicle trajectory prediction through physically feasible bayesian recurrent neural network
CN114881339A (en) Vehicle trajectory prediction method, system, computer device, and storage medium
CN110795522B (en) Method and device for predicting track position of mobile user
Zernetsch et al. A holistic view on probabilistic trajectory forecasting–case study. cyclist intention detection
CN114970714B (en) Track prediction method and system considering uncertain behavior mode of moving target
CN112829744A (en) Vehicle long time domain track prediction method based on longitudinal and transverse coupling
Zhou et al. SA-SGAN: A Vehicle Trajectory Prediction Model Based on Generative Adversarial Networks
Arbabi et al. Planning for autonomous driving via interaction-aware probabilistic action policies
CN113139644A (en) Information source navigation method and device based on deep Monte Carlo tree search
CN115009291A (en) Automatic driving aid decision-making method and system based on network evolution replay buffer area
CN114613159A (en) Traffic signal lamp control method, device and equipment based on deep reinforcement learning
CN114527759A (en) End-to-end driving method based on layered reinforcement learning
CN114154582A (en) Deep reinforcement learning method based on environment dynamic decomposition model
CN117556681B (en) Intelligent air combat decision method, system and electronic equipment
CN111476020B (en) Text generation method based on meta reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant