CN114970714B

CN114970714B - Track prediction method and system considering uncertain behavior mode of moving target

Info

Publication number: CN114970714B
Application number: CN202210582034.0A
Authority: CN
Inventors: 白成超; 颜鹏; 郭继峰; 郑红星
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2022-05-26
Filing date: 2022-05-26
Publication date: 2024-05-03
Anticipated expiration: 2042-05-26
Also published as: CN114970714A

Abstract

A track prediction method and a track prediction system considering uncertain behavior modes of a moving target relate to the technical field of track prediction of the moving target and are used for solving the problems that the existing method is poor in effect and low in accuracy in predicting the motion track of the target with uncertain behavior modes. The technical key points of the invention include: firstly, collecting historical motion trail data of a moving target as a training data set; then establishing a moving target behavior decision model and a moving target behavior preference model, and learning parameters of the moving target behavior preference model and the moving target behavior decision model from a training data set in a supervised learning mode; then alternately learning parameters of the moving target behavior decision model and the moving target behavior preference model from the training data set in an inverse reinforcement learning mode; and using the learned moving target behavior decision model to simulate a behavior decision process of the moving target, and predicting the motion trail of the moving target. The method and the device can remarkably improve the track prediction precision of the moving target.

Description

Track prediction method and system considering uncertain behavior mode of moving target

Technical Field

The invention relates to the technical field of moving target track prediction, in particular to a track prediction method and a track prediction system considering uncertain behavior modes of a moving target.

Background

The existing methods for predicting the moving target track are roughly divided into three types, namely a prediction method based on a moving target motion model, a prediction method based on moving target historical track data driving and a prediction method based on a moving target track planning process. In the prediction method based on the moving target motion model, the motion trail of the target is usually predicted by using a method based on filtering estimation according to the established target motion model and the observed target motion state, for example, the trail of the target is predicted by using methods such as Kalman filtering, unscented Kalman filtering, extended Kalman filtering and the like, but the prediction method based on filtering estimation of the moving target trail is seriously dependent on the target motion model, and if the established target model is inaccurate or can not obtain a more accurate target motion model, the motion trail of the target can not be accurately predicted; in a prediction method based on moving object historical track data driving, a model for predicting a moving object motion track is generally constructed by using a deep neural network, a Gaussian mixture model, a hidden Markov model and the like, model parameters are learned from collected object motion track data, motion behavior characteristics of an object are mapped into the learned model parameters, and then the track of the object is predicted based on a matching generation mode, and although the track prediction method based on the object historical track data driving can predict the track of the object by relying on the collected object motion track data in the absence of the object motion model, the defect is that the object is assumed to have a determined behavior mode and is consistent with the behavior mode reflected in the collected object motion track data set; since it is difficult to directly match the uncertain behavior pattern of the target, it is difficult to predict the motion trail of the target with the uncertain behavior pattern by this method; the method for predicting the moving track planning process based on the moving target predicts the moving track of the target by simulating the process of the moving track planning of the target, and the mode generally assumes that the target has an optimal behavior mode; however, in the real world, the behavior patterns of the moving targets are mostly not optimal, so that the behavior planning process of the targets cannot be simulated by using the optimal criteria, and the motion trail of the targets can be predicted according to the behavior planning process. The effective method for solving the problem is to learn the behavior mode of the target from the historical motion track data of the target in an inverse reinforcement learning mode, and then generate the motion track of the target in a mode of simulating the behavior planning of the target, so that the aim of predicting the motion track of the target is fulfilled.

Although the non-optimal behavior mode of the target can be simulated by the inverse reinforcement learning mode, for a moving target with an uncertain behavior mode moving in a complex environment, the existing target track prediction method based on the inverse reinforcement learning method cannot effectively simulate the uncertain behavior mode of the target, so that higher prediction accuracy is difficult to achieve.

Disclosure of Invention

In order to solve the problems of poor effect and low precision of the existing method for predicting the target motion trail with uncertain behavior modes, the invention provides a trail prediction method and a trail prediction system considering the uncertain behavior modes of a moving target, which can fully learn the uncertain behavior modes of the target from target motion trail data so as to accurately predict the motion trail of the target.

According to an aspect of the present invention, there is provided a trajectory prediction method considering an uncertain behavior pattern of a moving object, the method comprising the steps of:

step one, collecting historical motion trail data of a moving target in a complex environment, and taking the historical motion trail data as a training data set;

step two, according to the environment where the moving target is located and the behavior characteristics of the environment, a moving target behavior decision model and a moving target behavior preference model based on a convolutional neural network are established;

step three, learning model parameters of the moving target behavior decision model and the moving target behavior preference model from the training data set in a supervised learning mode;

Step four, taking the model parameters obtained in the previous step as initialization parameters, and alternately learning the moving target behavior decision model and the final model parameters of the moving target behavior preference model from the training data set in an inverse reinforcement learning mode to obtain a trained moving target behavior decision model;

and fifthly, simulating a behavior decision process of the moving target by using the trained moving target behavior decision model, and predicting the motion trail of the moving target.

Further, in the second step, a moving target behavior decision model based on the convolutional neural network is denoted as pi _θ (a|s), a moving target behavior preference model based on the convolutional neural network is denoted as r _ψ (s, a), wherein s represents an environmental state observed by the moving target, a represents a behavior action of the moving target, θ represents a model parameter of the moving target behavior decision model, and ψ represents a model parameter of the moving target behavior preference model; the inputs of the moving target behavior decision model and the moving target behavior preference model are the surrounding environment information o (M _t) observed by the moving target at the current position and the relative position relation o (b _g) of the moving target from the destination position, and the output of the moving target behavior decision model is the probability of selecting each behavior action of the moving target; the output of the moving target behavior preference model is a prize value for performing action a at the environmental state s.

Further, the specific step of learning to obtain the model parameters by means of supervised learning in the third step includes:

the frequency f (c _kl) of each grid unit in the environment grid map after the rasterization processing of all the motion tracks in the statistical training data set is calculated as follows:

Wherein: n _kl denotes the number of times that all the motion trajectories pass through each grid cell c _kl in the environmental grid map; n represents the total number of motion tracks;

Pre-training the moving target behavior preference model by minimizing the following loss function to obtain model parameters of the pre-trained moving target behavior preference model:

wherein: c _X and C _Y respectively represent the number of grid units of the environment grid map in the X-axis direction and the Y-axis direction; s (c _kl) represents an input state observed when the moving target is located in the grid cell c _kl in the environment grid map; r _ψ(s(c_kl)) represents an output value of the mobile target behavior preference model solution when the input state is s (c _kl);

Pre-training the moving target behavior decision model by minimizing the following loss function to obtain model parameters of the pre-trained moving target behavior decision model:

Wherein τ _i represents the i-th moving target motion track in the training data set D _demo; t _i denotes the total number of observed moments; representing the action executed by the target at the moment t; /(I) And represents the observed target motion state at the time t.

Further, the specific steps of the fourth step include:

step four, first: taking the model parameters obtained in the previous step as initialization parameters;

Step four, two: m moving object motion trajectories D _traj are sampled from the training dataset using a moving object behavior decision model pi _θ (a|s) M moving object motion trajectories D _traj and M moving object motion trajectories/>Merging to form a sampling track D _sample;

and step four, three: updating a model parameter psi of the moving target behavior preference model r _ψ (s, a) by a gradient descent method;

and step four: updating model parameters theta of a moving target behavior decision model pi _θ (a|s) by using a reinforcement learning algorithm and a sampling track D _sample;

Step four, five: and (3) circularly executing the second step to the fourth step, and ending the cycle until the training period reaches the preset maximum training period.

Further, the gradient calculation formula of the model parameter ψ of the moving target behavior preference model r _ψ (s, a) in the fourth step is as follows:

Wherein: r _ψ(τ_i) represents the cumulative prize value of the moving target behavior preference model at trace τ _i; r _ψ(τ_j) represents the cumulative prize value of the moving target behavior preference model at trace τ _j; omega _j represents the importance sampling factor corresponding to the trajectory τ _j.

According to another aspect of the present invention, there is provided a trajectory prediction system considering an uncertain behavior manner of a moving object, the system comprising:

A data collection module configured to collect historical motion trajectory data of a moving object in a complex environment as a training data set;

The model pre-training module is configured to establish a moving target behavior decision model and a moving target behavior preference model based on a convolutional neural network according to the environment in which the moving target is located and behavior characteristics of the environment; model parameters of the moving target behavior decision model and the moving target behavior preference model are learned from the training data set in a supervised learning mode;

The behavior decision model training module is configured to alternately learn the moving target behavior decision model and the final model parameters of the moving target behavior preference model from the training data set by adopting an inverse reinforcement learning mode by taking the model parameters obtained by learning as initialization parameters so as to obtain a trained moving target behavior decision model;

and the track prediction module is configured to simulate a behavior decision process of the moving target by using the trained moving target behavior decision model and predict the motion track of the moving target.

Further, a moving target behavior decision model based on a convolutional neural network in the model pre-training module is expressed as pi _θ (a|s), a moving target behavior preference model based on the convolutional neural network is expressed as r _ψ (s, a), wherein a represents behavior actions of the moving target, s represents environmental states observed by the moving target, θ represents model parameters of the moving target behavior decision model, and ψ represents model parameters of the moving target behavior preference model; the inputs of the moving target behavior decision model and the moving target behavior preference model are the surrounding environment information o (M _t) observed by the moving target at the current position and the relative position relation o (b _g) of the moving target from the destination position, and the output of the moving target behavior decision model is the probability of selecting each behavior action of the moving target; the output of the moving target behavior preference model is a prize value for performing action a at the environmental state s.

Further, the specific step of learning to obtain the model parameters by means of supervised learning in the model pre-training module comprises the following steps: the frequency f (c _kl) of each grid unit in the environment grid map after the rasterization processing of all the motion tracks in the statistical training data set is calculated as follows:

Further, the specific steps of model training in the behavior decision model training module comprise:

And step four, three: updating a model parameter psi of the moving target behavior preference model r _ψ (s, a) by a gradient descent method; wherein the gradient of the model parameter ψ is calculated as follows:

Wherein: r _ψ(τ_i) represents the cumulative prize value of the moving target behavior preference model at trace τ _i; r _ψ(τ_j) represents the cumulative prize value of the moving target behavior preference model at trace τ _j; omega _j represents the importance sampling factor corresponding to trajectory τ _j;

The beneficial technical effects of the invention are as follows:

The method predicts the motion trail of the moving target in the complex environment by simulating the uncertain behavior mode of the moving target, and can solve the trail prediction problem when the motion model of the moving target is unknown and the moving target has the uncertain behavior mode. Compared with the traditional method, the invention has the following advantages:

1) By learning the historical motion trail data of the moving target under the reverse reinforcement learning framework, a behavior preference model and a behavior decision model of the moving target can be learned, and then the motion trail conforming to the behavior mode of the target can be predicted by simulating the behavior decision process of the target;

2) Through supervised learning of historical motion trail data of the moving target, behavior characteristics of the moving target can be fully mined, and further, the learning process of a moving target behavior preference model and a behavior decision model is optimized, so that an uncertain behavior mode of the moving target can be learned, and the trail prediction precision of the moving target is improved.

Drawings

The invention may be better understood by reference to the following description taken in conjunction with the accompanying drawings, which are included to provide a further illustration of the preferred embodiments of the invention and to explain the principles and advantages of the invention, together with the detailed description below.

Fig. 1 is a flowchart of a track prediction method considering uncertain behavior patterns of a moving object according to an embodiment of the present invention.

FIG. 2 is a schematic diagram of a moving object behavior decision model structure in an embodiment of the invention.

Fig. 3 is a schematic diagram of an experimental verification scenario in an embodiment of the present invention.

FIG. 4 is a schematic diagram of a mobile object behavior decision model and a mobile object behavior preference model supervision training process according to an embodiment of the invention.

FIG. 5 is a schematic diagram of a training process of a mobile object behavior preference model under an inverse reinforcement learning architecture according to an embodiment of the present invention.

Fig. 6 is a schematic diagram of motion trail prediction of a moving object in an embodiment of the present invention.

Fig. 7 is a schematic diagram of a track prediction system considering uncertain behavior of a moving object according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, exemplary embodiments or examples of the present invention will be described below with reference to the accompanying drawings. It is apparent that the described embodiments or examples are only implementations or examples of a part of the invention, not all. All other embodiments or examples, which may be made by one of ordinary skill in the art without undue burden, are intended to be within the scope of the present invention based on the embodiments or examples herein.

Firstly, collecting historical motion trail data of a moving target as a training data set; then establishing a moving target behavior decision model and a moving target behavior preference model; then learning parameters of the mobile target behavior preference model and the mobile target behavior decision model from the training data set in a supervised learning mode; then alternately learning parameters of the moving target behavior decision model and the moving target behavior preference model from the training data set in an inverse reinforcement learning mode; and finally, the learned moving target behavior decision model is used for simulating a behavior decision process of the moving target, so that the motion trail of the moving target is predicted.

The embodiment of the invention provides a track prediction method considering uncertain behavior modes of a moving target, as shown in fig. 1, the method comprises the following steps:

step one: collecting historical motion trail data of a moving target in a complex environment, and taking the historical motion trail data as a training data set;

According to the embodiment of the invention, the collected historical motion trail data of the moving target in the complex environment is arranged into a training data set D _demo＝{τ₁,τ₂,…τ_N }, wherein Representing the motion trail of the ith moving target in the training data set D _demo, wherein the motion trail comprises the observed target motion state/>, observed at T _i momentsAction performed by the target/>N represents the number of motion trajectories in the training dataset D _demo.

Step two: establishing a moving target behavior decision model and a moving target behavior preference model according to the environmental characteristics of the moving target and the behavior characteristics of the moving target;

According to an embodiment of the present invention, first, a moving object behavior decision model pi _θ (a|s) is built based on a convolutional neural network, as shown in fig. 2. Where a represents the behavior action of the target, s= [ o (M _t),o(b_g) ] represents the environmental state observed by the target, and θ represents the parameters of the moving target behavior decision model. Considering that the behavior decision process of the moving target is influenced by the surrounding environment information of the moving target and the destination position, the two influencing factors are taken as the input of the model in the process of establishing the moving target behavior decision model. As shown in fig. 2, the inputs of the moving object behavior decision model are environmental information o (M _t) of its surroundings observed by the object at the current location, and a relative positional relationship o (b _g) of the moving object from its destination location. The moving target behavior decision model is formed by 5 layers of neural networks, wherein the first two layers of neural networks are two-dimensional convolutional neural networks which are mainly responsible for coding the surrounding environment information o (M _t) of the moving target, and key features in the o (M _t) are extracted through convolutional operation; and then splicing the information coded by the two layers of convolutional neural networks with a relative position relation o (b _g) to form fusion information containing the local environment information and the global destination information of the moving target, processing the fusion information through a 3-layer fully-connected network, performing depth coding on the fused information, and finally obtaining the probability of selecting each behavior action of the target. In the embodiment of the invention, the moving target has 8 behavior actions in the moving target behavior decision model, so the output dimension is 8.

Second, a moving target behavior preference model r _ψ (s, a) is built based on the convolutional neural network, where ψ represents the parameters of the moving target behavior preference model. The moving target behavior preference model established in this embodiment is basically identical to the moving target behavior decision model shown in fig. 1, except that its output layer has only one output unit and the activation function is Tanh, in order to limit the output value between (-1, 1).

In the present embodiment, the moving object behavior preference model r _ψ (s, a) is further simplified to r _ψ (s '), where s' represents a state of the moving object after performing the action a at the state s.

Step three: learning a moving target behavior preference model and a moving target behavior decision model from the training data set in a supervised learning mode;

according to the embodiment of the invention, first, the distribution characteristics of the motion trail in the training dataset D _demo are counted. Specifically, the frequency f (c _kl) of each grid unit c _kl in the environment map M _t after all the motion trajectories in the statistical training data set D _demo are rasterized is calculated as follows:

Wherein: n _kl represents the number of times all the motion trajectories in D _demo pass through grid cell c _kl in grid map M _t; the function min () represents taking the minimum value.

Then, the moving target behavior preference model is pre-trained by optimizing the loss function. Specifically, the moving target behavior preference model is pre-trained by minimizing the following loss function:

wherein: c _X and C _Y respectively represent the number of grid units of the grid map M _t in the X-axis direction and the Y-axis direction; s (c _kl) represents an input state observed when the moving target is located in the grid cell c _kl in the grid map M _t; r _ψ(s(c_kl)) represents an output value of the mobile target behavior preference model solution when the input state is s (c _kl).

Then, the moving target example trajectory probability distribution model is trained in a mode of optimizing the loss function. Specifically, the moving target example trajectory probability distribution model is pre-trained by minimizing the following loss function:

Step four: alternately learning parameters of the moving target behavior decision model and the moving target behavior preference model from the training data set by adopting an inverse reinforcement learning mode;

according to the embodiment of the invention, the method is realized by the following steps:

step four, first: taking the model parameter theta of the moving target behavior decision model obtained by the learning in the step three as an initialization parameter of a target behavior decision model pi _θ (a|s), and taking the model parameter phi of the moving target behavior preference model obtained by the learning as an initialization parameter of a target behavior preference model r _ψ (s, a);

Step four, two: sampling M moving object motion trajectories D _traj by using a moving object behavior decision model pi _θ (a|s);

and step four, three: sampling M moving target motion trajectories from a training dataset D _demo

And step four: m moving object motion tracks D _traj and M training tracksMerging to form a sampling track D _sample, namely D _sample←D_traj∪D_demo;

Step four, five: updating parameters of the moving object behavior preference model r _ψ (s, a) by a gradient descent method, wherein the gradient of the parameter ψ is calculated as follows:

wherein: A cumulative prize value under trace τ _i representing a moving target behavior preference model r _ψ (s, a); /(I) The importance sampling factor corresponding to the track tau _i is represented;

and step four, six: parameters of the moving object behavior decision model pi _θ (a|s) are updated using a reinforcement learning algorithm and the sampling trajectory D _sample.

Seventhly, the step four is that: judging whether the training period reaches the maximum training period E _max; if the maximum training period E _max is not reached, the process goes to the fourth step; if the maximum training period E _max is reached, the training is ended.

Step five: and (3) simulating a behavior decision process of the moving target by using the trained moving target behavior decision model in the step four, and predicting the motion trail of the moving target.

Further experiments prove the technical effect of the invention.

The correctness and the effectiveness of the invention are verified by adopting a digital simulation mode. First, a virtual complex environment is constructed in a Python environment, as shown in fig. 3. Wherein the inaccessible area indicates an area into which the moving object cannot enter, the accessible area indicates an area into which the moving object can enter, the destination area indicates a destination position of the moving object, the moving object moves from a start position (i.e., (0 m,0 m) in fig. 3) to its destination position, the movement behavior of the moving object is not optimal but has a certain uncertain behavior, and the target trajectory shown in fig. 3 is a primary trajectory of the moving object moving from the start position to the destination area. In the complex environment shown in fig. 3, the experiment randomly generates 500 moving object motion trajectories as the training data set D _demo. The simulation test software environment is Windows 10+Python3.7, and the hardware environment is AMD Ryzen 5 3550H CPU+16.0GB RAM.

The experiment first verifies whether the training process of step three and step four is converged.

Fig. 4 shows a loss value change curve in the training process of the moving target behavior preference model and the moving target example trajectory probability distribution model in the third step. As can be seen from fig. 4, the training process for both models was performed for a total of 1000 training cycles. During the pre-training of the mobile target behavior preference model, when the training period is more than 400, the loss value is calculatedSubstantially no further decline, indicating that the model training is substantially converged. In the training process of the moving target example track probability distribution model, when the training period is more than 800, the loss value/>Substantially no further decline, indicating that the model training is substantially converged.

Fig. 5 shows a loss value change curve when training the moving target behavior preference model by using the inverse reinforcement learning method in the fourth step. As can be seen from FIG. 5, the training process lasts for 125 training periods in total, and the loss value of the moving target behavior preference model increases with the increase of the training periodsThe absolute value of (2) approaches 0 gradually, which indicates that the training process gradually converges, that is, the behavior preference model of the moving target learned from the training data set gradually approaches the behavior mode of the real moving target.

The above results indicate that the training process of the third and fourth steps in the embodiment of the present invention is convergent, that is, stable model parameters can be learned from the training data set.

Further, the behavior mode that the target track predicted by the method accords with the target is verified through a one-time prediction process of the moving target motion track. Specifically, the motion behavior decision process of the target is simulated by using the motion target behavior decision model trained in the step four, and the simulated motion trail of the target is shown in fig. 6.

As can be seen from fig. 6, the target motion trajectory simulated by the moving target behavior decision model (i.e., the predicted trajectory shown in fig. 6) is substantially similar to the target example trajectory. In the target example track, the moving target enters the accessible area in the environment 4 times, and the target track predicted by the method successfully reproduces the first 3 times, and the last time is that the moving target is very close to the destination position, so the target track predicted by the method directly points to the destination position. Considering that the behavior of the moving object has certain uncertainty, it is difficult to accurately predict the motion trail of the moving object, and the motion trail predicted by the moving object trail prediction method provided by the invention is very close to the behavior mode of the moving object, which indicates that the moving object motion trail prediction method provided by the invention can predict the motion trail conforming to the behavior mode of the moving object.

According to the result, for the moving target with uncertain behavior mode moving in the complex environment, the method can learn the behavior decision model of the moving target and the behavior preference model of the moving target through learning the collected target motion track data, and further predict the motion track conforming to the behavior mode of the moving target through simulating the behavior decision process of the moving target. According to the method disclosed by the invention, the prediction of the motion trail of the moving target with an uncertain behavior mode, which moves in a complex environment, can be realized, and a new technical thought is provided for the motion trail prediction technology of the moving target.

Another embodiment of the present invention provides a trajectory prediction system considering uncertain behavior patterns of a moving object, as shown in fig. 7, the system includes:

a data collection module 10 configured to collect historical motion trajectory data of a moving object in a complex environment as a training data set;

The model pre-training module 20 is configured to establish a moving target behavior decision model and a moving target behavior preference model based on a convolutional neural network according to the environment in which the moving target is located and behavior characteristics thereof; model parameters of a moving target behavior decision model and a moving target behavior preference model are learned from a training data set in a supervised learning mode;

A behavior decision model training module 30 configured to alternately learn the final model parameters of the moving target behavior decision model and the moving target behavior preference model from the training data set by using the model parameters obtained by learning as initialization parameters in an inverse reinforcement learning manner, so as to obtain a trained moving target behavior decision model;

The trajectory prediction module 40 is configured to simulate the behavior decision process of the moving object by using the trained moving object behavior decision model, and predict the motion trajectory of the moving object.

In this embodiment, optionally, the moving target behavior decision model based on the convolutional neural network in the model pre-training module 20 is denoted as pi _θ (a|s), the moving target behavior preference model based on the convolutional neural network is denoted as r _ψ (s, a), where a represents the behavior action of the moving target, s represents the environmental state observed by the moving target, θ represents the model parameters of the moving target behavior decision model, and ψ represents the model parameters of the moving target behavior preference model; the input of the moving target behavior decision model and the moving target behavior preference model is the surrounding environment information o (M _t) observed by the moving target at the current position and the relative position relation o (b _g) of the moving target from the destination position, and the output of the moving target behavior decision model is the probability of selecting each behavior action of the moving target; the output of the moving target behavior preference model is the prize value for performing action a at the environmental state s.

In this embodiment, optionally, the specific steps of learning to obtain the model parameters by means of supervised learning in the model pre-training module 20 include: the frequency f (c _kl) of each grid unit in the environment grid map after the rasterization processing of all the motion tracks in the statistical training data set is calculated as follows:

pre-training the moving target behavior preference model by minimizing the following loss function to obtain pre-trained model parameters:

Pre-training the moving target behavior decision model by minimizing the following loss function to obtain pre-trained model parameters:

In this embodiment, optionally, the specific steps of model training in the behavior decision model training module 30 include:

Step four, two: m moving object motion trajectories D _traj are sampled from the training dataset using a moving object behavior decision model pi _θ (a|s) M moving object motion tracks D _traj and M tracksMerging to form a sampling track D _sample;

The function of the track prediction system that considers the uncertain behavior of the moving object according to the present embodiment may be described by the track prediction method that considers the uncertain behavior of the moving object, so that details of the track prediction system are not described in this embodiment, and reference may be made to the above method embodiments, which are not described herein.

While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of the above description, will appreciate that other embodiments are contemplated within the scope of the invention as described herein. The disclosure of the present invention is intended to be illustrative, but not limiting, of the scope of the invention, which is defined by the appended claims.

Claims

1. The track prediction method considering the uncertain behavior mode of the moving target is characterized by comprising the following steps of:

Step two, according to the environment where the moving target is located and the behavior characteristics of the environment, a moving target behavior decision model and a moving target behavior preference model based on a convolutional neural network are established; the method comprises the steps that a moving target behavior decision model based on a convolutional neural network is expressed as pi _θ (a|s), a moving target behavior preference model based on the convolutional neural network is expressed as r _ψ (s, a), wherein s represents an environment state observed by a moving target, a represents a behavior action of the moving target, θ represents model parameters of the moving target behavior decision model, and ψ represents model parameters of the moving target behavior preference model; the inputs of the moving target behavior decision model and the moving target behavior preference model are the surrounding environment information o (M _t) observed by the moving target at the current position and the relative position relation o (b _g) of the moving target from the destination position, and the output of the moving target behavior decision model is the probability of selecting each behavior action of the moving target; the output of the moving target behavior preference model is a reward value for executing action a at environmental state s;

step four, taking the model parameters obtained in the previous step as initialization parameters, and alternately learning the moving target behavior decision model and the final model parameters of the moving target behavior preference model from the training data set in an inverse reinforcement learning mode to obtain a trained moving target behavior decision model; the method comprises the following specific steps:

and step four, three: updating a model parameter psi of the moving target behavior preference model r _y (s, a) by a gradient descent method;

And step four: updating model parameters theta of a moving target behavior decision model pi _q (a|s) by using a reinforcement learning algorithm and a sampling track D _sample;

step four, five: the fourth step is executed circularly, and the circulation is ended until the training period reaches the preset maximum training period;

2. The trajectory prediction method considering uncertain behavior patterns of a moving object according to claim 1, wherein the specific step of learning to obtain model parameters by supervised learning in the step three comprises:

Wherein: c _X and C _Y respectively represent the number of grid units of the environment grid map in the X-axis direction and the Y-axis direction; s (c _kl) represents an input state observed when the moving target is located in the grid cell c _kl in the environment grid map; r _y(s(c_kl)) represents an output value of the mobile target behavior preference model solution when the input state is s (c _kl);

3. The trajectory prediction method considering uncertain behavior patterns of a moving object according to claim 2, wherein the gradient calculation formula of the model parameters ψ of the moving object behavior preference model r _ψ (s, a) in the fourth step is as follows:

4.A trajectory prediction system that considers uncertain behavior patterns of moving objects, comprising:

The model pre-training module is configured to establish a moving target behavior decision model and a moving target behavior preference model based on a convolutional neural network according to the environment in which the moving target is located and behavior characteristics of the environment; model parameters of the moving target behavior decision model and the moving target behavior preference model are learned from the training data set in a supervised learning mode; wherein a moving target behavior decision model based on the convolutional neural network is expressed as pi _θ (a|s), a moving target behavior preference model based on the convolutional neural network is expressed as r _ψ (s, a), wherein a represents the behavior action of the moving target, s represents the observed environmental state of the moving target, θ represents the model parameters of the moving target behavior decision model, and ψ represents the model parameters of the moving target behavior preference model; the inputs of the moving target behavior decision model and the moving target behavior preference model are the surrounding environment information o (M _t) observed by the moving target at the current position and the relative position relation o (b _g) of the moving target from the destination position, and the output of the moving target behavior decision model is the probability of selecting each behavior action of the moving target; the output of the moving target behavior preference model is a reward value for executing action a at environmental state s;

The behavior decision model training module is configured to alternately learn the moving target behavior decision model and the final model parameters of the moving target behavior preference model from the training data set by adopting an inverse reinforcement learning mode by taking the model parameters obtained by learning as initialization parameters so as to obtain a trained moving target behavior decision model; the specific steps of model training include:

and step four, three: updating a model parameter psi of the moving target behavior preference model r _y (s, a) by a gradient descent method; wherein the gradient of the model parameter ψ is calculated as follows:

5. The track prediction system considering uncertain behavior of a moving object according to claim 4, wherein the specific step of learning to obtain model parameters by supervised learning in the model pre-training module comprises: the frequency f (c _kl) of each grid unit in the environment grid map after the rasterization processing of all the motion tracks in the statistical training data set is calculated as follows: