CN114445465A - Track prediction method based on fusion inverse reinforcement learning - Google Patents

Track prediction method based on fusion inverse reinforcement learning Download PDF

Info

Publication number
CN114445465A
CN114445465A CN202210189127.7A CN202210189127A CN114445465A CN 114445465 A CN114445465 A CN 114445465A CN 202210189127 A CN202210189127 A CN 202210189127A CN 114445465 A CN114445465 A CN 114445465A
Authority
CN
China
Prior art keywords
scene
path
track
reinforcement learning
strategy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210189127.7A
Other languages
Chinese (zh)
Inventor
杨彪
王姝媛
杨长春
徐黎明
陈阳
吕继东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changzhou University
Original Assignee
Changzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changzhou University filed Critical Changzhou University
Priority to CN202210189127.7A priority Critical patent/CN114445465A/en
Publication of CN114445465A publication Critical patent/CN114445465A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/251Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30241Trajectory

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of pedestrian trajectory prediction and analysis, in particular to a trajectory prediction method based on fusion inverse reinforcement learning, which comprises S1, generating a path rewarding map and an end point rewarding map based on an input observation trajectory and a scene map; s2, obtaining a path by sampling the strategy by using an inverse reinforcement learning algorithm; s3, path position coding is carried out by using a full convolution network, a scene path is coded by fusing a bidirectional gating circulation unit, and the scene path and a pedestrian observation track are fused. According to the method, the ENet network is extracted by introducing the light-weight characteristics, so that the algorithm parameter quantity is reduced, and the generalization capability of the algorithm understanding scene is improved; by using the attention mechanism module of the scene, scene information and pedestrian observation tracks are better fused, and compared with a mainstream algorithm, the scene-oriented pedestrian track prediction network S2Tirl has a better effect on public data sets and actual data.

Description

Track prediction method based on fusion inverse reinforcement learning
Technical Field
The invention relates to the technical field of pedestrian trajectory prediction and analysis, in particular to a trajectory prediction method based on fusion inverse reinforcement learning.
Background
With the continuous improvement of travel demands of people, the increasing demand of traffic system intellectualization, and the continuous development of directions such as computer vision, sensor technology, control theory and the like, at present, the traditional method only relies on observation tracks, and with the increase of the number of tracks, a large number of tracks exist in the same scene, the problem of path selection can occur, and different from the traditional pedestrian track prediction method, the invention uses a reverse reinforcement learning method to code the scene, fuses observation track information, and provides a scene-oriented track prediction network, and in the face of urban scenes and some complex scenes, the traditional method only uses the observation tracks to code, and the utilization of scene information is insufficient, so that the generalization performance of the prediction track is poor in the face of new complex scenes, and the method for fusing scene information provided by the invention trains the capability of processing different elements in the scene, the capability is further transferred to a new scene, so that the generalization of the inferred path is better;
by utilizing a computer vision technology, a researcher can predict the pedestrian track depending on the kinematic characteristics such as the position and the speed of the observed track, the scene characteristics are introduced by the researcher to improve the generalization performance of track prediction, and a foundation is laid for the track prediction of a large number of complex scenes by utilizing an inverse reinforcement learning framework.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: and respectively coding scene path and pedestrian track information by combining the inverse reinforcement learning, the semantic segmentation network and the threshold recursion unit, and outputting a final predicted track after fusion coding.
The technical scheme adopted by the invention is as follows: a track prediction method based on fusion inverse reinforcement learning comprises the following steps:
s1, generating a path reward map and an end point reward map based on the input observation track and the scene graph;
further, the step S1 includes:
s11, introducing an observation track and extracting scene features;
the method comprises the steps that an asymmetric design of an ENet semantic segmentation network structure is utilized, five bottleneck modules including initialization, normal operation, down sampling, up sampling, expansion and asymmetry are adopted, and a scene graph is subjected to coding and decoding processing, so that a scene path is generated, and scene semantic understanding is achieved; meanwhile, the observation track is introduced to calculate the speed v of the intelligent agent and the distance r between the intelligent agent and the scene center to form the motion characteristic phi of the intelligent agentM(ii) a Fusing the image characteristics and the track characteristic information to generate scene characteristics;
s12, generating a reward map;
the multilayer perceptron neural network (MLP) is still perceptron in nature, also called artificial neural network, which may have a plurality of hidden layers in the middle except input and output layers, the simplest neural network only has one hidden layer, namely a three-layer structure, but the complexity is enhanced by the layer design, and there are three types of layers in the neural network, respectively: an input layer, a hidden layer, an output layer; the method comprises two layers of 2D convolution layers and a ReLU activation function, wherein the last layer is a Log-Sigmoid activation function, and scene characteristics and motion characteristics are input into an MLP (Multi-level layer processing) after being connectedpathAnd MLPgoalNetwork, finally generating path reward map and terminal reward map;
s2, taking the path reward map of S12 as an environment, introducing an inverse reinforcement learning algorithm to obtain an optimal strategy, and introducing Gumbel-softmax Trick to sample the strategy;
further, the S2 includes:
s21, calculating an optimal strategy by using inverse reinforcement learning;
after the path reward map and the end point reward map are obtained in step S1, a strategy pi is found and obtained in a grid environment of 21 × 21 size by using an inverse reinforcement learning algorithmθ(as), inverse reinforcement learning can be represented by the markov decision process MDPs, which is essentially a quadruple M ═ { S, a, T, r }, where S is the state space, a represents the action taken, T represents the state transition function T: sxa → S, r represents the reward function, and the state-action sequence τ { (S)1,a1),(s2,a2),...,(sN,aN) The probability of being proportional to the reward value index at maximum entropy, expressed in particular as
Figure BDA0003523911310000031
Where K is a normalization constant, N represents the length of the sequence, r (τ) represents the prize value of the sequence, r(s)i) Expressing a single state-action pair reward value, obtaining a reward function according to an expert example, and generating a strategy algorithm, wherein the aim is to maximize an expert example set T ═ (tau)12,...,τN) The log-likelihood value of (a) is,
Figure BDA0003523911310000032
solving the log-likelihood value L for using a gradient descent methodθIs simplified to
Figure BDA0003523911310000033
Wherein r isθ(τ) sequential reward value, Z, for exploration strategy GenerationθTo explore the strategically learned constants, FπIs the state access frequency, F, of the expert example τθIs strategy piθThe state access frequency of the generation path,
Figure BDA0003523911310000034
and
Figure BDA0003523911310000035
respectively represent the pair likelihood values LθAnd a reward function rθDerivation is carried out;
Figure BDA0003523911310000036
the probability of the action a is selected given a grid position s obtained by back propagation, wherein all grid positions s in the inverse reinforcement learning environment are spThe sum of the path points sgThe sum of the target points, action a is five actions of up, down, left, right and termination;
s22, using GST strategy to sample;
introduction of GST (Gumbel-s)Soft max check) generates a scene path according to the optimal strategy; different from random sampling in the traditional method, firstly sampling is carried out to make discrete probability distribution meaningful, but not only the value with the maximum probability is taken, secondly gradient is calculated, and strategy pi is achieved by introducing GSTθ *Perform a sampling generation action xπThe specific process is xπ=softmax(logπi+Gi) Wherein G isiRepresenting the standard Gumbel distribution of independent and same distribution, the Gumbel distribution is a kind of extreme value distribution, and the cumulative distribution function is
Figure BDA0003523911310000041
GiCan be obtained from a uniform distribution by inverting the Gumbel distribution;
s3, path position coding is carried out by using a full convolution network, a scene path is coded by fusing a bidirectional gating circulation unit, the scene path and the pedestrian observation track are fused, and the track prediction effect is improved;
further, the step S3 includes:
s31, a full convolution network;
sending the position code into a fusion bidirectional gate control circulation unit (BiGRU) through a full convolution network for time sequence coding, receiving an input scene graph with any size, adopting an anti-convolution layer to up-sample a feature graph of the last convolution layer, and restoring the feature graph to the same size of an input image, thereby generating a prediction for each pixel, simultaneously reserving spatial information in an original input image, and finally performing pixel classification on the up-sampled feature graph with odd and even numbers; after a strategy sampling scene path and an observation track are input, the strategy sampling scene path and the observation track are respectively coded and then sent to a scene-based attention mechanism (SBA) module, a prediction track is output through a decoder, a scene path scene _ path is input, a position code is sent to a fusion bidirectional gating circulation unit by using a full convolution network FC1 to carry out time sequence coding to generate a scene path hidden state code h _ scene, for the observation track obs _ traj, the position code is sent to a GRUenc by using an FC2 to carry out time sequence coding to generate an observation track hidden state code h _ obs, and h _ scene can be obtained as BiGRU (FC)1(scenepath),wBi) And h _ obs ═ GRUenc(FC2(obstraj),wGRU) Wherein w isBiAnd wGRULearnable parameters of BiGRU () and GRUenc () are respectively represented, the full convolution networks FC1 and FC2 are responsible for position coding, and a scene path is coded by using a fusion bidirectional gating circulation unit, so that the priori performance of the scene path is better exerted, and the track prediction effect is improved;
s32, performing time sequence coding on a threshold Recurrent Unit (GRU);
the threshold cycle unit is a variant of a Recurrent Neural Network (RNN), the calculated amount of the threshold cycle unit is smaller, each threshold cycle unit comprises an update gate for controlling information transmission, and the information of the current time and the previous time is processed, so that the current state is controlled, the reset gate is similar to the update gate, but the dependency of the reset gate on the previous time is controlled, therefore, the threshold cycle unit can effectively capture the long-short term relationship of the sequence, and is more suitable for solving the problem of dynamic identification;
s33, fusing a scene-based attention mechanism module;
after the scene path is obtained, introducing a better fusion track and scene information of the attention System (SBA) of the scene by using a multi-head attention system in a transform framework, on the premise of fully retaining the characteristics of the scene path, observing the correlation between the track and the scene path, inputting t-n moment observation track hidden state code h _ obs (t-n) and scene path hidden state code h _ scene, obtaining the code h _ obs (t-n +1) of the next moment,
Figure BDA0003523911310000051
get h _ obst=n+1Then, GRU is useddecEncoding timing information and decoding using a position decoder FC3
Figure BDA0003523911310000052
Output predicted position at that time:
Figure BDA0003523911310000053
the invention has the beneficial effects that:
1. by introducing the light-weight characteristic extraction ENet network, the algorithm parameter quantity is reduced, and the generalization capability of the algorithm understanding scene is improved;
2. GST sampling is used for strategies, so that the real probability distribution of the strategies can be correctly reflected, and the problem of insufficient robustness of random sampling strategies is solved;
3. by using the attention mechanism module of the scene, scene information and pedestrian observation tracks are better fused, and compared with a mainstream algorithm, the scene-oriented pedestrian track prediction network S2Tirl obtains better effects on a public data set and actual data.
Drawings
FIG. 1 is a flow chart of a pedestrian trajectory prediction system based on fusion inverse reinforcement learning according to the present invention;
FIG. 2 is a schematic diagram of the generation of a reward map using an ENet network architecture as proposed in the present invention;
FIG. 3 is a schematic diagram showing the comparison of sampling using an inverse reinforcement learning algorithm and a GST scene strategy proposed in the present invention;
FIG. 4 is a schematic diagram of a scene-based attention mechanism module fusing scene and pedestrian observation trajectory path generation as proposed in the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings and examples, which are simplified schematic drawings and illustrate only the basic structure of the invention in a schematic manner, and therefore only show the structures relevant to the invention.
The method comprehensively considers scene path and pedestrian track information, utilizes scene hidden information to increase prediction precision, and integrates the scene path and pedestrian track information to output a final predicted track;
as shown in fig. 1, which is a flow chart of a pedestrian trajectory prediction system based on fusion inverse reinforcement learning, a trajectory prediction method based on fusion inverse reinforcement learning includes the following steps:
by means of inverse reinforcement learning, the motion mode of the pedestrian is accurately understood from the motion track of the pedestrian, and meanwhile scene characteristics are introduced, so that the motion of the pedestrian around the current vehicle is more accurately predicted, and the generalization performance of a track prediction algorithm is improved; the method generates the scene path according to GST (Gumbel-softmax Trick) sampling, and can generate the scene path according to an optimal strategy; finally, information fusion is carried out on the pedestrian track prediction and the scene path, and after fusion is carried out by using a scene-based attention mechanism module, the correlation between the pedestrian track and the scene path is observed on the premise that the scene path characteristics are fully reserved;
the method comprises the following specific operation steps:
FIG. 2 presents a schematic diagram of the generation of a reward map using an ENet network architecture:
s1, generating a path reward map and an end point reward map based on the input observation track and the scene graph;
s11, introducing an observation track and extracting scene features;
the motion characteristics phi of the intelligent agent are formed by introducing parameters such as the observation track calculation speed v of the intelligent agent, the distance r between the intelligent agent and the scene center and the likeMGenerating motion characteristics, after acquiring pedestrian motion characteristics, in order to improve accuracy, adding scene graph information at the same time, utilizing asymmetric design of a semantic segmentation network ENet network structure, comprising five bottleneck modules of initialization, convention, down sampling, upper adoption, expansion and asymmetry, enhancing the characteristic information of each target in an image by inputting a scene graph, improving the acquisition of effective information by a subsequent semantic segmentation network, and reducing the influence of image noise on the network, thereby improving the identification accuracy of the target, building a semantic segmentation network of a road scene, after accurately acquiring the global information and the local information of each target, distinguishing according to different semantic meanings expressed by each pixel in a scene image, classifying each pixel in the image into image blocks with the size of 168 x 168 pixels at the last moment position in an observation track, ENetfeatUsing the initialization module of ENet and the first three layers of bottleneck blocks, the image with 3 × 168 × 168 of dimension is converted into a feature map phi of 128 × 21 × 21BFusing the feature information of each stage to generate a scene feature phiB
S12, generating the reward map
The scene characteristics phi generated in step S11BAnd exercise ofCharacteristic phiMMake a connection to input the MLPpathAnd MLPgoalA network for generating path reward maps and end point reward maps separately, a neural network multi-layer perceptron (MLP) is essentially a perceptron but complexity is enhanced by the design of layers, there are three types of layers in the neural network, respectively: the input layer, the hidden layer and the output layer comprise two 2D convolution layers and a ReLU activation function, the last layer is a Log-Sigmoid activation function, wherein the MLP layer is a multi-layer (MLP) layerpathThe network provides rewards for each action of generating paths, and can be used for judging the action termination position, MLPgoalProviding reward for the termination of the generation of a path for determining the location of termination of an action, the invention provides a method for determining the location of termination of an action by providing a reward to an ENetfeatThe network model parameters are pre-trained by using an aeroscape data set, so that the convergence speed of scene rewarding map training is increased;
fig. 3 shows a comparison between random sampling and gunn-bell sampling:
s21, calculating an optimal strategy by using inverse reinforcement learning;
after the path reward map and the end point reward map are obtained in the step S1, a reverse reinforcement learning algorithm is used for collecting a batch of pedestrian tracks in a grid environment with the size of 21 x 21, the forms of return functions are deduced after the pedestrian tracks are obtained, then behavior strategies are optimized according to the return functions, and finally the strategy pi is obtained through explorationθ(as), in the method of reasoning the return function, the maximum entropy reverse reinforcement learning is a probabilistic thinking reasoning mode, a sampling method is adopted, a model-free reinforcement learning algorithm is used for learning an optimal strategy under the current reward setting, then the strategy is used for collecting { Tj } to carry out unbiased estimation, on the other hand, the learning of the model-free reinforcement learning algorithm is carried out, an inaccurate strategy is used for similar estimation gradient, importance sampling is used for overcoming the deviation problem, an approximation iterative algorithm is introduced to calculate the optimal strategy piθUsing the formula: piθ(a | s) ═ exp (Q (s, a) -v (s)), where the state value function v(s) represents the desired prize for state s, and the state-action pair (s, a) is the desired prize; the strategy represents the probability of selecting action a given a grid location s, where all nets in the inverse reinforcement learning environmentGrid position s is spThe sum of the path points sgThe sum of the target points, action a is five actions of up, down, left, right and termination, and the action value Q is the state of the five actions(n)(s, a) is assigned, and after operation, the next wheel state function V is assigned(n-1)(s), finally obtaining a given grid position s, adopting the probability of each action a, and obtaining a strategy pi after N cyclesθN is determined by the policy generation path length;
s22, using GST strategy to sample;
introducing a GST (Gumbel-softmax Trick) method to sample from discontinuous probability distribution, generating a scene path according to an optimal strategy to improve sampling efficiency, and firstly, carrying out strategy piθSampling to obtain process xπ=softmax(logπi+Gi) Wherein G isiRepresenting a standard Gumbel distribution of independent equal distributions, the Gumbel distribution being a kind of extreme value distribution whose cumulative distribution function is
Figure BDA0003523911310000081
GiIt can be obtained from a uniform distribution by inverting the Gumbel distribution: gi=-log(-log(Ui)),UiThe discrete probability distribution is meaningful rather than only taking the value with the maximum probability, then the gradient needs to be calculated, and GST can correctly reflect the real probability distribution of the strategy, so that the problem of insufficient robustness of random sampling of the optimal strategy is solved;
FIG. 4 is a schematic diagram of the fusion of scene and pedestrian observation trajectory path generation by the SBA module;
s31 full convolution network
After a strategy sampling Scene path and an observation track are input, the strategy sampling Scene path and the observation track are respectively sent to a Scene-Based Attention mechanism (SBA) module after being coded, a prediction track is output through a decoder, a Scene path Scene _ path is input, the position code is sent to a bidirectional BiGRU by utilizing a full convolution network FC1 to carry out time sequence coding to generate a Scene path hidden state h _ Scene, and the observation path hidden state h _ Scene is obtained by observingThe trajectory obs _ traj is encoded by FC2, and then sent to GRUenc for time-series encoding to generate the motion hidden state h _ obs, so as to obtain h _ scene ═ BiGRU (FC)1(scenepath),wBi) And h _ obs ═ GRUenc(FC2(obstraj),wGRU) Wherein w isBiAnd wGRULearnable parameters respectively representing BiGRU () and GRUenc (), the full convolution networks FC1 and FC2 are responsible for position coding, and the BiGRU is used for coding the scene path, so that the priori performance of the scene path can be better exerted, and the track prediction effect can be improved;
s32, threshold cycle unit time sequence coding
The threshold cycle Unit (GRU) is a variant of Recurrent Neural Network (RNN) and has smaller calculation amount, each threshold cycle Unit comprises an update gate for controlling information transmission to process the information at the current moment and the previous moment so as to control the current state, the reset gate is similar to the update gate, but the side of the reset gate controls the dependency degree of the previous moment, therefore, the threshold cycle Unit can effectively capture the long-short term relation of the sequence and is more suitable for solving the problem of dynamic identification, a formula is generated after the time sequence coding, and wBiAnd wGRULearnable parameters of BiGRU () and GRUenc () are respectively represented, the full convolution networks FC1 and FC2 are responsible for position coding, and a bidirectional threshold cycle unit is used for coding a scene path, so that the priori performance of the scene path is better exerted, and the track prediction effect is improved;
s33, fusing a scene-based attention mechanism module;
the method comprises the steps of utilizing a multi-head attention mechanism in a transform framework, introducing a scene-based attention mechanism module to better obtain a fusion track and scene information after obtaining a scene path, inputting a strategy sampling scene path and an observation track, respectively sending the scene path and the observation track into the scene fusion attention mechanism module after encoding, outputting a prediction track through a decoder, and on the premise of fully retaining scene path characteristics, inputting observation track codes h _ obs (t is n) and scene path codes h _ scene to obtain codes h _ obs (t is n +1) of the next moment, wherein h _ obs is softmax (h _ obs)t=n*h_scene) h _ scene, yielding h _ obst=n+1Then, GRU is useddecEncoding timing information and using a position decoder FC3Decoding is carried out
Figure BDA0003523911310000101
Output predicted position at that time:
Figure BDA0003523911310000102
in light of the foregoing description of the preferred embodiment of the present invention, many modifications and variations will be apparent to those skilled in the art without departing from the spirit and scope of the invention. The technical scope of the present invention is not limited to the content of the specification, and must be determined according to the scope of the claims.

Claims (4)

1. A track prediction method based on fusion inverse reinforcement learning is characterized by comprising the following steps:
s1, generating a path reward map and an end point reward map based on the input observation track and the scene graph;
s2, obtaining a path by sampling the strategy by using an inverse reinforcement learning algorithm;
s3, path position coding is carried out by using a full convolution network, a scene path is coded by fusing a bidirectional gating circulation unit, and the scene path and a pedestrian observation track are fused.
2. The trajectory prediction method based on the fusion inverse reinforcement learning of claim 1, wherein the generating of the path reward map and the end point reward map based on the input observation trajectory and the scene graph comprises the following steps:
s11, introducing an observation track and extracting scene features;
encoding and decoding the scene graph by utilizing an ENet semantic segmentation network to generate a scene path; meanwhile, the observation track is introduced to calculate the speed v of the intelligent agent and the distance r between the intelligent agent and the scene center to form the motion characteristic phi of the intelligent agentM(ii) a Image processing methodFusing the information of the features and the track features to generate scene features;
s12, generating a reward map;
connecting scene characteristics and motion characteristics and inputting the scene characteristics and the motion characteristics into an MLPpathAnd MLPgoalAnd the network generates a path reward map and an end point reward map respectively.
3. The method for predicting the track based on the fusion inverse reinforcement learning as claimed in claim 2, wherein the step of obtaining the path by sampling the strategy with the inverse reinforcement learning algorithm comprises the following steps:
s21, calculating an optimal strategy by using inverse reinforcement learning;
after a path reward map and an end point reward map are obtained, a strategy pi is obtained by utilizing an inverse reinforcement learning algorithmθ(as), the sampling probability of the sequence is in direct proportion to the reward of the sequence, which is expressed as
Figure FDA0003523911300000011
Where K is a normalization constant, N represents the length of the sequence, r (τ) represents the prize value for the sequence, r(s)i) Representing a single state-action pair reward value; deriving reward functions from expert examples, generating a policy algorithm with the goal of maximizing the set of expert examples, T ═ τ12,...,τN) The log-likelihood value of (a) is,
Figure FDA0003523911300000021
solving the log-likelihood value L for using a gradient descent methodθIs simplified to
Figure FDA0003523911300000022
Wherein r isθ(τ) sequential reward value, Z, for exploration strategy GenerationθTo explore the strategically learned constants, FπIs the state access frequency, F, of the expert example τθIs strategy piθThe state access frequency of the generation path,
Figure FDA0003523911300000023
and
Figure FDA0003523911300000024
respectively represent the pair likelihood values LθAnd a reward function rθDerivation is carried out;
Figure FDA0003523911300000025
the probability of the action a is selected given a grid position s obtained by back propagation, wherein all grid positions s in the inverse reinforcement learning environment are spThe sum of the path points sgThe sum of the target points, action a is five actions of up, down, left, right and termination;
s22, using GST strategy to sample;
generating a scene path according to the optimal strategy, and introducing GST to strategy piθ *Perform a sampling generation action xπThe specific process is xπ=soft max(logπi+Gi) Wherein, GiA standard Gumbel distribution representing the same distribution independently, the cumulative distribution function of the Gumbel distribution being
Figure FDA0003523911300000026
GiObtained from the uniform distribution by inverting the Gumbel distribution.
4. The method for predicting the trajectory based on the fusion inverse reinforcement learning according to claim 1, wherein the path position is encoded by using a full convolution network, a scene path is encoded by using a fusion bidirectional gating circulation unit, and the fusion of the scene path and the observation trajectory of the pedestrian comprises the following steps:
s31, a full convolution network;
the position coding is sent to a fusion bidirectional gating circulating unit through a full convolution network for time sequence coding, after a strategy sampling scene path and an observation track are input, the position coding is sent to a scene-based attention mechanism module through coding, a prediction track is output through a decoder, a scene path scene _ path is input, and the position coding is carried out by utilizing the full convolution network FC1The codes are sent into a fusion bidirectional gating circulation unit to carry out time sequence coding to generate a scene path hidden state code h _ scene, for an observation track obs _ traj, position codes are sent into GRUenc to carry out time sequence coding after FC2 is used to generate an observation track hidden state code h _ obs, and h _ scene is obtained as biGRU (FC)1(scenepath),wBi) And h _ obs ═ GRUenc(FC2(obstraj),wGRU) Wherein w isBiAnd wGRULearnable parameters representing BiGRU () and GRUenc () respectively;
s32, a Gate Recycling Unit (GRU) time sequence coding;
each threshold cycle unit comprises an update gate for controlling information transmission, and processes information at the current time and the previous time so as to control the current state; resetting the long-short term relation of the sequence effectively captured by the threshold cycle unit at the moment before the gate control;
s33, fusing a scene-based attention mechanism module;
inputting t-n time observation track hidden state code h _ obs (t-n) and scene path hidden state code h _ scene, obtaining code h _ obs (t-n +1) of next time,
Figure FDA0003523911300000031
get h _ obst=n+1Then, GRU is useddecEncoding timing information and decoding using a position decoder FC3
Figure FDA0003523911300000032
Output predicted position at that time:
Figure FDA0003523911300000033
CN202210189127.7A 2022-02-28 2022-02-28 Track prediction method based on fusion inverse reinforcement learning Pending CN114445465A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210189127.7A CN114445465A (en) 2022-02-28 2022-02-28 Track prediction method based on fusion inverse reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210189127.7A CN114445465A (en) 2022-02-28 2022-02-28 Track prediction method based on fusion inverse reinforcement learning

Publications (1)

Publication Number Publication Date
CN114445465A true CN114445465A (en) 2022-05-06

Family

ID=81373523

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210189127.7A Pending CN114445465A (en) 2022-02-28 2022-02-28 Track prediction method based on fusion inverse reinforcement learning

Country Status (1)

Country Link
CN (1) CN114445465A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117273225A (en) * 2023-09-26 2023-12-22 西安理工大学 Pedestrian path prediction method based on space-time characteristics
CN117473961A (en) * 2023-12-27 2024-01-30 卓世科技(海南)有限公司 Market document generation method and system based on large language model
CN117808846A (en) * 2024-02-01 2024-04-02 中国科学院空天信息创新研究院 Target motion trail prediction method and device based on lightweight remote sensing basic model
CN117875535A (en) * 2024-03-13 2024-04-12 中南大学 Method and system for planning picking and delivering paths based on historical information embedding

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117273225A (en) * 2023-09-26 2023-12-22 西安理工大学 Pedestrian path prediction method based on space-time characteristics
CN117273225B (en) * 2023-09-26 2024-05-03 西安理工大学 Pedestrian path prediction method based on space-time characteristics
CN117473961A (en) * 2023-12-27 2024-01-30 卓世科技(海南)有限公司 Market document generation method and system based on large language model
CN117473961B (en) * 2023-12-27 2024-04-05 卓世科技(海南)有限公司 Market document generation method and system based on large language model
CN117808846A (en) * 2024-02-01 2024-04-02 中国科学院空天信息创新研究院 Target motion trail prediction method and device based on lightweight remote sensing basic model
CN117875535A (en) * 2024-03-13 2024-04-12 中南大学 Method and system for planning picking and delivering paths based on historical information embedding
CN117875535B (en) * 2024-03-13 2024-06-04 中南大学 Method and system for planning picking and delivering paths based on historical information embedding

Similar Documents

Publication Publication Date Title
CN112418409B (en) Improved convolution long-short-term memory network space-time sequence prediction method by using attention mechanism
CN114445465A (en) Track prediction method based on fusion inverse reinforcement learning
CN110458844B (en) Semantic segmentation method for low-illumination scene
CN113688723B (en) Infrared image pedestrian target detection method based on improved YOLOv5
CN108596958B (en) Target tracking method based on difficult positive sample generation
CN110309732B (en) Behavior identification method based on skeleton video
Saxena et al. D-GAN: Deep generative adversarial nets for spatio-temporal prediction
CN112863180A (en) Traffic speed prediction method, device, electronic equipment and computer readable medium
CN113139446B (en) End-to-end automatic driving behavior decision method, system and terminal equipment
CN111652903A (en) Pedestrian target tracking method based on convolution correlation network in automatic driving scene
CN113378775B (en) Video shadow detection and elimination method based on deep learning
CN110599443A (en) Visual saliency detection method using bidirectional long-term and short-term memory network
CN116246338B (en) Behavior recognition method based on graph convolution and transducer composite neural network
CN115331460B (en) Large-scale traffic signal control method and device based on deep reinforcement learning
CN113313123A (en) Semantic inference based glance path prediction method
CN114116944A (en) Trajectory prediction method and device based on time attention convolution network
CN115457081A (en) Hierarchical fusion prediction method based on graph neural network
CN115376103A (en) Pedestrian trajectory prediction method based on space-time diagram attention network
CN115457657A (en) Method for identifying channel characteristic interaction time modeling behaviors based on BERT model
CN115113165A (en) Radar echo extrapolation method, device and system
Li et al. Video prediction for driving scenes with a memory differential motion network model
CN117456449B (en) Efficient cross-modal crowd counting method based on specific information
CN117237411A (en) Pedestrian multi-target tracking method based on deep learning
CN115861664A (en) Feature matching method and system based on local feature fusion and self-attention mechanism
Šarić et al. Dense semantic forecasting in video by joint regression of features and feature motion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination