CN114445465A - Track prediction method based on fusion inverse reinforcement learning - Google Patents
Track prediction method based on fusion inverse reinforcement learning Download PDFInfo
- Publication number
- CN114445465A CN114445465A CN202210189127.7A CN202210189127A CN114445465A CN 114445465 A CN114445465 A CN 114445465A CN 202210189127 A CN202210189127 A CN 202210189127A CN 114445465 A CN114445465 A CN 114445465A
- Authority
- CN
- China
- Prior art keywords
- scene
- path
- track
- reinforcement learning
- strategy
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
- G06T7/251—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30241—Trajectory
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to the technical field of pedestrian trajectory prediction and analysis, in particular to a trajectory prediction method based on fusion inverse reinforcement learning, which comprises S1, generating a path rewarding map and an end point rewarding map based on an input observation trajectory and a scene map; s2, obtaining a path by sampling the strategy by using an inverse reinforcement learning algorithm; s3, path position coding is carried out by using a full convolution network, a scene path is coded by fusing a bidirectional gating circulation unit, and the scene path and a pedestrian observation track are fused. According to the method, the ENet network is extracted by introducing the light-weight characteristics, so that the algorithm parameter quantity is reduced, and the generalization capability of the algorithm understanding scene is improved; by using the attention mechanism module of the scene, scene information and pedestrian observation tracks are better fused, and compared with a mainstream algorithm, the scene-oriented pedestrian track prediction network S2Tirl has a better effect on public data sets and actual data.
Description
Technical Field
The invention relates to the technical field of pedestrian trajectory prediction and analysis, in particular to a trajectory prediction method based on fusion inverse reinforcement learning.
Background
With the continuous improvement of travel demands of people, the increasing demand of traffic system intellectualization, and the continuous development of directions such as computer vision, sensor technology, control theory and the like, at present, the traditional method only relies on observation tracks, and with the increase of the number of tracks, a large number of tracks exist in the same scene, the problem of path selection can occur, and different from the traditional pedestrian track prediction method, the invention uses a reverse reinforcement learning method to code the scene, fuses observation track information, and provides a scene-oriented track prediction network, and in the face of urban scenes and some complex scenes, the traditional method only uses the observation tracks to code, and the utilization of scene information is insufficient, so that the generalization performance of the prediction track is poor in the face of new complex scenes, and the method for fusing scene information provided by the invention trains the capability of processing different elements in the scene, the capability is further transferred to a new scene, so that the generalization of the inferred path is better;
by utilizing a computer vision technology, a researcher can predict the pedestrian track depending on the kinematic characteristics such as the position and the speed of the observed track, the scene characteristics are introduced by the researcher to improve the generalization performance of track prediction, and a foundation is laid for the track prediction of a large number of complex scenes by utilizing an inverse reinforcement learning framework.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: and respectively coding scene path and pedestrian track information by combining the inverse reinforcement learning, the semantic segmentation network and the threshold recursion unit, and outputting a final predicted track after fusion coding.
The technical scheme adopted by the invention is as follows: a track prediction method based on fusion inverse reinforcement learning comprises the following steps:
s1, generating a path reward map and an end point reward map based on the input observation track and the scene graph;
further, the step S1 includes:
s11, introducing an observation track and extracting scene features;
the method comprises the steps that an asymmetric design of an ENet semantic segmentation network structure is utilized, five bottleneck modules including initialization, normal operation, down sampling, up sampling, expansion and asymmetry are adopted, and a scene graph is subjected to coding and decoding processing, so that a scene path is generated, and scene semantic understanding is achieved; meanwhile, the observation track is introduced to calculate the speed v of the intelligent agent and the distance r between the intelligent agent and the scene center to form the motion characteristic phi of the intelligent agentM(ii) a Fusing the image characteristics and the track characteristic information to generate scene characteristics;
s12, generating a reward map;
the multilayer perceptron neural network (MLP) is still perceptron in nature, also called artificial neural network, which may have a plurality of hidden layers in the middle except input and output layers, the simplest neural network only has one hidden layer, namely a three-layer structure, but the complexity is enhanced by the layer design, and there are three types of layers in the neural network, respectively: an input layer, a hidden layer, an output layer; the method comprises two layers of 2D convolution layers and a ReLU activation function, wherein the last layer is a Log-Sigmoid activation function, and scene characteristics and motion characteristics are input into an MLP (Multi-level layer processing) after being connectedpathAnd MLPgoalNetwork, finally generating path reward map and terminal reward map;
s2, taking the path reward map of S12 as an environment, introducing an inverse reinforcement learning algorithm to obtain an optimal strategy, and introducing Gumbel-softmax Trick to sample the strategy;
further, the S2 includes:
s21, calculating an optimal strategy by using inverse reinforcement learning;
after the path reward map and the end point reward map are obtained in step S1, a strategy pi is found and obtained in a grid environment of 21 × 21 size by using an inverse reinforcement learning algorithmθ(as), inverse reinforcement learning can be represented by the markov decision process MDPs, which is essentially a quadruple M ═ { S, a, T, r }, where S is the state space, a represents the action taken, T represents the state transition function T: sxa → S, r represents the reward function, and the state-action sequence τ { (S)1,a1),(s2,a2),...,(sN,aN) The probability of being proportional to the reward value index at maximum entropy, expressed in particular asWhere K is a normalization constant, N represents the length of the sequence, r (τ) represents the prize value of the sequence, r(s)i) Expressing a single state-action pair reward value, obtaining a reward function according to an expert example, and generating a strategy algorithm, wherein the aim is to maximize an expert example set T ═ (tau)1,τ2,...,τN) The log-likelihood value of (a) is,solving the log-likelihood value L for using a gradient descent methodθIs simplified toWherein r isθ(τ) sequential reward value, Z, for exploration strategy GenerationθTo explore the strategically learned constants, FπIs the state access frequency, F, of the expert example τθIs strategy piθThe state access frequency of the generation path,andrespectively represent the pair likelihood values LθAnd a reward function rθDerivation is carried out;the probability of the action a is selected given a grid position s obtained by back propagation, wherein all grid positions s in the inverse reinforcement learning environment are spThe sum of the path points sgThe sum of the target points, action a is five actions of up, down, left, right and termination;
s22, using GST strategy to sample;
introduction of GST (Gumbel-s)Soft max check) generates a scene path according to the optimal strategy; different from random sampling in the traditional method, firstly sampling is carried out to make discrete probability distribution meaningful, but not only the value with the maximum probability is taken, secondly gradient is calculated, and strategy pi is achieved by introducing GSTθ *Perform a sampling generation action xπThe specific process is xπ=softmax(logπi+Gi) Wherein G isiRepresenting the standard Gumbel distribution of independent and same distribution, the Gumbel distribution is a kind of extreme value distribution, and the cumulative distribution function isGiCan be obtained from a uniform distribution by inverting the Gumbel distribution;
s3, path position coding is carried out by using a full convolution network, a scene path is coded by fusing a bidirectional gating circulation unit, the scene path and the pedestrian observation track are fused, and the track prediction effect is improved;
further, the step S3 includes:
s31, a full convolution network;
sending the position code into a fusion bidirectional gate control circulation unit (BiGRU) through a full convolution network for time sequence coding, receiving an input scene graph with any size, adopting an anti-convolution layer to up-sample a feature graph of the last convolution layer, and restoring the feature graph to the same size of an input image, thereby generating a prediction for each pixel, simultaneously reserving spatial information in an original input image, and finally performing pixel classification on the up-sampled feature graph with odd and even numbers; after a strategy sampling scene path and an observation track are input, the strategy sampling scene path and the observation track are respectively coded and then sent to a scene-based attention mechanism (SBA) module, a prediction track is output through a decoder, a scene path scene _ path is input, a position code is sent to a fusion bidirectional gating circulation unit by using a full convolution network FC1 to carry out time sequence coding to generate a scene path hidden state code h _ scene, for the observation track obs _ traj, the position code is sent to a GRUenc by using an FC2 to carry out time sequence coding to generate an observation track hidden state code h _ obs, and h _ scene can be obtained as BiGRU (FC)1(scenepath),wBi) And h _ obs ═ GRUenc(FC2(obstraj),wGRU) Wherein w isBiAnd wGRULearnable parameters of BiGRU () and GRUenc () are respectively represented, the full convolution networks FC1 and FC2 are responsible for position coding, and a scene path is coded by using a fusion bidirectional gating circulation unit, so that the priori performance of the scene path is better exerted, and the track prediction effect is improved;
s32, performing time sequence coding on a threshold Recurrent Unit (GRU);
the threshold cycle unit is a variant of a Recurrent Neural Network (RNN), the calculated amount of the threshold cycle unit is smaller, each threshold cycle unit comprises an update gate for controlling information transmission, and the information of the current time and the previous time is processed, so that the current state is controlled, the reset gate is similar to the update gate, but the dependency of the reset gate on the previous time is controlled, therefore, the threshold cycle unit can effectively capture the long-short term relationship of the sequence, and is more suitable for solving the problem of dynamic identification;
s33, fusing a scene-based attention mechanism module;
after the scene path is obtained, introducing a better fusion track and scene information of the attention System (SBA) of the scene by using a multi-head attention system in a transform framework, on the premise of fully retaining the characteristics of the scene path, observing the correlation between the track and the scene path, inputting t-n moment observation track hidden state code h _ obs (t-n) and scene path hidden state code h _ scene, obtaining the code h _ obs (t-n +1) of the next moment,get h _ obst=n+1Then, GRU is useddecEncoding timing information and decoding using a position decoder FC3Output predicted position at that time:
the invention has the beneficial effects that:
1. by introducing the light-weight characteristic extraction ENet network, the algorithm parameter quantity is reduced, and the generalization capability of the algorithm understanding scene is improved;
2. GST sampling is used for strategies, so that the real probability distribution of the strategies can be correctly reflected, and the problem of insufficient robustness of random sampling strategies is solved;
3. by using the attention mechanism module of the scene, scene information and pedestrian observation tracks are better fused, and compared with a mainstream algorithm, the scene-oriented pedestrian track prediction network S2Tirl obtains better effects on a public data set and actual data.
Drawings
FIG. 1 is a flow chart of a pedestrian trajectory prediction system based on fusion inverse reinforcement learning according to the present invention;
FIG. 2 is a schematic diagram of the generation of a reward map using an ENet network architecture as proposed in the present invention;
FIG. 3 is a schematic diagram showing the comparison of sampling using an inverse reinforcement learning algorithm and a GST scene strategy proposed in the present invention;
FIG. 4 is a schematic diagram of a scene-based attention mechanism module fusing scene and pedestrian observation trajectory path generation as proposed in the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings and examples, which are simplified schematic drawings and illustrate only the basic structure of the invention in a schematic manner, and therefore only show the structures relevant to the invention.
The method comprehensively considers scene path and pedestrian track information, utilizes scene hidden information to increase prediction precision, and integrates the scene path and pedestrian track information to output a final predicted track;
as shown in fig. 1, which is a flow chart of a pedestrian trajectory prediction system based on fusion inverse reinforcement learning, a trajectory prediction method based on fusion inverse reinforcement learning includes the following steps:
by means of inverse reinforcement learning, the motion mode of the pedestrian is accurately understood from the motion track of the pedestrian, and meanwhile scene characteristics are introduced, so that the motion of the pedestrian around the current vehicle is more accurately predicted, and the generalization performance of a track prediction algorithm is improved; the method generates the scene path according to GST (Gumbel-softmax Trick) sampling, and can generate the scene path according to an optimal strategy; finally, information fusion is carried out on the pedestrian track prediction and the scene path, and after fusion is carried out by using a scene-based attention mechanism module, the correlation between the pedestrian track and the scene path is observed on the premise that the scene path characteristics are fully reserved;
the method comprises the following specific operation steps:
FIG. 2 presents a schematic diagram of the generation of a reward map using an ENet network architecture:
s1, generating a path reward map and an end point reward map based on the input observation track and the scene graph;
s11, introducing an observation track and extracting scene features;
the motion characteristics phi of the intelligent agent are formed by introducing parameters such as the observation track calculation speed v of the intelligent agent, the distance r between the intelligent agent and the scene center and the likeMGenerating motion characteristics, after acquiring pedestrian motion characteristics, in order to improve accuracy, adding scene graph information at the same time, utilizing asymmetric design of a semantic segmentation network ENet network structure, comprising five bottleneck modules of initialization, convention, down sampling, upper adoption, expansion and asymmetry, enhancing the characteristic information of each target in an image by inputting a scene graph, improving the acquisition of effective information by a subsequent semantic segmentation network, and reducing the influence of image noise on the network, thereby improving the identification accuracy of the target, building a semantic segmentation network of a road scene, after accurately acquiring the global information and the local information of each target, distinguishing according to different semantic meanings expressed by each pixel in a scene image, classifying each pixel in the image into image blocks with the size of 168 x 168 pixels at the last moment position in an observation track, ENetfeatUsing the initialization module of ENet and the first three layers of bottleneck blocks, the image with 3 × 168 × 168 of dimension is converted into a feature map phi of 128 × 21 × 21BFusing the feature information of each stage to generate a scene feature phiB;
S12, generating the reward map
The scene characteristics phi generated in step S11BAnd exercise ofCharacteristic phiMMake a connection to input the MLPpathAnd MLPgoalA network for generating path reward maps and end point reward maps separately, a neural network multi-layer perceptron (MLP) is essentially a perceptron but complexity is enhanced by the design of layers, there are three types of layers in the neural network, respectively: the input layer, the hidden layer and the output layer comprise two 2D convolution layers and a ReLU activation function, the last layer is a Log-Sigmoid activation function, wherein the MLP layer is a multi-layer (MLP) layerpathThe network provides rewards for each action of generating paths, and can be used for judging the action termination position, MLPgoalProviding reward for the termination of the generation of a path for determining the location of termination of an action, the invention provides a method for determining the location of termination of an action by providing a reward to an ENetfeatThe network model parameters are pre-trained by using an aeroscape data set, so that the convergence speed of scene rewarding map training is increased;
fig. 3 shows a comparison between random sampling and gunn-bell sampling:
s21, calculating an optimal strategy by using inverse reinforcement learning;
after the path reward map and the end point reward map are obtained in the step S1, a reverse reinforcement learning algorithm is used for collecting a batch of pedestrian tracks in a grid environment with the size of 21 x 21, the forms of return functions are deduced after the pedestrian tracks are obtained, then behavior strategies are optimized according to the return functions, and finally the strategy pi is obtained through explorationθ(as), in the method of reasoning the return function, the maximum entropy reverse reinforcement learning is a probabilistic thinking reasoning mode, a sampling method is adopted, a model-free reinforcement learning algorithm is used for learning an optimal strategy under the current reward setting, then the strategy is used for collecting { Tj } to carry out unbiased estimation, on the other hand, the learning of the model-free reinforcement learning algorithm is carried out, an inaccurate strategy is used for similar estimation gradient, importance sampling is used for overcoming the deviation problem, an approximation iterative algorithm is introduced to calculate the optimal strategy piθUsing the formula: piθ(a | s) ═ exp (Q (s, a) -v (s)), where the state value function v(s) represents the desired prize for state s, and the state-action pair (s, a) is the desired prize; the strategy represents the probability of selecting action a given a grid location s, where all nets in the inverse reinforcement learning environmentGrid position s is spThe sum of the path points sgThe sum of the target points, action a is five actions of up, down, left, right and termination, and the action value Q is the state of the five actions(n)(s, a) is assigned, and after operation, the next wheel state function V is assigned(n-1)(s), finally obtaining a given grid position s, adopting the probability of each action a, and obtaining a strategy pi after N cyclesθN is determined by the policy generation path length;
s22, using GST strategy to sample;
introducing a GST (Gumbel-softmax Trick) method to sample from discontinuous probability distribution, generating a scene path according to an optimal strategy to improve sampling efficiency, and firstly, carrying out strategy piθSampling to obtain process xπ=softmax(logπi+Gi) Wherein G isiRepresenting a standard Gumbel distribution of independent equal distributions, the Gumbel distribution being a kind of extreme value distribution whose cumulative distribution function isGiIt can be obtained from a uniform distribution by inverting the Gumbel distribution: gi=-log(-log(Ui)),UiThe discrete probability distribution is meaningful rather than only taking the value with the maximum probability, then the gradient needs to be calculated, and GST can correctly reflect the real probability distribution of the strategy, so that the problem of insufficient robustness of random sampling of the optimal strategy is solved;
FIG. 4 is a schematic diagram of the fusion of scene and pedestrian observation trajectory path generation by the SBA module;
s31 full convolution network
After a strategy sampling Scene path and an observation track are input, the strategy sampling Scene path and the observation track are respectively sent to a Scene-Based Attention mechanism (SBA) module after being coded, a prediction track is output through a decoder, a Scene path Scene _ path is input, the position code is sent to a bidirectional BiGRU by utilizing a full convolution network FC1 to carry out time sequence coding to generate a Scene path hidden state h _ Scene, and the observation path hidden state h _ Scene is obtained by observingThe trajectory obs _ traj is encoded by FC2, and then sent to GRUenc for time-series encoding to generate the motion hidden state h _ obs, so as to obtain h _ scene ═ BiGRU (FC)1(scenepath),wBi) And h _ obs ═ GRUenc(FC2(obstraj),wGRU) Wherein w isBiAnd wGRULearnable parameters respectively representing BiGRU () and GRUenc (), the full convolution networks FC1 and FC2 are responsible for position coding, and the BiGRU is used for coding the scene path, so that the priori performance of the scene path can be better exerted, and the track prediction effect can be improved;
s32, threshold cycle unit time sequence coding
The threshold cycle Unit (GRU) is a variant of Recurrent Neural Network (RNN) and has smaller calculation amount, each threshold cycle Unit comprises an update gate for controlling information transmission to process the information at the current moment and the previous moment so as to control the current state, the reset gate is similar to the update gate, but the side of the reset gate controls the dependency degree of the previous moment, therefore, the threshold cycle Unit can effectively capture the long-short term relation of the sequence and is more suitable for solving the problem of dynamic identification, a formula is generated after the time sequence coding, and wBiAnd wGRULearnable parameters of BiGRU () and GRUenc () are respectively represented, the full convolution networks FC1 and FC2 are responsible for position coding, and a bidirectional threshold cycle unit is used for coding a scene path, so that the priori performance of the scene path is better exerted, and the track prediction effect is improved;
s33, fusing a scene-based attention mechanism module;
the method comprises the steps of utilizing a multi-head attention mechanism in a transform framework, introducing a scene-based attention mechanism module to better obtain a fusion track and scene information after obtaining a scene path, inputting a strategy sampling scene path and an observation track, respectively sending the scene path and the observation track into the scene fusion attention mechanism module after encoding, outputting a prediction track through a decoder, and on the premise of fully retaining scene path characteristics, inputting observation track codes h _ obs (t is n) and scene path codes h _ scene to obtain codes h _ obs (t is n +1) of the next moment, wherein h _ obs is softmax (h _ obs)t=n*h_scene) h _ scene, yielding h _ obst=n+1Then, GRU is useddecEncoding timing information and using a position decoder FC3Decoding is carried outOutput predicted position at that time:
in light of the foregoing description of the preferred embodiment of the present invention, many modifications and variations will be apparent to those skilled in the art without departing from the spirit and scope of the invention. The technical scope of the present invention is not limited to the content of the specification, and must be determined according to the scope of the claims.
Claims (4)
1. A track prediction method based on fusion inverse reinforcement learning is characterized by comprising the following steps:
s1, generating a path reward map and an end point reward map based on the input observation track and the scene graph;
s2, obtaining a path by sampling the strategy by using an inverse reinforcement learning algorithm;
s3, path position coding is carried out by using a full convolution network, a scene path is coded by fusing a bidirectional gating circulation unit, and the scene path and a pedestrian observation track are fused.
2. The trajectory prediction method based on the fusion inverse reinforcement learning of claim 1, wherein the generating of the path reward map and the end point reward map based on the input observation trajectory and the scene graph comprises the following steps:
s11, introducing an observation track and extracting scene features;
encoding and decoding the scene graph by utilizing an ENet semantic segmentation network to generate a scene path; meanwhile, the observation track is introduced to calculate the speed v of the intelligent agent and the distance r between the intelligent agent and the scene center to form the motion characteristic phi of the intelligent agentM(ii) a Image processing methodFusing the information of the features and the track features to generate scene features;
s12, generating a reward map;
connecting scene characteristics and motion characteristics and inputting the scene characteristics and the motion characteristics into an MLPpathAnd MLPgoalAnd the network generates a path reward map and an end point reward map respectively.
3. The method for predicting the track based on the fusion inverse reinforcement learning as claimed in claim 2, wherein the step of obtaining the path by sampling the strategy with the inverse reinforcement learning algorithm comprises the following steps:
s21, calculating an optimal strategy by using inverse reinforcement learning;
after a path reward map and an end point reward map are obtained, a strategy pi is obtained by utilizing an inverse reinforcement learning algorithmθ(as), the sampling probability of the sequence is in direct proportion to the reward of the sequence, which is expressed asWhere K is a normalization constant, N represents the length of the sequence, r (τ) represents the prize value for the sequence, r(s)i) Representing a single state-action pair reward value; deriving reward functions from expert examples, generating a policy algorithm with the goal of maximizing the set of expert examples, T ═ τ1,τ2,...,τN) The log-likelihood value of (a) is,solving the log-likelihood value L for using a gradient descent methodθIs simplified toWherein r isθ(τ) sequential reward value, Z, for exploration strategy GenerationθTo explore the strategically learned constants, FπIs the state access frequency, F, of the expert example τθIs strategy piθThe state access frequency of the generation path,andrespectively represent the pair likelihood values LθAnd a reward function rθDerivation is carried out;the probability of the action a is selected given a grid position s obtained by back propagation, wherein all grid positions s in the inverse reinforcement learning environment are spThe sum of the path points sgThe sum of the target points, action a is five actions of up, down, left, right and termination;
s22, using GST strategy to sample;
generating a scene path according to the optimal strategy, and introducing GST to strategy piθ *Perform a sampling generation action xπThe specific process is xπ=soft max(logπi+Gi) Wherein, GiA standard Gumbel distribution representing the same distribution independently, the cumulative distribution function of the Gumbel distribution beingGiObtained from the uniform distribution by inverting the Gumbel distribution.
4. The method for predicting the trajectory based on the fusion inverse reinforcement learning according to claim 1, wherein the path position is encoded by using a full convolution network, a scene path is encoded by using a fusion bidirectional gating circulation unit, and the fusion of the scene path and the observation trajectory of the pedestrian comprises the following steps:
s31, a full convolution network;
the position coding is sent to a fusion bidirectional gating circulating unit through a full convolution network for time sequence coding, after a strategy sampling scene path and an observation track are input, the position coding is sent to a scene-based attention mechanism module through coding, a prediction track is output through a decoder, a scene path scene _ path is input, and the position coding is carried out by utilizing the full convolution network FC1The codes are sent into a fusion bidirectional gating circulation unit to carry out time sequence coding to generate a scene path hidden state code h _ scene, for an observation track obs _ traj, position codes are sent into GRUenc to carry out time sequence coding after FC2 is used to generate an observation track hidden state code h _ obs, and h _ scene is obtained as biGRU (FC)1(scenepath),wBi) And h _ obs ═ GRUenc(FC2(obstraj),wGRU) Wherein w isBiAnd wGRULearnable parameters representing BiGRU () and GRUenc () respectively;
s32, a Gate Recycling Unit (GRU) time sequence coding;
each threshold cycle unit comprises an update gate for controlling information transmission, and processes information at the current time and the previous time so as to control the current state; resetting the long-short term relation of the sequence effectively captured by the threshold cycle unit at the moment before the gate control;
s33, fusing a scene-based attention mechanism module;
inputting t-n time observation track hidden state code h _ obs (t-n) and scene path hidden state code h _ scene, obtaining code h _ obs (t-n +1) of next time,get h _ obst=n+1Then, GRU is useddecEncoding timing information and decoding using a position decoder FC3Output predicted position at that time:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210189127.7A CN114445465A (en) | 2022-02-28 | 2022-02-28 | Track prediction method based on fusion inverse reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210189127.7A CN114445465A (en) | 2022-02-28 | 2022-02-28 | Track prediction method based on fusion inverse reinforcement learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114445465A true CN114445465A (en) | 2022-05-06 |
Family
ID=81373523
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210189127.7A Pending CN114445465A (en) | 2022-02-28 | 2022-02-28 | Track prediction method based on fusion inverse reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114445465A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117273225A (en) * | 2023-09-26 | 2023-12-22 | 西安理工大学 | Pedestrian path prediction method based on space-time characteristics |
CN117473961A (en) * | 2023-12-27 | 2024-01-30 | 卓世科技(海南)有限公司 | Market document generation method and system based on large language model |
CN117808846A (en) * | 2024-02-01 | 2024-04-02 | 中国科学院空天信息创新研究院 | Target motion trail prediction method and device based on lightweight remote sensing basic model |
CN117875535A (en) * | 2024-03-13 | 2024-04-12 | 中南大学 | Method and system for planning picking and delivering paths based on historical information embedding |
-
2022
- 2022-02-28 CN CN202210189127.7A patent/CN114445465A/en active Pending
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117273225A (en) * | 2023-09-26 | 2023-12-22 | 西安理工大学 | Pedestrian path prediction method based on space-time characteristics |
CN117273225B (en) * | 2023-09-26 | 2024-05-03 | 西安理工大学 | Pedestrian path prediction method based on space-time characteristics |
CN117473961A (en) * | 2023-12-27 | 2024-01-30 | 卓世科技(海南)有限公司 | Market document generation method and system based on large language model |
CN117473961B (en) * | 2023-12-27 | 2024-04-05 | 卓世科技(海南)有限公司 | Market document generation method and system based on large language model |
CN117808846A (en) * | 2024-02-01 | 2024-04-02 | 中国科学院空天信息创新研究院 | Target motion trail prediction method and device based on lightweight remote sensing basic model |
CN117875535A (en) * | 2024-03-13 | 2024-04-12 | 中南大学 | Method and system for planning picking and delivering paths based on historical information embedding |
CN117875535B (en) * | 2024-03-13 | 2024-06-04 | 中南大学 | Method and system for planning picking and delivering paths based on historical information embedding |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112418409B (en) | Improved convolution long-short-term memory network space-time sequence prediction method by using attention mechanism | |
CN114445465A (en) | Track prediction method based on fusion inverse reinforcement learning | |
CN110458844B (en) | Semantic segmentation method for low-illumination scene | |
CN113688723B (en) | Infrared image pedestrian target detection method based on improved YOLOv5 | |
CN108596958B (en) | Target tracking method based on difficult positive sample generation | |
CN110309732B (en) | Behavior identification method based on skeleton video | |
Saxena et al. | D-GAN: Deep generative adversarial nets for spatio-temporal prediction | |
CN112863180A (en) | Traffic speed prediction method, device, electronic equipment and computer readable medium | |
CN113139446B (en) | End-to-end automatic driving behavior decision method, system and terminal equipment | |
CN111652903A (en) | Pedestrian target tracking method based on convolution correlation network in automatic driving scene | |
CN113378775B (en) | Video shadow detection and elimination method based on deep learning | |
CN110599443A (en) | Visual saliency detection method using bidirectional long-term and short-term memory network | |
CN116246338B (en) | Behavior recognition method based on graph convolution and transducer composite neural network | |
CN115331460B (en) | Large-scale traffic signal control method and device based on deep reinforcement learning | |
CN113313123A (en) | Semantic inference based glance path prediction method | |
CN114116944A (en) | Trajectory prediction method and device based on time attention convolution network | |
CN115457081A (en) | Hierarchical fusion prediction method based on graph neural network | |
CN115376103A (en) | Pedestrian trajectory prediction method based on space-time diagram attention network | |
CN115457657A (en) | Method for identifying channel characteristic interaction time modeling behaviors based on BERT model | |
CN115113165A (en) | Radar echo extrapolation method, device and system | |
Li et al. | Video prediction for driving scenes with a memory differential motion network model | |
CN117456449B (en) | Efficient cross-modal crowd counting method based on specific information | |
CN117237411A (en) | Pedestrian multi-target tracking method based on deep learning | |
CN115861664A (en) | Feature matching method and system based on local feature fusion and self-attention mechanism | |
Šarić et al. | Dense semantic forecasting in video by joint regression of features and feature motion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |