WO2021077446A1 - Prediction method for trajectory of vehicle, prediction system and vehicle - Google Patents
Prediction method for trajectory of vehicle, prediction system and vehicle Download PDFInfo
- Publication number
- WO2021077446A1 WO2021077446A1 PCT/CN2019/113683 CN2019113683W WO2021077446A1 WO 2021077446 A1 WO2021077446 A1 WO 2021077446A1 CN 2019113683 W CN2019113683 W CN 2019113683W WO 2021077446 A1 WO2021077446 A1 WO 2021077446A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- trajectory
- vehicle
- prediction
- situation
- definition
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/10—Geometric CAD
- G06F30/15—Vehicle, aircraft or watercraft design
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
Definitions
- the present invention relates to a field of autonomous driving, in particular, to a prediction method for a trajectory of vehicle based on inverse reinforcement learning, a prediction system for a trajectory of vehicle based on inverse reinforcement learning and a vehicle comprising the same.
- the driving assistance system and the autonomous driving vehicle have made great progress, and the advanced intelligent vehicle has great potential in improving the performance and safety of the transportation system and getting people out of the driving task.
- the autonomous driving system not only needs to maintain safety, but also needs to follow the behavior of the human driver to make the passengers of the autonomous driving vehicle feel comfortable and make them accept the autonomous driving vehicle.
- other traffic participant on the road will also predict and make decisions about potential risks on the road based on experience. Therefore, it is necessary: 1 -to develop a prediction method of surrounding vehicles trajectories for anticipation in autonomous driving trajectory planning, and 2 –to generate driving trajectories simulating human driving.
- a first object of the present invention is to provide a prediction method for the trajectory planning of an autonomous vehicle based on inverse reinforcement learning capable of predicting the trajectory of human driven vehicles during the driving.
- a second object of the present invention is to provide a system comprising the prediction method for the trajectory planning of an autonomous vehicle based on inverse reinforcement learning capable of predicting the trajectory of human driven vehicles during the driving.
- a third object of the present invention is to provide a vehicle comprising the prediction system for the trajectory planning of an autonomous vehicle based on inverse reinforcement learning capable of predicting the trajectory of human driven vehicles during the driving.
- the present invention provides a prediction method for the trajectory of a vehicle based on inverse reinforcement learning comprising:
- step S1 of generating a set ⁇ containing admissible trajectories ⁇ i , (i 1...N T ) , for predicted vehicle from an initial situation; and a step S2 of selecting a particular trajectory ⁇ * from set ⁇ that mimicks the one human would follow in same initial situation.
- the admissible trajectories are generated in a Frenet frame, wherein the Frenet frame is set at the initial position of the predicted vehicle and the Frenet frame is fixed during the whole trajectory prediction and is always relative to the predicted vehicle initial position.
- the method further comprises a step S3 of the description of situations, and the step S3 comprises a method S310 of the definition of initial situation and a method S320 of the definition of potential future situations.
- the step S2 includes a step of situation-action trajectory definition and a step of trajectory selection.
- the prediction method further comprises a method for the description of situations, wherein the method for the description of situations includes the definition of a situation and the definition of initial situation and potential future situations.
- both the step S1 and the step S2 in the prediction are implemented by an embedded algorithm.
- the prediction method further comprises a calibration method including a reward function calibration.
- the reward function is obtained using a Recurrent Neural Network (RNN) with at least 2 layers in which a first layer is a state encoding and a second layer is a recurrent layer.
- RNN Recurrent Neural Network
- the present invention provides a prediction system for the trajectory of a vehicle for implementing the prediction method for the trajectory of the vehicle.
- the present invention provides a vehicle comprising the prediction system for the trajectory of the vehicle.
- the method proposed in this invention enables selection of a human-like trajectory and can therefore be used
- Such method increases acceptability of autonomous driving by surrounding drivers and is essential to predict surrounding human driven vehicles trajectories.
- Fig. 1 is a schematic view showing a Frenet frame disposed in an predicted vehicle according to an embodiment of the present invention
- Fig. 2 is a schematic view showing a situation where the measurements are performed by the predicted vehicle using a LIDAR according to an embodiment of the present invention
- Fig. 3 is a schematic diagram showing a situation where the measurements are performed by the surrounding vehicle of a non-predicted vehicle using a LIDAR according to an embodiment of the present invention
- Fig. 4 is a schematic diagram showing respective generated trajectories and human driving trajectory according to an embodiment of the present invention.
- Fig. 5 is a schematic view showing the RNN configuration in the case where an predicted vehicle performs measurements using a laser beam according to an embodiment of the present invention.
- An embodiment of the present invention provides a prediction method for the trajectory of a human-driven vehicle based on inverse reinforcement learning and a vehicle according to an embodiment of the present invention is hereinafter referred to as an “predicted vehicle” .
- the prediction method for the trajectory of a vehicle based on inverse reinforcement learning uses a Frenet frame F 0 , which is on the surface of the flat road, and as shown in Fig. 1, lateral axis is the unit vector tangent to the middle lane line at the predicted vehicle initial position, pointing in the direction of motion; longitudinal axis is the unit vector normal to and pointing to the opposite direction lane and its original point is set at the initial position of the predicted vehicle.
- the steps S110 and S120 are implemented based on the Frenet frame F 0 .
- step S110 comprises a method S111 of trajectory definition using coordinates, a method S112 of trajectory definition using boundary conditions and a method S113 of trajectory definition using sequence of actions.
- the method S111 can be used when prediction starts and an initial situation can still be measured.
- this trajectory can derived from order 5 polynomials:
- ⁇ coord (t) [ ⁇ ] implies that ⁇ coord (t) is the definition of trajectory ⁇ .
- predicted vehicle trajectory can be defined using the method S112 of trajectory definition using boundary conditions.
- predicted vehicle trajectory defined using order 5 polynomials can be equivalently fully defined by a particular unique vector:
- the predicted vehicle trajectory can be defined using the method S113 of trajectory definition using sequence of actions.
- a sequence of large N a values of acceleration on the and axis equivalently defines a trajectory when its initial structure is known, as well as sampling time ⁇ T .
- the predicted vehicle trajectory can be defined by order 5 polynomials.
- a sequence of absolute values of the acceleration and angles between the acceleration vectors and an axis of the Frenet frame defines a trajectory or a sequence of a vehicle steering and pedals angles defines a trajectory.
- Such a sequence of actions (as an example, accelerations or accelerations angle or steering and pedals angles) is called a trajectory defined using a sequence of actions, and will be denoted
- sampling time ⁇ t will be stated or inferred from the context and ⁇ actions [ ⁇ ] will be used to state that ⁇ actions is the definition of trajectory ⁇ whereas [ ⁇ actions ] k will be used to denote the action at time k.
- the step S1 further comprises a method S120 for trajectory generation.
- final position (X end , Y end ) , velocity acceleration and travel time during full trajectory t end are deduced from tessellation of final lateral deviation ⁇ Y , final velocity increase ⁇ v and manoeuver duration ⁇ D .
- Tessellation for trajectory generation can thus be performed through
- an embodiment of the present invention also provides a method S3 of the description of situations.
- the method S3 of the description of situations comprises a method S310 of the definition of initial situation and a method S320 of the definition of potential future situations.
- a situation s j is considered as an association of a descriptor of the scene around the predicted vehicle and of the velocity of the predicted vehicle, at time j. ⁇ t , all gathered in a vector, i.e. such that
- the scene descriptor embeds the position and velocity of surrounding vehicles relatively to the predicted vehicle, represented either by direct measurements or by a combination of them.
- the scene descriptor can be represented by
- Fig. 2 shows an example where the measurements are performed by the predicted vehicle using a LIDAR with 12 beams.
- Fig. 3 shows an example where the measurements are performed by another surrounding vehicle in the case where the predicted vehicle is not the one on which LIDAR and other measurement units are embedded. In this case, the scene descriptor must be reconstructed as scene descriptor is relative to the ego-vehicle.
- s j depends on the predicted vehicle trajectory, it embeds its velocity and the scene descriptor is relative to the predicted vehicle position along the trajectory. This dependency will be reflected through the notation s j [ ⁇ ] to emphasize the fact that the measurements are performed along trajectory ⁇ .
- [s [ ⁇ ] ] j will also be used to refer to the j th element of the full sequence also called situation trajectory.
- the initial situation is denoted as s 0 and follows the structure presented in the method S310 of the definition of initial situation.
- method S3 of the description of situations comprises a method S320 of the definition of potential future situations.
- this potential situation is then denoted where usually j is a future expected sample.
- Such an estimation can be performed using a linear model on the moving objects, such that the longitudinal position x k ( (j+1) . ⁇ t ) of moving object k at time (j+1) . ⁇ t is given by:
- the step S2 includes S210 of situation-action trajectory definition and S220 of trajectory selection.
- a situation-action trajectory ⁇ associated with trajectory ⁇ , also denoted ⁇ [ ⁇ ] , is defined as a sequence such that it is formed at sample j by concatenating situation s j [ ⁇ ] and predicted vehicle trajectory [ ⁇ action [ ⁇ ] ] j , i.e.
- the reward function is used for implementing the Inverse (or Reverse) Reinforcement Learning.
- At each sample k is high when action a k is adapted to situation s k for a particular purpose.
- Adapting a reward function to this purpose passes through its parameterization, which is enhanced by the notation referring to parameter ⁇ , or ⁇ * when this parameter is the optimal one regarding the purpose.
- the reward function is associated with the purpose of mimicking human behavior such that R ⁇ ( ⁇ ) is high if predicted vehicle trajectory ⁇ is human-like, e.g. is near to the one followed by a human driven vehicle in same situations.
- Fig. 4 shows human driven trajectory, multiple generated ones and a unique selected one.
- two types of applications can be derived comprising: one associated with autonomous driving trajectory planning, for which human-like trajectory enhances acceptability from passengers and surrounding drivers, and another one associated with surrounding vehicles trajectory prediction, supposed driven by human drivers, which reduces environment uncertainty for autonomous driving trajectory planning and thus increases safety.
- the main difference between both the applications lies in that the measurements are performed from the predicted vehicle in the case of trajectory planning and the measurements are performed from the vehicle on which an algorithm is embedded in the case of surrounding vehicle trajectory prediction.
- the difference leads to the facts that the descriptor D j requires computation and is not a simple collection of measurements in the case of trajectory prediction and some information may be missing due to occlusion of the near predicted vehicle environment from the vehicle on which the algorithm is embedded. And this distinction is addressed in step S310.
- parameters include t hor , ⁇ , ⁇ T , ⁇ v- , ⁇ v+ , ⁇ y- , ⁇ y+ , ⁇ D- , ⁇ D+ and and the algorithm includes the following steps:
- a step S200 recording N HT sequences of actions associated with human driven trajectories and associated situations
- a step S201 measuring position, velocity and acceleration of all measurable moving objects relative distance to predicted vehicle, velocity and acceleration of all surrounding vehicles;
- a step S202 measuring velocity v 0 and acceleration a 0 of predicted vehicle
- a step S203 computing scene descriptor relatively to the predicted vehicle as presented in (17) ;
- step S204 concatenating measures of step S202 and step S203 to form initial situation s 0 as presented in (18) ;
- a step S206 including step S2061 of computing where i denotes the trajectory index; step S2062 of using a linear motion model to extrapolate displacement of surrounding vehicles for all next N samples; step S2063 of generating, using estimated situations obtained at step S202 and potential actions obtained at S201, situation-action trajectories where as presented in (19) , and step S2064 of computing global reward cost as presented in (20) ; and
- a step S207 selecting a trajectory according to
- predicted or planned trajectory can be obtained through steps from S200 to S207.
- an embodiment of the present invention also provides a calibration method comprising a calibration method of the reward function and a calibration method of other parameters.
- the purpose of the calibration of the reward function is to associate a real value to a given trajectory through evaluation of each of its situation-action pair such that its value is high when the action is adapted to the situation for a particular purpose.
- reward function is associated with the purpose of mimicking human behavior such that R ⁇ ( ⁇ ) is high if predicted vehicle trajectory ⁇ is human-like, e.g. is near to the one followed by a human driven vehicle in same situations.
- N HT sequences of actions associated with human driven trajectories are recorded by the formula and associated situations are recorded such that N HT situation-action trajectories can be defined.
- For each human driven recorded trajectory a set of N T trajectories is generated and each generated trajectory ⁇ j, k is sampled such that recorded trajectory and generated trajectory have the same sample time ⁇ T .
- Such a reward function can be obtained using a Recurrent Neural Network (RNN) with at least 2 layers, such that the first layer is a state encoding and the second one is a recurrent one. All the tunable parameters contained in this network are embedded in vector ⁇ and optimal vector ⁇ * can be obtained by a gradient ascent method.
- RNN Recurrent Neural Network
- the first layer of RNN is responsible for state encoding of a situation as defined in (16) . Its output is denoted O 1 .
- the second layer has as input the output O 1 of the first layer, as well as the action associated with situation s in order to evaluate the situation-action pair. As this second layer is recurrent, its input at sample k also embeds its output at previous sample k-1. This leads to the equations of the full RNN at sample k be given by:
- Fig. 5 shows the RNN configuration in the case where an predicted vehicle performs measurements using a laser beam according to an embodiment of the present invention.
- ⁇ j, k is the probability of a candidate trajectory ⁇ j, k to be selected by the RNN among all the trajectories belonging to set i.e.generated from initial situation of recorded human driven trajectory
- parameters include t hor , ⁇ , ⁇ 0 , ⁇ , ⁇ T , ⁇ v- , ⁇ v+ , ⁇ y- , ⁇ y+ , ⁇ D- , ⁇ D+ ,
- the algorithm for offline reward function calibration includes the steps:
- Table 1 shows all the parameters that need to be valued. However, their calibration does not affect the functionality of the embodiment of the present invention, only its optimality. Some parameters are only used for training and their value only affect computation time for training. As training is an off-line phase, associated with calibration, this computation time is not a real time issue. All the parameters both used in the offline and online phases must have the same values in both these phases.
- the present invention provides a prediction system for the trajectory of a vehicle capable of implementing the prediction method for the trajectory of a vehicle.
- the present invention provides a vehicle comprising the prediction system for the trajectory of a vehicle.
- the proposed method has at least the following advantages:
- the method generates and analyzes trajectories with respect to each driving situation, and the feature correlations are considered (i.e. trajectories cannot be isolated from its driving situation) .
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Geometry (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Pure & Applied Mathematics (AREA)
- Computer Hardware Design (AREA)
- Mathematical Optimization (AREA)
- Probability & Statistics with Applications (AREA)
- Algebra (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Automation & Control Theory (AREA)
- Aviation & Aerospace Engineering (AREA)
- Traffic Control Systems (AREA)
Abstract
A prediction method for the trajectory of a vehicle comprises a step S1 of generating a set ζ containing admissible trajectories ζ, (i=1...N T), for predicted vehicle from an initial situation and a step S2 of selecting a particular trajectory ζ * from set ζ that mimicks the one human would follow in same initial situation.
Description
FIELD OF TECHNOLOGY
The present invention relates to a field of autonomous driving, in particular, to a prediction method for a trajectory of vehicle based on inverse reinforcement learning, a prediction system for a trajectory of vehicle based on inverse reinforcement learning and a vehicle comprising the same.
At present, the driving assistance system and the autonomous driving vehicle have made great progress, and the advanced intelligent vehicle has great potential in improving the performance and safety of the transportation system and getting people out of the driving task. The autonomous driving system not only needs to maintain safety, but also needs to follow the behavior of the human driver to make the passengers of the autonomous driving vehicle feel comfortable and make them accept the autonomous driving vehicle. However, other traffic participant on the road will also predict and make decisions about potential risks on the road based on experience. Therefore, it is necessary: 1 -to develop a prediction method of surrounding vehicles trajectories for anticipation in autonomous driving trajectory planning, and 2 –to generate driving trajectories simulating human driving.
SUMMARY
(I) Technical problem to be solved
A first object of the present invention is to provide a prediction method for the trajectory planning of an autonomous vehicle based on inverse reinforcement learning capable of predicting the trajectory of human driven vehicles during the driving.
A second object of the present invention is to provide a system comprising the prediction method for the trajectory planning of an autonomous vehicle based on inverse reinforcement learning capable of predicting the trajectory of human driven vehicles during the driving.
A third object of the present invention is to provide a vehicle comprising the prediction system for the trajectory planning of an autonomous vehicle based on inverse reinforcement learning capable of predicting the trajectory of human driven vehicles during the driving.
(II) Technical solution
In order to solve the technical problems above, the present invention provides a prediction method for the trajectory of a vehicle based on inverse reinforcement learning comprising:
a step S1 of generating a set ζ containing admissible trajectories ζ
i, (i=1...N
T) , for predicted vehicle from an initial situation; and a step S2 of selecting a particular trajectory ζ
* from set ζ that mimicks the one human would follow in same initial situation.
In an embodiment of present disclosure, the admissible trajectories are generated in a Frenet frame, wherein the Frenet frame is set at the initial position of the predicted vehicle and the Frenet frame is fixed during the whole trajectory prediction and is always relative to the predicted vehicle initial position.
In an embodiment of present disclosure, the method further comprises a step S3 of the description of situations, and the step S3 comprises a method S310 of the definition of initial situation and a method S320 of the definition of potential future situations.
In an embodiment of present disclosure, the step S2 includes a step of situation-action trajectory definition and a step of trajectory selection.
In an embodiment of present disclosure, the prediction method further comprises a method for the description of situations, wherein the method for the description of situations includes the definition of a situation and the definition of initial situation and potential future situations.
In an embodiment of present disclosure, both the step S1 and the step S2 in the prediction are implemented by an embedded algorithm.
In an embodiment of present disclosure, the prediction method further comprises a calibration method including a reward function calibration.
In an embodiment of present disclosure, the reward function is obtained using a Recurrent Neural Network (RNN) with at least 2 layers in which a first layer is a state encoding and a second layer is a recurrent layer.
In another aspect, the present invention provides a prediction system for the trajectory of a vehicle for implementing the prediction method for the trajectory of the vehicle.
In still another aspect, the present invention provides a vehicle comprising the prediction system for the trajectory of the vehicle.
(III) Beneficial effects
The method proposed in this invention enables selection of a human-like trajectory and can therefore be used
· for human-like trajectory planning of an predicted autonomous driving vehicle; or
· for surrounding human driven vehicles trajectories prediction.
Such method increases acceptability of autonomous driving by surrounding drivers and is essential to predict surrounding human driven vehicles trajectories.
These and other aspects of the present invention will be apparent from and elucidated with reference to the embodiments described hereinafter.
The above and other objects, features and other advantages of the present invention will be more clearly understood from the following description in conjunction with the drawings in which:
Fig. 1 is a schematic view showing a Frenet frame disposed in an predicted vehicle according to an embodiment of the present invention;
Fig. 2 is a schematic view showing a situation where the measurements are performed by the predicted vehicle using a LIDAR according to an embodiment of the present invention;
Fig. 3 is a schematic diagram showing a situation where the measurements are performed by the surrounding vehicle of a non-predicted vehicle using a LIDAR according to an embodiment of the present invention;
Fig. 4 is a schematic diagram showing respective generated trajectories and human driving trajectory according to an embodiment of the present invention; and
Fig. 5 is a schematic view showing the RNN configuration in the case where an predicted vehicle performs measurements using a laser beam according to an embodiment of the present invention.
The specific embodiments of the present invention are further described in detail below with reference to the drawings and embodiments. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
Further, in the description of the present invention, “multiple” , “aplurality of” , and “multiple groups” mean two or more unless otherwise stated.
An embodiment of the present invention provides a prediction method for the trajectory of a human-driven vehicle based on inverse reinforcement learning and a vehicle according to an embodiment of the present invention is hereinafter referred to as an “predicted vehicle” .
The prediction method for the trajectory of a vehicle based on inverse reinforcement learning according to the present invention comprises the following two steps: a step S1 of generating a set ζ containing admissible trajectories ζ
i, (i=1…N
T) , for predicted vehicle from an initial situation; and a step S2 of selecting a particular trajectory ζ
* from set ζ that mimicks the one human would follow in same initial situation.
Specifically, the set ζ containing admissible trajectories ζ
i, (i=1…N
T) , for predicted vehicle from an initial situation is generated in the step S1 further comprising a step S110 of trajectory definition and a step S120 of trajectory generation.
Specifically, the prediction method for the trajectory of a vehicle based on inverse reinforcement learning according to the present invention uses a Frenet frame F
0, which is on the surface of the flat road, and as shown in Fig. 1, lateral axis
is the unit vector tangent to the middle lane line at the predicted vehicle initial position, pointing in the direction of motion; longitudinal axis
is the unit vector normal to
and pointing to the opposite direction lane and its original point is set at the initial position of the predicted vehicle. The steps S110 and S120 are implemented based on the Frenet frame F
0.
Further, the step S110 comprises a method S111 of trajectory definition using coordinates, a method S112 of trajectory definition using boundary conditions and a method S113 of trajectory definition using sequence of actions.
With respect to the method S111 of trajectory definition using coordinates, the predicted vehicle position can be defined by the trajectories of its coordinates ζ
coord (t) = (x (t) , y (t) ) considering t=0 at initial time in the frame described above. For example, the method S111 can be used when prediction starts and an initial situation can still be measured. As an example, this trajectory can derived from order 5 polynomials:
The generation of the trajectory of the predicted vehicle using the Frenet Frame F
0 above can minimize the acceleration, and make the velocity, acceleration and jerk to have continuity. In this specification, ζ
coord (t) [ζ] implies that ζ
coord (t) is the definition of trajectory ζ.
Further, considering initial predicted vehicle position in the Frenet frame as nul and time T at the end of the trajectory known, the predicted vehicle trajectory can be defined using the method S112 of trajectory definition using boundary conditions. For example, as shown in the following formula, predicted vehicle trajectory defined using order 5 polynomials can be equivalently fully defined by a particular unique vector:
Where predicted vehicle initial velocity v
0 and acceleration a
0 are measured and X
end, Y
end,
and
are respectively the position, velocity and acceleration of the predicted vehicle on the
and
axis alternatively, at time T. In the specification, ζ
bound [ζ] will be used to imply that ζ
bound (t) is the definition of trajectory ζ.
In addition, the predicted vehicle trajectory can be defined using the method S113 of trajectory definition using sequence of actions. In a general case, a sequence of large N
a values of acceleration on the
and
axis equivalently defines a trajectory when its initial structure is known, as well as sampling time Δ
T. As an example, when N
a≥5, the predicted vehicle trajectory can be defined by order 5 polynomials. Equivalently, a sequence of absolute values of the acceleration and angles between the acceleration vectors and an axis of the Frenet frame defines a trajectory or a sequence of a vehicle steering and pedals angles defines a trajectory. In this specification, any variable a (t) such that knowing N
a values evaluated at t=k. Δ
t (k=0…N
a-1) enables definition of a trajectory ζ is called an action.
Such a sequence of actions (as an example, accelerations or accelerations angle or steering and pedals angles) is called a trajectory defined using a sequence of actions, and will be denoted
whereas sampling time Δ
t will be stated or inferred from the context and ζ
actions [ζ] will be used to state that ζ
actions is the definition of trajectory ζ whereas [ζ
actions]
k will be used to denote the action at time k. Δ
t, i.e. in this embodiment of the present invention, [ζ
actions]
k=a
k.
In another aspect, the step S1 further comprises a method S120 for trajectory generation. In the frame described above, a set ζ of N
T trajectories ζ
i (i=1…N
T) , can be generated from tessellation of the parameters used in their definition ζ
bound, i [ζ
i] . In this embodiment, final position (X
end, Y
end) , velocity
acceleration
and travel time during full trajectory t
end are deduced from tessellation of final lateral deviation Δ
Y, final velocity increase Δ
v and manoeuver duration Δ
D.
Considering highway situations, the following constraints can be used:
Y
end, i=Δ
Y, i (4) ,
X
end, i=x (Δ
D) (9)
which can all be derived from (1) with
and
initial predicted vehicle velocity v
0 and acceleration a
0 being measured.
Tessellation for trajectory generation can thus be performed through
From given predicted vehicle initial velocity v
0 and acceleration a
0, such a tessellation enables thus to generate
trajectories ζ
i using:
On another aspect, an embodiment of the present invention also provides a method S3 of the description of situations. The method S3 of the description of situations comprises a method S310 of the definition of initial situation and a method S320 of the definition of potential future situations.
Specifically, the method S310 of the definition of the initial situation is as follows:
given sampling time Δ
t as a parameter, as an example Δ
t=0.1, a situation s
j is considered as an association of a descriptor
of the scene around the predicted vehicle and of the velocity of the predicted vehicle, at time j. Δ
t, all gathered in a vector, i.e. such that
The scene descriptor
embeds the position and velocity of surrounding vehicles relatively to the predicted vehicle, represented either by direct measurements or by a combination of them. As an example, when the predicted vehicle is performing trajectory planning and is equipped with a LIDAR, the scene descriptor
can be represented by
where d
i, j denotes, at time j. Δ
t, the distance from the predicted vehicle to the near material point at angle i from axis
given in degrees, d
i and
both being typically provided by measurement from LIDAR with resolution res=360° and in the formula
where v
j is the predicted vehicle velocity, at same time j. Δ
t.
Fig. 2 shows an example where the measurements are performed by the predicted vehicle using a LIDAR with 12 beams. In addition, Fig. 3 shows an example where the measurements are performed by another surrounding vehicle in the case where the predicted vehicle is not the one on which LIDAR and other measurement units are embedded. In this case, the scene descriptor must be reconstructed as scene descriptor is relative to the ego-vehicle.
In the actual situation, s
j depends on the predicted vehicle trajectory, it embeds its velocity and the scene descriptor is relative to the predicted vehicle position along the trajectory. This dependency will be reflected through the notation s
j [ζ] to emphasize the fact that the measurements are performed along trajectory ζ. In addition, [s [ζ] ]
j will also be used to refer to the j
th element of the full sequence
also called situation trajectory. And finally, the initial situation is denoted as s
0 and follows the structure presented in the method S310 of the definition of initial situation.
On the other aspect, method S3 of the description of situations comprises a method S320 of the definition of potential future situations. In this case, this potential situation is then denoted
where usually j is a future expected sample. Such an estimation can be performed using a linear model on the moving objects, such that the longitudinal position x
k ( (j+1) . Δ
t) of moving object k at time (j+1) . Δ
t is given by:
x
k ( (j+1) . Δ
t)=x
k (j. Δ
t) +v
j. Δ
t. (10)
In the description above, the step S1 of generating the set ζ including the allowable trajectories ζ
i (i=1…N
T) of the predicted vehicle from the initial situation thereof has been described, and then the step S2 of selecting a particular trajectory ζ
* from set ζ will be described in detail.
The step S2 includes S210 of situation-action trajectory definition and S220 of trajectory selection.
For the situation-action trajectory definition, considering sampling time Δ
t, a situation-action trajectory ξ, associated with trajectory ζ, also denoted ξ [ζ] , is defined as a sequence
such that it is formed at sample j by concatenating situation s
j [ζ] and predicted vehicle trajectory [ζ
action [ζ] ]
j, i.e.
[ξ [ζ] ]
j= ( [s [ζ] ]
j, [ζ
action [ζ] ]
j (19) .
Further, in the S220 of the trajectory selection, the goal of this trajectory selection step is to select an appropriate trajectory
from set ζ containing N
T generated trajectories ζ
i, i=1…N
T, such that it maximizes a global reward function represented by the following formula:
where γ is a parameter, usually called a discount factor and fixed as an example to γ=1; N is a parameter that fixes the number of samples to consider for optimization, inferred from parameter t
hor, called horizon time and such that t
hor≥t
end-, as an example N=30 if t
hor=3 s and t
end-≥3 s, which gives 10 samples per second , and
is called a reward function, while a
k= [ζ
action [ζ] ]
k and s
k=s
k [ζ] are respectively the action and the situation along trajectory ζ at sample k.
Please note that this trajectory selection step is performed at initial time, i.e. k=0, when future situations are not available. And at potential future situations, this proposal uses a linear motion model to extrapolate surrounding vehicles trajectories until time t
hor=N. Δ
t.
Specifically, the reward function is used for implementing the Inverse (or Reverse) Reinforcement Learning. At each sample k,
is high when action a
k is adapted to situation s
k for a particular purpose. Adapting a reward function to this purpose passes through its parameterization, which is enhanced by the notation referring to parameter θ, or θ
* when this parameter is the optimal one regarding the purpose. In this embodiment of the present invention, the reward function is associated with the purpose of mimicking human behavior such that R
θ (ζ) is high if predicted vehicle trajectory ζ is human-like, e.g. is near to the one followed by a human driven vehicle in same situations. Fig. 4 shows human driven trajectory, multiple generated ones and a unique selected one.
Hereinafter, an embedded algorithm for implementing steps S1 and S2 will be explained.
According to the embodiments of the present invention, two types of applications can be derived comprising: one associated with autonomous driving trajectory planning, for which human-like trajectory enhances acceptability from passengers and surrounding drivers, and another one associated with surrounding vehicles trajectory prediction, supposed driven by human drivers, which reduces environment uncertainty for autonomous driving trajectory planning and thus increases safety. The main difference between both the applications lies in that the measurements are performed from the predicted vehicle in the case of trajectory planning and the measurements are performed from the vehicle on which an algorithm is embedded in the case of surrounding vehicle trajectory prediction. The difference leads to the facts that the descriptor D
j requires computation and is not a simple collection of measurements in the case of trajectory prediction and some information may be missing due to occlusion of the near predicted vehicle environment from the vehicle on which the algorithm is embedded. And this distinction is addressed in step S310.
In the algorithm for predicted vehicle trajectory planning and surrounding vehicle trajectory prediction, parameters include t
hor, γ,
Δ
T, Δ
v-, Δ
v+ ,
Δ
y-, Δ
y+ ,
Δ
D-, Δ
D+ and
and the algorithm includes the following steps:
a step S200: recording N
HT sequences of actions
associated with human driven trajectories
and associated situations
a step S201: measuring position, velocity and acceleration of all measurable moving objects relative distance to predicted vehicle, velocity and acceleration of all surrounding vehicles;
a step S202: measuring velocity v
0 and acceleration a
0 of predicted vehicle;
a step S204: concatenating measures of step S202 and step S203 to form initial situation s
0 as presented in (18) ;
a step S205: generating N
T trajectories ζ
i, (i=1…N
T) as presented in (15) , using definition by boundary conditions, from tessellation of Δ
v, Δ
y and Δ
D and initial velocity and acceleration measured in step S201;
a step S206: including step S2061 of computing
where i denotes the trajectory index; step S2062 of using a linear motion model to extrapolate displacement of surrounding vehicles for all next N samples; step S2063 of generating, using estimated situations obtained at step S202 and potential actions obtained at S201, situation-action trajectories
where
as presented in (19) , and step S2064 of computing global reward cost
as presented in (20) ; and
In another aspect, an embodiment of the present invention also provides a calibration method comprising a calibration method of the reward function and a calibration method of other parameters.
The purpose of the calibration of the reward function is to associate a real value to a given trajectory through evaluation of each of its situation-action pair such that its value is high when the action is adapted to the situation for a particular purpose. In this embodiment, reward function is associated with the purpose of mimicking human behavior such that R
θ (ζ) is high if predicted vehicle trajectory ζ is human-like, e.g. is near to the one followed by a human driven vehicle in same situations.
Specifically, N
HT sequences of actions
associated with human driven trajectories
are recorded by the formula
and associated situations
are recorded such that N
HT situation-action trajectories
can be defined. For each human driven recorded trajectory
a set
of N
T trajectories is generated and each generated trajectory ζ
j, k is sampled such that recorded trajectory and generated trajectory have the same sample time Δ
T.
Further, the idea is then to have a reward function r
θ , wherein parameterized by a vector θ, and for which there exists a particular vector θ
* such that:
Such a reward function can be obtained using a Recurrent Neural Network (RNN) with at least 2 layers, such that the first layer is a state encoding and the second one is a recurrent one. All the tunable parameters contained in this network are embedded in vector θ and optimal vector θ
* can be obtained by a gradient ascent method.
The first layer of RNN is responsible for state encoding of a situation as defined in (16) . Its output is denoted O
1. The second layer has as input the output O
1 of the first layer, as well as the action associated with situation s in order to evaluate the situation-action pair. As this second layer is recurrent, its input at sample k also embeds its output at previous sample k-1. This leads to the equations of the full RNN at sample k be given by:
where, as denoted by the notation in previous equation, f
1 and f
2 are parameterized by θ
1 and θ
2 respectively, such that RNN parameter θ= [θ
1, θ
2] . Fig. 5 shows the RNN configuration in the case where an predicted vehicle performs measurements using a laser beam according to an embodiment of the present invention.
In addition, training, i.e. computing optimal parameter θ
*, is finally performed through usual gradient ascent maximization of
where ω
j, k is the probability of a candidate trajectory ζ
j, k to be selected by the RNN among all the trajectories belonging to set
i.e.generated from initial situation of recorded human driven trajectory
Further, in an algorithm for offline reward function calibration, parameters include t
hor, γ, θ
0, μ, Δ
T, Δ
v-, Δ
v+ ,
Δ
y-, Δ
y+,
Δ
D-, Δ
D+,
The algorithm for offline reward function calibration includes the steps:
S210: recording N
HT sequences of actions
associated with human driven trajectories
and associated situations
S2113: generating N
T trajectories ζ
i (i=1…N
T) as presented in (15) , using definition by boundary conditions, from tessellation of Δ
v, Δ
y and Δ
D and initial velocity and acceleration from
S2116: looping to S2114 and S2115 for optimization of (24) ; and
S2117: else θ
*: =θ.
Therefore, optimal parameter for human like trajectory planning or prediction θ
* can be obtained through steps above.
In another aspect, Table 1 shows all the parameters that need to be valued. However, their calibration does not affect the functionality of the embodiment of the present invention, only its optimality. Some parameters are only used for training and their value only affect computation time for training. As training is an off-line phase, associated with calibration, this computation time is not a real time issue. All the parameters both used in the offline and online phases must have the same values in both these phases.
Table 1 Parameters used in the embodiment and usual values
In another aspect, the present invention provides a prediction system for the trajectory of a vehicle capable of implementing the prediction method for the trajectory of a vehicle.
In still another aspect, the present invention provides a vehicle comprising the prediction system for the trajectory of a vehicle.
The proposed method has at least the following advantages:
1. Similarity of trajectories are measured on both path and velocity profiles (not only on features) , and learns a cost function that favors on the trajectories similar to the real one.
2. The method generates and analyzes trajectories with respect to each driving situation, and the feature correlations are considered (i.e. trajectories cannot be isolated from its driving situation) .
It will be understood that there are numerous modifications of the illustrated embodiments described above which will be readily apparent to one skilled in the art, such as many variations and modifications as for the structure of the device for guiding and the assistant component. These modifications and variations fall within the scope of the claims, which follow.
Claims (10)
- A prediction method for the trajectory of a vehicle, characterized in that comprising:a step S1 of generating a set ζ containing admissible trajectories ζ i, (i=1…N T) , for predicted vehicle from an initial situation; anda step S2 of selecting a particular trajectory ζ * from set ζ that mimicks the one human would follow in same initial situation.
- The prediction method according to claim 1, characterized in that,the admissible trajectories are generated in a Frenet frame, wherein the Frenet frame is set at the initial position of the predicted vehicle and the Frenet frame is fixed during the whole trajectory prediction and is always relative to the predicted vehicle initial position.
- The prediction method according to claim 1, characterized in that,the method further comprises a step S3 of the description of situations, and the step S3 comprises a method S310 of the definition of initial situation and a method S320 of the definition of potential future situations.
- The prediction method according to claim 1, characterized in that,the step S2 includes a step of situation-action trajectory definition and a step of trajectory selection.
- The prediction method according to claim 1, characterized in that, further comprising a method for the description of situations, wherein the method for the description of situations includes the definition of a situation and the definition of initial situation and potential future situations.
- The prediction method according to claim 5, characterized in that,both the step S1 and the step S2 are implemented by an embedded algorithm.
- The prediction method according to claim 6, characterized in that, further comprising:a calibration method including a reward function calibration.
- The prediction method according to claim 6, characterized in that,the reward function is obtained using a Recurrent Neural Network (RNN) with at least 2 layers in which a first layer is a state encoding and a second layer is a recurrent layer.
- A prediction system for the trajectory of a vehicle, used for implementing the prediction system for the trajectory of the vehicle according to any one of claims 1 to 8.
- A vehicle comprising the prediction system for the trajectory of the vehicle according to claim 9.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2019113445 | 2019-10-25 | ||
CNPCT/CN2019/113445 | 2019-10-25 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021077446A1 true WO2021077446A1 (en) | 2021-04-29 |
Family
ID=75620380
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/113683 WO2021077446A1 (en) | 2019-10-25 | 2019-10-28 | Prediction method for trajectory of vehicle, prediction system and vehicle |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2021077446A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11987237B2 (en) | 2021-12-20 | 2024-05-21 | Waymo Llc | Systems and methods to determine a lane change strategy at a merge region |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090024357A1 (en) * | 2006-02-28 | 2009-01-22 | Toyota Jidosha Kabushiki Kaisha | Object Path Prediction Method, Apparatus, and Program, and Automatic Operation System |
CN104732066A (en) * | 2015-02-16 | 2015-06-24 | 同济大学 | Vehicle behavior spatial-temporal evolution modeling method under path constraint condition and application thereof |
CN109324620A (en) * | 2018-09-25 | 2019-02-12 | 北京主线科技有限公司 | The dynamic trajectory planing method for carrying out avoidance based on lane line parallel offset and overtaking other vehicles |
CN110189547A (en) * | 2019-05-30 | 2019-08-30 | 广州小鹏汽车科技有限公司 | A kind of obstacle detection method, device and vehicle |
-
2019
- 2019-10-28 WO PCT/CN2019/113683 patent/WO2021077446A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090024357A1 (en) * | 2006-02-28 | 2009-01-22 | Toyota Jidosha Kabushiki Kaisha | Object Path Prediction Method, Apparatus, and Program, and Automatic Operation System |
CN104732066A (en) * | 2015-02-16 | 2015-06-24 | 同济大学 | Vehicle behavior spatial-temporal evolution modeling method under path constraint condition and application thereof |
CN109324620A (en) * | 2018-09-25 | 2019-02-12 | 北京主线科技有限公司 | The dynamic trajectory planing method for carrying out avoidance based on lane line parallel offset and overtaking other vehicles |
CN110189547A (en) * | 2019-05-30 | 2019-08-30 | 广州小鹏汽车科技有限公司 | A kind of obstacle detection method, device and vehicle |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11987237B2 (en) | 2021-12-20 | 2024-05-21 | Waymo Llc | Systems and methods to determine a lane change strategy at a merge region |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Chen et al. | Learning from all vehicles | |
Guo et al. | Design of automatic steering controller for trajectory tracking of unmanned vehicles using genetic algorithms | |
WO2020000192A1 (en) | Method for providing vehicle trajectory prediction | |
CN111971691A (en) | Graph neural network representing a physical system | |
US10814870B2 (en) | Multi-headed recurrent neural network (RNN) for multi-class trajectory predictions | |
US20150046132A1 (en) | Method and device for determining a prediction quality | |
US20210004966A1 (en) | Method for the Assessment of Possible Trajectories | |
US11565715B2 (en) | Neural networks with attention al bottlenecks for trajectory planning | |
CN113807460A (en) | Method and device for determining intelligent body action, electronic equipment and medium | |
CN116134292A (en) | Tool for performance testing and/or training an autonomous vehicle planner | |
WO2021077446A1 (en) | Prediction method for trajectory of vehicle, prediction system and vehicle | |
Wang et al. | WSiP: Wave superposition inspired pooling for dynamic interactions-aware trajectory prediction | |
Kamran et al. | Minimizing safety interference for safe and comfortable automated driving with distributional reinforcement learning | |
Pierre | Incremental lifelong deep learning for autonomous vehicles | |
WO2020164089A1 (en) | Trajectory prediction using deep learning multiple predictor fusion and bayesian optimization | |
CN115049130A (en) | Automatic driving track prediction method based on space-time pyramid | |
Zhang et al. | A learning-based method for predicting heterogeneous traffic agent trajectories: Implications for transfer learning | |
Yang et al. | Vehicle trajectory prediction based on LSTM network | |
US20190227552A1 (en) | Vehicle control device | |
CN116448134B (en) | Vehicle path planning method and device based on risk field and uncertain analysis | |
CN116129637A (en) | Pedestrian track prediction system and method considering interaction | |
Yechiel et al. | IVO robot driver | |
WO2022146722A1 (en) | Systems and methods related to controlling autonomous vehicle(s) | |
Sharma et al. | A CNN and Multi-Head Attention-Based Deep Learning Network for Trajectory Prediction of Autonomous Vehicles on Multi-Lane Highways | |
US11195287B2 (en) | Method and device for checking the plausibility of a flow vector hypothesis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19950134 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19950134 Country of ref document: EP Kind code of ref document: A1 |