WO2021077446A1

WO2021077446A1 - Prediction method for trajectory of vehicle, prediction system and vehicle

Info

Publication number: WO2021077446A1
Application number: PCT/CN2019/113683
Authority: WO
Inventors: Mathieu MOZE; Franck Guillemard; Francois Aioun; Huijing ZHAO; Ruoyu Sun; Shaochi HU
Original assignee: Psa Automobiles Sa; Peking University
Priority date: 2019-10-25
Filing date: 2019-10-28
Publication date: 2021-04-29

Abstract

A prediction method for the trajectory of a vehicle comprises a step S1 of generating a set ζ containing admissible trajectories ζ, (i=1...N T), for predicted vehicle from an initial situation and a step S2 of selecting a particular trajectory ζ * from set ζ that mimicks the one human would follow in same initial situation.

Description

PREDICTION METHOD FOR TRAJECTORY OF VEHICLE, PREDICTION SYSTEM AND VEHICLE

FIELD OF TECHNOLOGY

The present invention relates to a field of autonomous driving, in particular, to a prediction method for a trajectory of vehicle based on inverse reinforcement learning, a prediction system for a trajectory of vehicle based on inverse reinforcement learning and a vehicle comprising the same.

BACKGROUND

At present, the driving assistance system and the autonomous driving vehicle have made great progress, and the advanced intelligent vehicle has great potential in improving the performance and safety of the transportation system and getting people out of the driving task. The autonomous driving system not only needs to maintain safety, but also needs to follow the behavior of the human driver to make the passengers of the autonomous driving vehicle feel comfortable and make them accept the autonomous driving vehicle. However, other traffic participant on the road will also predict and make decisions about potential risks on the road based on experience. Therefore, it is necessary: 1 -to develop a prediction method of surrounding vehicles trajectories for anticipation in autonomous driving trajectory planning, and 2 –to generate driving trajectories simulating human driving.

SUMMARY

(I) Technical problem to be solved

A first object of the present invention is to provide a prediction method for the trajectory planning of an autonomous vehicle based on inverse reinforcement learning capable of predicting the trajectory of human driven vehicles during the driving.

A second object of the present invention is to provide a system comprising the prediction method for the trajectory planning of an autonomous vehicle based on inverse reinforcement learning capable of predicting the trajectory of human driven vehicles during the driving.

A third object of the present invention is to provide a vehicle comprising the prediction system for the trajectory planning of an autonomous vehicle based on inverse reinforcement learning capable of predicting the trajectory of human driven vehicles during the driving.

(II) Technical solution

In order to solve the technical problems above, the present invention provides a prediction method for the trajectory of a vehicle based on inverse reinforcement learning comprising:

a step S1 of generating a set ζ containing admissible trajectories ζ _i, (i=1...N _T) , for predicted vehicle from an initial situation; and a step S2 of selecting a particular trajectory ζ ^* from set ζ that mimicks the one human would follow in same initial situation.

In an embodiment of present disclosure, the admissible trajectories are generated in a Frenet frame, wherein the Frenet frame is set at the initial position of the predicted vehicle and the Frenet frame is fixed during the whole trajectory prediction and is always relative to the predicted vehicle initial position.

In an embodiment of present disclosure, the method further comprises a step S3 of the description of situations, and the step S3 comprises a method S310 of the definition of initial situation and a method S320 of the definition of potential future situations.

In an embodiment of present disclosure, the step S2 includes a step of situation-action trajectory definition and a step of trajectory selection.

In an embodiment of present disclosure, the prediction method further comprises a method for the description of situations, wherein the method for the description of situations includes the definition of a situation and the definition of initial situation and potential future situations.

In an embodiment of present disclosure, both the step S1 and the step S2 in the prediction are implemented by an embedded algorithm.

In an embodiment of present disclosure, the prediction method further comprises a calibration method including a reward function calibration.

In an embodiment of present disclosure, the reward function is obtained using a Recurrent Neural Network (RNN) with at least 2 layers in which a first layer is a state encoding and a second layer is a recurrent layer.

In another aspect, the present invention provides a prediction system for the trajectory of a vehicle for implementing the prediction method for the trajectory of the vehicle.

In still another aspect, the present invention provides a vehicle comprising the prediction system for the trajectory of the vehicle.

(III) Beneficial effects

The method proposed in this invention enables selection of a human-like trajectory and can therefore be used

· for human-like trajectory planning of an predicted autonomous driving vehicle; or

· for surrounding human driven vehicles trajectories prediction.

Such method increases acceptability of autonomous driving by surrounding drivers and is essential to predict surrounding human driven vehicles trajectories.

These and other aspects of the present invention will be apparent from and elucidated with reference to the embodiments described hereinafter.

BRIEF DESCRIPTION OF THE DRAWINGS

The above and other objects, features and other advantages of the present invention will be more clearly understood from the following description in conjunction with the drawings in which:

Fig. 1 is a schematic view showing a Frenet frame disposed in an predicted vehicle according to an embodiment of the present invention;

Fig. 2 is a schematic view showing a situation where the measurements are performed by the predicted vehicle using a LIDAR according to an embodiment of the present invention;

Fig. 3 is a schematic diagram showing a situation where the measurements are performed by the surrounding vehicle of a non-predicted vehicle using a LIDAR according to an embodiment of the present invention;

Fig. 4 is a schematic diagram showing respective generated trajectories and human driving trajectory according to an embodiment of the present invention; and

Fig. 5 is a schematic view showing the RNN configuration in the case where an predicted vehicle performs measurements using a laser beam according to an embodiment of the present invention.

DETAILED DESCRIPTION

The specific embodiments of the present invention are further described in detail below with reference to the drawings and embodiments. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.

Further, in the description of the present invention, “multiple” , “aplurality of” , and “multiple groups” mean two or more unless otherwise stated.

An embodiment of the present invention provides a prediction method for the trajectory of a human-driven vehicle based on inverse reinforcement learning and a vehicle according to an embodiment of the present invention is hereinafter referred to as an “predicted vehicle” .

The prediction method for the trajectory of a vehicle based on inverse reinforcement learning according to the present invention comprises the following two steps: a step S1 of generating a set ζ containing admissible trajectories ζ _i, (i=1…N _T) , for predicted vehicle from an initial situation; and a step S2 of selecting a particular trajectory ζ ^* from set ζ that mimicks the one human would follow in same initial situation.

Specifically, the set ζ containing admissible trajectories ζ _i, (i=1…N _T) , for predicted vehicle from an initial situation is generated in the step S1 further comprising a step S110 of trajectory definition and a step S120 of trajectory generation.

Specifically, the prediction method for the trajectory of a vehicle based on inverse reinforcement learning according to the present invention uses a Frenet frame F ₀, which is on the surface of the flat road, and as shown in Fig. 1, lateral axis

is the unit vector tangent to the middle lane line at the predicted vehicle initial position, pointing in the direction of motion; longitudinal axis

is the unit vector normal to

and pointing to the opposite direction lane and its original point is set at the initial position of the predicted vehicle. The steps S110 and S120 are implemented based on the Frenet frame F ₀.

Further, the step S110 comprises a method S111 of trajectory definition using coordinates, a method S112 of trajectory definition using boundary conditions and a method S113 of trajectory definition using sequence of actions.

With respect to the method S111 of trajectory definition using coordinates, the predicted vehicle position can be defined by the trajectories of its coordinates ζ _coord (t) = (x (t) , y (t) ) considering t=0 at initial time in the frame described above. For example, the method S111 can be used when prediction starts and an initial situation can still be measured. As an example, this trajectory can derived from order 5 polynomials:

The generation of the trajectory of the predicted vehicle using the Frenet Frame F ₀ above can minimize the acceleration, and make the velocity, acceleration and jerk to have continuity. In this specification, ζ _coord (t) [ζ] implies that ζ _coord (t) is the definition of trajectory ζ.

Further, considering initial predicted vehicle position in the Frenet frame as nul and time T at the end of the trajectory known, the predicted vehicle trajectory can be defined using the method S112 of trajectory definition using boundary conditions. For example, as shown in the following formula, predicted vehicle trajectory defined using order 5 polynomials can be equivalently fully defined by a particular unique vector:

Where predicted vehicle initial velocity v ₀ and acceleration a ₀ are measured and X _end, Y _end,

and

are respectively the position, velocity and acceleration of the predicted vehicle on the

and

axis alternatively, at time T. In the specification, ζ _bound [ζ] will be used to imply that ζ _bound (t) is the definition of trajectory ζ.

In addition, the predicted vehicle trajectory can be defined using the method S113 of trajectory definition using sequence of actions. In a general case, a sequence of large N _a values of acceleration on the

and

axis equivalently defines a trajectory when its initial structure is known, as well as sampling time Δ _T. As an example, when N _a≥5, the predicted vehicle trajectory can be defined by order 5 polynomials. Equivalently, a sequence of absolute values of the acceleration and angles between the acceleration vectors and an axis of the Frenet frame defines a trajectory or a sequence of a vehicle steering and pedals angles defines a trajectory. In this specification, any variable a (t) such that knowing N _a values evaluated at t=k. Δ _t (k=0…N _a-1) enables definition of a trajectory ζ is called an action.

Such a sequence of actions (as an example, accelerations or accelerations angle or steering and pedals angles) is called a trajectory defined using a sequence of actions, and will be denoted

whereas sampling time Δ _t will be stated or inferred from the context and ζ _actions [ζ] will be used to state that ζ _actions is the definition of trajectory ζ whereas [ζ _actions] _k will be used to denote the action at time k. Δ _t, i.e. in this embodiment of the present invention, [ζ _actions] _k=a _k.

In another aspect, the step S1 further comprises a method S120 for trajectory generation. In the frame described above, a set ζ of N _T trajectories ζ _i (i=1…N _T) , can be generated from tessellation of the parameters used in their definition ζ _bound, i [ζ _i] . In this embodiment, final position (X _end, Y _end) , velocity

acceleration

and travel time during full trajectory t _end are deduced from tessellation of final lateral deviation Δ _Y, final velocity increase Δ _v and manoeuver duration Δ _D.

Considering highway situations, the following constraints can be used:

Y _end, i=Δ _Y, i (4) ,

and

X _end, i=x (Δ _D) (9)

which can all be derived from (1) with

and

initial predicted vehicle velocity v ₀ and acceleration a ₀ being measured.

Tessellation for trajectory generation can thus be performed through

where Δ _v-, Δ _v+ and

are parameters,

where Δ _y-, Δ _y+ and

are parameters, and

where Δ _D-, Δ _D+ and

are parameters.

From given predicted vehicle initial velocity v ₀ and acceleration a ₀, such a tessellation enables thus to generate

trajectories ζ _i using:

On another aspect, an embodiment of the present invention also provides a method S3 of the description of situations. The method S3 of the description of situations comprises a method S310 of the definition of initial situation and a method S320 of the definition of potential future situations.

Specifically, the method S310 of the definition of the initial situation is as follows:

given sampling time Δ _t as a parameter, as an example Δ _t=0.1, a situation s _j is considered as an association of a descriptor

of the scene around the predicted vehicle and of the velocity of the predicted vehicle, at time j. Δ _t, all gathered in a vector, i.e. such that

The scene descriptor

embeds the position and velocity of surrounding vehicles relatively to the predicted vehicle, represented either by direct measurements or by a combination of them. As an example, when the predicted vehicle is performing trajectory planning and is equipped with a LIDAR, the scene descriptor

can be represented by

where d _i, j denotes, at time j. Δ _t, the distance from the predicted vehicle to the near material point at angle i from axis

given in degrees, d _i and

both being typically provided by measurement from LIDAR with resolution res=360° and in the formula

where v _j is the predicted vehicle velocity, at same time j. Δ _t.

Fig. 2 shows an example where the measurements are performed by the predicted vehicle using a LIDAR with 12 beams. In addition, Fig. 3 shows an example where the measurements are performed by another surrounding vehicle in the case where the predicted vehicle is not the one on which LIDAR and other measurement units are embedded. In this case, the scene descriptor must be reconstructed as scene descriptor is relative to the ego-vehicle.

In the actual situation, s _j depends on the predicted vehicle trajectory, it embeds its velocity and the scene descriptor is relative to the predicted vehicle position along the trajectory. This dependency will be reflected through the notation s _j [ζ] to emphasize the fact that the measurements are performed along trajectory ζ. In addition, [s [ζ] ] _j will also be used to refer to the j ^th element of the full sequence

also called situation trajectory. And finally, the initial situation is denoted as s ₀ and follows the structure presented in the method S310 of the definition of initial situation.

On the other aspect, method S3 of the description of situations comprises a method S320 of the definition of potential future situations. In this case, this potential situation is then denoted

where usually j is a future expected sample. Such an estimation can be performed using a linear model on the moving objects, such that the longitudinal position x _k ( (j+1) . Δ _t) of moving object k at time (j+1) . Δ _t is given by:

x _k ( (j+1) . Δ _t)=x _k (j. Δ _t) +v _j. Δ _t. (10)

In the description above, the step S1 of generating the set ζ including the allowable trajectories ζ _i (i=1…N _T) of the predicted vehicle from the initial situation thereof has been described, and then the step S2 of selecting a particular trajectory ζ ^* from set ζ will be described in detail.

The step S2 includes S210 of situation-action trajectory definition and S220 of trajectory selection.

For the situation-action trajectory definition, considering sampling time Δ _t, a situation-action trajectory ξ, associated with trajectory ζ, also denoted ξ [ζ] , is defined as a sequence

such that it is formed at sample j by concatenating situation s _j [ζ] and predicted vehicle trajectory [ζ _action [ζ] ] _j, i.e.

[ξ [ζ] ] _j= ( [s [ζ] ] _j, [ζ _action [ζ] ] _j (19) .

Further, in the S220 of the trajectory selection, the goal of this trajectory selection step is to select an appropriate trajectory

from set ζ containing N _T generated trajectories ζ _i， i=1…N _T, such that it maximizes a global reward function represented by the following formula:

where γ is a parameter, usually called a discount factor and fixed as an example to γ=1; N is a parameter that fixes the number of samples to consider for optimization, inferred from parameter t _hor, called horizon time and such that t _hor≥t _end-, as an example N=30 if t _hor=3 s and t _end-≥3 s, which gives 10 samples per second , and

is called a reward function, while a _k= [ζ _action [ζ] ] _k and s _k=s _k [ζ] are respectively the action and the situation along trajectory ζ at sample k.

Please note that this trajectory selection step is performed at initial time, i.e. k=0, when future situations are not available. And at potential future situations, this proposal uses a linear motion model to extrapolate surrounding vehicles trajectories until time t _hor=N. Δ _t.

Specifically, the reward function is used for implementing the Inverse (or Reverse) Reinforcement Learning. At each sample k,

is high when action a _k is adapted to situation s _k for a particular purpose. Adapting a reward function to this purpose passes through its parameterization, which is enhanced by the notation referring to parameter θ, or θ ^* when this parameter is the optimal one regarding the purpose. In this embodiment of the present invention, the reward function is associated with the purpose of mimicking human behavior such that R _θ (ζ) is high if predicted vehicle trajectory ζ is human-like, e.g. is near to the one followed by a human driven vehicle in same situations. Fig. 4 shows human driven trajectory, multiple generated ones and a unique selected one.

Hereinafter, an embedded algorithm for implementing steps S1 and S2 will be explained.

According to the embodiments of the present invention, two types of applications can be derived comprising: one associated with autonomous driving trajectory planning, for which human-like trajectory enhances acceptability from passengers and surrounding drivers, and another one associated with surrounding vehicles trajectory prediction, supposed driven by human drivers, which reduces environment uncertainty for autonomous driving trajectory planning and thus increases safety. The main difference between both the applications lies in that the measurements are performed from the predicted vehicle in the case of trajectory planning and the measurements are performed from the vehicle on which an algorithm is embedded in the case of surrounding vehicle trajectory prediction. The difference leads to the facts that the descriptor D _j requires computation and is not a simple collection of measurements in the case of trajectory prediction and some information may be missing due to occlusion of the near predicted vehicle environment from the vehicle on which the algorithm is embedded. And this distinction is addressed in step S310.

In the algorithm for predicted vehicle trajectory planning and surrounding vehicle trajectory prediction, parameters include t _hor, γ,

Δ _T, Δ _v-, Δ _v+ ,

Δ _y-, Δ _y+ ,

Δ _D-, Δ _D+ and

and the algorithm includes the following steps:

a step S200: recording N _HT sequences of actions

associated with human driven trajectories

and associated situations

a step S201: measuring position, velocity and acceleration of all measurable moving objects relative distance to predicted vehicle, velocity and acceleration of all surrounding vehicles;

a step S202: measuring velocity v ₀ and acceleration a ₀ of predicted vehicle;

a step S203: computing scene descriptor

relatively to the predicted vehicle as presented in (17) ;

a step S204: concatenating measures of step S202 and step S203 to form initial situation s ₀ as presented in (18) ;

a step S205: generating N _T trajectories ζ _i, (i=1…N _T) as presented in (15) , using definition by boundary conditions, from tessellation of Δ _v, Δ _y and Δ _D and initial velocity and acceleration measured in step S201;

a step S206: including step S2061 of computing

where i denotes the trajectory index; step S2062 of using a linear motion model to extrapolate displacement of surrounding vehicles for all next N samples; step S2063 of generating, using estimated situations obtained at step S202 and potential actions obtained at S201, situation-action trajectories

where

as presented in (19) , and step S2064 of computing global reward cost

as presented in (20) ; and

a step S207: selecting a trajectory

according to

Therefore, predicted or planned trajectory

can be obtained through steps from S200 to S207.

In another aspect, an embodiment of the present invention also provides a calibration method comprising a calibration method of the reward function and a calibration method of other parameters.

The purpose of the calibration of the reward function is to associate a real value to a given trajectory through evaluation of each of its situation-action pair such that its value is high when the action is adapted to the situation for a particular purpose. In this embodiment, reward function is associated with the purpose of mimicking human behavior such that R _θ (ζ) is high if predicted vehicle trajectory ζ is human-like, e.g. is near to the one followed by a human driven vehicle in same situations.

Specifically, N _HT sequences of actions

associated with human driven trajectories

are recorded by the formula

and associated situations

are recorded such that N _HT situation-action trajectories

can be defined. For each human driven recorded trajectory

a set

of N _T trajectories is generated and each generated trajectory ζ _j, k is sampled such that recorded trajectory and generated trajectory have the same sample time Δ _T.

Further, the idea is then to have a reward function r _θ , wherein parameterized by a vector θ, and for which there exists a particular vector θ ^* such that:

Such a reward function can be obtained using a Recurrent Neural Network (RNN) with at least 2 layers, such that the first layer is a state encoding and the second one is a recurrent one. All the tunable parameters contained in this network are embedded in vector θ and optimal vector θ ^* can be obtained by a gradient ascent method.

The first layer of RNN is responsible for state encoding of a situation as defined in (16) . Its output is denoted O ₁. The second layer has as input the output O ₁ of the first layer, as well as the action associated with situation s in order to evaluate the situation-action pair. As this second layer is recurrent, its input at sample k also embeds its output at previous sample k-1. This leads to the equations of the full RNN at sample k be given by:

where, as denoted by the notation in previous equation, f ₁ and f ₂ are parameterized by θ ₁ and θ ₂ respectively, such that RNN parameter θ= [θ ₁, θ ₂] . Fig. 5 shows the RNN configuration in the case where an predicted vehicle performs measurements using a laser beam according to an embodiment of the present invention.

In addition, training, i.e. computing optimal parameter θ ^*, is finally performed through usual gradient ascent maximization of

where ω _j, k is the probability of a candidate trajectory ζ _j, k to be selected by the RNN among all the trajectories belonging to set

i.e.generated from initial situation of recorded human driven trajectory

Further, in an algorithm for offline reward function calibration, parameters include t _hor, γ, θ ₀, μ, Δ _T, Δ _v-, Δ _v+ ,

Δ _y-, Δ _y+,

Δ _D-, Δ _D+,

The algorithm for offline reward function calibration includes the steps:

S210: recording N _HT sequences of actions

associated with human driven trajectories

and associated situations

S211: for each recorded human driven initial situation

performing the following steps including:

S2111: for each sample time k. Δ _T, k≤N where NΔ _T=t _hor, computing

S2112: computing

as presented in (20) ;

S2113: generating N _T trajectories ζ _i (i=1…N _T) as presented in (15) , using definition by boundary conditions, from tessellation of Δ _v, Δ _y and Δ _D and initial velocity and acceleration from

S2114: for each sample time k. Δ _T, k≤N where NΔ _T=t _hor, computing

S2115: computing

as presented in (20) ;

S2116: looping to S2114 and S2115 for optimization of (24) ; and

S2117: else θ ^*: =θ.

Therefore, optimal parameter for human like trajectory planning or prediction θ ^* can be obtained through steps above.

In another aspect, Table 1 shows all the parameters that need to be valued. However, their calibration does not affect the functionality of the embodiment of the present invention, only its optimality. Some parameters are only used for training and their value only affect computation time for training. As training is an off-line phase, associated with calibration, this computation time is not a real time issue. All the parameters both used in the offline and online phases must have the same values in both these phases.

Table 1 Parameters used in the embodiment and usual values

In another aspect, the present invention provides a prediction system for the trajectory of a vehicle capable of implementing the prediction method for the trajectory of a vehicle.

In still another aspect, the present invention provides a vehicle comprising the prediction system for the trajectory of a vehicle.

The proposed method has at least the following advantages:

1. Similarity of trajectories are measured on both path and velocity profiles (not only on features) , and learns a cost function that favors on the trajectories similar to the real one.

2. The method generates and analyzes trajectories with respect to each driving situation, and the feature correlations are considered (i.e. trajectories cannot be isolated from its driving situation) .

It will be understood that there are numerous modifications of the illustrated embodiments described above which will be readily apparent to one skilled in the art, such as many variations and modifications as for the structure of the device for guiding and the assistant component. These modifications and variations fall within the scope of the claims, which follow.

Claims

A prediction method for the trajectory of a vehicle, characterized in that comprising:

a step S1 of generating a set ζ containing admissible trajectories ζ _i, (i=1…N _T) , for predicted vehicle from an initial situation; and

a step S2 of selecting a particular trajectory ζ ^* from set ζ that mimicks the one human would follow in same initial situation.
The prediction method according to claim 1, characterized in that,

the admissible trajectories are generated in a Frenet frame, wherein the Frenet frame is set at the initial position of the predicted vehicle and the Frenet frame is fixed during the whole trajectory prediction and is always relative to the predicted vehicle initial position.
The prediction method according to claim 1, characterized in that,

the method further comprises a step S3 of the description of situations, and the step S3 comprises a method S310 of the definition of initial situation and a method S320 of the definition of potential future situations.
The prediction method according to claim 1, characterized in that,

the step S2 includes a step of situation-action trajectory definition and a step of trajectory selection.
The prediction method according to claim 1, characterized in that, further comprising a method for the description of situations, wherein the method for the description of situations includes the definition of a situation and the definition of initial situation and potential future situations.
The prediction method according to claim 5, characterized in that,

both the step S1 and the step S2 are implemented by an embedded algorithm.
The prediction method according to claim 6, characterized in that, further comprising:

a calibration method including a reward function calibration.
The prediction method according to claim 6, characterized in that,

the reward function is obtained using a Recurrent Neural Network (RNN) with at least 2 layers in which a first layer is a state encoding and a second layer is a recurrent layer.
A prediction system for the trajectory of a vehicle, used for implementing the prediction system for the trajectory of the vehicle according to any one of claims 1 to 8.
A vehicle comprising the prediction system for the trajectory of the vehicle according to claim 9.