CN109709956B

CN109709956B - Multi-objective optimized following algorithm for controlling speed of automatic driving vehicle

Info

Publication number: CN109709956B
Application number: CN201811600366.7A
Authority: CN
Inventors: 王雪松; 朱美新; 孙平
Original assignee: Tongji University
Current assignee: Tongji University
Priority date: 2018-12-26
Filing date: 2018-12-26
Publication date: 2021-06-08
Anticipated expiration: 2038-12-26
Also published as: CN109709956A

Abstract

The invention develops a multi-objective optimized following algorithm for controlling the speed of an automatic driving vehicle. The algorithm provides a model for controlling the automobile following speed based on deep reinforcement learning, and the model not only simulates human driving, but also directly optimizes driving safety, efficiency and comfort. A reward function reflecting driving safety, efficiency and comfort is constructed by combining collision time, headway empirical distribution and acceleration, an actual driving data training model in a Next Generation Simulation (NGSIM) project is used, the following behavior simulated by the model is compared with the behavior observed in NGSIM empirical data, and the reinforced learning intelligent body learns the vehicle speed safely, comfortably and efficiently in a mode of maximizing accumulated reward through tests and trial and error in a Simulation environment. The results show that the proposed following speed control algorithm shows better safe, efficient and comfortable driving ability compared to a real-world human driver.

Description

Multi-objective optimized following algorithm for controlling speed of automatic driving vehicle

Technical Field

The invention relates to the field of automatic driving following control, in particular to a following algorithm for multi-objective optimization of speed control of an automatic driving vehicle.

Background

The following control is an important component of automatic driving intelligent decision, and comprises speed selection in free driving, vehicle distance keeping in vehicle following and braking in emergency. Under the condition that automatic driving and human driving coexist, the automatic driving vehicle makes a follow-up control decision similar to a human driver (anthropomorphic for short), so that the comfort level and the trust degree of passengers are improved, and meanwhile, other traffic participants can understand and predict the behavior of the automatic driving vehicle better, so that the safety interaction between the automatic driving and the human driving is realized. However, the traditional following model has many limitations when being applied to automatic following control, such as limitation on flexibility and accuracy of the model, difficulty in popularization to driving scenes and drivers except for calibration data, and incapability of reflecting driving styles and driving scenes of actual drivers of vehicles when being applied to automatic driving.

Deep Learning (DRL) is widely used in the fields of industrial manufacturing, simulation, robot control, optimization and scheduling, game playing, etc., and its basic idea is to learn the optimal strategy to achieve the goal by maximizing the accumulated award value obtained from the environment by the intelligent agent. The DRL method focuses more on learning a problem solving strategy, and does not fit data, so that the generalization capability of the DRL method is stronger, and reference is provided for automatic driving vehicle following control.

Disclosure of Invention

The purpose of the invention is: a multi-objective optimized following algorithm for controlling the speed of an automatically driven vehicle. The algorithm proposes a model for vehicle following speed control that directly optimizes driving safety, efficiency and comfort. Combining the Time To Collision (TTC), the experience distribution of the time headway and the Jerk (Jerk), a reward function reflecting the driving safety, the efficiency and the comfort is constructed, an actual driving data training model in a Next Generation Simulation (NGSIM) project is used, the following behavior simulated by the model is compared with the behavior observed in the NGSIM experience data, and the reinforcement learning intelligent body learns the vehicle speed safely, comfortably and efficiently in a mode of maximizing the accumulated reward through tests and trial and error in a simulation environment. The results show that the proposed following speed control algorithm shows better safe, efficient and comfortable driving ability compared to a real-world human driver.

The technical scheme adopted by the invention is as follows:

a multi-objective optimized following algorithm for controlling the speed of an automatic driving vehicle comprises the following steps:

step 1: data is acquired. And (3) extracting the following events based on the criteria that the front vehicle and the rear vehicle stay on the same lane and the length of the vehicle following events is greater than 15 seconds by using the data in the NGSIM project, and taking one part as training data and the other part as test data based on the extracted following events.

Step 2: a reward function is constructed. Characteristic quantities reflecting the relevant objectives (safety, comfort, efficiency) of the following control of the vehicle are proposed.

Step 2.1: time To Collision (TTC) is used to reflect safety. TTC represents the amount of time remaining before a collision of two vehicles and is formulated as

Where Sn-1, n (t) is the inter-vehicle distance, Δ Vn-1, n (t) is the relative velocity. Determining the safety threshold value to be 7 seconds according to NGSIM empirical data, and performing TTC feature construction:

if TTC is less than 7 seconds, the TTC characteristic index is a negative value, and as the TTC approaches zero, the TTC characteristic approaches negative infinity, and the most severe punishment is shown for the situation of approaching collision.

Step 2.2: the driving efficiency is measured by headway. From the analysis, the lognormal distribution is adapted to the distribution of the acquired training data with a probability density function of

x>0. From the extracted data, it can be estimated that the mean μ and the logarithmic standard deviation σ of the distribution variable x are 0.4226 and 0.4365, respectively. And constructing the headway characteristics into the probability density value of the estimated headway lognormal distribution: fheadway ═ flognormal (headway | μ ═ 0.4226, σ ═ 0.4365). According to the headway time characteristic, the headway time of about 1.3 seconds corresponds to a high characteristic value, and the headway time is too long or too short and corresponds to a low characteristic value, so that the characteristic value estimates the high-flow headway maintenance behavior, and meanwhile punishs the unsafe or too long headway maintenance behavior.

Step 2.3: the driving comfort is measured by adopting the change rate Jerk of the acceleration, and the characteristic is as follows:

step 2.4: and establishing a comprehensive reward function. R is w1FTTC + w2Fheadway + w3fjerk, where w1, w2, w3 are coefficients of the features, all set to 1.

And step 3: and (5) training the model. And during each training, sequentially simulating the following events in the data, repeating the training for multiple times, and selecting the model which obtains the maximum average reward on the test data as the final model.

And 4, step 4: and (6) evaluating the model. And (4) comparing and evaluating the following behaviors obtained by the NGSIM data and the DDPG model simulation by using indexes such as TTC, headway, jerk and the like.

The invention has the advantages that:

1. the developed autonomous vehicle-following control logic is applicable to autonomous vehicle development;

2. the algorithmic model does not mimic human driving, but directly optimizes driving safety, efficiency, and comfort.

Drawings

FIG. 1 is a flow chart of the present invention.

Fig. 2NGSIM data is compared to the DDPG model driving safety.

Fig. 3 comparison of driving comfort between NGSIM data and DDPG models.

Detailed Description

The algorithm proposes a model for automobile follow-up speed control based on deep reinforcement learning, and the model does not imitate human driving but directly optimizes driving safety, efficiency and comfort. Combining the Time To Collision (TTC), the experience distribution of the time headway and the Jerk (Jerk), a reward function reflecting the driving safety, the efficiency and the comfort is constructed, an actual driving data training model in a Next Generation Simulation (NGSIM) project is used, the following behavior simulated by the model is compared with the behavior observed in the NGSIM experience data, and the reinforcement learning intelligent body learns the vehicle speed safely, comfortably and efficiently in a mode of maximizing the accumulated reward through tests and trial and error in a simulation environment. The results show that the proposed following speed control algorithm shows better safe, efficient and comfortable driving ability compared to a real-world human driver. The results show that the proposed following speed control algorithm shows better safe, efficient and comfortable driving ability compared to a real-world human driver.

The invention is described in detail below with reference to the following figures and specific examples, the steps of which are as follows:

step 1: data is acquired. And (3) extracting a following event based on the criteria that a front vehicle and a rear vehicle stay on the same lane and the length of the vehicle following event is greater than 15 seconds and the like by using data in a Next Generation Simulation (NGSIM) project, and taking one part as training data and the other part as test data based on the extracted following event.

x>0. Based on the extracted dataIt can be estimated that the mean μ and the logarithmic standard deviation σ of the distribution variable x are 0.4226 and 0.4365, respectively. And constructing the headway characteristics into the probability density value of the estimated headway lognormal distribution: fheadway ═ flognormal (headway | μ ═ 0.4226, σ ═ 0.4365). According to the headway time characteristic, the headway time of about 1.3 seconds corresponds to a high characteristic value, and the headway time is too long or too short and corresponds to a low characteristic value, so that the characteristic value estimates the high-flow headway maintenance behavior, and meanwhile punishs the unsafe or too long headway maintenance behavior.

step 2.4: and establishing a comprehensive reward function. Establishing r ═ w1FTTC + w2Fheadway + w3fjerk according to the above steps 2.1, 2.2, 2.3, where w1, w2, w3 are coefficients of the features, all set to 1.

Examples

By comparing the empirical NGSIM data with the driving following behavior simulated by the DDPG model, the model can be tested to safely, efficiently and comfortably follow the front vehicle.

Data is acquired. Using the data in the NGSIM project, the following event is extracted based on criteria such as the leading and trailing vehicles staying on the same lane and the length of the vehicle following event >15 seconds.

In terms of driving safety, a following event is randomly selected from the NGSIM data set. FIG. 2 shows the observed velocities, spacings, and accelerations, along with the corresponding index values generated by the DDPG model. The driver in the NGSIM data drives at a very small inter-vehicle distance after 10 seconds, while the DDPG model always maintains a following gap of about 10 meters.

In terms of driving comfort, a follow-up event is randomly selected in the NGSIM dataset. FIG. 3 shows the observed velocity, pitch, acceleration and Jerk values, along with the corresponding index values generated by the DDPG model. The driver in NGSIM data produces frequent acceleration changes and large Jerk values during driving, while the DDPG model can maintain near constant acceleration and produce low Jerk values.

Based on the above, the proposed following speed control algorithm shows better safe, efficient and comfortable driving ability compared to the human driver in the NGSIM.

Claims

1. A multi-objective optimized follow-up algorithm for controlling the speed of an automatically driven vehicle is characterized by comprising the following steps:

step 1: acquiring data;

using data in an NGSIM project, extracting a following event based on the criterion that a front vehicle and a rear vehicle stay on the same lane and the length of the vehicle following event is greater than 15 seconds, and taking one part as training data and the other part as test data based on the extracted following event;

step 2: constructing a reward function;

providing characteristic quantities reflecting related targets of automobile follow-up control, wherein the characteristic quantities specifically comprise safety, comfort and efficiency;

step 2.1: adopting TTC to reflect safety;

TTC is the time to collision, representing the amount of time remaining before two vehicles collide, and is formulated as

Wherein Sn-1, n (t) distance between cars, Δ Vn-1, n (t) relative velocity; determining the safety threshold value to be 7 seconds according to NGSIM empirical data, and performing TTC feature construction:

if TTC is less than 7 seconds, the TTC characteristic index is a negative value, and as the TTC approaches zero, the TTC characteristic approaches negative infinity, and for approaching collisionThe situation (2) represents the most severe penalty;

step 2.2: measuring the driving efficiency by adopting headway ();

headway is the Headway time; from the analysis, the lognormal distribution is adapted to the distribution of the acquired training data with a probability density function of

x>0; from the extracted data, it can be estimated that the mean μ and the logarithmic standard deviation σ of the distribution variable x are 0.4226 and 0.4365, respectively; and constructing the headway characteristics into the probability density value of the estimated headway lognormal distribution: fheadway ═ flognormal (headway | μ ═ 0.4226, σ ═ 0.4365); according to the train head time characteristic, a train head time distance of about 1.3 seconds corresponds to a high characteristic value, and an overlong train head time distance or an overlong train head time distance corresponds to a low characteristic value, so that the characteristic value estimates a high-flow train head distance keeping behavior, and punishs an unsafe or overlong train head distance keeping behavior;

step 2.4: establishing a comprehensive reward function;

establishing r ═ w1FTTC + w2Fheadway + w3fjerk according to the above steps, where w1, w2, w3 are coefficients of features, all set to 1;

and step 3: training a model;

in each training, sequentially simulating the following events in the data, repeating the training for many times, and selecting a model which obtains the maximum average reward on the test data as a final model;

and 4, step 4: evaluating the model;

and comparing and evaluating the NGSIM data and the following behaviors obtained by DDPG model simulation by using TTC, headway and jerk indexes.