CN114954455A

CN114954455A - Electric vehicle following running control method based on multi-step reinforcement learning

Info

Publication number: CN114954455A
Application number: CN202210770539.XA
Authority: CN
Inventors: 翟春杰; 王栎; 裘健鋆
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2022-06-30
Filing date: 2022-06-30
Publication date: 2022-08-30
Anticipated expiration: 2042-06-30
Also published as: CN114954455B

Abstract

The invention discloses an electric vehicle following running control method based on multi-step reinforcement learning. The method comprises the following steps: 1. determining a state variable through an information acquisition module and a controller design module, and determining a control variable through a control target; 2. dividing the state variable and the equivalent inter-vehicle distance to obtain a Q table; 3. inputting the state variable and the control variable into a Q table to obtain an expected value function; 4. and solving the state variable of the next moment through the longitudinal dynamics module and the electric automobile energy storage module. 5. And selecting a Q table with the minimum expected cost value within n steps in the future to obtain a control variable according to the acceleration change condition of the front vehicle at the next moment. 6. And judging the control variable solved by the Q table as the optimal or suboptimal control variable. The invention utilizes the communication facility between the front vehicle and the main vehicle, so that the vehicle can acquire the information of the acceleration and the like of the front vehicle and realize the vehicle following effect, and the aim of reducing the power consumption in the vehicle following process is fulfilled.

Description

A multi-step reinforcement learning-based control method for electric vehicle following driving

技术领域technical field

本发明属于智能驾驶技术领域，尤其针对前后车跟车，具体涉及一种基于多步强化学习的电动车跟车行驶控制方法。The invention belongs to the technical field of intelligent driving, and in particular is aimed at following the vehicle at the front and rear, and in particular relates to a control method for following the vehicle of an electric vehicle based on multi-step reinforcement learning.

背景技术Background technique

为了减缓全球变暖上升趋势，降低二氧化碳的排放量，同时随着电池的容量和经济性不断提高，纯电动汽车成为新能源汽车发展的方向之一，电动汽车的推广应用排放,为遏制全球暖化的趋势,电动汽车的推广应用越来越受到人们的重视。In order to slow down the rising trend of global warming and reduce carbon dioxide emissions, and with the continuous improvement of battery capacity and economy, pure electric vehicles have become one of the development directions of new energy vehicles. The promotion and application of electric vehicles emits emissions to curb global warming. With the trend of globalization, the promotion and application of electric vehicles has attracted more and more attention.

随着时代的发展，全球不可再生资源日益稀缺，人类在发展的同时必须更加注重对现有非可再生资源的合理利用。然而，随着现代经济的快速蓬勃发展，智能汽车作为现如今的人均标配以及人类不可或缺的重要出行手段之一，如何平衡汽车燃油消耗以及人类社会石油资源的合理分配成为重要难题之一。其次，石油排放问题造成的大气污染、全球变暖等问题日趋突出，加之汽车排放尾气与燃油消耗标准的法规越来越严格，发展新能源电动智能汽车成为人类发展的必要。因此，新能源汽车引起了汽车制造业和政府的关注。现如今随着科学技术手段的不断进步，新能源汽车取得了巨大的进步与发展，电动汽车成为现如今人们出行的重要手段之一，在市场交易份额中占了重大比例。因此，为了更大程度地提升新能源电动汽车的耗电率，延长电池寿命，本文提出了一种基于多步强化学习的电动车跟车行驶控制方法。With the development of the times, the global non-renewable resources are increasingly scarce, and human beings must pay more attention to the rational use of existing non-renewable resources while developing. However, with the rapid and vigorous development of the modern economy, smart cars are now standard per capita and one of the indispensable means of travel for human beings. How to balance the fuel consumption of automobiles and the rational distribution of oil resources in human society has become one of the important problems. . Secondly, air pollution and global warming caused by oil emissions are becoming more and more prominent. In addition, the regulations on vehicle exhaust and fuel consumption standards are becoming more and more strict. The development of new energy electric smart vehicles has become a necessity for human development. Therefore, new energy vehicles have attracted the attention of the automobile manufacturing industry and the government. Nowadays, with the continuous progress of scientific and technological means, new energy vehicles have made great progress and development, and electric vehicles have become one of the important means of people's travel today, accounting for a significant proportion of market transactions. Therefore, in order to increase the power consumption rate of new energy electric vehicles to a greater extent and prolong the battery life, this paper proposes a multi-step reinforcement learning-based control method for electric vehicle following driving.

发明内容SUMMARY OF THE INVENTION

本发明主要考虑随着我国电动车的广泛使用以及新能源汽车市场的逐步开放，我国拥有电动汽车的数量将会持续上升。如何更好地利用电动汽车的发展，从而降低燃油车汽车尾气排放和提高生态环境并且降低电池耗电量是值得探讨的问题。The present invention mainly considers that with the widespread use of electric vehicles in my country and the gradual opening of the new energy vehicle market, the number of electric vehicles in my country will continue to increase. How to make better use of the development of electric vehicles, thereby reducing the exhaust emissions of fuel vehicles, improving the ecological environment and reducing battery power consumption are issues worth exploring.

本发明的目的是提供一种基于多步强化学习的智能汽车跟车行驶控制的方法，利用了前车与主车之间的通信设施，使得车辆能够获取前车加速度等信息以及实现跟车效果，在跟车过程中实现降低耗电量的目标。The purpose of the present invention is to provide a method for intelligent vehicle following driving control based on multi-step reinforcement learning, which utilizes the communication facilities between the preceding vehicle and the host vehicle, so that the vehicle can obtain information such as the acceleration of the preceding vehicle and realize the following effect. , in the process of following the car to achieve the goal of reducing power consumption.

以上目标通过以下的技术方案实现：The above goals are achieved through the following technical solutions:

步骤1、通过车辆自带的信息获取模块和控制器设计模块确定状态变量X(t), 通过控制目标来确定控制变量U(t)，初始化车辆相关参数详见图3。Step 1. Determine the state variable X(t) through the information acquisition module and the controller design module that comes with the vehicle, determine the control variable U(t) through the control target, and initialize the relevant parameters of the vehicle as shown in Figure 3.

步骤2、将状态变量X(t)中的速度V以等比例从-3.7～4.399划分成50000 个格子大小，将等效车间距δd以等比例从-2～2.2999划分成50000个格子大小，将控制变量加速度从-1～1之间划分成21个格子。因此，构成了一个50000*21 个表格大小的Q表。Step 2. Divide the speed V in the state variable X(t) into 50,000 grid sizes from -3.7 to 4.399 in equal proportions, and divide the equivalent vehicle distance δd into 50,000 grid sizes from -2 to 2.2999 in equal proportions, Divide the control variable acceleration into 21 grids from -1 to 1. Therefore, a Q table with a size of 50000*21 tables is formed.

步骤3、将状态变量X(t)和控制变量U(t)输入到Q表中获取期望值函数；Step 3. Input the state variable X(t) and the control variable U(t) into the Q table to obtain the expected value function;

步骤4、通过纵向动力学模块和电动汽车储能模块求解出下一时刻的状态变量X(t+1)。Step 4: Solve the state variable X(t+1) at the next moment through the longitudinal dynamics module and the electric vehicle energy storage module.

步骤5、根据前车下一时刻的加速度变化情况，选取到未来n步以内最小期望代价值的Q表来获取控制变量U(t+1)。Step 5. According to the acceleration change of the preceding vehicle at the next moment, select the Q table with the minimum expected cost value within n steps in the future to obtain the control variable U(t+1).

步骤6、判断该Q表是否满足最大迭代次数或容差是否满足自适应迭代值。若满足，则求解出的控制变量U(t+1)作为最优或者次优的控制变量，否则返回步骤4。Step 6: Determine whether the Q table satisfies the maximum number of iterations or whether the tolerance satisfies the adaptive iteration value. If satisfied, the obtained control variable U(t+1) is taken as the optimal or sub-optimal control variable, otherwise, return to step 4.

步骤7、迭代终止后，从Q表中获得控制输入，经过计算获得最优需求功率 Pe并应用于主车车辆上。Step 7: After the iteration is terminated, the control input is obtained from the Q table, and the optimal demand power Pe is obtained through calculation and applied to the host vehicle.

本发明方法具有的优点及有益结果为：The advantages and beneficial results that the method of the present invention has are:

1.随着我国智能汽车产业的快速发展以及电动汽车的广泛使用，对于电动汽车如何节电节能方面可作为一种开发利用的隐性资源，能根据市场要求指令，主动进行电动汽车提升耗电效率以及减小电池损耗，以维护整个电动汽车系统的稳定性和经济性。相比于传统的燃油汽车，电动汽车的广泛发展可作为现在新能源市场的重要目标之一。1. With the rapid development of my country's smart car industry and the widespread use of electric vehicles, it can be used as a hidden resource for development and utilization of electric vehicles in terms of how to save electricity efficiency and reduce battery losses to maintain the stability and economy of the entire electric vehicle system. Compared with traditional fuel vehicles, the extensive development of electric vehicles can be one of the important goals of the new energy market.

2、将电动汽车的控制变量限定于一定范围内，可使得电动汽车在加速或减速过程中不会出现大幅度的变化导致行驶过程中的安全性降低，同时也能使得乘客驾驶的的舒适性得到一定的保障。2. Limiting the control variables of electric vehicles to a certain range can prevent the electric vehicles from changing significantly during the acceleration or deceleration process, resulting in reduced safety during driving, and at the same time, it can also make the driving comfort of passengers. get some protection.

3、将状态变量控制在一定范围内可使得行车过程中始终保持与前车有个合适的车间距，这使得驾驶的最终目标——驾驶安全性得到保障。3. Controlling the state variable within a certain range can keep a proper distance between the vehicle and the vehicle ahead during the driving process, which ensures the ultimate goal of driving - driving safety.

4、通过使用多步强化学习算法可以获得多步范围内的最小值函数，使得总体代价函数最小，优化了用电效率，改变了以往单步更新的效率。4. By using the multi-step reinforcement learning algorithm, the minimum value function within the multi-step range can be obtained, so that the overall cost function is minimized, the power consumption efficiency is optimized, and the efficiency of the previous single-step update is changed.

附图说明Description of drawings

图1是本发明提供的车辆跟车场景图。FIG. 1 is a scene diagram of a vehicle following a vehicle provided by the present invention.

图2是本发明提供的基于ACC策略下的多步回溯树算法图。FIG. 2 is a diagram of a multi-step backtracking tree algorithm based on the ACC strategy provided by the present invention.

图3是本发明提供的基于多步强化学习的生态ACC策略的仿真平台和参数设置图。FIG. 3 is a simulation platform and parameter setting diagram of the ecological ACC strategy based on multi-step reinforcement learning provided by the present invention.

图4是本发明提供的基于多步强化学习的智能汽车跟车行驶控制在WLTC驾驶循环下的仿真实验图。FIG. 4 is a simulation experiment diagram of the intelligent vehicle following driving control based on multi-step reinforcement learning provided by the present invention under the WLTC driving cycle.

具体实施方式Detailed ways

下面结合具体实施方式对本发明进行详细的说明。The present invention will be described in detail below with reference to specific embodiments.

本发明提出的基于多步强化学习算法的智能汽车跟车行驶控制方法，按照以下步骤实施。The intelligent vehicle following driving control method based on the multi-step reinforcement learning algorithm proposed by the present invention is implemented according to the following steps.

步骤2、将状态变量X(t)中的速度V以等比例从-3.7到4.399划分成50000 个格子大小，将等效车间距δd以等比例从-2到2.2999划分成50000个格子大小，将控制变量加速度从-1到1之间划分成21个格子。从而构成一个50000*21个表格大小的Q表。Step 2. Divide the speed V in the state variable X(t) into 50,000 grid sizes from -3.7 to 4.399 in equal proportions, and divide the equivalent vehicle distance δd into 50,000 grid sizes from -2 to 2.2999 in equal proportions, Divide the control variable acceleration into 21 grids from -1 to 1. Thus, a Q table with a size of 50000*21 tables is formed.

步骤7、迭代终止后，从Q表中获得控制输入，经过计算获得最优需求功率Pe 并应用于主车车辆上。Step 7: After the iteration is terminated, the control input is obtained from the Q table, and the optimal demand power Pe is obtained through calculation and applied to the host vehicle.

进一步的，步骤1所述的初始化参数包括汽车质量m、空气密度ρ、重力加速度g、滚动阻力系数μ、主车的标称气动阻力系数C_h,d、电动车的车身长度L_car、电机效率η_m、固定齿轮比G_r、最小加速度a_min、最大加速度a_max等。Further, the initialization parameters described in step 1 include the vehicle mass m, the air density ρ, the gravitational acceleration g, the rolling resistance coefficient μ, the nominal aerodynamic drag coefficient C _h,d of the main vehicle, the body length L _car of the electric vehicle, and the motor. Efficiency η _m , fixed gear ratio _Gr , minimum acceleration a _min , maximum acceleration a _max and so on.

进一步的，步骤1具体实现如下：Further, step 1 is specifically implemented as follows:

对车辆进行纵向动力学建模，对车辆的基本信息以及车辆的物理量进行建模。车辆跟车场景图如图1所示。Model the longitudinal dynamics of the vehicle, model the basic information of the vehicle and the physical quantities of the vehicle. The vehicle following scene graph is shown in Figure 1.

1-1.建立二阶车辆纵向动力学模型，如下所示：1-1. Establish a second-order vehicle longitudinal dynamics model as follows:

其中，S表示车辆所在位置，

表示对车辆所在位置进行求导。V表示车辆纵向速度，

表示对车辆纵向速度求导。U为控制输入。Among them, S represents the location of the vehicle,

Indicates the derivation of the position of the vehicle. V is the longitudinal speed of the vehicle,

Represents the derivative of the vehicle longitudinal velocity. U is the control input.

1-2.建立车辆纵向力平衡行驶方程为：1-2. Establish the vehicle longitudinal force balance driving equation as:

其中，F_hf(t)和F_hr(t)分别为t时刻前车车轮轮胎纵向力和后车车轮轮胎纵向力，F_a(t)为t时刻的空气阻力，F_r(t)为t时刻的滚动阻力，m为汽车质量，V为车辆纵向速度。Among them, F _hf (t) and F _hr (t) are the longitudinal force of the front wheel tire and the rear wheel tire longitudinal force at time t, respectively, F _a (t) is the air resistance at time t, and F _r (t) is t Rolling resistance at time, m is the mass of the vehicle, and V is the longitudinal speed of the vehicle.

t时刻作用在车辆上的空气阻力可表述为：The air resistance acting on the vehicle at time t can be expressed as:

其中，ρ为空气密度，C_D(d_h)为两车跟车间距相关的气动阻力系数，A_v为主车正面迎风面积，

表示主车(前车)速度的平方。Among them, ρ is the air density, C _D (d _h ) is the aerodynamic drag coefficient related to the distance between the two vehicles and the vehicle, A _v is the front windward area of the main vehicle,

Indicates the square of the speed of the host vehicle (the preceding vehicle).

t时刻滚动阻力表达式如下所示：The rolling resistance expression at time t is as follows:

F_r(t)＝mgμ (4)F _r (t) = mgμ (4)

其中，g为重力加速度，μ为滚动阻力系数，m为汽车质量。Among them, g is the acceleration of gravity, μ is the coefficient of rolling resistance, and m is the mass of the vehicle.

气动阻力系数可表述为：The aerodynamic drag coefficient can be expressed as:

其中，C_h,d表示主车的标称气动阻力系数。参数c₁和c₂是通过回归实验数据得到的，d_h为两车的跟车间距。Among them, C _h,d represents the nominal aerodynamic drag coefficient of the main vehicle. The parameters c ₁ and _c ₂ are obtained by regressing the experimental data, and dh is the following distance between the two vehicles.

两车的跟车间距d_h通过以下方式计算：The following distance d _h of the two vehicles is calculated as follows:

d_h＝S_p-S_h-L_car (6)d _h =S _p -S _h -L _car (6)

其中，L_car为电动车的车身长度。S_p，S_h分别为前车位置及主车位置。Among them, L _car is the body length of the electric vehicle. _{Sp and Sh} _are the position of the preceding vehicle and the position of the host vehicle, respectively.

1-3.建立电动车电机驱动模型。1-3. Establish electric vehicle motor drive model.

本发明中控制主体是电动车，其驱动力由电机提供。为准确描述车辆的动力学特性，假设不考虑电机效率的约束，电机实际输出力矩T_m与电机转速ω_m可表述为：In the present invention, the control body is an electric vehicle, and its driving force is provided by the motor. In order to accurately describe the dynamic characteristics of the vehicle, assuming that the constraints of the motor efficiency are not considered, the actual output torque T _m of the motor and the motor speed ω _m can be expressed as:

其中，R和G_r分别为轮胎半径和固定齿轮比，T_ω(t)是t时刻的牵引力。where R and G _r are the tire radius and fixed gear ratio, respectively, and T _ω (t) is the traction force at time t.

1-4.建立电动车电池功率模型。忽略辅件对电池功率的影响，期望输出电池功率可等同于期望电机输入功率，其表达式如下:1-4. Establish an electric vehicle battery power model. Ignoring the influence of accessories on battery power, the expected output battery power can be equal to the expected motor input power, and its expression is as follows:

其中，P_bat为电池功率；η_m为电机效率。其中电机效率可描述为：Among them, P _bat is battery power; η _m is motor efficiency. The motor efficiency can be described as:

η_m(t)＝f_m(ω_m(t),T_m(t)) (9)η _m (t) = f _m (ω _m (t), T _m (t)) (9)

其中，f_m表示电机转换的功率转换效率函数。where f _m represents the power conversion efficiency function of the motor conversion.

1-5.建立电动车的充放电电阻模型，如下所示：1-5. Establish the charging and discharging resistance model of the electric vehicle, as shown below:

其中，SoC_bat(t)为电池组在t时刻的荷电状态，R_bat(t)为电池在t时刻的电阻。

均为电池组充电模型系数，

为电池组放电模型系数， I_bat(t)为t时刻的电池组电流。Among them, SoC _bat (t) is the state of charge of the battery pack at time t, and R _bat (t) is the resistance of the battery at time t.

are the battery pack charging model coefficients,

is the battery pack discharge model coefficient, and I _bat (t) is the battery pack current at time t.

步骤2、基于多步强化学习的电动车进行生态自适应巡航控制，确定优化目标。Step 2. The electric vehicle based on multi-step reinforcement learning performs ecological adaptive cruise control and determines the optimization target.

2-1基于车辆安全驾驶的优化目标；为了确保车辆行驶的安全性，车间距必须受限于：2-1 Based on the optimization goal of safe driving of vehicles; in order to ensure the safety of vehicle driving, the distance between vehicles must be limited by:

d_min(t)≤d_h(t)≤d_max(t) (11)d _min (t)≤d _h (t)≤d _max (t) (11)

其中，d_h(t)为车辆在t时刻时，主车与前车的车间距，d_min(t)和d_max(t)分别为所被允许的最小和最大车间距，d_min(t)和d_max(t)均来源于Q表。Among them, d _h (t) is the distance between the host vehicle and the preceding vehicle at time t, d _min (t) and d _max (t) are the allowed minimum and maximum inter-vehicle distances, respectively, d _min (t ) and d _max (t) are both derived from the Q table.

d_min(t)和d_max(t)分别可表述为：d _min (t) and d _max (t) can be expressed as:

2-2.基于车辆驾驶舒适性的优化目标；为了确保驾驶舒适性，电动车的控制输入必须受限于：2-2. Optimization objective based on vehicle driving comfort; in order to ensure driving comfort, the control input of electric vehicles must be limited by:

a_min≤U(t)≤a_max (13)a _min ≤U(t)≤a _max (13)

其中，a_min和a_max分别为所被允许的最小和最大加速度。在本发明中， a_min＝-1m/s²,a_max＝1m/s²。where a _min and a _max are the allowed minimum and maximum accelerations, respectively. In the present invention, a _min =-1 m/s ² , a _max =1 m/s ² .

2-3.基于延长车辆电池寿命的优化目标。减小

可以延长电池的寿命。因此为了尽可能延长电池寿命，需要使下式尽可能地减小：2-3. Based on the optimization goal of extending vehicle battery life. reduce

Extends battery life. Therefore, in order to prolong the battery life as much as possible, the following formula needs to be reduced as much as possible:

其中，

为电池组在t时刻的电流的平方，P₀为驾驶循环的开始时刻，T_cyc为驾驶循环的终止时刻。in,

is the square of the current of the battery pack at time t, P ₀ is the start time of the drive cycle, and T _cyc is the end time of the drive cycle.

2-4.基于车辆能源经济性的优化目标。2-4. Optimization objective based on vehicle energy economy.

为了提升电动车能源经济性，需要使下式尽可能地减小：In order to improve the energy economy of electric vehicles, the following equations need to be reduced as much as possible:

步骤3、确定基于多步强化学习的算法为n步树回溯算法。Step 3: Determine that the algorithm based on multi-step reinforcement learning is an n-step tree backtracking algorithm.

3-1.确定基于多步强化学习的Eco-ACC策略研究下的状态变量和控制变量。3-1. Determine the state variables and control variables under the Eco-ACC strategy research based on multi-step reinforcement learning.

①状态变量X(t)：为了使得电动汽车在合理的车间距范围内跟随前车，就必须满足等式(11)。因此跟车性能可以用车辆间距偏差Δd和速度偏差ΔV来评估，分别可以定义为：① State variable X(t): In order for the electric vehicle to follow the preceding vehicle within a reasonable distance between vehicles, equation (11) must be satisfied. Therefore, the following performance can be evaluated by vehicle distance deviation Δd and speed deviation ΔV, which can be defined as:

X(t)＝[ΔV(t),Δd(t)]T (16)X(t)=[ΔV(t),Δd(t)]T (16)

其中，in,

ΔV(t)＝V_p(t)-V(t) (18)ΔV(t)= _Vp (t)-V(t) (18)

其中，BSF()为带阻函数，α,β，

cf为与带阻函数相关的系数，可见表1。 V_p(t)为t时刻前车的速度。Among them, BSF() is the band-stop function, α, β,

cf is the coefficient related to the band-stop function, see Table 1. V _p (t) is the speed of the preceding vehicle at time t.

为了更直观地表述车辆地状态信息，对带阻函数进行改进，将车间距描述为：In order to express the state information of vehicles more intuitively, the band-stop function is improved, and the distance between vehicles is described as:

其中，δd(t)为t时刻的等效车间距偏差，α,β，

cf_z为与带阻函数相关的系数，可见表1。Among them, δd(t) is the equivalent vehicle distance deviation at time t, α, β,

cf _z is a coefficient related to the bandstop function, see Table 1.

②控制变量：本发明的控制变量为加速度。②Control variable: The control variable of the present invention is acceleration.

U(t)＝a(t) (20)U(t)=a(t) (20)

其中，a(t)为t时刻的主车加速度。Among them, a(t) is the acceleration of the host vehicle at time t.

3-2.确定基于多步强化学习的Eco-ACC策略研究下的奖励函数和值函数。3-2. Determine the reward function and value function under the Eco-ACC strategy study based on multi-step reinforcement learning.

③奖励函数：为了实现控制目标，给出的奖励函数如下所示：③Reward function: In order to achieve the control objective, the given reward function is as follows:

r(X(t),U(t))＝α₁L₁(t)+α₂L₂(t)+α₃L₃(t) (21)r(X(t),U(t))=α ₁ L ₁ (t)+α ₂ L ₂ (t)+α ₃ L ₃ (t) (21)

其中，α₁、α₂和α₃为奖励函数的权重系数，可见表1。L₁、L₂和L₃可用下式表述为：Among them, α ₁ , α ₂ and α ₃ are the weight coefficients of the reward function, see Table 1. L ₁ , L ₂ and L ₃ can be expressed as:

④值函数：基于多步强化学习的生态ACC策略的值函数可用下式表述为：④Value function: The value function of the ecological ACC strategy based on multi-step reinforcement learning can be expressed as:

其中，γ为折扣因子，α,β和

是带阻函数的参数,详见表1。where γ is the discount factor, α, β and

are the parameters of the band-stop function, see Table 1 for details.

在强化学习中，代理的最终目标是最大化累积奖励。奖励函数能在短时内判断该动作是好是坏。因此值函数可被描述为：In reinforcement learning, the ultimate goal of the agent is to maximize the cumulative reward. The reward function can determine whether the action is good or bad in a short period of time. So the value function can be described as:

3-3.多步学习算法采用不适用重要性采样的离线算法，即回溯树法，n步回溯树图如图2所示。3-3. The multi-step learning algorithm adopts an offline algorithm that does not apply importance sampling, that is, the backtracking tree method. The n-step backtracking tree diagram is shown in Figure 2.

3-4.基于多步强化学习的生态ACC策略的仿真平台和参数设置如图3所示。3-4. The simulation platform and parameter settings of the ecological ACC strategy based on multi-step reinforcement learning are shown in Figure 3.

带阻函数相关参数与奖励函数权重系数设置如表1所示。The relevant parameters of the bandstop function and the weight coefficient settings of the reward function are shown in Table 1.

表1Table 1

步骤4、对于不同驾驶循环下的基于多步强化学习算法的智能汽车跟车形式控制方法进行验证如下表所示。Step 4. Verify the intelligent vehicle following form control method based on the multi-step reinforcement learning algorithm under different driving cycles as shown in the following table.

表2不同驾驶循环下的基于多步强化学习算法的智能汽车跟车形式控制方法Table 2 The intelligent vehicle following form control method based on multi-step reinforcement learning algorithm under different driving cycles

仿真结果如图4所示，其结果表明：发明的基于多步强化学习算法的Eco-ACC 系统所控制的车辆和前方车辆的速度基本一致，车辆的加速度也比传统ACC系统控制的更加平滑，使乘客感到更加舒适；Eco-ACC系统所控制的车辆与前方车辆的实际车距始终保持在安全范围内，保证了车辆在行驶过程中的安全性；Eco-ACC 系统所控制的车辆比传统ACC系统控制的车辆更加节能。The simulation results are shown in Figure 4. The results show that the speed of the vehicle controlled by the invented Eco-ACC system based on the multi-step reinforcement learning algorithm is basically the same as that of the vehicle ahead, and the acceleration of the vehicle is smoother than that controlled by the traditional ACC system. Make passengers feel more comfortable; the actual distance between the vehicle controlled by the Eco-ACC system and the vehicle ahead is always kept within a safe range, ensuring the safety of the vehicle during driving; the vehicle controlled by the Eco-ACC system is better than traditional ACC. System-controlled vehicles are more energy efficient.

Claims

1. An electric vehicle following running control method based on multi-step reinforcement learning is characterized by comprising the following steps:

step 1, determining a state variable X (t) through an information acquisition module and a controller design module of a vehicle, determining a control variable U (t) through a control target, and initializing relevant parameters of the vehicle;

step 2, dividing the speed V in the state variable X (t) into 50000 grids in size from-3.7 to 4.399 in equal proportion, dividing the equivalent inter-vehicle distance delta d into 50000 grids in size from-2 to 2.2999 in equal proportion, and dividing the acceleration of the control variable into 21 grids from-1 to 1; thus forming a Q table with the size of 50000X 21 tables;

step 3, inputting the state variable x (t) and the control variable U (t) into a Q table to obtain an expected value function;

step 4, solving a state variable X (t +1) at the next moment through the longitudinal dynamics module and the electric automobile energy storage module;

step 5, selecting a Q table with the minimum expected cost value within the next n steps in the future to obtain a control variable U (t +1) according to the acceleration change condition of the front vehicle at the next moment;

step 6, judging whether the Q table meets the maximum iteration times or whether the tolerance meets the self-adaptive iteration value; if yes, the solved control variable U (t +1) is used as the optimal or suboptimal control variable, otherwise, the step 4 is returned;

and 7, after iteration is ended, obtaining control input from the Q table, obtaining the optimal required power Pe through calculation, and applying the optimal required power Pe to the main vehicle.

2. The electric vehicle following running control method based on multi-step reinforcement learning according to claim 1, characterized in that longitudinal dynamics modeling is performed on the vehicle, basic information of the vehicle and physical quantities of the vehicle are modeled;

1-1, establishing a second-order vehicle longitudinal dynamic model, which is as follows:

wherein S represents the position of the vehicle,

indicating that the position of the vehicle is subjected to derivation; v represents the longitudinal speed of the vehicle,

representing the derivation of the vehicle longitudinal speed; u is a control input;

1-2, establishing a longitudinal force balance driving equation of the vehicle as follows:

wherein, F _hf (t) and F _hr (t) longitudinal force of front and rear wheel tires at time t, respectively, F _a (t) is the air resistance at time t, F _r (t) rolling resistance at time t, m is vehicle mass, and V is vehicle longitudinal speed;

the air resistance acting on the vehicle at time t can be expressed as:

where ρ is the air density, C _D (d _h ) Is the aerodynamic drag coefficient related to the distance between two vehicles, A _v The frontal windward area of the main vehicle,

represents the square of the host vehicle velocity;

the rolling resistance expression at time t is as follows:

F _r (t)＝mgμ (4)

wherein g is the gravity acceleration, mu is the rolling resistance coefficient, and m is the automobile mass;

the aerodynamic drag coefficient can be expressed as:

wherein, C _h，d Representing a nominal aerodynamic drag coefficient of the host vehicle; parameter c ₁ And c ₂ Is obtained by regression of experimental data, d _h The distance between two vehicles is the following distance of the two vehicles;

following distance d between two vehicles _h Calculated by the following way:

d _h ＝S _p -S _h -L _car (6)

wherein L is _car The length of the body of the electric vehicle; s _p ，S _h Respectively a front vehicle position and a main vehicle position;

1-3, establishing a motor driving model of the electric vehicle;

in order to accurately describe the dynamic characteristics of the vehicle, the actual output torque T of the motor is assumed without considering the constraint of the motor efficiency _m With the motor speed omega _m Can be expressed as:

wherein R and G _r Respectively, the radius of the tire and the fixed gear ratio, T _ω (t) is the traction at time t;

1-4, establishing a battery power model of the electric vehicle; neglecting the effect of the auxiliary on the battery power, the desired output battery power may be equivalent to the desired motor input power, expressed as follows:

wherein, P _bat Is the battery power; eta _m To the motor efficiency; where the motor efficiency can be described as:

η _m (t)＝f _m (ω _m (t)，T _m (t)) (9)

wherein f is _m A power conversion efficiency function representing a motor conversion;

1-5, establishing a charge and discharge resistance model of the electric vehicle, as follows:

wherein, SoC _bat (t) is the state of charge of the battery at time t, R _bat (t) is the resistance of the battery at time t;

are all the coefficients of a charging model of the battery pack,

is a battery discharge model coefficient, I _bat And (t) is the battery pack current at time t.

3. The electric vehicle following running control method based on multi-step reinforcement learning according to claim 2, characterized in that the electric vehicle based on multi-step reinforcement learning performs ecological adaptive cruise control to determine an optimization target, and specifically realizes the following:

2-1 optimizing target based on safe driving of vehicle; in order to ensure the safety of the vehicle running, the inter-vehicle distance must be limited by:

d _min (t)≤d _h (t)≤d _max (t) (11)

wherein d is _h (t) is the inter-vehicle distance between the main vehicle and the preceding vehicle at time t, d _min (t) and d _max (t) minimum and maximum permissible inter-vehicle distances, d _min (t) and d _max (t) are all derived from the Q table;

d _min (t) and d _max (t) can be expressed as:

2-2, optimizing the target based on the driving comfort of the vehicle; to ensure driving comfort, the control inputs of an electric vehicle must be limited by:

a _min ≤U(t)≤a _max (13)

wherein, a _min And a _max Minimum and maximum accelerations allowed, respectively; in the present invention, a _min ＝-1m/s ² ，a _max ＝1m/s ² ；

2-3. based on the optimization goal of prolonging the service life of the vehicle battery; reduce

The service life of the battery can be prolonged; therefore, in order to extend the battery life as long as possible, it is necessary to reduce the following equation (14) as small as possible:

wherein,

for battery packs at time tSquare of the current of (d), t ₀ To the start of the driving cycle, T _cyc Is the end time of the driving cycle;

2-4, optimizing target based on vehicle energy economy; in order to improve the energy economy of the electric vehicle, it is necessary to reduce the following formula as much as possible:

4. the electric vehicle following running control method based on multi-step reinforcement learning according to claim 3, characterized in that the algorithm based on multi-step reinforcement learning is determined to be an n-step tree backtracking algorithm;

3-1, determining state variables and control variables under the research of the Eco-ACC strategy based on multi-step reinforcement learning;

state variable x (t): in order for the electric vehicle to follow the leading vehicle within a reasonable inter-vehicle distance range, equation (11) must be satisfied; the following performance is therefore evaluated with the vehicle separation deviation Δ d and the speed deviation Δ V, which can be defined as:

X(t)＝[ΔV(t)，Δd(t)] ^T (16)

wherein,

ΔV(t)＝V _p (t)-V(t) (18)

wherein BSF () is a band stop function, α, β,

cf is a coefficient related to the band stop function, V _p (t) is the speed of the vehicle ahead at time t;

in order to express the state information of the vehicle more intuitively, the belt resistance function is improved, and the distance between the vehicles is described as follows:

where δ d (t) is the equivalent inter-vehicle distance deviation at time t, α, β,

cf _z is a coefficient related to the band stop function;

controlling variables: the control variable is an acceleration;

U(t)＝a(t) (20)

wherein a (t) is the principal acceleration at time t;

3-2, determining a reward function and a value function under the research of the Eco-ACC strategy based on multi-step reinforcement learning;

③ reward function: to achieve the control objective, the reward function is given as follows:

r(X(t)，U(t))＝α ₁ L ₁ (t)+α ₂ L ₂ (t)+α ₃ L ₃ (t) (21)

wherein alpha is ₁ 、α ₂ And alpha ₃ A weighting factor that is a reward function; l is ₁ 、L ₂ And L ₃ Can be expressed as:

value function: the value function of the ecological ACC strategy based on multi-step reinforcement learning can be expressed as follows:

wherein gamma is a discount factor, alpha, beta and

is a parameter of the band stop function;

in reinforcement learning, the ultimate goal of an agent is to maximize the jackpot; the reward function can judge whether the action is good or bad in a short time; the value function can therefore be described as:

and 3-3, adopting an off-line algorithm which is not suitable for importance sampling, namely an n-step back-tracing tree method, by using the multi-step learning algorithm.