CN114954455A - Electric vehicle following running control method based on multi-step reinforcement learning - Google Patents

Electric vehicle following running control method based on multi-step reinforcement learning Download PDF

Info

Publication number
CN114954455A
CN114954455A CN202210770539.XA CN202210770539A CN114954455A CN 114954455 A CN114954455 A CN 114954455A CN 202210770539 A CN202210770539 A CN 202210770539A CN 114954455 A CN114954455 A CN 114954455A
Authority
CN
China
Prior art keywords
vehicle
time
reinforcement learning
control
battery
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210770539.XA
Other languages
Chinese (zh)
Other versions
CN114954455B (en
Inventor
翟春杰
王栎
裘健鋆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202210770539.XA priority Critical patent/CN114954455B/en
Publication of CN114954455A publication Critical patent/CN114954455A/en
Application granted granted Critical
Publication of CN114954455B publication Critical patent/CN114954455B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W30/00Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units
    • B60W30/14Adaptive cruise control
    • B60W30/16Control of distance between vehicles, e.g. keeping a distance to preceding vehicle
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60LPROPULSION OF ELECTRICALLY-PROPELLED VEHICLES; SUPPLYING ELECTRIC POWER FOR AUXILIARY EQUIPMENT OF ELECTRICALLY-PROPELLED VEHICLES; ELECTRODYNAMIC BRAKE SYSTEMS FOR VEHICLES IN GENERAL; MAGNETIC SUSPENSION OR LEVITATION FOR VEHICLES; MONITORING OPERATING VARIABLES OF ELECTRICALLY-PROPELLED VEHICLES; ELECTRIC SAFETY DEVICES FOR ELECTRICALLY-PROPELLED VEHICLES
    • B60L15/00Methods, circuits, or devices for controlling the traction-motor speed of electrically-propelled vehicles
    • B60L15/20Methods, circuits, or devices for controlling the traction-motor speed of electrically-propelled vehicles for control of the vehicle or its driving motor to achieve a desired performance, e.g. speed, torque, programmed variation of speed
    • B60L15/2045Methods, circuits, or devices for controlling the traction-motor speed of electrically-propelled vehicles for control of the vehicle or its driving motor to achieve a desired performance, e.g. speed, torque, programmed variation of speed for optimising the use of energy
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2556/00Input parameters relating to data
    • B60W2556/45External transmission of data to or from the vehicle
    • B60W2556/65Data transmitted between vehicles
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/60Other road transportation technologies with climate change mitigation effect
    • Y02T10/72Electric energy management in electromobility

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Mechanical Engineering (AREA)
  • Transportation (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Power Engineering (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Automation & Control Theory (AREA)
  • Electric Propulsion And Braking For Vehicles (AREA)

Abstract

The invention discloses an electric vehicle following running control method based on multi-step reinforcement learning. The method comprises the following steps: 1. determining a state variable through an information acquisition module and a controller design module, and determining a control variable through a control target; 2. dividing the state variable and the equivalent inter-vehicle distance to obtain a Q table; 3. inputting the state variable and the control variable into a Q table to obtain an expected value function; 4. and solving the state variable of the next moment through the longitudinal dynamics module and the electric automobile energy storage module. 5. And selecting a Q table with the minimum expected cost value within n steps in the future to obtain a control variable according to the acceleration change condition of the front vehicle at the next moment. 6. And judging the control variable solved by the Q table as the optimal or suboptimal control variable. The invention utilizes the communication facility between the front vehicle and the main vehicle, so that the vehicle can acquire the information of the acceleration and the like of the front vehicle and realize the vehicle following effect, and the aim of reducing the power consumption in the vehicle following process is fulfilled.

Description

一种基于多步强化学习的电动车跟车行驶控制方法A multi-step reinforcement learning-based control method for electric vehicle following driving

技术领域technical field

本发明属于智能驾驶技术领域,尤其针对前后车跟车,具体涉及一种基于多步强化学习的电动车跟车行驶控制方法。The invention belongs to the technical field of intelligent driving, and in particular is aimed at following the vehicle at the front and rear, and in particular relates to a control method for following the vehicle of an electric vehicle based on multi-step reinforcement learning.

背景技术Background technique

为了减缓全球变暖上升趋势,降低二氧化碳的排放量,同时随着电池的容量和经济性不断提高,纯电动汽车成为新能源汽车发展的方向之一,电动汽车的推广应用排放,为遏制全球暖化的趋势,电动汽车的推广应用越来越受到人们的重视。In order to slow down the rising trend of global warming and reduce carbon dioxide emissions, and with the continuous improvement of battery capacity and economy, pure electric vehicles have become one of the development directions of new energy vehicles. The promotion and application of electric vehicles emits emissions to curb global warming. With the trend of globalization, the promotion and application of electric vehicles has attracted more and more attention.

随着时代的发展,全球不可再生资源日益稀缺,人类在发展的同时必须更加注重对现有非可再生资源的合理利用。然而,随着现代经济的快速蓬勃发展,智能汽车作为现如今的人均标配以及人类不可或缺的重要出行手段之一,如何平衡汽车燃油消耗以及人类社会石油资源的合理分配成为重要难题之一。其次,石油排放问题造成的大气污染、全球变暖等问题日趋突出,加之汽车排放尾气与燃油消耗标准的法规越来越严格,发展新能源电动智能汽车成为人类发展的必要。因此,新能源汽车引起了汽车制造业和政府的关注。现如今随着科学技术手段的不断进步,新能源汽车取得了巨大的进步与发展,电动汽车成为现如今人们出行的重要手段之一,在市场交易份额中占了重大比例。因此,为了更大程度地提升新能源电动汽车的耗电率,延长电池寿命,本文提出了一种基于多步强化学习的电动车跟车行驶控制方法。With the development of the times, the global non-renewable resources are increasingly scarce, and human beings must pay more attention to the rational use of existing non-renewable resources while developing. However, with the rapid and vigorous development of the modern economy, smart cars are now standard per capita and one of the indispensable means of travel for human beings. How to balance the fuel consumption of automobiles and the rational distribution of oil resources in human society has become one of the important problems. . Secondly, air pollution and global warming caused by oil emissions are becoming more and more prominent. In addition, the regulations on vehicle exhaust and fuel consumption standards are becoming more and more strict. The development of new energy electric smart vehicles has become a necessity for human development. Therefore, new energy vehicles have attracted the attention of the automobile manufacturing industry and the government. Nowadays, with the continuous progress of scientific and technological means, new energy vehicles have made great progress and development, and electric vehicles have become one of the important means of people's travel today, accounting for a significant proportion of market transactions. Therefore, in order to increase the power consumption rate of new energy electric vehicles to a greater extent and prolong the battery life, this paper proposes a multi-step reinforcement learning-based control method for electric vehicle following driving.

发明内容SUMMARY OF THE INVENTION

本发明主要考虑随着我国电动车的广泛使用以及新能源汽车市场的逐步开放,我国拥有电动汽车的数量将会持续上升。如何更好地利用电动汽车的发展,从而降低燃油车汽车尾气排放和提高生态环境并且降低电池耗电量是值得探讨的问题。The present invention mainly considers that with the widespread use of electric vehicles in my country and the gradual opening of the new energy vehicle market, the number of electric vehicles in my country will continue to increase. How to make better use of the development of electric vehicles, thereby reducing the exhaust emissions of fuel vehicles, improving the ecological environment and reducing battery power consumption are issues worth exploring.

本发明的目的是提供一种基于多步强化学习的智能汽车跟车行驶控制的方法,利用了前车与主车之间的通信设施,使得车辆能够获取前车加速度等信息以及实现跟车效果,在跟车过程中实现降低耗电量的目标。The purpose of the present invention is to provide a method for intelligent vehicle following driving control based on multi-step reinforcement learning, which utilizes the communication facilities between the preceding vehicle and the host vehicle, so that the vehicle can obtain information such as the acceleration of the preceding vehicle and realize the following effect. , in the process of following the car to achieve the goal of reducing power consumption.

以上目标通过以下的技术方案实现:The above goals are achieved through the following technical solutions:

步骤1、通过车辆自带的信息获取模块和控制器设计模块确定状态变量X(t), 通过控制目标来确定控制变量U(t),初始化车辆相关参数详见图3。Step 1. Determine the state variable X(t) through the information acquisition module and the controller design module that comes with the vehicle, determine the control variable U(t) through the control target, and initialize the relevant parameters of the vehicle as shown in Figure 3.

步骤2、将状态变量X(t)中的速度V以等比例从-3.7~4.399划分成50000 个格子大小,将等效车间距δd以等比例从-2~2.2999划分成50000个格子大小,将控制变量加速度从-1~1之间划分成21个格子。因此,构成了一个50000*21 个表格大小的Q表。Step 2. Divide the speed V in the state variable X(t) into 50,000 grid sizes from -3.7 to 4.399 in equal proportions, and divide the equivalent vehicle distance δd into 50,000 grid sizes from -2 to 2.2999 in equal proportions, Divide the control variable acceleration into 21 grids from -1 to 1. Therefore, a Q table with a size of 50000*21 tables is formed.

步骤3、将状态变量X(t)和控制变量U(t)输入到Q表中获取期望值函数;Step 3. Input the state variable X(t) and the control variable U(t) into the Q table to obtain the expected value function;

步骤4、通过纵向动力学模块和电动汽车储能模块求解出下一时刻的状态变量X(t+1)。Step 4: Solve the state variable X(t+1) at the next moment through the longitudinal dynamics module and the electric vehicle energy storage module.

步骤5、根据前车下一时刻的加速度变化情况,选取到未来n步以内最小期望代价值的Q表来获取控制变量U(t+1)。Step 5. According to the acceleration change of the preceding vehicle at the next moment, select the Q table with the minimum expected cost value within n steps in the future to obtain the control variable U(t+1).

步骤6、判断该Q表是否满足最大迭代次数或容差是否满足自适应迭代值。若满足,则求解出的控制变量U(t+1)作为最优或者次优的控制变量,否则返回步骤4。Step 6: Determine whether the Q table satisfies the maximum number of iterations or whether the tolerance satisfies the adaptive iteration value. If satisfied, the obtained control variable U(t+1) is taken as the optimal or sub-optimal control variable, otherwise, return to step 4.

步骤7、迭代终止后,从Q表中获得控制输入,经过计算获得最优需求功率 Pe并应用于主车车辆上。Step 7: After the iteration is terminated, the control input is obtained from the Q table, and the optimal demand power Pe is obtained through calculation and applied to the host vehicle.

本发明方法具有的优点及有益结果为:The advantages and beneficial results that the method of the present invention has are:

1.随着我国智能汽车产业的快速发展以及电动汽车的广泛使用,对于电动汽车如何节电节能方面可作为一种开发利用的隐性资源,能根据市场要求指令,主动进行电动汽车提升耗电效率以及减小电池损耗,以维护整个电动汽车系统的稳定性和经济性。相比于传统的燃油汽车,电动汽车的广泛发展可作为现在新能源市场的重要目标之一。1. With the rapid development of my country's smart car industry and the widespread use of electric vehicles, it can be used as a hidden resource for development and utilization of electric vehicles in terms of how to save electricity efficiency and reduce battery losses to maintain the stability and economy of the entire electric vehicle system. Compared with traditional fuel vehicles, the extensive development of electric vehicles can be one of the important goals of the new energy market.

2、将电动汽车的控制变量限定于一定范围内,可使得电动汽车在加速或减速过程中不会出现大幅度的变化导致行驶过程中的安全性降低,同时也能使得乘客驾驶的的舒适性得到一定的保障。2. Limiting the control variables of electric vehicles to a certain range can prevent the electric vehicles from changing significantly during the acceleration or deceleration process, resulting in reduced safety during driving, and at the same time, it can also make the driving comfort of passengers. get some protection.

3、将状态变量控制在一定范围内可使得行车过程中始终保持与前车有个合适的车间距,这使得驾驶的最终目标——驾驶安全性得到保障。3. Controlling the state variable within a certain range can keep a proper distance between the vehicle and the vehicle ahead during the driving process, which ensures the ultimate goal of driving - driving safety.

4、通过使用多步强化学习算法可以获得多步范围内的最小值函数,使得总体代价函数最小,优化了用电效率,改变了以往单步更新的效率。4. By using the multi-step reinforcement learning algorithm, the minimum value function within the multi-step range can be obtained, so that the overall cost function is minimized, the power consumption efficiency is optimized, and the efficiency of the previous single-step update is changed.

附图说明Description of drawings

图1是本发明提供的车辆跟车场景图。FIG. 1 is a scene diagram of a vehicle following a vehicle provided by the present invention.

图2是本发明提供的基于ACC策略下的多步回溯树算法图。FIG. 2 is a diagram of a multi-step backtracking tree algorithm based on the ACC strategy provided by the present invention.

图3是本发明提供的基于多步强化学习的生态ACC策略的仿真平台和参数设置图。FIG. 3 is a simulation platform and parameter setting diagram of the ecological ACC strategy based on multi-step reinforcement learning provided by the present invention.

图4是本发明提供的基于多步强化学习的智能汽车跟车行驶控制在WLTC驾驶循环下的仿真实验图。FIG. 4 is a simulation experiment diagram of the intelligent vehicle following driving control based on multi-step reinforcement learning provided by the present invention under the WLTC driving cycle.

具体实施方式Detailed ways

下面结合具体实施方式对本发明进行详细的说明。The present invention will be described in detail below with reference to specific embodiments.

本发明提出的基于多步强化学习算法的智能汽车跟车行驶控制方法,按照以下步骤实施。The intelligent vehicle following driving control method based on the multi-step reinforcement learning algorithm proposed by the present invention is implemented according to the following steps.

步骤1、通过车辆自带的信息获取模块和控制器设计模块确定状态变量X(t), 通过控制目标来确定控制变量U(t),初始化车辆相关参数详见图3。Step 1. Determine the state variable X(t) through the information acquisition module and the controller design module that comes with the vehicle, determine the control variable U(t) through the control target, and initialize the relevant parameters of the vehicle as shown in Figure 3.

步骤2、将状态变量X(t)中的速度V以等比例从-3.7到4.399划分成50000 个格子大小,将等效车间距δd以等比例从-2到2.2999划分成50000个格子大小,将控制变量加速度从-1到1之间划分成21个格子。从而构成一个50000*21个表格大小的Q表。Step 2. Divide the speed V in the state variable X(t) into 50,000 grid sizes from -3.7 to 4.399 in equal proportions, and divide the equivalent vehicle distance δd into 50,000 grid sizes from -2 to 2.2999 in equal proportions, Divide the control variable acceleration into 21 grids from -1 to 1. Thus, a Q table with a size of 50000*21 tables is formed.

步骤3、将状态变量X(t)和控制变量U(t)输入到Q表中获取期望值函数;Step 3. Input the state variable X(t) and the control variable U(t) into the Q table to obtain the expected value function;

步骤4、通过纵向动力学模块和电动汽车储能模块求解出下一时刻的状态变量X(t+1)。Step 4: Solve the state variable X(t+1) at the next moment through the longitudinal dynamics module and the electric vehicle energy storage module.

步骤5、根据前车下一时刻的加速度变化情况,选取到未来n步以内最小期望代价值的Q表来获取控制变量U(t+1)。Step 5. According to the acceleration change of the preceding vehicle at the next moment, select the Q table with the minimum expected cost value within n steps in the future to obtain the control variable U(t+1).

步骤6、判断该Q表是否满足最大迭代次数或容差是否满足自适应迭代值。若满足,则求解出的控制变量U(t+1)作为最优或者次优的控制变量,否则返回步骤4。Step 6: Determine whether the Q table satisfies the maximum number of iterations or whether the tolerance satisfies the adaptive iteration value. If satisfied, the obtained control variable U(t+1) is taken as the optimal or sub-optimal control variable, otherwise, return to step 4.

步骤7、迭代终止后,从Q表中获得控制输入,经过计算获得最优需求功率Pe 并应用于主车车辆上。Step 7: After the iteration is terminated, the control input is obtained from the Q table, and the optimal demand power Pe is obtained through calculation and applied to the host vehicle.

进一步的,步骤1所述的初始化参数包括汽车质量m、空气密度ρ、重力加速度g、滚动阻力系数μ、主车的标称气动阻力系数Ch,d、电动车的车身长度Lcar、电机效率ηm、固定齿轮比Gr、最小加速度amin、最大加速度amax等。Further, the initialization parameters described in step 1 include the vehicle mass m, the air density ρ, the gravitational acceleration g, the rolling resistance coefficient μ, the nominal aerodynamic drag coefficient C h,d of the main vehicle, the body length L car of the electric vehicle, and the motor. Efficiency η m , fixed gear ratio Gr , minimum acceleration a min , maximum acceleration a max and so on.

进一步的,步骤1具体实现如下:Further, step 1 is specifically implemented as follows:

对车辆进行纵向动力学建模,对车辆的基本信息以及车辆的物理量进行建模。车辆跟车场景图如图1所示。Model the longitudinal dynamics of the vehicle, model the basic information of the vehicle and the physical quantities of the vehicle. The vehicle following scene graph is shown in Figure 1.

1-1.建立二阶车辆纵向动力学模型,如下所示:1-1. Establish a second-order vehicle longitudinal dynamics model as follows:

Figure BDA0003723960990000041
Figure BDA0003723960990000041

其中,S表示车辆所在位置,

Figure BDA0003723960990000042
表示对车辆所在位置进行求导。V表示车辆纵向速度,
Figure BDA0003723960990000043
表示对车辆纵向速度求导。U为控制输入。Among them, S represents the location of the vehicle,
Figure BDA0003723960990000042
Indicates the derivation of the position of the vehicle. V is the longitudinal speed of the vehicle,
Figure BDA0003723960990000043
Represents the derivative of the vehicle longitudinal velocity. U is the control input.

1-2.建立车辆纵向力平衡行驶方程为:1-2. Establish the vehicle longitudinal force balance driving equation as:

Figure BDA0003723960990000044
Figure BDA0003723960990000044

其中,Fhf(t)和Fhr(t)分别为t时刻前车车轮轮胎纵向力和后车车轮轮胎纵向力,Fa(t)为t时刻的空气阻力,Fr(t)为t时刻的滚动阻力,m为汽车质量,V为车辆纵向速度。Among them, F hf (t) and F hr (t) are the longitudinal force of the front wheel tire and the rear wheel tire longitudinal force at time t, respectively, F a (t) is the air resistance at time t, and F r (t) is t Rolling resistance at time, m is the mass of the vehicle, and V is the longitudinal speed of the vehicle.

t时刻作用在车辆上的空气阻力可表述为:The air resistance acting on the vehicle at time t can be expressed as:

Figure BDA0003723960990000045
Figure BDA0003723960990000045

其中,ρ为空气密度,CD(dh)为两车跟车间距相关的气动阻力系数,Av为主车正面迎风面积,

Figure BDA0003723960990000046
表示主车(前车)速度的平方。Among them, ρ is the air density, C D (d h ) is the aerodynamic drag coefficient related to the distance between the two vehicles and the vehicle, A v is the front windward area of the main vehicle,
Figure BDA0003723960990000046
Indicates the square of the speed of the host vehicle (the preceding vehicle).

t时刻滚动阻力表达式如下所示:The rolling resistance expression at time t is as follows:

Fr(t)=mgμ (4)F r (t) = mgμ (4)

其中,g为重力加速度,μ为滚动阻力系数,m为汽车质量。Among them, g is the acceleration of gravity, μ is the coefficient of rolling resistance, and m is the mass of the vehicle.

气动阻力系数可表述为:The aerodynamic drag coefficient can be expressed as:

Figure BDA0003723960990000051
Figure BDA0003723960990000051

其中,Ch,d表示主车的标称气动阻力系数。参数c1和c2是通过回归实验数据得到的,dh为两车的跟车间距。Among them, C h,d represents the nominal aerodynamic drag coefficient of the main vehicle. The parameters c 1 and c 2 are obtained by regressing the experimental data, and dh is the following distance between the two vehicles.

两车的跟车间距dh通过以下方式计算:The following distance d h of the two vehicles is calculated as follows:

dh=Sp-Sh-Lcar (6)d h =S p -S h -L car (6)

其中,Lcar为电动车的车身长度。Sp,Sh分别为前车位置及主车位置。Among them, L car is the body length of the electric vehicle. Sp and Sh are the position of the preceding vehicle and the position of the host vehicle, respectively.

1-3.建立电动车电机驱动模型。1-3. Establish electric vehicle motor drive model.

本发明中控制主体是电动车,其驱动力由电机提供。为准确描述车辆的动力学特性,假设不考虑电机效率的约束,电机实际输出力矩Tm与电机转速ωm可表述为:In the present invention, the control body is an electric vehicle, and its driving force is provided by the motor. In order to accurately describe the dynamic characteristics of the vehicle, assuming that the constraints of the motor efficiency are not considered, the actual output torque T m of the motor and the motor speed ω m can be expressed as:

Figure BDA0003723960990000052
Figure BDA0003723960990000052

其中,R和Gr分别为轮胎半径和固定齿轮比,Tω(t)是t时刻的牵引力。where R and G r are the tire radius and fixed gear ratio, respectively, and T ω (t) is the traction force at time t.

1-4.建立电动车电池功率模型。忽略辅件对电池功率的影响,期望输出电池功率可等同于期望电机输入功率,其表达式如下:1-4. Establish an electric vehicle battery power model. Ignoring the influence of accessories on battery power, the expected output battery power can be equal to the expected motor input power, and its expression is as follows:

Figure BDA0003723960990000053
Figure BDA0003723960990000053

其中,Pbat为电池功率;ηm为电机效率。其中电机效率可描述为:Among them, P bat is battery power; η m is motor efficiency. The motor efficiency can be described as:

ηm(t)=fmm(t),Tm(t)) (9)η m (t) = f mm (t), T m (t)) (9)

其中,fm表示电机转换的功率转换效率函数。where f m represents the power conversion efficiency function of the motor conversion.

1-5.建立电动车的充放电电阻模型,如下所示:1-5. Establish the charging and discharging resistance model of the electric vehicle, as shown below:

Figure BDA0003723960990000054
Figure BDA0003723960990000054

其中,SoCbat(t)为电池组在t时刻的荷电状态,Rbat(t)为电池在t时刻的电阻。

Figure BDA0003723960990000055
均为电池组充电模型系数,
Figure BDA0003723960990000056
为电池组放电模型系数, Ibat(t)为t时刻的电池组电流。Among them, SoC bat (t) is the state of charge of the battery pack at time t, and R bat (t) is the resistance of the battery at time t.
Figure BDA0003723960990000055
are the battery pack charging model coefficients,
Figure BDA0003723960990000056
is the battery pack discharge model coefficient, and I bat (t) is the battery pack current at time t.

步骤2、基于多步强化学习的电动车进行生态自适应巡航控制,确定优化目标。Step 2. The electric vehicle based on multi-step reinforcement learning performs ecological adaptive cruise control and determines the optimization target.

2-1基于车辆安全驾驶的优化目标;为了确保车辆行驶的安全性,车间距必须受限于:2-1 Based on the optimization goal of safe driving of vehicles; in order to ensure the safety of vehicle driving, the distance between vehicles must be limited by:

dmin(t)≤dh(t)≤dmax(t) (11)d min (t)≤d h (t)≤d max (t) (11)

其中,dh(t)为车辆在t时刻时,主车与前车的车间距,dmin(t)和dmax(t)分别为所被允许的最小和最大车间距,dmin(t)和dmax(t)均来源于Q表。Among them, d h (t) is the distance between the host vehicle and the preceding vehicle at time t, d min (t) and d max (t) are the allowed minimum and maximum inter-vehicle distances, respectively, d min (t ) and d max (t) are both derived from the Q table.

dmin(t)和dmax(t)分别可表述为:d min (t) and d max (t) can be expressed as:

Figure BDA0003723960990000061
Figure BDA0003723960990000061

2-2.基于车辆驾驶舒适性的优化目标;为了确保驾驶舒适性,电动车的控制输入必须受限于:2-2. Optimization objective based on vehicle driving comfort; in order to ensure driving comfort, the control input of electric vehicles must be limited by:

amin≤U(t)≤amax (13)a min ≤U(t)≤a max (13)

其中,amin和amax分别为所被允许的最小和最大加速度。在本发明中, amin=-1m/s2,amax=1m/s2where a min and a max are the allowed minimum and maximum accelerations, respectively. In the present invention, a min =-1 m/s 2 , a max =1 m/s 2 .

2-3.基于延长车辆电池寿命的优化目标。减小

Figure BDA0003723960990000065
可以延长电池的寿命。因此为了尽可能延长电池寿命,需要使下式尽可能地减小:2-3. Based on the optimization goal of extending vehicle battery life. reduce
Figure BDA0003723960990000065
Extends battery life. Therefore, in order to prolong the battery life as much as possible, the following formula needs to be reduced as much as possible:

Figure BDA0003723960990000062
Figure BDA0003723960990000062

其中,

Figure BDA0003723960990000063
为电池组在t时刻的电流的平方,P0为驾驶循环的开始时刻,Tcyc为驾驶循环的终止时刻。in,
Figure BDA0003723960990000063
is the square of the current of the battery pack at time t, P 0 is the start time of the drive cycle, and T cyc is the end time of the drive cycle.

2-4.基于车辆能源经济性的优化目标。2-4. Optimization objective based on vehicle energy economy.

为了提升电动车能源经济性,需要使下式尽可能地减小:In order to improve the energy economy of electric vehicles, the following equations need to be reduced as much as possible:

Figure BDA0003723960990000064
Figure BDA0003723960990000064

步骤3、确定基于多步强化学习的算法为n步树回溯算法。Step 3: Determine that the algorithm based on multi-step reinforcement learning is an n-step tree backtracking algorithm.

3-1.确定基于多步强化学习的Eco-ACC策略研究下的状态变量和控制变量。3-1. Determine the state variables and control variables under the Eco-ACC strategy research based on multi-step reinforcement learning.

①状态变量X(t):为了使得电动汽车在合理的车间距范围内跟随前车,就必须满足等式(11)。因此跟车性能可以用车辆间距偏差Δd和速度偏差ΔV来评估,分别可以定义为:① State variable X(t): In order for the electric vehicle to follow the preceding vehicle within a reasonable distance between vehicles, equation (11) must be satisfied. Therefore, the following performance can be evaluated by vehicle distance deviation Δd and speed deviation ΔV, which can be defined as:

X(t)=[ΔV(t),Δd(t)]T (16)X(t)=[ΔV(t),Δd(t)]T (16)

其中,in,

Figure BDA0003723960990000071
Figure BDA0003723960990000071

ΔV(t)=Vp(t)-V(t) (18)ΔV(t)= Vp (t)-V(t) (18)

其中,BSF()为带阻函数,α,β,

Figure BDA0003723960990000078
cf为与带阻函数相关的系数,可见表1。 Vp(t)为t时刻前车的速度。Among them, BSF() is the band-stop function, α, β,
Figure BDA0003723960990000078
cf is the coefficient related to the band-stop function, see Table 1. V p (t) is the speed of the preceding vehicle at time t.

为了更直观地表述车辆地状态信息,对带阻函数进行改进,将车间距描述为:In order to express the state information of vehicles more intuitively, the band-stop function is improved, and the distance between vehicles is described as:

Figure BDA0003723960990000072
Figure BDA0003723960990000072

其中,δd(t)为t时刻的等效车间距偏差,α,β,

Figure BDA0003723960990000073
cfz为与带阻函数相关的系数,可见表1。Among them, δd(t) is the equivalent vehicle distance deviation at time t, α, β,
Figure BDA0003723960990000073
cf z is a coefficient related to the bandstop function, see Table 1.

②控制变量:本发明的控制变量为加速度。②Control variable: The control variable of the present invention is acceleration.

U(t)=a(t) (20)U(t)=a(t) (20)

其中,a(t)为t时刻的主车加速度。Among them, a(t) is the acceleration of the host vehicle at time t.

3-2.确定基于多步强化学习的Eco-ACC策略研究下的奖励函数和值函数。3-2. Determine the reward function and value function under the Eco-ACC strategy study based on multi-step reinforcement learning.

③奖励函数:为了实现控制目标,给出的奖励函数如下所示:③Reward function: In order to achieve the control objective, the given reward function is as follows:

r(X(t),U(t))=α1L1(t)+α2L2(t)+α3L3(t) (21)r(X(t),U(t))=α 1 L 1 (t)+α 2 L 2 (t)+α 3 L 3 (t) (21)

其中,α1、α2和α3为奖励函数的权重系数,可见表1。L1、L2和L3可用下式表述为:Among them, α 1 , α 2 and α 3 are the weight coefficients of the reward function, see Table 1. L 1 , L 2 and L 3 can be expressed as:

Figure BDA0003723960990000074
Figure BDA0003723960990000074

④值函数:基于多步强化学习的生态ACC策略的值函数可用下式表述为:④Value function: The value function of the ecological ACC strategy based on multi-step reinforcement learning can be expressed as:

Figure BDA0003723960990000075
Figure BDA0003723960990000075

其中,γ为折扣因子,α,β和

Figure BDA0003723960990000076
是带阻函数的参数,详见表1。where γ is the discount factor, α, β and
Figure BDA0003723960990000076
are the parameters of the band-stop function, see Table 1 for details.

在强化学习中,代理的最终目标是最大化累积奖励。奖励函数能在短时内判断该动作是好是坏。因此值函数可被描述为:In reinforcement learning, the ultimate goal of the agent is to maximize the cumulative reward. The reward function can determine whether the action is good or bad in a short period of time. So the value function can be described as:

Figure BDA0003723960990000077
Figure BDA0003723960990000077

3-3.多步学习算法采用不适用重要性采样的离线算法,即回溯树法,n步回溯树图如图2所示。3-3. The multi-step learning algorithm adopts an offline algorithm that does not apply importance sampling, that is, the backtracking tree method. The n-step backtracking tree diagram is shown in Figure 2.

3-4.基于多步强化学习的生态ACC策略的仿真平台和参数设置如图3所示。3-4. The simulation platform and parameter settings of the ecological ACC strategy based on multi-step reinforcement learning are shown in Figure 3.

带阻函数相关参数与奖励函数权重系数设置如表1所示。The relevant parameters of the bandstop function and the weight coefficient settings of the reward function are shown in Table 1.

Figure BDA0003723960990000082
Figure BDA0003723960990000082

表1Table 1

步骤4、对于不同驾驶循环下的基于多步强化学习算法的智能汽车跟车形式控制方法进行验证如下表所示。Step 4. Verify the intelligent vehicle following form control method based on the multi-step reinforcement learning algorithm under different driving cycles as shown in the following table.

表2不同驾驶循环下的基于多步强化学习算法的智能汽车跟车形式控制方法Table 2 The intelligent vehicle following form control method based on multi-step reinforcement learning algorithm under different driving cycles

Figure BDA0003723960990000081
Figure BDA0003723960990000081

仿真结果如图4所示,其结果表明:发明的基于多步强化学习算法的Eco-ACC 系统所控制的车辆和前方车辆的速度基本一致,车辆的加速度也比传统ACC系统控制的更加平滑,使乘客感到更加舒适;Eco-ACC系统所控制的车辆与前方车辆的实际车距始终保持在安全范围内,保证了车辆在行驶过程中的安全性;Eco-ACC 系统所控制的车辆比传统ACC系统控制的车辆更加节能。The simulation results are shown in Figure 4. The results show that the speed of the vehicle controlled by the invented Eco-ACC system based on the multi-step reinforcement learning algorithm is basically the same as that of the vehicle ahead, and the acceleration of the vehicle is smoother than that controlled by the traditional ACC system. Make passengers feel more comfortable; the actual distance between the vehicle controlled by the Eco-ACC system and the vehicle ahead is always kept within a safe range, ensuring the safety of the vehicle during driving; the vehicle controlled by the Eco-ACC system is better than traditional ACC. System-controlled vehicles are more energy efficient.

Claims (4)

1. An electric vehicle following running control method based on multi-step reinforcement learning is characterized by comprising the following steps:
step 1, determining a state variable X (t) through an information acquisition module and a controller design module of a vehicle, determining a control variable U (t) through a control target, and initializing relevant parameters of the vehicle;
step 2, dividing the speed V in the state variable X (t) into 50000 grids in size from-3.7 to 4.399 in equal proportion, dividing the equivalent inter-vehicle distance delta d into 50000 grids in size from-2 to 2.2999 in equal proportion, and dividing the acceleration of the control variable into 21 grids from-1 to 1; thus forming a Q table with the size of 50000X 21 tables;
step 3, inputting the state variable x (t) and the control variable U (t) into a Q table to obtain an expected value function;
step 4, solving a state variable X (t +1) at the next moment through the longitudinal dynamics module and the electric automobile energy storage module;
step 5, selecting a Q table with the minimum expected cost value within the next n steps in the future to obtain a control variable U (t +1) according to the acceleration change condition of the front vehicle at the next moment;
step 6, judging whether the Q table meets the maximum iteration times or whether the tolerance meets the self-adaptive iteration value; if yes, the solved control variable U (t +1) is used as the optimal or suboptimal control variable, otherwise, the step 4 is returned;
and 7, after iteration is ended, obtaining control input from the Q table, obtaining the optimal required power Pe through calculation, and applying the optimal required power Pe to the main vehicle.
2. The electric vehicle following running control method based on multi-step reinforcement learning according to claim 1, characterized in that longitudinal dynamics modeling is performed on the vehicle, basic information of the vehicle and physical quantities of the vehicle are modeled;
1-1, establishing a second-order vehicle longitudinal dynamic model, which is as follows:
Figure FDA0003723960980000011
wherein S represents the position of the vehicle,
Figure FDA0003723960980000012
indicating that the position of the vehicle is subjected to derivation; v represents the longitudinal speed of the vehicle,
Figure FDA0003723960980000013
representing the derivation of the vehicle longitudinal speed; u is a control input;
1-2, establishing a longitudinal force balance driving equation of the vehicle as follows:
Figure FDA0003723960980000014
wherein, F hf (t) and F hr (t) longitudinal force of front and rear wheel tires at time t, respectively, F a (t) is the air resistance at time t, F r (t) rolling resistance at time t, m is vehicle mass, and V is vehicle longitudinal speed;
the air resistance acting on the vehicle at time t can be expressed as:
Figure FDA0003723960980000021
where ρ is the air density, C D (d h ) Is the aerodynamic drag coefficient related to the distance between two vehicles, A v The frontal windward area of the main vehicle,
Figure FDA0003723960980000022
represents the square of the host vehicle velocity;
the rolling resistance expression at time t is as follows:
F r (t)=mgμ (4)
wherein g is the gravity acceleration, mu is the rolling resistance coefficient, and m is the automobile mass;
the aerodynamic drag coefficient can be expressed as:
Figure FDA0003723960980000023
wherein, C h,d Representing a nominal aerodynamic drag coefficient of the host vehicle; parameter c 1 And c 2 Is obtained by regression of experimental data, d h The distance between two vehicles is the following distance of the two vehicles;
following distance d between two vehicles h Calculated by the following way:
d h =S p -S h -L car (6)
wherein L is car The length of the body of the electric vehicle; s p ,S h Respectively a front vehicle position and a main vehicle position;
1-3, establishing a motor driving model of the electric vehicle;
in order to accurately describe the dynamic characteristics of the vehicle, the actual output torque T of the motor is assumed without considering the constraint of the motor efficiency m With the motor speed omega m Can be expressed as:
Figure FDA0003723960980000024
wherein R and G r Respectively, the radius of the tire and the fixed gear ratio, T ω (t) is the traction at time t;
1-4, establishing a battery power model of the electric vehicle; neglecting the effect of the auxiliary on the battery power, the desired output battery power may be equivalent to the desired motor input power, expressed as follows:
Figure FDA0003723960980000025
wherein, P bat Is the battery power; eta m To the motor efficiency; where the motor efficiency can be described as:
η m (t)=f mm (t),T m (t)) (9)
wherein f is m A power conversion efficiency function representing a motor conversion;
1-5, establishing a charge and discharge resistance model of the electric vehicle, as follows:
Figure FDA0003723960980000031
wherein, SoC bat (t) is the state of charge of the battery at time t, R bat (t) is the resistance of the battery at time t;
Figure FDA0003723960980000032
are all the coefficients of a charging model of the battery pack,
Figure FDA0003723960980000033
is a battery discharge model coefficient, I bat And (t) is the battery pack current at time t.
3. The electric vehicle following running control method based on multi-step reinforcement learning according to claim 2, characterized in that the electric vehicle based on multi-step reinforcement learning performs ecological adaptive cruise control to determine an optimization target, and specifically realizes the following:
2-1 optimizing target based on safe driving of vehicle; in order to ensure the safety of the vehicle running, the inter-vehicle distance must be limited by:
d min (t)≤d h (t)≤d max (t) (11)
wherein d is h (t) is the inter-vehicle distance between the main vehicle and the preceding vehicle at time t, d min (t) and d max (t) minimum and maximum permissible inter-vehicle distances, d min (t) and d max (t) are all derived from the Q table;
d min (t) and d max (t) can be expressed as:
Figure FDA0003723960980000034
2-2, optimizing the target based on the driving comfort of the vehicle; to ensure driving comfort, the control inputs of an electric vehicle must be limited by:
a min ≤U(t)≤a max (13)
wherein, a min And a max Minimum and maximum accelerations allowed, respectively; in the present invention, a min =-1m/s 2 ,a max =1m/s 2
2-3. based on the optimization goal of prolonging the service life of the vehicle battery; reduce
Figure FDA0003723960980000035
The service life of the battery can be prolonged; therefore, in order to extend the battery life as long as possible, it is necessary to reduce the following equation (14) as small as possible:
Figure FDA0003723960980000036
wherein,
Figure FDA0003723960980000037
for battery packs at time tSquare of the current of (d), t 0 To the start of the driving cycle, T cyc Is the end time of the driving cycle;
2-4, optimizing target based on vehicle energy economy; in order to improve the energy economy of the electric vehicle, it is necessary to reduce the following formula as much as possible:
Figure FDA0003723960980000041
4. the electric vehicle following running control method based on multi-step reinforcement learning according to claim 3, characterized in that the algorithm based on multi-step reinforcement learning is determined to be an n-step tree backtracking algorithm;
3-1, determining state variables and control variables under the research of the Eco-ACC strategy based on multi-step reinforcement learning;
state variable x (t): in order for the electric vehicle to follow the leading vehicle within a reasonable inter-vehicle distance range, equation (11) must be satisfied; the following performance is therefore evaluated with the vehicle separation deviation Δ d and the speed deviation Δ V, which can be defined as:
X(t)=[ΔV(t),Δd(t)] T (16)
wherein,
Figure FDA0003723960980000042
ΔV(t)=V p (t)-V(t) (18)
wherein BSF () is a band stop function, α, β,
Figure FDA0003723960980000043
cf is a coefficient related to the band stop function, V p (t) is the speed of the vehicle ahead at time t;
in order to express the state information of the vehicle more intuitively, the belt resistance function is improved, and the distance between the vehicles is described as follows:
Figure FDA0003723960980000044
where δ d (t) is the equivalent inter-vehicle distance deviation at time t, α, β,
Figure FDA0003723960980000045
cf z is a coefficient related to the band stop function;
controlling variables: the control variable is an acceleration;
U(t)=a(t) (20)
wherein a (t) is the principal acceleration at time t;
3-2, determining a reward function and a value function under the research of the Eco-ACC strategy based on multi-step reinforcement learning;
③ reward function: to achieve the control objective, the reward function is given as follows:
r(X(t),U(t))=α 1 L 1 (t)+α 2 L 2 (t)+α 3 L 3 (t) (21)
wherein alpha is 1 、α 2 And alpha 3 A weighting factor that is a reward function; l is 1 、L 2 And L 3 Can be expressed as:
Figure FDA0003723960980000051
value function: the value function of the ecological ACC strategy based on multi-step reinforcement learning can be expressed as follows:
Figure FDA0003723960980000052
wherein gamma is a discount factor, alpha, beta and
Figure FDA0003723960980000053
is a parameter of the band stop function;
in reinforcement learning, the ultimate goal of an agent is to maximize the jackpot; the reward function can judge whether the action is good or bad in a short time; the value function can therefore be described as:
Figure FDA0003723960980000054
and 3-3, adopting an off-line algorithm which is not suitable for importance sampling, namely an n-step back-tracing tree method, by using the multi-step learning algorithm.
CN202210770539.XA 2022-06-30 2022-06-30 A method for controlling electric vehicle following vehicle based on multi-step reinforcement learning Active CN114954455B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210770539.XA CN114954455B (en) 2022-06-30 2022-06-30 A method for controlling electric vehicle following vehicle based on multi-step reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210770539.XA CN114954455B (en) 2022-06-30 2022-06-30 A method for controlling electric vehicle following vehicle based on multi-step reinforcement learning

Publications (2)

Publication Number Publication Date
CN114954455A true CN114954455A (en) 2022-08-30
CN114954455B CN114954455B (en) 2024-07-02

Family

ID=82966644

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210770539.XA Active CN114954455B (en) 2022-06-30 2022-06-30 A method for controlling electric vehicle following vehicle based on multi-step reinforcement learning

Country Status (1)

Country Link
CN (1) CN114954455B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109484407A (en) * 2018-11-14 2019-03-19 北京科技大学 A kind of adaptive follow the bus method that electric car auxiliary drives
CN112046484A (en) * 2020-09-21 2020-12-08 吉林大学 Q learning-based vehicle lane-changing overtaking path planning method
CN112989553A (en) * 2020-12-28 2021-06-18 郑州大学 Construction and application of CEBs (common electronic devices and controllers) speed planning model based on battery capacity loss control
WO2021197246A1 (en) * 2020-03-31 2021-10-07 长安大学 V2x-based motorcade cooperative braking method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109484407A (en) * 2018-11-14 2019-03-19 北京科技大学 A kind of adaptive follow the bus method that electric car auxiliary drives
WO2021197246A1 (en) * 2020-03-31 2021-10-07 长安大学 V2x-based motorcade cooperative braking method and system
CN112046484A (en) * 2020-09-21 2020-12-08 吉林大学 Q learning-based vehicle lane-changing overtaking path planning method
CN112989553A (en) * 2020-12-28 2021-06-18 郑州大学 Construction and application of CEBs (common electronic devices and controllers) speed planning model based on battery capacity loss control

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李文昌;郭景华;王进;: "分层架构下智能电动汽车纵向运动自适应模糊滑模控制", 厦门大学学报(自然科学版), no. 03, 28 May 2019 (2019-05-28) *

Also Published As

Publication number Publication date
CN114954455B (en) 2024-07-02

Similar Documents

Publication Publication Date Title
CN112896161B (en) An ecological adaptive cruise control system for electric vehicles based on reinforcement learning
CN109703375B (en) Coordinated recovery control method of regenerative braking energy for electric vehicles
CN105416077B (en) The EMS and management method of a kind of electric automobile
Biao et al. Regenerative braking control strategy of electric vehicles based on braking stability requirements
Zhuang et al. Integrated energy-oriented cruising control of electric vehicle on highway with varying slopes considering battery aging
CN113561793B (en) A dynamically constrained energy management strategy for smart fuel cell vehicles
CN109977449B (en) Hybrid dynamic modeling and optimizing control method for intelligent automobile longitudinal dynamics system
CN115027290A (en) Hybrid electric vehicle following energy management method based on multi-objective optimization
CN105667501B (en) The energy distributing method of motor vehicle driven by mixed power with track optimizing function
CN106055830A (en) PHEV (Plug-in Hybrid Electric Vehicle) control threshold parameter optimization method based on dynamic programming
Zhang et al. Powertrain design and energy management of a novel coaxial series-parallel plug-in hybrid electric vehicle
Tong et al. Speed planning for connected electric buses based on battery capacity loss
Ye et al. A fast Q-learning energy management strategy for battery/supercapacitor electric vehicles considering energy saving and battery aging
Zhang et al. Deep reinforcement learning based multi-objective energy management strategy for a plug-in hybrid electric bus considering driving style recognition
CN105620310A (en) Three-motor hybrid truck and power system parameter matching method
CN110356396B (en) Method for instantaneously optimizing speed of electric vehicle by considering road gradient
CN114954455B (en) A method for controlling electric vehicle following vehicle based on multi-step reinforcement learning
CN114103711B (en) Control method, system, device and storage medium for orderly charging of charging load
Zhang et al. Multi-objective optimization for pure electric vehicle during a car-following process
Li et al. Study on regenerative braking control strategy for extended range electric vehicles
Hong et al. Development of a mathematical model of a train in the energy point of view for the international conference on control, automation and systems 2007 (ICCAS 2007)
CN116424317A (en) Electric automobile economic self-adaptive cruise control method based on multi-step DDQN
CN111176140B (en) An integrated control method for electric vehicle motion-transmission-energy system
CN112659922B (en) Hybrid power rail vehicle and direct current bus voltage control method and system thereof
Bravo et al. The influences of energy storage and energy management strategies on fuel consumption of a fuel cell hybrid vehicle

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant