CN112896161A

CN112896161A - Electric automobile ecological self-adaptation cruise control system based on reinforcement learning

Info

Publication number: CN112896161A
Application number: CN202110171999.6A
Authority: CN
Inventors: 翟春杰; 杨建�; 杨祥宇; 颜成钢; 孙垚棋
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2021-02-08
Filing date: 2021-02-08
Publication date: 2021-06-04
Anticipated expiration: 2041-02-08
Also published as: CN112896161B

Abstract

The invention discloses an electric automobile ecological self-adaptive cruise control system based on reinforcement learning, which comprises an information acquisition module, a longitudinal dynamics module, an electric automobile energy storage module, a control target module and a controller design module, wherein the information acquisition module is used for acquiring information of an electric automobile; the information acquisition module acquires position and speed information of a front vehicle through a radar and a vehicle-mounted information sensor; the longitudinal dynamics module is used for calculating acceleration, lumped resistance, actual vehicle distance, wheel torque and expected power; the electric vehicle energy storage module is used for calculating required power and resistance under the driving and braking conditions of the electric vehicle; the control target module ensures the safety of the vehicle by restricting the distance between the vehicles; the energy-saving driving is improved and the service life of a battery is prolonged by setting an optimization target; the controller design module is used for determining the state variables and the specific involved contents of the cost function in the control process. The system can ensure the following performance of the electric automobile, realize the driving safety, enhance the energy economy and prolong the service life of the battery.

Description

Electric automobile ecological self-adaptation cruise control system based on reinforcement learning

Technical Field

The invention belongs to automobile auxiliary intelligent driving, and particularly relates to an electric automobile ecological self-adaptive cruise control system based on self-adaptive dynamic programming.

Background

Currently, the automobile industry faces a greater pressure of energy conservation and emission reduction as a large industrial user with greater energy consumption. Zero emission of pollutants in the driving process of the electric automobile is an important direction for future development of the automobile industry. How to apply the intelligent driving technology to the electric automobile to further develop the energy-saving potential of the electric automobile is a key research direction of various colleges and universities and vehicle enterprises. An Advanced Driver Assistance System (ADAS) is an initial development stage of an intelligent driving technology, and can automatically acquire relevant environmental data by using various vehicle-mounted sensors, realize automatic control on a vehicle, and improve driving comfort and active safety.

As an advanced intelligent driving assistance system, Adaptive Cruise Control (ACC) is developed from early Cruise Control, and is mainly used to Control longitudinal movement of a vehicle. The ACC system can use various vehicle-mounted sensors to detect the relative position and speed of a vehicle in front, and automatically adjust the speed of the vehicle according to a control strategy so as to keep an expected safe distance, thereby being beneficial to improving the traffic flow, reducing traffic accidents and providing comfortable driving experience.

Although the ACC system can maintain a certain safe vehicle distance and reduce energy consumption by reducing air resistance, its energy saving effect is not significant, especially for passenger vehicles with small vehicle head area. Particularly, when the electric automobile is controlled by the ACC system based on a constant vehicle distance and a constant time distance, the electric automobile often follows the front vehicle closely at a certain vehicle distance, and if the speed fluctuation of the front vehicle is large, the electric automobile is always in a frequent acceleration and deceleration state, which greatly affects the service life of a battery in the electric automobile and also causes energy consumption loss and driving discomfort.

Executing an Action-Dependent Heuristic Dynamic Programming (ADHDP) framework, that is, referring to book "intelligent optimization control based on adaptive Dynamic Programming" 4.3 ADHDP algorithm based on BP network and implementing P118, author: linxiaofeng, Sonshaoshangjian, Songchuning.

Disclosure of Invention

The invention aims to provide an enhanced Learning (RL) -based Eco-Adaptive Cruise Control (Eco-ACC) system for an electric vehicle, which can ensure the following performance of the electric vehicle, realize driving safety, enhance energy economy and prolong the service life of a battery.

An electric automobile ecological self-adaptive cruise control system based on reinforcement learning comprises an information acquisition module, a longitudinal dynamics module, an electric automobile energy storage module, a control target module and a controller design module.

The information acquisition module acquires position and speed information of a front vehicle through a radar and a vehicle-mounted information sensor;

the longitudinal dynamics module is used for calculating acceleration, lumped resistance, actual vehicle distance, wheel torque and expected power;

the electric vehicle energy storage module is used for calculating required power and resistance under the driving and braking conditions of the electric vehicle;

the control target module ensures the safety of the vehicle by restricting the distance between the vehicles; the energy-saving driving is improved and the service life of a battery is prolonged by setting an optimization target;

the controller design module is used for determining the state variables and the specific involved contents of the cost function in the control process.

An enhanced learning-based ecological adaptive cruise control method for an electric vehicle adopts an execution-Dependent Heuristic Dynamic Programming (ADHDP) framework, and comprises the following steps of:

1) the state variable x (t) is determined through an information acquisition module and a controller design module, the utility function U (t) is determined through a control target module, and relevant parameters are initialized. (ii) a

2) Inputting the state variable x (t) into the execution network acquisition control variable u (t);

3) inputting the state variable x (t) and the control variable u (t) into an evaluation network to obtain an expected cost J (t);

4) setting errors of an execution network and an evaluation network;

5) and solving a state variable x (t +1) at the next moment through the longitudinal dynamics module and the electric automobile energy storage module.

6) Updating the weight value of the execution network, and inputting a state variable x (t +1) into the execution network to obtain a control variable u (t + 1);

7) updating the weight of the evaluation network, and obtaining the expected cost through the evaluation network

A value;

8) and judging whether the evaluation network and the execution network meet the maximum iteration times or whether the tolerance meets the self-adaptive iteration value. If the control variable u (t +1) is satisfied, the solved control variable u (t +1) is used as the optimal or suboptimal control variable, otherwise, the second step is returned.

The invention has the following beneficial effects:

the speed of the vehicle controlled by the system is basically consistent with that of the vehicle in front, and the acceleration of the vehicle is smoother than that controlled by the traditional ACC system, so that passengers feel more comfortable; the actual distance between the vehicle controlled by the system and the vehicle in front is always kept in a safe range, so that the safety of the vehicle in the driving process is ensured; the vehicle controlled by the system of the invention is more energy-saving than the vehicle controlled by the traditional ACC system. The system can ensure the following performance of the electric automobile, realize the driving safety, enhance the energy economy and prolong the service life of the battery.

Drawings

FIG. 1 is a vehicle following scenario;

FIG. 2 is a block diagram of the ADHDP architecture;

FIG. 3 is an ADHDP evaluation network structure;

FIG. 4 is an ADHDP execution network structure;

FIG. 5 is a flow chart based on the ADHDP control algorithm;

FIG. 6 is a comparison of UDDS driving cycle simulation results;

FIG. 7 is a comparison of results of MANHATAN driving cycle simulation;

fig. 8 is a comparison of WLTC2 driving cycle simulation results.

Detailed Description

The system and method of the present invention are further described with reference to the accompanying drawings and examples.

A vehicle following scene to be researched is shown in fig. 1, wherein a controlled electric vehicle and a vehicle in front of the controlled electric vehicle are respectively marked as a main vehicle and a front vehicle; the actual vehicle distance between the main vehicle and the front vehicle is represented by L; the speeds of the main and front vehicles being V respectively_hAnd V_pAnd (4) showing. The specific contents of each module are as follows:

a longitudinal dynamics module:

the longitudinal dynamics model of the master is represented as follows:

in the formula: s_h(t)、v_h(T) and T_w(t) position, velocity and wheel torque of the host vehicle, respectively; m, R, eta_tAnd delta is the main vehicle mass, the effective rolling radius of the tire, the transmission efficiency and the rotation inertia coefficient respectively; f_b(t) and F_r(t) are each independentlyPower and collective resistance.

Lumped resistance F consisting of aerodynamic resistance, rolling resistance and gravity_r(t)F_r(t) is represented as follows:

in the formula: phi_h(L(t))、C_d、μ_h、A_vAnd theta(s)_h(t)) respectively are a vehicle normalized resistance coefficient, an air resistance coefficient, a rolling resistance coefficient, a head windward area and a road surface gradient; g and ρ are the acceleration of gravity and the air density, respectively. Further, the distance L (t) between the host vehicle and the preceding vehicle can be expressed as

L(t)＝s_p(t)-s_h(t)-d_car (3)

In the formula: d_carIndicating the length of the main vehicle body, s_p(t) represents a preceding vehicle position.

The torque of the wheel is output or input to the motor through the gear, and the torque T of the motor_mAnd a rotational speed omega_mIs represented as follows:

in the formula: g_rThe fixed gear ratio of the main vehicle. Wheel speed omega_wThe calculation formula of (t) is as follows:

then, the input power of the motor inverter is given as follows:

in the formula: eta_m(t)(0＜η_m(t) < 1) represents the efficiency of the motor inverter.

Electric automobile energy storage system module:

the variable symbols are defined as follows:

·P_bat(t): the output power of the battery pack at the time t;

·P_e(t): the required power of the electric automobile at the moment t;

·V_bat(t): open circuit voltage of the battery pack at time t;

·I_bat(t): the current of the battery pack at the time t;

·SoC_bat(t): the State of Charge (SOC) of the battery pack at time t;

·R_bat，disch(SoC_bat(t)): the discharge resistance of the battery pack at the time t;

·R_bat，ch(SoC_bat(t)): the charging resistance of the battery pack at the time t;

discharge resistance R of battery pack_bat，disch(SoC_bat(t)) and a charging resistor

R_bat，ch(SoC_bat(t)) is represented as follows:

(1) a driving mode:

(2) regenerative braking mode:

the state of charge SoC of the battery is as follows:

a control target module:

(1) vehicle safety:

to ensure vehicle safety, constraints on the vehicle spacing are given as follows:

d_min(υ_h(t))≤L(t)≤d_max(υ_h(t)) (11)

wherein d is_min(υ_h(t)) and d_max(υ_h(t)) are respectively the minimum and maximum safe vehicle distances allowed, and their calculation formula is as follows:

(2) energy-saving driving:

in order to ensure the energy consumption economy of the vehicle during driving, the following optimization goals are given:

(3) and the service life of the battery is prolonged:

in order to reduce the battery capacity loss of the vehicle during running, the following optimization objectives are given:

a controller design module:

(1) bandstop function with compensation factor:

in order to obtain the error Δ d (t) between vehicles, the error of the iteration δ d (t) of the vehicle in the safety range is firstly obtained, which is specifically described as follows:

from the equation, Δ d (t) the inter-vehicle error can be found as:

wherein alpha is more than 0 and beta is more than or equal to 1

d_min，d_max∈R⁺Respectively the lower and upper band stop limits, c_fIs a compensation factor.

In an optimization problem that minimizes the objective with a multi-objective cost function, the cost of the optimization is reduced when the parameters a, ss,

and c_fAfter setting correctly, if the band stop function

As part of the cost function, the actual vehicle distance L (t) is limited to [ d [ [ d ]_min，d_max]Within the range.

(2) Demand power optimization problem based on reinforcement learning:

first, the basic variables are defined:

x (t): state variables of the electric automobile at the moment t;

·F_b(t): braking force of the electric automobile at the time t;

·ω_w(t): the wheel rotating speed of the electric automobile at the time t;

·ω_m(t): the motor rotating speed of the electric automobile at the moment t;

·T_m(t): the motor torque of the electric automobile at the time t;

·T_m，max(t): the maximum motor torque allowed by the electric automobile at the time t;

u (t): control input of the electric automobile at the time t;

·η_m(t): the motor efficiency of the electric automobile at the moment t;

·P_e(t): the required power of the electric automobile at the moment t;

the continuous dynamic state equation of the host at time t is as follows:

in the formula: x (t) ═ Δ v_h(t)，Δd(t)]^TState variables representing the main vehicle dynamics system. After two types of variables are defined, the objective cost function J in the optimization problem is as follows:

in the formula: u is the utility function, γ is the reduction coefficient, 0 < γ ≦ 1, and J is the cost function for state x (t), which depends on the initial time t and the initial state x (t). The goal of reinforcement learning is to select a control sequence u (t) that minimizes the cost function defined by equation (18). In addition, the optimization goal of the objective cost function is as follows:

U(t)＝λ₁L₁+λ₂L₂+λ₃L₃. (20)

in the formula: considering the driving safety of the vehicle, L₁With the aim of keeping the distance between the vehicles at a minimum distance d_minAnd the maximum vehicle distance d_maxIn the meantime. In addition, the concentration of the alpha, beta,

and c_fIs a parameter of the spacing stop band function. L is₂The energy consumption economy of the vehicle during running can be improved. L is₃The service life of the battery of the electric automobile can be prolonged.

Assuming the expected motor torque as the control variable, the control variables optimized based on the ADHDP algorithm are given as follows

u^*(·|t₀)＝argminJ(x(·|t₀)) (21)

A flow chart of the overall control algorithm, as shown in FIG. 5;

the specific operation flow of the present invention is shown in fig. 3, and first we determine the state variable x (t) ═ Δ v, BSF by the controller module. Learning is then performed through our ADHDP framework to obtain the control best variables. The learning of the ADHDP learning framework comprises the following steps:

an enhanced learning-based ecological adaptive cruise control method for an electric vehicle adopts an execution-Dependent Heuristic Dynamic Programming (ADHDP) framework, as shown in fig. 2, and the structures of an evaluation network and an execution network based on a BP neural network are shown in fig. 3 and 4, and the method comprises the following steps:

1) the state variable x (t) is determined through an information acquisition module and a controller design module, the utility function U (t) is determined through a control target module, and relevant parameters are initialized.

2) Inputting state variable x (t) into execution network acquisition control variable u (t)

3) Inputting the state variable x (t) and the control variable u (t) into an evaluation network to obtain the expected cost

4) Setting errors of execution network and evaluation network

6) Updating the weight of the execution network, inputting the state variable x (t +1) to the execution network to obtain the control variable u (t +1)

Value of

At the beginning of the learning process, the parameters of the evaluation network and the execution network are initialized randomly. In each time step after the simulation begins, iteration is carried out on the weight of the evaluation network until the maximum iteration number N is reached_cOr E_cTo an allowable tolerance T_cAfter iteration is terminated, obtaining an approximate value function from the evaluation network; iterating the weights of the execution network until a maximum number of iterations N is reached_ahOr E_aTo an allowable tolerance T_cAfter iteration is terminated, obtaining control input from the execution network, and obtaining the optimal required power P through calculation_eAnd applied to the host vehicle. The simulation parameters are shown in table 1.

TABLE 1 Online learning parameters

The inventive Eco-ACC control system was evaluated and tested using driving cycles such as urban, high speed, suburban, etc. The leading vehicle runs along the speed track of the driving cycle, and the following vehicle respectively adopts the traditional ACC system and the inventive Eco-ACC system to follow the leading vehicle. The test data for the typical UDDS, MANHATTAN and WLTC2 driving cycles are shown in fig. 6, 7 and 8, respectively, it being noted that to facilitate viewing of the simulation results, only the simulation effect plot for the first 400 seconds is shown in the simulation plot; the test data for more driving cycles are shown in tables 2 and 3.

TABLE 2 loss of Battery Capacity (%)

TABLE 3 loss of energy consumption (w.h)

Simulation results show that: the speeds of the vehicle controlled by the Eco-ACC system are basically consistent with those of the vehicle in front, and the acceleration of the vehicle is smoother than that controlled by the traditional ACC system, so that passengers feel more comfortable; the actual distance between the vehicle controlled by the Eco-ACC system and the vehicle in front is always kept in a safe range, so that the safety of the vehicle in the driving process is ensured; the Eco-ACC system controlled vehicle is more energy efficient than the conventional ACC system controlled vehicle, see table 3.

Claims

1. An electric automobile ecological self-adaptive cruise control system based on reinforcement learning is characterized by comprising an information acquisition module, a longitudinal dynamics module, an electric automobile energy storage module, a control target module and a controller design module;

2. The electric vehicle ecological adaptive cruise control system based on reinforcement learning according to claim 1, wherein the longitudinal dynamics module comprises the following specific contents:

the controlled electric automobile and the vehicle in front of the controlled electric automobile are respectively marked as a main automobile and a front automobile; the actual vehicle distance between the main vehicle and the front vehicle is represented by L; speed of main and preceding vehicles respectivelyV_hAnd V_pRepresents;

a longitudinal dynamics module:

the longitudinal dynamics model of the master is represented as follows:

in the formula: s_h(t)、v_h(T) and T_ω(t) position, velocity and wheel torque of the host vehicle, respectively; m, R, eta_tAnd delta is the main vehicle mass, the effective rolling radius of the tire, the transmission efficiency and the rotation inertia coefficient respectively; f_b(t) and F_r(t) braking force and collective resistance, respectively;

in the formula: phi_h(L(t))、C_d、μ_h、A_vAnd theta(s)_h(t)) respectively are a vehicle normalized resistance coefficient, an air resistance coefficient, a rolling resistance coefficient, a head windward area and a road surface gradient; g and rho are gravity acceleration and air density respectively; further, the distance L (t) between the host vehicle and the preceding vehicle can be expressed as

L(t)＝s_p(t)-s_h(t)-d_car (3)

In the formula: d_carIndicating the length of the main vehicle body, s_p(t) represents a preceding vehicle position;

in the formula: g_rThe fixed gear ratio of the main vehicle; wheel speed omega_wThe calculation formula of (t) is as follows:

then, the input power of the motor inverter is given as follows:

3. The electric vehicle ecological adaptive cruise control system based on reinforcement learning of claim 2, characterized in that the electric vehicle energy storage system module comprises the following specific contents:

electric automobile energy storage system module:

the variable symbols are defined as follows:

·P_bat(t): the output power of the battery pack at the time t;

·P_e(t): the required power of the electric automobile at the moment t;

·V_bat(t): open circuit voltage of the battery pack at time t;

·I_bat(t): the current of the battery pack at the time t;

·SoC_bat(t): the State of Charge (SOC) of the battery pack at time t;

·R_batdisch(SoC_bat(t)): the discharge resistance of the battery pack at the time t;

·R_batch(SoC_bat(t)): the charging resistance of the battery pack at the time t;

discharge resistance R of battery pack_bat，disch(SoC_bat(t)) and a charging resistance R_bat，ch(SoC_bat(t)) is represented as follows:

(1) a driving mode:

(2) regenerative braking mode:

the state of charge SoC of the battery is as follows:

4. the electric vehicle ecological adaptive cruise control system based on reinforcement learning according to claim 3, wherein the specific contents of the control target module are as follows:

a control target module:

(1) vehicle safety:

d_min(v_h(t))≤L(t)≤d_max(v_h(t)) (11)

wherein d is_min(v_h(t)) and d_max(v_h(t)) are respectively the minimum and maximum safe vehicle distances allowed, and their calculation formula is as follows:

(2) energy-saving driving:

(3) and the service life of the battery is prolonged:

5. the electric vehicle ecological adaptive cruise control system based on reinforcement learning according to claim 4, wherein the controller design module comprises the following specific contents:

a controller design module:

(1) bandstop function with compensation factor:

from the equation, Δ d (t) the inter-vehicle error can be found as:

wherein alpha is more than 0, beta is more than or equal to 1,

d_min，d_max∈R⁺respectively the lower and upper band stop limits, c_fTo compensate forA factor;

and c_fAfter setting correctly, if the band stop function

As part of the cost function, the actual vehicle distance L (t) is limited to [ d [ [ d ]_min，d_max]Within the range;

(2) demand power optimization problem based on reinforcement learning:

first, the basic variables are defined:

x (t): state variables of the electric automobile at the moment t;

·F_b(t): braking force of the electric automobile at the time t;

·ω_w(t): the wheel rotating speed of the electric automobile at the time t;

·ω_m(t): the motor rotating speed of the electric automobile at the moment t;

·T_m(t): the motor torque of the electric automobile at the time t;

u (t): control input of the electric automobile at the time t;

·η_m(t): the motor efficiency of the electric automobile at the moment t;

·P_e(t): the required power of the electric automobile at the moment t;

the continuous dynamic state equation of the host at time t is as follows:

in the formula: x (t) ═ Δ v_h(t)，Δd(t)]^TA state variable representing a primary vehicle dynamics system; after defining two types of variables, optimizing the aim in the problemThe standard cost function J is as follows:

in the formula: u is a utility function, gamma is a discount coefficient, gamma is more than 0 and less than or equal to 1, and the function J is a cost function of the state x (t) and depends on the initial time t and the initial state x (t); the goal of reinforcement learning is to select a control sequence u (t) that minimizes the cost function defined by equation (18); in addition, the optimization goal of the objective cost function is as follows:

U(t)＝λ_iL_i+λ₂L₂+λ_aL_a. (20)

in the formula: considering the driving safety of the vehicle, L₁With the aim of keeping the distance between the vehicles at a minimum distance d_minAnd the maximum vehicle distance d_maxTo (c) to (d); in addition, the concentration of the alpha, beta,

and c_fIs a parameter of the spacing stop band function; l is₂The energy consumption economy of the vehicle during running can be improved; l is₃The service life of the battery of the electric automobile can be prolonged;

u^*(·|t₀)＝argminJ(x(·|t₀)) (21)。

6. The electric vehicle ecological adaptive cruise control system based on reinforcement learning according to claim 5, wherein an electric vehicle ecological adaptive cruise control method based on reinforcement learning adopts an execution dependence heuristic dynamic programming framework, and comprises the following steps:

1) determining a state variable x (t) through an information acquisition module and a controller design module, determining a utility function U (t) through a control target module, and initializing relevant parameters;

4) Setting errors of execution network and evaluation network

5) Solving a state variable x (t +1) at the next moment through a longitudinal dynamics module and an electric vehicle energy storage module;

Value of

8) Judging whether the evaluation network and the execution network meet the maximum iteration times or whether the tolerance meets the self-adaptive iteration value; if the control variable u (t +1) is satisfied, the solved control variable u (t +1) is used as the optimal or suboptimal control variable, otherwise, the second step is returned.