CN114852043A

CN114852043A - HEV energy management method and system based on layered return TD3

Info

Publication number: CN114852043A
Application number: CN202210298825.0A
Authority: CN
Inventors: 颜伏伍; 王金海; 杜常清; 彭辅明
Original assignee: Wuhan University of Technology WUT
Current assignee: Wuhan University of Technology WUT
Priority date: 2022-03-23
Filing date: 2022-03-23
Publication date: 2022-08-05
Anticipated expiration: 2042-03-23
Also published as: CN114852043B

Abstract

The invention belongs to the technical field of hybrid electric vehicle energy management, and discloses a HEV energy management method and system based on layered return TD 3. The invention combines a parallel hybrid vehicle model to select a state space signal and an action space signal of the HEV energy management strategy, adopts a layered return structure, and the layered return structure comprises two return functions and four adjusting layers in total, can pertinently adjust the control strategy according to different driving states of the vehicle, reduces unnecessary repeated exploration behaviors, and improves the overall performance of the energy management strategy.

Description

HEV energy management method and system based on layered return TD3

Technical Field

The invention belongs to the technical field of hybrid electric vehicle energy management, and particularly relates to a HEV energy management method and system based on layered return TD 3.

Background

Among all environment-friendly vehicles, a Hybrid Electric Vehicle (HEV) enjoys a longer travel distance than a pure electric vehicle, and has lower fuel consumption and is more environment-friendly than a conventional fuel-powered vehicle. However, hybrid vehicles have an energy management system that is far more complex than conventional fuel-powered and electric vehicles. Therefore, an Energy Management Strategy (EMS) of the hybrid vehicle has been a research focus in the automobile field.

Existing hybrid vehicle energy management strategies can be divided into three main categories: rule-based strategies, optimization-based strategies, and learning-based strategies. While rule-based energy management strategies are easy to implement, it is difficult to develop reasonable rules for very complex operating conditions. The optimization-based energy management policies include a global optimization policy and a real-time optimization policy. Typical global optimization strategy algorithms are computationally expensive, are usually performed offline, and are often used as benchmarks for evaluating the effectiveness of other online EMSs. The optimization efficiency of a real-time optimization strategy such as the Pontryagin minimum principle is good, but a common state (co-state) is difficult to obtain, and the calculation amount is relatively large. Real-time optimization strategies such as an equivalent fuel consumption minimization strategy have good real-time characteristics, but historical road information used for calculating equivalent fuel consumption often cannot represent future driving conditions, so that the robustness of the algorithm is poor. The key to the success of real-time optimization strategies such as model prediction control is the rapid prediction and rapid optimization strategies, which require predicting road conditions in advance, which depends on models with superior performance to a great extent. Compared with the traditional rule-based strategy, the energy management strategy based on the reinforcement learning algorithm Q-learning can greatly improve the fuel economy performance of the vehicle, but has the problem of 'dimension disaster'. Deep Deterministic Policy Gradient (DDPG) strategies, while they can be trained in an environment with continuous or discrete state space and continuous action space, suffer from incremental bias and suboptimal strategies due to overestimation of the cost function. Although a double delay depth Deterministic Policy Gradient (TD 3) strategy can make up for the problem of over-estimation of the DDPG strategy, the existing TD 3-based energy management method cannot specifically adjust the control strategy according to different driving states of the vehicle, and the overall performance of the energy management strategy needs to be further improved.

Disclosure of Invention

The invention provides a HEV energy management method and system based on layered return TD3, and solves the problem that in the prior art, an energy management scheme based on TD3 cannot pertinently adjust a control strategy according to different driving states of a vehicle.

The invention provides an HEV energy management method based on a layered reward TD3, which comprises the following steps:

establishing a parallel hybrid vehicle model;

selecting a state space signal and an action space signal of an HEV energy management strategy by combining the parallel hybrid vehicle model;

combining the parallel hybrid vehicle model to construct a hierarchical return function; the layered return function comprises a first return function and a second return function, the first return function or the second return function is activated according to an activation condition, and the first return function and the second return function are respectively divided into two different adjusting layers according to the range of the state of charge of the battery;

constructing an HEV energy management learning network based on the layered return TD3 based on the state space signal, the action space signal and the layered return function;

the HEV energy management learning network based on the hierarchical return TD3 is trained, and an energy management strategy is executed through the trained HEV energy management learning network based on the hierarchical return TD 3.

Preferably, the operating modes of the vehicle in the parallel hybrid vehicle model include an electric-only mode, a neutral mode and a parallel mode.

Preferably, the state space signal is S ═ (v, SOC, m) _f Cs), the motion space signal is a ═ T (T) _eng |T _eng ∈[-250，841]) (ii) a Where v represents the traveling speed of the vehicle, SOC represents the battery state of charge, and m represents _f Representing the fuel consumption rate of the engine; cs represents the state of the vehicle clutch, cs is 0 represents that the clutch is disconnected, and cs is 1 represents that the clutch is closed; t is _eng Indicating engine outputAnd (6) outputting the torque.

Preferably, the activation mode of the layered reward function is as follows: judging the state of the clutch, activating the first report-back function if the clutch is disconnected, and judging the instantaneous vehicle speed if the clutch is closed; if the instantaneous vehicle speed is zero, activating the first reward function, and if the instantaneous vehicle speed is not zero, activating the second reward function;

the first report-back function R _soc Expressed as:

the second return function R _com Expressed as:

wherein, the first adjusting layer L1 corresponds to SOC (t) > 0.8 or SOC (t) < 0.3; the second adjusting layer L2 corresponds to SOC (t) of more than or equal to 0.3 and less than or equal to 0.8; m is _f An actual value representing a specific fuel consumption of the engine; SOC _ref A reference value representing a state of charge of the battery; soc (t) represents an actual value of the battery state of charge; pen represents a constant penalty factor; delta ₁ And delta ₂ The two weight factors are used for balancing the influence of the fuel consumption rate and the change of the state of charge of the battery on the fuel consumption of the vehicle; omega ₁ 、ω ₂ And ω ₃ The three constants are used for ensuring that the values of the return functions of all the adjusting layers are in the same order of magnitude.

Preferably, parameters and observed values influencing energy management in vehicle standard condition simulation driving are obtained, and the HEV energy management learning network based on the layered return TD3 is trained based on the parameters and observed values under the standard condition.

Preferably, after the trained HEV energy management learning network based on the hierarchical reward TD3 is obtained, the method further includes: acquiring parameters and observation values influencing energy management in actual running of the vehicle, verifying the trained HEV energy management learning network based on the layered return TD3 based on the parameters and the observation values under actual running, and executing an energy management strategy through the trained and verified HEV energy management learning network based on the layered return TD 3.

In another aspect, the present invention provides a HEV energy management system based on a hierarchical reward TD3, including: a processor and a memory; the memory stores a control program that, when executed by the processor, is configured to implement the HEV energy management method based on the hierarchical reward TD3 described above.

One or more technical schemes provided by the invention at least have the following technical effects or advantages:

in the invention, a state space signal and an action space signal of the HEV energy management strategy are selected by combining a parallel hybrid vehicle model, a layered return structure is adopted, the layered return structure comprises two return functions, and four adjusting layers are counted, so that the control strategy can be adjusted in a pertinence manner according to different driving states of the vehicle, unnecessary repeated exploration behaviors are reduced, and the overall performance of the energy management strategy is improved. The invention uses the hierarchical rewarding double-delay depth certainty strategy gradient algorithm, not only solves the dimension disaster problem of the discrete action space depth reinforcement learning energy management strategy and the depth certainty strategy gradient overestimation problem, but also can improve the optimality of the energy management strategy because the hierarchical rewarding structure with four adjusting layers can pertinently adjust the control strategy according to the difference of the working condition and the vehicle running mode.

Drawings

FIG. 1 is a schematic diagram of a hybrid electric vehicle according to an HEV energy management method based on a layered reward TD3 according to embodiment 1 of the present invention;

fig. 2 is a schematic diagram of a layered reward structure in an HEV energy management method based on the layered reward TD3 according to embodiment 1 of the present invention;

fig. 3 is a diagram of a deep reinforcement learning TD3 proxy basic architecture in an HEV energy management method based on a hierarchical reward TD3 according to embodiment 1 of the present invention;

FIG. 4 is a standard condition speed variation curve;

fig. 5 is an actual road speed change curve.

Detailed Description

In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.

Example 1:

embodiment 1 provides an HEV energy management method based on a hierarchical reward TD3, including the steps of:

step 1: and establishing a parallel hybrid vehicle model.

Specifically, a parallel hybrid vehicle model can be established through MATLAB/Simulink.

In the established parallel hybrid vehicle model, an engine and a motor are connected in parallel, and the engine can be combined with or separated from wheels through a clutch. The vehicle operates mainly in an electric-only mode, a neutral mode and a parallel mode. The above three vehicle operating modes depend on the state of the clutch and the gear, and are shown in FIG. 1.

The traction required for the vehicle to travel must be provided by the vehicle powertrain and can be calculated from the vehicle dynamics equation as shown in equation (1):

wherein, F _t Is the driving force for vehicle running, F _f Is the rolling resistance of the automobile in running F _i Is the slope resistance of the vehicle running, F _ω Is the air resistance of the automobile in running, F _j Is the acceleration resistance of the automobile, m is the mass of the automobile, g is the gravity acceleration, f is the rolling resistance coefficient, alpha is the gradient of the automobile road, rho is the air density, A is the windward area of the automobile, C is the speed of the automobile _D The air resistance coefficient, v is the vehicle running speed, δ is the rotating mass conversion coefficient, and a is the vehicle running acceleration.

Step 2: and selecting a state space signal and an action space signal of the HEV energy management strategy by combining the parallel hybrid vehicle model.

Hybrid vehicle energy management strategies aim to reduce fuel consumption and maintain battery SOC within a reasonable interval. Selecting a state space signal of the HEV energy management strategy as S ═ v, SOC, m according to the control target _f Cs). Where v represents the vehicle running speed, SOC represents the battery state of charge, and m _f Is the engine specific fuel consumption, cs is a boolean value 0 (clutch open) or 1 (clutch closed) indicating the state of the vehicle clutch. The motion space signal is selected as the engine output torque T _eng ，A＝(T _eng |T _eng ∈[-250，841]) And acquiring corresponding state information through the sensor.

And step 3: a layered reporting structure is designed and a reporting function is formulated.

The hierarchical reward structure is very important for the TD3 energy management policy. A well-designed return structure not only can fully utilize environment feedback information, but also can reduce unnecessary repeated exploration behaviors, so that an agent can interact with the environment more quickly and deeply, the learning process is accelerated, and the overall performance of an energy management strategy is improved.

The invention combines the parallel hybrid vehicle model to construct a layered return function; the layered return function comprises a first return function and a second return function, the first return function or the second return function is activated according to an activation condition, and the first return function and the second return function are respectively divided into two different adjusting layers according to the range of the state of charge of the battery.

Specifically, as described in step 1, the vehicle operates primarily in electric-only mode, neutral mode, and parallel mode. When the vehicle runs in an electric-only mode or a neutral mode, an engine of the vehicle is switched off (electric mode) or the engine is connected with wheels through a clutch, but the rotating speed can be changed at will (neutral mode), the main energy consumption of the vehicle is in a battery, and therefore a return function R based on the SOC of the battery is designed _soc This means that keeping the SOC value in a reasonable interval is the most important objective. Accordingly, when the vehicle is operating in parallel mode, the engineThe motor provides the power required by the vehicle running at the same time, and a comprehensive return function R is designed _com The two goals of coordinating fuel consumption and maintaining battery SOC within a reasonable range are to achieve minimum energy consumption. The two return functions are divided into two different regulation layers according to the SOC range. The structure of the layered reward function is shown in FIG. 2, and the layered reward structure is based on the clutch state and the instantaneous vehicle speed V _spd Activation of R _soc Or R _com 。

Specifically, referring to fig. 2, the activation manner of the layered reward function is as follows: judging the state of the clutch, activating a first report-back function if the clutch is disconnected, and judging the instantaneous vehicle speed if the clutch is closed; and if the instantaneous vehicle speed is zero, activating a first return function, and if the instantaneous vehicle speed is not zero, activating a second return function.

The first report-back function R _soc Expressed as:

the second return function R _com Expressed as:

wherein, the first adjusting layer L1 corresponds to SOC (t) > 0.8 or SOC (t) < 0.3; the second adjusting layer L2 corresponds to SOC (t) of more than or equal to 0.3 and less than or equal to 0.8; m is _f An actual value representing a specific fuel consumption of the engine; SOC _ref A reference value representing a state of charge of the battery; soc (t) represents an actual value of the battery state of charge; pen represents a constant penalty factor; delta ₁ And delta ₂ The two weight factors are used for balancing the influence of the fuel consumption rate and the change of the state of charge of the battery on the fuel consumption of the vehicle, and the larger the two values are, the more the energy management strategy pays more attention to the protection of the battery; omega ₁ 、ω ₂ And ω ₃ The three constants are used for ensuring that the values of the return functions of all the adjusting layers are in the same order of magnitude.

Namely, the invention creatively designs a reasonable and efficient layered reporting structure, and the HEV energy management strategy of the layered reporting TD3 can make an action A capable of maximizing the reporting function R according to the received state signal S, so that the vehicle is controlled to save energy and stably and efficiently run.

And 4, step 4: and constructing an HEV energy management learning network based on the layered return TD3 based on the state space signal, the action space signal and the layered return function.

A Critic network and an Actor network are respectively built by utilizing the principle of a deep neural network, a basic network framework of a double-delay depth deterministic strategy gradient strategy, namely the Actor-Critic network, is built together, a deep reinforcement learning TD3 agent basic framework diagram is shown in figure 3, so that an HEV energy management learning network based on a layered return TD3 is built, and initialization and state data normalization processing are carried out on parameters of the Actor-Critic network. The HEV energy management strategy implementation details of the hierarchical reward TD3 are shown in Table 1.

TABLE 1 hierarchical reward TD3 algorithm execution steps

And 5: obtaining parameters and observed values influencing energy management in vehicle standard working condition simulation driving, and training the HEV energy management learning network based on the layered return TD3 based on the parameters and observed values under the standard working condition.

Parameters and observed values influencing energy management in the simulated driving of the automobile under standard working conditions are obtained, and a trained deep reinforcement learning agent is obtained by combining a HEV energy management strategy target training learning network with layered return TD 3.

The learning network may be trained using three typical standard conditions, but is not limited thereto. The speed parameters for the three conditions are shown in FIG. 4, and the characteristics for each condition are shown in Table 2.

TABLE 1 Standard working conditions

Step 6: acquiring parameters and observation values influencing energy management in actual running of the vehicle, verifying the trained HEV energy management learning network based on the layered return TD3 based on the parameters and the observation values under actual running, and executing an energy management strategy through the trained and verified HEV energy management learning network based on the layered return TD 3.

For example, the driving data of a real vehicle in the wuhan city area is collected, the actual road conditions are manufactured and imported into a driver model, the trained hierarchical return TD3 energy management strategy is verified, the optimization performance of the energy management strategy is tested, and the actual road speed parameter is shown in fig. 5.

Example 2:

embodiment 2 provides an HEV energy management system based on a hierarchical reward TD3, including: a processor and a memory; the memory stores a control program that, when executed by the processor, is operable to implement the HEV energy management method according to embodiment 1 based on the hierarchical reward TD 3.

The HEV energy management method and the system based on the layered return TD3 provided by the embodiment of the invention at least have the following technical effects:

(1) the invention accurately designs a layered return structure which has two return functions and four adjusting layers in total, can adjust the control strategy according to different driving states of the vehicle, reduces unnecessary repeated exploration behaviors, ensures the comprehensive adjustment of the return functions aiming at different driving modes, avoids the waste of vehicle-mounted computing resources, enables an agent to interact with the environment more quickly and deeply, accelerates the learning speed of a deep reinforcement learning intelligent agent and can improve the overall performance of an energy management strategy.

(2) The invention not only collects the energy consumption index m _f And the battery SOC, and also collects the vehicle dynamic index vehicle speed v and the clutch state cs as deep reinforcement learning state space signals. Two return functions are switched according to the vehicle speed and the clutch state, and different driving modes of the vehicle can be selectedAn accurate and efficient reward function is selected. The invention can not only ensure the fuel economy to be optimal in the driving process of the vehicle, but also ensure the battery to work in a proper SOC (state of charge) interval, prevent the battery from being damaged by overcharge or overdischarge and prolong the service life of the battery.

(3) The method adopts a hierarchical return double-delay depth certainty strategy gradient energy management strategy, not only can make up the problem of dimension disaster of a discrete action space depth reinforcement learning energy management strategy, but also can solve the problems of depth certainty strategy gradient overestimation and unstable training.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention has been described in detail with reference to examples, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.

Claims

1. A HEV energy management method based on a layered reward TD3, characterized by comprising the following steps:

establishing a parallel hybrid vehicle model;

combining the parallel hybrid vehicle model to construct a layered return function; the layered return function comprises a first return function and a second return function, the first return function or the second return function is activated according to an activation condition, and the first return function and the second return function are respectively divided into two different adjusting layers according to the range of the state of charge of the battery;

2. The layered reward TD 3-based HEV energy management method according to claim 1, wherein the operating modes of the vehicle in the parallel hybrid vehicle model include an electric-only mode, a neutral mode, and a parallel mode.

3. The layered reward TD 3-based HEV energy management method according to claim 1, wherein the state space signal is S ═ v (SOC, m) _f Cs), the motion space signal is a ═ T (T) _eng |T _eng ∈[-250，841]) (ii) a Where v represents the traveling speed of the vehicle, SOC represents the battery state of charge, and m represents _f Representing the fuel consumption rate of the engine; cs represents the state of the vehicle clutch, cs is 0 represents that the clutch is disconnected, and cs is 1 represents that the clutch is closed; t is _eng Representing engine output torque.

4. The HEV energy management method based on the hierarchical reward TD3 of claim 1, wherein the activation manner of the hierarchical reward function is: judging the state of the clutch, activating the first report-back function if the clutch is disconnected, and judging the instantaneous vehicle speed if the clutch is closed; if the instantaneous vehicle speed is zero, activating the first reward function, and if the instantaneous vehicle speed is not zero, activating the second reward function;

the first report-back function R _soc Expressed as:

the second return function R _com Expressed as:

5. The HEV energy management method based on the hierarchical reward TD3 according to claim 1, wherein parameters and observations affecting energy management during vehicle standard operating condition simulation driving are obtained, and the HEV energy management learning network based on the hierarchical reward TD3 is trained based on the parameters and observations under the standard operating condition.

6. The HEV energy management method based on the hierarchical reward TD3 of claim 1, wherein after obtaining the trained HEV energy management learning network based on the hierarchical reward TD3, the method further comprises: acquiring parameters and observation values influencing energy management in actual running of the vehicle, verifying the trained HEV energy management learning network based on the layered return TD3 based on the parameters and the observation values under actual running, and executing an energy management strategy through the trained and verified HEV energy management learning network based on the layered return TD 3.

7. An HEV energy management system based on a hierarchical reward TD3, comprising: a processor and a memory; the memory stores a control program that when executed by the processor is operable to implement a HEV energy management method based on a tiered return TD3 according to any one of claims 1-6.