CN114852043A - HEV energy management method and system based on layered return TD3 - Google Patents

HEV energy management method and system based on layered return TD3 Download PDF

Info

Publication number
CN114852043A
CN114852043A CN202210298825.0A CN202210298825A CN114852043A CN 114852043 A CN114852043 A CN 114852043A CN 202210298825 A CN202210298825 A CN 202210298825A CN 114852043 A CN114852043 A CN 114852043A
Authority
CN
China
Prior art keywords
energy management
return
hev
layered
reward
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210298825.0A
Other languages
Chinese (zh)
Other versions
CN114852043B (en
Inventor
颜伏伍
王金海
杜常清
彭辅明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Technology WUT
Original Assignee
Wuhan University of Technology WUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Technology WUT filed Critical Wuhan University of Technology WUT
Priority to CN202210298825.0A priority Critical patent/CN114852043B/en
Publication of CN114852043A publication Critical patent/CN114852043A/en
Application granted granted Critical
Publication of CN114852043B publication Critical patent/CN114852043B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W20/00Control systems specially adapted for hybrid vehicles
    • B60W20/10Controlling the power contribution of each of the prime movers to meet required power demand
    • B60W20/11Controlling the power contribution of each of the prime movers to meet required power demand using model predictive control [MPC] strategies, i.e. control methods based on models predicting performance
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W20/00Control systems specially adapted for hybrid vehicles
    • B60W20/10Controlling the power contribution of each of the prime movers to meet required power demand
    • B60W20/15Control strategies specially adapted for achieving a particular effect
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W50/06Improving the dynamic response of the control system, e.g. improving the speed of regulation or avoiding hunting or overshoot
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W2050/0062Adapting control system settings
    • B60W2050/0075Automatic parameter input, automatic initialising or calibrating means
    • B60W2050/0095Automatic control mode change
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W50/06Improving the dynamic response of the control system, e.g. improving the speed of regulation or avoiding hunting or overshoot
    • B60W2050/065Improving the dynamic response of the control system, e.g. improving the speed of regulation or avoiding hunting or overshoot by reducing the computational load on the digital processor of the control computer
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2510/00Input parameters relating to a particular sub-units
    • B60W2510/02Clutches
    • B60W2510/0208Clutch engagement state, e.g. engaged or disengaged
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2510/00Input parameters relating to a particular sub-units
    • B60W2510/06Combustion engines, Gas turbines
    • B60W2510/0657Engine torque
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2510/00Input parameters relating to a particular sub-units
    • B60W2510/24Energy storage means
    • B60W2510/242Energy storage means for electrical energy
    • B60W2510/244Charge state
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/80Technologies aiming to reduce greenhouse gasses emissions common to all road transportation technologies
    • Y02T10/84Data processing systems or methods, management, administration

Landscapes

  • Engineering & Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Transportation (AREA)
  • Mechanical Engineering (AREA)
  • Human Computer Interaction (AREA)
  • Electric Propulsion And Braking For Vehicles (AREA)
  • Hybrid Electric Vehicles (AREA)

Abstract

The invention belongs to the technical field of hybrid electric vehicle energy management, and discloses a HEV energy management method and system based on layered return TD 3. The invention combines a parallel hybrid vehicle model to select a state space signal and an action space signal of the HEV energy management strategy, adopts a layered return structure, and the layered return structure comprises two return functions and four adjusting layers in total, can pertinently adjust the control strategy according to different driving states of the vehicle, reduces unnecessary repeated exploration behaviors, and improves the overall performance of the energy management strategy.

Description

HEV energy management method and system based on layered return TD3
Technical Field
The invention belongs to the technical field of hybrid electric vehicle energy management, and particularly relates to a HEV energy management method and system based on layered return TD 3.
Background
Among all environment-friendly vehicles, a Hybrid Electric Vehicle (HEV) enjoys a longer travel distance than a pure electric vehicle, and has lower fuel consumption and is more environment-friendly than a conventional fuel-powered vehicle. However, hybrid vehicles have an energy management system that is far more complex than conventional fuel-powered and electric vehicles. Therefore, an Energy Management Strategy (EMS) of the hybrid vehicle has been a research focus in the automobile field.
Existing hybrid vehicle energy management strategies can be divided into three main categories: rule-based strategies, optimization-based strategies, and learning-based strategies. While rule-based energy management strategies are easy to implement, it is difficult to develop reasonable rules for very complex operating conditions. The optimization-based energy management policies include a global optimization policy and a real-time optimization policy. Typical global optimization strategy algorithms are computationally expensive, are usually performed offline, and are often used as benchmarks for evaluating the effectiveness of other online EMSs. The optimization efficiency of a real-time optimization strategy such as the Pontryagin minimum principle is good, but a common state (co-state) is difficult to obtain, and the calculation amount is relatively large. Real-time optimization strategies such as an equivalent fuel consumption minimization strategy have good real-time characteristics, but historical road information used for calculating equivalent fuel consumption often cannot represent future driving conditions, so that the robustness of the algorithm is poor. The key to the success of real-time optimization strategies such as model prediction control is the rapid prediction and rapid optimization strategies, which require predicting road conditions in advance, which depends on models with superior performance to a great extent. Compared with the traditional rule-based strategy, the energy management strategy based on the reinforcement learning algorithm Q-learning can greatly improve the fuel economy performance of the vehicle, but has the problem of 'dimension disaster'. Deep Deterministic Policy Gradient (DDPG) strategies, while they can be trained in an environment with continuous or discrete state space and continuous action space, suffer from incremental bias and suboptimal strategies due to overestimation of the cost function. Although a double delay depth Deterministic Policy Gradient (TD 3) strategy can make up for the problem of over-estimation of the DDPG strategy, the existing TD 3-based energy management method cannot specifically adjust the control strategy according to different driving states of the vehicle, and the overall performance of the energy management strategy needs to be further improved.
Disclosure of Invention
The invention provides a HEV energy management method and system based on layered return TD3, and solves the problem that in the prior art, an energy management scheme based on TD3 cannot pertinently adjust a control strategy according to different driving states of a vehicle.
The invention provides an HEV energy management method based on a layered reward TD3, which comprises the following steps:
establishing a parallel hybrid vehicle model;
selecting a state space signal and an action space signal of an HEV energy management strategy by combining the parallel hybrid vehicle model;
combining the parallel hybrid vehicle model to construct a hierarchical return function; the layered return function comprises a first return function and a second return function, the first return function or the second return function is activated according to an activation condition, and the first return function and the second return function are respectively divided into two different adjusting layers according to the range of the state of charge of the battery;
constructing an HEV energy management learning network based on the layered return TD3 based on the state space signal, the action space signal and the layered return function;
the HEV energy management learning network based on the hierarchical return TD3 is trained, and an energy management strategy is executed through the trained HEV energy management learning network based on the hierarchical return TD 3.
Preferably, the operating modes of the vehicle in the parallel hybrid vehicle model include an electric-only mode, a neutral mode and a parallel mode.
Preferably, the state space signal is S ═ (v, SOC, m) f Cs), the motion space signal is a ═ T (T) eng |T eng ∈[-250,841]) (ii) a Where v represents the traveling speed of the vehicle, SOC represents the battery state of charge, and m represents f Representing the fuel consumption rate of the engine; cs represents the state of the vehicle clutch, cs is 0 represents that the clutch is disconnected, and cs is 1 represents that the clutch is closed; t is eng Indicating engine outputAnd (6) outputting the torque.
Preferably, the activation mode of the layered reward function is as follows: judging the state of the clutch, activating the first report-back function if the clutch is disconnected, and judging the instantaneous vehicle speed if the clutch is closed; if the instantaneous vehicle speed is zero, activating the first reward function, and if the instantaneous vehicle speed is not zero, activating the second reward function;
the first report-back function R soc Expressed as:
Figure BDA0003560173200000021
the second return function R com Expressed as:
Figure BDA0003560173200000022
wherein, the first adjusting layer L1 corresponds to SOC (t) > 0.8 or SOC (t) < 0.3; the second adjusting layer L2 corresponds to SOC (t) of more than or equal to 0.3 and less than or equal to 0.8; m is f An actual value representing a specific fuel consumption of the engine; SOC ref A reference value representing a state of charge of the battery; soc (t) represents an actual value of the battery state of charge; pen represents a constant penalty factor; delta 1 And delta 2 The two weight factors are used for balancing the influence of the fuel consumption rate and the change of the state of charge of the battery on the fuel consumption of the vehicle; omega 1 、ω 2 And ω 3 The three constants are used for ensuring that the values of the return functions of all the adjusting layers are in the same order of magnitude.
Preferably, parameters and observed values influencing energy management in vehicle standard condition simulation driving are obtained, and the HEV energy management learning network based on the layered return TD3 is trained based on the parameters and observed values under the standard condition.
Preferably, after the trained HEV energy management learning network based on the hierarchical reward TD3 is obtained, the method further includes: acquiring parameters and observation values influencing energy management in actual running of the vehicle, verifying the trained HEV energy management learning network based on the layered return TD3 based on the parameters and the observation values under actual running, and executing an energy management strategy through the trained and verified HEV energy management learning network based on the layered return TD 3.
In another aspect, the present invention provides a HEV energy management system based on a hierarchical reward TD3, including: a processor and a memory; the memory stores a control program that, when executed by the processor, is configured to implement the HEV energy management method based on the hierarchical reward TD3 described above.
One or more technical schemes provided by the invention at least have the following technical effects or advantages:
in the invention, a state space signal and an action space signal of the HEV energy management strategy are selected by combining a parallel hybrid vehicle model, a layered return structure is adopted, the layered return structure comprises two return functions, and four adjusting layers are counted, so that the control strategy can be adjusted in a pertinence manner according to different driving states of the vehicle, unnecessary repeated exploration behaviors are reduced, and the overall performance of the energy management strategy is improved. The invention uses the hierarchical rewarding double-delay depth certainty strategy gradient algorithm, not only solves the dimension disaster problem of the discrete action space depth reinforcement learning energy management strategy and the depth certainty strategy gradient overestimation problem, but also can improve the optimality of the energy management strategy because the hierarchical rewarding structure with four adjusting layers can pertinently adjust the control strategy according to the difference of the working condition and the vehicle running mode.
Drawings
FIG. 1 is a schematic diagram of a hybrid electric vehicle according to an HEV energy management method based on a layered reward TD3 according to embodiment 1 of the present invention;
fig. 2 is a schematic diagram of a layered reward structure in an HEV energy management method based on the layered reward TD3 according to embodiment 1 of the present invention;
fig. 3 is a diagram of a deep reinforcement learning TD3 proxy basic architecture in an HEV energy management method based on a hierarchical reward TD3 according to embodiment 1 of the present invention;
FIG. 4 is a standard condition speed variation curve;
fig. 5 is an actual road speed change curve.
Detailed Description
In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.
Example 1:
embodiment 1 provides an HEV energy management method based on a hierarchical reward TD3, including the steps of:
step 1: and establishing a parallel hybrid vehicle model.
Specifically, a parallel hybrid vehicle model can be established through MATLAB/Simulink.
In the established parallel hybrid vehicle model, an engine and a motor are connected in parallel, and the engine can be combined with or separated from wheels through a clutch. The vehicle operates mainly in an electric-only mode, a neutral mode and a parallel mode. The above three vehicle operating modes depend on the state of the clutch and the gear, and are shown in FIG. 1.
The traction required for the vehicle to travel must be provided by the vehicle powertrain and can be calculated from the vehicle dynamics equation as shown in equation (1):
Figure BDA0003560173200000041
wherein, F t Is the driving force for vehicle running, F f Is the rolling resistance of the automobile in running F i Is the slope resistance of the vehicle running, F ω Is the air resistance of the automobile in running, F j Is the acceleration resistance of the automobile, m is the mass of the automobile, g is the gravity acceleration, f is the rolling resistance coefficient, alpha is the gradient of the automobile road, rho is the air density, A is the windward area of the automobile, C is the speed of the automobile D The air resistance coefficient, v is the vehicle running speed, δ is the rotating mass conversion coefficient, and a is the vehicle running acceleration.
Step 2: and selecting a state space signal and an action space signal of the HEV energy management strategy by combining the parallel hybrid vehicle model.
Hybrid vehicle energy management strategies aim to reduce fuel consumption and maintain battery SOC within a reasonable interval. Selecting a state space signal of the HEV energy management strategy as S ═ v, SOC, m according to the control target f Cs). Where v represents the vehicle running speed, SOC represents the battery state of charge, and m f Is the engine specific fuel consumption, cs is a boolean value 0 (clutch open) or 1 (clutch closed) indicating the state of the vehicle clutch. The motion space signal is selected as the engine output torque T eng ,A=(T eng |T eng ∈[-250,841]) And acquiring corresponding state information through the sensor.
And step 3: a layered reporting structure is designed and a reporting function is formulated.
The hierarchical reward structure is very important for the TD3 energy management policy. A well-designed return structure not only can fully utilize environment feedback information, but also can reduce unnecessary repeated exploration behaviors, so that an agent can interact with the environment more quickly and deeply, the learning process is accelerated, and the overall performance of an energy management strategy is improved.
The invention combines the parallel hybrid vehicle model to construct a layered return function; the layered return function comprises a first return function and a second return function, the first return function or the second return function is activated according to an activation condition, and the first return function and the second return function are respectively divided into two different adjusting layers according to the range of the state of charge of the battery.
Specifically, as described in step 1, the vehicle operates primarily in electric-only mode, neutral mode, and parallel mode. When the vehicle runs in an electric-only mode or a neutral mode, an engine of the vehicle is switched off (electric mode) or the engine is connected with wheels through a clutch, but the rotating speed can be changed at will (neutral mode), the main energy consumption of the vehicle is in a battery, and therefore a return function R based on the SOC of the battery is designed soc This means that keeping the SOC value in a reasonable interval is the most important objective. Accordingly, when the vehicle is operating in parallel mode, the engineThe motor provides the power required by the vehicle running at the same time, and a comprehensive return function R is designed com The two goals of coordinating fuel consumption and maintaining battery SOC within a reasonable range are to achieve minimum energy consumption. The two return functions are divided into two different regulation layers according to the SOC range. The structure of the layered reward function is shown in FIG. 2, and the layered reward structure is based on the clutch state and the instantaneous vehicle speed V spd Activation of R soc Or R com
Specifically, referring to fig. 2, the activation manner of the layered reward function is as follows: judging the state of the clutch, activating a first report-back function if the clutch is disconnected, and judging the instantaneous vehicle speed if the clutch is closed; and if the instantaneous vehicle speed is zero, activating a first return function, and if the instantaneous vehicle speed is not zero, activating a second return function.
The first report-back function R soc Expressed as:
Figure BDA0003560173200000051
the second return function R com Expressed as:
Figure BDA0003560173200000061
wherein, the first adjusting layer L1 corresponds to SOC (t) > 0.8 or SOC (t) < 0.3; the second adjusting layer L2 corresponds to SOC (t) of more than or equal to 0.3 and less than or equal to 0.8; m is f An actual value representing a specific fuel consumption of the engine; SOC ref A reference value representing a state of charge of the battery; soc (t) represents an actual value of the battery state of charge; pen represents a constant penalty factor; delta 1 And delta 2 The two weight factors are used for balancing the influence of the fuel consumption rate and the change of the state of charge of the battery on the fuel consumption of the vehicle, and the larger the two values are, the more the energy management strategy pays more attention to the protection of the battery; omega 1 、ω 2 And ω 3 The three constants are used for ensuring that the values of the return functions of all the adjusting layers are in the same order of magnitude.
Namely, the invention creatively designs a reasonable and efficient layered reporting structure, and the HEV energy management strategy of the layered reporting TD3 can make an action A capable of maximizing the reporting function R according to the received state signal S, so that the vehicle is controlled to save energy and stably and efficiently run.
And 4, step 4: and constructing an HEV energy management learning network based on the layered return TD3 based on the state space signal, the action space signal and the layered return function.
A Critic network and an Actor network are respectively built by utilizing the principle of a deep neural network, a basic network framework of a double-delay depth deterministic strategy gradient strategy, namely the Actor-Critic network, is built together, a deep reinforcement learning TD3 agent basic framework diagram is shown in figure 3, so that an HEV energy management learning network based on a layered return TD3 is built, and initialization and state data normalization processing are carried out on parameters of the Actor-Critic network. The HEV energy management strategy implementation details of the hierarchical reward TD3 are shown in Table 1.
TABLE 1 hierarchical reward TD3 algorithm execution steps
Figure BDA0003560173200000071
And 5: obtaining parameters and observed values influencing energy management in vehicle standard working condition simulation driving, and training the HEV energy management learning network based on the layered return TD3 based on the parameters and observed values under the standard working condition.
Parameters and observed values influencing energy management in the simulated driving of the automobile under standard working conditions are obtained, and a trained deep reinforcement learning agent is obtained by combining a HEV energy management strategy target training learning network with layered return TD 3.
The learning network may be trained using three typical standard conditions, but is not limited thereto. The speed parameters for the three conditions are shown in FIG. 4, and the characteristics for each condition are shown in Table 2.
TABLE 1 Standard working conditions
Figure BDA0003560173200000081
Step 6: acquiring parameters and observation values influencing energy management in actual running of the vehicle, verifying the trained HEV energy management learning network based on the layered return TD3 based on the parameters and the observation values under actual running, and executing an energy management strategy through the trained and verified HEV energy management learning network based on the layered return TD 3.
For example, the driving data of a real vehicle in the wuhan city area is collected, the actual road conditions are manufactured and imported into a driver model, the trained hierarchical return TD3 energy management strategy is verified, the optimization performance of the energy management strategy is tested, and the actual road speed parameter is shown in fig. 5.
Example 2:
embodiment 2 provides an HEV energy management system based on a hierarchical reward TD3, including: a processor and a memory; the memory stores a control program that, when executed by the processor, is operable to implement the HEV energy management method according to embodiment 1 based on the hierarchical reward TD 3.
The HEV energy management method and the system based on the layered return TD3 provided by the embodiment of the invention at least have the following technical effects:
(1) the invention accurately designs a layered return structure which has two return functions and four adjusting layers in total, can adjust the control strategy according to different driving states of the vehicle, reduces unnecessary repeated exploration behaviors, ensures the comprehensive adjustment of the return functions aiming at different driving modes, avoids the waste of vehicle-mounted computing resources, enables an agent to interact with the environment more quickly and deeply, accelerates the learning speed of a deep reinforcement learning intelligent agent and can improve the overall performance of an energy management strategy.
(2) The invention not only collects the energy consumption index m f And the battery SOC, and also collects the vehicle dynamic index vehicle speed v and the clutch state cs as deep reinforcement learning state space signals. Two return functions are switched according to the vehicle speed and the clutch state, and different driving modes of the vehicle can be selectedAn accurate and efficient reward function is selected. The invention can not only ensure the fuel economy to be optimal in the driving process of the vehicle, but also ensure the battery to work in a proper SOC (state of charge) interval, prevent the battery from being damaged by overcharge or overdischarge and prolong the service life of the battery.
(3) The method adopts a hierarchical return double-delay depth certainty strategy gradient energy management strategy, not only can make up the problem of dimension disaster of a discrete action space depth reinforcement learning energy management strategy, but also can solve the problems of depth certainty strategy gradient overestimation and unstable training.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention has been described in detail with reference to examples, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.

Claims (7)

1. A HEV energy management method based on a layered reward TD3, characterized by comprising the following steps:
establishing a parallel hybrid vehicle model;
selecting a state space signal and an action space signal of an HEV energy management strategy by combining the parallel hybrid vehicle model;
combining the parallel hybrid vehicle model to construct a layered return function; the layered return function comprises a first return function and a second return function, the first return function or the second return function is activated according to an activation condition, and the first return function and the second return function are respectively divided into two different adjusting layers according to the range of the state of charge of the battery;
constructing an HEV energy management learning network based on the layered return TD3 based on the state space signal, the action space signal and the layered return function;
the HEV energy management learning network based on the hierarchical return TD3 is trained, and an energy management strategy is executed through the trained HEV energy management learning network based on the hierarchical return TD 3.
2. The layered reward TD 3-based HEV energy management method according to claim 1, wherein the operating modes of the vehicle in the parallel hybrid vehicle model include an electric-only mode, a neutral mode, and a parallel mode.
3. The layered reward TD 3-based HEV energy management method according to claim 1, wherein the state space signal is S ═ v (SOC, m) f Cs), the motion space signal is a ═ T (T) eng |T eng ∈[-250,841]) (ii) a Where v represents the traveling speed of the vehicle, SOC represents the battery state of charge, and m represents f Representing the fuel consumption rate of the engine; cs represents the state of the vehicle clutch, cs is 0 represents that the clutch is disconnected, and cs is 1 represents that the clutch is closed; t is eng Representing engine output torque.
4. The HEV energy management method based on the hierarchical reward TD3 of claim 1, wherein the activation manner of the hierarchical reward function is: judging the state of the clutch, activating the first report-back function if the clutch is disconnected, and judging the instantaneous vehicle speed if the clutch is closed; if the instantaneous vehicle speed is zero, activating the first reward function, and if the instantaneous vehicle speed is not zero, activating the second reward function;
the first report-back function R soc Expressed as:
Figure FDA0003560173190000011
the second return function R com Expressed as:
Figure FDA0003560173190000021
wherein, the first adjusting layer L1 corresponds to SOC (t) > 0.8 or SOC (t) < 0.3; the second adjusting layer L2 corresponds to SOC (t) of more than or equal to 0.3 and less than or equal to 0.8; m is f An actual value representing a specific fuel consumption of the engine; SOC ref A reference value representing a state of charge of the battery; soc (t) represents an actual value of the battery state of charge; pen represents a constant penalty factor; delta 1 And delta 2 The two weight factors are used for balancing the influence of the fuel consumption rate and the change of the state of charge of the battery on the fuel consumption of the vehicle; omega 1 、ω 2 And ω 3 The three constants are used for ensuring that the values of the return functions of all the adjusting layers are in the same order of magnitude.
5. The HEV energy management method based on the hierarchical reward TD3 according to claim 1, wherein parameters and observations affecting energy management during vehicle standard operating condition simulation driving are obtained, and the HEV energy management learning network based on the hierarchical reward TD3 is trained based on the parameters and observations under the standard operating condition.
6. The HEV energy management method based on the hierarchical reward TD3 of claim 1, wherein after obtaining the trained HEV energy management learning network based on the hierarchical reward TD3, the method further comprises: acquiring parameters and observation values influencing energy management in actual running of the vehicle, verifying the trained HEV energy management learning network based on the layered return TD3 based on the parameters and the observation values under actual running, and executing an energy management strategy through the trained and verified HEV energy management learning network based on the layered return TD 3.
7. An HEV energy management system based on a hierarchical reward TD3, comprising: a processor and a memory; the memory stores a control program that when executed by the processor is operable to implement a HEV energy management method based on a tiered return TD3 according to any one of claims 1-6.
CN202210298825.0A 2022-03-23 2022-03-23 HEV energy management method and system based on layered return TD3 Active CN114852043B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210298825.0A CN114852043B (en) 2022-03-23 2022-03-23 HEV energy management method and system based on layered return TD3

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210298825.0A CN114852043B (en) 2022-03-23 2022-03-23 HEV energy management method and system based on layered return TD3

Publications (2)

Publication Number Publication Date
CN114852043A true CN114852043A (en) 2022-08-05
CN114852043B CN114852043B (en) 2024-06-18

Family

ID=82629986

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210298825.0A Active CN114852043B (en) 2022-03-23 2022-03-23 HEV energy management method and system based on layered return TD3

Country Status (1)

Country Link
CN (1) CN114852043B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200065665A1 (en) * 2018-08-24 2020-02-27 Ford Global Technologies, Llc Vehicle adaptive learning
CN112440974A (en) * 2020-11-27 2021-03-05 武汉理工大学 HEV energy management method based on distributed depth certainty strategy gradient
CN112590774A (en) * 2020-12-22 2021-04-02 同济大学 Intelligent electric automobile drifting and warehousing control method based on deep reinforcement learning
US20210213933A1 (en) * 2018-06-15 2021-07-15 The Regents Of The University Of California Systems, apparatus and methods to improve plug-in hybrid electric vehicle energy performance by using v2c connectivity
CN113246958A (en) * 2021-06-11 2021-08-13 武汉理工大学 TD 3-based multi-target HEV energy management method and system
CN113501008A (en) * 2021-08-12 2021-10-15 东风悦享科技有限公司 Automatic driving behavior decision method based on reinforcement learning algorithm

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210213933A1 (en) * 2018-06-15 2021-07-15 The Regents Of The University Of California Systems, apparatus and methods to improve plug-in hybrid electric vehicle energy performance by using v2c connectivity
US20200065665A1 (en) * 2018-08-24 2020-02-27 Ford Global Technologies, Llc Vehicle adaptive learning
CN112440974A (en) * 2020-11-27 2021-03-05 武汉理工大学 HEV energy management method based on distributed depth certainty strategy gradient
CN112590774A (en) * 2020-12-22 2021-04-02 同济大学 Intelligent electric automobile drifting and warehousing control method based on deep reinforcement learning
CN113246958A (en) * 2021-06-11 2021-08-13 武汉理工大学 TD 3-based multi-target HEV energy management method and system
CN113501008A (en) * 2021-08-12 2021-10-15 东风悦享科技有限公司 Automatic driving behavior decision method based on reinforcement learning algorithm

Also Published As

Publication number Publication date
CN114852043B (en) 2024-06-18

Similar Documents

Publication Publication Date Title
CN110696815B (en) Prediction energy management method of network-connected hybrid electric vehicle
CN111731303B (en) HEV energy management method based on deep reinforcement learning A3C algorithm
WO2021103625A1 (en) Short-term vehicle speed condition real-time prediction method based on interaction between vehicle ahead and current vehicle
WO2021114742A1 (en) Comprehensive energy prediction and management method for hybrid electric vehicle
CN107688343B (en) Energy control method of hybrid power vehicle
CN111267831A (en) Hybrid vehicle intelligent time-domain-variable model prediction energy management method
CN112249002B (en) TD 3-based heuristic series-parallel hybrid power energy management method
CN110717218B (en) Electric drive vehicle distributed power drive system reconstruction control method and vehicle
CN112668799A (en) Intelligent energy management method and storage medium for PHEV (Power electric vehicle) based on big driving data
CN109733378A (en) Optimize the torque distribution method predicted on line under a kind of line
CN113479186B (en) Energy management strategy optimization method for hybrid electric vehicle
CN113635879B (en) Vehicle braking force distribution method
CN112590760B (en) Double-motor hybrid electric vehicle energy management system considering mode switching frequency
CN116070783B (en) Learning type energy management method of hybrid transmission system under commute section
CN112009456A (en) Energy management method for network-connected hybrid electric vehicle
CN112026744B (en) Series-parallel hybrid power system energy management method based on DQN variants
CN115534929A (en) Plug-in hybrid electric vehicle energy management method based on multi-information fusion
Huang et al. Real-time long horizon model predictive control of a plug-in hybrid vehicle power-split utilizing trip preview
CN115805840A (en) Energy consumption control method and system for range-extending type electric loader
CN113246958B (en) TD 3-based multi-target HEV energy management method and system
Wang et al. Hierarchical rewarding deep deterministic policy gradient strategy for energy management of hybrid electric vehicles
CN112440974B (en) HEV energy management method based on distributed depth certainty strategy gradient
CN116442799A (en) Control method and device for torque distribution of vehicle
CN114852043A (en) HEV energy management method and system based on layered return TD3
Fechert et al. Using deep reinforcement learning for hybrid electric vehicle energy management under consideration of dynamic emission models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant