CN114852043A - HEV energy management method and system based on layered return TD3 - Google Patents
HEV energy management method and system based on layered return TD3 Download PDFInfo
- Publication number
- CN114852043A CN114852043A CN202210298825.0A CN202210298825A CN114852043A CN 114852043 A CN114852043 A CN 114852043A CN 202210298825 A CN202210298825 A CN 202210298825A CN 114852043 A CN114852043 A CN 114852043A
- Authority
- CN
- China
- Prior art keywords
- energy management
- return
- hev
- layered
- reward
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000007726 management method Methods 0.000 title claims abstract description 87
- 230000006870 function Effects 0.000 claims abstract description 61
- 230000009471 action Effects 0.000 claims abstract description 12
- 239000000446 fuel Substances 0.000 claims description 19
- 230000003213 activating effect Effects 0.000 claims description 9
- 230000004913 activation Effects 0.000 claims description 7
- 230000007935 neutral effect Effects 0.000 claims description 6
- 238000000034 method Methods 0.000 claims description 5
- 230000008859 change Effects 0.000 claims description 4
- 238000004088 simulation Methods 0.000 claims description 3
- 238000011217 control strategy Methods 0.000 abstract description 6
- 230000021824 exploration behavior Effects 0.000 abstract description 4
- 238000005457 optimization Methods 0.000 description 11
- 230000002787 reinforcement Effects 0.000 description 8
- 238000004422 calculation algorithm Methods 0.000 description 5
- 239000003795 chemical substances by application Substances 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000001133 acceleration Effects 0.000 description 3
- 238000005265 energy consumption Methods 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000005096 rolling process Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000002580 esophageal motility study Methods 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W20/00—Control systems specially adapted for hybrid vehicles
- B60W20/10—Controlling the power contribution of each of the prime movers to meet required power demand
- B60W20/11—Controlling the power contribution of each of the prime movers to meet required power demand using model predictive control [MPC] strategies, i.e. control methods based on models predicting performance
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W20/00—Control systems specially adapted for hybrid vehicles
- B60W20/10—Controlling the power contribution of each of the prime movers to meet required power demand
- B60W20/15—Control strategies specially adapted for achieving a particular effect
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W50/00—Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W50/00—Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
- B60W50/06—Improving the dynamic response of the control system, e.g. improving the speed of regulation or avoiding hunting or overshoot
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W50/00—Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
- B60W2050/0062—Adapting control system settings
- B60W2050/0075—Automatic parameter input, automatic initialising or calibrating means
- B60W2050/0095—Automatic control mode change
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W50/00—Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
- B60W50/06—Improving the dynamic response of the control system, e.g. improving the speed of regulation or avoiding hunting or overshoot
- B60W2050/065—Improving the dynamic response of the control system, e.g. improving the speed of regulation or avoiding hunting or overshoot by reducing the computational load on the digital processor of the control computer
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2510/00—Input parameters relating to a particular sub-units
- B60W2510/02—Clutches
- B60W2510/0208—Clutch engagement state, e.g. engaged or disengaged
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2510/00—Input parameters relating to a particular sub-units
- B60W2510/06—Combustion engines, Gas turbines
- B60W2510/0657—Engine torque
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2510/00—Input parameters relating to a particular sub-units
- B60W2510/24—Energy storage means
- B60W2510/242—Energy storage means for electrical energy
- B60W2510/244—Charge state
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/80—Technologies aiming to reduce greenhouse gasses emissions common to all road transportation technologies
- Y02T10/84—Data processing systems or methods, management, administration
Landscapes
- Engineering & Computer Science (AREA)
- Automation & Control Theory (AREA)
- Transportation (AREA)
- Mechanical Engineering (AREA)
- Human Computer Interaction (AREA)
- Electric Propulsion And Braking For Vehicles (AREA)
- Hybrid Electric Vehicles (AREA)
Abstract
The invention belongs to the technical field of hybrid electric vehicle energy management, and discloses a HEV energy management method and system based on layered return TD 3. The invention combines a parallel hybrid vehicle model to select a state space signal and an action space signal of the HEV energy management strategy, adopts a layered return structure, and the layered return structure comprises two return functions and four adjusting layers in total, can pertinently adjust the control strategy according to different driving states of the vehicle, reduces unnecessary repeated exploration behaviors, and improves the overall performance of the energy management strategy.
Description
Technical Field
The invention belongs to the technical field of hybrid electric vehicle energy management, and particularly relates to a HEV energy management method and system based on layered return TD 3.
Background
Among all environment-friendly vehicles, a Hybrid Electric Vehicle (HEV) enjoys a longer travel distance than a pure electric vehicle, and has lower fuel consumption and is more environment-friendly than a conventional fuel-powered vehicle. However, hybrid vehicles have an energy management system that is far more complex than conventional fuel-powered and electric vehicles. Therefore, an Energy Management Strategy (EMS) of the hybrid vehicle has been a research focus in the automobile field.
Existing hybrid vehicle energy management strategies can be divided into three main categories: rule-based strategies, optimization-based strategies, and learning-based strategies. While rule-based energy management strategies are easy to implement, it is difficult to develop reasonable rules for very complex operating conditions. The optimization-based energy management policies include a global optimization policy and a real-time optimization policy. Typical global optimization strategy algorithms are computationally expensive, are usually performed offline, and are often used as benchmarks for evaluating the effectiveness of other online EMSs. The optimization efficiency of a real-time optimization strategy such as the Pontryagin minimum principle is good, but a common state (co-state) is difficult to obtain, and the calculation amount is relatively large. Real-time optimization strategies such as an equivalent fuel consumption minimization strategy have good real-time characteristics, but historical road information used for calculating equivalent fuel consumption often cannot represent future driving conditions, so that the robustness of the algorithm is poor. The key to the success of real-time optimization strategies such as model prediction control is the rapid prediction and rapid optimization strategies, which require predicting road conditions in advance, which depends on models with superior performance to a great extent. Compared with the traditional rule-based strategy, the energy management strategy based on the reinforcement learning algorithm Q-learning can greatly improve the fuel economy performance of the vehicle, but has the problem of 'dimension disaster'. Deep Deterministic Policy Gradient (DDPG) strategies, while they can be trained in an environment with continuous or discrete state space and continuous action space, suffer from incremental bias and suboptimal strategies due to overestimation of the cost function. Although a double delay depth Deterministic Policy Gradient (TD 3) strategy can make up for the problem of over-estimation of the DDPG strategy, the existing TD 3-based energy management method cannot specifically adjust the control strategy according to different driving states of the vehicle, and the overall performance of the energy management strategy needs to be further improved.
Disclosure of Invention
The invention provides a HEV energy management method and system based on layered return TD3, and solves the problem that in the prior art, an energy management scheme based on TD3 cannot pertinently adjust a control strategy according to different driving states of a vehicle.
The invention provides an HEV energy management method based on a layered reward TD3, which comprises the following steps:
establishing a parallel hybrid vehicle model;
selecting a state space signal and an action space signal of an HEV energy management strategy by combining the parallel hybrid vehicle model;
combining the parallel hybrid vehicle model to construct a hierarchical return function; the layered return function comprises a first return function and a second return function, the first return function or the second return function is activated according to an activation condition, and the first return function and the second return function are respectively divided into two different adjusting layers according to the range of the state of charge of the battery;
constructing an HEV energy management learning network based on the layered return TD3 based on the state space signal, the action space signal and the layered return function;
the HEV energy management learning network based on the hierarchical return TD3 is trained, and an energy management strategy is executed through the trained HEV energy management learning network based on the hierarchical return TD 3.
Preferably, the operating modes of the vehicle in the parallel hybrid vehicle model include an electric-only mode, a neutral mode and a parallel mode.
Preferably, the state space signal is S ═ (v, SOC, m) f Cs), the motion space signal is a ═ T (T) eng |T eng ∈[-250,841]) (ii) a Where v represents the traveling speed of the vehicle, SOC represents the battery state of charge, and m represents f Representing the fuel consumption rate of the engine; cs represents the state of the vehicle clutch, cs is 0 represents that the clutch is disconnected, and cs is 1 represents that the clutch is closed; t is eng Indicating engine outputAnd (6) outputting the torque.
Preferably, the activation mode of the layered reward function is as follows: judging the state of the clutch, activating the first report-back function if the clutch is disconnected, and judging the instantaneous vehicle speed if the clutch is closed; if the instantaneous vehicle speed is zero, activating the first reward function, and if the instantaneous vehicle speed is not zero, activating the second reward function;
the first report-back function R soc Expressed as:
the second return function R com Expressed as:
wherein, the first adjusting layer L1 corresponds to SOC (t) > 0.8 or SOC (t) < 0.3; the second adjusting layer L2 corresponds to SOC (t) of more than or equal to 0.3 and less than or equal to 0.8; m is f An actual value representing a specific fuel consumption of the engine; SOC ref A reference value representing a state of charge of the battery; soc (t) represents an actual value of the battery state of charge; pen represents a constant penalty factor; delta 1 And delta 2 The two weight factors are used for balancing the influence of the fuel consumption rate and the change of the state of charge of the battery on the fuel consumption of the vehicle; omega 1 、ω 2 And ω 3 The three constants are used for ensuring that the values of the return functions of all the adjusting layers are in the same order of magnitude.
Preferably, parameters and observed values influencing energy management in vehicle standard condition simulation driving are obtained, and the HEV energy management learning network based on the layered return TD3 is trained based on the parameters and observed values under the standard condition.
Preferably, after the trained HEV energy management learning network based on the hierarchical reward TD3 is obtained, the method further includes: acquiring parameters and observation values influencing energy management in actual running of the vehicle, verifying the trained HEV energy management learning network based on the layered return TD3 based on the parameters and the observation values under actual running, and executing an energy management strategy through the trained and verified HEV energy management learning network based on the layered return TD 3.
In another aspect, the present invention provides a HEV energy management system based on a hierarchical reward TD3, including: a processor and a memory; the memory stores a control program that, when executed by the processor, is configured to implement the HEV energy management method based on the hierarchical reward TD3 described above.
One or more technical schemes provided by the invention at least have the following technical effects or advantages:
in the invention, a state space signal and an action space signal of the HEV energy management strategy are selected by combining a parallel hybrid vehicle model, a layered return structure is adopted, the layered return structure comprises two return functions, and four adjusting layers are counted, so that the control strategy can be adjusted in a pertinence manner according to different driving states of the vehicle, unnecessary repeated exploration behaviors are reduced, and the overall performance of the energy management strategy is improved. The invention uses the hierarchical rewarding double-delay depth certainty strategy gradient algorithm, not only solves the dimension disaster problem of the discrete action space depth reinforcement learning energy management strategy and the depth certainty strategy gradient overestimation problem, but also can improve the optimality of the energy management strategy because the hierarchical rewarding structure with four adjusting layers can pertinently adjust the control strategy according to the difference of the working condition and the vehicle running mode.
Drawings
FIG. 1 is a schematic diagram of a hybrid electric vehicle according to an HEV energy management method based on a layered reward TD3 according to embodiment 1 of the present invention;
fig. 2 is a schematic diagram of a layered reward structure in an HEV energy management method based on the layered reward TD3 according to embodiment 1 of the present invention;
fig. 3 is a diagram of a deep reinforcement learning TD3 proxy basic architecture in an HEV energy management method based on a hierarchical reward TD3 according to embodiment 1 of the present invention;
FIG. 4 is a standard condition speed variation curve;
fig. 5 is an actual road speed change curve.
Detailed Description
In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.
Example 1:
step 1: and establishing a parallel hybrid vehicle model.
Specifically, a parallel hybrid vehicle model can be established through MATLAB/Simulink.
In the established parallel hybrid vehicle model, an engine and a motor are connected in parallel, and the engine can be combined with or separated from wheels through a clutch. The vehicle operates mainly in an electric-only mode, a neutral mode and a parallel mode. The above three vehicle operating modes depend on the state of the clutch and the gear, and are shown in FIG. 1.
The traction required for the vehicle to travel must be provided by the vehicle powertrain and can be calculated from the vehicle dynamics equation as shown in equation (1):
wherein, F t Is the driving force for vehicle running, F f Is the rolling resistance of the automobile in running F i Is the slope resistance of the vehicle running, F ω Is the air resistance of the automobile in running, F j Is the acceleration resistance of the automobile, m is the mass of the automobile, g is the gravity acceleration, f is the rolling resistance coefficient, alpha is the gradient of the automobile road, rho is the air density, A is the windward area of the automobile, C is the speed of the automobile D The air resistance coefficient, v is the vehicle running speed, δ is the rotating mass conversion coefficient, and a is the vehicle running acceleration.
Step 2: and selecting a state space signal and an action space signal of the HEV energy management strategy by combining the parallel hybrid vehicle model.
Hybrid vehicle energy management strategies aim to reduce fuel consumption and maintain battery SOC within a reasonable interval. Selecting a state space signal of the HEV energy management strategy as S ═ v, SOC, m according to the control target f Cs). Where v represents the vehicle running speed, SOC represents the battery state of charge, and m f Is the engine specific fuel consumption, cs is a boolean value 0 (clutch open) or 1 (clutch closed) indicating the state of the vehicle clutch. The motion space signal is selected as the engine output torque T eng ,A=(T eng |T eng ∈[-250,841]) And acquiring corresponding state information through the sensor.
And step 3: a layered reporting structure is designed and a reporting function is formulated.
The hierarchical reward structure is very important for the TD3 energy management policy. A well-designed return structure not only can fully utilize environment feedback information, but also can reduce unnecessary repeated exploration behaviors, so that an agent can interact with the environment more quickly and deeply, the learning process is accelerated, and the overall performance of an energy management strategy is improved.
The invention combines the parallel hybrid vehicle model to construct a layered return function; the layered return function comprises a first return function and a second return function, the first return function or the second return function is activated according to an activation condition, and the first return function and the second return function are respectively divided into two different adjusting layers according to the range of the state of charge of the battery.
Specifically, as described in step 1, the vehicle operates primarily in electric-only mode, neutral mode, and parallel mode. When the vehicle runs in an electric-only mode or a neutral mode, an engine of the vehicle is switched off (electric mode) or the engine is connected with wheels through a clutch, but the rotating speed can be changed at will (neutral mode), the main energy consumption of the vehicle is in a battery, and therefore a return function R based on the SOC of the battery is designed soc This means that keeping the SOC value in a reasonable interval is the most important objective. Accordingly, when the vehicle is operating in parallel mode, the engineThe motor provides the power required by the vehicle running at the same time, and a comprehensive return function R is designed com The two goals of coordinating fuel consumption and maintaining battery SOC within a reasonable range are to achieve minimum energy consumption. The two return functions are divided into two different regulation layers according to the SOC range. The structure of the layered reward function is shown in FIG. 2, and the layered reward structure is based on the clutch state and the instantaneous vehicle speed V spd Activation of R soc Or R com 。
Specifically, referring to fig. 2, the activation manner of the layered reward function is as follows: judging the state of the clutch, activating a first report-back function if the clutch is disconnected, and judging the instantaneous vehicle speed if the clutch is closed; and if the instantaneous vehicle speed is zero, activating a first return function, and if the instantaneous vehicle speed is not zero, activating a second return function.
The first report-back function R soc Expressed as:
the second return function R com Expressed as:
wherein, the first adjusting layer L1 corresponds to SOC (t) > 0.8 or SOC (t) < 0.3; the second adjusting layer L2 corresponds to SOC (t) of more than or equal to 0.3 and less than or equal to 0.8; m is f An actual value representing a specific fuel consumption of the engine; SOC ref A reference value representing a state of charge of the battery; soc (t) represents an actual value of the battery state of charge; pen represents a constant penalty factor; delta 1 And delta 2 The two weight factors are used for balancing the influence of the fuel consumption rate and the change of the state of charge of the battery on the fuel consumption of the vehicle, and the larger the two values are, the more the energy management strategy pays more attention to the protection of the battery; omega 1 、ω 2 And ω 3 The three constants are used for ensuring that the values of the return functions of all the adjusting layers are in the same order of magnitude.
Namely, the invention creatively designs a reasonable and efficient layered reporting structure, and the HEV energy management strategy of the layered reporting TD3 can make an action A capable of maximizing the reporting function R according to the received state signal S, so that the vehicle is controlled to save energy and stably and efficiently run.
And 4, step 4: and constructing an HEV energy management learning network based on the layered return TD3 based on the state space signal, the action space signal and the layered return function.
A Critic network and an Actor network are respectively built by utilizing the principle of a deep neural network, a basic network framework of a double-delay depth deterministic strategy gradient strategy, namely the Actor-Critic network, is built together, a deep reinforcement learning TD3 agent basic framework diagram is shown in figure 3, so that an HEV energy management learning network based on a layered return TD3 is built, and initialization and state data normalization processing are carried out on parameters of the Actor-Critic network. The HEV energy management strategy implementation details of the hierarchical reward TD3 are shown in Table 1.
TABLE 1 hierarchical reward TD3 algorithm execution steps
And 5: obtaining parameters and observed values influencing energy management in vehicle standard working condition simulation driving, and training the HEV energy management learning network based on the layered return TD3 based on the parameters and observed values under the standard working condition.
Parameters and observed values influencing energy management in the simulated driving of the automobile under standard working conditions are obtained, and a trained deep reinforcement learning agent is obtained by combining a HEV energy management strategy target training learning network with layered return TD 3.
The learning network may be trained using three typical standard conditions, but is not limited thereto. The speed parameters for the three conditions are shown in FIG. 4, and the characteristics for each condition are shown in Table 2.
TABLE 1 Standard working conditions
Step 6: acquiring parameters and observation values influencing energy management in actual running of the vehicle, verifying the trained HEV energy management learning network based on the layered return TD3 based on the parameters and the observation values under actual running, and executing an energy management strategy through the trained and verified HEV energy management learning network based on the layered return TD 3.
For example, the driving data of a real vehicle in the wuhan city area is collected, the actual road conditions are manufactured and imported into a driver model, the trained hierarchical return TD3 energy management strategy is verified, the optimization performance of the energy management strategy is tested, and the actual road speed parameter is shown in fig. 5.
Example 2:
embodiment 2 provides an HEV energy management system based on a hierarchical reward TD3, including: a processor and a memory; the memory stores a control program that, when executed by the processor, is operable to implement the HEV energy management method according to embodiment 1 based on the hierarchical reward TD 3.
The HEV energy management method and the system based on the layered return TD3 provided by the embodiment of the invention at least have the following technical effects:
(1) the invention accurately designs a layered return structure which has two return functions and four adjusting layers in total, can adjust the control strategy according to different driving states of the vehicle, reduces unnecessary repeated exploration behaviors, ensures the comprehensive adjustment of the return functions aiming at different driving modes, avoids the waste of vehicle-mounted computing resources, enables an agent to interact with the environment more quickly and deeply, accelerates the learning speed of a deep reinforcement learning intelligent agent and can improve the overall performance of an energy management strategy.
(2) The invention not only collects the energy consumption index m f And the battery SOC, and also collects the vehicle dynamic index vehicle speed v and the clutch state cs as deep reinforcement learning state space signals. Two return functions are switched according to the vehicle speed and the clutch state, and different driving modes of the vehicle can be selectedAn accurate and efficient reward function is selected. The invention can not only ensure the fuel economy to be optimal in the driving process of the vehicle, but also ensure the battery to work in a proper SOC (state of charge) interval, prevent the battery from being damaged by overcharge or overdischarge and prolong the service life of the battery.
(3) The method adopts a hierarchical return double-delay depth certainty strategy gradient energy management strategy, not only can make up the problem of dimension disaster of a discrete action space depth reinforcement learning energy management strategy, but also can solve the problems of depth certainty strategy gradient overestimation and unstable training.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention has been described in detail with reference to examples, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.
Claims (7)
1. A HEV energy management method based on a layered reward TD3, characterized by comprising the following steps:
establishing a parallel hybrid vehicle model;
selecting a state space signal and an action space signal of an HEV energy management strategy by combining the parallel hybrid vehicle model;
combining the parallel hybrid vehicle model to construct a layered return function; the layered return function comprises a first return function and a second return function, the first return function or the second return function is activated according to an activation condition, and the first return function and the second return function are respectively divided into two different adjusting layers according to the range of the state of charge of the battery;
constructing an HEV energy management learning network based on the layered return TD3 based on the state space signal, the action space signal and the layered return function;
the HEV energy management learning network based on the hierarchical return TD3 is trained, and an energy management strategy is executed through the trained HEV energy management learning network based on the hierarchical return TD 3.
2. The layered reward TD 3-based HEV energy management method according to claim 1, wherein the operating modes of the vehicle in the parallel hybrid vehicle model include an electric-only mode, a neutral mode, and a parallel mode.
3. The layered reward TD 3-based HEV energy management method according to claim 1, wherein the state space signal is S ═ v (SOC, m) f Cs), the motion space signal is a ═ T (T) eng |T eng ∈[-250,841]) (ii) a Where v represents the traveling speed of the vehicle, SOC represents the battery state of charge, and m represents f Representing the fuel consumption rate of the engine; cs represents the state of the vehicle clutch, cs is 0 represents that the clutch is disconnected, and cs is 1 represents that the clutch is closed; t is eng Representing engine output torque.
4. The HEV energy management method based on the hierarchical reward TD3 of claim 1, wherein the activation manner of the hierarchical reward function is: judging the state of the clutch, activating the first report-back function if the clutch is disconnected, and judging the instantaneous vehicle speed if the clutch is closed; if the instantaneous vehicle speed is zero, activating the first reward function, and if the instantaneous vehicle speed is not zero, activating the second reward function;
the first report-back function R soc Expressed as:
the second return function R com Expressed as:
wherein, the first adjusting layer L1 corresponds to SOC (t) > 0.8 or SOC (t) < 0.3; the second adjusting layer L2 corresponds to SOC (t) of more than or equal to 0.3 and less than or equal to 0.8; m is f An actual value representing a specific fuel consumption of the engine; SOC ref A reference value representing a state of charge of the battery; soc (t) represents an actual value of the battery state of charge; pen represents a constant penalty factor; delta 1 And delta 2 The two weight factors are used for balancing the influence of the fuel consumption rate and the change of the state of charge of the battery on the fuel consumption of the vehicle; omega 1 、ω 2 And ω 3 The three constants are used for ensuring that the values of the return functions of all the adjusting layers are in the same order of magnitude.
5. The HEV energy management method based on the hierarchical reward TD3 according to claim 1, wherein parameters and observations affecting energy management during vehicle standard operating condition simulation driving are obtained, and the HEV energy management learning network based on the hierarchical reward TD3 is trained based on the parameters and observations under the standard operating condition.
6. The HEV energy management method based on the hierarchical reward TD3 of claim 1, wherein after obtaining the trained HEV energy management learning network based on the hierarchical reward TD3, the method further comprises: acquiring parameters and observation values influencing energy management in actual running of the vehicle, verifying the trained HEV energy management learning network based on the layered return TD3 based on the parameters and the observation values under actual running, and executing an energy management strategy through the trained and verified HEV energy management learning network based on the layered return TD 3.
7. An HEV energy management system based on a hierarchical reward TD3, comprising: a processor and a memory; the memory stores a control program that when executed by the processor is operable to implement a HEV energy management method based on a tiered return TD3 according to any one of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210298825.0A CN114852043B (en) | 2022-03-23 | 2022-03-23 | HEV energy management method and system based on layered return TD3 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210298825.0A CN114852043B (en) | 2022-03-23 | 2022-03-23 | HEV energy management method and system based on layered return TD3 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114852043A true CN114852043A (en) | 2022-08-05 |
CN114852043B CN114852043B (en) | 2024-06-18 |
Family
ID=82629986
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210298825.0A Active CN114852043B (en) | 2022-03-23 | 2022-03-23 | HEV energy management method and system based on layered return TD3 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114852043B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200065665A1 (en) * | 2018-08-24 | 2020-02-27 | Ford Global Technologies, Llc | Vehicle adaptive learning |
CN112440974A (en) * | 2020-11-27 | 2021-03-05 | 武汉理工大学 | HEV energy management method based on distributed depth certainty strategy gradient |
CN112590774A (en) * | 2020-12-22 | 2021-04-02 | 同济大学 | Intelligent electric automobile drifting and warehousing control method based on deep reinforcement learning |
US20210213933A1 (en) * | 2018-06-15 | 2021-07-15 | The Regents Of The University Of California | Systems, apparatus and methods to improve plug-in hybrid electric vehicle energy performance by using v2c connectivity |
CN113246958A (en) * | 2021-06-11 | 2021-08-13 | 武汉理工大学 | TD 3-based multi-target HEV energy management method and system |
CN113501008A (en) * | 2021-08-12 | 2021-10-15 | 东风悦享科技有限公司 | Automatic driving behavior decision method based on reinforcement learning algorithm |
-
2022
- 2022-03-23 CN CN202210298825.0A patent/CN114852043B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210213933A1 (en) * | 2018-06-15 | 2021-07-15 | The Regents Of The University Of California | Systems, apparatus and methods to improve plug-in hybrid electric vehicle energy performance by using v2c connectivity |
US20200065665A1 (en) * | 2018-08-24 | 2020-02-27 | Ford Global Technologies, Llc | Vehicle adaptive learning |
CN112440974A (en) * | 2020-11-27 | 2021-03-05 | 武汉理工大学 | HEV energy management method based on distributed depth certainty strategy gradient |
CN112590774A (en) * | 2020-12-22 | 2021-04-02 | 同济大学 | Intelligent electric automobile drifting and warehousing control method based on deep reinforcement learning |
CN113246958A (en) * | 2021-06-11 | 2021-08-13 | 武汉理工大学 | TD 3-based multi-target HEV energy management method and system |
CN113501008A (en) * | 2021-08-12 | 2021-10-15 | 东风悦享科技有限公司 | Automatic driving behavior decision method based on reinforcement learning algorithm |
Also Published As
Publication number | Publication date |
---|---|
CN114852043B (en) | 2024-06-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110696815B (en) | Prediction energy management method of network-connected hybrid electric vehicle | |
CN111731303B (en) | HEV energy management method based on deep reinforcement learning A3C algorithm | |
WO2021103625A1 (en) | Short-term vehicle speed condition real-time prediction method based on interaction between vehicle ahead and current vehicle | |
WO2021114742A1 (en) | Comprehensive energy prediction and management method for hybrid electric vehicle | |
CN107688343B (en) | Energy control method of hybrid power vehicle | |
CN111267831A (en) | Hybrid vehicle intelligent time-domain-variable model prediction energy management method | |
CN112249002B (en) | TD 3-based heuristic series-parallel hybrid power energy management method | |
CN110717218B (en) | Electric drive vehicle distributed power drive system reconstruction control method and vehicle | |
CN112668799A (en) | Intelligent energy management method and storage medium for PHEV (Power electric vehicle) based on big driving data | |
CN109733378A (en) | Optimize the torque distribution method predicted on line under a kind of line | |
CN113479186B (en) | Energy management strategy optimization method for hybrid electric vehicle | |
CN113635879B (en) | Vehicle braking force distribution method | |
CN112590760B (en) | Double-motor hybrid electric vehicle energy management system considering mode switching frequency | |
CN116070783B (en) | Learning type energy management method of hybrid transmission system under commute section | |
CN112009456A (en) | Energy management method for network-connected hybrid electric vehicle | |
CN112026744B (en) | Series-parallel hybrid power system energy management method based on DQN variants | |
CN115534929A (en) | Plug-in hybrid electric vehicle energy management method based on multi-information fusion | |
Huang et al. | Real-time long horizon model predictive control of a plug-in hybrid vehicle power-split utilizing trip preview | |
CN115805840A (en) | Energy consumption control method and system for range-extending type electric loader | |
CN113246958B (en) | TD 3-based multi-target HEV energy management method and system | |
Wang et al. | Hierarchical rewarding deep deterministic policy gradient strategy for energy management of hybrid electric vehicles | |
CN112440974B (en) | HEV energy management method based on distributed depth certainty strategy gradient | |
CN116442799A (en) | Control method and device for torque distribution of vehicle | |
CN114852043A (en) | HEV energy management method and system based on layered return TD3 | |
Fechert et al. | Using deep reinforcement learning for hybrid electric vehicle energy management under consideration of dynamic emission models |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |