CN113246958A - TD 3-based multi-target HEV energy management method and system - Google Patents

TD 3-based multi-target HEV energy management method and system Download PDF

Info

Publication number
CN113246958A
CN113246958A CN202110654498.3A CN202110654498A CN113246958A CN 113246958 A CN113246958 A CN 113246958A CN 202110654498 A CN202110654498 A CN 202110654498A CN 113246958 A CN113246958 A CN 113246958A
Authority
CN
China
Prior art keywords
battery
energy management
soc
engine
vehicle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110654498.3A
Other languages
Chinese (zh)
Other versions
CN113246958B (en
Inventor
颜伏伍
王金海
杜常清
彭可挥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Technology WUT
Original Assignee
Wuhan University of Technology WUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Technology WUT filed Critical Wuhan University of Technology WUT
Priority to CN202110654498.3A priority Critical patent/CN113246958B/en
Publication of CN113246958A publication Critical patent/CN113246958A/en
Application granted granted Critical
Publication of CN113246958B publication Critical patent/CN113246958B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W20/00Control systems specially adapted for hybrid vehicles
    • B60W20/10Controlling the power contribution of each of the prime movers to meet required power demand
    • B60W20/13Controlling the power contribution of each of the prime movers to meet required power demand in order to stay within battery power input or output limits; in order to prevent overcharging or battery depletion
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W10/00Conjoint control of vehicle sub-units of different type or different function
    • B60W10/04Conjoint control of vehicle sub-units of different type or different function including control of propulsion units
    • B60W10/06Conjoint control of vehicle sub-units of different type or different function including control of propulsion units including control of combustion engines
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W10/00Conjoint control of vehicle sub-units of different type or different function
    • B60W10/24Conjoint control of vehicle sub-units of different type or different function including control of energy storage means
    • B60W10/26Conjoint control of vehicle sub-units of different type or different function including control of energy storage means for electrical energy, e.g. batteries or capacitors
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W20/00Control systems specially adapted for hybrid vehicles
    • B60W20/10Controlling the power contribution of each of the prime movers to meet required power demand
    • B60W20/15Control strategies specially adapted for achieving a particular effect
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W2050/0001Details of the control system
    • B60W2050/0043Signal treatments, identification of variables or parameters, parameter estimation or state estimation
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2710/00Output or target parameters relating to a particular sub-units
    • B60W2710/06Combustion engines, Gas turbines
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2710/00Output or target parameters relating to a particular sub-units
    • B60W2710/06Combustion engines, Gas turbines
    • B60W2710/0666Engine torque
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2710/00Output or target parameters relating to a particular sub-units
    • B60W2710/24Energy storage means
    • B60W2710/242Energy storage means for electrical energy
    • B60W2710/244Charge state
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2710/00Output or target parameters relating to a particular sub-units
    • B60W2710/24Energy storage means
    • B60W2710/242Energy storage means for electrical energy
    • B60W2710/246Temperature

Landscapes

  • Engineering & Computer Science (AREA)
  • Transportation (AREA)
  • Mechanical Engineering (AREA)
  • Chemical & Material Sciences (AREA)
  • Combustion & Propulsion (AREA)
  • Automation & Control Theory (AREA)
  • Human Computer Interaction (AREA)
  • Electric Propulsion And Braking For Vehicles (AREA)
  • Hybrid Electric Vehicles (AREA)

Abstract

A gradient multi-target HEV energy management method and system based on a double-delay depth certainty strategy are disclosed. The invention innovatively uses a double-delay depth certainty strategy gradient strategy to solve the problem of dimension disaster of a depth reinforcement learning energy management strategy based on a discrete action space and the problem of depth certainty strategy gradient over-estimation. And fuel consumption, battery temperature and battery life (SOH) are taken as optimization targets, and the practical value of the energy management strategy is improved.

Description

TD 3-based multi-target HEV energy management method and system
Technical Field
The invention relates to a Deep reinforcement learning algorithm for improving fuel economy of a new energy automobile and prolonging service life of a battery, in particular to a parallel Hybrid Electric Vehicle (HEV) multi-target energy management method based on a double delay Deep Deterministic strategy Gradient (TD 3).
Background
Energy crisis and climate change have attracted extensive attention from countries in the world, and fuel consumption and exhaust emission of vehicles are key factors that cannot be ignored. In order to alleviate severe energy crisis and climate change, vehicle motorization is the necessary way for the development of the automotive industry in the future. In new energy vehicles, hybrid vehicles need less fuel than traditional fuel vehicles, and have longer driving range than pure electric vehicles, so that hybrid vehicles become the most effective solution at present. However, the energy management system of the hybrid electric vehicle is very complex, not only needs to properly distribute the power of the engine and the power of the motor, but also needs to comprehensively ensure the driving performance and the economical efficiency of the vehicle, and the energy management method covers the contents in various aspects of energy management of the traditional automobile, the pure electric automobile and the oil-electric hybrid automobile, and becomes the focus of extensive research in the field of domestic and foreign automobiles.
Energy management policies can be largely divided into three categories. a) The rule-based energy management strategy depends on a rule set formulated through professional experience and does not need to predict the driving condition, although the practicability is high, the rule-based energy management cannot achieve optimal control of a vehicle, and the specific driving condition is single. The binary control strategy is a typical rule-based control strategy, which first drives the vehicle with the energy of the battery and switches to engine-driven vehicle when the battery SOC reaches a set minimum value. b) Based on an optimized energy management strategy, such as a dynamic programming strategy (DP), convex optimization, and a genetic algorithm, the optimal control of the vehicle is performed according to the known or predicted vehicle running conditions, so that the optimal or near optimal result of the vehicle under a specific condition cycle can be obtained, but all the running conditions of the vehicle need to be predicted, and the consumed computing resources are large and cannot be used for real-time control. To improve the utility of energy management strategies, real-time online optimization strategies are widely studied, such as Model Predictive Control (MPC), the pent-rieya-gold minimum principle (PMP) and the equivalent fuel consumption strategy (ECMS). However, due to the fact that equivalent fuel consumption of a system is calculated by adopting part of historical information, the historical information cannot necessarily represent future driving states, and the robustness of the algorithm is poor. A better-performing strategy needs to be adopted to make up for the defects of the above algorithm. c) A learning-based energy management policy. Machine Learning (data-driven optimization), in particular to a Deep Reinforcement Learning (Deep Learning) algorithm developed in recent years, provides a powerful research tool for system model and control parameter optimization, road condition feature and driving behavior feature extraction. Among the reinforcement Learning algorithms, discrete motion space reinforcement Learning algorithms such as Q Learning and Deep Q Network (DQN) are most widely used, but the above algorithms are only applicable to discrete and low-dimensional motion spaces, and the HEV energy management control task has a high-dimensional and continuous motion space. The above algorithm requires discretization of the motion space, which inevitably loses important information of the motion space and also constitutes a dimension of disaster (security) problem. The reinforcement learning algorithm of continuous action spaces such as a depth deterministic strategy gradient (DDPG) can well process the continuous action spaces without discretization, but the depth deterministic strategy gradient has an over-estimation problem, an estimated value function is often larger than a real value function, the stability of the energy management strategy is influenced, and the robustness of the algorithm is poor.
Furthermore, current energy management strategies only marginally improve vehicle fuel economy, ignoring the control strategy's impact on battery life. It is well known that the service life of a battery system is closely related to the operating conditions and the temperature of the battery, and that excessive internal temperature of the battery can cause thermal breakdown. The energy management policy must take these important factors into account or else there is no practical value.
Disclosure of Invention
The invention provides a gradient multi-target HEV energy management method and system based on a double-delay depth certainty strategy. The method and the system can well solve the over-estimation problem by using two sets of network representation value functions and a delay updating technology. The fuel consumption, the SOC of the battery, the temperature of the battery and the service life (SOH) of the battery of the vehicle are used as optimization targets, a multi-objective optimization energy management strategy is constructed, the vehicle works in a real optimal State, and the practical value of the energy management strategy is improved.
At least one embodiment of the present invention provides a HEV energy management method, comprising:
establishing a dynamic model, a battery thermal model and a battery service life model of the parallel hybrid electric vehicle, and calculating the fuel consumption rate m of the engine of the three modelsfEngine output torque TengBattery temperature TempThe SOC and the SOH of the battery are taken as control targets;
constructing a dual-delay depth deterministic strategy gradient TD3 network;
taking the engine fuel consumption rate, the engine output torque, the battery temperature, the battery SOH and the battery SOC of the control target as a TD3 state space signal S, taking the engine output torque as a TD3 action space signal A, and making a return function r of TD 3;
acquiring parameters and observation values influencing energy management in vehicle standard working condition driving, wherein the parameters and observation values influencing energy management in vehicle standard working condition driving and the return function r are used for training a TD3 network, so that the TD3 network can make an action A capable of maximizing the return function r according to a received state signal S, and a trained deep reinforcement learning intelligent agent is further obtained;
and acquiring parameters and observation values influencing energy management in the actual running of the vehicle, wherein the parameters and the observation values influencing the energy management in the actual running of the vehicle comprise the fuel consumption rate of an engine, the output torque of the engine, the temperature of a battery, the SOH of the battery and the SOC of the battery which are taken as control targets, and inputting the parameters and the observation values influencing the energy management in the actual running of the vehicle into the trained deep reinforcement learning intelligent body for energy management.
At least one embodiment of the present invention provides a HEV energy management method system, comprising: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to perform all or part of the steps of the method.
At least one embodiment of the invention provides a non-transitory computer readable storage medium having stored thereon a computer program that, when executed by a processor, performs all or part of the steps of a method as described herein.
The invention adopts a double-delay depth certainty strategy gradient energy management strategy to optimize the power distribution of an engine and a motor and the use condition of a battery, not only can make up the problem of dimension disaster of a discrete action space depth reinforcement learning energy management strategy, but also can solve the problems of depth certainty strategy gradient overestimation and unstable training.
The invention not only optimizes the fuel consumption in the running process of the vehicle and keeps the SOC of the battery in a reasonable range, but also considers the influence of the control strategy on the temperature of the battery and the service life of the battery. A return function is innovatively designed, a multi-target energy management strategy for fuel economy, battery SOC, battery temperature and battery service life is constructed, and the vehicle can be comprehensively optimized in a multi-target mode.
The method collects actual road working condition data and verifies the optimality of the deep reinforcement learning TD3 energy management strategy.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings of the embodiments will be briefly described below.
Fig. 1 is a flowchart of a multi-target HEV energy management method based on TD3 according to an embodiment of the present invention.
Fig. 2 is a structural diagram of a parallel hybrid electric vehicle according to an embodiment of the present invention.
Fig. 3 is a basic architecture diagram of an intelligent agent TD3 for deep reinforcement learning according to an embodiment of the present invention.
Fig. 4 is a speed curve of a vehicle under a standard operating condition according to an embodiment of the present invention.
Fig. 5 is a speed curve of a vehicle actually traveling at a certain location according to an embodiment of the present invention.
Detailed Description
Aiming at HEV energy management, the invention innovatively uses a double-delay depth certainty strategy gradient TD3 strategy to solve the problem of dimension disaster of a depth reinforcement learning energy management strategy based on a discrete action space and the problem of depth certainty strategy gradient overestimation. And fuel consumption, battery temperature and battery life (SOH) are taken as optimization targets, and the practical value of the energy management strategy is improved. The multi-target HEV energy management method based on TD3 will be described in detail with reference to fig. 1-5.
Step 1: the parallel hybrid power automobile model is established, the automobile power model is established according to an automobile power equation, the battery thermal model is established according to a battery heat generation and heat dissipation principle, and the battery service life model is established according to a battery capacity attenuation principle. The dynamic characteristics of the battery system can be predicted by combining the thermal model of the battery with the life model of the battery. The fuel consumption rates m of the engines of the three models are comparedfEngine output torque TengBattery temperature TempBattery SOH and battery SOC as control targets;
step 2: respectively constructing a critical network and an Actor network by using a deep neural network, commonly constructing a basic network framework, namely the Actor-critical network, of a double-delay deep deterministic strategy gradient strategy TD3 to construct a multi-target HEV energy management strategy learning network, and initializing and normalizing state data of parameters of the Actor-critical network, wherein the network parameters are shown in a table 2. And taking the engine fuel consumption rate, the engine output torque, the battery temperature, the battery SOH and the battery SOC of the control target as a TD3 state space signal S, taking the engine output torque as a TD3 action space signal A, and establishing a reasonable return function r of TD 3.
And step 3: the method comprises the steps of obtaining parameters and observation values influencing energy management in automobile standard working condition driving, wherein the parameters and the observation values influencing energy management in the automobile standard working condition driving and the return function r are used as control targets, training a basic network of TD3, enabling a TD3 energy management strategy to make an action A capable of maximizing the return function r according to a received state signal S, controlling the automobile to drive in an energy-saving and efficient manner, and further obtaining a trained deep reinforcement learning intelligent agent.
And 4, step 4: and acquiring parameters and observed values influencing energy management in the actual running of the automobile, wherein the parameters and the observed values influencing the energy management in the actual running are input into the trained deep reinforcement learning intelligent agent for energy management, and the parameters and the observed values influencing the energy management in the actual running comprise the fuel consumption rate of the engine, the output torque of the engine, the temperature of the battery, the SOH of the battery and the SOC of the battery which are used as control targets.
FIG. 2 shows a schematic diagram of a parallel hybrid vehicle drive system. In step 1, the automobile dynamic model may be calculated by an automobile dynamic equation, which is shown in formula (1):
Figure BDA0003113258050000041
wherein, FtIs the driving force for automobile running; ffIs the rolling resistance of the automobile during running; fiIs the slope resistance of the automobile; fωIs the air resistance of the automobile in running; fjIs the acceleration resistance of the automobile; m is the mass of the automobile; g is the acceleration of gravity; f is a rolling resistance coefficient; α is the motorroad grade; ρ is the air density; a is the frontal area of the automobile; cDCoefficient of air resistance; v is the vehicle speed; δ is a rotating mass conversion factor; and a is the running acceleration of the automobile.
The thermal model of the battery is shown as formula (2):
Figure BDA0003113258050000042
wherein, TempIs the battery temperature; t isambIs ambient temperature; m is the mass of the battery; c is the specific heat capacity of the battery; i is the working current of the battery; OCV is the battery open circuit voltage; v is the working voltage of the battery; h is the natural thermal convection constant.
The battery life model is shown in equation (3):
Figure BDA0003113258050000043
wherein, N (c)r,Temp) Is the equivalent cycle number before the end of the battery life, and the discharge rate C-rate (C) of the batteryr) And battery temperature (T)emp) Influence, shown by equation (4);
Figure BDA0003113258050000051
percentage of loss of battery capacity is CnB is an exponential factor, the value of which is given in table 1, R-8.314 is a universal gas constant,z0.55 is the power law coefficient, Ah is the battery throughput, EaIs the activation energy; when the capacity of the battery drops to 20%, the battery reaches the end of life. CnAh and EaIs defined by equation (5):
Figure BDA0003113258050000052
TABLE 1 relationship between index factor and discharge rate
Figure BDA0003113258050000053
In step 2, TD3 state space signal is S ═ SOC, mf,Teng,TempSOH), where SOC represents the State of Charge (SOC) of the battery; m isfIs the engine fuel consumption rate; t isengAn engine output torque; t isempIs the battery temperature. The motion space signal is A ═ Teng|Teng∈[-250,841]) (ii) a The reward function is defined by equation (6):
Figure BDA0003113258050000054
wherein b is an offset used for adjusting the range of the return function; j. the design is a squareiIs a loss function, i represents a time step; s and a represent the states of the ith time step (the engine fuel consumption rate of the control target, and the like), respectivelyEngine output torque, the battery temperature, the battery SOH, and the battery SOC), and action (the engine output torque);
Figure BDA0003113258050000055
representing the fuel consumption rate of the engine; cbRepresents a battery degradation cost; psAnd PtRespectively representing SOC relative to a reference value SOCrefDeviation of (d) and penalty factor for excessive temperature; omega1And ω2Respectively represent PsAnd PtThe weight of the influencing factor. CbCalculated from equation (7):
Cb,i=λΔSOH (7)
where λ is the ratio of the battery replacement cost to the one kilogram fuel cost (n.kittner, f.lill, and d.m.kammen, "Energy storage deployment and innovation for the clean Energy transfer," Nature Energy, vol.2,2017, art.no. 17125.).
SOC relative to a reference value SOCrefThe deviation of (a) and the penalty coefficient for excessive temperatures are determined by equation (8) and equation (9):
Figure BDA0003113258050000061
Figure BDA0003113258050000062
therein, SOCref0.6 is the battery SOC reference value, TrefIs a penalty trigger threshold, which may be set at 40 ℃. Tau is1And τ2And adjusting the coefficient to enable the SOC deviation and the over-high temperature penalty coefficient of the battery to be in the same order of magnitude as the fuel consumption rate of the engine.
In step 2, the basic architecture of the dual-delay depth deterministic policy gradient algorithm is shown in fig. 3.
Wherein J represents a loss function, M number of gradient descent samples in batch, and θQAnd thetaμThe parameters of the Critic network and the Actor network are respectively, r represents a return function, and epsilon represents noiseSound, τ represents a soft update factor, y represents a time sequence difference error (TD error), LkIndicating the accumulated error.
The detailed parameters of the deep reinforcement learning TD3 agent are shown in table 2:
TABLE 2 TD3 agent specific parameters
Figure BDA0003113258050000063
The TD3 energy management policy implementation details are shown in table 3:
TABLE 3 TD3 Algorithm execution steps
Figure BDA0003113258050000071
Wherein theta isQAnd thetaμAre parameters of the Critic network and the Actor network, respectively. The deep reinforcement learning agent transmits observation signals (including the engine fuel consumption rate, the engine output torque, the battery temperature, the battery SOH and the battery SOC) to an Actor network, and the Actor network outputs a control action a ═ mu (s | theta) through a deterministic strategy function mu(s) and a random noise Nμ) + N. The controlled object obtains a new state s 'and a new reward r by executing an action a, stores (s, a, r, s') into an experience playback area, randomly samples M samples from the experience playback area, and inputs s 'into an Actor network in a target network to obtain a', wherein the action a obtained by the Critic network through the state s and the Actor network obtains a value function Q (s, a) by using Bellman equation learning, and the target Critic network obtains a target Q value Q '(s, a) ═ E [ r (s, a) + gamma Q' (s ', a')]Wherein Q '(s, a) represents a target Q value, s represents an observed quantity at this time, a represents an action selected by an Actor network in the agent, E represents an expectation operation, r (s, a) represents a return obtained under such a state value and action value, γ represents a discount silver, and Q' (s ', a') represents a target Q value of a next state, and the controlled object obtains a new state value s 'by performing the action a and obtains a new state value s' under the intelligent agentThe action a' at the next time instant selected in the volume, the TD error is calculated as follows
Figure BDA0003113258050000081
Where y represents the approximate equivalent of the target Q value, LkIs the cumulative error, Q(s)j,aj) Is the estimated Q value in the current network. The Actor network parameter in the current network is updated by mapping the state to a specified action through an action value function, and is updated through the gradient back propagation and soft update strategy of the neural network.
In step 3, the deep reinforcement learning agent learns in the process of interacting with the environment (vehicle and road working conditions), and selects the action capable of maximizing the return, but the action selected by the agent at the initial stage is far from the optimal value and can generate unexpected consequences, so that the deep reinforcement learning agent is trained in the standard working condition to obtain more stable intelligent agent hyper-parameters (learning rate, neuron number, network layer number, experience playback zone size, batch gradient sampling size and the like), and then the deep reinforcement learning agent is applied to the actual road working condition. The method comprises the steps of selecting a proper standard working condition, leading the proper standard working condition into a driver model, preprocessing road working condition information by the driver model, inputting speed, acceleration and gradient information of the working condition, and outputting speed, acceleration and total torque demand information required by vehicle running. And in the training process, the hyper-parameters of the TD3 intelligent agent are adjusted according to the vehicle information and the working condition information, so that the aim that the TD3 intelligent agent can quickly and accurately select the optimal control action is fulfilled. The deep reinforcement learning TD3 network can be trained using three typical standard conditions, but is not limited thereto. The speed parameters for the three conditions are shown in fig. 4, and the characteristics of each condition are shown in table 4:
table 4 Standard Condition characteristics
Figure BDA0003113258050000091
And 4, acquiring actual operation data of the vehicle, making actual road working condition data, importing the actual road working condition data into a driver model, and performing energy management by using the trained deep reinforcement learning intelligent agent. Meanwhile, the trained deep reinforcement learning TD3 energy management strategy can be verified, and the optimality of the energy management strategy can be tested. The actual road speed parameter is shown in fig. 5.
In conclusion, the method provided by the invention not only ensures that the fuel economy is optimal in the driving process of the vehicle, but also ensures that the battery works in a proper temperature range, prolongs the service life of the battery and ensures that the multi-target comprehensive performance of the hybrid vehicle is optimal.
In an exemplary embodiment, there is also provided a dual delay depth certainty strategy gradient-based multi-target HEV energy management system, comprising: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to perform all or part of the steps of the method.
In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, on which a computer program is stored, which when executed by a processor implements all or part of the steps of the method. For example, the non-transitory computer readable storage medium may be a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Claims (4)

1. A HEV energy management method comprising:
establishing a dynamic model, a battery thermal model and a battery service life model of the parallel hybrid electric vehicle, and calculating the fuel consumption rate m of the engine of the three modelsfEngine output torque TengBattery temperature TempThe SOC and the SOH of the battery are taken as control targets;
constructing a dual-delay depth deterministic strategy gradient TD3 network;
taking the engine fuel consumption rate, the engine output torque, the battery temperature, the battery SOH and the battery SOC of the control target as a TD3 state space signal S, taking the engine output torque as a TD3 action space signal A, and making a return function r of TD 3;
acquiring parameters and observation values influencing energy management in vehicle standard working condition driving, wherein the parameters and observation values influencing energy management in vehicle standard working condition driving and the return function r are used for training a TD3 network, so that the TD3 network can make an action capable of maximizing the return function r according to a received state signal S, and a trained deep reinforcement learning intelligent agent is further obtained;
and acquiring parameters and observation values influencing energy management in the actual running of the vehicle, wherein the parameters and the observation values influencing the energy management in the actual running of the vehicle comprise the fuel consumption rate of an engine, the output torque of the engine, the temperature of a battery, the SOH of the battery and the SOC of the battery which are taken as control targets, and inputting the parameters and the observation values influencing the energy management in the actual running of the vehicle into the trained deep reinforcement learning intelligent body for energy management.
2. The HEV energy management method of claim 1, wherein the TD3 state space signal is S ═ S (SOC, m)f,Teng,TempSOH), the motion space signal is a ═ T (T)eng|Teng∈[-250,841]) The reward function is defined by equation (1):
Figure FDA0003113258040000011
wherein b is an offset used for adjusting the range of the return function; i represents a time step;
Figure FDA0003113258040000012
representing the fuel consumption rate of the engine; cbRepresents a battery degradation cost; psAnd PtRespectively representing SOC relative to a reference value SOCrefDeviation of (d) and penalty factor for excessive temperature; omega1And ω2Respectively represent PsAnd PtThe weight of the influencing factor; cbCalculated from equation (2):
Cb,i=λΔSOH (2)
where λ is the ratio of battery replacement cost to one kilogram of fuel cost;
SOC relative to a reference value SOCrefThe deviation of (a) and the penalty coefficient for excessive temperatures are determined by equation (8) and equation (9):
Figure FDA0003113258040000021
Figure FDA0003113258040000022
therein, SOCref0.6 is the battery SOC reference value, TrefIs a penalty trigger threshold, which can be set to 40 deg.C, tau1And τ2And adjusting the coefficient to enable the SOC deviation and the over-high temperature penalty coefficient of the battery to be in the same order of magnitude as the fuel consumption rate of the engine.
3. A HEV energy management method system, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to perform the steps of the method of any one of claims 1-2.
4. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 2.
CN202110654498.3A 2021-06-11 2021-06-11 TD 3-based multi-target HEV energy management method and system Active CN113246958B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110654498.3A CN113246958B (en) 2021-06-11 2021-06-11 TD 3-based multi-target HEV energy management method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110654498.3A CN113246958B (en) 2021-06-11 2021-06-11 TD 3-based multi-target HEV energy management method and system

Publications (2)

Publication Number Publication Date
CN113246958A true CN113246958A (en) 2021-08-13
CN113246958B CN113246958B (en) 2022-06-14

Family

ID=77187634

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110654498.3A Active CN113246958B (en) 2021-06-11 2021-06-11 TD 3-based multi-target HEV energy management method and system

Country Status (1)

Country Link
CN (1) CN113246958B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114290959A (en) * 2021-12-30 2022-04-08 重庆长安新能源汽车科技有限公司 Power battery active service life control method and system and computer readable storage medium
CN114852043A (en) * 2022-03-23 2022-08-05 武汉理工大学 HEV energy management method and system based on layered return TD3

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102014222513A1 (en) * 2014-11-04 2016-05-04 Continental Automotive Gmbh Method of operating a hybrid or electric vehicle
CN108216201A (en) * 2016-12-21 2018-06-29 株式会社电装 Controller of vehicle, control method for vehicle and the recording medium for storing vehicle control program
CN110254418A (en) * 2019-06-28 2019-09-20 福州大学 A kind of hybrid vehicle enhancing study energy management control method
CN110341690A (en) * 2019-07-22 2019-10-18 北京理工大学 A kind of PHEV energy management method based on deterministic policy Gradient learning
CN112249002A (en) * 2020-09-23 2021-01-22 南京航空航天大学 Heuristic series-parallel hybrid power energy management method based on TD3
CN112440974A (en) * 2020-11-27 2021-03-05 武汉理工大学 HEV energy management method based on distributed depth certainty strategy gradient

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102014222513A1 (en) * 2014-11-04 2016-05-04 Continental Automotive Gmbh Method of operating a hybrid or electric vehicle
CN108216201A (en) * 2016-12-21 2018-06-29 株式会社电装 Controller of vehicle, control method for vehicle and the recording medium for storing vehicle control program
CN110254418A (en) * 2019-06-28 2019-09-20 福州大学 A kind of hybrid vehicle enhancing study energy management control method
CN110341690A (en) * 2019-07-22 2019-10-18 北京理工大学 A kind of PHEV energy management method based on deterministic policy Gradient learning
CN112249002A (en) * 2020-09-23 2021-01-22 南京航空航天大学 Heuristic series-parallel hybrid power energy management method based on TD3
CN112440974A (en) * 2020-11-27 2021-03-05 武汉理工大学 HEV energy management method based on distributed depth certainty strategy gradient

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114290959A (en) * 2021-12-30 2022-04-08 重庆长安新能源汽车科技有限公司 Power battery active service life control method and system and computer readable storage medium
CN114290959B (en) * 2021-12-30 2023-05-23 重庆长安新能源汽车科技有限公司 Active life control method and system for power battery and computer readable storage medium
CN114852043A (en) * 2022-03-23 2022-08-05 武汉理工大学 HEV energy management method and system based on layered return TD3

Also Published As

Publication number Publication date
CN113246958B (en) 2022-06-14

Similar Documents

Publication Publication Date Title
Yuan et al. Intelligent energy management strategy based on hierarchical approximate global optimization for plug-in fuel cell hybrid electric vehicles
Liu et al. Optimal power management based on Q-learning and neuro-dynamic programming for plug-in hybrid electric vehicles
Liu et al. Online energy management for multimode plug-in hybrid electric vehicles
CN111731303B (en) HEV energy management method based on deep reinforcement learning A3C algorithm
Liu et al. Reinforcement learning optimized look-ahead energy management of a parallel hybrid electric vehicle
Huang et al. Model predictive control power management strategies for HEVs: A review
CN107688343B (en) Energy control method of hybrid power vehicle
Zhang et al. A deep reinforcement learning-based energy management framework with Lagrangian relaxation for plug-in hybrid electric vehicle
CN113246958B (en) TD 3-based multi-target HEV energy management method and system
CN112668799A (en) Intelligent energy management method and storage medium for PHEV (Power electric vehicle) based on big driving data
CN113479186B (en) Energy management strategy optimization method for hybrid electric vehicle
Ngo Gear shift strategies for automotive transmissions
Zhu et al. Energy management of hybrid electric vehicles via deep Q-networks
CN116070783B (en) Learning type energy management method of hybrid transmission system under commute section
Montazeri-Gh et al. Driving condition recognition for genetic-fuzzy HEV control
CN115107733A (en) Energy management method and system for hybrid electric vehicle
CN115130266A (en) Method and system for thermal management control of a vehicle
CN115805840A (en) Energy consumption control method and system for range-extending type electric loader
Yang et al. Real-time energy management for a hybrid electric vehicle based on heuristic search
Liu Reinforcement learning-enabled intelligent energy management for hybrid electric vehicles
Gan et al. Intelligent learning algorithm and intelligent transportation-based energy management strategies for hybrid electric vehicles: A review
He et al. Deep reinforcement learning based energy management strategies for electrified vehicles: Recent advances and perspectives
Guo et al. Clustered energy management strategy of plug-in hybrid electric logistics vehicle based on Gaussian mixture model and stochastic dynamic programming
Wei et al. Priority-driven multi-objective model predictive control for integrated motion control and energy management of hybrid electric vehicles
Fechert et al. Using deep reinforcement learning for hybrid electric vehicle energy management under consideration of dynamic emission models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant