CN116811836A - Plug-in hybrid electric vehicle energy management method based on double-delay Q learning - Google Patents

Plug-in hybrid electric vehicle energy management method based on double-delay Q learning Download PDF

Info

Publication number
CN116811836A
CN116811836A CN202310552153.6A CN202310552153A CN116811836A CN 116811836 A CN116811836 A CN 116811836A CN 202310552153 A CN202310552153 A CN 202310552153A CN 116811836 A CN116811836 A CN 116811836A
Authority
CN
China
Prior art keywords
power
model
vehicle
soc
delay
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310552153.6A
Other languages
Chinese (zh)
Inventor
沈世全
高顺
郭玉帆
陈峥
申江卫
王青旺
胡后征
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN202310552153.6A priority Critical patent/CN116811836A/en
Publication of CN116811836A publication Critical patent/CN116811836A/en
Pending legal-status Critical Current

Links

Landscapes

  • Electric Propulsion And Braking For Vehicles (AREA)

Abstract

The invention provides a plug-in hybrid electric vehicle energy management method based on double-delay Q learning, which comprises the following steps: step 1: acquiring historical vehicle speed and acceleration data of a vehicle; step 2: establishing a mathematical model of a power transmission system, a vehicle dynamic longitudinal mathematical model and a power balance equation between power components; step 3: solving an offline double-delay Q learning controller; step 4: training a convolutional neural network model by using the historical vehicle speed and acceleration data to obtain a CNN vehicle speed prediction model; step 5: and solving the optimal power distribution of the engine and the power battery on a model predictive control framework. Compared with the traditional reinforcement learning algorithm, the energy management method of the plug-in hybrid electric vehicle based on the double-delay Q learning provided by the invention uses two Q functions to select and evaluate actions, so that overestimation of action values is avoided, and meanwhile, two free parameters and three internal variables are introduced to delay updating of the Q functions, so that the stability of the algorithm is improved, and the oil saving capability is stronger.

Description

Plug-in hybrid electric vehicle energy management method based on double-delay Q learning
Technical Field
The invention belongs to the technical field of new energy automobile electric control, and particularly relates to a plug-in hybrid electric vehicle energy management method based on double-delay Q learning.
Background
With the increase of automobile demands, energy shortage and environmental pollution are aggravated, and new energy automobiles become the necessary trend of automobile development. However, the pure electric vehicle has a certain limitation due to the short endurance mileage and imperfect infrastructure such as the charging pile, and the traditional hybrid power has a certain limitation due to the small capacity configuration of the storage battery and inflexible working mode. As a transition product of electrification of a vehicle, a Plug-in hybrid electric vehicle (PHEV) has the advantages of a pure electric vehicle and a traditional Plug-in hybrid electric vehicle, so that not only can the running economy of the vehicle be improved, but also the problem of short endurance mileage of the pure electric vehicle can be solved, and therefore, the Plug-in hybrid electric vehicle is considered as a feasible method for solving the problems of energy shortage, environmental pollution and the like. As a key technology of PHEV, energy management strategy has a direct impact on fuel economy of PHEV, which has been a very challenging task facing academia and industry, and is also an urgent requirement for development of automobile power electrification.
Currently, some energy management strategies are applied in practice, such as strategies based on electricity consumption-electricity maintenance and dynamic programming algorithms, however, the strategies have some defects, are severely dependent on artificial experience and have limited optimization effect; or the working condition information needs to be known in advance, so that the calculation intensity is high, and the method is difficult to apply to actual vehicles. In addition, by utilizing the development opportunity of artificial intelligence technology, some intelligent algorithms are also applied to PHEV energy management optimization, such as reinforcement learning, which can realize better control effect and has higher fuel saving potential. However, conventional reinforcement learning typically ignores the effects of policy overestimation problems, resulting in poor policy optimization in a random environment. Therefore, there is a need for further improvements in such algorithms based on the prior art.
Disclosure of Invention
The invention aims to provide a plug-in hybrid electric vehicle energy management method based on double-delay Q learning, which solves the problems that in the prior art, engineering experience and a specific model are seriously relied on, working condition information is required to be known in advance, strategy control effect is poor due to overestimation of an action value, adaptability in a random environment is poor and the like.
In order to achieve the above purpose, the present invention provides the following technical solutions: the plug-in hybrid electric vehicle energy management method based on double-delay Q learning is characterized by comprising the following steps of:
step 1: acquiring historical vehicle speed and acceleration data of a vehicle by using GPS equipment;
step 2: establishing a mathematical model of a power transmission system, a vehicle dynamic longitudinal mathematical model and a power balance equation between power components;
step 3: based on a double-delay Q learning algorithm, an objective function is established, and an offline double-delay Q learning controller is solved;
step 4: offline training a convolutional neural network model by utilizing historical vehicle speed and acceleration data to obtain a CNN vehicle speed prediction model;
step 5: and on a model predictive control framework, realizing an MPC rolling optimization solving process by using a double-delay Q learning controller to obtain the optimal power distribution of the engine and the power battery.
Further, the mathematical model of the power transmission system in the step 2 includes: a drive motor model, a generator model, a power battery model, and a planetary transmission model;
the driving motor model is as follows:
wherein ,wM1 and TM1 The rotation speed and the torque of the driving motor are respectively, P M1 Is the output power of the driving motor, eta M1 Is the working efficiency of the driving motor;
the generator model is as follows:
wherein ,ηM2 and PM2 The working efficiency and the output power of the generator, w M2 and TM2 The rotation speed and the torque of the generator respectively;
the power battery model is as follows:
wherein ,Voc and Rint Respectively refers to open circuit voltage and internal resistance of battery, P b Is the output power of the battery, I b Power battery current, SOC is power battery state of charge, Q b Is battery capacity;
the planetary gear transmission model is as follows:
wherein ,wr 、w s and wx Angular velocities of the planet, sun and ring gears, respectively, T r 、T s and Tx Torque is applied to the planet gears, sun gear, and ring gear, respectively, and μ represents the gear ratio of the planetary gear set.
Further, the vehicle dynamics longitudinal model in the step 2 is as follows:
wherein ,Preq 、F w 、F i 、F f and Fj Respectively representing the required power, the air resistance, the ramp resistance, the rolling resistance and the acceleration resistance, v is the vehicle speed, C d An air resistance coefficient, m is the mass of the whole vehicle, ρ is a rolling resistance coefficient, g is a gravitational acceleration, θ is a gradient, δ is a rotational inertia conversion coefficient, a is the windward area of the vehicle,is acceleration;
the power balance equation between the power components is:
wherein ,Pps Is the main deceleration power, P e Is the engine power, eta ps Efficiency of main speed reducer eta g Representing the efficiency of the transmission system, P aux Refer to accessory power; z is a variable belonging to { -1,1 }.
Further, in the step 3, the solving the offline dual-delay Q learning controller includes the following steps:
s31: establishing an objective function:
wherein ,Es As the judging parameter for the start and stop of the engine,the instantaneous oil consumption of the engine is achieved;
s32: determining a state variable: s.epsilon.S= { v, P req SOC } and control variables: a (t) ∈a= { P b };
Wherein S is a state set comprising all states, v is speed, P req For power demand, SOC is the state of charge of the power battery, A is the control set including all actions, P b Power of the power battery;
s33: the energy of the plug-in hybrid electric vehicle is mainly provided by an engine and a power battery of the vehicle, the power distribution of the engine and the power battery is a key factor influencing the consumption of the vehicle, a double-delay Q learning rewarding function is a function related to the fuel oil of the engine and the SOC of the battery at each moment, and specifically, the rewarding function of the double-delay Q learning controller is set as follows:
wherein ,a1 、a 2 、a 3 、b 1 and b2 All are constant coefficients, and delta SOC is a gain value of the actual SOC and the target SOC;
s34: according to the Bellman principle, establishing an update equation of a Q value function of double-delay Q learning;
where s' is the optimal operation a for the environment * The next state obtained later, n being the number of accesses experienced, λε (0, 1) being used to provide a continuous "explore reward", γ being the discount factor, b being a random constant, d being a constant, U (s, a) being used to store the running sum of the estimated values of each (s, a) after n accesses, L (s, a) being used to store the number of accesses of (s, a); learn (s, a) is a Boolean flag indicating whether the collection state is allowed-Q value of action pair (s, a), learn (s, a) ∈ { true, false };
q value function update condition:
repeating the steps S31-S34 until the execution times are greater than or equal to the maximum iteration times M, and finally obtaining an energy management control strategy of the plug-in hybrid electric vehicle under the circulation working condition;
the optimized control strategy is expressed as:
the off-line double-delay Q learning controller solution is finished.
Further, in the step 4, the CNN vehicle speed prediction model is specifically obtained by the following steps:
s41: establishing a speed training set and a speed testing set according to the historical vehicle speed and the acceleration of the vehicle obtained in the step 1;
s42: randomly selecting a training algorithm of the convolutional neural network model and setting super parameters;
s43: and performing offline training on the convolutional neural network model by using the established speed training set to obtain a convolutional neural network prediction model, and checking the convolutional neural network prediction model by using the speed testing set to obtain a CNN vehicle speed prediction model.
Further, the step 5 specifically includes:
s51: according to the CNN vehicle speed prediction model, solving the prediction speed in the prediction time domain; wherein, the mapping relation between the input and the output of the CNN vehicle speed prediction model is described as follows:
wherein ,is history L h Speed of seconds, < > on->Is the predicted time domain h p The speed of seconds, f (·) is the mapping relation between the input and the output of the CNN speed prediction model;
s52: determining an SOC reference track and required power in a prediction time domain;
the reference SOC calculation formula in the prediction time domain is:
wherein ,Dr 、D p D respectively represents the running distance of the vehicle at the current moment, the running distance in the prediction time domain and the total running total distance under the driving cycle; Δt, h p、v and vp Respectively representing sampling time, predicted time domain length, current vehicle speed and predicted vehicle speed; SOC (State of Charge) 0 、SOC end and SOCref The initial SOC, the terminal SOC and the reference SOC track value of the battery are respectively;
the calculation formula of the required power in the prediction time domain is as follows:
wherein ,for the sequence of power requirements in the prediction domain, v p Is a sequence of vehicle speeds in the prediction time domain,is the power in the prediction time domain;
s53: according to the model predictive control principle, an optimization index function in each prediction time domain is established:
wherein ,Jt Is an objective function, h p To predict the time domain length E s (t) refers to a judgment parameter of engine start-stop,is instantaneous fuel of engine, tau soc Is a power battery SOC penalty function;
wherein, kappa is a weight factor, SOC ref Is a reference SOC;
s54: the off-line double-delay Q learning controller is utilized to combine the predicted vehicle speed in the prediction time domain, the reference SOC and the required power to realize the rolling optimization of the model prediction control based on double-delay Q learning, the optimal control sequence in the prediction time domain is obtained, after feedback correction, the first element in the control sequence is applied to the plug-in hybrid electric vehicle system, and the feedback correction is carried out under the action of the reference SOC track and the actual vehicle speed, so that the on-line optimization of the energy management of the plug-in hybrid electric vehicle is realized.
Further, the specific implementation steps in the step S54 are as follows:
s541: predicting future h using a CNN vehicle speed prediction model based on historical vehicle speed p Vehicle speed in seconds:
s542: based on a model of longitudinal dynamics of the vehicle andcalculating the vehicle required power in the prediction time domain: />
S543: based on SOC reference trajectory, SOC trajectory calculated in each prediction horizon
S544: at each prediction time domain h p In the step (3), the offline double-delay Q learning controller obtained in the step (3) is utilized to solve the problem of rolling optimization control in model prediction control, so as to obtain a control variable sequence of each stage in the prediction time domain:
s545: after feedback correction, the first element of the battery output power in a prediction time domain is used as an output action of model prediction control to be input into a plug-in hybrid electric vehicle system;
steps S541-S545 are then repeated until the entire driving cycle is completed.
Compared with the prior art, the invention has the following beneficial effects:
(1) Compared with the traditional reinforcement learning algorithm, the plug-in hybrid electric vehicle energy management method based on double-delay Q learning provided by the invention uses two Q functions to select and evaluate actions, so that overestimation of action values is avoided, and meanwhile, two free parameters and three internal variables are introduced to delay updating of the Q functions, thereby realizing unbiased estimation of the algorithm and improving the stability of the algorithm.
(2) When the reward function is set, the influence of the fuel consumption of the engine and the electric quantity consumption of the power battery is comprehensively considered, the convergence speed is high, and the control effect is better.
(3) When the vehicle speed prediction model is built, the convolutional neural network algorithm is adopted, so that noise is reduced, calculation efficiency is improved, fitting problems are reduced, and a better prediction effect is realized.
Drawings
In order to make the description of the technical scheme of the present invention more clear, the technical scheme of the present invention will be specifically described with reference to the accompanying drawings. It is evident that the drawings in the following description are only some embodiments of the present invention and that other drawings may be obtained from these drawings by those skilled in the art without inventive effort.
FIG. 1 is a topology of a power train of a plug-in hybrid vehicle;
FIG. 2 is a flow chart of the method of the present invention;
FIG. 3 is a graph showing the open circuit voltage and internal resistance of a power battery as a function of SOC;
FIG. 4 is a graph of a change in prize value for a dual delay Q learning energy management strategy prize function;
FIG. 5 is a schematic diagram of a convolutional neural network;
FIG. 6 is a diagram of a model predictive energy management design framework in accordance with the present invention;
FIG. 7 is a schematic diagram of real-time energy management control of PHEV model predictive control with dual delay Q learning;
FIG. 8a is a graph showing the variation of SOC of a PHEV power battery under different strategies;
FIG. 8b is a plot of operating point profiles for a PHEV engine under different strategies;
FIG. 8c is a graph comparing energy consumption of PHEV under different strategies;
Detailed Description
The invention will be further described with reference to the drawings and the specific examples. The embodiment is implemented on the premise of the technical scheme of the invention, and detailed implementation modes and specific operation processes are given, but the protection scope of the invention is not limited to the following embodiments.
Example 1:
in the embodiment, a plug-in hybrid electric vehicle model is built by adopting automobile simulation software Autonomie developed based on Matlab/Simulink, and the plug-in hybrid electric vehicle model is shown in figure 1 and comprises an engine, a driving motor, a generator, a battery pack, a main speed reducer, a DC/AC inverter and a planetary gear mechanism consisting of a sun gear, a planet carrier and a gear ring.
The specific flow of the plug-in hybrid electric vehicle energy management method based on the dual-delay Q learning provided in the embodiment is shown in fig. 2, and specifically includes the following steps:
step 1: acquiring historical vehicle speed and acceleration data of a vehicle by using GPS equipment;
step 2: a mathematical model of the powertrain system, a vehicle dynamics model, and a power balance equation between the power components are established.
Wherein the mathematical model of the driveline comprises: a drive motor model, a generator model, a power battery model, and a planetary transmission model;
the driving motor model is as follows:
wherein ,wM1 and TM1 The rotation speed and the torque of the driving motor are respectively, P M1 Is the output power of the driving motor, eta M1 Is the working efficiency of the driving motor;
the generator model is as follows:
wherein ,ηM2 and PM2 The working efficiency and the output power of the generator, w M2 and TM2 The rotation speed and the torque of the generator respectively;
the power battery model is as follows:
wherein ,Voc and Rint Respectively refers to open circuit voltage and internal resistance of battery, P b Is the output power of the battery, I b Power battery current, SOC is power battery nuclear power state, Q b Is battery capacity; the open circuit voltage and internal resistance are interpolated from the power cell SOC as shown in fig. 3.
The planetary transmission model is:
wherein ,wr 、w s and wx Angular velocities of the planet, sun and ring gears, respectively, T r 、T s and Tx Torque of the planetary gear, sun gear, and ring gear, respectively, μ represents a gear ratio of the planetary gear set, and this embodiment is set to 2.6.
The vehicle dynamic longitudinal model is as follows:
wherein ,Preq 、F w 、F i 、F f and Fj Respectively representing the required power, the air resistance, the ramp resistance, the rolling resistance and the acceleration resistance, v is the vehicle speed, C d An air resistance coefficient, m is the mass of the whole vehicle, ρ is a rolling resistance coefficient, g is a gravitational acceleration, θ is a gradient, δ is a rotational inertia conversion coefficient, a is the windward area of the vehicle,is acceleration;
the power balance equation between the power components is:
wherein ,Pps Is the main deceleration power, P e Is the engine power, eta ps Efficiency of main speed reducer eta g Representing the efficiency of the transmission system, P aux The power of the accessory is set to be 200kW; z is a variable belonging to { -1,1 }.
Step 3: based on a double-delay Q learning algorithm, an objective function is established, and an offline double-delay Q learning controller is solved; specifically, the method for solving the offline double-delay Q learning controller comprises the following steps:
s31: establishing an objective function:
wherein ,Es As the judging parameter for the start and stop of the engine,the instantaneous oil consumption of the engine is achieved;
s32: determining a state variable: s.epsilon.S= { v, P req SOC } and control variables: a (t) ∈a= { P b };
Wherein S is a state set comprising all states, v is speed, P req For power demand, SOC is the state of charge of the power battery, A is the control set including all actions, P b Power of the power battery;
s33: the energy of the plug-in hybrid electric vehicle is mainly provided by an engine and a power battery of the vehicle, the power distribution of the engine and the power battery is a key factor influencing the consumption of the vehicle, a double-delay Q learning rewarding function is a function related to the fuel oil of the engine and the SOC of the battery at each moment, and specifically, the rewarding function of the delay double-Q learning controller is set as follows:
wherein ,a1 、a 2 、a 3 、b 1 、b 2 and b3 All are constant coefficients, in this embodiment 10, -10 respectively 3 、2.5*10 4 、5*10 4 and 104 Delta SOC is a gain value of the actual SOC and the target SOC;
s34: according to the Bellman principle, establishing an update equation of a Q value function of double-delay Q learning;
where s' is the optimal operation a for the environment * The next state obtained later, n, is the number of accesses experienced, this implementationLet 3, λε (0, 1) for providing a persistent "explore prize", γ being the discount factor, let 0.9, b being a random constant, d being a constant, let 0.5, U (s, a) for storing running sums of the estimates of each (s, a) n times after access, L (s, a) for storing the number of accesses of (s, a); learn (s, a) is a boolean flag indicating whether the Q value of the state-action pair (s, a) is allowed to be collected, learn (s, a) ∈ { true, false };
q value function update condition:
and repeating the steps S31-S34 until the execution times are greater than or equal to the maximum iteration times M, M=1000, and finally obtaining the energy management control strategy of the plug-in hybrid electric vehicle under the circulation working condition.
The optimized control strategy is expressed as:
the off-line double-delay Q learning controller solution is finished.
The change condition of the value of the rewarding function of the double-delay Q learning strategy after 1000 iterations is shown in fig. 4, the value of the rewarding function gradually increases along with the increase of the iteration times, and gradually converges after 580 rounds, which shows that the offline double-delay Q learning controller obtains the final optimal control sequence by applying the learned state action pair to solve the optimal action after exploration and learning in the iteration process.
Step 4: the convolutional neural network model is trained offline by utilizing the historical vehicle speed and acceleration data to obtain a CNN vehicle speed prediction model, which is specifically as follows:
s41: establishing a speed training set and a speed testing set according to the historical vehicle speed and the acceleration of the vehicle obtained in the step 1;
s42: selecting sgdm as a training algorithm of a convolutional neural network model, setting an initial learning rate to 0.35, setting a learning speed reduction factor to 0.975, setting a reduction period to 2, and setting learning momentum to 0.925;
s43: and performing offline training on the convolutional neural network model by using the established speed training set to obtain a convolutional neural network prediction model, and checking the convolutional neural network prediction model by using the speed testing set to obtain a CNN vehicle speed prediction model.
Step 5: on the model predictive control framework, as shown in fig. 6, the MPC rolling optimization solving process is implemented by using a dual-delay Q learning controller, so as to obtain the optimal power distribution of the engine and the power battery.
The method comprises the following steps:
s51: according to the CNN vehicle speed prediction model, solving the prediction speed in the prediction time domain; wherein, the mapping relation between the input and the output of the CNN vehicle speed prediction model is described as follows:
wherein ,is history L h Speed of seconds, < > on->Is the predicted time domain h p Speed of seconds, L in this embodiment h and hp Respectively setting 20s and 5s, wherein f (-) is the mapping relation between the input and the output of the CNN vehicle speed prediction model;
s52: determining an SOC reference track and required power in a prediction time domain;
the reference SOC calculation formula in the prediction time domain is:
wherein ,Dr 、D p D respectively represents the running distance of the vehicle at the current moment, the running distance in the prediction time domain and the total running total distance under the driving cycle; Δt, h p、v and vp Respectively representing sampling time, predicted time domain length, current vehicle speed and predicted vehicle speed; SOC (State of Charge) 0 、SOC end and SOCref The initial SOC, the terminal SOC and the reference SOC track value of the battery are respectively;
the calculation formula of the required power in the prediction time domain is as follows:
wherein ,for the sequence of power requirements in the prediction domain, v p Is a sequence of vehicle speeds in the prediction time domain,is the power in the prediction time domain;
s53: according to the model predictive control principle, an optimization index function in each prediction time domain is established:
wherein ,Jt Is an objective function, h p To predict the time domain length E s (t) refers to a judgment parameter of engine start-stop,is instantaneous fuel of engine, tau soc Is a power battery SOC penalty function;
wherein κ is a weight factor, the embodiment is set to 10 8 ,SOC ref Is a reference SOC;
s54: the off-line double-delay Q learning controller is utilized to combine the predicted vehicle speed in the prediction time domain, the reference SOC and the required power to realize the rolling optimization of the model prediction control based on double-delay Q learning, the optimal control sequence in the prediction time domain is obtained, after feedback correction, the first element in the control sequence is applied to the plug-in hybrid electric vehicle system, and the feedback correction is carried out under the action of the reference SOC track and the actual vehicle speed, so that the on-line optimization of the energy management of the plug-in hybrid electric vehicle is realized.
The specific implementation process in step S54 is shown in fig. 7, and specifically includes:
s541: predicting future h using a CNN vehicle speed prediction model based on historical vehicle speed p Vehicle speed in seconds:
s542: based on a model of longitudinal dynamics of the vehicle andcalculating the vehicle required power in the prediction time domain: />
S543: calculating an SOC trajectory in each prediction time domain based on the SOC reference trajectory
S544: at each prediction time domain h p In the step (3), the offline double-delay Q learning controller obtained in the step (3) is utilized to solve the problem of rolling optimization control in model prediction control, so as to obtain a control variable sequence of each stage in the prediction time domain:
s545: after feedback correction, the first element of the battery output power in a prediction time domain is used as an output action of model prediction control to be input into a plug-in hybrid electric vehicle system;
steps S541-S545 are then repeated until the execution time is greater than or equal to the time length T of the 6UDDS operating mode, t=8220S.
In order to verify the effectiveness of the algorithm, a composite working condition 6UDDS formed by 6 groups of UDDS is used as a verification working condition in the embodiment, a comparison group is arranged, the comparison group adopts a currently common power consumption-power maintenance strategy (CD/CS), and the performance of power battery SOC, engine working point distribution and equivalent oil consumption under the 6UDDS working condition is used for verifying the effectiveness of the algorithm according to 2 strategies.
FIG. 8a compares the SOC variation curves of the power battery of the plug-in hybrid electric vehicle in the CD/CS strategy and the double-delay Q learning control strategy (2 DQL-MPC) based on model predictive control, wherein gray dotted lines in the graph represent the SOC curves under the CD/CS strategy, and black solid lines represent the SOC curves under the 2DQL-MPC strategy.
In fig. 8b, the distribution situation of engine working points of the plug-in hybrid electric vehicle under two strategies is compared, the black line marked by an asterisk in the figure represents the optimal working curve of the engine, and the circle represents the working point of the engine. Fig. 8c compares the equivalent fuel consumption change conditions of the two strategies, wherein a black curve in the figure represents the fuel consumption change curve of the proposed method under the verification working condition, and a gray dotted line represents the fuel consumption change curve of the CD/CS strategy, so that the fuel consumption change track of the proposed method is more gentle, no drastic energy consumption occurs, the method provided by the invention is used for effectively distributing the power source of the plug-in hybrid power automobile, the energy consumption in the running process of the vehicle is obviously reduced, and the effectiveness and the superiority of the energy management method of the plug-in hybrid power automobile based on double-delay Q learning are verified.
The above embodiments are merely illustrative of the preferred embodiments of the present invention, and the present invention is not limited to the above embodiments, and various modifications and improvements made by those skilled in the art to the technical solutions of the present invention should fall within the protection scope of the present invention without departing from the design concept of the present invention, and the claimed technical content of the present invention is fully described in the claims.

Claims (7)

1. The plug-in hybrid electric vehicle energy management method based on double-delay Q learning is characterized by comprising the following steps of:
step 1: acquiring historical vehicle speed and acceleration data of a vehicle by using GPS equipment;
step 2: establishing a mathematical model of a power transmission system, a vehicle dynamic longitudinal mathematical model and a power balance equation between power components;
step 3: based on a double-delay Q learning algorithm, an objective function is established, and an offline double-delay Q learning controller is solved;
step 4: offline training a convolutional neural network model by utilizing historical vehicle speed and acceleration data to obtain a CNN vehicle speed prediction model;
step 5: and on a model predictive control framework, realizing an MPC rolling optimization solving process by using a double-delay Q learning controller to obtain the optimal power distribution of the engine and the power battery.
2. The method for energy management of a plug-in hybrid vehicle based on dual-delay Q learning of claim 1, wherein said mathematical model of the powertrain system in step 2 comprises: a drive motor model, a generator model, a power battery model, and a planetary transmission model;
the driving motor model is as follows:
wherein ,wM1 and TM1 The rotation speed and the torque of the driving motor are respectively, P M1 Is the output power of the driving motor, eta M1 Is the working efficiency of the driving motor;
the generator model is as follows:
wherein ,ηM2 and PM2 The working efficiency and the output power of the generator, w M2 and TM2 The rotation speed and the torque of the generator respectively;
the power battery model is as follows:
wherein ,Voc and Rint Respectively refers to open circuit voltage and internal resistance of battery, P b Is the output power of the battery, I b Power battery current, SOC is power battery nuclear power state, Q b Is battery capacity;
the planetary gear transmission model is as follows:
wherein ,wr 、w s and wx Angular velocities of the planet, sun and ring gears, respectively, T r 、T s and Tx Torque is applied to the planet gears, sun gear, and ring gear, respectively, and μ represents the gear ratio of the planetary gear set.
3. The energy management method of a plug-in hybrid vehicle based on delayed double Q learning of claim 1, wherein the vehicle dynamics model in step 2 is:
wherein ,Preq 、F w 、F i 、F f and Fj Respectively representing the required power, the air resistance, the ramp resistance, the rolling resistance and the acceleration resistance, v is the vehicle speed, C d An air resistance coefficient, m is the mass of the whole vehicle, ρ is a rolling resistance coefficient, g is a gravitational acceleration, θ is a gradient, δ is a rotational inertia conversion coefficient, a is the windward area of the vehicle,is acceleration;
the power balance equation between the power components is:
wherein ,Pps Is the main deceleration power, P e Is the engine power, eta ps Efficiency of main speed reducer eta g Representing the efficiency of the transmission system, P aux Refer to accessory power; z is a variable belonging to { -1,1 }.
4. The method for managing energy of a plug-in hybrid vehicle based on dual-delay Q learning according to claim 1, wherein in the step 3, the solving the offline dual-delay Q learning controller comprises the steps of:
s31: establishing an objective function:
wherein ,Es For starting or stopping engineThe parameters are judged, and the parameters are judged,the instantaneous oil consumption of the engine is achieved;
s32: determining a state variable: s.epsilon.S= { v, P req SOC } and control variables: a (t) ∈a= { P b };
Wherein S is a state set comprising all states, v is speed, P req For power demand, SOC is the state of charge of the power battery, A is the control set including all actions, P b Power of the power battery;
s33: the energy of the plug-in hybrid electric vehicle is mainly provided by an engine and a power battery of the vehicle, the power distribution of the engine and the power battery is a key factor influencing the consumption of the vehicle, a double-delay Q learning rewarding function is a function related to the fuel oil of the engine and the SOC of the battery at each moment, and specifically, the rewarding function of the double-delay Q learning controller is set as follows:
wherein ,a1 、a 2 、a 3 、b 1 and b2 All are constant coefficients, and delta SOC is a gain value of the actual SOC and the target SOC;
s34: according to the Bellman principle, establishing an update equation of a Q value function of double-delay Q learning;
where s' is the optimal operation a for the environment * The next state obtained later, n being the number of accesses experienced, λε (0, 1) being used to provide a continuous "explore reward", γ being the discount factor, b being a random constant, d being a constant, U (s, a) being used to store the running sum of the estimated values of each (s, a) after n accesses, L (s, a) being used to store the number of accesses of (s, a); learn (s, a) is a Boolean flag indicating whether or not permission is givenCollecting the Q value of the state-action pair (s, a), learn (s, a) ∈ { true, false };
q value function update condition:
repeating the steps S31-S34 until the execution times are greater than or equal to the maximum iteration times M, and finally obtaining an energy management control strategy of the plug-in hybrid electric vehicle under the circulation working condition;
the optimized control strategy is expressed as:
the off-line double-delay Q learning controller solution is finished.
5. The energy management method of a plug-in hybrid electric vehicle based on dual-delay Q learning according to claim 1, wherein in the step 4, the CNN vehicle speed prediction model is specifically obtained by:
s41: establishing a speed training set and a speed testing set according to the historical vehicle speed and the acceleration of the vehicle obtained in the step 1;
s42: randomly selecting a training algorithm of the convolutional neural network model and setting super parameters;
s43: and performing offline training on the convolutional neural network model by using the established speed training set to obtain a convolutional neural network prediction model, and checking the convolutional neural network prediction model by using the speed testing set to obtain a CNN vehicle speed prediction model.
6. The method for energy management of a plug-in hybrid vehicle based on dual-delay Q learning of claim 1, wherein the step 5 specifically comprises:
s51: according to the CNN vehicle speed prediction model, solving the prediction speed in the prediction time domain; wherein, the mapping relation between the input and the output of the CNN vehicle speed prediction model is described as follows:
wherein ,is history L h Speed of seconds, < > on->Is the predicted time domain h p The speed of seconds, f (·) is the mapping relation between the input and the output of the CNN speed prediction model;
s52: determining an SOC reference track and required power in a prediction time domain;
the reference SOC calculation formula in the prediction time domain is:
wherein ,Dr 、D p D respectively represents the running distance of the vehicle at the current moment, the running distance in the prediction time domain and the total running total distance under the driving cycle; Δt, h p、v and vp Respectively representing sampling time, predicted time domain length, current vehicle speed and predicted vehicle speed; SOC (State of Charge) 0 、SOC end and SOCref Respectively referring to an initial SOC (state of charge), a terminal SOC and a reference SOC track value of the battery;
the calculation formula of the required power in the prediction time domain is as follows:
wherein ,for the sequence of power requirements in the prediction domain, v p Is in the prediction time domainIs a sequence of vehicle speeds of (a),is the power in the prediction time domain;
s53: according to the model predictive control principle, an optimization index function in each prediction time domain is established:
wherein ,Jt Is an objective function, h p To predict the time domain length E s (t) refers to a judgment parameter of engine start-stop,is the instantaneous fuel of the engine, τsoc is the power battery SOC penalty function;
wherein, kappa is a weight factor, SOC ref Is a reference SOC;
s54: the off-line double-delay Q learning controller is utilized to combine the predicted vehicle speed in the prediction time domain, the reference SOC and the required power to realize the rolling optimization of the model prediction control based on double-delay Q learning, the optimal control sequence in the prediction time domain is obtained, after feedback correction, the first element in the control sequence is applied to the plug-in hybrid electric vehicle system, and the feedback correction is carried out under the action of the reference SOC track and the actual vehicle speed, so that the on-line optimization of the energy management of the plug-in hybrid electric vehicle is realized.
7. The energy management method for a plug-in hybrid vehicle based on dual-delay Q learning of claim 6, wherein the implementation step in step S54 is as follows:
s541: predicting future h using a CNN vehicle speed prediction model based on historical vehicle speed p Vehicle speed in seconds:
s542: based on a model of longitudinal dynamics of the vehicle andcalculating the vehicle required power in the prediction time domain: />
S543: calculating an SOC trajectory in each prediction time domain based on the SOC reference trajectory
S544: at each prediction time domain h p In the step (3), the offline double-delay Q learning controller obtained in the step (3) is utilized to solve the problem of rolling optimization control in model prediction control, so as to obtain a control variable sequence of each stage in the prediction time domain:
s545: after feedback correction, the first element of the battery output power in a prediction time domain is used as an output action of model prediction control to be input into a plug-in hybrid electric vehicle system;
steps S541-S545 are then repeated until the entire driving cycle is completed.
CN202310552153.6A 2023-05-16 2023-05-16 Plug-in hybrid electric vehicle energy management method based on double-delay Q learning Pending CN116811836A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310552153.6A CN116811836A (en) 2023-05-16 2023-05-16 Plug-in hybrid electric vehicle energy management method based on double-delay Q learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310552153.6A CN116811836A (en) 2023-05-16 2023-05-16 Plug-in hybrid electric vehicle energy management method based on double-delay Q learning

Publications (1)

Publication Number Publication Date
CN116811836A true CN116811836A (en) 2023-09-29

Family

ID=88111776

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310552153.6A Pending CN116811836A (en) 2023-05-16 2023-05-16 Plug-in hybrid electric vehicle energy management method based on double-delay Q learning

Country Status (1)

Country Link
CN (1) CN116811836A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117984983A (en) * 2024-04-03 2024-05-07 中汽研汽车检验中心(天津)有限公司 Hybrid vehicle energy real-time control method, vehicle controller and hybrid vehicle

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117984983A (en) * 2024-04-03 2024-05-07 中汽研汽车检验中心(天津)有限公司 Hybrid vehicle energy real-time control method, vehicle controller and hybrid vehicle

Similar Documents

Publication Publication Date Title
CN111267831B (en) Intelligent time-domain-variable model prediction energy management method for hybrid electric vehicle
Chen et al. Optimal strategies of energy management integrated with transmission control for a hybrid electric vehicle using dynamic particle swarm optimization
CN112776673B (en) Intelligent network fuel cell automobile real-time energy optimization management system
CN109591659B (en) Intelligent learning pure electric vehicle energy management control method
WO2021159660A1 (en) Energy management method and system for hybrid vehicle
CN112249002B (en) TD 3-based heuristic series-parallel hybrid power energy management method
CN113554337B (en) Plug-in hybrid electric vehicle energy management strategy construction method integrating traffic information
CN112765723B (en) Curiosity-driven hybrid power system deep reinforcement learning energy management method
Xu et al. Learning time reduction using warm-start methods for a reinforcement learning-based supervisory control in hybrid electric vehicle applications
CN112757922B (en) Hybrid power energy management method and system for vehicle fuel cell
CN113135113B (en) Global SOC (System on chip) planning method and device
CN111923897A (en) Intelligent energy management method for plug-in hybrid electric vehicle
CN115230485B (en) Fuel cell bus energy management method based on short-term power smooth prediction
Shen et al. Two-level energy control strategy based on ADP and A-ECMS for series hybrid electric vehicles
CN113104023B (en) Distributed MPC network-connected hybrid electric vehicle energy management system and method
CN106055830A (en) PHEV (Plug-in Hybrid Electric Vehicle) control threshold parameter optimization method based on dynamic programming
CN116811836A (en) Plug-in hybrid electric vehicle energy management method based on double-delay Q learning
Li et al. A real-time energy management strategy combining rule-based control and ECMS with optimization equivalent factor for HEVs
CN113581163B (en) Multimode PHEV mode switching optimization and energy management method based on LSTM
CN114969982A (en) Fuel cell automobile deep reinforcement learning energy management method based on strategy migration
Zhang et al. Uncertainty-Aware Energy Management Strategy for Hybrid Electric Vehicle Using Hybrid Deep Learning Method
Wang et al. Hierarchical rewarding deep deterministic policy gradient strategy for energy management of hybrid electric vehicles
Dorri et al. Design of an optimal control strategy in a parallel hybrid vehicle in order to simultaneously reduce fuel consumption and emissions
CN111891109B (en) Hybrid electric vehicle energy optimal distribution control method based on non-cooperative game theory
CN114670803A (en) Parallel hybrid electric vehicle energy management method based on self-supervision learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination