CN116811836A

CN116811836A - Plug-in hybrid electric vehicle energy management method based on double-delay Q learning

Info

Publication number: CN116811836A
Application number: CN202310552153.6A
Authority: CN
Inventors: 沈世全; 高顺; 郭玉帆; 陈峥; 申江卫; 王青旺; 胡后征
Original assignee: Kunming University of Science and Technology
Current assignee: Kunming University of Science and Technology
Priority date: 2023-05-16
Filing date: 2023-05-16
Publication date: 2023-09-29

Abstract

The invention provides a plug-in hybrid electric vehicle energy management method based on double-delay Q learning, which comprises the following steps: step 1: acquiring historical vehicle speed and acceleration data of a vehicle; step 2: establishing a mathematical model of a power transmission system, a vehicle dynamic longitudinal mathematical model and a power balance equation between power components; step 3: solving an offline double-delay Q learning controller; step 4: training a convolutional neural network model by using the historical vehicle speed and acceleration data to obtain a CNN vehicle speed prediction model; step 5: and solving the optimal power distribution of the engine and the power battery on a model predictive control framework. Compared with the traditional reinforcement learning algorithm, the energy management method of the plug-in hybrid electric vehicle based on the double-delay Q learning provided by the invention uses two Q functions to select and evaluate actions, so that overestimation of action values is avoided, and meanwhile, two free parameters and three internal variables are introduced to delay updating of the Q functions, so that the stability of the algorithm is improved, and the oil saving capability is stronger.

Description

Plug-in hybrid electric vehicle energy management method based on double-delay Q learning

Technical Field

The invention belongs to the technical field of new energy automobile electric control, and particularly relates to a plug-in hybrid electric vehicle energy management method based on double-delay Q learning.

Background

With the increase of automobile demands, energy shortage and environmental pollution are aggravated, and new energy automobiles become the necessary trend of automobile development. However, the pure electric vehicle has a certain limitation due to the short endurance mileage and imperfect infrastructure such as the charging pile, and the traditional hybrid power has a certain limitation due to the small capacity configuration of the storage battery and inflexible working mode. As a transition product of electrification of a vehicle, a Plug-in hybrid electric vehicle (PHEV) has the advantages of a pure electric vehicle and a traditional Plug-in hybrid electric vehicle, so that not only can the running economy of the vehicle be improved, but also the problem of short endurance mileage of the pure electric vehicle can be solved, and therefore, the Plug-in hybrid electric vehicle is considered as a feasible method for solving the problems of energy shortage, environmental pollution and the like. As a key technology of PHEV, energy management strategy has a direct impact on fuel economy of PHEV, which has been a very challenging task facing academia and industry, and is also an urgent requirement for development of automobile power electrification.

Currently, some energy management strategies are applied in practice, such as strategies based on electricity consumption-electricity maintenance and dynamic programming algorithms, however, the strategies have some defects, are severely dependent on artificial experience and have limited optimization effect; or the working condition information needs to be known in advance, so that the calculation intensity is high, and the method is difficult to apply to actual vehicles. In addition, by utilizing the development opportunity of artificial intelligence technology, some intelligent algorithms are also applied to PHEV energy management optimization, such as reinforcement learning, which can realize better control effect and has higher fuel saving potential. However, conventional reinforcement learning typically ignores the effects of policy overestimation problems, resulting in poor policy optimization in a random environment. Therefore, there is a need for further improvements in such algorithms based on the prior art.

Disclosure of Invention

The invention aims to provide a plug-in hybrid electric vehicle energy management method based on double-delay Q learning, which solves the problems that in the prior art, engineering experience and a specific model are seriously relied on, working condition information is required to be known in advance, strategy control effect is poor due to overestimation of an action value, adaptability in a random environment is poor and the like.

In order to achieve the above purpose, the present invention provides the following technical solutions: the plug-in hybrid electric vehicle energy management method based on double-delay Q learning is characterized by comprising the following steps of:

step 1: acquiring historical vehicle speed and acceleration data of a vehicle by using GPS equipment;

step 2: establishing a mathematical model of a power transmission system, a vehicle dynamic longitudinal mathematical model and a power balance equation between power components;

step 3: based on a double-delay Q learning algorithm, an objective function is established, and an offline double-delay Q learning controller is solved;

step 4: offline training a convolutional neural network model by utilizing historical vehicle speed and acceleration data to obtain a CNN vehicle speed prediction model;

step 5: and on a model predictive control framework, realizing an MPC rolling optimization solving process by using a double-delay Q learning controller to obtain the optimal power distribution of the engine and the power battery.

Further, the mathematical model of the power transmission system in the step 2 includes: a drive motor model, a generator model, a power battery model, and a planetary transmission model;

the driving motor model is as follows:

wherein ,w_M1 and T_M1 The rotation speed and the torque of the driving motor are respectively, P _M1 Is the output power of the driving motor, eta _M1 Is the working efficiency of the driving motor;

the generator model is as follows:

wherein ,η_M2 and P_M2 The working efficiency and the output power of the generator, w _M2 and T_M2 The rotation speed and the torque of the generator respectively;

the power battery model is as follows:

wherein ,V_oc and R_int Respectively refers to open circuit voltage and internal resistance of battery, P _b Is the output power of the battery, I _b Power battery current, SOC is power battery state of charge, Q _b Is battery capacity;

the planetary gear transmission model is as follows:

wherein ,w_r 、w _s and w_x Angular velocities of the planet, sun and ring gears, respectively, T _r 、T _s and T_x Torque is applied to the planet gears, sun gear, and ring gear, respectively, and μ represents the gear ratio of the planetary gear set.

Further, the vehicle dynamics longitudinal model in the step 2 is as follows:

wherein ,P_req 、F _w 、F _i 、F _f and F_j Respectively representing the required power, the air resistance, the ramp resistance, the rolling resistance and the acceleration resistance, v is the vehicle speed, C _d An air resistance coefficient, m is the mass of the whole vehicle, ρ is a rolling resistance coefficient, g is a gravitational acceleration, θ is a gradient, δ is a rotational inertia conversion coefficient, a is the windward area of the vehicle,is acceleration;

the power balance equation between the power components is:

wherein ,P_ps Is the main deceleration power, P _e Is the engine power, eta _ps Efficiency of main speed reducer eta _g Representing the efficiency of the transmission system, P _aux Refer to accessory power; z is a variable belonging to { -1,1 }.

Further, in the step 3, the solving the offline dual-delay Q learning controller includes the following steps:

s31: establishing an objective function:

wherein ,E_s As the judging parameter for the start and stop of the engine,the instantaneous oil consumption of the engine is achieved;

s32: determining a state variable: s.epsilon.S= { v, P _req SOC } and control variables: a (t) ∈a= { P _b }；

Wherein S is a state set comprising all states, v is speed, P _req For power demand, SOC is the state of charge of the power battery, A is the control set including all actions, P _b Power of the power battery;

s33: the energy of the plug-in hybrid electric vehicle is mainly provided by an engine and a power battery of the vehicle, the power distribution of the engine and the power battery is a key factor influencing the consumption of the vehicle, a double-delay Q learning rewarding function is a function related to the fuel oil of the engine and the SOC of the battery at each moment, and specifically, the rewarding function of the double-delay Q learning controller is set as follows:

wherein ,a₁ 、a ₂ 、a ₃ 、b ₁ and b₂ All are constant coefficients, and delta SOC is a gain value of the actual SOC and the target SOC;

s34: according to the Bellman principle, establishing an update equation of a Q value function of double-delay Q learning;

where s' is the optimal operation a for the environment ^* The next state obtained later, n being the number of accesses experienced, λε (0, 1) being used to provide a continuous "explore reward", γ being the discount factor, b being a random constant, d being a constant, U (s, a) being used to store the running sum of the estimated values of each (s, a) after n accesses, L (s, a) being used to store the number of accesses of (s, a); learn (s, a) is a Boolean flag indicating whether the collection state is allowed-Q value of action pair (s, a), learn (s, a) ∈ { true, false };

q value function update condition:

repeating the steps S31-S34 until the execution times are greater than or equal to the maximum iteration times M, and finally obtaining an energy management control strategy of the plug-in hybrid electric vehicle under the circulation working condition;

the optimized control strategy is expressed as:

the off-line double-delay Q learning controller solution is finished.

Further, in the step 4, the CNN vehicle speed prediction model is specifically obtained by the following steps:

s41: establishing a speed training set and a speed testing set according to the historical vehicle speed and the acceleration of the vehicle obtained in the step 1;

s42: randomly selecting a training algorithm of the convolutional neural network model and setting super parameters;

s43: and performing offline training on the convolutional neural network model by using the established speed training set to obtain a convolutional neural network prediction model, and checking the convolutional neural network prediction model by using the speed testing set to obtain a CNN vehicle speed prediction model.

Further, the step 5 specifically includes:

s51: according to the CNN vehicle speed prediction model, solving the prediction speed in the prediction time domain; wherein, the mapping relation between the input and the output of the CNN vehicle speed prediction model is described as follows:

wherein ,is history L _h Speed of seconds, < > on->Is the predicted time domain h _p The speed of seconds, f (·) is the mapping relation between the input and the output of the CNN speed prediction model;

s52: determining an SOC reference track and required power in a prediction time domain;

the reference SOC calculation formula in the prediction time domain is:

wherein ,D_r 、D _p D respectively represents the running distance of the vehicle at the current moment, the running distance in the prediction time domain and the total running total distance under the driving cycle; Δt, h _p、v and v_p Respectively representing sampling time, predicted time domain length, current vehicle speed and predicted vehicle speed; SOC (State of Charge) ₀ 、SOC _end and SOC_ref The initial SOC, the terminal SOC and the reference SOC track value of the battery are respectively;

the calculation formula of the required power in the prediction time domain is as follows:

wherein ,for the sequence of power requirements in the prediction domain, v _p Is a sequence of vehicle speeds in the prediction time domain,is the power in the prediction time domain;

s53: according to the model predictive control principle, an optimization index function in each prediction time domain is established:

wherein ,J_t Is an objective function, h _p To predict the time domain length E _s (t) refers to a judgment parameter of engine start-stop,is instantaneous fuel of engine, tau _soc Is a power battery SOC penalty function;

wherein, kappa is a weight factor, SOC _ref Is a reference SOC;

s54: the off-line double-delay Q learning controller is utilized to combine the predicted vehicle speed in the prediction time domain, the reference SOC and the required power to realize the rolling optimization of the model prediction control based on double-delay Q learning, the optimal control sequence in the prediction time domain is obtained, after feedback correction, the first element in the control sequence is applied to the plug-in hybrid electric vehicle system, and the feedback correction is carried out under the action of the reference SOC track and the actual vehicle speed, so that the on-line optimization of the energy management of the plug-in hybrid electric vehicle is realized.

Further, the specific implementation steps in the step S54 are as follows:

s541: predicting future h using a CNN vehicle speed prediction model based on historical vehicle speed _p Vehicle speed in seconds:

s542: based on a model of longitudinal dynamics of the vehicle andcalculating the vehicle required power in the prediction time domain: />

S543: based on SOC reference trajectory, SOC trajectory calculated in each prediction horizon

S544: at each prediction time domain h _p In the step (3), the offline double-delay Q learning controller obtained in the step (3) is utilized to solve the problem of rolling optimization control in model prediction control, so as to obtain a control variable sequence of each stage in the prediction time domain:

s545: after feedback correction, the first element of the battery output power in a prediction time domain is used as an output action of model prediction control to be input into a plug-in hybrid electric vehicle system;

steps S541-S545 are then repeated until the entire driving cycle is completed.

Compared with the prior art, the invention has the following beneficial effects:

(1) Compared with the traditional reinforcement learning algorithm, the plug-in hybrid electric vehicle energy management method based on double-delay Q learning provided by the invention uses two Q functions to select and evaluate actions, so that overestimation of action values is avoided, and meanwhile, two free parameters and three internal variables are introduced to delay updating of the Q functions, thereby realizing unbiased estimation of the algorithm and improving the stability of the algorithm.

(2) When the reward function is set, the influence of the fuel consumption of the engine and the electric quantity consumption of the power battery is comprehensively considered, the convergence speed is high, and the control effect is better.

(3) When the vehicle speed prediction model is built, the convolutional neural network algorithm is adopted, so that noise is reduced, calculation efficiency is improved, fitting problems are reduced, and a better prediction effect is realized.

Drawings

In order to make the description of the technical scheme of the present invention more clear, the technical scheme of the present invention will be specifically described with reference to the accompanying drawings. It is evident that the drawings in the following description are only some embodiments of the present invention and that other drawings may be obtained from these drawings by those skilled in the art without inventive effort.

FIG. 1 is a topology of a power train of a plug-in hybrid vehicle;

FIG. 2 is a flow chart of the method of the present invention;

FIG. 3 is a graph showing the open circuit voltage and internal resistance of a power battery as a function of SOC;

FIG. 4 is a graph of a change in prize value for a dual delay Q learning energy management strategy prize function;

FIG. 5 is a schematic diagram of a convolutional neural network;

FIG. 6 is a diagram of a model predictive energy management design framework in accordance with the present invention;

FIG. 7 is a schematic diagram of real-time energy management control of PHEV model predictive control with dual delay Q learning;

FIG. 8a is a graph showing the variation of SOC of a PHEV power battery under different strategies;

FIG. 8b is a plot of operating point profiles for a PHEV engine under different strategies;

FIG. 8c is a graph comparing energy consumption of PHEV under different strategies;

Detailed Description

The invention will be further described with reference to the drawings and the specific examples. The embodiment is implemented on the premise of the technical scheme of the invention, and detailed implementation modes and specific operation processes are given, but the protection scope of the invention is not limited to the following embodiments.

Example 1:

in the embodiment, a plug-in hybrid electric vehicle model is built by adopting automobile simulation software Autonomie developed based on Matlab/Simulink, and the plug-in hybrid electric vehicle model is shown in figure 1 and comprises an engine, a driving motor, a generator, a battery pack, a main speed reducer, a DC/AC inverter and a planetary gear mechanism consisting of a sun gear, a planet carrier and a gear ring.

The specific flow of the plug-in hybrid electric vehicle energy management method based on the dual-delay Q learning provided in the embodiment is shown in fig. 2, and specifically includes the following steps:

step 2: a mathematical model of the powertrain system, a vehicle dynamics model, and a power balance equation between the power components are established.

Wherein the mathematical model of the driveline comprises: a drive motor model, a generator model, a power battery model, and a planetary transmission model;

the driving motor model is as follows:

the generator model is as follows:

the power battery model is as follows:

wherein ,V_oc and R_int Respectively refers to open circuit voltage and internal resistance of battery, P _b Is the output power of the battery, I _b Power battery current, SOC is power battery nuclear power state, Q _b Is battery capacity; the open circuit voltage and internal resistance are interpolated from the power cell SOC as shown in fig. 3.

The planetary transmission model is:

wherein ,w_r 、w _s and w_x Angular velocities of the planet, sun and ring gears, respectively, T _r 、T _s and T_x Torque of the planetary gear, sun gear, and ring gear, respectively, μ represents a gear ratio of the planetary gear set, and this embodiment is set to 2.6.

The vehicle dynamic longitudinal model is as follows:

the power balance equation between the power components is:

wherein ,P_ps Is the main deceleration power, P _e Is the engine power, eta _ps Efficiency of main speed reducer eta _g Representing the efficiency of the transmission system, P _aux The power of the accessory is set to be 200kW; z is a variable belonging to { -1,1 }.

Step 3: based on a double-delay Q learning algorithm, an objective function is established, and an offline double-delay Q learning controller is solved; specifically, the method for solving the offline double-delay Q learning controller comprises the following steps:

s31: establishing an objective function:

s33: the energy of the plug-in hybrid electric vehicle is mainly provided by an engine and a power battery of the vehicle, the power distribution of the engine and the power battery is a key factor influencing the consumption of the vehicle, a double-delay Q learning rewarding function is a function related to the fuel oil of the engine and the SOC of the battery at each moment, and specifically, the rewarding function of the delay double-Q learning controller is set as follows:

wherein ,a₁ 、a ₂ 、a ₃ 、b ₁ 、b ₂ and b₃ All are constant coefficients, in this embodiment 10, -10 respectively ³ 、2.5*10 ⁴ 、5*10 ⁴ and 10⁴ Delta SOC is a gain value of the actual SOC and the target SOC;

where s' is the optimal operation a for the environment ^* The next state obtained later, n, is the number of accesses experienced, this implementationLet 3, λε (0, 1) for providing a persistent "explore prize", γ being the discount factor, let 0.9, b being a random constant, d being a constant, let 0.5, U (s, a) for storing running sums of the estimates of each (s, a) n times after access, L (s, a) for storing the number of accesses of (s, a); learn (s, a) is a boolean flag indicating whether the Q value of the state-action pair (s, a) is allowed to be collected, learn (s, a) ∈ { true, false };

q value function update condition:

and repeating the steps S31-S34 until the execution times are greater than or equal to the maximum iteration times M, M=1000, and finally obtaining the energy management control strategy of the plug-in hybrid electric vehicle under the circulation working condition.

The optimized control strategy is expressed as:

the off-line double-delay Q learning controller solution is finished.

The change condition of the value of the rewarding function of the double-delay Q learning strategy after 1000 iterations is shown in fig. 4, the value of the rewarding function gradually increases along with the increase of the iteration times, and gradually converges after 580 rounds, which shows that the offline double-delay Q learning controller obtains the final optimal control sequence by applying the learned state action pair to solve the optimal action after exploration and learning in the iteration process.

Step 4: the convolutional neural network model is trained offline by utilizing the historical vehicle speed and acceleration data to obtain a CNN vehicle speed prediction model, which is specifically as follows:

s42: selecting sgdm as a training algorithm of a convolutional neural network model, setting an initial learning rate to 0.35, setting a learning speed reduction factor to 0.975, setting a reduction period to 2, and setting learning momentum to 0.925;

Step 5: on the model predictive control framework, as shown in fig. 6, the MPC rolling optimization solving process is implemented by using a dual-delay Q learning controller, so as to obtain the optimal power distribution of the engine and the power battery.

The method comprises the following steps:

wherein ,is history L _h Speed of seconds, < > on->Is the predicted time domain h _p Speed of seconds, L in this embodiment _h and h_p Respectively setting 20s and 5s, wherein f (-) is the mapping relation between the input and the output of the CNN vehicle speed prediction model;

the reference SOC calculation formula in the prediction time domain is:

wherein κ is a weight factor, the embodiment is set to 10 ⁸ ，SOC _ref Is a reference SOC;

The specific implementation process in step S54 is shown in fig. 7, and specifically includes:

S543: calculating an SOC trajectory in each prediction time domain based on the SOC reference trajectory

steps S541-S545 are then repeated until the execution time is greater than or equal to the time length T of the 6UDDS operating mode, t=8220S.

In order to verify the effectiveness of the algorithm, a composite working condition 6UDDS formed by 6 groups of UDDS is used as a verification working condition in the embodiment, a comparison group is arranged, the comparison group adopts a currently common power consumption-power maintenance strategy (CD/CS), and the performance of power battery SOC, engine working point distribution and equivalent oil consumption under the 6UDDS working condition is used for verifying the effectiveness of the algorithm according to 2 strategies.

FIG. 8a compares the SOC variation curves of the power battery of the plug-in hybrid electric vehicle in the CD/CS strategy and the double-delay Q learning control strategy (2 DQL-MPC) based on model predictive control, wherein gray dotted lines in the graph represent the SOC curves under the CD/CS strategy, and black solid lines represent the SOC curves under the 2DQL-MPC strategy.

In fig. 8b, the distribution situation of engine working points of the plug-in hybrid electric vehicle under two strategies is compared, the black line marked by an asterisk in the figure represents the optimal working curve of the engine, and the circle represents the working point of the engine. Fig. 8c compares the equivalent fuel consumption change conditions of the two strategies, wherein a black curve in the figure represents the fuel consumption change curve of the proposed method under the verification working condition, and a gray dotted line represents the fuel consumption change curve of the CD/CS strategy, so that the fuel consumption change track of the proposed method is more gentle, no drastic energy consumption occurs, the method provided by the invention is used for effectively distributing the power source of the plug-in hybrid power automobile, the energy consumption in the running process of the vehicle is obviously reduced, and the effectiveness and the superiority of the energy management method of the plug-in hybrid power automobile based on double-delay Q learning are verified.

The above embodiments are merely illustrative of the preferred embodiments of the present invention, and the present invention is not limited to the above embodiments, and various modifications and improvements made by those skilled in the art to the technical solutions of the present invention should fall within the protection scope of the present invention without departing from the design concept of the present invention, and the claimed technical content of the present invention is fully described in the claims.

Claims

1. The plug-in hybrid electric vehicle energy management method based on double-delay Q learning is characterized by comprising the following steps of:

2. The method for energy management of a plug-in hybrid vehicle based on dual-delay Q learning of claim 1, wherein said mathematical model of the powertrain system in step 2 comprises: a drive motor model, a generator model, a power battery model, and a planetary transmission model;

the driving motor model is as follows:

the generator model is as follows:

the power battery model is as follows:

wherein ,V_oc and R_int Respectively refers to open circuit voltage and internal resistance of battery, P _b Is the output power of the battery, I _b Power battery current, SOC is power battery nuclear power state, Q _b Is battery capacity;

the planetary gear transmission model is as follows:

3. The energy management method of a plug-in hybrid vehicle based on delayed double Q learning of claim 1, wherein the vehicle dynamics model in step 2 is:

the power balance equation between the power components is:

4. The method for managing energy of a plug-in hybrid vehicle based on dual-delay Q learning according to claim 1, wherein in the step 3, the solving the offline dual-delay Q learning controller comprises the steps of:

s31: establishing an objective function:

wherein ,E_s For starting or stopping engineThe parameters are judged, and the parameters are judged,the instantaneous oil consumption of the engine is achieved;

where s' is the optimal operation a for the environment ^* The next state obtained later, n being the number of accesses experienced, λε (0, 1) being used to provide a continuous "explore reward", γ being the discount factor, b being a random constant, d being a constant, U (s, a) being used to store the running sum of the estimated values of each (s, a) after n accesses, L (s, a) being used to store the number of accesses of (s, a); learn (s, a) is a Boolean flag indicating whether or not permission is givenCollecting the Q value of the state-action pair (s, a), learn (s, a) ∈ { true, false };

q value function update condition:

the optimized control strategy is expressed as:

the off-line double-delay Q learning controller solution is finished.

5. The energy management method of a plug-in hybrid electric vehicle based on dual-delay Q learning according to claim 1, wherein in the step 4, the CNN vehicle speed prediction model is specifically obtained by:

6. The method for energy management of a plug-in hybrid vehicle based on dual-delay Q learning of claim 1, wherein the step 5 specifically comprises:

the reference SOC calculation formula in the prediction time domain is:

wherein ,D_r 、D _p D respectively represents the running distance of the vehicle at the current moment, the running distance in the prediction time domain and the total running total distance under the driving cycle; Δt, h _p、v and v_p Respectively representing sampling time, predicted time domain length, current vehicle speed and predicted vehicle speed; SOC (State of Charge) ₀ 、SOC _end and SOC_ref Respectively referring to an initial SOC (state of charge), a terminal SOC and a reference SOC track value of the battery;

wherein ,for the sequence of power requirements in the prediction domain, v _p Is in the prediction time domainIs a sequence of vehicle speeds of (a),is the power in the prediction time domain;

wherein ,J_t Is an objective function, h _p To predict the time domain length E _s (t) refers to a judgment parameter of engine start-stop,is the instantaneous fuel of the engine, τsoc is the power battery SOC penalty function;

wherein, kappa is a weight factor, SOC _ref Is a reference SOC;

7. The energy management method for a plug-in hybrid vehicle based on dual-delay Q learning of claim 6, wherein the implementation step in step S54 is as follows:

steps S541-S545 are then repeated until the entire driving cycle is completed.