CN117184095B - Hybrid electric vehicle system control method based on deep reinforcement learning - Google Patents

Hybrid electric vehicle system control method based on deep reinforcement learning Download PDF

Info

Publication number
CN117184095B
CN117184095B CN202311359313.1A CN202311359313A CN117184095B CN 117184095 B CN117184095 B CN 117184095B CN 202311359313 A CN202311359313 A CN 202311359313A CN 117184095 B CN117184095 B CN 117184095B
Authority
CN
China
Prior art keywords
mode
power
critic
reinforcement learning
actor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311359313.1A
Other languages
Chinese (zh)
Other versions
CN117184095A (en
Inventor
张亚辉
王子萌
王众
田阳
焦晓红
文桂林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yanshan University
Original Assignee
Yanshan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yanshan University filed Critical Yanshan University
Priority to CN202311359313.1A priority Critical patent/CN117184095B/en
Publication of CN117184095A publication Critical patent/CN117184095A/en
Application granted granted Critical
Publication of CN117184095B publication Critical patent/CN117184095B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Electric Propulsion And Braking For Vehicles (AREA)
  • Hybrid Electric Vehicles (AREA)

Abstract

The invention provides a control method of a hybrid electric vehicle system based on deep reinforcement learning, which comprises the following steps: s1, acquiring multidimensional road condition information of a plug-in hybrid power logistics light truck in a historical driving process; s2, establishing a whole vehicle power system model; s3, pre-optimizing the two motors, and reducing the optimized dimension; s4, performing dynamic planning calculation to generate a state transition data set; s5, determining state variables, action variables and rewarding functions required by the reinforcement learning algorithm; s6, pretraining critic and actor networks by using the state transition data set generated in the step S4; and S7, building an environment-agent model, and continuously and iteratively training an energy management strategy by using a deep reinforcement learning algorithm. S8, applying the model. Model training is carried out by utilizing DDPG algorithm, and the trained deep reinforcement learning intelligent body is obtained, so that the self-adaption capability of the logistics light truck to random working conditions under a fixed route is realized under the condition of ensuring fuel economy.

Description

Hybrid electric vehicle system control method based on deep reinforcement learning
Technical Field
The invention relates to the technical field of new energy automobiles, in particular to a hybrid electric vehicle system control method based on deep reinforcement learning.
Background
In recent years, both the industry and the academia are researching and developing new types of automotive driveline. Increasingly stringent requirements continue to reduce fuel consumption and pollutant emission levels. Although electric vehicles (ELECTRIC VEHICLE, EV) can achieve zero emission, the electric vehicles are limited in popularity due to the shortcomings of mileage, charging infrastructure, battery cost, and the like. Plug-in hybrid ELECTRIC VEHICLE (PHEV) combines the advantages of both conventional internal combustion engines and new electric engines, and is becoming increasingly appreciated by automobile manufacturers.
Currently, energy management strategies are key to improving fuel economy for hybrid vehicles. Existing energy management strategies (ENERGY MANAGEMENT STRATEGY, EMS) generally fall into three broad categories, rule-based strategies, optimization-based strategies, and learning-based strategies. Although effective and easy to implement, rule-based strategies are limited in control performance to different vehicle types and vehicle configurations and are therefore not suitable for diverse driving needs. The optimization-based strategies include dynamic planning (Dynamic Programming, DP), pointrian maximum principle (Pontryagin's Minimum Principle, PMP), etc., and in practical applications, the optimization-based strategies rely largely on global operating condition information. The reinforcement learning (reinforcement learning, RL) based strategy may address the deficiencies of the conventional control techniques described above. Compared with dynamic planning, the energy management strategy based on reinforcement learning saves a great deal of calculation time and cost under the condition of being capable of running on line, and can obtain a near global optimal result. These characteristics make it an efficient, robust development approach for hybrid system energy management strategies. However, the existing method is inaccurate in result and low in calculation speed, and is not suitable for popularization and application, so that a high-efficiency and accurate energy management method needs to be studied urgently.
Disclosure of Invention
Aiming at the problems in the prior art, the invention aims to provide a control method of a hybrid electric vehicle system based on deep reinforcement learning, which can be applied in real time while ensuring that the optimization result is close to the optimality, has high efficiency and rapidness in the treatment process and accurate result, can excavate the fuel saving potential of the configuration under various driving modes, and improves the overall economy of a fuel cell vehicle. And meanwhile, experience generated by dynamic programming of historical working conditions is introduced in the optimization process, so that the calculation speed of the reinforcement learning algorithm is increased, and the accuracy of a calculation result is further improved.
Specifically, the invention provides a control method of a hybrid electric vehicle system based on deep reinforcement learning, which specifically comprises the following steps:
S1, acquiring fixed-route multidimensional road condition information in the historical running process of a plug-in hybrid electric vehicle;
s2, establishing a whole vehicle power system model and a power battery model of the plug-in hybrid electric vehicle; the method specifically comprises the following substeps:
s21, a whole vehicle power system model is as follows:
Pdem=Pm12+PengηAMT
Wherein P dem is the required power, P m12 is the equivalent motor power, P eng is the engine power, and eta AMT is the efficiency of the gearbox;
Pm12=PEM+PISGηAMT
wherein P EM is EM motor power, and P ISG is ISG motor power;
S22, building a power battery model:
Wherein I bat is power battery current, U OC is power battery open-circuit voltage, and R bat is power battery internal resistance;
In the method, in the process of the invention, Q bat is the power battery capacity, which is the change rate of the state of charge of the power battery;
s3, pre-optimizing the two motors to reduce the optimized dimension, and specifically comprises the following substeps:
S31, taking total energy conversion efficiency of two motors as a target, and establishing a pre-optimized objective function as follows:
wherein P EM is EM motor power, eta EM is EM motor efficiency, P ISG is ISG motor power, eta ISG is ISG motor efficiency, and i g is gear;
s32, discretizing the vehicle speed and the equivalent motor power to form grid points to be traversed;
S33, under each mode, calculating the optimal power distribution of the two motors under each grid point formed in the step S32 according to the objective function established in the step S31;
S4, carrying out dynamic planning calculation on the multidimensional road condition information of the plug-in hybrid electric vehicle acquired in the step S1 by combining the pre-optimized result in the step S3 to generate a state transition data set;
s5, determining state variables, action variables and rewarding functions required by the reinforcement learning algorithm; wherein, the state variable s, the action variable a and the reward function r are respectively:
s=(Pdem,SOC,d,θ,v,v1,v2)
Wherein P dem is the power required by the whole vehicle, SOC is the battery charge state, d is the driving distance, θ is the current road gradient, v is the current vehicle speed, v 1 is the previous second vehicle speed, and v 2 is the previous two seconds vehicle speed;
a=(mode,Peng)
Where mode is the selected mode and P eng is engine power;
r=-(Cfuel·pricefuelSOC·Qbat·priceelectric)
Wherein, C fuel is the fuel consumption of one step, price fuel is the price of unit fuel consumption, delta SOC is the state of charge change of one step, Q bat is the power battery capacity, and price electric is the price of one degree of electricity;
S6, pre-training critic and actor networks by using the state transition data set generated in the step S4;
S7, building an environment intelligent agent model, transferring the state transfer data set generated in the step S4 into an experience pool, taking the two neural networks trained in the step S6 as an initial network model, and continuously and iteratively training an energy management strategy by using a deep reinforcement learning algorithm until the algorithm converges to obtain a final environment intelligent agent model;
And S8, adding the converged environment intelligent agent model into the HCU controller, and performing online application on a fixed line.
Preferably, the fixed-route multidimensional road condition information obtained in the step S1 includes a plurality of sets of historical vehicle speed curves and road gradient curves collected by the electric vehicle driving on the fixed route.
Preferably, the modes in step S33 include a series mode, a pure electric mode, a parallel first gear mode, and a parallel second gear mode.
Preferably, step S4 specifically comprises the following sub-steps:
s41, selecting a plurality of groups of working conditions as road condition multidimensional information for the history working conditions under the same road section acquired in the step S1;
S42, respectively calculating the minimum intervention vehicle speed in each mode, and respectively traversing the executable modes under the grid points formed by time and SOC according to the minimum intervention vehicle speed condition;
And S43, when the pure electric mode, the first-gear series-parallel mode and the second-gear series-parallel mode are traversed, the power of the two motors is distributed according to the pre-optimized result in the step S3.
Preferably, the step S6 specifically comprises the following substeps;
S61, collecting a state S, an optimal action a, a generated return r, a next state S' and a value function V of the state under each grid point of the data set generated in the step S4;
S62, constructing critic and actor neural network structures, wherein the critic and actor neural network structures comprise an input layer, an hidden layer and an output layer, and determining the number of network neurons of each layer;
S63, constructing critic an output sample set of the network, wherein the output sample set is as follows:
Q(s,a)=r+γ·V(s')
Wherein r is the return in the data set generated in the step 4, and V (s ') is the value function of the next state s' in the data set generated in the step 4;
S64, taking (S; a) and Q (S, a) in the step S61 and the step S63 as input and output samples of the critic network respectively, wherein S and a are input and output samples of the actor network, and pretraining the critic and actor networks by using a gradient descent method.
Preferably, in step S62, the neural network structures critic and actor output the values of the four modes and the engine power, and the actual actions are selected as follows:
a=(argmax(V1、V2、V3、V4),Peng)
Wherein V 1、V2、V3、V4 is the value of the series mode, the pure electric mode, the parallel first gear mode and the parallel second gear mode respectively.
Preferably, the step S7 specifically includes the following substeps:
S71, combining a whole vehicle environment module and a DDPG algorithm module to construct an interactive algorithm;
S72, migrating the state transition data set generated in the step S4 into an experience pool of an interactive algorithm, taking the two neural network models pre-trained in the step S6 as initial critic and actor neural network models, and completing the establishment of an intelligent agent module;
S73, defining real-time state parameters of the whole vehicle and corresponding rewarding values as input parameters of a neural network in an intelligent agent module in each training, taking control variables output by the neural network as input parameters of a whole vehicle model in an environment module, generating new rewarding values after a vehicle executes a control command, and storing obtained experiences in an experience pool;
s74, the agent updates through the strategy gradient to realize the learning updating step of the neural network;
and S75, repeatedly iterating until the algorithm converges to obtain a final environment intelligent agent model.
Preferably, the specific policy gradient update formula in step S74 is as follows:
wherein r is a single step reward, s and s' are the current state quantity and the state quantity at the next moment, a is the current action quantity, θ Q and θ μ are the network parameters of critic and actor at the current moment respectively, And/>The network parameters are respectively the target critic and actor, gamma and mu are weight parameters, and alpha and tau are learning rate and target network update rate.
Compared with the prior art, the invention has the following beneficial effects:
(1) The invention adopts the deep reinforcement learning algorithm to manage the energy, realizes the instantaneity and optimality of the energy management strategy, and ensures the maximization of energy utilization.
(2) According to the invention, under the condition that two motors work simultaneously, pre-optimization calculation is adopted, the problem of simultaneous optimization of three power sources is changed into a layered optimization problem, firstly, the power distribution of the two motors in the power system is regarded as a static optimization problem to perform power distribution and gear pre-optimization, and secondly, deep reinforcement learning is used for mode selection and power distribution of the engine and the equivalent motor, so that the optimization difficulty is reduced.
(3) According to the invention, the results of dynamic planning of a plurality of groups of working conditions acquired on a fixed route are used as expert knowledge to be transferred to the reinforcement learning agent, so that the convergence speed of the reinforcement learning algorithm is greatly increased, and the calculation efficiency can be greatly improved.
(4) According to the invention, the historical data and the topographic information are fully utilized aiming at the scene of the electric vehicle delivering goods on the fixed route, so that the method can effectively learn the driving characteristics of the electric vehicle under the fixed route, and is more applicable than a general energy management strategy under the fixed route delivering scene.
Drawings
FIG. 1 is a schematic flow chart of the present invention;
FIG. 2 is a flow chart of an energy management strategy algorithm in embodiment 1 of the present invention;
Fig. 3 is a schematic structural diagram of a P2P3 plug-in hybrid light card and a pattern division illustration in embodiment 1 of the present invention;
FIG. 4 is a pre-optimization flow chart in embodiment 1 of the present invention;
fig. 5 is a schematic diagram of critic and actor networks in embodiment 1 of the present invention;
FIG. 6 is a schematic illustration of a fixed route for light truck driving for a part of road segments used in the 6th IFAC E-COSM 2021 challenge in example 2 of the present invention;
FIGS. 7 a-7 b are maps of two motor efficiencies of the light truck of example 2 of the present invention;
FIG. 8 is a graph showing the results after mode two pre-optimization in example 2 of the present invention;
FIG. 9 is a graph showing the results after mode three pre-optimization in example 2 of the present invention;
FIG. 10 is a schematic diagram of the results after mode four pre-optimization in example 2 of the present invention;
FIG. 11 is a schematic diagram of the history collected under the road section shown in FIG. 6 in embodiment 2 of the present invention;
FIG. 12 is a schematic diagram of the energy distribution results of the dynamic planning energy management strategy in embodiment 2 of the present invention under a certain working condition;
fig. 13 is a diagram showing convergence characteristics of the training process in embodiment 2 of the present invention.
Detailed Description
The invention will be described in further detail in the following detailed description of specific embodiments thereof with reference to the drawings. The following examples or figures are illustrative of the invention and are not intended to limit the scope of the invention.
The invention provides a control method of a hybrid electric vehicle system based on deep reinforcement learning, as shown in fig. 1, the method specifically comprises the following steps:
s1, acquiring fixed-route multidimensional road condition information in the history running process of the plug-in hybrid electric vehicle.
S2, establishing a whole vehicle power system model and a power battery model of the plug-in hybrid electric vehicle; the method specifically comprises the following substeps:
s21, a whole vehicle power system model is as follows:
Pdem=Pm12+PengηAMT
Wherein P dem is the required power, P m12 is the equivalent motor power, P eng is the engine power, and eta AMT is the efficiency of the gearbox;
Pm12=PEM+PISGηAMT
Where P EM is the EM motor power and P ISG is the ISG motor power.
S22, building a power battery model:
Wherein I bat is power battery current, U OC is power battery open-circuit voltage, and R bat is power battery internal resistance;
In the method, in the process of the invention, The change rate of the state of charge of the power battery is represented by Q bat, which is the capacity of the power battery.
S3, pre-optimizing the two motors to reduce the optimized dimension, and specifically comprises the following substeps:
S31, taking total energy conversion efficiency of two motors as a target, and establishing a pre-optimized objective function as follows:
wherein P EM is EM motor power, eta EM is EM motor efficiency, P ISG is ISG motor power, eta ISG is ISG motor efficiency, and i g is gear.
S32, discretizing the vehicle speed and the equivalent motor power to form grid points needing to be traversed.
S33, under each mode, calculating the optimal power distribution of the two motors under each grid point formed in the step S32 according to the objective function established in the step S31; each mode includes a series mode, a pure mode, a parallel first gear mode, and a parallel second gear mode.
S4, carrying out dynamic planning calculation on the multidimensional road condition information of the plug-in hybrid electric vehicle acquired in the step S1 by combining the pre-optimized result in the step S3 to generate a state transition data set.
The step S4 specifically includes the following substeps:
S41, selecting a plurality of groups of working conditions as road condition multidimensional information for the history working conditions under the same road section acquired in the step S1.
S42, calculating the minimum intervention vehicle speed in each mode, and traversing the executable modes according to the minimum intervention vehicle speed condition under the grid points formed by time and SOC.
And S43, when the pure electric mode, the first-gear series-parallel mode and the second-gear series-parallel mode are traversed, the power of the two motors is distributed according to the pre-optimized result in the step S3.
S5, determining state variables, action variables and rewarding functions required by the reinforcement learning algorithm; wherein, the state variable s, the action variable a and the reward function r are respectively:
s=(Pdem,SOC,d,θ,v,v1,v2)
Wherein P dem is the power required by the whole vehicle, SOC is the battery charge state, d is the driving distance, θ is the current road gradient, v is the current vehicle speed, v 1 is the previous second vehicle speed, and v 2 is the previous two seconds vehicle speed;
a=(mode,Peng)
Where mode is the selected mode and P eng is engine power;
r=-(Cfuel·pricefuelSOC·Qbat·priceelectric)
where C fuel is one-step fuel consumption, price fuel is fuel consumption per unit, delta SOC is one-step state of charge change, Q bat is power battery capacity, and price electric is one-degree electricity price.
S6, pre-training critic and actor networks by using the state transition data set generated in the step S4; s61, collecting the state S at each grid point of the data set generated in step S4, the optimal action a in the state, the generated return r, the next state S' transferred to and the value function V of the state.
S62, constructing critic and actor neural network structures, wherein the critic and actor neural network structures comprise an input layer, an hidden layer and an output layer, and determining the number of network neurons of each layer; the critic and actor neural network structures output the values of four modes and the power of the engine, and actual actions are selected as follows:
a=(argmax(V1、V2、V3、V4),Peng)
Wherein V 1、V2、V3、V4 is the value of the series mode, the pure electric mode, the parallel first gear mode and the parallel second gear mode respectively.
S63, constructing critic an output sample set of the network, wherein the output sample set is as follows:
Q(s,a)=r+γ·V(s')
Wherein r is the return in the data set generated in the step 4, and V (s ') is the value function of the next state s' in the data set generated in the step 4;
S64, taking (S; a) and Q (S, a) in the step S61 and the step S63 as input and output samples of the critic network respectively, wherein S and a are input and output samples of the actor network, and pretraining the critic and actor networks by using a gradient descent method.
S7, building an environment intelligent agent model, transferring the state transfer data set generated in the step S4 into an experience pool, taking the two neural networks trained in the step S6 as an initial network model, and continuously and iteratively training an energy management strategy by using a deep reinforcement learning algorithm until the algorithm converges to obtain a final environment intelligent agent model; step S7 specifically comprises the following sub-steps;
S71, combining a whole vehicle environment module and a DDPG algorithm module to construct an interactive algorithm;
S72, migrating the state transition data set generated in the step S4 into an experience pool of an interactive algorithm, taking the two neural network models pre-trained in the step S6 as initial critic and actor neural network models, and completing the establishment of an intelligent agent module;
S73, defining real-time state parameters of the whole vehicle and corresponding rewarding values as input parameters of a neural network in an intelligent agent module in each training, taking control variables output by the neural network as input parameters of a whole vehicle model in an environment module, generating new rewarding values after a vehicle executes a control command, and storing obtained experiences in an experience pool;
s74, the agent updates through the strategy gradient to realize the learning updating step of the neural network;
the specific strategy gradient update formula is as follows:
wherein r is a single step reward, s and s' are the current state quantity and the state quantity at the next moment, a is the current action quantity, θ Q and θ μ are the network parameters of critic and actor at the current moment respectively, And/>The network parameters are respectively the target critic and actor, gamma and mu are weight parameters, and alpha and tau are learning rate and target network update rate.
And S75, repeatedly iterating until the algorithm converges to obtain a final environment intelligent agent model.
And S8, adding the converged environment intelligent agent model into the HCU controller, and performing online application on a fixed line.
Example 1
In this embodiment, the method of the present invention is applied to a P2P3 serial light card, and the method is shown in fig. 2, and specifically includes the following steps:
s1, acquiring fixed-route multidimensional road condition information of a plug-in hybrid power logistics light truck in a historical driving process, wherein the method specifically comprises the following steps of:
the logistics light truck runs and collects a plurality of groups of historical vehicle speed curves and road gradient curves on a fixed route.
S2, a power system structure is shown in FIG 3, a plug-in hybrid power logistics light truck whole vehicle power system model is built based on the power system structure, and the method specifically comprises the following steps:
s21, a built whole vehicle dynamics model is as follows:
Wherein P dem is the required power, v h is the vehicle speed of the vehicle, m is the vehicle weight, ρ is the air density, A is the frontal area of the vehicle, C d is the air resistance coefficient, g is the gravitational acceleration, μ r is the rolling resistance coefficient, and θ is the road gradient.
Pdem=Pm12+PengηAMT
Where P m12 is the equivalent motor power, P eng is the engine power, and η AMT is the transmission efficiency.
Pm12=PEM+PISGηAMT
Where P EM is the EM motor power and P ISG is the ISG motor power.
S22, a built power battery model is as follows:
Wherein I bat is power battery current, U OC is power battery open-circuit voltage, and R bat is power battery internal resistance.
In the method, in the process of the invention,The change rate of the state of charge of the power battery is represented by Q bat, which is the capacity of the power battery.
S3, pre-optimizing logic is shown in FIG. 4, and pre-optimizing processing is carried out on the coupled modes of the two motors in an off-line mode, wherein the pre-optimizing logic specifically comprises the following steps:
S31, taking the total energy conversion efficiency of the two motors as a target, and establishing a pre-optimized objective function as follows:
wherein P EM is EM motor power, eta EM is EM motor efficiency, P ISG is ISG motor power, eta ISG is ISG motor efficiency, and i g is gear.
S32, discretizing the vehicle speed and the equivalent motor power to form grid points needing to be traversed.
And S33, in each mode, calculating the optimal power distribution of the two motors under each grid point formed in the step S32 according to the objective function established in the step S31.
S4, carrying out dynamic planning calculation on the history working conditions under the same road section collected in the step S1 by combining the pre-optimized result in the step S3, wherein the method specifically comprises the following steps:
S41, selecting a plurality of groups of working conditions for the history working conditions under the same road section acquired in the step S1.
S42, respectively calculating the minimum intervention vehicle speed under each mode according to the characteristic that the minimum rotation speed of the engine is required to be larger than the idle speed, and respectively traversing the executable modes under the grid points formed by time and SOC according to the minimum intervention vehicle speed condition.
And S43, when the pure electric mode (mode 2), the first-gear series-parallel mode (mode 3) and the second-gear series-parallel mode (mode 3) are traversed, the power of the two motors is distributed according to the pre-optimized result of the step S3.
S5, determining state variables required by the reinforcement learning algorithm, wherein the action variables and the reward functions are respectively as follows:
s=(Pdem,SOC,d,θ,v,v1,v2)
Wherein P dem is the power required by the whole vehicle, SOC is the battery charge state, d is the driving distance, θ is the current road gradient, v is the current vehicle speed, v 1 is the previous second vehicle speed, and v 2 is the previous two seconds vehicle speed.
a=(mode,Peng)
Where mode is the selected mode and P eng is engine power.
r=-(Cfuel·pricefuelSOC·Qbat·priceelectric)
Where C fuel is one-step fuel consumption, price fuel is fuel consumption per unit, delta SOC is one-step state of charge change, Q bat is power battery capacity, and price electric is one-degree electricity price.
S6, pretraining critic and actor networks by using the state transition data set generated in the step S4, wherein the pretraining comprises the following steps;
S61, collecting the data set generated in the step S4, wherein the data set comprises: the state s at each grid point, the optimal action a at that state, the generated return r, the next state s' to transition to, the value function V of the state.
S62, as shown in FIG. 5, a critic and actor network structure is constructed, wherein the network structure comprises an input layer, an hidden layer and an output layer, and the number of network neurons of each layer is determined.
Further, actor network outputs the value and engine power of four modes, and the actual actions are selected as follows:
a=(argmax(V1、V2、V3、V4),Peng)
Wherein V 1、V2、V3、V4 is the value of the series mode, the pure electric mode, the parallel first gear mode and the parallel second gear mode respectively.
Actor the network outputs V1, V2, V3, V4, peng five values, the first four of which are the values of the pattern, the largest of which is selected as the selected pattern, and finally the previous action variable a= (mode, peng) is formed.
S63, constructing critic an output sample set of the network, wherein the output sample set is as follows:
Q(s,a)=r+γ·V(s')
Where r is the return in the dataset generated in step 4 and V (s ') is a function of the value of the next state s' in the dataset generated in step 4.
S64, using (S; a) and Q (S, a) in the step S63 and the step S62 as input and output samples of critic networks, S and a as input and output samples of actor networks, and pretraining the critic and actor networks by using a gradient descent method.
S7, building an environment-agent model, and continuously and iteratively training an energy management strategy by using a deep reinforcement learning algorithm, wherein the method comprises the following steps of:
s71, combining the whole vehicle environment module with the DDPG algorithm module to construct an interactive algorithm.
And S72, migrating the state transition data set generated in the step S4 into an experience pool, and taking the two neural networks pre-trained in the step S6 as initial critic and actor networks to complete the construction of the intelligent agent module.
S73, dividing the plurality of groups of historical vehicle speed curves acquired in the step S1 into a training set and a testing set.
And S74, in each training, defining the real-time state parameter of the whole vehicle and the corresponding rewarding value as input parameters of a neural network in the intelligent agent module, taking the control variable output by the neural network as the input parameters of the whole vehicle model in the environment module, generating a new rewarding value after the vehicle executes a control command, and storing the experience obtained in the step in an experience pool.
S75, the agent updates through the strategy gradient, so that the learning updating step of the neural network is realized, and a specific strategy gradient updating formula is as follows:
wherein r is a single step reward, s and s' are the current state quantity and the state quantity at the next moment, a is the current action quantity, θ Q and θ μ are the network parameters of critic and actor at the current moment respectively, And/>The network parameters are respectively the target critic and actor, gamma and mu are weight parameters, and alpha and tau are learning rate and target network update rate.
And S76, repeatedly iterating and verifying on the test set until the ideal effect is learned, and storing the global neural network persistence model after training is finished.
And S8, burning the converged actor neural network model into a light truck HCU controller, and performing online application on a fixed line.
Under the loop experiment scene, after actor network built based on tensorflow is converged, the hardware reproduces the network structure in the simulink platform and extracts the parameters into the network built by the simulink. And encapsulating the network built by the simulink into an HCU module, adding the HCU module into a speedgoat controller, outputting the control quantity in real time as a controller for hardware in-loop test, and performing in-loop experiments by interacting with a light card model.
Example 2
In this embodiment, the part of road used by 6th IFAC E-COSM 2021 to challenge the race is the fixed route for the light truck to travel, so as to optimize the plug-in hybrid power logistics light truck. The whole method comprises the following steps:
S1, acquiring multi-dimensional road condition information of a fixed route of the plug-in hybrid power logistics light truck in the history driving process, wherein the part of road sections used by the 6th IFAC E-COSM 2021 for challenging the racing is the fixed route of the light truck driving, as shown in FIG. 6. The total length of the road section is 10.16km, 11 signal intersections are all arranged, and the distance between traffic lights is different from 1288m to 247 m. The phase cycle time of each traffic light was 120 seconds, and the duration of the red and green lights was half of that of the yellow light, respectively, without consideration.
S2, a plug-in hybrid power logistics light truck whole vehicle power system model is built based on whole vehicle parameters and a power system structure, and main parameters of the light truck in the embodiment are shown in a table 1.
TABLE 1 main parameters of P2-P3 series-parallel hybrid light truck
S3, pre-optimizing the coupled modes of the two motors, wherein two motor efficiency map diagrams of the light truck in the example are shown in fig. 7a and 7b, and pre-optimizing results in different modes are shown in fig. 8, 9 and 10 respectively.
And S4, collecting the history working conditions under the same road section shown in FIG 6, and carrying out dynamic planning calculation by combining the pre-optimized result in the step S3. The history of the present embodiment collected under the road section shown in fig. 6 is shown in fig. 11. The energy distribution result of the dynamic programming energy management strategy under a certain working condition is shown in fig. 12.
S5, determining state variables required by the reinforcement learning algorithm, wherein the action variables and the reward functions are respectively as follows:
s=(Pdem,SOC,d,θ,v,v1,v2)
Wherein P dem is the power required by the whole vehicle, SOC is the battery charge state, d is the driving distance, θ is the current road gradient, v is the current vehicle speed, v 1 is the previous second vehicle speed, and v 2 is the previous two seconds vehicle speed.
a=(mode,Peng)
Where mode is the selected mode and P eng is engine power.
r=-(Cfuel·pricefuelSOC·Qbat·priceelectric)
Where C fuel is one-step fuel consumption, price fuel is fuel consumption per unit, delta SOC is one-step state of charge change, Q bat is power battery capacity, and price electric is one-degree electricity price.
S6, pre-training critic and actor networks by using a state transition data set generated by dynamic programming calculation.
And S7, building an environment-agent model, and continuously and iteratively training an energy management strategy by using a deep reinforcement learning algorithm. The convergence characteristics of the training process in this example are shown in fig. 13.
And S8, burning the converged actor neural network model into a light truck HCU controller, and performing online application on a fixed line. The HCU controller employed in this example is Speedgoat real-time controller.
Finally, it is noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present invention may be modified or equally substituted without departing from the spirit and scope of the present invention, and it should be covered in the scope of the appended claims.

Claims (5)

1. A hybrid electric vehicle system control method based on deep reinforcement learning is characterized in that: the method specifically comprises the following steps:
S1, acquiring fixed-route multidimensional road condition information in the historical running process of a plug-in hybrid electric vehicle;
s2, establishing a whole vehicle power system model and a power battery model of the plug-in hybrid electric vehicle; the method specifically comprises the following substeps:
s21, a whole vehicle power system model is as follows:
Pdem=Pm12+PengηAMT
Wherein P dem is the required power, P m12 is the equivalent motor power, P eng is the engine power, and eta AMT is the efficiency of the gearbox;
Pm12=PEM+PISGηAMT
wherein P EM is EM motor power, and P ISG is ISG motor power;
S22, building a power battery model:
Wherein I bat is power battery current, U OC is power battery open-circuit voltage, and R bat is power battery internal resistance;
In the method, in the process of the invention, Q bat is the power battery capacity, which is the change rate of the state of charge of the power battery;
s3, pre-optimizing the two motors to reduce the optimized dimension, and specifically comprises the following substeps:
S31, taking total energy conversion efficiency of two motors as a target, and establishing a pre-optimized objective function as follows:
wherein P EM is EM motor power, eta EM is EM motor efficiency, P ISG is ISG motor power, eta ISG is ISG motor efficiency, and i g is gear;
s32, discretizing the vehicle speed and the equivalent motor power to form grid points to be traversed;
S33, under each mode, calculating the optimal power distribution of the two motors under each grid point formed in the step S32 according to the objective function established in the step S31, wherein the modes comprise a series mode, a pure electric mode, a parallel first gear mode and a parallel second gear mode;
S4, carrying out dynamic planning calculation on the multidimensional road condition information of the plug-in hybrid electric vehicle acquired in the step S1 by combining the pre-optimized result in the step S3 to generate a state transition data set;
the step S4 specifically includes the following substeps:
s41, selecting a plurality of groups of working conditions as multidimensional road condition information for the history working conditions under the same road section acquired in the step S1;
S42, respectively calculating the minimum intervention vehicle speed in each mode, and respectively traversing the executable modes under the grid points formed by time and SOC according to the minimum intervention vehicle speed condition;
s43, when the pure electric mode, the first-gear series-parallel mode and the second-gear series-parallel mode are traversed, the power of the two motors is distributed according to the pre-optimized result of the step S3;
s5, determining state variables, action variables and rewarding functions required by the reinforcement learning algorithm; wherein, the state variable s, the action variable a and the reward function r are respectively:
s=(Pdem,SOC,d,θ,v,v1,v2)
Wherein P dem is the power required by the whole vehicle, SOC is the battery charge state, d is the driving distance, θ is the current road gradient, v is the current vehicle speed, v 1 is the previous second vehicle speed, and v 2 is the previous two seconds vehicle speed;
a=(mode,Peng)
Where mode is the selected mode and P eng is engine power;
r=-(Cfuel·pricefuelSOC·Qbat·priceelectric)
Wherein, C fuel is the fuel consumption of one step, price fuel is the price of unit fuel consumption, delta SOC is the state of charge change of one step, Q bat is the power battery capacity, and price electric is the price of one degree of electricity;
S6, pre-training critic and actor networks by using the state transition data set generated in the step S4;
S7, building an environment intelligent agent model, transferring the state transfer data set generated in the step S4 into an experience pool, taking the two neural networks trained in the step S6 as an initial network model, and continuously and iteratively training an energy management strategy by using a deep reinforcement learning algorithm until the algorithm converges to obtain a final environment intelligent agent model;
The step S7 specifically includes the following substeps:
S71, combining a whole vehicle environment module and a DDPG algorithm module to construct an interactive algorithm;
S72, migrating the state transition data set generated in the step S4 into an experience pool of an interactive algorithm, taking the two neural network models pre-trained in the step S6 as initial critic and actor neural network models, and completing the establishment of an intelligent agent module;
S73, defining real-time state parameters of the whole vehicle and corresponding rewarding values as input parameters of a neural network in an intelligent agent module in each training, taking control variables output by the neural network as input parameters of a whole vehicle model in an environment module, generating new rewarding values after a vehicle executes a control command, and storing obtained experiences in an experience pool;
s74, the agent updates through the strategy gradient to realize the learning updating step of the neural network;
S75, repeatedly iterating until the algorithm converges to obtain a final environment intelligent agent model;
And S8, adding the converged environment intelligent agent model into the HCU controller, and performing online application on a fixed line.
2. The deep reinforcement learning-based hybrid electric vehicle system control method according to claim 1, characterized in that: the fixed-route multidimensional road condition information obtained in the step S1 comprises a plurality of groups of historical vehicle speed curves and road gradient curves, wherein the historical vehicle speed curves and the road gradient curves are collected when the electric vehicle runs on the fixed route.
3. The deep reinforcement learning-based hybrid electric vehicle system control method according to claim 1, characterized in that: the step S6 specifically comprises the following substeps;
S61, acquiring a state variable S, an optimal action variable a, a reward function r and a value function V of a next state variable S' transferred to, which are generated in the step S4, at each grid point of the data set;
S62, constructing critic and actor neural network structures, wherein the critic and actor neural network structures comprise an input layer, an hidden layer and an output layer, and determining the number of network neurons of each layer;
S63, constructing critic an output sample set of the network, wherein the output sample set is as follows:
Q(s,a)=r+γ·V(s′)
wherein r is a reward function in the data set generated in the step S4, and V (S ') is a value function of a next state variable S' in the data set generated in the step S4;
S64, taking (S, a) in the step S61 as an input sample of the critic network, taking Q (S, a) in the step S63 as an output sample of the critic network, wherein S is an input sample of the actor network, and a is an output sample of the actor network, and pretraining the critic and actor networks by using a gradient descent method.
4. The deep reinforcement learning-based hybrid electric vehicle system control method according to claim 3, characterized in that: in step S62, the neural network structures critic and actor output the values of the four modes and the engine power, and the actual action variables are selected as follows:
a=(argmax(V1、V2、V3、V4),Peng)
Wherein V 1、V2、V3、V4 is the value of the series mode, the pure electric mode, the parallel first gear mode and the parallel second gear mode respectively.
5. The deep reinforcement learning-based hybrid electric vehicle system control method according to claim 1, characterized in that: the specific policy gradient update formula in step S74 is as follows:
Where r is the reward function, s and s' are the current state variable and the next state variable, a is the action variable, θ Q and θ μ are the critic and actor network parameters at the current time, And/>The network parameters are respectively the target critic and actor, gamma and mu are weight parameters, and alpha and tau are learning rate and target network update rate.
CN202311359313.1A 2023-10-20 2023-10-20 Hybrid electric vehicle system control method based on deep reinforcement learning Active CN117184095B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311359313.1A CN117184095B (en) 2023-10-20 2023-10-20 Hybrid electric vehicle system control method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311359313.1A CN117184095B (en) 2023-10-20 2023-10-20 Hybrid electric vehicle system control method based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN117184095A CN117184095A (en) 2023-12-08
CN117184095B true CN117184095B (en) 2024-05-14

Family

ID=88988839

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311359313.1A Active CN117184095B (en) 2023-10-20 2023-10-20 Hybrid electric vehicle system control method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN117184095B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110341690A (en) * 2019-07-22 2019-10-18 北京理工大学 A kind of PHEV energy management method based on deterministic policy Gradient learning
CN111731303A (en) * 2020-07-09 2020-10-02 重庆大学 HEV energy management method based on deep reinforcement learning A3C algorithm
CN112287463A (en) * 2020-11-03 2021-01-29 重庆大学 Fuel cell automobile energy management method based on deep reinforcement learning algorithm
CN115476841A (en) * 2022-10-10 2022-12-16 湖南大学重庆研究院 Plug-in hybrid electric vehicle energy management method based on improved multi-target DDPG
WO2023020083A1 (en) * 2021-08-20 2023-02-23 Ningbo Geely Automobile Research & Development Co., Ltd. A method for adaptative real-time optimization of a power or torque split in a vehicle
CN116424332A (en) * 2023-04-10 2023-07-14 重庆大学 Energy management strategy enhancement updating method for deep reinforcement learning type hybrid electric vehicle

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20210076223A (en) * 2019-12-13 2021-06-24 현대자동차주식회사 Hybrid vehicle and method of controlling the same

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110341690A (en) * 2019-07-22 2019-10-18 北京理工大学 A kind of PHEV energy management method based on deterministic policy Gradient learning
CN111731303A (en) * 2020-07-09 2020-10-02 重庆大学 HEV energy management method based on deep reinforcement learning A3C algorithm
CN112287463A (en) * 2020-11-03 2021-01-29 重庆大学 Fuel cell automobile energy management method based on deep reinforcement learning algorithm
WO2023020083A1 (en) * 2021-08-20 2023-02-23 Ningbo Geely Automobile Research & Development Co., Ltd. A method for adaptative real-time optimization of a power or torque split in a vehicle
CN115476841A (en) * 2022-10-10 2022-12-16 湖南大学重庆研究院 Plug-in hybrid electric vehicle energy management method based on improved multi-target DDPG
CN116424332A (en) * 2023-04-10 2023-07-14 重庆大学 Energy management strategy enhancement updating method for deep reinforcement learning type hybrid electric vehicle

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
基于并行深度强化学习的混合动力汽车能量管理策略优化;李家曦;孙友长;庞玉涵;伍朝兵;杨小青;胡博;;重庆理工大学学报(自然科学);20200915(第09期);全文 *
胡恒杰.基于强化学习的插电式混合动力汽车能量管理策略研究.中国硕士学位论文全文数据库 工程科技Ⅱ辑.2021,(第5期),全文. *
陈伯杨.基于深度强化学习的功率分流式混合动力汽车能量管理策略研究.中国硕士学位论文全文数据库 工程科技Ⅱ辑.2022,(第10期),全文. *

Also Published As

Publication number Publication date
CN117184095A (en) 2023-12-08

Similar Documents

Publication Publication Date Title
Lian et al. Cross-type transfer for deep reinforcement learning based hybrid electric vehicle energy management
Wu et al. Continuous reinforcement learning of energy management with deep Q network for a power split hybrid electric bus
CN111731303B (en) HEV energy management method based on deep reinforcement learning A3C algorithm
CN110341690B (en) PHEV energy management method based on deterministic strategy gradient learning
CN106004865B (en) Mileage ADAPTIVE MIXED power vehicle energy management method based on operating mode's switch
CN110936949B (en) Energy control method, equipment, storage medium and device based on driving condition
Guo et al. Transfer deep reinforcement learning-enabled energy management strategy for hybrid tracked vehicle
CN111267831A (en) Hybrid vehicle intelligent time-domain-variable model prediction energy management method
Singh et al. Fuzzy logic and Elman neural network tuned energy management strategies for a power-split HEVs
Zhang et al. Route planning and power management for PHEVs with reinforcement learning
Li et al. Power management for a plug-in hybrid electric vehicle based on reinforcement learning with continuous state and action spaces
CN113554337B (en) Plug-in hybrid electric vehicle energy management strategy construction method integrating traffic information
CN106055830A (en) PHEV (Plug-in Hybrid Electric Vehicle) control threshold parameter optimization method based on dynamic programming
CN112765723A (en) Curiosity-driven hybrid power system deep reinforcement learning energy management method
CN115793445A (en) Hybrid electric vehicle control method based on multi-agent deep reinforcement learning
CN117131606A (en) Hybrid power tracked vehicle energy management method capable of transferring across motion dimension
Hu et al. Energy management optimization method of plug-in hybrid-electric bus based on incremental learning
CN115805840A (en) Energy consumption control method and system for range-extending type electric loader
Chen et al. Driving cycle recognition based adaptive equivalent consumption minimization strategy for hybrid electric vehicles
CN114969982A (en) Fuel cell automobile deep reinforcement learning energy management method based on strategy migration
Xia et al. A predictive energy management strategy for multi-mode plug-in hybrid electric vehicle based on long short-term memory neural network
Zhang et al. Uncertainty-Aware Energy Management Strategy for Hybrid Electric Vehicle Using Hybrid Deep Learning Method
He et al. Deep reinforcement learning based energy management strategies for electrified vehicles: Recent advances and perspectives
Zhang et al. A fuzzy neural network energy management strategy for parallel hybrid electric vehicle
CN117184095B (en) Hybrid electric vehicle system control method based on deep reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant