CN112026744B - Series-parallel hybrid power system energy management method based on DQN variants - Google Patents

Series-parallel hybrid power system energy management method based on DQN variants Download PDF

Info

Publication number
CN112026744B
CN112026744B CN202010845021.9A CN202010845021A CN112026744B CN 112026744 B CN112026744 B CN 112026744B CN 202010845021 A CN202010845021 A CN 202010845021A CN 112026744 B CN112026744 B CN 112026744B
Authority
CN
China
Prior art keywords
energy management
dqn
vehicle
series
automobile
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010845021.9A
Other languages
Chinese (zh)
Other versions
CN112026744A (en
Inventor
周健豪
薛四伍
廖宇晖
薛源
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202010845021.9A priority Critical patent/CN112026744B/en
Publication of CN112026744A publication Critical patent/CN112026744A/en
Application granted granted Critical
Publication of CN112026744B publication Critical patent/CN112026744B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W20/00Control systems specially adapted for hybrid vehicles
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W2050/0001Details of the control system
    • B60W2050/0019Control system elements or transfer functions
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W2050/0001Details of the control system
    • B60W2050/0019Control system elements or transfer functions
    • B60W2050/0028Mathematical models, e.g. for simulation
    • B60W2050/0031Mathematical model of the vehicle

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Automation & Control Theory (AREA)
  • Mechanical Engineering (AREA)
  • Transportation (AREA)
  • Human Computer Interaction (AREA)
  • Hybrid Electric Vehicles (AREA)
  • Electric Propulsion And Braking For Vehicles (AREA)

Abstract

The invention discloses a series-parallel hybrid power system energy management method based on a DQN variant, which belongs to the technical field of series-parallel hybrid power automobiles and can improve the training convergence speed and the automobile fuel economy; the invention comprises the following steps: establishing a series-parallel hybrid electric vehicle model, and acquiring environmental parameters influencing the energy management strategy, including road gradient and vehicle-mounted mass; and solving by using a Dynamic Programming (DP) algorithm to obtain an optimal energy management strategy, storing experience into an optimal experience pool (OEB), combining a hybrid experience playback (HER) technology, and adopting a duplex DQN strategy training model to obtain a trained deep reinforcement learning agent for energy management of the series-parallel hybrid electric vehicle under different working conditions. The HER technology and the DQN variant Dueling framework constructed by the method can effectively improve the training convergence speed, the automobile fuel economy and the algorithm robustness.

Description

Series-parallel hybrid power system energy management method based on DQN variants
Technical Field
The invention belongs to the technical field of series-parallel hybrid electric vehicles, and particularly relates to a series-parallel hybrid power system energy management method based on DQN variants.
Background
At present, the energy crisis is becoming more serious, the emission standard of automobiles is becoming more severe, the use of pure fuel oil automobiles is challenged, and the hybrid electric automobile has the advantages of long driving range of the fuel oil automobiles and no emission of electric automobiles to solve the problem of fossil fuel combustion, so that the energy management problem of the hybrid power is the key of research all the time.
At present, most of energy management of hybrid electric vehicles is a strategy based on rules, and by setting a certain energy management threshold value, the most common rule of the plug-in hybrid electric vehicles is to firstly consume battery energy, then maintain battery electric quantity and perform energy control on the rule. The optimization-based strategy has a representative benchmark which is Dynamic Programming (DP), and the hybrid electric vehicle relatively optimal energy management obtained off line under the condition that the global working condition information is known utilizes the known speed working condition to carry out corresponding optimal energy demand distribution on an engine and a battery of the hybrid electric vehicle so as to obtain the optimal energy management. In the prior art, engineers are used to develop rules for regular energy management or optimized model predictive control based on known or predicted speeds for energy management, thereby adjusting the equivalent fuel consumption of a hybrid vehicle.
However, the method in the prior art has some disadvantages, the rule-based energy management effect is often not obvious enough, many experience knowledge is needed for single working condition, global working condition is needed to be known based on optimized DP, real-time online application cannot be carried out due to too long calculation time, existing model prediction can be carried out in an optimized and real-time manner, but the prediction control step length cannot be selected too large, and the method still has a large difference compared with the optimization result of DP. And many optimization methods are not comprehensively considered, and the road gradient condition information and the vehicle-mounted mass change condition of the automobile are ignored.
Disclosure of Invention
The invention provides a series-parallel hybrid power system energy management method based on a DQN variant, which can improve the training convergence speed and the automobile fuel economy by combining the mixed experience playback of OEB and PER technologies.
In order to achieve the above purpose, the invention adopts the following technical scheme:
a method for the energy management of a series-parallel hybrid power system based on DQN variants comprises the following steps:
the method comprises the following steps: establishing a model of a passive series-parallel automobile;
step two: acquiring parameters influencing energy management of the experimental vehicle under a fixed route working condition, then solving by using DP to obtain an optimal solution, and storing the optimal solution experience in an OEB;
step three: based on parameters and observed quantities influencing energy management, a Dueling DQN neural network model is trained by using HER combined with PER to obtain a trained deep reinforcement learning agent;
step four: and acquiring parameters and observed quantities influencing energy management in the actual running of the automobile, and performing energy management on the hybrid electric vehicle under different working conditions based on the parameters and the observed quantities influencing the energy management in the actual running and the trained deep reinforcement learning agent.
In the above steps, the objective function of energy management:
Figure BDA0002642740590000021
wherein γ is an aggressive weighting factor, indicating a balanced equivalence to fuel consumption and battery power consumption; SOCrefRepresents the SOC reference value, mdotfuelFuel consumption for each sampling time;
the passive series-parallel automobile model comprises an automobile dynamic model, a planetary gear transmission, a motor and a battery;
the automobile dynamics model is as follows:
Figure BDA0002642740590000022
wherein, ToutIs the drive shaft torque, R is the vehicle wheel radius, FaIs the inertial resistance of the vehicle, FrIs the air resistance of the car, FgIs the car ramp resistance, FfIs the rolling resistance of the automobile, m is the mass of the automobile, v is the speed of the automobile, a is the acceleration of the automobile, ρ is the air density, A is the frontal area of the automobile, C is the speed of the automobileDIs the coefficient of air resistance, alpha is the gradient of the automobile road, murIs the rolling resistance coefficient;
the planetary gear transmission model is as follows:
Figure BDA0002642740590000023
wherein n ismIs the motor speed; β is a planetary gear parameter; n isoutIs the drive shaft speed; n iseIs the engine speed; t ismIs motor torque、TeIs the engine torque;
the cell model shown is:
Figure BDA0002642740590000031
wherein P isbatt(t) is the battery power, VocIs the battery voltage, Ib(t) is the battery current, rintIs the battery resistance, Pm(t) is the motor power, which is of the magnitude
Figure BDA0002642740590000032
Wherein etamIs the motor efficiency, SOC is the battery state of charge, QmaxIs the maximum capacity of the battery;
the parameters influencing the energy management comprise road conditions, namely road gradient, of the hybrid electric vehicle on different working conditions and vehicle-mounted mass change caused by passenger or cargo change;
the observed quantity influencing energy management comprises the speed of the automobile, the acceleration of the automobile, the engine rotating speed, the engine torque, the motor rotating speed, the motor torque, the battery state of charge, the fuel consumption at the current moment, the difference value between the SOC and the reference SOC, the automobile displacement, the measurable interference road gradient and the vehicle-mounted quality variation quantity;
the second step to the fourth step specifically comprise the following steps:
under the condition that working condition information is known in advance, an optimal energy management experience is obtained by utilizing a DP algorithm to solve and is stored in an OEB, then, under the condition of real-time working conditions, a dulling DQN in deep reinforcement learning is utilized to train, in each training, the SOC at the current moment, the difference value between the SOC and a reference SOC, the automobile speed, the automobile displacement, the automobile acceleration, the fuel consumption, the road gradient and the vehicle-mounted quality variation are used as observed value input data of a dulling DQN agent, and the reward value at the current moment is used as reward value input data of the dulling DQN agent, the experience obtained at the moment is stored in the PEB, then, PER is utilized to sample in the PEB and randomly sample from the OEB, the experience of the two is combined, wherein the proportion of the experience in the PEB can be continuously reduced along with the progress of time, so that a neural network of the dulling DQN of the HER is trained to obtain a convergent agent, obtaining equivalent fuel consumption of the experimental vehicle under different working conditions and parameters influencing energy management; inputting the observed value into the deep reinforcement learning agent, and outputting the observed value as the control quantity of the series-parallel hybrid electric vehicle, namely the torque demand and the rotating speed of the engine at the next moment of the current moment, wherein the current moment is the moment of the current observed quantity;
the parameters of the experimental vehicle influencing energy management under different working conditions are obtained as follows: obtaining a plurality of samples, each sample including parameters collected on the test vehicle at different times that may affect energy management;
after obtaining the equivalent fuel consumption of the experimental vehicle under different working conditions and the parameters influencing the energy management, acquiring the equivalent fuel consumption of the experimental vehicle under the working conditions of a fixed route and the parameters influencing the energy management at a preset sampling frequency, and performing smoothing processing and normalization processing on the acquired data.
Has the advantages that: the invention provides a series-parallel hybrid power system energy management method based on DQN variants, which considers environmental parameters influencing energy management due to road gradient and vehicle-mounted quality change, utilizes DP to solve optimal experience in advance and utilizes PER to form HER, so that a Dueling framework DQN agent can be trained better, and environmental parameters influencing energy management due to equivalent fuel consumption of an experimental vehicle under different working conditions are obtained; training a deep reinforcement learning agent model based on parameters and observed quantities influencing the energy management to obtain a trained agent; the method comprises the steps of obtaining environmental parameters influencing energy management in actual running of the automobile, and carrying out energy management on the hybrid electric vehicle based on the environmental parameters influencing energy management in actual running and trained agents, so that energy optimization can be effectively controlled, real-time online application can be realized, more effective control on energy management of the hybrid electric vehicle is realized, and energy consumption is reduced.
Drawings
FIG. 1 is a schematic illustration of the training and application process of a series-parallel hybrid power system energy management method based on DQN variants in an embodiment of the present invention;
FIG. 2 is a flow chart of a specific application of a series-parallel hybrid power system energy management method based on DQN variants in an embodiment of the present invention;
FIG. 3 is a schematic network structure diagram of a hybrid-series hybrid power system energy management method Dueling DQN based on DQN variants in the embodiment of the invention;
FIG. 4 is a graph of reference SOC versus time for an embodiment of the present invention.
Detailed Description
The invention is described in detail below with reference to the following figures and specific examples:
a method for the energy management of a series-parallel hybrid power system based on DQN variants comprises the following steps:
the method comprises the following steps: establishing a model of a passive series-parallel automobile;
step two: acquiring parameters influencing energy management of the experimental vehicle under a fixed route working condition, then solving by using DP to obtain an optimal solution, and storing the optimal solution experience in an OEB;
step three: based on parameters and observed quantities influencing energy management, a Dueling DQN neural network model is trained by using HER combined with PER to obtain a trained deep reinforcement learning agent;
step four: and acquiring parameters and observed quantities influencing energy management in the actual running of the automobile, and performing energy management on the hybrid electric vehicle under different working conditions based on the parameters and the observed quantities influencing the energy management in the actual running and the trained deep reinforcement learning agent.
In the above steps, the objective function of energy management is:
Figure BDA0002642740590000051
wherein γ is an aggressive weighting factor, indicating a balanced equivalence to fuel consumption and battery power consumption; SOCrefRepresents an SOC reference value; as shown in FIG. 4, in the objective functionThe SOC is mainly the working condition running time of the automobile through historical travel information, so that a simple SOC reference can be obtained, namely the reference SOC is uniformly reduced in a linear function manner along with the change of time, and the working effect of the battery under the condition is better;
the passive series-parallel automobile model comprises an automobile dynamic model, a planetary gear transmission, a motor and a battery;
the automobile dynamics model is as follows:
Figure BDA0002642740590000052
wherein, ToutIs the drive shaft torque, R is the vehicle wheel radius, FaIs the inertial resistance of the vehicle, FrIs the air resistance of the car, FgIs the car ramp resistance, FfIs the rolling resistance of the automobile, m is the mass of the automobile, v is the speed of the automobile, a is the acceleration of the automobile, ρ is the air density, A is the frontal area of the automobile, C is the speed of the automobileDIs the coefficient of air resistance, alpha is the gradient of the automobile road, murIs the rolling resistance coefficient;
the planetary gear transmission model is as follows:
Figure BDA0002642740590000053
wherein n ismIs the motor speed; β is a planetary gear parameter; n isoutIs the drive shaft speed; n iseIs the engine speed; t ismIs the motor torque, TeIs the engine torque;
the cell model shown is:
Figure BDA0002642740590000054
wherein P isbatt(t) is the battery power, VocIs the battery voltage, Ib(t) is the battery current, rintIs the battery resistance, Pm(t) is the motor power, which is largeIs small as
Figure BDA0002642740590000061
Wherein etamIs the motor efficiency, SOC is the battery state of charge, QmaxIs the maximum capacity of the battery;
the parameters influencing the energy management comprise road conditions, namely road gradient, of the hybrid electric vehicle on different working conditions and vehicle-mounted mass change caused by passenger or cargo change;
the observed quantity influencing energy management comprises the speed of the automobile, the acceleration of the automobile, the engine rotating speed, the engine torque, the motor rotating speed, the motor torque, the battery state of charge, the fuel consumption at the current moment, the difference value between the SOC and the reference SOC, the automobile displacement, the measurable interference road gradient and the vehicle-mounted quality variation quantity;
as shown in fig. 1, the off-line training and on-line real-time application process of the series-parallel hybrid power system energy management method based on the DQN variant includes the following steps:
under the condition that working condition information is known in advance, an optimal energy management experience is obtained by utilizing a DP algorithm to solve and is stored in an OEB, then, under the condition of real-time working conditions, a dulling DQN in deep reinforcement learning is utilized to train, in each training, the SOC at the current moment, the difference value between the SOC and a reference SOC, the automobile speed, the automobile displacement, the automobile acceleration, the fuel consumption, the road gradient and the vehicle-mounted quality variation are used as observed value input data of a dulling DQN agent, and the reward value at the current moment is used as reward value input data of the dulling DQN agent, the experience obtained at the moment is stored in the PEB, then, PER is utilized to sample in the PEB and randomly sample from the OEB, the experience of the two is combined, wherein the proportion of the experience in the PEB can be continuously reduced along with the progress of time, so that a neural network of the dulling DQN of the HER is trained to obtain a convergent agent, obtaining equivalent fuel consumption of the experimental vehicle under different working conditions and parameters influencing energy management; inputting the observed value into the deep reinforcement learning agent, and outputting the observed value as the control quantity of the series-parallel hybrid electric vehicle, namely the torque demand and the rotating speed of the engine at the next moment of the current moment, wherein the current moment is the moment of the current observed quantity;
the parameters of the experimental vehicle influencing energy management under different working conditions are obtained as follows: obtaining a plurality of samples, each sample including parameters collected on the test vehicle at different times that may affect energy management;
after obtaining the equivalent fuel consumption of the experimental vehicle under different working conditions and the parameters influencing the energy management, acquiring the equivalent fuel consumption of the experimental vehicle under the working conditions of a fixed route and the parameters influencing the energy management at a preset sampling frequency, and performing smoothing processing and normalization processing on the acquired data.
The method can be applied to the scenes of the series-parallel hybrid electric vehicle under different working conditions, for example, the energy management of the vehicle is carried out by using the method, and the vehicle can adjust the online driving condition according to the agent trained in advance, so that the equivalent fuel consumption is reduced and the more accurate control is carried out. As shown in fig. 2, the specific application process of the above method is as follows:
step 201, obtaining environmental parameters of the experimental vehicle influencing energy management under different working conditions
The operating condition may represent a change of the running speed of the experimental vehicle with time, for example, the time from a starting station to an ending station of the automobile is 1024s, and how the speed change is, which may be regarded as an operating condition. In this embodiment, NEDC is used as an experimental training condition, and data of the experimental vehicle circulating at least three times under the condition is collected to ensure reliability of the training data.
The environmental parameter affecting energy management may be at least one of road conditions of different working conditions of the hybrid vehicle, i.e. road grade, vehicle-mounted mass changes due to passenger and cargo changes.
In the implementation, at least one environmental parameter which may influence energy management of the experimental vehicle under each working condition is obtained, and the environmental parameter which may influence energy management is selected from the at least one environmental parameter which may influence energy management. The equivalent fuel consumption of the experimental vehicle under the working condition is obtained by reading the data of the fuel consumption and the battery electric quantity sensor installed on the automobile.
Optionally, the method includes acquiring the equivalent fuel consumption of the experimental vehicle and the environmental parameters and the observed quantity influencing energy management under various working conditions at a preset sampling frequency, and performing smoothing processing and normalization processing on the acquired parameters.
The larger the sampling frequency is, the smaller the interval between sampling points is, the more data are obtained, and the greater the correlation between the data is, so that the result output by the finally trained agent model is more accurate. The technician can preset a sampling time interval, and samples the experimental vehicle in running according to the sampling time interval. For example, the technician may set the sampling frequency to 1 HZ.
By smoothing the collected data, the purpose of suppressing the collected inaccurate data can be achieved.
The acquired parameters are normalized, so that the values corresponding to different parameters have certain comparability, the accuracy of the proxy network model is improved, and great help is provided for setting the reward value.
Specifically, in the normalization process in this embodiment, because multiple sets of parameter items have been obtained in the above process, the normalization process may be calculated by determining the maximum parameter value and the minimum parameter value in each set of parameter items, and according to the determined maximum parameter value and the determined minimum parameter value, through the following formula:
Figure BDA0002642740590000071
wherein X is data after parameter normalization processing in each group of parameter items, and XminFor the minimum parameter value, x, in each set of parameter termsmaxThe maximum parameter value in each group of parameter items.
And 202, based on the environmental parameters and the observed values influencing energy management, training an agent model by utilizing deep reinforcement learning to obtain a trained convergence agent.
The deep reinforcement learning agent is one of the combination of neural network models, and can control the action data of the next moment according to the observation data of the estimated moment. Observation data is input at the input layer of the agent model, and a Q value estimation is output at the output layer of the agent.
The neural network carries out Q function calculation to obtain a Q value: q (s, a | theta)Q) The input is state s, action a, and the output is Q function Q (s, a | theta)Q) The network is divided into an Online evaluation network and a Target evaluation network, the Target evaluation network and the Online evaluation network have the same structure, and the parameter theta of the Online evaluation network is the sameQCarrying out random initialization, and initializing a network parameter theta of a Target evaluation network through the two network parametersQ′And simultaneously, opening up a space OEB as a storage space for experience playback.
After initialization is completed, iterative solution is started, action exploration is carried out by adopting an e-greed greedy algorithm, and action a is executed in the current statetThe corresponding reward and next state are obtained and the process is formed into element groups(s)t,at,rt,st+1) Store into OEB space. Selecting a small batch of data from an OEB space by adopting a PER technology, combining the small batch of data randomly selected from the OEB to perform HER as training data of the Online evaluation network, and updating the Online evaluation network.
Defining an Online evaluation network Loss function: l ═ [ (r + γ Q)t(s′,μt(s′|θt μ)|θt Q))-Q(s,a|θQ)]2The Online evaluation network is updated by minimizing the Loss function. Evaluating network parameter theta using updated OnlineQNetwork parameter theta of network is evaluated for TargetQ′Updating:
Figure BDA0002642740590000081
in the deep reinforcement learning agent dulling DQN neural network framework shown in fig. 3, the Q value of standard DQN represents the value of a state and an action, and in this case, for the case that the Q value is not related to different actions in some states, the evaluation of the value of the action is not accurate, and the robustness and the stability are lacked.
The Dueling DQL divides the abstract features extracted by the convolutional layer into two streams in the fully connected layer, respectively represents the state action value function represented only by the original DQN as a state value function v(s), makes the state estimation value independent of action and environmental noise, makes each state have a relative value with respect to other unselected actions, and the other stream represents a state action dominance function a (s, a) in a certain state. Finally, combining the two streams together through a special aggregation layer to generate an estimate of the state-action-value function has the advantage of generalizing learning across operations without any modification to the underlying RL algorithm, in dulling DQL, the Q-value function is constructed as follows:
Q(s,a;θ,α,β)=V(s;θ,α)+A(s,a;θ,β) (7)
α and β are parameters of two streams in the fully-connected layer, respectively, and θ is a parameter of the convolutional layer. However, when Q is given, the values of V and a are not unique, in other words, different combinations of V and a can result in the same Q value, which makes the algorithm less stable, and therefore the average of the merit function is used to improve the stability of the proposed algorithm:
Figure BDA0002642740590000082
only more layers are needed as compared with standard DQN training, but when there are many behaviors of similar value, dulling DQN can better perform strategy evaluation and improve stability and robustness.
In the actual process, the number of the neurons in the hidden layer of the agent model is set to be 40, and in order to accurately evaluate the effect of deep reinforcement learning energy management, the control effect of the deep reinforcement learning can be evaluated through the equivalent fuel consumption ratio R.
The equivalent fuel consumption ratio reflects a comparison between the effect of the actual control and the DP reference, with good results as the R value approaches 0. The formula for calculating the value of the ratio R is as follows:
Figure BDA0002642740590000091
wherein R is represented as the ratio between DP reference data and actual data, SRLRepresenting equivalent fuel consumption, S, by deep reinforcement learning trainingDPThe equivalent fuel consumption reference data obtained under the DP reference is shown.
It should be noted that the prediction performance of the nonlinear autoregressive dynamic neural network model after the training is completed is evaluated by calculating the ratio and the root mean square between the reference data and the actual data, for example, when the obtained ratio is close to 1, the root mean square is close to 0, which indicates that the proxy model after the training is completed by the deep reinforcement learning has good control performance.
Optionally, the training condition is NEDC, and the detection condition is WLTP, or FTP75, UDDS, JN 1015. In order to make the control data of the deep reinforcement learning algorithm more accurate, the trained agent models can be detected respectively under various different working conditions, the R value is calculated in the training process under each working condition, the control performance of each back propagation training method is compared by taking the R value as an index, and further the effect and robustness of the deep reinforcement learning are detected. The results are shown in table 1:
TABLE 1
Figure BDA0002642740590000092
From table 1, it can be seen that, in the deep reinforcement learning training process, the ratio R value of the obtained reference data to the actual data is close to 90%, which proves the effectiveness of real-time application.
And 303, acquiring environmental parameters and observed quantities influencing energy management in the actual running of the automobile, and controlling the energy management of the automobile based on the environmental parameters influencing the energy management in the actual running and the trained agent model.
In the steps, because the environmental parameter items influencing the energy management and the trained agent model are obtained, the parameters influencing the energy and the observed quantity can be input into the trained agent model in real time to control the running and the energy management of the automobile.
Specifically, if the control action of the vehicle at the current moment is to be controlled, the environmental parameters and the observed quantity affecting the energy management at the estimated moment need to be obtained, such as at least one of the road gradient condition and the vehicle-mounted mass change condition of the hybrid vehicle under different working conditions. The vehicle speed, the vehicle acceleration, the engine speed, the engine torque, the motor speed, the motor torque, the battery state of charge, the current fuel consumption, the difference between the SOC and the reference SOC, and the vehicle displacement. Inputting the parameters into the trained proxy model, and outputting the control action of the automobile at the estimated time, namely the engine torque and the rotating speed at the next time, wherein the estimated time is the time for sampling the parameters next to the current time, namely the estimated time is the time corresponding to the next sampling point of the sampling point corresponding to the current time.
The above description is only a preferred embodiment of the present invention, and the purpose, technical solution and advantages of the present invention are further described in detail without limiting the invention, and any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (9)

1. A method for the energy management of a series-parallel hybrid power system based on DQN variants is characterized by comprising the following steps:
establishing a model of a passive series-parallel automobile; under the condition that working condition information is known in advance, an optimal energy management experience is obtained by utilizing a DP algorithm to solve and is stored in an OEB, then, under the condition of real-time working conditions, a dulling DQN in deep reinforcement learning is utilized to train, in each training, the SOC at the current moment, the difference value between the SOC and a reference SOC, the automobile speed, the automobile displacement, the automobile acceleration, the fuel consumption, the road gradient and the vehicle-mounted quality variation are used as observed value input data of a dulling DQN agent, and the reward value at the current moment is used as reward value input data of the dulling DQN agent, the experience obtained at the moment is stored in the PEB, then, PER is utilized to sample in the PEB and randomly sample from the OEB, the experience of the two is combined, wherein the proportion of the experience in the PEB can be continuously reduced along with the progress of time, so that a neural network of the dulling DQN of the HER is trained to obtain a convergent agent, obtaining equivalent fuel consumption of the experimental vehicle under different working conditions, and parameters and observed quantities influencing energy management; and inputting the observed quantity into the deep reinforcement learning agent to perform energy management of the hybrid electric vehicle under different working conditions, and outputting the observed quantity as the control quantity of the series-parallel hybrid electric vehicle, namely the torque demand and the rotating speed of the engine at the next moment of the current moment.
2. The method of series-parallel hybrid power system energy management based on DQN variant according to claim 1, characterized in that the model of the passive series-parallel vehicle comprises vehicle dynamics model, planetary gear transmission and electric machine and battery.
3. The method of series-parallel hybrid power system energy management based on DQN variant according to claim 2, characterized by the automotive dynamics model as follows:
Figure FDA0003292787060000011
wherein, ToutIs the drive shaft torque, R is the vehicle wheel radius, FaIs the inertial resistance of the vehicle, FrIs the air resistance of the car, FgIs the car ramp resistance, FfIs the rolling resistance of the automobile, m is the mass of the automobile, v is the speed of the automobile, a is the acceleration of the automobile, ρ is the air density, A is the frontal area of the automobile, C is the speed of the automobileDIs the coefficient of air resistance, alpha is the gradient of the automobile road, murIs the rolling resistance coefficient.
4. The method of series-parallel hybrid powertrain energy management based on DQN variants according to claim 2, characterized in that the planetary gear transmission model is as follows:
Figure FDA0003292787060000021
wherein n ismIs the motor speed; β is a planetary gear parameter; n isoutIs the drive shaft speed; n iseIs the engine speed; t ismIs the motor torque, TeIs the engine torque.
5. The method for series-parallel hybrid power system energy management based on DQN variant according to claim 2, characterized by the battery model:
Figure FDA0003292787060000022
wherein P isbatt(t) is the battery power, VocIs the battery voltage, Ib(t) is the battery current, rintIs the battery resistance, Pm(t) is the motor power, which is of the magnitude
Figure FDA0003292787060000023
Wherein etamIs the motor efficiency, SOC is the battery state of charge, QmaxIs the maximum capacity of the battery.
6. The method of series-parallel hybrid system energy management based on DQN variant according to claim 1, characterized in that the parameters influencing energy management include road conditions, i.e. road grade, on different working conditions of the hybrid vehicle and vehicle-mounted mass changes due to passenger or cargo changes; the observed quantities affecting energy management include speed of the vehicle, acceleration of the vehicle, engine speed, engine torque, motor speed, motor torque, battery state of charge, fuel consumption at the present time, difference between SOC and reference SOC, vehicle displacement, and measurable disturbance road grade and vehicle mass change.
7. The method of series-parallel hybrid power system energy management based on DQN variant according to claim 1, characterized by that the objective function of energy management is:
Figure FDA0003292787060000024
wherein γ is an aggressive weighting factor, indicating a balanced equivalence to fuel consumption and battery power consumption; SOCrefWhich represents a reference value of the SOC,
Figure FDA0003292787060000025
fuel consumption was measured for each sampling time.
8. The method for the energy management of the series-parallel hybrid system based on the DQN variants according to claim 1, wherein the parameters of the experimental vehicle influencing the energy management under different working conditions are obtained as follows: a plurality of samples are taken, each sample including parameters collected at different times on the test vehicle that may affect energy management.
9. The DQN variant-based energy management method for the series-parallel hybrid power system, according to claim 1, wherein after obtaining the equivalent fuel consumption and the parameters affecting the energy management of the experimental vehicle under different working conditions, the method collects the equivalent fuel consumption and the parameters affecting the energy management of the experimental vehicle under the working conditions of a fixed route at a preset sampling frequency, and performs smoothing and normalization processing on the collected data.
CN202010845021.9A 2020-08-20 2020-08-20 Series-parallel hybrid power system energy management method based on DQN variants Active CN112026744B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010845021.9A CN112026744B (en) 2020-08-20 2020-08-20 Series-parallel hybrid power system energy management method based on DQN variants

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010845021.9A CN112026744B (en) 2020-08-20 2020-08-20 Series-parallel hybrid power system energy management method based on DQN variants

Publications (2)

Publication Number Publication Date
CN112026744A CN112026744A (en) 2020-12-04
CN112026744B true CN112026744B (en) 2022-01-04

Family

ID=73581036

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010845021.9A Active CN112026744B (en) 2020-08-20 2020-08-20 Series-parallel hybrid power system energy management method based on DQN variants

Country Status (1)

Country Link
CN (1) CN112026744B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112498334B (en) * 2020-12-15 2022-03-11 清华大学 Robust energy management method and system for intelligent network-connected hybrid electric vehicle
CN113492827A (en) * 2021-06-23 2021-10-12 东风柳州汽车有限公司 Energy management method and device for hybrid electric vehicle
CN113997926A (en) * 2021-11-30 2022-02-01 江苏浩峰汽车附件有限公司 Parallel hybrid electric vehicle energy management method based on layered reinforcement learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1252036B1 (en) * 2000-01-31 2006-03-15 Azure Dynamics Inc. Method and apparatus for adaptive hybrid vehicle control
CN102717797A (en) * 2012-06-14 2012-10-10 北京理工大学 Energy management method and system of hybrid vehicle
CN110682905A (en) * 2019-10-12 2020-01-14 重庆大学 Method for acquiring battery charge state reference variable quantity in time domain based on driving mileage
CN111267831A (en) * 2020-02-28 2020-06-12 南京航空航天大学 Hybrid vehicle intelligent time-domain-variable model prediction energy management method
CN111267830A (en) * 2020-02-10 2020-06-12 南京航空航天大学 Hybrid power bus energy management method, device and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8543272B2 (en) * 2010-08-05 2013-09-24 Ford Global Technologies, Llc Distance oriented energy management strategy for a hybrid electric vehicle

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1252036B1 (en) * 2000-01-31 2006-03-15 Azure Dynamics Inc. Method and apparatus for adaptive hybrid vehicle control
CN102717797A (en) * 2012-06-14 2012-10-10 北京理工大学 Energy management method and system of hybrid vehicle
CN110682905A (en) * 2019-10-12 2020-01-14 重庆大学 Method for acquiring battery charge state reference variable quantity in time domain based on driving mileage
CN111267830A (en) * 2020-02-10 2020-06-12 南京航空航天大学 Hybrid power bus energy management method, device and storage medium
CN111267831A (en) * 2020-02-28 2020-06-12 南京航空航天大学 Hybrid vehicle intelligent time-domain-variable model prediction energy management method

Also Published As

Publication number Publication date
CN112026744A (en) 2020-12-04

Similar Documents

Publication Publication Date Title
CN112026744B (en) Series-parallel hybrid power system energy management method based on DQN variants
CN110775065B (en) Hybrid electric vehicle battery life prediction method based on working condition recognition
WO2021114742A1 (en) Comprehensive energy prediction and management method for hybrid electric vehicle
Liessner et al. Deep reinforcement learning for advanced energy management of hybrid electric vehicles.
CN111267830B (en) Hybrid power bus energy management method, device and storage medium
CN108909702A (en) A kind of plug-in hybrid-power automobile energy management method and system
CN111267831A (en) Hybrid vehicle intelligent time-domain-variable model prediction energy management method
CN112249002B (en) TD 3-based heuristic series-parallel hybrid power energy management method
CN110682905B (en) Method for acquiring battery charge state reference variable quantity in time domain based on driving mileage
CN112668799A (en) Intelligent energy management method and storage medium for PHEV (Power electric vehicle) based on big driving data
CN113525396B (en) Hybrid electric vehicle layered prediction energy management method integrating deep reinforcement learning
CN108647836B (en) Driver energy-saving evaluation method and system
CN105527110B (en) The appraisal procedure and device of automobile fuel ecomomy
Zöldy et al. Modelling fuel consumption and refuelling of autonomous vehicles
CN112406875A (en) Vehicle energy consumption analysis method and device
Montazeri-Gh et al. Driving condition recognition for genetic-fuzzy HEV control
CN115107733A (en) Energy management method and system for hybrid electric vehicle
Vignesh et al. Intelligent energy management through neuro-fuzzy based adaptive ECMS approach for an optimal battery utilization in plugin parallel hybrid electric vehicle
CN110077389B (en) Energy management method for plug-in hybrid electric vehicle
CN115805840A (en) Energy consumption control method and system for range-extending type electric loader
Peng et al. Ecological driving framework of hybrid electric vehicle based on heterogeneous multi agent deep reinforcement learning
Puchalski et al. Driving style analysis and driver classification using OBD data of a hybrid electric vehicle
Chen et al. On the relationship between energy consumption and driving behavior of electric vehicles based on statistical features
Fechert et al. Using deep reinforcement learning for hybrid electric vehicle energy management under consideration of dynamic emission models
CN114969982A (en) Fuel cell automobile deep reinforcement learning energy management method based on strategy migration

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant