CN112026744A - Series-parallel hybrid power system energy management method based on DQN variants - Google Patents
Series-parallel hybrid power system energy management method based on DQN variants Download PDFInfo
- Publication number
- CN112026744A CN112026744A CN202010845021.9A CN202010845021A CN112026744A CN 112026744 A CN112026744 A CN 112026744A CN 202010845021 A CN202010845021 A CN 202010845021A CN 112026744 A CN112026744 A CN 112026744A
- Authority
- CN
- China
- Prior art keywords
- energy management
- dqn
- series
- vehicle
- automobile
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W20/00—Control systems specially adapted for hybrid vehicles
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W50/00—Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W50/00—Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
- B60W2050/0001—Details of the control system
- B60W2050/0019—Control system elements or transfer functions
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W50/00—Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
- B60W2050/0001—Details of the control system
- B60W2050/0019—Control system elements or transfer functions
- B60W2050/0028—Mathematical models, e.g. for simulation
- B60W2050/0031—Mathematical model of the vehicle
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Physics & Mathematics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Automation & Control Theory (AREA)
- Mechanical Engineering (AREA)
- Transportation (AREA)
- Human Computer Interaction (AREA)
- Hybrid Electric Vehicles (AREA)
- Electric Propulsion And Braking For Vehicles (AREA)
Abstract
The invention discloses a series-parallel hybrid power system energy management method based on a DQN variant, which belongs to the technical field of series-parallel hybrid power automobiles and can improve the training convergence speed and the automobile fuel economy; the invention comprises the following steps: establishing a series-parallel hybrid electric vehicle model, and acquiring environmental parameters influencing the energy management strategy, including road gradient and vehicle-mounted mass; and solving by using a Dynamic Programming (DP) algorithm to obtain an optimal energy management strategy, storing experience into an optimal experience pool (OEB), combining a hybrid experience playback (HER) technology, and adopting a duplex DQN strategy training model to obtain a trained deep reinforcement learning agent for energy management of the series-parallel hybrid electric vehicle under different working conditions. The HER technology and the DQN variant Dueling framework constructed by the method can effectively improve the training convergence speed, the automobile fuel economy and the algorithm robustness.
Description
Technical Field
The invention belongs to the technical field of series-parallel hybrid electric vehicles, and particularly relates to a series-parallel hybrid power system energy management method based on DQN variants.
Background
At present, the energy crisis is becoming more serious, the emission standard of automobiles is becoming more severe, the use of pure fuel oil automobiles is challenged, and the hybrid electric automobile has the advantages of long driving range of the fuel oil automobiles and no emission of electric automobiles to solve the problem of fossil fuel combustion, so that the energy management problem of the hybrid power is the key of research all the time.
At present, most of energy management of hybrid electric vehicles is a strategy based on rules, and by setting a certain energy management threshold value, the most common rule of the plug-in hybrid electric vehicles is to firstly consume battery energy, then maintain battery electric quantity and perform energy control on the rule. The optimization-based strategy has a representative benchmark which is Dynamic Programming (DP), and the hybrid electric vehicle relatively optimal energy management obtained off line under the condition that the global working condition information is known utilizes the known speed working condition to carry out corresponding optimal energy demand distribution on an engine and a battery of the hybrid electric vehicle so as to obtain the optimal energy management. In the prior art, engineers are used to develop rules for regular energy management or optimized model predictive control based on known or predicted speeds for energy management, thereby adjusting the equivalent fuel consumption of a hybrid vehicle.
However, the method in the prior art has some disadvantages, the rule-based energy management effect is often not obvious enough, many experience knowledge is needed for single working condition, global working condition is needed to be known based on optimized DP, real-time online application cannot be carried out due to too long calculation time, existing model prediction can be carried out in an optimized and real-time manner, but the prediction control step length cannot be selected too large, and the method still has a large difference compared with the optimization result of DP. And many optimization methods are not comprehensively considered, and the road gradient condition information and the vehicle-mounted mass change condition of the automobile are ignored.
Disclosure of Invention
The invention provides a series-parallel hybrid power system energy management method based on a DQN variant, which can improve the training convergence speed and the automobile fuel economy by combining the mixed experience playback of OEB and PER technologies.
In order to achieve the above purpose, the invention adopts the following technical scheme:
a method for the energy management of a series-parallel hybrid power system based on DQN variants comprises the following steps:
the method comprises the following steps: establishing a model of a passive series-parallel automobile;
step two: acquiring parameters influencing energy management of the experimental vehicle under a fixed route working condition, then solving by using DP to obtain an optimal solution, and storing the optimal solution experience in an OEB;
step three: based on parameters and observed quantities influencing energy management, a Dueling DQN neural network model is trained by using HER combined with PER to obtain a trained deep reinforcement learning agent;
step four: and acquiring parameters and observed quantities influencing energy management in the actual running of the automobile, and performing energy management on the hybrid electric vehicle under different working conditions based on the parameters and the observed quantities influencing the energy management in the actual running and the trained deep reinforcement learning agent.
In the above steps, the objective function of energy management:
wherein γ is an aggressive weighting factor, indicating a balanced equivalence to fuel consumption and battery power consumption; SOCrefRepresents the SOC reference value, mdotfuelFuel consumption for each sampling time;
the passive series-parallel automobile model comprises an automobile dynamic model, a planetary gear transmission, a motor and a battery;
the automobile dynamics model is as follows:
wherein, ToutIs the drive shaft torque, R is the vehicle wheel radius, FaIs the inertial resistance of the vehicle, FrIs the air resistance of the car, FgIs the car ramp resistance, FfIs the rolling resistance of the automobile, m is the mass of the automobile, v is the speed of the automobile, a is the acceleration of the automobile, ρ is the air density, A is the frontal area of the automobile, C is the speed of the automobileDIs the coefficient of air resistance, alpha is the gradient of the automobile road, murIs the rolling resistance coefficient;
the planetary gear transmission model is as follows:
wherein n ismIs the motor speed; β is a planetary gear parameter; n isoutIs the drive shaft speed; n iseIs the engine speed; t ismIs the motor torque, TeIs the engine torque;
the cell model shown is:
wherein P isbatt(t) is the battery power, VocIs the battery voltage, Ib(t) is the battery current, rintIs the battery resistance, Pm(t) is the motor power, which is of the magnitudeWherein etamIs the motor efficiency, SOC is the battery state of charge, QmaxIs the maximum capacity of the battery;
the parameters influencing the energy management comprise road conditions, namely road gradient, of the hybrid electric vehicle on different working conditions and vehicle-mounted mass change caused by passenger or cargo change;
the observed quantity influencing energy management comprises the speed of the automobile, the acceleration of the automobile, the engine rotating speed, the engine torque, the motor rotating speed, the motor torque, the battery state of charge, the fuel consumption at the current moment, the difference value between the SOC and the reference SOC, the automobile displacement, the measurable interference road gradient and the vehicle-mounted quality variation quantity;
the second step to the fourth step specifically comprise the following steps:
under the condition that working condition information is known in advance, an optimal energy management experience is obtained by utilizing a DP algorithm to solve and is stored in an OEB, then, under the condition of real-time working conditions, a dulling DQN in deep reinforcement learning is utilized to train, in each training, the SOC at the current moment, the difference value between the SOC and a reference SOC, the automobile speed, the automobile displacement, the automobile acceleration, the fuel consumption, the road gradient and the vehicle-mounted quality variation are used as observed value input data of a dulling DQN agent, and the reward value at the current moment is used as reward value input data of the dulling DQN agent, the experience obtained at the moment is stored in the PEB, then, PER is utilized to sample in the PEB and randomly sample from the OEB, the experience of the two is combined, wherein the proportion of the experience in the PEB can be continuously reduced along with the progress of time, so that a neural network of the dulling DQN of the HER is trained to obtain a convergent agent, obtaining equivalent fuel consumption of the experimental vehicle under different working conditions and parameters influencing energy management; inputting the observed value into the deep reinforcement learning agent, and outputting the observed value as the control quantity of the series-parallel hybrid electric vehicle, namely the torque demand and the rotating speed of the engine at the next moment of the current moment, wherein the current moment is the moment of the current observed quantity;
the parameters of the experimental vehicle influencing energy management under different working conditions are obtained as follows: obtaining a plurality of samples, each sample including parameters collected on the test vehicle at different times that may affect energy management;
after obtaining the equivalent fuel consumption of the experimental vehicle under different working conditions and the parameters influencing the energy management, acquiring the equivalent fuel consumption of the experimental vehicle under the working conditions of a fixed route and the parameters influencing the energy management at a preset sampling frequency, and performing smoothing processing and normalization processing on the acquired data.
Has the advantages that: the invention provides a series-parallel hybrid power system energy management method based on DQN variants, which considers environmental parameters influencing energy management due to road gradient and vehicle-mounted quality change, utilizes DP to solve optimal experience in advance and utilizes PER to form HER, so that a Dueling framework DQN agent can be trained better, and environmental parameters influencing energy management due to equivalent fuel consumption of an experimental vehicle under different working conditions are obtained; training a deep reinforcement learning agent model based on parameters and observed quantities influencing the energy management to obtain a trained agent; the method comprises the steps of obtaining environmental parameters influencing energy management in actual running of the automobile, and carrying out energy management on the hybrid electric vehicle based on the environmental parameters influencing energy management in actual running and trained agents, so that energy optimization can be effectively controlled, real-time online application can be realized, more effective control on energy management of the hybrid electric vehicle is realized, and energy consumption is reduced.
Drawings
FIG. 1 is a schematic illustration of the training and application process of a series-parallel hybrid power system energy management method based on DQN variants in an embodiment of the present invention;
FIG. 2 is a flow chart of a specific application of a series-parallel hybrid power system energy management method based on DQN variants in an embodiment of the present invention;
FIG. 3 is a schematic network structure diagram of a hybrid-series hybrid power system energy management method Dueling DQN based on DQN variants in the embodiment of the invention;
FIG. 4 is a graph of reference SOC versus time for an embodiment of the present invention.
Detailed Description
The invention is described in detail below with reference to the following figures and specific examples:
a method for the energy management of a series-parallel hybrid power system based on DQN variants comprises the following steps:
the method comprises the following steps: establishing a model of a passive series-parallel automobile;
step two: acquiring parameters influencing energy management of the experimental vehicle under a fixed route working condition, then solving by using DP to obtain an optimal solution, and storing the optimal solution experience in an OEB;
step three: based on parameters and observed quantities influencing energy management, a Dueling DQN neural network model is trained by using HER combined with PER to obtain a trained deep reinforcement learning agent;
step four: and acquiring parameters and observed quantities influencing energy management in the actual running of the automobile, and performing energy management on the hybrid electric vehicle under different working conditions based on the parameters and the observed quantities influencing the energy management in the actual running and the trained deep reinforcement learning agent.
In the above steps, the objective function of energy management is:
wherein γ is an aggressive weighting factor, indicating a balanced equivalence to fuel consumption and battery power consumption; SOCrefRepresents an SOC reference value; as shown in fig. 4, the reference SOC in the objective function is mainly the operating time of the vehicle under the condition of the historical travel information, and a simple SOC reference can be obtained, that is, the reference SOC decreases at a uniform speed in a linear function along with the time change, so that the battery working effect under the condition is better;
the passive series-parallel automobile model comprises an automobile dynamic model, a planetary gear transmission, a motor and a battery;
the automobile dynamics model is as follows:
wherein, ToutIs the drive shaft torque, R is the vehicle wheel radius, FaIs the inertial resistance of the vehicle, FrIs the air resistance of the car, FgIs the car ramp resistance, FfIs the rolling resistance of the automobile, m is the mass of the automobile, v is the speed of the automobile, a is the acceleration of the automobile, ρ is the air density, A is the frontal area of the automobile, C is the speed of the automobileDIs the coefficient of air resistance, alpha is the gradient of the automobile road, murIs the rolling resistance coefficient;
the planetary gear transmission model is as follows:
wherein n ismIs the motor speed; β is a planetary gear parameter; n isoutIs the drive shaft speed; n iseIs the engine speed; t ismIs the motor torque, TeIs the engine torque;
the cell model shown is:
wherein P isbatt(t) is the battery power, VocIs the battery voltage, Ib(t) is the battery current, rintIs the battery resistance, Pm(t) is the motor power, which is of the magnitudeWherein etamIs the motor efficiency, SOC is the battery state of charge, QmaxIs the maximum capacity of the battery;
the parameters influencing the energy management comprise road conditions, namely road gradient, of the hybrid electric vehicle on different working conditions and vehicle-mounted mass change caused by passenger or cargo change;
the observed quantity influencing energy management comprises the speed of the automobile, the acceleration of the automobile, the engine rotating speed, the engine torque, the motor rotating speed, the motor torque, the battery state of charge, the fuel consumption at the current moment, the difference value between the SOC and the reference SOC, the automobile displacement, the measurable interference road gradient and the vehicle-mounted quality variation quantity;
as shown in fig. 1, the off-line training and on-line real-time application process of the series-parallel hybrid power system energy management method based on the DQN variant includes the following steps:
under the condition that working condition information is known in advance, an optimal energy management experience is obtained by utilizing a DP algorithm to solve and is stored in an OEB, then, under the condition of real-time working conditions, a dulling DQN in deep reinforcement learning is utilized to train, in each training, the SOC at the current moment, the difference value between the SOC and a reference SOC, the automobile speed, the automobile displacement, the automobile acceleration, the fuel consumption, the road gradient and the vehicle-mounted quality variation are used as observed value input data of a dulling DQN agent, and the reward value at the current moment is used as reward value input data of the dulling DQN agent, the experience obtained at the moment is stored in the PEB, then, PER is utilized to sample in the PEB and randomly sample from the OEB, the experience of the two is combined, wherein the proportion of the experience in the PEB can be continuously reduced along with the progress of time, so that a neural network of the dulling DQN of the HER is trained to obtain a convergent agent, obtaining equivalent fuel consumption of the experimental vehicle under different working conditions and parameters influencing energy management; inputting the observed value into the deep reinforcement learning agent, and outputting the observed value as the control quantity of the series-parallel hybrid electric vehicle, namely the torque demand and the rotating speed of the engine at the next moment of the current moment, wherein the current moment is the moment of the current observed quantity;
the parameters of the experimental vehicle influencing energy management under different working conditions are obtained as follows: obtaining a plurality of samples, each sample including parameters collected on the test vehicle at different times that may affect energy management;
after obtaining the equivalent fuel consumption of the experimental vehicle under different working conditions and the parameters influencing the energy management, acquiring the equivalent fuel consumption of the experimental vehicle under the working conditions of a fixed route and the parameters influencing the energy management at a preset sampling frequency, and performing smoothing processing and normalization processing on the acquired data.
The method can be applied to the scenes of the series-parallel hybrid electric vehicle under different working conditions, for example, the energy management of the vehicle is carried out by using the method, and the vehicle can adjust the online driving condition according to the agent trained in advance, so that the equivalent fuel consumption is reduced and the more accurate control is carried out. As shown in fig. 2, the specific application process of the above method is as follows:
The operating condition may represent a change of the running speed of the experimental vehicle with time, for example, the time from a starting station to an ending station of the automobile is 1024s, and how the speed change is, which may be regarded as an operating condition. In this embodiment, NEDC is used as an experimental training condition, and data of the experimental vehicle circulating at least three times under the condition is collected to ensure reliability of the training data.
The environmental parameter affecting energy management may be at least one of road conditions of different working conditions of the hybrid vehicle, i.e. road grade, vehicle-mounted mass changes due to passenger and cargo changes.
In the implementation, at least one environmental parameter which may influence energy management of the experimental vehicle under each working condition is obtained, and the environmental parameter which may influence energy management is selected from the at least one environmental parameter which may influence energy management. The equivalent fuel consumption of the experimental vehicle under the working condition is obtained by reading the data of the fuel consumption and the battery electric quantity sensor installed on the automobile.
Optionally, the method includes acquiring the equivalent fuel consumption of the experimental vehicle and the environmental parameters and the observed quantity influencing energy management under various working conditions at a preset sampling frequency, and performing smoothing processing and normalization processing on the acquired parameters.
The larger the sampling frequency is, the smaller the interval between sampling points is, the more data are obtained, and the greater the correlation between the data is, so that the result output by the finally trained agent model is more accurate. The technician can preset a sampling time interval, and samples the experimental vehicle in running according to the sampling time interval. For example, the technician may set the sampling frequency to 1 HZ.
By smoothing the collected data, the purpose of suppressing the collected inaccurate data can be achieved.
The acquired parameters are normalized, so that the values corresponding to different parameters have certain comparability, the accuracy of the proxy network model is improved, and great help is provided for setting the reward value.
Specifically, in the normalization process in this embodiment, because multiple sets of parameter items have been obtained in the above process, the normalization process may be calculated by determining the maximum parameter value and the minimum parameter value in each set of parameter items, and according to the determined maximum parameter value and the determined minimum parameter value, through the following formula:
wherein X is data after parameter normalization processing in each group of parameter items, and XminFor the minimum parameter value, x, in each set of parameter termsmaxThe maximum parameter value in each group of parameter items.
And 202, based on the environmental parameters and the observed values influencing energy management, training an agent model by utilizing deep reinforcement learning to obtain a trained convergence agent.
The deep reinforcement learning agent is one of the combination of neural network models, and can control the action data of the next moment according to the observation data of the estimated moment. Observation data is input at the input layer of the agent model, and a Q value estimation is output at the output layer of the agent.
The neural network carries out Q function calculation to obtain a Q value: q (s, a | theta)Q) The input is state s, action a, and the output is Q function Q (s, a | theta)Q) The network is divided into an Online evaluation network and a Target evaluation network, the Target evaluation network and the Online evaluation network have the same structure, and the parameter theta of the Online evaluation network is the sameQCarrying out random initialization, and initializing a network parameter theta of a Target evaluation network through the two network parametersQ′And simultaneously, opening up a space OEB as a storage space for experience playback.
After initialization is completed, iterative solution is started, action exploration is carried out by adopting an e-greed greedy algorithm, and action a is executed in the current statetThe corresponding reward and next state are obtained and the process is formed into element groups(s)t,at,rt,st+1) Store into OEB space. Selecting a small batch from OEB space by PER technologyAnd (4) measuring data, combining a small batch of data randomly selected from the OEB to perform HER as training data of the Online evaluation network, and updating the Online evaluation network.
Defining an Online evaluation network Loss function: l ═ [ (r + γ Q)t(s′,μt(s′|θt μ)|θt Q))-Q(s,a|θQ)]2The Online evaluation network is updated by minimizing the Loss function. Evaluating network parameter theta using updated OnlineQNetwork parameter theta of network is evaluated for TargetQ′Updating:
in the deep reinforcement learning agent dulling DQN neural network framework shown in fig. 3, the Q value of standard DQN represents the value of a state and an action, and in this case, for the case that the Q value is not related to different actions in some states, the evaluation of the value of the action is not accurate, and the robustness and the stability are lacked.
The Dueling DQL divides the abstract features extracted by the convolutional layer into two streams in the fully connected layer, respectively represents the state action value function represented only by the original DQN as a state value function v(s), makes the state estimation value independent of action and environmental noise, makes each state have a relative value with respect to other unselected actions, and the other stream represents a state action dominance function a (s, a) in a certain state. Finally, combining the two streams together through a special aggregation layer to generate an estimate of the state-action-value function has the advantage of generalizing learning across operations without any modification to the underlying RL algorithm, in dulling DQL, the Q-value function is constructed as follows:
Q(s,a;θ,α,β)=V(s;θ,α)+A(s,a;θ,β) (7)
α and β are parameters of two streams in the fully-connected layer, respectively, and θ is a parameter of the convolutional layer. However, when Q is given, the values of V and a are not unique, in other words, different combinations of V and a can result in the same Q value, which makes the algorithm less stable, and therefore the average of the merit function is used to improve the stability of the proposed algorithm:
only more layers are needed as compared with standard DQN training, but when there are many behaviors of similar value, dulling DQN can better perform strategy evaluation and improve stability and robustness.
In the actual process, the number of the neurons in the hidden layer of the agent model is set to be 40, and in order to accurately evaluate the effect of deep reinforcement learning energy management, the control effect of the deep reinforcement learning can be evaluated through the equivalent fuel consumption ratio R.
The equivalent fuel consumption ratio reflects a comparison between the effect of the actual control and the DP reference, with good results as the R value approaches 0. The formula for calculating the value of the ratio R is as follows:
wherein R is represented as the ratio between DP reference data and actual data, SRLRepresenting equivalent fuel consumption, S, by deep reinforcement learning trainingDPThe equivalent fuel consumption reference data obtained under the DP reference is shown.
It should be noted that the prediction performance of the nonlinear autoregressive dynamic neural network model after the training is completed is evaluated by calculating the ratio and the root mean square between the reference data and the actual data, for example, when the obtained ratio is close to 1, the root mean square is close to 0, which indicates that the proxy model after the training is completed by the deep reinforcement learning has good control performance.
Optionally, the training condition is NEDC, and the detection condition is WLTP, or FTP75, UDDS, JN 1015. In order to make the control data of the deep reinforcement learning algorithm more accurate, the trained agent models can be detected respectively under various different working conditions, the R value is calculated in the training process under each working condition, the control performance of each back propagation training method is compared by taking the R value as an index, and further the effect and robustness of the deep reinforcement learning are detected. The results are shown in table 1:
TABLE 1
From table 1, it can be seen that, in the deep reinforcement learning training process, the ratio R value of the obtained reference data to the actual data is close to 90%, which proves the effectiveness of real-time application.
And 303, acquiring environmental parameters and observed quantities influencing energy management in the actual running of the automobile, and controlling the energy management of the automobile based on the environmental parameters influencing the energy management in the actual running and the trained agent model.
In the steps, because the environmental parameter items influencing the energy management and the trained agent model are obtained, the parameters influencing the energy and the observed quantity can be input into the trained agent model in real time to control the running and the energy management of the automobile.
Specifically, if the control action of the vehicle at the current moment is to be controlled, the environmental parameters and the observed quantity affecting the energy management at the estimated moment need to be obtained, such as at least one of the road gradient condition and the vehicle-mounted mass change condition of the hybrid vehicle under different working conditions. The vehicle speed, the vehicle acceleration, the engine speed, the engine torque, the motor speed, the motor torque, the battery state of charge, the current fuel consumption, the difference between the SOC and the reference SOC, and the vehicle displacement. Inputting the parameters into the trained proxy model, and outputting the control action of the automobile at the estimated time, namely the engine torque and the rotating speed at the next time, wherein the estimated time is the time for sampling the parameters next to the current time, namely the estimated time is the time corresponding to the next sampling point of the sampling point corresponding to the current time.
The above description is only a preferred embodiment of the present invention, and the purpose, technical solution and advantages of the present invention are further described in detail without limiting the invention, and any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (10)
1. A method for the energy management of a series-parallel hybrid power system based on DQN variants is characterized by comprising the following steps:
the method comprises the following steps: establishing a model of a passive series-parallel automobile;
step two: acquiring parameters influencing energy management of the experimental vehicle under a fixed route working condition, then solving by using DP to obtain an optimal solution, and storing the optimal solution experience in an OEB;
step three: based on parameters and observed quantities influencing energy management, a Dueling DQN neural network model is trained by using HER combined with PER to obtain a trained deep reinforcement learning agent;
step four: and acquiring parameters and observed quantities influencing energy management in the actual running of the automobile, and performing energy management on the hybrid electric vehicle under different working conditions based on the parameters and the observed quantities influencing the energy management in the actual running and the trained deep reinforcement learning agent.
2. The method of series-parallel hybrid power system energy management based on DQN variant according to claim 1, characterized in that the model of the passive series-parallel vehicle comprises vehicle dynamics model, planetary gear transmission and electric machine and battery.
3. The method of series-parallel hybrid power system energy management based on DQN variant according to claim 2, characterized by the automotive dynamics model as follows:
wherein, ToutIs the drive shaft torque, R is the vehicle wheel radius, FaIs the inertial resistance of the vehicle, FrIs the air resistance of the car, FgIs the car ramp resistance, FfIs the rolling resistance of the automobile, m is the mass of the automobile, v is the speed of the automobile, a is the acceleration of the automobile, ρ is the air density, A is the frontal area of the automobile, C is the speed of the automobileDIs the coefficient of air resistance, alpha is the gradient of the automobile road, murIs the rolling resistance coefficient.
4. The method of series-parallel hybrid powertrain energy management based on DQN variants according to claim 2, characterized in that the planetary gear transmission model is as follows:
wherein n ismIs the motor speed; β is a planetary gear parameter; n isoutIs the drive shaft speed; n iseIs the engine speed; t ismIs the motor torque, TeIs the engine torque.
5. The method for series-parallel hybrid power system energy management based on DQN variant according to claim 2, characterized by the battery model:
6. The method of series-parallel hybrid system energy management based on DQN variant according to claim 1, characterized in that the parameters influencing energy management include road conditions, i.e. road grade, on different working conditions of the hybrid vehicle and vehicle-mounted mass changes due to passenger or cargo changes; the observed quantities affecting energy management include speed of the vehicle, acceleration of the vehicle, engine speed, engine torque, motor speed, motor torque, battery state of charge, fuel consumption at the present time, difference between SOC and reference SOC, vehicle displacement, and measurable disturbance road grade and vehicle mass change.
7. The method for the energy management of the series-parallel hybrid system based on the DQN variants as recited in claim 1, wherein the step two to the step four comprise the following steps:
under the condition that working condition information is known in advance, an optimal energy management experience is obtained by utilizing a DP algorithm to solve and is stored in an OEB, then, under the condition of real-time working conditions, a dulling DQN in deep reinforcement learning is utilized to train, in each training, the SOC at the current moment, the difference value between the SOC and a reference SOC, the automobile speed, the automobile displacement, the automobile acceleration, the fuel consumption, the road gradient and the vehicle-mounted quality variation are used as observed value input data of a dulling DQN agent, and the reward value at the current moment is used as reward value input data of the dulling DQN agent, the experience obtained at the moment is stored in the PEB, then, PER is utilized to sample in the PEB and randomly sample from the OEB, the experience of the two is combined, wherein the proportion of the experience in the PEB can be continuously reduced along with the progress of time, so that a neural network of the dulling DQN of the HER is trained to obtain a convergent agent, obtaining equivalent fuel consumption of the experimental vehicle under different working conditions and parameters influencing energy management; and inputting the observed value into the deep reinforcement learning agent, and outputting the observed value as the control quantity of the series-parallel hybrid electric vehicle, namely the torque demand and the rotating speed of the engine at the next moment of the current moment.
8. The method of series-parallel hybrid power system energy management based on DQN variant according to claim 1 or 7, characterized in that the objective function of energy management is:
9. The method for the energy management of the series-parallel hybrid system based on the DQN variants according to claim 7, wherein the parameters of the experimental vehicle influencing the energy management under different working conditions are obtained as follows: a plurality of samples are taken, each sample including parameters collected at different times on the test vehicle that may affect energy management.
10. The DQN variant-based energy management method for the series-parallel hybrid power system, according to claim 7, wherein after obtaining the equivalent fuel consumption and the parameters affecting the energy management of the experimental vehicle under different working conditions, the method collects the equivalent fuel consumption and the parameters affecting the energy management of the experimental vehicle under the working conditions of a fixed route at a preset sampling frequency, and performs smoothing and normalization processing on the collected data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010845021.9A CN112026744B (en) | 2020-08-20 | 2020-08-20 | Series-parallel hybrid power system energy management method based on DQN variants |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010845021.9A CN112026744B (en) | 2020-08-20 | 2020-08-20 | Series-parallel hybrid power system energy management method based on DQN variants |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112026744A true CN112026744A (en) | 2020-12-04 |
CN112026744B CN112026744B (en) | 2022-01-04 |
Family
ID=73581036
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010845021.9A Active CN112026744B (en) | 2020-08-20 | 2020-08-20 | Series-parallel hybrid power system energy management method based on DQN variants |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112026744B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112498334A (en) * | 2020-12-15 | 2021-03-16 | 清华大学 | Robust energy management method and system for intelligent network-connected hybrid electric vehicle |
CN113492827A (en) * | 2021-06-23 | 2021-10-12 | 东风柳州汽车有限公司 | Energy management method and device for hybrid electric vehicle |
CN113997926A (en) * | 2021-11-30 | 2022-02-01 | 江苏浩峰汽车附件有限公司 | Parallel hybrid electric vehicle energy management method based on layered reinforcement learning |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1252036B1 (en) * | 2000-01-31 | 2006-03-15 | Azure Dynamics Inc. | Method and apparatus for adaptive hybrid vehicle control |
US20120035795A1 (en) * | 2010-08-05 | 2012-02-09 | Ford Global Technologies, Llc | Distance oriented energy management strategy for a hybrid electric vehicle |
CN102717797A (en) * | 2012-06-14 | 2012-10-10 | 北京理工大学 | Energy management method and system of hybrid vehicle |
CN110682905A (en) * | 2019-10-12 | 2020-01-14 | 重庆大学 | Method for acquiring battery charge state reference variable quantity in time domain based on driving mileage |
CN111267831A (en) * | 2020-02-28 | 2020-06-12 | 南京航空航天大学 | Hybrid vehicle intelligent time-domain-variable model prediction energy management method |
CN111267830A (en) * | 2020-02-10 | 2020-06-12 | 南京航空航天大学 | Hybrid power bus energy management method, device and storage medium |
-
2020
- 2020-08-20 CN CN202010845021.9A patent/CN112026744B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1252036B1 (en) * | 2000-01-31 | 2006-03-15 | Azure Dynamics Inc. | Method and apparatus for adaptive hybrid vehicle control |
US20120035795A1 (en) * | 2010-08-05 | 2012-02-09 | Ford Global Technologies, Llc | Distance oriented energy management strategy for a hybrid electric vehicle |
CN102717797A (en) * | 2012-06-14 | 2012-10-10 | 北京理工大学 | Energy management method and system of hybrid vehicle |
CN110682905A (en) * | 2019-10-12 | 2020-01-14 | 重庆大学 | Method for acquiring battery charge state reference variable quantity in time domain based on driving mileage |
CN111267830A (en) * | 2020-02-10 | 2020-06-12 | 南京航空航天大学 | Hybrid power bus energy management method, device and storage medium |
CN111267831A (en) * | 2020-02-28 | 2020-06-12 | 南京航空航天大学 | Hybrid vehicle intelligent time-domain-variable model prediction energy management method |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112498334A (en) * | 2020-12-15 | 2021-03-16 | 清华大学 | Robust energy management method and system for intelligent network-connected hybrid electric vehicle |
CN112498334B (en) * | 2020-12-15 | 2022-03-11 | 清华大学 | Robust energy management method and system for intelligent network-connected hybrid electric vehicle |
CN113492827A (en) * | 2021-06-23 | 2021-10-12 | 东风柳州汽车有限公司 | Energy management method and device for hybrid electric vehicle |
CN113997926A (en) * | 2021-11-30 | 2022-02-01 | 江苏浩峰汽车附件有限公司 | Parallel hybrid electric vehicle energy management method based on layered reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN112026744B (en) | 2022-01-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112026744B (en) | Series-parallel hybrid power system energy management method based on DQN variants | |
CN110775065B (en) | Hybrid electric vehicle battery life prediction method based on working condition recognition | |
CN110991757B (en) | Comprehensive prediction energy management method for hybrid electric vehicle | |
Liessner et al. | Deep reinforcement learning for advanced energy management of hybrid electric vehicles. | |
CN111267830B (en) | Hybrid power bus energy management method, device and storage medium | |
CN108909702A (en) | A kind of plug-in hybrid-power automobile energy management method and system | |
CN112249002B (en) | TD 3-based heuristic series-parallel hybrid power energy management method | |
CN110682905B (en) | Method for acquiring battery charge state reference variable quantity in time domain based on driving mileage | |
CN113525396B (en) | Hybrid electric vehicle layered prediction energy management method integrating deep reinforcement learning | |
CN112668799A (en) | Intelligent energy management method and storage medium for PHEV (Power electric vehicle) based on big driving data | |
CN108647836B (en) | Driver energy-saving evaluation method and system | |
CN105527110B (en) | The appraisal procedure and device of automobile fuel ecomomy | |
Zöldy et al. | Modelling fuel consumption and refuelling of autonomous vehicles | |
CN112406875A (en) | Vehicle energy consumption analysis method and device | |
Montazeri-Gh et al. | Driving condition recognition for genetic-fuzzy HEV control | |
Vignesh et al. | Intelligent energy management through neuro-fuzzy based adaptive ECMS approach for an optimal battery utilization in plugin parallel hybrid electric vehicle | |
CN110077389B (en) | Energy management method for plug-in hybrid electric vehicle | |
CN115805840A (en) | Energy consumption control method and system for range-extending type electric loader | |
Peng et al. | Ecological driving framework of hybrid electric vehicle based on heterogeneous multi agent deep reinforcement learning | |
CN114969982A (en) | Fuel cell automobile deep reinforcement learning energy management method based on strategy migration | |
Puchalski et al. | Driving style analysis and driver classification using OBD data of a hybrid electric vehicle | |
CN109597346A (en) | A kind of novel commercial vehicle remained capacity and ramp estimation method | |
Chen et al. | On the relationship between energy consumption and driving behavior of electric vehicles based on statistical features | |
CN116538286A (en) | Commercial vehicle gear shifting system and method considering NVH characteristics | |
Kang et al. | A Costate Estimation for Pontryagin's Minimum Principle by Machine Learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |