CN111267830B - Hybrid power bus energy management method, device and storage medium - Google Patents
Hybrid power bus energy management method, device and storage medium Download PDFInfo
- Publication number
- CN111267830B CN111267830B CN202010084077.7A CN202010084077A CN111267830B CN 111267830 B CN111267830 B CN 111267830B CN 202010084077 A CN202010084077 A CN 202010084077A CN 111267830 B CN111267830 B CN 111267830B
- Authority
- CN
- China
- Prior art keywords
- energy management
- bus
- parameters
- training
- reinforcement learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W20/00—Control systems specially adapted for hybrid vehicles
- B60W20/10—Controlling the power contribution of each of the prime movers to meet required power demand
- B60W20/11—Controlling the power contribution of each of the prime movers to meet required power demand using model predictive control [MPC] strategies, i.e. control methods based on models predicting performance
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2530/00—Input parameters relating to vehicle conditions or values, not covered by groups B60W2510/00 or B60W2520/00
Abstract
The invention discloses a method, equipment and a storage medium for energy management of a hybrid bus, wherein the method comprises the following steps: acquiring parameters influencing energy management of the experimental vehicle under the working condition of a fixed bus route; obtaining a trained deep reinforcement learning agent based on parameters influencing energy management and an observed quantity training model; and acquiring parameters and observed quantities influencing the energy management in the actual running of the bus, and performing energy management on the hybrid bus under the working condition of a fixed route based on the parameters influencing the energy management in the actual running and a trained deep reinforcement learning agent. By adopting the technical scheme of the invention, the energy management of the hybrid power bus can be more effectively controlled, and the energy consumption is reduced.
Description
Technical Field
The application belongs to the technical field of hybrid electric vehicles, and particularly relates to a method, equipment and a storage medium for energy management of a hybrid electric bus.
Background
Most of the energy management of hybrid vehicles is a policy based on rules, and by setting a certain energy management threshold, the most common rule of the plug-in hybrid vehicle is to first consume the energy of a battery, then maintain the electric quantity of the battery, and perform the energy control on the rule.
The optimization-based strategy has a representative benchmark of DP (Dynamic Programming). The method comprises the steps that under the condition that global working condition information is known, the hybrid bus relatively optimal energy management is obtained in an off-line mode, and the optimal energy management is obtained by performing corresponding optimal energy demand distribution on an engine and a battery of the hybrid bus according to the known speed working condition.
In the prior art, engineers are used to develop rules for regular energy management or optimized model predictive control based on known or predicted speeds for energy management, thereby adjusting the equivalent fuel consumption of a hybrid bus.
In the course of implementing the present application, the inventors found that the related art has at least the following problems:
according to the method, the rule-based frequent energy management effect is not obvious enough, the working condition is single, the optimized DP needs to be known in the global working condition, the calculation time is too long, the real-time online application cannot be carried out, the existing model prediction can be optimized and carried out in real time, the prediction control step length cannot be too large, and the difference is still large compared with the optimization result of the DP.
Disclosure of Invention
In order to solve the technical problems in the related art, the embodiment of the application provides a method and equipment for energy management of a hybrid bus. The technical scheme of the bus energy management method and the bus energy management equipment is as follows:
in a first aspect, the present application provides a method for hybrid bus energy management, comprising:
acquiring parameters influencing energy management of the experimental vehicle under the working condition of a fixed bus route;
based on the parameters and the observed quantity influencing the energy management, a depth certainty strategy gradient training model is utilized to obtain the trained depth intensity
A learning agent;
acquiring parameters and observed quantities influencing energy management in actual running of the bus, and performing energy management on the hybrid bus under the working condition of a fixed route based on the parameters and observed quantities influencing the energy management in the actual running and the trained deep reinforcement learning agent.
Preferably, the parameters influencing energy management comprise road conditions and time periods on a fixed route of the hybrid bus, wherein the road conditions comprise ambient temperature, weather conditions, road gradient and traffic light conditions at intersections, and the number of passengers on each station of the bus.
Preferably, the observed quantity comprises the speed, the acceleration, the engine speed, the engine torque, the motor speed, the motor torque, the battery state of charge, the fuel consumption at the current moment, the difference value between the SOC and the reference SOC, and the bus displacement/time.
Preferably, the gradient training model using the depth certainty strategy comprises the following 5 types,
taking time as an abscissa, not considering intersection conditions, assuming that buses at the intersection meet green lights, not performing speed constraint, training by using Deep Deterministic Policy Gradient (DDPG) in Deep reinforcement learning to obtain a convergent agent, performing simulation, and acquiring equivalent fuel consumption and other parameters which may influence energy management on the experimental vehicle;
taking time as an abscissa, considering intersection conditions, assuming that buses meet red lights at the intersection, and the speed of the vehicles has a process of decelerating to 0 and then accelerating at the intersection, training by using DDPG in deep reinforcement learning to obtain a convergent agent, and performing simulation, wherein equivalent fuel consumption acquired on the experimental vehicles and other parameters possibly influencing energy management are acquired;
taking the displacement as an abscissa, not considering intersection conditions, assuming that buses at the intersection meet green lights, not carrying out speed constraint, training by using DDPG in deep reinforcement learning to obtain a convergent agent, carrying out simulation, and collecting equivalent fuel consumption and other parameters which may influence energy management on the experimental vehicle;
taking the displacement as an abscissa, considering the intersection condition, assuming that buses meet red lights at the intersection, and the speed of the buses is in the process of decelerating to 0 and then accelerating at the intersection point, training by using DDPG in deep reinforcement learning to obtain a convergent agent, performing simulation, and collecting equivalent fuel consumption and other parameters which may influence energy management on the experimental vehicle;
taking displacement as an abscissa, considering intersection conditions and intersection traffic light change signals, combining displacement with time and bus speed, constraining automobile speed, training by using DDPG in deep reinforcement learning to obtain a convergent proxy, performing simulation, and collecting equivalent fuel consumption and other parameters which may influence energy management on the experimental vehicle;
the method for obtaining the trained deep reinforcement learning agent specifically comprises the following steps: selecting a plurality of time points with obvious passenger number change in different time periods, then training in the 5 modes, comparing equivalent fuel consumption and other parameters which may influence energy management collected on an experimental vehicle after training with a dynamic planning result, and comparing with a dynamic planning reference when intersection signals are considered to obtain the deep reinforcement learning agent with corresponding number of representative time periods.
Preferably, after obtaining the equivalent fuel consumption of the experimental vehicle under the same route working condition at different time periods and the parameters affecting the energy management, the method further comprises: and acquiring parameters of the equivalent fuel consumption of the experimental vehicle under the same route working condition at each time period and possibly influencing the energy management at a preset sampling frequency, and performing smoothing processing and normalization processing on the acquired parameters.
Preferably, the training is performed by using a deep reinforcement learning DDPG method based on the parameters affecting the energy management and the equivalent fuel consumption, and the method specifically comprises the following steps:
in each training, the SOC at the current moment, the difference value between the SOC and a reference SOC, the automobile speed, the automobile displacement/automobile running time, the automobile acceleration and the fuel consumption are used as observed value input data of a DDPG proxy, the reward value at the current moment is used as reward value input data of the DDPG proxy, the model is trained on the basis of a depth certainty strategy gradient, and the trained depth reinforcement learning proxy is obtained.
Preferably, the acquiring parameters affecting the energy management in the actual running of the bus, and performing the bus energy management based on the parameters affecting the energy management in the actual running and the trained deep reinforcement learning agent includes:
acquiring parameters influencing the energy management in the actual running of the bus;
and inputting the observed value and the reward value into the deep reinforcement learning agent, and outputting the observed value and the reward value as the torque requirement of the motor and the engine of the bus at the next moment at the current moment, wherein the current moment is the moment of the current observed quantity.
In a second aspect, the present application provides an apparatus for hybrid bus energy management, the apparatus comprising,
the bus energy management system comprises an acquisition module, a processing module and a control module, wherein the acquisition module is configured to acquire parameters influencing energy management of a bus under a fixed route working condition;
the training module is configured to train the model by using a DDPG strategy in deep reinforcement learning based on the parameters influencing energy management to obtain a trained deep reinforcement learning agent;
a real-time implementation module configured to acquire an observed quantity affecting the energy management during actual driving of the bus based on the shadow during actual driving
Parameters of sound energy management and the trained deep reinforcement learning agent select the depth intensity of the corresponding time interval according to the time of the bus
A learning agent to perform the bus road energy management;
the acquisition module, the training module and the real-time implementation module are sequentially connected.
Optionally, the acquisition module is configured to:
the road condition on the fixed route of the hybrid power bus comprises at least one parameter of temperature weather, the number of passengers at different stops on the bus route and the signal lamp condition of the same bus and a traffic channel intersection at different time periods every day.
Optionally, the acquisition module is configured to:
acquiring K samples, wherein each sample comprises parameters which are acquired on the experimental vehicle at different moments and possibly influence energy management;
optionally, the acquisition module is configured to:
the speed of the bus, the acceleration of the bus, the engine speed, the engine torque, the motor speed, the motor torque, the battery state of charge, the fuel consumption at the current moment, the difference between the SOC and the reference SOC, and the bus displacement (time).
Optionally, the acquisition module is configured to:
and acquiring equivalent fuel consumption of the experimental vehicle under a fixed route working condition and parameters influencing energy management at a preset sampling frequency, and performing smoothing processing and normalization processing on the acquired data.
Optionally, the training module is configured to:
in each training, the SOC at the current moment, the difference value between the SOC and the reference SOC, the automobile speed, the automobile displacement (automobile running time), the automobile acceleration and the fuel consumption are used as observed value input data of the DDPG proxy, and the reward value at the current moment is used as reward value input data of the DDPG proxy, the model is trained on the basis of a depth certainty gradient strategy, and the trained depth-enhanced learning proxy is obtained.
Optionally, the training module is configured to:
acquiring observed quantity of at least one moment before the estimated moment in the actual running of the hybrid bus and parameters influencing the energy control;
and inputting the observed quantity of the at least one moment and parameters influencing the energy management into the deep reinforcement learning agent, outputting a control behavior, and controlling the bus, wherein the estimated moment is the moment of carrying out parameter sampling next to the current moment.
In a third aspect, the present invention provides a storage medium, which includes a program stored in the storage medium, and when the program runs, the device where the storage medium is located is controlled to execute the hybrid bus energy management method in the above technical solution.
The technical scheme provided by the invention has the beneficial effects that at least:
the method provided by the invention obtains equivalent fuel consumption and parameters influencing energy management of the experimental vehicle under the working condition of a fixed route; training a deep reinforcement learning agent model based on parameters and observed quantities influencing the energy management to obtain a trained agent; the method comprises the steps of obtaining parameters influencing energy management in actual running of the bus, and carrying out energy management on the bus based on the parameters influencing energy management in actual running and a trained agent, so that energy optimization can be effectively controlled. The method provided by the embodiment of the application can be used for energy management of the hybrid bus, the influence of signals of traffic lights at a road intersection on the energy management is considered, in addition, representative 12 time intervals are selected for buses which are susceptible to time intervals to be trained respectively to obtain the agent model, DDPG agents in corresponding time intervals can be selected according to the time of the bus, more effective control on the energy management of the hybrid bus is achieved, and energy consumption is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic diagram of a training process of deep reinforcement learning DDPG according to an embodiment of the present application;
FIG. 2 is a flow chart of a method for hybrid bus energy management provided by an embodiment of the present application;
FIG. 3 is a schematic diagram illustrating a grid structure and gradient back propagation of an Actor-critical for deep reinforcement learning according to an embodiment of the present disclosure;
FIG. 4 is a flow chart of an apparatus for controlling energy management of a hybrid bus according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of deep reinforcement learning energy management of a hybrid bus provided in an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
FIG. 1 is a schematic diagram of a training process of a deep reinforcement learning DDPG provided by an embodiment of the present application;
referring to fig. 1, in the implementation environment, a deep reinforcement learning agent after training is obtained by collecting parameters that may affect energy management when an experimental vehicle runs in different bus periods under a fixed route working condition and using a deep certainty strategy gradient training model. Specifically, parameters and observed quantities which may affect energy management and reward values are used as input data, the input data are respectively input into a controlled object (namely a hybrid bus) and a deep reinforcement learning agent, and the deep reinforcement learning agent outputs a control quantity action. And outputting the control signal to a controlled object, inputting the reward value into a deep reinforcement learning agent for training, and adjusting a Cryc parameter in the agent by utilizing reverse gradient descent so as to finish one-time training. Through the repeated and continuous learning training, a converged proxy after training is finally established.
And acquiring parameters and observed quantities influencing the energy management in the actual running of the bus, and performing energy management on the hybrid bus under the working condition of a fixed route based on the parameters and observed quantities influencing the energy management in the actual running and the trained deep reinforcement learning agent.
The embodiment can be applied to the scene of hybrid bus under the fixed route operating mode, for example, utilize this embodiment to carry out the energy management of bus, the bus can be according to training good agent in advance, adjusts the condition of traveling of on-line bus, and then reduces the equivalent fuel consumption of bus and carries out more accurate control to the bus.
As a specific implementation, as shown in fig. 2, the present embodiment provides a method for controlling energy management of a hybrid bus, the method comprising:
The working condition can represent the running displacement of the experimental vehicle, for example, the distance from a starting station to an end station of a bus is 10km, and the working condition can be regarded as one working condition. In this embodiment, a fixed round-trip route of a certain bus is used as an experimental condition, and data of the experimental vehicle circulating at least three times under the condition is collected to ensure the reliability of training data.
The parameter affecting energy management may be at least one of road conditions on a fixed route of the hybrid bus including temperature weather, number of passengers at different stops on the bus route, and signal light conditions of the same bus, traffic lane intersection at different time periods each day.
In the implementation, at least one parameter which may influence energy management of the experimental vehicle under each working condition is obtained, and the parameter which influences the energy management is selected from the at least one parameter which may influence the energy management. The equivalent fuel consumption of the experimental vehicle under the working condition is obtained by reading the fuel consumption and the data of the battery electric quantity sensor installed on the bus.
Optionally, parameters and observed quantities of the experimental vehicle, which are equivalent to fuel consumption and affect energy management under various working conditions, are collected at a preset sampling frequency, and smoothing and normalization processing are performed on the collected parameters.
The larger the sampling frequency is, the smaller the interval between sampling points is, the more data are obtained, and the greater the correlation between the data is, so that the result output by the finally trained agent model is more accurate. The technician can preset a sampling time interval, and samples the experimental vehicle in running according to the sampling time interval. For example, the technician may set the sampling frequency to 1 HZ.
By smoothing the collected data, the purpose of suppressing the collected inaccurate data can be achieved.
The acquired parameters are normalized, so that the values corresponding to different parameters have certain comparability, the accuracy of the proxy network model is improved, and great help is provided for setting the reward value.
Specifically, in the normalization process in this embodiment, because multiple sets of parameter items have been obtained in the above process, the data after the parameter normalization processing in each set of parameter items can be calculated by determining the maximum parameter value and the minimum parameter value in each set of parameter items and according to the determined maximum parameter value and the determined minimum parameter value, through the following formula:
wherein x isminFor the minimum parameter value, x, in each set of parameter termsmaxThe maximum parameter value in each group of parameter items.
The deep reinforcement learning agent is one of the combination of neural network models, and can control the action data of the next moment according to the observation data of the estimated moment. The deep reinforcement learning agent is mainly divided into two layers, as shown in fig. 3, the two layers are a Critic layer and an Actor layer, wherein data is input in the Critic layer of the agent model, and data is output in the Actor layer of the agent.
And (3) performing Q function calculation on the evaluation network to obtain a Q value: q (s, a | theta)Q) The input is state s and action a, and the output is Q function Q ═ s, a | thetaQ) The action network maps the state s to the action to obtain a ═ s, a | thetaQ) The input is a state s, the output is a motion a, the evaluation network is divided into an Online evaluation network and a Target evaluation network, and the motion network is divided into an Online motion network and a Target motion network.
Target evaluation network and Online evaluation network have the same structure, and parameter theta of Online evaluation network Online action network is equal to that of Online evaluation networkQ、θμPerforming random initialization through the two networksNetwork parameter initial Target evaluation network and network parameter theta of Target action networkQ'And thetaμ'And simultaneously, a space R is opened up to be used as a storage space for experience playback.
After the initialization is finished, iterative solution is started, an action is selected for exploration by adding a Gaussian disturbance to the current network,
at=μ(s|θμ)+Ntin which N istIs a gaussian perturbation. Performing action a in the current statetThe corresponding reward and next state are obtained and the process is formed into element groups(s)t,at,rt,st+1) Store into Memory Replay space. Randomly selecting a small batch of data from the MemoryRelay space as training data of an Online action network and an Online evaluation network, and updating the Online evaluation network.
Defining an Online evaluation network Loss function:the Online evaluation network is updated by minimizing the Loss function. And after the update of the Online evaluation network is finished, updating the Online action network.
The calculated gradient is:
and updating the Online action network according to the gradient descent principle. Finally, the updated Online evaluation network and the parameter theta of the Online action network are utilizedQAnd thetaμNetwork parameter theta of Target evaluation network and Target action networkQ'And thetaμ′Updating:
as shown in fig. 3, fig. 3 is a schematic structural diagram of a deep reinforcement learning agent model, in each training, parameters and observations affecting braking energy management at a time before an estimated time and a reward value obtained therefrom are used as input data of the agent model, the input data are input into a hidden layer of the agent, and control action data are output at an output layer of the agent. And performing back propagation training according to the gradient value, and adjusting parameters in the proxy model, thereby completing one-time training. And finally establishing a trained agent model through the repeated and continuous learning training.
In the actual process, the number of the neurons in the hidden layer of the agent model is set to be 40, and in order to accurately evaluate the effect of deep reinforcement learning energy management, the control effect of the deep reinforcement learning can be evaluated through the equivalent fuel consumption ratio R.
The equivalent fuel consumption ratio reflects a comparison between the effect of the actual control and the DP reference, with good results as the R value approaches 0. The formula for calculating the value of the ratio R is as follows:
wherein R is represented as the ratio between DP reference data and actual data, SRLRepresenting equivalent fuel consumption, S, by deep reinforcement learning trainingDPThe equivalent fuel consumption reference data obtained under the DP reference is shown.
It should be noted that the prediction performance of the nonlinear autoregressive dynamic neural network model after the training is completed is evaluated by calculating the ratio and the root mean square between the reference data and the actual data, for example, when the obtained ratio is close to 0, the root mean square is close to 0, which indicates that the proxy model after the training is completed by the deep reinforcement learning has good control performance.
Optionally, the training method may be a DDPG algorithm, or a DQN algorithm, Q-learning. In order to make the control data of the deep reinforcement learning algorithm more accurate, a plurality of back propagation training methods can be used for respectively training the agent model, the R value is calculated in the training process of each back propagation training method, the control performance of each back propagation training method is compared by taking the R value as an index, and the back propagation training method with the best training effect is further determined.
Taking time as an abscissa, not easily considering intersection conditions, assuming that buses at the intersection meet green lights, not performing speed constraint, training by using DDPG in deep reinforcement learning to obtain a convergent agent, performing simulation, and collecting equivalent fuel consumption and other parameters which may influence energy management on the experimental vehicle;
taking time as an abscissa, considering intersection conditions, assuming that buses meet red lights at the intersection, enabling the speed to have a process of decelerating to 0 and then accelerating at the intersection point, training by using DDPG in deep reinforcement learning to obtain a convergent agent, and performing simulation, wherein equivalent fuel consumption acquired on the experimental vehicle and other parameters possibly influencing energy management are acquired;
taking the displacement as an abscissa, not considering intersection conditions, assuming that buses at the intersection meet green lights, not carrying out speed constraint, training by using DDPG in deep reinforcement learning to obtain a convergent agent, carrying out simulation, and collecting equivalent fuel consumption and other parameters which may influence energy management on the experimental vehicle;
taking the displacement as an abscissa, considering the intersection condition, assuming that buses meet red lights at the intersection, enabling the speed to have a process of decelerating to 0 and then accelerating at the intersection point, training by using DDPG in deep reinforcement learning to obtain a convergent agent, and performing simulation, wherein equivalent fuel consumption acquired on the experimental vehicle and other parameters possibly influencing energy management are acquired;
taking the displacement as the abscissa, considering the crossing situation, the crossing traffic light change signal is shown in table 1, and the last column of ellipses in table 1 indicates that the change situation is the same as the above columns. Combine displacement and time and bus speed, retrain car speed, specifically retrain has two: once the vehicle is at the intersection position, if the vehicle is just in the red light time domain, the speed of the vehicle must be 0, and forced constraint is carried out until the red light time domain is finished, so that the vehicle can not continue to travel. The second is that the automobile speed V (t) has the following relationship with the displacement X (t) and the time t.
When the displacement is determined, the total driving time and even the time between stations are restricted by the upper limit and the lower limit, namely the speed (the average speed of the displacement vehicle) is also restricted. And selecting the same time period, training by using DDPG in deep reinforcement learning to obtain a convergent agent, carrying out simulation, and collecting equivalent fuel consumption and other parameters which may influence energy management on the experimental vehicle.
The method for obtaining the trained deep reinforcement learning agent specifically comprises the following steps: selecting a plurality of time points with obvious change of passenger number in different time periods respectively, then training in the 5 modes, comparing equivalent fuel consumption and other parameters which possibly influence energy management acquired on an experimental vehicle after training with a result of dynamic planning, wherein time and displacement are taken as horizontal coordinates respectively, 4 conditions of green lights and red lights are met at all intersections, the 5 th condition of traffic light change conditions of the intersections is taken as the horizontal coordinates, and the equivalent fuel consumption, speed and SOC curves are compared under the condition that other conditions are the same, so that the increase rate or the decrease rate of the equivalent fuel consumption is output, and the deep reinforcement learning agent with corresponding number of representative time periods is obtained. In this embodiment, 12 time points with obvious characteristics are selected in different time periods respectively, the number of passengers at each station is changed, then training is performed in the 5 ways, equivalent fuel consumption acquired on the experimental vehicle and other parameters which may affect energy management are compared with each other, and compared with a Dynamic Programming (DP) reference considering intersection signals, so as to obtain a deep reinforcement learning agent with 12 representative time periods;
TABLE 1
Under the condition of 5 above, one of 12 groups of data was selected, and the R value is shown in table 2.
TABLE 2
Training algorithm | R |
St | -0.0376 |
Stc | 0.0762 |
Sd | -0.0423 |
Sdc | 0.0631 |
Sdct | 0.0384 |
From table 2, in the deep reinforcement learning training process, the ratio R value of the obtained reference data to the actual data is close to 0, the displacement is used as the abscissa, so that the performance is better, and the crossing traffic light signal condition is considered to have a great influence on the actual control result.
And 303, acquiring parameters and observed quantities influencing energy management in actual running of the bus, and controlling the energy management in the bus based on the parameters influencing the energy management in the actual running and the trained agent model.
In the steps, the parameter item influencing energy management and the trained agent model are obtained, so that the parameters influencing energy and the observed quantity can be input into the trained agent model in real time to control the running and the energy management of the bus.
Specifically, if the control action of the bus at the current moment is to be controlled, parameters and observed quantities influencing energy management at the estimated moment need to be acquired, and the parameters influencing energy management are at least one of the parameters of the road condition and different time periods on the fixed route of the hybrid bus, the number of passengers at different stops on the bus route and the signal lamp condition at the intersection of the traffic channel. The observed quantity is the speed of the bus, the acceleration of the bus, the engine speed, the engine torque, the motor speed, the motor torque, the battery state of charge, the fuel consumption at the current moment, the difference value between the SOC and the reference SOC, and the bus displacement (time). Inputting the parameters into the trained agent model, and outputting the control action of the bus at the estimated time, wherein the estimated time is the next time for parameter sampling at the current time, that is, the estimated time is the time corresponding to the next sampling point of the sampling point corresponding to the current time.
The method provided by the embodiment of the invention can be used for energy management of the hybrid bus and reduce equivalent fuel consumption.
As shown in fig. 4, another embodiment of the present application provides an apparatus for controlling energy management of a hybrid bus, the apparatus including:
the acquisition module 401 is configured to acquire parameters and observed quantities of the experimental vehicle influencing the brake pressure under various working conditions;
a training module 402, configured to train a deep reinforcement learning agent model based on the parameters and observations affecting the brake pressure and a set reward value calculation, to obtain a trained convergent agent model;
and the real-time control module 403 is configured to acquire parameters and observed quantities affecting the energy management in actual running of the bus, and control the bus energy management based on the parameters and observed quantities affecting the energy management in actual running and the trained agent model.
Optionally, the acquisition module 401 is configured to:
at least one parameter of road condition and time signal on the fixed route of the hybrid power bus, the number of passengers at different stops on the bus route and the signal lamp condition of the intersection of the traffic channel. The speed of the bus, the acceleration of the bus, the engine speed, the engine torque, the motor speed, the motor torque, the battery state of charge, the fuel consumption at the current moment, the difference between the SOC and the reference SOC, and the bus displacement (time).
Optionally, the acquisition module 401 is configured to:
acquiring K samples, wherein each sample comprises parameters which are acquired on the experimental vehicle at the same time and possibly affect energy management;
and acquiring parameters influencing the energy management of the experimental vehicle under various working conditions at a preset sampling frequency, and performing smoothing processing and normalization processing on the acquired data.
Optionally, the training module 402 is configured to:
in each training, parameters and observed values influencing energy management at the estimated time are used as input data of the agent model, calculated reward values are obtained, the agent model is trained on the basis of DDPG back propagation, and the trained agent model is obtained.
Optionally, the real-time implementation module 403 is configured to:
acquiring parameters and observed quantities affecting the energy management at the estimated time in the actual running of the bus, inputting the parameters and the observed quantities affecting the energy management at the time into the agent model, and outputting the bus control quantity, wherein the control time is the next time for controlling at the current time.
It should be noted that: in the energy management of the device for controlling energy management of a bus provided in the above embodiment, only the division of the above functional modules is taken as an example, and in practical applications, after the functions are distributed by different functional modules according to needs, the internal structure of the device may be divided into different functional modules, and all or part of the functions described above are described later. In addition, the embodiments of the method for controlling bus energy management provided by the above embodiments belong to the same concept, and specific implementation processes thereof are detailed in the embodiments of the method and are not described herein again.
As shown in fig. 5, a schematic diagram of a deep reinforcement learning energy management structure of a hybrid bus is provided in this embodiment. Through the fact that the historical travel information mainly comprises the working condition mileage and the working condition running time of the bus, a simple SOC reference can be obtained, namely the reference SOC is decreased at a constant speed in a linear function mode along with the change of displacement (time). The method comprises the steps of obtaining running time of a bus known by an acquisition module, selecting a trained deep reinforcement learning agent in a corresponding time period of 12 representative time periods, obtaining parameters and observed quantities affecting energy management at estimated time in actual running of the bus, inputting the observed quantities affecting the energy management at the time into an agent model, outputting corresponding torque of an engine and a motor of a bus control quantity at the next time, controlling the hybrid bus under consideration of the parameters affecting the energy management, and repeating the processes until the vehicle completes a running task.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.
Claims (8)
1. A hybrid power bus energy management method is characterized by comprising
Acquiring parameters influencing energy management of the experimental vehicle under the working condition of a fixed bus route;
obtaining a trained deep reinforcement learning agent based on the parameters influencing energy management and the observed quantity training model;
acquiring parameters and observed quantities influencing energy management in actual running of the bus, and performing energy management on the hybrid bus under a fixed route working condition based on the parameters and observed quantities influencing the energy management in the actual running and the trained deep reinforcement learning agent;
wherein the training model based on the parameters affecting energy management and the observed quantity is a deep deterministic strategy gradient algorithm, a DQN algorithm or Q-learning, and the training model using the deep deterministic strategy gradient comprises the following 5,
the time is taken as an abscissa, the crossing condition is not considered, the bus at the crossing is assumed to be all green light, speed constraint is not carried out, DDPG in deep reinforcement learning is used for training, a convergent agent is obtained, simulation is carried out, and equivalent fuel consumption and other parameters which possibly influence energy management are collected on the experimental vehicle;
taking time as an abscissa, considering intersection conditions, assuming that buses meet red lights at the intersection, and the speed of the vehicles has a process of decelerating to 0 and then accelerating at the intersection, training by using DDPG in deep reinforcement learning to obtain a convergent agent, and performing simulation, wherein equivalent fuel consumption acquired on the experimental vehicles and other parameters possibly influencing energy management are acquired;
taking the displacement as an abscissa, not considering intersection conditions, assuming that buses at the intersection meet green lights, not carrying out speed constraint, training by using DDPG in deep reinforcement learning to obtain a convergent agent, carrying out simulation, and collecting equivalent fuel consumption and other parameters which may influence energy management on the experimental vehicle;
taking the displacement as an abscissa, considering the intersection condition, assuming that buses meet red lights at the intersection, and the speed of the buses is in the process of decelerating to 0 and then accelerating at the intersection point, training by using DDPG in deep reinforcement learning to obtain a convergent agent, performing simulation, and collecting equivalent fuel consumption and other parameters which may influence energy management on the experimental vehicle;
taking displacement as an abscissa, considering intersection conditions and intersection traffic light change signals, combining displacement with time and bus speed, constraining automobile speed, training by using DDPG in deep reinforcement learning to obtain a convergent proxy, performing simulation, and collecting equivalent fuel consumption and other parameters which may influence energy management on the experimental vehicle;
the method for obtaining the trained deep reinforcement learning agent specifically comprises the following steps: selecting a plurality of time points with obvious passenger number change in different time periods, then training in the 5 modes, comparing equivalent fuel consumption and other parameters which may influence energy management collected on an experimental vehicle after training with a dynamic planning result, and comparing with a dynamic planning reference when intersection signals are considered to obtain the deep reinforcement learning agent with corresponding number of representative time periods.
2. The hybrid bus energy management method as claimed in claim 1, wherein the parameters affecting energy management comprise road conditions and time periods on a fixed route of the hybrid bus, wherein the road conditions comprise ambient temperature, weather conditions, road gradient, traffic light conditions at intersections, and the number of passengers on each stop of the bus.
3. The hybrid bus energy management method of claim 1, wherein the observed quantities comprise bus speed, acceleration, engine speed, engine torque, motor speed, motor torque, battery state of charge, current time fuel consumption, difference between SOC and reference SOC, and bus displacement/time.
4. The energy management method for the hybrid electric bus according to claim 1, wherein after acquiring the equivalent fuel consumption of the experimental vehicle under the same route working conditions in different periods and the parameters influencing the energy management, the method further comprises the following steps: and acquiring parameters of the equivalent fuel consumption of the experimental vehicle under the same route working condition at each time period and possibly influencing the energy management at a preset sampling frequency, and performing smoothing processing and normalization processing on the acquired parameters.
5. The hybrid bus energy management method according to any one of claims 1 to 4, wherein the agent is obtained by training with a deep reinforcement learning DDPG method based on parameters affecting the energy management and the equivalent fuel consumption, specifically:
in each training, the SOC at the current moment, the difference value between the SOC and the reference SOC, the automobile speed, the automobile displacement/automobile running time, the automobile acceleration and the fuel consumption are used as observed value input data of the DDPG proxy, and the reward value at the current moment is used as reward value input data of the DDPG proxy, the model is trained on the basis of the depth certainty strategy gradient, and the trained depth reinforcement learning proxy is obtained.
6. The hybrid bus energy management method according to claim 5, wherein the acquiring parameters affecting the energy management in the actual driving of the bus, and performing the bus energy management based on the parameters affecting the energy management in the actual driving and the trained deep reinforcement learning agent comprises:
acquiring parameters influencing the energy management in the actual running of the bus;
and inputting the observed value and the reward value into the deep reinforcement learning agent, and outputting the observed value and the reward value as the torque requirement of the motor and the engine of the bus at the next moment at the current moment, wherein the current moment is the moment of the current observed quantity.
7. An apparatus for controlling energy management in a bus, said apparatus comprising,
the bus energy management system comprises an acquisition module, a processing module and a control module, wherein the acquisition module is configured to acquire parameters influencing energy management of a bus under a fixed route working condition;
the training module is configured to train a model by using a DDPG strategy in deep reinforcement learning based on the parameters influencing energy management to obtain a trained deep reinforcement learning agent;
the real-time implementation module is configured to obtain observed quantity influencing energy management in actual running of the bus, and based on parameters influencing energy management in actual running and the trained deep reinforcement learning agent, the deep reinforcement learning agent in the corresponding time period is selected according to the time of the bus to conduct bus road energy management;
the acquisition module, the training module and the real-time implementation module are sequentially connected;
wherein a model is trained based on the parameters affecting energy management and the observed quantity, the training method used is a deep deterministic strategy gradient algorithm, a DQN algorithm or Q-learning, and the training model using the deep deterministic strategy gradient comprises the following 5,
the time is taken as an abscissa, the crossing condition is not considered, the bus at the crossing is assumed to be all green light, speed constraint is not carried out, DDPG in deep reinforcement learning is used for training, a convergent agent is obtained, simulation is carried out, and equivalent fuel consumption and other parameters which possibly influence energy management are collected on an experimental vehicle;
taking time as an abscissa, considering intersection conditions, assuming that buses meet red lights at the intersection, and the speed of the vehicles has a process of decelerating to 0 and then accelerating at the intersection, training by using DDPG in deep reinforcement learning to obtain a convergent agent, and performing simulation, wherein equivalent fuel consumption acquired on the experimental vehicles and other parameters possibly influencing energy management are acquired;
taking the displacement as an abscissa, not considering intersection conditions, assuming that buses at the intersection meet green lights, not carrying out speed constraint, training by using DDPG in deep reinforcement learning to obtain a convergent agent, carrying out simulation, and collecting equivalent fuel consumption and other parameters which may influence energy management on the experimental vehicle;
taking the displacement as an abscissa, considering the intersection condition, assuming that buses meet red lights at the intersection, and the speed of the buses is in the process of decelerating to 0 and then accelerating at the intersection point, training by using DDPG in deep reinforcement learning to obtain a convergent agent, performing simulation, and collecting equivalent fuel consumption and other parameters which may influence energy management on the experimental vehicle;
taking displacement as an abscissa, considering intersection conditions and intersection traffic light change signals, combining displacement with time and bus speed, constraining automobile speed, training by using DDPG in deep reinforcement learning to obtain a convergent proxy, performing simulation, and collecting equivalent fuel consumption and other parameters which may influence energy management on the experimental vehicle;
the method for obtaining the trained deep reinforcement learning agent specifically comprises the following steps: selecting a plurality of time points with obvious passenger number change in different time periods, then training in the 5 modes, comparing equivalent fuel consumption and other parameters which may influence energy management collected on an experimental vehicle after training with a dynamic planning result, and comparing with a dynamic planning reference when intersection signals are considered to obtain the deep reinforcement learning agent with corresponding number of representative time periods.
8. A storage medium, characterized by: the energy management system comprises a program stored in the storage medium, and the device where the storage medium is located is controlled to execute the energy management method of the hybrid bus according to any one of claims 1-6 when the program runs.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010084077.7A CN111267830B (en) | 2020-02-10 | 2020-02-10 | Hybrid power bus energy management method, device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010084077.7A CN111267830B (en) | 2020-02-10 | 2020-02-10 | Hybrid power bus energy management method, device and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111267830A CN111267830A (en) | 2020-06-12 |
CN111267830B true CN111267830B (en) | 2021-07-09 |
Family
ID=70994986
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010084077.7A Active CN111267830B (en) | 2020-02-10 | 2020-02-10 | Hybrid power bus energy management method, device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111267830B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111959509B (en) * | 2020-08-19 | 2022-06-17 | 重庆交通大学 | Q learning regenerative braking control strategy based on state space domain battery energy balance |
CN112026744B (en) * | 2020-08-20 | 2022-01-04 | 南京航空航天大学 | Series-parallel hybrid power system energy management method based on DQN variants |
CN111965981B (en) * | 2020-09-07 | 2022-02-22 | 厦门大学 | Aeroengine reinforcement learning control method and system |
CN112249002B (en) * | 2020-09-23 | 2022-06-28 | 南京航空航天大学 | TD 3-based heuristic series-parallel hybrid power energy management method |
CN112287463B (en) * | 2020-11-03 | 2022-02-11 | 重庆大学 | Fuel cell automobile energy management method based on deep reinforcement learning algorithm |
CN112613229B (en) * | 2020-12-14 | 2023-05-23 | 中国科学院深圳先进技术研究院 | Energy management method, model training method and device for hybrid power equipment |
CN112837532B (en) * | 2020-12-31 | 2022-04-01 | 东南大学 | New energy bus cooperative dispatching and energy-saving driving system and control method thereof |
CN112989715B (en) * | 2021-05-20 | 2021-08-03 | 北京理工大学 | Multi-signal-lamp vehicle speed planning method for fuel cell vehicle |
CN113911103B (en) * | 2021-12-14 | 2022-03-15 | 北京理工大学 | Hybrid power tracked vehicle speed and energy collaborative optimization method and system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105270383A (en) * | 2014-05-30 | 2016-01-27 | 福特全球技术公司 | Vehicle speed profile prediction using neural networks |
CN109934332A (en) * | 2018-12-31 | 2019-06-25 | 中国科学院软件研究所 | The depth deterministic policy Gradient learning method in pond is tested based on reviewer and double ends |
CN110254418A (en) * | 2019-06-28 | 2019-09-20 | 福州大学 | A kind of hybrid vehicle enhancing study energy management control method |
CN110751346A (en) * | 2019-11-04 | 2020-02-04 | 重庆中涪科瑞工业技术研究院有限公司 | Distributed energy management method based on driving speed prediction and game theory |
Family Cites Families (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130073104A1 (en) * | 2011-09-20 | 2013-03-21 | Maro Sciacchitano | Modular intelligent energy management, storage and distribution system |
CN102729987B (en) * | 2012-06-20 | 2014-11-19 | 浙江大学 | Hybrid bus energy management method |
CN107618501B (en) * | 2016-07-15 | 2020-10-09 | 联合汽车电子有限公司 | Energy management method for hybrid vehicle, terminal device and server |
KR101917375B1 (en) * | 2016-11-22 | 2018-11-12 | 한국에너지기술연구원 | Energy management system and method using machine learning |
CN107284441B (en) * | 2017-06-07 | 2019-07-05 | 同济大学 | The energy-optimised management method of the adaptive plug-in hybrid-power automobile of real-time working condition |
KR20190075294A (en) * | 2017-12-21 | 2019-07-01 | 중앙대학교 산학협력단 | Deep Learning Based Building Energy Management System and System Maintenance Method Using the Building Energy Management System |
CN108177648B (en) * | 2018-01-02 | 2019-09-17 | 北京理工大学 | A kind of energy management method of the plug-in hybrid vehicle based on intelligent predicting |
CN108427985B (en) * | 2018-01-02 | 2020-05-19 | 北京理工大学 | Plug-in hybrid vehicle energy management method based on deep reinforcement learning |
CA3030490A1 (en) * | 2018-01-22 | 2019-07-22 | Pason Power Inc. | Intelligent energy management system for distributed energy resources and energy storage systems using machine learning |
EP3807137A4 (en) * | 2018-06-15 | 2021-12-22 | The Regents of the University of California | Systems, apparatus and methods to improve plug-in hybrid electric vehicle energy performance by using v2c connectivity |
CN110194172A (en) * | 2019-06-28 | 2019-09-03 | 重庆大学 | Based on enhanced neural network plug-in hybrid passenger car energy management method |
CN110481536B (en) * | 2019-07-03 | 2020-12-11 | 中国科学院深圳先进技术研究院 | Control method and device applied to hybrid electric vehicle |
CN110341690B (en) * | 2019-07-22 | 2020-08-04 | 北京理工大学 | PHEV energy management method based on deterministic strategy gradient learning |
CN110458443B (en) * | 2019-08-07 | 2022-08-16 | 南京邮电大学 | Smart home energy management method and system based on deep reinforcement learning |
CN110610260B (en) * | 2019-08-21 | 2023-04-18 | 南京航空航天大学 | Driving energy consumption prediction system, method, storage medium and equipment |
-
2020
- 2020-02-10 CN CN202010084077.7A patent/CN111267830B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105270383A (en) * | 2014-05-30 | 2016-01-27 | 福特全球技术公司 | Vehicle speed profile prediction using neural networks |
CN109934332A (en) * | 2018-12-31 | 2019-06-25 | 中国科学院软件研究所 | The depth deterministic policy Gradient learning method in pond is tested based on reviewer and double ends |
CN110254418A (en) * | 2019-06-28 | 2019-09-20 | 福州大学 | A kind of hybrid vehicle enhancing study energy management control method |
CN110751346A (en) * | 2019-11-04 | 2020-02-04 | 重庆中涪科瑞工业技术研究院有限公司 | Distributed energy management method based on driving speed prediction and game theory |
Non-Patent Citations (1)
Title |
---|
Energy Management Strategy for a Hybrid Electric Vehicle Based on Deep Reinforcement Learning;Yue Hu等;《applied sciences》;20180126;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN111267830A (en) | 2020-06-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111267830B (en) | Hybrid power bus energy management method, device and storage medium | |
CN110610260B (en) | Driving energy consumption prediction system, method, storage medium and equipment | |
Wegener et al. | Automated eco-driving in urban scenarios using deep reinforcement learning | |
CN111267831A (en) | Hybrid vehicle intelligent time-domain-variable model prediction energy management method | |
CN110991757B (en) | Comprehensive prediction energy management method for hybrid electric vehicle | |
CN107577234B (en) | Automobile fuel economy control method for driver in-loop | |
CN113010967B (en) | Intelligent automobile in-loop simulation test method based on mixed traffic flow model | |
JP2022532972A (en) | Unmanned vehicle lane change decision method and system based on hostile imitation learning | |
CN104200267A (en) | Vehicle driving economy evaluation system and vehicle driving economy evaluation method | |
Valera et al. | Driving cycle and road grade on-board predictions for the optimal energy management in EV-PHEVs | |
CN113525396B (en) | Hybrid electric vehicle layered prediction energy management method integrating deep reinforcement learning | |
CN113415288B (en) | Sectional type longitudinal vehicle speed planning method, device, equipment and storage medium | |
CN112339756B (en) | New energy automobile traffic light intersection energy recovery optimization speed planning algorithm based on reinforcement learning | |
CN112735126A (en) | Mixed traffic flow cooperative optimization control method based on model predictive control | |
CN112026744B (en) | Series-parallel hybrid power system energy management method based on DQN variants | |
CN116187161A (en) | Intelligent energy management method and system for hybrid electric bus in intelligent networking environment | |
Deshpande et al. | In-vehicle test results for advanced propulsion and vehicle system controls using connected and automated vehicle information | |
Pi et al. | Automotive platoon energy-saving: A review | |
CN115534929A (en) | Plug-in hybrid electric vehicle energy management method based on multi-information fusion | |
CN113479187B (en) | Layered different-step-length energy management method for plug-in hybrid electric vehicle | |
CN114074680B (en) | Vehicle channel change behavior decision method and system based on deep reinforcement learning | |
CN115973179A (en) | Model training method, vehicle control method, device, electronic equipment and vehicle | |
CN114148349B (en) | Vehicle personalized following control method based on generation of countermeasure imitation study | |
CN115454082A (en) | Vehicle obstacle avoidance method and system, computer readable storage medium and electronic device | |
Wu et al. | An optimal longitudinal control strategy of platoons using improved particle swarm optimization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |