CN111267830B - Hybrid power bus energy management method, device and storage medium - Google Patents

Hybrid power bus energy management method, device and storage medium Download PDF

Info

Publication number
CN111267830B
CN111267830B CN202010084077.7A CN202010084077A CN111267830B CN 111267830 B CN111267830 B CN 111267830B CN 202010084077 A CN202010084077 A CN 202010084077A CN 111267830 B CN111267830 B CN 111267830B
Authority
CN
China
Prior art keywords
energy management
bus
parameters
training
reinforcement learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010084077.7A
Other languages
Chinese (zh)
Other versions
CN111267830A (en
Inventor
周健豪
薛四伍
顾诚
薛源
刘军
廖宇晖
张仁鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202010084077.7A priority Critical patent/CN111267830B/en
Publication of CN111267830A publication Critical patent/CN111267830A/en
Application granted granted Critical
Publication of CN111267830B publication Critical patent/CN111267830B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W20/00Control systems specially adapted for hybrid vehicles
    • B60W20/10Controlling the power contribution of each of the prime movers to meet required power demand
    • B60W20/11Controlling the power contribution of each of the prime movers to meet required power demand using model predictive control [MPC] strategies, i.e. control methods based on models predicting performance
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2530/00Input parameters relating to vehicle conditions or values, not covered by groups B60W2510/00 or B60W2520/00

Abstract

The invention discloses a method, equipment and a storage medium for energy management of a hybrid bus, wherein the method comprises the following steps: acquiring parameters influencing energy management of the experimental vehicle under the working condition of a fixed bus route; obtaining a trained deep reinforcement learning agent based on parameters influencing energy management and an observed quantity training model; and acquiring parameters and observed quantities influencing the energy management in the actual running of the bus, and performing energy management on the hybrid bus under the working condition of a fixed route based on the parameters influencing the energy management in the actual running and a trained deep reinforcement learning agent. By adopting the technical scheme of the invention, the energy management of the hybrid power bus can be more effectively controlled, and the energy consumption is reduced.

Description

Hybrid power bus energy management method, device and storage medium
Technical Field
The application belongs to the technical field of hybrid electric vehicles, and particularly relates to a method, equipment and a storage medium for energy management of a hybrid electric bus.
Background
Most of the energy management of hybrid vehicles is a policy based on rules, and by setting a certain energy management threshold, the most common rule of the plug-in hybrid vehicle is to first consume the energy of a battery, then maintain the electric quantity of the battery, and perform the energy control on the rule.
The optimization-based strategy has a representative benchmark of DP (Dynamic Programming). The method comprises the steps that under the condition that global working condition information is known, the hybrid bus relatively optimal energy management is obtained in an off-line mode, and the optimal energy management is obtained by performing corresponding optimal energy demand distribution on an engine and a battery of the hybrid bus according to the known speed working condition.
In the prior art, engineers are used to develop rules for regular energy management or optimized model predictive control based on known or predicted speeds for energy management, thereby adjusting the equivalent fuel consumption of a hybrid bus.
In the course of implementing the present application, the inventors found that the related art has at least the following problems:
according to the method, the rule-based frequent energy management effect is not obvious enough, the working condition is single, the optimized DP needs to be known in the global working condition, the calculation time is too long, the real-time online application cannot be carried out, the existing model prediction can be optimized and carried out in real time, the prediction control step length cannot be too large, and the difference is still large compared with the optimization result of the DP.
Disclosure of Invention
In order to solve the technical problems in the related art, the embodiment of the application provides a method and equipment for energy management of a hybrid bus. The technical scheme of the bus energy management method and the bus energy management equipment is as follows:
in a first aspect, the present application provides a method for hybrid bus energy management, comprising:
acquiring parameters influencing energy management of the experimental vehicle under the working condition of a fixed bus route;
based on the parameters and the observed quantity influencing the energy management, a depth certainty strategy gradient training model is utilized to obtain the trained depth intensity
A learning agent;
acquiring parameters and observed quantities influencing energy management in actual running of the bus, and performing energy management on the hybrid bus under the working condition of a fixed route based on the parameters and observed quantities influencing the energy management in the actual running and the trained deep reinforcement learning agent.
Preferably, the parameters influencing energy management comprise road conditions and time periods on a fixed route of the hybrid bus, wherein the road conditions comprise ambient temperature, weather conditions, road gradient and traffic light conditions at intersections, and the number of passengers on each station of the bus.
Preferably, the observed quantity comprises the speed, the acceleration, the engine speed, the engine torque, the motor speed, the motor torque, the battery state of charge, the fuel consumption at the current moment, the difference value between the SOC and the reference SOC, and the bus displacement/time.
Preferably, the gradient training model using the depth certainty strategy comprises the following 5 types,
taking time as an abscissa, not considering intersection conditions, assuming that buses at the intersection meet green lights, not performing speed constraint, training by using Deep Deterministic Policy Gradient (DDPG) in Deep reinforcement learning to obtain a convergent agent, performing simulation, and acquiring equivalent fuel consumption and other parameters which may influence energy management on the experimental vehicle;
taking time as an abscissa, considering intersection conditions, assuming that buses meet red lights at the intersection, and the speed of the vehicles has a process of decelerating to 0 and then accelerating at the intersection, training by using DDPG in deep reinforcement learning to obtain a convergent agent, and performing simulation, wherein equivalent fuel consumption acquired on the experimental vehicles and other parameters possibly influencing energy management are acquired;
taking the displacement as an abscissa, not considering intersection conditions, assuming that buses at the intersection meet green lights, not carrying out speed constraint, training by using DDPG in deep reinforcement learning to obtain a convergent agent, carrying out simulation, and collecting equivalent fuel consumption and other parameters which may influence energy management on the experimental vehicle;
taking the displacement as an abscissa, considering the intersection condition, assuming that buses meet red lights at the intersection, and the speed of the buses is in the process of decelerating to 0 and then accelerating at the intersection point, training by using DDPG in deep reinforcement learning to obtain a convergent agent, performing simulation, and collecting equivalent fuel consumption and other parameters which may influence energy management on the experimental vehicle;
taking displacement as an abscissa, considering intersection conditions and intersection traffic light change signals, combining displacement with time and bus speed, constraining automobile speed, training by using DDPG in deep reinforcement learning to obtain a convergent proxy, performing simulation, and collecting equivalent fuel consumption and other parameters which may influence energy management on the experimental vehicle;
the method for obtaining the trained deep reinforcement learning agent specifically comprises the following steps: selecting a plurality of time points with obvious passenger number change in different time periods, then training in the 5 modes, comparing equivalent fuel consumption and other parameters which may influence energy management collected on an experimental vehicle after training with a dynamic planning result, and comparing with a dynamic planning reference when intersection signals are considered to obtain the deep reinforcement learning agent with corresponding number of representative time periods.
Preferably, after obtaining the equivalent fuel consumption of the experimental vehicle under the same route working condition at different time periods and the parameters affecting the energy management, the method further comprises: and acquiring parameters of the equivalent fuel consumption of the experimental vehicle under the same route working condition at each time period and possibly influencing the energy management at a preset sampling frequency, and performing smoothing processing and normalization processing on the acquired parameters.
Preferably, the training is performed by using a deep reinforcement learning DDPG method based on the parameters affecting the energy management and the equivalent fuel consumption, and the method specifically comprises the following steps:
in each training, the SOC at the current moment, the difference value between the SOC and a reference SOC, the automobile speed, the automobile displacement/automobile running time, the automobile acceleration and the fuel consumption are used as observed value input data of a DDPG proxy, the reward value at the current moment is used as reward value input data of the DDPG proxy, the model is trained on the basis of a depth certainty strategy gradient, and the trained depth reinforcement learning proxy is obtained.
Preferably, the acquiring parameters affecting the energy management in the actual running of the bus, and performing the bus energy management based on the parameters affecting the energy management in the actual running and the trained deep reinforcement learning agent includes:
acquiring parameters influencing the energy management in the actual running of the bus;
and inputting the observed value and the reward value into the deep reinforcement learning agent, and outputting the observed value and the reward value as the torque requirement of the motor and the engine of the bus at the next moment at the current moment, wherein the current moment is the moment of the current observed quantity.
In a second aspect, the present application provides an apparatus for hybrid bus energy management, the apparatus comprising,
the bus energy management system comprises an acquisition module, a processing module and a control module, wherein the acquisition module is configured to acquire parameters influencing energy management of a bus under a fixed route working condition;
the training module is configured to train the model by using a DDPG strategy in deep reinforcement learning based on the parameters influencing energy management to obtain a trained deep reinforcement learning agent;
a real-time implementation module configured to acquire an observed quantity affecting the energy management during actual driving of the bus based on the shadow during actual driving
Parameters of sound energy management and the trained deep reinforcement learning agent select the depth intensity of the corresponding time interval according to the time of the bus
A learning agent to perform the bus road energy management;
the acquisition module, the training module and the real-time implementation module are sequentially connected.
Optionally, the acquisition module is configured to:
the road condition on the fixed route of the hybrid power bus comprises at least one parameter of temperature weather, the number of passengers at different stops on the bus route and the signal lamp condition of the same bus and a traffic channel intersection at different time periods every day.
Optionally, the acquisition module is configured to:
acquiring K samples, wherein each sample comprises parameters which are acquired on the experimental vehicle at different moments and possibly influence energy management;
optionally, the acquisition module is configured to:
the speed of the bus, the acceleration of the bus, the engine speed, the engine torque, the motor speed, the motor torque, the battery state of charge, the fuel consumption at the current moment, the difference between the SOC and the reference SOC, and the bus displacement (time).
Optionally, the acquisition module is configured to:
and acquiring equivalent fuel consumption of the experimental vehicle under a fixed route working condition and parameters influencing energy management at a preset sampling frequency, and performing smoothing processing and normalization processing on the acquired data.
Optionally, the training module is configured to:
in each training, the SOC at the current moment, the difference value between the SOC and the reference SOC, the automobile speed, the automobile displacement (automobile running time), the automobile acceleration and the fuel consumption are used as observed value input data of the DDPG proxy, and the reward value at the current moment is used as reward value input data of the DDPG proxy, the model is trained on the basis of a depth certainty gradient strategy, and the trained depth-enhanced learning proxy is obtained.
Optionally, the training module is configured to:
acquiring observed quantity of at least one moment before the estimated moment in the actual running of the hybrid bus and parameters influencing the energy control;
and inputting the observed quantity of the at least one moment and parameters influencing the energy management into the deep reinforcement learning agent, outputting a control behavior, and controlling the bus, wherein the estimated moment is the moment of carrying out parameter sampling next to the current moment.
In a third aspect, the present invention provides a storage medium, which includes a program stored in the storage medium, and when the program runs, the device where the storage medium is located is controlled to execute the hybrid bus energy management method in the above technical solution.
The technical scheme provided by the invention has the beneficial effects that at least:
the method provided by the invention obtains equivalent fuel consumption and parameters influencing energy management of the experimental vehicle under the working condition of a fixed route; training a deep reinforcement learning agent model based on parameters and observed quantities influencing the energy management to obtain a trained agent; the method comprises the steps of obtaining parameters influencing energy management in actual running of the bus, and carrying out energy management on the bus based on the parameters influencing energy management in actual running and a trained agent, so that energy optimization can be effectively controlled. The method provided by the embodiment of the application can be used for energy management of the hybrid bus, the influence of signals of traffic lights at a road intersection on the energy management is considered, in addition, representative 12 time intervals are selected for buses which are susceptible to time intervals to be trained respectively to obtain the agent model, DDPG agents in corresponding time intervals can be selected according to the time of the bus, more effective control on the energy management of the hybrid bus is achieved, and energy consumption is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic diagram of a training process of deep reinforcement learning DDPG according to an embodiment of the present application;
FIG. 2 is a flow chart of a method for hybrid bus energy management provided by an embodiment of the present application;
FIG. 3 is a schematic diagram illustrating a grid structure and gradient back propagation of an Actor-critical for deep reinforcement learning according to an embodiment of the present disclosure;
FIG. 4 is a flow chart of an apparatus for controlling energy management of a hybrid bus according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of deep reinforcement learning energy management of a hybrid bus provided in an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
FIG. 1 is a schematic diagram of a training process of a deep reinforcement learning DDPG provided by an embodiment of the present application;
referring to fig. 1, in the implementation environment, a deep reinforcement learning agent after training is obtained by collecting parameters that may affect energy management when an experimental vehicle runs in different bus periods under a fixed route working condition and using a deep certainty strategy gradient training model. Specifically, parameters and observed quantities which may affect energy management and reward values are used as input data, the input data are respectively input into a controlled object (namely a hybrid bus) and a deep reinforcement learning agent, and the deep reinforcement learning agent outputs a control quantity action. And outputting the control signal to a controlled object, inputting the reward value into a deep reinforcement learning agent for training, and adjusting a Cryc parameter in the agent by utilizing reverse gradient descent so as to finish one-time training. Through the repeated and continuous learning training, a converged proxy after training is finally established.
And acquiring parameters and observed quantities influencing the energy management in the actual running of the bus, and performing energy management on the hybrid bus under the working condition of a fixed route based on the parameters and observed quantities influencing the energy management in the actual running and the trained deep reinforcement learning agent.
The embodiment can be applied to the scene of hybrid bus under the fixed route operating mode, for example, utilize this embodiment to carry out the energy management of bus, the bus can be according to training good agent in advance, adjusts the condition of traveling of on-line bus, and then reduces the equivalent fuel consumption of bus and carries out more accurate control to the bus.
As a specific implementation, as shown in fig. 2, the present embodiment provides a method for controlling energy management of a hybrid bus, the method comprising:
step 201, obtaining parameters of the experimental vehicle influencing the brake pressure under a fixed working condition.
The working condition can represent the running displacement of the experimental vehicle, for example, the distance from a starting station to an end station of a bus is 10km, and the working condition can be regarded as one working condition. In this embodiment, a fixed round-trip route of a certain bus is used as an experimental condition, and data of the experimental vehicle circulating at least three times under the condition is collected to ensure the reliability of training data.
The parameter affecting energy management may be at least one of road conditions on a fixed route of the hybrid bus including temperature weather, number of passengers at different stops on the bus route, and signal light conditions of the same bus, traffic lane intersection at different time periods each day.
In the implementation, at least one parameter which may influence energy management of the experimental vehicle under each working condition is obtained, and the parameter which influences the energy management is selected from the at least one parameter which may influence the energy management. The equivalent fuel consumption of the experimental vehicle under the working condition is obtained by reading the fuel consumption and the data of the battery electric quantity sensor installed on the bus.
Optionally, parameters and observed quantities of the experimental vehicle, which are equivalent to fuel consumption and affect energy management under various working conditions, are collected at a preset sampling frequency, and smoothing and normalization processing are performed on the collected parameters.
The larger the sampling frequency is, the smaller the interval between sampling points is, the more data are obtained, and the greater the correlation between the data is, so that the result output by the finally trained agent model is more accurate. The technician can preset a sampling time interval, and samples the experimental vehicle in running according to the sampling time interval. For example, the technician may set the sampling frequency to 1 HZ.
By smoothing the collected data, the purpose of suppressing the collected inaccurate data can be achieved.
The acquired parameters are normalized, so that the values corresponding to different parameters have certain comparability, the accuracy of the proxy network model is improved, and great help is provided for setting the reward value.
Specifically, in the normalization process in this embodiment, because multiple sets of parameter items have been obtained in the above process, the data after the parameter normalization processing in each set of parameter items can be calculated by determining the maximum parameter value and the minimum parameter value in each set of parameter items and according to the determined maximum parameter value and the determined minimum parameter value, through the following formula:
Figure BDA0002381404630000051
wherein x isminFor the minimum parameter value, x, in each set of parameter termsmaxThe maximum parameter value in each group of parameter items.
Step 202, training an agent model based on parameters and observed values influencing energy management, and obtaining a trained convergence agent.
The deep reinforcement learning agent is one of the combination of neural network models, and can control the action data of the next moment according to the observation data of the estimated moment. The deep reinforcement learning agent is mainly divided into two layers, as shown in fig. 3, the two layers are a Critic layer and an Actor layer, wherein data is input in the Critic layer of the agent model, and data is output in the Actor layer of the agent.
And (3) performing Q function calculation on the evaluation network to obtain a Q value: q (s, a | theta)Q) The input is state s and action a, and the output is Q function Q ═ s, a | thetaQ) The action network maps the state s to the action to obtain a ═ s, a | thetaQ) The input is a state s, the output is a motion a, the evaluation network is divided into an Online evaluation network and a Target evaluation network, and the motion network is divided into an Online motion network and a Target motion network.
Target evaluation network and Online evaluation network have the same structure, and parameter theta of Online evaluation network Online action network is equal to that of Online evaluation networkQ、θμPerforming random initialization through the two networksNetwork parameter initial Target evaluation network and network parameter theta of Target action networkQ'And thetaμ'And simultaneously, a space R is opened up to be used as a storage space for experience playback.
After the initialization is finished, iterative solution is started, an action is selected for exploration by adding a Gaussian disturbance to the current network,
at=μ(s|θμ)+Ntin which N istIs a gaussian perturbation. Performing action a in the current statetThe corresponding reward and next state are obtained and the process is formed into element groups(s)t,at,rt,st+1) Store into Memory Replay space. Randomly selecting a small batch of data from the MemoryRelay space as training data of an Online action network and an Online evaluation network, and updating the Online evaluation network.
Defining an Online evaluation network Loss function:
Figure BDA0002381404630000061
the Online evaluation network is updated by minimizing the Loss function. And after the update of the Online evaluation network is finished, updating the Online action network.
The calculated gradient is:
Figure BDA0002381404630000062
and updating the Online action network according to the gradient descent principle. Finally, the updated Online evaluation network and the parameter theta of the Online action network are utilizedQAnd thetaμNetwork parameter theta of Target evaluation network and Target action networkQ'And thetaμ′Updating:
Figure BDA0002381404630000063
as shown in fig. 3, fig. 3 is a schematic structural diagram of a deep reinforcement learning agent model, in each training, parameters and observations affecting braking energy management at a time before an estimated time and a reward value obtained therefrom are used as input data of the agent model, the input data are input into a hidden layer of the agent, and control action data are output at an output layer of the agent. And performing back propagation training according to the gradient value, and adjusting parameters in the proxy model, thereby completing one-time training. And finally establishing a trained agent model through the repeated and continuous learning training.
In the actual process, the number of the neurons in the hidden layer of the agent model is set to be 40, and in order to accurately evaluate the effect of deep reinforcement learning energy management, the control effect of the deep reinforcement learning can be evaluated through the equivalent fuel consumption ratio R.
The equivalent fuel consumption ratio reflects a comparison between the effect of the actual control and the DP reference, with good results as the R value approaches 0. The formula for calculating the value of the ratio R is as follows:
Figure BDA0002381404630000064
wherein R is represented as the ratio between DP reference data and actual data, SRLRepresenting equivalent fuel consumption, S, by deep reinforcement learning trainingDPThe equivalent fuel consumption reference data obtained under the DP reference is shown.
It should be noted that the prediction performance of the nonlinear autoregressive dynamic neural network model after the training is completed is evaluated by calculating the ratio and the root mean square between the reference data and the actual data, for example, when the obtained ratio is close to 0, the root mean square is close to 0, which indicates that the proxy model after the training is completed by the deep reinforcement learning has good control performance.
Optionally, the training method may be a DDPG algorithm, or a DQN algorithm, Q-learning. In order to make the control data of the deep reinforcement learning algorithm more accurate, a plurality of back propagation training methods can be used for respectively training the agent model, the R value is calculated in the training process of each back propagation training method, the control performance of each back propagation training method is compared by taking the R value as an index, and the back propagation training method with the best training effect is further determined.
Taking time as an abscissa, not easily considering intersection conditions, assuming that buses at the intersection meet green lights, not performing speed constraint, training by using DDPG in deep reinforcement learning to obtain a convergent agent, performing simulation, and collecting equivalent fuel consumption and other parameters which may influence energy management on the experimental vehicle;
taking time as an abscissa, considering intersection conditions, assuming that buses meet red lights at the intersection, enabling the speed to have a process of decelerating to 0 and then accelerating at the intersection point, training by using DDPG in deep reinforcement learning to obtain a convergent agent, and performing simulation, wherein equivalent fuel consumption acquired on the experimental vehicle and other parameters possibly influencing energy management are acquired;
taking the displacement as an abscissa, not considering intersection conditions, assuming that buses at the intersection meet green lights, not carrying out speed constraint, training by using DDPG in deep reinforcement learning to obtain a convergent agent, carrying out simulation, and collecting equivalent fuel consumption and other parameters which may influence energy management on the experimental vehicle;
taking the displacement as an abscissa, considering the intersection condition, assuming that buses meet red lights at the intersection, enabling the speed to have a process of decelerating to 0 and then accelerating at the intersection point, training by using DDPG in deep reinforcement learning to obtain a convergent agent, and performing simulation, wherein equivalent fuel consumption acquired on the experimental vehicle and other parameters possibly influencing energy management are acquired;
taking the displacement as the abscissa, considering the crossing situation, the crossing traffic light change signal is shown in table 1, and the last column of ellipses in table 1 indicates that the change situation is the same as the above columns. Combine displacement and time and bus speed, retrain car speed, specifically retrain has two: once the vehicle is at the intersection position, if the vehicle is just in the red light time domain, the speed of the vehicle must be 0, and forced constraint is carried out until the red light time domain is finished, so that the vehicle can not continue to travel. The second is that the automobile speed V (t) has the following relationship with the displacement X (t) and the time t.
Figure BDA0002381404630000071
When the displacement is determined, the total driving time and even the time between stations are restricted by the upper limit and the lower limit, namely the speed (the average speed of the displacement vehicle) is also restricted. And selecting the same time period, training by using DDPG in deep reinforcement learning to obtain a convergent agent, carrying out simulation, and collecting equivalent fuel consumption and other parameters which may influence energy management on the experimental vehicle.
The method for obtaining the trained deep reinforcement learning agent specifically comprises the following steps: selecting a plurality of time points with obvious change of passenger number in different time periods respectively, then training in the 5 modes, comparing equivalent fuel consumption and other parameters which possibly influence energy management acquired on an experimental vehicle after training with a result of dynamic planning, wherein time and displacement are taken as horizontal coordinates respectively, 4 conditions of green lights and red lights are met at all intersections, the 5 th condition of traffic light change conditions of the intersections is taken as the horizontal coordinates, and the equivalent fuel consumption, speed and SOC curves are compared under the condition that other conditions are the same, so that the increase rate or the decrease rate of the equivalent fuel consumption is output, and the deep reinforcement learning agent with corresponding number of representative time periods is obtained. In this embodiment, 12 time points with obvious characteristics are selected in different time periods respectively, the number of passengers at each station is changed, then training is performed in the 5 ways, equivalent fuel consumption acquired on the experimental vehicle and other parameters which may affect energy management are compared with each other, and compared with a Dynamic Programming (DP) reference considering intersection signals, so as to obtain a deep reinforcement learning agent with 12 representative time periods;
TABLE 1
Figure BDA0002381404630000081
Under the condition of 5 above, one of 12 groups of data was selected, and the R value is shown in table 2.
TABLE 2
Training algorithm R
St -0.0376
Stc 0.0762
Sd -0.0423
Sdc 0.0631
Sdct 0.0384
From table 2, in the deep reinforcement learning training process, the ratio R value of the obtained reference data to the actual data is close to 0, the displacement is used as the abscissa, so that the performance is better, and the crossing traffic light signal condition is considered to have a great influence on the actual control result.
And 303, acquiring parameters and observed quantities influencing energy management in actual running of the bus, and controlling the energy management in the bus based on the parameters influencing the energy management in the actual running and the trained agent model.
In the steps, the parameter item influencing energy management and the trained agent model are obtained, so that the parameters influencing energy and the observed quantity can be input into the trained agent model in real time to control the running and the energy management of the bus.
Specifically, if the control action of the bus at the current moment is to be controlled, parameters and observed quantities influencing energy management at the estimated moment need to be acquired, and the parameters influencing energy management are at least one of the parameters of the road condition and different time periods on the fixed route of the hybrid bus, the number of passengers at different stops on the bus route and the signal lamp condition at the intersection of the traffic channel. The observed quantity is the speed of the bus, the acceleration of the bus, the engine speed, the engine torque, the motor speed, the motor torque, the battery state of charge, the fuel consumption at the current moment, the difference value between the SOC and the reference SOC, and the bus displacement (time). Inputting the parameters into the trained agent model, and outputting the control action of the bus at the estimated time, wherein the estimated time is the next time for parameter sampling at the current time, that is, the estimated time is the time corresponding to the next sampling point of the sampling point corresponding to the current time.
The method provided by the embodiment of the invention can be used for energy management of the hybrid bus and reduce equivalent fuel consumption.
As shown in fig. 4, another embodiment of the present application provides an apparatus for controlling energy management of a hybrid bus, the apparatus including:
the acquisition module 401 is configured to acquire parameters and observed quantities of the experimental vehicle influencing the brake pressure under various working conditions;
a training module 402, configured to train a deep reinforcement learning agent model based on the parameters and observations affecting the brake pressure and a set reward value calculation, to obtain a trained convergent agent model;
and the real-time control module 403 is configured to acquire parameters and observed quantities affecting the energy management in actual running of the bus, and control the bus energy management based on the parameters and observed quantities affecting the energy management in actual running and the trained agent model.
Optionally, the acquisition module 401 is configured to:
at least one parameter of road condition and time signal on the fixed route of the hybrid power bus, the number of passengers at different stops on the bus route and the signal lamp condition of the intersection of the traffic channel. The speed of the bus, the acceleration of the bus, the engine speed, the engine torque, the motor speed, the motor torque, the battery state of charge, the fuel consumption at the current moment, the difference between the SOC and the reference SOC, and the bus displacement (time).
Optionally, the acquisition module 401 is configured to:
acquiring K samples, wherein each sample comprises parameters which are acquired on the experimental vehicle at the same time and possibly affect energy management;
and acquiring parameters influencing the energy management of the experimental vehicle under various working conditions at a preset sampling frequency, and performing smoothing processing and normalization processing on the acquired data.
Optionally, the training module 402 is configured to:
in each training, parameters and observed values influencing energy management at the estimated time are used as input data of the agent model, calculated reward values are obtained, the agent model is trained on the basis of DDPG back propagation, and the trained agent model is obtained.
Optionally, the real-time implementation module 403 is configured to:
acquiring parameters and observed quantities affecting the energy management at the estimated time in the actual running of the bus, inputting the parameters and the observed quantities affecting the energy management at the time into the agent model, and outputting the bus control quantity, wherein the control time is the next time for controlling at the current time.
It should be noted that: in the energy management of the device for controlling energy management of a bus provided in the above embodiment, only the division of the above functional modules is taken as an example, and in practical applications, after the functions are distributed by different functional modules according to needs, the internal structure of the device may be divided into different functional modules, and all or part of the functions described above are described later. In addition, the embodiments of the method for controlling bus energy management provided by the above embodiments belong to the same concept, and specific implementation processes thereof are detailed in the embodiments of the method and are not described herein again.
As shown in fig. 5, a schematic diagram of a deep reinforcement learning energy management structure of a hybrid bus is provided in this embodiment. Through the fact that the historical travel information mainly comprises the working condition mileage and the working condition running time of the bus, a simple SOC reference can be obtained, namely the reference SOC is decreased at a constant speed in a linear function mode along with the change of displacement (time). The method comprises the steps of obtaining running time of a bus known by an acquisition module, selecting a trained deep reinforcement learning agent in a corresponding time period of 12 representative time periods, obtaining parameters and observed quantities affecting energy management at estimated time in actual running of the bus, inputting the observed quantities affecting the energy management at the time into an agent model, outputting corresponding torque of an engine and a motor of a bus control quantity at the next time, controlling the hybrid bus under consideration of the parameters affecting the energy management, and repeating the processes until the vehicle completes a running task.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (8)

1. A hybrid power bus energy management method is characterized by comprising
Acquiring parameters influencing energy management of the experimental vehicle under the working condition of a fixed bus route;
obtaining a trained deep reinforcement learning agent based on the parameters influencing energy management and the observed quantity training model;
acquiring parameters and observed quantities influencing energy management in actual running of the bus, and performing energy management on the hybrid bus under a fixed route working condition based on the parameters and observed quantities influencing the energy management in the actual running and the trained deep reinforcement learning agent;
wherein the training model based on the parameters affecting energy management and the observed quantity is a deep deterministic strategy gradient algorithm, a DQN algorithm or Q-learning, and the training model using the deep deterministic strategy gradient comprises the following 5,
the time is taken as an abscissa, the crossing condition is not considered, the bus at the crossing is assumed to be all green light, speed constraint is not carried out, DDPG in deep reinforcement learning is used for training, a convergent agent is obtained, simulation is carried out, and equivalent fuel consumption and other parameters which possibly influence energy management are collected on the experimental vehicle;
taking time as an abscissa, considering intersection conditions, assuming that buses meet red lights at the intersection, and the speed of the vehicles has a process of decelerating to 0 and then accelerating at the intersection, training by using DDPG in deep reinforcement learning to obtain a convergent agent, and performing simulation, wherein equivalent fuel consumption acquired on the experimental vehicles and other parameters possibly influencing energy management are acquired;
taking the displacement as an abscissa, not considering intersection conditions, assuming that buses at the intersection meet green lights, not carrying out speed constraint, training by using DDPG in deep reinforcement learning to obtain a convergent agent, carrying out simulation, and collecting equivalent fuel consumption and other parameters which may influence energy management on the experimental vehicle;
taking the displacement as an abscissa, considering the intersection condition, assuming that buses meet red lights at the intersection, and the speed of the buses is in the process of decelerating to 0 and then accelerating at the intersection point, training by using DDPG in deep reinforcement learning to obtain a convergent agent, performing simulation, and collecting equivalent fuel consumption and other parameters which may influence energy management on the experimental vehicle;
taking displacement as an abscissa, considering intersection conditions and intersection traffic light change signals, combining displacement with time and bus speed, constraining automobile speed, training by using DDPG in deep reinforcement learning to obtain a convergent proxy, performing simulation, and collecting equivalent fuel consumption and other parameters which may influence energy management on the experimental vehicle;
the method for obtaining the trained deep reinforcement learning agent specifically comprises the following steps: selecting a plurality of time points with obvious passenger number change in different time periods, then training in the 5 modes, comparing equivalent fuel consumption and other parameters which may influence energy management collected on an experimental vehicle after training with a dynamic planning result, and comparing with a dynamic planning reference when intersection signals are considered to obtain the deep reinforcement learning agent with corresponding number of representative time periods.
2. The hybrid bus energy management method as claimed in claim 1, wherein the parameters affecting energy management comprise road conditions and time periods on a fixed route of the hybrid bus, wherein the road conditions comprise ambient temperature, weather conditions, road gradient, traffic light conditions at intersections, and the number of passengers on each stop of the bus.
3. The hybrid bus energy management method of claim 1, wherein the observed quantities comprise bus speed, acceleration, engine speed, engine torque, motor speed, motor torque, battery state of charge, current time fuel consumption, difference between SOC and reference SOC, and bus displacement/time.
4. The energy management method for the hybrid electric bus according to claim 1, wherein after acquiring the equivalent fuel consumption of the experimental vehicle under the same route working conditions in different periods and the parameters influencing the energy management, the method further comprises the following steps: and acquiring parameters of the equivalent fuel consumption of the experimental vehicle under the same route working condition at each time period and possibly influencing the energy management at a preset sampling frequency, and performing smoothing processing and normalization processing on the acquired parameters.
5. The hybrid bus energy management method according to any one of claims 1 to 4, wherein the agent is obtained by training with a deep reinforcement learning DDPG method based on parameters affecting the energy management and the equivalent fuel consumption, specifically:
in each training, the SOC at the current moment, the difference value between the SOC and the reference SOC, the automobile speed, the automobile displacement/automobile running time, the automobile acceleration and the fuel consumption are used as observed value input data of the DDPG proxy, and the reward value at the current moment is used as reward value input data of the DDPG proxy, the model is trained on the basis of the depth certainty strategy gradient, and the trained depth reinforcement learning proxy is obtained.
6. The hybrid bus energy management method according to claim 5, wherein the acquiring parameters affecting the energy management in the actual driving of the bus, and performing the bus energy management based on the parameters affecting the energy management in the actual driving and the trained deep reinforcement learning agent comprises:
acquiring parameters influencing the energy management in the actual running of the bus;
and inputting the observed value and the reward value into the deep reinforcement learning agent, and outputting the observed value and the reward value as the torque requirement of the motor and the engine of the bus at the next moment at the current moment, wherein the current moment is the moment of the current observed quantity.
7. An apparatus for controlling energy management in a bus, said apparatus comprising,
the bus energy management system comprises an acquisition module, a processing module and a control module, wherein the acquisition module is configured to acquire parameters influencing energy management of a bus under a fixed route working condition;
the training module is configured to train a model by using a DDPG strategy in deep reinforcement learning based on the parameters influencing energy management to obtain a trained deep reinforcement learning agent;
the real-time implementation module is configured to obtain observed quantity influencing energy management in actual running of the bus, and based on parameters influencing energy management in actual running and the trained deep reinforcement learning agent, the deep reinforcement learning agent in the corresponding time period is selected according to the time of the bus to conduct bus road energy management;
the acquisition module, the training module and the real-time implementation module are sequentially connected;
wherein a model is trained based on the parameters affecting energy management and the observed quantity, the training method used is a deep deterministic strategy gradient algorithm, a DQN algorithm or Q-learning, and the training model using the deep deterministic strategy gradient comprises the following 5,
the time is taken as an abscissa, the crossing condition is not considered, the bus at the crossing is assumed to be all green light, speed constraint is not carried out, DDPG in deep reinforcement learning is used for training, a convergent agent is obtained, simulation is carried out, and equivalent fuel consumption and other parameters which possibly influence energy management are collected on an experimental vehicle;
taking time as an abscissa, considering intersection conditions, assuming that buses meet red lights at the intersection, and the speed of the vehicles has a process of decelerating to 0 and then accelerating at the intersection, training by using DDPG in deep reinforcement learning to obtain a convergent agent, and performing simulation, wherein equivalent fuel consumption acquired on the experimental vehicles and other parameters possibly influencing energy management are acquired;
taking the displacement as an abscissa, not considering intersection conditions, assuming that buses at the intersection meet green lights, not carrying out speed constraint, training by using DDPG in deep reinforcement learning to obtain a convergent agent, carrying out simulation, and collecting equivalent fuel consumption and other parameters which may influence energy management on the experimental vehicle;
taking the displacement as an abscissa, considering the intersection condition, assuming that buses meet red lights at the intersection, and the speed of the buses is in the process of decelerating to 0 and then accelerating at the intersection point, training by using DDPG in deep reinforcement learning to obtain a convergent agent, performing simulation, and collecting equivalent fuel consumption and other parameters which may influence energy management on the experimental vehicle;
taking displacement as an abscissa, considering intersection conditions and intersection traffic light change signals, combining displacement with time and bus speed, constraining automobile speed, training by using DDPG in deep reinforcement learning to obtain a convergent proxy, performing simulation, and collecting equivalent fuel consumption and other parameters which may influence energy management on the experimental vehicle;
the method for obtaining the trained deep reinforcement learning agent specifically comprises the following steps: selecting a plurality of time points with obvious passenger number change in different time periods, then training in the 5 modes, comparing equivalent fuel consumption and other parameters which may influence energy management collected on an experimental vehicle after training with a dynamic planning result, and comparing with a dynamic planning reference when intersection signals are considered to obtain the deep reinforcement learning agent with corresponding number of representative time periods.
8. A storage medium, characterized by: the energy management system comprises a program stored in the storage medium, and the device where the storage medium is located is controlled to execute the energy management method of the hybrid bus according to any one of claims 1-6 when the program runs.
CN202010084077.7A 2020-02-10 2020-02-10 Hybrid power bus energy management method, device and storage medium Active CN111267830B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010084077.7A CN111267830B (en) 2020-02-10 2020-02-10 Hybrid power bus energy management method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010084077.7A CN111267830B (en) 2020-02-10 2020-02-10 Hybrid power bus energy management method, device and storage medium

Publications (2)

Publication Number Publication Date
CN111267830A CN111267830A (en) 2020-06-12
CN111267830B true CN111267830B (en) 2021-07-09

Family

ID=70994986

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010084077.7A Active CN111267830B (en) 2020-02-10 2020-02-10 Hybrid power bus energy management method, device and storage medium

Country Status (1)

Country Link
CN (1) CN111267830B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111959509B (en) * 2020-08-19 2022-06-17 重庆交通大学 Q learning regenerative braking control strategy based on state space domain battery energy balance
CN112026744B (en) * 2020-08-20 2022-01-04 南京航空航天大学 Series-parallel hybrid power system energy management method based on DQN variants
CN111965981B (en) * 2020-09-07 2022-02-22 厦门大学 Aeroengine reinforcement learning control method and system
CN112249002B (en) * 2020-09-23 2022-06-28 南京航空航天大学 TD 3-based heuristic series-parallel hybrid power energy management method
CN112287463B (en) * 2020-11-03 2022-02-11 重庆大学 Fuel cell automobile energy management method based on deep reinforcement learning algorithm
CN112613229B (en) * 2020-12-14 2023-05-23 中国科学院深圳先进技术研究院 Energy management method, model training method and device for hybrid power equipment
CN112837532B (en) * 2020-12-31 2022-04-01 东南大学 New energy bus cooperative dispatching and energy-saving driving system and control method thereof
CN112989715B (en) * 2021-05-20 2021-08-03 北京理工大学 Multi-signal-lamp vehicle speed planning method for fuel cell vehicle
CN113911103B (en) * 2021-12-14 2022-03-15 北京理工大学 Hybrid power tracked vehicle speed and energy collaborative optimization method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105270383A (en) * 2014-05-30 2016-01-27 福特全球技术公司 Vehicle speed profile prediction using neural networks
CN109934332A (en) * 2018-12-31 2019-06-25 中国科学院软件研究所 The depth deterministic policy Gradient learning method in pond is tested based on reviewer and double ends
CN110254418A (en) * 2019-06-28 2019-09-20 福州大学 A kind of hybrid vehicle enhancing study energy management control method
CN110751346A (en) * 2019-11-04 2020-02-04 重庆中涪科瑞工业技术研究院有限公司 Distributed energy management method based on driving speed prediction and game theory

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130073104A1 (en) * 2011-09-20 2013-03-21 Maro Sciacchitano Modular intelligent energy management, storage and distribution system
CN102729987B (en) * 2012-06-20 2014-11-19 浙江大学 Hybrid bus energy management method
CN107618501B (en) * 2016-07-15 2020-10-09 联合汽车电子有限公司 Energy management method for hybrid vehicle, terminal device and server
KR101917375B1 (en) * 2016-11-22 2018-11-12 한국에너지기술연구원 Energy management system and method using machine learning
CN107284441B (en) * 2017-06-07 2019-07-05 同济大学 The energy-optimised management method of the adaptive plug-in hybrid-power automobile of real-time working condition
KR20190075294A (en) * 2017-12-21 2019-07-01 중앙대학교 산학협력단 Deep Learning Based Building Energy Management System and System Maintenance Method Using the Building Energy Management System
CN108177648B (en) * 2018-01-02 2019-09-17 北京理工大学 A kind of energy management method of the plug-in hybrid vehicle based on intelligent predicting
CN108427985B (en) * 2018-01-02 2020-05-19 北京理工大学 Plug-in hybrid vehicle energy management method based on deep reinforcement learning
CA3030490A1 (en) * 2018-01-22 2019-07-22 Pason Power Inc. Intelligent energy management system for distributed energy resources and energy storage systems using machine learning
EP3807137A4 (en) * 2018-06-15 2021-12-22 The Regents of the University of California Systems, apparatus and methods to improve plug-in hybrid electric vehicle energy performance by using v2c connectivity
CN110194172A (en) * 2019-06-28 2019-09-03 重庆大学 Based on enhanced neural network plug-in hybrid passenger car energy management method
CN110481536B (en) * 2019-07-03 2020-12-11 中国科学院深圳先进技术研究院 Control method and device applied to hybrid electric vehicle
CN110341690B (en) * 2019-07-22 2020-08-04 北京理工大学 PHEV energy management method based on deterministic strategy gradient learning
CN110458443B (en) * 2019-08-07 2022-08-16 南京邮电大学 Smart home energy management method and system based on deep reinforcement learning
CN110610260B (en) * 2019-08-21 2023-04-18 南京航空航天大学 Driving energy consumption prediction system, method, storage medium and equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105270383A (en) * 2014-05-30 2016-01-27 福特全球技术公司 Vehicle speed profile prediction using neural networks
CN109934332A (en) * 2018-12-31 2019-06-25 中国科学院软件研究所 The depth deterministic policy Gradient learning method in pond is tested based on reviewer and double ends
CN110254418A (en) * 2019-06-28 2019-09-20 福州大学 A kind of hybrid vehicle enhancing study energy management control method
CN110751346A (en) * 2019-11-04 2020-02-04 重庆中涪科瑞工业技术研究院有限公司 Distributed energy management method based on driving speed prediction and game theory

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Energy Management Strategy for a Hybrid Electric Vehicle Based on Deep Reinforcement Learning;Yue Hu等;《applied sciences》;20180126;全文 *

Also Published As

Publication number Publication date
CN111267830A (en) 2020-06-12

Similar Documents

Publication Publication Date Title
CN111267830B (en) Hybrid power bus energy management method, device and storage medium
CN110610260B (en) Driving energy consumption prediction system, method, storage medium and equipment
Wegener et al. Automated eco-driving in urban scenarios using deep reinforcement learning
CN111267831A (en) Hybrid vehicle intelligent time-domain-variable model prediction energy management method
CN110991757B (en) Comprehensive prediction energy management method for hybrid electric vehicle
CN107577234B (en) Automobile fuel economy control method for driver in-loop
CN113010967B (en) Intelligent automobile in-loop simulation test method based on mixed traffic flow model
JP2022532972A (en) Unmanned vehicle lane change decision method and system based on hostile imitation learning
CN104200267A (en) Vehicle driving economy evaluation system and vehicle driving economy evaluation method
Valera et al. Driving cycle and road grade on-board predictions for the optimal energy management in EV-PHEVs
CN113525396B (en) Hybrid electric vehicle layered prediction energy management method integrating deep reinforcement learning
CN113415288B (en) Sectional type longitudinal vehicle speed planning method, device, equipment and storage medium
CN112339756B (en) New energy automobile traffic light intersection energy recovery optimization speed planning algorithm based on reinforcement learning
CN112735126A (en) Mixed traffic flow cooperative optimization control method based on model predictive control
CN112026744B (en) Series-parallel hybrid power system energy management method based on DQN variants
CN116187161A (en) Intelligent energy management method and system for hybrid electric bus in intelligent networking environment
Deshpande et al. In-vehicle test results for advanced propulsion and vehicle system controls using connected and automated vehicle information
Pi et al. Automotive platoon energy-saving: A review
CN115534929A (en) Plug-in hybrid electric vehicle energy management method based on multi-information fusion
CN113479187B (en) Layered different-step-length energy management method for plug-in hybrid electric vehicle
CN114074680B (en) Vehicle channel change behavior decision method and system based on deep reinforcement learning
CN115973179A (en) Model training method, vehicle control method, device, electronic equipment and vehicle
CN114148349B (en) Vehicle personalized following control method based on generation of countermeasure imitation study
CN115454082A (en) Vehicle obstacle avoidance method and system, computer readable storage medium and electronic device
Wu et al. An optimal longitudinal control strategy of platoons using improved particle swarm optimization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant