CN111267830B

CN111267830B - Hybrid power bus energy management method, device and storage medium

Info

Publication number: CN111267830B
Application number: CN202010084077.7A
Authority: CN
Inventors: 周健豪; 薛四伍; 顾诚; 薛源; 刘军; 廖宇晖; 张仁鹏
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2020-02-10
Filing date: 2020-02-10
Publication date: 2021-07-09
Anticipated expiration: 2040-02-10
Also published as: CN111267830A

Abstract

The invention discloses a method, equipment and a storage medium for energy management of a hybrid bus, wherein the method comprises the following steps: acquiring parameters influencing energy management of the experimental vehicle under the working condition of a fixed bus route; obtaining a trained deep reinforcement learning agent based on parameters influencing energy management and an observed quantity training model; and acquiring parameters and observed quantities influencing the energy management in the actual running of the bus, and performing energy management on the hybrid bus under the working condition of a fixed route based on the parameters influencing the energy management in the actual running and a trained deep reinforcement learning agent. By adopting the technical scheme of the invention, the energy management of the hybrid power bus can be more effectively controlled, and the energy consumption is reduced.

Description

Hybrid power bus energy management method, device and storage medium

Technical Field

The application belongs to the technical field of hybrid electric vehicles, and particularly relates to a method, equipment and a storage medium for energy management of a hybrid electric bus.

Background

Most of the energy management of hybrid vehicles is a policy based on rules, and by setting a certain energy management threshold, the most common rule of the plug-in hybrid vehicle is to first consume the energy of a battery, then maintain the electric quantity of the battery, and perform the energy control on the rule.

The optimization-based strategy has a representative benchmark of DP (Dynamic Programming). The method comprises the steps that under the condition that global working condition information is known, the hybrid bus relatively optimal energy management is obtained in an off-line mode, and the optimal energy management is obtained by performing corresponding optimal energy demand distribution on an engine and a battery of the hybrid bus according to the known speed working condition.

In the prior art, engineers are used to develop rules for regular energy management or optimized model predictive control based on known or predicted speeds for energy management, thereby adjusting the equivalent fuel consumption of a hybrid bus.

In the course of implementing the present application, the inventors found that the related art has at least the following problems:

according to the method, the rule-based frequent energy management effect is not obvious enough, the working condition is single, the optimized DP needs to be known in the global working condition, the calculation time is too long, the real-time online application cannot be carried out, the existing model prediction can be optimized and carried out in real time, the prediction control step length cannot be too large, and the difference is still large compared with the optimization result of the DP.

Disclosure of Invention

In order to solve the technical problems in the related art, the embodiment of the application provides a method and equipment for energy management of a hybrid bus. The technical scheme of the bus energy management method and the bus energy management equipment is as follows:

in a first aspect, the present application provides a method for hybrid bus energy management, comprising:

acquiring parameters influencing energy management of the experimental vehicle under the working condition of a fixed bus route;

based on the parameters and the observed quantity influencing the energy management, a depth certainty strategy gradient training model is utilized to obtain the trained depth intensity

A learning agent;

acquiring parameters and observed quantities influencing energy management in actual running of the bus, and performing energy management on the hybrid bus under the working condition of a fixed route based on the parameters and observed quantities influencing the energy management in the actual running and the trained deep reinforcement learning agent.

Preferably, the parameters influencing energy management comprise road conditions and time periods on a fixed route of the hybrid bus, wherein the road conditions comprise ambient temperature, weather conditions, road gradient and traffic light conditions at intersections, and the number of passengers on each station of the bus.

Preferably, the observed quantity comprises the speed, the acceleration, the engine speed, the engine torque, the motor speed, the motor torque, the battery state of charge, the fuel consumption at the current moment, the difference value between the SOC and the reference SOC, and the bus displacement/time.

Preferably, the gradient training model using the depth certainty strategy comprises the following 5 types,

taking time as an abscissa, not considering intersection conditions, assuming that buses at the intersection meet green lights, not performing speed constraint, training by using Deep Deterministic Policy Gradient (DDPG) in Deep reinforcement learning to obtain a convergent agent, performing simulation, and acquiring equivalent fuel consumption and other parameters which may influence energy management on the experimental vehicle;

taking time as an abscissa, considering intersection conditions, assuming that buses meet red lights at the intersection, and the speed of the vehicles has a process of decelerating to 0 and then accelerating at the intersection, training by using DDPG in deep reinforcement learning to obtain a convergent agent, and performing simulation, wherein equivalent fuel consumption acquired on the experimental vehicles and other parameters possibly influencing energy management are acquired;

taking the displacement as an abscissa, not considering intersection conditions, assuming that buses at the intersection meet green lights, not carrying out speed constraint, training by using DDPG in deep reinforcement learning to obtain a convergent agent, carrying out simulation, and collecting equivalent fuel consumption and other parameters which may influence energy management on the experimental vehicle;

taking the displacement as an abscissa, considering the intersection condition, assuming that buses meet red lights at the intersection, and the speed of the buses is in the process of decelerating to 0 and then accelerating at the intersection point, training by using DDPG in deep reinforcement learning to obtain a convergent agent, performing simulation, and collecting equivalent fuel consumption and other parameters which may influence energy management on the experimental vehicle;

taking displacement as an abscissa, considering intersection conditions and intersection traffic light change signals, combining displacement with time and bus speed, constraining automobile speed, training by using DDPG in deep reinforcement learning to obtain a convergent proxy, performing simulation, and collecting equivalent fuel consumption and other parameters which may influence energy management on the experimental vehicle;

the method for obtaining the trained deep reinforcement learning agent specifically comprises the following steps: selecting a plurality of time points with obvious passenger number change in different time periods, then training in the 5 modes, comparing equivalent fuel consumption and other parameters which may influence energy management collected on an experimental vehicle after training with a dynamic planning result, and comparing with a dynamic planning reference when intersection signals are considered to obtain the deep reinforcement learning agent with corresponding number of representative time periods.

Preferably, after obtaining the equivalent fuel consumption of the experimental vehicle under the same route working condition at different time periods and the parameters affecting the energy management, the method further comprises: and acquiring parameters of the equivalent fuel consumption of the experimental vehicle under the same route working condition at each time period and possibly influencing the energy management at a preset sampling frequency, and performing smoothing processing and normalization processing on the acquired parameters.

Preferably, the training is performed by using a deep reinforcement learning DDPG method based on the parameters affecting the energy management and the equivalent fuel consumption, and the method specifically comprises the following steps:

in each training, the SOC at the current moment, the difference value between the SOC and a reference SOC, the automobile speed, the automobile displacement/automobile running time, the automobile acceleration and the fuel consumption are used as observed value input data of a DDPG proxy, the reward value at the current moment is used as reward value input data of the DDPG proxy, the model is trained on the basis of a depth certainty strategy gradient, and the trained depth reinforcement learning proxy is obtained.

Preferably, the acquiring parameters affecting the energy management in the actual running of the bus, and performing the bus energy management based on the parameters affecting the energy management in the actual running and the trained deep reinforcement learning agent includes:

acquiring parameters influencing the energy management in the actual running of the bus;

and inputting the observed value and the reward value into the deep reinforcement learning agent, and outputting the observed value and the reward value as the torque requirement of the motor and the engine of the bus at the next moment at the current moment, wherein the current moment is the moment of the current observed quantity.

In a second aspect, the present application provides an apparatus for hybrid bus energy management, the apparatus comprising,

the bus energy management system comprises an acquisition module, a processing module and a control module, wherein the acquisition module is configured to acquire parameters influencing energy management of a bus under a fixed route working condition;

the training module is configured to train the model by using a DDPG strategy in deep reinforcement learning based on the parameters influencing energy management to obtain a trained deep reinforcement learning agent;

a real-time implementation module configured to acquire an observed quantity affecting the energy management during actual driving of the bus based on the shadow during actual driving

Parameters of sound energy management and the trained deep reinforcement learning agent select the depth intensity of the corresponding time interval according to the time of the bus

A learning agent to perform the bus road energy management;

the acquisition module, the training module and the real-time implementation module are sequentially connected.

Optionally, the acquisition module is configured to:

the road condition on the fixed route of the hybrid power bus comprises at least one parameter of temperature weather, the number of passengers at different stops on the bus route and the signal lamp condition of the same bus and a traffic channel intersection at different time periods every day.

Optionally, the acquisition module is configured to:

acquiring K samples, wherein each sample comprises parameters which are acquired on the experimental vehicle at different moments and possibly influence energy management;

optionally, the acquisition module is configured to:

the speed of the bus, the acceleration of the bus, the engine speed, the engine torque, the motor speed, the motor torque, the battery state of charge, the fuel consumption at the current moment, the difference between the SOC and the reference SOC, and the bus displacement (time).

Optionally, the acquisition module is configured to:

and acquiring equivalent fuel consumption of the experimental vehicle under a fixed route working condition and parameters influencing energy management at a preset sampling frequency, and performing smoothing processing and normalization processing on the acquired data.

Optionally, the training module is configured to:

in each training, the SOC at the current moment, the difference value between the SOC and the reference SOC, the automobile speed, the automobile displacement (automobile running time), the automobile acceleration and the fuel consumption are used as observed value input data of the DDPG proxy, and the reward value at the current moment is used as reward value input data of the DDPG proxy, the model is trained on the basis of a depth certainty gradient strategy, and the trained depth-enhanced learning proxy is obtained.

Optionally, the training module is configured to:

acquiring observed quantity of at least one moment before the estimated moment in the actual running of the hybrid bus and parameters influencing the energy control;

and inputting the observed quantity of the at least one moment and parameters influencing the energy management into the deep reinforcement learning agent, outputting a control behavior, and controlling the bus, wherein the estimated moment is the moment of carrying out parameter sampling next to the current moment.

In a third aspect, the present invention provides a storage medium, which includes a program stored in the storage medium, and when the program runs, the device where the storage medium is located is controlled to execute the hybrid bus energy management method in the above technical solution.

The technical scheme provided by the invention has the beneficial effects that at least:

the method provided by the invention obtains equivalent fuel consumption and parameters influencing energy management of the experimental vehicle under the working condition of a fixed route; training a deep reinforcement learning agent model based on parameters and observed quantities influencing the energy management to obtain a trained agent; the method comprises the steps of obtaining parameters influencing energy management in actual running of the bus, and carrying out energy management on the bus based on the parameters influencing energy management in actual running and a trained agent, so that energy optimization can be effectively controlled. The method provided by the embodiment of the application can be used for energy management of the hybrid bus, the influence of signals of traffic lights at a road intersection on the energy management is considered, in addition, representative 12 time intervals are selected for buses which are susceptible to time intervals to be trained respectively to obtain the agent model, DDPG agents in corresponding time intervals can be selected according to the time of the bus, more effective control on the energy management of the hybrid bus is achieved, and energy consumption is reduced.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic diagram of a training process of deep reinforcement learning DDPG according to an embodiment of the present application;

FIG. 2 is a flow chart of a method for hybrid bus energy management provided by an embodiment of the present application;

FIG. 3 is a schematic diagram illustrating a grid structure and gradient back propagation of an Actor-critical for deep reinforcement learning according to an embodiment of the present disclosure;

FIG. 4 is a flow chart of an apparatus for controlling energy management of a hybrid bus according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of deep reinforcement learning energy management of a hybrid bus provided in an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

FIG. 1 is a schematic diagram of a training process of a deep reinforcement learning DDPG provided by an embodiment of the present application;

referring to fig. 1, in the implementation environment, a deep reinforcement learning agent after training is obtained by collecting parameters that may affect energy management when an experimental vehicle runs in different bus periods under a fixed route working condition and using a deep certainty strategy gradient training model. Specifically, parameters and observed quantities which may affect energy management and reward values are used as input data, the input data are respectively input into a controlled object (namely a hybrid bus) and a deep reinforcement learning agent, and the deep reinforcement learning agent outputs a control quantity action. And outputting the control signal to a controlled object, inputting the reward value into a deep reinforcement learning agent for training, and adjusting a Cryc parameter in the agent by utilizing reverse gradient descent so as to finish one-time training. Through the repeated and continuous learning training, a converged proxy after training is finally established.

And acquiring parameters and observed quantities influencing the energy management in the actual running of the bus, and performing energy management on the hybrid bus under the working condition of a fixed route based on the parameters and observed quantities influencing the energy management in the actual running and the trained deep reinforcement learning agent.

The embodiment can be applied to the scene of hybrid bus under the fixed route operating mode, for example, utilize this embodiment to carry out the energy management of bus, the bus can be according to training good agent in advance, adjusts the condition of traveling of on-line bus, and then reduces the equivalent fuel consumption of bus and carries out more accurate control to the bus.

As a specific implementation, as shown in fig. 2, the present embodiment provides a method for controlling energy management of a hybrid bus, the method comprising:

step 201, obtaining parameters of the experimental vehicle influencing the brake pressure under a fixed working condition.

The working condition can represent the running displacement of the experimental vehicle, for example, the distance from a starting station to an end station of a bus is 10km, and the working condition can be regarded as one working condition. In this embodiment, a fixed round-trip route of a certain bus is used as an experimental condition, and data of the experimental vehicle circulating at least three times under the condition is collected to ensure the reliability of training data.

The parameter affecting energy management may be at least one of road conditions on a fixed route of the hybrid bus including temperature weather, number of passengers at different stops on the bus route, and signal light conditions of the same bus, traffic lane intersection at different time periods each day.

In the implementation, at least one parameter which may influence energy management of the experimental vehicle under each working condition is obtained, and the parameter which influences the energy management is selected from the at least one parameter which may influence the energy management. The equivalent fuel consumption of the experimental vehicle under the working condition is obtained by reading the fuel consumption and the data of the battery electric quantity sensor installed on the bus.

Optionally, parameters and observed quantities of the experimental vehicle, which are equivalent to fuel consumption and affect energy management under various working conditions, are collected at a preset sampling frequency, and smoothing and normalization processing are performed on the collected parameters.

The larger the sampling frequency is, the smaller the interval between sampling points is, the more data are obtained, and the greater the correlation between the data is, so that the result output by the finally trained agent model is more accurate. The technician can preset a sampling time interval, and samples the experimental vehicle in running according to the sampling time interval. For example, the technician may set the sampling frequency to 1 HZ.

By smoothing the collected data, the purpose of suppressing the collected inaccurate data can be achieved.

The acquired parameters are normalized, so that the values corresponding to different parameters have certain comparability, the accuracy of the proxy network model is improved, and great help is provided for setting the reward value.

Specifically, in the normalization process in this embodiment, because multiple sets of parameter items have been obtained in the above process, the data after the parameter normalization processing in each set of parameter items can be calculated by determining the maximum parameter value and the minimum parameter value in each set of parameter items and according to the determined maximum parameter value and the determined minimum parameter value, through the following formula:

wherein x is_minFor the minimum parameter value, x, in each set of parameter terms_maxThe maximum parameter value in each group of parameter items.

Step 202, training an agent model based on parameters and observed values influencing energy management, and obtaining a trained convergence agent.

The deep reinforcement learning agent is one of the combination of neural network models, and can control the action data of the next moment according to the observation data of the estimated moment. The deep reinforcement learning agent is mainly divided into two layers, as shown in fig. 3, the two layers are a Critic layer and an Actor layer, wherein data is input in the Critic layer of the agent model, and data is output in the Actor layer of the agent.

And (3) performing Q function calculation on the evaluation network to obtain a Q value: q (s, a | theta)^Q) The input is state s and action a, and the output is Q function Q ═ s, a | theta^Q) The action network maps the state s to the action to obtain a ═ s, a | theta^Q) The input is a state s, the output is a motion a, the evaluation network is divided into an Online evaluation network and a Target evaluation network, and the motion network is divided into an Online motion network and a Target motion network.

Target evaluation network and Online evaluation network have the same structure, and parameter theta of Online evaluation network Online action network is equal to that of Online evaluation network^Q、θ^μPerforming random initialization through the two networksNetwork parameter initial Target evaluation network and network parameter theta of Target action network^Q＇And theta^μ＇And simultaneously, a space R is opened up to be used as a storage space for experience playback.

After the initialization is finished, iterative solution is started, an action is selected for exploration by adding a Gaussian disturbance to the current network,

a_t＝μ(s|θ^μ)+N_tin which N is_tIs a gaussian perturbation. Performing action a in the current state_tThe corresponding reward and next state are obtained and the process is formed into element groups(s)_t，a_t，r_t，s_t+1) Store into Memory Replay space. Randomly selecting a small batch of data from the MemoryRelay space as training data of an Online action network and an Online evaluation network, and updating the Online evaluation network.

Defining an Online evaluation network Loss function:

the Online evaluation network is updated by minimizing the Loss function. And after the update of the Online evaluation network is finished, updating the Online action network.

The calculated gradient is:

and updating the Online action network according to the gradient descent principle. Finally, the updated Online evaluation network and the parameter theta of the Online action network are utilized^QAnd theta^μNetwork parameter theta of Target evaluation network and Target action network^Q＇And theta^μ′Updating:

as shown in fig. 3, fig. 3 is a schematic structural diagram of a deep reinforcement learning agent model, in each training, parameters and observations affecting braking energy management at a time before an estimated time and a reward value obtained therefrom are used as input data of the agent model, the input data are input into a hidden layer of the agent, and control action data are output at an output layer of the agent. And performing back propagation training according to the gradient value, and adjusting parameters in the proxy model, thereby completing one-time training. And finally establishing a trained agent model through the repeated and continuous learning training.

In the actual process, the number of the neurons in the hidden layer of the agent model is set to be 40, and in order to accurately evaluate the effect of deep reinforcement learning energy management, the control effect of the deep reinforcement learning can be evaluated through the equivalent fuel consumption ratio R.

The equivalent fuel consumption ratio reflects a comparison between the effect of the actual control and the DP reference, with good results as the R value approaches 0. The formula for calculating the value of the ratio R is as follows:

wherein R is represented as the ratio between DP reference data and actual data, S_RLRepresenting equivalent fuel consumption, S, by deep reinforcement learning training_DPThe equivalent fuel consumption reference data obtained under the DP reference is shown.

It should be noted that the prediction performance of the nonlinear autoregressive dynamic neural network model after the training is completed is evaluated by calculating the ratio and the root mean square between the reference data and the actual data, for example, when the obtained ratio is close to 0, the root mean square is close to 0, which indicates that the proxy model after the training is completed by the deep reinforcement learning has good control performance.

Optionally, the training method may be a DDPG algorithm, or a DQN algorithm, Q-learning. In order to make the control data of the deep reinforcement learning algorithm more accurate, a plurality of back propagation training methods can be used for respectively training the agent model, the R value is calculated in the training process of each back propagation training method, the control performance of each back propagation training method is compared by taking the R value as an index, and the back propagation training method with the best training effect is further determined.

Taking time as an abscissa, not easily considering intersection conditions, assuming that buses at the intersection meet green lights, not performing speed constraint, training by using DDPG in deep reinforcement learning to obtain a convergent agent, performing simulation, and collecting equivalent fuel consumption and other parameters which may influence energy management on the experimental vehicle;

taking time as an abscissa, considering intersection conditions, assuming that buses meet red lights at the intersection, enabling the speed to have a process of decelerating to 0 and then accelerating at the intersection point, training by using DDPG in deep reinforcement learning to obtain a convergent agent, and performing simulation, wherein equivalent fuel consumption acquired on the experimental vehicle and other parameters possibly influencing energy management are acquired;

taking the displacement as an abscissa, considering the intersection condition, assuming that buses meet red lights at the intersection, enabling the speed to have a process of decelerating to 0 and then accelerating at the intersection point, training by using DDPG in deep reinforcement learning to obtain a convergent agent, and performing simulation, wherein equivalent fuel consumption acquired on the experimental vehicle and other parameters possibly influencing energy management are acquired;

taking the displacement as the abscissa, considering the crossing situation, the crossing traffic light change signal is shown in table 1, and the last column of ellipses in table 1 indicates that the change situation is the same as the above columns. Combine displacement and time and bus speed, retrain car speed, specifically retrain has two: once the vehicle is at the intersection position, if the vehicle is just in the red light time domain, the speed of the vehicle must be 0, and forced constraint is carried out until the red light time domain is finished, so that the vehicle can not continue to travel. The second is that the automobile speed V (t) has the following relationship with the displacement X (t) and the time t.

When the displacement is determined, the total driving time and even the time between stations are restricted by the upper limit and the lower limit, namely the speed (the average speed of the displacement vehicle) is also restricted. And selecting the same time period, training by using DDPG in deep reinforcement learning to obtain a convergent agent, carrying out simulation, and collecting equivalent fuel consumption and other parameters which may influence energy management on the experimental vehicle.

The method for obtaining the trained deep reinforcement learning agent specifically comprises the following steps: selecting a plurality of time points with obvious change of passenger number in different time periods respectively, then training in the 5 modes, comparing equivalent fuel consumption and other parameters which possibly influence energy management acquired on an experimental vehicle after training with a result of dynamic planning, wherein time and displacement are taken as horizontal coordinates respectively, 4 conditions of green lights and red lights are met at all intersections, the 5 th condition of traffic light change conditions of the intersections is taken as the horizontal coordinates, and the equivalent fuel consumption, speed and SOC curves are compared under the condition that other conditions are the same, so that the increase rate or the decrease rate of the equivalent fuel consumption is output, and the deep reinforcement learning agent with corresponding number of representative time periods is obtained. In this embodiment, 12 time points with obvious characteristics are selected in different time periods respectively, the number of passengers at each station is changed, then training is performed in the 5 ways, equivalent fuel consumption acquired on the experimental vehicle and other parameters which may affect energy management are compared with each other, and compared with a Dynamic Programming (DP) reference considering intersection signals, so as to obtain a deep reinforcement learning agent with 12 representative time periods;

TABLE 1

Under the condition of 5 above, one of 12 groups of data was selected, and the R value is shown in table 2.

TABLE 2

Training algorithm	R
		S_t	-0.0376
S_tc	0.0762
		S_d	-0.0423
S_dc	0.0631
		S_dct	0.0384

From table 2, in the deep reinforcement learning training process, the ratio R value of the obtained reference data to the actual data is close to 0, the displacement is used as the abscissa, so that the performance is better, and the crossing traffic light signal condition is considered to have a great influence on the actual control result.

And 303, acquiring parameters and observed quantities influencing energy management in actual running of the bus, and controlling the energy management in the bus based on the parameters influencing the energy management in the actual running and the trained agent model.

In the steps, the parameter item influencing energy management and the trained agent model are obtained, so that the parameters influencing energy and the observed quantity can be input into the trained agent model in real time to control the running and the energy management of the bus.

Specifically, if the control action of the bus at the current moment is to be controlled, parameters and observed quantities influencing energy management at the estimated moment need to be acquired, and the parameters influencing energy management are at least one of the parameters of the road condition and different time periods on the fixed route of the hybrid bus, the number of passengers at different stops on the bus route and the signal lamp condition at the intersection of the traffic channel. The observed quantity is the speed of the bus, the acceleration of the bus, the engine speed, the engine torque, the motor speed, the motor torque, the battery state of charge, the fuel consumption at the current moment, the difference value between the SOC and the reference SOC, and the bus displacement (time). Inputting the parameters into the trained agent model, and outputting the control action of the bus at the estimated time, wherein the estimated time is the next time for parameter sampling at the current time, that is, the estimated time is the time corresponding to the next sampling point of the sampling point corresponding to the current time.

The method provided by the embodiment of the invention can be used for energy management of the hybrid bus and reduce equivalent fuel consumption.

As shown in fig. 4, another embodiment of the present application provides an apparatus for controlling energy management of a hybrid bus, the apparatus including:

the acquisition module 401 is configured to acquire parameters and observed quantities of the experimental vehicle influencing the brake pressure under various working conditions;

a training module 402, configured to train a deep reinforcement learning agent model based on the parameters and observations affecting the brake pressure and a set reward value calculation, to obtain a trained convergent agent model;

and the real-time control module 403 is configured to acquire parameters and observed quantities affecting the energy management in actual running of the bus, and control the bus energy management based on the parameters and observed quantities affecting the energy management in actual running and the trained agent model.

Optionally, the acquisition module 401 is configured to:

at least one parameter of road condition and time signal on the fixed route of the hybrid power bus, the number of passengers at different stops on the bus route and the signal lamp condition of the intersection of the traffic channel. The speed of the bus, the acceleration of the bus, the engine speed, the engine torque, the motor speed, the motor torque, the battery state of charge, the fuel consumption at the current moment, the difference between the SOC and the reference SOC, and the bus displacement (time).

Optionally, the acquisition module 401 is configured to:

acquiring K samples, wherein each sample comprises parameters which are acquired on the experimental vehicle at the same time and possibly affect energy management;

and acquiring parameters influencing the energy management of the experimental vehicle under various working conditions at a preset sampling frequency, and performing smoothing processing and normalization processing on the acquired data.

Optionally, the training module 402 is configured to:

in each training, parameters and observed values influencing energy management at the estimated time are used as input data of the agent model, calculated reward values are obtained, the agent model is trained on the basis of DDPG back propagation, and the trained agent model is obtained.

Optionally, the real-time implementation module 403 is configured to:

acquiring parameters and observed quantities affecting the energy management at the estimated time in the actual running of the bus, inputting the parameters and the observed quantities affecting the energy management at the time into the agent model, and outputting the bus control quantity, wherein the control time is the next time for controlling at the current time.

It should be noted that: in the energy management of the device for controlling energy management of a bus provided in the above embodiment, only the division of the above functional modules is taken as an example, and in practical applications, after the functions are distributed by different functional modules according to needs, the internal structure of the device may be divided into different functional modules, and all or part of the functions described above are described later. In addition, the embodiments of the method for controlling bus energy management provided by the above embodiments belong to the same concept, and specific implementation processes thereof are detailed in the embodiments of the method and are not described herein again.

As shown in fig. 5, a schematic diagram of a deep reinforcement learning energy management structure of a hybrid bus is provided in this embodiment. Through the fact that the historical travel information mainly comprises the working condition mileage and the working condition running time of the bus, a simple SOC reference can be obtained, namely the reference SOC is decreased at a constant speed in a linear function mode along with the change of displacement (time). The method comprises the steps of obtaining running time of a bus known by an acquisition module, selecting a trained deep reinforcement learning agent in a corresponding time period of 12 representative time periods, obtaining parameters and observed quantities affecting energy management at estimated time in actual running of the bus, inputting the observed quantities affecting the energy management at the time into an agent model, outputting corresponding torque of an engine and a motor of a bus control quantity at the next time, controlling the hybrid bus under consideration of the parameters affecting the energy management, and repeating the processes until the vehicle completes a running task.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A hybrid power bus energy management method is characterized by comprising

obtaining a trained deep reinforcement learning agent based on the parameters influencing energy management and the observed quantity training model;

acquiring parameters and observed quantities influencing energy management in actual running of the bus, and performing energy management on the hybrid bus under a fixed route working condition based on the parameters and observed quantities influencing the energy management in the actual running and the trained deep reinforcement learning agent;

wherein the training model based on the parameters affecting energy management and the observed quantity is a deep deterministic strategy gradient algorithm, a DQN algorithm or Q-learning, and the training model using the deep deterministic strategy gradient comprises the following 5,

the time is taken as an abscissa, the crossing condition is not considered, the bus at the crossing is assumed to be all green light, speed constraint is not carried out, DDPG in deep reinforcement learning is used for training, a convergent agent is obtained, simulation is carried out, and equivalent fuel consumption and other parameters which possibly influence energy management are collected on the experimental vehicle;

2. The hybrid bus energy management method as claimed in claim 1, wherein the parameters affecting energy management comprise road conditions and time periods on a fixed route of the hybrid bus, wherein the road conditions comprise ambient temperature, weather conditions, road gradient, traffic light conditions at intersections, and the number of passengers on each stop of the bus.

3. The hybrid bus energy management method of claim 1, wherein the observed quantities comprise bus speed, acceleration, engine speed, engine torque, motor speed, motor torque, battery state of charge, current time fuel consumption, difference between SOC and reference SOC, and bus displacement/time.

4. The energy management method for the hybrid electric bus according to claim 1, wherein after acquiring the equivalent fuel consumption of the experimental vehicle under the same route working conditions in different periods and the parameters influencing the energy management, the method further comprises the following steps: and acquiring parameters of the equivalent fuel consumption of the experimental vehicle under the same route working condition at each time period and possibly influencing the energy management at a preset sampling frequency, and performing smoothing processing and normalization processing on the acquired parameters.

5. The hybrid bus energy management method according to any one of claims 1 to 4, wherein the agent is obtained by training with a deep reinforcement learning DDPG method based on parameters affecting the energy management and the equivalent fuel consumption, specifically:

in each training, the SOC at the current moment, the difference value between the SOC and the reference SOC, the automobile speed, the automobile displacement/automobile running time, the automobile acceleration and the fuel consumption are used as observed value input data of the DDPG proxy, and the reward value at the current moment is used as reward value input data of the DDPG proxy, the model is trained on the basis of the depth certainty strategy gradient, and the trained depth reinforcement learning proxy is obtained.

6. The hybrid bus energy management method according to claim 5, wherein the acquiring parameters affecting the energy management in the actual driving of the bus, and performing the bus energy management based on the parameters affecting the energy management in the actual driving and the trained deep reinforcement learning agent comprises:

7. An apparatus for controlling energy management in a bus, said apparatus comprising,

the training module is configured to train a model by using a DDPG strategy in deep reinforcement learning based on the parameters influencing energy management to obtain a trained deep reinforcement learning agent;

the real-time implementation module is configured to obtain observed quantity influencing energy management in actual running of the bus, and based on parameters influencing energy management in actual running and the trained deep reinforcement learning agent, the deep reinforcement learning agent in the corresponding time period is selected according to the time of the bus to conduct bus road energy management;

the acquisition module, the training module and the real-time implementation module are sequentially connected;

wherein a model is trained based on the parameters affecting energy management and the observed quantity, the training method used is a deep deterministic strategy gradient algorithm, a DQN algorithm or Q-learning, and the training model using the deep deterministic strategy gradient comprises the following 5,

the time is taken as an abscissa, the crossing condition is not considered, the bus at the crossing is assumed to be all green light, speed constraint is not carried out, DDPG in deep reinforcement learning is used for training, a convergent agent is obtained, simulation is carried out, and equivalent fuel consumption and other parameters which possibly influence energy management are collected on an experimental vehicle;

8. A storage medium, characterized by: the energy management system comprises a program stored in the storage medium, and the device where the storage medium is located is controlled to execute the energy management method of the hybrid bus according to any one of claims 1-6 when the program runs.