CN115563716A - New energy automobile energy management and adaptive cruise cooperative optimization method - Google Patents
New energy automobile energy management and adaptive cruise cooperative optimization method Download PDFInfo
- Publication number
- CN115563716A CN115563716A CN202211253311.XA CN202211253311A CN115563716A CN 115563716 A CN115563716 A CN 115563716A CN 202211253311 A CN202211253311 A CN 202211253311A CN 115563716 A CN115563716 A CN 115563716A
- Authority
- CN
- China
- Prior art keywords
- network
- energy management
- vehicle
- actor
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/10—Geometric CAD
- G06F30/15—Vehicle, aircraft or watercraft design
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Geometry (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Computational Mathematics (AREA)
- Aviation & Aerospace Engineering (AREA)
- Automation & Control Theory (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Electric Propulsion And Braking For Vehicles (AREA)
Abstract
The invention discloses a hybrid electric vehicle energy management strategy and adaptive cruise control cooperative optimization method, which takes a hybrid electric vehicle as a research object, fuses a following model and a power battery energy management strategy based on a depth certainty strategy gradient algorithm, develops an ecological driving energy management strategy based on depth reinforcement learning, and improves fuel economy on the premise of realizing optimal following performance. The method mainly comprises the steps of constructing a simulation environment and loading training data; constructing an Actor and criticic training network based on a DDPG algorithm; training energy management strategies through a DDPG algorithm to obtain inheritable neural network parameters; and downloading the trained network parameters to the whole hybrid electric vehicle controller to realize real-time online application.
Description
Technical Field
The invention relates to a new energy automobile energy management and adaptive cruise cooperative optimization method which is mainly applied to ecological driving energy management strategy development based on deep reinforcement learning.
Background
Global warming due to the large amount of greenhouse gases mainly containing carbon dioxide (CO 2) is increasing, and the process of controlling carbon emissions to delay the global warming has become a common general consensus in countries of the world. A significant proportion of the CO2 emitted into the air comes from the use of fossil fuels by vehicles.
The energy source of the hybrid electric vehicle comprises two parts of heat energy generated by fossil fuel and electric energy stored by a battery, and compared with the traditional fuel oil vehicle, the hybrid electric vehicle has the advantages of less carbon emission and higher fuel oil economic benefit. The energy management strategy aims to improve fuel economy and maintain battery state of charge during vehicle operation. The adaptive cruise control is used for vehicle following scenes of urban roads and expressways and aims to improve the running efficiency and fuel economy of following vehicles. Currently, deep reinforcement learning is used for optimization of an energy management strategy and control of a following model respectively, but the two models are two independent models for the same problem and cannot achieve global optimization.
In order to achieve global optimal performance of an energy management strategy and a following model, energy management and adaptive cruise control are integrated into one model, and a scheme for developing an ecological driving energy management strategy based on deep reinforcement learning becomes possible.
Disclosure of Invention
Aiming at the technical problems in the field, the invention provides a framework combining an energy management strategy based on deep reinforcement learning and an adaptive cruise control algorithm on the basis of a deep reinforcement learning algorithm, and the framework is named as an ecological driving energy management strategy based on deep reinforcement learning.
The invention adopts the following technical scheme:
compared with the prior art, the technical scheme adopted by the invention has the following technical effects:
(1) The hybrid power energy management and the adaptive cruise control of the new energy automobile realize cooperative optimization under an algorithm architecture, and compared with a traditional layered architecture, the development difficulty of each system is reduced;
(2) The hybrid power energy management and the adaptive cruise system of the new energy automobile are separated from a simple uploading and issuing relation, and multi-parameter interaction is realized from the aspects of input states, reward functions, control actions and the like.
Drawings
FIG. 1 is an ecological driving energy management strategy algorithm framework based on deep reinforcement learning;
FIG. 2 is a graph of optimal fuel consumption for an engine;
FIG. 3 is a graph of battery characteristics;
figure 4 is a DDPG algorithm flow.
Detailed Description
The technical solutions of the present application will be further elaborated with reference to the drawings, and the described embodiments are only a part of the embodiments related to this patent. All non-inventive embodiments of this embodiment that are within the scope of this patent by other researchers in the field are considered to be within the scope of this patent.
The invention designs a new energy automobile energy management and adaptive cruise cooperative optimization method, which comprises the following specific steps as shown in figure 1:
step one, building a following model simulation environment, and preloading a battery characteristic curve and an optimal fuel economy curve as prior knowledge; and inputting vehicle running data under a mixed working condition, and using the vehicle running data as training data of a pilot vehicle in a following model.
And step two, creating an Actor network and a criticic network based on the DDPG algorithm and the neural network structure, creating a target network for the Actor network and the criticic network respectively, and constructing a training network and a total reward function of the energy management strategy of the hybrid electric vehicle.
Step three, the intelligent agent interacts with the simulation environment, and offline training is carried out on the energy management strategy of the hybrid electric vehicle through a DDPG algorithm based on the constructed Actor and Critic networks and reward functions to obtain sustainable neural network parameters;
and step four, downloading the inheritable network parameters obtained by the offline training into a vehicle control unit of the hybrid electric vehicle, and realizing real-time online application.
According to the ecological driving energy management strategy based on deep reinforcement learning, in the first step, a follow-up model simulation environment is built by SUMO software, and the speed and the acceleration of a vehicle in a simulation scene are obtained and controlled through a Traci interactive interface. The priori knowledge comprises a battery characteristic curve and an optimal fuel economy curve, wherein the battery characteristic curve is used for constructing a functional relation between internal resistance and open-circuit voltage and an SoC value, and the optimal fuel economy curve is used for constructing a functional relation between engine power and rotating speed and torque. The mixed working condition comprises an expressway working condition and an urban road working condition, and covers most of following scenes, so that the training result can be applied to various roads.
The internal resistance and the open-circuit voltage of the battery have functional relations with the SoC value thereof, three groups of test data are input, namely the relation between the internal resistance and the SoC value in a charging state, the relation between the internal resistance and the SoC value in a discharging state and the relation between the open-circuit voltage and the SoC value, and the functional relations between the internal resistance and the SoC value in the charging state, the internal resistance and the SoC value in the discharging state and the open-circuit voltage and the SoC value are respectively displayed and expressed through unitary linear interpolation fitting, so that the SoC value of the battery at any moment and any state can be solved by using the functional relations.
Inputting the operation data of the engine, the motor and the engine obtained from the bench test as prior knowledge, constructing an optimal fuel economy curve model for representing the functional relation among the rotating speed, the torque and the equivalent fuel consumption rate of the engine, carrying out binary interpolation fitting, displaying and expressing the functional relation, and solving the output power of the engine at any time and in any state by using the functional relation, wherein the output power of the engine is equal to the product of the rotating speed and the torque.
According to the ecological driving energy management strategy based on deep reinforcement learning, in the second step, the inertial navigation system and the global positioning system are used for obtaining real-time speed and acceleration data of the hybrid vehicle, and the SoC value of the hybrid vehicle at any moment is obtained through the following equation:
wherein SoC is the State of Charge, V OC Is the open circuit voltage, R 0 Is the internal resistance, P b Is the output energy, Q, at the charge-discharge stage 0 Is the initial capacity of the battery, Q is the nominal capacity of the battery, and I is the current of the battery at the present time.
And respectively defining a state vector and an action vector by combining the distance between two vehicles, the speed, the acceleration and the engine power in the following model as follows:
wherein v is h And a h Respectively the speed and acceleration of the target vehicle (rear vehicle), L is the inter-vehicle distance, i.e. the distance from the head of the target vehicle to the tail of the pilot vehicle, v p And a p Respectively the speed and acceleration of the pilot vehicle, e h Is the target vehicle engine power. a is h Is a control action of the following model, e h Is the control action of the energy management policy.
In order to ensure the safety of the target vehicle during the following process and simultaneously take the riding comfort into consideration, the reward function of the following model is defined as follows:
r follow =r follow1 +r follow2
wherein L is min And L max Is the minimum and maximum values of the respective vehicle separation, TTC is the time before collision, the reward function r follo The purpose of the method is to limit the vehicle to run within the maximum and minimum following distances and describe the safety during the following process;jerk is the acceleration change rate of the target vehicle at the sampling moment, describes the comfort performance in the following process, and has a reward function r follow2 To improve the ride experience for the driver and passengers.
In order to reduce the fuel consumption of the engine and maintain the SoC value of the battery within an acceptable range, the instantaneous fuel consumption of the engine and the battery charging maintenance cost need to be considered, so that the reward function of the energy management strategy is defined as follows:
r energy =-[fuel+250(SoC ref -SoC) 2 ]
wherein fuel is the fuel consumption of the target vehicle at the sampling moment, soC ref Is the nominal SoC value of the battery.
The ecological driving energy management strategy based on deep reinforcement learning provided by the invention is characterized in that an adaptive cruise and following model and a hybrid electric vehicle energy management strategy are innovatively fused together through a DDPG algorithm, and a total reward function comprises two parts of reward of the following model and reward of the energy management strategy, and is defined as follows:
reward=r follow +r energy
next, a training network is constructed. Construction of an Actor network, denoted as μ (s | θ) μ ) Wherein θ μ The network parameters are input into the Actor network as the current state s, and the deterministic action a is output. Constructing a Critic network, denoted as Q (s, a | θ) Q ),θ Q The Critic network has the input of a current state s and a deterministic action a output of the Actor network, and the output of the Critic network is a value function and gradient information.
Respectively establishing target networks mu' (s | theta) of Actor network and Critic network μ ′)、Q′(s,a|θ Q ') the network structure and the parameter structure of the target network are the same as the corresponding network, and θ μ' is the parameter of the target network of the Actor network, θ μ Q ' is a parameter of a target network of the Critic network. And (3) training the energy management strategy of the hybrid electric vehicle by applying the constructed target networks of the Actor and Critic networks.
The invention relates to an ecological driving energy management strategy based on deep reinforcement learning, which comprises the following stepsAnd the intelligent agent in the DDPG algorithm framework interacts with the simulation environment, acquires the current environment state information, selects and executes actions according to the strategy, enters a new environment state, acquires rewards fed back by the environment, stores the information of the states, the actions, the rewards and the like at the same time, and realizes the training of the energy management strategy through an experience playback pool in a circulating way. In order to make the model converge more quickly and achieve better training effect, a prior experience playback technology is adopted in the algorithm, namely each group of experience data is assigned with one absolute value | delta ] of the time sequence error of the experience data t I, the samples with higher probability values will have a higher probability to be sampled. The training steps are as follows:
And 2, introducing action noise by using Laplace random distribution to search a potential better strategy.
δ t =y t -Q(s t ,a t |θ Q )
Wherein:
y t =r t +γQ′[s t+1 ,μ′(s t+1 |θ μ ′)|θ Q ′]
where γ is the attenuation ratio, y t Is the target Q value.
The absolute value of the timing error | delta t I is sorted from big to small, rank (t) is marked as its serial number, according to which the experience p is defined t The priority of (2):
define the experience p accordingly t Sampling probability of (2):
where n is the size of the empirical playback pool, α represents the degree of control priority usage, and takes a value between 0 and 1, and when α =0 represents uniform sampling.
In order to increase the diversity of the experience pool and avoid the network from falling into an overfitting state, a sampling importance weight is defined:
wherein p is min Represents p t Minimum value of (d); beta is the annealing index, the initial value of which is beta 0 Between 0 and 1, beta will anneal linearly to 1.
And 6, sampling from the experience playback pool R according to the sampling probability by a prior experience playback technology to obtain a small batch of samples S (the number of the samples is recorded as N) for training an Actor and a Critic network.
in the formulaIs the gradient operator, J is the objective function of the DDPG algorithm, a represents the action, and s represents the state.
in the formula, tau is an updating amplitude, and the default value is 0.001.
And 10, repeating the steps 2 to 9 until the training is finished, and then storing and downloading the neural network parameters.
In a preferred embodiment of the present invention, the step one specifically includes the following steps:
And 3, inputting mixed working condition data as the driving information of the pilot vehicle, wherein the mixed working condition consists of an expressway working condition and an urban road working condition and covers the following scene under most road conditions. The average speed in the set of data was 44km/h, the maximum speed was 116km/h, and the duration was 1858s.
And 4, step 4: the SoC value of the hybrid vehicle at any time is obtained by the following equation:
wherein SoC is the state of charge, V OC Is an open circuit voltage, R 0 Is the internal resistance, P b Is the output energy, Q, in the charging and discharging phases 0 Is the initial capacity of the battery, Q is the nominal capacity of the battery, and I is the current of the battery at the present time.
In a preferred embodiment of the present invention, the second step specifically includes the following steps:
wherein v is h And a h Respectively the speed and acceleration of the target vehicle (rear vehicle), L is the inter-vehicle distance, i.e. the distance from the head of the target vehicle to the tail of the pilot vehicle, v p And a p Speed and acceleration of the pilot vehicle, respectively, e h Is the target vehicle engine power. a is h Is a control action of the following model, e h Is the control action of the energy management policy.
The reward function is defined as follows:
reward=r follo +r follow2 +r energy
r energy =-[fuel+250(SoC ref -SoC) 2 ]
wherein TTC is the time before collision, jerk is the acceleration change rate of the target vehicle at the sampling moment, fuel is the fuel consumption of the target vehicle at the sampling moment, and SoC ref Is the nominal SoC value of the battery.
In a preferred embodiment of the present invention, the DDPG algorithm flow is shown in FIG. 4.
In a preferred embodiment of the present invention, the step three specifically includes the following steps:
And 6, sampling from the experience playback pool R according to the sampling probability through a prior experience playback technology to obtain small batch samples S, and training an Actor and a Critic network.
in the formulaIs the gradient operator, J is the objective function of the algorithm, a denotesAction, s denotes state.
in the formula, tau is an updating amplitude, and the default value is 0.001.
And 10, repeating the steps 2 to 9 until the maximum cycle number M, finishing training, and then storing and downloading the neural network parameters.
In a preferred embodiment of the present invention, the step four specifically is: and downloading the network parameters obtained by off-line training into a vehicle control unit of the hybrid electric vehicle to realize real-time on-line application.
The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can understand that the modifications or substitutions within the technical scope of the present invention are included in the scope of the present invention, and therefore, the scope of the present invention should be subject to the protection scope of the claims.
Claims (9)
1. A new energy automobile energy management and adaptive cruise cooperative optimization method is characterized by comprising the following steps:
step one, building a following model simulation environment, and preloading a battery characteristic curve and an optimal fuel economy curve as prior knowledge; inputting vehicle running data under a mixed working condition, and using the vehicle running data as training data of a pilot vehicle in a following model;
creating an Actor network and a criticic network based on a DDPG algorithm and a neural network structure, respectively creating a target network for the Actor network and the criticic network, constructing a training network of the energy management strategy of the hybrid electric vehicle, and constructing a total reward function of the energy management strategy of the hybrid electric vehicle;
step three, the intelligent agent interacts with the simulation environment, and based on the established Actor network, criticic network and reward function, the energy management strategy of the hybrid electric vehicle is trained offline through a DDPG algorithm to obtain sustainable neural network parameters;
and step four, downloading the inheritable network parameters obtained by the offline training into a vehicle control unit of the hybrid electric vehicle, and realizing real-time online application.
2. The new energy automobile energy management and adaptive cruise cooperative optimization method according to claim 1, characterized in that in the first step, SUMO software is used for building a car following model simulation environment, and the speed and the acceleration of a vehicle in a simulation scene are obtained and controlled through a Traci interaction interface.
3. The method for collaborative optimization of new energy vehicle energy management and adaptive cruise according to claim 1, wherein the prior knowledge includes: the battery characteristic curve is used for constructing a functional relation among the internal resistance, the open-circuit voltage and the SoC value, so that the SoC value of the battery at any time and in any state is solved; the optimal fuel economy curve is used for constructing a functional relation between the engine power and the rotating speed and the torque, so that the engine output power at any time and any state can be solved.
4. The method for collaborative optimization of new energy vehicle energy management and adaptive cruise according to claim 1, characterized in that the hybrid condition is composed of a highway condition and an urban road condition.
5. The method for energy management and adaptive cruise cooperative optimization of a new energy vehicle according to claim 3, wherein the SoC value of the hybrid vehicle at any time is obtained through the following equation:
wherein SoC is the state of charge, V OC Is an open circuit voltage, R 0 Is the internal resistance, P b Is the output energy, Q, at the charge-discharge stage 0 Is the initial capacity of the battery, Q is the nominal capacity of the battery, and I is the current of the battery at the present time.
6. The method for collaborative optimization of new energy automobile energy management and adaptive cruise control according to claim 1, characterized in that, in combination with distance between two vehicles, speed, acceleration and engine power in a follow-up model, state vector state and action vector action are respectively defined as follows:
wherein v is h Is the speed of the target vehicle; l is the distance between vehicles, namely the distance from the head of the target vehicle to the tail of the pilot vehicle; v. of p And a p Respectively the speed and acceleration of the pilot vehicle; a is h Is a control action of the following model, i.e. the acceleration of the target vehicle, e h Is the control action of the energy management strategy, i.e. the target vehicle engine power;
the reward function defining the following model is as follows:
r follow =r follow1 +r follow2
wherein L is min And L max Are the minimum and maximum values of the respective vehicle separation, TTC is the time before collision; jerk is the acceleration rate of the target vehicle at the sampling time;
the reward function that defines the energy management policy is as follows:
r energy =-[fuel+250(SoC ref -SoC) 2 ]
wherein fuel is the fuel consumption of the target vehicle at the sampling time, soC ref Is the nominal SoC value of the battery;
the total reward function defining the energy management strategy of the hybrid electric vehicle is as follows:
reward=r follow +r energy 。
7. the method for collaborative optimization of new energy automobile energy management and adaptive cruise control according to claim 1, characterized by constructing an Actor network denoted as μ (s | θ) μ ) Wherein θ μ The method comprises the steps that network parameters are adopted, the input of an Actor network is a current state s, and the output is a deterministic action a; constructing a Critic network, denoted as Q (s, a | θ) Q ),θ Q The method comprises the steps that network parameters are input into a Critic network, the current state s and the deterministic action a output from an Actor network are input into the Critic network, and the output is a value function and gradient information;
respectively establishing target networks mu' (s | theta) of Actor network and Critic network μ′ )、Q′(s,a|θ Q′ ) Target network μ' (s | θ) μ′ )、Q′(s,a|θ Q′ ) Respectively with the corresponding network mu (s | theta) μ )、Q(s,a|θ Q ) Same, remember θ μ′ Is a parameter of the target network of the Actor network, θ Q′ A parameter of a target network which is a Critic network; and (3) training the energy management strategy of the hybrid electric vehicle by applying the constructed target networks of the Actor and the Critic network.
8. The method for collaborative optimization of new energy vehicle energy management and adaptive cruise according to claim 1, characterized by: and in the third step, the intelligent agent interacts with the simulation environment, acquires the current environment state information, selects and executes actions according to the strategy, enters a new environment state, acquires rewards fed back by the simulation environment, stores the state, the actions and the reward information at the same time, and realizes the training of the hybrid power energy management strategy through an experience playback pool.
9. The method for collaborative optimization of new energy vehicle energy management and adaptive cruise according to claim 1, characterized by: in the third step, the offline training of the energy management strategy of the hybrid electric vehicle adopts a prior experience playback technology, and the specific training steps are as follows:
step 1, initializing an Actor network, a Critic network and a target network thereof; defining a storage space R as an experience playback pool and initializing;
step 2, introducing action noise Z at time t by using Laplace random distribution t To find a potentially better strategy;
step 3, combining the state s at the time t according to the action strategy t And Laplace random noise to obtain the motion vector a at the time t t ={a h ,e h And that is: a is a t =μ(s t |θ μ )+Z t (ii) a Performing motion vector a t Obtaining the total reward r of the current time t And state s at time t +1 t+1 (ii) a Judging whether the current cycle is ended, if the pool value is true, ending the current cycle, and returning to execute the step 2; if the pool value is false, continuing to execute the step 4;
step 4, according to the absolute value | delta of the time sequence error t I calculating the sampling probability P (t) and the importance weight omega t :
δ t =y t -Q(s t ,a t |θ Q )
Wherein:
y t =r t +γQ′[s t+1 ,μ′(s t+1 |θ μ′ )|θ Q′ ]
where γ is the attenuation ratio, y t Is the target Q value at time t;
the absolute value of the timing error | delta t I is sorted from big to small, rank (t) is marked as its serial number, according to which the experience p is defined t The priority of (2):
define the experience p accordingly t Sampling probability of (2):
where n is the size of the empirical playback pool and α represents the degree of control priority usage;
defining sampling importance weights:
wherein p is min Represents p t β is the annealing index;
step 5, the experience playback pool adopts a binary tree data structure, and the information generated in the interaction is processed by T t =(s t ,a t ,r t ,s t+1 Bol) form into the end leaves and storing T at the same time t As a training data set of the Actor and criticc networks;
step 6, sampling is carried out from the experience playback pool R according to the sampling probability through a prior experience playback technology, a small batch of samples S are obtained, the number of the samples is recorded as N, and the samples are used for training an Actor and a Critic network;
step 7, calculating the gradient of the Critic network, and calculating the loss function L (theta) of the Critic network Q ):
Step 8, updating the parameter theta of the Critic network by using the adaptive matrix estimation algorithm Adam Q And calculating the gradient of the Actor network:
in the formulaIs a gradient operator, J is an objective function of the DDPG algorithm, a represents an action, and s represents a state;
step 9, updating the parameter theta of the Actor network by using the adaptive matrix estimation algorithm Adam μ And updating target network parameters of the Actor and the Critic networks by using a soft updating method, namely updating the target networks of the Critic and the Actor by a set amplitude tau in each time step:
and 10, repeating the steps 2 to 9 until the preset maximum iteration number is reached, finishing the training, and then storing and downloading the neural network parameters.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211253311.XA CN115563716A (en) | 2022-10-13 | 2022-10-13 | New energy automobile energy management and adaptive cruise cooperative optimization method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211253311.XA CN115563716A (en) | 2022-10-13 | 2022-10-13 | New energy automobile energy management and adaptive cruise cooperative optimization method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115563716A true CN115563716A (en) | 2023-01-03 |
Family
ID=84744500
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211253311.XA Pending CN115563716A (en) | 2022-10-13 | 2022-10-13 | New energy automobile energy management and adaptive cruise cooperative optimization method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115563716A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117708999A (en) * | 2024-02-06 | 2024-03-15 | 北京航空航天大学 | Scene-oriented hybrid electric vehicle energy management strategy evaluation method |
CN117807714A (en) * | 2024-01-05 | 2024-04-02 | 重庆大学 | Adaptive online lifting method for deep reinforcement learning type control strategy |
US12030657B1 (en) | 2023-10-27 | 2024-07-09 | Rtx Corporation | System and methods for power split algorithm design for aircraft hybrid electric propulsion based on combined actor-critic RL agent and control barrier function filter |
-
2022
- 2022-10-13 CN CN202211253311.XA patent/CN115563716A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US12030657B1 (en) | 2023-10-27 | 2024-07-09 | Rtx Corporation | System and methods for power split algorithm design for aircraft hybrid electric propulsion based on combined actor-critic RL agent and control barrier function filter |
CN117807714A (en) * | 2024-01-05 | 2024-04-02 | 重庆大学 | Adaptive online lifting method for deep reinforcement learning type control strategy |
CN117708999A (en) * | 2024-02-06 | 2024-03-15 | 北京航空航天大学 | Scene-oriented hybrid electric vehicle energy management strategy evaluation method |
CN117708999B (en) * | 2024-02-06 | 2024-04-09 | 北京航空航天大学 | Scene-oriented hybrid electric vehicle energy management strategy evaluation method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111731303B (en) | HEV energy management method based on deep reinforcement learning A3C algorithm | |
CN111267831B (en) | Intelligent time-domain-variable model prediction energy management method for hybrid electric vehicle | |
CN111845701B (en) | HEV energy management method based on deep reinforcement learning in car following environment | |
Liessner et al. | Deep reinforcement learning for advanced energy management of hybrid electric vehicles. | |
Xu et al. | Look-ahead prediction-based real-time optimal energy management for connected HEVs | |
CN110936949B (en) | Energy control method, equipment, storage medium and device based on driving condition | |
Li et al. | Power management for a plug-in hybrid electric vehicle based on reinforcement learning with continuous state and action spaces | |
CN115495997A (en) | New energy automobile ecological driving method based on heterogeneous multi-agent deep reinforcement learning | |
CN115793445B (en) | Hybrid electric vehicle control method based on multi-agent deep reinforcement learning | |
CN112498334B (en) | Robust energy management method and system for intelligent network-connected hybrid electric vehicle | |
CN114852105A (en) | Method and system for planning track change of automatic driving vehicle | |
CN115563716A (en) | New energy automobile energy management and adaptive cruise cooperative optimization method | |
CN115107733A (en) | Energy management method and system for hybrid electric vehicle | |
Zhang et al. | Driving behavior oriented torque demand regulation for electric vehicles with single pedal driving | |
CN115805840A (en) | Energy consumption control method and system for range-extending type electric loader | |
CN114969982A (en) | Fuel cell automobile deep reinforcement learning energy management method based on strategy migration | |
CN116861791A (en) | Energy saving and emission reduction energy management method based on enhanced TD3 algorithm | |
CN110641470A (en) | Pure electric vehicle driving auxiliary system optimization method integrating driver preference | |
Guo et al. | Modeling, learning and prediction of longitudinal behaviors of human-driven vehicles by incorporating internal human DecisionMaking process using inverse model predictive control | |
Liu et al. | Adaptive eco-driving of fuel cell vehicles based on multi-light trained deep reinforcement learning | |
Jayanthi et al. | Powell Metaheuristic Cat Swarm optimized Sugeno Fuzzy Controller based Deep Belief Network for energy management in Hybrid electric vehicles | |
Liu et al. | Integrated longitudinal speed decision-making and energy efficiency control for connected electrified vehicles | |
CN113859214B (en) | Method and device for controlling dynamic energy efficiency of engine of hybrid power system | |
CN117184095B (en) | Hybrid electric vehicle system control method based on deep reinforcement learning | |
Yuxing et al. | Research on Driving Control Strategy of Electric Racing Car Based on Pattern Recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |