CN115563716A - New energy automobile energy management and adaptive cruise cooperative optimization method - Google Patents

New energy automobile energy management and adaptive cruise cooperative optimization method Download PDF

Info

Publication number
CN115563716A
CN115563716A CN202211253311.XA CN202211253311A CN115563716A CN 115563716 A CN115563716 A CN 115563716A CN 202211253311 A CN202211253311 A CN 202211253311A CN 115563716 A CN115563716 A CN 115563716A
Authority
CN
China
Prior art keywords
network
energy management
vehicle
actor
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211253311.XA
Other languages
Chinese (zh)
Inventor
彭剑坤
范毅
陈伟琪
殷国栋
庄伟超
江如海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202211253311.XA priority Critical patent/CN115563716A/en
Publication of CN115563716A publication Critical patent/CN115563716A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/10Geometric CAD
    • G06F30/15Vehicle, aircraft or watercraft design
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Geometry (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Automation & Control Theory (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Electric Propulsion And Braking For Vehicles (AREA)

Abstract

The invention discloses a hybrid electric vehicle energy management strategy and adaptive cruise control cooperative optimization method, which takes a hybrid electric vehicle as a research object, fuses a following model and a power battery energy management strategy based on a depth certainty strategy gradient algorithm, develops an ecological driving energy management strategy based on depth reinforcement learning, and improves fuel economy on the premise of realizing optimal following performance. The method mainly comprises the steps of constructing a simulation environment and loading training data; constructing an Actor and criticic training network based on a DDPG algorithm; training energy management strategies through a DDPG algorithm to obtain inheritable neural network parameters; and downloading the trained network parameters to the whole hybrid electric vehicle controller to realize real-time online application.

Description

New energy automobile energy management and adaptive cruise cooperative optimization method
Technical Field
The invention relates to a new energy automobile energy management and adaptive cruise cooperative optimization method which is mainly applied to ecological driving energy management strategy development based on deep reinforcement learning.
Background
Global warming due to the large amount of greenhouse gases mainly containing carbon dioxide (CO 2) is increasing, and the process of controlling carbon emissions to delay the global warming has become a common general consensus in countries of the world. A significant proportion of the CO2 emitted into the air comes from the use of fossil fuels by vehicles.
The energy source of the hybrid electric vehicle comprises two parts of heat energy generated by fossil fuel and electric energy stored by a battery, and compared with the traditional fuel oil vehicle, the hybrid electric vehicle has the advantages of less carbon emission and higher fuel oil economic benefit. The energy management strategy aims to improve fuel economy and maintain battery state of charge during vehicle operation. The adaptive cruise control is used for vehicle following scenes of urban roads and expressways and aims to improve the running efficiency and fuel economy of following vehicles. Currently, deep reinforcement learning is used for optimization of an energy management strategy and control of a following model respectively, but the two models are two independent models for the same problem and cannot achieve global optimization.
In order to achieve global optimal performance of an energy management strategy and a following model, energy management and adaptive cruise control are integrated into one model, and a scheme for developing an ecological driving energy management strategy based on deep reinforcement learning becomes possible.
Disclosure of Invention
Aiming at the technical problems in the field, the invention provides a framework combining an energy management strategy based on deep reinforcement learning and an adaptive cruise control algorithm on the basis of a deep reinforcement learning algorithm, and the framework is named as an ecological driving energy management strategy based on deep reinforcement learning.
The invention adopts the following technical scheme:
compared with the prior art, the technical scheme adopted by the invention has the following technical effects:
(1) The hybrid power energy management and the adaptive cruise control of the new energy automobile realize cooperative optimization under an algorithm architecture, and compared with a traditional layered architecture, the development difficulty of each system is reduced;
(2) The hybrid power energy management and the adaptive cruise system of the new energy automobile are separated from a simple uploading and issuing relation, and multi-parameter interaction is realized from the aspects of input states, reward functions, control actions and the like.
Drawings
FIG. 1 is an ecological driving energy management strategy algorithm framework based on deep reinforcement learning;
FIG. 2 is a graph of optimal fuel consumption for an engine;
FIG. 3 is a graph of battery characteristics;
figure 4 is a DDPG algorithm flow.
Detailed Description
The technical solutions of the present application will be further elaborated with reference to the drawings, and the described embodiments are only a part of the embodiments related to this patent. All non-inventive embodiments of this embodiment that are within the scope of this patent by other researchers in the field are considered to be within the scope of this patent.
The invention designs a new energy automobile energy management and adaptive cruise cooperative optimization method, which comprises the following specific steps as shown in figure 1:
step one, building a following model simulation environment, and preloading a battery characteristic curve and an optimal fuel economy curve as prior knowledge; and inputting vehicle running data under a mixed working condition, and using the vehicle running data as training data of a pilot vehicle in a following model.
And step two, creating an Actor network and a criticic network based on the DDPG algorithm and the neural network structure, creating a target network for the Actor network and the criticic network respectively, and constructing a training network and a total reward function of the energy management strategy of the hybrid electric vehicle.
Step three, the intelligent agent interacts with the simulation environment, and offline training is carried out on the energy management strategy of the hybrid electric vehicle through a DDPG algorithm based on the constructed Actor and Critic networks and reward functions to obtain sustainable neural network parameters;
and step four, downloading the inheritable network parameters obtained by the offline training into a vehicle control unit of the hybrid electric vehicle, and realizing real-time online application.
According to the ecological driving energy management strategy based on deep reinforcement learning, in the first step, a follow-up model simulation environment is built by SUMO software, and the speed and the acceleration of a vehicle in a simulation scene are obtained and controlled through a Traci interactive interface. The priori knowledge comprises a battery characteristic curve and an optimal fuel economy curve, wherein the battery characteristic curve is used for constructing a functional relation between internal resistance and open-circuit voltage and an SoC value, and the optimal fuel economy curve is used for constructing a functional relation between engine power and rotating speed and torque. The mixed working condition comprises an expressway working condition and an urban road working condition, and covers most of following scenes, so that the training result can be applied to various roads.
The internal resistance and the open-circuit voltage of the battery have functional relations with the SoC value thereof, three groups of test data are input, namely the relation between the internal resistance and the SoC value in a charging state, the relation between the internal resistance and the SoC value in a discharging state and the relation between the open-circuit voltage and the SoC value, and the functional relations between the internal resistance and the SoC value in the charging state, the internal resistance and the SoC value in the discharging state and the open-circuit voltage and the SoC value are respectively displayed and expressed through unitary linear interpolation fitting, so that the SoC value of the battery at any moment and any state can be solved by using the functional relations.
Inputting the operation data of the engine, the motor and the engine obtained from the bench test as prior knowledge, constructing an optimal fuel economy curve model for representing the functional relation among the rotating speed, the torque and the equivalent fuel consumption rate of the engine, carrying out binary interpolation fitting, displaying and expressing the functional relation, and solving the output power of the engine at any time and in any state by using the functional relation, wherein the output power of the engine is equal to the product of the rotating speed and the torque.
According to the ecological driving energy management strategy based on deep reinforcement learning, in the second step, the inertial navigation system and the global positioning system are used for obtaining real-time speed and acceleration data of the hybrid vehicle, and the SoC value of the hybrid vehicle at any moment is obtained through the following equation:
Figure BDA0003888657560000031
wherein SoC is the State of Charge, V OC Is the open circuit voltage, R 0 Is the internal resistance, P b Is the output energy, Q, at the charge-discharge stage 0 Is the initial capacity of the battery, Q is the nominal capacity of the battery, and I is the current of the battery at the present time.
And respectively defining a state vector and an action vector by combining the distance between two vehicles, the speed, the acceleration and the engine power in the following model as follows:
Figure BDA0003888657560000032
wherein v is h And a h Respectively the speed and acceleration of the target vehicle (rear vehicle), L is the inter-vehicle distance, i.e. the distance from the head of the target vehicle to the tail of the pilot vehicle, v p And a p Respectively the speed and acceleration of the pilot vehicle, e h Is the target vehicle engine power. a is h Is a control action of the following model, e h Is the control action of the energy management policy.
In order to ensure the safety of the target vehicle during the following process and simultaneously take the riding comfort into consideration, the reward function of the following model is defined as follows:
r follow =r follow1 +r follow2
Figure BDA0003888657560000033
Figure BDA0003888657560000034
wherein L is min And L max Is the minimum and maximum values of the respective vehicle separation, TTC is the time before collision, the reward function r follo The purpose of the method is to limit the vehicle to run within the maximum and minimum following distances and describe the safety during the following process;jerk is the acceleration change rate of the target vehicle at the sampling moment, describes the comfort performance in the following process, and has a reward function r follow2 To improve the ride experience for the driver and passengers.
In order to reduce the fuel consumption of the engine and maintain the SoC value of the battery within an acceptable range, the instantaneous fuel consumption of the engine and the battery charging maintenance cost need to be considered, so that the reward function of the energy management strategy is defined as follows:
r energy =-[fuel+250(SoC ref -SoC) 2 ]
wherein fuel is the fuel consumption of the target vehicle at the sampling moment, soC ref Is the nominal SoC value of the battery.
The ecological driving energy management strategy based on deep reinforcement learning provided by the invention is characterized in that an adaptive cruise and following model and a hybrid electric vehicle energy management strategy are innovatively fused together through a DDPG algorithm, and a total reward function comprises two parts of reward of the following model and reward of the energy management strategy, and is defined as follows:
reward=r follow +r energy
next, a training network is constructed. Construction of an Actor network, denoted as μ (s | θ) μ ) Wherein θ μ The network parameters are input into the Actor network as the current state s, and the deterministic action a is output. Constructing a Critic network, denoted as Q (s, a | θ) Q ),θ Q The Critic network has the input of a current state s and a deterministic action a output of the Actor network, and the output of the Critic network is a value function and gradient information.
Respectively establishing target networks mu' (s | theta) of Actor network and Critic network μ ′)、Q′(s,a|θ Q ') the network structure and the parameter structure of the target network are the same as the corresponding network, and θ μ' is the parameter of the target network of the Actor network, θ μ Q ' is a parameter of a target network of the Critic network. And (3) training the energy management strategy of the hybrid electric vehicle by applying the constructed target networks of the Actor and Critic networks.
The invention relates to an ecological driving energy management strategy based on deep reinforcement learning, which comprises the following stepsAnd the intelligent agent in the DDPG algorithm framework interacts with the simulation environment, acquires the current environment state information, selects and executes actions according to the strategy, enters a new environment state, acquires rewards fed back by the environment, stores the information of the states, the actions, the rewards and the like at the same time, and realizes the training of the energy management strategy through an experience playback pool in a circulating way. In order to make the model converge more quickly and achieve better training effect, a prior experience playback technology is adopted in the algorithm, namely each group of experience data is assigned with one absolute value | delta ] of the time sequence error of the experience data t I, the samples with higher probability values will have a higher probability to be sampled. The training steps are as follows:
step 1, initializing an Actor network, a Critic network and a target network thereof; a storage space R is defined as an experience replay pool and initialized.
And 2, introducing action noise by using Laplace random distribution to search a potential better strategy.
Step 3, combining the state s of the current time t according to the action strategy t And Laplace random noise to obtain motion vector a t ={a h ,e h And i.e.: a is t =μ(s tμ )+Z t . Performing action a t Obtaining the reward r of the current time t t And the state vector s at time t +1 t+1 . Judging whether the current cycle is ended, if the pool value is true, ending the current cycle, and executing the step 2; if the pool value is false, continue to execute step 4.
Step 4, according to the absolute value | delta of the time sequence error t I calculating the sampling probability P (t) and the importance weight omega t
δ t =y t -Q(s t ,a tQ )
Wherein:
y t =r t +γQ′[s t+1 ,μ′(s t+1μ ′)|θ Q ′]
where γ is the attenuation ratio, y t Is the target Q value.
The absolute value of the timing error | delta t I is sorted from big to small, rank (t) is marked as its serial number, according to which the experience p is defined t The priority of (2):
Figure BDA0003888657560000051
define the experience p accordingly t Sampling probability of (2):
Figure BDA0003888657560000052
where n is the size of the empirical playback pool, α represents the degree of control priority usage, and takes a value between 0 and 1, and when α =0 represents uniform sampling.
In order to increase the diversity of the experience pool and avoid the network from falling into an overfitting state, a sampling importance weight is defined:
Figure BDA0003888657560000053
wherein p is min Represents p t Minimum value of (d); beta is the annealing index, the initial value of which is beta 0 Between 0 and 1, beta will anneal linearly to 1.
Step 5, the experience playback pool adopts a binary tree data structure, and the information generated in the interaction is processed by T t =(s t ,a t ,r t ,s t+1 Bol) form into the terminal leaves and simultaneously storing T t As a training data set for the Actor and criticc networks.
And 6, sampling from the experience playback pool R according to the sampling probability by a prior experience playback technology to obtain a small batch of samples S (the number of the samples is recorded as N) for training an Actor and a Critic network.
Step 7, calculating the gradient of the Critic network through a chain rule, and calculating a loss function L (theta) of the Critic network Q ):
Figure BDA0003888657560000054
Step 8, updating the parameter theta of the Critic network by using an adaptive matrix estimation algorithm (Adam) Q And calculating the gradient of the Actor network:
Figure BDA0003888657560000055
in the formula
Figure BDA0003888657560000056
Is the gradient operator, J is the objective function of the DDPG algorithm, a represents the action, and s represents the state.
Step 9, updating the parameter theta of the Actor network by using an adaptive matrix estimation algorithm (Adam) μ And updating target network parameters of the networks of the Actor and the Critic by using a soft updating method, namely updating the target networks of the Critic and the Actor in small amplitude in each time step:
Figure BDA0003888657560000061
in the formula, tau is an updating amplitude, and the default value is 0.001.
And 10, repeating the steps 2 to 9 until the training is finished, and then storing and downloading the neural network parameters.
In a preferred embodiment of the present invention, the step one specifically includes the following steps:
step 1, pre-writing road network and vehicle files in the SUMO, and calling the files in the python program through a Traci interactive interface.
Step 2, inputting prior knowledge, and obtaining an explicit functional relationship by an interpolation fitting method, wherein the explicit functional relationship comprises four groups of functional relationships: (1) The functional relation between the engine speed, the torque and the equivalent fuel consumption rate; (2) a functional relation between the internal resistance and the SoC value in a charging state; (3) functional relation between internal resistance and SoC value in discharge state; and (4) the functional relation between the open-circuit voltage and the SoC value. The images are drawn as shown in fig. 2 and 3. The functional relation is used for solving the battery SoC value and the engine output power at any time and in any state.
And 3, inputting mixed working condition data as the driving information of the pilot vehicle, wherein the mixed working condition consists of an expressway working condition and an urban road working condition and covers the following scene under most road conditions. The average speed in the set of data was 44km/h, the maximum speed was 116km/h, and the duration was 1858s.
And 4, step 4: the SoC value of the hybrid vehicle at any time is obtained by the following equation:
Figure BDA0003888657560000062
wherein SoC is the state of charge, V OC Is an open circuit voltage, R 0 Is the internal resistance, P b Is the output energy, Q, in the charging and discharging phases 0 Is the initial capacity of the battery, Q is the nominal capacity of the battery, and I is the current of the battery at the present time.
In a preferred embodiment of the present invention, the second step specifically includes the following steps:
step 1, defining a state set and a behavior set as follows:
Figure BDA0003888657560000063
wherein v is h And a h Respectively the speed and acceleration of the target vehicle (rear vehicle), L is the inter-vehicle distance, i.e. the distance from the head of the target vehicle to the tail of the pilot vehicle, v p And a p Speed and acceleration of the pilot vehicle, respectively, e h Is the target vehicle engine power. a is h Is a control action of the following model, e h Is the control action of the energy management policy.
The reward function is defined as follows:
reward=r follo +r follow2 +r energy
Figure BDA0003888657560000071
Figure BDA0003888657560000072
r energy =-[fuel+250(SoC ref -SoC) 2 ]
wherein TTC is the time before collision, jerk is the acceleration change rate of the target vehicle at the sampling moment, fuel is the fuel consumption of the target vehicle at the sampling moment, and SoC ref Is the nominal SoC value of the battery.
Step 2, constructing an Actor network and recording as mu (s | theta) μ ) Wherein θ μ The network parameters are input into the Actor network as the current state s, and the output is the deterministic action a.
Step 3, constructing a Critic network, and recording the Critic network as Q (s, a | theta) Q ),θ Q Is a network parameter, and the inputs to the Critic network are the current states s and Ac to r deterministic action a of the network output, the output being a function of the value and gradient information.
Step 4, respectively establishing target networks of the Actor network and the Critic network, wherein the network structure and the parameter structure of the target networks are the same as those of the corresponding networks, and recording theta μ ' is a parameter of a target network of an Actor network, theta Q ' is a parameter of a target network of the Critic network.
In a preferred embodiment of the present invention, the DDPG algorithm flow is shown in FIG. 4.
In a preferred embodiment of the present invention, the step three specifically includes the following steps:
step 1, initializing Actor network mu (s | theta) μ ) And Critic network Q (s, a | θ) Q ) And its target network mu' (s | theta) μ ′)、Q′(s,a|θ Q ') to a host; defining a storage space R as an experience playback pool, and setting the capacity to be N; initializing hyper-parameters alpha and beta; and setting the maximum circulation times M of the intelligent agent.
Step 2, initial simulation environment andtraining data of the pilot vehicle to obtain an initial state s t
Step 3, according to the initial state s t And selecting an action according to the action strategy and the Laplace random noise, namely: a is a t =μ(s tμ )+Z t . Performing action a t Obtaining the prize r at the current time t And the state vector s at the next time instant t+1 . Judging whether the current cycle is ended, if the pool value is true, ending the current cycle, and executing the step 2; if the pool value is false, continue to execute step 4.
Step 4, calculating the absolute value | delta of the time sequence error t L, calculating the sampling probability P (t) and the importance weight ω t
Step 5, the information generated in the interaction is used as T t =(s t ,a t ,r t ,s t+1 Bol) form into an experience playback pool, and simultaneously stores T t As a training data set for the Actor and criticc networks.
And 6, sampling from the experience playback pool R according to the sampling probability through a prior experience playback technology to obtain small batch samples S, and training an Actor and a Critic network.
Step 7, calculating the gradient of the Critic network, and calculating the loss function L (theta) of the Critic network Q ):
Figure BDA0003888657560000081
Step 8, updating the parameter theta of the Critic network by using an adaptive matrix estimation algorithm (Adam) Q And calculating the gradient of the Actor network:
Figure BDA0003888657560000082
in the formula
Figure BDA0003888657560000083
Is the gradient operator, J is the objective function of the algorithm, a denotesAction, s denotes state.
Step 9, updating the parameter theta of the Actor network by using an adaptive matrix estimation algorithm (Adam) μ And updating target network parameters of the networks of the Actor and the Critic by using a soft updating method, namely updating the target networks of the Critic and the Actor in small amplitude in each time step:
Figure BDA0003888657560000084
in the formula, tau is an updating amplitude, and the default value is 0.001.
And 10, repeating the steps 2 to 9 until the maximum cycle number M, finishing training, and then storing and downloading the neural network parameters.
In a preferred embodiment of the present invention, the step four specifically is: and downloading the network parameters obtained by off-line training into a vehicle control unit of the hybrid electric vehicle to realize real-time on-line application.
The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can understand that the modifications or substitutions within the technical scope of the present invention are included in the scope of the present invention, and therefore, the scope of the present invention should be subject to the protection scope of the claims.

Claims (9)

1. A new energy automobile energy management and adaptive cruise cooperative optimization method is characterized by comprising the following steps:
step one, building a following model simulation environment, and preloading a battery characteristic curve and an optimal fuel economy curve as prior knowledge; inputting vehicle running data under a mixed working condition, and using the vehicle running data as training data of a pilot vehicle in a following model;
creating an Actor network and a criticic network based on a DDPG algorithm and a neural network structure, respectively creating a target network for the Actor network and the criticic network, constructing a training network of the energy management strategy of the hybrid electric vehicle, and constructing a total reward function of the energy management strategy of the hybrid electric vehicle;
step three, the intelligent agent interacts with the simulation environment, and based on the established Actor network, criticic network and reward function, the energy management strategy of the hybrid electric vehicle is trained offline through a DDPG algorithm to obtain sustainable neural network parameters;
and step four, downloading the inheritable network parameters obtained by the offline training into a vehicle control unit of the hybrid electric vehicle, and realizing real-time online application.
2. The new energy automobile energy management and adaptive cruise cooperative optimization method according to claim 1, characterized in that in the first step, SUMO software is used for building a car following model simulation environment, and the speed and the acceleration of a vehicle in a simulation scene are obtained and controlled through a Traci interaction interface.
3. The method for collaborative optimization of new energy vehicle energy management and adaptive cruise according to claim 1, wherein the prior knowledge includes: the battery characteristic curve is used for constructing a functional relation among the internal resistance, the open-circuit voltage and the SoC value, so that the SoC value of the battery at any time and in any state is solved; the optimal fuel economy curve is used for constructing a functional relation between the engine power and the rotating speed and the torque, so that the engine output power at any time and any state can be solved.
4. The method for collaborative optimization of new energy vehicle energy management and adaptive cruise according to claim 1, characterized in that the hybrid condition is composed of a highway condition and an urban road condition.
5. The method for energy management and adaptive cruise cooperative optimization of a new energy vehicle according to claim 3, wherein the SoC value of the hybrid vehicle at any time is obtained through the following equation:
Figure FDA0003888657550000011
wherein SoC is the state of charge, V OC Is an open circuit voltage, R 0 Is the internal resistance, P b Is the output energy, Q, at the charge-discharge stage 0 Is the initial capacity of the battery, Q is the nominal capacity of the battery, and I is the current of the battery at the present time.
6. The method for collaborative optimization of new energy automobile energy management and adaptive cruise control according to claim 1, characterized in that, in combination with distance between two vehicles, speed, acceleration and engine power in a follow-up model, state vector state and action vector action are respectively defined as follows:
Figure FDA0003888657550000021
wherein v is h Is the speed of the target vehicle; l is the distance between vehicles, namely the distance from the head of the target vehicle to the tail of the pilot vehicle; v. of p And a p Respectively the speed and acceleration of the pilot vehicle; a is h Is a control action of the following model, i.e. the acceleration of the target vehicle, e h Is the control action of the energy management strategy, i.e. the target vehicle engine power;
the reward function defining the following model is as follows:
r follow =r follow1 +r follow2
Figure FDA0003888657550000022
Figure FDA0003888657550000023
wherein L is min And L max Are the minimum and maximum values of the respective vehicle separation, TTC is the time before collision; jerk is the acceleration rate of the target vehicle at the sampling time;
the reward function that defines the energy management policy is as follows:
r energy =-[fuel+250(SoC ref -SoC) 2 ]
wherein fuel is the fuel consumption of the target vehicle at the sampling time, soC ref Is the nominal SoC value of the battery;
the total reward function defining the energy management strategy of the hybrid electric vehicle is as follows:
reward=r follow +r energy
7. the method for collaborative optimization of new energy automobile energy management and adaptive cruise control according to claim 1, characterized by constructing an Actor network denoted as μ (s | θ) μ ) Wherein θ μ The method comprises the steps that network parameters are adopted, the input of an Actor network is a current state s, and the output is a deterministic action a; constructing a Critic network, denoted as Q (s, a | θ) Q ),θ Q The method comprises the steps that network parameters are input into a Critic network, the current state s and the deterministic action a output from an Actor network are input into the Critic network, and the output is a value function and gradient information;
respectively establishing target networks mu' (s | theta) of Actor network and Critic network μ′ )、Q′(s,a|θ Q′ ) Target network μ' (s | θ) μ′ )、Q′(s,a|θ Q′ ) Respectively with the corresponding network mu (s | theta) μ )、Q(s,a|θ Q ) Same, remember θ μ′ Is a parameter of the target network of the Actor network, θ Q′ A parameter of a target network which is a Critic network; and (3) training the energy management strategy of the hybrid electric vehicle by applying the constructed target networks of the Actor and the Critic network.
8. The method for collaborative optimization of new energy vehicle energy management and adaptive cruise according to claim 1, characterized by: and in the third step, the intelligent agent interacts with the simulation environment, acquires the current environment state information, selects and executes actions according to the strategy, enters a new environment state, acquires rewards fed back by the simulation environment, stores the state, the actions and the reward information at the same time, and realizes the training of the hybrid power energy management strategy through an experience playback pool.
9. The method for collaborative optimization of new energy vehicle energy management and adaptive cruise according to claim 1, characterized by: in the third step, the offline training of the energy management strategy of the hybrid electric vehicle adopts a prior experience playback technology, and the specific training steps are as follows:
step 1, initializing an Actor network, a Critic network and a target network thereof; defining a storage space R as an experience playback pool and initializing;
step 2, introducing action noise Z at time t by using Laplace random distribution t To find a potentially better strategy;
step 3, combining the state s at the time t according to the action strategy t And Laplace random noise to obtain the motion vector a at the time t t ={a h ,e h And that is: a is a t =μ(s tμ )+Z t (ii) a Performing motion vector a t Obtaining the total reward r of the current time t And state s at time t +1 t+1 (ii) a Judging whether the current cycle is ended, if the pool value is true, ending the current cycle, and returning to execute the step 2; if the pool value is false, continuing to execute the step 4;
step 4, according to the absolute value | delta of the time sequence error t I calculating the sampling probability P (t) and the importance weight omega t
δ t =y t -Q(s t ,a tQ )
Wherein:
y t =r t +γQ′[s t+1 ,μ′(s t+1μ′ )|θ Q′ ]
where γ is the attenuation ratio, y t Is the target Q value at time t;
the absolute value of the timing error | delta t I is sorted from big to small, rank (t) is marked as its serial number, according to which the experience p is defined t The priority of (2):
Figure FDA0003888657550000031
define the experience p accordingly t Sampling probability of (2):
Figure FDA0003888657550000032
where n is the size of the empirical playback pool and α represents the degree of control priority usage;
defining sampling importance weights:
Figure FDA0003888657550000033
wherein p is min Represents p t β is the annealing index;
step 5, the experience playback pool adopts a binary tree data structure, and the information generated in the interaction is processed by T t =(s t ,a t ,r t ,s t+1 Bol) form into the end leaves and storing T at the same time t As a training data set of the Actor and criticc networks;
step 6, sampling is carried out from the experience playback pool R according to the sampling probability through a prior experience playback technology, a small batch of samples S are obtained, the number of the samples is recorded as N, and the samples are used for training an Actor and a Critic network;
step 7, calculating the gradient of the Critic network, and calculating the loss function L (theta) of the Critic network Q ):
Figure FDA0003888657550000041
Step 8, updating the parameter theta of the Critic network by using the adaptive matrix estimation algorithm Adam Q And calculating the gradient of the Actor network:
Figure FDA0003888657550000042
in the formula
Figure FDA0003888657550000043
Is a gradient operator, J is an objective function of the DDPG algorithm, a represents an action, and s represents a state;
step 9, updating the parameter theta of the Actor network by using the adaptive matrix estimation algorithm Adam μ And updating target network parameters of the Actor and the Critic networks by using a soft updating method, namely updating the target networks of the Critic and the Actor by a set amplitude tau in each time step:
Figure FDA0003888657550000044
and 10, repeating the steps 2 to 9 until the preset maximum iteration number is reached, finishing the training, and then storing and downloading the neural network parameters.
CN202211253311.XA 2022-10-13 2022-10-13 New energy automobile energy management and adaptive cruise cooperative optimization method Pending CN115563716A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211253311.XA CN115563716A (en) 2022-10-13 2022-10-13 New energy automobile energy management and adaptive cruise cooperative optimization method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211253311.XA CN115563716A (en) 2022-10-13 2022-10-13 New energy automobile energy management and adaptive cruise cooperative optimization method

Publications (1)

Publication Number Publication Date
CN115563716A true CN115563716A (en) 2023-01-03

Family

ID=84744500

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211253311.XA Pending CN115563716A (en) 2022-10-13 2022-10-13 New energy automobile energy management and adaptive cruise cooperative optimization method

Country Status (1)

Country Link
CN (1) CN115563716A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117708999A (en) * 2024-02-06 2024-03-15 北京航空航天大学 Scene-oriented hybrid electric vehicle energy management strategy evaluation method
CN117807714A (en) * 2024-01-05 2024-04-02 重庆大学 Adaptive online lifting method for deep reinforcement learning type control strategy
US12030657B1 (en) 2023-10-27 2024-07-09 Rtx Corporation System and methods for power split algorithm design for aircraft hybrid electric propulsion based on combined actor-critic RL agent and control barrier function filter

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12030657B1 (en) 2023-10-27 2024-07-09 Rtx Corporation System and methods for power split algorithm design for aircraft hybrid electric propulsion based on combined actor-critic RL agent and control barrier function filter
CN117807714A (en) * 2024-01-05 2024-04-02 重庆大学 Adaptive online lifting method for deep reinforcement learning type control strategy
CN117708999A (en) * 2024-02-06 2024-03-15 北京航空航天大学 Scene-oriented hybrid electric vehicle energy management strategy evaluation method
CN117708999B (en) * 2024-02-06 2024-04-09 北京航空航天大学 Scene-oriented hybrid electric vehicle energy management strategy evaluation method

Similar Documents

Publication Publication Date Title
CN111731303B (en) HEV energy management method based on deep reinforcement learning A3C algorithm
CN111267831B (en) Intelligent time-domain-variable model prediction energy management method for hybrid electric vehicle
CN111845701B (en) HEV energy management method based on deep reinforcement learning in car following environment
Liessner et al. Deep reinforcement learning for advanced energy management of hybrid electric vehicles.
Xu et al. Look-ahead prediction-based real-time optimal energy management for connected HEVs
CN110936949B (en) Energy control method, equipment, storage medium and device based on driving condition
Li et al. Power management for a plug-in hybrid electric vehicle based on reinforcement learning with continuous state and action spaces
CN115495997A (en) New energy automobile ecological driving method based on heterogeneous multi-agent deep reinforcement learning
CN115793445B (en) Hybrid electric vehicle control method based on multi-agent deep reinforcement learning
CN112498334B (en) Robust energy management method and system for intelligent network-connected hybrid electric vehicle
CN114852105A (en) Method and system for planning track change of automatic driving vehicle
CN115563716A (en) New energy automobile energy management and adaptive cruise cooperative optimization method
CN115107733A (en) Energy management method and system for hybrid electric vehicle
Zhang et al. Driving behavior oriented torque demand regulation for electric vehicles with single pedal driving
CN115805840A (en) Energy consumption control method and system for range-extending type electric loader
CN114969982A (en) Fuel cell automobile deep reinforcement learning energy management method based on strategy migration
CN116861791A (en) Energy saving and emission reduction energy management method based on enhanced TD3 algorithm
CN110641470A (en) Pure electric vehicle driving auxiliary system optimization method integrating driver preference
Guo et al. Modeling, learning and prediction of longitudinal behaviors of human-driven vehicles by incorporating internal human DecisionMaking process using inverse model predictive control
Liu et al. Adaptive eco-driving of fuel cell vehicles based on multi-light trained deep reinforcement learning
Jayanthi et al. Powell Metaheuristic Cat Swarm optimized Sugeno Fuzzy Controller based Deep Belief Network for energy management in Hybrid electric vehicles
Liu et al. Integrated longitudinal speed decision-making and energy efficiency control for connected electrified vehicles
CN113859214B (en) Method and device for controlling dynamic energy efficiency of engine of hybrid power system
CN117184095B (en) Hybrid electric vehicle system control method based on deep reinforcement learning
Yuxing et al. Research on Driving Control Strategy of Electric Racing Car Based on Pattern Recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination