CN113627993A - Intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning - Google Patents

Intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning Download PDF

Info

Publication number
CN113627993A
CN113627993A CN202110989593.9A CN202110989593A CN113627993A CN 113627993 A CN113627993 A CN 113627993A CN 202110989593 A CN202110989593 A CN 202110989593A CN 113627993 A CN113627993 A CN 113627993A
Authority
CN
China
Prior art keywords
action
network
discharging
time
electric vehicle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110989593.9A
Other languages
Chinese (zh)
Inventor
姚翰林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeastern University Qinhuangdao Branch
Original Assignee
Northeastern University Qinhuangdao Branch
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeastern University Qinhuangdao Branch filed Critical Northeastern University Qinhuangdao Branch
Priority to CN202110989593.9A priority Critical patent/CN113627993A/en
Publication of CN113627993A publication Critical patent/CN113627993A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0206Price or cost determination based on market factors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Economics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Engineering & Computer Science (AREA)
  • Marketing (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Primary Health Care (AREA)
  • Human Resources & Organizations (AREA)
  • Water Supply & Treatment (AREA)
  • Public Health (AREA)
  • Game Theory and Decision Science (AREA)
  • Electric Propulsion And Braking For Vehicles (AREA)
  • Charge And Discharge Circuits For Batteries Or The Like (AREA)

Abstract

The invention provides an intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning, and relates to the technical field of electric vehicle charging and discharging. The machine learning algorithm based on data driving can be used for solving a complex optimization problem, prior knowledge of a system is not needed, dynamic learning of a historical running state can be completed based on function iteration, and an optimal charging and discharging plan is obtained based on experience accumulation and return analysis. From the perspective of a user, an MDP with unknown transition probability is constructed and used for describing the charging and discharging scheduling problem of the electric automobile. The randomness of electricity prices and commuting behaviors are considered to describe an actual scene; the method does not need any system model information to determine the optimal decision of the real-time decision problem; the electricity price is predicted iteratively by using the single-step prediction LSTM network, and the prediction precision is higher compared with that of a traditional time series prediction method (Arima).

Description

Intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning
Technical Field
The invention relates to the technical field of electric vehicle charging and discharging, in particular to an intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning.
Background
With the improvement of the living standard of residents and the intelligent level of electric automobiles, more factors, such as accumulated charging and discharging cost and user using satisfaction, can be considered when people charge and discharge the electric automobiles. For many households with electric vehicles, if the electric vehicle is selected to be charged at the maximum power when parked, although the user satisfaction is high, the charging cost is high; although the charging cost can be reduced if charging at a low electricity price is always selected, user satisfaction may be reduced if the electric vehicle is not fully charged when it leaves.
The electric automobile is different from other controllable loads and energy storage equipment, the travel requirements of various users need to be met firstly, the regulation and control of the electric automobile must be on the premise of meeting the travel and charge-discharge willingness of the users, and under the actual operation conditions that wind power output and photovoltaic output are uncertain, the load of the users fluctuates randomly, the topological structure of a power distribution network is flexible and changeable and the like, the traditional modeling optimization method is difficult to find the optimal solution of the charge-discharge electric quantity of the electric automobile which accords with the reality.
Because the energy consumption and electricity prices of electric vehicles are dynamic and time-varying due to arrival and departure times of electric vehicles, it becomes challenging to efficiently manage Electric Vehicle (EV) charging/discharging to reduce costs. In recent years, many day-ahead scheduling methods have been proposed for this problem. Although some success has been achieved in day-ahead charge/discharge scheduling, they are not suitable for real-time scenarios. A real-time scheduling strategy capable of responding to dynamic charging demand and time-varying electricity prices has recently attracted much attention, and the scheduling problem can be expressed as a model-based control problem, however, the following limitations are prevalent in the actual modeling process: 1. the fluctuating electricity price in hours has certain flexibility, and there is a great delay in transmitting the actual electricity price to the electric vehicle for control. 2. Most load control methods need to accurately model electric vehicles, and because the internal structures of the electric vehicles are different, different types of electric vehicles have different models, many parameters of the electric vehicles need to be known in advance in the modeling, the modeling is complex, and the respective modeling difficulty is high. Therefore, the model-free intelligent electric vehicle charging and discharging decision method based on the predicted electricity price and suitable for different types of electric vehicles is of great significance for solving the problems of high flexibility of the electricity price and poor applicability of the different types of electric vehicles.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides an intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning. The machine learning algorithm based on data driving can be used for solving a complex optimization problem, prior knowledge of a system is not needed, dynamic learning of a historical running state can be completed based on function iteration, and an optimal charging and discharging plan is obtained based on experience accumulation and return analysis.
In order to solve the technical problems, the invention adopts the following technical scheme:
an intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning specifically comprises the following steps:
step 1: collecting the electricity price data of the past 24 hours;
step 2: using a single-step prediction LSTM network to perform iterative prediction on the electricity price of 24 hours in the future;
step 2.1: the LSTM network is expanded into 23 layers of neural networks, and the same weight parameters are used for each layer;
step 2.2: let the input d of the first layert-22=pt-22-pt-23Wherein p ist-22And pt-23Representing the electricity prices, y, of time steps t-22 and t-23, respectivelyt-22Representing the output of the first layer, W and R are weight matrices of the LSTM gate structure; and ct-22Indicating a cell state including past power rate information;
step 2.3: y containing past electricity rate informationt-22And ct-22The electricity prices are sequentially transmitted to the second layer to the last layer, namely the electricity prices of 1 time step in the future are predicted;
step 2.4: repeating the steps 2.1-2.3 until the electricity price of the future 24 hours is predicted in an iterative mode;
and step 3: a DQN reinforcement learning method is introduced, an intelligent agent for controlling the load of the electric automobile is trained based on a neural network, and the optimal control decision is obtained by automatically learning the optimization process of the charge and discharge decision of the electric automobile through the observation of the predicted electricity price and the SOC of the residual electric quantity of the battery in the current hour and the obtained reward;
step 3.1:initializing an experience pool D and estimating action value network parameters QθAnd target action value network parameters
Figure BDA0003231825770000021
Respectively selecting an initial SOC and arrival and departure times of the electric automobile from the truncated normal distribution;
step 3.2: the intelligent load control body of the electric automobile has 7 power selections, and the action space is recorded as A ═ 6kW, 4kW, 2kW, 0kW, -2kW, -4kW, -6kW]The electric automobile selects the action arg max by a greedy algorithm according to the probability of epsilonaQ(StA; theta), randomly choosing action a with a probability of (1-epsilon)t. Wherein s istIs the environment at time step t, a is the environment stA next selectable action, θ represents a parameter of the Q network;
step 3.3: t the observed value of the state of the time step is st=(ut,Et,Pt-23,...,Pt) Wherein parameter (P)t-23,...,Pt) Representing the electricity price at t hour at t time step, parameter EtRepresents the remaining energy in the battery of the electric vehicle, utIndicates whether the EV is at home;
step 3.4: state transition st+1=f(st,at),EtThe state transition of (1) is from action a at time step ttControl by a deterministic battery model Et+1=Et+atRepresents; for utAnd PtSince the arrival time, the departure time and the next hour of electricity price are unknown, the state transition has randomness;
step 3.5: reward function rt=-(d*n*at*p)/10-λ*((1-soc)*C)2Wherein d is the proportion of the time occupied by the complete charge and discharge of the vehicle; during discharging, d is (soc/(rate/C), and during charging, d is (1-soc)/(rate/C), wherein rate is charging and discharging power, n is a time step, where 1 is taken as one hour, p is real-time electricity price, and λ is a penalty coefficient and is a decrement term; c is the battery capacity, and a penalty term is included if t +1 is the last time step and the electric vehicle is not fully charged; soc is the battery occupied by the residual electric quantity of the battery of the electric automobileThe ratio of the capacities;
step 3.6: will(s)t,at,rt,st+1) Recording the data into an experience pool D;
step 3.7: random selection of minimum batch sample quadruples from replay memory D
Figure BDA0003231825770000031
Wherein # F is the number of tuples of the minimum batch sampling, j is 1, 2., # F;
step 3.8: network parameters according to target action values
Figure BDA0003231825770000032
Calculating a target action value independent of the estimated value network parameter theta
Figure BDA0003231825770000033
Is calculated as
Figure BDA0003231825770000034
Wherein
Figure BDA0003231825770000035
Parameters of the target network; gamma is the discount coefficient, and the value range [0,1 ]]Q is an action value;
step 3.9: minimizing loss function
Figure BDA0003231825770000036
Carrying out back propagation by a gradient descent method to update the parameters of the estimated value network theta;
step 3.10: repeating the step 3.2-3.9, and copying the estimated action value network parameter to the target action value network parameter every other set step to update the target action value network parameter;
step 3.11: repeating steps 3.1-3.10 until learning a strategy pi that maximizes the cumulative prize value R, wherein
Figure BDA0003231825770000037
Step 3.12: optimal selection among known DQNActing as atUnder the competitive structure, the action value function decomposes to: q (s, a) ═ v(s) + a (s, a) at this time, optimum action a*Argmax (a (s, a)). Where V(s) is the state cost function and A (s, a) is the action dominance function.
The invention has the following beneficial effects:
the invention provides an intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning, which has the following beneficial effects:
1. from the perspective of a user, the invention constructs an MDP with unknown transition probability for describing the charging and discharging scheduling problem of the electric automobile. The randomness of electricity prices and commuting behaviors are considered to describe an actual scene;
2. the method does not need any system model information to determine the optimal decision of the real-time decision problem;
3. the electricity price is iteratively predicted by using the single-step prediction LSTM network, and the prediction precision is higher compared with that of a traditional time series prediction method (Arima);
4. a competition structure is added at the DQN output end, the Q value is divided into the sum of a state cost function and an advantage function of specific action in a related state, the problem of overestimation of the DQN value function is effectively solved, the generalization capability of a model is enhanced, and the problems of noise and instability caused by large absolute value difference of the Q function in action and state dimensions in a traditional DQN algorithm are solved.
Drawings
FIG. 1 is a general flow chart of the intelligent electric vehicle charging and discharging decision method according to the present invention;
FIG. 2 is a block diagram of an LSTM network according to an embodiment of the present invention;
FIG. 3 is a graph of the training effect of DQN and a competition depth Q network (dulling-DQN) in the embodiment of the present invention;
FIG. 4 is a graph of the cumulative charge-discharge costs of DQN and dulling-DQN in the example of the present invention.
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
An intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning is shown in fig. 1 and specifically comprises the following steps:
step 1: collecting the electricity price data of the past 24 hours;
step 2: using a single-step prediction LSTM network to perform iterative prediction on the electricity price of 24 hours in the future, as shown in FIG. 2;
step 2.1: the LSTM network is expanded into 23 layers of neural networks, and the same weight parameters are used for each layer;
step 2.2: let the input d of the first layert-22=pt-22-pt-23Wherein p ist-22And pt-23Representing the electricity prices, y, of time steps t-22 and t-23, respectivelyt-22Representing the output of the first layer, W and R are weight matrices of the LSTM gate structure; and ct-22Indicating a cell state including past power rate information;
step 2.3: y containing past electricity rate informationt-22And ct-22The electricity prices are sequentially transmitted to the second layer to the last layer, namely the electricity prices of 1 time step in the future are predicted;
step 2.4: repeating the steps 2.1-2.3 until the electricity price of the future 24 hours is predicted in an iterative mode;
and step 3: a DQN reinforcement learning method is introduced, an intelligent agent for controlling the load of the electric automobile is trained based on a neural network, and the optimal control decision is obtained by automatically learning the optimization process of the charge and discharge decision of the electric automobile through the observation of the predicted electricity price and the SOC of the residual electric quantity of the battery in the current hour and the obtained reward;
step 3.1: initializing an experience pool D and estimating action value network parameters QθAnd target action value network parameters
Figure BDA0003231825770000041
Respectively selecting an initial SOC and arrival and departure times of the electric automobile from the truncated normal distribution;
step 3.2: the intelligent load control body of the electric automobile has 7 power optionsTaking, and recording the action space as A ═ 6kW, 4kW, 2kW, 0kW, -2kW, -4kW, -6kW]The electric automobile selects the action arg max by a greedy algorithm according to the probability of epsilona Q(StA; theta), randomly choosing action a with a probability of (1-epsilon)t. Wherein s istIs the environment at time step t, a is the environment stA next selectable action, θ represents a parameter of the Q network;
step 3.3: t the observed value of the state of the time step is st=(ut,Et,Pt-23,...,Pt) Wherein parameter (P)t-23,...,Pt) Representing the electricity price at t hour at t time step, parameter EtRepresents the remaining energy in the battery of the electric vehicle, utIndicates whether the EV is at home;
step 3.4: state transition st+1=f(st,at),EtThe state transition of (1) is from action a at time step ttControl by a deterministic battery model Et+1=Et+atRepresents; for utAnd PtSince the arrival time, the departure time and the next hour of electricity price are unknown, the state transition has randomness;
step 3.5: reward function rt=-(d*n*at*p)/10-λ*((1-soc)*C)2Wherein d is the proportion of the time occupied by the complete charge and discharge of the vehicle; during discharging, d is (soc/(rate/C), and during charging, d is (1-soc)/(rate/C), wherein rate is charging and discharging power, n is a time step, where 1 is taken as one hour, p is real-time electricity price, and λ is a penalty coefficient and is a decrement term; c is the battery capacity, and a penalty term is included if t +1 is the last time step and the electric vehicle is not fully charged; the soc is the proportion of the residual electric quantity of the battery of the electric automobile to the capacity of the battery;
step 3.6: will(s)t,at,rt,st+1) Recording the data into an experience pool D;
step 3.7: random selection of minimum batch sample quadruples from replay memory D
Figure BDA0003231825770000051
Wherein # F is the mostThe number of tuples sampled in small batches;
step 3.8: network parameters according to target action values
Figure BDA0003231825770000052
Calculating a target action value independent of the estimated value network parameter theta
Figure BDA0003231825770000053
Is calculated as
Figure BDA0003231825770000054
Wherein
Figure BDA0003231825770000055
Parameters of the target network; gamma is the discount coefficient, and the value range [0,1 ]]Penalizing on a time basis to achieve better performance, if γ equals 0, meaning only looking at the current reward; if γ is equal to 1, meaning that the environment is determined, the same action will always receive the same reward; q is an action value;
step 3.9: minimizing loss function
Figure BDA0003231825770000056
Carrying out back propagation by a gradient descent method to update the parameters of the estimated value network theta;
step 3.10: repeating the step 3.2-3.9, and copying the estimated action value network parameter to the target action value network parameter every other set step to update the target action value network parameter;
step 3.11: repeating steps 3.1-3.10 until learning a strategy pi that maximizes the cumulative prize value R, wherein
Figure BDA0003231825770000057
Step 3.12: the best action among the known DQNs is selected as atUnder the competitive structure, the action value function decomposes to: q (s, a) ═ v(s) + a (s, a) at this time, optimum action a*Argmax (a (s, a)). Where V(s) is the state cost function and A (s, a) is the action dominance function.
The method is divided into two stages, wherein the first stage is an LSTM network electricity price forecasting stage, and the second stage is a DQN method training agent to obtain an optimal strategy stage.
In the present invention, the electricity price trend is captured by the LSTM network. Its input is the electricity prices of the past 24 time steps and its output is the electricity prices of the future 1 time step. The idea behind LSTM networks is to utilize sequential information such as real-time electricity prices. The LSTM network performs the same processing on each element of the sequence, the output of which depends on the previous calculation. The calculated information may be stored in the LSTM unit so far. For this EV charging scheduling problem, the LSTM network will be expanded to a 23-layer neural network. Specifically, the input to the first layer is dt-22=pt-22-pt-23Wherein p ist-22And pt-23Representing the electricity prices for time steps t-22 and t-23, respectively. W and R represent the respective parameters shared between all layers. y ist-22Represents the output of the first layer, and ct-22Indicating its cell state. Y containing past electricity rate informationt-22And ct-22Is passed to the second layer and the process is repeated until the last layer. It can be seen from the expanded view that the output of each layer can be passed to the next neuron. The weight parameters used by each layer after the LSTM neural network is expanded are the same, which greatly simplifies the difficulty of the neural network parameter training.
Output y of the LSTM networktIn series with a scalar battery SOC. These series characteristics contain information on predicted future rates of electricity and battery SOC. Information on future electricity prices is important to reduce charging costs, while information on battery SOC is important to ensure that the EV is fully charged. Then, these series characteristics are fed into the competitive Q network to obtain the action dominance function A (s, a) corresponding to each actioni) Choose the optimal action a*The optimal charge and discharge plan is obtained as argmax (a (s, a)).
In reinforcement learning, it is necessary to estimate the value of each state, but it is not necessary to estimate the value of each action for many states. Representation and status of status value by competing network structureThe following action advantages were evaluated separately. State-action cost function Qπ(s, a) represents the expected reward value for selecting action a by policy π in state s, the value of state Vπ(s) represents the value of state s, which is the expected value of all actions worth produced by strategy π in that state, and the difference between the two represents the advantage of selecting action a in state s, defined as
Aπ(s,a)=Qπ(s,a)-Vπ(s)
Thus, there are two data streams for the contention network, one outputting a state value V (s; θ, β) and the other outputting an action dominance A (s, a; θ, α). Wherein, theta represents a network neuron parameter for performing feature processing on an input layer, and the parameter is the weight of each layer of the network in the neural network; α, β are parameters of the two streams, respectively. The output of the deep Q network adopting the competition network structure is
Q(s,a;θ,α,β)=V(s;θ,β)+A(s,a;θ,α)
When a competitive network structure is actually applied, the average value of the motion advantages is usually used for solving the maximum value of the motion advantages in the calculation of the Q value, so that the performance is ensured, and the optimization stability is improved. As can be seen from fig. 3, dulling-DQN has less overall loss fluctuation and converges faster than DQN.
As can be seen from fig. 4, the charging and discharging costs using the dulling-DQN method are generally lower than the DQN method for randomly selected 100 days.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions and scope of the present invention as defined in the appended claims.

Claims (3)

1. An intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning is characterized by comprising the following steps:
step 1: collecting the electricity price data of the past 24 hours;
step 2: using a single-step prediction LSTM network to perform iterative prediction on the electricity price of 24 hours in the future;
and step 3: a DQN reinforcement learning method is introduced, an intelligent agent for controlling the load of the electric automobile is trained based on a neural network, the optimization process of the charge and discharge decision of the electric automobile is automatically learned through the observation of the predicted electricity price and the SOC of the battery residual electricity and the obtained reward in the current hour, and the optimal control decision is obtained.
2. The intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning of claim 1, wherein the step 2 specifically comprises the following steps:
step 2.1: the LSTM network is expanded into 23 layers of neural networks, and the same weight parameters are used for each layer;
step 2.2: let the input d of the first layert-22=pt-22-pt-23Wherein p ist-22And pt-23Representing the electricity prices, y, of time steps t-22 and t-23, respectivelyt-22Representing the output of the first layer, W and R are weight matrices of the LSTM gate structure; and ct-22Indicating a cell state including past power rate information;
step 2.3: y containing past electricity rate informationt-22And ct-22The electricity prices are sequentially transmitted to the second layer to the last layer, namely the electricity prices of 1 time step in the future are predicted;
step 2.4: and repeating the steps 2.1-2.3 until the future 24-hour electricity price is predicted in an iterative mode.
3. The intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning of claim 1, wherein the step 3 specifically comprises the following steps:
step 3.1: initializing an experience pool D and estimating action value network parameters QθAnd target action value network parameters
Figure FDA0003231825760000011
Respectively selecting an initial SOC and arrival and departure times of the electric automobile from the truncated normal distribution;
step 3.2: the intelligent load control body of the electric automobile has 7 power selections, and the action space is recorded as A ═ 6kW, 4kW, 2kW, 0kW, -2kW, -4kW, -6kW]The electric automobile selects the action argmax by a greedy algorithm according to the probability of epsilonaQ(StA; theta), randomly choosing action a with a probability of (1-epsilon)tWherein s istIs the environment at time step t, a is the environment stA next selectable action, θ represents a parameter of the Q network;
step 3.3: t the observed value of the state of the time step is st=(ut,Et,Pt-23,...,Pt) Wherein parameter (P)t-23,...,Pt) Representing the electricity price at t hour at t time step, parameter EtRepresents the remaining energy in the battery of the electric vehicle, utIndicates whether the EV is at home;
step 3.4: state transition st+1=f(st,at),EtThe state transition of (1) is from action a at time step ttControl by a deterministic battery model Et+1=Et+atRepresents; for utAnd PtSince the arrival time, the departure time and the next hour of electricity price are unknown, the state transition has randomness;
step 3.5: reward function rt=-(d*n*at*p)/10-λ*((1-soc)*C)2Wherein d is the proportion of the time occupied by the complete charge and discharge of the vehicle; during discharging, d is (soc/(rate/C), and during charging, d is (1-soc)/(rate/C), wherein rate is charging and discharging power, n is a time step, where 1 is taken as one hour, p is real-time electricity price, and λ is a penalty coefficient and is a decrement term; c is the battery capacity, and a penalty term is included if t +1 is the last time step and the electric vehicle is not fully charged; the soc is the proportion of the residual electric quantity of the battery of the electric automobile to the capacity of the battery;
step 3.6: will(s)t,at,rt,st+1) Recording the data into an experience pool D;
step (ii) of3.7: random selection of minimum batch sample quadruples from replay memory D
Figure FDA0003231825760000021
Wherein # F is the number of tuples of the minimum batch sampling, j is 1, 2., # F;
step 3.8: network parameters according to target action values
Figure FDA0003231825760000022
Calculating a target action value independent of the estimated value network parameter theta
Figure FDA0003231825760000023
Is calculated as
Figure FDA0003231825760000024
Wherein
Figure FDA0003231825760000025
Parameters of the target network; gamma is the discount coefficient, and the value range [0,1 ]]Q is an action value;
step 3.9: minimizing loss function
Figure FDA0003231825760000026
Carrying out back propagation by a gradient descent method to update the parameters of the estimated value network theta;
step 3.10: repeating the step 3.2-3.9, and copying the estimated action value network parameter to the target action value network parameter every other set step to update the target action value network parameter;
step 3.11: repeating steps 3.1-3.10 until learning a strategy pi that maximizes the cumulative prize value R, wherein
Figure FDA0003231825760000027
Step 3.12: the best action among the known DQNs is selected as atUnder the competition structure, the action value function decomposes to: q (s, a) ═ v(s) + a (s, a) at this timeOptimum action a*Argmax (a (s, a)), where v(s) is the state cost function and a (s, a) is the action dominance function.
CN202110989593.9A 2021-08-26 2021-08-26 Intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning Pending CN113627993A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110989593.9A CN113627993A (en) 2021-08-26 2021-08-26 Intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110989593.9A CN113627993A (en) 2021-08-26 2021-08-26 Intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning

Publications (1)

Publication Number Publication Date
CN113627993A true CN113627993A (en) 2021-11-09

Family

ID=78387939

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110989593.9A Pending CN113627993A (en) 2021-08-26 2021-08-26 Intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN113627993A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114139653A (en) * 2021-12-15 2022-03-04 中国人民解放军国防科技大学 Intelligent agent strategy obtaining method based on adversary action prediction and related device
CN114254765A (en) * 2022-03-01 2022-03-29 之江实验室 Active sequence decision method, device and medium for simulation deduction
CN114844083A (en) * 2022-05-27 2022-08-02 深圳先进技术研究院 Electric vehicle cluster charging and discharging management method for improving stability of energy storage system
CN114997935A (en) * 2022-07-19 2022-09-02 东南大学溧阳研究院 Electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization
CN115293100A (en) * 2022-09-30 2022-11-04 深圳市威特利电源有限公司 Accurate evaluation method for residual electric quantity of new energy battery
CN117863948A (en) * 2024-01-17 2024-04-12 广东工业大学 Distributed electric vehicle charging control method and device for auxiliary frequency modulation

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110276638A (en) * 2019-05-29 2019-09-24 南京邮电大学 A kind of Electricity price forecasting solution and system based on two-way shot and long term neural network
CN110535157A (en) * 2018-05-24 2019-12-03 三菱电机(中国)有限公司 The discharge control device and discharge control method of electric car
CN110588432A (en) * 2019-08-27 2019-12-20 深圳市航通北斗信息技术有限公司 Electric vehicle, battery management method thereof and computer-readable storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110535157A (en) * 2018-05-24 2019-12-03 三菱电机(中国)有限公司 The discharge control device and discharge control method of electric car
CN110276638A (en) * 2019-05-29 2019-09-24 南京邮电大学 A kind of Electricity price forecasting solution and system based on two-way shot and long term neural network
CN110588432A (en) * 2019-08-27 2019-12-20 深圳市航通北斗信息技术有限公司 Electric vehicle, battery management method thereof and computer-readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHIQIANG WAN 等: "Model-Free Real-Time EV Charging Scheduling Based on Deep Reinforcement Learning", 《IEEE TRANSACTIONS ON SMART GRID》, vol. 10, no. 5, pages 5246 - 5257, XP011741247, DOI: 10.1109/TSG.2018.2879572 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114139653A (en) * 2021-12-15 2022-03-04 中国人民解放军国防科技大学 Intelligent agent strategy obtaining method based on adversary action prediction and related device
CN114254765A (en) * 2022-03-01 2022-03-29 之江实验室 Active sequence decision method, device and medium for simulation deduction
CN114844083A (en) * 2022-05-27 2022-08-02 深圳先进技术研究院 Electric vehicle cluster charging and discharging management method for improving stability of energy storage system
CN114844083B (en) * 2022-05-27 2023-02-17 深圳先进技术研究院 Electric automobile cluster charging and discharging management method for improving stability of energy storage system
CN114997935A (en) * 2022-07-19 2022-09-02 东南大学溧阳研究院 Electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization
CN114997935B (en) * 2022-07-19 2023-04-07 东南大学溧阳研究院 Electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization
CN115293100A (en) * 2022-09-30 2022-11-04 深圳市威特利电源有限公司 Accurate evaluation method for residual electric quantity of new energy battery
CN117863948A (en) * 2024-01-17 2024-04-12 广东工业大学 Distributed electric vehicle charging control method and device for auxiliary frequency modulation

Similar Documents

Publication Publication Date Title
CN113627993A (en) Intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning
CN109347149B (en) Micro-grid energy storage scheduling method and device based on deep Q-value network reinforcement learning
CN110341690B (en) PHEV energy management method based on deterministic strategy gradient learning
CN112117760A (en) Micro-grid energy scheduling method based on double-Q-value network deep reinforcement learning
CN111934335A (en) Cluster electric vehicle charging behavior optimization method based on deep reinforcement learning
CN110659796B (en) Data acquisition method in rechargeable group vehicle intelligence
CN113515884A (en) Distributed electric vehicle real-time optimization scheduling method, system, terminal and medium
CN112614009A (en) Power grid energy management method and system based on deep expected Q-learning
CN113572157B (en) User real-time autonomous energy management optimization method based on near-end policy optimization
CN114997631B (en) Electric vehicle charging scheduling method, device, equipment and medium
CN114997935B (en) Electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization
CN112491094B (en) Hybrid-driven micro-grid energy management method, system and device
CN111833205B (en) Intelligent scheduling method for mobile charging pile group under big data scene
CN117057553A (en) Deep reinforcement learning-based household energy demand response optimization method and system
CN115587645A (en) Electric vehicle charging management method and system considering charging behavior randomness
CN113110052A (en) Hybrid energy management method based on neural network and reinforcement learning
CN114619907B (en) Coordinated charging method and coordinated charging system based on distributed deep reinforcement learning
CN116683513A (en) Method and system for optimizing energy supplement strategy of mobile micro-grid
CN116436019B (en) Multi-resource coordination optimization method, device and storage medium
CN117543581A (en) Virtual power plant optimal scheduling method considering electric automobile demand response and application thereof
CN114611811B (en) Low-carbon park optimal scheduling method and system based on EV load participation
CN113555888B (en) Micro-grid energy storage coordination control method
CN114742453A (en) Micro-grid energy management method based on Rainbow deep Q network
CN114154718A (en) Day-ahead optimization scheduling method of wind storage combined system based on energy storage technical characteristics
CN114030386A (en) Electric vehicle charging control method based on user charging selection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination