CN113627993A - Intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning - Google Patents
Intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning Download PDFInfo
- Publication number
- CN113627993A CN113627993A CN202110989593.9A CN202110989593A CN113627993A CN 113627993 A CN113627993 A CN 113627993A CN 202110989593 A CN202110989593 A CN 202110989593A CN 113627993 A CN113627993 A CN 113627993A
- Authority
- CN
- China
- Prior art keywords
- action
- network
- discharging
- time
- electric vehicle
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000007599 discharging Methods 0.000 title claims abstract description 28
- 230000002787 reinforcement Effects 0.000 title claims abstract description 15
- 230000005611 electricity Effects 0.000 claims abstract description 54
- 230000006870 function Effects 0.000 claims abstract description 20
- 230000007704 transition Effects 0.000 claims abstract description 11
- 238000005457 optimization Methods 0.000 claims abstract description 7
- 230000009471 action Effects 0.000 claims description 61
- 238000013528 artificial neural network Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 5
- 230000001186 cumulative effect Effects 0.000 claims description 4
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 2
- 238000009825 accumulation Methods 0.000 abstract description 2
- 230000006399 behavior Effects 0.000 abstract description 2
- 238000010801 machine learning Methods 0.000 abstract description 2
- YHXISWVBGDMDLQ-UHFFFAOYSA-N moclobemide Chemical compound C1=CC(Cl)=CC=C1C(=O)NCCN1CCOCC1 YHXISWVBGDMDLQ-UHFFFAOYSA-N 0.000 abstract description 2
- 230000008901 benefit Effects 0.000 description 5
- 230000002860 competitive effect Effects 0.000 description 4
- 210000004027 cell Anatomy 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 210000002569 neuron Anatomy 0.000 description 2
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 238000004146 energy storage Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
- G06Q30/0206—Price or cost determination based on market factors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Strategic Management (AREA)
- General Physics & Mathematics (AREA)
- Finance (AREA)
- Development Economics (AREA)
- Accounting & Taxation (AREA)
- Economics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Entrepreneurship & Innovation (AREA)
- General Engineering & Computer Science (AREA)
- Marketing (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Primary Health Care (AREA)
- Human Resources & Organizations (AREA)
- Water Supply & Treatment (AREA)
- Public Health (AREA)
- Game Theory and Decision Science (AREA)
- Charge And Discharge Circuits For Batteries Or The Like (AREA)
- Electric Propulsion And Braking For Vehicles (AREA)
Abstract
The invention provides an intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning, and relates to the technical field of electric vehicle charging and discharging. The machine learning algorithm based on data driving can be used for solving a complex optimization problem, prior knowledge of a system is not needed, dynamic learning of a historical running state can be completed based on function iteration, and an optimal charging and discharging plan is obtained based on experience accumulation and return analysis. From the perspective of a user, an MDP with unknown transition probability is constructed and used for describing the charging and discharging scheduling problem of the electric automobile. The randomness of electricity prices and commuting behaviors are considered to describe an actual scene; the method does not need any system model information to determine the optimal decision of the real-time decision problem; the electricity price is predicted iteratively by using the single-step prediction LSTM network, and the prediction precision is higher compared with that of a traditional time series prediction method (Arima).
Description
Technical Field
The invention relates to the technical field of electric vehicle charging and discharging, in particular to an intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning.
Background
With the improvement of the living standard of residents and the intelligent level of electric automobiles, more factors, such as accumulated charging and discharging cost and user using satisfaction, can be considered when people charge and discharge the electric automobiles. For many households with electric vehicles, if the electric vehicle is selected to be charged at the maximum power when parked, although the user satisfaction is high, the charging cost is high; although the charging cost can be reduced if charging at a low electricity price is always selected, user satisfaction may be reduced if the electric vehicle is not fully charged when it leaves.
The electric automobile is different from other controllable loads and energy storage equipment, the travel requirements of various users need to be met firstly, the regulation and control of the electric automobile must be on the premise of meeting the travel and charge-discharge willingness of the users, and under the actual operation conditions that wind power output and photovoltaic output are uncertain, the load of the users fluctuates randomly, the topological structure of a power distribution network is flexible and changeable and the like, the traditional modeling optimization method is difficult to find the optimal solution of the charge-discharge electric quantity of the electric automobile which accords with the reality.
Because the energy consumption and electricity prices of electric vehicles are dynamic and time-varying due to arrival and departure times of electric vehicles, it becomes challenging to efficiently manage Electric Vehicle (EV) charging/discharging to reduce costs. In recent years, many day-ahead scheduling methods have been proposed for this problem. Although some success has been achieved in day-ahead charge/discharge scheduling, they are not suitable for real-time scenarios. A real-time scheduling strategy capable of responding to dynamic charging demand and time-varying electricity prices has recently attracted much attention, and the scheduling problem can be expressed as a model-based control problem, however, the following limitations are prevalent in the actual modeling process: 1. the fluctuating electricity price in hours has certain flexibility, and there is a great delay in transmitting the actual electricity price to the electric vehicle for control. 2. Most load control methods need to accurately model electric vehicles, and because the internal structures of the electric vehicles are different, different types of electric vehicles have different models, many parameters of the electric vehicles need to be known in advance in the modeling, the modeling is complex, and the respective modeling difficulty is high. Therefore, the model-free intelligent electric vehicle charging and discharging decision method based on the predicted electricity price and suitable for different types of electric vehicles is of great significance for solving the problems of high flexibility of the electricity price and poor applicability of the different types of electric vehicles.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides an intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning. The machine learning algorithm based on data driving can be used for solving a complex optimization problem, prior knowledge of a system is not needed, dynamic learning of a historical running state can be completed based on function iteration, and an optimal charging and discharging plan is obtained based on experience accumulation and return analysis.
In order to solve the technical problems, the invention adopts the following technical scheme:
an intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning specifically comprises the following steps:
step 1: collecting the electricity price data of the past 24 hours;
step 2: using a single-step prediction LSTM network to perform iterative prediction on the electricity price of 24 hours in the future;
step 2.1: the LSTM network is expanded into 23 layers of neural networks, and the same weight parameters are used for each layer;
step 2.2: let the input d of the first layert-22=pt-22-pt-23Wherein p ist-22And pt-23Representing the electricity prices, y, of time steps t-22 and t-23, respectivelyt-22Representing the output of the first layer, W and R are weight matrices of the LSTM gate structure; and ct-22Indicating a cell state including past power rate information;
step 2.3: y containing past electricity rate informationt-22And ct-22The electricity prices are sequentially transmitted to the second layer to the last layer, namely the electricity prices of 1 time step in the future are predicted;
step 2.4: repeating the steps 2.1-2.3 until the electricity price of the future 24 hours is predicted in an iterative mode;
and step 3: a DQN reinforcement learning method is introduced, an intelligent agent for controlling the load of the electric automobile is trained based on a neural network, and the optimal control decision is obtained by automatically learning the optimization process of the charge and discharge decision of the electric automobile through the observation of the predicted electricity price and the SOC of the residual electric quantity of the battery in the current hour and the obtained reward;
step 3.1:initializing an experience pool D and estimating action value network parameters QθAnd target action value network parametersRespectively selecting an initial SOC and arrival and departure times of the electric automobile from the truncated normal distribution;
step 3.2: the intelligent load control body of the electric automobile has 7 power selections, and the action space is recorded as A ═ 6kW, 4kW, 2kW, 0kW, -2kW, -4kW, -6kW]The electric automobile selects the action arg max by a greedy algorithm according to the probability of epsilonaQ(StA; theta), randomly choosing action a with a probability of (1-epsilon)t. Wherein s istIs the environment at time step t, a is the environment stA next selectable action, θ represents a parameter of the Q network;
step 3.3: t the observed value of the state of the time step is st=(ut,Et,Pt-23,...,Pt) Wherein parameter (P)t-23,...,Pt) Representing the electricity price at t hour at t time step, parameter EtRepresents the remaining energy in the battery of the electric vehicle, utIndicates whether the EV is at home;
step 3.4: state transition st+1=f(st,at),EtThe state transition of (1) is from action a at time step ttControl by a deterministic battery model Et+1=Et+atRepresents; for utAnd PtSince the arrival time, the departure time and the next hour of electricity price are unknown, the state transition has randomness;
step 3.5: reward function rt=-(d*n*at*p)/10-λ*((1-soc)*C)2Wherein d is the proportion of the time occupied by the complete charge and discharge of the vehicle; during discharging, d is (soc/(rate/C), and during charging, d is (1-soc)/(rate/C), wherein rate is charging and discharging power, n is a time step, where 1 is taken as one hour, p is real-time electricity price, and λ is a penalty coefficient and is a decrement term; c is the battery capacity, and a penalty term is included if t +1 is the last time step and the electric vehicle is not fully charged; soc is the battery occupied by the residual electric quantity of the battery of the electric automobileThe ratio of the capacities;
step 3.6: will(s)t,at,rt,st+1) Recording the data into an experience pool D;
step 3.7: random selection of minimum batch sample quadruples from replay memory DWherein # F is the number of tuples of the minimum batch sampling, j is 1, 2., # F;
step 3.8: network parameters according to target action valuesCalculating a target action value independent of the estimated value network parameter thetaIs calculated asWhereinParameters of the target network; gamma is the discount coefficient, and the value range [0,1 ]]Q is an action value;
step 3.9: minimizing loss functionCarrying out back propagation by a gradient descent method to update the parameters of the estimated value network theta;
step 3.10: repeating the step 3.2-3.9, and copying the estimated action value network parameter to the target action value network parameter every other set step to update the target action value network parameter;
step 3.11: repeating steps 3.1-3.10 until learning a strategy pi that maximizes the cumulative prize value R, wherein
Step 3.12: optimal selection among known DQNActing as atUnder the competitive structure, the action value function decomposes to: q (s, a) ═ v(s) + a (s, a) at this time, optimum action a*Argmax (a (s, a)). Where V(s) is the state cost function and A (s, a) is the action dominance function.
The invention has the following beneficial effects:
the invention provides an intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning, which has the following beneficial effects:
1. from the perspective of a user, the invention constructs an MDP with unknown transition probability for describing the charging and discharging scheduling problem of the electric automobile. The randomness of electricity prices and commuting behaviors are considered to describe an actual scene;
2. the method does not need any system model information to determine the optimal decision of the real-time decision problem;
3. the electricity price is iteratively predicted by using the single-step prediction LSTM network, and the prediction precision is higher compared with that of a traditional time series prediction method (Arima);
4. a competition structure is added at the DQN output end, the Q value is divided into the sum of a state cost function and an advantage function of specific action in a related state, the problem of overestimation of the DQN value function is effectively solved, the generalization capability of a model is enhanced, and the problems of noise and instability caused by large absolute value difference of the Q function in action and state dimensions in a traditional DQN algorithm are solved.
Drawings
FIG. 1 is a general flow chart of the intelligent electric vehicle charging and discharging decision method according to the present invention;
FIG. 2 is a block diagram of an LSTM network according to an embodiment of the present invention;
FIG. 3 is a graph of the training effect of DQN and a competition depth Q network (dulling-DQN) in the embodiment of the present invention;
FIG. 4 is a graph of the cumulative charge-discharge costs of DQN and dulling-DQN in the example of the present invention.
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
An intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning is shown in fig. 1 and specifically comprises the following steps:
step 1: collecting the electricity price data of the past 24 hours;
step 2: using a single-step prediction LSTM network to perform iterative prediction on the electricity price of 24 hours in the future, as shown in FIG. 2;
step 2.1: the LSTM network is expanded into 23 layers of neural networks, and the same weight parameters are used for each layer;
step 2.2: let the input d of the first layert-22=pt-22-pt-23Wherein p ist-22And pt-23Representing the electricity prices, y, of time steps t-22 and t-23, respectivelyt-22Representing the output of the first layer, W and R are weight matrices of the LSTM gate structure; and ct-22Indicating a cell state including past power rate information;
step 2.3: y containing past electricity rate informationt-22And ct-22The electricity prices are sequentially transmitted to the second layer to the last layer, namely the electricity prices of 1 time step in the future are predicted;
step 2.4: repeating the steps 2.1-2.3 until the electricity price of the future 24 hours is predicted in an iterative mode;
and step 3: a DQN reinforcement learning method is introduced, an intelligent agent for controlling the load of the electric automobile is trained based on a neural network, and the optimal control decision is obtained by automatically learning the optimization process of the charge and discharge decision of the electric automobile through the observation of the predicted electricity price and the SOC of the residual electric quantity of the battery in the current hour and the obtained reward;
step 3.1: initializing an experience pool D and estimating action value network parameters QθAnd target action value network parametersRespectively selecting an initial SOC and arrival and departure times of the electric automobile from the truncated normal distribution;
step 3.2: the intelligent load control body of the electric automobile has 7 power optionsTaking, and recording the action space as A ═ 6kW, 4kW, 2kW, 0kW, -2kW, -4kW, -6kW]The electric automobile selects the action arg max by a greedy algorithm according to the probability of epsilona Q(StA; theta), randomly choosing action a with a probability of (1-epsilon)t. Wherein s istIs the environment at time step t, a is the environment stA next selectable action, θ represents a parameter of the Q network;
step 3.3: t the observed value of the state of the time step is st=(ut,Et,Pt-23,...,Pt) Wherein parameter (P)t-23,...,Pt) Representing the electricity price at t hour at t time step, parameter EtRepresents the remaining energy in the battery of the electric vehicle, utIndicates whether the EV is at home;
step 3.4: state transition st+1=f(st,at),EtThe state transition of (1) is from action a at time step ttControl by a deterministic battery model Et+1=Et+atRepresents; for utAnd PtSince the arrival time, the departure time and the next hour of electricity price are unknown, the state transition has randomness;
step 3.5: reward function rt=-(d*n*at*p)/10-λ*((1-soc)*C)2Wherein d is the proportion of the time occupied by the complete charge and discharge of the vehicle; during discharging, d is (soc/(rate/C), and during charging, d is (1-soc)/(rate/C), wherein rate is charging and discharging power, n is a time step, where 1 is taken as one hour, p is real-time electricity price, and λ is a penalty coefficient and is a decrement term; c is the battery capacity, and a penalty term is included if t +1 is the last time step and the electric vehicle is not fully charged; the soc is the proportion of the residual electric quantity of the battery of the electric automobile to the capacity of the battery;
step 3.6: will(s)t,at,rt,st+1) Recording the data into an experience pool D;
step 3.7: random selection of minimum batch sample quadruples from replay memory DWherein # F is the mostThe number of tuples sampled in small batches;
step 3.8: network parameters according to target action valuesCalculating a target action value independent of the estimated value network parameter thetaIs calculated asWhereinParameters of the target network; gamma is the discount coefficient, and the value range [0,1 ]]Penalizing on a time basis to achieve better performance, if γ equals 0, meaning only looking at the current reward; if γ is equal to 1, meaning that the environment is determined, the same action will always receive the same reward; q is an action value;
step 3.9: minimizing loss functionCarrying out back propagation by a gradient descent method to update the parameters of the estimated value network theta;
step 3.10: repeating the step 3.2-3.9, and copying the estimated action value network parameter to the target action value network parameter every other set step to update the target action value network parameter;
step 3.11: repeating steps 3.1-3.10 until learning a strategy pi that maximizes the cumulative prize value R, wherein
Step 3.12: the best action among the known DQNs is selected as atUnder the competitive structure, the action value function decomposes to: q (s, a) ═ v(s) + a (s, a) at this time, optimum action a*Argmax (a (s, a)). Where V(s) is the state cost function and A (s, a) is the action dominance function.
The method is divided into two stages, wherein the first stage is an LSTM network electricity price forecasting stage, and the second stage is a DQN method training agent to obtain an optimal strategy stage.
In the present invention, the electricity price trend is captured by the LSTM network. Its input is the electricity prices of the past 24 time steps and its output is the electricity prices of the future 1 time step. The idea behind LSTM networks is to utilize sequential information such as real-time electricity prices. The LSTM network performs the same processing on each element of the sequence, the output of which depends on the previous calculation. The calculated information may be stored in the LSTM unit so far. For this EV charging scheduling problem, the LSTM network will be expanded to a 23-layer neural network. Specifically, the input to the first layer is dt-22=pt-22-pt-23Wherein p ist-22And pt-23Representing the electricity prices for time steps t-22 and t-23, respectively. W and R represent the respective parameters shared between all layers. y ist-22Represents the output of the first layer, and ct-22Indicating its cell state. Y containing past electricity rate informationt-22And ct-22Is passed to the second layer and the process is repeated until the last layer. It can be seen from the expanded view that the output of each layer can be passed to the next neuron. The weight parameters used by each layer after the LSTM neural network is expanded are the same, which greatly simplifies the difficulty of the neural network parameter training.
Output y of the LSTM networktIn series with a scalar battery SOC. These series characteristics contain information on predicted future rates of electricity and battery SOC. Information on future electricity prices is important to reduce charging costs, while information on battery SOC is important to ensure that the EV is fully charged. Then, these series characteristics are fed into the competitive Q network to obtain the action dominance function A (s, a) corresponding to each actioni) Choose the optimal action a*The optimal charge and discharge plan is obtained as argmax (a (s, a)).
In reinforcement learning, it is necessary to estimate the value of each state, but it is not necessary to estimate the value of each action for many states. Representation and status of status value by competing network structureThe following action advantages were evaluated separately. State-action cost function Qπ(s, a) represents the expected reward value for selecting action a by policy π in state s, the value of state Vπ(s) represents the value of state s, which is the expected value of all actions worth produced by strategy π in that state, and the difference between the two represents the advantage of selecting action a in state s, defined as
Aπ(s,a)=Qπ(s,a)-Vπ(s)
Thus, there are two data streams for the contention network, one outputting a state value V (s; θ, β) and the other outputting an action dominance A (s, a; θ, α). Wherein, theta represents a network neuron parameter for performing feature processing on an input layer, and the parameter is the weight of each layer of the network in the neural network; α, β are parameters of the two streams, respectively. The output of the deep Q network adopting the competition network structure is
Q(s,a;θ,α,β)=V(s;θ,β)+A(s,a;θ,α)
When a competitive network structure is actually applied, the average value of the motion advantages is usually used for solving the maximum value of the motion advantages in the calculation of the Q value, so that the performance is ensured, and the optimization stability is improved. As can be seen from fig. 3, dulling-DQN has less overall loss fluctuation and converges faster than DQN.
As can be seen from fig. 4, the charging and discharging costs using the dulling-DQN method are generally lower than the DQN method for randomly selected 100 days.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions and scope of the present invention as defined in the appended claims.
Claims (3)
1. An intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning is characterized by comprising the following steps:
step 1: collecting the electricity price data of the past 24 hours;
step 2: using a single-step prediction LSTM network to perform iterative prediction on the electricity price of 24 hours in the future;
and step 3: a DQN reinforcement learning method is introduced, an intelligent agent for controlling the load of the electric automobile is trained based on a neural network, the optimization process of the charge and discharge decision of the electric automobile is automatically learned through the observation of the predicted electricity price and the SOC of the battery residual electricity and the obtained reward in the current hour, and the optimal control decision is obtained.
2. The intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning of claim 1, wherein the step 2 specifically comprises the following steps:
step 2.1: the LSTM network is expanded into 23 layers of neural networks, and the same weight parameters are used for each layer;
step 2.2: let the input d of the first layert-22=pt-22-pt-23Wherein p ist-22And pt-23Representing the electricity prices, y, of time steps t-22 and t-23, respectivelyt-22Representing the output of the first layer, W and R are weight matrices of the LSTM gate structure; and ct-22Indicating a cell state including past power rate information;
step 2.3: y containing past electricity rate informationt-22And ct-22The electricity prices are sequentially transmitted to the second layer to the last layer, namely the electricity prices of 1 time step in the future are predicted;
step 2.4: and repeating the steps 2.1-2.3 until the future 24-hour electricity price is predicted in an iterative mode.
3. The intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning of claim 1, wherein the step 3 specifically comprises the following steps:
step 3.1: initializing an experience pool D and estimating action value network parameters QθAnd target action value network parametersRespectively selecting an initial SOC and arrival and departure times of the electric automobile from the truncated normal distribution;
step 3.2: the intelligent load control body of the electric automobile has 7 power selections, and the action space is recorded as A ═ 6kW, 4kW, 2kW, 0kW, -2kW, -4kW, -6kW]The electric automobile selects the action argmax by a greedy algorithm according to the probability of epsilonaQ(StA; theta), randomly choosing action a with a probability of (1-epsilon)tWherein s istIs the environment at time step t, a is the environment stA next selectable action, θ represents a parameter of the Q network;
step 3.3: t the observed value of the state of the time step is st=(ut,Et,Pt-23,...,Pt) Wherein parameter (P)t-23,...,Pt) Representing the electricity price at t hour at t time step, parameter EtRepresents the remaining energy in the battery of the electric vehicle, utIndicates whether the EV is at home;
step 3.4: state transition st+1=f(st,at),EtThe state transition of (1) is from action a at time step ttControl by a deterministic battery model Et+1=Et+atRepresents; for utAnd PtSince the arrival time, the departure time and the next hour of electricity price are unknown, the state transition has randomness;
step 3.5: reward function rt=-(d*n*at*p)/10-λ*((1-soc)*C)2Wherein d is the proportion of the time occupied by the complete charge and discharge of the vehicle; during discharging, d is (soc/(rate/C), and during charging, d is (1-soc)/(rate/C), wherein rate is charging and discharging power, n is a time step, where 1 is taken as one hour, p is real-time electricity price, and λ is a penalty coefficient and is a decrement term; c is the battery capacity, and a penalty term is included if t +1 is the last time step and the electric vehicle is not fully charged; the soc is the proportion of the residual electric quantity of the battery of the electric automobile to the capacity of the battery;
step 3.6: will(s)t,at,rt,st+1) Recording the data into an experience pool D;
step (ii) of3.7: random selection of minimum batch sample quadruples from replay memory DWherein # F is the number of tuples of the minimum batch sampling, j is 1, 2., # F;
step 3.8: network parameters according to target action valuesCalculating a target action value independent of the estimated value network parameter thetaIs calculated asWhereinParameters of the target network; gamma is the discount coefficient, and the value range [0,1 ]]Q is an action value;
step 3.9: minimizing loss functionCarrying out back propagation by a gradient descent method to update the parameters of the estimated value network theta;
step 3.10: repeating the step 3.2-3.9, and copying the estimated action value network parameter to the target action value network parameter every other set step to update the target action value network parameter;
step 3.11: repeating steps 3.1-3.10 until learning a strategy pi that maximizes the cumulative prize value R, wherein
Step 3.12: the best action among the known DQNs is selected as atUnder the competition structure, the action value function decomposes to: q (s, a) ═ v(s) + a (s, a) at this timeOptimum action a*Argmax (a (s, a)), where v(s) is the state cost function and a (s, a) is the action dominance function.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110989593.9A CN113627993A (en) | 2021-08-26 | 2021-08-26 | Intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110989593.9A CN113627993A (en) | 2021-08-26 | 2021-08-26 | Intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113627993A true CN113627993A (en) | 2021-11-09 |
Family
ID=78387939
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110989593.9A Pending CN113627993A (en) | 2021-08-26 | 2021-08-26 | Intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113627993A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114139653A (en) * | 2021-12-15 | 2022-03-04 | 中国人民解放军国防科技大学 | Intelligent agent strategy obtaining method based on adversary action prediction and related device |
CN114254765A (en) * | 2022-03-01 | 2022-03-29 | 之江实验室 | Active sequence decision method, device and medium for simulation deduction |
CN114692310A (en) * | 2022-04-14 | 2022-07-01 | 北京理工大学 | Virtual-real integration-two-stage separation model parameter optimization method based on Dueling DQN |
CN114844083A (en) * | 2022-05-27 | 2022-08-02 | 深圳先进技术研究院 | Electric vehicle cluster charging and discharging management method for improving stability of energy storage system |
CN114997935A (en) * | 2022-07-19 | 2022-09-02 | 东南大学溧阳研究院 | Electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization |
CN115293100A (en) * | 2022-09-30 | 2022-11-04 | 深圳市威特利电源有限公司 | Accurate evaluation method for residual electric quantity of new energy battery |
CN117863948A (en) * | 2024-01-17 | 2024-04-12 | 广东工业大学 | Distributed electric vehicle charging control method and device for auxiliary frequency modulation |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110276638A (en) * | 2019-05-29 | 2019-09-24 | 南京邮电大学 | A kind of Electricity price forecasting solution and system based on two-way shot and long term neural network |
CN110535157A (en) * | 2018-05-24 | 2019-12-03 | 三菱电机(中国)有限公司 | The discharge control device and discharge control method of electric car |
CN110588432A (en) * | 2019-08-27 | 2019-12-20 | 深圳市航通北斗信息技术有限公司 | Electric vehicle, battery management method thereof and computer-readable storage medium |
-
2021
- 2021-08-26 CN CN202110989593.9A patent/CN113627993A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110535157A (en) * | 2018-05-24 | 2019-12-03 | 三菱电机(中国)有限公司 | The discharge control device and discharge control method of electric car |
CN110276638A (en) * | 2019-05-29 | 2019-09-24 | 南京邮电大学 | A kind of Electricity price forecasting solution and system based on two-way shot and long term neural network |
CN110588432A (en) * | 2019-08-27 | 2019-12-20 | 深圳市航通北斗信息技术有限公司 | Electric vehicle, battery management method thereof and computer-readable storage medium |
Non-Patent Citations (1)
Title |
---|
ZHIQIANG WAN 等: "Model-Free Real-Time EV Charging Scheduling Based on Deep Reinforcement Learning", 《IEEE TRANSACTIONS ON SMART GRID》, vol. 10, no. 5, pages 5246 - 5257, XP011741247, DOI: 10.1109/TSG.2018.2879572 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114139653A (en) * | 2021-12-15 | 2022-03-04 | 中国人民解放军国防科技大学 | Intelligent agent strategy obtaining method based on adversary action prediction and related device |
CN114254765A (en) * | 2022-03-01 | 2022-03-29 | 之江实验室 | Active sequence decision method, device and medium for simulation deduction |
CN114692310A (en) * | 2022-04-14 | 2022-07-01 | 北京理工大学 | Virtual-real integration-two-stage separation model parameter optimization method based on Dueling DQN |
CN114844083A (en) * | 2022-05-27 | 2022-08-02 | 深圳先进技术研究院 | Electric vehicle cluster charging and discharging management method for improving stability of energy storage system |
CN114844083B (en) * | 2022-05-27 | 2023-02-17 | 深圳先进技术研究院 | Electric automobile cluster charging and discharging management method for improving stability of energy storage system |
CN114997935A (en) * | 2022-07-19 | 2022-09-02 | 东南大学溧阳研究院 | Electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization |
CN114997935B (en) * | 2022-07-19 | 2023-04-07 | 东南大学溧阳研究院 | Electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization |
CN115293100A (en) * | 2022-09-30 | 2022-11-04 | 深圳市威特利电源有限公司 | Accurate evaluation method for residual electric quantity of new energy battery |
CN117863948A (en) * | 2024-01-17 | 2024-04-12 | 广东工业大学 | Distributed electric vehicle charging control method and device for auxiliary frequency modulation |
CN117863948B (en) * | 2024-01-17 | 2024-06-11 | 广东工业大学 | Distributed electric vehicle charging control method and device for auxiliary frequency modulation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113627993A (en) | Intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning | |
CN109347149B (en) | Micro-grid energy storage scheduling method and device based on deep Q-value network reinforcement learning | |
CN110341690B (en) | PHEV energy management method based on deterministic strategy gradient learning | |
CN113511082B (en) | Hybrid electric vehicle energy management method based on rule and double-depth Q network | |
CN112117760A (en) | Micro-grid energy scheduling method based on double-Q-value network deep reinforcement learning | |
CN114997631B (en) | Electric vehicle charging scheduling method, device, equipment and medium | |
CN112614009A (en) | Power grid energy management method and system based on deep expected Q-learning | |
CN113515884A (en) | Distributed electric vehicle real-time optimization scheduling method, system, terminal and medium | |
CN114997935B (en) | Electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization | |
CN116436019B (en) | Multi-resource coordination optimization method, device and storage medium | |
CN116451880B (en) | Distributed energy optimization scheduling method and device based on hybrid learning | |
CN113011101B (en) | Control method and system for energy storage to participate in frequency modulation auxiliary service optimization | |
CN117057553A (en) | Deep reinforcement learning-based household energy demand response optimization method and system | |
CN115587645A (en) | Electric vehicle charging management method and system considering charging behavior randomness | |
CN113110052A (en) | Hybrid energy management method based on neural network and reinforcement learning | |
CN114619907B (en) | Coordinated charging method and coordinated charging system based on distributed deep reinforcement learning | |
CN117318169A (en) | Active power distribution network scheduling method based on deep reinforcement learning and new energy consumption | |
CN118137582A (en) | Multi-target dynamic scheduling method and system based on regional power system source network charge storage | |
CN114611811B (en) | Low-carbon park optimal scheduling method and system based on EV load participation | |
CN116384845A (en) | Electric automobile demand response and charging scheduling method based on deep reinforcement learning | |
CN113555888B (en) | Micro-grid energy storage coordination control method | |
CN114154718A (en) | Day-ahead optimization scheduling method of wind storage combined system based on energy storage technical characteristics | |
CN112613229A (en) | Energy management method and model training method and device for hybrid power equipment | |
CN117863969B (en) | Electric automobile charge and discharge control method and system considering battery loss | |
CN118095783B (en) | Electric automobile charging planning method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |