CN114997935B - Electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization - Google Patents
Electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization Download PDFInfo
- Publication number
- CN114997935B CN114997935B CN202210848364.XA CN202210848364A CN114997935B CN 114997935 B CN114997935 B CN 114997935B CN 202210848364 A CN202210848364 A CN 202210848364A CN 114997935 B CN114997935 B CN 114997935B
- Authority
- CN
- China
- Prior art keywords
- charging
- time
- electric vehicle
- constraint
- discharging
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000005457 optimization Methods 0.000 title claims abstract description 59
- 238000000034 method Methods 0.000 title claims abstract description 54
- 238000007599 discharging Methods 0.000 title claims abstract description 46
- 230000006870 function Effects 0.000 claims abstract description 44
- 230000005611 electricity Effects 0.000 claims abstract description 39
- 238000013528 artificial neural network Methods 0.000 claims abstract description 19
- 230000015654 memory Effects 0.000 claims abstract description 13
- 230000004888 barrier function Effects 0.000 claims abstract description 12
- 230000009471 action Effects 0.000 claims abstract description 10
- 230000007246 mechanism Effects 0.000 claims abstract description 7
- 230000002787 reinforcement Effects 0.000 claims abstract description 6
- 230000008569 process Effects 0.000 claims description 25
- 239000003795 chemical substances by application Substances 0.000 claims description 14
- 238000004364 calculation method Methods 0.000 claims description 12
- 238000012549 training Methods 0.000 claims description 10
- 239000011159 matrix material Substances 0.000 claims description 8
- 230000004913 activation Effects 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 6
- 239000013598 vector Substances 0.000 claims description 6
- 230000007613 environmental effect Effects 0.000 claims description 4
- 230000005540 biological transmission Effects 0.000 claims description 3
- 230000001413 cellular effect Effects 0.000 claims description 3
- 238000012937 correction Methods 0.000 claims description 3
- 230000008878 coupling Effects 0.000 claims description 3
- 238000010168 coupling process Methods 0.000 claims description 3
- 238000005859 coupling reaction Methods 0.000 claims description 3
- 238000011156 evaluation Methods 0.000 claims description 3
- 230000003993 interaction Effects 0.000 claims description 3
- GEPDYQSQVLXLEU-AATRIKPKSA-N methyl (e)-3-dimethoxyphosphoryloxybut-2-enoate Chemical compound COC(=O)\C=C(/C)OP(=O)(OC)OC GEPDYQSQVLXLEU-AATRIKPKSA-N 0.000 claims description 3
- 238000003062 neural network model Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 2
- 230000002452 interceptive effect Effects 0.000 claims description 2
- 230000000694 effects Effects 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 4
- 239000000284 extract Substances 0.000 description 2
- 230000007787 long-term memory Effects 0.000 description 2
- 230000006403 short-term memory Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000002803 fossil fuel Substances 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
- G06Q30/0206—Price or cost determination based on market factors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T90/00—Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation
- Y02T90/10—Technologies relating to charging of electric vehicles
- Y02T90/16—Information or communication technologies improving the operation of electric vehicles
- Y02T90/167—Systems integrating technologies related to power network operation and communication or information technologies for supporting the interoperability of electric or hybrid vehicles, i.e. smartgrids as interface for battery charging of electric vehicles [EV] or hybrid vehicles [HEV]
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Strategic Management (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Development Economics (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Economics (AREA)
- General Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- Marketing (AREA)
- Data Mining & Analysis (AREA)
- General Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Game Theory and Decision Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Tourism & Hospitality (AREA)
- Primary Health Care (AREA)
- Human Resources & Organizations (AREA)
- Water Supply & Treatment (AREA)
- Public Health (AREA)
- Charge And Discharge Circuits For Batteries Or The Like (AREA)
Abstract
The invention discloses an electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization, which relates to the field of electric vehicle charging and discharging scheduling, and comprises the following steps: firstly, an electric vehicle charging and discharging model is constructed according to an actual physical mechanism, and meanwhile, modeling is carried out as a constraint sequential decision problem based on an electric vehicle charging and discharging deployment optimization problem. And then extracting the future trend of the time-varying electricity price by using the long-short term memory neural network to assist the follow-up deep reinforcement learning to carry out strategy optimization. And finally, inputting the extracted electricity price information and the internal state characteristics of the electric vehicle into a strategy function based on a deep neural network, enabling to deploy intelligent learning charge and discharge actions, and expanding the electric quantity constraint into an optimization target of interior point strategy optimization through a logarithmic barrier function to carry out strategy optimization. The deployment optimization method provided by the invention minimizes the charging cost of a user on the premise of meeting the power consumption requirement of the electric automobile, and meanwhile, improves the adaptability of the strategy to uncertainty.
Description
Technical Field
The invention relates to the field of electric vehicle charging and discharging scheduling, in particular to a real-time electric vehicle charging and discharging strategy deployment optimization method based on interior point strategy optimization.
Background
As an environmentally friendly alternative to traditional fossil fuel automobiles, electric automobiles have been widely adopted over the past few years. However, the rapid development of EVs inevitably causes large-scale electric vehicle clusters to be integrated into the power grid, which poses great challenges to the economic and safe operation of the power grid. To address this issue, electric vehicles may be motivated by demand response to shift charging time to off-peak hours and optimize electric vehicle charging costs based on dynamic electricity prices, even by discharging to the grid to gain revenue.
The essence of the EV optimization scheduling problem is the scheduling problem of the charging and discharging states of the electric vehicle under the random scene of multiple uncertain factors. The deep reinforcement learning is suitable for finding the optimal strategy in a complex uncertain environment, and is an effective method for solving the sequence decision problem. However, when the constraint of meeting the trip electricity demand of the user is taken as a constraint, the conventional deep reinforcement learning method needs to correctly design a penalty item and select a penalty coefficient to ensure that the electric vehicle can be fully charged when leaving. Selecting the proper penalty coefficients requires a great deal of time and effort, is a very tedious process, and can lead to a sharp reduction in the performance of the algorithm once the designed penalty coefficients are not proper.
Disclosure of Invention
The invention aims to provide an electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization, which does not depend on accurate prediction of future information and only depends on real-time perception of environmental states to carry out self-optimization-approaching strategy learning; the adopted long-short term memory neural network effectively extracts the time sequence characteristics of the future trend of the time-varying electricity price; in addition, the interior point strategy optimization algorithm takes the requirement of the user for trip electricity utilization as a constraint premise, so that the charging cost of the user is minimized, and the adaptability of the strategy to uncertainty is improved.
In order to meet the requirements, the invention adopts the following technical scheme:
an electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization comprises the following steps:
step1: modeling a slow charging process of the electric vehicle under the dynamic electricity price, describing the state space of the EV storage battery, considering the power demand constraint of a vehicle owner, and modeling a charge and discharge strategy deployment problem of the electric vehicle as a constraint Markov decision problem;
step2: extracting future trend of time-varying power price by using the long-short term memory neural network, assisting subsequent deep reinforcement learning to optimize strategies, and realizing effective deployment of EV charge-discharge strategies;
step3: and an interior point-based strategy optimization algorithm is adopted, the logarithmic barrier function is utilized to convert the electric quantity constraint condition, the strategy deployment is optimized in the deep neural network, and the user charging cost is minimized.
Step4: and interacting the external environment with the deployment intelligent agent according to the strategy obtained by training to obtain a real-time electric vehicle charging and discharging decision.
Further, in the Step1, a model of the electric vehicle slow charging at the dynamic electricity price is as follows:
in the formula: t is t 0 And t 1 Respectively representing the arrival time and the departure time of the electric automobile, wherein the arrival time and the departure time are both generated at the beginning of time; e 0 Is the remaining electric quantity at the beginning of charging and discharging of the electric vehicle, E t And E t+1 SOC values at t and t + 1; eta ch And η dis The EV battery energy conversion efficiency during charging and discharging;and &>Respectively charge and discharge maximum power; Δ t is the duration of each charging action; a is t For charging and discharging power of electric vehicle, when a t If > 0, the electric vehicle is charged, otherwise, the electric vehicle is discharged.
The electric vehicle is charged and discharged and deploys an intelligent agent and the environment: the electric vehicle charging and discharging deployment intelligent body and the environment interactively learn experiences and optimize a charging and discharging deployment strategy; the environment observed by the intelligent agent is divided into two parts, wherein one part is the real-time SOC value of the electric automobile, and the other part is the time-varying electricity price of the time node.
A state set S: the environmental state at time t may be defined as:
s t =(E t ,P t-23 ,...,P t )
three types of information are contained in the formula: e t The SOC value of the electric automobile at the time t; (P) t-23 ,...,P t ) Represents the electricity price in the past 24 hours;
action set A: action a at time t t Represents the charge and discharge power of the electric vehicle in unit time, a t And the multiplied by delta t is the electric energy conversion quantity of the electric automobile in the delta t period.
The core of the constraint Markov decision process is to satisfy a constraint function c t Under the premise of maximizing the reward function r t Giving an optimal strategy, so that the optimal goal of electric vehicle charging deployment is to ensure that the charging cost under the constraint of meeting the electric vehicle owner power demand is minimum:
(1) Charging fee, i.e. reward value:
r t =R(s t ,a t ,s t+1 )=-a t ×P t
in the formula: in the charging process, the reward represents the product of the electricity price at the time t and the charging power in unit time, namely the negative value of the charging cost; during the discharge process, the reward represents revenue for selling electricity to the grid.
(2) Electric quantity constraint is a constraint value:
wherein, | E t -E target L is the time when charging is completed and the battery capacity E t And a charging target E target A deviation value; e t -E max And E min -E t Shows that the practical physical mechanism is combined to ensure that the charging and discharging are carried out within the EV battery capacity range.
The aim of learning of intelligent agent for electric vehicle charging and discharging deployment is to solve the T-period total expected discount reward J meeting the electric quantity constraint condition C (π) max, the objective function is expressed as:
Π C ={π:J C (π)≤d}
in the formula: gamma is a discount factor for balancing the current constraint value with the future constraint value; d represents a very small constraint variance.
Further, in the Step2, the characteristic information of the future trend of the time-varying electricity price is extracted by using the long-term and short-term memory neural network, and the method specifically comprises the following steps:
the long-short term memory neural network and prediction module comprises the following calculation processes: the LSTM network expands into a 23-layer neural network structure. Wherein the input X of the first layer t-22 Is represented by X t-22 =P t-22 -P t-23 In which P is t-22 And P t-23 Representing the time-varying electricity prices at time t-22 and time t-21, respectively. y is t-22 Representing the output of the first layer, c t-22 Indicating its cellular state. Containing past electricity price information y t-22 And c t-22 Is passed to the next layer. This process is repeated until the last layer.
The LSTM realizes the selective transmission of memory information by protecting and controlling the cell state through a gate mechanism, and comprises a forgetting gate, an input gate and an output gate:
in the formula: o is forget (t)、O input (t)、O out (t) respectively representing output matrixes of the forgetting gate, the input gate and the output gate at the time t; w is a group of yf 、W xf 、W yi 、W xi 、W yo 、W xo Respectively showing a forgetting gate, an input gate, an output gate and an output y at the moment of t-1 t-1 And input x at time t t The connection weight matrix of (2); b f 、bi、b o Respectively representing the offset vectors of the gates on the corresponding branches; σ represents an activation function;
therefore, the calculation formula for extracting the future trend of the time series electricity price is as follows:
in the formula: o is z (t) preprocessing information input to the cell status module at time t; w is a group of yz 、W xz Respectively representing the output y at time t-1 t-1 And x is input at time t t And O z (t) a connection weight matrix; b z Is a bias vector; the Hadamard product representing the matrix; tanh is the activation function.
Further, in Step3, the electric quantity constraint is expanded by using a logarithmic barrier function, which specifically includes the following steps:
for each problem that the constraint satisfies, an index function is setSatisfies the following conditions:
in the formula: when in strategy pi θ Lower constraint conditionWhen the condition is met, the problem is converted into an unconstrained strategy optimization problem only considering rewards to solve; however, when any constraint violates, the penalty is- ∞, requiring preferential tuning of the policy to satisfy the constraint. The logarithmic barrier function being an index function>Can be approximated by:
wherein k is a hyper-parameter, and the larger the k value is, the finger-to-finger ratioStandard functionThe better the fitting. Thus passes the index function->And expanding the target, and simplifying the original CMDP problem into an unconstrained optimization problem.
The interior point strategy optimization inherits a frame of a near-end strategy optimization algorithm, an actuator-judger framework is adopted, and small-batch data are randomly sampled from an experience playback pool for strategy updating during training; the evaluation device network updates the network parameter theta by using a time sequence difference error method v The specific calculation formula is as follows:
the near-end strategy optimization carries out first-order approximation and adopts Monte Carlo to approximate expectation, and then a target function L is obtained through a cutting function CLIP :
In the formula:representing the ratio of old and new policies;Representing a merit function; clip (·) function will ξ t Restricted to the interval [ 1-epsilon, 1+ epsilon ] with respect to the hyper-parameter epsilon]In addition, the calculation process is simplified.
The interior point strategy optimization expands the constraint condition into the objective function through a logarithmic barrier function, so that not only is the long-time coupling constraint met, but also a confidence domain correction method compatible with random gradient descent is realized, and the objective function under the final parameter theta is specifically as follows:
further, the Step4 specifically includes the following steps:
the decision of the intelligent agent after on-line deployment only depends on the actuator network which is trained, and the optimal parameter theta obtained by training the actuator network is loaded * And the neural network model is used for interacting with the intelligent agent according to the state information to obtain a real-time charging and discharging decision. And continuously repeating the interaction process until the electric automobile leaves the charging pile.
Has the advantages that:
the electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization does not depend on accurate prediction of future information, and only senses the environment state in real time to carry out self-optimization-approaching strategy learning; the adopted long-short term memory neural network effectively extracts the time sequence characteristics of the future trend of the time-varying electricity price; in addition, the interior point strategy optimization algorithm takes the requirement of meeting the trip electricity utilization of the user as a constraint premise, the charging cost of the user is minimized, and meanwhile the adaptability of the strategy to uncertainty is improved.
Drawings
FIG. 1 is a diagram of a long-short term memory neural network-based timing feature extraction and policy network according to the present invention.
FIG. 2 is a flow chart of training of the interior point strategy optimization algorithm of the present invention.
FIG. 3 is a diagram of a Markov decision process of the present invention.
FIG. 4 is a diagram illustrating reward values under the interior point policy optimization algorithm of the present invention.
FIG. 5 is a diagram illustrating constraint values under the interior point policy optimization algorithm of the present invention.
FIG. 6 is a real-time electric vehicle charging and discharging schedule according to the present invention.
Detailed description of the preferred embodiment
The following detailed description of the embodiments of the present invention is provided with reference to the accompanying drawings 1:
the invention provides an electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization, which comprises the following steps:
step1: modeling the slow charging process of the electric automobile under the dynamic electricity price, describing the state space of the EV storage battery, considering the constraint of the electricity demand of an automobile owner, and modeling the charge and discharge strategy deployment problem of the electric automobile as a constraint Markov decision problem, specifically comprising:
in the formula: t is t 0 And t 1 Respectively representing the arrival time and the departure time of the electric automobile, wherein the arrival time and the departure time are both generated at the beginning of time; e 0 Is the remaining electric quantity at the beginning of charging and discharging of the electric vehicle, E t And E t+1 SOC values at t and t + 1; eta ch And η dis The EV battery energy conversion efficiency during charging and discharging;and &>Respectively charge and discharge maximum power; Δ t is the duration of each charging action; a is t For charging and discharging power of electric vehicle, when a t If the voltage is greater than 0, the electric automobile is charged, otherwise, the electric automobile is discharged.
The electric automobile charge and discharge deploys the intelligent agent and the environment: the method comprises the following steps that an electric vehicle charging and discharging deployment intelligent body and the environment learn experience in an interactive mode and optimize a charging and discharging deployment strategy; the environment observed by the intelligent agent can be divided into two parts, wherein one part is the real-time SOC value of the electric automobile, and the other part is the time-varying electricity price of the time node.
A state set S: the environmental state at time t may be defined as:
s t =(E t ,P t-23 ,...,P t )
three types of information are contained in the formula: e t The SOC value of the electric automobile at the time t; (P) t-23 ,...,P t ) Represents the electricity price in the past 24 hours;
action set A: action a at time t t Represents the charge and discharge power of the electric vehicle in unit time, a t And the multiplied by delta t is the electric energy conversion quantity of the electric automobile in the delta t period.
The core of the constraint Markov decision process is to satisfy a constraint function c t Under the premise of maximizing the reward function r t Giving an optimal strategy, so that the optimal goal of electric vehicle charging deployment is to ensure that the charging cost under the constraint of satisfying the vehicle owner power demand is minimum:
(1) Charging fee, i.e. reward value:
r t =R(s t ,a t ,s t+1 )=-a t ×P t
in the formula: in the charging process, the reward represents the product of the electricity price at the time t and the charging power in unit time, namely the negative value of the charging cost; during the discharge process, the reward represents revenue for selling electricity to the grid.
(2) Electric quantity constraint is a constraint value:
wherein, | E t -E target L is the moment when charging is completed and the battery capacity E t And a charging target E target A deviation value; e t -E max And E min -E t Shows that the practical physical mechanism is combined to ensure that charging and discharging are carried out within the EV battery capacity range.
The aim of intelligent learning for electric vehicle charge and discharge deployment is to solve T-period total expected discount reward J meeting the electric quantity constraint condition C (π) max, the objective function is expressed as:
Π C ={π:J C (π)≤d}
step2: the method includes the steps that the future trend of the time-varying electricity price is extracted by using a long-term and short-term memory neural network, follow-up deep reinforcement learning is assisted to carry out strategy optimization, effective deployment of EV charge and discharge strategies is achieved, and the method specifically comprises the following steps:
the long-short term memory neural network and prediction module comprises the following calculation processes: the LSTM network expands into a 23-layer neural network structure. Wherein the input X of the first layer t-22 Is represented by X t-22 =P t-22 -P t-23 In which P is t-22 And P t-23 Representing the time-varying electricity prices at time t-22 and time t-21, respectively. y is t-22 Represents the output of the first layer, c t-22 Indicating its cellular state. Containing past electricity price information y t-22 And c t-22 Is passed to the next layer. This process is repeated until the last layer.
The LSTM realizes the selective transmission of memory information by protecting and controlling the cell state through a gate mechanism, and comprises a forgetting gate, an input gate and an output gate:
in the formula: o is forget (t)、O input (t)、O out (t) respectively representing the output matrixes of the forgetting gate, the input gate and the output gate; w yf 、W xf 、W yi 、W xi 、W yo 、W xo Respectively showing a forgetting gate, an input gate, an output gate and an output y at the time of t-1 t-1 And input x at time t t The connection weight matrix of (2); b f 、bi、b o Respectively representing the offset vectors of the gates on the corresponding branches; σ represents an activation function;
therefore, the calculation formula for extracting the future trend of the time series electricity price is as follows:
in the formula: o is z (t) preprocessing information input to the cell status module at time t; w is a group of yz 、W xz Respectively representing the output y at time t-1 t-1 And input x at time t t And O z (t) a connection weight matrix; b z Is a bias vector; the Hadamard product representing the matrix; tanh is the activation function.
Step3: the method adopts an interior point-based strategy optimization algorithm, utilizes a logarithmic barrier function to convert an electric quantity constraint condition, optimizes strategy deployment in a deep neural network, and minimizes user charging cost, and specifically comprises the following steps:
for each problem that the constraint satisfies, an index function is setSatisfies the following conditions:
in the formula: when in strategy pi θ Lower constraint conditionWhen the condition is met, the problem is converted into an unconstrained strategy optimization problem only considering rewards to solve; however, when any constraint violates, the penalty is- ∞, requiring preferential tuning of the policy to satisfy the constraint. The logarithmic barrier function being an index function>A micro-approximation of:
where k is a hyperparameter, the larger the value of k, the larger the index functionThe better the fit. Thus passing through the index functionAnd (4) expanding the target and simplifying the original CMDP problem into an unconstrained optimization problem.
The interior point strategy optimization inherits a frame of a near-end strategy optimization algorithm, an actuator-judger framework is adopted, and small-batch data are randomly sampled from an experience playback pool for strategy updating during training; the evaluation device network updates the network parameter theta by using a time sequence difference error method v The specific calculation formula is as follows:
the near-end strategy optimization carries out first-order approximation and adopts Monte Carlo to approximate expectation, and then a target function L is obtained through a cutting function CLIP :
In the formula:representing the ratio of old and new policies;Representing a merit function; clip (·) function will ξ t Restricted to the interval [ 1-epsilon, 1+ epsilon ] with respect to the hyper-parameter epsilon]In addition, the calculation process is simplified.
The interior point strategy optimization expands the constraint condition into the objective function through a logarithmic barrier function, so that not only is the long-time coupling constraint met, but also a confidence domain correction method compatible with random gradient descent is realized, and the objective function under the final parameter theta is specifically as follows:
step4: according to the strategy obtained by training, interacting the external environment with the deployment intelligent body to obtain a real-time electric vehicle charging and discharging decision, which specifically comprises the following steps:
the decision of the intelligent agent after on-line deployment only depends on the actuator network which is trained, and the optimal parameter theta obtained by the training of the actuator network is loaded * And the neural network model is used for interacting with the intelligent agent according to the state information to obtain a real-time charging and discharging decision. And continuously repeating the interaction process until the electric automobile leaves the charging pile.
Claims (4)
1. An electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization is characterized by comprising the following steps:
step1: modeling a slow charging process of the electric vehicle under a dynamic electricity price, describing an EV storage battery state space, considering vehicle owner electricity demand constraint, and modeling a charging and discharging strategy deployment problem of the electric vehicle as a constraint Markov decision problem;
step2: extracting future trend of time-varying power price by using a long-short term memory neural network, assisting subsequent deep reinforcement learning to carry out strategy optimization, and realizing effective deployment of EV charge-discharge strategies;
step3: an interior point-based strategy optimization algorithm is adopted, the constraint condition of the electricity quantity is converted by using a logarithm barrier function, strategy deployment is optimized in a deep neural network, and the charging cost of a user is minimized;
step4: according to the strategy obtained by training, interacting an external environment with a deployment intelligent body to obtain a real-time electric vehicle charging and discharging decision;
in the Step1, a model of the electric vehicle for slow charging at the dynamic electricity price is as follows:
in the formula: t is t 0 And t 1 Respectively representing the arrival time and the departure time of the electric automobile, wherein the arrival time and the departure time are both generated at the beginning of time; e 0 Is the remaining capacity at the beginning of charging and discharging of the electric vehicle, E t And E t+1 SOC values at t and t + 1; eta ch And η dis The EV battery energy conversion efficiency during charging and discharging;and &>Respectively charge and discharge maximum power; Δ t is the duration of each charging action; a is t For charging and discharging power of electric vehicle, when a t If the voltage is more than 0, the electric automobile is charged, otherwise, the electric automobile is discharged;
the electric automobile charge and discharge deploys the intelligent agent and the environment: the method comprises the following steps that an electric vehicle charging and discharging deployment intelligent body and the environment learn experience in an interactive mode and optimize a charging and discharging deployment strategy; the environment observed by the intelligent agent is divided into two parts, wherein one part is the real-time SOC value of the electric automobile, and the other part is the time-varying electricity price of the time node;
and (4) state set S: the environmental state at time t may be defined as:
S t =(E t ,P t-23 ,...,P t )
three types of information are contained in the formula: e t The SOC value of the electric automobile at the time t; (P) t-23 ,...,P t ) Represents the electricity price in the past 24 hours;
an action set A: action a at time t t Represents the charge and discharge power of the electric automobile in unit time, a t The x delta t is the electric energy conversion quantity of the electric automobile in a delta t time period;
the core of the constraint Markov decision process is to satisfy a constraint function c t Maximizing the reward function r on the premise t Giving an optimal strategy, so that the optimal goal of electric vehicle charging deployment is to ensure that the charging cost under the constraint of meeting the electric vehicle owner power demand is minimum:
(1) Charging fee, i.e. reward value:
r t =R(s t ,a t ,s t+1 )=-a t ×P t
in the formula: in the charging process, the reward represents the product of the charging power in unit time and the electricity price at the moment t, namely the negative value of the charging cost; during the discharge, the reward represents revenue for selling electricity to the grid;
(2) Electric quantity constraint is a constraint value:
wherein, | E t -E target L is the moment when charging is completed and the battery capacity E t And charging target E target A deviation value;
E t -E max and E min -E t The practical physical mechanism is combined, and charging and discharging in the EV battery capacity range are guaranteed;
the aim of learning of intelligent agent for electric vehicle charging and discharging deployment is to solve the T-period total expected discount reward J meeting the electric quantity constraint condition C (π) max, the objective function is expressed as:
in the formula: gamma is a discount factor for balancing the current constraint value with the future constraint value; d represents a very small constraint variance.
2. The electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization according to claim 1, wherein in the Step2, the characteristic information of future trend of time-varying power price is extracted by using a long-short term memory neural network, and the method specifically comprises the following steps:
the long-short term memory neural network and prediction module comprises the following calculation processes: the LSTM network is expanded into a 23-layer neural network structure; wherein the input X of the first layer t-22 Is represented by X t-22 =P t-22 -P t-23 In which P is t-22 And P t-23 Respectively representing the time-varying electricity prices at the t-22 moment and the t-21 moment; y is t-22 Representing the output of the first layer, c t-22 Indicating its cellular state; containing past electricity price information y t-22 And c t-22 Is passed to the next layer; this process is repeated until the last layer;
the LSTM realizes the selective transmission of memory information by protecting and controlling the cell state through a gate mechanism, and comprises a forgetting gate, an input gate and an output gate:
in the formula: o is forget (t)、O input (t)、O out (t) respectively representing output matrixes of the forgetting gate, the input gate and the output gate at the time t; w yf 、W xf 、W yi 、W xi 、W yo 、W xo Respectively showing a forgetting gate, an input gate, an output gate and an output y at the time of t-1 t-1 And input x at time t t The connection weight matrix of (2); b f 、bi、b o Respectively representing the offset vectors of the gates on the corresponding branches; σ represents an activation function;
therefore, the calculation formula for the time series electricity price future trend extraction is as follows:
in the formula: o is z (t) preprocessing information input to the cell status module at time t; w is a group of yz 、W xz Respectively representing the output y at time t-1 t-1 And input x at time t t And O z (t) connection weight momentArraying; b z Is a bias vector; the Hadamard product representing the matrix; tanh is the activation function.
3. The electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization as claimed in claim 1, wherein in the Step3, the electric quantity constraint is expanded by using a logarithmic barrier function, and the method specifically comprises the following steps:
for each problem that the constraint satisfies, an index function is setSatisfies the following conditions:
in the formula: when in strategy pi θ Lower constraint conditionWhen the condition is met, the problem is converted into an unconstrained strategy optimization problem only considering rewards to solve; however, when any constraint violates, the penalty is- ∞, requiring preferential tuning of the policy to satisfy the constraint; the logarithmic barrier function being an index function>Can be approximated by:
where k is a hyperparameter, the larger the value of k, the larger the index functionThe better the fitting effect; in this way by means of the criterion function>Expanding a target, and simplifying the original CMDP problem into an unconstrained optimization problem;
the interior point strategy optimization inherits a frame of a near-end strategy optimization algorithm, an actuator-judger framework is adopted, and small-batch data are randomly sampled from an experience playback pool for strategy updating during training; the evaluation device network updates the network parameter theta by using a time sequence difference error method v The specific calculation formula is as follows:
the near-end strategy optimization carries out first-order approximation and adopts Monte Carlo to approximate expectation, and then an objective function L is obtained through a clipping function CLIP :
In the formula:representing the ratio of old and new policies;Representing a merit function; clip (·) function will ξ t Restricted to the interval [ 1-epsilon, 1+ epsilon ] with respect to the hyper-parameter epsilon]In addition, the calculation process is simplified;
the interior point strategy optimization expands the constraint condition into the objective function through a logarithm barrier function, so that not only is the long-time coupling constraint met, but also a confidence domain correction method compatible with random gradient descent is realized, and the objective function under the final parameter theta is specifically as follows:
4. the electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization according to claim 1, wherein the Step4 specifically comprises the following steps:
the decision of the intelligent agent after on-line deployment only depends on the actuator network which is trained, and the optimal parameter theta obtained by the training of the actuator network is loaded * And the neural network model is interacted with the intelligent body according to the state information to obtain a real-time charging and discharging decision, and the interaction process is repeated continuously until the electric automobile leaves the charging pile.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210848364.XA CN114997935B (en) | 2022-07-19 | 2022-07-19 | Electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210848364.XA CN114997935B (en) | 2022-07-19 | 2022-07-19 | Electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114997935A CN114997935A (en) | 2022-09-02 |
CN114997935B true CN114997935B (en) | 2023-04-07 |
Family
ID=83021907
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210848364.XA Active CN114997935B (en) | 2022-07-19 | 2022-07-19 | Electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114997935B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115731072B (en) * | 2022-11-22 | 2024-01-30 | 东南大学 | Micro-grid space-time perception energy management method based on safety deep reinforcement learning |
CN115936195B (en) * | 2022-11-23 | 2024-07-12 | 合肥工业大学 | Intelligent cell energy optimization method, system, electronic equipment and storage medium |
CN117689188B (en) * | 2024-02-04 | 2024-04-26 | 江西驴充充物联网科技有限公司 | Big data-based user charging strategy optimization system and method |
CN117863969B (en) * | 2024-03-13 | 2024-05-17 | 国网北京市电力公司 | Electric automobile charge and discharge control method and system considering battery loss |
CN118082598B (en) * | 2024-04-25 | 2024-10-11 | 国网天津市电力公司电力科学研究院 | Electric vehicle charging method, apparatus, device, medium, and program product |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113572157A (en) * | 2021-07-27 | 2021-10-29 | 东南大学 | User real-time autonomous energy management optimization method based on near-end policy optimization |
CN113627993A (en) * | 2021-08-26 | 2021-11-09 | 东北大学秦皇岛分校 | Intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111934335B (en) * | 2020-08-18 | 2022-11-18 | 华北电力大学 | Cluster electric vehicle charging behavior optimization method based on deep reinforcement learning |
CN114619907B (en) * | 2020-12-14 | 2023-10-20 | 中国科学技术大学 | Coordinated charging method and coordinated charging system based on distributed deep reinforcement learning |
CN113922404B (en) * | 2021-10-22 | 2023-08-29 | 山东大学 | Community electric automobile cluster charging coordination method and system |
-
2022
- 2022-07-19 CN CN202210848364.XA patent/CN114997935B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113572157A (en) * | 2021-07-27 | 2021-10-29 | 东南大学 | User real-time autonomous energy management optimization method based on near-end policy optimization |
CN113627993A (en) * | 2021-08-26 | 2021-11-09 | 东北大学秦皇岛分校 | Intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN114997935A (en) | 2022-09-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114997935B (en) | Electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization | |
CN109347149B (en) | Micro-grid energy storage scheduling method and device based on deep Q-value network reinforcement learning | |
CN113511082B (en) | Hybrid electric vehicle energy management method based on rule and double-depth Q network | |
Xu et al. | A soft actor-critic-based energy management strategy for electric vehicles with hybrid energy storage systems | |
CN112117760A (en) | Micro-grid energy scheduling method based on double-Q-value network deep reinforcement learning | |
Huang et al. | Ensemble learning for charging load forecasting of electric vehicle charging stations | |
CN110138006B (en) | Multi-microgrid coordinated optimization scheduling method considering new energy electric vehicle | |
CN112131733B (en) | Distributed power supply planning method considering influence of charging load of electric automobile | |
CN113572157B (en) | User real-time autonomous energy management optimization method based on near-end policy optimization | |
CN109050284B (en) | Electric automobile charging and discharging electricity price optimization method considering V2G | |
CN113627993A (en) | Intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning | |
CN107919675B (en) | Charging station load scheduling model comprehensively considering benefits of vehicle owners and operators | |
CN112238781B (en) | Electric automobile ordered charging control method based on layered architecture | |
CN103997091A (en) | Scale electric automobile intelligent charging control method | |
CN114583729A (en) | Light-storage electric vehicle charging station scheduling method considering full-life-cycle carbon emission | |
CN116683513A (en) | Method and system for optimizing energy supplement strategy of mobile micro-grid | |
CN117559468A (en) | V2G station rapid frequency modulation response method based on ultra-short term frequency deviation prediction | |
CN117318169A (en) | Active power distribution network scheduling method based on deep reinforcement learning and new energy consumption | |
CN115308606A (en) | Lithium ion battery health state estimation method based on proximity features | |
Zhang et al. | Uncertainty-Aware Energy Management Strategy for Hybrid Electric Vehicle Using Hybrid Deep Learning Method | |
CN114619907B (en) | Coordinated charging method and coordinated charging system based on distributed deep reinforcement learning | |
CN113555888B (en) | Micro-grid energy storage coordination control method | |
CN114583696A (en) | Power distribution network reactive power optimization method and system based on BP neural network and scene matching | |
CN115036952A (en) | Real-time power control method for electric vehicle participating in load stabilization based on MPC | |
CN114742453A (en) | Micro-grid energy management method based on Rainbow deep Q network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |