CN114997935A - Electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization - Google Patents

Electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization Download PDF

Info

Publication number
CN114997935A
CN114997935A CN202210848364.XA CN202210848364A CN114997935A CN 114997935 A CN114997935 A CN 114997935A CN 202210848364 A CN202210848364 A CN 202210848364A CN 114997935 A CN114997935 A CN 114997935A
Authority
CN
China
Prior art keywords
charging
time
electric vehicle
constraint
discharging
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210848364.XA
Other languages
Chinese (zh)
Other versions
CN114997935B (en
Inventor
臧汉洲
叶宇剑
汤奕
钱俊良
周吉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Liyang Research Institute of Southeast University
Original Assignee
Liyang Research Institute of Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Liyang Research Institute of Southeast University filed Critical Liyang Research Institute of Southeast University
Priority to CN202210848364.XA priority Critical patent/CN114997935B/en
Publication of CN114997935A publication Critical patent/CN114997935A/en
Application granted granted Critical
Publication of CN114997935B publication Critical patent/CN114997935B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0206Price or cost determination based on market factors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T90/00Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation
    • Y02T90/10Technologies relating to charging of electric vehicles
    • Y02T90/16Information or communication technologies improving the operation of electric vehicles
    • Y02T90/167Systems integrating technologies related to power network operation and communication or information technologies for supporting the interoperability of electric or hybrid vehicles, i.e. smartgrids as interface for battery charging of electric vehicles [EV] or hybrid vehicles [HEV]

Abstract

The invention discloses an electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization, and relates to the field of electric vehicle charging and discharging scheduling, wherein the method comprises the following steps: firstly, an electric vehicle charging and discharging model is constructed according to an actual physical mechanism, and meanwhile, modeling is carried out as a constraint sequential decision problem based on an electric vehicle charging and discharging deployment optimization problem. And then extracting the future trend of the time-varying electricity price by using the long-short term memory neural network to assist the follow-up deep reinforcement learning to carry out strategy optimization. And finally, inputting the extracted electricity price information and the internal state characteristics of the electric vehicle into a strategy function based on a deep neural network, enabling to deploy intelligent learning charge and discharge actions, and expanding the electric quantity constraint into an optimization target of interior point strategy optimization through a logarithm barrier function to carry out strategy optimization. The deployment optimization method provided by the invention minimizes the charging cost of a user on the premise of meeting the power consumption requirement of the electric automobile, and meanwhile, improves the adaptability of the strategy to uncertainty.

Description

Electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization
Technical Field
The invention relates to the field of electric vehicle charging and discharging scheduling, in particular to a real-time electric vehicle charging and discharging strategy deployment optimization method based on interior point strategy optimization.
Background
As an environmentally friendly alternative to traditional fossil fuel automobiles, electric automobiles have been widely adopted over the past few years. However, the rapid development of EVs inevitably causes large-scale electric vehicle clusters to be integrated into the power grid, which poses great challenges to the economic and safe operation of the power grid. To address this issue, electric vehicles may be incentivized by demand response to shift charging times to off-peak hours, and optimize electric vehicle charging costs according to dynamic electricity prices, even to gain revenue by discharging to the grid.
The essence of the EV optimization scheduling problem is the scheduling problem of the charging and discharging states of the electric vehicle under the random scene of multiple uncertain factors. The deep reinforcement learning is suitable for finding the optimal strategy in a complex uncertain environment, and is an effective method for solving the sequence decision problem. However, when the constraint of meeting the trip electricity demand of the user is taken as a constraint, the conventional deep reinforcement learning method needs to correctly design a penalty item and select a penalty coefficient to ensure that the electric vehicle can be fully charged when leaving. Selecting the proper penalty coefficients requires a great deal of time and effort, is a very tedious process, and can lead to a sharp reduction in the performance of the algorithm once the designed penalty coefficients are not proper.
Disclosure of Invention
The invention aims to provide an electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization, which does not depend on accurate prediction of future information and only depends on real-time perception of environmental states to carry out self-optimization-approaching strategy learning; the adopted long-short term memory neural network effectively extracts the time sequence characteristics of the future trend of the time-varying electricity price; in addition, the interior point strategy optimization algorithm takes the requirement of the user for trip electricity utilization as a constraint premise, so that the charging cost of the user is minimized, and the adaptability of the strategy to uncertainty is improved.
In order to meet the requirements, the invention adopts the following technical scheme:
an electric vehicle charge and discharge strategy optimization method based on interior point strategy optimization comprises the following steps:
step 1: modeling a slow charging process of the electric vehicle under a dynamic electricity price, describing an EV storage battery state space, considering the power demand constraint of a vehicle owner, and modeling a charging and discharging strategy deployment problem of the electric vehicle as a constraint Markov decision problem;
step 2: extracting future trend of time-varying power price by using a long-short term memory neural network, assisting subsequent deep reinforcement learning to carry out strategy optimization, and realizing effective deployment of EV charge-discharge strategies;
step 3: and an interior point-based strategy optimization algorithm is adopted, the logarithmic barrier function is utilized to convert the electric quantity constraint condition, the strategy deployment is optimized in the deep neural network, and the user charging cost is minimized.
Step 4: and interacting the external environment with the deployment intelligent agent according to the strategy obtained by training to obtain a real-time electric vehicle charging and discharging decision.
Further, in the Step1, the model of the electric vehicle slow charging at the dynamic electricity price is as follows:
Figure BDA0003753893190000021
in the formula: t is t 0 And t 1 Respectively representing the arrival time and the departure time of the electric automobile, wherein the arrival time and the departure time are both generated at the beginning of time; e 0 Is the remaining capacity at the beginning of charging and discharging of the electric vehicle, E t And E t+1 SOC values at t and t + 1; eta ch And η dis The EV battery energy conversion efficiency during charging and discharging;
Figure BDA0003753893190000022
and
Figure BDA0003753893190000023
respectively charging and discharging maximum power; Δ t is the duration of each charging action; a is t For charging and discharging power of electric vehicle, when a t If > 0, the electric vehicle is charged, otherwise, the electric vehicle is discharged.
The electric automobile charge and discharge deploys the intelligent agent and the environment: the electric vehicle charging and discharging deployment intelligent body and the environment interactively learn experiences and optimize a charging and discharging deployment strategy; the environment observed by the intelligent agent is divided into two parts, wherein one part is the real-time SOC value of the electric automobile, and the other part is the time-varying electricity price of the time node.
And (4) state set S: the environmental state at time t may be defined as:
s t =(E t ,P t-23 ,...,P t )
three types of information are contained in the formula: e t The SOC value of the electric automobile at the time t; (P) t-23 ,...,P t ) Represents the electricity price in the past 24 hours;
action set A: action a at time t t Represents the charge and discharge power of the electric vehicle in unit time, a t And the multiplied by delta t is the electric energy conversion quantity of the electric automobile in the delta t period.
The core of the constraint Markov decision process is to satisfy a constraint function c t Under the premise of maximizing the reward function r t Giving an optimal strategy, so that the optimal goal of electric vehicle charging deployment is to ensure that the charging cost under the constraint of satisfying the vehicle owner power demand is minimum:
(1) charging fee, i.e., reward value:
r t =R(s t ,a t ,s t+1 )=-a t ×P t
in the formula: in the charging process, the reward represents the product of the electricity price at the time t and the charging power in unit time, namely the negative value of the charging cost; during the discharge process, the reward represents revenue for selling electricity to the grid.
(2) Electric quantity constraint is a constraint value:
Figure BDA0003753893190000031
wherein, | E t -E target L is the time when charging is completed and the battery capacity E t And charging target E target A deviation value; e t -E max And E min -E t Shows that the practical physical mechanism is combined to ensure that the charging and discharging are carried out within the EV battery capacity range.
The aim of learning of intelligent agent for electric vehicle charging and discharging deployment is to solve the T-period total expected discount reward J meeting the electric quantity constraint condition C (π) max, the objective function is expressed as:
Figure BDA0003753893190000032
Π C ={π:J C (π)≤d}
in the formula: gamma is a discount factor for balancing the current constraint value with the future constraint value; d represents a very small constraint variance.
Further, in the Step2, the method for extracting the characteristic information of the future trend of the time-varying electricity price by using the long-term and short-term memory neural network specifically comprises the following steps:
the long-short term memory neural network and prediction module comprises the following calculation processes: the LSTM network expands into a 23-layer neural network structure. Wherein the input X of the first layer t-22 Is represented by X t-22 =P t-22 -P t-23 In which P is t-22 And P t-23 Representing the time-varying electricity prices at time t-22 and time t-21, respectively. y is t-22 Representing the output of the first layer, c t-22 Indicating its cellular state. Containing past electricity price information y t-22 And c t-22 Is passed to the next layer. This process is repeated until the last layer.
The LSTM realizes the selective transmission of memory information by protecting and controlling the cell state through a gate mechanism, and comprises a forgetting gate, an input gate and an output gate:
Figure BDA0003753893190000033
in the formula: o is forget (t)、O input (t)、O out (t) respectively representing output matrixes of the forgetting gate, the input gate and the output gate at the time t; w yf 、W xf 、W yi 、W xi 、W yo 、W xo Respectively showing a forgetting gate, an input gate, an output gate and an output y at the time of t-1 t-1 And input x at time t t The connection weight matrix of (2); b f 、bi、b o Respectively representing the offset vectors of the gates on the corresponding branches; σ represents an activation function;
therefore, the calculation formula for the time series electricity price future trend extraction is as follows:
Figure BDA0003753893190000041
in the formula: o is z (t) preprocessing information input to the cell status module at time t; w yz 、W xz Respectively representing the output y at time t-1 t-1 And input x at time t t And O z (t) a connection weight matrix; b z Is a bias vector; the Hadamard product representing the matrix; tanh is the activation function.
Further, in Step3, the expanding the electric quantity constraint by using a logarithmic barrier function specifically includes the following steps:
for each problem that the constraint satisfies, an index function is set
Figure BDA0003753893190000042
Satisfies the following conditions:
Figure BDA0003753893190000043
in the formula: when in strategy pi θ Lower constraint condition
Figure BDA0003753893190000044
When the condition is met, the problem is converted into an unconstrained strategy optimization problem only considering rewards to solve; however, when any constraint violates, the penalty is- ∞, requiring preferential tuning of the policy to satisfy the constraint. The logarithmic barrier function being an index function
Figure BDA0003753893190000045
Can be approximated by:
Figure BDA0003753893190000046
where k is a hyperparameter, the larger the value of k, the larger the index function
Figure BDA0003753893190000047
The better the fitting. Thus passing through the index function
Figure BDA0003753893190000048
And (4) expanding the target and simplifying the original CMDP problem into an unconstrained optimization problem.
The interior point strategy optimization inherits a framework of a near-end strategy optimization algorithm, an actuator-judger framework is adopted, and small batches of data are randomly sampled from an experience playback pool for strategy updating during training; the evaluation device network updates the network parameter theta by using a time sequence difference error method v The specific calculation formula is as follows:
Figure BDA0003753893190000049
in the formula:
Figure BDA00037538931900000410
representing a network state value function at the moment t;
the near-end strategy optimization carries out first-order approximation and adopts Monte Carlo to approximate expectation, and then a target function L is obtained through a cutting function CLIP
Figure BDA00037538931900000411
In the formula:
Figure BDA0003753893190000051
representing the ratio of old and new policies;
Figure BDA0003753893190000052
representing a merit function; clip (·) function will ξ t Restricted to the interval [ 1-epsilon, 1+ epsilon ] with respect to the hyperparameter epsilon]In addition, the calculation process is simplified.
The interior point strategy optimization expands the constraint condition into the objective function through a logarithmic barrier function, so that not only is the long-time coupling constraint met, but also a confidence domain correction method compatible with random gradient descent is realized, and the objective function under the final parameter theta is specifically as follows:
Figure BDA0003753893190000053
further, the Step4 includes the following specific steps:
the decision of the intelligent agent after on-line deployment only depends on the actuator network which is trained, and the optimal parameter theta obtained by the training of the actuator network is loaded * And the neural network model is used for interacting with the intelligent agent according to the state information to obtain a real-time charging and discharging decision. And continuously repeating the interaction process until the electric automobile leaves the charging pile.
Has the beneficial effects that:
the electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization does not depend on accurate prediction of future information, and only senses the environment state in real time to carry out self-optimization-approaching strategy learning; the adopted long-short term memory neural network effectively extracts the time sequence characteristics of the future trend of the time-varying electricity price; in addition, the interior point strategy optimization algorithm takes the requirement of meeting the trip electricity utilization of the user as a constraint premise, the charging cost of the user is minimized, and meanwhile the adaptability of the strategy to uncertainty is improved.
Drawings
FIG. 1 is a diagram of the timing feature extraction and strategy network based on the long-term and short-term memory neural network according to the present invention.
FIG. 2 is a flow chart of training of the interior point strategy optimization algorithm of the present invention.
FIG. 3 is a diagram of a Markov decision process of the present invention.
FIG. 4 is a diagram illustrating reward values under the interior point policy optimization algorithm of the present invention.
FIG. 5 is a diagram illustrating constraint values under the interior point policy optimization algorithm of the present invention.
FIG. 6 is a real-time electric vehicle charging and discharging schedule according to the present invention.
Detailed description of the preferred embodiment
The following detailed description of the embodiments of the present invention is provided with reference to the accompanying drawings 1:
the invention provides an electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization, which comprises the following steps:
step 1: modeling the slow charging process of the electric automobile at the dynamic electricity price, describing the state space of the EV storage battery, considering the constraint of the electricity demand of an automobile owner, and modeling the charge and discharge strategy deployment problem of the electric automobile as a constraint Markov decision problem, specifically comprising:
Figure BDA0003753893190000061
in the formula: t is t 0 And t 1 Respectively representing the arrival time and the departure time of the electric automobile, wherein the arrival time and the departure time are both generated at the beginning of time; e 0 Is the remaining electric quantity at the beginning of charging and discharging of the electric vehicle, E t And E t+1 SOC values at t and t + 1; eta ch And η dis The EV battery energy conversion efficiency during charging and discharging;
Figure BDA0003753893190000062
and
Figure BDA0003753893190000063
respectively charge and discharge maximum power; Δ t is the duration of each charging action; a is t For charging and discharging power of electric vehicle, when a t If the voltage is greater than 0, the electric automobile is charged, otherwise, the electric automobile is discharged.
The electric automobile charge and discharge deploys the intelligent agent and the environment: the method comprises the following steps that an electric vehicle charging and discharging deployment intelligent body and the environment learn experience in an interactive mode and optimize a charging and discharging deployment strategy; the environment observed by the intelligent agent can be divided into two parts, wherein one part is the real-time SOC value of the electric automobile, and the other part is the time-varying electricity price of the time node.
And (4) state set S: the environmental state at time t may be defined as:
s t =(E t ,P t-23 ,...,P t )
three types of information are contained in the formula: e t The SOC value of the electric automobile at the time t; (P) t-23 ,...,P t ) Represents the electricity price in the past 24 hours;
action set A: action a at time t t Represents the charge and discharge power of the electric vehicle in unit time, a t And the multiplied by delta t is the electric energy conversion quantity of the electric automobile in the delta t period.
The core of the constraint Markov decision process is to satisfy a constraint function c t Under the premise of maximizing the reward function r t Giving an optimal strategy, so that the optimal goal of electric vehicle charging deployment is to ensure that the charging cost under the constraint of satisfying the vehicle owner power demand is minimum:
(1) charging fee, i.e. reward value:
r t =R(s t ,a t ,s t+1 )=-a t ×P t
in the formula: in the charging process, the reward represents the product of the electricity price at the moment t and the charging power in unit time, namely the negative value of the charging cost; during the discharge process, the reward represents revenue for selling electricity to the grid.
(2) Electric quantity constraint is a constraint value:
Figure BDA0003753893190000071
wherein, | E t -E target L is the time when charging is completed and the battery capacity E t And charging target E target A deviation value; e t -E max And E min -E t Shows that the practical physical mechanism is combined to ensure that the charging and discharging are carried out within the EV battery capacity range.
The aim of intelligent learning for electric vehicle charge and discharge deployment is to solve T-period total expected discount reward J meeting the electric quantity constraint condition C (π) max, the objective function is expressed as:
Figure BDA0003753893190000072
Π C ={π:J C (π)≤d}
step 2: the method includes the steps that the future trend of the time-varying electricity price is extracted by using a long-term and short-term memory neural network, follow-up deep reinforcement learning is assisted to carry out strategy optimization, effective deployment of EV charge and discharge strategies is achieved, and the method specifically comprises the following steps:
the long-short term memory neural network and prediction module comprises the following calculation processes: the LSTM network expands into a 23-layer neural network structure. Wherein the input X of the first layer t-22 Is represented by X t-22 =P t-22 -P t-23 In which P is t-22 And P t-23 Representing the time-varying electricity prices at time t-22 and time t-21, respectively. y is t-22 Represents the output of the first layer, c t-22 Indicating its cellular state. Containing past electricity price information y t-22 And c t-22 Is passed to the next layer. This process is repeated until the last layer.
The LSTM realizes the selective transmission of memory information by protecting and controlling the cell state through a gate mechanism, and comprises a forgetting gate, an input gate and an output gate:
Figure BDA0003753893190000073
in the formula: o is forget (t)、O input (t)、O out (t) respectively representing the output matrixes of the forgetting gate, the input gate and the output gate; w yf 、W xf 、W yi 、W xi 、W yo 、W xo Respectively showing a forgetting gate, an input gate, an output gate and an output y at the time of t-1 t-1 And input x at time t t The connection weight matrix of (2); b f 、bi、b o Respectively representing the offset vectors of the gates on the corresponding branches; σ represents an activation function;
therefore, the calculation formula for the time series electricity price future trend extraction is as follows:
Figure BDA0003753893190000081
in the formula: o is z (t) preprocessing information input to the cell status module at time t; w yz 、W xz Respectively representing the output y at time t-1 t-1 And input x at time t t And O z (t) a connection weight matrix; b z Is a bias vector; the Hadamard product representing the matrix; tanh is the activation function.
Step 3: the method adopts an interior point-based strategy optimization algorithm, utilizes a logarithmic barrier function to convert an electric quantity constraint condition, optimizes strategy deployment in a deep neural network, minimizes user charging cost, and specifically comprises the following steps:
for each problem that the constraint satisfies, an index function is set
Figure BDA0003753893190000082
Satisfies the following conditions:
Figure BDA0003753893190000083
in the formula: when in strategy pi θ Lower constraint condition
Figure BDA0003753893190000084
When the condition is met, the problem is converted into an unconstrained strategy optimization problem only considering rewards to solve;however, when any constraint violates, the penalty is- ∞, requiring preferential tuning of the policy to satisfy the constraint. The logarithmic barrier function being an index function
Figure BDA0003753893190000085
Can be approximated by:
Figure BDA0003753893190000086
where k is a hyperparameter, the larger the value of k, the larger the index function
Figure BDA0003753893190000087
The better the fitting. Thus passing through the index function
Figure BDA0003753893190000088
And expanding the target, and simplifying the original CMDP problem into an unconstrained optimization problem.
The interior point strategy optimization inherits a frame of a near-end strategy optimization algorithm, an actuator-judger framework is adopted, and small-batch data are randomly sampled from an experience playback pool for strategy updating during training; the evaluation device network updates the network parameter theta by using a time sequence difference error method v The specific calculation formula is as follows:
Figure BDA0003753893190000089
in the formula:
Figure BDA00037538931900000810
representing a network state value function at the moment t;
the near-end strategy optimization carries out first-order approximation and adopts Monte Carlo to approximate expectation, and then an objective function L is obtained through a clipping function CLIP
Figure BDA00037538931900000811
In the formula:
Figure BDA0003753893190000091
representing the ratio of old and new policies;
Figure BDA0003753893190000092
representing a merit function; clip (·) function will ξ t Restricted to the interval [ 1-epsilon, 1+ epsilon ] with respect to the hyper-parameter epsilon]In addition, the calculation process is simplified.
The interior point strategy optimization expands the constraint condition into the objective function through a logarithm barrier function, so that not only is the long-time coupling constraint met, but also a confidence domain correction method compatible with random gradient descent is realized, and the objective function under the final parameter theta is specifically as follows:
Figure BDA0003753893190000093
step 4: according to the strategy obtained by training, the external environment is interacted with the deployment intelligent body to obtain a real-time electric vehicle charging and discharging decision, and the method specifically comprises the following steps:
the decision of the intelligent agent after on-line deployment only depends on the actuator network which is trained, and the optimal parameter theta obtained by the training of the actuator network is loaded * And the neural network model is used for interacting with the intelligent agent according to the state information to obtain a real-time charging and discharging decision. And continuously repeating the interaction process until the electric automobile leaves the charging pile.

Claims (5)

1. An electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization is characterized by comprising the following steps:
step 1: modeling a slow charging process of the electric vehicle under a dynamic electricity price, describing an EV storage battery state space, considering vehicle owner electricity demand constraint, and modeling a charging and discharging strategy deployment problem of the electric vehicle as a constraint Markov decision problem;
step 2: extracting future trend of time-varying power price by using a long-short term memory neural network, assisting subsequent deep reinforcement learning to carry out strategy optimization, and realizing effective deployment of EV charge-discharge strategies;
step 3: an interior point-based strategy optimization algorithm is adopted, the constraint condition of the electricity quantity is converted by using a logarithm barrier function, strategy deployment is optimized in a deep neural network, and the charging cost of a user is minimized;
step 4: and interacting the external environment with the deployment intelligent agent according to the strategy obtained by training to obtain a real-time electric vehicle charging and discharging decision.
2. The electric vehicle charge and discharge strategy optimization method based on interior point strategy optimization according to claim 1, characterized in that the model of slow charging of the electric vehicle at the dynamic electricity price in Step1 is as follows:
Figure FDA0003753893180000011
in the formula: t is t 0 And t 1 Respectively representing the arrival time and the departure time of the electric automobile, wherein the arrival time and the departure time are both generated at the beginning of time; e 0 Is the remaining electric quantity at the beginning of charging and discharging of the electric vehicle, E t And E t+1 SOC values at t and t + 1; eta ch And η dis The EV battery energy conversion efficiency during charging and discharging;
Figure FDA0003753893180000012
and
Figure FDA0003753893180000013
respectively charge and discharge maximum power; Δ t is the duration of each charging action; a is t For charging and discharging power of electric vehicle, when a t If the voltage is more than 0, the electric automobile is charged, otherwise, the electric automobile is discharged;
the electric automobile charge and discharge deploys the intelligent agent and the environment: the method comprises the following steps that an electric vehicle charging and discharging deployment intelligent body and the environment learn experience in an interactive mode and optimize a charging and discharging deployment strategy; the intelligent agent observation environment is divided into two parts, one part is the real-time SOC value of the electric automobile, and the other part is the time-varying electricity price of the time node;
and (4) state set S: the environmental state at time t may be defined as:
s t =(E t ,P t-23 ,K,P t )
three types of information are contained in the formula: e t The SOC value of the electric automobile at the time t; (P) t-23 ,K,P t ) Represents the electricity price in the past 24 hours;
action set A: action a at time t t Represents the charge and discharge power of the electric vehicle in unit time, a t The x delta t is the electric energy conversion quantity of the electric automobile in a delta t time period;
the core of the constraint Markov decision process is to satisfy a constraint function c t Under the premise of maximizing the reward function r t Giving an optimal strategy, so that the optimal goal of electric vehicle charging deployment is to ensure that the charging cost under the constraint of satisfying the vehicle owner power demand is minimum:
(1) charging fee, i.e. reward value:
r t =R(s t ,a t ,s t+1 )=-a t ×P t
in the formula: in the charging process, the reward represents the product of the charging power in unit time and the electricity price at the time t, namely the negative value of the charging cost; during the discharge, the reward represents revenue from the sale of electricity to the grid;
(2) electric quantity constraint is a constraint value:
Figure FDA0003753893180000021
wherein, | E t -E target L is the time when charging is completed and the battery capacity E t And charging target E target A deviation value; e t -E max And E min -E t The practical physical mechanism is combined, and charging and discharging in the EV battery capacity range are guaranteed;
the aim of learning of intelligent agent for electric vehicle charging and discharging deployment is to solve the T-period total expected discount reward J meeting the electric quantity constraint condition C (π) max, the objective function is expressed as:
Figure FDA0003753893180000022
Π C ={π:J C (π)≤d}
in the formula: gamma is a discount factor for balancing the current constraint value with the future constraint value; d represents a very small constraint variance.
3. The electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization according to claim 1, wherein in the Step2, the method for extracting the characteristic information of the future trend of the time-varying power price by using the long-short term memory neural network specifically comprises the following steps:
the long-short term memory neural network and prediction module comprises the following calculation processes: the LSTM network is expanded into a 23-layer neural network structure; wherein the input X of the first layer t-22 Is represented by X t-22 =P t-22 -P t-23 In which P is t-22 And P t-23 Respectively representing the time-varying electricity prices at the t-22 moment and the t-21 moment; y is t-22 Representing the output of the first layer, c t-22 Indicating its cellular state; containing past electricity price information y t-22 And c t-22 Is passed to the next layer; this process is repeated until the last layer;
the LSTM realizes the selective transmission of memory information by protecting and controlling the cell state through a gate mechanism, and comprises a forgetting gate, an input gate and an output gate:
Figure FDA0003753893180000031
in the formula: o is forget (t)、O input (t)、O out (t) respectively representing output matrixes of the forgetting gate, the input gate and the output gate at the time t; w yf 、W xf 、W yi 、W xi 、W yo 、W xo Respectively showing a forgetting gate, an input gate, an output gate and a t-1 moment outputy t-1 And input x at time t t The connection weight matrix of (2); b f 、bi、b o Respectively representing the offset vectors of the gates on the corresponding branches; σ represents an activation function;
therefore, the calculation formula for extracting the future trend of the time series electricity price is as follows:
Figure FDA0003753893180000032
in the formula: o is z (t) preprocessing information input to the cell status module at time t; w yz 、W xz Respectively representing the output y at time t-1 t-1 And input x at time t t And O z (t) a connection weight matrix; b z Is a bias vector; the Hadamard product representing the matrix; tanh is the activation function.
4. The electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization as claimed in claim 1, wherein in the Step3, the electric quantity constraint is expanded by using a logarithmic barrier function, and the method specifically comprises the following steps:
for each problem that the constraint satisfies, an index function is set
Figure FDA0003753893180000033
Satisfies the following conditions:
Figure FDA0003753893180000034
in the formula: when in strategy pi θ Lower constraint condition
Figure FDA0003753893180000035
When the condition is met, the problem is converted into an unconstrained strategy optimization problem only considering rewards to solve; however, when any constraint violates, the penalty is- ∞, requiring preferential tuning of the policy to meet the constraint; the logarithmic barrier function being an index function
Figure FDA0003753893180000036
Can be approximated by:
Figure FDA0003753893180000037
where k is a hyperparameter, the larger the value of k, the larger the index function
Figure FDA0003753893180000041
The better the fitting effect; thus passing through the index function
Figure FDA0003753893180000042
Expanding a target, and simplifying the original CMDP problem into an unconstrained optimization problem;
the interior point strategy optimization inherits a frame of a near-end strategy optimization algorithm, an actuator-judger framework is adopted, and small-batch data are randomly sampled from an experience playback pool for strategy updating during training; the evaluation device network updates the network parameter theta by using a time sequence difference error method v The specific calculation formula is as follows:
Figure FDA0003753893180000047
in the formula:
Figure FDA0003753893180000048
representing a network state value function at the moment t;
the near-end strategy optimization carries out first-order approximation and adopts Monte Carlo to approximate expectation, and then an objective function L is obtained through a clipping function CLIP
Figure FDA0003753893180000043
In the formula:
Figure FDA0003753893180000044
representing the ratio of old and new policies;
Figure FDA0003753893180000045
representing a merit function; clip (·) function will ξ t Restricted to the interval [ 1-epsilon, 1+ epsilon ] with respect to the hyper-parameter epsilon]In addition, the calculation process is simplified;
the interior point strategy optimization expands the constraint condition into the objective function through a logarithm barrier function, so that not only is the long-time coupling constraint met, but also a confidence domain correction method compatible with random gradient descent is realized, and the objective function under the final parameter theta is specifically as follows:
Figure FDA0003753893180000046
5. the electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization according to claim 1, wherein the Step4 specifically comprises the following steps:
the decision of the intelligent agent after on-line deployment only depends on the actuator network which is trained, and the optimal parameter theta obtained by the training of the actuator network is loaded * And the neural network model is used for interacting with the intelligent agent according to the state information to obtain a real-time charging and discharging decision. And continuously repeating the interaction process until the electric automobile leaves the charging pile.
CN202210848364.XA 2022-07-19 2022-07-19 Electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization Active CN114997935B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210848364.XA CN114997935B (en) 2022-07-19 2022-07-19 Electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210848364.XA CN114997935B (en) 2022-07-19 2022-07-19 Electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization

Publications (2)

Publication Number Publication Date
CN114997935A true CN114997935A (en) 2022-09-02
CN114997935B CN114997935B (en) 2023-04-07

Family

ID=83021907

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210848364.XA Active CN114997935B (en) 2022-07-19 2022-07-19 Electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization

Country Status (1)

Country Link
CN (1) CN114997935B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115731072A (en) * 2022-11-22 2023-03-03 东南大学 Microgrid space-time perception energy management method based on safe deep reinforcement learning
CN117689188A (en) * 2024-02-04 2024-03-12 江西驴充充物联网科技有限公司 Big data-based user charging strategy optimization system and method
CN117863969A (en) * 2024-03-13 2024-04-12 国网北京市电力公司 Electric automobile charge and discharge control method and system considering battery loss
CN117863969B (en) * 2024-03-13 2024-05-17 国网北京市电力公司 Electric automobile charge and discharge control method and system considering battery loss

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111934335A (en) * 2020-08-18 2020-11-13 华北电力大学 Cluster electric vehicle charging behavior optimization method based on deep reinforcement learning
CN113572157A (en) * 2021-07-27 2021-10-29 东南大学 User real-time autonomous energy management optimization method based on near-end policy optimization
CN113627993A (en) * 2021-08-26 2021-11-09 东北大学秦皇岛分校 Intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning
CN113922404A (en) * 2021-10-22 2022-01-11 山东大学 Community electric vehicle cluster charging coordination method and system
CN114619907A (en) * 2020-12-14 2022-06-14 中国科学技术大学 Coordinated charging method and coordinated charging system based on distributed deep reinforcement learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111934335A (en) * 2020-08-18 2020-11-13 华北电力大学 Cluster electric vehicle charging behavior optimization method based on deep reinforcement learning
CN114619907A (en) * 2020-12-14 2022-06-14 中国科学技术大学 Coordinated charging method and coordinated charging system based on distributed deep reinforcement learning
CN113572157A (en) * 2021-07-27 2021-10-29 东南大学 User real-time autonomous energy management optimization method based on near-end policy optimization
CN113627993A (en) * 2021-08-26 2021-11-09 东北大学秦皇岛分校 Intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning
CN113922404A (en) * 2021-10-22 2022-01-11 山东大学 Community electric vehicle cluster charging coordination method and system

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115731072A (en) * 2022-11-22 2023-03-03 东南大学 Microgrid space-time perception energy management method based on safe deep reinforcement learning
CN115731072B (en) * 2022-11-22 2024-01-30 东南大学 Micro-grid space-time perception energy management method based on safety deep reinforcement learning
CN117689188A (en) * 2024-02-04 2024-03-12 江西驴充充物联网科技有限公司 Big data-based user charging strategy optimization system and method
CN117689188B (en) * 2024-02-04 2024-04-26 江西驴充充物联网科技有限公司 Big data-based user charging strategy optimization system and method
CN117863969A (en) * 2024-03-13 2024-04-12 国网北京市电力公司 Electric automobile charge and discharge control method and system considering battery loss
CN117863969B (en) * 2024-03-13 2024-05-17 国网北京市电力公司 Electric automobile charge and discharge control method and system considering battery loss

Also Published As

Publication number Publication date
CN114997935B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN114997935B (en) Electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization
CN109347149B (en) Micro-grid energy storage scheduling method and device based on deep Q-value network reinforcement learning
CN113511082B (en) Hybrid electric vehicle energy management method based on rule and double-depth Q network
CN112117760A (en) Micro-grid energy scheduling method based on double-Q-value network deep reinforcement learning
CN111934335A (en) Cluster electric vehicle charging behavior optimization method based on deep reinforcement learning
CN112862281A (en) Method, device, medium and electronic equipment for constructing scheduling model of comprehensive energy system
CN112131733B (en) Distributed power supply planning method considering influence of charging load of electric automobile
CN110138006B (en) Multi-microgrid coordinated optimization scheduling method considering new energy electric vehicle
CN113572157B (en) User real-time autonomous energy management optimization method based on near-end policy optimization
Huang et al. Ensemble learning for charging load forecasting of electric vehicle charging stations
CN113627993A (en) Intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning
CN107919675B (en) Charging station load scheduling model comprehensively considering benefits of vehicle owners and operators
CN103997091A (en) Scale electric automobile intelligent charging control method
CN112238781B (en) Electric automobile ordered charging control method based on layered architecture
CN116683513A (en) Method and system for optimizing energy supplement strategy of mobile micro-grid
CN115308606A (en) Lithium ion battery health state estimation method based on proximity features
CN115115130A (en) Wind-solar energy storage hydrogen production system day-ahead scheduling method based on simulated annealing algorithm
CN114619907B (en) Coordinated charging method and coordinated charging system based on distributed deep reinforcement learning
CN113972645A (en) Power distribution network optimization method based on multi-agent depth determination strategy gradient algorithm
CN117559468A (en) V2G station rapid frequency modulation response method based on ultra-short term frequency deviation prediction
CN113555888B (en) Micro-grid energy storage coordination control method
CN114583696A (en) Power distribution network reactive power optimization method and system based on BP neural network and scene matching
CN115036952A (en) Real-time power control method for electric vehicle participating in load stabilization based on MPC
CN112003279B (en) Evaluation method for new energy consumption capability of hierarchical micro-grid
Lian et al. Real‐time energy management strategy for fuel cell plug‐in hybrid electric bus using short‐term power smoothing prediction and distance adaptive state‐of‐charge consumption

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant