CN114997935A

CN114997935A - Electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization

Info

Publication number: CN114997935A
Application number: CN202210848364.XA
Authority: CN
Inventors: 臧汉洲; 叶宇剑; 汤奕; 钱俊良; 周吉
Original assignee: Liyang Research Institute of Southeast University
Current assignee: Liyang Research Institute of Southeast University
Priority date: 2022-07-19
Filing date: 2022-07-19
Publication date: 2022-09-02
Anticipated expiration: 2042-07-19
Also published as: CN114997935B

Abstract

The invention discloses an electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization, and relates to the field of electric vehicle charging and discharging scheduling, wherein the method comprises the following steps: firstly, an electric vehicle charging and discharging model is constructed according to an actual physical mechanism, and meanwhile, modeling is carried out as a constraint sequential decision problem based on an electric vehicle charging and discharging deployment optimization problem. And then extracting the future trend of the time-varying electricity price by using the long-short term memory neural network to assist the follow-up deep reinforcement learning to carry out strategy optimization. And finally, inputting the extracted electricity price information and the internal state characteristics of the electric vehicle into a strategy function based on a deep neural network, enabling to deploy intelligent learning charge and discharge actions, and expanding the electric quantity constraint into an optimization target of interior point strategy optimization through a logarithm barrier function to carry out strategy optimization. The deployment optimization method provided by the invention minimizes the charging cost of a user on the premise of meeting the power consumption requirement of the electric automobile, and meanwhile, improves the adaptability of the strategy to uncertainty.

Description

Electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization

Technical Field

The invention relates to the field of electric vehicle charging and discharging scheduling, in particular to a real-time electric vehicle charging and discharging strategy deployment optimization method based on interior point strategy optimization.

Background

As an environmentally friendly alternative to traditional fossil fuel automobiles, electric automobiles have been widely adopted over the past few years. However, the rapid development of EVs inevitably causes large-scale electric vehicle clusters to be integrated into the power grid, which poses great challenges to the economic and safe operation of the power grid. To address this issue, electric vehicles may be incentivized by demand response to shift charging times to off-peak hours, and optimize electric vehicle charging costs according to dynamic electricity prices, even to gain revenue by discharging to the grid.

The essence of the EV optimization scheduling problem is the scheduling problem of the charging and discharging states of the electric vehicle under the random scene of multiple uncertain factors. The deep reinforcement learning is suitable for finding the optimal strategy in a complex uncertain environment, and is an effective method for solving the sequence decision problem. However, when the constraint of meeting the trip electricity demand of the user is taken as a constraint, the conventional deep reinforcement learning method needs to correctly design a penalty item and select a penalty coefficient to ensure that the electric vehicle can be fully charged when leaving. Selecting the proper penalty coefficients requires a great deal of time and effort, is a very tedious process, and can lead to a sharp reduction in the performance of the algorithm once the designed penalty coefficients are not proper.

Disclosure of Invention

The invention aims to provide an electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization, which does not depend on accurate prediction of future information and only depends on real-time perception of environmental states to carry out self-optimization-approaching strategy learning; the adopted long-short term memory neural network effectively extracts the time sequence characteristics of the future trend of the time-varying electricity price; in addition, the interior point strategy optimization algorithm takes the requirement of the user for trip electricity utilization as a constraint premise, so that the charging cost of the user is minimized, and the adaptability of the strategy to uncertainty is improved.

In order to meet the requirements, the invention adopts the following technical scheme:

an electric vehicle charge and discharge strategy optimization method based on interior point strategy optimization comprises the following steps:

step 1: modeling a slow charging process of the electric vehicle under a dynamic electricity price, describing an EV storage battery state space, considering the power demand constraint of a vehicle owner, and modeling a charging and discharging strategy deployment problem of the electric vehicle as a constraint Markov decision problem;

step 2: extracting future trend of time-varying power price by using a long-short term memory neural network, assisting subsequent deep reinforcement learning to carry out strategy optimization, and realizing effective deployment of EV charge-discharge strategies;

step 3: and an interior point-based strategy optimization algorithm is adopted, the logarithmic barrier function is utilized to convert the electric quantity constraint condition, the strategy deployment is optimized in the deep neural network, and the user charging cost is minimized.

Step 4: and interacting the external environment with the deployment intelligent agent according to the strategy obtained by training to obtain a real-time electric vehicle charging and discharging decision.

Further, in the Step1, the model of the electric vehicle slow charging at the dynamic electricity price is as follows:

in the formula: t is t ₀ And t ₁ Respectively representing the arrival time and the departure time of the electric automobile, wherein the arrival time and the departure time are both generated at the beginning of time; e ₀ Is the remaining capacity at the beginning of charging and discharging of the electric vehicle, E _t And E _t+1 SOC values at t and t + 1; eta _ch And η _dis The EV battery energy conversion efficiency during charging and discharging;

and

respectively charging and discharging maximum power; Δ t is the duration of each charging action; a is _t For charging and discharging power of electric vehicle, when a _t If > 0, the electric vehicle is charged, otherwise, the electric vehicle is discharged.

The electric automobile charge and discharge deploys the intelligent agent and the environment: the electric vehicle charging and discharging deployment intelligent body and the environment interactively learn experiences and optimize a charging and discharging deployment strategy; the environment observed by the intelligent agent is divided into two parts, wherein one part is the real-time SOC value of the electric automobile, and the other part is the time-varying electricity price of the time node.

And (4) state set S: the environmental state at time t may be defined as:

s _t ＝(E _t ,P _t-23 ,...,P _t )

three types of information are contained in the formula: e _t The SOC value of the electric automobile at the time t; (P) _t-23 ,...,P _t ) Represents the electricity price in the past 24 hours;

action set A: action a at time t _t Represents the charge and discharge power of the electric vehicle in unit time, a _t And the multiplied by delta t is the electric energy conversion quantity of the electric automobile in the delta t period.

The core of the constraint Markov decision process is to satisfy a constraint function c _t Under the premise of maximizing the reward function r _t Giving an optimal strategy, so that the optimal goal of electric vehicle charging deployment is to ensure that the charging cost under the constraint of satisfying the vehicle owner power demand is minimum:

(1) charging fee, i.e., reward value:

r _t ＝R(s _t ,a _t ,s _t+1 )＝-a _t ×P _t

in the formula: in the charging process, the reward represents the product of the electricity price at the time t and the charging power in unit time, namely the negative value of the charging cost; during the discharge process, the reward represents revenue for selling electricity to the grid.

(2) Electric quantity constraint is a constraint value:

wherein, | E _t -E _target L is the time when charging is completed and the battery capacity E _t And charging target E _target A deviation value; e _t -E _max And E _min -E _t Shows that the practical physical mechanism is combined to ensure that the charging and discharging are carried out within the EV battery capacity range.

The aim of learning of intelligent agent for electric vehicle charging and discharging deployment is to solve the T-period total expected discount reward J meeting the electric quantity constraint condition _C (π) max, the objective function is expressed as:

Π _C ＝{π:J _C (π)≤d}

in the formula: gamma is a discount factor for balancing the current constraint value with the future constraint value; d represents a very small constraint variance.

Further, in the Step2, the method for extracting the characteristic information of the future trend of the time-varying electricity price by using the long-term and short-term memory neural network specifically comprises the following steps:

the long-short term memory neural network and prediction module comprises the following calculation processes: the LSTM network expands into a 23-layer neural network structure. Wherein the input X of the first layer _t-22 Is represented by X _t-22 ＝P _t-22 -P _t-23 In which P is _t-22 And P _t-23 Representing the time-varying electricity prices at time t-22 and time t-21, respectively. y is _t-22 Representing the output of the first layer, c _t-22 Indicating its cellular state. Containing past electricity price information y _t-22 And c _t-22 Is passed to the next layer. This process is repeated until the last layer.

The LSTM realizes the selective transmission of memory information by protecting and controlling the cell state through a gate mechanism, and comprises a forgetting gate, an input gate and an output gate:

in the formula: o is _forget (t)、O _input (t)、O _out (t) respectively representing output matrixes of the forgetting gate, the input gate and the output gate at the time t; w _yf 、W _xf 、W _yi 、W _xi 、W _yo 、W _xo Respectively showing a forgetting gate, an input gate, an output gate and an output y at the time of t-1 _t-1 And input x at time t _t The connection weight matrix of (2); b _f 、bi、b _o Respectively representing the offset vectors of the gates on the corresponding branches; σ represents an activation function;

therefore, the calculation formula for the time series electricity price future trend extraction is as follows:

in the formula: o is _z (t) preprocessing information input to the cell status module at time t; w _yz 、W _xz Respectively representing the output y at time t-1 _t-1 And input x at time t _t And O _z (t) a connection weight matrix; b _z Is a bias vector; the Hadamard product representing the matrix; tanh is the activation function.

Further, in Step3, the expanding the electric quantity constraint by using a logarithmic barrier function specifically includes the following steps:

for each problem that the constraint satisfies, an index function is set

Satisfies the following conditions:

in the formula: when in strategy pi _θ Lower constraint condition

When the condition is met, the problem is converted into an unconstrained strategy optimization problem only considering rewards to solve; however, when any constraint violates, the penalty is- ∞, requiring preferential tuning of the policy to satisfy the constraint. The logarithmic barrier function being an index function

Can be approximated by:

where k is a hyperparameter, the larger the value of k, the larger the index function

The better the fitting. Thus passing through the index function

And (4) expanding the target and simplifying the original CMDP problem into an unconstrained optimization problem.

The interior point strategy optimization inherits a framework of a near-end strategy optimization algorithm, an actuator-judger framework is adopted, and small batches of data are randomly sampled from an experience playback pool for strategy updating during training; the evaluation device network updates the network parameter theta by using a time sequence difference error method ^v The specific calculation formula is as follows:

in the formula:

representing a network state value function at the moment t;

the near-end strategy optimization carries out first-order approximation and adopts Monte Carlo to approximate expectation, and then a target function L is obtained through a cutting function ^CLIP ：

In the formula:

representing the ratio of old and new policies;

representing a merit function; clip (·) function will ξ _t Restricted to the interval [ 1-epsilon, 1+ epsilon ] with respect to the hyperparameter epsilon]In addition, the calculation process is simplified.

The interior point strategy optimization expands the constraint condition into the objective function through a logarithmic barrier function, so that not only is the long-time coupling constraint met, but also a confidence domain correction method compatible with random gradient descent is realized, and the objective function under the final parameter theta is specifically as follows:

further, the Step4 includes the following specific steps:

the decision of the intelligent agent after on-line deployment only depends on the actuator network which is trained, and the optimal parameter theta obtained by the training of the actuator network is loaded ^* And the neural network model is used for interacting with the intelligent agent according to the state information to obtain a real-time charging and discharging decision. And continuously repeating the interaction process until the electric automobile leaves the charging pile.

Has the beneficial effects that:

the electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization does not depend on accurate prediction of future information, and only senses the environment state in real time to carry out self-optimization-approaching strategy learning; the adopted long-short term memory neural network effectively extracts the time sequence characteristics of the future trend of the time-varying electricity price; in addition, the interior point strategy optimization algorithm takes the requirement of meeting the trip electricity utilization of the user as a constraint premise, the charging cost of the user is minimized, and meanwhile the adaptability of the strategy to uncertainty is improved.

Drawings

FIG. 1 is a diagram of the timing feature extraction and strategy network based on the long-term and short-term memory neural network according to the present invention.

FIG. 2 is a flow chart of training of the interior point strategy optimization algorithm of the present invention.

FIG. 3 is a diagram of a Markov decision process of the present invention.

FIG. 4 is a diagram illustrating reward values under the interior point policy optimization algorithm of the present invention.

FIG. 5 is a diagram illustrating constraint values under the interior point policy optimization algorithm of the present invention.

FIG. 6 is a real-time electric vehicle charging and discharging schedule according to the present invention.

Detailed description of the preferred embodiment

The following detailed description of the embodiments of the present invention is provided with reference to the accompanying drawings 1:

the invention provides an electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization, which comprises the following steps:

step 1: modeling the slow charging process of the electric automobile at the dynamic electricity price, describing the state space of the EV storage battery, considering the constraint of the electricity demand of an automobile owner, and modeling the charge and discharge strategy deployment problem of the electric automobile as a constraint Markov decision problem, specifically comprising:

in the formula: t is t ₀ And t ₁ Respectively representing the arrival time and the departure time of the electric automobile, wherein the arrival time and the departure time are both generated at the beginning of time; e ₀ Is the remaining electric quantity at the beginning of charging and discharging of the electric vehicle, E _t And E _t+1 SOC values at t and t + 1; eta _ch And η _dis The EV battery energy conversion efficiency during charging and discharging;

and

respectively charge and discharge maximum power; Δ t is the duration of each charging action; a is _t For charging and discharging power of electric vehicle, when a _t If the voltage is greater than 0, the electric automobile is charged, otherwise, the electric automobile is discharged.

The electric automobile charge and discharge deploys the intelligent agent and the environment: the method comprises the following steps that an electric vehicle charging and discharging deployment intelligent body and the environment learn experience in an interactive mode and optimize a charging and discharging deployment strategy; the environment observed by the intelligent agent can be divided into two parts, wherein one part is the real-time SOC value of the electric automobile, and the other part is the time-varying electricity price of the time node.

And (4) state set S: the environmental state at time t may be defined as:

s _t ＝(E _t ,P _t-23 ,...,P _t )

(1) charging fee, i.e. reward value:

r _t ＝R(s _t ,a _t ,s _t+1 )＝-a _t ×P _t

in the formula: in the charging process, the reward represents the product of the electricity price at the moment t and the charging power in unit time, namely the negative value of the charging cost; during the discharge process, the reward represents revenue for selling electricity to the grid.

(2) Electric quantity constraint is a constraint value:

The aim of intelligent learning for electric vehicle charge and discharge deployment is to solve T-period total expected discount reward J meeting the electric quantity constraint condition _C (π) max, the objective function is expressed as:

Π _C ＝{π:J _C (π)≤d}

step 2: the method includes the steps that the future trend of the time-varying electricity price is extracted by using a long-term and short-term memory neural network, follow-up deep reinforcement learning is assisted to carry out strategy optimization, effective deployment of EV charge and discharge strategies is achieved, and the method specifically comprises the following steps:

the long-short term memory neural network and prediction module comprises the following calculation processes: the LSTM network expands into a 23-layer neural network structure. Wherein the input X of the first layer _t-22 Is represented by X _t-22 ＝P _t-22 -P _t-23 In which P is _t-22 And P _t-23 Representing the time-varying electricity prices at time t-22 and time t-21, respectively. y is _t-22 Represents the output of the first layer, c _t-22 Indicating its cellular state. Containing past electricity price information y _t-22 And c _t-22 Is passed to the next layer. This process is repeated until the last layer.

in the formula: o is _forget (t)、O _input (t)、O _out (t) respectively representing the output matrixes of the forgetting gate, the input gate and the output gate; w _yf 、W _xf 、W _yi 、W _xi 、W _yo 、W _xo Respectively showing a forgetting gate, an input gate, an output gate and an output y at the time of t-1 _t-1 And input x at time t _t The connection weight matrix of (2); b _f 、bi、b _o Respectively representing the offset vectors of the gates on the corresponding branches; σ represents an activation function;

Step 3: the method adopts an interior point-based strategy optimization algorithm, utilizes a logarithmic barrier function to convert an electric quantity constraint condition, optimizes strategy deployment in a deep neural network, minimizes user charging cost, and specifically comprises the following steps:

for each problem that the constraint satisfies, an index function is set

Satisfies the following conditions:

in the formula: when in strategy pi _θ Lower constraint condition

When the condition is met, the problem is converted into an unconstrained strategy optimization problem only considering rewards to solve;however, when any constraint violates, the penalty is- ∞, requiring preferential tuning of the policy to satisfy the constraint. The logarithmic barrier function being an index function

Can be approximated by:

The better the fitting. Thus passing through the index function

And expanding the target, and simplifying the original CMDP problem into an unconstrained optimization problem.

The interior point strategy optimization inherits a frame of a near-end strategy optimization algorithm, an actuator-judger framework is adopted, and small-batch data are randomly sampled from an experience playback pool for strategy updating during training; the evaluation device network updates the network parameter theta by using a time sequence difference error method ^v The specific calculation formula is as follows:

in the formula:

representing a network state value function at the moment t;

the near-end strategy optimization carries out first-order approximation and adopts Monte Carlo to approximate expectation, and then an objective function L is obtained through a clipping function ^CLIP ：

In the formula:

representing the ratio of old and new policies;

representing a merit function; clip (·) function will ξ _t Restricted to the interval [ 1-epsilon, 1+ epsilon ] with respect to the hyper-parameter epsilon]In addition, the calculation process is simplified.

The interior point strategy optimization expands the constraint condition into the objective function through a logarithm barrier function, so that not only is the long-time coupling constraint met, but also a confidence domain correction method compatible with random gradient descent is realized, and the objective function under the final parameter theta is specifically as follows:

step 4: according to the strategy obtained by training, the external environment is interacted with the deployment intelligent body to obtain a real-time electric vehicle charging and discharging decision, and the method specifically comprises the following steps:

Claims

1. An electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization is characterized by comprising the following steps:

step 1: modeling a slow charging process of the electric vehicle under a dynamic electricity price, describing an EV storage battery state space, considering vehicle owner electricity demand constraint, and modeling a charging and discharging strategy deployment problem of the electric vehicle as a constraint Markov decision problem;

step 3: an interior point-based strategy optimization algorithm is adopted, the constraint condition of the electricity quantity is converted by using a logarithm barrier function, strategy deployment is optimized in a deep neural network, and the charging cost of a user is minimized;

2. The electric vehicle charge and discharge strategy optimization method based on interior point strategy optimization according to claim 1, characterized in that the model of slow charging of the electric vehicle at the dynamic electricity price in Step1 is as follows:

and

respectively charge and discharge maximum power; Δ t is the duration of each charging action; a is _t For charging and discharging power of electric vehicle, when a _t If the voltage is more than 0, the electric automobile is charged, otherwise, the electric automobile is discharged;

the electric automobile charge and discharge deploys the intelligent agent and the environment: the method comprises the following steps that an electric vehicle charging and discharging deployment intelligent body and the environment learn experience in an interactive mode and optimize a charging and discharging deployment strategy; the intelligent agent observation environment is divided into two parts, one part is the real-time SOC value of the electric automobile, and the other part is the time-varying electricity price of the time node;

and (4) state set S: the environmental state at time t may be defined as:

s _t ＝(E _t ,P _t-23 ,K,P _t )

three types of information are contained in the formula: e _t The SOC value of the electric automobile at the time t; (P) _t-23 ,K,P _t ) Represents the electricity price in the past 24 hours;

action set A: action a at time t _t Represents the charge and discharge power of the electric vehicle in unit time, a _t The x delta t is the electric energy conversion quantity of the electric automobile in a delta t time period;

(1) charging fee, i.e. reward value:

r _t ＝R(s _t ,a _t ,s _t+1 )＝-a _t ×P _t

in the formula: in the charging process, the reward represents the product of the charging power in unit time and the electricity price at the time t, namely the negative value of the charging cost; during the discharge, the reward represents revenue from the sale of electricity to the grid;

(2) electric quantity constraint is a constraint value:

wherein, | E _t -E _target L is the time when charging is completed and the battery capacity E _t And charging target E _target A deviation value; e _t -E _max And E _min -E _t The practical physical mechanism is combined, and charging and discharging in the EV battery capacity range are guaranteed;

Π _C ＝{π:J _C (π)≤d}

3. The electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization according to claim 1, wherein in the Step2, the method for extracting the characteristic information of the future trend of the time-varying power price by using the long-short term memory neural network specifically comprises the following steps:

the long-short term memory neural network and prediction module comprises the following calculation processes: the LSTM network is expanded into a 23-layer neural network structure; wherein the input X of the first layer _t-22 Is represented by X _t-22 ＝P _t-22 -P _t-23 In which P is _t-22 And P _t-23 Respectively representing the time-varying electricity prices at the t-22 moment and the t-21 moment; y is _t-22 Representing the output of the first layer, c _t-22 Indicating its cellular state; containing past electricity price information y _t-22 And c _t-22 Is passed to the next layer; this process is repeated until the last layer;

in the formula: o is _forget (t)、O _input (t)、O _out (t) respectively representing output matrixes of the forgetting gate, the input gate and the output gate at the time t; w _yf 、W _xf 、W _yi 、W _xi 、W _yo 、W _xo Respectively showing a forgetting gate, an input gate, an output gate and a t-1 moment outputy _t-1 And input x at time t _t The connection weight matrix of (2); b _f 、bi、b _o Respectively representing the offset vectors of the gates on the corresponding branches; σ represents an activation function;

therefore, the calculation formula for extracting the future trend of the time series electricity price is as follows:

4. The electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization as claimed in claim 1, wherein in the Step3, the electric quantity constraint is expanded by using a logarithmic barrier function, and the method specifically comprises the following steps:

for each problem that the constraint satisfies, an index function is set

Satisfies the following conditions:

in the formula: when in strategy pi _θ Lower constraint condition

When the condition is met, the problem is converted into an unconstrained strategy optimization problem only considering rewards to solve; however, when any constraint violates, the penalty is- ∞, requiring preferential tuning of the policy to meet the constraint; the logarithmic barrier function being an index function

Can be approximated by:

The better the fitting effect; thus passing through the index function

Expanding a target, and simplifying the original CMDP problem into an unconstrained optimization problem;

in the formula:

representing a network state value function at the moment t;

In the formula:

representing the ratio of old and new policies;

representing a merit function; clip (·) function will ξ _t Restricted to the interval [ 1-epsilon, 1+ epsilon ] with respect to the hyper-parameter epsilon]In addition, the calculation process is simplified;

5. the electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization according to claim 1, wherein the Step4 specifically comprises the following steps: