CN114997935B - Electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization - Google Patents

Electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization Download PDF

Info

Publication number
CN114997935B
CN114997935B CN202210848364.XA CN202210848364A CN114997935B CN 114997935 B CN114997935 B CN 114997935B CN 202210848364 A CN202210848364 A CN 202210848364A CN 114997935 B CN114997935 B CN 114997935B
Authority
CN
China
Prior art keywords
charging
time
electric vehicle
constraint
discharging
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210848364.XA
Other languages
Chinese (zh)
Other versions
CN114997935A (en
Inventor
臧汉洲
叶宇剑
汤奕
钱俊良
周吉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Liyang Research Institute of Southeast University
Original Assignee
Liyang Research Institute of Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Liyang Research Institute of Southeast University filed Critical Liyang Research Institute of Southeast University
Priority to CN202210848364.XA priority Critical patent/CN114997935B/en
Publication of CN114997935A publication Critical patent/CN114997935A/en
Application granted granted Critical
Publication of CN114997935B publication Critical patent/CN114997935B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0206Price or cost determination based on market factors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T90/00Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation
    • Y02T90/10Technologies relating to charging of electric vehicles
    • Y02T90/16Information or communication technologies improving the operation of electric vehicles
    • Y02T90/167Systems integrating technologies related to power network operation and communication or information technologies for supporting the interoperability of electric or hybrid vehicles, i.e. smartgrids as interface for battery charging of electric vehicles [EV] or hybrid vehicles [HEV]

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Data Mining & Analysis (AREA)
  • General Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Tourism & Hospitality (AREA)
  • Primary Health Care (AREA)
  • Human Resources & Organizations (AREA)
  • Water Supply & Treatment (AREA)
  • Public Health (AREA)
  • Charge And Discharge Circuits For Batteries Or The Like (AREA)

Abstract

The invention discloses an electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization, which relates to the field of electric vehicle charging and discharging scheduling, and comprises the following steps: firstly, an electric vehicle charging and discharging model is constructed according to an actual physical mechanism, and meanwhile, modeling is carried out as a constraint sequential decision problem based on an electric vehicle charging and discharging deployment optimization problem. And then extracting the future trend of the time-varying electricity price by using the long-short term memory neural network to assist the follow-up deep reinforcement learning to carry out strategy optimization. And finally, inputting the extracted electricity price information and the internal state characteristics of the electric vehicle into a strategy function based on a deep neural network, enabling to deploy intelligent learning charge and discharge actions, and expanding the electric quantity constraint into an optimization target of interior point strategy optimization through a logarithmic barrier function to carry out strategy optimization. The deployment optimization method provided by the invention minimizes the charging cost of a user on the premise of meeting the power consumption requirement of the electric automobile, and meanwhile, improves the adaptability of the strategy to uncertainty.

Description

Electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization
Technical Field
The invention relates to the field of electric vehicle charging and discharging scheduling, in particular to a real-time electric vehicle charging and discharging strategy deployment optimization method based on interior point strategy optimization.
Background
As an environmentally friendly alternative to traditional fossil fuel automobiles, electric automobiles have been widely adopted over the past few years. However, the rapid development of EVs inevitably causes large-scale electric vehicle clusters to be integrated into the power grid, which poses great challenges to the economic and safe operation of the power grid. To address this issue, electric vehicles may be motivated by demand response to shift charging time to off-peak hours and optimize electric vehicle charging costs based on dynamic electricity prices, even by discharging to the grid to gain revenue.
The essence of the EV optimization scheduling problem is the scheduling problem of the charging and discharging states of the electric vehicle under the random scene of multiple uncertain factors. The deep reinforcement learning is suitable for finding the optimal strategy in a complex uncertain environment, and is an effective method for solving the sequence decision problem. However, when the constraint of meeting the trip electricity demand of the user is taken as a constraint, the conventional deep reinforcement learning method needs to correctly design a penalty item and select a penalty coefficient to ensure that the electric vehicle can be fully charged when leaving. Selecting the proper penalty coefficients requires a great deal of time and effort, is a very tedious process, and can lead to a sharp reduction in the performance of the algorithm once the designed penalty coefficients are not proper.
Disclosure of Invention
The invention aims to provide an electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization, which does not depend on accurate prediction of future information and only depends on real-time perception of environmental states to carry out self-optimization-approaching strategy learning; the adopted long-short term memory neural network effectively extracts the time sequence characteristics of the future trend of the time-varying electricity price; in addition, the interior point strategy optimization algorithm takes the requirement of the user for trip electricity utilization as a constraint premise, so that the charging cost of the user is minimized, and the adaptability of the strategy to uncertainty is improved.
In order to meet the requirements, the invention adopts the following technical scheme:
an electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization comprises the following steps:
step1: modeling a slow charging process of the electric vehicle under the dynamic electricity price, describing the state space of the EV storage battery, considering the power demand constraint of a vehicle owner, and modeling a charge and discharge strategy deployment problem of the electric vehicle as a constraint Markov decision problem;
step2: extracting future trend of time-varying power price by using the long-short term memory neural network, assisting subsequent deep reinforcement learning to optimize strategies, and realizing effective deployment of EV charge-discharge strategies;
step3: and an interior point-based strategy optimization algorithm is adopted, the logarithmic barrier function is utilized to convert the electric quantity constraint condition, the strategy deployment is optimized in the deep neural network, and the user charging cost is minimized.
Step4: and interacting the external environment with the deployment intelligent agent according to the strategy obtained by training to obtain a real-time electric vehicle charging and discharging decision.
Further, in the Step1, a model of the electric vehicle slow charging at the dynamic electricity price is as follows:
Figure BDA0003753893190000021
in the formula: t is t 0 And t 1 Respectively representing the arrival time and the departure time of the electric automobile, wherein the arrival time and the departure time are both generated at the beginning of time; e 0 Is the remaining electric quantity at the beginning of charging and discharging of the electric vehicle, E t And E t+1 SOC values at t and t + 1; eta ch And η dis The EV battery energy conversion efficiency during charging and discharging;
Figure BDA0003753893190000022
and &>
Figure BDA0003753893190000023
Respectively charge and discharge maximum power; Δ t is the duration of each charging action; a is t For charging and discharging power of electric vehicle, when a t If > 0, the electric vehicle is charged, otherwise, the electric vehicle is discharged.
The electric vehicle is charged and discharged and deploys an intelligent agent and the environment: the electric vehicle charging and discharging deployment intelligent body and the environment interactively learn experiences and optimize a charging and discharging deployment strategy; the environment observed by the intelligent agent is divided into two parts, wherein one part is the real-time SOC value of the electric automobile, and the other part is the time-varying electricity price of the time node.
A state set S: the environmental state at time t may be defined as:
s t =(E t ,P t-23 ,...,P t )
three types of information are contained in the formula: e t The SOC value of the electric automobile at the time t; (P) t-23 ,...,P t ) Represents the electricity price in the past 24 hours;
action set A: action a at time t t Represents the charge and discharge power of the electric vehicle in unit time, a t And the multiplied by delta t is the electric energy conversion quantity of the electric automobile in the delta t period.
The core of the constraint Markov decision process is to satisfy a constraint function c t Under the premise of maximizing the reward function r t Giving an optimal strategy, so that the optimal goal of electric vehicle charging deployment is to ensure that the charging cost under the constraint of meeting the electric vehicle owner power demand is minimum:
(1) Charging fee, i.e. reward value:
r t =R(s t ,a t ,s t+1 )=-a t ×P t
in the formula: in the charging process, the reward represents the product of the electricity price at the time t and the charging power in unit time, namely the negative value of the charging cost; during the discharge process, the reward represents revenue for selling electricity to the grid.
(2) Electric quantity constraint is a constraint value:
Figure BDA0003753893190000031
wherein, | E t -E target L is the time when charging is completed and the battery capacity E t And a charging target E target A deviation value; e t -E max And E min -E t Shows that the practical physical mechanism is combined to ensure that the charging and discharging are carried out within the EV battery capacity range.
The aim of learning of intelligent agent for electric vehicle charging and discharging deployment is to solve the T-period total expected discount reward J meeting the electric quantity constraint condition C (π) max, the objective function is expressed as:
Figure BDA0003753893190000032
Π C ={π:J C (π)≤d}
in the formula: gamma is a discount factor for balancing the current constraint value with the future constraint value; d represents a very small constraint variance.
Further, in the Step2, the characteristic information of the future trend of the time-varying electricity price is extracted by using the long-term and short-term memory neural network, and the method specifically comprises the following steps:
the long-short term memory neural network and prediction module comprises the following calculation processes: the LSTM network expands into a 23-layer neural network structure. Wherein the input X of the first layer t-22 Is represented by X t-22 =P t-22 -P t-23 In which P is t-22 And P t-23 Representing the time-varying electricity prices at time t-22 and time t-21, respectively. y is t-22 Representing the output of the first layer, c t-22 Indicating its cellular state. Containing past electricity price information y t-22 And c t-22 Is passed to the next layer. This process is repeated until the last layer.
The LSTM realizes the selective transmission of memory information by protecting and controlling the cell state through a gate mechanism, and comprises a forgetting gate, an input gate and an output gate:
Figure BDA0003753893190000033
in the formula: o is forget (t)、O input (t)、O out (t) respectively representing output matrixes of the forgetting gate, the input gate and the output gate at the time t; w is a group of yf 、W xf 、W yi 、W xi 、W yo 、W xo Respectively showing a forgetting gate, an input gate, an output gate and an output y at the moment of t-1 t-1 And input x at time t t The connection weight matrix of (2); b f 、bi、b o Respectively representing the offset vectors of the gates on the corresponding branches; σ represents an activation function;
therefore, the calculation formula for extracting the future trend of the time series electricity price is as follows:
Figure BDA0003753893190000041
in the formula: o is z (t) preprocessing information input to the cell status module at time t; w is a group of yz 、W xz Respectively representing the output y at time t-1 t-1 And x is input at time t t And O z (t) a connection weight matrix; b z Is a bias vector; the Hadamard product representing the matrix; tanh is the activation function.
Further, in Step3, the electric quantity constraint is expanded by using a logarithmic barrier function, which specifically includes the following steps:
for each problem that the constraint satisfies, an index function is set
Figure BDA0003753893190000042
Satisfies the following conditions:
Figure BDA0003753893190000043
in the formula: when in strategy pi θ Lower constraint condition
Figure BDA0003753893190000044
When the condition is met, the problem is converted into an unconstrained strategy optimization problem only considering rewards to solve; however, when any constraint violates, the penalty is- ∞, requiring preferential tuning of the policy to satisfy the constraint. The logarithmic barrier function being an index function>
Figure BDA0003753893190000045
Can be approximated by:
Figure BDA0003753893190000046
wherein k is a hyper-parameter, and the larger the k value is, the finger-to-finger ratioStandard function
Figure BDA0003753893190000047
The better the fitting. Thus passes the index function->
Figure BDA0003753893190000048
And expanding the target, and simplifying the original CMDP problem into an unconstrained optimization problem.
The interior point strategy optimization inherits a frame of a near-end strategy optimization algorithm, an actuator-judger framework is adopted, and small-batch data are randomly sampled from an experience playback pool for strategy updating during training; the evaluation device network updates the network parameter theta by using a time sequence difference error method v The specific calculation formula is as follows:
Figure BDA0003753893190000049
in the formula:
Figure BDA00037538931900000410
representing a network state value function at the moment t;
the near-end strategy optimization carries out first-order approximation and adopts Monte Carlo to approximate expectation, and then a target function L is obtained through a cutting function CLIP
Figure BDA00037538931900000411
In the formula:
Figure BDA0003753893190000051
representing the ratio of old and new policies;
Figure BDA0003753893190000052
Representing a merit function; clip (·) function will ξ t Restricted to the interval [ 1-epsilon, 1+ epsilon ] with respect to the hyper-parameter epsilon]In addition, the calculation process is simplified.
The interior point strategy optimization expands the constraint condition into the objective function through a logarithmic barrier function, so that not only is the long-time coupling constraint met, but also a confidence domain correction method compatible with random gradient descent is realized, and the objective function under the final parameter theta is specifically as follows:
Figure BDA0003753893190000053
further, the Step4 specifically includes the following steps:
the decision of the intelligent agent after on-line deployment only depends on the actuator network which is trained, and the optimal parameter theta obtained by training the actuator network is loaded * And the neural network model is used for interacting with the intelligent agent according to the state information to obtain a real-time charging and discharging decision. And continuously repeating the interaction process until the electric automobile leaves the charging pile.
Has the advantages that:
the electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization does not depend on accurate prediction of future information, and only senses the environment state in real time to carry out self-optimization-approaching strategy learning; the adopted long-short term memory neural network effectively extracts the time sequence characteristics of the future trend of the time-varying electricity price; in addition, the interior point strategy optimization algorithm takes the requirement of meeting the trip electricity utilization of the user as a constraint premise, the charging cost of the user is minimized, and meanwhile the adaptability of the strategy to uncertainty is improved.
Drawings
FIG. 1 is a diagram of a long-short term memory neural network-based timing feature extraction and policy network according to the present invention.
FIG. 2 is a flow chart of training of the interior point strategy optimization algorithm of the present invention.
FIG. 3 is a diagram of a Markov decision process of the present invention.
FIG. 4 is a diagram illustrating reward values under the interior point policy optimization algorithm of the present invention.
FIG. 5 is a diagram illustrating constraint values under the interior point policy optimization algorithm of the present invention.
FIG. 6 is a real-time electric vehicle charging and discharging schedule according to the present invention.
Detailed description of the preferred embodiment
The following detailed description of the embodiments of the present invention is provided with reference to the accompanying drawings 1:
the invention provides an electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization, which comprises the following steps:
step1: modeling the slow charging process of the electric automobile under the dynamic electricity price, describing the state space of the EV storage battery, considering the constraint of the electricity demand of an automobile owner, and modeling the charge and discharge strategy deployment problem of the electric automobile as a constraint Markov decision problem, specifically comprising:
Figure BDA0003753893190000061
in the formula: t is t 0 And t 1 Respectively representing the arrival time and the departure time of the electric automobile, wherein the arrival time and the departure time are both generated at the beginning of time; e 0 Is the remaining electric quantity at the beginning of charging and discharging of the electric vehicle, E t And E t+1 SOC values at t and t + 1; eta ch And η dis The EV battery energy conversion efficiency during charging and discharging;
Figure BDA0003753893190000062
and &>
Figure BDA0003753893190000063
Respectively charge and discharge maximum power; Δ t is the duration of each charging action; a is t For charging and discharging power of electric vehicle, when a t If the voltage is greater than 0, the electric automobile is charged, otherwise, the electric automobile is discharged.
The electric automobile charge and discharge deploys the intelligent agent and the environment: the method comprises the following steps that an electric vehicle charging and discharging deployment intelligent body and the environment learn experience in an interactive mode and optimize a charging and discharging deployment strategy; the environment observed by the intelligent agent can be divided into two parts, wherein one part is the real-time SOC value of the electric automobile, and the other part is the time-varying electricity price of the time node.
A state set S: the environmental state at time t may be defined as:
s t =(E t ,P t-23 ,...,P t )
three types of information are contained in the formula: e t The SOC value of the electric automobile at the time t; (P) t-23 ,...,P t ) Represents the electricity price in the past 24 hours;
action set A: action a at time t t Represents the charge and discharge power of the electric vehicle in unit time, a t And the multiplied by delta t is the electric energy conversion quantity of the electric automobile in the delta t period.
The core of the constraint Markov decision process is to satisfy a constraint function c t Under the premise of maximizing the reward function r t Giving an optimal strategy, so that the optimal goal of electric vehicle charging deployment is to ensure that the charging cost under the constraint of satisfying the vehicle owner power demand is minimum:
(1) Charging fee, i.e. reward value:
r t =R(s t ,a t ,s t+1 )=-a t ×P t
in the formula: in the charging process, the reward represents the product of the electricity price at the time t and the charging power in unit time, namely the negative value of the charging cost; during the discharge process, the reward represents revenue for selling electricity to the grid.
(2) Electric quantity constraint is a constraint value:
Figure BDA0003753893190000071
wherein, | E t -E target L is the moment when charging is completed and the battery capacity E t And a charging target E target A deviation value; e t -E max And E min -E t Shows that the practical physical mechanism is combined to ensure that charging and discharging are carried out within the EV battery capacity range.
The aim of intelligent learning for electric vehicle charge and discharge deployment is to solve T-period total expected discount reward J meeting the electric quantity constraint condition C (π) max, the objective function is expressed as:
Figure BDA0003753893190000072
Π C ={π:J C (π)≤d}
step2: the method includes the steps that the future trend of the time-varying electricity price is extracted by using a long-term and short-term memory neural network, follow-up deep reinforcement learning is assisted to carry out strategy optimization, effective deployment of EV charge and discharge strategies is achieved, and the method specifically comprises the following steps:
the long-short term memory neural network and prediction module comprises the following calculation processes: the LSTM network expands into a 23-layer neural network structure. Wherein the input X of the first layer t-22 Is represented by X t-22 =P t-22 -P t-23 In which P is t-22 And P t-23 Representing the time-varying electricity prices at time t-22 and time t-21, respectively. y is t-22 Represents the output of the first layer, c t-22 Indicating its cellular state. Containing past electricity price information y t-22 And c t-22 Is passed to the next layer. This process is repeated until the last layer.
The LSTM realizes the selective transmission of memory information by protecting and controlling the cell state through a gate mechanism, and comprises a forgetting gate, an input gate and an output gate:
Figure BDA0003753893190000073
in the formula: o is forget (t)、O input (t)、O out (t) respectively representing the output matrixes of the forgetting gate, the input gate and the output gate; w yf 、W xf 、W yi 、W xi 、W yo 、W xo Respectively showing a forgetting gate, an input gate, an output gate and an output y at the time of t-1 t-1 And input x at time t t The connection weight matrix of (2); b f 、bi、b o Respectively representing the offset vectors of the gates on the corresponding branches; σ represents an activation function;
therefore, the calculation formula for extracting the future trend of the time series electricity price is as follows:
Figure BDA0003753893190000081
in the formula: o is z (t) preprocessing information input to the cell status module at time t; w is a group of yz 、W xz Respectively representing the output y at time t-1 t-1 And input x at time t t And O z (t) a connection weight matrix; b z Is a bias vector; the Hadamard product representing the matrix; tanh is the activation function.
Step3: the method adopts an interior point-based strategy optimization algorithm, utilizes a logarithmic barrier function to convert an electric quantity constraint condition, optimizes strategy deployment in a deep neural network, and minimizes user charging cost, and specifically comprises the following steps:
for each problem that the constraint satisfies, an index function is set
Figure BDA0003753893190000082
Satisfies the following conditions:
Figure BDA0003753893190000083
in the formula: when in strategy pi θ Lower constraint condition
Figure BDA0003753893190000084
When the condition is met, the problem is converted into an unconstrained strategy optimization problem only considering rewards to solve; however, when any constraint violates, the penalty is- ∞, requiring preferential tuning of the policy to satisfy the constraint. The logarithmic barrier function being an index function>
Figure BDA0003753893190000085
A micro-approximation of:
Figure BDA0003753893190000086
where k is a hyperparameter, the larger the value of k, the larger the index function
Figure BDA0003753893190000087
The better the fit. Thus passing through the index function
Figure BDA0003753893190000088
And (4) expanding the target and simplifying the original CMDP problem into an unconstrained optimization problem.
The interior point strategy optimization inherits a frame of a near-end strategy optimization algorithm, an actuator-judger framework is adopted, and small-batch data are randomly sampled from an experience playback pool for strategy updating during training; the evaluation device network updates the network parameter theta by using a time sequence difference error method v The specific calculation formula is as follows:
Figure BDA0003753893190000089
in the formula:
Figure BDA00037538931900000810
representing a network state value function at the moment t;
the near-end strategy optimization carries out first-order approximation and adopts Monte Carlo to approximate expectation, and then a target function L is obtained through a cutting function CLIP
Figure BDA00037538931900000811
In the formula:
Figure BDA0003753893190000091
representing the ratio of old and new policies;
Figure BDA0003753893190000092
Representing a merit function; clip (·) function will ξ t Restricted to the interval [ 1-epsilon, 1+ epsilon ] with respect to the hyper-parameter epsilon]In addition, the calculation process is simplified.
The interior point strategy optimization expands the constraint condition into the objective function through a logarithmic barrier function, so that not only is the long-time coupling constraint met, but also a confidence domain correction method compatible with random gradient descent is realized, and the objective function under the final parameter theta is specifically as follows:
Figure BDA0003753893190000093
step4: according to the strategy obtained by training, interacting the external environment with the deployment intelligent body to obtain a real-time electric vehicle charging and discharging decision, which specifically comprises the following steps:
the decision of the intelligent agent after on-line deployment only depends on the actuator network which is trained, and the optimal parameter theta obtained by the training of the actuator network is loaded * And the neural network model is used for interacting with the intelligent agent according to the state information to obtain a real-time charging and discharging decision. And continuously repeating the interaction process until the electric automobile leaves the charging pile.

Claims (4)

1. An electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization is characterized by comprising the following steps:
step1: modeling a slow charging process of the electric vehicle under a dynamic electricity price, describing an EV storage battery state space, considering vehicle owner electricity demand constraint, and modeling a charging and discharging strategy deployment problem of the electric vehicle as a constraint Markov decision problem;
step2: extracting future trend of time-varying power price by using a long-short term memory neural network, assisting subsequent deep reinforcement learning to carry out strategy optimization, and realizing effective deployment of EV charge-discharge strategies;
step3: an interior point-based strategy optimization algorithm is adopted, the constraint condition of the electricity quantity is converted by using a logarithm barrier function, strategy deployment is optimized in a deep neural network, and the charging cost of a user is minimized;
step4: according to the strategy obtained by training, interacting an external environment with a deployment intelligent body to obtain a real-time electric vehicle charging and discharging decision;
in the Step1, a model of the electric vehicle for slow charging at the dynamic electricity price is as follows:
Figure FDA0004063536740000011
in the formula: t is t 0 And t 1 Respectively representing the arrival time and the departure time of the electric automobile, wherein the arrival time and the departure time are both generated at the beginning of time; e 0 Is the remaining capacity at the beginning of charging and discharging of the electric vehicle, E t And E t+1 SOC values at t and t + 1; eta ch And η dis The EV battery energy conversion efficiency during charging and discharging;
Figure FDA0004063536740000012
and &>
Figure FDA0004063536740000013
Respectively charge and discharge maximum power; Δ t is the duration of each charging action; a is t For charging and discharging power of electric vehicle, when a t If the voltage is more than 0, the electric automobile is charged, otherwise, the electric automobile is discharged;
the electric automobile charge and discharge deploys the intelligent agent and the environment: the method comprises the following steps that an electric vehicle charging and discharging deployment intelligent body and the environment learn experience in an interactive mode and optimize a charging and discharging deployment strategy; the environment observed by the intelligent agent is divided into two parts, wherein one part is the real-time SOC value of the electric automobile, and the other part is the time-varying electricity price of the time node;
and (4) state set S: the environmental state at time t may be defined as:
S t =(E t ,P t-23 ,...,P t )
three types of information are contained in the formula: e t The SOC value of the electric automobile at the time t; (P) t-23 ,...,P t ) Represents the electricity price in the past 24 hours;
an action set A: action a at time t t Represents the charge and discharge power of the electric automobile in unit time, a t The x delta t is the electric energy conversion quantity of the electric automobile in a delta t time period;
the core of the constraint Markov decision process is to satisfy a constraint function c t Maximizing the reward function r on the premise t Giving an optimal strategy, so that the optimal goal of electric vehicle charging deployment is to ensure that the charging cost under the constraint of meeting the electric vehicle owner power demand is minimum:
(1) Charging fee, i.e. reward value:
r t =R(s t ,a t ,s t+1 )=-a t ×P t
in the formula: in the charging process, the reward represents the product of the charging power in unit time and the electricity price at the moment t, namely the negative value of the charging cost; during the discharge, the reward represents revenue for selling electricity to the grid;
(2) Electric quantity constraint is a constraint value:
Figure FDA0004063536740000021
wherein, | E t -E target L is the moment when charging is completed and the battery capacity E t And charging target E target A deviation value;
E t -E max and E min -E t The practical physical mechanism is combined, and charging and discharging in the EV battery capacity range are guaranteed;
the aim of learning of intelligent agent for electric vehicle charging and discharging deployment is to solve the T-period total expected discount reward J meeting the electric quantity constraint condition C (π) max, the objective function is expressed as:
Figure FDA0004063536740000022
Figure FDA0004063536740000023
in the formula: gamma is a discount factor for balancing the current constraint value with the future constraint value; d represents a very small constraint variance.
2. The electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization according to claim 1, wherein in the Step2, the characteristic information of future trend of time-varying power price is extracted by using a long-short term memory neural network, and the method specifically comprises the following steps:
the long-short term memory neural network and prediction module comprises the following calculation processes: the LSTM network is expanded into a 23-layer neural network structure; wherein the input X of the first layer t-22 Is represented by X t-22 =P t-22 -P t-23 In which P is t-22 And P t-23 Respectively representing the time-varying electricity prices at the t-22 moment and the t-21 moment; y is t-22 Representing the output of the first layer, c t-22 Indicating its cellular state; containing past electricity price information y t-22 And c t-22 Is passed to the next layer; this process is repeated until the last layer;
the LSTM realizes the selective transmission of memory information by protecting and controlling the cell state through a gate mechanism, and comprises a forgetting gate, an input gate and an output gate:
Figure FDA0004063536740000031
in the formula: o is forget (t)、O input (t)、O out (t) respectively representing output matrixes of the forgetting gate, the input gate and the output gate at the time t; w yf 、W xf 、W yi 、W xi 、W yo 、W xo Respectively showing a forgetting gate, an input gate, an output gate and an output y at the time of t-1 t-1 And input x at time t t The connection weight matrix of (2); b f 、bi、b o Respectively representing the offset vectors of the gates on the corresponding branches; σ represents an activation function;
therefore, the calculation formula for the time series electricity price future trend extraction is as follows:
Figure FDA0004063536740000032
in the formula: o is z (t) preprocessing information input to the cell status module at time t; w is a group of yz 、W xz Respectively representing the output y at time t-1 t-1 And input x at time t t And O z (t) connection weight momentArraying; b z Is a bias vector; the Hadamard product representing the matrix; tanh is the activation function.
3. The electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization as claimed in claim 1, wherein in the Step3, the electric quantity constraint is expanded by using a logarithmic barrier function, and the method specifically comprises the following steps:
for each problem that the constraint satisfies, an index function is set
Figure FDA0004063536740000033
Satisfies the following conditions:
Figure FDA0004063536740000034
in the formula: when in strategy pi θ Lower constraint condition
Figure FDA0004063536740000035
When the condition is met, the problem is converted into an unconstrained strategy optimization problem only considering rewards to solve; however, when any constraint violates, the penalty is- ∞, requiring preferential tuning of the policy to satisfy the constraint; the logarithmic barrier function being an index function>
Figure FDA0004063536740000036
Can be approximated by:
Figure FDA0004063536740000037
where k is a hyperparameter, the larger the value of k, the larger the index function
Figure FDA0004063536740000041
The better the fitting effect; in this way by means of the criterion function>
Figure FDA0004063536740000042
Expanding a target, and simplifying the original CMDP problem into an unconstrained optimization problem;
the interior point strategy optimization inherits a frame of a near-end strategy optimization algorithm, an actuator-judger framework is adopted, and small-batch data are randomly sampled from an experience playback pool for strategy updating during training; the evaluation device network updates the network parameter theta by using a time sequence difference error method v The specific calculation formula is as follows:
Figure FDA0004063536740000043
in the formula:
Figure FDA0004063536740000044
representing a network state value function at the moment t;
the near-end strategy optimization carries out first-order approximation and adopts Monte Carlo to approximate expectation, and then an objective function L is obtained through a clipping function CLIP
Figure FDA0004063536740000045
In the formula:
Figure FDA0004063536740000046
representing the ratio of old and new policies;
Figure FDA0004063536740000047
Representing a merit function; clip (·) function will ξ t Restricted to the interval [ 1-epsilon, 1+ epsilon ] with respect to the hyper-parameter epsilon]In addition, the calculation process is simplified;
the interior point strategy optimization expands the constraint condition into the objective function through a logarithm barrier function, so that not only is the long-time coupling constraint met, but also a confidence domain correction method compatible with random gradient descent is realized, and the objective function under the final parameter theta is specifically as follows:
Figure FDA0004063536740000048
4. the electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization according to claim 1, wherein the Step4 specifically comprises the following steps:
the decision of the intelligent agent after on-line deployment only depends on the actuator network which is trained, and the optimal parameter theta obtained by the training of the actuator network is loaded * And the neural network model is interacted with the intelligent body according to the state information to obtain a real-time charging and discharging decision, and the interaction process is repeated continuously until the electric automobile leaves the charging pile.
CN202210848364.XA 2022-07-19 2022-07-19 Electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization Active CN114997935B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210848364.XA CN114997935B (en) 2022-07-19 2022-07-19 Electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210848364.XA CN114997935B (en) 2022-07-19 2022-07-19 Electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization

Publications (2)

Publication Number Publication Date
CN114997935A CN114997935A (en) 2022-09-02
CN114997935B true CN114997935B (en) 2023-04-07

Family

ID=83021907

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210848364.XA Active CN114997935B (en) 2022-07-19 2022-07-19 Electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization

Country Status (1)

Country Link
CN (1) CN114997935B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115731072B (en) * 2022-11-22 2024-01-30 东南大学 Micro-grid space-time perception energy management method based on safety deep reinforcement learning
CN115936195B (en) * 2022-11-23 2024-07-12 合肥工业大学 Intelligent cell energy optimization method, system, electronic equipment and storage medium
CN117689188B (en) * 2024-02-04 2024-04-26 江西驴充充物联网科技有限公司 Big data-based user charging strategy optimization system and method
CN117863969B (en) * 2024-03-13 2024-05-17 国网北京市电力公司 Electric automobile charge and discharge control method and system considering battery loss
CN118082598B (en) * 2024-04-25 2024-10-11 国网天津市电力公司电力科学研究院 Electric vehicle charging method, apparatus, device, medium, and program product

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113572157A (en) * 2021-07-27 2021-10-29 东南大学 User real-time autonomous energy management optimization method based on near-end policy optimization
CN113627993A (en) * 2021-08-26 2021-11-09 东北大学秦皇岛分校 Intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111934335B (en) * 2020-08-18 2022-11-18 华北电力大学 Cluster electric vehicle charging behavior optimization method based on deep reinforcement learning
CN114619907B (en) * 2020-12-14 2023-10-20 中国科学技术大学 Coordinated charging method and coordinated charging system based on distributed deep reinforcement learning
CN113922404B (en) * 2021-10-22 2023-08-29 山东大学 Community electric automobile cluster charging coordination method and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113572157A (en) * 2021-07-27 2021-10-29 东南大学 User real-time autonomous energy management optimization method based on near-end policy optimization
CN113627993A (en) * 2021-08-26 2021-11-09 东北大学秦皇岛分校 Intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning

Also Published As

Publication number Publication date
CN114997935A (en) 2022-09-02

Similar Documents

Publication Publication Date Title
CN114997935B (en) Electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization
CN109347149B (en) Micro-grid energy storage scheduling method and device based on deep Q-value network reinforcement learning
CN113511082B (en) Hybrid electric vehicle energy management method based on rule and double-depth Q network
Xu et al. A soft actor-critic-based energy management strategy for electric vehicles with hybrid energy storage systems
CN112117760A (en) Micro-grid energy scheduling method based on double-Q-value network deep reinforcement learning
Huang et al. Ensemble learning for charging load forecasting of electric vehicle charging stations
CN110138006B (en) Multi-microgrid coordinated optimization scheduling method considering new energy electric vehicle
CN112131733B (en) Distributed power supply planning method considering influence of charging load of electric automobile
CN113572157B (en) User real-time autonomous energy management optimization method based on near-end policy optimization
CN109050284B (en) Electric automobile charging and discharging electricity price optimization method considering V2G
CN113627993A (en) Intelligent electric vehicle charging and discharging decision method based on deep reinforcement learning
CN107919675B (en) Charging station load scheduling model comprehensively considering benefits of vehicle owners and operators
CN112238781B (en) Electric automobile ordered charging control method based on layered architecture
CN103997091A (en) Scale electric automobile intelligent charging control method
CN114583729A (en) Light-storage electric vehicle charging station scheduling method considering full-life-cycle carbon emission
CN116683513A (en) Method and system for optimizing energy supplement strategy of mobile micro-grid
CN117559468A (en) V2G station rapid frequency modulation response method based on ultra-short term frequency deviation prediction
CN117318169A (en) Active power distribution network scheduling method based on deep reinforcement learning and new energy consumption
CN115308606A (en) Lithium ion battery health state estimation method based on proximity features
Zhang et al. Uncertainty-Aware Energy Management Strategy for Hybrid Electric Vehicle Using Hybrid Deep Learning Method
CN114619907B (en) Coordinated charging method and coordinated charging system based on distributed deep reinforcement learning
CN113555888B (en) Micro-grid energy storage coordination control method
CN114583696A (en) Power distribution network reactive power optimization method and system based on BP neural network and scene matching
CN115036952A (en) Real-time power control method for electric vehicle participating in load stabilization based on MPC
CN114742453A (en) Micro-grid energy management method based on Rainbow deep Q network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant