CN117350424A - Economic dispatching and electric vehicle charging strategy combined optimization method in energy internet - Google Patents

Economic dispatching and electric vehicle charging strategy combined optimization method in energy internet Download PDF

Info

Publication number
CN117350424A
CN117350424A CN202311231834.9A CN202311231834A CN117350424A CN 117350424 A CN117350424 A CN 117350424A CN 202311231834 A CN202311231834 A CN 202311231834A CN 117350424 A CN117350424 A CN 117350424A
Authority
CN
China
Prior art keywords
charging
power
day
charging pile
scheduling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311231834.9A
Other languages
Chinese (zh)
Inventor
彭宇
胡本然
王宁
孙迪
王振邦
潘刚
刘永楠
关心
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Heilongjiang Electric Power Co Ltd
Heilongjiang University
Original Assignee
State Grid Heilongjiang Electric Power Co Ltd
Heilongjiang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Heilongjiang Electric Power Co Ltd, Heilongjiang University filed Critical State Grid Heilongjiang Electric Power Co Ltd
Priority to CN202311231834.9A priority Critical patent/CN117350424A/en
Publication of CN117350424A publication Critical patent/CN117350424A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/067Enterprise or organisation modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Probability & Statistics with Applications (AREA)
  • Algebra (AREA)
  • Artificial Intelligence (AREA)
  • Computational Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Charge And Discharge Circuits For Batteries Or The Like (AREA)

Abstract

The invention relates to the technical field of operation optimization of power distribution networks, in particular to an intelligent optimization method combining economic dispatch and electric vehicle charging strategies in the energy Internet. In the prior art, in the joint optimization of economic dispatch and electric automobile charging strategies in the energy Internet, the problem of incomplete consideration exists. The invention provides a multi-target electric automobile intelligent charging model considering two-stage economic dispatch. The model indirectly reflects the charging satisfaction degree of the user through distance cost, time cost, user mileage anxiety and the like, and adopts a method based on near-end policy optimization to find the optimal charging policy of the electric automobile, so that lower charging cost is realized and electric energy in the charging pile is consumed as much as possible. Meanwhile, the power economy dispatching of the charging pile is considered, and an optimal power dispatching strategy is given out by establishing a two-stage multi-objective optimization model and using a deep reinforcement learning method based on near-end strategy optimization, so that lower power generation cost and lower carbon emission are realized.

Description

Economic dispatching and electric vehicle charging strategy combined optimization method in energy internet
Technical Field
The invention relates to the technical field of operation optimization of power distribution networks, in particular to an intelligent optimization method combining economic dispatch and electric vehicle charging strategies in the energy Internet.
Background
In recent years, with the development of industry, the problems of energy shortage and environmental pollution are increasingly serious. In the traffic field, the emission of automobile exhaust has become one of the main factors of increasingly serious environmental pollution problems. New energy electric vehicles are becoming more and more popular due to their zero emission characteristics. However, as the number of electric vehicles increases, the problem of road congestion is also accompanied, and the "charging difficulty" becomes the biggest trouble for many electric vehicle users. In addition, the traffic jam and the improper charging strategy can lead to surplus electric energy transmitted by the power generation side to the charging pile, further lead to energy loss and seriously affect the shortage of energy.
There are three major challenges in designing charging strategies in intelligent traffic agents. Firstly, due to the time-varying nature of road traffic networks and the volatility of energy prices, it is difficult to design an effective strategy to guide an electric vehicle to select an appropriate charging pile for charging. The strategy not only can promote the charging experience of the user, but also can consume the residual electric energy in the charging pile as much as possible. Secondly, designing an efficient power economic dispatching strategy, and reducing environmental pollution is also critical because new energy is consumed as much as possible for power generation. Finally, due to uncertainty of wind and light power generation, it is difficult to design an efficient and economical scheduling strategy to meet energy load requirements of the charging pile, and simultaneously, energy surplus of the charging pile is reduced as much as possible.
There are many studies on charging strategies of electric vehicles, for example, an "Intelligent Charging Scheduling Algorithm (ICSA)" has been proposed to reduce charging time and energy consumption of electric vehicles. However, the authors do not consider real-time road traffic conditions in the model, which may lead to the electric vehicle not being charged in time due to road congestion. Still other people put forward a plug-in electric automobile intelligent charging scheduling strategy for supplementary plug-in electric automobile seeks charging stake that charge time, travel time and charge cost are minimum. However, the authors do not take into account the remaining energy consumption and user mileage anxiety of the charging stake [3] Variations in (c) may result in less than optimal results for policy guidance. Still another person provides an electric automobile navigation frame that charges, connects electric power intelligent object and traffic intelligent object, and the method through layering recreation is effective with electric automobile navigation to charging stake. However, the authors of the document do not consider that the change of the energy price of the charging piles in different regions may have different effects on the policy guidance, and some electric car users may endure a longer distance to select charging piles with lower prices for the charging task. Therefore, it is necessary to design a comprehensive electric vehicle charging architecture to improve the user charging satisfaction in the intelligent traffic agent.
With the success of alpha go, deep Reinforcement Learning (DRL) has shown great potential in decision-making application problems, and recently, many researches have been carried out to solve the problem of electric vehicle charging scheduling in complex scenes by using the DRL, so that an effective charging planning model is proposed, and an optimal scheduling strategy is obtained by adopting a Deep Q Network (DQN) algorithm based on the DRL, so that real-time intelligent energy management of the electric vehicle is realized. However, DQN can only handle discrete action tasks, and it is obvious that DQN is not suitable for continuous action electric car charging scenarios. A multi-target electric vehicle charge coordination framework has also been proposed to achieve lower charge costs. In addition, authors have used long and short term memory networks to predict electricity prices to cope with uncertain price mechanisms. However, the effect of real-time changes in road traffic on the charging strategy of electric vehicles is not fully considered in this document. The uncertainty of user commute and the randomness of the electric automobile reaching the charging pile are solved artificially, the author describes the electric automobile charging problem as a constrained Markov decision process, and finally a constrained charging strategy is found. However, the authors do not consider the real-time electric quantity of the electric vehicle and the residual electric quantity of the charging pile, which easily causes that the electric vehicle cannot be charged in time due to insufficient electric quantity, and the energy of the charging pile is also caused to be excessive,
In summary, in the prior art, in the joint optimization of economic dispatch and electric vehicle charging strategy in the energy internet, there is a problem that consideration is incomplete.
Disclosure of Invention
The invention aims to solve the problem of incomplete consideration in the joint optimization of economic dispatch and electric vehicle charging strategies in the energy Internet in the prior art, designs an optimal charging strategy to consume renewable energy sources as much as possible to generate electricity, reduces environmental pollution and improves user charging satisfaction. Meanwhile, the power economy dispatching of the charging pile is considered, and an optimal power dispatching strategy is given out by establishing a two-stage multi-objective optimization model and using a deep reinforcement learning method based on near-end strategy optimization, so that lower power generation cost and lower carbon emission are realized.
The technical scheme of the application is as follows:
s1: construction of data set 1
S1.1: acquiring historical data information of a charging pile;
The charging pile historical data information comprises charging pile historical load data and charging pile historical energy storage data;
the charging pile historical load data comprise historical load data of a wind generating set on a charging pile, historical load data of a photovoltaic generating set on the charging pile and historical load data of a thermal generating set on the charging pile;
the charging pile historical energy storage data comprises historical energy storage data of a charging pile day-ahead dispatching stage and historical energy storage data of a charging pile day-in-short-term rolling dispatching stage;
s1.2: the GRU network based on Attention (Attention) mechanism processes historical load data of the wind generating set on the charging pile, historical load data of the photovoltaic generating set on the charging pile and historical load data of the thermal generating set on the charging pile respectively to obtain predicted generating data of the wind generating set, predicted generating data of the photovoltaic generating set and predicted generating power of the thermal generating set on the charging pile in a day-ahead dispatching stage;
the wind generating set forecast power generation data comprises: the predicted power generation power of the wind generating set to the charging pile in the day-ahead dispatching stage and the predicted power generation power of the wind generating set to the charging pile in the day-in-short-term rolling dispatching stage;
The photovoltaic generator set forecast power generation data comprises: the predicted power generation power of the photovoltaic generator set to the charging pile in the day-ahead dispatching stage and the predicted power generation power of the photovoltaic generator set to the charging pile in the day-in dispatching stage;
s1.3: taking predicted power generation of the charging pile by the wind generating set in the day-ahead dispatching stage, predicted power generation of the charging pile by the photovoltaic generating set in the day-ahead dispatching stage, historical energy storage data of the charging pile in the day-ahead dispatching stage and load power of the charging pile in the day-ahead dispatching stage as a data set 1;
the load power of the charging pile in the day-ahead dispatching stage consists of the predicted power generation power of the wind generating set to the charging pile in the day-ahead dispatching stage, the predicted power generation power of the photovoltaic generating set to the charging pile in the day-ahead dispatching stage and the predicted power generation power of the thermal generating set to the charging pile in the day-ahead dispatching stage;
s2: an electric vehicle charging schedule optimization model which considers two-stage economic schedule and is optimized based on a near-end strategy is constructed;
the electric vehicle charging schedule optimization model based on the near-end policy optimization and considering two-stage economic schedule comprises the following steps: the electric power economic dispatch model and the electric vehicle charging dispatch model;
the specific process for constructing the electric vehicle charging schedule optimization model based on near-end policy optimization and considering two-stage economic schedule is as follows:
S2.1: establishing an objective function and constraint conditions of the power economy dispatching, and establishing an optimized form of the power economy dispatching according to the objective function and constraint conditions of the power economy dispatching;
s2.2: establishing an objective function and constraint conditions of electric vehicle charging schedule, and establishing an optimized form of the electric vehicle charging schedule according to the objective function and constraint conditions of the electric vehicle charging schedule;
s3: converting an electric vehicle charging schedule optimization model which is optimized based on a near-end strategy and considers two-stage economic schedule into a corresponding Markov decision model; the markov decision model is generally composed of a triplet of data S, A, R, including: a state set S, an action set A and a return function R;
the specific process of converting the electric vehicle charging schedule optimization model based on the near-end strategy optimization and considering the two-stage economic schedule into the corresponding Markov decision model is as follows:
s3.1: establishing a Markov decision model of the power economy dispatching according to the power economy dispatching optimization form; the markov decision model of the power economy dispatch includes: a Markov decision model for day-ahead scheduling and a Markov decision model for intra-day short-term rolling scheduling;
S3.2: establishing a Markov decision model of the electric vehicle charging schedule according to an optimized form of the electric vehicle charging schedule;
s4: solving a Markov decision model of the power economy dispatching to obtain an optimal power economy dispatching strategy planning table;
the optimal power economy scheduling policy schedule includes: a day-ahead scheduling strategy consisting of a day-ahead scheduling action set of the generator set and a day-in short-term rolling scheduling strategy consisting of a day-in short-term rolling scheduling action set of the generator set;
the generator set comprises a thermal generator set, a wind generator set and a photovoltaic generator set;
s4.1: constructing an Actor-Critic network integrating an importance sampling technology and a dynamic step length mechanism under an Actor-Critic architecture,
actor network according to current state s t Generating a random strategy pi by the current strategy parameter theta θ And according to the current policy pi θ The output intelligent agent takes corresponding action a t The method comprises the steps of carrying out a first treatment on the surface of the the intelligent agent is making action a at time t t Thereafter, the Actor-Critic network returns an instant prize r to the agent t Then the agent generates a state vector s at the next moment t+1 Obtaining a training sample sequence<s t ,a t ,r t ,s t+1 ,π θ >The method comprises the steps of carrying out a first treatment on the surface of the The agent is an entity that takes action and makes decisions based on feedback; the current state s t Is the current location or condition of the agent;
when training the Actor network, a strategy gradient descent method is adopted, and the strategy parameter theta of the Actor network is updated in the direction of the maximum rewarding expected value J (theta) obtained by the intelligent agent; the strategy gradient descending method is to update strategy parameters according to the descending direction of the measurement gradient by calculating the strategy gradient;
the strategy gradient is calculated by introducing a dominance function, and the dominance function represents the excellent degree of the selected action in the current state; the dominance function is a state-action value functionAnd a state value function generated by Critic network +.>Obtaining the difference value of (2);
the Critic network is responsible for evaluating action a t Extracting the current state s from the extracted sample sequence t Calculating the current state s t Lower state value functionCritic network uses bonus function +.>Sum state value functionThe Mean Square Error (MSE) between as a loss function to update the network parameter μ;
the importance sampling technology is used for estimating the expected value of the reward of updating the new strategy for the old strategy, so as to determine the maximizing direction of the expected value J (theta) of the reward and update the strategy parameters;
the dynamic step length mechanism is used for limiting the updating step length through a shearing mechanism of the dominance function in the training process, so that the dynamic step length adjustment is realized, and the loss function of the Actor network is obtained;
S4.2: constructing an Actor1-Critic1 network which is integrated with an importance sampling technology and a dynamic step length mechanism under an Actor-Critic architecture, and solving a Markov decision model of day-ahead scheduling through the Actor1-Critic1 network to obtain a day-ahead scheduling strategy formed by a group of optimal day-ahead scheduling action sets;
the policy pi generated by the Actor1 network θ1 Obeying Beta (Beta) distribution;
the specific process of solving the Markov decision model scheduled in the day before by the Actor1-Critic1 network is as follows:
data set 1 constructs state set s of Actor1-Critic1 network t1 Then input into an Actor1-Critic1 network, and the output generator set takes a day-ahead scheduling actionRewards and rewards/>Obtaining training sample sequencesTraining sample sequence->π θ1 >Stored in experience buffer->
Actor1-Critic1 network slave experience bufferRandomly extracting a batch of sample sequences, training and updating the network parameters theta of the Actor1 1 And Critic1 network parameter μ 1 Obtaining a day-ahead scheduling strategy consisting of a group of optimal day-ahead scheduling action sets;
the day-ahead scheduling action set includes: the method comprises the steps that in a day-ahead dispatching stage, a photovoltaic generator set dispatches the Internet surfing power to a charging pile, a wind power generator set dispatches the Internet surfing power to the charging pile, a thermal generator set dispatches the actual power generation power to the charging pile, and the thermal generator set is in an operation state in the day-ahead dispatching stage;
S4.3: constructing a data set 2;
the method comprises the steps of taking predicted power generation power of a charging pile by a wind generating set in a short-term rolling scheduling stage in the day, predicted power generation power of the charging pile by a photovoltaic generating set in the short-term rolling scheduling stage in the day, historical energy storage data of the charging pile in the short-term rolling scheduling stage in the day, load power of the charging pile in the short-term rolling scheduling stage in the day and the running state of the thermal generating set in the short-term rolling scheduling stage in the day as a data set 2;
the load power of the charging pile in the daily short-term rolling scheduling stage consists of the predicted power generation power of the wind generating set to the charging pile in the daily front scheduling stage, the predicted power generation power of the photovoltaic generating set to the charging pile in the daily short-term rolling scheduling stage and the actual power generation power of the thermal generating set to the charging pile in the daily short-term rolling scheduling stage;
the operation state of the thermal generator set in the daily short-term rolling scheduling stage is the same as the operation state of the thermal generator set in the daily front scheduling stage;
s4.4: constructing an Actor2-Critic2 network integrating an importance sampling technology and a dynamic step length mechanism under an Actor-Critic architecture, and solving a Markov decision model of intra-day short-term rolling scheduling through the Actor2-Critic2 network to obtain an intra-day short-term rolling scheduling strategy consisting of a group of intra-day short-term rolling scheduling action sets;
The policy pi generated by the Actor2 network θ2 Obeying Beta (Beta) distribution;
the method for solving the Markov decision model of the intra-day short-term rolling schedule by the Actor2-Critic2 network comprises the following specific processes:
data set 2 construction of state set s of Actor2-Critic2 network t2 Then input the data to an Actor2-Critic2 network, and the output generator set takes a daily short-term rolling scheduling action a t2 Prize r t2 Obtaining a training sample sequence<s t2 ,a t2 ,r t2 ,s t2+1 ,π θ2 >Training a sample sequence<s t2 ,a t2 ,r t2 ,s t2+1 ,π θ2 >Stored in experience buffers
Actor2-Critic2 network slave experience bufferA batch of sample sequences are randomly extracted to train and update the Actor2 and Critic2 network parameters theta 2 Sum mu 2 Obtaining a daily short-term rolling scheduling strategy consisting of a group of daily short-term rolling scheduling action sets,
based on the S4.2 and the S4.4, an optimal power economy dispatching strategy planning table is finally obtained, and high-precision power distribution of the charging pile is realized;
the intra-day short-term rolling schedule set includes: in the daily short-term rolling scheduling stage, the photovoltaic generator set schedules the Internet surfing power to the charging pile, the wind power generator set schedules the Internet surfing power to the charging pile, and the thermal generator set schedules the actual power generation power to the charging pile;
s5: constructing a data set 3, and solving a Markov decision model of the charging schedule of the electric automobile to obtain an optimal charging schedule strategy of the electric automobile; the specific process is as follows:
S5.1: acquiring data information of electric vehicles, charging piles, energy storage systems and road traffic conditions as a data set 3
The electric vehicle data includes: residual electric quantity of the electric automobile;
the charging pile data includes: the electricity price of the charging pile, the number of queuing vehicles of the charging pile, the generated power received by the charging pile and the minimum electric quantity required by the electric automobile to travel to the charging pile;
the energy storage system data includes: residual electric quantity in the energy storage system;
the road traffic condition data includes: the distance between two routing nodes, the road congestion index and the average road passing speed;
s5.2: the specific process for obtaining the optimal charging scheduling strategy of the electric automobile by solving the Markov decision model of the charging scheduling of the electric automobile comprises the following steps:
constructing an Actor3-Critic3 network integrating an importance sampling technology and a dynamic step length mechanism under an Actor-Critic architecture, solving a Markov decision model of electric automobile charging schedule through the Actor3-Critic3 network,
policy pi generated by the Actor3 network θ3 Obeying a category (category) distribution;
the specific process of solving the Markov decision model of the electric automobile charging schedule by the Actor3-Critic3 network is as follows:
Data set 3 construction of state set s of Actor3-Critic3 network t3 Then input to an Actor3-Critic3 network, and output that the electric automobile can take charge scheduling actiona t3 Prize r t3 Obtaining a training sample sequence<s t3 ,a t3 ,r t3 ,s t3+1 ,π θ3 >Training a sample sequence<s t3 ,a t3 ,r t3 ,s t3+1 ,π θ3 >Stored in experience buffers
Actor3-Critic3 network slave experience bufferA batch of sample sequences are randomly extracted to train and update the Actor3 and Critic3 network parameters theta 3 Sum mu 3 Finally, an optimal electric vehicle charging scheduling strategy formed by a group of electric vehicle charging scheduling action sets is obtained;
the electric vehicle charging scheduling action consists of selecting which charging pile is charged by the electric vehicle and charging power obtained by the selected charging pile.
Compared with the prior art, the application has the following effects:
the electric vehicle charging scheduling model based on near-end policy optimization and considering two-stage economic scheduling is provided. The model aims at maximizing the charging satisfaction of the user and the energy consumption of the charging pile. Meanwhile, the power generation cost and the carbon emission of the power generation side are reduced to the greatest extent. And finally, an optimal strategy of electric vehicle charging scheduling is provided. In order to effectively solve the proposed model, a three-step optimization framework is first proposed. The first step is to give the generating schedule of the unit by day-ahead scheduling. The second step is to improve the accuracy of power dispatching through daily short-term rolling dispatching, and the third step is to select proper charging piles for the electric automobile to realize charging dispatching. In the three-step optimization process, a deep reinforcement learning algorithm based on near-end strategy optimization is used for obtaining an optimal scheduling result. Numerical results show that the proposed algorithm is superior to the most advanced algorithms such as SAC and the like in terms of model convergence speed and magnitude of the reward function.
Drawings
Fig. 1 is a schematic diagram of an electric vehicle charging schedule architecture based on two-stage economic schedule according to the present invention;
fig. 2 is a schematic routing topology diagram of the electric vehicle according to the present invention;
fig. 3 is a schematic diagram illustrating a routing topology pattern of an electric vehicle according to the present invention;
FIG. 4 is a flow chart of an electric vehicle charging scheduling algorithm taking into account two-stage economic scheduling based on near-end policy optimization;
FIG. 5 is a schematic diagram of the next daily load prediction of the charging pile according to the present invention;
FIG. 6 is a graphical representation of day-ahead scheduling average jackpots at different learning rates in accordance with the present invention;
FIG. 7 is a schematic diagram of the power, energy storage and load conditions of the day-ahead dispatcher of the present invention;
FIG. 8 is a schematic diagram of the photovoltaic power generation optimization result scheduled in the future according to the invention;
FIG. 9 is a schematic diagram of the day-ahead schedule wind power generation optimization results of the present invention;
FIG. 10 is a graphical representation of average jackpot for daily short-term rolling schedule at different learning rates in accordance with the present invention;
FIG. 11 is a schematic diagram showing comparison of the optimization results of the photovoltaic power generation of the intra-day short-term rolling schedule and the pre-day schedule according to the present invention;
FIG. 12 is a graph showing comparison of the optimization results of the wind power generation during the daily short-term rolling schedule and the daily front schedule;
FIG. 13 is a graph showing the result of optimizing the carbon dioxide emission amount of the daily short-term rolling schedule according to the present invention;
FIG. 14 is a graph showing average cumulative rewards of electric vehicle charging schedule at different learning rates in accordance with the present invention;
fig. 15 is a schematic diagram of the energy remaining amount optimization result of the charging pile under different learning rates;
fig. 16 is an analysis schematic diagram of the charging schedule result of the electric vehicle according to the present invention;
FIG. 17 is a graphical representation of a comparison of normalized average jackpot results for different deep reinforcement learning algorithms under the same environmental setting of the present invention.
Reference numerals
A power plant-generator set; energy forecast-energy prediction; intraday Optimization-daily roll mobilization strategy; day-ahead Optimization-Day-ahead scheduling policy; electric Vehicle-Electric Vehicle; charging pile; server-Server; wind Power-Wind generating set; photovoltaic Power-photovoltaic generator set; thermal Power-Thermal generator set; energy Storage System-energy storage system; grid-Grid; data Flow-dataset; energy Flow-Energy Flow; charging Dispatching Optimization-electric car charging scheduling strategy; intraday Dispatch-day-ahead schedule; day-ahead Dispatch-intra-Day rolling schedule; a Repaly Buffer-experience Buffer; experient-Experience; sampling-samples; charging stations-charging pile group; EVs Charging Dispatch-electric car charging schedule; calcylate-calculation; loss value-loss function; advantage function-dominance function; update-Update; action-Action; state-State; energy-Energy source; data-Data.
Detailed Description
The first embodiment is as follows: in order to promote the energy consumption in the energy storage system and improve the user charging satisfaction, a mixed integer multi-objective optimization function is established, and the electric automobile can be charged by selecting a proper charging pile while meeting the self charging requirement. Secondly, in order to reduce environmental pollution and power generation cost, firstly, GRU network based on attention mechanism is used for predicting the next day energy demand of the charging pile, then day-ahead scheduling is used for determining the power generation plan and start-stop condition of the unit, and due to uncertainty of wind and light power generation, day-ahead scheduling is introduced on the basis of day-ahead scheduling, and accuracy of power scheduling is improved through a state updating strategy. Due to the highly dynamic traffic environment and power requirements, we learn time-varying road conditions and power generation using a near-end policy-optimized deep reinforcement learning algorithm. Based on the above knowledge, we need to obtain an optimal charging strategy to consume renewable energy sources as much as possible to generate electricity, reduce environmental pollution and improve user charging satisfaction.
We will first generally introduce the agentAnd (5) a model. FIG. 1 is a general frame diagram of an agent, provided with Is an intelligent body set, wherein the intelligent body comprises a thermal power generating unit, a wind power generating unit and a photoelectric unit,is an electric automobile set, which is->Is a collection of charging piles.
The method for optimizing the economic dispatch and the electric vehicle charging strategy in the energy internet in a combined way comprises the following steps:
s1: the data set 1 is constructed by the following specific processes:
s1.1: acquiring historical data information of a charging pile;
the charging pile historical data information comprises charging pile historical load data and charging pile historical energy storage data;
the charging pile historical load data comprise historical load data of a wind generating set on a charging pile, historical load data of a photovoltaic generating set on the charging pile and historical load data of a thermal generating set on the charging pile;
the charging pile historical energy storage data comprises historical energy storage data of a charging pile day-ahead dispatching stage and historical energy storage data of a charging pile day-in-short-term rolling dispatching stage;
s1.2: the GRU network based on Attention (Attention) mechanism processes historical load data of the wind generating set on the charging pile, historical load data of the photovoltaic generating set on the charging pile and historical load data of the thermal generating set on the charging pile respectively to obtain predicted generating data of the wind generating set, predicted generating data of the photovoltaic generating set and predicted generating power of the thermal generating set on the charging pile in a day-ahead dispatching stage;
The wind generating set forecast power generation data comprises: the predicted power generation power of the wind generating set to the charging pile in the day-ahead dispatching stage and the predicted power generation power of the wind generating set to the charging pile in the day-in-short-term rolling dispatching stage;
the photovoltaic generator set forecast power generation data comprises: the predicted power generation power of the photovoltaic generator set to the charging pile in the day-ahead dispatching stage and the predicted power generation power of the photovoltaic generator set to the charging pile in the day-in dispatching stage;
s1.3: taking predicted power generation of the charging pile by the wind generating set in the day-ahead dispatching stage, predicted power generation of the charging pile by the photovoltaic generating set in the day-ahead dispatching stage, historical energy storage data of the charging pile in the day-ahead dispatching stage and load power of the charging pile in the day-ahead dispatching stage as a data set 1;
the load power of the charging pile in the day-ahead dispatching stage consists of the predicted power generation power of the wind generating set to the charging pile in the day-ahead dispatching stage, the predicted power generation power of the photovoltaic generating set to the charging pile in the day-ahead dispatching stage and the predicted power generation power of the thermal generating set to the charging pile in the day-ahead dispatching stage;
s2: an electric vehicle charging schedule optimization model which considers two-stage economic schedule and is optimized based on a near-end strategy is constructed;
The electric vehicle charging schedule optimization model based on the near-end policy optimization and considering two-stage economic schedule comprises the following steps: the electric power economic dispatch model and the electric vehicle charging dispatch model;
the specific process for constructing the electric vehicle charging schedule optimization model based on near-end policy optimization and considering two-stage economic schedule is as follows:
s2.1: establishing an objective function and constraint conditions of the power economy dispatching, and establishing an optimized form of the power economy dispatching according to the objective function and constraint conditions of the power economy dispatching;
s2.2: establishing an objective function and constraint conditions of electric vehicle charging schedule, and establishing an optimized form of the electric vehicle charging schedule according to the objective function and constraint conditions of the electric vehicle charging schedule;
s3: converting an electric vehicle charging schedule optimization model which is optimized based on a near-end strategy and considers two-stage economic schedule into a corresponding Markov decision model; the markov decision model is generally composed of a triplet of data S, A, R, including: a state set S, an action set A and a return function R;
the specific process of converting the electric vehicle charging schedule optimization model based on the near-end strategy optimization and considering the two-stage economic schedule into the corresponding Markov decision model is as follows:
S3.1: establishing a Markov decision model of the power economy dispatching according to the power economy dispatching optimization form; the markov decision model of the power economy dispatch includes: a Markov decision model for day-ahead scheduling and a Markov decision model for intra-day short-term rolling scheduling;
s3.2: establishing a Markov decision model of the electric vehicle charging schedule according to an optimized form of the electric vehicle charging schedule;
s4: solving a Markov decision model of the power economy dispatching to obtain an optimal power economy dispatching strategy planning table;
the optimal power economy scheduling policy schedule includes: a day-ahead scheduling strategy consisting of a day-ahead scheduling action set of the generator set and a day-in short-term rolling scheduling strategy consisting of a day-in short-term rolling scheduling action set of the generator set;
the generator set comprises a thermal generator set, a wind generator set and a photovoltaic generator set;
s4.1: constructing an Actor-Critic network integrating an importance sampling technology and a dynamic step length mechanism under an Actor-Critic architecture,
actor network according to current state s t Generating a random strategy pi by the current strategy parameter theta θ And according to the current policy pi θ The output intelligent agent takes corresponding action a t The method comprises the steps of carrying out a first treatment on the surface of the the intelligent agent is making action a at time t t Thereafter, the Actor-Critic network returns an instant prize r to the agent t Then the agent generates a state vector s at the next moment t+1 Obtaining a training sample sequence<s t ,a t ,r t ,s t+1 ,π θ >The method comprises the steps of carrying out a first treatment on the surface of the The agent is an entity that takes action and makes decisions based on feedback; the current state s t Is intelligentThe current location or condition of the body;
wherein the Actor network is composed of Z 1 The Critic network consists of Z 2 The electric vehicle charging scheduling optimization model based on the near-end strategy optimization and considering two-stage economic scheduling comprises the following time complexity:
wherein,and->Respectively represent the z < th 1 Personal Actor network layer and z-th 2 The number of neurons in the individual Critic network layers;
when training the Actor network, a strategy gradient descent method is adopted, and the strategy parameter theta of the Actor network is updated in the direction of the maximum rewarding expected value J (theta) obtained by the intelligent agent; the strategy gradient descending method is to update strategy parameters according to the descending direction of the measurement gradient by calculating the strategy gradient;
the strategy gradient is calculated by introducing a dominance function, and the dominance function represents the excellent degree of the selected action in the current state; the dominance function is a state-action value function And a state value function generated by Critic network +.>Obtaining the difference value of (2);
the Critic network is responsible for evaluating action a t Extracting the current state s from the extracted sample sequence t Calculating the current state s t Lower state value functionCritic network uses bonus function +.>Sum state value functionThe Mean Square Error (MSE) between as a loss function to update the network parameter μ;
the importance sampling technology is used for estimating the expected value of the reward of updating the new strategy for the old strategy, so as to determine the maximizing direction of the expected value J (theta) of the reward and update the strategy parameters;
the dynamic step length mechanism is used for limiting the updating step length through a shearing mechanism of the dominance function in the training process, so that the dynamic step length adjustment is realized, and the loss function of the Actor network is obtained; the problem that the training process of the model cannot be converged due to the fact that the updating step length is too sensitive is avoided;
s4.2: constructing an Actor1-Critic1 network (constructing as S4.1) which integrates an importance sampling technology and a dynamic step length mechanism under an Actor-Critic architecture, and solving a Markov decision model of day-ahead scheduling through the Actor1-Critic1 network to obtain a day-ahead scheduling strategy consisting of a group of optimal day-ahead scheduling action sets;
The policy pi generated by the Actor1 network θ1 Obeying Beta (Beta) distribution;
the specific process of solving the Markov decision model scheduled in the day before by the Actor1-Critic1 network is as follows:
data set 1 constructs state set s of Actor1-Critic1 network t1 Then input into an Actor1-Critic1 network, and the output generator set takes a day-ahead scheduling actionRewards->Obtaining training sample sequencesTraining sample sequence->Stored in experience buffer->
Actor1-Critic1 network slave experience bufferRandomly extracting a batch of sample sequences, training and updating the network parameters theta of the Actor1 1 And Critic1 network parameter μ 1 Obtaining a day-ahead scheduling strategy consisting of a group of optimal day-ahead scheduling action sets;
the day-ahead scheduling action set includes: the method comprises the steps that in a day-ahead dispatching stage, a photovoltaic generator set dispatches the Internet surfing power to a charging pile, a wind power generator set dispatches the Internet surfing power to the charging pile, a thermal generator set dispatches the actual power generation power to the charging pile, and the thermal generator set is in an operation state in the day-ahead dispatching stage;
s4.3: constructing a data set 2;
the method comprises the steps of taking predicted power generation power of a charging pile by a wind generating set in a short-term rolling scheduling stage in the day, predicted power generation power of the charging pile by a photovoltaic generating set in the short-term rolling scheduling stage in the day, historical energy storage data of the charging pile in the short-term rolling scheduling stage in the day, load power of the charging pile in the short-term rolling scheduling stage in the day and the running state of the thermal generating set in the short-term rolling scheduling stage in the day as a data set 2;
The load power of the charging pile in the daily short-term rolling scheduling stage consists of the predicted power generation power of the wind generating set to the charging pile in the daily front scheduling stage, the predicted power generation power of the photovoltaic generating set to the charging pile in the daily short-term rolling scheduling stage and the actual power generation power of the thermal generating set to the charging pile in the daily short-term rolling scheduling stage;
the operation state of the thermal generator set in the daily short-term rolling scheduling stage is the same as the operation state of the thermal generator set in the daily front scheduling stage;
s4.4: constructing an Actor2-Critic2 network integrating an importance sampling technology and a dynamic step length mechanism under an Actor-Critic architecture, and solving a Markov decision model of intra-day short-term rolling scheduling through the Actor2-Critic2 network to obtain an intra-day short-term rolling scheduling strategy consisting of a group of intra-day short-term rolling scheduling action sets;
the policy pi generated by the Actor2 network θ2 Obeying Beta (Beta) distribution;
the method for solving the Markov decision model of the intra-day short-term rolling schedule by the Actor2-Critic2 network comprises the following specific processes:
data set 2 construction of state set s of Actor2-Critic2 network t2 Then input the data to an Actor2-Critic2 network, and the output generator set takes a daily short-term rolling scheduling action a t2 Prize r t2 Obtaining a training sample sequence<s t2 ,a t2 ,r t2 ,s t2+1 ,π θ2 >Training a sample sequence<s t2 ,a t2 ,r t2 ,s t2+1 ,π θ2 >Stored in experience buffers
Actor2-Critic2 network slave experience bufferA batch of sample sequences are randomly extracted to train and update the Actor2 and Critic2 network parameters theta 2 Sum mu 2 Obtaining a daily short-term rolling scheduling strategy consisting of a group of daily short-term rolling scheduling action sets,
based on the S4.2 and the S4.4, an optimal power economy dispatching strategy planning table is finally obtained, and high-precision power distribution of the charging pile is realized;
the intra-day short-term rolling schedule set includes: in the daily short-term rolling scheduling stage, the photovoltaic generator set schedules the Internet surfing power to the charging pile, the wind power generator set schedules the Internet surfing power to the charging pile, and the thermal generator set schedules the actual power generation power to the charging pile;
s5: constructing a data set 3, and solving a Markov decision model of the charging schedule of the electric automobile to obtain an optimal charging schedule strategy of the electric automobile; the specific process is as follows:
s5.1: acquiring data information of electric vehicles, charging piles, energy storage systems and road traffic conditions as a data set 3
The electric vehicle data includes: residual electric quantity of the electric automobile;
the charging pile data includes: the electricity price of the charging pile, the number of queuing vehicles of the charging pile, the generated power received by the charging pile and the minimum electric quantity required by the electric automobile to travel to the charging pile;
The energy storage system data includes: residual electric quantity in the energy storage system;
the road traffic condition data includes: the distance between two routing nodes, the road congestion index and the average road passing speed;
s5.2: the specific process for obtaining the optimal charging scheduling strategy of the electric automobile by solving the Markov decision model of the charging scheduling of the electric automobile comprises the following steps:
constructing an Actor3-Critic3 network integrating an importance sampling technology and a dynamic step length mechanism under an Actor-Critic architecture, solving a Markov decision model of electric automobile charging schedule through the Actor3-Critic3 network,
policy pi generated by the Actor3 network θ3 Obeying a category (category) distribution;
the specific process of solving the Markov decision model of the electric automobile charging schedule by the Actor3-Critic3 network is as follows:
data set 3 construction of state set s of Actor3-Critic3 network t3 Then input to an Actor3-Critic3 network, and output that the electric automobile can take a charging scheduling action a t3 Prize r t3 Obtaining a training sample sequence<s t3 ,a t3 ,r t3 ,s t3+1 ,π θ3 >Training a sample sequence<s t3 ,a t3 ,r t3 ,s t3+1 ,π θ3 >Stored in experience buffers
Actor3-Critic3 network slave experience bufferA batch of sample sequences are randomly extracted to train and update the Actor3 and Critic3 network parameters theta 3 Sum mu 3 Finally, an optimal electric vehicle charging scheduling strategy formed by a group of electric vehicle charging scheduling action sets is obtained;
the electric vehicle charging scheduling action consists of selecting which charging pile is charged by the electric vehicle and charging power obtained by the selected charging pile.
According to the invention, the power generation side can firstly combine the historical load data of the charging pile in the interaction data of the intelligent body and the charging pile to predict the load demand of the next day, and then carry out daily scheduling to give a power generation plan and a start-stop state of the unit. And then, introducing daily short-term rolling scheduling on the basis of daily scheduling, and aiming at minimizing the short-term adjustment cost and carbon emission of the unit, so as to realize high-precision power scheduling. And finally, selecting a proper charging pile for charging according to the charging demand of the electric automobile, the road traffic condition and the energy utilization rate in the energy storage intelligent body. The method solves the problems that in the prior art, only discrete action tasks can be processed, the charging scene of the electric vehicle is not suitable for continuous action, the influence of real-time change of road traffic on the charging strategy of the electric vehicle is not fully considered, and the situation of real-time electric quantity of the electric vehicle and the residual electric quantity of a charging pile is not considered, so that the electric vehicle cannot be charged in time due to insufficient electric quantity or the energy of the charging pile is excessive.
The second embodiment is as follows: the method for optimizing the economic dispatch and the electric vehicle charging strategy in the energy internet in a combined way comprises the following steps: the S2.1: the objective function of establishing the power economy schedule includes: an objective function of day-ahead scheduling and an objective function of intra-day short-term rolling scheduling; constraints on power economy scheduling include: constraint conditions of day-ahead scheduling and constraint conditions of intra-day short-term rolling scheduling; establishing the power economy schedule includes the following optimized forms: an optimized form of day-ahead scheduling and an optimized form of intra-day short-term rolling scheduling;
the objective function of the day-ahead schedule includes: the wind and light discarding cost of the renewable energy unit, the power generation cost of the traditional energy unit and the carbon emission of the traditional energy unit are reduced;
the constraint conditions of the day-ahead scheduling comprise: power balance constraint, unit output size constraint, unit running state constraint, unit minimum running time and downtime constraint in the day-ahead stage;
the objective function of the intra-day short-term rolling schedule comprises the following steps: the total cost of the unit in the daily short-term rolling scheduling stage and the carbon emission of the thermal power unit are the lowest, and the total cost of the unit comprises the waste light and waste wind punishment cost and the adjustment cost of the thermal power generator unit;
The constraint conditions of the intra-day short-term rolling schedule comprise: power balance constraint in the daytime, output force size constraint of the unit and climbing constraint of the unit;
the S2.2: establishing an objective function of the electric vehicle charging schedule, wherein the objective function comprises maximizing satisfaction of users and energy consumption of charging piles;
the S2.2: the constraint conditions for establishing the electric vehicle charging schedule comprise: capacity constraint of the energy storage system, selection constraint of the charging pile, charging power constraint, remaining capacity constraint of the electric automobile, charging capacity constraint of the electric automobile and road network topology constraint. The other steps are the same as those of the first embodiment.
And a third specific embodiment: referring to fig. 4 to fig. 7, in this embodiment, the training, using a DRL method based on near-end policy optimization, of an electric vehicle charging schedule optimization model based on near-end policy optimization and considering two-stage economic schedule, to obtain a corresponding decision action includes:
the objective function of the day-ahead schedule is specifically expressed as:
wherein,
wherein P and O respectively represent the internet power and the running state of the machine set in the day-ahead stage;the cost is punished for the solar photovoltaic and wind turbine generator system light discarding at the day-ahead stage; />Punishment cost for wind curtailment of photovoltaic and wind turbine units in the day-ahead stage; / >The method is the power generation cost of the thermal power generating unit in the day-ahead stage, and comprises the material cost, the maintenance cost and the start-stop cost of the unit; />Is the carbon emission quantity k of the thermal power generating unit in the day-ahead stage p Penalty cost coefficients for the photovoltaic, wind turbines; />At t for photovoltaic 1 Predicting power generation at moment; />At t for wind turbine 1 Predicting power generation at moment; />At t for photovoltaic 1 The internet surfing power at the moment; />At t for wind turbine 1 The internet surfing power at the moment; />Indicating that the thermal power unit i is at t 1 The running state of the time period is a variable of 0-1, and the running state of the unit is represented when the value is 1; a, a i ,b i ,c i The cost coefficient of the power generation material of the thermal power unit i; />At t for thermal power unit i 1 The actual power generated at the moment; k (k) m The maintenance cost coefficient of the thermal power unit i; />The starting cost of the thermal power unit i is; />The shutdown cost of the thermal power unit i is; />Indicating that the thermal power unit i is at t 1 Starting-up behavior of period->Indicating that the thermal power unit i is at t 1 Period shutdown behavior, < >>And->All are 0-1 variables, and the value of the variable is represented at t when the value is 1 1 The time period is when the power-on or power-off action occurs; alpha ggggg The carbon emission coefficients of the thermal power generating unit i are respectively; />Is a thermal power unit set; g i Is the ithA thermal power generating unit; i E [1, NG];
The constraint condition of the day-ahead scheduling is specifically expressed as follows:
Power balance constraint in the early day phase:
in the method, in the process of the invention,at t 1 The load power of the jth charging pile at the moment; />At t for the jth energy storage system in the day-ahead stage 1 The storage power at the moment; />Collecting charging piles; q j For the j-th charging pile, j is E [1, M];
The output size constraint of the unit is as follows:
in the method, in the process of the invention,respectively the minimum power generation power and the maximum power generation power of the thermal power generating unit i in the day-ahead scheduling stage;
the unit running state constraint is as follows:
wherein,maximum power generation power is respectively scheduled for the photovoltaic and wind generating sets before the day,
the minimum running time and the shutdown time constraint of the unit are as follows:
/>
wherein T is i up The minimum running time of the unit; t (T) i down Minimum downtime for the unit;
the optimization form of the day-ahead schedule is as follows:
s.t.~.(3)~(12).
wherein K is 1 ,K 2 ,K 3 ,K 4 Respectively the weight coefficient of each component part, andnormalizing it to satisfyThe other steps are the same as those of the second embodiment.
The specific embodiment IV is as follows: the present embodiment is described with reference to fig. 4, where the objective function of the intra-day short-term rolling schedule is expressed as:
wherein,
wherein P is the Internet surfing power of the intra-day stage unit; t is t 0 The time is the initial optimization time; delta T is the optimized duration;the cost of light and wind abandoning in the daily stage is punished respectively; / >The adjustment cost of the thermal power generating unit in the daytime is realized;carbon emission of the thermal power generating unit; in (1) the->The predicted power generation power of the photovoltaic and wind turbine units in the daily short-term rolling scheduling stage is respectively calculated; />Respectively photovoltaic and wind turbine at t 2 The internet power of the time period; />The method is used for adjusting the cost coefficient of the thermal power generating unit in the daytime; />For the operation state of the thermal power generating unit i in the daily short-term rolling scheduling stage, +.>And->The two values are the same; />At t for thermal power unit i 2 Actual power generated during the time period;
the constraint conditions of the intra-day short-term rolling schedule are specifically expressed as follows:
power balance constraint during the intra-day phase:
in the method, in the process of the invention,at t 2 The load power of the jth charging pile at the moment; />The j-th energy storage system is in t for the day stage 2 The storage power at the moment;
upper and lower limit constraint of daily short-term rolling dispatching output of thermal power generating unit:
wherein,respectively, thermal power generating uniti minimum and maximum power generated in the short-term rolling scheduling stage in the day;
upper and lower limit constraint of solar short-term rolling dispatching output of photovoltaic unit:
upper and lower limit constraint of daily short-term rolling dispatching output of wind turbine
Wherein,maximum power generation of the photovoltaic unit and the wind turbine unit which are scheduled in a short-term rolling manner in the day are respectively;
And the daily short-term rolling schedule takes 15 minutes as an interval, and the climbing constraint of the unit:
wherein R is iu ,R id The upper limit value and the lower limit value of the climbing rate of the thermal power unit i are respectively
The optimization form of the daily short-term rolling schedule is as follows:
s.t.~.(16)~(22).
wherein K is 5 ,K 6 ,K 7 ,K 8 The weight coefficients of the components are respectively normalized to satisfyThe other steps are the same as in the third embodiment.
Fifth embodiment: the present embodiment is described with reference to fig. 1 to 2, and the objective function for establishing the electric vehicle charging schedule is expressed as:
wherein, P and z respectively represent the charging power of the electric automobile and which charging pile is selected to carry out the charging task;representing the user satisfaction degree of the nth electric vehicle; />Representing the energy storage capacity of the jth charging pile; />The method is an electric automobile set; y is n Representing an nth electric automobile; n is E [1, N];
The user satisfaction index comprises a distance energy consumption cost, a charging cost, a driving time, a queuing waiting time and a tolerant distance anxiety value, and the specific expression modes are as follows:
path energy consumption costs: after the nth electric automobile sends out the charging requirement, a certain energy cost is consumed when the electric automobile runs to the charging pileExpressed as:
wherein, l and m are road node numbers respectively; t (T) ed Is a scheduling period; zeta type n Representing the battery consumption of the nth electric automobile per kilometer;the electricity price of the jth charging pile in the charging scheduling stage of the electric automobile is represented, all charging piles are set to belong to the same third party operator, the electricity price at each moment is determined by the third party operator, and the electricity price of the charging piles in the same area at each moment is the same; d, d lm,n Representing the distance between the nodes l and m through which the nth electric automobile selects the jth charging pile; phi (phi) lm,n A binary 0-1 variable, when the value is 1, the nth electric automobile runs from the node l to the node m, and when the value is 0, the route is not selected; z j,n A binary 0-1 variable which indicates whether the jth charging pile is selected by the nth electric automobile;
charging cost: charging cost of electric automobileThe product form of the charging quantity and the electricity price is described as:
wherein,representing the charge quantity of the nth electric automobile in a charging stage; />Battery charging efficiency for the nth electric vehicle; />Indicating the charging work of the nth electric automobile in the charging stageA rate; />Representing the maximum battery capacity of the nth electric automobile; Δt (delta t) 3 Representing the charging time of the electric automobile;
Travel time: the time taken by the nth electric vehicle to travel from the departure place to the charging pile is expressed as
Wherein,indicating that the nth electric automobile is at t 3 The congestion index of the road through which the charging pile passes is reached at the moment,indicating that the road through which the nth electric automobile passes is at t 3 Average speed of time of day, here +.>And->All are from the traffic department to obtain real-time traffic conditions;
queuing wait time: after the nth electric automobile runs to the charging pile, if other electric automobiles are charging in the charging pile, a certain queuing waiting time is needed at the momentExpressed as:
wherein,indicating that the jth charging pile is at t 3 The number of queuing vehicles at the moment; />Indicating that the jth charging pile is at t 3 The charging completion efficiency at the moment; />Indicating the queuing waiting time required when the electric automobile n goes to the j-th charging pile for charging,
tolerance distance anxiety values: the nth electric vehicle may endure a great distance anxiety degree in some casesTo select a remote charging pile for a charging task, thereby realizing a lower charging cost, expressed as:
/>
wherein ρ is n Anxiety coefficient of the nth electric automobile;the remaining capacity of the nth electric automobile;
the user satisfaction of the nth electric vehicle is described as follows in conjunction with the above formula:
Wherein omega s ,The weight coefficients of the components are respectively used for the following purposesUser satisfaction->Normalized to the value of [0,1 ]]And->The larger the value of (c), the better the charging experience for the user,
wherein the method comprises the steps of
Indicated at t 3 The power of the jth charging pile is scheduled in a rolling way in a short time within a day;
the constraint conditions for establishing the electric vehicle charging schedule are specifically expressed as follows:
energy storage capacity constraint: the storage capacity of the single energy storage system is limited by upper and lower limits, and the residual electric quantity value generated by scheduling is calculatedIs that the upper and lower limits of the storage capacity cannot be exceeded:
in the method, in the process of the invention,respectively representing the upper limit and the lower limit of energy storage;
charging pile selection constraint: for the nth electric automobile, only one charging pile can be selected for charging:
charging power constraint: for the nth electric vehicle, the upper limit constraint of charging power needs to be complied with in the charging process:
remaining capacity constraint of electric automobile: the electric quantity of the nth electric automobile in the driving stage must support the electric automobile to drive to the charging pile:
wherein,indicating that the electric automobile is at t 3 The minimum electric quantity required by the j-th charging pile is selected at any time,
charge amount constraint of electric vehicle: the nth electric automobile is at t 3 The charge quantity at the moment can not exceed the dispatch quantity received by the selected charging pile and the residual quantity in the energy storage system nearby at the previous moment And (2) sum:
road network topology constraints: the electric automobile starts from the node omega and arrives at the charging pile setThe constraint ensures that the routes selected by the nth electric automobile can be connected in sequence:
wherein l, m, p are road nodes respectively, i.eφ lmpl And phi in formula (27) lm,n The two variables are binary 0-1 variables, when the value is 1, the route between the two nodes is selected by the electric automobile, otherwise, the route is not selected by the electric automobile;
the optimization form for establishing the electric vehicle charging schedule is as follows:
s.t.~.(32)~(37).
wherein K is 9 ,K 10 The weight coefficients of the components are respectively normalized to satisfy
The other steps are the same as those of the fourth embodiment.
Specific embodiment six: according to the energy internet economic scheduling and electric vehicle charging strategy combined optimization method in the embodiment, because of integer constraint sum, economic scheduling and electric vehicle charging scheduling problems are NP-hard, and when solving by using traditional methods such as Lagrange multiplier method, time cost is relatively high. Thus, we approximate the optimal solution of the model using a near-end policy optimization (DRL) based approach, and this approach does not require any uncertain a priori knowledge.
The reinforcement learning can integrate the characteristics of the historical data in an online manner to form a reinforcement learning model, so that when receiving a scheduling task, an optimal scheduling strategy can be rapidly given out in the online manner. In addition, when the historical data is updated, the existing model can be updated rapidly, and the training speed is improved. However, the conventional method does not have the above advantages, and cannot quickly integrate rules in data and give an optimal scheduling policy. Since deep reinforcement learning is based on a markov decision process, the problem is first converted into a form of markov decision process,
s3.1: establishing a Markov decision model of the power economy dispatching according to the power economy dispatching optimization form; the markov decision model of the power economy dispatch includes: a Markov decision model for day-ahead scheduling and a Markov decision model for intra-day short-term rolling scheduling; the specific process is as follows:
the Markov decision model of the day-ahead schedule is as follows:
state set: the scheduling period set in the day-ahead scheduling stage is 24, namely t 1 ∈[0,23]The method comprises the steps of carrying out a first treatment on the surface of the For each time slot t 1 Using 4 variables to describe the state of the day-ahead scheduling phaseExpressed as:
the four variables are respectively: predicted power generation of photovoltaic pair jth charging pile Predicted power generation of wind turbine unit on jth charging pile +.>Load power of jth charging pile in day-ahead stage +.>The jth energy storage system is at t 1 Storage power +.>And->Is obtained by GRU network model prediction based on the Attention mechanism;
action set: state s of the day-ahead scheduling stage according to equation (39) t1 The generator-side unit takes corresponding actionWherein the actions are->Consists of 4 variables, expressed as:
wherein,respectively photovoltaic, wind turbine set at t 1 The internet power of the j-th charging pile is scheduled at the moment,at t for thermal power unit i 1 The actual power generation power of the jth charging pile is scheduled at the moment, and 4 variables are all results output by an Actor1-Critic1 network;
bonus function: t is t 1 The time slot generator set is making actionThe generator set will then get an immediate prize +.>The reward is also a day-ahead schedule optimization form:
wherein K is 1 ,K 2 ,K 3 ,K 4 The weight coefficient is set in the formula, and the weight coefficient is set in the T 1 After that time, the generator set will receive the total jackpot:
wherein Y is da ∈[0,1]Is a discount rate;the desired function is represented by a function of the desired function,
the Markov decision model of the intra-day short-term rolling schedule is as follows:
state set: for time slot t 2 Using 4 variables to describe the state of the short-term rolling schedule phase within a day The method comprises the following steps of: predicted power generation of photovoltaic unit on jth charging pile>Predicted power generation of wind turbine unit on jth charging pile +.>Load power of jth charging pile in daytime>The jth energy storage system is at t 2 Storage power +.>Expressed as:
wherein,the method is obtained through GRU network model prediction based on an Attention mechanism;
action set: the intra-day short-term rolling schedule phase state according to (43)The generator set will take corresponding action +.>Expressed in terms of 3 variables:
wherein (1)>Respectively photovoltaic at t 2 Scheduling the internet power of the jth charging pile at moment and enabling the wind turbine to be at t 2 Scheduling the internet power to the jth charging pile at a time,/>At t for thermal power unit i 2 The actual power generated by the j-th charging pile is scheduled at the moment, 3 variables are all the results output by the Actor2-Critic2 network,
bonus function: the generator set is according to the stateMake action->After that, the generator set gets instant rewards +.>The prize value is also an optimized form of daily short-term rolling schedule:
wherein K is 5 ,K 6 ,K 7 ,K 8 The weight coefficient is set in the formula, and the weight coefficient is set in the T 2 After that time, the generator set will receive the total jackpot:
wherein Y is di ∈[0,1]For discounted rate
The S3.2: establishing a Markov decision model of the electric vehicle charging schedule according to an optimized form of the electric vehicle charging schedule; the Markov decision model of the electric automobile charging schedule is as follows:
State space: for each time slot t 3 Describing the state s of the electric automobile charging schedule stage from 4 parts t3 Including electric automobile, fill electric pile, energy storage system and road traffic conditions:
wherein,representing the residual electric quantity of the electric automobile; />Respectively representing the electricity price of the charging pile, the number of queuing vehicles of the charging pile, the generated power received by the charging pile and the minimum electric quantity required by the electric automobile to travel to the charging pile; />Representing the residual electric quantity in the energy storage system;respectively are provided withRepresenting the distance between two routing nodes, the road congestion index and the average road passing speed;
action space: state of electric vehicle charge schedule stage according to formula (47)The electric automobile takes an action +.>Action->Is composed of t 3 The electric automobile selects which charging pile is charged at the moment, and the charging power obtained by the selected charging pile comprises the following components:
bonus function: the electric automobile is making actionAfterwards, the electric car gets an instant prize +.>The reward is also an objective function of the electric vehicle charging schedule:
wherein K is 9 ,K 10 Is a weight coefficient set in
At T 3 After the moment, the electric automobile can obtain a total reward:
Wherein Y is ed ∈[0,1]Is the discount rate. The other steps are the same as those of the fifth embodiment.
Seventh embodiment: according to the method for jointly optimizing the economic dispatch and the electric vehicle charging strategy in the energy Internet, a dynamic step length mechanism is integrated under an Actor-Critic framework to perform training and learning of an electric vehicle charging dispatch optimization model based on near-end strategy optimization and considering two-stage economic dispatch, and because the action space of the proposed optimization problem is a continuous space, a strategy-based reinforcement learning algorithm is adopted, so that continuous actions can be generated. Traditional policy-based methods use policy gradients (policy-gradient) to update parameters and optimize objective functions. However, the volatility of action selection is large, and the convergence of a strategy algorithm is difficult to ensure, so that a deep reinforcement learning algorithm based on near-end strategy optimization is adopted to select actions based on a trust region, and thus action selection with large variation can be reduced.
The near-end strategy optimization algorithm is a reinforcement learning algorithm based on strategy gradients, and compared with other strategy gradient methods, the method for limiting the update step length reduces the problem of excessive sensitivity to the update step length. In addition, the near-end policy optimization algorithm also adopts the importance sampling technology to better utilize the sampled data, so as to improve the updating efficiency of the network,
The strategy gradient descending method is adopted when the Actor network is trained in the S4.1, and the strategy parameter theta of the Actor network is updated in the direction of the maximum rewarding expected value J (theta) obtained by the intelligent agent; the strategy gradient descending method is to update strategy parameters according to the descending direction of the measurement gradient by calculating the strategy gradient, wherein the strategy pi is calculated by the strategy gradient θ The expression of maximizing the prize expectation J (θ) achieved by the agent is specifically expressed as:
π θ =arg max θ J(θ), (51)
j (θ) represents the expected value of the prize obtained by the agent:
wherein τ represents the agent in pi θ To use a policy, a process trajectory generated by interacting with an environment, which may be represented by a sequence of "state-actions",representing a reward function, T being a time step; s is(s) t Representing the state of the intelligent agent at the time t; a, a t Representing the action of the agent at the time t; />Representing an expected function that measures the magnitude of rewards earned by the agent.
The iterative process for updating the strategy parameter theta of the Actor network in the direction of the maximum rewarding expected value J (theta) obtained by the agent is as follows:
in θ old Respectively the old policy parameters; eta is learning rate and the value range is [0,1 ]];I.e. the gradient of the effect exhibited by the current strategy with respect to the strategy parameters.
The other steps are the same as those of the sixth embodiment.
Eighth embodiment: in the method for jointly optimizing economic dispatch and electric vehicle charging strategies in the energy internet in the embodiment, the method comprises the following steps:
the policy gradient is calculated by introducing a merit function expressed as:
in the method, in the process of the invention,representing an dominance function at time t representing the selected action a t In the current state s t The following dominance degree; pi θ (a t |s t ) Is a probability distribution representing the probability of being in a given state s t Action a is taken t Is used to determine the policy probability of (1),
the dominance function is a state-action value functionAnd a state value function generated by a Critic networkObtaining the difference value of (2); dominance function at time t->The calculation formula of (2) is as follows:
wherein Y is E [0,1 ]]For discount rate, it is used to control that rewards fed back by the current agent are more important than rewards fed back in the future,as a function of state-action values; />Is a state value function generated by the Critic network;
the Critic network is responsible for evaluating action a t Extracting the current state s from the extracted sample sequence t Calculating a state value function at the current state stCritic network uses bonus function +.>Sum state value functionThe Mean Square Error (MSE) between as a loss function updates the network parameter μ, namely:
where μ is a parameter of the Critic network, As a desired function. The other steps are the same as in the seventh embodiment.
In the method for jointly optimizing economic dispatch and electric vehicle charging strategies in the energy Internet, the importance sampling technology is used for estimating the expected value of the old strategy, which is updated by the new strategy, so as to determine the maximum direction of the expected value J (theta), and update the strategy parameters; after the importance sampling technique is introduced under the Actor-Critic architecture, the equation (55) for calculating the strategy gradient is updated as follows:
wherein pi θ (s t ,a t ) Representing the policy of the current update,representing old policies, ++>Representing the dominance function calculated under the old strategy, then +.>Representing the expected value of rewards obtained after the intelligent agent introduces the importance sampling technology, expressed as:
the other steps are the same as those of embodiment eight.
Detailed description ten: in the method for jointly optimizing economic dispatching and charging strategies of electric vehicles in the energy internet in the embodiment, because the updating of step length in the formula is still very sensitive, namely if probability distribution of output actions of two networks is far away, the training process of a model is difficult to converge, and in order to make the probability distribution of strategies of the two networks not far away, a near-end strategy optimization algorithm adds a certain constraint condition in the formula, namely the distance between new strategies and old strategies is limited by a shearing mechanism of an advantage function, the method is expressed as follows:
In the method, in the process of the invention,the ratio of two policy networks is shown; clip (·) is the clipping function, when r t The value of (θ) is less than 1 ε, then for r t Cutting (θ), r t The value of (θ) becomes 1- ε; if r t If the value of (θ) is greater than 1+ε, then for r t Cutting (θ), r t The value of (θ) becomes 1+ε; if r t (θ) is in [ 1-. Epsilon., 1+epsilon ]]Between, r t And (theta) is kept unchanged. The method comprises the steps of carrying out a first treatment on the surface of the Epsilon is a clipping value set herein, and the size is usually set to 0.2, so that the problem caused by overlarge update step size can be avoided, meanwhile, the problem that the update step size is too sensitive in a strategy gradient algorithm is solved, and other steps are the same as those in the embodiment nine.
Eleventh embodiment: the present embodiment will be described with reference to one to ten specific embodiments, in which the power dispatching data includes photovoltaic generators, wind turbines, and load data. Table 1 lists specific settings of the electric vehicle charging schedule in which the electricity prices of the charging posts obey a mean value m p = (3.28×5.0), standard deviationIs a normal distribution of (2); the average speed of the road obeys a mean value m v = (55.13 ×90.0), standard deviation +.>Is a normal distribution of (2); the number of queues of charging piles and the road congestion factor are subject to a uniform distribution of different parameters.
Table 1 electric vehicle charge schedule experiment parameter settings
/>
In the day-ahead scheduling experiments, we used a DRL algorithm based on near-end policy optimization to solve the optimization problem. Since near-end policy optimization is an artificial intelligence algorithm based on the Actor-Critic architecture, we use two networks to train model parameters. The structure of the Actor1 network is (4, 32,12, sigmoid, adam), namely the structure of the Actor1 is a neural network model with 4 input features, two hidden layers and 12 output neurons, the Sigmoid function is used as an activation function, and the Adam algorithm is used for training. The structure of the Critic1 network is (4, 32,1, relu, adam). That is, the Critic1 network is a neural network model with 4 input features, two hidden layers, and 12 output neurons, using the Relu function as the activation function, and training using Adam's algorithm. Because the action of the Actor network is to give a strategy pi according to the state data of the intelligent agent, when the program runs, the strategy pi is assumed to be obeyed by a Beta distribution, an action interval generated by sampling the Beta distribution is (0, 1), the action interval accords with experimental data distribution characteristics and actual power output scenes, and finally the neural network obtains a group of optimal strategy set by continuously updating two parameters of the Beta distribution.
The specific codes are as follows:
in the daily short-term rolling schedule experiment, we also used a DRL algorithm optimized based on a near-end strategy, wherein the structure of the Actor2 network is (4, 32,12, sigmoid, adam), namely the Actor2 network is a neural network model with 4 input features, two hidden layers and 12 output neurons, the Sigmoid function is used as an activation function, and the Adam algorithm is used for training. The structure of the Critic2 network is (4, 32,1, relu, adam). That is, the Critic2 network is a neural network model with 4 input features, two hidden layers, and 12 output neurons, using the Relu function as the activation function, and training using Adam's algorithm. This strategy pi is subject to a Beta distribution, and the algorithm code is as follows by generating an action interval (0, 1) from Beta distribution samples:
in the electric automobile charging schedule experiment, a DRL algorithm based on near-end policy optimization is used. The structure of the Actor3 network is (25, 256,4, softmax, adam), the Actor3 network is a neural network model with 25 input features, two hidden layers and 4 output neurons, the Softmax function is used as an activation function, and Adam algorithm is used for training. The structure of the Critic 3 network is (25,256,256, 1, tanh, adam). The Critic 3 network is a neural network model with 25 input features, two hidden layers and 1 output neuron, using the Tanh function as the activation function, and trained using Adam's algorithm. The algorithm code is as follows:
Because the electric vehicle can only select one charging pile at a time in the electric vehicle charging simulation process, the strategy given by the Actor3 network is assumed to follow a global distribution, and the neural network continuously updates the distributed parameters through training, so that a group of optimal strategy action sets are obtained. In this embodiment, all simulation experiments are run on a 4-core i5 cpu, 12GB memory and GTX960M graphics card, and the versions of Python and Pytorch are 3.8 and 1.8, respectively.
Because of the uncertainty of new energy power generation and charging pile load demands, the next day energy demands of the charging pile, such as photovoltaic and wind power prediction power generation capacity and the load demand of the charging pile, need to be predicted. Fig. 5 illustrates the predicted next daily load demand of the charging pile under different models, and it can be seen from fig. 8 that, during the course of a day, the Attention mechanism (Attention) mechanism-based gating cycle (GRU) curve, that is, the att_gru curve, fits the real data better than the other two gating cycle (GRU) curves and the curve of the Convolutional Neural Network (CNN), wherein the actual data is particularly apparent in the time periods of 3:00-5:00, 11:00-13:00, and 21:00-22:00.
Table 2 shows the load prediction performance index of the charging pile under different models. Wherein Mean Absolute Error (MAE), mean Square Error (MSE) and Mean Absolute Percent Error (MAPE) are used to measure the quality of a model, and smaller values of these values indicate smaller deviations between the predicted value and the true value, and higher prediction accuracy of the model. R is R 2 To determine the coefficients, it represents the proportion of the variance interpreted by the model, with the larger the value the better the model performance.
TABLE 2 load prediction Performance index for different models
According to the prediction result of the next day energy source of the charging pile, firstly, the power generation plan of the next day charging pile is optimized by power day front scheduling and using a deep reinforcement learning algorithm based on near-end strategy optimization, and FIG. 6 shows that the average cumulative rewards result of day front scheduling of the algorithm under different learning rates. As can be seen from FIG. 6, the jackpot is converged at all three learning rates, but the final jackpot value is greatest when the learning rate is 3e-4, and the rate of convergence of the jackpot values is greater than in the other two cases.
Fig. 7 illustrates the output and energy storage conditions of the unit in the scheduling process before the power day, wherein the value of the energy storage system is positive or negative, the output of the energy storage system is represented when the value is positive, so that the power supply and demand balance is kept, and the surplus power of the unit is represented when the value is negative and stored in the energy storage system. As can be seen from fig. 7, the new energy power generation "renewable power" and the thermal power generation "thermal power" and the energy storage output "ESS" of the thermal power generating unit can ensure the energy load "requirement of the charging pile in one day.
Fig. 8 and 9 respectively describe the output conditions of the photovoltaic and wind power plants during the day-ahead scheduling process, wherein "Predicted power" represents Predicted power, and "Grid-connected power" represents plant internet power. As can be seen from fig. 8, the internet power of the photovoltaic unit is a certain difference from the predicted power in the course of 10:00-17:00 a day. In FIG. 9, the Internet surfing power of the wind turbine is a certain difference from the predicted power in the time periods of 0:00-1:00 and 19:00-23:00. Therefore, in the power day-ahead scheduling process, a certain error exists in the new energy power generation plan.
The daily short-term scheduling is based on the daily scheduling unit output plan, and the daily scheduling result is continuously corrected in a rolling mode. FIG. 10 illustrates average jackpot for daily short-term rolling schedule at different learning rates, where the Actor and Critic network learning rates are set to {3e-3,3e-5,3e-7}, respectively, and as can be seen from FIG. 9, the final jackpot value is maximum when the learning rate is 3e-5, and the convergence rate of the algorithm is better than the other two learning rates.
FIG. 11 illustrates the intra-day short-term rolling schedule phase and the pre-day timeComparing the optimization results of photovoltaic power generation in the scheduling stage, fig. 8 illustrates the comparison of the optimization results of wind power generation in the intra-day short-term rolling scheduling stage and the pre-day scheduling stage. In fig. 11 and 12, "Predictedpower" indicates predicted power, "Day-ahead power" indicates Day-ahead power, and "intra power" indicates intra-Day short-term roll power. As can be seen from fig. 11, the photovoltaic unit outputs during the day period more than during the day period for these periods of time 10:00-12:00, 14:00-17:00. As can be seen from FIG. 12, the wind turbine assembly outputs during the day period are higher than during the day period for the periods of 0:00-1:00, 19:00-22:00. Therefore, it is necessary to improve the accuracy of the scheduling plan by daily short-term rolling scheduling. FIG. 13 depicts a daily short-term rolling schedule CO 2 As a result of optimizing the emission, CO at three learning rates with increasing training round number 2 Emissions decrease gradually and tend to converge, with a learning rate of 3e-5 CO 2 The final discharge amount of (c) is minimized.
After the power dispatching is completed, each charging pile can distribute electric energy in different proportions, and the electric automobile selects a proper charging pile to carry out a charging task according to the self requirements. FIG. 14 shows the average jackpot conditions of the electric vehicle charge scheduling algorithm optimized based on the near-end strategy at different learning rates, here respectively set the Actor and Critic network learning rates as {2e-3,2e-5,2e-6}. As can be seen from fig. 14, the prize value increases with increasing training rounds (epoosdes) for the three learning rates, but the final jackpot prize value (Average Cumulative Reward) is greatest at learning rates of 2e-3 and the prize value convergence rate is greater than the prizes for the other two learning rates.
Fig. 15 shows the energy storage optimization results at different learning rates, and as can be seen from fig. 15, when the learning rates of the Actor and Critic networks are {2e-3,2e-5,2e-6} respectively, the energy storage values gradually decrease and tend to converge with increasing training wheel numbers, and this result is consistent with the optimization objective set herein in the electric vehicle charging schedule stage.
Fig. 16 shows an analysis of a charging schedule result of an electric vehicle, in which a bar graph shows a road congestion index, a larger value of the bar graph indicates that a road is congested, a square line shows user charging satisfaction, and a triangle line shows a charging pile energy price. As can be seen from fig. 16, the user's charge satisfaction is maximum at 4:00, and the energy price is about 2.1 at this time, but the road congestion index is 0.18 at this time, which is a non-congestion level, although the energy price is not the lowest. Therefore, the electric vehicle charging scheduling algorithm based on the near-end policy optimization is effective.
FIG. 17 depicts a comparison of normalized average jackpot results for different deep reinforcement learning algorithms under the same environmental setting. As can be seen from fig. 17, the SAC algorithm has a higher prize value than the algorithm presented herein for cycles 0-2000, but as the number of training cycles increases, the SAC algorithm generates a set of actions with less agent feedback prize value than the algorithm presented herein, i.e., the proposed algorithm takes better action than the SAC algorithm. Therefore, the algorithm proposed herein has certain advantages.

Claims (10)

1. The method for jointly optimizing the economic dispatch and the electric vehicle charging strategy in the energy Internet is characterized by comprising the following steps:
s1: constructing a data set 1; the specific process is as follows:
s1.1: acquiring historical data information of a charging pile;
the charging pile historical data information comprises charging pile historical load data and charging pile historical energy storage data;
the charging pile historical load data comprise historical load data of a wind generating set on a charging pile, historical load data of a photovoltaic generating set on the charging pile and historical load data of a thermal generating set on the charging pile;
the charging pile historical energy storage data comprises historical energy storage data of a charging pile day-ahead dispatching stage and historical energy storage data of a charging pile day-in-short-term rolling dispatching stage;
s1.2: the GRU network based on the attention mechanism respectively processes historical load data of the wind generating set on the charging pile, historical load data of the photovoltaic generating set on the charging pile and historical load data of the thermal generating set on the charging pile to obtain predicted generating data of the wind generating set, predicted generating data of the photovoltaic generating set and predicted generating power of the thermal generating set on the charging pile in a day-ahead dispatching stage;
the wind generating set forecast power generation data comprises: the predicted power generation power of the wind generating set to the charging pile in the day-ahead dispatching stage and the predicted power generation power of the wind generating set to the charging pile in the day-in-short-term rolling dispatching stage;
The photovoltaic generator set forecast power generation data comprises: the predicted power generation power of the photovoltaic generator set to the charging pile in the day-ahead dispatching stage and the predicted power generation power of the photovoltaic generator set to the charging pile in the day-in dispatching stage;
s1.3: taking predicted power generation of the charging pile by the wind generating set in the day-ahead dispatching stage, predicted power generation of the charging pile by the photovoltaic generating set in the day-ahead dispatching stage, historical energy storage data of the charging pile in the day-ahead dispatching stage and load power of the charging pile in the day-ahead dispatching stage as a data set 1;
the load power of the charging pile in the day-ahead dispatching stage consists of the predicted power generation power of the wind generating set to the charging pile in the day-ahead dispatching stage, the predicted power generation power of the photovoltaic generating set to the charging pile in the day-ahead dispatching stage and the predicted power generation power of the thermal generating set to the charging pile in the day-ahead dispatching stage;
s2: an electric vehicle charging schedule optimization model which considers two-stage economic schedule and is optimized based on a near-end strategy is constructed;
the electric vehicle charging schedule optimization model based on the near-end policy optimization and considering two-stage economic schedule comprises the following steps: the electric power economic dispatch model and the electric vehicle charging dispatch model;
the specific process for constructing the electric vehicle charging schedule optimization model based on near-end policy optimization and considering two-stage economic schedule is as follows:
S2.1: establishing an objective function and constraint conditions of the power economy dispatching, and establishing an optimized form of the power economy dispatching according to the objective function and constraint conditions of the power economy dispatching;
s2.2: establishing an objective function and constraint conditions of electric vehicle charging schedule, and establishing an optimized form of the electric vehicle charging schedule according to the objective function and constraint conditions of the electric vehicle charging schedule;
s3: converting an electric vehicle charging schedule optimization model which is optimized based on a near-end strategy and considers two-stage economic schedule into a corresponding Markov decision model; the markov decision model is generally composed of a triplet of data S, A, R, including: a state set S, an action set A and a return function R;
the specific process of converting the electric vehicle charging schedule optimization model based on the near-end strategy optimization and considering the two-stage economic schedule into the corresponding Markov decision model is as follows:
s3.1: establishing a Markov decision model of the power economy dispatching according to the power economy dispatching optimization form; the markov decision model of the power economy dispatch includes: a Markov decision model for day-ahead scheduling and a Markov decision model for intra-day short-term rolling scheduling;
S3.2: establishing a Markov decision model of the electric vehicle charging schedule according to an optimized form of the electric vehicle charging schedule;
s4: solving a Markov decision model of the power economy dispatching to obtain an optimal power economy dispatching strategy planning table;
the optimal power economy scheduling policy schedule includes: a day-ahead scheduling strategy consisting of a day-ahead scheduling action set of the generator set and a day-in short-term rolling scheduling strategy consisting of a day-in short-term rolling scheduling action set of the generator set;
the generator set comprises a thermal generator set, a wind generator set and a photovoltaic generator set;
s4.1: constructing an Actor-Critic network integrating an importance sampling technology and a dynamic step length mechanism under an Actor-Critic architecture,
actor network according to current state s t Generating a random strategy pi by the current strategy parameter theta θ And according to the current policy pi θ The output intelligent agent takes corresponding action a t The method comprises the steps of carrying out a first treatment on the surface of the the intelligent agent is making action a at time t t Thereafter, the Actor-Critic network returns an instant prize r to the agent t Then the agent generates a state vector s at the next moment t+1 Obtaining a training sample sequence<s t ,a t ,r t ,s t+1 ,π θ >;
When training the Actor network, a strategy gradient descent method is adopted, and the strategy parameter theta of the Actor network is updated in the direction of the maximum rewarding expected value J (theta) obtained by the intelligent agent;
The strategy gradient is calculated by introducing a dominance function through a state-action value functionAnd a state value function generated by Critic network +.>Obtaining the difference value of (2);
the Critic network is responsible for evaluating action a t Extracting the current state s from the extracted sample sequence t Calculating the current state s t Lower state value functionCritic network uses bonus function +.>And state value function->The Mean Square Error (MSE) between as a loss function to update the network parameter μ;
the importance sampling technology is used for estimating the expected value of the reward of updating the new strategy for the old strategy, so as to determine the maximizing direction of the expected value J (theta) of the reward and update the strategy parameters;
the dynamic step length mechanism is used for limiting the updating step length through a shearing mechanism of the dominance function in the training process, so that the dynamic step length adjustment is realized, and the loss function of the Actor network is obtained;
s4.2: constructing an Actor1-Critic1 network which is integrated with an importance sampling technology and a dynamic step length mechanism under an Actor-Critic architecture, and solving a Markov decision model of day-ahead scheduling through the Actor1-Critic1 network to obtain a day-ahead scheduling strategy formed by a group of optimal day-ahead scheduling action sets;
The policy pi generated by the Actor1 network θ1 Obeying beta distribution;
the specific process of solving the Markov decision model scheduled in the day before by the Actor1-Critic1 network is as follows:
data set 1 constructs state set s of Actor1-Critic1 network t1 Then input into an Actor1-Critic1 network, and the output generator set takes a day-ahead scheduling actionRewards->Obtaining training sample sequence->Training sample sequence->Stored in experience buffer->
Actor1-Critic1 network slave experience bufferRandomly extracting a batch of sample sequences, training and updating the network parameters theta of the Actor1 1 And Critic1 network parameter μ 1 Obtaining a day-ahead scheduling strategy consisting of a group of optimal day-ahead scheduling action sets;
the day-ahead scheduling action set includes: the method comprises the steps that in a day-ahead dispatching stage, a photovoltaic generator set dispatches the Internet surfing power to a charging pile, a wind power generator set dispatches the Internet surfing power to the charging pile, a thermal generator set dispatches the actual power generation power to the charging pile, and the thermal generator set is in an operation state in the day-ahead dispatching stage;
s4.3: constructing a data set 2;
the method comprises the steps of taking predicted power generation power of a charging pile by a wind generating set in a short-term rolling scheduling stage in the day, predicted power generation power of the charging pile by a photovoltaic generating set in the short-term rolling scheduling stage in the day, historical energy storage data of the charging pile in the short-term rolling scheduling stage in the day, load power of the charging pile in the short-term rolling scheduling stage in the day and operation state of the thermal generating set in the short-term rolling scheduling stage in the day as a data set 2;
The load power of the charging pile in the daily short-term rolling scheduling stage consists of the predicted power generation power of the wind generating set to the charging pile in the daily front scheduling stage, the predicted power generation power of the photovoltaic generating set to the charging pile in the daily short-term rolling scheduling stage and the actual power generation power of the thermal generating set to the charging pile in the daily short-term rolling scheduling stage;
the operation state of the thermal generator set in the daily short-term rolling scheduling stage is the same as the operation state of the thermal generator set in the daily front scheduling stage;
s4.4: constructing an Actor2-Critic2 network integrating an importance sampling technology and a dynamic step length mechanism under an Actor-Critic architecture, and solving a Markov decision model of intra-day short-term rolling scheduling through the Actor2-Critic2 network to obtain an intra-day short-term rolling scheduling strategy consisting of a group of intra-day short-term rolling scheduling action sets;
the policy pi generated by the Actor2 network θ2 Obeying beta distribution;
the method for solving the Markov decision model of the intra-day short-term rolling schedule by the Actor2-Critic2 network comprises the following specific processes:
data set 2 construction of state set s of Actor2-Critic2 network t2 Then input the data to an Actor2-Critic2 network, and the output generator set takes a daily short-term rolling scheduling action a t2 AndRewards r t2 Obtaining a training sample sequence<s t2 ,a t2 ,r t2 ,s t2+1 ,π θ2 >Training a sample sequence<s t2 ,a t2 ,r t2 ,s t2+1 ,π θ2 >Stored in experience buffers
Actor2-Critic2 network slave experience bufferA batch of sample sequences are randomly extracted to train and update the Actor2 and Critic2 network parameters theta 2 Sum mu 2 Obtaining a daily short-term rolling scheduling strategy consisting of a group of daily short-term rolling scheduling action sets,
finally obtaining an optimal power economy scheduling strategy schedule based on the S4.2 and the S4.4;
the intra-day short-term rolling schedule set includes: in the daily short-term rolling scheduling stage, the photovoltaic generator set schedules the Internet surfing power to the charging pile, the wind power generator set schedules the Internet surfing power to the charging pile, and the thermal generator set schedules the actual power generation power to the charging pile;
s5: constructing a data set 3, and solving a Markov decision model of the charging schedule of the electric automobile to obtain an optimal charging schedule strategy of the electric automobile; the specific process is as follows:
s5.1: acquiring data information of electric vehicles, charging piles, energy storage systems and road traffic conditions as a data set 3
The electric vehicle data includes: residual electric quantity of the electric automobile;
the charging pile data includes: the electricity price of the charging pile, the number of queuing vehicles of the charging pile, the generated power received by the charging pile and the minimum electric quantity required by the electric automobile to travel to the charging pile;
The energy storage system data includes: residual electric quantity in the energy storage system;
the road traffic condition data includes: the distance between two routing nodes, the road congestion index and the average road passing speed;
s5.2: the specific process for obtaining the optimal charging scheduling strategy of the electric automobile by solving the Markov decision model of the charging scheduling of the electric automobile comprises the following steps:
constructing an Actor3-Critic3 network integrating an importance sampling technology and a dynamic step length mechanism under an Actor-Critic architecture, solving a Markov decision model of electric automobile charging schedule through the Actor3-Critic3 network,
policy pi generated by the Actor3 network θ3 Obeying the category distribution;
the specific process of solving the Markov decision model of the electric automobile charging schedule by the Actor3-Critic3 network is as follows:
data set 3 construction of state set s of Actor3-Critic3 network t3 Then input to an Actor3-Critic3 network, and output that the electric automobile can take a charging scheduling action a t3 Prize r t3 Obtaining a training sample sequence<s t3 ,a t3 ,r t3 ,s t3+1 ,π θ3 >Training a sample sequence<s t3 ,a t3 ,r t3 ,s t3+1 ,π θ3 >Stored in experience buffers
Actor3-Critic3 network slave experience bufferA batch of sample sequences are randomly extracted to train and update the Actor3 and Critic3 network parameters theta 3 Sum mu 3 Finally, an optimal electric vehicle charging scheduling strategy formed by a group of electric vehicle charging scheduling action sets is obtained;
the electric vehicle charging scheduling action consists of selecting which charging pile is charged by the electric vehicle and charging power obtained by the selected charging pile.
2. The method for jointly optimizing economic dispatch and electric vehicle charging strategies in the energy internet according to claim 1, which is characterized in that:
the S2.1: the objective function of establishing the power economy schedule includes: an objective function of day-ahead scheduling and an objective function of intra-day short-term rolling scheduling; constraints on power economy scheduling include: constraint conditions of day-ahead scheduling and constraint conditions of intra-day short-term rolling scheduling; establishing the power economy schedule includes the following optimized forms: an optimized form of day-ahead scheduling and an optimized form of intra-day short-term rolling scheduling;
the objective function of the day-ahead schedule includes: the wind and light discarding cost of the renewable energy unit, the power generation cost of the traditional energy unit and the carbon emission of the traditional energy unit are reduced;
the constraint conditions of the day-ahead scheduling comprise: power balance constraint, unit output size constraint, unit running state constraint, unit minimum running time and downtime constraint in the day-ahead stage;
The objective function of the intra-day short-term rolling schedule comprises the following steps: the total cost of the unit in the daily short-term rolling scheduling stage and the carbon emission of the thermal power unit are the lowest, and the total cost of the unit comprises the waste light and waste wind punishment cost and the adjustment cost of the thermal power generator unit;
the constraint conditions of the intra-day short-term rolling schedule comprise: power balance constraint in the daytime, output force size constraint of the unit and climbing constraint of the unit;
the S2.2: establishing an objective function of the electric vehicle charging schedule, wherein the objective function comprises maximizing satisfaction of users and energy consumption of charging piles;
the S2.2: the constraint conditions for establishing the electric vehicle charging schedule comprise: capacity constraint of the energy storage system, selection constraint of the charging pile, charging power constraint, remaining capacity constraint of the electric automobile, charging capacity constraint of the electric automobile and road network topology constraint.
3. The method for optimizing economic dispatch and electric vehicle charging strategy combination in the energy Internet according to claim 2, wherein,
the objective function of the day-ahead schedule is specifically expressed as:
wherein,
wherein P and O respectively represent the internet power and the running state of the machine set in the day-ahead stage;the cost is punished for the solar photovoltaic and wind turbine generator system light discarding at the day-ahead stage; / >Punishment cost for wind curtailment of photovoltaic and wind turbine units in the day-ahead stage; />The method is the power generation cost of the thermal power generating unit in the day-ahead stage, and comprises the material cost, the maintenance cost and the start-stop cost of the unit; />Is the carbon emission quantity k of the thermal power generating unit in the day-ahead stage p Penalty cost coefficients for the photovoltaic, wind turbines; />At t for photovoltaic 1 Predicting power generation at moment; />At t for wind turbine 1 Predicting power generation at moment; />At t for photovoltaic 1 The internet surfing power at the moment; />At t for wind turbine 1 The internet surfing power at the moment; />Indicating that the thermal power unit i is at t 1 The running state of the time period is a variable of 0-1, and the running state of the unit is represented when the value is 1; a, a i ,b i ,c i The cost coefficient of the power generation material of the thermal power unit i; />At t for thermal power unit i 1 The actual power generated at the moment; k (k) m The maintenance cost coefficient of the thermal power unit i; />The starting cost of the thermal power unit i is; />The shutdown cost of the thermal power unit i is; />Indicating that the thermal power unit i is at t 1 Starting-up behavior of period->Indicating that the thermal power unit i is at t 1 The shutdown behavior of the time period,and->All are 0-1 variables, and the value of the variable is represented at t when the value is 1 1 The time period is when the power-on or power-off action occurs; alpha ggggg The carbon emission coefficients of the thermal power generating unit i are respectively; />Is a thermal power unit set; g i Is the ith thermal power generating unit; i E [1, NG ];
The constraint condition of the day-ahead scheduling is specifically expressed as follows:
power balance constraint in the early day phase:
in the method, in the process of the invention,at t 1 The load power of the jth charging pile at the moment; />At t for the jth energy storage system in the day-ahead stage 1 The storage power at the moment; />Collecting charging piles; q j For the j-th charging pile, j is E [1, M];
The output size constraint of the unit is as follows:
in the method, in the process of the invention,respectively the minimum power generation power and the maximum power generation power of the thermal power generating unit i in the day-ahead scheduling stage;
the unit running state constraint is as follows:
wherein,maximum power generation power is respectively scheduled for the photovoltaic and wind generating sets before the day,
the minimum running time and the shutdown time constraint of the unit are as follows:
wherein T is i up The minimum running time of the unit; t (T) i down Minimum downtime for the unit;
the optimization form of the day-ahead schedule is as follows:
s.t.~.(3)~(12).
wherein K is 1 ,K 2 ,K 3 ,K 4 The weight coefficients of the components are respectively normalized to satisfy
4. The method for optimizing economic dispatch and electric vehicle charging strategy combination in the energy Internet according to claim 3, wherein,
the objective function of the intra-day short-term rolling schedule is expressed as:
wherein,
wherein P is the Internet surfing power of the intra-day stage unit; t is t 0 The time is the initial optimization time; delta T is the optimized duration; The cost of light and wind abandoning in the daily stage is punished respectively; />The adjustment cost of the thermal power generating unit in the daytime is realized; />Carbon emission of the thermal power generating unit; in (1) the->The predicted power generation power of the photovoltaic and wind turbine units in the daily short-term rolling scheduling stage is respectively calculated; />Respectively photovoltaic and wind turbine at t 2 The internet power of the time period; />The method is used for adjusting the cost coefficient of the thermal power generating unit in the daytime; />For the operation state of the thermal power generating unit i in the daily short-term rolling scheduling stage, +.>And->The two values are the same;at t for thermal power unit i 2 Actual power generated during the time period;
the constraint conditions of the intra-day short-term rolling schedule are specifically expressed as follows:
power balance constraint during the intra-day phase:
in the method, in the process of the invention,at t 2 The load power of the jth charging pile at the moment; />The j-th energy storage system is in t for the day stage 2 The storage power at the moment;
upper and lower limit constraint of daily short-term rolling dispatching output of thermal power generating unit:
wherein,respectively the minimum power generation power and the maximum power generation power of the thermal power unit i in a daily short-term rolling scheduling stage;
upper and lower limit constraint of solar short-term rolling dispatching output of photovoltaic unit:
upper and lower limit constraint of daily short-term rolling dispatching output of wind turbine
Wherein,maximum power generation of the photovoltaic unit and the wind turbine unit which are scheduled in a short-term rolling manner in the day are respectively;
And the daily short-term rolling schedule takes 15 minutes as an interval, and the climbing constraint of the unit:
wherein R is iu ,R id The upper limit value and the lower limit value of the climbing rate of the thermal power unit i are respectively
The optimization form of the daily short-term rolling schedule is as follows:
s.t.~.(16)~(22).
wherein K is 5 ,K 6 ,K 7 ,K 8 The weight coefficients of the components are respectively normalized to satisfy
5. The method for jointly optimizing economic dispatch and electric vehicle charging strategies in the energy internet according to claim 4, wherein the objective function for establishing the electric vehicle charging dispatch is expressed as:
wherein, P and z respectively represent the charging power of the electric automobile and which charging pile is selected to carry out the charging task;representing the user satisfaction degree of the nth electric vehicle; />Representing the energy storage capacity of the jth charging pile; />The method is an electric automobile set; y is n Representing an nth electric automobile; n is E [1, N];
The user satisfaction index comprises a distance energy consumption cost, a charging cost, a driving time, a queuing waiting time and a tolerant distance anxiety value, and the specific expression modes are as follows:
path energy consumption costs: after the nth electric automobile sends out the charging requirement, a certain energy cost is consumed when the electric automobile runs to the charging pileExpressed as:
Wherein, l and m are road node numbers respectively; t (T) ed Is a scheduling period; zeta type n Representing the battery consumption of the nth electric automobile per kilometer;the electricity price of the jth charging pile in the charging scheduling stage of the electric automobile is represented, all charging piles are set to belong to the same third party operator, the electricity price at each moment is determined by the third party operator, and the electricity price of the charging piles in the same area at each moment is the same; d, d lm,n Representing the distance between the nodes l and m through which the nth electric automobile selects the jth charging pile; phi (phi) lm,n Is a binary 0-1 variable, and when the value is 1, the n is represented byThe electric vehicle runs from the node l to the node m, and the value of the electric vehicle is 0, so that the route is not selected; z j,n A binary 0-1 variable which indicates whether the jth charging pile is selected by the nth electric automobile;
charging cost: charging cost of electric automobileThe product form of the charging quantity and the electricity price is described as:
wherein,representing the charge quantity of the nth electric automobile in a charging stage; />Battery charging efficiency for the nth electric vehicle; />Representing the charging power of the nth electric automobile in a charging stage; />Representing the maximum battery capacity of the nth electric automobile; Δt (delta t) 3 Representing the charging time of the electric automobile;
travel time: the time taken by the nth electric vehicle to travel from the departure place to the charging pile is expressed as
Wherein,indicating that the nth electric automobile is at t 3 Congestion index of road through which the charging pile is driven at moment,/->Indicating that the road through which the nth electric automobile passes is at t 3 Average speed of time of day, here +.>And->All are from the traffic department to obtain real-time traffic conditions;
queuing wait time: after the nth electric automobile runs to the charging pile, if other electric automobiles are charging in the charging pile, a certain queuing waiting time is needed at the momentExpressed as:
wherein,indicating that the jth charging pile is at t 3 The number of queuing vehicles at the moment; />Indicating that the jth charging pile is at t 3 The charging completion efficiency at the moment; />Indicating the queuing waiting time required when the electric automobile n goes to the j-th charging pile for charging,
tolerance distance anxiety values: the nth electric vehicle may endure a great distance anxiety degree in some casesTo select a remote charging pile for a charging task, thereby realizing a lower charging cost, expressed as:
wherein ρ is n Anxiety coefficient of the nth electric automobile;the remaining capacity of the nth electric automobile;
The user satisfaction of the nth electric vehicle is described as follows in conjunction with the above formula:
wherein omega ss ,Weight coefficients of the respective components for setting user satisfaction +.>Normalized to the value of [0,1 ]]And->The larger the value of (c), the better the charging experience for the user,
wherein the method comprises the steps of
Indicated at t 3 The power of the jth charging pile is scheduled in a rolling way in a short time within a day;
the constraint conditions for establishing the electric vehicle charging schedule are specifically expressed as follows:
energy storage capacity constraint: the storage capacity of the single energy storage system is limited by upper and lower limits, and the residual electric quantity value generated by scheduling is calculatedIs that the upper and lower limits of the storage capacity cannot be exceeded:
in the method, in the process of the invention,respectively representing the upper limit and the lower limit of energy storage;
charging pile selection constraint: for the nth electric automobile, only one charging pile can be selected for charging:
charging power constraint: for the nth electric vehicle, the upper limit constraint of charging power needs to be complied with in the charging process:
remaining capacity constraint of electric automobile: the electric quantity of the nth electric automobile in the driving stage must support the electric automobile to drive to the charging pile:
wherein,indicating that the electric automobile is at t 3 The minimum electric quantity required by the j-th charging pile is selected at any time,
charge amount constraint of electric vehicle: the nth electric automobile is at t 3 The charge quantity at the moment can not exceed the dispatch quantity received by the selected charging pile and the residual quantity in the energy storage system nearby at the previous momentAnd (2) sum:
road network topology constraints: the electric automobile starts from the node omega and arrives at the charging pile setThe constraint ensures that the routes selected by the nth electric automobile can be connected in sequence:
wherein l, m, p are road nodes, i.e. l, m,φ lmpl and phi in formula (27) lm,n Are all one and twoWhen the value of the variable 0-1 is 1, the route between two nodes is selected by the electric automobile, otherwise, the route is not selected by the electric automobile;
the optimization form for establishing the electric vehicle charging schedule is as follows:
s.t.~.(32)~(37).
wherein K is 9 ,K 10 The weight coefficients of the components are respectively normalized to satisfy
6. The method for jointly optimizing economic dispatch and electric vehicle charging strategies in the energy internet according to claim 5, wherein the step S3.1: establishing a Markov decision model of the power economy dispatching according to the power economy dispatching optimization form; the markov decision model of the power economy dispatch includes: a Markov decision model for day-ahead scheduling and a Markov decision model for intra-day short-term rolling scheduling; the specific process is as follows:
The Markov decision model of the day-ahead schedule is as follows:
state set: the scheduling period set in the day-ahead scheduling stage is 24, namely t 1 ∈[0,23]The method comprises the steps of carrying out a first treatment on the surface of the For each time slot t 1 Using 4 variables to describe the state of the day-ahead scheduling phaseExpressed as:
the four variables are respectively: predicted power generation of photovoltaic pair jth charging pilePredicted power generation of wind turbine unit on jth charging pile +.>Load power of jth charging pile in day-ahead stage +.>The jth energy storage system is at t 1 Storage power +.>And->Is obtained by GRU network model prediction based on the Attention mechanism;
action set: state s of the day-ahead scheduling stage according to equation (39) t1 The generator-side unit takes corresponding actionWherein the actions are->Consists of 4 variables, expressed as:
wherein,respectively photovoltaic, wind turbine set at t 1 Scheduling the internet power to the jth charging pile at a time,/>At t for thermal power unit i 1 The actual power generation power of the jth charging pile is scheduled at the moment, and 4 variables are all results output by an Actor1-Critic1 network;
bonus function: t is t 1 The time slot generator set is making actionThe generator set will then get an immediate prize +.>The reward is also a day-ahead schedule optimization form:
Wherein K is 1 ,K 2 ,K 3 ,K 4 The weight coefficient is set in the formula, and the weight coefficient is set in the T 1 After that time, the generator set will receive the total jackpot:
wherein Y is da ∈[0,1]Is a discount rate;the desired function is represented by a function of the desired function,
the Markov decision model of the intra-day short-term rolling schedule is as follows:
state set: for time slot t 2 Using 4 variables to describe the state of the short-term rolling schedule phase within a dayThe method comprises the following steps of: predicted power generation of photovoltaic unit on jth charging pile>Predicted power generation of wind turbine unit to jth charging pileLoad power of jth charging pile in daytime>The jth energy storage system is at t 2 Storage power at timeExpressed as:
wherein,the method is obtained through GRU network model prediction based on an Attention mechanism;
action set: the intra-day short-term rolling schedule phase state according to (43)The generator set takes corresponding actionExpressed in terms of 3 variables:
wherein (1)>Respectively photovoltaic at t 2 Scheduling the internet power of the jth charging pile at moment and enabling the wind turbine to be at t 2 Scheduling the internet power to the jth charging pile at a time,/>At t for thermal power unit i 2 The actual power generated by the j-th charging pile is scheduled at the moment, 3 variables are all the results output by the Actor2-Critic2 network,
Bonus function: the generator set is according to the stateMake action->After that, the generator set gets instant rewards +.>The prize value is also an optimized form of daily short-term rolling schedule:
wherein K is 5 ,K 6 ,K 7 ,K 8 The weight coefficient is set in the formula, and the weight coefficient is set in the T 2 After that time, the generator set will receive the total jackpot:
wherein Y is di ∈[0,1]Is a discount rate;
the S3.2: establishing a Markov decision model of the electric vehicle charging schedule according to an optimized form of the electric vehicle charging schedule; the Markov decision model of the electric automobile charging schedule is as follows:
state space: for each time slot t 3 Describing the state s of the electric automobile charging schedule stage from 4 parts t3 Including electric automobile, fill electric pile, energy storage system and road traffic conditions:
wherein,representing the residual electric quantity of the electric automobile; />Respectively representing the electricity price of the charging pile, the number of queuing vehicles of the charging pile, the generated power received by the charging pile and the minimum electric quantity required by the electric automobile to travel to the charging pile; />Representing the residual electric quantity in the energy storage system;respectively representing the distance between two routing nodes, the road congestion index and the average road passing speed;
action space: state of electric vehicle charge schedule stage according to formula (47) The electric automobile takes an action +.>Action->Is composed of t 3 The electric automobile selects which charging pile is charged at the moment, and the charging power obtained by the selected charging pile comprises the following components:
bonus function: the electric automobile is making actionAfterwards, the electric car gets an instant prize +.>The reward is also an objective function of the electric vehicle charging schedule:
wherein K is 9 ,K 10 Is a weight coefficient set in
At T 3 After the moment, the electric automobile can obtain a total reward:
wherein Y is ed ∈[0,1]Is the discount rate.
7. The method for optimizing economic dispatch and electric vehicle charging strategy combination in energy Internet according to claim 6, wherein,
the strategy gradient descending method is adopted when the Actor network is trained in the S4.1, and the strategy parameter theta of the Actor network is updated in the direction of the maximum rewarding expected value J (theta) obtained by the intelligent agent; the strategy gradient descent method is that by calculating the strategy gradient, the strategy gradient is reduced according toMeasuring the direction of gradient descent to update the policy parameters, policy pi θ The expression of maximizing the prize expectation J (θ) achieved by the agent is specifically expressed as:
π θ =argmax θ J(θ), (51)
j (θ) represents the expected value of the prize obtained by the agent:
wherein τ represents the agent in pi θ To use a policy, a process trajectory generated by interacting with an environment, which may be represented by a sequence of "state-actions",representing a reward function, T being a time step; s is(s) t Representing the state of the intelligent agent at the time t; a, a t Representing the action of the agent at the time t; />Representing a desired function that measures the magnitude of the rewards earned by the agent,
the iterative process for updating the strategy parameter theta of the Actor network in the direction of the maximum rewarding expected value J (theta) obtained by the agent is as follows:
in θ old Respectively the old policy parameters; eta is learning rate and the value range is [0,1 ]];I.e. the gradient of the effect exhibited by the current strategy with respect to the strategy parameters. />
8. The method for optimizing the combination of economic dispatch and electric vehicle charging strategies in the energy internet according to claim 7, wherein the strategy gradient is calculated by introducing a merit function as follows:
in the method, in the process of the invention,representing an dominance function at time t representing the selected action a t In the current state s t The following dominance degree; pi θ (a t |s t ) Is a probability distribution representing the probability of being in a given state s t Action a is taken t Is used to determine the policy probability of (1),
the dominance function is a state-action value functionAnd a state value function generated by a Critic network Obtaining the difference value of (2); dominance function at time t->The calculation formula of (2) is as follows:
wherein Y is E [0,1 ]]For discount rate, it is used to control that rewards fed back by the current agent are more important than rewards fed back in the future,as a function of state-action values; />Is a state value function generated by the Critic network;
the Critic network is responsible for evaluating action a t Extracting the current state s from the extracted sample sequence t Calculating the current state s t Lower state value functionCritic network uses bonus function +.>And state value function->The Mean Square Error (MSE) between as a loss function updates the network parameter μ, namely:
where μ is a parameter of the Critic network,as a desired function.
9. The method for optimizing the combination of economic dispatch and electric vehicle charging strategies in the energy internet according to claim 8, wherein the importance sampling technique is used for estimating the expected value of the reward of the update of the new strategy to the old strategy, thereby determining the maximizing direction of the expected value J (theta) of the reward and updating the strategy parameters; after the importance sampling technique is introduced under the Actor-Critic architecture, the equation (55) for calculating the strategy gradient is updated as follows:
wherein pi θ (s t ,a t ) Representing the policy of the current update, Representing old policies, ++>Representing the dominance function calculated under the old strategy, with +.>Representing the expected value of rewards obtained after the intelligent agent introduces the importance sampling technology, expressed as:
10. the method for optimizing economic dispatch and electric vehicle charging strategy combination in energy Internet according to claim 9, wherein the dynamic step length mechanism is to limit update step length by a shearing mechanism of an advantage function in the training process, thereby realizing dynamic adjustment of step length and obtaining a loss function of an Actor networkExpressed as:
/>
in the method, in the process of the invention,the ratio of the new strategy to the old strategy is shown; clip (·) is the clipping function, when r t The value of (θ) is less than 1 ε, for r t Cutting (θ), r t The value of (θ) becomes 1- ε; if r t The value of (θ) is greater than 1+ε, for r t Cutting (θ), r t The value of (θ) becomes 1+ε; if r t (θ) is in [ 1-. Epsilon., 1+epsilon ]]Between, r t And (theta) is kept unchanged. />
CN202311231834.9A 2023-09-22 2023-09-22 Economic dispatching and electric vehicle charging strategy combined optimization method in energy internet Pending CN117350424A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311231834.9A CN117350424A (en) 2023-09-22 2023-09-22 Economic dispatching and electric vehicle charging strategy combined optimization method in energy internet

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311231834.9A CN117350424A (en) 2023-09-22 2023-09-22 Economic dispatching and electric vehicle charging strategy combined optimization method in energy internet

Publications (1)

Publication Number Publication Date
CN117350424A true CN117350424A (en) 2024-01-05

Family

ID=89358542

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311231834.9A Pending CN117350424A (en) 2023-09-22 2023-09-22 Economic dispatching and electric vehicle charging strategy combined optimization method in energy internet

Country Status (1)

Country Link
CN (1) CN117350424A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117634713A (en) * 2024-01-26 2024-03-01 南京邮电大学 Electric taxi charging cost optimization method and system based on charging pile lease
CN117689188A (en) * 2024-02-04 2024-03-12 江西驴充充物联网科技有限公司 Big data-based user charging strategy optimization system and method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117634713A (en) * 2024-01-26 2024-03-01 南京邮电大学 Electric taxi charging cost optimization method and system based on charging pile lease
CN117634713B (en) * 2024-01-26 2024-05-24 南京邮电大学 Electric taxi charging cost optimization method and system based on charging pile lease
CN117689188A (en) * 2024-02-04 2024-03-12 江西驴充充物联网科技有限公司 Big data-based user charging strategy optimization system and method
CN117689188B (en) * 2024-02-04 2024-04-26 江西驴充充物联网科技有限公司 Big data-based user charging strategy optimization system and method

Similar Documents

Publication Publication Date Title
Li et al. Constrained EV charging scheduling based on safe deep reinforcement learning
CN117350424A (en) Economic dispatching and electric vehicle charging strategy combined optimization method in energy internet
CN111762051B (en) Electric automobile participating receiving-end power grid low-valley peak regulation demand response regulation and control method based on aggregators
CN112776673B (en) Intelligent network fuel cell automobile real-time energy optimization management system
Li et al. Probabilistic charging power forecast of EVCS: Reinforcement learning assisted deep learning approach
CN113222463B (en) Data-driven neural network agent-assisted strip mine unmanned truck scheduling method
CN114997631B (en) Electric vehicle charging scheduling method, device, equipment and medium
Huang et al. A control strategy based on deep reinforcement learning under the combined wind-solar storage system
CN107590607A (en) A kind of micro-capacitance sensor Optimal Scheduling and method based on photovoltaic prediction
CN112217195A (en) Cloud energy storage charging and discharging strategy forming method based on GRU multi-step prediction technology
Guo et al. Energy management of intelligent solar parking lot with EV charging and FCEV refueling based on deep reinforcement learning
CN115805840A (en) Energy consumption control method and system for range-extending type electric loader
CN117578409A (en) Multi-energy complementary optimization scheduling method and system in power market environment
Pan et al. Grey wolf fuzzy optimal energy management for electric vehicles based on driving condition prediction
CN114169916A (en) Market member quotation strategy making method suitable for novel power system
Cui et al. Dynamic pricing for fast charging stations with deep reinforcement learning
Lin et al. Aggregator pricing and electric vehicles charging strategy based on a two-layer deep learning model
CN117543581A (en) Virtual power plant optimal scheduling method considering electric automobile demand response and application thereof
CN112750298B (en) Truck formation dynamic resource allocation method based on SMDP and DRL
CN117207803A (en) Electric automobile intelligent charging strategy selection method based on economic dispatch
CN111799820B (en) Double-layer intelligent hybrid zero-star cloud energy storage countermeasure regulation and control method for power system
CN116596252A (en) Multi-target charging scheduling method for electric automobile clusters
CN116683530A (en) Wind-light-containing hybrid type pumping and storing station cascade reservoir random optimization scheduling method
CN113098073B (en) Day-ahead scheduling optimization method considering source-load bilateral elastic space
Hong et al. 6G based intelligent charging management for autonomous electric vehicles

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication