CN116683513A - Method and system for optimizing energy supplement strategy of mobile micro-grid - Google Patents

Method and system for optimizing energy supplement strategy of mobile micro-grid Download PDF

Info

Publication number
CN116683513A
CN116683513A CN202310750126.XA CN202310750126A CN116683513A CN 116683513 A CN116683513 A CN 116683513A CN 202310750126 A CN202310750126 A CN 202310750126A CN 116683513 A CN116683513 A CN 116683513A
Authority
CN
China
Prior art keywords
grid
power
energy
action
micro
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310750126.XA
Other languages
Chinese (zh)
Inventor
文书礼
汤俊彦
朱淼
徐莉婷
马建军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202310750126.XA priority Critical patent/CN116683513A/en
Publication of CN116683513A publication Critical patent/CN116683513A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/28Arrangements for balancing of the load in a network by storage of energy
    • H02J3/32Arrangements for balancing of the load in a network by storage of energy using batteries with converting means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06313Resource planning in a project environment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/007Arrangements for selectively connecting the load or loads to one or several among a plurality of power lines or power sources
    • H02J3/0075Arrangements for selectively connecting the load or loads to one or several among a plurality of power lines or power sources for providing alternative feeding paths between load and source according to economic or energy efficiency considerations, e.g. economic dispatch
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/38Arrangements for parallely feeding a single network by two or more generators, converters or transformers
    • H02J3/46Controlling of the sharing of output between the generators, converters, or transformers
    • H02J3/466Scheduling the operation of the generators, e.g. connecting or disconnecting generators to meet a given demand
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2300/00Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
    • H02J2300/40Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation wherein a plurality of decentralised, dispersed or local energy generation technologies are operated simultaneously

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Power Engineering (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Development Economics (AREA)
  • Molecular Biology (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Quality & Reliability (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Algebra (AREA)
  • Probability & Statistics with Applications (AREA)
  • Primary Health Care (AREA)
  • Water Supply & Treatment (AREA)

Abstract

The invention provides a method and a system for optimizing energy supplement strategies of a mobile micro-grid, wherein the method comprises the following steps: and (3) characterizing an energy supplementing step of the mobile micro-grid: the mobile micro-grid is regarded as an agent in a Markov decision process, and a state space, an action space and a reward function of intelligent energy supplement regulation and control of the mobile micro-grid are designed based on the Markov decision process; training a mobile micro-grid energy supplementing strategy optimizing step: training a movable micro-grid agent through a DQN algorithm to obtain an optimal cost function, and selecting the action with the highest value by the agent according to the state to obtain an optimal strategy; deploying a trained mobile micro-grid energy supplementing strategy step: based on the trained mobile micro-grid intelligent body, an intelligent energy supplementing strategy of the mobile micro-grid is realized. The invention utilizes the coupling relation between the mobile micro-grid and the power system for providing the energy supplementary service to achieve the optimization effect of considering both requirements.

Description

Method and system for optimizing energy supplement strategy of mobile micro-grid
Technical Field
The invention relates to the technical fields of electrical engineering and computer science, in particular to a method and a system for optimizing an energy supplementing strategy of a mobile micro-grid.
Background
The electrification of transportation means is an important ring in the energy transformation process, the electrified transportation means is used as a movable micro-grid, the energy supplementing process of the electrified transportation means relates to a plurality of complex elements, the electrified transportation means is difficult to control by manpower, and intelligent algorithm automatic control is needed.
At present, most of research on an optimization method of a mobile micro-grid energy supplementing strategy takes an electric automobile as a research object, only relates to an energy supplementing process between the electric automobile and a charging pile, and does not consider the complexity of an internal system of an energy supplementing service provider. For electrified ships, complex running states and component links exist in a port power system for providing energy supplementary service, if an idealized charging pile model is only considered, the potential of cooperative optimization cannot be exerted, larger power fluctuation is easily caused, and the stability of the port power system is negatively influenced. The mobile micro-grid energy supplementing strategy optimization method needs to enhance the adaptability to a complex power system.
Existing relevant references: li Hang, li Guojie, wang Keyou. Electric vehicle real-time scheduling strategy based on deep reinforcement learning [ J ]. Electric power system automation, 2020,44 (22): 161-167. Technical comparison: the document proposes an energy supplementing strategy optimization method based on deep reinforcement learning aiming at an automobile micro-grid. The literature provides an optimization method for an electric automobile, and gives consideration to energy supplementing cost and power grid operation stability, however, the method assumes that the battery parameters of a micro-grid are uniform constants, does not consider complex factors in an electric power system for providing energy supplementing service, and cannot be suitable for optimizing mobile micro-grids such as electrified ships.
Existing relevant references: wei Z, li Y, cai L.electric vehicle charging scheme for apark-and-charge system considering battery degradation costs [ J ]. IEEE Transactions on Intelligent Vehicles,2018,3 (3): 361-373. Technical comparison: the literature designs an energy supplementing strategy optimization method of the electric automobile through mathematical programming. This approach builds a battery loss model focusing on improving the battery life of the electric vehicle, however, this document does not consider the optimization requirements of the power system providing the energy replenishment service, which may result in serious negative effects on the operational stability of the power system when a scenario involving a large number of mobile micro-grids while replenishing energy is involved.
Existing relevant references: zhao Z, lee C K m. dynamic pricing for EV charging stations: adeep reinforcement learning approach [ J ]. IEEE Transactions on Transportation Electrification,2021,8 (2): 2456-2468. Technical comparison: the technical points are compared: the literature researches an energy supplement strategy optimization method of the electric automobile based on deep reinforcement learning. The literature considers queuing processes in the automobile energy supplementing process, the comprehensive queuing time and other indexes provide a concept of service quality, an Actor-Critic algorithm is used for designing an optimization method taking the service quality as an optimization target, however, the literature does not consider the optimization requirement of an electric power system for providing energy supplementing service, and the Actor-Critic algorithm has the defects of slow training, large model convergence difficulty and the like compared with a DQN algorithm.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a method and a system for optimizing an energy supplementing strategy of a mobile micro-grid.
According to the method and the system for optimizing the energy supplement strategy of the mobile micro-grid, which are provided by the invention, the scheme is as follows:
in a first aspect, a method for optimizing energy replenishment strategy of a mobile micro-grid is provided, the method comprising:
and (3) characterizing an energy supplementing step of the mobile micro-grid: the method comprises the steps of regarding a mobile micro-grid as an agent in a Markov decision process, designing a state space, an action space and a reward function of intelligent energy supplement regulation of the mobile micro-grid based on the Markov decision process, and regarding energy supplement power of the mobile micro-grid as an action to be decided;
training a mobile micro-grid energy supplementing strategy optimizing step: training a movable micro-grid agent through a DQN algorithm to obtain an optimal cost function, and selecting the action with the highest value by the agent according to the state to obtain an optimal strategy;
deploying a trained mobile micro-grid energy supplementing strategy step: based on the trained mobile micro-grid intelligent body, energy supplementing power is selected according to the optimal cost function obtained through training, and an intelligent energy supplementing strategy of the mobile micro-grid is realized.
Preferably, the state space design in the step of characterizing the energy replenishment of the mobile microgrid comprises: the lower limit of the SOC of the energy storage battery at each moment required by meeting the energy supplementing requirement of the micro-grid is calculated in advance, and the SOC is ensured not to be lower than the lower limit in the energy supplementing process, so that the energy supplementing requirement of the mobile micro-grid is ensured to be met when the mobile micro-grid leaves;
the calculation mode of the SOC lower limit is as follows:
wherein ,et,i SOC of energy storage battery at t moment for ith mobile micro-grid, t a,i and tl,i The ith mobile micro-grid is supplied with energy at the start and the preset end moments, and />The actual charging efficiency and the actual discharging efficiency of the ith mobile micro-grid respectively, P i dis,max and Pi ch,max Respectively, the maximum discharge power and the maximum charge power which can be born by the ith mobile micro-grid energy storage battery, delta t is the interval time of adjacent moments, c i Energy storage battery capacity, e, for the ith mobile microgrid i min E is the lowest safety lower limit of the SOC of the ith mobile micro-grid energy storage battery target,i A target energy storage battery SOC that is expected to be reached on departure for the ith mobile microgrid;
the state space is defined as follows:
wherein ,st,i The state variable set of the ith mobile micro-grid at the moment t is the current moment t stay,i Supplementing time, P, for remaining energy of the ith mobile micro-grid t,i sp Shore power access load power, p, generated at time t when accessing power system for ith mobile micro-grid ch,t For electricity price at time t, P t RE Is the sum of the output of renewable energy sources at the moment t of the power system, P t L Is the normal load power at the moment t of the power system, n b,t The total number of mobile micro-grids being supplied with energy at time t.
Preferably, the step of characterizing the energy replenishment of the mobile microgrid comprises: setting the energy supplementing power of the mobile micro-grid as an action variable to be decided, discretizing the charging and discharging power into a plurality of gears for selection, wherein positive values represent energy absorption from a power system, negative values represent energy release to the power system, and an action space is defined as follows:
wherein ,At Action to be performed by agent at time t, a t,i For the motion variable of the ith mobile micro-grid at the moment t, P t,i ch And (3) for the charging power of the ith mobile micro-grid at the time t, wherein I is a sequence number set of all mobile micro-grids connected to the power system.
Preferably, the step of characterizing the energy replenishment of the mobile microgrid comprises:
for an optimization target meeting energy supplementing requirements, calculating the SOC lower limit at each moment; however, during the training process, the action result determined by the DQN algorithm may make the SOC lower than the lower limit or higher than the safety upper limit, and an action correction is required, where the correction formula is as follows:
wherein ,ei max Maximum SOC upper limit defined for the energy storage battery of the ith mobile micro-grid to ensure the electric energy supplement safety, e t,i SOC of energy storage battery at t moment for ith mobile micro-grid, c i For the energy storage battery capacity of the ith mobile microgrid,e is the actual charging efficiency of the ith mobile microgrid t+1.i pred For the ith vessel by uncorrected action a t,i The resulting expected SOC at time t+1, < > and->Is the correlation coefficient of the charge and discharge efficiency of the ship, when e t,i <e t+1,i th,low Time->Otherwise->
In order to guide the intelligent agent to select the action quantity between the upper limit and the lower limit, the difference value between the action quantity before correction and the action quantity after correction is obtained, the opposite number of the absolute value is used as the action out-of-limit rewarding function value, the intelligent agent is forced to make a decision of the action without out-of-limit through the negative rewarding value, and the action out-of-limit rewarding function formula is as follows:
for the optimization target for reducing the energy supplementing cost, the opposite number of the supplementing cost is directly used as a rewarding function, the expense cost when the mobile micro-grid is charged corresponds to a negative rewarding value, the earning is earned when the mobile micro-grid is used for selling electricity, the positive rewarding value corresponds to the formula of the electric energy supplementing cost rewarding function as follows:
for the optimization target of reducing power fluctuation of the power grid, for measuring the contribution of the mobile micro-grid as virtual energy storage, the average value of the exchange power data of all time periods is required to be calculated as a reference value, and the exchange power of the energy supplement power of the mobile micro-grid is not considered in the past, and the specific calculation formula is as follows:
wherein ,for t time of previous day, the exchange power of charging and discharging power of mobile micro-grid is not considered, P t L ,before Load power inside an electric power system providing energy supplementary service at time t of the previous day,/->Total microgrid power load, P, for the previous day t without accounting for mobile microgrid energy replenishment process t RE,before Total power generated by renewable energy sources inside an electric power system providing energy supplementary services at time t of the previous day, P t ST,before Power plant output, P, of an energy storage station inside an electric power system providing energy supplementary services at time t of the previous day t,i sp,before Shore power load power, P, from a power system providing energy supplementary service at time t of the previous day for providing power to a power system equipped with an ith mobile microgrid t,j ch,before The power system for providing energy supplementary service at t time of the previous day supplements load power brought by electric energy for the j-th block by using an energy storage battery replaced from the mobile micro-grid in a power conversion mode, I before Sequence number set of mobile micro-grid for all access power systems of previous day, J before The serial number set of the energy storage batteries which are replaced from the mobile micro-grid in a power conversion mode is used for all the previous days;
selecting a full-day average value of the exchange power of the mobile micro-grid not considered before the day when measuring the power fluctuation, and calculating a difference value from the exchange power at the moment of the day t to measure the fluctuation power; the calculation formula of the fluctuation power of the output of the energy storage power station is not considered as follows:
wherein ,Pt MG,nonST Exchanging power for a power grid which does not consider the output of the energy storage power station at the time t of the day;
the energy storage power station determines the output after the decision is made by the intelligent agent, and aims to continuously reduce the power fluctuation to the maximum on the basis of the output of the movable micro-grid, wherein the output power calculation formula of the energy storage power station is as follows:
wherein ,Et,k ST For the electric energy storage quantity eta of the kth battery in the t-moment energy storage power station ch,k and ηdis,k C, respectively, the charging efficiency and the discharging efficiency of the kth battery in the energy storage power station k ST The capacity of a kth battery in the energy storage power station;
after the output of the energy storage power station is obtained, the fluctuation power calculation formula for calculating the output of the energy storage power station is as follows:
P t f =P t f,nonST -P t ST
the power ripple reward function formula is as follows:
comprehensively considering the action out-of-limit rewarding function, the electric energy supplementing cost rewarding function and the power fluctuation rewarding function, and obtaining a total rewarding function formula as follows:
r t =σ exceed r t exceedcost r t costfluc r t fluc
wherein ,σexceed 、σ cost 、σ fluc The method comprises the steps of respectively obtaining weight coefficients of an action out-of-limit reward function, an electric energy supplementing cost reward function and a power fluctuation reward function, wherein the larger the coefficient is, the more importance is given to an optimization target corresponding to the reward function.
Preferably, the training mobile micro-grid energy replenishment strategy optimization step includes:
using a neural network to evaluate the action cost function, and training a movable micro-grid intelligent body by adopting a DQN algorithm in hopes of gradually approaching an optimal action cost function and an optimal strategy through iteration;
The DQN algorithm introduces the current network Q (s, a, omega) and the target networkThe target network is responsible for calculating the action value of the next state, the current network is used for calculating the value of each action of the current state, and the loss function after the target network is introduced is as follows:
wherein M is the number of sampling samples, R k The prize value for the kth sample, gamma is the discount factor,in state S for target network k ' value of action a taken under, Q (S k ,A k ω) is the current network in state S k Take action A k Is of value (c).
Preferably, the basic flow of the DQN algorithm is as follows:
step a: randomly selected parameters initialize the current network Q (s, a, omega) and the target network
Step b: initializing an experience playback pool;
step c: starting a new interaction sequence, and executing steps d to n;
step d: acquiring an initial state S of a current sequence 1
Step e: steps f to m are performed for each time step t=1→t in the sequence;
step f: deriving current state S from current network t Action value under and action A executed by E-greedy strategy is selected t Setting an epsilon value by using an epsilon-greedy strategy, randomly selecting the action to be executed according to the epsilon probability, and selecting the action with the highest value according to the epsilon probability of 1-;
step g: execute action A t After that, get rewards R t Transition to the next state S t+1
Step h: sample state transition (S t ,A t ,R t ,S t+1 ) Storing the experience playback pool;
step i: if the number of samples in the experience playback pool exceeds a certain number, performing steps j to l;
step j: randomly sampling N samples from an experience playback pool;
step k: calculation of loss function from sampled samples
Step l: minimizing a loss function value by using a gradient descent method, and updating parameters of the current network Q (s, a, omega) with a step length beta;
step m: the target network is set at regular time stepsSynchronizing parameters with the current network Q (s, a, ω);
step n: returning to step e until the interaction sequence is terminated;
step o: returning to the step c until all the interaction sequences are executed;
after training the movable micro-grid intelligent body through the DQN algorithm, obtaining an optimal cost function, and selecting the action with the highest value according to the state of the intelligent body to obtain an optimal strategy.
In a second aspect, a mobile microgrid energy replenishment strategy optimization system is provided, the system comprising:
an energy replenishment module characterizing a mobile microgrid: the method comprises the steps of regarding a mobile micro-grid as an agent in a Markov decision process, designing a state space, an action space and a reward function of intelligent energy supplement regulation of the mobile micro-grid based on the Markov decision process, and regarding energy supplement power of the mobile micro-grid as an action to be decided;
Training a mobile micro-grid energy supplementing strategy optimization module: training a movable micro-grid agent through a DQN algorithm to obtain an optimal cost function, and selecting the action with the highest value by the agent according to the state to obtain an optimal strategy;
deploying a trained mobile micro-grid energy supplementing strategy module: based on the trained mobile micro-grid intelligent body, energy supplementing power is selected according to the optimal cost function obtained through training, and an intelligent energy supplementing strategy of the mobile micro-grid is realized.
Preferably, the characterizing the state space design in the energy replenishment module of the mobile microgrid comprises: the lower limit of the SOC of the energy storage battery at each moment required by meeting the energy supplementing requirement of the micro-grid is calculated in advance, and the SOC is ensured not to be lower than the lower limit in the energy supplementing process, so that the energy supplementing requirement of the mobile micro-grid is ensured to be met when the mobile micro-grid leaves;
the calculation mode of the SOC lower limit is as follows:
wherein ,et,i SOC of energy storage battery at t moment for ith mobile micro-grid, t a,i and tl,i The ith mobile micro-grid is supplied with energy at the start and the preset end moments, and />The actual charging efficiency and the actual discharging efficiency of the ith mobile micro-grid respectively, P i dis,max and Pi ch,max Respectively, the maximum discharge power and the maximum charge power which can be born by the ith mobile micro-grid energy storage battery, delta t is the interval time of adjacent moments, c i Energy storage battery capacity, e, for the ith mobile microgrid i min E is the lowest safety lower limit of the SOC of the ith mobile micro-grid energy storage battery target,i A target energy storage battery SOC that is expected to be reached on departure for the ith mobile microgrid;
the state space is defined as follows:
wherein ,st,i The state variable set of the ith mobile micro-grid at the moment t is the current moment t stay,i Supplementing time, P, for remaining energy of the ith mobile micro-grid t,i sp Shore power access load power, p, generated at time t when accessing power system for ith mobile micro-grid ch,t For electricity price at time t, P t RE Is an electric power systemSum of output of renewable energy sources at time t, P t L Is the normal load power at the moment t of the power system, n b,t The total number of the movable micro-grids which are being supplemented with energy at the moment t;
the design of the action space in the energy supplementing module for representing the mobile micro-grid comprises the following steps: setting the energy supplementing power of the mobile micro-grid as an action variable to be decided, discretizing the charging and discharging power into a plurality of gears for selection, wherein positive values represent energy absorption from a power system, negative values represent energy release to the power system, and an action space is defined as follows:
wherein ,At Action to be performed by agent at time t, a t,i For the motion variable of the ith mobile micro-grid at the moment t, P t,i ch The charging power of the ith mobile micro-grid at the time t is obtained, wherein I is a sequence number set of all mobile micro-grids connected to the power system;
the bonus function design in the energy replenishment module characterizing the mobile microgrid comprises:
for an optimization target meeting energy supplementing requirements, calculating the SOC lower limit at each moment; however, during the training process, the action result determined by the DQN algorithm may make the SOC lower than the lower limit or higher than the safety upper limit, and an action correction is required, where the correction formula is as follows:
wherein ,ei max Maximum SOC upper limit defined for the energy storage battery of the ith mobile micro-grid to ensure the electric energy supplement safety, e t,i SOC of energy storage battery at t moment for ith mobile micro-grid, c i For the energy storage battery capacity of the ith mobile microgrid,for the ith shiftActual charging efficiency of the mobile micro-grid e t+1.i pred For the ith vessel by uncorrected action a t,i The resulting expected SOC at time t+1, < > and->Is the correlation coefficient of the charge and discharge efficiency of the ship, when e t,i <e t+1,i th,low Time->Otherwise->
In order to guide the intelligent agent to select the action quantity between the upper limit and the lower limit, the difference value between the action quantity before correction and the action quantity after correction is obtained, the opposite number of the absolute value is used as the action out-of-limit rewarding function value, the intelligent agent is forced to make a decision of the action without out-of-limit through the negative rewarding value, and the action out-of-limit rewarding function formula is as follows:
For the optimization target for reducing the energy supplementing cost, the opposite number of the supplementing cost is directly used as a rewarding function, the expense cost when the mobile micro-grid is charged corresponds to a negative rewarding value, the earning is earned when the mobile micro-grid is used for selling electricity, the positive rewarding value corresponds to the formula of the electric energy supplementing cost rewarding function as follows:
for the optimization target of reducing power fluctuation of the power grid, for measuring the contribution of the mobile micro-grid as virtual energy storage, the average value of the exchange power data of all time periods is required to be calculated as a reference value, and the exchange power of the energy supplement power of the mobile micro-grid is not considered in the past, and the specific calculation formula is as follows:
wherein ,for t time of previous day, the exchange power of charging and discharging power of mobile micro-grid is not considered, P t L ,before Load power inside an electric power system providing energy supplementary service at time t of the previous day,/->Total microgrid power load, P, for the previous day t without accounting for mobile microgrid energy replenishment process t RE,before Total power generated by renewable energy sources inside an electric power system providing energy supplementary services at time t of the previous day, P t ST,before Power plant output, P, of an energy storage station inside an electric power system providing energy supplementary services at time t of the previous day t,i sp,before Shore power load power, P, from a power system providing energy supplementary service at time t of the previous day for providing power to a power system equipped with an ith mobile microgrid t,j ch,before The power system for providing energy supplementary service at t time of the previous day supplements load power brought by electric energy for the j-th block by using an energy storage battery replaced from the mobile micro-grid in a power conversion mode, I before Sequence number set of mobile micro-grid for all access power systems of previous day, J before The serial number set of the energy storage batteries which are replaced from the mobile micro-grid in a power conversion mode is used for all the previous days;
selecting a full-day average value of the exchange power of the mobile micro-grid not considered before the day when measuring the power fluctuation, and calculating a difference value from the exchange power at the moment of the day t to measure the fluctuation power; the calculation formula of the fluctuation power of the output of the energy storage power station is not considered as follows:
wherein ,Pt MG,nonST Exchanging power for a power grid which does not consider the output of the energy storage power station at the time t of the day;
the energy storage power station determines the output after the decision is made by the intelligent agent, and aims to continuously reduce the power fluctuation to the maximum on the basis of the output of the movable micro-grid, wherein the output power calculation formula of the energy storage power station is as follows:
wherein ,Et,k ST For the electric energy storage quantity eta of the kth battery in the t-moment energy storage power station ch,k and ηdis,k C, respectively, the charging efficiency and the discharging efficiency of the kth battery in the energy storage power station k ST The capacity of a kth battery in the energy storage power station;
after the output of the energy storage power station is obtained, the fluctuation power calculation formula for calculating the output of the energy storage power station is as follows:
P t f =P t f,nonST -P t ST
the power ripple reward function formula is as follows:
comprehensively considering the action out-of-limit rewarding function, the electric energy supplementing cost rewarding function and the power fluctuation rewarding function, and obtaining a total rewarding function formula as follows:
r t =σ exceed r t exceedcost r t costfluc r t fluc
wherein ,σexceed 、σ cost 、σ fluc The method comprises the steps of respectively obtaining weight coefficients of an action out-of-limit reward function, an electric energy supplementing cost reward function and a power fluctuation reward function, wherein the larger the coefficients are, the more importance is given to an optimization target corresponding to the reward function;
the training mobile micro-grid energy supplementing strategy optimization module comprises:
using a neural network to evaluate the action cost function, and training a movable micro-grid intelligent body by adopting a DQN algorithm in hopes of gradually approaching an optimal action cost function and an optimal strategy through iteration;
the DQN algorithm introduces the current network Q (s, a, omega) and the target networkThe target network is responsible for calculating the action value of the next state, the current network is used for calculating the value of each action of the current state, and the loss function after the target network is introduced is as follows:
Wherein M is the number of sampling samples, R k The prize value for the kth sample, gamma is the discount factor,in state S for target network k ' value of action a taken under, Q (S k ,A k ω) is the current network in state S k Take action A k Is of value (1);
the basic flow of the DQN algorithm is as follows:
step a: randomly selected parameters initialize the current network Q (s, a, omega) and the target network
Step b: initializing an experience playback pool;
step c: starting a new interaction sequence, and executing steps d to n;
step d: acquiring an initial state S of a current sequence 1
Step e: steps f to m are performed for each time step t=1→t in the sequence;
step f: deriving current state S from current network t Action value under and action A executed by E-greedy strategy is selected t E-bulimiaGreedy policy sets an epsilon value, randomly selects the action to be executed with the epsilon probability, and selects the action with highest value with the epsilon probability of 1-;
step g: execute action A t After that, get rewards R t Transition to the next state S t+1
Step h: sample state transition (S t ,A t ,R t ,S t+1 ) Storing the experience playback pool;
step i: if the number of samples in the experience playback pool exceeds a certain number, performing steps j to l;
step j: randomly sampling N samples from an experience playback pool;
Step k: calculation of loss function from sampled samples
Step l: minimizing a loss function value by using a gradient descent method, and updating parameters of the current network Q (s, a, omega) with a step length beta;
step m: the target network is set at regular time stepsSynchronizing parameters with the current network Q (s, a, ω);
step n: returning to step e until the interaction sequence is terminated;
step o: returning to the step c until all the interaction sequences are executed;
after training the movable micro-grid intelligent body through the DQN algorithm, obtaining an optimal cost function, and selecting the action with the highest value according to the state of the intelligent body to obtain an optimal strategy.
In a third aspect, a computer readable storage medium storing a computer program is provided, which when executed by a processor implements the steps of the mobile microgrid energy replenishment policy optimization method.
In a fourth aspect, an electronic device is provided, including a memory, a processor, and a computer program stored on the memory and executable on the processor, which when executed by the processor implements the steps of the mobile microgrid energy replenishment policy optimization method.
Compared with the prior art, the invention has the following beneficial effects:
1. The energy supplementing strategy optimization method provided by the invention can improve the adaptability to complex factors in the power system for providing energy supplementing service, and can realize the general optimization capability aiming at various situations;
2. the invention designs the coupling relation between the mobile micro-grid and the energy supplementary service provider and designs the cooperative optimization mechanism between the mobile micro-grid and the energy supplementary service provider, thereby giving consideration to the optimization requirements of the mobile micro-grid and the energy supplementary service provider and achieving better optimization effect;
3. the intelligent optimization system for deploying the energy supplement strategy by adopting the deep reinforcement learning method only needs to know partial state quantities of the mobile micro-grid and the energy supplement service provider, does not need to accurately model a specific structure, and is easy to deploy;
4. the proposal provided by the invention is not limited to the optimization of energy supplementing cost and power system operation stability, and can flexibly compatible with new optimization targets by modifying the rewarding function.
Other advantages of the present invention will be set forth in the description of specific technical features and solutions, by which those skilled in the art should understand the advantages that the technical features and solutions bring.
Drawings
Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments, given with reference to the accompanying drawings in which:
FIG. 1 is a mobile microgrid energy replenishment strategy intelligent optimization process;
FIG. 2 is a basic flow of energy replenishment strategy optimization for a mobile micro-grid based on deep reinforcement learning;
FIG. 3 is a schematic diagram of a training process;
FIG. 4 is a comparison of energy replenishment costs;
fig. 5 is a graph showing the comparison of power fluctuation of the power grid.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the present invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications could be made by those skilled in the art without departing from the inventive concept. These are all within the scope of the present invention.
The embodiment of the invention provides a mobile micro-grid energy supplementing strategy optimizing method, which utilizes a deep reinforcement learning method to deploy an energy supplementing strategy intelligent optimizing system for a mobile micro-grid, focuses the coupling relation between the mobile micro-grid and a power system providing energy supplementing service, characterizes the energy supplementing process of the mobile micro-grid as a Markov decision process, designs three optimizing targets of meeting energy supplementing requirements, reducing energy supplementing cost and improving the running stability of the power system, and designs a corresponding state space, action space and rewarding function, and the optimal energy supplementing strategy is obtained by training a mobile micro-grid intelligent body by using a DQN algorithm so as to realize intelligent mobile micro-grid energy supplementing scheduling. Referring to fig. 1, the present invention includes the following:
And (3) characterizing an energy supplementing step of the mobile micro-grid: the mobile micro-grid is regarded as an agent in a Markov decision process, a state space, an action space and a reward function of intelligent energy supplement regulation and control of the mobile micro-grid are designed based on the Markov decision process, and the energy supplement power of the mobile micro-grid is regarded as an action to be decided.
Training a mobile micro-grid energy supplementing strategy optimizing step: training a movable micro-grid agent through a DQN algorithm to obtain an optimal cost function, and selecting the action with the highest value by the agent according to the state to obtain an optimal strategy;
deploying a trained mobile micro-grid energy supplementing strategy step: based on the trained mobile micro-grid intelligent body, energy supplementing power is selected according to the optimal cost function obtained through training, and an intelligent energy supplementing strategy of the mobile micro-grid is realized.
In particular, the state space design in the energy replenishment step characterizing the mobile microgrid comprises: the lower limit of the SOC of the energy storage battery at each moment required by meeting the energy supplementing requirement of the micro-grid is calculated in advance, and the SOC is ensured not to be lower than the lower limit in the energy supplementing process, so that the energy supplementing requirement of the mobile micro-grid is ensured to be met when the mobile micro-grid leaves;
The calculation mode of the SOC lower limit is as follows:
wherein ,et,i SOC of energy storage battery at t moment for ith mobile micro-grid, t a,i and tl,i The ith mobile micro-grid is supplied with energy at the start and the preset end moments, and />The actual charging efficiency and the actual discharging efficiency of the ith mobile micro-grid respectively, P i dis,max and Pi ch,max Respectively, the maximum discharge power and the maximum charge power which can be born by the ith mobile micro-grid energy storage battery, delta t is the interval time of adjacent moments, c i Energy storage battery capacity, e, for the ith mobile microgrid i min E is the lowest safety lower limit of the SOC of the ith mobile micro-grid energy storage battery target I is the target energy storage battery SOC that the ith mobile micro-grid expects to reach when leaving;
the state space is defined as follows:
wherein ,st,i For the state of the ith mobile microgrid at time tVariable set, t is the current time, t stay,i Supplementing time, P, for remaining energy of the ith mobile micro-grid t,i sp Shore power access load power, p, generated at time t when accessing power system for ith mobile micro-grid ch,t For electricity price at time t, P t RE Is the sum of the output of renewable energy sources at the moment t of the power system, P t L Is the normal load power at the moment t of the power system, n b,t The total number of mobile micro-grids being supplied with energy at time t.
The motion space design in the energy replenishment step characterizing the mobile microgrid comprises: setting the energy supplementing power of the mobile micro-grid as an action variable to be decided, discretizing the charging and discharging power into a plurality of gears for selection, wherein positive values represent energy absorption from a power system, negative values represent energy release to the power system, and an action space is defined as follows:
wherein ,At Action to be performed by agent at time t, a t,i For the motion variable of the ith mobile micro-grid at the moment t, P t,i ch And (3) for the charging power of the ith mobile micro-grid at the time t, wherein I is a sequence number set of all mobile micro-grids connected to the power system.
The bonus function design in the energy replenishment step characterizing the mobile microgrid comprises:
for an optimization target meeting energy supplementing requirements, calculating the SOC lower limit at each moment; however, during the training process, the action result determined by the DQN algorithm may make the SOC lower than the lower limit or higher than the safety upper limit, and an action correction is required, where the correction formula is as follows:
wherein ,ei max Ensuring the electric energy supplement of the energy storage battery of the ith mobile micro-gridFull specified maximum SOC upper limit, e t,i SOC of energy storage battery at t moment for ith mobile micro-grid, c i For the energy storage battery capacity of the ith mobile microgrid,e is the actual charging efficiency of the ith mobile microgrid t+1.i pred For the ith vessel by uncorrected action a t,i The resulting expected SOC at time t+1, < > and->Is the correlation coefficient of the charge and discharge efficiency of the ship, when e t,i <e t+1,i th,low Time->Otherwise->
In order to guide the intelligent agent to select the action quantity between the upper limit and the lower limit, the difference value between the action quantity before correction and the action quantity after correction is obtained, the opposite number of the absolute value is used as the action out-of-limit rewarding function value, the intelligent agent is forced to make a decision of the action without out-of-limit through the negative rewarding value, and the action out-of-limit rewarding function formula is as follows:
for the optimization target for reducing the energy supplementing cost, the opposite number of the supplementing cost is directly used as a rewarding function, the expense cost when the mobile micro-grid is charged corresponds to a negative rewarding value, the earning is earned when the mobile micro-grid is used for selling electricity, the positive rewarding value corresponds to the formula of the electric energy supplementing cost rewarding function as follows:
for the optimization target of reducing power fluctuation of the power grid, for measuring the contribution of the mobile micro-grid as virtual energy storage, the average value of the exchange power data of all time periods is required to be calculated as a reference value, and the exchange power of the energy supplement power of the mobile micro-grid is not considered in the past, and the specific calculation formula is as follows:
wherein ,for t time of previous day, the exchange power of charging and discharging power of mobile micro-grid is not considered, P t L ,before Load power inside an electric power system providing energy supplementary service at time t of the previous day,/->Total microgrid power load, P, for the previous day t without accounting for mobile microgrid energy replenishment process t RE,before Total power generated by renewable energy sources inside an electric power system providing energy supplementary services at time t of the previous day, P t ST,before Power plant output, P, of an energy storage station inside an electric power system providing energy supplementary services at time t of the previous day t,i sp,before Shore power load power, P, from a power system providing energy supplementary service at time t of the previous day for providing power to a power system equipped with an ith mobile microgrid t,j ch,before The power system for providing energy supplementary service at t time of the previous day supplements load power brought by electric energy for the j-th block by using an energy storage battery replaced from the mobile micro-grid in a power conversion mode, I before Sequence number set of mobile micro-grid for all access power systems of previous day, J before The serial number set of the energy storage batteries which are replaced from the mobile micro-grid in a power conversion mode is used for all the previous days;
selecting a full-day average value of the exchange power of the mobile micro-grid not considered before the day when measuring the power fluctuation, and calculating a difference value from the exchange power at the moment of the day t to measure the fluctuation power; the calculation formula of the fluctuation power of the output of the energy storage power station is not considered as follows:
wherein ,Pt MG,nonST Exchanging power for a power grid which does not consider the output of the energy storage power station at the time t of the day;
the energy storage power station determines the output after the decision is made by the intelligent agent, and aims to continuously reduce the power fluctuation to the maximum on the basis of the output of the movable micro-grid, wherein the output power calculation formula of the energy storage power station is as follows:
wherein ,Et,k ST For the electric energy storage quantity eta of the kth battery in the t-moment energy storage power station ch,k and ηdis,k C, respectively, the charging efficiency and the discharging efficiency of the kth battery in the energy storage power station k ST The capacity of a kth battery in the energy storage power station;
after the output of the energy storage power station is obtained, the fluctuation power calculation formula for calculating the output of the energy storage power station is as follows:
P t f =P t f,nonST -P t ST
the power ripple reward function formula is as follows:
comprehensively considering the action out-of-limit rewarding function, the electric energy supplementing cost rewarding function and the power fluctuation rewarding function, and obtaining a total rewarding function formula as follows:
r t =σ exceed r t exceedcost r t costfluc r t fluc
wherein ,σexceed 、σ cost 、σ fluc The method comprises the steps of respectively obtaining weight coefficients of an action out-of-limit reward function, an electric energy supplementing cost reward function and a power fluctuation reward function, wherein the larger the coefficient is, the more importance is given to an optimization target corresponding to the reward function.
The training mobile micro-grid energy supplementing strategy optimizing step comprises the following steps:
using a neural network to evaluate the action cost function, and training a movable micro-grid intelligent body by adopting a DQN algorithm in hopes of gradually approaching an optimal action cost function and an optimal strategy through iteration;
The DQN algorithm introduces the current network Q (s, a, omega) and the target networkThe target network is responsible for calculating the action value of the next state, the current network is used for calculating the value of each action of the current state, and the loss function after the target network is introduced is as follows:
wherein M is the number of sampling samples, R k The prize value for the kth sample, gamma is the discount factor,in state S for target network k ' value of action a taken under, Q (S k ,A k ω) is the current network in state S k Take action A k Is of value (c).
The basic flow of the DQN algorithm is as follows:
step a: randomly selected parameters initialize the current network Q (s, a, omega) and the target network
Step b: initializing an experience playback pool;
step c: starting a new interaction sequence, and executing steps d to n;
step d: acquiring an initial state S of a current sequence 1
Step e: steps f to m are performed for each time step t=1→t in the sequence;
step f: deriving current state S from current network t Action value under and action A executed by E-greedy strategy is selected t Setting an epsilon value by using an epsilon-greedy strategy, randomly selecting the action to be executed according to the epsilon probability, and selecting the action with the highest value according to the epsilon probability of 1-;
step g: execute action A t After that, get rewards R t Transition to the next state S t+1
Step h: sample state transition (S t ,A t ,R t ,S t+1 ) Storing the experience playback pool;
step i: if the number of samples in the experience playback pool exceeds a certain number, performing steps j to l;
step j: randomly sampling N samples from an experience playback pool;
step k: calculation of loss function from sampled samples
Step l: minimizing a loss function value by using a gradient descent method, and updating parameters of the current network Q (s, a, omega) with a step length beta;
step m: the target network is set at regular time stepsSynchronizing parameters with the current network Q (s, a, ω);
step n: returning to step e until the interaction sequence is terminated;
step o: returning to the step c until all the interaction sequences are executed;
after training the movable micro-grid intelligent body through the DQN algorithm, obtaining an optimal cost function, and selecting the action with the highest value according to the state of the intelligent body to obtain an optimal strategy.
Next, the present invention will be described in more detail.
In order to realize a general intelligent mobile micro-grid energy supplementing strategy, reduce the energy supplementing cost of the mobile micro-grid and reduce the negative influence of the energy supplementing process on the stability of a power system providing the supplementing service, the invention provides a mobile micro-grid energy supplementing strategy optimizing method based on a reinforcement learning idea, and utilizes the coupling relation between the mobile micro-grid and the power system providing the energy supplementing service to exert the potential of cooperative optimization of the mobile micro-grid and the power system providing the energy supplementing service, thereby achieving the optimizing effect of taking both requirements into consideration. The method comprises the following steps:
The energy replenishment process of the mobile microgrid is characterized based on a Markov decision process:
the key of intelligent dispatching of the energy supplement of the mobile micro-grid is to provide an optimal decision pertinently according to the current state and specific requirements of the mobile micro-grid and a connected power system, coordinate the requirement balance between the mobile micro-grid and the connected power system, and guarantee the operation stability of the power system while reducing the energy supplement cost to the maximum extent.
The energy replenishment process of the mobile micro-grid can be abstracted as a sequential decision problem in a limited time domain, wherein the current state of the mobile micro-grid and the connected power system is only related to the last state and the adopted energy replenishment power, and the markov property is met, and the mobile micro-grid and the connected power system can be characterized as a markov decision process. In the invention, the mobile micro-grid is regarded as an agent in a Markov decision process, the energy supplement power is taken as an action to be decided, the micro-grid agent interacts with an environment formed by a connected power system through the energy supplement power action, and a corresponding rewarding value is obtained according to an interaction result.
In the Markov decision process, the agent makes action in the environment according to the current state, obtains corresponding rewards based on the executed action and the transferred next state, and continuously improves the strategy according to the feedback rewards of the environment, and the specific interaction process is shown in fig. 1.
The markov decision process may be uniquely determined by the five-tuple (S, a, T, r, γ). S is a state space containing all possible states in the environment; a is an action space, which contains all executable actions of the intelligent agent; t is a state transfer function, and T (s '|s, a) represents the probability that the agent will transfer to state s' after the state s performs action a; r is a reward function, needs to be set manually and corresponds to an optimization target; gamma is a discount factor, the value range is between 0, 1), and the larger the value is, the more concerned the long-term accumulated rewards are, otherwise, the more concerned the short-term current rewards are.
As an optimization object for reinforcement learning, an agent's strategy can be modeled as a probability of taking a certain action in a certain state, expressed mathematically as:
wherein ,St and At The state of the agent at the time t and the action executed are respectively.
The agent performs a given action in a given state, then performs the action according to a given policy, and finally obtains a mathematical expectation value of return when reaching a termination state, which is defined as an action cost function, and the mathematical expectation value is expressed as:
wherein t is the current time, G t For return at time t, S t A is the state of the intelligent agent at the moment t t For the action executed by the agent at time t, R t+k Is the reward at time t+k.
The action cost function under the optimal strategy is called the optimal action cost function and is marked as Q * (s, a) expressed mathematically as:
the optimal strategy for solving the energy supplementation of the mobile micro-grid by applying the deep reinforcement learning method is characterized in that an optimization target of an intelligent energy supplementation optimizing system and a known state quantity required for achieving the optimization target are defined. There is therefore a need to design state space, action space, and rewarding function of mobile microgrid intelligent energy replenishment regulation based on a markov decision process.
(1) Analysis of a power system:
the invention abstracts the power system providing the energy supplementary service into four aspects of source network charge storage, thereby constructing a general power system model, and the provided optimization method can be suitable for various power systems.
In the aspect of power supply of an electric power system, the invention examines the self renewable energy power generation of the electric power system, for example, in a port electric power system for providing energy supplementary service for electrified ships, the invention often comprises two renewable energy power generation forms of solar energy and wind energy. In order to ensure the universality of various power systems, the optimization method provided by the invention only relates to the sum of the renewable energy source output of a target power system at a certain moment, and does not need to consider the specific component links and principles of renewable energy source power generation of each power system.
In the aspect of public power grids connected with the power system, the invention focuses on the exchange power between the public power grids and the power system. When the fluctuation of the exchange power is large, the power regulation pressure of the public power grid is increased, and the operation stability is further compromised. The calculation formula of the exchange power of the micro-grid is as follows:
P t MG =P t L +P t EB -P t RE -P t ST
wherein ,Pt MG Switching power for accessing power system into public power grid at t moment, P t L For the normal load power of the power system at the moment t, P t EB For the total load power of the mobile micro-grid (positive value is the power absorbed from the power system and negative value is the power released to the power system) which is accessed to the power system at the time t, P t RE Is the sum of renewable energy sources of power systems such as wind energy, solar energy and the like, P t ST The output of the energy storage power station of the power system (the positive value is the energy storage power station releases electric energy and the negative value is the energy storage power station absorbs electric energy).
In the field of electrified loads of electric power systems, electric powerIn addition to providing energy supplementary services, the system itself can also generate electrified loads, for example, a port power system needs to bear electrified loads caused by logistics transportation, and the electrified loads comprise cargo handling equipment such as tire type field bridges, track type field bridges and the like, and traffic transportation equipment such as stacker cranes, automatic guided vehicles and the like with electrified conditions, and all of the traffic transportation equipment needs to provide power for ports. Such loads are conventional loads of the power system, which can be regarded as variables P independent of the mobile microgrid t L
The mobile micro-grid adds three new types of electrified loads to the power system: charging load, power conversion load and shore power access load. The charging load refers to the electric energy consumed by the mobile micro-grid when the charging pile of the power system is connected to charge, and the specific load power depends on the charging power of the mobile micro-grid and is related to the factors such as the battery capacity, the maximum charging power, the target SOC, the parking time and the like carried by the mobile micro-grid; the power change load refers to electric energy consumed in the process that the electric power system charges a replacement battery after supplementing energy to the mobile micro-grid through a power change mode, and the load quantity is related to the number of the mobile micro-grid adopting the power change mode in a period of time; the shore power access load refers to electric energy consumption generated by supplying power to electric equipment carried by the mobile micro-grid in the process of supplementing energy for the mobile micro-grid by the electric power system, for example, the electric power system of the ship is often supplied by the port electric power system in the process of charging the electrified ship. The calculation formula of the mobile micro-grid load is as follows:
wherein I is a sequence number set of all mobile micro-grids accessed into the power system, and m i 0-1 energy supplementing mode decision variable (1 means charging mode, 0 means power exchanging mode) for ith mobile micro-grid, P t,i ch For the charging power of the ith mobile micro-grid at the time t, P t,i sp The load power is accessed for the shore power of the ith mobile micro-grid at the moment t, and J is all the power replaced from the mobile micro-grid in a power replacing modeAnd (5) collecting serial numbers of the batteries.
In the aspect of energy storage of a power system, the invention abstracts each energy storage mode into the form of an energy storage power station in a unified way, and the specific difference of each energy storage mode is characterized as the difference of battery parameters such as battery capacity, charge and discharge power and the like of the energy storage power station. The output of the energy storage power station relates to the change of the electric energy storage amount, and the calculation formula of the electric energy storage amount of the energy storage power station is as follows:
wherein K is the serial number set of the energy storage power station battery, E t,k ST For the electric energy storage quantity of the kth battery in the energy storage power station at the t moment, theta is 0-1 decision variable of the energy storage power station (1 represents the absorption of electric energy by the energy storage power station, 0 represents the release of electric energy by the energy storage power station), and eta ch,k and ηdis,k Respectively the charging efficiency and the discharging efficiency of the kth battery in the energy storage power station, P t,k ST,ch and Pt,k ST,dis Respectively, the charging power and the discharging power of a kth battery in the energy storage power station at the t moment, delta t is the interval time (1 hour in the invention), c k ST Is the capacity of the kth battery in the energy storage power station.
Mobile micro-grids may also become a "virtual energy storage" for power systems. Specifically, the mobile micro-grid with the time margin can actively sell electricity to the power system, so that the electric charge is earned, and the electric energy supplementing cost is reduced. Compared with renewable energy source power generation, the mode of using the mobile micro-grid to supply electric energy for the electric power system has stronger controllability, and can compensate uncertainty of renewable energy sources to a certain extent and share power regulation pressure. In addition, the electricity price is higher in the electricity consumption peak period and lower in the valley period, and by discharging the mobile micro-grid in the peak period and charging the mobile micro-grid in the valley period, the economic benefit of the mobile micro-grid owner can be maximized from the electricity price difference, the effect of peak clipping and valley filling of the public power grid can be achieved, and the benefits of the mobile micro-grid and the two sides of the power system are simultaneously met.
(2) And (3) transition space design:
in order to ensure that the energy supplementing requirement of the mobile micro-grid is met, the invention calculates the lower limit of the SOC of the energy storage battery at each moment required by meeting the energy supplementing requirement of the micro-grid in advance, and ensures that the SOC is not lower than the lower limit in the energy supplementing process, thereby ensuring that the energy supplementing requirement of the mobile micro-grid is met when the mobile micro-grid leaves. The calculation mode of the SOC lower limit is as follows: assuming that the micro-grid firstly discharges with the maximum discharge power until the battery SOC reaches the lowest safety lower limit, and then charges with the maximum charge power after waiting for a certain time, so that the micro-grid reaches the required SOC just when leaving, the specific calculation formula is as follows:
wherein ,et,i SOC of energy storage battery at t moment for ith mobile micro-grid, t a,i and tl,i The ith mobile micro-grid is supplied with energy at the start and the preset end moments, and />The actual charging efficiency and the actual discharging efficiency of the ith mobile micro-grid respectively, P i dis,max and Pi ch,max Respectively, the maximum discharge power and the maximum charge power which can be born by the ith mobile micro-grid energy storage battery, delta t is the interval time (1 hour in the invention) between adjacent moments, c i Energy storage battery capacity, e, for the ith mobile microgrid i min E is the lowest safety lower limit of the SOC of the ith mobile micro-grid energy storage battery target,i The target energy storage battery SOC that is expected to be reached on departure for the ith mobile microgrid.
The selection of the state variables needs to be able to characterize the various states involved in the energy replenishment process of the mobile microgrid. In order to reflect the energy supplementing demand and the energy supplementing cost of the mobile micro-grid, state variables such as the SOC of the energy storage battery of the micro-grid, the remaining energy supplementing time of the micro-grid, the electricity price and the like at the current moment need to be selected, and macro battery parameters such as the capacity, the charging and discharging efficiency and the like of the energy storage battery of each micro-grid need to be included into the state variables due to large differences among the energy storage battery parameters used by various mobile micro-grids. Besides meeting the energy supplementing requirement of the micro-grid, the optimization target of reducing the power fluctuation of the power system for providing energy supplementing service is also needed to be achieved, the power system is often provided with conventional electrified loads such as logistics loads and the like, partial loads are offset through the renewable energy power generation and the output of the energy storage power station, and the residual loads are born by the public grid as exchange power. The output of the energy storage power station can be determined after the energy supplementing power is determined later, so that renewable energy generation power and conventional load power of the power system are selected as state variables to represent fluctuation of the exchange power. Furthermore, since there may be multiple energy-supplemented mobile micro-grids at the same time, the state variables also need to contain the total number of micro-grids that are being supplemented.
In summary, the state space is defined as follows:
wherein ,st,i The state variable set of the ith mobile micro-grid at the moment t is the current moment t stay,i Supplementing time, P, for remaining energy of the ith mobile micro-grid t,i sp Shore power access load power, p, generated at time t when accessing power system for ith mobile micro-grid ch,t For electricity price at time t, P t RE Is the sum of the output of renewable energy sources at the moment t of the power system, P t L Is the normal load power at the moment t of the power system, n b,t The total number of mobile micro-grids being supplied with energy at time t.
(3) And (3) designing an action space:
according to the invention, the energy supplementing power of the mobile micro-grid is set as the action variable to be decided, and as the DQN algorithm can only process discrete actions, the charge and discharge power is discretized into a plurality of gears to be selected, wherein positive values represent energy absorption from a power system, negative values represent energy release to the power system, and the action space is defined as follows:
wherein ,At Action to be performed by agent at time t, a t,i For the motion variable of the ith mobile micro-grid at the moment t, P t,i ch And (3) for the charging power of the ith mobile micro-grid at the time t, wherein I is a sequence number set of all mobile micro-grids connected to the power system.
(4) And (3) bonus function design:
the optimization objective of the strategy optimization method provided by the invention is to meet the energy supplement requirement of the mobile micro-grid, reduce the energy supplement cost and power fluctuation of the power system providing the energy supplement service, so that the reward function is designed for the three objectives respectively.
For the optimization target meeting the energy supplementing demand, the SOC lower limit at each moment can be calculated by using the energy supplementing demand guarantee method provided by the invention. However, during the training process, the action result determined by the DQN algorithm may make the SOC lower than the lower limit or higher than the upper safety limit, and an action correction is required, where the correction formula is as follows:
wherein ,ei max Maximum SOC upper limit defined for the energy storage battery of the ith mobile micro-grid to ensure the electric energy supplement safety, e t,i SOC of energy storage battery at t moment for ith mobile micro-grid, c i For the energy storage battery capacity of the ith mobile microgrid,e is the actual charging efficiency of the ith mobile microgrid t+1.i pred For the ith vessel by uncorrected action a t,i The resulting expected SOC at time t+1, < > and->Is the correlation coefficient of the charge and discharge efficiency of the ship, when e t,i <e t+1,i th,low Time->Otherwise->
In order to guide the intelligent agent to select the action quantity between the upper limit and the lower limit, the difference value between the action quantity before correction and the action quantity after correction is obtained, the opposite number of the absolute value is used as the action out-of-limit rewarding function value, the intelligent agent is forced to make a decision of the action without out-of-limit through the negative rewarding value, and the action out-of-limit rewarding function formula is as follows:
For the optimization target for reducing the energy supplementing cost, the opposite number of the supplementing cost is directly used as a rewarding function, the expense cost when the mobile micro-grid is charged corresponds to a negative rewarding value, the earning is earned when the mobile micro-grid is used for selling electricity, the positive rewarding value corresponds to the formula of the electric energy supplementing cost rewarding function as follows:
for the optimization target of reducing power fluctuation of a power grid, the mobile micro-grid is regarded as virtual energy storage of the power system, and peak clipping and valley filling of the power system for providing energy supplementary service are realized by switching between two behavior modes of charging and electricity selling. In order to measure the contribution of the mobile micro-grid as the virtual energy storage, the average value of the exchange power data of all time periods is needed to be calculated as a reference value, the invention is based on the exchange power which does not consider the energy supplementary power of the mobile micro-grid in the past, and the specific calculation formula is as follows:
wherein ,for t time of previous day, the exchange power of charging and discharging power of mobile micro-grid is not considered, P t L ,before Load power inside an electric power system providing energy supplementary service at time t of the previous day,/->Total microgrid power load, P, for the previous day t without accounting for mobile microgrid energy replenishment process t RE,before Total power generated by renewable energy sources inside an electric power system providing energy supplementary services at time t of the previous day, P t ST,before Power plant output, P, of an energy storage station inside an electric power system providing energy supplementary services at time t of the previous day t,i sp,before Shore power load power, P, from a power system providing energy supplementary service at time t of the previous day for providing power to a power system equipped with an ith mobile microgrid t,j ch,before The power system for providing energy supplementary service at t time of the previous day supplements load power brought by electric energy for the j-th block by using an energy storage battery replaced from the mobile micro-grid in a power conversion mode, I before Sequence number set of mobile micro-grid for all access power systems of previous day, J before The serial number of the energy storage battery is collected for all the energy storage batteries which are replaced from the mobile micro-grid in the previous day in a power exchanging mode.
In order to accurately measure the positive effect of the mobile micro-grid on reducing power fluctuation during the period of accessing the power system, the invention selects the whole day average value of the exchange power of the mobile micro-grid before the day when measuring the power fluctuation, and calculates the difference value from the exchange power at the moment of the day t to measure the fluctuation power; the calculation formula of the fluctuation power of the output of the energy storage power station is not considered as follows:
wherein ,Pt MG,nonST And exchanging power for the power grid which does not consider the output of the energy storage power station at the time t.
The method provided by the invention distributes additional tasks for the energy storage power station of the power system for providing energy supplementary service, the energy storage power station determines the output after the intelligent body makes a decision, and the purpose is to continuously reduce the power fluctuation to the maximum on the basis of the output of the mobile micro-grid, and the output power calculation formula of the energy storage power station is as follows:
wherein ,Et,k ST For the electric energy storage quantity eta of the kth battery in the t-moment energy storage power station ch,k and ηdis,k C, respectively, the charging efficiency and the discharging efficiency of the kth battery in the energy storage power station k ST Is the capacity of the kth battery in the energy storage power station.
After the output of the energy storage power station is obtained, the fluctuation power calculation formula for calculating the output of the energy storage power station is as follows:
P t f =P t f,nonST -P t ST
the power ripple reward function formula is as follows:
comprehensively considering the action out-of-limit rewarding function, the electric energy supplementing cost rewarding function and the power fluctuation rewarding function, and obtaining a total rewarding function formula as follows:
r t =σ exceed r t exceedcost r t costfluc r t fluc
wherein ,σexceed 、σ cost 、σ fluc The method comprises the steps of respectively obtaining weight coefficients of an action out-of-limit reward function, an electric energy supplementing cost reward function and a power fluctuation reward function, wherein the larger the coefficient is, the more importance is given to an optimization target corresponding to the reward function.
Training a mobile micro-grid energy supplementing strategy intelligent optimization system based on a DQN algorithm:
the intelligent optimization system for deploying the mobile micro-grid energy supplementing strategy is characterized in that an iterative action cost function is fed back continuously according to the environment, and finally an optimal strategy is obtained according to the optimal cost function. However, in real-world situations, the optimal cost function cannot be directly solved due to the complexity and uncertainty of the environment. Therefore, the invention uses the neural Network to evaluate the action cost function and adopts the Deep Q Network (DQN) algorithm to train the mobile micro-grid agent in hopes of gradually approaching the optimal action cost function and the optimal strategy through iteration.
The DQN algorithm introduces the current network Q (s, a, omega) and the target networkThe target network is a copy of the current network, but differs in that the current network updates the parameters after each training, and the target network synchronizes the parameters with the current network only at regular intervals. The target network is responsible for calculating the action value of the next state, and the current network is used for calculating the value of each action of the current state, and the parameters of the target network are not changed in real time due to each training, so that the stability of the training process is enhanced. The loss function after the introduction of the target network is as follows:
Wherein M is the number of sampling samples, R k The prize value for the kth sample, gamma is the discount factor,in state S for target network k ' value of action a taken under, Q (S k ,A k ω) is current network presenceState S k Take action A k Is of value (c).
The basic flow of the DQN algorithm is as shown in steps a to o:
step a: randomly selected parameters initialize the current network Q (s, a, omega) and the target network
Step b: initializing an experience playback pool;
step c: starting a new interaction sequence, and executing steps d to n;
step d: acquiring an initial state S of a current sequence 1
Step e: steps f to m are performed for each time step t=1→t in the sequence;
step f: deriving current state S from current network t Action value under and action A executed by E-greedy strategy is selected t Setting an epsilon value by using an epsilon-greedy strategy, randomly selecting the action to be executed according to the epsilon probability, and selecting the action with the highest value according to the epsilon probability of 1-;
step g: execute action A t After that, get rewards R t Transition to the next state S t+1
Step h: sample state transition (S t ,A t ,R t ,S t+1 ) Storing the experience playback pool;
step i: if the number of samples in the experience playback pool exceeds a certain number, performing steps j to l;
step j: randomly sampling N samples from an experience playback pool;
Step k: calculation of loss function from sampled samples
Step l: minimizing a loss function value by using a gradient descent method, and updating parameters of the current network Q (s, a, omega) with a step length beta;
step m: the target network is set at regular time stepsSynchronizing parameters with the current network Q (s, a, ω);
step n: returning to step e until the interaction sequence is terminated;
step o: returning to the step c until all the interaction sequences are executed;
after training the movable micro-grid intelligent body through the DQN algorithm, obtaining an optimal cost function, and selecting the action with the highest value according to the state of the intelligent body to obtain an optimal strategy.
In summary, according to the method for optimizing the energy replenishment strategy of the mobile micro-grid based on deep reinforcement learning provided by the invention, firstly, the energy replenishment process of the mobile micro-grid is characterized as a Markov decision process considering environmental uncertainty, and a state space, an action space and a reward function of the mobile micro-grid are designed; further training an energy supplementing strategy intelligent optimization system of the mobile micro-grid by adopting an DQN algorithm combining deep learning and reinforcement learning, learning from historical data, and capturing a coupling relation between the mobile micro-grid and a power system for providing energy supplementing service; and finally, based on the trained intelligent optimization system of the energy supplementing strategy of the mobile micro-grid, the intelligent energy supplementing strategy of the mobile micro-grid is realized. The basic flow is shown in fig. 2.
Illustrating:
taking the data of a port for carrying out energy supplement on electrified ships in one day as an example, the invention constructs an DQN algorithm according to the parameters of the table 1, trains the ship intelligent agent through 60000 interaction sequences, and the training process is shown in figure 3. The trained ship real-time energy management intelligent decision system is tested on a test scene, and the test result is compared with the results of the method of charging with maximum power all the time and the method of deciding charging power by adopting a random strategy, wherein the test result is shown in fig. 4 and 5.
Table 1Q network and superparameter design
Parameters (parameters) Numerical value Parameters (parameters) Numerical value
Learning rate 0.0001 Neuron activation function ReLU
Discount factor 0.95 Number of hidden layers 2
Experience playback pool capacity 10000 Number of neurons per hidden layer 40、20
Small sample number 128 Target network update period 100
It can be seen that in the online decision process, the method achieves optimal performance on two optimization targets of energy supplementing cost and power grid power fluctuation, and is remarkably superior to unordered charging results. Specifically, the method reduces the energy supplementing cost by about 5% compared with the traditional disordered charging method; on the power fluctuation of the power grid, the power is reduced by nearly 25%, and the powerful optimization capability of the method is embodied.
The intelligent optimization system for the energy replenishment strategy of the mobile micro-grid based on the DQN algorithm after training can realize online decision in the energy replenishment process of the micro-grid, can adapt to the complexity in the power system for providing energy replenishment service, and can give consideration to two optimization targets of the energy replenishment cost of the micro-grid and the operation stability of the power system on the premise of guaranteeing the energy replenishment requirement of the micro-grid.
The embodiment of the invention provides a method and a system for optimizing an energy supplementing strategy of a mobile micro-grid, which are used for adapting to the complexity in a power system for providing energy supplementing service, so that the provided scheme can meet the requirements of both the mobile micro-grid and the power system for providing energy supplementing service.
The scheme has the following characteristics:
1. without being limited by the optimization method considering only the ideal charging pile model, the invention provides the mobile micro-grid energy supplementing strategy optimization method related to the internal complexity of the power system for providing the energy supplementing service, which is beneficial to further reducing the energy supplementing cost through collaborative optimization and improving the economical efficiency of electrified vehicles.
2. Different from the existing energy supplementing strategy optimization method of the mobile micro-grid, the power system for providing energy supplementing service is subdivided into four parts of source, grid, load and storage, the coupling relation and interaction mechanism between the mobile micro-grid and the connected power system are carefully examined, and the general optimization method capable of considering both energy supplementing economy and power system operation stability is realized.
3. The application provides an algorithm framework for deploying the intelligent optimization system of the energy supplementing strategy based on a deep reinforcement learning method, and the neural network is utilized to evaluate the action cost function related to the continuous state quantity parameters, so that the optimal strategy can be more effectively solved from the complex model.
Those skilled in the art will appreciate that the application provides a system and its individual devices, modules, units, etc. that can be implemented entirely by logic programming of method steps, in addition to being implemented as pure computer readable program code, in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Therefore, the system and various devices, modules and units thereof provided by the application can be regarded as a hardware component, and the devices, modules and units for realizing various functions included in the system can also be regarded as structures in the hardware component; means, modules, and units for implementing the various functions may also be considered as either software modules for implementing the methods or structures within hardware components.
The foregoing describes specific embodiments of the present application. It is to be understood that the application is not limited to the particular embodiments described above, and that various changes or modifications may be made by those skilled in the art within the scope of the appended claims without affecting the spirit of the application. The embodiments of the application and the features of the embodiments may be combined with each other arbitrarily without conflict.

Claims (10)

1. The utility model provides a portable micro-grid energy replenishment strategy optimization method which is characterized by comprising the following steps:
and (3) characterizing an energy supplementing step of the mobile micro-grid: the method comprises the steps of regarding a mobile micro-grid as an agent in a Markov decision process, designing a state space, an action space and a reward function of intelligent energy supplement regulation of the mobile micro-grid based on the Markov decision process, and regarding energy supplement power of the mobile micro-grid as an action to be decided;
training a mobile micro-grid energy supplementing strategy optimizing step: training a movable micro-grid agent through a DQN algorithm to obtain an optimal cost function, and selecting the action with the highest value by the agent according to the state to obtain an optimal strategy;
deploying a trained mobile micro-grid energy supplementing strategy step: based on the trained mobile micro-grid intelligent body, energy supplementing power is selected according to the optimal cost function obtained through training, and an intelligent energy supplementing strategy of the mobile micro-grid is realized.
2. The mobile microgrid energy replenishment strategy optimization method according to claim 1, wherein the state space design in the step of characterizing the energy replenishment of the mobile microgrid comprises: the lower limit of the SOC of the energy storage battery at each moment required by meeting the energy supplementing requirement of the micro-grid is calculated in advance, and the SOC is ensured not to be lower than the lower limit in the energy supplementing process, so that the energy supplementing requirement of the mobile micro-grid is ensured to be met when the mobile micro-grid leaves;
The calculation mode of the SOC lower limit is as follows:
wherein ,et,i SOC of energy storage battery at t moment for ith mobile micro-grid, t a,i and tl,i The ith mobile micro-grid is supplied with energy at the start and the preset end moments, and />The actual charging efficiency and the actual discharging efficiency of the ith mobile micro-grid respectively, P i dis,max and Pi ch,max Respectively, the maximum discharge power and the maximum charge power which can be born by the ith mobile micro-grid energy storage battery, delta t is the interval time of adjacent moments, c i Energy storage battery capacity, e, for the ith mobile microgrid i min E is the lowest safety lower limit of the SOC of the ith mobile micro-grid energy storage battery target,i A target energy storage battery SOC that is expected to be reached on departure for the ith mobile microgrid;
the state space is defined as follows:
wherein ,st,i The state variable set of the ith mobile micro-grid at the moment t is the current moment t stay,i Supplementing time, P, for remaining energy of the ith mobile micro-grid t,i sp Shore power access load power, p, generated at time t when accessing power system for ith mobile micro-grid ch,t For electricity price at time t, P t RE Is the sum of the output of renewable energy sources at the moment t of the power system, P t L Is the normal load power at the moment t of the power system, n b,t The total number of mobile micro-grids being supplied with energy at time t.
3. The mobile microgrid energy replenishment strategy optimization method according to claim 2, wherein the characterizing the action space design in the energy replenishment step of the mobile microgrid comprises: setting the energy supplementing power of the mobile micro-grid as an action variable to be decided, discretizing the charging and discharging power into a plurality of gears for selection, wherein positive values represent energy absorption from a power system, negative values represent energy release to the power system, and an action space is defined as follows:
wherein ,At Action to be performed by agent at time t, a t,i For the motion variable of the ith mobile micro-grid at the moment t, P t,i ch And (3) for the charging power of the ith mobile micro-grid at the time t, wherein I is a sequence number set of all mobile micro-grids connected to the power system.
4. A mobile microgrid energy replenishment strategy optimization method according to claim 3, wherein the step of characterizing the energy replenishment of the mobile microgrid comprises:
for an optimization target meeting energy supplementing requirements, calculating the SOC lower limit at each moment; however, during the training process, the action result determined by the DQN algorithm may make the SOC lower than the lower limit or higher than the safety upper limit, and an action correction is required, where the correction formula is as follows:
wherein ,ei max Maximum SOC upper limit defined for the energy storage battery of the ith mobile micro-grid to ensure the electric energy supplement safety, e t,i SOC of energy storage battery at t moment for ith mobile micro-grid, c i For the energy storage battery capacity of the ith mobile microgrid,e is the actual charging efficiency of the ith mobile microgrid t+1.i pred For the ith vessel by uncorrected action a t,i The resulting expected SOC at time t+1, < > and->Is the correlation coefficient of the charge and discharge efficiency of the ship, when e t,i <e t+1,i th,low Time of dayOtherwise->
In order to guide the intelligent agent to select the action quantity between the upper limit and the lower limit, the difference value between the action quantity before correction and the action quantity after correction is obtained, the opposite number of the absolute value is used as the action out-of-limit rewarding function value, the intelligent agent is forced to make a decision of the action without out-of-limit through the negative rewarding value, and the action out-of-limit rewarding function formula is as follows:
for the optimization target for reducing the energy supplementing cost, the opposite number of the supplementing cost is directly used as a rewarding function, the expense cost when the mobile micro-grid is charged corresponds to a negative rewarding value, the earning is earned when the mobile micro-grid is used for selling electricity, the positive rewarding value corresponds to the formula of the electric energy supplementing cost rewarding function as follows:
for the optimization target of reducing power fluctuation of the power grid, for measuring the contribution of the mobile micro-grid as virtual energy storage, the average value of the exchange power data of all time periods is required to be calculated as a reference value, and the exchange power of the energy supplement power of the mobile micro-grid is not considered in the past, and the specific calculation formula is as follows:
wherein ,for t time of previous day, the exchange power of charging and discharging power of mobile micro-grid is not considered, P t L,before Load power inside an electric power system providing energy supplementary service at time t of the previous day,/->Total microgrid power load, P, for the previous day t without accounting for mobile microgrid energy replenishment process t RE,before Total power generated by renewable energy sources inside an electric power system providing energy supplementary services at time t of the previous day, P t ST,before Energy storage plant output inside an electric power system providing energy replenishment service at time t of the previous dayPower, P t,i sp,before Shore power load power, P, from a power system providing energy supplementary service at time t of the previous day for providing power to a power system equipped with an ith mobile microgrid t,j ch,before The power system for providing energy supplementary service at t time of the previous day supplements load power brought by electric energy for the j-th block by using an energy storage battery replaced from the mobile micro-grid in a power conversion mode, I before Sequence number set of mobile micro-grid for all access power systems of previous day, J before The serial number set of the energy storage batteries which are replaced from the mobile micro-grid in a power conversion mode is used for all the previous days;
selecting a full-day average value of the exchange power of the mobile micro-grid not considered before the day when measuring the power fluctuation, and calculating a difference value from the exchange power at the moment of the day t to measure the fluctuation power; the calculation formula of the fluctuation power of the output of the energy storage power station is not considered as follows:
wherein ,Pt MG,nonST Exchanging power for a power grid which does not consider the output of the energy storage power station at the time t of the day;
the energy storage power station determines the output after the decision is made by the intelligent agent, and aims to continuously reduce the power fluctuation to the maximum on the basis of the output of the movable micro-grid, wherein the output power calculation formula of the energy storage power station is as follows:
wherein ,Et,k ST For the electric energy storage quantity eta of the kth battery in the t-moment energy storage power station ch,k and ηdis,k C, respectively, the charging efficiency and the discharging efficiency of the kth battery in the energy storage power station k ST The capacity of a kth battery in the energy storage power station;
after the output of the energy storage power station is obtained, the fluctuation power calculation formula for calculating the output of the energy storage power station is as follows:
P t f =P t f,nonST -P t ST
the power ripple reward function formula is as follows:
comprehensively considering the action out-of-limit rewarding function, the electric energy supplementing cost rewarding function and the power fluctuation rewarding function, and obtaining a total rewarding function formula as follows:
r t =σ exceed r t exceedcost r t costfluc r t fluc
wherein ,σexceed 、σ cost 、σ fluc The method comprises the steps of respectively obtaining weight coefficients of an action out-of-limit reward function, an electric energy supplementing cost reward function and a power fluctuation reward function, wherein the larger the coefficient is, the more importance is given to an optimization target corresponding to the reward function.
5. The mobile microgrid energy replenishment strategy optimization method according to claim 1, wherein the training mobile microgrid energy replenishment strategy optimization step comprises:
Using a neural network to evaluate the action cost function, and training a movable micro-grid intelligent body by adopting a DQN algorithm in hopes of gradually approaching an optimal action cost function and an optimal strategy through iteration;
the DQN algorithm introduces the current network Q (s, a, omega) and the target networkThe target network is responsible for calculating the action value of the next state, the current network is used for calculating the value of each action of the current state, and the loss function after the target network is introduced is as follows:
wherein M is the number of sampling samples, R k The prize value for the kth sample, gamma is the discount factor,in state S for target network k ' value of action a taken under, Q (S k ,A k ω) is the current network in state S k Take action A k Is of value (c).
6. The method of optimizing energy replenishment strategy for a mobile micro grid according to claim 5, wherein the basic flow of the DQN algorithm is as follows:
step a: randomly selected parameters initialize the current network Q (s, a, omega) and the target network
Step b: initializing an experience playback pool;
step c: starting a new interaction sequence, and executing steps d to n;
step d: acquiring an initial state S of a current sequence 1
Step e: steps f to m are performed for each time step t=1→t in the sequence;
step f: deriving current state S from current network t Action value under and action A executed by E-greedy strategy is selected t Setting an epsilon value by using an epsilon-greedy strategy, randomly selecting the action to be executed according to the epsilon probability, and selecting the action with the highest value according to the epsilon probability of 1-;
step g: execute action A t After that, get rewards R t Transition to the next state S t+1
Step h: sample state transition (S t ,A t ,R t ,S t+1 ) Storing the experience playback pool;
step i: if the number of samples in the experience playback pool exceeds a certain number, performing steps j to l;
step j: randomly sampling N samples from an experience playback pool;
step k: calculation of loss function from sampled samples
Step l: minimizing a loss function value by using a gradient descent method, and updating parameters of the current network Q (s, a, omega) with a step length beta;
step m: the target network is set at regular time stepsSynchronizing parameters with the current network Q (s, a, ω);
step n: returning to step e until the interaction sequence is terminated;
step o: returning to the step c until all the interaction sequences are executed;
after training the movable micro-grid intelligent body through the DQN algorithm, obtaining an optimal cost function, and selecting the action with the highest value according to the state of the intelligent body to obtain an optimal strategy.
7. A mobile microgrid energy replenishment strategy optimization system, comprising:
An energy replenishment module characterizing a mobile microgrid: the method comprises the steps of regarding a mobile micro-grid as an agent in a Markov decision process, designing a state space, an action space and a reward function of intelligent energy supplement regulation of the mobile micro-grid based on the Markov decision process, and regarding energy supplement power of the mobile micro-grid as an action to be decided;
training a mobile micro-grid energy supplementing strategy optimization module: training a movable micro-grid agent through a DQN algorithm to obtain an optimal cost function, and selecting the action with the highest value by the agent according to the state to obtain an optimal strategy;
deploying a trained mobile micro-grid energy supplementing strategy module: based on the trained mobile micro-grid intelligent body, energy supplementing power is selected according to the optimal cost function obtained through training, and an intelligent energy supplementing strategy of the mobile micro-grid is realized.
8. The mobile microgrid energy replenishment strategy optimization system of claim 7, wherein said characterizing a state space design in an energy replenishment module of a mobile microgrid comprises: the lower limit of the SOC of the energy storage battery at each moment required by meeting the energy supplementing requirement of the micro-grid is calculated in advance, and the SOC is ensured not to be lower than the lower limit in the energy supplementing process, so that the energy supplementing requirement of the mobile micro-grid is ensured to be met when the mobile micro-grid leaves;
The calculation mode of the SOC lower limit is as follows:
wherein ,et,i SOC of energy storage battery at t moment for ith mobile micro-grid, t a,i and tl,i The ith mobile micro-grid is supplied with energy at the start and the preset end moments, and />The actual charging efficiency and the actual discharging efficiency of the ith mobile micro-grid respectively, P i dis,max and Pi ch,max Respectively, the maximum discharge power and the maximum charge power which can be born by the ith mobile micro-grid energy storage battery, delta t is the interval time of adjacent moments, c i Energy storage battery capacity, e, for the ith mobile microgrid i min E is the lowest safety lower limit of the SOC of the ith mobile micro-grid energy storage battery target,i A target energy storage battery SOC that is expected to be reached on departure for the ith mobile microgrid;
the state space is defined as follows:
wherein ,st,i The state variable set of the ith mobile micro-grid at the moment t is the current moment t stay,i Supplementing time, P, for remaining energy of the ith mobile micro-grid t,i sp Shore power access load power, p, generated at time t when accessing power system for ith mobile micro-grid ch,t For electricity price at time t, P t RE Is the sum of the output of renewable energy sources at the moment t of the power system, P t L Is the normal load power at the moment t of the power system, n b,t The total number of the movable micro-grids which are being supplemented with energy at the moment t;
The design of the action space in the energy supplementing module for representing the mobile micro-grid comprises the following steps: setting the energy supplementing power of the mobile micro-grid as an action variable to be decided, discretizing the charging and discharging power into a plurality of gears for selection, wherein positive values represent energy absorption from a power system, negative values represent energy release to the power system, and an action space is defined as follows:
wherein ,At Action to be performed by agent at time t, a t,i For the motion variable of the ith mobile micro-grid at the moment t, P t,i ch The charging power of the ith mobile micro-grid at the time t is obtained, wherein I is a sequence number set of all mobile micro-grids connected to the power system;
the bonus function design in the energy replenishment module characterizing the mobile microgrid comprises:
for an optimization target meeting energy supplementing requirements, calculating the SOC lower limit at each moment; however, during the training process, the action result determined by the DQN algorithm may make the SOC lower than the lower limit or higher than the safety upper limit, and an action correction is required, where the correction formula is as follows:
wherein ,ei max Maximum SOC upper limit defined for the energy storage battery of the ith mobile micro-grid to ensure the electric energy supplement safety, e t,i SOC of energy storage battery at t moment for ith mobile micro-grid, c i For the energy storage battery capacity of the ith mobile microgrid,e is the actual charging efficiency of the ith mobile microgrid t+1.i pred For the ith vessel by uncorrected action a t,i The resulting expected SOC at time t+1, < > and->Is the correlation coefficient of the charge and discharge efficiency of the ship, when e t,i <e t+1,i th,low Time of dayOtherwise->
In order to guide the intelligent agent to select the action quantity between the upper limit and the lower limit, the difference value between the action quantity before correction and the action quantity after correction is obtained, the opposite number of the absolute value is used as the action out-of-limit rewarding function value, the intelligent agent is forced to make a decision of the action without out-of-limit through the negative rewarding value, and the action out-of-limit rewarding function formula is as follows:
for the optimization target for reducing the energy supplementing cost, the opposite number of the supplementing cost is directly used as a rewarding function, the expense cost when the mobile micro-grid is charged corresponds to a negative rewarding value, the earning is earned when the mobile micro-grid is used for selling electricity, the positive rewarding value corresponds to the formula of the electric energy supplementing cost rewarding function as follows:
for the optimization target of reducing power fluctuation of the power grid, for measuring the contribution of the mobile micro-grid as virtual energy storage, the average value of the exchange power data of all time periods is required to be calculated as a reference value, and the exchange power of the energy supplement power of the mobile micro-grid is not considered in the past, and the specific calculation formula is as follows:
wherein ,for t time of previous day, the exchange power of charging and discharging power of mobile micro-grid is not considered, P t L,before Load power inside an electric power system providing energy supplementary service at time t of the previous day,/->Total microgrid power load, P, for the previous day t without accounting for mobile microgrid energy replenishment process t RE,before Total power generated by renewable energy sources inside an electric power system providing energy supplementary services at time t of the previous day, P t ST,before Power plant output, P, of an energy storage station inside an electric power system providing energy supplementary services at time t of the previous day t,i sp,before Power system providing energy supplementary service for t moment of previous day is power system equipped for ith mobile micro gridShore power load power, P, caused by power supply t,j ch,before The power system for providing energy supplementary service at t time of the previous day supplements load power brought by electric energy for the j-th block by using an energy storage battery replaced from the mobile micro-grid in a power conversion mode, I before Sequence number set of mobile micro-grid for all access power systems of previous day, J before The serial number set of the energy storage batteries which are replaced from the mobile micro-grid in a power conversion mode is used for all the previous days;
selecting a full-day average value of the exchange power of the mobile micro-grid not considered before the day when measuring the power fluctuation, and calculating a difference value from the exchange power at the moment of the day t to measure the fluctuation power; the calculation formula of the fluctuation power of the output of the energy storage power station is not considered as follows:
wherein ,Pt MG,nonST Exchanging power for a power grid which does not consider the output of the energy storage power station at the time t of the day;
the energy storage power station determines the output after the decision is made by the intelligent agent, and aims to continuously reduce the power fluctuation to the maximum on the basis of the output of the movable micro-grid, wherein the output power calculation formula of the energy storage power station is as follows:
wherein ,Et,k ST For the electric energy storage quantity eta of the kth battery in the t-moment energy storage power station ch,k and ηdis,k C, respectively, the charging efficiency and the discharging efficiency of the kth battery in the energy storage power station k ST The capacity of a kth battery in the energy storage power station;
after the output of the energy storage power station is obtained, the fluctuation power calculation formula for calculating the output of the energy storage power station is as follows:
P t f =P t f,nonST -P t ST
the power ripple reward function formula is as follows:
comprehensively considering the action out-of-limit rewarding function, the electric energy supplementing cost rewarding function and the power fluctuation rewarding function, and obtaining a total rewarding function formula as follows:
r t =σ exceed r t exceedcost r t costfluc r t fluc
wherein ,σexceed 、σ cost 、σ fluc The method comprises the steps of respectively obtaining weight coefficients of an action out-of-limit reward function, an electric energy supplementing cost reward function and a power fluctuation reward function, wherein the larger the coefficients are, the more importance is given to an optimization target corresponding to the reward function;
the training mobile micro-grid energy supplementing strategy optimization module comprises:
using a neural network to evaluate the action cost function, and training a movable micro-grid intelligent body by adopting a DQN algorithm in hopes of gradually approaching an optimal action cost function and an optimal strategy through iteration;
The DQN algorithm introduces the current network Q (s, a, omega) and the target networkThe target network is responsible for calculating the action value of the next state, the current network is used for calculating the value of each action of the current state, and the loss function after the target network is introduced is as follows:
wherein M is the number of sampling samples, R k The prize value for the kth sample, gamma is the discount factor,in state S for target network k ' value of action a taken under, Q (S k ,A k ω) is the current network in state S k Take action A k Is of value (1);
the basic flow of the DQN algorithm is as follows:
step a: randomly selected parameters initialize the current network Q (s, a, omega) and the target network
Step b: initializing an experience playback pool;
step c: starting a new interaction sequence, and executing steps d to n;
step d: acquiring an initial state S of a current sequence 1
Step e: steps f to m are performed for each time step t=1→t in the sequence;
step f: deriving current state S from current network t Action value under and action A executed by E-greedy strategy is selected t Setting an epsilon value by using an epsilon-greedy strategy, randomly selecting the action to be executed according to the epsilon probability, and selecting the action with the highest value according to the epsilon probability of 1-;
step g: execute action A t After that, get rewards R t Transition to the next state S t+1
Step h: sample state transition (S t ,A t ,R t ,S t+1 ) Storing the experience playback pool;
step i: if the number of samples in the experience playback pool exceeds a certain number, performing steps j to l;
step j: randomly sampling N samples from an experience playback pool;
step k: calculation of loss function from sampled samples
Step l: minimizing a loss function value by using a gradient descent method, and updating parameters of the current network Q (s, a, omega) with a step length beta;
step m: the target network is set at regular time stepsSynchronizing parameters with the current network Q (s, a, ω);
step n: returning to step e until the interaction sequence is terminated;
step o: returning to the step c until all the interaction sequences are executed;
after training the movable micro-grid intelligent body through the DQN algorithm, obtaining an optimal cost function, and selecting the action with the highest value according to the state of the intelligent body to obtain an optimal strategy.
9. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the mobile microgrid energy replenishment strategy optimization method of any one of claims 1 to 6.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the computer program when executed by the processor implements the steps of the mobile microgrid energy replenishment policy optimization method of any one of claims 1 to 6.
CN202310750126.XA 2023-06-21 2023-06-21 Method and system for optimizing energy supplement strategy of mobile micro-grid Pending CN116683513A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310750126.XA CN116683513A (en) 2023-06-21 2023-06-21 Method and system for optimizing energy supplement strategy of mobile micro-grid

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310750126.XA CN116683513A (en) 2023-06-21 2023-06-21 Method and system for optimizing energy supplement strategy of mobile micro-grid

Publications (1)

Publication Number Publication Date
CN116683513A true CN116683513A (en) 2023-09-01

Family

ID=87782007

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310750126.XA Pending CN116683513A (en) 2023-06-21 2023-06-21 Method and system for optimizing energy supplement strategy of mobile micro-grid

Country Status (1)

Country Link
CN (1) CN116683513A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117648123A (en) * 2024-01-30 2024-03-05 中国人民解放军国防科技大学 Micro-service rapid integration method, system, equipment and storage medium
CN117808174A (en) * 2024-03-01 2024-04-02 山东大学 Micro-grid operation optimization method and system based on reinforcement learning under network attack
CN117648123B (en) * 2024-01-30 2024-06-11 中国人民解放军国防科技大学 Micro-service rapid integration method, system, equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117648123A (en) * 2024-01-30 2024-03-05 中国人民解放军国防科技大学 Micro-service rapid integration method, system, equipment and storage medium
CN117648123B (en) * 2024-01-30 2024-06-11 中国人民解放军国防科技大学 Micro-service rapid integration method, system, equipment and storage medium
CN117808174A (en) * 2024-03-01 2024-04-02 山东大学 Micro-grid operation optimization method and system based on reinforcement learning under network attack
CN117808174B (en) * 2024-03-01 2024-05-28 山东大学 Micro-grid operation optimization method and system based on reinforcement learning under network attack

Similar Documents

Publication Publication Date Title
CN109492815B (en) Energy storage power station site selection and volume fixing optimization method for power grid under market mechanism
CN109347149B (en) Micro-grid energy storage scheduling method and device based on deep Q-value network reinforcement learning
CN110570007B (en) Multi-time-scale optimal scheduling method for electric automobile
CN104269849B (en) Energy management method based on building photovoltaic micro and system
CN110138006B (en) Multi-microgrid coordinated optimization scheduling method considering new energy electric vehicle
CN114091879A (en) Multi-park energy scheduling method and system based on deep reinforcement learning
CN112217195B (en) Cloud energy storage charging and discharging strategy forming method based on GRU multi-step prediction technology
CN112491094B (en) Hybrid-driven micro-grid energy management method, system and device
CN113326994A (en) Virtual power plant energy collaborative optimization method considering source load storage interaction
CN111626527A (en) Intelligent power grid deep learning scheduling method considering fast/slow charging/discharging form of schedulable electric vehicle
CN112366704A (en) Comprehensive energy system tie line power control method based on excitation demand response
CN114997935B (en) Electric vehicle charging and discharging strategy optimization method based on interior point strategy optimization
CN116683513A (en) Method and system for optimizing energy supplement strategy of mobile micro-grid
Wan et al. A data-driven approach for real-time residential EV charging management
CN111293682A (en) Multi-microgrid energy management method based on cooperative model predictive control
CN117057553A (en) Deep reinforcement learning-based household energy demand response optimization method and system
CN110796286A (en) Flexible planning method of power distribution system suitable for electric automobile large-scale application
Erick et al. Energy trading in grid-connected PV-battery electric vehicle charging station
CN115663793A (en) Electric automobile low-carbon charging and discharging scheduling method based on deep reinforcement learning
Yi et al. Optimal energy management strategy for smart home with electric vehicle
CN114723230A (en) Micro-grid double-layer scheduling method and system for new energy power generation and energy storage
CN117691586A (en) New energy base micro-grid optimized operation method and system based on behavior cloning
CN116914755B (en) Light-storage joint planning method and system considering battery cycle life
CN113988403A (en) Electric vehicle charging load prediction method and system
CN114421502A (en) Cooperative optimization method for micro-grid community

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination