CN111242443A - Deep reinforcement learning-based economic dispatching method for virtual power plant in energy internet - Google Patents

Deep reinforcement learning-based economic dispatching method for virtual power plant in energy internet Download PDF

Info

Publication number
CN111242443A
CN111242443A CN202010010410.XA CN202010010410A CN111242443A CN 111242443 A CN111242443 A CN 111242443A CN 202010010410 A CN202010010410 A CN 202010010410A CN 111242443 A CN111242443 A CN 111242443A
Authority
CN
China
Prior art keywords
network
operator
information
time slot
power generation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010010410.XA
Other languages
Chinese (zh)
Other versions
CN111242443B (en
Inventor
孙迪
王宁
关心
林霖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Heilongjiang Electric Power Co Ltd
Heilongjiang University
Original Assignee
State Grid Heilongjiang Electric Power Co Ltd
Heilongjiang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Heilongjiang Electric Power Co Ltd, Heilongjiang University filed Critical State Grid Heilongjiang Electric Power Co Ltd
Priority to CN202010010410.XA priority Critical patent/CN111242443B/en
Publication of CN111242443A publication Critical patent/CN111242443A/en
Application granted granted Critical
Publication of CN111242443B publication Critical patent/CN111242443B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06312Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E40/00Technologies for an efficient electrical power generation, transmission or distribution
    • Y02E40/70Smart grids as climate change mitigation technology in the energy generation sector
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Abstract

A virtual power plant economic dispatching method in an energy internet based on deep reinforcement learning belongs to the technical field of energy distribution of virtual power plants. The invention solves the problems of large communication load and delay, high calculation complexity and poor reliability of data transmission in the existing method. The invention provides a distributed power generation economic dispatching structure utilizing a three-layer system structure based on edge calculation, wherein: the first and second layers are edge computing layers, and the third layer is a cloud computing layer. The proposed three-layer edge computing architecture reduces the computational complexity of processing training tasks at the central node, and further reduces the communication load between the VPP operator and the DG, thereby also reducing the response time of industrial users, and simultaneously also keeping the privacy of the industrial users and improving the reliability of data transmission. The invention can be applied to the energy distribution of the virtual power plant.

Description

Deep reinforcement learning-based economic dispatching method for virtual power plant in energy internet
Technical Field
The invention belongs to the technical field of energy distribution of virtual power plants, and particularly relates to a virtual power plant economic dispatching method in an energy internet based on deep reinforcement learning.
Background
With the access of large-scale distributed power generation in the energy internet, due to the limitation of geographical conditions, the traditional micro-grid has certain limitation, so that the effective utilization of multi-region large-scale distributed power generation is hindered, and the power reduction is very frequent. Due to the mismatch between the scale of construction of renewable energy stations and the demand of local loads, the capacity of renewable energy sources is limited, resulting in a certain amount of power reduction in wind power stations and photovoltaic power station concentration areas. Compared with a micro-grid, the VPP has a larger energy load channel, can better match the construction scale of renewable energy with the demand scale of local load, and reduces power reduction.
Due to the complexity of economic dispatch scenarios, such as intelligent devices that manage distributed renewable energy and industrial users, large amounts of different types of data need to be transmitted in real time. Due to the close relationship between industrial users and VPP operators, reasonable economic scheduling should take full account of user participation. Industrial users can participate in economic dispatch by contracting with VPP operators. The VPP operator needs to receive data from the demand side industrial users and DG units (distributed generation units). Since data transmission between a VPP operator and a device requires a certain degree of performance guarantees to achieve optimal economic scheduling, VPPs employ advanced control, sensing and communication techniques to sense and collect data and transmit it to the VPP's economic scheduling control center. VPPs achieve optimal economic scheduling in complex situations, requiring consideration of the wireless link between most devices and the VPP operator, and large data transfers can easily exceed transmission capacity limits. Thus, resource-limited bulk devices cannot directly send demand to a VPP operator, which poses a significant challenge to efficient economic scheduling.
Traditionally, VPP operators distribute geographically dispersed distributed power supplies in a centralized fashion. The information of the users and the real-time status data of the DGs from the plurality of areas are sent to the cloud for storage and processing, which results in a large network communication load and consumption of computing resources. However, this results in higher network delay and computational complexity. In practical situations, long distance data transmission from various DG and industrial users to a cloud computing center can consume a large amount of energy. Moreover, transmitted data raises privacy concerns for industrial users in different regions. In a traditional cloud computing mode, local sensitive data needs to be uploaded to a cloud computing center, and the risk of privacy disclosure of a user is increased. In addition, the generation and transmission of a large amount of data makes it difficult to accurately ensure the reliability of data transmission in a complex environment.
Disclosure of Invention
The invention aims to solve the problems of high computational complexity, large communication load and delay and poor reliability of data transmission in the conventional method, and provides an economic dispatching method of a virtual power plant in an energy internet based on deep reinforcement learning.
The technical scheme adopted by the invention for solving the technical problems is as follows: the method for economically scheduling the virtual power plant in the energy internet based on deep reinforcement learning comprises the following steps:
step one, for any area I, collecting power generation side information and user side information from the area I by using an industrial side server and a power supply side server of the area I, wherein I is 1,2, …, and I is the total number of the areas;
respectively training the operator-critical network by using the information collected by each region to respectively obtain the operator-critical network trained by using the information of each region;
step two, deploying the trained operator-critic networks at edge nodes of corresponding areas respectively;
and step three, the industrial side server and the power supply side server in each area collect information from the power generation side and the user side in real time, input the collected information into an operator-critical network on a corresponding edge node, and obtain decision information of each area in real time.
The invention has the beneficial effects that: the invention provides a deep reinforcement learning-based economic dispatching method for a virtual power plant in an energy internet. Since we consider real-time economic dispatch scenarios, demand response and energy delivery are real-time. And on the second layer, the agent manages the distributed power supplies and the industrial users of the local area to perform online scheduling, and compared with the mode that scheduling of all areas is put into a cloud center, communication delay and response time to the industrial users can be reduced. The calculation and the storage are completed in the edge node, the application program is started on the edge server, and the new energy is used for supplying power to the server nearby, so that the energy consumption can be obviously reduced. In the framework proposed by the invention, the first and second layers are edge computing layers, while the third layer is a cloud computing layer. The proposed three-layer edge computing architecture reduces the computational complexity of processing training tasks at the central node, and further reduces the communication load between the VPP operator and the DG department, thereby also reducing the response time of industrial users, and simultaneously also keeping the privacy of the industrial users and improving the reliability of data transmission.
Drawings
FIG. 1 is a diagram of an economic dispatch architecture proposed by the present invention;
FIG. 2 is a block diagram of a distributed power generation economic dispatch architecture utilizing a three-tier architecture based on edge computing as proposed by the present invention;
FIG. 3 is a diagram of an information delivery model for DRL-based VPP economic scheduling of the present invention;
in the figure: siIs the real-time status of area i, aiIs a state siCorresponding action, riIs a return value, pi is a strategy, V is a state value function, theta is a parameter of an actor network in a thread, and theta isvIs a parameter of the critic network in the thread, and theta 'is a parameter of the global actor network, theta'vParameters of a global critic network;
FIG. 4 is a graph of power from photovoltaic power generation, wind power generation, and controlled load, uncontrolled load power for a random day;
in the figure: PV represents photovoltaic, WT represents wind, Controllable load represents Controllable load, and unconditionalllable load represents uncontrollable load;
FIG. 5 is a graph of the return value as a function of iteration number;
FIG. 6 is a graph comparing the generated power of wind power with the actual power;
FIG. 7 is a graph of generated power versus actual power for a photovoltaic cell;
FIG. 8 is a graph of power generated by a gas turbine versus actual power;
FIG. 9 is a graph of the optimization results for a controllable load;
FIG. 10 is a graph comparing the cost of the inventive process and the DPG process.
Detailed Description
The first embodiment is as follows: the method for economically scheduling the virtual power plant in the energy internet based on the deep reinforcement learning comprises the following steps:
step one, for any area I, collecting power generation side information and user side information from the area I by using an industrial side server and a power supply side server of the area I, wherein I is 1,2, …, and I is the total number of the areas;
respectively training the operator-critical network of the VPP operator cloud server by using the information collected in each region to respectively obtain the operator-critical network trained by using the information in each region;
step two, deploying the trained operator-critic networks at edge nodes of corresponding areas respectively;
and step three, the industrial side server and the power supply side server in each area collect information from the power generation side and the user side in real time, input the collected information into an operator-critical network on a corresponding edge node, and obtain decision information of each area in real time.
The second embodiment is as follows: the first difference between the present embodiment and the specific embodiment is: in the first step, the operator-critic network of the VPP operator cloud server is trained by using the information collected in each region, an asynchronous method is adopted, and 8 threads are run in parallel.
The third concrete implementation mode: the first difference between the present embodiment and the specific embodiment is: the objective function of the operator-critical network is as follows:
Figure BDA0002356951350000031
wherein: c is the total operating cost of the area i,
Figure BDA0002356951350000041
for the initial depreciation cost of the photovoltaic investment in region i at time slot K, K is 0,1, …, K (24 hours considered in VPP, K equals 23),
Figure BDA0002356951350000042
for the photovoltaic operation and maintenance costs of zone i at time slot k,
Figure BDA0002356951350000043
the initial depreciation cost for the wind turbines in time slot k for region i,
Figure BDA0002356951350000044
the wind turbine operating and maintenance costs for region i at time slot k,
Figure BDA0002356951350000045
the initial depreciation cost of the micro gas turbine at time slot k for zone i,
Figure BDA0002356951350000046
for the micro gas turbine operating and maintenance costs for zone i at time slot k,
Figure BDA0002356951350000047
for the micro gas turbine environmental cost of zone i at time slot k,
Figure BDA0002356951350000048
the cost of the micro gas turbine itself consumed in the time slot k for the area i, λ is the compensation factor,
Figure BDA0002356951350000049
controllable load for zone i in time slot k, xi(k) Selection of interruptible load percentage vector, x, for region i in time slot ki(k) Has a value range of [0,1 ]]。
The fourth concrete implementation mode: the first difference between the present embodiment and the specific embodiment is: the specific training process of the operator network in the operator-critical network comprises the following steps:
the actor network consists of a mu network and a sigma network, and the mu network and the sigma network consist of 2 full connection layers;
the activation functions of the 1 st full connection layer of the mu network and the sigma network are both tanh, the input dimensionality is 5, and the output dimensionality is h;
activating functions of the 2 nd full connection layer of the mu network and the sigma network are softplus, input dimensionality is h, and output dimensionality is m;
inputting the information of the power generation side and the user side into the mu network and the sigma network to obtain the output of the mu network and the sigma network; and then carrying out normal random sampling on the output of the mu network and the sigma network to obtain 4-dimensional action of the operator network output.
The fifth concrete implementation mode: the fourth difference between this embodiment and the specific embodiment is that: the specific training process of the critic network in the operator-critic network comprises the following steps:
the critic network is composed of full connection layers;
inputting the information of the power generation side and the user side and the 4-dimensional action output by the actor network into the full connection layer of the critic network, splicing the output of the full connection layer to obtain a splicing result, and performing linear transformation on the splicing result to obtain a one-dimensional return value output by the critic network.
The sixth specific implementation mode: the fifth embodiment is different from the fifth embodiment in that: the expression of the return function of the operator-critical network is as follows:
Figure BDA0002356951350000051
wherein: k1、K2、K3And K4Are weighted values.
And guiding the training of the operator network according to the return function value output by the critic network.
Edge computing is used to provide computing services on batch processing equipment near the network edge of a VPP. First, edge computation can greatly reduce data transfer from the device to the VPP operator through pre-processing. Second, the edge computing architecture can shift the computational burden to the edge. Fig. 1 shows the economic dispatch architecture proposed by the method of the present invention, which consists of four main components: a power source side server (PSS), an industrial user side server, a proxy edge server and a VPP operator cloud server. The power source side server connects the power devices through different communication technologies (e.g., 5G, WiFi). It collects and processes power generation data from distributed power equipment and transmits the data to the proxy edge server in real time. The PSS also receives scheduling information for the proxy edge server and provides power to the industrial users. The industrial user side server is connected to the power equipment through different communication technologies (e.g. 5G, WiFi). It collects and processes power consumption information of industrial users and transmits data to the proxy edge server in real time. And making a local economic dispatching decision according to the analysis results of the industrial user side server and the power supply side server, and interacting the proxy edge server with the servers on the two sides. The VPP operator cloud server meets the computing requirements of the proxy edge server and manages each proxy. It can not only help the proxy server to provide real-time analysis and computation, but also collect the scheduling information of the managed proxy.
FIG. 2 illustrates a distributed power generation economic dispatch architecture utilizing a three-tier architecture based on edge computation as proposed by the present invention. First, the VPP operator sets up agents to manage distributed power generation and industrial users in different regions. In terms of demand, the user's controllable load participates in demand response, which may reduce load demand during peak hours. In contrast to the VPP operator, each proxy is an edge compute server. The industrial customer side server and the power side server collect data from each distributed power generation unit and extract and aggregate the data in real time mode. These distributed power generation may be photovoltaic power generation, wind power generation and micro gas turbines. The proxy server provides the optimal economic dispatching strategy for the area and finally sends the decision information to the VPP operator. The proposed architecture is suitable for offline training and real-time online scheduling. First, in the offline training phase, the industrial-side server and the power supply server must process and collect information from the power generation side and the user side in a specific area, and transmit the collected information to the VPP operator cloud server. The VPP operator cloud server performs model training according to large-scale off-line data and transmits the trained model to the proxy edge server in a specific area. During real-time economic dispatching, data of industrial users and distributed power supplies are collected by the servers of the two parties and transmitted to the proxy edge server, and the proxy edge server is put into a model trained before as input to obtain a real-time dispatching strategy. The three-layer economic dispatching model is adaptive to the distributed characteristic of the power supply, and the problem of large-scale data transmission in VPP economic dispatching is solved. More flexible and adaptable to the expansion of dynamic networks, making it a more scalable solution.
The goal of economic dispatch by the VPP operator is to minimize the compensation to the industrial users and the operating costs of the DG (including photovoltaic, wind turbine and micro gas turbine). On the basis of minimizing the cost of a VPP operator, C is fully considered in the provided optimal economic scheduling algorithmpom,CwomAnd Cdom. In particular, we also consider the environmental cost C of micro gas turbinesdeAnd fuel cost Cd. In general, the initial depreciation cost of DG units is taken into consideration and defined as C, respectivelypdp,CwdpAnd Cddp. We consider the needs of the industrial users, and also include the compensation cost for the industrial users participating in the demand response, denoted Cdr. We consider industrial users as schedulable resources, participating in the economic scheduling of VPP. The proposed algorithm reduces the economic loss of VPP during peak power consumption by reducing the controllable load, which may result in load peak-to-valley shifts due to increased user flexibility. In this case, the industryThe user is equivalent to a virtual power generation resource. Therefore, in the objective function of the proposed model, the compensation cost for the demand side is added to be CdrThe compensation selects the user who shed the controllable load. The objective function consists of two parts, the first part is the running cost of the DG, and the second part is the compensation cost of the demand party and the controllable load when the system runs.
Figure BDA0002356951350000061
Where C is the total operating cost of managing DG in the VPP and industrial users. CiIs the operating cost for managing DG and industrial users in the management area i.
Figure BDA0002356951350000062
Is the operating cost of the DG in the i region,
Figure BDA0002356951350000063
is the compensatory cost of the i region for the industrial user to participate in demand response.
In a real-time scheme, the edge proxy for the VPP is denoted by i. In our proposed optimal economic dispatch model, three types of DG, photovoltaic, wind turbine and micro gas turbine, are considered. The operating costs of a DG device include the initial depreciation of the VPP, the operating and maintenance costs. Specifically, environmental protection and fuel costs of the micro gas turbine are also considered. Where k denotes a slot interval,
Figure BDA0002356951350000064
representing the actual consumption of the photovoltaic, wind turbine and micro gas turbine respectively in time slot k;
(1) photovoltaic: the initial depreciation cost of the photovoltaic investment may be expressed as
Figure BDA0002356951350000065
Wherein r is the annual rate of interest,
Figure BDA0002356951350000066
is the installation cost per unit volume of the photovoltaic cell, KpIs the photovoltaic capacity coefficient, npIs the service life of the photovoltaic.
The operating and maintenance costs of the photovoltaic will be
Figure BDA0002356951350000071
Wherein
Figure BDA0002356951350000072
Is the maintenance and operation cost of the photovoltaic, and KpomIs a photovoltaic coefficient of maintenance and operation cost.
(2) A wind power generator: the initial investment cost of the wind driven generator is converted into the output power per unit time. As a depreciation cost for a wind turbine, it has been included in the operating costs of a wind turbine
Figure BDA0002356951350000073
Wherein
Figure BDA0002356951350000074
Is the initial depreciation cost of the wind turbine,
Figure BDA0002356951350000075
is the unit installation cost of the wind turbine, KwIs the capacity factor of the wind turbine, r is the annual rate, nwIs the service life of the wind turbine.
The operating and maintenance costs of a wind turbine during operation may be expressed as
Figure BDA0002356951350000076
Wherein, KwomIs the operating cost factor of the wind turbine.
(3) A micro gas turbine: the initial depreciation cost of a micro gas turbine is modeled as:
Figure BDA0002356951350000077
wherein
Figure BDA0002356951350000078
Is the unit volume installation cost, K, of the micro gas turbinedIs the capacity coefficient, n, of a micro gas turbinedThe service life of the micro gas turbine is prolonged.
Operating and maintenance costs of micro gas turbines:
Figure BDA0002356951350000079
wherein, KdomIs the operating and maintenance cost factor of the micro gas turbine.
The environmental protection cost of the micro gas turbine is as follows:
Figure BDA00023569513500000710
where M is the pollutant emitted, M is the total number of pollutants, βmIs the treatment cost of unit pollutant m discharge amount, αdmIs the pollutant discharge amount of the micro gas turbine generating unit electricity.
The relationship function between the power generation efficiency and the output power of the micro gas turbine is as follows:
Figure BDA00023569513500000711
η thereindIs the power generation efficiency of the micro gas turbine,
Figure BDA00023569513500000712
is the output power of the micro gas turbine.
The consumption characteristic of the micro gas turbine can be expressed as (10)
Figure BDA0002356951350000081
Wherein
Figure BDA0002356951350000082
Is the cost of the fuel, cdIs the natural gas price and L is the lowest energy released by the natural gas.
According to the above description, the operating costs of a DG are as follows:
Figure BDA0002356951350000083
the demand response can effectively integrate the potential of the user side response, thereby enhancing the safety, stability and economy of the power grid operation. In this context, we consider the demand response of an industrial user during the model building process. In order to achieve the best economic dispatch strategy, each agent selects the controllable load size to be reduced. This is inconvenient for industrial users as the controllable load is reduced, for which purpose it needs to be compensated. The VPP operator should provide power compensation to the user who chooses to curtail the controllable load. Controlling a variable of controllable load to be Xi(k) And a compensation coefficient lambda. Xi(k) Is a variable derived from the power information of all industrial users in the area, defined as the percentage of the maximum interruptible controllable load in each time slot of the industrial area considering agent i, with a compensation cost at the load end of
Figure BDA0002356951350000084
This approach may reduce or reduce part of the power consumption, thereby avoiding peak loads for industrial users. The load of the industrial user is obtained and divided into controllable loads
Figure BDA0002356951350000085
And uncontrollable load
Figure BDA0002356951350000086
Since controllable loads can respond directly to economic scheduling of VPPs, participation in VPPs is a primary consideration hereinAnd the controllable load is reduced in the scheduling process. The compensation cost of the managed controllable load of agent i can be expressed as:
Figure BDA0002356951350000087
where λ is the compensation factor, xi(k) Expressed as a vector of percentage of selected interruptible load, the range of values is 0,1]. The objective function of the economic dispatch of each agent i can be expressed as:
Figure BDA0002356951350000088
for the entire VPP system, the power balance constraint is a fundamental problem and should be fully considered in the model building process. In each management area of agent i, the total power consumption of the individual DG units should be equal to the total power consumption of the industrial users. For the total power demand of an industrial user, the curtailment of the controllable load of the industrial user by the agent i, i.e. the
Figure BDA0002356951350000091
The actual power consumption of the DG in each agent management area is limited by the actual power generation in that area. The actual power of the DG is photovoltaic, wind energy and micro gas turbine
Figure BDA0002356951350000092
Respectively as follows:
Figure BDA0002356951350000093
Figure BDA0002356951350000094
Figure BDA0002356951350000095
the percentage of interruptible load in the industrial domain managed by agent i should not exceed the percentage of maximum interrupt controllable load per timeslot, i.e. the percentage of maximum interrupt controllable load per timeslot
0≤xi(k)≤Xi(k) (18)
The VPP operator manages all the regions and summarizes scheduling information of each region. Based on the above description, we define the objective function of the optimal economic scheduling policy as follows:
Figure BDA0002356951350000096
in the invention, the optimal economic dispatching strategy provided by the invention minimizes the power generation cost of the distributed power supply and simultaneously meets the limitations of power balance and power generation capacity of the VPP.
To make the solution more practical, we incorporate various cost components into the objective function. The objective function established by the invention is a non-linear cost function, although the invention does not add the constraint of non-convexity, in a real scene, the power generation unit is usually influenced by the valve point effect, and the cost function is usually non-convex. To address these difficulties, previous work has often employed heuristic methods. The deep reinforcement learning method adopted by the user can adapt to the nonlinear non-convex condition, and the nonlinear and non-convex constraints are relaxed. In a practical economic scheduling scheme, the scheduling process should generally be completed in a short time. Due to the stochastic nature of photovoltaic and wind power generation and the flexibility of the load, the state transition from the previous time slot to the next constitutes a large state space and the state information needs to be updated quickly. The DRL, as an effective artificial intelligence algorithm, has achieved great success in many areas of problem resolution, such as the internet of things, where it can find different optimization strategies within a reasonable time frame. In the invention, the provided DRL-based algorithm relaxes the constraint of nonlinear characteristics, and improves the solving precision by fitting a value function through a deep learning algorithm. The economic scheduling problem in the invention is nonlinear, the transition probability is unknown, the state space is large and continuous, and the DRL can calculate the probability distribution of state transition without environment information. The off-line training model can be directly applied to on-line economic dispatching, and the optimal economic dispatching algorithm based on the DRL provided by the invention obviously improves the calculation efficiency.
An information delivery model for the DRL based VPP economic scheduling is shown in fig. 3. The algorithm adopts an off-line data training mode, and a power supply side server and a user side server collect historical temporary data and transmit the information to a VPP cloud server. The VPP cloud server uses the DRL to train the network independently according to the data transmitted from different areas, so that economic scheduling strategies of different areas are obtained. In an online economic dispatching stage, each proxy edge server obtains a corresponding network weight value from a VPP cloud server. The power side server and the industrial customer side server gather real-time transmission information and power requirements and then transmit all the gathered information to the corresponding proxy edge server. And the proxy edge server obtains a real-time optimal economic dispatching strategy based on the historical weight and according to the real-time state information, and feeds back the result to the servers of the two parties.
The off-line training and the on-line scheduling are respectively realized at different nodes. Firstly, completely training a model based on offline data in a cloud center. Then, the proposed DRL-based method is combined with edge calculation, and the trained model is placed at the edge node, so that the model can be applied online in a real environment. If there is slight variation in the online and offline training environments at this time, the model trained offline can learn these variations by default and dynamically adjust the actions to achieve optimal scheduling. During online scheduling, the distributed power generation data and the demand data of the industrial users can be directly transmitted to the edge nodes without being transmitted to the cloud center, and the method is more suitable for real-time economic scheduling scenes.
For VPP we consider 24 hours, denoted by k ∈ (0,1, …, 23). The goal of economic scheduling is to find an optimal economic scheduling solution to minimize the operating cost of the VPP. For region i, the state is set to Si,si∈Si,
Figure BDA0002356951350000101
The power supply side server and the industrial user side server are aggregated to respectively represent the photovoltaic power, the wind power, the actual power generation capacity of the micro gas turbine, the load controlled by an industrial user and the uncontrollable load demand in a time slot k. Action set Ai,ai∈Ai,
Figure BDA0002356951350000102
Respectively representing the actual power consumption of photovoltaic power generation, wind power generation and the micro gas turbine in the time slot k, and the control coefficient of the controllable load. A is a continuous motion space satisfying a power balance constraint, aiIs a selected action that satisfies the action constraint.
In any slot, we introduce a policy of pi in order to find a mapping from state to action. The policy represents a conditional probability distribution for each action given the current state. The next state is represented as s'iThe initial state is represented as s0 i. Namely, it is
Figure BDA0002356951350000103
In practical cases, the state transition probability is unknown, and the state space and the behavior space are continuous. When s is knowni,aiThen, a return value r related to the objective function can be obtainedi(si,ai). The reported value is a key component for evaluating the quality of the action and guiding the effect of the learning process. For better setting of the reward value, the reward value is set as a function related to the cost through repeated experiments, and the specific setting of the reward value is explained in detail below:
Figure BDA0002356951350000111
wherein K1,K2,K3,K4Is the set weight value. The return value is negative because the cost of the virtual power plant is to be minimized. The total reported value of K hours can be obtained as follows:
Figure BDA0002356951350000112
to maximize the return value, we use the gradient ascent method to update the strategy in the proposed algorithm, i.e. we use
Figure BDA0002356951350000113
From (23), the state value function V can be obtainedπ(si) Sum state contribution function Qπ(si,ai) And gamma is a discount factor representing the discount rate of the return value.
Figure BDA0002356951350000114
Figure BDA0002356951350000115
The goal is to select the best strategy and maximize the state effort function, which is expressed as follows:
Figure BDA0002356951350000116
in order to find the optimal economic dispatch strategy, it is usually considered to represent the function by using a data table. However, this approach limits the scale of the reinforcement learning algorithm. When the size of the problem is too large, the storage space for storing the table may be large, and it takes a long time to accurately calculate each value in the table. If learning experience is obtained from a small training data set, the generalization ability of the training pattern is insufficient. In order to solve the above problem, a state value function and a state action value function are parameterized using a deep neural network in consideration of a large-scale state action space. In the algorithm provided by the invention, the deep neural network is used for extracting the characteristics of large-scale input state data to train the economic dispatching model, so that the trained model is more generalized. Starting from the first layer of neurons, the mind is entered by a non-linear activation functionAnd continuously transmitting downwards through the next layer of the element until reaching an output layer. Since the nonlinear function is essential for the deep neural network, the deep neural network has sufficient capability to extract data features. ThetavFor approximating the function V(s) of state valuesi) And the state function Q(s)i,ai)。
Q(si,ai)≈Q(si,aiv) (26)
V(si)≈V(siv) (27)
The deep neural network is used as a function approximator, and the parameter theta of the deep neural network is a strategy parameter. Pi obeys Gaussian distribution and can be used to solve the continuous motion space problem, i.e.
Figure BDA0002356951350000121
The value of each slot return for each zone i is given at (20), so
Figure BDA0002356951350000122
In our scenario, to increase the probability of a policy with a higher reward value, we perform an update of the policy gradient, the gradient update calculated as:
Figure BDA0002356951350000123
wherein R isiIs the total reported value in region i and is represented by Q(s)i,ai) Estimation, i.e. Ri≈Q(si,ai)。b(si) Is a baseline for reducing estimation errors. V(s)i) For estimating the baseline, i.e. b(s)i)≈V(si)。
Aπ(si,ai;θ,θv)=Qπ(si,aiv)-Vπ(siv) (31)
Equation (31) is an advantage function, representing the advantage of the action value function over the cost function. The merit function is positive if the action value function is greater than the value function, and negative if the action value function is smaller. The parameters are updated in a direction that increases the strategic probability when the dominance function is positive, and in a direction that decreases the strategic probability when the dominance function is negative. Therefore, the convergence speed of the algorithm is faster when the dominance function is employed.
Figure BDA0002356951350000124
The policy gradient is updated as:
Figure BDA0002356951350000125
parameter thetavThe updates of θ are:
Figure BDA0002356951350000126
Figure BDA0002356951350000127
in order to make the training strategy more adaptive and prevent premature convergence to a suboptimal deterministic strategy, entropy regularization is added to the strategy gradient, i.e.
Figure BDA0002356951350000128
Figure BDA0002356951350000129
When the neural network training is carried out, required data are independently and simultaneously distributed, in order to break the correlation between the data, an asynchronous method is adopted, a plurality of threads can be operated in parallel, and each thread has an own environment copy. During the training process, a plurality of threads maintain a global operator-critical networkAnd each thread keeps a copy of the local network weight values for the global network. The local network accumulates gradient updates and passes the gradients to the global network for parameter updates. The local network will then synchronize the parameters in the global network. The local network can not only update its own independent network by learning the environment status, but also interact with the global network. We define the global shared parameter vector as θ 'and θ'v
Figure BDA0002356951350000131
Figure BDA0002356951350000132
In this sense, each zone achieves the best economic dispatch. In the numerical part of the offline training process, we implement 8 threads, the VPP operator communicates with each region and computes C. Based on the algorithm, an economic dispatch model for region i can be obtained. In the online scheduling phase, each proxy edge server first obtains a corresponding network weight value from a VPP cloud server, i.e., proxy i. The DRL-based economic dispatch model is shown in FIG. 3.
Experimental part
To train the DRL-based economic dispatch model, we train load data from photovoltaic, wind, micro gas turbines and industrial users with an offline dataset. Fig. 4 shows the power of photovoltaic power generation and wind power generation, and the power of controllable load and uncontrollable load in a random day. Wherein the maximum power of the micro gas turbine is set to 200 kw. Since the industrial load is mainly a variety of industrial processes, the power demand generally does not vary much, without particularly significant peak-to-valley differences. The periods of higher load demand are 9.00-10.00, 12.00-14.00 and 19.00-21.00, and the periods of lower load demand are 1.00-5.00. It can be seen that the photovoltaic power generation and the wind power generation have larger peak-valley difference, the peak time of the photovoltaic power generation is 10.00-16.00, and the peak time of the wind power generation is 10.00-18.00. The power of photovoltaic power generation and wind power generation in one day, the power consumption of controllable load and uncontrollable load are randomly generated.
The emission costs of pollution and the operating and maintenance costs of photovoltaic, wind power generation and micro gas turbines are listed in tables 1 and 2.
TABLE 1
Figure BDA0002356951350000133
TABLE 2
Figure BDA0002356951350000134
Figure BDA0002356951350000141
The structure of the neural network in the DRL-based algorithm used in the present invention is described in detail below. The state is expressed as 5-dimensional vector expression, the finally obtained action has 4 dimensions, the action is obtained by normal distribution random sampling according to the state, and the neural network is adopted to calculate mu and sigma parameters required by normal distribution. The states are input into the mu network and the sigma network, respectively, resulting in 4-dimensional mu and sigma parameters. The mu network is composed of 2 MLP layers, the input dimension of the first layer is 5, the output dimension is h, and tan h is used for activation; the second layer inputs dimension h and outputs dimension m is activated using softplus. The sigma network also comprises 2 MLP layers, the input dimension of the first layer is 5, tan h is used for activation, the input dimension is input into a two-layer neural network, the output dimension is 4, softplus is used for activation, and in order to ensure that the sigma network does not output 0,1 multiplied by 10 is added to an output sigma vector-6. Thereafter, 4-dimensional motion is randomly sampled by positive-phase distribution. From the states and actions, the Q value is calculated by using a critic network. In the criticc network, the state is encoded by using one MLP, 5 dimensions are input, and activation is performed by using tanh. The actions are encoded using another MLP, input dimension 5, and activated using tanh. And splicing the two coded outputs to use a linear change output fraction, wherein the final output dimension is 1. For an operator-critical network, it implements twoThe discount coefficient of the neural network is 0.90, and the entropy weight is 0.01. Typically, an actor update is generated in return for critic, which is faster than the actor. The convergence speed is faster as the learning rate increases. However, a higher learning rate may result in a local optimum rather than a global optimum. Therefore, we set the learning rate to be moderate.
In the invention, numerical experiments are carried out on an 8-core CPU and 16GB memory computer. The number of threads is 8, i.e. each local operator and critical network corresponds to one sub-thread, for a total of 8 threads. The environment is asynchronously learned through the child threads, and the learning result is regularly updated to the global network. There are many random choices at the beginning of learning, but through multiple iterations, the economic dispatch model converges and selects the action that optimizes the objective. We train the optimal economic scheduling strategy using the offline data set. The main advantage of DRL is that the model can be applied online in a real environment after such offline data is fully trained. This online environment changes slightly, and the DRL model can learn about these changes and dynamically adjust the actions to achieve optimal scheduling.
In order to verify the convergence of the algorithm, 100-day data is sampled as training data, each epsilon runs for any one of 100 days, and after 4.5 ten thousand eposides are run, the model can generate the optimal action. There are 24 steps per epsilon, where each step is an hour, and the iterative process is shown in fig. 5. The actions are obtained by random sampling in a normal distribution according to the state. We can see that the algorithm has a large fluctuation in the first 3 ten thousand episodies, mainly due to the randomness of policy selection, and is therefore constantly being explored. But the fluctuation interval is approximately between-300 and-400 due to the constraints of the action interval and the equality constraints. After 32000 epistates are trained, there is a good breakthrough in training, as the model learns how to select the optimal actions. From 35000 epistates, the model began to converge gradually. The training results show that the proposed model can minimize the cost of a fully trained VPP operator. Although there are many random choices, many iterations, at the beginning of learning, the deep reinforcement learning model can converge and learn to choose an action that is close to the optimal target value.
In a virtual power plant, compared with a micro gas turbine, photovoltaic power generation and wind power generation are lower in cost and more environment-friendly, and the training strategy mainly takes wind power photovoltaic power generation as a main strategy. The load is therefore mainly powered by wind photovoltaic, the remaining part being supplemented by gas turbines or curtailed to a controllable load by demand response. Wherein, fig. 6, fig. 7 and fig. 8 are the comparison of the generated power of the wind power, the photovoltaic power and the gas turbine with the actual power consumption, the dark gray is the generated power, the light gray is the actual power consumption, the horizontal axis is time, the unit hour and the vertical axis is power. As can be seen from fig. 6 and 7, the difference between the actual power generation amount and the final power consumption of the wind power generation and the photovoltaic power generation is approximately 0, and the actual power output of the photovoltaic power generation and the wind power generation is small at 1.00 to 7.00 and 23.00 to 24.00 per hour. The load at this time needs to be powered by a micro gas turbine. As can be seen from fig. 7, 1.00-7.00 and 23.00-24.00, micro gas turbines are the main power supply units. As can be seen from fig. 9, at 20.00-24.00, this time period has a high weight to controllable load shedding, almost all shedding, due to the large electricity demand of the industrial user and the high cost of the gas turbine. Therefore, it can be concluded that, by using the algorithm proposed by the inventor to minimize the cost of the virtual power plant, the early learning stage is relatively random under the preset return value, and in the training process, the model learns the correct strategy selection along with the time, so as to minimize the cost of the virtual power plant by stably controlling the distributed power generation and the demand response.
To verify the effectiveness of the proposed method, we compared the proposed algorithm with other reinforcement learning algorithms. The method of the invention is compared with a deterministic policy gradient algorithm (DPG) which can solve this continuous action space problem. The results are shown in fig. 10, with the light gray curve being DPG and the dark gray curve being our proposed DRL-based algorithm. Comparing the costs of DPG and our proposed DRL-based algorithm over 30 days, it can be seen from the figure that by comparing the costs of the two methods, it can be seen that the cost of our proposed method is significantly lower from day 22 onwards. Compared with the method proposed by the inventor, the DPG uses the return value at the current moment as the unbiased estimation of the action state function under the current strategy, so that the obtained strategy has higher variance, small generalization and instability in some cases. Our proposed method uses a neural network to fit the action value state function, resulting in smaller variance by subtracting baseline. To break the correlation between data, an asynchronous update mechanism is used to create multiple parallel contexts because the parallelism will not interfere with each other, allowing the child threads to simultaneously update the parameters of the primary network in the parallel contexts.
TABLE 3
Figure BDA0002356951350000151
Figure BDA0002356951350000161
We set the epsilon to 4.5 ten thousand, comparing the run times of the different methods, compared to DDPG and DPG. As can be seen from table 3, compared with different deep reinforcement learning methods adapted to solve the economic dispatch of the virtual power plant, the time complexity of the method proposed by us is the lowest. Because each epsilon time is several milliseconds, in a virtual power plant real-time economic dispatching scene, a decision can be made within several milliseconds according to state input. The traditional heuristic method needs to re-run the optimization process for each state, and the time complexity is higher.
The invention is suitable for the random characteristic of distributed renewable energy power generation and provides a VPP optimal economic scheduling algorithm based on deep reinforcement learning. We further utilize a framework based on edge computation so that the optimal scheduling solution can be achieved with lower computational complexity. The performance of the algorithm proposed by us is evaluated by using real-world meteorological and load data, and experimental results show that the DRL-based model proposed by us can successfully learn the characteristics of distributed power generation and industrial user requirements in the economic scheduling problem of the virtual power plant and learn to select actions to minimize the cost of the virtual power plant. By comparison with DPG, the method we propose has better performance. By comparison with DPG and DDPG, we propose a method with lower time complexity.
The above-described calculation examples of the present invention are merely to explain the calculation model and the calculation flow of the present invention in detail, and are not intended to limit the embodiments of the present invention. It will be apparent to those skilled in the art that other variations and modifications of the present invention can be made based on the above description, and it is not intended to be exhaustive or to limit the invention to the precise form disclosed, and all such modifications and variations are possible and contemplated as falling within the scope of the invention.

Claims (6)

1. The method for economically scheduling the virtual power plant in the energy internet based on deep reinforcement learning is characterized by comprising the following steps:
step one, for any area I, collecting power generation side information and user side information from the area I by using an industrial side server and a power supply side server of the area I, wherein I is 1,2, …, and I is the total number of the areas;
respectively training the operator-critical network by using the information collected by each region to respectively obtain the operator-critical network trained by using the information of each region;
step two, deploying the trained operator-critic networks at edge nodes of corresponding areas respectively;
and step three, the industrial side server and the power supply side server in each area collect information from the power generation side and the user side in real time, input the collected information into an operator-critical network on a corresponding edge node, and obtain decision information of each area in real time.
2. The deep reinforcement learning-based economic scheduling method for the virtual power plant in the energy internet according to claim 1, wherein in the first step, the operator-critic network of the VPP operator cloud server is trained by using the information collected in each region, an asynchronous method is adopted, and 8 threads are run in parallel.
3. The deep reinforcement learning-based economic dispatching method for virtual power plants in the energy Internet according to claim 1, wherein the objective function of the operator-critic network is as follows:
Figure FDA0002356951340000011
wherein: c is the total operating cost of the area i,
Figure FDA0002356951340000012
the initial depreciation cost for the photovoltaic investment in region i at time slot K, K being 0,1, …, K,
Figure FDA0002356951340000013
for the photovoltaic operation and maintenance costs of zone i at time slot k,
Figure FDA0002356951340000014
the initial depreciation cost for the wind turbines in time slot k for region i,
Figure FDA0002356951340000015
the wind turbine operating and maintenance costs for region i at time slot k,
Figure FDA0002356951340000016
the initial depreciation cost of the micro gas turbine at time slot k for zone i,
Figure FDA0002356951340000017
for the micro gas turbine operating and maintenance costs for zone i at time slot k,
Figure FDA0002356951340000018
for the micro gas turbine environmental cost of zone i at time slot k,
Figure FDA0002356951340000019
the cost of the micro gas turbine itself consumed in the time slot k for the area i, λ is the compensation factor,
Figure FDA00023569513400000110
controllable load for zone i in time slot k, xi(k) Selection of interruptible load percentage vector, x, for region i in time slot ki(k) Has a value range of [0,1 ]]。
4. The deep reinforcement learning-based economic scheduling method for virtual power plants in energy Internet as claimed in claim 1, wherein the specific training process of the operator network in the operator-critic network is as follows:
the actor network consists of a mu network and a sigma network, and the mu network and the sigma network consist of 2 full connection layers;
the activation functions of the 1 st full connection layer of the mu network and the sigma network are both tanh, the input dimensionality is 5, and the output dimensionality is h;
activating functions of the 2 nd full connection layer of the mu network and the sigma network are softplus, input dimensionality is h, and output dimensionality is m;
inputting the information of the power generation side and the user side into the mu network and the sigma network to obtain the output of the mu network and the sigma network; and then carrying out normal random sampling on the output of the mu network and the sigma network to obtain 4-dimensional action of the operator network output.
5. The deep reinforcement learning-based economic dispatching method for virtual power plants in the energy Internet according to claim 4, wherein the specific training process of the critic network in the operator-critic network is as follows:
the critic network is composed of full connection layers;
inputting the information of the power generation side and the user side and the 4-dimensional action output by the actor network into the full connection layer of the critic network, splicing the output of the full connection layer to obtain a splicing result, and performing linear transformation on the splicing result to obtain a one-dimensional return value output by the critic network.
6. The deep reinforcement learning-based economic dispatching method for virtual power plants in the energy Internet according to claim 5, wherein the return function of the operator-critic network has an expression as follows:
Figure FDA0002356951340000021
wherein: k1、K2、K3And K4Are weighted values.
CN202010010410.XA 2020-01-06 2020-01-06 Deep reinforcement learning-based economic dispatching method for virtual power plant in energy internet Active CN111242443B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010010410.XA CN111242443B (en) 2020-01-06 2020-01-06 Deep reinforcement learning-based economic dispatching method for virtual power plant in energy internet

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010010410.XA CN111242443B (en) 2020-01-06 2020-01-06 Deep reinforcement learning-based economic dispatching method for virtual power plant in energy internet

Publications (2)

Publication Number Publication Date
CN111242443A true CN111242443A (en) 2020-06-05
CN111242443B CN111242443B (en) 2023-04-18

Family

ID=70876028

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010010410.XA Active CN111242443B (en) 2020-01-06 2020-01-06 Deep reinforcement learning-based economic dispatching method for virtual power plant in energy internet

Country Status (1)

Country Link
CN (1) CN111242443B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111738627A (en) * 2020-08-07 2020-10-02 中国空气动力研究与发展中心低速空气动力研究所 Wind tunnel test scheduling method and system based on deep reinforcement learning
CN112381359A (en) * 2020-10-27 2021-02-19 惠州蓄能发电有限公司 Multi-critic reinforcement learning power economy scheduling method based on data mining
CN113191680A (en) * 2021-05-21 2021-07-30 上海交通大学 Self-adaptive virtual power plant distributed architecture and economic dispatching method thereof
CN113315172A (en) * 2021-05-21 2021-08-27 华中科技大学 Distributed source load data scheduling system of electric heating comprehensive energy
CN113962390A (en) * 2021-12-21 2022-01-21 中国科学院自动化研究所 Method for constructing diversified search strategy model based on deep reinforcement learning network
CN114244679A (en) * 2021-12-07 2022-03-25 国网福建省电力有限公司经济技术研究院 Layered control method for communication network of virtual power plant under cloud-edge-end architecture
CN114301909A (en) * 2021-12-02 2022-04-08 阿里巴巴(中国)有限公司 Edge distributed management and control system, method, equipment and storage medium
CN114862177A (en) * 2022-04-29 2022-08-05 国网江苏省电力有限公司南通供电分公司 Energy interconnection energy storage and distribution method and system
CN115062869A (en) * 2022-08-04 2022-09-16 国网山东省电力公司东营供电公司 Comprehensive energy scheduling method and system considering carbon emission
CN116111599A (en) * 2022-09-08 2023-05-12 贵州电网有限责任公司 Intelligent power grid uncertainty perception management control method based on interval prediction

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103824134A (en) * 2014-03-06 2014-05-28 河海大学 Two-stage optimized dispatching method for virtual power plant
US20170024643A1 (en) * 2015-07-24 2017-01-26 Google Inc. Continuous control with deep reinforcement learning
CN108604310A (en) * 2015-12-31 2018-09-28 威拓股份有限公司 Method, controller and the system of distribution system are controlled for using neural network framework
CN109976909A (en) * 2019-03-18 2019-07-05 中南大学 Low delay method for scheduling task in edge calculations network based on study
US20190318244A1 (en) * 2019-06-27 2019-10-17 Intel Corporation Methods and apparatus to provide machine programmed creative support to a user
CN110443447A (en) * 2019-07-01 2019-11-12 中国电力科学研究院有限公司 A kind of method and system learning adjustment electric power system tide based on deeply

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103824134A (en) * 2014-03-06 2014-05-28 河海大学 Two-stage optimized dispatching method for virtual power plant
US20170024643A1 (en) * 2015-07-24 2017-01-26 Google Inc. Continuous control with deep reinforcement learning
CN108604310A (en) * 2015-12-31 2018-09-28 威拓股份有限公司 Method, controller and the system of distribution system are controlled for using neural network framework
CN109976909A (en) * 2019-03-18 2019-07-05 中南大学 Low delay method for scheduling task in edge calculations network based on study
US20190318244A1 (en) * 2019-06-27 2019-10-17 Intel Corporation Methods and apparatus to provide machine programmed creative support to a user
CN110443447A (en) * 2019-07-01 2019-11-12 中国电力科学研究院有限公司 A kind of method and system learning adjustment electric power system tide based on deeply

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
NAT. ENERGY: "Using peer-to-peer energy-trading platforms to incentivize prosumers to form federated power plants" *
陈春武: "智能电网环境下虚拟电厂经济运行模型研究" *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111738627B (en) * 2020-08-07 2020-11-27 中国空气动力研究与发展中心低速空气动力研究所 Wind tunnel test scheduling method and system based on deep reinforcement learning
CN111738627A (en) * 2020-08-07 2020-10-02 中国空气动力研究与发展中心低速空气动力研究所 Wind tunnel test scheduling method and system based on deep reinforcement learning
CN112381359A (en) * 2020-10-27 2021-02-19 惠州蓄能发电有限公司 Multi-critic reinforcement learning power economy scheduling method based on data mining
CN113191680A (en) * 2021-05-21 2021-07-30 上海交通大学 Self-adaptive virtual power plant distributed architecture and economic dispatching method thereof
CN113315172A (en) * 2021-05-21 2021-08-27 华中科技大学 Distributed source load data scheduling system of electric heating comprehensive energy
CN113191680B (en) * 2021-05-21 2023-08-15 上海交通大学 Self-adaptive virtual power plant distributed architecture and economic dispatching method thereof
CN114301909A (en) * 2021-12-02 2022-04-08 阿里巴巴(中国)有限公司 Edge distributed management and control system, method, equipment and storage medium
CN114301909B (en) * 2021-12-02 2023-09-22 阿里巴巴(中国)有限公司 Edge distributed management and control system, method, equipment and storage medium
CN114244679A (en) * 2021-12-07 2022-03-25 国网福建省电力有限公司经济技术研究院 Layered control method for communication network of virtual power plant under cloud-edge-end architecture
CN113962390B (en) * 2021-12-21 2022-04-01 中国科学院自动化研究所 Method for constructing diversified search strategy model based on deep reinforcement learning network
CN113962390A (en) * 2021-12-21 2022-01-21 中国科学院自动化研究所 Method for constructing diversified search strategy model based on deep reinforcement learning network
CN114862177A (en) * 2022-04-29 2022-08-05 国网江苏省电力有限公司南通供电分公司 Energy interconnection energy storage and distribution method and system
CN115062869A (en) * 2022-08-04 2022-09-16 国网山东省电力公司东营供电公司 Comprehensive energy scheduling method and system considering carbon emission
CN116111599A (en) * 2022-09-08 2023-05-12 贵州电网有限责任公司 Intelligent power grid uncertainty perception management control method based on interval prediction

Also Published As

Publication number Publication date
CN111242443B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
CN111242443B (en) Deep reinforcement learning-based economic dispatching method for virtual power plant in energy internet
Lin et al. Deep reinforcement learning for economic dispatch of virtual power plant in internet of energy
CN112615379B (en) Power grid multi-section power control method based on distributed multi-agent reinforcement learning
Zeng et al. A potential game approach to distributed operational optimization for microgrid energy management with renewable energy and demand response
Li et al. Coordinated load frequency control of multi-area integrated energy system using multi-agent deep reinforcement learning
JP7261507B2 (en) Electric heat pump - regulation method and system for optimizing cogeneration systems
Du et al. Distributed MPC for coordinated energy efficiency utilization in microgrid systems
Niknam et al. A new multi-objective reserve constrained combined heat and power dynamic economic emission dispatch
Xi et al. A wolf pack hunting strategy based virtual tribes control for automatic generation control of smart grid
CN108039737B (en) Source-grid-load coordinated operation simulation system
Du et al. Game-theoretic formulation of power dispatch with guaranteed convergence and prioritized bestresponse
Xi et al. Automatic generation control based on multiple-step greedy attribute and multiple-level allocation strategy
CN111934360B (en) Virtual power plant-energy storage system energy collaborative optimization regulation and control method based on model predictive control
CN106026084B (en) A kind of AGC power dynamic allocation methods based on virtual power generation clan
Zhang et al. A cyber-physical-social system with parallel learning for distributed energy management of a microgrid
Xi et al. A deep reinforcement learning algorithm for the power order optimization allocation of AGC in interconnected power grids
CN114744687A (en) Energy regulation and control method and system of virtual power plant
CN114331059A (en) Electricity-hydrogen complementary park multi-building energy supply system and coordinated scheduling method thereof
CN115409431A (en) Distributed power resource scheduling method based on neural network
Bi et al. Real-time energy management of microgrid using reinforcement learning
CN115795992A (en) Park energy Internet online scheduling method based on virtual deduction of operation situation
Yin et al. Deep Stackelberg heuristic dynamic programming for frequency regulation of interconnected power systems considering flexible energy sources
Riemer-Sørensen et al. Deep reinforcement learning for long term hydropower production scheduling
CN111767621A (en) Multi-energy system optimization scheduling method based on knowledge migration Q learning algorithm
CN117117878A (en) Power grid demand side response potential evaluation and load regulation method based on artificial neural network and multi-agent reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant