CN117117878A - Power grid demand side response potential evaluation and load regulation method based on artificial neural network and multi-agent reinforcement learning - Google Patents

Power grid demand side response potential evaluation and load regulation method based on artificial neural network and multi-agent reinforcement learning Download PDF

Info

Publication number
CN117117878A
CN117117878A CN202310777311.8A CN202310777311A CN117117878A CN 117117878 A CN117117878 A CN 117117878A CN 202310777311 A CN202310777311 A CN 202310777311A CN 117117878 A CN117117878 A CN 117117878A
Authority
CN
China
Prior art keywords
reinforcement learning
load
neural network
electricity
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310777311.8A
Other languages
Chinese (zh)
Inventor
张佳雯
张�成
蔡文嘉
何行
董重重
肖燕婷
田猛
张芹
魏解
吴明珍
张蕾
吴悠
冉艳春
胡亚天
王兹玥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Metering Center of State Grid Hubei Electric Power Co Ltd
Original Assignee
Wuhan University WHU
Metering Center of State Grid Hubei Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU, Metering Center of State Grid Hubei Electric Power Co Ltd filed Critical Wuhan University WHU
Priority to CN202310777311.8A priority Critical patent/CN117117878A/en
Publication of CN117117878A publication Critical patent/CN117117878A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0463Neocognitrons
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06315Needs-based resource requirements planning or analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/007Arrangements for selectively connecting the load or loads to one or several among a plurality of power lines or power sources
    • H02J3/0075Arrangements for selectively connecting the load or loads to one or several among a plurality of power lines or power sources for providing alternative feeding paths between load and source according to economic or energy efficiency considerations, e.g. economic dispatch
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/008Circuit arrangements for ac mains or ac distribution networks involving trading of energy or energy transmission rights
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/04Circuit arrangements for ac mains or ac distribution networks for connecting networks of the same frequency but supplied from different sources
    • H02J3/06Controlling transfer of power between connected networks; Controlling sharing of load between connected networks
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/12Circuit arrangements for ac mains or ac distribution networks for adjusting voltage in ac networks by changing a characteristic of the network load
    • H02J3/14Circuit arrangements for ac mains or ac distribution networks for adjusting voltage in ac networks by changing a characteristic of the network load by switching loads on to, or off from, network, e.g. progressively balanced loading
    • H02J3/144Demand-response operation of the power transmission or distribution network
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/10Power transmission or distribution systems management focussing at grid-level, e.g. load flow analysis, node profile computation, meshed network optimisation, active network management or spinning reserve management
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2310/00The network for supplying or distributing electric power characterised by its spatial reach or by the load
    • H02J2310/50The network for supplying or distributing electric power characterised by its spatial reach or by the load for selectively controlling the operation of the loads
    • H02J2310/56The network for supplying or distributing electric power characterised by its spatial reach or by the load for selectively controlling the operation of the loads characterised by the condition upon which the selective controlling is based
    • H02J2310/58The condition being electrical
    • H02J2310/60Limiting power consumption in the network or in one section of the network, e.g. load shedding or peak shaving
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2310/00The network for supplying or distributing electric power characterised by its spatial reach or by the load
    • H02J2310/50The network for supplying or distributing electric power characterised by its spatial reach or by the load for selectively controlling the operation of the loads
    • H02J2310/56The network for supplying or distributing electric power characterised by its spatial reach or by the load for selectively controlling the operation of the loads characterised by the condition upon which the selective controlling is based
    • H02J2310/62The condition being non-electrical, e.g. temperature
    • H02J2310/64The condition being economic, e.g. tariff based load management

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Power Engineering (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Tourism & Hospitality (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Marketing (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Primary Health Care (AREA)
  • Water Supply & Treatment (AREA)
  • Public Health (AREA)
  • Educational Administration (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application relates to a power grid demand side response potential evaluation and load regulation method based on artificial neural network and multi-agent reinforcement learning, which comprises the following steps: step S1, clustering the customer electricity consumption data by using an improved k-means clustering algorithm to generate electricity consumption behavior labels; s2, establishing a part of observable Markov game model; step S3, building and training a neural network model of the multi-layer perceptron; and S4, solving the constructed load regulation model by utilizing multi-agent reinforcement learning, and outputting an optimal time-sharing electricity price establishment proposal and regulation load scheme. The application improves the traditional reinforcement learning modeling method and algorithm, and utilizes the core framework of multi-agent reinforcement learning Centralized Training and Distributed Execution (CTDE) to assist the grid company to formulate a proper time-of-use electricity price strategy.

Description

Power grid demand side response potential evaluation and load regulation method based on artificial neural network and multi-agent reinforcement learning
Technical Field
The application relates to the field of power grid information, in particular to a power grid demand side response potential evaluation and load regulation method based on artificial neural network and multi-agent reinforcement learning.
Background
The concept of power demand side management was first proposed by the united states in the last 70 th century and gradually extended to other developed countries in the western world in the 80 s. The method is characterized in that on the premise of ensuring the power service level in the power industry, a series of measures are taken to guide a user to scientifically and reasonably use electricity, so that the utilization efficiency of electric energy is improved, and the electricity consumption management activities of protecting the environment and reducing the power service cost are realized.
With the continuous development of smart grid technology, demand Response (DR) plays an increasingly important role in the economic and stable operation of a power grid, and a Demand Response strategy can improve the power supply reliability of the power distribution network. In recent years, the power demand in various places in the country is kept to be fast growing, the shortage of power occurs, and the running pressure of the power grid is continuously increased, so that the power consumption behavior characteristics of a user side are more focused in an intelligent power consumption environment, the response will and potential of the user are mined, price signals are pertinently formulated or incentive measures are implemented to encourage the power grid user to voluntarily participate in response activities, and the reasonable power consumption of the user can be scientifically guided, so that the purpose of relieving the contradiction between power supply and demand is achieved.
Disclosure of Invention
The embodiment of the application aims to provide a power grid demand side response potential assessment and load regulation method based on artificial neural network and multi-agent reinforcement learning, which can automatically mine and extract hidden information in power load data of a customer, forecast willingness and potential of the customer to participate in demand side response, utilize a core framework of multi-agent reinforcement learning 'centralized training, distributed execution (CTDE)' to assist a power grid company to formulate a proper time-of-use electricity price strategy, regulate and control loads, guide the customer to reasonably arrange power consumption, improve the power grid demand side management efficiency, further improve the peak clipping and valley filling capacity of the power grid, further relieve the power supply and demand relation pressure, and ensure the safe operation and reasonable planning of a power system.
In order to achieve the above purpose, the present application provides the following technical solutions:
the embodiment of the application provides a power grid demand side response potential evaluation and load regulation method based on artificial neural network and multi-agent reinforcement learning, which is characterized by comprising the following steps:
step S1, clustering the customer electricity consumption data by using an improved k-means clustering algorithm to generate electricity consumption behavior labels, and clustering the customer data into three categories of no peak, single peak and multiple peaks;
step S2, modeling the clustered non-peak, single-peak and multi-peak clients as three independent agents in the multi-agent reinforcement learning model, and establishing a partially observable Markov game model;
step S3, a multi-layer perceptron neural network model is built and trained, 96-point work daily load data and electricity behavior labels of three types of clients are used as input, and implicit mapping relations between the input data and response will and potential of participation demands of the clients are mined;
and S4, solving the constructed load regulation model by utilizing multi-agent reinforcement learning, and outputting an optimal time-sharing electricity price establishment proposal and regulation load scheme.
Said step S1 comprises the sub-steps of:
step S11, determining an initial clustering center, sorting the total power loads of all users in a sample set, uniformly dividing the total power loads into K classes, and calculating the average value of the sample loads in each class as the initial clustering center of the class;
step S12, calculating distances from all samples to K clustering centers, dividing all samples into different categories according to the nearest distances, and recalculating and updating the clustering centers;
step S13, repeating step S12 until the cluster center is not changed.
Said step S2 comprises the sub-steps of:
step S21, modeling the clients of the three clustering results of no peak, single peak and multiple peaks obtained in the step S1 as an independent agent in reinforcement learning,
step S22, taking the customer electricity load, electricity behavior characteristics, demand response potential, real-time electricity price, weather state and the like as Markov states; time-of-use electricity prices, interruptible loads, adjustable loads, etc. as markov actions; and feeding back the negative value of the total electricity consumption cost of each type of user as a reward item to each type of intelligent agent, and training the deep reinforcement learning game model based on the artificial neural network.
Said step S3 comprises the sub-steps of:
step S31, a multi-layer perceptron MLP neural network model is built, and parameters such as the number of hidden layers, the number of neurons, a training function, the maximum iteration number, a loss function and the like of the network are set;
and step S32, training an MLP neural network model, wherein training data comprises input samples and real response labels, the input samples are the 96-point work day load data of the clients and the power consumption behavior labels obtained in the step S1, the real response labels comprise response will and response potential, and the real response labels of the users are obtained from the early development of demand response work of the power grid company.
Said step S4 comprises the sub-steps of:
s41, constructing a demand side load regulation model of win-win of a power grid company and a power customer;
and step S42, utilizing a multi-agent reinforcement learning centralized training, and solving the Markov game model in step S22 and the load transfer model in step S41 by using a core framework for performing CTDE in a distributed manner, outputting an optimal time-of-use electricity price scheme, and assisting a power grid company to formulate a proper time-of-use electricity price to guide a user to participate in peak clipping and valley filling.
Compared with the prior art, the application has the beneficial effects that:
(1) The traditional k-means clustering algorithm is improved, and an initial clustering center selection strategy is optimized according to the power grid customer power load data characteristics, so that a clustering effect is more accurate;
(2) The traditional reinforcement learning modeling method and algorithm are improved, and a multi-agent reinforcement learning Centralized Training and Distributed Execution (CTDE) core framework is utilized to assist a grid company to formulate a proper time-of-use electricity price strategy.
(3) The method has the advantages that the prediction and evaluation of the customer demand response potential and the optimal time-sharing electricity price and the load regulation and control solution are unified, the capacity of automatically predicting the customer participation demand response according to the electricity load data is realized, and meanwhile, the optimal time-sharing electricity price and the optimal load regulation and control scheme are output, so that the pressure of the power supply and demand relation is further relieved, and the safe operation and reasonable planning of the power grid are guaranteed.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and should not be considered as limiting the scope, and other related drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is an overall flow chart of an embodiment of the present application.
Fig. 2 is a multi-layer perceptron (MLP) neural network model of an embodiment of the present application.
FIG. 3 is a simplified diagram of a power grid demand side response potential evaluation and load regulation method solving process based on artificial neural network and multi-agent reinforcement learning according to an embodiment of the application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application. It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
Fig. 1 is an overall flow chart of the technical solution of the present application. The embodiment of the application relates to a power grid demand side response potential evaluation and load regulation method based on artificial neural network and multi-agent reinforcement learning, which specifically comprises the following steps:
step S1, clustering massive customer electricity data by using an improved k-means clustering algorithm to generate electricity behavior labels, wherein the electricity behavior labels comprise three types of no peak, single peak and multiple peaks, and specifically comprise the following steps:
step S11, determining an initial clustering center. The traditional K-means clustering algorithm is randomly selected from all sample data when the initial clustering centers are selected, and the initial clustering centers randomly selected each time are different, so that the final clustering result is different, the obtained result is unstable, if the initial clustering centers are not proper, the clustering result may be in local optimum, but not global optimum, and the accuracy of the result is greatly influenced.
An improved initial cluster center selection method is presented. The method comprises the steps of firstly sequencing the total power loads of all users in a sample set, then uniformly dividing the power load data of all users into K classes, and finally calculating the average value of the sample loads in each class as an initial clustering center of the class.
Step S12, using Euclidean distance squared d (x i ,c j )=||x i -c j || 2 As a measure, all samples x are calculated i To K cluster centers c j Dividing all samples into different categories according to the nearest distance according to the formulaRe-calculating and updating the clustering center;
step S13, repeating step S12 until the cluster center is not changed.
The specific algorithm is as follows:
improving a K-means clustering algorithm:
input: sample set d= { x 1 ,x 2 ,x 3 ,...,x m }
Calculating an initialization cluster center:
equally dividing the sample set D into K classes according to the total load size sequence:
calculating the mean vector of each class as an initial clustering center:
and (3) outputting: cluster class partition c= { C 1 ,C 2 ,C 3 ,...,C k }。
And S2, modeling the clustered non-peak, single-peak and multi-peak clients as three independent agents in the multi-agent reinforcement learning model, and establishing a partially observable Markov game model. The method specifically comprises the following steps:
and S21, modeling a part of observable Markov game model based on an artificial neural network and a power grid demand side response potential evaluation and load regulation method of multi-agent reinforcement learning, and respectively modeling the clients of the three clustering results of no peak, single peak and multiple peaks obtained in the step S1 as an independent agent in reinforcement learning.
The partially observable Markov game model is represented by a five-tuple (N, S, A, R, P). N is a collection containing a plurality of agents; s=s 1 ×S 2 ×...×S N A Markov state set of number 1-N agents; a=a 1 ×A 2 ×...×A N Is a Markov set of # 1-N agents; r=r 1 ×r 2 ×...×r N A Markov reward set of number 1-N agents; p is the transition probability of S. Because each agent has an interactive relationship not only with the environment but also with other agents in competition or collaboration.
Step S22, taking the customer electricity load, electricity behavior characteristics, demand response potential, real-time electricity price, weather state and the like as Markov states; time-of-use electricity prices, interruptible loads, adjustable loads, etc. as markov actions; and feeding back the negative value of the total electricity consumption cost of each type of user as a reward item to each type of intelligent agent, and training the deep reinforcement learning game model based on the artificial neural network.
And a Markov game model which is partially observable by a power grid demand side response potential evaluation and load regulation method based on artificial neural network and multi-agent reinforcement learning, wherein the model can be defined by elements in five-tuple (t, N, S, A, R, P and gamma) as follows in detail:
(a) Discrete time t: in the finite time domain model, taking 1h as decision time granularity, and making decision actions by an intelligent agent;
(b) Agent set N: modeling the clients of the three clustering results of no peak, single peak and multiple peaks obtained in the step S1 as an independent intelligent agent in reinforcement learning respectively; the electricity selling company is regarded as a decision agent.
(c) State s=s 1 ×S 2 ×...×S N : in the state space set, the Markov states of three types of clustered client agents contain real-time electricity prices lambda t Weather state (conversion to wind power generation efficiency)And photovoltaic power generation efficiency->) The method comprises the steps of carrying out a first treatment on the surface of the Markov state of an agent of an electricity company includes the power consumption P of a consumer of electricity t Electric energy demand E t Load transfer willingness coefficient a, the upper limit and the lower limit U of the climbing of the demand of the power user n at the moment t t,n 、D t,n And the amount of load removed by the user +.>
(d) Action a=a 1 ×A 2 ×...×A N : in the action space set, the Markov actions of three kinds of clustered client agents comprise adjustable loadsAnd can be used forInterrupt load->The Markov action of the intelligent agent of the electricity selling company is retail electricity price lambda formulated by the electricity selling company at the time t t Discretizing the retail electricity prices into a limited number of actions;
(e) Prize r=r 1 ×r 2 ×...×r N : refers to the feedback value given by the environment in the state s passing through the action a and reaching the next state s', and is used for the state-action combination at the time t and the time t+1<s,a>Performing value evaluation;
in the actual solving process, a common Q function measures the advantages and disadvantages of actions executed by an agent following a strategy and in a state of the agent, Q p (s,a)=E p [R t |s t =s,a t =a]Indicating the desired rewards that can be obtained in state s by following policy p to act a.
In the rewards set, the rewards function and constraint condition of three kinds of electric power user agent are defined as follows:
1) Reward function
The electric power user reduces the electric charge expenditure by means of load transfer, but increases corresponding dissatisfaction, the dissatisfaction is described as a form of cost, and in order to maximize the benefit of the user, the reward function can be expressed as follows:
wherein E is t,nRespectively representing the electric energy demand and dissatisfaction cost of user n at time t, with higher dissatisfaction indicating that user is less willing to participate in load transfer, a n The load transfer willingness coefficient for the user n, predicted by step S2, is a positive value, the larger the value,indicating that the smaller the user dissatisfaction is, the greater the willingness to participate in load transfer is>The load amount of the power consumer n forced to be cut off at time t is represented, and δ represents the dissatisfaction coefficient of the forced cut-off load amount.
2) Constraint conditions
In the power user model, user load constraint, demand climbing constraint, transfer quantity constraint and excision quantity constraint need to be satisfied.
(a) User load capacity constraints
Wherein,indicating that user n is at time t i Load amount transferred to time t, +.>And->The inherent load and the flexible load of the user n at the time t are respectively, pi is a user load transfer decision matrix, and can be expressed as follows:
the values of the main diagonal lines are all 0, the values of other positions are 0-1 variables, the decision of a user is represented, and when the value is 0, the user does not participate in load transfer at the moment corresponding to the column number where the user is located at the moment corresponding to the row; when the value is 1, participation is indicated.
(b) Demand hill climbing constraint
E t,n -U t,n ≤P t,n ≤E t,n -D t,n
Wherein U is t,n 、D t,n The upper and lower limits of the climbing of the demand of the user n at the time t are respectively defined.
(c) Transfer volume constraint
Wherein t is i ,t∈T,t i Not equal to t, indicating that user n is at time t i The amount of load transferred to time t should not exceed the total amount of flexible load of the user n at time t.
(d) Cut-off amount constraint
The equation indicates that the amount of power that user n is forced to cut off at time t should not exceed the amount of power that the user is consuming at the current time.
(e) Average satisfaction constraint
Wherein, psi is n Representing the time average satisfaction of user N in the total period T, Ω represents the average satisfaction of users under the total number of users N, and e is an equalization index representing the maximum value of the deviation degree between the satisfaction of each user and the average satisfaction. The purpose of this formula is to ensure that all users' satisfaction approaches for consistent fairness.
The rewarding function and constraint conditions of the intelligent agent of the electricity selling company are defined as follows:
1) Reward function
The electricity vending company obtains the maximum profit by making the proper retail price, and the rewarding function is as follows:
wherein P is t,n Representing the power consumption of the nth user at time t, c t And u t The retail price of the electric energy and the wholesale price of the electricity purchased from the power supply company are respectively formulated for the electricity selling company at the moment t.
2) Constraint conditions
Retail price constraints
c t,min ≤c t ≤c t,max
Wherein c t,min 、c t,max The lower limit and the upper limit of the electricity price at the moment t are respectively formulated for the electricity selling company, and c degrees represent the current day unified electricity price adopted by the electricity selling company according to the 24-hour unified electricity price strategy.
The reward functions between the three types of power consumer agents and the electricity selling company are defined as follows:
wherein, alpha is 0,1, which represents the emphasis factor of the respective benefits of the electricity-selling company and the electricity user. When α=1, it is shown that the demand response model focuses on maximizing the revenue of the electricity vending company; α=0, it is shown that the demand response model focuses on the benefits of the power consumer.
(f) Action probability P: the selection of actions of the individual agents employs a depth deterministic strategy, which is deterministic and can be defined as a=μ θ (s). According to deterministic policy gradient theorem:
where Q and μ are the outputs of the Critic and Actor networks, respectively, the depth deterministic strategy uses strategy μ to find the action value that maximizes the expected value of Q (s, a), namely:
(g) Discount factor gamma: this parameter is in the range of [0,1] and is used to represent the importance of the long-term rewards, i.e. the greater the value of gamma, the more important the long-term rewards, and vice versa, the more important the instant rewards are seen.
Step S3, a multi-layer perceptron (MLP) neural network model is built and trained, 96-point working day load data and electricity consumption behavior labels of clients are used as input, implicit mapping relations between input data and response will and potential of participation demands of the clients are mined, a corresponding network structure is shown in FIG. 2, and the method comprises the following steps:
and S31, constructing a multi-layer perceptron (MLP) neural network model.
The MLP model is a machine learning algorithm that fits the relationship between input and output vectors by mimicking human neurons, and its structure is shown in fig. 2. When a neuron receives an input vector, it produces a corresponding output by activating a function. The neural network is a hierarchical network structure composed of an input layer, hidden layers and an output layer, wherein each hidden layer comprises a plurality of neurons, the aim of information processing can be achieved by adjusting the connection among the neurons of the network, and the output of the jth neuron of the ith layer in the network can be expressed as:
wherein sigma represents an activation function, n represents the number of neurons of the i-1 th layer, w kj Representing the weight coefficient between the jth neuron of the ith layer and the kth neuron of the i-1 layer,is the output of the (i-1) th layer (k) th neuron, b 1 Is the ith layer and the jth layerBias parameters of neurons.
When all the results of the output layer are obtained, the error between the model predicted result and the real result can be calculated by using a proper loss function, and then the weight parameters are updated by a gradient descent algorithm, so that the predicted result of the model is continuously approximate to the real result.
In the embodiment of the application, the MLP is provided with two hidden layers, a random gradient descent algorithm is used for a training function, and MSE loss is used for a loss function.
Step S32, training an MLP neural network model, wherein training data comprises input samples and real response labels, the input samples are power consumption behavior labels obtained in the step S1 and comprise response willingness and response potential, the real response labels are obtained from the early development of demand response work of a power grid company, the response willingness = participation response days/implementation demand response total days, the probability of the user participating in demand response is represented, the response potential is a value of power consumption load reduction after the user participating in demand response compared with the power consumption load before the response, and the response potential is represented.
Training the MLP model after acquiring enough training data, wherein the training process comprises forward propagation and reverse propagation, the forward propagation is a model prediction process, the data X is input by an input layer, and a prediction output is obtained after the network structure, the weight and the threshold value functionBack propagation is a model parameter update procedure, calculating the prediction output +.>Error e between true output Y:
updating the model weight and the bias parameters according to the error e, so that the error e is continuously reduced, namely, the model prediction result approaches to a real structure, and updating the formula as follows:
where e represents an error and γ represents a learning rate. The forward propagation and the backward propagation processes are continuously repeated by using the training data until e is less than or equal to E and is very small, and the model prediction result is similar to the real result at the moment, so that the model training is completed.
Step S4, solving the constructed load transfer model by utilizing deep reinforcement learning, and outputting an optimal time-sharing electricity price formulation suggestion, wherein the method specifically comprises the following steps of:
and S41, constructing a win-win demand side load transfer model of the grid company and the power customer.
In the electricity market, electricity companies can make retail prices of electricity and sell it to electricity consumers. However, many factors are often considered in the establishment of electricity price, and if the electricity price is too high, the electricity price may be lost to the power consumer; the price of electricity is too low, and the electricity selling company can bear the risk of loss. The time-sharing electricity price is divided into a plurality of time periods to make electricity price according to the running condition of the system, and the pricing mode can reduce the risk of fluctuation of the electricity price and is popular with users. Starting from two aspects of an electricity selling company and an electricity user, a user flexible load economic transfer model which is win-win between the electricity selling company and the electricity user is built.
Electricity selling company model
1) Objective function
The electricity vending company obtains the maximum profit by making a proper retail price, and the objective function is as follows:
wherein P is t,n Representing the power consumption of the nth user at time t, c t And u t The retail price of the electric energy and the wholesale price of the electricity purchased from the power supply company are respectively formulated for the electricity selling company at the moment t.
2) Constraint conditions
Retail price constraints
c t,min ≤c t ≤c t,max
Wherein c t,min 、c t,max The lower limit and the upper limit of the electricity price at the moment t are respectively formulated for the electricity selling company, and c degrees represent the current day unified electricity price adopted by the electricity selling company according to the 24-hour unified electricity price strategy.
Power consumer model
1) Objective function
The electric power user reduces the electric charge expenditure by means of load transfer, but increases corresponding dissatisfaction, the dissatisfaction is described as a form of cost, and in order to maximize the benefit of the user, the objective function can be expressed as:
wherein E is t,nRespectively representing the electric energy demand and dissatisfaction cost of user n at time t, with higher dissatisfaction indicating that user is less willing to participate in load transfer, a n The load transfer willingness coefficient of the user n is predicted to be a positive value in the step S2, and the larger the value is, the smaller the dissatisfaction of the user is, the larger the willingness to participate in load transfer is, and the +.>The load amount of the power consumer n forced to be cut off at time t is represented, and δ represents the dissatisfaction coefficient of the forced cut-off load amount.
2) Constraint conditions
In the power user model, user load constraint, demand climbing constraint, transfer quantity constraint and excision quantity constraint need to be satisfied.
(a) User load capacity constraints
Wherein,indicating that user n is at time t i Load amount transferred to time t, +.>And->The inherent load and the flexible load of the user n at the time t are respectively, pi is a user load transfer decision matrix, and can be expressed as follows:
the values of the main diagonal lines are all 0, the values of other positions are 0-1 variables, the decision of a user is represented, and when the value is 0, the user does not participate in load transfer at the moment corresponding to the column number where the user is located at the moment corresponding to the row; when the value is 1, participation is indicated.
(b) Demand hill climbing constraint
E t,n -U t,n ≤P t,n ≤E t,n -D t,n
Wherein U is t,n 、D t,n The upper and lower limits of the climbing of the demand of the user n at the time t are respectively defined.
(c) Transfer volume constraint
Wherein t is i ,t∈T,t i Not equal to t, indicating that user n is at time t i The amount of load transferred to time t should not exceed the total amount of flexible load of the user n at time t.
(d) Cut-off amount constraint
The equation indicates that the amount of power that user n is forced to cut off at time t should not exceed the amount of power that the user is consuming at the current time.
(e) Average satisfaction constraint
Wherein, psi is n Representing the time average satisfaction of user N in the total period T, Ω represents the average satisfaction of users under the total number of users N, and e is an equalization index representing the maximum value of the deviation degree between the satisfaction of each user and the average satisfaction. The purpose of this formula is to ensure that all users' satisfaction approaches for consistent fairness.
Objective function
Wherein, alpha is 0,1, which represents the emphasis factor of the respective benefits of the electricity-selling company and the electricity user. When α=1, it is shown that the demand response model focuses on maximizing the revenue of the electricity vending company; α=0, it is shown that the demand response model focuses on the benefits of the power consumer.
And step S42, solving the Markov game model in step S22 and the load transfer model in step S41 by utilizing a core framework of multi-agent reinforcement learning Centralized Training and Distributed Execution (CTDE), outputting an optimal time-of-use electricity price scheme, and assisting a power grid company to formulate a proper time-of-use electricity price to guide a user to participate in peak clipping and valley filling. The pressure of the power supply and demand relationship is further relieved, and the safe operation and reasonable planning of the power system are guaranteed.
Multi-agent reinforcement learning is a machine learning algorithm guided by the Nash equilibrium of each agent. For each independent agent, when the agent selects an action a to act on the environment, the state s of the environment changes to enter the state s at the next moment And simultaneously generating a reward or punishment signal r to feed back to the intelligent agent, wherein the intelligent agent can select a new action according to the obtained signal and the state of the current environment until the iteration is finished. Multi-agent reinforcement learning uses a concept paradigm of Centralized Training Decentralized Execution (CTDE):
according to the CTDE idea, global state information of all agents can be used during model training to achieve a better training effect, and each agent in the decision stage is executed independently, and actions are output only according to own strategies. The model takes 24 hours a day as a finite time length, and discretizes the finite time length into 24 moments; three types of electricity consumers and electricity companies are respectively regarded as a decision intelligent agent. For three types of electricity consumers, the self load is adjusted according to the real-time electricity price and weather state (active power output of the distributed generation units) of an electricity selling company. For an intelligent body of an electricity selling company, an initial time-sharing electricity price is firstly established for an electricity user, the user decides whether to participate in response according to the electricity price and feeds back the total profits of the electricity user and the electricity selling company to the electricity selling company, then the electricity selling company resets the time-sharing electricity price according to the current total profits, when the total profits of the electricity user and the electricity selling company reach Nash equilibrium, the iteration process is stopped, and the time-sharing electricity price at the moment is the optimal demand response strategy.
The power grid demand side response potential evaluation and load regulation method solving process based on artificial neural network and multi-agent reinforcement learning is shown in figure 3, and the solving process is combined with CTDE idea, and multi-agent depth deterministic strategy gradient (MA-DDPG) algorithm is used for learning the mostAnd (5) optimizing strategies. Each agent in agent set N is trained using an Actor-critter (Actor-Critic) method. The Actor networks of each agent are independent, and the Actor network mu is independent φ In state s t For network input, outputting actions according to the action space of each agent:
a=μ φ (s t )
and the strategy is promoted according to the estimated value of the state-action cost function Q fed back by the Critic network, namely, the objective function is maximized by updating the network parameter phi:
however, all agents share a centralized Critic network, critic network Q θ In the state s of each agent t And action a t For network input, output state-motion cost function Q estimate Q θ (s t ,a t ) To evaluate the output of the Actor network. With the cooperation of the Actor and the Critic network, each agent can better specify the strategy and make corresponding decisions so as to achieve the Nash equilibrium of the mixed strategy among the agents.
In the method for evaluating the response potential and regulating the load of the power grid demand side based on the artificial neural network and the multi-agent reinforcement learning, the target gradients of the Actor and the Critic network of each agent are respectively as follows:
the application utilizes the strong nonlinear fitting capability of the artificial neural network to construct a complex mapping relation between the Markov state set and the Markov action set, fully excavates and utilizes the power consumption data characteristics of the clients, and further predicts the willingness and potential of the clients to participate in demand response; the method comprises the steps of constructing a win-win electricity load transfer model of an electricity selling company and a power grid customer, utilizing a core framework of multi-agent reinforcement learning Centralized Training and Distributed Execution (CTDE) to assist the power grid company to formulate a proper time-of-use electricity price strategy, regulating and controlling loads, guiding the customer to reasonably arrange electricity, improving the capacity of peak clipping and valley filling of the power grid, further relieving the pressure of power supply and demand relationship, and guaranteeing the safe operation and reasonable planning of a power system.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and variations will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (5)

1. The power grid demand side response potential evaluation and load regulation method based on artificial neural network and multi-agent reinforcement learning is characterized by comprising the following steps of:
step S1, clustering the customer electricity consumption data by using an improved k-means clustering algorithm to generate electricity consumption behavior labels, and clustering the customer data into three categories of no peak, single peak and multiple peaks;
step S2, modeling the clustered non-peak, single-peak and multi-peak clients as three independent agents in the multi-agent reinforcement learning model, and establishing a partially observable Markov game model;
step S3, a multi-layer perceptron neural network model is built and trained, 96-point work daily load data and electricity behavior labels of three types of clients are used as input, and implicit mapping relations between the input data and response will and potential of participation demands of the clients are mined;
and S4, solving the constructed load regulation model by utilizing multi-agent reinforcement learning, and outputting an optimal time-sharing electricity price establishment proposal and regulation load scheme.
2. The method for evaluating the response potential and regulating the load on the demand side of a power grid based on artificial neural network and multi-agent reinforcement learning according to claim 1, wherein the step S1 comprises the following substeps:
step S11, determining an initial clustering center, sorting the total power loads of all users in a sample set, uniformly dividing the total power loads into K classes, and calculating the average value of the sample loads in each class as the initial clustering center of the class;
step S12, calculating distances from all samples to K clustering centers, dividing all samples into different categories according to the nearest distances, and recalculating and updating the clustering centers;
step S13, repeating step S12 until the cluster center is not changed.
3. The method for evaluating the response potential and regulating the load on the power grid demand side based on the artificial neural network and the multi-agent reinforcement learning according to claim 2, wherein the step S2 comprises the following substeps:
step S21, modeling the clients of the three clustering results of no peak, single peak and multiple peaks obtained in the step S1 as an independent agent in reinforcement learning,
step S22, taking the customer electricity load, electricity behavior characteristics, demand response potential, real-time electricity price, weather state and the like as Markov states; time-of-use electricity prices, interruptible loads, adjustable loads, etc. as markov actions; and feeding back the negative value of the total electricity consumption cost of each type of user as a reward item to each type of intelligent agent, and training the deep reinforcement learning game model based on the artificial neural network.
4. The method for evaluating the response potential and regulating the load on the demand side of a power grid based on artificial neural network and multi-agent reinforcement learning according to claim 3, wherein the step S3 comprises the following substeps:
step S31, a multi-layer perceptron MLP neural network model is built, and parameters such as the number of hidden layers, the number of neurons, a training function, the maximum iteration number, a loss function and the like of the network are set;
and step S32, training an MLP neural network model, wherein training data comprises input samples and real response labels, the input samples are the 96-point work day load data of the clients and the power consumption behavior labels obtained in the step S1, the real response labels comprise response will and response potential, and the real response labels of the users are obtained from the early development of demand response work of the power grid company.
5. The method for evaluating the response potential and regulating the load on the demand side of a power grid based on artificial neural network and multi-agent reinforcement learning according to claim 4, wherein the step S4 comprises the following substeps:
s41, constructing a demand side load regulation model of win-win of a power grid company and a power customer;
and step S42, utilizing a multi-agent reinforcement learning centralized training, and solving the Markov game model in step S22 and the load transfer model in step S41 by using a core framework for performing CTDE in a distributed manner, outputting an optimal time-of-use electricity price scheme, and assisting a power grid company to formulate a proper time-of-use electricity price to guide a user to participate in peak clipping and valley filling.
CN202310777311.8A 2023-06-29 2023-06-29 Power grid demand side response potential evaluation and load regulation method based on artificial neural network and multi-agent reinforcement learning Pending CN117117878A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310777311.8A CN117117878A (en) 2023-06-29 2023-06-29 Power grid demand side response potential evaluation and load regulation method based on artificial neural network and multi-agent reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310777311.8A CN117117878A (en) 2023-06-29 2023-06-29 Power grid demand side response potential evaluation and load regulation method based on artificial neural network and multi-agent reinforcement learning

Publications (1)

Publication Number Publication Date
CN117117878A true CN117117878A (en) 2023-11-24

Family

ID=88797308

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310777311.8A Pending CN117117878A (en) 2023-06-29 2023-06-29 Power grid demand side response potential evaluation and load regulation method based on artificial neural network and multi-agent reinforcement learning

Country Status (1)

Country Link
CN (1) CN117117878A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117674142A (en) * 2024-02-01 2024-03-08 云南电网有限责任公司信息中心 Power scheduling method and scheduling device based on time-sharing electric quantity accounting

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117674142A (en) * 2024-02-01 2024-03-08 云南电网有限责任公司信息中心 Power scheduling method and scheduling device based on time-sharing electric quantity accounting
CN117674142B (en) * 2024-02-01 2024-04-19 云南电网有限责任公司信息中心 Power scheduling method and scheduling device based on time-sharing electric quantity accounting

Similar Documents

Publication Publication Date Title
Chen et al. Local energy trading behavior modeling with deep reinforcement learning
CN109492815B (en) Energy storage power station site selection and volume fixing optimization method for power grid under market mechanism
Guo et al. Optimal energy management of multi-microgrids connected to distribution system based on deep reinforcement learning
Saravanan et al. A solution to the unit commitment problem—a review
CN111242443B (en) Deep reinforcement learning-based economic dispatching method for virtual power plant in energy internet
CN108734396B (en) Virtual power plant scheduling optimization method based on demand side bidding and multi-energy complementation
CN110276698A (en) Distribution type renewable energy trade decision method based on the study of multiple agent bilayer cooperative reinforcing
CN112217195B (en) Cloud energy storage charging and discharging strategy forming method based on GRU multi-step prediction technology
CN111553750A (en) Energy storage bidding strategy method considering power price uncertainty and loss cost
Chuang et al. Deep reinforcement learning based pricing strategy of aggregators considering renewable energy
CN117117878A (en) Power grid demand side response potential evaluation and load regulation method based on artificial neural network and multi-agent reinforcement learning
Gao et al. Distributed energy trading and scheduling among microgrids via multiagent reinforcement learning
CN116862551A (en) New energy consumption price decision method considering user privacy protection
Anwar et al. Proximal policy optimization based reinforcement learning for joint bidding in energy and frequency regulation markets
Taghizadeh et al. Deep reinforcement learning-aided bidding strategies for transactive energy market
Li et al. Deep reinforcement learning based explainable pricing policy for virtual storage rental service
CN114169916A (en) Market member quotation strategy making method suitable for novel power system
Xu et al. Perception and decision-making for demand response based on dynamic classification of consumers
Saini et al. Data driven net load uncertainty quantification for cloud energy storage management in residential microgrid
CN112686693A (en) Method, system, equipment and storage medium for predicting marginal electricity price of electric power spot market
CN109687452B (en) Analysis method and device for power supply capacity of power distribution network and storage medium
CN106600078A (en) Micro-grid energy management scheme based on new energy power generation forecasting
CN110599032A (en) Deep Steinberg self-adaptive dynamic game method for flexible power supply
CN115829119A (en) Neural network and reinforcement learning-based power grid demand side response potential evaluation and load transfer method
Guiducci et al. A Reinforcement Learning approach to the management of Renewable Energy Communities

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination