CN117117878A

CN117117878A - Power grid demand side response potential evaluation and load regulation method based on artificial neural network and multi-agent reinforcement learning

Info

Publication number: CN117117878A
Application number: CN202310777311.8A
Authority: CN
Inventors: 张佳雯; 张�成; 蔡文嘉; 何行; 董重重; 肖燕婷; 田猛; 张芹; 魏解; 吴明珍; 张蕾; 吴悠; 冉艳春; 胡亚天; 王兹玥
Original assignee: Wuhan University WHU; Metering Center of State Grid Hubei Electric Power Co Ltd
Current assignee: Wuhan University WHU; Metering Center of State Grid Hubei Electric Power Co Ltd
Priority date: 2023-06-29
Filing date: 2023-06-29
Publication date: 2023-11-24

Abstract

The application relates to a power grid demand side response potential evaluation and load regulation method based on artificial neural network and multi-agent reinforcement learning, which comprises the following steps: step S1, clustering the customer electricity consumption data by using an improved k-means clustering algorithm to generate electricity consumption behavior labels; s2, establishing a part of observable Markov game model; step S3, building and training a neural network model of the multi-layer perceptron; and S4, solving the constructed load regulation model by utilizing multi-agent reinforcement learning, and outputting an optimal time-sharing electricity price establishment proposal and regulation load scheme. The application improves the traditional reinforcement learning modeling method and algorithm, and utilizes the core framework of multi-agent reinforcement learning Centralized Training and Distributed Execution (CTDE) to assist the grid company to formulate a proper time-of-use electricity price strategy.

Description

Power grid demand side response potential evaluation and load regulation method based on artificial neural network and multi-agent reinforcement learning

Technical Field

The application relates to the field of power grid information, in particular to a power grid demand side response potential evaluation and load regulation method based on artificial neural network and multi-agent reinforcement learning.

Background

The concept of power demand side management was first proposed by the united states in the last 70 th century and gradually extended to other developed countries in the western world in the 80 s. The method is characterized in that on the premise of ensuring the power service level in the power industry, a series of measures are taken to guide a user to scientifically and reasonably use electricity, so that the utilization efficiency of electric energy is improved, and the electricity consumption management activities of protecting the environment and reducing the power service cost are realized.

With the continuous development of smart grid technology, demand Response (DR) plays an increasingly important role in the economic and stable operation of a power grid, and a Demand Response strategy can improve the power supply reliability of the power distribution network. In recent years, the power demand in various places in the country is kept to be fast growing, the shortage of power occurs, and the running pressure of the power grid is continuously increased, so that the power consumption behavior characteristics of a user side are more focused in an intelligent power consumption environment, the response will and potential of the user are mined, price signals are pertinently formulated or incentive measures are implemented to encourage the power grid user to voluntarily participate in response activities, and the reasonable power consumption of the user can be scientifically guided, so that the purpose of relieving the contradiction between power supply and demand is achieved.

Disclosure of Invention

The embodiment of the application aims to provide a power grid demand side response potential assessment and load regulation method based on artificial neural network and multi-agent reinforcement learning, which can automatically mine and extract hidden information in power load data of a customer, forecast willingness and potential of the customer to participate in demand side response, utilize a core framework of multi-agent reinforcement learning 'centralized training, distributed execution (CTDE)' to assist a power grid company to formulate a proper time-of-use electricity price strategy, regulate and control loads, guide the customer to reasonably arrange power consumption, improve the power grid demand side management efficiency, further improve the peak clipping and valley filling capacity of the power grid, further relieve the power supply and demand relation pressure, and ensure the safe operation and reasonable planning of a power system.

In order to achieve the above purpose, the present application provides the following technical solutions:

the embodiment of the application provides a power grid demand side response potential evaluation and load regulation method based on artificial neural network and multi-agent reinforcement learning, which is characterized by comprising the following steps:

step S1, clustering the customer electricity consumption data by using an improved k-means clustering algorithm to generate electricity consumption behavior labels, and clustering the customer data into three categories of no peak, single peak and multiple peaks;

step S2, modeling the clustered non-peak, single-peak and multi-peak clients as three independent agents in the multi-agent reinforcement learning model, and establishing a partially observable Markov game model;

step S3, a multi-layer perceptron neural network model is built and trained, 96-point work daily load data and electricity behavior labels of three types of clients are used as input, and implicit mapping relations between the input data and response will and potential of participation demands of the clients are mined;

and S4, solving the constructed load regulation model by utilizing multi-agent reinforcement learning, and outputting an optimal time-sharing electricity price establishment proposal and regulation load scheme.

Said step S1 comprises the sub-steps of:

step S11, determining an initial clustering center, sorting the total power loads of all users in a sample set, uniformly dividing the total power loads into K classes, and calculating the average value of the sample loads in each class as the initial clustering center of the class;

step S12, calculating distances from all samples to K clustering centers, dividing all samples into different categories according to the nearest distances, and recalculating and updating the clustering centers;

step S13, repeating step S12 until the cluster center is not changed.

Said step S2 comprises the sub-steps of:

step S21, modeling the clients of the three clustering results of no peak, single peak and multiple peaks obtained in the step S1 as an independent agent in reinforcement learning,

step S22, taking the customer electricity load, electricity behavior characteristics, demand response potential, real-time electricity price, weather state and the like as Markov states; time-of-use electricity prices, interruptible loads, adjustable loads, etc. as markov actions; and feeding back the negative value of the total electricity consumption cost of each type of user as a reward item to each type of intelligent agent, and training the deep reinforcement learning game model based on the artificial neural network.

Said step S3 comprises the sub-steps of:

step S31, a multi-layer perceptron MLP neural network model is built, and parameters such as the number of hidden layers, the number of neurons, a training function, the maximum iteration number, a loss function and the like of the network are set;

and step S32, training an MLP neural network model, wherein training data comprises input samples and real response labels, the input samples are the 96-point work day load data of the clients and the power consumption behavior labels obtained in the step S1, the real response labels comprise response will and response potential, and the real response labels of the users are obtained from the early development of demand response work of the power grid company.

Said step S4 comprises the sub-steps of:

s41, constructing a demand side load regulation model of win-win of a power grid company and a power customer;

and step S42, utilizing a multi-agent reinforcement learning centralized training, and solving the Markov game model in step S22 and the load transfer model in step S41 by using a core framework for performing CTDE in a distributed manner, outputting an optimal time-of-use electricity price scheme, and assisting a power grid company to formulate a proper time-of-use electricity price to guide a user to participate in peak clipping and valley filling.

Compared with the prior art, the application has the beneficial effects that:

(1) The traditional k-means clustering algorithm is improved, and an initial clustering center selection strategy is optimized according to the power grid customer power load data characteristics, so that a clustering effect is more accurate;

(2) The traditional reinforcement learning modeling method and algorithm are improved, and a multi-agent reinforcement learning Centralized Training and Distributed Execution (CTDE) core framework is utilized to assist a grid company to formulate a proper time-of-use electricity price strategy.

(3) The method has the advantages that the prediction and evaluation of the customer demand response potential and the optimal time-sharing electricity price and the load regulation and control solution are unified, the capacity of automatically predicting the customer participation demand response according to the electricity load data is realized, and meanwhile, the optimal time-sharing electricity price and the optimal load regulation and control scheme are output, so that the pressure of the power supply and demand relation is further relieved, and the safe operation and reasonable planning of the power grid are guaranteed.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and should not be considered as limiting the scope, and other related drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is an overall flow chart of an embodiment of the present application.

Fig. 2 is a multi-layer perceptron (MLP) neural network model of an embodiment of the present application.

FIG. 3 is a simplified diagram of a power grid demand side response potential evaluation and load regulation method solving process based on artificial neural network and multi-agent reinforcement learning according to an embodiment of the application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application. It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

Fig. 1 is an overall flow chart of the technical solution of the present application. The embodiment of the application relates to a power grid demand side response potential evaluation and load regulation method based on artificial neural network and multi-agent reinforcement learning, which specifically comprises the following steps:

step S1, clustering massive customer electricity data by using an improved k-means clustering algorithm to generate electricity behavior labels, wherein the electricity behavior labels comprise three types of no peak, single peak and multiple peaks, and specifically comprise the following steps:

step S11, determining an initial clustering center. The traditional K-means clustering algorithm is randomly selected from all sample data when the initial clustering centers are selected, and the initial clustering centers randomly selected each time are different, so that the final clustering result is different, the obtained result is unstable, if the initial clustering centers are not proper, the clustering result may be in local optimum, but not global optimum, and the accuracy of the result is greatly influenced.

An improved initial cluster center selection method is presented. The method comprises the steps of firstly sequencing the total power loads of all users in a sample set, then uniformly dividing the power load data of all users into K classes, and finally calculating the average value of the sample loads in each class as an initial clustering center of the class.

Step S12, using Euclidean distance squared d (x _i ，c _j )＝||x _i -c _j || ² As a measure, all samples x are calculated _i To K cluster centers c _j Dividing all samples into different categories according to the nearest distance according to the formulaRe-calculating and updating the clustering center;

step S13, repeating step S12 until the cluster center is not changed.

The specific algorithm is as follows:

improving a K-means clustering algorithm:

input: sample set d= { x ₁ ，x ₂ ，x ₃ ，...，x _m }

Calculating an initialization cluster center:

equally dividing the sample set D into K classes according to the total load size sequence:

calculating the mean vector of each class as an initial clustering center:

and (3) outputting: cluster class partition c= { C ₁ ，C ₂ ，C ₃ ，...，C _k }。

And S2, modeling the clustered non-peak, single-peak and multi-peak clients as three independent agents in the multi-agent reinforcement learning model, and establishing a partially observable Markov game model. The method specifically comprises the following steps:

and S21, modeling a part of observable Markov game model based on an artificial neural network and a power grid demand side response potential evaluation and load regulation method of multi-agent reinforcement learning, and respectively modeling the clients of the three clustering results of no peak, single peak and multiple peaks obtained in the step S1 as an independent agent in reinforcement learning.

The partially observable Markov game model is represented by a five-tuple (N, S, A, R, P). N is a collection containing a plurality of agents; s=s ₁ ×S ₂ ×...×S _N A Markov state set of number 1-N agents; a=a ₁ ×A ₂ ×...×A _N Is a Markov set of # 1-N agents; r=r ₁ ×r ₂ ×...×r _N A Markov reward set of number 1-N agents; p is the transition probability of S. Because each agent has an interactive relationship not only with the environment but also with other agents in competition or collaboration.

And a Markov game model which is partially observable by a power grid demand side response potential evaluation and load regulation method based on artificial neural network and multi-agent reinforcement learning, wherein the model can be defined by elements in five-tuple (t, N, S, A, R, P and gamma) as follows in detail:

(a) Discrete time t: in the finite time domain model, taking 1h as decision time granularity, and making decision actions by an intelligent agent;

(b) Agent set N: modeling the clients of the three clustering results of no peak, single peak and multiple peaks obtained in the step S1 as an independent intelligent agent in reinforcement learning respectively; the electricity selling company is regarded as a decision agent.

(c) State s=s ₁ ×S ₂ ×...×S _N : in the state space set, the Markov states of three types of clustered client agents contain real-time electricity prices lambda _t Weather state (conversion to wind power generation efficiency)And photovoltaic power generation efficiency->) The method comprises the steps of carrying out a first treatment on the surface of the Markov state of an agent of an electricity company includes the power consumption P of a consumer of electricity _t Electric energy demand E _t Load transfer willingness coefficient a, the upper limit and the lower limit U of the climbing of the demand of the power user n at the moment t _t，n 、D _t，n And the amount of load removed by the user +.>

(d) Action a=a ₁ ×A ₂ ×...×A _N : in the action space set, the Markov actions of three kinds of clustered client agents comprise adjustable loadsAnd can be used forInterrupt load->The Markov action of the intelligent agent of the electricity selling company is retail electricity price lambda formulated by the electricity selling company at the time t _t Discretizing the retail electricity prices into a limited number of actions;

(e) Prize r=r ₁ ×r ₂ ×...×r _N : refers to the feedback value given by the environment in the state s passing through the action a and reaching the next state s', and is used for the state-action combination at the time t and the time t+1<s，a>Performing value evaluation;

in the actual solving process, a common Q function measures the advantages and disadvantages of actions executed by an agent following a strategy and in a state of the agent, Q ^p (s，a)＝E _p [R _t |s _t ＝s，a _t ＝a]Indicating the desired rewards that can be obtained in state s by following policy p to act a.

In the rewards set, the rewards function and constraint condition of three kinds of electric power user agent are defined as follows:

1) Reward function

The electric power user reduces the electric charge expenditure by means of load transfer, but increases corresponding dissatisfaction, the dissatisfaction is described as a form of cost, and in order to maximize the benefit of the user, the reward function can be expressed as follows:

wherein E is _t，n 、Respectively representing the electric energy demand and dissatisfaction cost of user n at time t, with higher dissatisfaction indicating that user is less willing to participate in load transfer, a _n The load transfer willingness coefficient for the user n, predicted by step S2, is a positive value, the larger the value,indicating that the smaller the user dissatisfaction is, the greater the willingness to participate in load transfer is>The load amount of the power consumer n forced to be cut off at time t is represented, and δ represents the dissatisfaction coefficient of the forced cut-off load amount.

2) Constraint conditions

In the power user model, user load constraint, demand climbing constraint, transfer quantity constraint and excision quantity constraint need to be satisfied.

(a) User load capacity constraints

Wherein,indicating that user n is at time t _i Load amount transferred to time t, +.>And->The inherent load and the flexible load of the user n at the time t are respectively, pi is a user load transfer decision matrix, and can be expressed as follows:

the values of the main diagonal lines are all 0, the values of other positions are 0-1 variables, the decision of a user is represented, and when the value is 0, the user does not participate in load transfer at the moment corresponding to the column number where the user is located at the moment corresponding to the row; when the value is 1, participation is indicated.

(b) Demand hill climbing constraint

E _t，n -U _t，n ≤P _t，n ≤E _t，n -D _t，n

Wherein U is _t，n 、D _t，n The upper and lower limits of the climbing of the demand of the user n at the time t are respectively defined.

(c) Transfer volume constraint

Wherein t is _i ，t∈T，t _i Not equal to t, indicating that user n is at time t _i The amount of load transferred to time t should not exceed the total amount of flexible load of the user n at time t.

(d) Cut-off amount constraint

The equation indicates that the amount of power that user n is forced to cut off at time t should not exceed the amount of power that the user is consuming at the current time.

(e) Average satisfaction constraint

Wherein, psi is _n Representing the time average satisfaction of user N in the total period T, Ω represents the average satisfaction of users under the total number of users N, and e is an equalization index representing the maximum value of the deviation degree between the satisfaction of each user and the average satisfaction. The purpose of this formula is to ensure that all users' satisfaction approaches for consistent fairness.

The rewarding function and constraint conditions of the intelligent agent of the electricity selling company are defined as follows:

1) Reward function

The electricity vending company obtains the maximum profit by making the proper retail price, and the rewarding function is as follows:

wherein P is _t，n Representing the power consumption of the nth user at time t, c _t And u _t The retail price of the electric energy and the wholesale price of the electricity purchased from the power supply company are respectively formulated for the electricity selling company at the moment t.

2) Constraint conditions

Retail price constraints

c _t，min ≤c _t ≤c _t，max

Wherein c _t，min 、c _t，max The lower limit and the upper limit of the electricity price at the moment t are respectively formulated for the electricity selling company, and c degrees represent the current day unified electricity price adopted by the electricity selling company according to the 24-hour unified electricity price strategy.

The reward functions between the three types of power consumer agents and the electricity selling company are defined as follows:

wherein, alpha is 0,1, which represents the emphasis factor of the respective benefits of the electricity-selling company and the electricity user. When α=1, it is shown that the demand response model focuses on maximizing the revenue of the electricity vending company; α=0, it is shown that the demand response model focuses on the benefits of the power consumer.

(f) Action probability P: the selection of actions of the individual agents employs a depth deterministic strategy, which is deterministic and can be defined as a=μ _θ (s). According to deterministic policy gradient theorem:

where Q and μ are the outputs of the Critic and Actor networks, respectively, the depth deterministic strategy uses strategy μ to find the action value that maximizes the expected value of Q (s, a), namely:

(g) Discount factor gamma: this parameter is in the range of [0,1] and is used to represent the importance of the long-term rewards, i.e. the greater the value of gamma, the more important the long-term rewards, and vice versa, the more important the instant rewards are seen.

Step S3, a multi-layer perceptron (MLP) neural network model is built and trained, 96-point working day load data and electricity consumption behavior labels of clients are used as input, implicit mapping relations between input data and response will and potential of participation demands of the clients are mined, a corresponding network structure is shown in FIG. 2, and the method comprises the following steps:

and S31, constructing a multi-layer perceptron (MLP) neural network model.

The MLP model is a machine learning algorithm that fits the relationship between input and output vectors by mimicking human neurons, and its structure is shown in fig. 2. When a neuron receives an input vector, it produces a corresponding output by activating a function. The neural network is a hierarchical network structure composed of an input layer, hidden layers and an output layer, wherein each hidden layer comprises a plurality of neurons, the aim of information processing can be achieved by adjusting the connection among the neurons of the network, and the output of the jth neuron of the ith layer in the network can be expressed as:

wherein sigma represents an activation function, n represents the number of neurons of the i-1 th layer, w _kj Representing the weight coefficient between the jth neuron of the ith layer and the kth neuron of the i-1 layer,is the output of the (i-1) th layer (k) th neuron, b ₁ Is the ith layer and the jth layerBias parameters of neurons.

When all the results of the output layer are obtained, the error between the model predicted result and the real result can be calculated by using a proper loss function, and then the weight parameters are updated by a gradient descent algorithm, so that the predicted result of the model is continuously approximate to the real result.

In the embodiment of the application, the MLP is provided with two hidden layers, a random gradient descent algorithm is used for a training function, and MSE loss is used for a loss function.

Step S32, training an MLP neural network model, wherein training data comprises input samples and real response labels, the input samples are power consumption behavior labels obtained in the step S1 and comprise response willingness and response potential, the real response labels are obtained from the early development of demand response work of a power grid company, the response willingness = participation response days/implementation demand response total days, the probability of the user participating in demand response is represented, the response potential is a value of power consumption load reduction after the user participating in demand response compared with the power consumption load before the response, and the response potential is represented.

Training the MLP model after acquiring enough training data, wherein the training process comprises forward propagation and reverse propagation, the forward propagation is a model prediction process, the data X is input by an input layer, and a prediction output is obtained after the network structure, the weight and the threshold value functionBack propagation is a model parameter update procedure, calculating the prediction output +.>Error e between true output Y:

updating the model weight and the bias parameters according to the error e, so that the error e is continuously reduced, namely, the model prediction result approaches to a real structure, and updating the formula as follows:

where e represents an error and γ represents a learning rate. The forward propagation and the backward propagation processes are continuously repeated by using the training data until e is less than or equal to E and is very small, and the model prediction result is similar to the real result at the moment, so that the model training is completed.

Step S4, solving the constructed load transfer model by utilizing deep reinforcement learning, and outputting an optimal time-sharing electricity price formulation suggestion, wherein the method specifically comprises the following steps of:

and S41, constructing a win-win demand side load transfer model of the grid company and the power customer.

In the electricity market, electricity companies can make retail prices of electricity and sell it to electricity consumers. However, many factors are often considered in the establishment of electricity price, and if the electricity price is too high, the electricity price may be lost to the power consumer; the price of electricity is too low, and the electricity selling company can bear the risk of loss. The time-sharing electricity price is divided into a plurality of time periods to make electricity price according to the running condition of the system, and the pricing mode can reduce the risk of fluctuation of the electricity price and is popular with users. Starting from two aspects of an electricity selling company and an electricity user, a user flexible load economic transfer model which is win-win between the electricity selling company and the electricity user is built.

Electricity selling company model

1) Objective function

The electricity vending company obtains the maximum profit by making a proper retail price, and the objective function is as follows:

2) Constraint conditions

Retail price constraints

c _t，min ≤c _t ≤c _t，max

Power consumer model

1) Objective function

The electric power user reduces the electric charge expenditure by means of load transfer, but increases corresponding dissatisfaction, the dissatisfaction is described as a form of cost, and in order to maximize the benefit of the user, the objective function can be expressed as:

wherein E is _t，n 、Respectively representing the electric energy demand and dissatisfaction cost of user n at time t, with higher dissatisfaction indicating that user is less willing to participate in load transfer, a _n The load transfer willingness coefficient of the user n is predicted to be a positive value in the step S2, and the larger the value is, the smaller the dissatisfaction of the user is, the larger the willingness to participate in load transfer is, and the +.>The load amount of the power consumer n forced to be cut off at time t is represented, and δ represents the dissatisfaction coefficient of the forced cut-off load amount.

2) Constraint conditions

(a) User load capacity constraints

(b) Demand hill climbing constraint

E _t，n -U _t，n ≤P _t，n ≤E _t，n -D _t，n

(c) Transfer volume constraint

(d) Cut-off amount constraint

(e) Average satisfaction constraint

Objective function

And step S42, solving the Markov game model in step S22 and the load transfer model in step S41 by utilizing a core framework of multi-agent reinforcement learning Centralized Training and Distributed Execution (CTDE), outputting an optimal time-of-use electricity price scheme, and assisting a power grid company to formulate a proper time-of-use electricity price to guide a user to participate in peak clipping and valley filling. The pressure of the power supply and demand relationship is further relieved, and the safe operation and reasonable planning of the power system are guaranteed.

Multi-agent reinforcement learning is a machine learning algorithm guided by the Nash equilibrium of each agent. For each independent agent, when the agent selects an action a to act on the environment, the state s of the environment changes to enter the state s at the next moment ^′ And simultaneously generating a reward or punishment signal r to feed back to the intelligent agent, wherein the intelligent agent can select a new action according to the obtained signal and the state of the current environment until the iteration is finished. Multi-agent reinforcement learning uses a concept paradigm of Centralized Training Decentralized Execution (CTDE):

according to the CTDE idea, global state information of all agents can be used during model training to achieve a better training effect, and each agent in the decision stage is executed independently, and actions are output only according to own strategies. The model takes 24 hours a day as a finite time length, and discretizes the finite time length into 24 moments; three types of electricity consumers and electricity companies are respectively regarded as a decision intelligent agent. For three types of electricity consumers, the self load is adjusted according to the real-time electricity price and weather state (active power output of the distributed generation units) of an electricity selling company. For an intelligent body of an electricity selling company, an initial time-sharing electricity price is firstly established for an electricity user, the user decides whether to participate in response according to the electricity price and feeds back the total profits of the electricity user and the electricity selling company to the electricity selling company, then the electricity selling company resets the time-sharing electricity price according to the current total profits, when the total profits of the electricity user and the electricity selling company reach Nash equilibrium, the iteration process is stopped, and the time-sharing electricity price at the moment is the optimal demand response strategy.

The power grid demand side response potential evaluation and load regulation method solving process based on artificial neural network and multi-agent reinforcement learning is shown in figure 3, and the solving process is combined with CTDE idea, and multi-agent depth deterministic strategy gradient (MA-DDPG) algorithm is used for learning the mostAnd (5) optimizing strategies. Each agent in agent set N is trained using an Actor-critter (Actor-Critic) method. The Actor networks of each agent are independent, and the Actor network mu is independent _φ In state s _t For network input, outputting actions according to the action space of each agent:

a＝μ _φ (s _t )

and the strategy is promoted according to the estimated value of the state-action cost function Q fed back by the Critic network, namely, the objective function is maximized by updating the network parameter phi:

however, all agents share a centralized Critic network, critic network Q _θ In the state s of each agent _t And action a _t For network input, output state-motion cost function Q estimate Q _θ (s _t ，a _t ) To evaluate the output of the Actor network. With the cooperation of the Actor and the Critic network, each agent can better specify the strategy and make corresponding decisions so as to achieve the Nash equilibrium of the mixed strategy among the agents.

In the method for evaluating the response potential and regulating the load of the power grid demand side based on the artificial neural network and the multi-agent reinforcement learning, the target gradients of the Actor and the Critic network of each agent are respectively as follows:

the application utilizes the strong nonlinear fitting capability of the artificial neural network to construct a complex mapping relation between the Markov state set and the Markov action set, fully excavates and utilizes the power consumption data characteristics of the clients, and further predicts the willingness and potential of the clients to participate in demand response; the method comprises the steps of constructing a win-win electricity load transfer model of an electricity selling company and a power grid customer, utilizing a core framework of multi-agent reinforcement learning Centralized Training and Distributed Execution (CTDE) to assist the power grid company to formulate a proper time-of-use electricity price strategy, regulating and controlling loads, guiding the customer to reasonably arrange electricity, improving the capacity of peak clipping and valley filling of the power grid, further relieving the pressure of power supply and demand relationship, and guaranteeing the safe operation and reasonable planning of a power system.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and variations will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. The power grid demand side response potential evaluation and load regulation method based on artificial neural network and multi-agent reinforcement learning is characterized by comprising the following steps of:

2. The method for evaluating the response potential and regulating the load on the demand side of a power grid based on artificial neural network and multi-agent reinforcement learning according to claim 1, wherein the step S1 comprises the following substeps:

step S13, repeating step S12 until the cluster center is not changed.

3. The method for evaluating the response potential and regulating the load on the power grid demand side based on the artificial neural network and the multi-agent reinforcement learning according to claim 2, wherein the step S2 comprises the following substeps:

4. The method for evaluating the response potential and regulating the load on the demand side of a power grid based on artificial neural network and multi-agent reinforcement learning according to claim 3, wherein the step S3 comprises the following substeps:

5. The method for evaluating the response potential and regulating the load on the demand side of a power grid based on artificial neural network and multi-agent reinforcement learning according to claim 4, wherein the step S4 comprises the following substeps: