CN116934011A

CN116934011A - Confidence algorithm for scheduling balance electricity utilization of multiple users by multiple suppliers of smart grid

Info

Publication number: CN116934011A
Application number: CN202310831603.5A
Authority: CN
Inventors: 马媛媛; 许曦予; 王轩慧; 李猛; 郭祥天; 李若冰
Original assignee: Qingdao Agricultural University
Current assignee: Qingdao Agricultural University
Priority date: 2023-07-07
Filing date: 2023-07-07
Publication date: 2023-10-24

Abstract

The invention relates to the technical field of power grid confidence algorithms, in particular to a confidence algorithm for scheduling balance power consumption of multiple users by multiple suppliers of a smart power grid. According to the invention, a virtual model is constructed for the actual situation, so that the distributed scheduling problem is simulated and simulated by using mathematical analysis software. Describing the load selection problem as a gaming problem by adding a game theory idea, and proving that the NE point can be reached by using the game in the problem; the algorithm is improved, beta distribution is regulated to serve as a basis for selecting suppliers by users in each iteration, rewards and penalties are carried out on all user actions according to a judging rule after user load selection, parameters in the beta distribution are updated according to the rewards and penalties, NE points in a game are reached, the limit of iteration times is formulated, load balancing is achieved when a plurality of user actions are unchanged all the time in a certain number of iteration processes, and finally scheduling of total loads of a plurality of users by a plurality of suppliers is achieved.

Description

Confidence algorithm for scheduling balance electricity utilization of multiple users by multiple suppliers of smart grid

Technical Field

The invention relates to the technical field of power grid confidence algorithms, in particular to a confidence algorithm for scheduling balance power consumption of multiple users by multiple suppliers of a smart power grid.

Background

The smart grid belief algorithm is based on a machine-learned distributed algorithm (DRS). The machine learning algorithms are mainly of two types: centralized scheduling and distributed scheduling. The first type of centralized scheduling is a way to rely entirely on a Central Controller (CC) to send their needs to the CC for the desired scheduling. But it has the following problems: the mining is difficult to manage and control through a data mining algorithm, and the risk of extremely infringement of user privacy exists; the centralized scheduling efficiency is lower, the approval needs longer time, the flexibility is poor, and the CC has high requirements on bandwidth due to huge communication and information exchange quantity. The second type of distributed scheduling allows the peer users to negotiate a scheduling protocol with each other to determine an optimal scheduling scheme, so that participation of the CC in the load scheduling process can be effectively reduced, and the central function of the power supply controller in the load scheduling process is effectively reduced. For example, the method for realizing coordinated multi-point transmission scheduling and power distribution with Chinese patent publication No. CN103281770A can dynamically adjust the coordination mode in real time according to the system parameters on each PRB, and has flexible scheduling and strong adaptability; the centralized transmission scheduling scheme has the advantages of low algorithm delay cost and strong instantaneity; the non-cooperative game is adopted to realize interference coordination in CoMP, the power distribution is reasonable, and the throughput of the edge user is improved. Distributed scheduling typically includes LA-based Bayesian Learning Automata (BLA), including linear rewards-acts (LIR), and co-game learning automata (CLA). But it has the following problems: the LIR adopts a direct decision, each LA makes an action based on the action selection probability, if the selected action is rewarded, the user-defined parameters can reduce the probability of replacing the action, and the probability related to the selected action is increased; if penalty is obtained, LA keeps action probability unchanged; CLA is similar to LRI, and the action probability and the values of the learning parameters involved affect the convergence speed and the closeness of the final solution to the optimal solution, except that it explicitly uses a continuous utility function in updating the equation.

The two schemes are all required to be pre-determined with learning parameters, and the optimal weight balance between the speed and the accuracy of the machine is realized by controlling the learning parameters of the learning speed; the environments of the various systems may vary greatly and even change over time, and it is difficult to find common learning parameters to achieve high performance in LA's in different environments.

Disclosure of Invention

The invention aims to solve the technical problems that: the method overcomes the defects of the prior art, provides a confidence algorithm for dispatching balance electricity consumption of multiple users by multiple suppliers of the intelligent power grid, and constructs a virtual model for actual conditions so as to simulate and simulate distributed dispatching problems by using mathematical analysis software. By incorporating the idea of game theory, the load selection problem is described as a game problem and it is demonstrated that NE points can be reached using games in this problem. The algorithm is further improved, beta distribution is defined as the basis of selecting suppliers by users in each iteration, users are informed of each other after load selection, rewards and penalties are carried out on all user actions according to a judging rule, parameters in the beta distribution are updated according to the rewards and penalties, finally, in order to reach NE points of a game, the limit of iteration times is formulated, and when a plurality of user actions are unchanged all the time in a certain number of iteration processes, the scheduling of total loads of a plurality of users by a plurality of suppliers is finally realized.

The technical scheme of the invention is as follows:

a confidence algorithm for scheduling and balancing electricity consumption of multiple users by multiple suppliers of a smart grid comprises the following steps:

step one, multi-vendor budget and multi-user decision: assume that the power system has a user i and a provider j, wherein:

user' sUser i has a total load L _i ∈{L ₁ ，L ₂ ...L _N }；

Suppliers (suppliers)The vendor j uses the overall budget C _j Providing power to all potential users i;

with time slot t as period, power budget for each vendor jConsidered as a separate budget, when power budgetIs insufficient to satisfy the load L _i In time, according to the load L that the supplier j can provide _i Maximized objective function:

the following two constraints are combined:

constraint condition one: user i of vendor j does not overload vendor j:

constraint conditions II: a user i can only be powered by one provider j:

then, calculate user i decision whether to be powered by vendor j by: d, d _i,j E {1,0} (4) in the formula: d, d _i,j A value of 1 means the load L of user i during the current time period _i Providing power by supplier j;

d _i,j a value of 0 means the load L of user i during the current time period _i No power is being provided by supplier j;

step two, improving analysis of a confidence algorithm: the method comprises the following steps:

s21, user decision: user i provides parameters { a } for vendor j _i,j,k ，b _i,j,k And (2) judging whether the user i selects a provider with the load being on or not by judging whether the parameters contain the maximum sampling value of the beta distribution in the following formula:

s22, calculating rewards and punishments: the user decides to result in two results:

result one: satisfy the following requirementsPower budget constraints, i.e.)>

And a second result: the current load sum is greater than or equal to the maximum payload sum in the previous iteration, where the payload sum is a combination of loads that meets the power budget constraint;

the two results are met at the same time, and the user obtains rewards; any result is not satisfied, and the user gets punishment;

s23, BLA super-parameter updating: the rules for updating the super parameters are as follows:

when the current decision of user i is to select the provider j with the load on,

if a prize is caused, a _i,j,1 ＝a _i,j,1 +1；

If a penalty is incurred, b _i,j,1 ＝b _i,j,1 +1；

When the current decision of user i is to select a provider j with the load off,

if a prize is caused, a _i,j,0 ＝a _i,j,0 +1；

If a penalty is incurred, b _i,j,0 ＝b _i,j,0 +1；

S24, iteration number stopping standard: using s to represent a particular iteration, the decision of user i at iteration s is made by d _i,j (s) represents L _T(s) Represented as the current value of the total load value at iteration s, L _T(s) With the value of decision d _i,j (s) a change in value; assuming that the stopping criteria are met, the user who has converged will turn on the load; the average value of the load is very close to the optimum point, which indicates that the convergence accuracy of the algorithm is close to the NE point.

Preferably, in the multi-provider budget and multi-user decision of the first step, the game theory method is adopted to realize in a distributed manner, the total number of users and providers is expected to be limited, the distributed load selection problem among the users is described as a game that the users compete with each other for power access, and the purpose of the distributed load selection problem is to determine a suitable user group, wherein the total requirements of the user group are as close as possible, but within the budget range of different providers.

Preferably, in the user decision of step S21, at the beginning of each iteration, each user must decide on the action of the current iteration, the iterative action being decided by sampling a set of beta distributions maintained by the specific user; at the beginning of each iteration, a random number is extracted from each beta distribution, and then the sample with the largest value is found, which determines the action in the current iteration.

Preferably, in the punishment calculation of step S22, once the user decides on the action of the current iteration, the action is broadcasted in the local area network to notify other users; when all users share this information, rewards or penalties are calculated separately by the different users; if the current action results in the following two results at the same time, giving a reward to the user;

only when the load sum has a non-decreasing trend and the power budget is met, a reward is obtained that will guide the learning entity to converge to a point that provides a better load sum; since the utility function is defined in the global view, the same result of the rewarding penalty applies to all users.

Preferably, in the above-mentioned BLA super-parameter updating in step S23, according to the rewards or penalties obtained by the user, the user will update the super-parameters in the above-mentioned beta distribution in this round of iteration; each user updates its own hyper-parameters according to the feedback of the environment to take the action for the current iteration; this updating rule allows each user to learn from the environment through interactions with other users and attempt to maximize their own benefits of utility function definition; after a certain round of interaction and learning, the user will converge to a NE point.

Preferably, in the iteration number stopping criterion of step S24, parameters of budget and stopping criterion provided for different service providers are input; the output is a consumer of power supplied to the load by various selected service providers.

Preferably, the iteration count stopping criterion of the step S24 includes the following steps:

each user broadcasts his needs to the other users, assumingIf the user can turn on the load, s=0, a is initialized _i，j，k ＝b _i，j，k ＝1；

For each of i and k, a variable is derived from β (a _i，j，k ，b _i，j，k ) Extracting a random number x obeying the formula (2) _i，j，k ，{x _i，j，k -have a random value of 2M;

select { x } _i，j，k Action d associated with the largest element in } _i，j (s) broadcasting the user's actions to all other users;

calculating load:

assume that forAnd L is _T (s) is greater than or equal to C, action d _i，j (s) awarding user i c=l _T (s)；

Obtain rewards (d) _i，j (s) =0) then a _i，j，0 ＝a _i，j，0 +1, otherwise a _i，j，1 ＝a _i，j，1 +1；

Obtaining penalty (d) _i，j (s) =0) then b _i，j，0 ＝b _i，j，0 +1, otherwise b _i，j，1 ＝b _i，j，1 +1；

A user who has converged will turn on the load if the stopping criteria are met.

Preferably, in the iteration count stopping criterion of step S24, after a round of iteration, the user converges to a specific action, and then the game is stopped and returns to the action set; in this work, when all users take the same action continuously in a certain number of iterations, it is represented that the game has converged; to avoid any potential non-convergence situation, a maximum number of iterations I is defined _m At I _m After a iteration, the game will stop regardless of the convergence criteria described above;

when the game stops and returns to the user's action set, the user with decision 1 gets power from the corresponding provider; in the iterative process, 1/0 action decision exchange is carried out between users, and the load is not turned on/off; only when the game is stopped will power be transferred accordingly.

Compared with the prior art, the invention has the following beneficial effects:

(1) Optimizing grid usage: the invention improves the parameters of distributed scheduling in the intelligent power grid, so that the power scheduling is more efficient, controllable and intelligent, the power supply pressure can be relieved when the user uses electricity in the peak period, the consumption peak period can be transferred to the time with lower demand, and the use of the power grid is optimized; and finally, the scheduling of multiple suppliers to the total load of multiple users is realized.

(2) And (3) reducing the network loss: after the power grid is formed, the line structure and parameters of the power grid are relatively unchanged, and the power grid must have the operation condition of minimum grid loss under a certain load in power supply. If the peak Gu Fuzai difference value is increased in the running of the power grid, the power grid utilization coefficient is inevitably reduced, the load rate is reduced, and the power loss is possibly increased. The invention distributes the power load in a balanced way, optimizes the power load configuration, and is favorable for realizing the economic operation of the power grid

(3) High-efficiency interaction: the bidirectional interaction between the electricity utilization user and the power supply and electricity selling enterprises can be promoted. The coordination control process is a voluntary active participation process of both parties, which is beneficial to improving the utilization rate of energy sources and reducing equipment investment and loss.

(4) Stable electricity supply: in distributed power supply, since the power supplier already provides the power set value, the power supply can meet the requirements of power balance, voltage stability and high reliability, and the requirements of users on power quality are met.

(5) Harmonious load management, realizing non-pull-out limit: the users participate in the electricity distribution process autonomously, and the demands of the users are met to the greatest extent. And further, the potential of power plant load regulation and control can be also exploited.

(6) Intelligent electricity consumption, energy conservation and emission reduction are realized: the intelligent control and the control of the electric power guide and the stimulation of the vast users optimize the electricity utilization mode, improve the electricity utilization efficiency and reduce the electricity consumption and the electric power demand, thereby relieving the voltage deficiency, reducing the power supply cost and the electricity utilization cost, and making both the power supply and the electricity utilization win together, thereby achieving the long-term aims of saving energy and protecting the environment.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.

Fig. 1 is a schematic view of scheduling balance electricity consumption of the smart grid of the present invention.

Fig. 2 is a flow chart of a procedure for improving the confidence algorithm of the present invention.

FIG. 3 is a simulated process diagram of the improved confidence algorithm of the present invention.

FIG. 4 is a command operation diagram of the improved confidence algorithm of the present invention.

FIG. 5 is a graph of user rewards and penalties in example 4.

Detailed Description

In order to make the technical solution of the present invention better understood by those skilled in the art, the technical solution of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

The invention proposes an improved confidence algorithm combining a Learning Automaton (LA) with a distributed game theory, further uses simulation software ns2 to explain the implementation idea and design and analyze the model, and proves that the BLA-based method can converge to NE point with very high probability without any pre-configuration, which makes it a more practical choice in this application field, which makes it more advantageous in solving the problem.

Example 1

As shown in fig. 1, the embodiment provides a confidence algorithm for scheduling and balancing power consumption of multiple users by multiple suppliers of a smart grid, which includes the following steps:

user' sUser i has a total load L _i ∈{L ₁ ，L ₂ ...L _N }；

the following two constraints are combined:

constraint condition one: user i of vendor j does not overload vendor j:

constraint conditions II: a user i can only be powered by one provider j:

then, the following formula meter is usedCalculating user i decides whether to be powered by vendor j: d, d _i,j ∈{1,0} (4)

Wherein: d, d _i,j A value of 1 means the load L of user i during the current time period _i Providing power by supplier j;

result one: satisfy the following requirementsPower budget constraints, i.e.)>

if a prize is caused, a _i,j,1 ＝a _i,j,1 +1；

If a penalty is incurred, b _i,j,1 ＝b _i,j,1 +1；

if a prize is caused, a _i,j,0 ＝a _i,j,0 +1；

If a penalty is incurred, b _i,j,0 ＝b _i,j,0 +1；

Working principle:

the invention provides a smart grid electricity consumption scheduling balancing method based on an improved confidence algorithm. According to the invention, a virtual model is constructed for the actual situation, so that the distributed scheduling problem is simulated and simulated by using mathematical analysis software. By incorporating the idea of game theory, the load selection problem is described as a game problem and it is demonstrated that NE points can be reached using games in this problem. The algorithm is further improved, beta distribution is defined as the basis of user selection provider in each iteration, users are informed of each other after load selection, rewards and penalties are carried out on each user action according to a judging rule, parameters in the beta distribution are updated according to the rewards and penalties, finally, in order to reach NE points of a game, the limit of iteration times is formulated, and when a plurality of user actions are unchanged all the time in a certain number of iteration processes, load balancing is achieved.

Example 2

The present invention will be further explained on the basis of example 1 with reference to the accompanying drawings:

the research object of the invention is an intelligent electric network subnet: a local customer network between the transformer and the customer. A typical scenario is a residential area, where several users play a user role. These power sources are provided and installed by power suppliers who obtain their power budget from the upper layers of the power network according to the supplier's schedule. The goal of the power supply is to provide power to individual users while maintaining overall power consumption below a given power budget. Where multiple power providers are available, each provider will provide its own power budget.

Assuming that there are a total of N users,L _i ∈{L ₁ ，L ₂ ...L _N and are their respective movable loads. Assuming that there are M suppliers in the system, +.>The provider may suggest to all potential users the overall budget for the movable load of a particular provider j, and this number is denoted by C _j And (3) representing. The system operates in a periodic manner and the time slot is t, which is of the order of a few minutes in length. At the beginning of each period, the budget of the movable loads of all suppliers is provided to all users where the power budget from each individual supplier is +.>Considered as a separate budget. The power budget provided may not be sufficient to service all L' s _i And (3) loading. The system therefore needs to determine the power budget being provided +.>Which loads can be serviced, users will compete for their own required load after getting budgets, winning users, and their loads will be delivered in slots.

The goal of the distributed scheduling problem is to determine a suitable group of users whose total demands are as close as possible, but within the budget of different suppliers. Let d _i，j E {1,0} indicates whether user i is served by service provider j. User blockA policy of 1 means that the demand of a particular user will be serviced by a particular provider for the current period of time, while a policy of 0 means that the corresponding load demand will not be serviced by the power grid for the current period of time. Thus, the user who obtains decision 1 and selects provider j completes sharing with provider j's budget. In the current time slot, a user not served by provider j may be served by another provider. If any service provider is unable to provide service to the user for the current time period, the user will be provided with service for a future time period. Formally, the problem is expressed as follows:

d _i,j ∈{1,0} (4)

the objective function equation (1) is to maximize the amount of load that the provider can provide. Equation (2) ensures that the user using vendor j does not overload vendor j. Equation (3) ensures that each user can only be serviced by one vendor.

The main focus of the present invention is the scheduling of the total load of users in domestic micro-grids, and thus the total number of users and suppliers is expected to be limited. In this work, the invention is realized in a distributed manner by adopting a game theory method, and the distributed load selection problem among users is described as a game that users compete with each other for power access.

The improved algorithm comprises four major parts of user decision, punishment calculation, BLA super-parameter updating and iteration number stopping standard.

1) User decision

At the beginning of each iteration, each user must determine the action of the current iteration, which is determined by sampling a set of beta distributions maintained by the particular user.

More specifically, each user i provides four parameters for vendor j, namely { a } _i,j,k ，b _i,j,k Where i is the user, j is the provider, and k e {0,1} represents on or off, respectively. For example, for user 1, it provides provider 1 with a _1,1,0 、a _1,1,1 、b _1,1,0 、b _1,1,1 . These parameters are used to maintain the beta distribution to facilitate decision making, as shown in equation (5.1), for a total of 4M parameters per user, corresponding to 2M beta distributions.

At the beginning of each iteration, a random number is extracted from each beta distribution, and then the sample with the largest value is found, which determines the action in the current iteration. For example, if with the parameter alpha _1，1，1 、b _1，1，1 If the beta distribution of (c) provides the maximum sample value, then the user i operates to turn on the load using vendor j.

2) Prize and punishment calculation

Once the user decides on their action on the current iteration, they will broadcast their decision in the local area network to notify other players. When all players share this information, rewards or penalties may be calculated separately by different users. If the current action results in both of the following results, then a reward is given to the user.

1. Meeting the power budget constraint, i.e.

2. The current load sum is greater than or equal to the maximum payload sum in the previous iteration, where the payload sum is a combination of loads that meets the power budget constraint.

If both conditions are not met at the same time, a penalty is given to the user. Obviously, only when the load sum has a non-decreasing trend and the power budget is met, the reward will be obtained, which will lead the learning entity to converge to a point that can provide a better load sum. Since the utility function is defined in the global view, the same result of the rewarding penalty applies to all users.

3) BLA hyper-parameter update

Based on the rewards or penalties obtained by the user, the user will update the hyper-parameters in the beta distribution in this round of iterations. The rule for updating the super parameters is as follows. When the current action of user i is to select the provider j with the load on, if this action results in a prize alpha _i，j，1 ＝α _i，j，1 +1. If a penalty is received, b _i，j，1 ＝b _i，j，1 +1. Similarly, when the current action of user i is to select vendor j with load off, if this operation results in a reward, α _i，j，0 ＝a _i，j，0 +1 if penalty is received, b _i，j，0 ＝b _i，j，0 +1。

Based on the rules described above, each user can update its own hyper-parameters based on feedback from the environment to take action for the current iteration. This update rule together with the sampling method explained in table 1 formulate a BLA. Each user is provided with a BLA that can learn from the environment through interactions with other users and attempt to maximize the benefits of the utility function definition itself. After a certain round of interaction and learning, the user will converge to a NE point.

4) Stop criterion for iteration number

After a round of iterations, the user will converge to a particular action, and the game will then stop and return to the action set. In this work, when all users take the same action continuously in a certain number of iterations, representing that the game has converged, denoted S. To avoid any potential non-convergence, the present invention defines a maximum number of iterations I _m At I _m After a number of iterations, the game will stop, regardless of the convergence criteria described above.

When the game stops and returns to the user's action set, the user who decides ON can obtain power service from the corresponding provider. During the iteration process, the users exchange ON/OFF action decisions and they do not turn ON/OFF their devices. Only when the game is stopped will power be delivered accordingly.

Example 3

Based on example 2, the overall LA-based SGs algorithm is explained below, as in table 1.

The invention uses s to represent a particular iteration, the decision of user i at iteration s is made by d _i，j (s) represents. Similarly, the invention will be L _T (s) is represented as the current value of the total load value at iteration s. L (L) _T The value of(s) follows the decision d _i，j The value of(s) varies.

Table 1 algorithm table

The LA algorithm is compared for the three algorithms in table 2. The results for different lambda values are illustrated for the case of CLA and LRI. In contrast, the results of BLAs are presented in a single row in these tables, as BLAs are not dependent on any parameters.

Table 2 algorithm comparison table

For a BLA-based, parameter-free algorithm, the average of the chosen loads is very close to the best point in both tables, indicating that the convergence accuracy of the algorithm is close to the NE point. Furthermore, unlike other LA-based schemes, the BLA algorithm has no speed/accuracy conflicts.

When λ is 0.10 and 0.15, the LRI and CLA iterations decrease, while the BAL iterations do not change. In this case, the load capacity remains the same, but the LRI and CLA iterations are less than BLA. BLA is not an optimal algorithm.

When the lambda value is greater than 0.20, the LRI and CLA have defects in load capacity, and as the lambda value increases, the load capacity decreases, and the accuracy of reaching the NE point decreases continuously, and the accuracy of BLA does not change. Although the number of iterations of LRI and CLA is less than BLA, the first element considered by the present invention is that the result is optimal. So that in this case, BLA is said to be a more excellent algorithm.

In summary, in the first case and the third case, BLA is a better performing LA algorithm that exceeds LRI and CLA in terms of speed and accuracy, respectively. In the second case, a relatively small lambda value results in better performance of LRI and CLA than BLA, but the lambda value may vary greatly depending on the configuration of the system. Thus, whenever LRI or CLA is applied, an ideal value of λ must be determined for a particular system configuration. However, the BLA-based approach does not require any prior configuration to be set, which makes it a more practical choice in this application field.

Example 4

Based on example 3, the following explanation is made for the improved confidence algorithm simulation analysis.

Simulation program flow diagram the framework design is performed before programming, as shown in fig. 2, and the following parameter analysis is performed:

1) Prize and punish time:

the time it takes to make a reward or penalty between two users. The time it takes for a node to make each iteration is recorded by Matlab and denoted t. Statistics of punishment and punishment time are performed in order to calculate the time spent in network communication between users.

2) Iteration number:

all users reach the number of iterations performed after the nash equalization point. The speed of tending to the equilibrium point among different algorithms can be obtained by comparing and analyzing the number relation of the iteration times. The more iterations, the more time it takes to reach the equilibrium point, i.e. the slower the speed towards the NE point, the lower the performance of the algorithm corresponding to the number of iterations. Conversely, the higher.

3) Load capacity:

the sum of the capacities used by each user to select the load-on action. The accuracy of the point NE is converged by observing the load capacity analysis algorithm. The more the load capacity, the higher the representation accuracy and the lower the branching, with the same number of iterations.

4) Data statistics

The time of the one-time punishment (period) was calculated by Matlab, and 78 sets of data were randomly extracted in all the time of the punishment, with an average result of 6, as shown in table 3. The time interval for transmitting two packets between users in the communication domain is illustrated as 6. Assuming a simulation model, there are 3 power transformers supplying 20 customers. 24 experiments were performed presenting the actions taken by 20 users (1 representing an open load and 2 representing a closed load). And the iteration number and load capacity are recorded.

Table 3 rewarding and punishing schedule

As can be seen from table 4, user 2, nine load turns off, user 16, eleven load turns off, user 19, three load turns off, indicating that the NE point reached in each game play is not just one point.

Table 4 user action table

As shown in table 5, the average iteration number F is 636542, that is, the user needs to perform F iterations to reach the load balancing point, which is equivalent to performing F transmissions on each node in Ns 2.

TABLE 5 iteration number table

As shown in table 6, the load capacity was substantially the same for each simulation, indicating that the accuracy of the experiment was relatively stable when load balancing was achieved. The Ns2 simulation software was then used to analyze the communication between users.

Table 6 load capacity meter

By simulating two users, the bandwidth is set to be 1Gbps, the packet size is 1250byte, 636542 (iteration number) transmissions are carried out, and the interval time between two packets is one-time punishment time 6. Looking at the trace file, as shown in FIG. 3, the total transmission time is 4.82 seconds, which is equivalent to 4.82 seconds required to reach the NE point. Analysis of trace files by awk language resulted in a number of iterations equal to 636542, as shown in fig. 4. The resulting rewards and penalties for one user in the first 300 iterations of reaching the NE point are analyzed, as shown in fig. 5. By observing the graph, the user's prize penalty value is no longer changed during the last period of time of FIG. 5, and convergence to the NE point can be determined.

Claims

1. A confidence algorithm for scheduling and balancing electricity consumption of multiple users by multiple suppliers of a smart grid, which is characterized by comprising the following steps:

the user i has the advantage that,user i has a total load L _i ∈{L ₁ ，L ₂ ...L _N }；

Vendor j, j e M, m= {1, 2..m }, vendor j uses the overall budget C _j Providing power to all potential users i;

with time slot t as period, power budget for each vendor jConsidered as separate budget, when power budget +.>Is insufficient to satisfy the load L _i In time, according to the load L that the supplier j can provide _i Maximized objective function:

the following two constraints are combined:

constraint condition one: user i of vendor j does not overload vendor j:

constraint conditions II: a user i can only be powered by one provider j:

then, calculate user i decision whether to be powered by vendor j by: d, d _i，i ∈{1，0} (4)

Wherein: d, d _i，j A value of 1 means the load L of user i during the current time period _i Providing power by supplier j;

d _i，j a value of 0 means the load L of user i during the current time period _i No power is being provided by supplier j;

s21, user decision: user i provides parameters { a } for vendor j _i，j，k ，b _i，j，k And (2) judging whether the user i selects a provider with the load being on or not by judging whether the parameters contain the maximum sampling value of the beta distribution in the following formula:

result one: satisfy the following requirementsPower budget constraints, i.e.)>

if a prize is caused, a _i，j，1 ＝a _i，j，1 +1；

If a penalty is incurred, b _i，j，1 ＝b _i，j，1 +1；

if a prize is caused, a _i，j，0 ＝a _i，j，0 +1；

If a penalty is incurred, b _i，j，0 ＝b _i，j，0 +1；

S24, iteration number stopping standard: using s to represent a particular iteration, the decision of user i at iteration s is made by d _i，j (s) represents L _T(s) Represented as the current value of the total load value at iteration s, L _T(s) With the value of decision d _i，j (s) a change in value; assuming that the stopping criteria are met, the user who has converged will turn on the load; the average value of the load is very close to the optimum point, which indicates that the convergence accuracy of the algorithm is close to the NE point.

2. The smart grid multi-provider-to-multi-user scheduling balancing power consumption confidence algorithm according to claim 1, wherein the multi-provider budget and multi-user decision of step one are implemented in a distributed manner using a game theory method, the total number of users and providers is expected to be limited, the distributed load selection problem between users is described as a game where users compete with each other for power access, the goal of the distributed load selection problem is to determine a suitable user group, the total demands of the user groups are as close as possible, but within the budget of different providers.

3. The smart grid multi-provider-to-multi-user scheduling balancing power consumption confidence algorithm of claim 1, wherein in the user decision of step S21, at the beginning of each iteration, each user must decide the current iteration action, which is decided by sampling a set of beta distributions maintained by a specific user; at the beginning of each iteration, a random number is extracted from each beta distribution, and then the sample with the largest value is found, which determines the action in the current iteration.

4. The smart grid multi-provider-to-multi-user scheduling balancing power consumption confidence algorithm according to claim 3, wherein in the step S22 reward calculation, once the user decides the action on the current iteration, the user will broadcast in the local area network to notify other users; when all users share this information, rewards or penalties are calculated separately by the different users; if the current action results in the following two results at the same time, giving a reward to the user;

5. The confidence algorithm for multi-provider scheduling balance power consumption of the smart grid according to claim 4, wherein in the BLA super-parameter updating of step S23, the user updates the super-parameters in the beta distribution in the present iteration according to rewards or penalties obtained by the user; each user updates its own hyper-parameters according to the feedback of the environment to take the action for the current iteration; this updating rule allows each user to learn from the environment through interactions with other users and attempt to maximize their own benefits of utility function definition; after a certain round of interaction and learning, the user will converge to a NE point.

6. The confidence algorithm for scheduling balance power consumption by multiple users of multiple suppliers of smart grid according to claim 5, wherein parameters of budget and stop criteria provided for different service providers are input into the iteration number stop criteria of step S24; the output is a consumer of power supplied to the load by various selected service providers.

7. The smart grid multi-provider-to-multi-user scheduling balancing power consumption confidence algorithm according to claim 6, wherein the iteration count stopping criteria in step S24 includes the following steps:

each user broadcasts his needs to the other users, assuming min (L _i )>C _j ，If the user can turn on the load, s=0, a is initialized _i，jk ＝b _i，j，k ＝1；

For each of i and k, a variable is derived from β (a _ijk ，b _i，j，k ) Extracting a random number x obeying the formula (2) _i，jk ，{x _i，j，k -have a random value of 2M;

calculating load:

8. The smart grid multi-provider-to-multi-user scheduling balancing power consumption confidence algorithm according to claim 7, wherein in the iteration number stopping criteria of step S24, after one iteration, the user converges to a specific action, and then the game is stopped and returns to the action set; in this work, when all users take the same action continuously in a certain number of iterations, it is represented that the game has converged; to avoid any potential non-convergence situation, a maximum number of iterations I is defined _m At I _m After a iteration, the game will stop regardless of the convergence criteria described above;