CN114400675A

CN114400675A - Active power distribution network voltage control method based on weight mean value deep double-Q network

Info

Publication number: CN114400675A
Application number: CN202210074238.3A
Authority: CN
Inventors: 王杨洋; 茆美琴; 杜燕
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2022-01-21
Filing date: 2022-01-21
Publication date: 2022-04-26
Anticipated expiration: 2042-01-21
Also published as: CN114400675B

Abstract

The invention discloses an active power distribution network voltage control method based on a weight mean value deep double-Q network, which is applied to a deep reinforcement learning method in the field of distribution network/micro-grid voltage regulation, wherein the voltage distribution of a distribution network/micro-grid, the adjustable capacity/power of an electric vehicle cluster are used as a state set, the adjustable micro-source output power is used as an action set, and the deep reinforcement learning training is carried out according to the corrected node voltage fluctuation as an instant reward, so that an intelligent body learns the output action which is most beneficial to voltage regulation under different voltage distribution and adjustable resource environments. The method for designing the reward target value by adopting the weight mean value combines the characteristics of reward target value design of the deep Q network and the deep double Q network, thereby avoiding the problems that the reward target value of the deep Q network is overestimated and the reward target value in the deep double Q network is underestimated.

Description

Active power distribution network voltage control method based on weight mean value deep double-Q network

Technical Field

The invention relates to an active power distribution network voltage regulation method based on a weight-mean-value deep double-Q network (WDDQN), in particular to an intelligent voltage regulation method for distribution network/microgrid voltage fluctuation and overlimit caused by renewable energy output fluctuation, voltage drop caused by network topology and other reasons, which is applied to the environment that electric automobiles are connected to a power distribution network and a microgrid in a large scale.

Background

With the continuous development of renewable energy sources and the increase of electric automobiles. A variety of distributed energy sources including renewable energy, distributed generation units (DG), and stored energy are connected into a distribution/microgrid on a large scale. However, the coupling of reproducible intermittency and randomness to load fluctuations, such as photovoltaic fans, creates source-to-load mismatch problems and thus voltage fluctuations and out-of-limit problems. In addition, the distribution network/microgrid generally belongs to a low-voltage power grid, the line reactance in a low-voltage line cannot meet the condition far greater than the resistance, and the coupling relation between the active power and the voltage cannot be ignored. Therefore, the output adjustment of the active power and the reactive power of the adjustable micro-source in the distribution network/micro-grid can affect the voltage distribution, so that the voltage control is more difficult in the scene.

For the problem of voltage regulation of the distribution network/microgrid, various methods exist, such as a local control method based on droop control, hierarchical control through mathematical optimization and an intelligent algorithm, and the like. However, the methods generally need to solve after modeling for the multi-type micro-sources, and the problems of large modeling workload, large calculation amount, difficult online control, easy falling into local optimization and the like generally exist. In addition, for the problem that the electric automobile participates in voltage regulation, as the charging and discharging of the electric automobile are active power exchange, a model and an algorithm for voltage regulation need to be designed independently when the electric automobile is jointly regulated with a reactive power source, and the optimization is more difficult. Based on this, a voltage control method based on reinforcement learning has attracted attention. For example, voltage control using Q learning is adopted, but the state set and the action set in Q learning are discrete and limited, and it is difficult to handle a large-scale network structure. The deep Q learning (DQN) combined with the deep learning realizes the continuity of the state set, but the DQN often has the problem that the selection of the voltage regulating action is unreasonable due to overestimation of the reward value.

Disclosure of Invention

The invention provides an active power distribution network voltage control method based on a weight mean value deep double-Q network for avoiding the defects of the prior art, wherein the weight mean value method is adopted in the design of the reward target value and is combined with the characteristics of the reward target value design of the deep Q network and the deep double-Q network, so that the problems that the reward target value of the deep Q network is overestimated and the reward target value in the deep double-Q network is underestimated are avoided, and the reasonable evaluation of the reward value is realized; aiming at the problem of voltage and active and reactive power uniform coupling in a low-voltage distribution network/microgrid, a reasonable action set design is adopted, the common control of the output power of an active micro source and a reactive micro source is realized, and finally the optimal output of an adjustable micro source is realized under the conditions of different voltage distribution states and adjustable resources of an electric automobile cluster.

The invention adopts the following technical scheme for solving the technical problems:

the active power distribution network voltage control method based on the weight mean value deep double-Q network is characterized by comprising the following steps of:

step 1, determining a state set S and an action set A according to voltage distribution and power supply adjustable conditions in a power grid, setting an instant reward r, and obtaining a reward function Q (S, a) of an action a in a state S, which is represented by formula (1):

in formula (1):

s belongs to S; a belongs to A; e () is the desired value; gamma is the learning rate;

s ' represents the new state reached by taking action a in state s, a ' represents the new action taken in state s ';

p (s, s ') is the probability of state s transitioning to a new state s';

q (s ', a') is the reward function for the new action a 'in the new state s';

step 2, designing a network structure and a network loss function L (theta) of the weight mean value deep double-Q neural network:

the network structure comprises an input layer, an output layer and a plurality of hidden layers;

the input layer takes the current state S of a state set S at the current time t_tTo input, the current state s_tIs a state vector, i.e. a state vector s_t：

The output layer is in the current state s_tReward function estimate Q for all actions in action set a at the current time t_t(s_tA | θ) is the output, where θ is a defining parameter of the reward function estimate;

each hidden layer of the plurality of hidden layers comprises a plurality of neurons; the activation function of the neuron is ReLu;

the network loss function L (θ) is characterized by equation (2):

L(θ)＝E(y^WDDQN-Q(s,a|θ))² (2)

in formula (2):

q (s, a | θ) is the reward function estimate for action a in state s;

y^WDDQNthe reward target value is obtained by calculation of an equation (3) by adopting a weight average value method:

y^WDDQN＝r+γ(βQ(s',a*|θ)+(1-β)Q(s',a*|θ^-)) (3)

in formula (3):

beta is a weight; a is the action when the current reward function estimation value is maximum under the state s';

Q(s',a*|θ^-) The target reward function value of action a in state s' is θ^-Defining parameters for the value of the objective reward function;

q (s', a | θ) is the reward function estimation value of action a in state s, i.e. the current reward function estimation value is defined parameter using θ as reward function value;

the current reward function estimation value Q (s', a | theta) is obtained by the output of the neural network output layer and is continuously updated in the neural network to form an online network;

the target reward function value Q (s', a | θ |)^-) The target network is obtained by target network output, the structure of the target network is the same as that of the online network, and target network parameters are obtained by online network replication according to set interval steps;

the weight β is obtained by equation (4):

in formula (4):

a_Lthe action when the reward function estimation value is minimum under the state s'; c is a hyperparameter for adjusting the weight value;

Q(s',a_L|θ^-) Is in a state s' and acts a_LThe value of the objective reward function of, in theta^-Defining parameters for the value of the objective reward function;

step 3, designing a dynamic epsilon-greedy strategy;

the dynamic epsilon-greedy strategy is to randomly select an action with a probability of epsilon and select the action with the maximum current reward value with a probability of (1-epsilon) when selecting the action, wherein the epsilon is characterized by an equation (5):

in formula (5):

δ is an adjustment coefficient, the value of δ being a constant less than 1;

step is the number of steps; x₀Is an initial value of exploration, the value of which is a positive number; a is_rIs an action randomly selected under the state s;

Q(s,a_r| θ) is an action a under the state s_rThe reward function estimated value of (1) takes theta as a definition parameter of a reward function value;

q (s, a | theta) is the reward function estimation value of the action a in the state s, and theta is a definition parameter of the reward function value;

step 4, the voltage control of the distribution network/microgrid is implemented as follows:

4.1, establishing a mean depth double-Q neural network according to the Step 2, initializing a parameter theta in the neural network, initializing the capacity of a memory set D and the sampling number of a sampling set, reading the prediction result of the adjustable power of the electric automobile, and setting the Step number Step to be 0;

4.2, reading the voltage of the power grid and obtaining the current state s by combining the prediction result of the adjustable power of the electric automobile_t；

4.3, comparing the current state s_tReward function estimation value Q of all actions obtained by inputting into online network_t(s_t,A|θ)；

4.4, selecting the current action a from the action set A in the step 1 according to the dynamic epsilon-greedy strategy obtained in the step 3_tAnd inputting the current action into the power grid for load flow calculation to obtain a new state s_t+1；

4.5 according to the new state s_t+1Calculating to obtain an instant reward r;

4.6, will { s_t,a_t,s_t+1R is put into a memory set D, and then whether the memory set D is full is judged;

if the memory set D is not full, returning to the step 4.2;

if the memory set D is full, go to step 4.7;

4.7, sampling the memory set D to the online network and the target network, and respectively calculating Q (s, a | theta) and y^WDDQNCalculating a loss function L (theta), and updating online network parameters by adopting random gradient descent;

4.8, increasing the assignment of Step by 1, and copying the online network parameters to the target network at intervals of a fixed Step number C;

4.9, judging whether the value of Step is the maximum;

if the value of Step is not the maximum value, returning to the Step 4.7;

if the value of Step is the maximum value, the action with the maximum reward value is output by the current online network, the reinforcement learning process is completed, and the voltage control is realized.

The active power distribution network voltage control method based on the weight mean value deep double-Q network is also characterized in that: in the step 1, the status set S, the action set a and the instant prize r are set as follows:

the state set S is a set of all state vectors, state vector S_tThe node voltage distribution condition in the power grid and the electric automobile cluster adjustable condition at the current moment t are represented by the formula (6):

s_t＝{U_1,t,...,U_N,t,...,PE_l,t,min,...,PE_l,t,max,...,CE_l,t,min,...,CE_l,t,max,...} (6)

in formula (6):

representing nodes by i, wherein i is 1,2, …, and N is the number of nodes in the voltage regulation area;

by U_i,tRepresenting the voltage amplitude of the ith node at the current moment t;

U_1,tis the voltage amplitude, U, of the 1 st node at the present time t_N,tThe voltage amplitude of the Nth node at the current moment t is obtained;

PE_l,t,minthe minimum value of the adjustable power of the first electric automobile cluster at the current moment t is obtained;

PE_l,t,maxthe maximum value of the adjustable power of the first electric automobile cluster at the current moment t is obtained;

CE_l,t,minthe minimum value of the adjustable capacity of the first electric automobile cluster at the current moment t is obtained;

CE_l,t,maxthe maximum value of the adjustable capacity of the first electric automobile cluster at the current moment t is obtained;

the number of the adjustable electric automobile clusters is represented by L, and the state vector s_tThe number of elements of (a) is N + 4L;

the action set A is the set of all action vectors in the current state s_tMotion vector a of_tThe output action of the adjustable micro source is characterized by an equation (7);

in formula (7):

the tunable micro sources are represented by j, m is the total number of tunable micro sources, and j is 1,2, …, m;

the adjustable actions are represented by K, the K is the total number of the adjustable actions of each unit, and K is 0,1, … and K-1

The number of elements of the action set A is K^mA plurality of;

to be provided with

Characterizing a kth action of a jth tunable micro-source at a current time tth; and comprises the following components:

Q_j,minthe minimum value of the adjustable reactive power of the jth adjustable micro source;

Q_j,maxthe maximum value of the adjustable reactive power of the jth adjustable micro source;

P_j,t,minthe minimum value of the adjustable active power of the jth adjustable micro source at the moment t in the current plane is obtained;

P_j,t,maxthe maximum value of the adjustable active power of the jth adjustable micro source at the moment t in the current plane is obtained;

the instant prize r is characterized by equation (9);

in formula (9):

U_iis the node voltage; lambda [ alpha ]_iThe real-time reward is a reward coefficient and is used for correcting the size of the real-time reward, and the reward coefficient comprises the following components:

the instant prize r is based on the relevant specifications of the power system with respect to voltage offsets and schedules out-of-limit nodes preferentially.

Compared with the prior art, the invention has the beneficial effects that:

1. according to the active power distribution network/microgrid voltage control method based on the weight mean value deep double-Q network, which is disclosed by the invention, in the field of power distribution network/microgrid voltage regulation, the problem that the traditional deep Q learning overestimates the reward target value and the double-Q deep learning underestimates the reward target value is effectively avoided; reasonable evaluation of the reward target value is more effectively realized, so that the optimal output action of the adjustable micro-source is determined under different voltage distribution states and adjustable resource conditions of the electric automobile cluster;

2. the electric automobile can only perform active regulation, and the voltage of the distribution network/microgrid is coupled with both active power and reactive power, and the invention sets a suitable action set and state set aiming at the condition that the electric automobile and other adjustable loads are connected into the distribution network/microgrid, thereby realizing the purpose of effectively utilizing various adjustable resources to perform voltage regulation;

3. the invention sets the immediate action reward function in consideration of the regulation of the voltage deviation by the management department, realizes the prior scheduling of the node with larger deviation under the condition of limited scheduling resources, and greatly contributes to preventing the voltage from exceeding the limit.

Drawings

FIG. 1 is a flow chart of an active power distribution network voltage control method based on a weight mean depth double Q network in the invention;

FIG. 2 illustrates a training process of a neural network according to the present invention;

FIG. 3 is an IEEE-33 distribution network topology used in the testing of the present invention;

FIG. 4a is a graph of the renewable energy output of the distribution network used in the tests of the present invention

FIG. 4b shows the distribution network used in the test of the present invention; the connected electric automobile cluster can regulate power data;

FIG. 5 shows the voltage distribution before conditioning for the test case of the present invention;

FIG. 6 is a process of training different algorithms in the test case of the present invention;

FIG. 7 shows the adjusted voltage distribution for the test cases of the present invention.

Detailed description of the invention

The invention belongs to a deep reinforcement learning method, which is a branch of machine learning and is the combination of reinforcement learning and a deep neural network. The deep reinforcement learning has the greatest characteristic that the intelligent agent learns in interaction, and the intelligent agent learns knowledge continuously in interaction with the environment according to acquired rewards or punishment, so that the intelligent agent is more adaptive to the environment. The state range of the environment is a state set, the action range of the intelligent agent is an action set, the rewards obtained after actions can be divided into instant rewards and overall profits, and the overall profits are obtained by the instant rewards. Therefore, for a deep reinforcement learning design, one of the cores is the design of the reward set, the action set and the instant reward.

The active power distribution network voltage control method based on the weight mean value deep double-Q network in the embodiment comprises the following steps:

step 1, determining a state set S and an action set A according to voltage distribution and power supply adjustable conditions in a power grid, and setting an instant reward r.

Through the design of the state set, the action set and the instant reward, the instant reward value of the action a in the state s can be obtained, but for the whole system, what is more important is what the overall benefit of the action a in the state s can be obtained in the whole process, but the overall benefit is difficult to obtain directly, so that the reward function Q (s, a) of the action a in the state s is characterized by the following formula (1):

in formula (1):

s belongs to S; a belongs to A; e () is the desired value; γ is the learning rate, γ is typically a constant less than 1;

p (s, s ') is the probability of state s transitioning to a new state s';

q (s ', a') is the reward function for the new action a 'in the new state s'.

the current state S of the input layer at the current time t in a state set S_tFor input, current state s_tIs a state vector, i.e. a state vector s_t：

Output layer in current state s_tReward function estimate Q for all actions in action set a at the current time t_t(s_tA | θ) is the output, where θ is a defining parameter of the reward function estimate;

the network loss function L (θ) is characterized by equation (2):

L(θ)＝E(y^WDDQN-Q(s,a|θ))² (2)

in formula (2):

q (s, a | θ) is the reward function estimate for action a in state s;

y^WDDQN＝r+γ(βQ(s',a*|θ)+(1-β)Q(s',a*|θ^-)) (3)

in formula (3):

the current reward function estimated value Q (s', a | theta) is obtained by the output of the output layer of the neural network and is continuously updated in the neural network to form an online network;

target reward function value Q (s', a | θ)^-) The target network is obtained by the output of the target network, the structure of the target network is the same as that of the online network, and the target network parameters are obtained by the online network replication according to the set interval steps;

the weight β is obtained by equation (4):

in formula (4):

in Q learning, each state and action corresponds to a reward function, and when the state and the action are increased, the reward functions are too many to be calculated; therefore, the method of deep reinforcement learning through the deep neural network approaches the real reward function in the form of the reward function estimation value, namely the Q neural network; the network structure of the weight mean value deep double-Q neural network is similar to that of the Q neural network, and the difference is mainly the design of a network loss function and a reward target value.

Step 3, designing a dynamic epsilon-greedy strategy;

the dynamic epsilon-greedy strategy is to randomly select an action with the probability of epsilon and select the action with the maximum current reward value with the probability of (1-epsilon) when selecting the action, and epsilon is characterized by the formula (5):

in formula (5):

δ is an adjustment coefficient, the value of δ being a constant less than 1;

on one hand, epsilon is gradually reduced along with the increase of the iteration times, so that the goal of first exploring and then converging can be realized; on the other hand, epsilon is also related to the relative size of the reward value, and if the randomly chosen reward value is a small difference from the current maximum reward value, indicating that the action currently obtaining the maximum reward value may not be good enough, epsilon will become large, tending to explore new actions, and vice versa.

Step 4, referring to fig. 1, the voltage control of the distribution network/microgrid is implemented as follows:

4.5 according to the new state s_t+1Calculating to obtain an instant reward r;

if the memory set D is not full, returning to the step 4.2;

if the memory set D is full, go to step 4.7;

4.7, sampling the memory set D to the online network and the target network, and respectively calculating Q (s, a | theta) and y^WDDQNAnd calculating a loss function L (theta), updating the on-line network parameters using a random gradient descent, whichThe process is shown in FIG. 2;

4.9, judging whether the value of Step is the maximum;

if the value of Step is not the maximum value, returning to the Step 4.7;

In a specific implementation, in step 1, the status set S, the action set a and the instant prize r are set as follows:

the state set S is the set of all state vectors, state vector S_tThe node voltage distribution condition in the power grid and the electric automobile cluster adjustable condition at the current moment t are represented by the formula (6):

in formula (6):

the number of the adjustable electric automobile clusters is represented by L,state vector s_tThe number of elements of (a) is N + 4L;

the action set A is the set of all action vectors, at the current state s_tMotion vector a of_tThe output action of the adjustable micro source is characterized by an equation (7);

in formula (7):

The number of elements of the action set A is K^mA plurality of;

to be provided with

the instant prize r is characterized by equation (9);

in formula (9):

U_iis a section ofA dot voltage; lambda [ alpha ]_iThe real-time reward is a reward coefficient and is used for correcting the size of the real-time reward, and the reward coefficient comprises the following components:

the immediate reward r is based on the relevant specifications of the power system with respect to voltage offsets and schedules out-of-limit nodes preferentially.

In the invention, the maximum power and the minimum power of the reactive power source in the scheduling are regarded as constant, and the adjustable power of the active power source is changed. The reactive power source comprises a capacitor, a static reactive compensator and a synchronous generator; the active power source comprises an electric automobile cluster, an energy storage unit, a temperature control load and a micro gas turbine.

According to the invention, based on the weight mean value deep double-Q network method, the node voltage of a power grid and the adjustable condition of an electric vehicle are used as a state set, the adjustable micro-source reactive power and the active power output are used as an action set, and the corrected voltage fluctuation is used as a reward value to carry out deep reinforcement learning training, so that an intelligent agent learns the output action which is most beneficial to voltage adjustment under different power grid and adjustable resource environments.

Referring to fig. 3, a distribution network system using the modified IEEE33 node was tested. Three nodes of 8, 15 and 25 in the system are respectively connected with photovoltaic with the rated power of 2.5MW, a fan with the rated power of 3MW and photovoltaic with the rated power of 2.5 MW. The solar output of the fan and the photovoltaic is shown in fig. 4 a. The node 1 in the figure 5 is a balance node due to the fluctuation of the photovoltaic and fan output and the voltage drop of the distribution network, which causes the voltage fluctuation and the out-of-limit of the system; the voltage control of the distribution network is realized in the distribution network system according to the following steps:

step a, determining a state set S and an action set A, and setting an instant reward r.

The number N of nodes in a voltage regulation area is 33, l is 2, 35 state sets are used, the maximum power and the minimum power of a reactive power source in scheduling are regarded as constant, the adjustable power of an active power source such as an electric automobile is changed, and the adjustable micro power source condition is shown in a table 1:

TABLE 1

Summarizing: if K is 8 and m is 3 in the action set, there are 512 action combinations in total.

B, establishing a mean value depth double-Q neural network, wherein a hidden layer is 2, each hidden layer neuron is 48, and an activation function is ReLu; the hyperparameter c for adjusting the weight value is taken as 1.

Step c, designing a dynamic Epsilon-Greedy (Epsilon-Greedy) strategy, wherein an adjusting coefficient delta is 0.99, the iteration number is i, and an initial value X is explored₀Is taken as 10⁶。

Step d, implementing distribution network/microgrid voltage control according to the following processes, as shown in fig. 1:

d1, establishing a mean depth double-Q neural network according to the Step b, initializing a parameter theta in the neural network, initializing the capacity of a memory set D to be 10000 and the sampling number of a sampling set to be 96, reading the prediction result of the adjustable power of the electric automobile, and setting the Step number Step to be 0;

step d2, reading the voltage of the power grid and obtaining the current state s by combining the prediction result of the adjustable power of the electric automobile_t；

Step d3, converting the current state s_tReward function estimation value Q of all actions obtained by inputting into online network_t(s_t,A|θ)；

Step d4, selecting the current action a from the action set A in the step a according to the dynamic epsilon-greedy strategy obtained in the step C_tAnd inputting the current action into the power grid for load flow calculation to obtain a new state s_t+1；

Step d5, according to the new state s_t+1Calculating to obtain an instant reward r;

step d6, converting s_t,a_t,s_t+1R is put into a memory set D and whether the memory set D is full is judged;

if the memory set D is not full, returning to the step D2; if the memory set D is full, go to step D7;

step D7, sampling the online network and the target network from the memory set DSeparately, Q (s, a | θ) and y are calculated^WDDQNCalculating a loss function L (theta), and updating online network parameters by adopting random gradient descent, wherein the process is as shown in FIG. 2;

step d8, increasing the assignment of Step by 1, taking C as 100 every fixed Step number C, and copying the online network parameters to the target network;

step d9, judging whether the value of Step is maximum;

if the value of Step is not the maximum value, returning to the Step 4.7;

if the value of Step is the maximum value, the action with the maximum reward value is output by the current online network, the reinforcement learning process is completed, and the voltage control is realized. In this embodiment, the maximum value of Step is 30000.

The average reward comparison between the inventive method (WDDQN) and the traditional deep Q learning (DQN) during training is illustrated in fig. 6, and fig. 6 shows that the effect of both methods gradually increases and eventually stabilizes as training progresses. However, the stable value of WDDQN is larger than DQN, and DQN is trapped in local optima. The experimental result shows that the invention can better select the action value compared with DQN.

The result of using the trained agent for the distribution network voltage control is shown in fig. 7, and compared with the voltage distribution before adjustment in fig. 5, the distribution network voltage range in fig. 7 is changed from [0.926,1.073] before adjustment to [0.951,1.046] after adjustment, and at this time, the distribution network voltage all day is within the range [0.95,1.05] required by the national standard. Meanwhile, the voltage offset amount is expressed by equation (11):

the voltage offset before and after control is reduced from 0.0412 before adjustment to 0.0152 after adjustment, and the method can effectively control the voltage of the distribution network.

Claims

1. A voltage control method of an active power distribution network based on a weight mean value deep double-Q network is characterized by comprising the following steps:

in formula (1):

p (s, s ') is the probability of state s transitioning to a new state s';

q (s ', a') is the reward function for the new action a 'in the new state s';

the network loss function L (θ) is characterized by equation (2):

L(θ)＝E(y^WDDQN-Q(s,a|θ))² (2)

in formula (2):

q (s, a | θ) is the reward function estimate for action a in state s;

y^WDDQNfor rewarding the target value, a weight averaging method is adoptedObtained by calculation of equation (3):

y^WDDQN＝r+γ(βQ(s',a*|θ)+(1-β)Q(s',a*|θ^-)) (3)

in formula (3):

the weight β is obtained by equation (4):

in formula (4):

step 3, designing a dynamic epsilon-greedy strategy;

in formula (5):

δ is an adjustment coefficient, the value of δ being a constant less than 1;

4.5 according to the new state s_t+1Calculating to obtain an instant reward r;

if the memory set D is not full, returning to the step 4.2;

if the memory set D is full, go to step 4.7;

4.9, judging whether the value of Step is the maximum;

if the value of Step is not the maximum value, returning to the Step 4.7;

2. The active power distribution network voltage control method based on the weight mean depth double-Q network as claimed in claim 1, wherein: in the step 1, the status set S, the action set a and the instant prize r are set as follows:

in formula (6):

in formula (7):

The number of elements of the action set A is K^mA plurality of;

to be provided with

the instant prize r is characterized by equation (9);

in formula (9):