CN115438873A - Power dispatching method based on block chain and deep reinforcement learning - Google Patents
Power dispatching method based on block chain and deep reinforcement learning Download PDFInfo
- Publication number
- CN115438873A CN115438873A CN202211167510.9A CN202211167510A CN115438873A CN 115438873 A CN115438873 A CN 115438873A CN 202211167510 A CN202211167510 A CN 202211167510A CN 115438873 A CN115438873 A CN 115438873A
- Authority
- CN
- China
- Prior art keywords
- power
- user
- block chain
- scheduling
- load
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 title claims abstract description 28
- 230000002787 reinforcement Effects 0.000 title claims abstract description 23
- 238000012549 training Methods 0.000 claims abstract description 26
- 230000009471 action Effects 0.000 claims abstract description 18
- 230000006870 function Effects 0.000 claims abstract description 13
- 230000007246 mechanism Effects 0.000 claims abstract description 5
- 230000005611 electricity Effects 0.000 claims description 15
- 238000004146 energy storage Methods 0.000 claims description 15
- 238000011156 evaluation Methods 0.000 claims description 12
- 238000004422 calculation algorithm Methods 0.000 claims description 10
- 238000007726 management method Methods 0.000 claims description 10
- 230000008901 benefit Effects 0.000 claims description 9
- 238000005457 optimization Methods 0.000 claims description 8
- 238000013528 artificial neural network Methods 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 7
- 238000012545 processing Methods 0.000 claims description 7
- 238000005070 sampling Methods 0.000 claims description 6
- 230000007774 longterm Effects 0.000 claims description 5
- 238000009825 accumulation Methods 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000012937 correction Methods 0.000 claims description 3
- 238000007405 data analysis Methods 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 239000002699 waste material Substances 0.000 claims description 3
- 238000012550 audit Methods 0.000 claims description 2
- 238000012795 verification Methods 0.000 claims 1
- 238000005516 engineering process Methods 0.000 abstract description 12
- 230000004927 fusion Effects 0.000 abstract description 5
- 230000002776 aggregation Effects 0.000 abstract description 4
- 238000004220 aggregation Methods 0.000 abstract description 4
- 238000013135 deep learning Methods 0.000 description 6
- 238000010248 power generation Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 3
- 230000008447 perception Effects 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 206010063385 Intellectualisation Diseases 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 238000013439 planning Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
- G06Q30/0206—Price or cost determination based on market factors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/06—Electricity, gas or water supply
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/28—Arrangements for balancing of the load in a network by storage of energy
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/38—Arrangements for parallely feeding a single network by two or more generators, converters or transformers
- H02J3/46—Controlling of the sharing of output between the generators, converters, or transformers
- H02J3/466—Scheduling the operation of the generators, e.g. connecting or disconnecting generators to meet a given demand
Abstract
The invention discloses a power dispatching method based on a block chain and deep reinforcement learning, which comprises the following steps: step one, registering a user; collecting data and encrypting a chain; setting a DRL state space and an action space; setting a DRL reward function R (t) and a user constraint and penalty mechanism; step five, obtaining a prediction result based on DRL training of the improved DQN, and reporting electric power to a power grid department; step six, updating the credit value; step seven, the power grid department uploads the scheduling information to the block chain for storage after encrypting the scheduling information by using the public key of the application user, and the user credit value is also stored in the block chain in an uplink manner; and step eight, finishing power dispatching based on the reputation value. The method can perform fusion management on big data acquired based on a block chain technology, realize data aggregation and sharing of different sources, balance the power problem and perfect an energy management system.
Description
Technical Field
The invention relates to the technical field of deep reinforcement learning and block chains, in particular to a power dispatching method based on a block chain and the deep reinforcement learning.
Background
The smart grid is the intellectualization of the grid, also called as "grid 2.0", is established on the basis of an integrated, high-speed two-way communication network, and realizes the purposes of reliability, safety, economy, high efficiency, environmental friendliness and safe use of the grid through the application of advanced sensing and measuring technology, advanced equipment technology, advanced control method and advanced decision support system technology, and the main characteristics of the smart grid comprise self-healing, excitation and user protection, attack resistance, provision of electric energy quality meeting the requirements of users, allowance of access of various different power generation forms, starting of the power market and optimized and efficient operation of assets.
With the development of smart grid concept, green energy and renewable energy are integrated to become a new vision of the traditional power grid, and due to the transition of the traditional power grid from centralized type to distributed type, the excellent characteristics of decentralization, credibility, traceability and the like of a block chain make the traditional power grid and the construction of the smart grid very in accordance, so that the mutual trust between different user subjects in the distributed environment, and further the operations such as energy trading and the like are performed.
In a narrow sense, a blockchain is a Decentralized shared ledger (Decentralized shared ledger) which combines data blocks into a specific data structure in a chain manner according to a time sequence and is cryptographically guaranteed to be non-falsifiable and non-falsifiable, and can safely store simple, sequential and verifiable data in a system. The generalized blockchain technique is a new decentralized infrastructure and distributed computing paradigm that uses a cryptographic chain blockchain structure to verify and store data, uses a distributed node consensus algorithm to generate and update data, and uses automated script code (smart contracts) to program and manipulate data. The block chain technology has the core advantages of decentralization, and can realize point-to-point transaction, coordination and cooperation based on decentralization credit in a distributed system with nodes not needing to trust with each other by means of data encryption, timestamps, distributed consensus, economic incentive and the like, thereby providing a solution for solving the problems of high cost, low efficiency, unsafe data storage and the like commonly existing in a centralization mechanism.
However, p2p energy trading also faces some challenges and difficulties, and most distributed power generation equipment mostly depends on wind power, solar energy and the like, is easily influenced by weather, and has the problems of randomness, uncertainty and the like of generated power. In summary, the premise for completing the power transaction is that the user knows the power load of the user and whether the user has excess power to perform the transaction, and the conventional prediction method is to predict the transaction by a deep learning method. However, the existing deep learning network has the problems of poor feature extraction capability, low accuracy, easy loss of long-term dependence on information and the like. Therefore, the idea of reinforcement learning is introduced, the periodic characteristics of the load can be dynamically learned, and the long-term dependency relationship can be captured more effectively.
The technical background mainly stems from the following two parts:
1. block Chain (Block Chain):
the block chain is a chain formed by blocks. Each block stores
Certain information, which are connected in a chain according to the time sequence of their respective generation. These servers, referred to as nodes in the blockchain system, provide storage space and computational support for the entire blockchain system. If the information in the block chain is to be modified, more than half of the nodes must be authenticated and the information in all the nodes must be modified, and the nodes are usually held in different hands of different subjects, so that the information in the block chain is extremely difficult to tamper with. Compared with the traditional network, the block chain has two core characteristics: the first is that data is difficult to tamper with, and the second is decentralized. Based on the two characteristics, the information recorded by the block chain is more real and reliable, and the problem that people are not trusted each other can be solved.
Deep Reinforcement Learning (DRL): deep Learning (DL) has strong perception capability but lacks certain decision-making capability; while Reinforcement Learning (RL) has decision-making capabilities and is not motivated to perceive the problem. Therefore, the two are combined, the advantages are complementary, and a solution is provided for the perception decision problem of a complex system. Modeless RLs can be roughly classified into two categories depending on the optimization objective: value-Based (Value-Based RL) and Policy-Based (Policy-Based RL), representative algorithms include time-difference learning, Q-learning, sarsa, and others.
Disclosure of Invention
The invention aims to solve the problems of overlarge wave crests and wave troughs and local unbalance of electric power caused by instability of green energy power generation, and provides an electric power scheduling method based on a block chain and deep reinforcement learning. The method can perform fusion management on big data acquired based on a block chain technology, realize data aggregation and sharing of different sources, balance the power problem and perfect an energy management system.
The technical scheme for realizing the purpose of the invention is as follows:
a power dispatching method based on a block chain and deep reinforcement learning comprises the following steps:
step one, a user registers an account and adds equipment to be managed to obtain a user id and an equipment id, and a power grid department serves as a trusted and safe certificate authority to distribute public and private keys for the user;
step two, the intelligent electric meter submits the electric power parameters of the registered equipment and carries out cochain after encryption through the public key of the user, and the cochain data format is as follows:
<Uid,ID,Pc,P,Se,C,V>,
wherein Uid is a user ID, ID is a device ID, pc is a user public key, P is device power generation, se is a device state, C is device current, and V is device voltage;
performing data analysis on the power data, processing the data of the user at each stage in the power market, and taking the data of the uplink data after filtering, duplicate removal and error correction discretization preprocessing operations as a training sample of a subsequent network;
step three, setting DRL state space and action space:
load power P of state space S from real time t moment load (t), time-of-use electricity price TOU (t) at the time t, and charge state SOC (t) of the energy storage device at the time t;
where SOC (t) is defined as:
SOC(t)=SOC(t-1)+P load (t)·Δt/E b ,
E b is the maximum capacity of the energy storage device,
the state space S is defined as:
S={P load (t),TOU(t),SOC(t)};
the action space Act is defined as follows:
whereinAnd discretizing the continuous load for the next moment load grade, wherein the processing process is as follows:
…
in the formula: p min Indicating the minimum permissible load prediction, P max Indicating that the maximum load forecast is allowed,the average value is represented by the average value,is selected by the agent among a-Z;
b (t) >0 represents that the user has redundant electric energy, the energy storage device is charged, the charging electric quantity is B (t), B (t) < 0 represents that the user lacks load, the power dispatching needs to be applied to the power grid, and the electric energy needing to be dispatched is B (t);
step four, setting a DRL reward function R (t) and a user constraint and penalty mechanism:
in a learning link, a deep reinforcement learning algorithm needs to determine the updating direction and amplitude of the parameters of the controller according to the reward value returned by the external environment; in the market environment, the goal of optimizing control is to minimize the long-term electricity purchase cost of electricity purchasing users and to reduce the operating cost of existing equipment:
p is penalty cost, mainly reflecting the degree of violation of operation constraint by a user, and in order to prevent a malicious user from making an extreme load prediction value for pursuing benefits, the operation constraint is as follows:
(1) Load capacity limit constraints:
(2) And (3) deviation electric quantity constraint:
in the formula: p allow Representing the maximum deviation value allowed;
in the optimization process, if the user violates the constraint condition, the user pays punishment according to the out-of-limit degree of the user, and the reward is reduced, wherein the specific punishment cost is calculated as follows:
P=P 1 +P 2 +P 3 ,
in the formula: rho and delta are corresponding penalty coefficients;
(II) operation cost:
f g (t)=TOU(t)·Δt(P load -B store ),
wherein B is store Representing the energy stored in the energy storage means, if the user does not have energy storage means, B store Is 0;
(iii) a reward function, where the user has N devices,
step five, DRL training based on the improved DQN:
(1) And (3) experience accumulation: evaluating network according to user state S in power market environment t Outputting Q values of all actions in the action space Act, and selecting the action a according to a greedy strategy t Market feedback award r t And obtaining the next state S t+1 From this, a complete Markov tuple (S) is obtained t ,a t ,r t ,S t+1 ) Putting the samples into an experience pool as a sample set, and repeatedly training until the number of the samples reaches the set size of the experience pool;
(2) Update parameters of the Q function: randomly sampling the TD-Error of different sample data in the experience pool according to a priority sampling algorithm, and selecting the sample with the large TD-Error with higher probability for training;
(3) Training a neural network: constructing a loss function L of the neural network for training, and copying the parameters of the evaluation network to a target network completely after N rounds of complete training of the evaluation network;
(4) Optimizing parameters: if the benefit obtained by the controller is not increasing and tends to be stable for a long time, the evaluation network parameter at the moment is converged, otherwise, the steps (1) to (4) are repeatedly executed;
(5) The DRL output power operation execution command format is as follows:
<Uid,Pk,ID,Op,Qty>,
wherein Uid is a user ID, pk is a user public key, ID is an equipment ID, op is operation on equipment, namely applying for dispatching and storing electric quantity, and Qty is predicted electric quantity;
step six, in order to prevent resource waste caused by the fact that part of malicious users apply for power scheduling requests with excessive difference with self-needed power when the time-of-use electricity price is low, scheduling credit values are introduced and stored in a block chain, and are automatically managed according to intelligent contracts, when the users apply for scheduling power, the users need to upload past actual power data to the block chain together with needed load requirements predicted by an intelligent electric meter and a DR1 controller for credit value calculation, and the following is specific definition of the credit values:
if the users need power scheduling, a transaction request is sent to a block chain, a power grid company acquires transaction information and scheduling credit values TR of the users applying power scheduling from the block chain, and a power grid department performs power scheduling according to the scheduling credit values of all the users in descending order;
the scheduling reputation value updating mode is as follows:
wherein Q value (t) represents a predictive evaluation value at the ith scheduling,for the calling power credit requested by the controller at time t,alpha is a prediction deviation factor for the amount actually required by the user, because the prediction cannot be certain and the actual situation is the same, deviation protection within a certain range is given, alpha is related to the latest scheduling credit value, the higher the credit value is, the larger the deviation factor is, and N represents the accumulated scheduling application times of the user;
the updating of the credit value is automatically updated by using an intelligent contract, and the credit value is stored on the block chain and is supervised by all users participating in the power dispatching platform;
step seven, the user uploads the scheduling information and the power information to a block chain for storage after the scheduling information and the power information are encrypted by the public key of the user, and the user credit value is also stored in the block chain in an uplink manner;
in order to enable the user reputation value updating to be supervised by all users, storing the reputation value plaintext on a block chain, and encrypting and storing the power parameters and the scheduling request of user equipment by a public key of the user; only the power department and the user can decrypt the data through the private key, and the power department and the user can trace the changes of the power dispatching request and the power equipment parameters in the past through accessing the block chain;
step eight, through the step five, the intelligent agent after data training predicts the power load of the user in the next time period, if surplus electric quantity exists, the user does not need to perform power declaration, the surplus power is stored in the next time period, if the intelligent agent predicts that the power load of the user in the next time period is too large and cannot be self-sufficient, the intelligent agent performs power declaration to a power grid department according to the predicted power limit, and the scheduling of the power is completed after the intelligent agent waits for the auditing of the power department; for a plurality of received power dispatching applications, a power dispatching center needs to carry out sequential dispatching in order to ensure safe and stable operation of a power grid and reliable power supply to the outside, dispatching information is stored and linked up, the power dispatching center carries out descending sequencing on the dispatching applications according to user credit values stored in a block chain, dispatching auditing and power dispatching are preferentially carried out on users with high credit values, in order to ensure that the credit values are fairly and justly updated, the six steps of credit values are used for updating an intelligent contract for management, and the credit values are updated in the next time period according to actual power actual load data according to the credit values in the six steps.
The technical scheme focuses on load prediction and optimization of a user electricity purchasing scheme by using a block chain technology and a deep reinforcement learning technology in a distributed environment. The technical advantages of the technical scheme are as follows:
1. the block chain technology is used for solving data sharing in a distributed environment, as distributed power generation equipment is more and more popular, the data sharing efficiency in the distributed environment is very low in a previous centralized mode, and the generation of the block chain data of the power microgrid relates to data acquisition, fusion sharing management, contract affairs, access control and data consensus. Data from physical equipment, including structured data, unstructured data and semi-structured data, are acquired by data acquisition equipment such as sensors and the like, and are stored in distributed manner and different positions by using a block chain technology. And performing fusion management on the acquired big data based on the block chain technology to realize data aggregation and sharing of different sources.
2. The user load is predicted and the power dispatching is optimized by using deep reinforcement learning, the traditional deep learning and reinforcement learning method lacks the extensible capability, the decision in a high-dimensional space is low in efficiency, and continuous input variables are difficult to directly process, so that the method for using the deep reinforcement learning by combining the deep learning with strong perception capability and the reinforcement learning with strong decision capability is suitable for the complex environment of the power market, wherein the state and the action space are high-dimensional.
The method can perform fusion management on big data acquired based on a block chain technology, realize data aggregation and sharing of different sources, balance the power problem and perfect an energy management system.
Drawings
FIG. 1 is a schematic flow chart of an embodiment;
FIG. 2 is a flow diagram of a DRL in an embodiment;
FIG. 3 is a block diagram of an embodiment of a power block chain;
fig. 4 is a schematic diagram of a discretization process of a continuous load in the embodiment.
Detailed Description
The invention will be further elucidated with reference to the drawings and examples, without however being limited thereto.
Example (b):
with the continuous improvement of the openness degree of the electric power market, a large number of electric power selling enterprises are gradually participating in electric power market transactions. The current research is based on methods such as random planning, robust optimization, artificial intelligence and the like to construct an electricity vendor load optimization decision model. In order to fully utilize the available uncertainty information, the effectiveness of the decision result is improved. The optimization method based on reinforcement learning is very suitable for user load prediction, and the data-driven artificial intelligence algorithm can fully excavate the uncertainty rule of various factors such as electricity price, real-time load and the like in the electric power spot market, take multi-decision variables into consideration to the greatest extent, and improve the decision effect. Meanwhile, in order to adapt to a complex power market environment, the action strategy of the power selling trader is optimized by adopting a deep reinforcement learning algorithm on the basis of reinforcement learning and combining the thought of a neural network.
In the embodiment, the DRL training agent is carried out by using historical data, a user can know the load use condition of the next time period in advance through the agent, and the advanced power dispatching declaration or the storage of redundant power is carried out, so that an energy management system is perfected.
Referring to fig. 1, a power scheduling method based on a block chain and deep reinforcement learning includes the following steps:
step one, a user registers an account and adds equipment to be managed to obtain a user id and an equipment id, and a power grid department serves as a trusted and safe certificate authority to distribute public and private keys for the user;
step two, the intelligent electric meter submits the electric power parameters of the registered equipment and carries out cochain after encryption through the public key of the user, and the cochain data format is as follows:
<Uid,ID,Pc,P,Se,C,V>,
wherein Uid is a user ID, ID is a device ID, pc is a user public key, P is device power generation, se is a device state, C is device current, and V is device voltage;
performing data analysis on the power data, processing the data of the user at each stage in the power market, and taking the data of the uplink data after filtering, duplicate removal and error correction discretization preprocessing operations as a training sample of a subsequent network;
setting a DRL state space and an action space:
load power P of state space S at real time t load (t), time-of-use electricity price TOU (t) at time t, and charge state SOC (t) of the energy storage device at time t;
where SOC (t) is defined as:
SOC(t)=SOC(t-1)+P load (t)·Δt/E b ,
E b is the maximum capacity of the energy storage device,
the state space S is defined as:
S={P load (t),TOU(t),SOC(t)};
the action space Act is defined as follows:
whereinFor the next moment load level, discretizing the continuous load, as shown in fig. 4, the processing procedure is as follows:
…
in the formula: p is min Indicating the minimum permissible load prediction, P max Indicating that the maximum load forecast is allowed,the average score value is represented by the average score value,is selected by the agent among a-Z;
b (t) >0 represents that the user has redundant electric energy, the energy storage device is charged, the charging electric quantity is B (t), B (t) < 0 represents that the user lacks load, electric power scheduling needs to be applied to a power grid, and the electric energy needing scheduling is B (t);
step four, setting a DRL reward function R (t) and a user constraint and penalty mechanism:
in a learning link, a deep reinforcement learning algorithm needs to determine the updating direction and amplitude of the parameters of the controller according to the reward value returned by the external environment; in the market environment, the goal of optimizing control is to minimize the long-term electricity purchase cost of electricity purchasing users and to reduce the operating cost of existing equipment:
p is penalty cost, mainly reflecting the degree of violation of operation constraint by a user, and in order to prevent a malicious user from making an extreme load prediction value for pursuing benefits, the operation constraint is as follows:
(1) Load capacity limit constraints:
(2) And (3) deviation electric quantity constraint:
in the formula: p allow Representing the maximum deviation value allowed;
in the optimization process, if the user violates the constraint condition, the user pays punishment according to the out-of-limit degree of the user, and the reward is reduced, wherein the specific punishment cost is calculated as follows:
P=P 1 +P 2 +P 3 ,
in the formula: rho and delta are corresponding penalty coefficients;
(II) operation cost:
f g (t)=TOU(t)·Δt(P load -B store ),
wherein B is store Representing the energy stored in the energy storage device, if the user does not have an energy storage device, B store Is 0;
(iii) a reward function, where the user has N devices,
step five, DRL training based on improved DQN, as shown in fig. 2:
(1) And (3) experience accumulation: evaluating the network according to the user state S in the power market environment t Outputting Q values of all actions in the action space Act, and selecting the action a according to a greedy strategy t Market feedback award r t And obtaining the next state S t+1 From this, a complete Markov tuple (S) is obtained t ,a t ,r t ,S t+1 ) Putting the samples into an experience pool as a sample set, and repeatedly training until the number of the samples reaches the set size of the experience pool;
(2) Update parameters of the Q function: randomly sampling the TD-Error of different sample data in the experience pool according to a priority sampling algorithm, and selecting the sample with the large TD-Error with higher probability for training;
(3) Training a neural network: constructing a loss function L of the neural network for training, and copying the parameters of the evaluation network to a target network completely after N rounds of complete training of the evaluation network;
(4) Optimizing parameters: if the benefit obtained by the controller is not increasing and tends to be stable for a long time, the evaluation network parameter at the moment is converged, otherwise, the steps (1) to (4) are repeatedly executed;
(5) The DRL output power operation execution command format is as follows:
<Uid,Pk,ID,Op,Qty>,
wherein Uid is a user ID, pk is a user public key, ID is an equipment ID, op is operation on equipment, namely applying for dispatching and storing electric quantity, and Qty is predicted electric quantity;
step six, in order to prevent resource waste caused by the fact that part of malicious users apply for power scheduling requests with excessive difference from self-needs when the time-of-use electricity price is low, a scheduling credit value is introduced and stored in a block chain, and the scheduling credit value is automatically managed according to an intelligent contract, when the users apply for scheduling power, the users need to upload past actual power data to the block chain together with the needed load needs predicted by an intelligent electric meter and a DRl controller for credit value calculation, and the following is the specific definition of the credit value:
if the users need power scheduling, a transaction request is sent to a block chain, a power grid company acquires transaction information and scheduling credit values TR of the users applying power scheduling from the block chain, and a power grid department performs power scheduling according to the scheduling credit values of all the users in descending order;
the scheduling reputation value updating mode is as follows:
wherein Q value (t) represents a predictive evaluation value at the ith scheduling,for the calling power credit requested by the controller at time t,is the amount that the user actually needs,alpha is a prediction deviation factor, because the prediction cannot be certain and the same as the actual situation, deviation protection within a certain range is given, alpha is related to the latest scheduling credit value, the higher the credit value is, the larger the deviation factor is, and N represents the number of times of accumulated application scheduling of the user;
the updating of the credit value is automatically updated by using an intelligent contract, and the credit value is stored on the block chain and is supervised by all users participating in the power dispatching platform;
step seven, the user encrypts the scheduling information and the electric power information through a public key of the user and uploads the encrypted scheduling information and electric power information to a block chain for storage, the user reputation value is also stored in the block chain in an uplink mode, and the block structure in the electric power block chain is shown in fig. 3;
in order to enable the user reputation value updating to be supervised by all users, storing the reputation value plaintext on a block chain, and encrypting and storing the power parameters and the scheduling request of user equipment by a public key of the user; only the power department and the user can decrypt the data through the private key, and the power department and the user can trace the changes of the power dispatching request and the power equipment parameters in the past through accessing the block chain;
step eight, through the step five, the intelligent agent after data training predicts the power load of the user in the next time period, if surplus electric quantity exists, the user does not need to perform power declaration, the surplus power is stored in the next time period, if the intelligent agent predicts that the power load of the user in the next time period is too large and can not be self-sufficient, the intelligent agent performs power declaration to a power grid department according to the predicted power limit, and the scheduling of the power is completed after the power department audits the power load; for a plurality of received power dispatching applications, a power dispatching center needs to carry out sequential dispatching for ensuring safe and stable operation of a power grid and reliable power supply to the outside, dispatching information is stored and linked up, the power dispatching center carries out descending sequencing on the dispatching applications according to user credit values stored in a block chain, dispatching auditing and power dispatching are preferentially carried out on users with high credit values, in order to ensure that the updating of the credit values is fair and fair, credit value updating and intelligent contract management in the step six are used, and the credit values are updated according to actual power actual load data in the step six in the next time period.
Claims (1)
1. A power dispatching method based on a block chain and deep reinforcement learning is characterized by comprising the following steps:
step one, a user registers an account and adds equipment to be managed to obtain a user id and an equipment id, and a power grid department serves as a trusted and safe certificate authority to distribute public and private keys for the user;
step two, the intelligent electric meter submits the electric power parameters of the registered equipment and carries out cochain after encryption through the public key of the user, and the cochain data format is as follows:
<Uid,ID,Pc,P,Se,C,V>,
wherein Uid is a user ID, ID is a device ID, pc is a user public key, P is device generating power, se is a device state, C is device current, and V is device voltage;
performing data analysis on the power data, processing the data of the user at each stage in the power market, and taking the data of the uplink data after discretization preprocessing operations such as filtering, duplicate removal, error correction and the like as training samples of a subsequent network;
setting a DRL state space and an action space:
load power P of state space S at real time t load (t), time-of-use electricity price TOU (t) at time t, and charge state SOC (t) of the energy storage device at time t;
where SOC (t) is defined as:
SOC(t)=SOC(t-1)+Pload(t)·Δt/E b ,
E b is the maximum capacity of the energy storage device,
the state space S is defined as:
S={P load (t),TOU(t),SOC(t)};
the action space Act is defined as follows:
whereinCarrying out discretization processing on continuous loads for load predicted values at the next moment, wherein the processing process is as follows:
…
P min indicating the minimum permissible load prediction, P max Indicating that the maximum load forecast is allowed,the average score value is represented by the average score value,is selected by the agent among a-Z;
b (t) >0 represents that the user has redundant electric energy and charges the energy storage device, the charging electric quantity is B (t), B (t) < 0 represents that the user lacks load and needs to apply for power dispatching to the power grid, and the electric energy needing to be dispatched is B (t);
step four, setting a DRL reward function R (t) and a user constraint and penalty mechanism:
in a learning link, a deep reinforcement learning algorithm needs to determine the updating direction and amplitude of the parameters of the controller according to the reward value returned by the external environment; in the market environment, the goal of optimizing control is to minimize the long-term electricity purchase cost of electricity purchasing users and to reduce the operating cost of existing equipment:
p is penalty cost, mainly reflecting the degree of violation of operation constraint by a user, and in order to prevent a malicious user from making extreme load prediction value operation constraint for benefit pursuit, the method comprises the following steps:
(1) Load capacity limit constraints:
(2) And (3) deviation electric quantity constraint:
in the formula: p is allow Representing the maximum deviation value allowed;
in the optimization process, if the user violates the constraint condition, the user pays punishment according to the out-of-limit degree of the user, and the reward is reduced, wherein the specific punishment cost is calculated as follows:
P=P 1 +P 2 +P 3 ,
in the formula: rho and delta are corresponding penalty coefficients;
(II) operation cost:
f g (t)=TOU(t)·Δt(P load -B store ),
wherein B is store Representing the energy stored in the energy storage means, if the user does not have energy storage means, B store Is 0;
(iii) a reward function, where the user has N devices,
step five, DRL training based on the improved DQN:
(1) And (3) experience accumulation: evaluating the network according to the user state S in the power market environment t Outputting Q values of all actions in the action space Act, and selecting the action a according to a greedy strategy t Market feedback award r t And obtaining the next state S t+1 From this, a complete Markov tuple (S) is obtained t ,a t ,r t ,S t+1 ) Putting the samples into an experience pool as a sample set, and repeatedly training until the number of the samples reaches the set size of the experience pool;
(2) Update parameters of the Q function: randomly sampling the TD-Error of different sample data in the experience pool according to a priority sampling algorithm, and selecting the sample with the large TD-Error with higher probability for training;
(3) Training a neural network: constructing a loss function L of the neural network for training, and copying the parameters of the evaluation network to a target network completely after N rounds of complete training of the evaluation network;
(4) Optimizing parameters: if the benefit obtained by the controller is not increasing and tends to be stable for a long time, the evaluation network parameter at the moment is converged, otherwise, the steps (1) to (4) are repeatedly executed;
(5) The DRL output power operation execution command format is as follows:
<Uid,Pk,ID,Op,Qty>,
wherein Uid is a user ID, pk is a user public key, ID is an equipment ID, op is operation on equipment, namely applying for dispatching and storing electric quantity, and Qty is predicted electric quantity;
step six, in order to prevent resource waste caused by the fact that part of malicious users apply for power scheduling requests with excessive difference from self-needs when the time-of-use electricity price is low, a scheduling credit value is introduced and stored in a block chain, and the scheduling credit value is automatically managed according to an intelligent contract, when the users apply for scheduling power, the users need to upload past actual power data to the block chain together with the needed load needs predicted by an intelligent electric meter and a DRl controller for credit value calculation, and the following is the specific definition of the credit value:
if the users need power scheduling, a transaction request is sent to a block chain, a power grid company acquires transaction information and scheduling credit values TR of the users applying power scheduling from the block chain, and a power grid department performs power scheduling according to the scheduling credit values of all the users in descending order;
the scheduling reputation value updating mode is as follows:
wherein Q value (t) represents a predictive evaluation value at the ith scheduling,for the calling power credit requested by the controller at time t,alpha is a prediction deviation factor and alpha is the latest scheduling creditThe values are related, the higher the credit value is, the larger the deviation factor is, and N represents the number of times of accumulated application scheduling of the user;
the updating of the credit value is automatically updated by using an intelligent contract, and the credit value is stored on the block chain and is supervised by all users participating in the power dispatching platform;
step seven, the user uploads the scheduling information and the power information to a block chain for storage after the scheduling information and the power information are encrypted by the public key of the user, and the user credit value is also stored in the block chain in an uplink manner;
in order to enable the user reputation value updating to be supervised by all users, storing the reputation value plaintext on a block chain, and encrypting and storing the power parameters and the scheduling requests of user equipment by public keys of the users; only the power department and the user can decrypt the data through the private key, and the power department and the user can trace the changes of the power dispatching request and the power equipment parameters in the past through accessing the block chain;
step eight, through the step five, the intelligent agent after data training predicts the power load of the user in the next time period, if surplus electric quantity exists, the user does not need to perform power declaration, the surplus power is stored in the next time period, if the intelligent agent predicts that the power load of the user in the next time period is too large and can not be self-sufficient, the intelligent agent performs power declaration to a power grid department according to the predicted power limit, and the scheduling of the power is completed after the power department audits the power load; for a power grid department, a plurality of received power dispatching applications, a power grid dispatching center needs to carry out sequential dispatching in order to ensure safe and stable operation of a power grid and reliable external power supply, dispatching information is stored and uplink is stored, the power grid department carries out descending sequencing on the dispatching applications according to user credit values stored in a block chain, dispatching verification and power dispatching are preferentially carried out on users with high credit values, in order to ensure that the updating of the credit values is fair and fair, the credit values in the sixth step are used for updating intelligent contracts for management, and the credit values are updated according to actual power actual load data in the sixth step in the next time period.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211167510.9A CN115438873A (en) | 2022-09-23 | 2022-09-23 | Power dispatching method based on block chain and deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211167510.9A CN115438873A (en) | 2022-09-23 | 2022-09-23 | Power dispatching method based on block chain and deep reinforcement learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115438873A true CN115438873A (en) | 2022-12-06 |
Family
ID=84249735
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211167510.9A Withdrawn CN115438873A (en) | 2022-09-23 | 2022-09-23 | Power dispatching method based on block chain and deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115438873A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116233132A (en) * | 2023-05-08 | 2023-06-06 | 成都理工大学 | Energy block chain link point consensus method based on improved Raft consensus mechanism |
CN116307606A (en) * | 2023-03-24 | 2023-06-23 | 华北电力大学 | Shared energy storage flexible operation scheduling method based on block chain |
CN116703009A (en) * | 2023-08-08 | 2023-09-05 | 深圳航天科创泛在电气有限公司 | Operation reference information generation method of photovoltaic power generation energy storage system |
CN117478306A (en) * | 2023-12-28 | 2024-01-30 | 湖南天河国云科技有限公司 | Block chain-based energy control method, storage medium and terminal equipment |
-
2022
- 2022-09-23 CN CN202211167510.9A patent/CN115438873A/en not_active Withdrawn
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116307606A (en) * | 2023-03-24 | 2023-06-23 | 华北电力大学 | Shared energy storage flexible operation scheduling method based on block chain |
CN116307606B (en) * | 2023-03-24 | 2023-09-12 | 华北电力大学 | Shared energy storage flexible operation scheduling method based on block chain |
CN116233132A (en) * | 2023-05-08 | 2023-06-06 | 成都理工大学 | Energy block chain link point consensus method based on improved Raft consensus mechanism |
CN116703009A (en) * | 2023-08-08 | 2023-09-05 | 深圳航天科创泛在电气有限公司 | Operation reference information generation method of photovoltaic power generation energy storage system |
CN116703009B (en) * | 2023-08-08 | 2024-01-09 | 深圳航天科创泛在电气有限公司 | Operation reference information generation method of photovoltaic power generation energy storage system |
CN117478306A (en) * | 2023-12-28 | 2024-01-30 | 湖南天河国云科技有限公司 | Block chain-based energy control method, storage medium and terminal equipment |
CN117478306B (en) * | 2023-12-28 | 2024-03-22 | 湖南天河国云科技有限公司 | Block chain-based energy control method, storage medium and terminal equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Kirli et al. | Smart contracts in energy systems: A systematic review of fundamental approaches and implementations | |
Zhang et al. | Multi-agent safe policy learning for power management of networked microgrids | |
Yang et al. | Automated demand response framework in ELNs: Decentralized scheduling and smart contract | |
Di Silvestre et al. | Ancillary services in the energy blockchain for microgrids | |
CN115438873A (en) | Power dispatching method based on block chain and deep reinforcement learning | |
Helseth et al. | Detailed long‐term hydro‐thermal scheduling for expansion planning in the Nordic power system | |
Soares et al. | Multi-dimensional signaling method for population-based metaheuristics: Solving the large-scale scheduling problem in smart grids | |
Dalal et al. | Chance-constrained outage scheduling using a machine learning proxy | |
US20220179378A1 (en) | Blockchain-Based Transactive Energy Systems | |
Ziras et al. | A mid-term DSO market for capacity limits: How to estimate opportunity costs of aggregators? | |
Liu et al. | A blockchain-based trustworthy collaborative power trading scheme for 5G-enabled social internet of vehicles | |
Zahid et al. | Balancing electricity demand and supply in smart grids using blockchain | |
Kumar et al. | Blockchain based optimized energy trading for e-mobility using quantum reinforcement learning | |
Veerasamy et al. | Blockchain-based decentralized frequency control of microgrids using federated learning fractional-order recurrent neural network | |
Hou et al. | A study on decentralized autonomous organizations based intelligent transportation system enabled by blockchain and smart contract | |
Medved et al. | The use of intelligent aggregator agents for advanced control of demand response | |
CN102801524A (en) | Trust-theory-based trusted service system based on trusted authentication system | |
El-adaway et al. | Preliminary attempt toward better understanding the impact of distributed energy generation: An agent-based computational economics approach | |
Ngwira et al. | Towards context-aware smart contracts for Blockchain IoT systems | |
Kell et al. | A systematic literature review on machine learning for electricity market agent-based models | |
Robert et al. | Economic emission dispatch of hydro‐thermal‐wind using CMQLSPSN technique | |
CN114977160A (en) | Micro-grid group optimization operation strategy generation method, system, equipment and storage medium | |
Chatzidimitriou et al. | Enhancing agent intelligence through evolving reservoir networks for predictions in power stock markets | |
Shang et al. | An Information Security Solution for Vehicle-to-grid Scheduling by Distributed Edge Computing and Federated Deep Learning | |
An | Game-theoretic methods for cost allocation and security in Smart Grid |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20221206 |
|
WW01 | Invention patent application withdrawn after publication |