CN115438873A - Power dispatching method based on block chain and deep reinforcement learning - Google Patents

Power dispatching method based on block chain and deep reinforcement learning Download PDF

Info

Publication number
CN115438873A
CN115438873A CN202211167510.9A CN202211167510A CN115438873A CN 115438873 A CN115438873 A CN 115438873A CN 202211167510 A CN202211167510 A CN 202211167510A CN 115438873 A CN115438873 A CN 115438873A
Authority
CN
China
Prior art keywords
power
user
block chain
scheduling
load
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202211167510.9A
Other languages
Chinese (zh)
Inventor
李先贤
周梁昊杰
刘鹏
李东城
陈柠天
霍浩
王博仪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangxi Normal University
Original Assignee
Guangxi Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi Normal University filed Critical Guangxi Normal University
Priority to CN202211167510.9A priority Critical patent/CN115438873A/en
Publication of CN115438873A publication Critical patent/CN115438873A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0206Price or cost determination based on market factors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/28Arrangements for balancing of the load in a network by storage of energy
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/38Arrangements for parallely feeding a single network by two or more generators, converters or transformers
    • H02J3/46Controlling of the sharing of output between the generators, converters, or transformers
    • H02J3/466Scheduling the operation of the generators, e.g. connecting or disconnecting generators to meet a given demand

Abstract

The invention discloses a power dispatching method based on a block chain and deep reinforcement learning, which comprises the following steps: step one, registering a user; collecting data and encrypting a chain; setting a DRL state space and an action space; setting a DRL reward function R (t) and a user constraint and penalty mechanism; step five, obtaining a prediction result based on DRL training of the improved DQN, and reporting electric power to a power grid department; step six, updating the credit value; step seven, the power grid department uploads the scheduling information to the block chain for storage after encrypting the scheduling information by using the public key of the application user, and the user credit value is also stored in the block chain in an uplink manner; and step eight, finishing power dispatching based on the reputation value. The method can perform fusion management on big data acquired based on a block chain technology, realize data aggregation and sharing of different sources, balance the power problem and perfect an energy management system.

Description

Power dispatching method based on block chain and deep reinforcement learning
Technical Field
The invention relates to the technical field of deep reinforcement learning and block chains, in particular to a power dispatching method based on a block chain and the deep reinforcement learning.
Background
The smart grid is the intellectualization of the grid, also called as "grid 2.0", is established on the basis of an integrated, high-speed two-way communication network, and realizes the purposes of reliability, safety, economy, high efficiency, environmental friendliness and safe use of the grid through the application of advanced sensing and measuring technology, advanced equipment technology, advanced control method and advanced decision support system technology, and the main characteristics of the smart grid comprise self-healing, excitation and user protection, attack resistance, provision of electric energy quality meeting the requirements of users, allowance of access of various different power generation forms, starting of the power market and optimized and efficient operation of assets.
With the development of smart grid concept, green energy and renewable energy are integrated to become a new vision of the traditional power grid, and due to the transition of the traditional power grid from centralized type to distributed type, the excellent characteristics of decentralization, credibility, traceability and the like of a block chain make the traditional power grid and the construction of the smart grid very in accordance, so that the mutual trust between different user subjects in the distributed environment, and further the operations such as energy trading and the like are performed.
In a narrow sense, a blockchain is a Decentralized shared ledger (Decentralized shared ledger) which combines data blocks into a specific data structure in a chain manner according to a time sequence and is cryptographically guaranteed to be non-falsifiable and non-falsifiable, and can safely store simple, sequential and verifiable data in a system. The generalized blockchain technique is a new decentralized infrastructure and distributed computing paradigm that uses a cryptographic chain blockchain structure to verify and store data, uses a distributed node consensus algorithm to generate and update data, and uses automated script code (smart contracts) to program and manipulate data. The block chain technology has the core advantages of decentralization, and can realize point-to-point transaction, coordination and cooperation based on decentralization credit in a distributed system with nodes not needing to trust with each other by means of data encryption, timestamps, distributed consensus, economic incentive and the like, thereby providing a solution for solving the problems of high cost, low efficiency, unsafe data storage and the like commonly existing in a centralization mechanism.
However, p2p energy trading also faces some challenges and difficulties, and most distributed power generation equipment mostly depends on wind power, solar energy and the like, is easily influenced by weather, and has the problems of randomness, uncertainty and the like of generated power. In summary, the premise for completing the power transaction is that the user knows the power load of the user and whether the user has excess power to perform the transaction, and the conventional prediction method is to predict the transaction by a deep learning method. However, the existing deep learning network has the problems of poor feature extraction capability, low accuracy, easy loss of long-term dependence on information and the like. Therefore, the idea of reinforcement learning is introduced, the periodic characteristics of the load can be dynamically learned, and the long-term dependency relationship can be captured more effectively.
The technical background mainly stems from the following two parts:
1. block Chain (Block Chain):
the block chain is a chain formed by blocks. Each block stores
Certain information, which are connected in a chain according to the time sequence of their respective generation. These servers, referred to as nodes in the blockchain system, provide storage space and computational support for the entire blockchain system. If the information in the block chain is to be modified, more than half of the nodes must be authenticated and the information in all the nodes must be modified, and the nodes are usually held in different hands of different subjects, so that the information in the block chain is extremely difficult to tamper with. Compared with the traditional network, the block chain has two core characteristics: the first is that data is difficult to tamper with, and the second is decentralized. Based on the two characteristics, the information recorded by the block chain is more real and reliable, and the problem that people are not trusted each other can be solved.
Deep Reinforcement Learning (DRL): deep Learning (DL) has strong perception capability but lacks certain decision-making capability; while Reinforcement Learning (RL) has decision-making capabilities and is not motivated to perceive the problem. Therefore, the two are combined, the advantages are complementary, and a solution is provided for the perception decision problem of a complex system. Modeless RLs can be roughly classified into two categories depending on the optimization objective: value-Based (Value-Based RL) and Policy-Based (Policy-Based RL), representative algorithms include time-difference learning, Q-learning, sarsa, and others.
Disclosure of Invention
The invention aims to solve the problems of overlarge wave crests and wave troughs and local unbalance of electric power caused by instability of green energy power generation, and provides an electric power scheduling method based on a block chain and deep reinforcement learning. The method can perform fusion management on big data acquired based on a block chain technology, realize data aggregation and sharing of different sources, balance the power problem and perfect an energy management system.
The technical scheme for realizing the purpose of the invention is as follows:
a power dispatching method based on a block chain and deep reinforcement learning comprises the following steps:
step one, a user registers an account and adds equipment to be managed to obtain a user id and an equipment id, and a power grid department serves as a trusted and safe certificate authority to distribute public and private keys for the user;
step two, the intelligent electric meter submits the electric power parameters of the registered equipment and carries out cochain after encryption through the public key of the user, and the cochain data format is as follows:
<Uid,ID,Pc,P,Se,C,V>,
wherein Uid is a user ID, ID is a device ID, pc is a user public key, P is device power generation, se is a device state, C is device current, and V is device voltage;
performing data analysis on the power data, processing the data of the user at each stage in the power market, and taking the data of the uplink data after filtering, duplicate removal and error correction discretization preprocessing operations as a training sample of a subsequent network;
step three, setting DRL state space and action space:
load power P of state space S from real time t moment load (t), time-of-use electricity price TOU (t) at the time t, and charge state SOC (t) of the energy storage device at the time t;
where SOC (t) is defined as:
SOC(t)=SOC(t-1)+P load (t)·Δt/E b
E b is the maximum capacity of the energy storage device,
the state space S is defined as:
S={P load (t),TOU(t),SOC(t)};
the action space Act is defined as follows:
Figure BDA0003862278040000031
wherein
Figure BDA0003862278040000032
And discretizing the continuous load for the next moment load grade, wherein the processing process is as follows:
Figure BDA0003862278040000033
Figure BDA0003862278040000034
Figure BDA0003862278040000035
Figure BDA0003862278040000036
in the formula: p min Indicating the minimum permissible load prediction, P max Indicating that the maximum load forecast is allowed,
Figure BDA0003862278040000038
the average value is represented by the average value,
Figure BDA0003862278040000037
is selected by the agent among a-Z;
b (t) >0 represents that the user has redundant electric energy, the energy storage device is charged, the charging electric quantity is B (t), B (t) < 0 represents that the user lacks load, the power dispatching needs to be applied to the power grid, and the electric energy needing to be dispatched is B (t);
step four, setting a DRL reward function R (t) and a user constraint and penalty mechanism:
in a learning link, a deep reinforcement learning algorithm needs to determine the updating direction and amplitude of the parameters of the controller according to the reward value returned by the external environment; in the market environment, the goal of optimizing control is to minimize the long-term electricity purchase cost of electricity purchasing users and to reduce the operating cost of existing equipment:
p is penalty cost, mainly reflecting the degree of violation of operation constraint by a user, and in order to prevent a malicious user from making an extreme load prediction value for pursuing benefits, the operation constraint is as follows:
(1) Load capacity limit constraints:
Figure BDA0003862278040000041
(2) And (3) deviation electric quantity constraint:
Figure BDA0003862278040000042
in the formula: p allow Representing the maximum deviation value allowed;
in the optimization process, if the user violates the constraint condition, the user pays punishment according to the out-of-limit degree of the user, and the reward is reduced, wherein the specific punishment cost is calculated as follows:
P=P 1 +P 2 +P 3
Figure BDA0003862278040000043
Figure BDA0003862278040000044
Figure BDA0003862278040000045
in the formula: rho and delta are corresponding penalty coefficients;
(II) operation cost:
f g (t)=TOU(t)·Δt(P load -B store ),
wherein B is store Representing the energy stored in the energy storage means, if the user does not have energy storage means, B store Is 0;
(iii) a reward function, where the user has N devices,
Figure BDA0003862278040000046
step five, DRL training based on the improved DQN:
(1) And (3) experience accumulation: evaluating network according to user state S in power market environment t Outputting Q values of all actions in the action space Act, and selecting the action a according to a greedy strategy t Market feedback award r t And obtaining the next state S t+1 From this, a complete Markov tuple (S) is obtained t ,a t ,r t ,S t+1 ) Putting the samples into an experience pool as a sample set, and repeatedly training until the number of the samples reaches the set size of the experience pool;
(2) Update parameters of the Q function: randomly sampling the TD-Error of different sample data in the experience pool according to a priority sampling algorithm, and selecting the sample with the large TD-Error with higher probability for training;
(3) Training a neural network: constructing a loss function L of the neural network for training, and copying the parameters of the evaluation network to a target network completely after N rounds of complete training of the evaluation network;
(4) Optimizing parameters: if the benefit obtained by the controller is not increasing and tends to be stable for a long time, the evaluation network parameter at the moment is converged, otherwise, the steps (1) to (4) are repeatedly executed;
(5) The DRL output power operation execution command format is as follows:
<Uid,Pk,ID,Op,Qty>,
wherein Uid is a user ID, pk is a user public key, ID is an equipment ID, op is operation on equipment, namely applying for dispatching and storing electric quantity, and Qty is predicted electric quantity;
step six, in order to prevent resource waste caused by the fact that part of malicious users apply for power scheduling requests with excessive difference with self-needed power when the time-of-use electricity price is low, scheduling credit values are introduced and stored in a block chain, and are automatically managed according to intelligent contracts, when the users apply for scheduling power, the users need to upload past actual power data to the block chain together with needed load requirements predicted by an intelligent electric meter and a DR1 controller for credit value calculation, and the following is specific definition of the credit values:
if the users need power scheduling, a transaction request is sent to a block chain, a power grid company acquires transaction information and scheduling credit values TR of the users applying power scheduling from the block chain, and a power grid department performs power scheduling according to the scheduling credit values of all the users in descending order;
the scheduling reputation value updating mode is as follows:
Figure BDA0003862278040000051
Figure BDA0003862278040000052
wherein Q value (t) represents a predictive evaluation value at the ith scheduling,
Figure BDA0003862278040000053
for the calling power credit requested by the controller at time t,
Figure BDA0003862278040000054
alpha is a prediction deviation factor for the amount actually required by the user, because the prediction cannot be certain and the actual situation is the same, deviation protection within a certain range is given, alpha is related to the latest scheduling credit value, the higher the credit value is, the larger the deviation factor is, and N represents the accumulated scheduling application times of the user;
the updating of the credit value is automatically updated by using an intelligent contract, and the credit value is stored on the block chain and is supervised by all users participating in the power dispatching platform;
step seven, the user uploads the scheduling information and the power information to a block chain for storage after the scheduling information and the power information are encrypted by the public key of the user, and the user credit value is also stored in the block chain in an uplink manner;
Figure BDA0003862278040000061
in order to enable the user reputation value updating to be supervised by all users, storing the reputation value plaintext on a block chain, and encrypting and storing the power parameters and the scheduling request of user equipment by a public key of the user; only the power department and the user can decrypt the data through the private key, and the power department and the user can trace the changes of the power dispatching request and the power equipment parameters in the past through accessing the block chain;
step eight, through the step five, the intelligent agent after data training predicts the power load of the user in the next time period, if surplus electric quantity exists, the user does not need to perform power declaration, the surplus power is stored in the next time period, if the intelligent agent predicts that the power load of the user in the next time period is too large and cannot be self-sufficient, the intelligent agent performs power declaration to a power grid department according to the predicted power limit, and the scheduling of the power is completed after the intelligent agent waits for the auditing of the power department; for a plurality of received power dispatching applications, a power dispatching center needs to carry out sequential dispatching in order to ensure safe and stable operation of a power grid and reliable power supply to the outside, dispatching information is stored and linked up, the power dispatching center carries out descending sequencing on the dispatching applications according to user credit values stored in a block chain, dispatching auditing and power dispatching are preferentially carried out on users with high credit values, in order to ensure that the credit values are fairly and justly updated, the six steps of credit values are used for updating an intelligent contract for management, and the credit values are updated in the next time period according to actual power actual load data according to the credit values in the six steps.
The technical scheme focuses on load prediction and optimization of a user electricity purchasing scheme by using a block chain technology and a deep reinforcement learning technology in a distributed environment. The technical advantages of the technical scheme are as follows:
1. the block chain technology is used for solving data sharing in a distributed environment, as distributed power generation equipment is more and more popular, the data sharing efficiency in the distributed environment is very low in a previous centralized mode, and the generation of the block chain data of the power microgrid relates to data acquisition, fusion sharing management, contract affairs, access control and data consensus. Data from physical equipment, including structured data, unstructured data and semi-structured data, are acquired by data acquisition equipment such as sensors and the like, and are stored in distributed manner and different positions by using a block chain technology. And performing fusion management on the acquired big data based on the block chain technology to realize data aggregation and sharing of different sources.
2. The user load is predicted and the power dispatching is optimized by using deep reinforcement learning, the traditional deep learning and reinforcement learning method lacks the extensible capability, the decision in a high-dimensional space is low in efficiency, and continuous input variables are difficult to directly process, so that the method for using the deep reinforcement learning by combining the deep learning with strong perception capability and the reinforcement learning with strong decision capability is suitable for the complex environment of the power market, wherein the state and the action space are high-dimensional.
The method can perform fusion management on big data acquired based on a block chain technology, realize data aggregation and sharing of different sources, balance the power problem and perfect an energy management system.
Drawings
FIG. 1 is a schematic flow chart of an embodiment;
FIG. 2 is a flow diagram of a DRL in an embodiment;
FIG. 3 is a block diagram of an embodiment of a power block chain;
fig. 4 is a schematic diagram of a discretization process of a continuous load in the embodiment.
Detailed Description
The invention will be further elucidated with reference to the drawings and examples, without however being limited thereto.
Example (b):
with the continuous improvement of the openness degree of the electric power market, a large number of electric power selling enterprises are gradually participating in electric power market transactions. The current research is based on methods such as random planning, robust optimization, artificial intelligence and the like to construct an electricity vendor load optimization decision model. In order to fully utilize the available uncertainty information, the effectiveness of the decision result is improved. The optimization method based on reinforcement learning is very suitable for user load prediction, and the data-driven artificial intelligence algorithm can fully excavate the uncertainty rule of various factors such as electricity price, real-time load and the like in the electric power spot market, take multi-decision variables into consideration to the greatest extent, and improve the decision effect. Meanwhile, in order to adapt to a complex power market environment, the action strategy of the power selling trader is optimized by adopting a deep reinforcement learning algorithm on the basis of reinforcement learning and combining the thought of a neural network.
In the embodiment, the DRL training agent is carried out by using historical data, a user can know the load use condition of the next time period in advance through the agent, and the advanced power dispatching declaration or the storage of redundant power is carried out, so that an energy management system is perfected.
Referring to fig. 1, a power scheduling method based on a block chain and deep reinforcement learning includes the following steps:
step one, a user registers an account and adds equipment to be managed to obtain a user id and an equipment id, and a power grid department serves as a trusted and safe certificate authority to distribute public and private keys for the user;
step two, the intelligent electric meter submits the electric power parameters of the registered equipment and carries out cochain after encryption through the public key of the user, and the cochain data format is as follows:
<Uid,ID,Pc,P,Se,C,V>,
wherein Uid is a user ID, ID is a device ID, pc is a user public key, P is device power generation, se is a device state, C is device current, and V is device voltage;
performing data analysis on the power data, processing the data of the user at each stage in the power market, and taking the data of the uplink data after filtering, duplicate removal and error correction discretization preprocessing operations as a training sample of a subsequent network;
setting a DRL state space and an action space:
load power P of state space S at real time t load (t), time-of-use electricity price TOU (t) at time t, and charge state SOC (t) of the energy storage device at time t;
where SOC (t) is defined as:
SOC(t)=SOC(t-1)+P load (t)·Δt/E b
E b is the maximum capacity of the energy storage device,
the state space S is defined as:
S={P load (t),TOU(t),SOC(t)};
the action space Act is defined as follows:
Figure BDA0003862278040000081
wherein
Figure BDA0003862278040000082
For the next moment load level, discretizing the continuous load, as shown in fig. 4, the processing procedure is as follows:
Figure BDA0003862278040000083
Figure BDA0003862278040000084
Figure BDA0003862278040000085
Figure BDA0003862278040000086
in the formula: p is min Indicating the minimum permissible load prediction, P max Indicating that the maximum load forecast is allowed,
Figure BDA0003862278040000088
the average score value is represented by the average score value,
Figure BDA0003862278040000087
is selected by the agent among a-Z;
b (t) >0 represents that the user has redundant electric energy, the energy storage device is charged, the charging electric quantity is B (t), B (t) < 0 represents that the user lacks load, electric power scheduling needs to be applied to a power grid, and the electric energy needing scheduling is B (t);
step four, setting a DRL reward function R (t) and a user constraint and penalty mechanism:
in a learning link, a deep reinforcement learning algorithm needs to determine the updating direction and amplitude of the parameters of the controller according to the reward value returned by the external environment; in the market environment, the goal of optimizing control is to minimize the long-term electricity purchase cost of electricity purchasing users and to reduce the operating cost of existing equipment:
p is penalty cost, mainly reflecting the degree of violation of operation constraint by a user, and in order to prevent a malicious user from making an extreme load prediction value for pursuing benefits, the operation constraint is as follows:
(1) Load capacity limit constraints:
Figure BDA0003862278040000091
(2) And (3) deviation electric quantity constraint:
Figure BDA0003862278040000092
in the formula: p allow Representing the maximum deviation value allowed;
in the optimization process, if the user violates the constraint condition, the user pays punishment according to the out-of-limit degree of the user, and the reward is reduced, wherein the specific punishment cost is calculated as follows:
P=P 1 +P 2 +P 3
Figure BDA0003862278040000093
Figure BDA0003862278040000094
Figure BDA0003862278040000095
in the formula: rho and delta are corresponding penalty coefficients;
(II) operation cost:
f g (t)=TOU(t)·Δt(P load -B store ),
wherein B is store Representing the energy stored in the energy storage device, if the user does not have an energy storage device, B store Is 0;
(iii) a reward function, where the user has N devices,
Figure BDA0003862278040000096
step five, DRL training based on improved DQN, as shown in fig. 2:
(1) And (3) experience accumulation: evaluating the network according to the user state S in the power market environment t Outputting Q values of all actions in the action space Act, and selecting the action a according to a greedy strategy t Market feedback award r t And obtaining the next state S t+1 From this, a complete Markov tuple (S) is obtained t ,a t ,r t ,S t+1 ) Putting the samples into an experience pool as a sample set, and repeatedly training until the number of the samples reaches the set size of the experience pool;
(2) Update parameters of the Q function: randomly sampling the TD-Error of different sample data in the experience pool according to a priority sampling algorithm, and selecting the sample with the large TD-Error with higher probability for training;
(3) Training a neural network: constructing a loss function L of the neural network for training, and copying the parameters of the evaluation network to a target network completely after N rounds of complete training of the evaluation network;
(4) Optimizing parameters: if the benefit obtained by the controller is not increasing and tends to be stable for a long time, the evaluation network parameter at the moment is converged, otherwise, the steps (1) to (4) are repeatedly executed;
(5) The DRL output power operation execution command format is as follows:
<Uid,Pk,ID,Op,Qty>,
wherein Uid is a user ID, pk is a user public key, ID is an equipment ID, op is operation on equipment, namely applying for dispatching and storing electric quantity, and Qty is predicted electric quantity;
step six, in order to prevent resource waste caused by the fact that part of malicious users apply for power scheduling requests with excessive difference from self-needs when the time-of-use electricity price is low, a scheduling credit value is introduced and stored in a block chain, and the scheduling credit value is automatically managed according to an intelligent contract, when the users apply for scheduling power, the users need to upload past actual power data to the block chain together with the needed load needs predicted by an intelligent electric meter and a DRl controller for credit value calculation, and the following is the specific definition of the credit value:
if the users need power scheduling, a transaction request is sent to a block chain, a power grid company acquires transaction information and scheduling credit values TR of the users applying power scheduling from the block chain, and a power grid department performs power scheduling according to the scheduling credit values of all the users in descending order;
the scheduling reputation value updating mode is as follows:
Figure BDA0003862278040000101
Figure BDA0003862278040000102
wherein Q value (t) represents a predictive evaluation value at the ith scheduling,
Figure BDA0003862278040000103
for the calling power credit requested by the controller at time t,
Figure BDA0003862278040000104
is the amount that the user actually needs,alpha is a prediction deviation factor, because the prediction cannot be certain and the same as the actual situation, deviation protection within a certain range is given, alpha is related to the latest scheduling credit value, the higher the credit value is, the larger the deviation factor is, and N represents the number of times of accumulated application scheduling of the user;
the updating of the credit value is automatically updated by using an intelligent contract, and the credit value is stored on the block chain and is supervised by all users participating in the power dispatching platform;
step seven, the user encrypts the scheduling information and the electric power information through a public key of the user and uploads the encrypted scheduling information and electric power information to a block chain for storage, the user reputation value is also stored in the block chain in an uplink mode, and the block structure in the electric power block chain is shown in fig. 3;
Figure BDA0003862278040000111
in order to enable the user reputation value updating to be supervised by all users, storing the reputation value plaintext on a block chain, and encrypting and storing the power parameters and the scheduling request of user equipment by a public key of the user; only the power department and the user can decrypt the data through the private key, and the power department and the user can trace the changes of the power dispatching request and the power equipment parameters in the past through accessing the block chain;
step eight, through the step five, the intelligent agent after data training predicts the power load of the user in the next time period, if surplus electric quantity exists, the user does not need to perform power declaration, the surplus power is stored in the next time period, if the intelligent agent predicts that the power load of the user in the next time period is too large and can not be self-sufficient, the intelligent agent performs power declaration to a power grid department according to the predicted power limit, and the scheduling of the power is completed after the power department audits the power load; for a plurality of received power dispatching applications, a power dispatching center needs to carry out sequential dispatching for ensuring safe and stable operation of a power grid and reliable power supply to the outside, dispatching information is stored and linked up, the power dispatching center carries out descending sequencing on the dispatching applications according to user credit values stored in a block chain, dispatching auditing and power dispatching are preferentially carried out on users with high credit values, in order to ensure that the updating of the credit values is fair and fair, credit value updating and intelligent contract management in the step six are used, and the credit values are updated according to actual power actual load data in the step six in the next time period.

Claims (1)

1. A power dispatching method based on a block chain and deep reinforcement learning is characterized by comprising the following steps:
step one, a user registers an account and adds equipment to be managed to obtain a user id and an equipment id, and a power grid department serves as a trusted and safe certificate authority to distribute public and private keys for the user;
step two, the intelligent electric meter submits the electric power parameters of the registered equipment and carries out cochain after encryption through the public key of the user, and the cochain data format is as follows:
<Uid,ID,Pc,P,Se,C,V>,
wherein Uid is a user ID, ID is a device ID, pc is a user public key, P is device generating power, se is a device state, C is device current, and V is device voltage;
performing data analysis on the power data, processing the data of the user at each stage in the power market, and taking the data of the uplink data after discretization preprocessing operations such as filtering, duplicate removal, error correction and the like as training samples of a subsequent network;
setting a DRL state space and an action space:
load power P of state space S at real time t load (t), time-of-use electricity price TOU (t) at time t, and charge state SOC (t) of the energy storage device at time t;
where SOC (t) is defined as:
SOC(t)=SOC(t-1)+Pload(t)·Δt/E b
E b is the maximum capacity of the energy storage device,
the state space S is defined as:
S={P load (t),TOU(t),SOC(t)};
the action space Act is defined as follows:
Figure FDA0003862278030000011
wherein
Figure FDA0003862278030000012
Carrying out discretization processing on continuous loads for load predicted values at the next moment, wherein the processing process is as follows:
Figure FDA0003862278030000013
Figure FDA0003862278030000014
Figure FDA0003862278030000015
Figure FDA0003862278030000021
P min indicating the minimum permissible load prediction, P max Indicating that the maximum load forecast is allowed,
Figure FDA0003862278030000022
the average score value is represented by the average score value,
Figure FDA0003862278030000023
is selected by the agent among a-Z;
b (t) >0 represents that the user has redundant electric energy and charges the energy storage device, the charging electric quantity is B (t), B (t) < 0 represents that the user lacks load and needs to apply for power dispatching to the power grid, and the electric energy needing to be dispatched is B (t);
step four, setting a DRL reward function R (t) and a user constraint and penalty mechanism:
in a learning link, a deep reinforcement learning algorithm needs to determine the updating direction and amplitude of the parameters of the controller according to the reward value returned by the external environment; in the market environment, the goal of optimizing control is to minimize the long-term electricity purchase cost of electricity purchasing users and to reduce the operating cost of existing equipment:
p is penalty cost, mainly reflecting the degree of violation of operation constraint by a user, and in order to prevent a malicious user from making extreme load prediction value operation constraint for benefit pursuit, the method comprises the following steps:
(1) Load capacity limit constraints:
Figure FDA0003862278030000024
(2) And (3) deviation electric quantity constraint:
Figure FDA0003862278030000025
in the formula: p is allow Representing the maximum deviation value allowed;
in the optimization process, if the user violates the constraint condition, the user pays punishment according to the out-of-limit degree of the user, and the reward is reduced, wherein the specific punishment cost is calculated as follows:
P=P 1 +P 2 +P 3
Figure FDA0003862278030000026
Figure FDA0003862278030000027
Figure FDA0003862278030000028
in the formula: rho and delta are corresponding penalty coefficients;
(II) operation cost:
f g (t)=TOU(t)·Δt(P load -B store ),
wherein B is store Representing the energy stored in the energy storage means, if the user does not have energy storage means, B store Is 0;
(iii) a reward function, where the user has N devices,
Figure FDA0003862278030000029
step five, DRL training based on the improved DQN:
(1) And (3) experience accumulation: evaluating the network according to the user state S in the power market environment t Outputting Q values of all actions in the action space Act, and selecting the action a according to a greedy strategy t Market feedback award r t And obtaining the next state S t+1 From this, a complete Markov tuple (S) is obtained t ,a t ,r t ,S t+1 ) Putting the samples into an experience pool as a sample set, and repeatedly training until the number of the samples reaches the set size of the experience pool;
(2) Update parameters of the Q function: randomly sampling the TD-Error of different sample data in the experience pool according to a priority sampling algorithm, and selecting the sample with the large TD-Error with higher probability for training;
(3) Training a neural network: constructing a loss function L of the neural network for training, and copying the parameters of the evaluation network to a target network completely after N rounds of complete training of the evaluation network;
(4) Optimizing parameters: if the benefit obtained by the controller is not increasing and tends to be stable for a long time, the evaluation network parameter at the moment is converged, otherwise, the steps (1) to (4) are repeatedly executed;
(5) The DRL output power operation execution command format is as follows:
<Uid,Pk,ID,Op,Qty>,
wherein Uid is a user ID, pk is a user public key, ID is an equipment ID, op is operation on equipment, namely applying for dispatching and storing electric quantity, and Qty is predicted electric quantity;
step six, in order to prevent resource waste caused by the fact that part of malicious users apply for power scheduling requests with excessive difference from self-needs when the time-of-use electricity price is low, a scheduling credit value is introduced and stored in a block chain, and the scheduling credit value is automatically managed according to an intelligent contract, when the users apply for scheduling power, the users need to upload past actual power data to the block chain together with the needed load needs predicted by an intelligent electric meter and a DRl controller for credit value calculation, and the following is the specific definition of the credit value:
if the users need power scheduling, a transaction request is sent to a block chain, a power grid company acquires transaction information and scheduling credit values TR of the users applying power scheduling from the block chain, and a power grid department performs power scheduling according to the scheduling credit values of all the users in descending order;
the scheduling reputation value updating mode is as follows:
Figure FDA0003862278030000031
Figure FDA0003862278030000032
wherein Q value (t) represents a predictive evaluation value at the ith scheduling,
Figure FDA0003862278030000033
for the calling power credit requested by the controller at time t,
Figure FDA0003862278030000041
alpha is a prediction deviation factor and alpha is the latest scheduling creditThe values are related, the higher the credit value is, the larger the deviation factor is, and N represents the number of times of accumulated application scheduling of the user;
the updating of the credit value is automatically updated by using an intelligent contract, and the credit value is stored on the block chain and is supervised by all users participating in the power dispatching platform;
step seven, the user uploads the scheduling information and the power information to a block chain for storage after the scheduling information and the power information are encrypted by the public key of the user, and the user credit value is also stored in the block chain in an uplink manner;
Figure FDA0003862278030000042
in order to enable the user reputation value updating to be supervised by all users, storing the reputation value plaintext on a block chain, and encrypting and storing the power parameters and the scheduling requests of user equipment by public keys of the users; only the power department and the user can decrypt the data through the private key, and the power department and the user can trace the changes of the power dispatching request and the power equipment parameters in the past through accessing the block chain;
step eight, through the step five, the intelligent agent after data training predicts the power load of the user in the next time period, if surplus electric quantity exists, the user does not need to perform power declaration, the surplus power is stored in the next time period, if the intelligent agent predicts that the power load of the user in the next time period is too large and can not be self-sufficient, the intelligent agent performs power declaration to a power grid department according to the predicted power limit, and the scheduling of the power is completed after the power department audits the power load; for a power grid department, a plurality of received power dispatching applications, a power grid dispatching center needs to carry out sequential dispatching in order to ensure safe and stable operation of a power grid and reliable external power supply, dispatching information is stored and uplink is stored, the power grid department carries out descending sequencing on the dispatching applications according to user credit values stored in a block chain, dispatching verification and power dispatching are preferentially carried out on users with high credit values, in order to ensure that the updating of the credit values is fair and fair, the credit values in the sixth step are used for updating intelligent contracts for management, and the credit values are updated according to actual power actual load data in the sixth step in the next time period.
CN202211167510.9A 2022-09-23 2022-09-23 Power dispatching method based on block chain and deep reinforcement learning Withdrawn CN115438873A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211167510.9A CN115438873A (en) 2022-09-23 2022-09-23 Power dispatching method based on block chain and deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211167510.9A CN115438873A (en) 2022-09-23 2022-09-23 Power dispatching method based on block chain and deep reinforcement learning

Publications (1)

Publication Number Publication Date
CN115438873A true CN115438873A (en) 2022-12-06

Family

ID=84249735

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211167510.9A Withdrawn CN115438873A (en) 2022-09-23 2022-09-23 Power dispatching method based on block chain and deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN115438873A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116233132A (en) * 2023-05-08 2023-06-06 成都理工大学 Energy block chain link point consensus method based on improved Raft consensus mechanism
CN116307606A (en) * 2023-03-24 2023-06-23 华北电力大学 Shared energy storage flexible operation scheduling method based on block chain
CN116703009A (en) * 2023-08-08 2023-09-05 深圳航天科创泛在电气有限公司 Operation reference information generation method of photovoltaic power generation energy storage system
CN117478306A (en) * 2023-12-28 2024-01-30 湖南天河国云科技有限公司 Block chain-based energy control method, storage medium and terminal equipment

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116307606A (en) * 2023-03-24 2023-06-23 华北电力大学 Shared energy storage flexible operation scheduling method based on block chain
CN116307606B (en) * 2023-03-24 2023-09-12 华北电力大学 Shared energy storage flexible operation scheduling method based on block chain
CN116233132A (en) * 2023-05-08 2023-06-06 成都理工大学 Energy block chain link point consensus method based on improved Raft consensus mechanism
CN116703009A (en) * 2023-08-08 2023-09-05 深圳航天科创泛在电气有限公司 Operation reference information generation method of photovoltaic power generation energy storage system
CN116703009B (en) * 2023-08-08 2024-01-09 深圳航天科创泛在电气有限公司 Operation reference information generation method of photovoltaic power generation energy storage system
CN117478306A (en) * 2023-12-28 2024-01-30 湖南天河国云科技有限公司 Block chain-based energy control method, storage medium and terminal equipment
CN117478306B (en) * 2023-12-28 2024-03-22 湖南天河国云科技有限公司 Block chain-based energy control method, storage medium and terminal equipment

Similar Documents

Publication Publication Date Title
Kirli et al. Smart contracts in energy systems: A systematic review of fundamental approaches and implementations
Zhang et al. Multi-agent safe policy learning for power management of networked microgrids
Yang et al. Automated demand response framework in ELNs: Decentralized scheduling and smart contract
Di Silvestre et al. Ancillary services in the energy blockchain for microgrids
CN115438873A (en) Power dispatching method based on block chain and deep reinforcement learning
Helseth et al. Detailed long‐term hydro‐thermal scheduling for expansion planning in the Nordic power system
Soares et al. Multi-dimensional signaling method for population-based metaheuristics: Solving the large-scale scheduling problem in smart grids
Dalal et al. Chance-constrained outage scheduling using a machine learning proxy
US20220179378A1 (en) Blockchain-Based Transactive Energy Systems
Ziras et al. A mid-term DSO market for capacity limits: How to estimate opportunity costs of aggregators?
Liu et al. A blockchain-based trustworthy collaborative power trading scheme for 5G-enabled social internet of vehicles
Zahid et al. Balancing electricity demand and supply in smart grids using blockchain
Kumar et al. Blockchain based optimized energy trading for e-mobility using quantum reinforcement learning
Veerasamy et al. Blockchain-based decentralized frequency control of microgrids using federated learning fractional-order recurrent neural network
Hou et al. A study on decentralized autonomous organizations based intelligent transportation system enabled by blockchain and smart contract
Medved et al. The use of intelligent aggregator agents for advanced control of demand response
CN102801524A (en) Trust-theory-based trusted service system based on trusted authentication system
El-adaway et al. Preliminary attempt toward better understanding the impact of distributed energy generation: An agent-based computational economics approach
Ngwira et al. Towards context-aware smart contracts for Blockchain IoT systems
Kell et al. A systematic literature review on machine learning for electricity market agent-based models
Robert et al. Economic emission dispatch of hydro‐thermal‐wind using CMQLSPSN technique
CN114977160A (en) Micro-grid group optimization operation strategy generation method, system, equipment and storage medium
Chatzidimitriou et al. Enhancing agent intelligence through evolving reservoir networks for predictions in power stock markets
Shang et al. An Information Security Solution for Vehicle-to-grid Scheduling by Distributed Edge Computing and Federated Deep Learning
An Game-theoretic methods for cost allocation and security in Smart Grid

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20221206

WW01 Invention patent application withdrawn after publication