CN113888327A - Energy internet transaction method and system based on reinforcement learning block chain energizing - Google Patents

Energy internet transaction method and system based on reinforcement learning block chain energizing Download PDF

Info

Publication number
CN113888327A
CN113888327A CN202111164320.7A CN202111164320A CN113888327A CN 113888327 A CN113888327 A CN 113888327A CN 202111164320 A CN202111164320 A CN 202111164320A CN 113888327 A CN113888327 A CN 113888327A
Authority
CN
China
Prior art keywords
energy
retailer
representing
operator
seller
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111164320.7A
Other languages
Chinese (zh)
Inventor
曹一凡
仇超
任晓旭
王晓飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202111164320.7A priority Critical patent/CN113888327A/en
Publication of CN113888327A publication Critical patent/CN113888327A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Development Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Technology Law (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Primary Health Care (AREA)
  • Tourism & Hospitality (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an energy internet transaction method and system based on reinforcement learning block chain energizing, which comprises the following steps: constructing a three-stage game model among the operators, the retailers and the producers on the basis of the energy transaction relation among the operators, the retailers and the producers on the blockchain transaction platform; solving game equilibrium points in the three-stage game model by using a distributed hierarchical strategy gradient algorithm, wherein the game equilibrium points comprise the optimal unit service price, the optimal unit energy price and the optimal energy demand; and the operator, the retailer and the seller carry out energy trading according to the game balance points. The invention can help operators and retailers to realize higher utility, and simultaneously, producers and sellers can obtain better utility.

Description

Energy internet transaction method and system based on reinforcement learning block chain energizing
Technical Field
The invention belongs to the technical field of energy Internet, and particularly relates to an energy Internet transaction method and system based on reinforcement learning block chain energizing.
Background
With the development trend of distributed energy, the Energy Internet (EI) is rapidly becoming a focus of attention. However, the proliferation of a large number of distributed energy sources and traditional control methods have hindered the development of energy internet due to the intermittency and uncertainty of distributed energy sources. Meanwhile, the advent of Software Defined Networking (SDN) has brought reliability and flexibility to solve these problems. The distributed energy market is gradually emerging in the energy internet due to reasonable price and efficient transmission. The energy source system enables traditional energy consumers to be converted into energy retailers, has the capability of producing, storing and selling distributed energy, and can reduce transmission loss and reduce the load peak of the energy Internet.
On the other hand, under the large trend of the internet of things, edge computing is widely applied to architectures of various network computing by virtue of advantages of the edge computing in network delay, expandability and reliability. In order to continuously provide reliable computing, storage and communication services, energy utilization and supply of devices such as edge servers and gateways are urgent issues to be explored.
In order to meet the energy requirements of emerging energy retailers and various types of edge equipment in the energy Internet, an energy trading market serving edge computing is constructed. While this model effectively addresses the needs of both parties, there are still a number of problems that need to be addressed: 1) the credit crisis between different trading entities makes it impossible to reliably conduct energy trading; (2) imperfect market modeling: the model establishment of each role in the existing energy trading market is not perfect enough, and a trading process of mutual interaction constraint is not formed; (3) unbalanced utility: the optimization mechanism in the current energy trading is mainly to maximize the utility of a certain party, and the utility balance among multiple parties is not considered. In addition, to reach a balance of utility, most current research utilizes the methods of game theory to simulate the interaction between parties in a transaction. The traditional method usually assumes a centralized organization to collect the user's information and help them to make relevant policies, which is the goal optimization based on complete information, neglecting the protection of the user's privacy parameters. Meanwhile, in real life, the complete information of an individual cannot be well acquired, especially some privacy parameters, so that the problem of difficult information collection is easily caused when a traditional method is adopted to make a relevant policy, and the traditional method cannot be used.
Disclosure of Invention
Aiming at the problem that user privacy protection and utility balance cannot be realized in the energy transaction based on edge calculation in the prior art, the invention provides an energy internet transaction method and system based on reinforcement learning block chain energizing. In order to solve the technical problems, the technical scheme adopted by the invention is as follows:
an energy internet transaction method based on reinforcement learning block chain energizing comprises the following steps:
s1, constructing a three-stage game model among the operator, the retailer and the seller based on the energy transaction relationship among the operator, the retailer and the seller on the blockchain transaction platform;
s2, solving game equilibrium points in the three-stage game model by using a distributed hierarchical strategy gradient algorithm, wherein the game equilibrium points comprise the optimal unit service price, the optimal unit energy price and the optimal energy requirement;
and S3, the operator, the retailer and the seller trade the energy according to the game equilibrium points obtained in the step S2.
The step S2 includes the following steps:
s2.1, setting network parameters of the three-stage game model;
s2.2, initializing the weight parameters of the three-stage game model;
s2.3, respectively obtaining the state of the operator
Figure BDA0003290882330000021
Status of retailer
Figure BDA0003290882330000022
And status of the producer and the seller
Figure BDA0003290882330000023
Each edge server in the blockchain trading platform utilizes Markov decisionThe program orderly converts the utility U of the operatoro(eta) selecting a suitable price per service eta as a reward function for the utility U of the operatoro(η) maximization, of retailer utility Ur(p) selecting an appropriate price per energy unit p as a reward function for the retailer Utility Ur(p) maximizing the respective marketer's utility
Figure BDA0003290882330000024
Selecting a suitable energy demand q as a reward functionjMake the producer and seller effective
Figure BDA0003290882330000025
And (4) maximizing.
In step S2.3, the status of the operator
Figure BDA0003290882330000026
The expression of (a) is:
Figure BDA0003290882330000027
in the formula, pt-1Represents the unit energy price at the step t-1,
Figure BDA0003290882330000028
representing the energy demand submitted by the seller to the local retailer through the edge server j at step t-1;
the operator utility UoThe expression for maximizing (η) is:
Figure BDA0003290882330000031
in the formula of UmRepresents an additional reward for the operator to obtain through the trusted blockchain service provided by each energy exchange, phi represents the transmission loss rate, ctRepresents a unit transmission cost, CoRepresenting a fixed operation and maintenance cost, ηminRepresents the lowest unit service price, etamaxIndicating the highest unit service priceLattice, qjRepresenting the energy demand submitted by the producer to the local retailer through edge server j, and N representing the set of all edge servers in the blockchain trading platform.
The additional reward UmThe calculation formula of (2) is as follows:
Um=(Rf+rs)λ;
in the formula, RfRepresenting a fixed block reward, r representing a block chain service charge provided to the operator by the producer and seller at the time of each energy transaction, s representing a block parameter, and λ representing a probability factor in the block chain.
Status of the retailer
Figure BDA0003290882330000032
The expression of (a) is:
Figure BDA0003290882330000033
in the formula etatRepresents the unit service price at step t,
Figure BDA0003290882330000034
representing the energy demand submitted by the seller to the local retailer through the edge server j at step t-1;
the retailer utility Ur(p) the expression for maximization is:
Figure BDA0003290882330000035
in the formula, CgRepresenting the production cost to be incurred by the retailer in producing energy, CsIndicating the storage cost, p, that the retailer is required to incur in storing energyminRepresents the lowest price per unit energy, pmaxRepresenting the highest price per unit energy, qjRepresenting the energy demand submitted by the producer to the local retailer through edge server j, and N representing the set of all edge servers in the blockchain trading platform.
Said production cost CgThe calculation formula of (2) is as follows:
Figure BDA0003290882330000036
in the formula, a, b and k are weighting factors of the power generation cost when the retailer produces, and phi represents the transmission loss rate.
The storage cost CsThe calculation formula of (2) is as follows:
Figure BDA0003290882330000041
in the formula, csIndicating the unit cost, ξ, of the retailer's stored energycRepresenting the charging efficiency, ξ, of the energy storage devicedRepresenting the discharge efficiency of the energy storage device.
Status of the producer and the seller
Figure BDA0003290882330000042
The expression of (a) is:
Figure BDA0003290882330000043
the effect of the producer and the seller
Figure BDA0003290882330000044
The expression maximized is:
Figure BDA0003290882330000045
in the formula, δ represents a conversion factor, wjRepresents the usage scenario of the edge server j in terms of energy utilization, qminRepresents the minimum energy requirement, qmaxRepresents the maximum energy demand, qjRepresenting the energy demand submitted by the seller to the local retailer through the edge server j, and r representing the energy supply offered by the seller to the shipper at the time of each energy transactionBlock chain service charges of the operator.
An energy internet transaction system based on reinforcement learning block chain energizing comprises an energy application layer, an energy data layer and an edge control layer, wherein the energy application layer is interacted with the edge control layer through an intelligent contract interface, and the energy data layer is interacted with the edge control layer; the energy application layer comprises a retailer and a producer and a seller, and the retailer and the producer and the seller interact through a blockchain transaction platform; the edge control layer comprises edge servers and distributed SDN controllers maintained by an operator, and each edge server is used as a node in a blockchain trading platform; the energy data layer comprises a switch and an energy router, wherein the switch is connected with the distributed SDN controller and used for receiving a scheduling instruction sent by the distributed SDN controller and forwarding the scheduling instruction to the corresponding energy router; the energy router is used for sensing the state of the energy line and reflecting the state of the energy line to the edge server.
The intelligent contract interface is established based on an intelligent contract system, the intelligent contract system comprises a user registration module, an energy transaction module, an energy transmission module, an energy recording module and an information query module, after operators, sellers and three-party participants of retailers register respective accounts through the user registration module, the sellers place orders through the energy transaction module according to self requirements, and the energy transaction realizes that energy flows from the retailers to the sellers through the energy transmission module; the energy recording module is used for recording respective electric quantity information of retailers and producers, and the information inquiry module is used for allowing each party to inquire the account information of each party.
The invention has the beneficial effects that:
compared with the popular deep reinforcement learning algorithm, the invention can help operators and retailers to realize higher utility, and simultaneously, producers and sellers can obtain better utility. Under the unified pricing mechanism, the convergence sequence of different entities is consistent with the action sequence of the three phases of the Stackelberg game, so that the leader in the game is more likely to obtain better benefits than the followers.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic diagram of a transaction system of the present invention.
Fig. 2 is a schematic diagram of a three-stage Stackelberg gaming model.
FIG. 3 is a block diagram of a hierarchical policy gradient algorithm.
Fig. 4 is a structural diagram of an intelligent contract system.
Fig. 5 shows game convergence performance under HDPG.
Fig. 6 is a graph of the performance of different algorithms in comparison.
FIG. 7 is a graph of utility under different parameters.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
Example 1: an energy internet transaction method based on reinforcement learning block chain energizing comprises the following steps:
s1, constructing a three-stage game model among the operator, the retailer and the seller based on the energy transaction relationship among the operator, the retailer and the seller on the blockchain transaction platform;
as shown in fig. 2 and 3, the three-stage game model includes three policy networks, which correspond to an operator, a retailer and a seller, respectively, and the seller submits the energy demand to the operator through the blockchain transaction platform, and the operator as an intermediary facilitates the energy transaction and transmission between the retailer and the seller through the blockchain transaction platform. The blockchain trading platform is composed of a plurality of edge servers, each edge server serves as a node in the blockchain trading platform and takes on functions of accounting, broadcasting, verifying and identifying, and a set of the edge servers is represented as N ═ 1, 2.
S2, solving a game equilibrium point in the three-stage game model by using a distributed policy gradient (HDPG) algorithm;
considering a Stackelberg game under incomplete information, solving game equilibrium points in a three-stage game model by utilizing a Markov decision process, wherein one Markov decision process is equivalent to the whole process of primary energy transaction and transmission, and the game equilibrium points comprise the optimal unit service price, the optimal unit energy price and the optimal energy demand, so that the utility U of an operator is maximizedoEta, retailer Utility Ur(p) and Producer and seller effects
Figure BDA0003290882330000061
Wherein η represents a unit service price, p represents a unit energy price, q represents a unit energy price, andjrepresenting the energy demand submitted by the producer to the local retailer through edge server j, and j e N.
The step S2 includes the following steps:
s2.1, setting network parameters of three strategy networks;
the network parameters include a learning rate alpha of a retailer policy networkrLearning rate alpha of policy network of production and marketing personspOperator policy network learning rate alphaoA discount factor gamma.
S2.2, weighting parameters of three strategy networks
Figure BDA0003290882330000062
Carrying out random initialization;
s2.3, respectively obtaining the state of the operator
Figure BDA0003290882330000063
Status of retailer
Figure BDA0003290882330000064
And status of the producer and the seller
Figure BDA0003290882330000065
Each edge server sequentially compares operator utility U using a Markov decision processo(eta) selecting a suitable price per service eta as a reward function for the utility U of the operatoro(η) maximization, of retailer utility Ur(p) selecting an appropriate price per energy unit p as a reward function for the retailer Utility Ur(p) maximizing the respective marketer's utility
Figure BDA0003290882330000071
Selecting appropriate energy requirements as a reward function
Figure BDA0003290882330000072
Make the producer and seller effective
Figure BDA0003290882330000073
Maximum, and ηt∈Ao,AoRepresenting the operator's action space, pt∈Ar,ArThe action space of the retailer is represented,
Figure BDA0003290882330000074
Aprepresenting the action space of the production and marketing person;
status of the operator
Figure BDA0003290882330000075
The expression of (a) is:
Figure BDA0003290882330000076
in the formula, pt-1Represents the unit energy price at the step t-1,
Figure BDA0003290882330000077
representing the energy demand submitted by the producer to the local retailer via edge server j at step t-1.
The operator utility UoThe expression for maximizing (η) is:
Figure BDA0003290882330000078
in the formula of UmRepresents an additional reward for the operator to obtain through the trusted blockchain service provided by each energy exchange, phi represents the transmission loss rate, ctRepresents a unit transmission cost, CoRepresenting a fixed operation and maintenance cost, ηminRepresents the minimum unit service cost, etamaxRepresenting the highest unit service cost.
The additional reward UmThe calculation formula of (2) is as follows:
Um=(Rf+rs)λ;
in the formula, RfRepresenting a fixed block reward, r representing a block chain service charge provided to the operator by the producer and seller at the time of each energy transaction, s representing a block parameter, and λ representing a probability factor in the block chain. By additional award UmOperators may be further incentivized to maintain blockchains.
Status of the retailer
Figure BDA0003290882330000079
The expression of (a) is:
Figure BDA00032908823300000710
the retailer utility Ur(p) from the price per energy p and the total energyDemand determination, retailer utility Ur(p) the expression for maximization is:
Figure BDA00032908823300000711
in the formula, CgRepresenting the production cost to be incurred by the retailer in producing energy, CsIndicating the storage cost, p, that the retailer is required to incur in storing energyminRepresents the lowest price per unit energy, pmaxRepresenting the highest price per energy unit.
Said production cost CgThe calculation formula of (2) is as follows:
Figure BDA0003290882330000081
in the formula, a, b and k are weighting factors of the power generation cost.
The storage cost CsThe calculation formula of (2) is as follows:
Figure BDA0003290882330000082
in the formula, csIndicating the unit cost, ξ, of the retailer's stored energycRepresenting the charging efficiency, ξ, of the energy storage devicedRepresenting the efficiency of the discharge of the energy storage device,
Figure BDA0003290882330000083
indicating the energy actually produced and stored by the retailer, taking into account the energy loss during transmission.
Status of the producer and the seller
Figure BDA0003290882330000084
The expression of (a) is:
Figure BDA0003290882330000085
the effect of the producer and the seller
Figure BDA0003290882330000086
The expression maximized is:
Figure BDA0003290882330000087
in the formula, δ represents a conversion factor, wjRepresents the usage scenario of the edge server j in terms of energy utilization, qminRepresents the minimum energy requirement, qmaxRepresents the maximum energy demand, pqjIndicating that the seller paid the retailer for energy,
Figure BDA0003290882330000088
indicating the benefits obtained by the edge server j in actual production using its purchased energy.
The specific flow of the markov decision process is the prior art, and the detailed description is not repeated, and in addition, the time complexity of each iteration of the outer loop, namely the total iteration for executing the markov decision process, the middle loop, namely the number of iteration steps of each strategy network, and the inner loop, namely the number of producers in the producer strategy network is O (E), O (T) and O (N) respectively. Each policy network comprises two full connection layers, and the time complexity of each full connection layer is expressed as
Figure BDA0003290882330000089
Wherein KlRefers to the number of fully connected neural units, and L represents the number of layers of a policy network. Because each policy network contains two fully-connected layers to generate policies, the overall time complexity of the algorithm is O (ETN (T)f))。
Operator policy network update weight parameter θoThe formula of (1) is:
Figure BDA0003290882330000091
in the formula (I), the compound is shown in the specification,
Figure BDA0003290882330000092
denotes the policy of the operator at step t,
Figure BDA0003290882330000093
indicating the operator's reward at step t,
Figure BDA0003290882330000094
represents the status of the operator at step t, an
Figure BDA0003290882330000095
SoRepresenting the state space of the operator, alphaoRepresenting the learning rate of the operator policy network.
Retailer policy network update weight parameter θrThe formula of (1) is:
Figure BDA0003290882330000096
in the formula (I), the compound is shown in the specification,
Figure BDA0003290882330000097
indicating the policy of the retailer at step t,
Figure BDA0003290882330000098
indicating the retailer's reward at step t,
Figure BDA0003290882330000099
indicates the status of the retailer at step t, an
Figure BDA00032908823300000910
SrRepresenting the status space of the retailer, αrRepresenting the learning rate of the retailer policy network.
Producer and seller policy network update weight parameter
Figure BDA00032908823300000911
The formula of (1) is:
Figure BDA00032908823300000912
in the formula (I), the compound is shown in the specification,
Figure BDA00032908823300000913
shows the strategy of the producer at step t,
Figure BDA00032908823300000914
indicating the prize to the seller at step t,
Figure BDA00032908823300000915
shows the status of the producer and seller at step t, and
Figure BDA00032908823300000916
Spa status space representing the producer and the seller,
Figure BDA00032908823300000917
representing the learning rate of the producer-seller policy network.
In this embodiment, the sellers refer to distributed energy consumers who cannot produce energy by themselves or whose energy produced cannot satisfy their energy consumption, and they can purchase energy according to energy demand and unit energy price from public energy companies or BSDEI retailers.
The retailer refers to an energy user who uses distributed power generation equipment and energy storage equipment and has the total power generation amount larger than the total energy consumption amount; retailers obtain revenue by providing energy to various distributed applications. On the other hand, they need to bear the cost of distributed power generation, energy storage, and pay the operator for transport routing services.
The operator assists in completing the energy transaction process for the intermediary between the seller and the seller. In order to provide more convenient service and lower delay, an operator deploys hardware devices such as an edge server and a distributed SDN controller at an edge control layer, so as to implement edge-edge coordination between the devices. In return, it charges the retailer for transport routing services and the seller for trusted blockchain services.
Example 2: an energy internet transaction system based on reinforcement learning block chain energizing is disclosed, as shown in fig. 1, the system comprises an energy application layer, an energy data layer and an edge control layer, which together form an energy transaction service system of a distributed energy market, the three layers are mutually independent and associated, and energy routing and scheduling control in a block chain energizing energy internet (BEI) are decoupled; the energy application layer is interacted with the edge control layer through an intelligent contract interface, and the energy data layer is interacted with the edge control layer through a standard interface OpenFlow; the energy application layer comprises retailers and production and marketing persons, the retailers and the production and marketing persons directly carry out information interaction through a blockchain trading platform, the retailers and the production and marketing persons can be stimulated to participate in the distributed energy market more actively through the direct trading process, and the blockchain trading platform provides a reliable and stable third-party service platform for energy trading in the energy application layer; the edge control layer comprises an edge server maintained by an operator and a distributed SDN controller for scheduling and controlling energy routing of an energy data layer; each edge server is used as a node in a blockchain transaction platform and takes charge of functions of accounting, broadcasting, verification and consensus, and the set of edge servers is represented as N ═ 1, 2.., j., N'; the intelligent contract provides reliable automatic process control for energy trading in the blockchain trading platform; the energy data layer comprises a switch and an energy router, the switch is used for receiving a scheduling instruction sent by a distributed SDN controller of the edge control layer and forwarding the scheduling instruction to the corresponding energy router, and the energy router is used for sensing the state of an energy line and reflecting the real-time state of the energy line to an edge server, so that the distributed SDN controller is facilitated to modify the scheduling instruction; the state of the energy line comprises the electric energy value, the voltage, the current and the like on the energy line. Furthermore, the energy router may also receive a command from the distributed SDN controller to change a state of the energy router.
As shown in fig. 4, the intelligent contract interface is established based on an intelligent contract system, the intelligent contract system includes a user registration module, an energy transaction module, an energy transmission module, an energy recording module and an information query module, after operators, sellers and three-party participants of retailers register their respective accounts through the user registration module, the sellers place orders through the energy transaction module according to their own needs, and the energy transaction realizes energy flow from the retailers to the sellers through the energy transmission module; the energy recording module is used for recording respective electric quantity information of retailers and producers, and the information inquiry module is used for allowing each party to inquire the account information of each party.
Specifically, in the user registration module, the intelligent contract deployer is an initial administrator of the trading system and initializes some common parameters. The participant needs to register an account according to the user name, the account address and the account type of the participant through the user registration module. Information for these accounts, including available energy and energy currency, and total energy generated and used, is then initialized. In view of the cold start problem, the participant may obtain the energy currency through the energy transaction module. After that, the producer confirms the exact amount of the required energy and places an order, and then the transaction process is completed through the energy transaction module, and the whole energy transaction process is transacted based on the method described in embodiment 1. The energy transaction module can verify the authority of the account, which also ensures sufficient balance of energy; then, an energy transmission module is called, which not only clears the transaction between the retailer and the operator, but also activates an energy scheduling switch, so that energy is ensured to flow from the retailer to the producer and the seller; in the energy recording module, according to the data of the intelligent electric meters at the locations of retailers and producers and sellers, the energy recording module can modify the total generated energy and the total used electric quantity, and further, the cost is calculated according to the electric quantity; the information inquiry module provides six types of interfaces for each party to inquire the account information of the party.
The following sets of 1 operator, 1 retailer and 10 edge servers illustrate the performance of the invention in terms of convergence performance under a uniform pricing mechanism. Since the optimal demand of the edge servers is very similar under the unified pricing mechanism, as shown in fig. 5, the operator and the retailer obtain quite good utility by selecting one of the edge servers for display. Under the high-level policies of the operator and the middle-level policies of the retailer, the producers, i.e., consumers, can also quickly converge to a relatively good solution. The convergence order of the different entities is consistent with the order of the actions of the three phases of the Stackelberg game, so that the leader is more likely to receive a better benefit than the follower.
In order to demonstrate the excellent performance of the present invention, as shown in fig. 6, the present invention is compared with some popular deep reinforcement learning algorithms from the aspect of economic analysis, and in order to reduce random errors, the data in the figure is the average of 10 experimental results. As is apparent from fig. 6a, HDPG obtains more total rewards than three deep reinforcement learning algorithms, PPO also known as proximity strategy optimization, SAC also known as flexactor-critic, DQN also known as deep Q learning. In addition, different algorithms have their own features, such as SAC assisting retailers in obtaining the highest utility, but underperforming in the operator's strategy. In contrast, HDPGs help operators and retailers achieve higher utility. In addition, the edge devices also achieve better utility using HDPGs. The hierarchical design of the actions and learning process of the multi-agent is beneficial for the agent to learn its own strategy according to the competition strategy, which is a potential reason for better performance of the HDPG.
As shown in fig. 7, the parameter sensitivity of the utility as a function of the number of production users was analyzed. Since the utility of the edge server is not too sensitive to the number of participants, a boxplot is used to carefully describe the impact of the edge server energy usage on the utility of the edge server. As shown in fig. 7a, it is wise to have edge servers with high value production participate in the energy market. As can be seen from fig. 7b and 7c, the transmission loss rate largely determines the utility of the operator and retailer. Therefore, it is of great significance to adopt more advanced techniques to reduce transmission loss rates in energy transmission and distribution networks. As the number of edge servers increases, the utility of carriers and retailers steadily increases, while the utility of each edge server has a slightly decreasing trend. It should be noted that the increase in the number of edge servers may affect the policy of each edge server, resulting in some fluctuation in the utility of all entities. Fig. 7d shows the trend of the utility of the retailer for different energy storage efficiencies. Low energy storage efficiency may result in negative utility, while the higher the efficiency of the energy storage device, the better the utility of the retailer. However, the cost and difficulty of reducing the transmission loss rate and improving the energy storage efficiency are also difficulties in the energy trading market.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

1. An energy internet transaction method based on reinforcement learning block chain energizing is characterized by comprising the following steps:
s1, constructing a three-stage game model among the operator, the retailer and the seller based on the energy transaction relationship among the operator, the retailer and the seller on the blockchain transaction platform;
s2, solving game equilibrium points in the three-stage game model by using a distributed hierarchical strategy gradient algorithm, wherein the game equilibrium points comprise the optimal unit service price, the optimal unit energy price and the optimal energy requirement;
and S3, the operator, the retailer and the seller trade the energy according to the game equilibrium points obtained in the step S2.
2. The reinforcement learning blockchain-based energy internet transaction method of claim 1, wherein the step S2 comprises the steps of:
s2.1, setting network parameters of the three-stage game model;
s2.2, initializing the weight parameters of the three-stage game model;
s2.3, respectively obtaining the state of the operator
Figure FDA0003290882320000011
Status of retailer
Figure FDA0003290882320000012
And status of the producer and the seller
Figure FDA0003290882320000013
Each edge server in the blockchain trading platform sequentially converts the utility U of the operator by utilizing a Markov decision processo(eta) selecting a suitable price per service eta as a reward function for the utility U of the operatoro(η) maximizing retailer Ur (p) and selecting a suitable price per unit of energy p as a reward function to maximize retailer Ur (p)r(p) maximizing the respective marketer's utility
Figure FDA0003290882320000014
Selecting a suitable energy demand q as a reward functionjMake the producer and seller effective
Figure FDA0003290882320000015
And (4) maximizing.
3. The reinforcement learning blockchain-based energy internet transaction method of claim 2, wherein in step S2.3, the status of the operator
Figure FDA0003290882320000016
The expression of (a) is:
Figure FDA0003290882320000017
in the formula, pt-1Represents the unit energy price at the step t-1,
Figure FDA0003290882320000018
shows that the person who produces and sells is in step t-1Energy demand submitted to the local retailer by the edge server j;
the operator utility UoThe expression for maximizing (η) is:
Figure FDA0003290882320000019
in the formula of UmRepresents an additional reward for the operator to obtain through the trusted blockchain service provided by each energy exchange, phi represents the transmission loss rate, ctRepresents a unit transmission cost, CoRepresenting a fixed operation and maintenance cost, ηminRepresents the lowest unit service price, etamaxRepresents the highest price per unit of service, qjRepresenting the energy demand submitted by the producer to the local retailer through edge server j, and N representing the set of all edge servers in the blockchain trading platform.
4. The reinforcement learning blockchain-based energy internet transaction method of claim 3, wherein the additional reward U ismThe calculation formula of (2) is as follows:
Um=(Rf+rs)λ;
in the formula, RfRepresenting a fixed block reward, r representing a block chain service charge provided to the operator by the producer and seller at the time of each energy transaction, s representing a block parameter, and λ representing a probability factor in the block chain.
5. The reinforcement learning blockchain-based energy internet transaction method of claim 2, wherein the status of the retailer
Figure FDA0003290882320000021
The expression of (a) is:
Figure FDA0003290882320000022
in the formula etatRepresents the unit service price at step t,
Figure FDA0003290882320000023
representing the energy demand submitted by the seller to the local retailer through the edge server j at step t-1;
the retailer utility Ur(p) the expression for maximization is:
Figure FDA0003290882320000024
in the formula, CgRepresenting the production cost to be incurred by the retailer in producing energy, CsIndicating the storage cost, p, that the retailer is required to incur in storing energyminRepresents the lowest price per unit energy, pmaxRepresenting the highest price per unit energy, qjRepresenting the energy demand submitted by the producer to the local retailer through edge server j, and N representing the set of all edge servers in the blockchain trading platform.
6. The reinforcement learning blockchain-based energy internet transaction method of claim 5, wherein the production cost C isgThe calculation formula of (2) is as follows:
Figure FDA0003290882320000025
in the formula, a, b and k are weighting factors of the power generation cost when the retailer produces, and phi represents the transmission loss rate.
7. The reinforcement learning blockchain-based energy internet transaction method of claim 5, wherein the storage cost C issThe calculation formula of (2) is as follows:
Figure FDA0003290882320000031
in the formula, csIndicating the unit cost, ξ, of the retailer's stored energycRepresenting the charging efficiency, ξ, of the energy storage devicedRepresenting the discharge efficiency of the energy storage device.
8. The reinforcement learning blockchain-based energy internet transaction method of claim 2, wherein the status of the sellers
Figure FDA0003290882320000032
The expression of (a) is:
Figure FDA0003290882320000033
the effect of the producer and the seller
Figure FDA0003290882320000034
The expression maximized is:
Figure FDA0003290882320000035
in the formula, δ represents a conversion factor, wjRepresents the usage scenario of the edge server j in terms of energy utilization, qminRepresents the minimum energy requirement, qmaxRepresents the maximum energy demand, qjRepresenting the energy demand submitted by the producer to the local retailer through the edge server j and r representing the blockchain service charge provided by the producer to the operator at the time of each energy transaction.
9. An energy internet transaction system based on reinforcement learning block chain energizing is characterized by comprising an energy application layer, an energy data layer and an edge control layer, wherein the energy application layer is interacted with the edge control layer through an intelligent contract interface, and the energy data layer is interacted with the edge control layer; the energy application layer comprises a retailer and a producer and a seller, and the retailer and the producer and the seller interact through a blockchain transaction platform; the edge control layer comprises edge servers and distributed SDN controllers maintained by an operator, and each edge server is used as a node in a blockchain trading platform; the energy data layer comprises a switch and an energy router, wherein the switch is connected with the distributed SDN controller and used for receiving a scheduling instruction sent by the distributed SDN controller and forwarding the scheduling instruction to the corresponding energy router; the energy router is used for sensing the state of the energy line and reflecting the state of the energy line to the edge server.
10. The energy internet transaction system based on reinforcement learning blockchain energizing according to claim 9, wherein the intelligent contract interface is established based on an intelligent contract system, the intelligent contract system comprises a user registration module, an energy transaction module, an energy transmission module, an energy recording module and an information query module, after an operator, a seller and a retailer three-party participant register respective accounts through the user registration module, the seller places orders through the energy transaction module according to own needs, and the transaction of energy realizes the energy flow from the retailer to the seller through the energy transmission module; the energy recording module is used for recording respective electric quantity information of retailers and producers, and the information inquiry module is used for allowing each party to inquire the account information of each party.
CN202111164320.7A 2021-09-30 2021-09-30 Energy internet transaction method and system based on reinforcement learning block chain energizing Pending CN113888327A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111164320.7A CN113888327A (en) 2021-09-30 2021-09-30 Energy internet transaction method and system based on reinforcement learning block chain energizing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111164320.7A CN113888327A (en) 2021-09-30 2021-09-30 Energy internet transaction method and system based on reinforcement learning block chain energizing

Publications (1)

Publication Number Publication Date
CN113888327A true CN113888327A (en) 2022-01-04

Family

ID=79004931

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111164320.7A Pending CN113888327A (en) 2021-09-30 2021-09-30 Energy internet transaction method and system based on reinforcement learning block chain energizing

Country Status (1)

Country Link
CN (1) CN113888327A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114529107A (en) * 2022-04-21 2022-05-24 南方电网数字电网研究院有限公司 Energy transaction data processing method and device, computer equipment and storage medium
CN115660896A (en) * 2022-11-11 2023-01-31 深圳市人工智能与机器人研究院 Excitation mechanism configuration method of energy system based on block chain and related equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150278839A1 (en) * 2000-06-28 2015-10-01 Buymetrics, Inc. Automated system for adapting market data and evaluating the market value of items
CN109784926A (en) * 2019-01-22 2019-05-21 华北电力大学(保定) A kind of virtual plant internal market method of commerce and system based on alliance's block chain
CN111107506A (en) * 2020-01-02 2020-05-05 南京邮电大学 Network resource safety sharing method based on block chain and auction game
CN111460358A (en) * 2020-03-23 2020-07-28 四川大学 Park operator energy transaction optimization decision method based on supply and demand game interaction
CN111556508A (en) * 2020-05-20 2020-08-18 南京大学 Stackelberg game multi-operator dynamic spectrum sharing method facing large-scale IoT access

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150278839A1 (en) * 2000-06-28 2015-10-01 Buymetrics, Inc. Automated system for adapting market data and evaluating the market value of items
CN109784926A (en) * 2019-01-22 2019-05-21 华北电力大学(保定) A kind of virtual plant internal market method of commerce and system based on alliance's block chain
CN111107506A (en) * 2020-01-02 2020-05-05 南京邮电大学 Network resource safety sharing method based on block chain and auction game
CN111460358A (en) * 2020-03-23 2020-07-28 四川大学 Park operator energy transaction optimization decision method based on supply and demand game interaction
CN111556508A (en) * 2020-05-20 2020-08-18 南京大学 Stackelberg game multi-operator dynamic spectrum sharing method facing large-scale IoT access

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114529107A (en) * 2022-04-21 2022-05-24 南方电网数字电网研究院有限公司 Energy transaction data processing method and device, computer equipment and storage medium
CN115660896A (en) * 2022-11-11 2023-01-31 深圳市人工智能与机器人研究院 Excitation mechanism configuration method of energy system based on block chain and related equipment
CN115660896B (en) * 2022-11-11 2024-09-10 深圳市人工智能与机器人研究院 Excitation mechanism configuration method and related equipment of energy system based on block chain

Similar Documents

Publication Publication Date Title
Mengelkamp et al. Trading on local energy markets: A comparison of market designs and bidding strategies
CN109190824B (en) Alliance game optimization operation method based on user side area comprehensive energy system
CN111563786B (en) Virtual power plant regulation and control platform based on block chain and operation method
CN112054513B (en) Hybrid game-based multi-microgrid double-layer coordination optimization scheduling method
Dong et al. Decentralized peer-to-peer energy trading strategy in energy blockchain environment: A game-theoretic approach
CN112381263B (en) Block chain-based distributed data storage multi-microgrid pre-day robust electric energy transaction method
Teng et al. Efficient blockchain-enabled large scale parked vehicular computing with green energy supply
Wang et al. Modelling and analysis of a two-level incentive mechanism based peer-to-peer energy sharing community
CN113888327A (en) Energy internet transaction method and system based on reinforcement learning block chain energizing
CN110119963A (en) A kind of micro-capacitance sensor power trade method based on principal and subordinate's intelligence contract
CN109902952A (en) A kind of photovoltaic micro electric intelligent transaction system and method based on block chain
CN111199487A (en) Energy internet energy storage and transaction system based on cloud
Luo et al. A hierarchical blockchain architecture based V2G market trading system
CN110738375A (en) Active power distribution network power transaction main body optimization decision method based on alliance chain framework
Li et al. A game-based combinatorial double auction model for cloud resource allocation
CN111815369A (en) Multi-energy system energy scheduling method based on deep reinforcement learning
CN112688335A (en) Real-time demand response modeling method based on game balance
Vytelingum et al. Agent-based modeling of smart-grid market operations
CN112837126A (en) Industrial park distributed energy trading method and system based on block chain
CN110675065A (en) Virtual power plant bidding modeling method, system and medium containing wind power and electric automobile
Manjunatha et al. Auction based single buyer energy trading model in grid-tied microgrid with active sellers and buyers
Thangavelu et al. Transactive energy management systems: Mathematical models and formulations
Pałka et al. Balancing electric power in a microgrid via programmable agents auctions
Wang et al. Bi-level multi-agents interactive decision-making model in regional integrated energy system
Tong et al. A bilateral game approach for task outsourcing in multi-access edge computing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination