CN113132232B - Energy route optimization method - Google Patents

Energy route optimization method Download PDF

Info

Publication number
CN113132232B
CN113132232B CN202110261579.7A CN202110261579A CN113132232B CN 113132232 B CN113132232 B CN 113132232B CN 202110261579 A CN202110261579 A CN 202110261579A CN 113132232 B CN113132232 B CN 113132232B
Authority
CN
China
Prior art keywords
energy
network
router
source
reinforcement learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110261579.7A
Other languages
Chinese (zh)
Other versions
CN113132232A (en
Inventor
郭盛
曹军威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202110261579.7A priority Critical patent/CN113132232B/en
Publication of CN113132232A publication Critical patent/CN113132232A/en
Application granted granted Critical
Publication of CN113132232B publication Critical patent/CN113132232B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/14Routing performance; Theoretical aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The embodiment of the application discloses an energy route optimization method, which comprises the following steps: acquiring a network structure formed by interconnection of N energy routers in an energy internet, and a state information historical data set and equipment parameters of each energy router; constructing and training a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network according to the network structure; and inputting the real-time state information and equipment parameters of each energy router in the energy Internet and the newly added load in the energy Internet into a deep reinforcement learning model to obtain the optimal power supply energy router and energy routing path. The energy transfer process between the multiple energy routers in the energy internet is learned in a mode of combining the graph convolutional neural network and the deep reinforcement learning, so that the energy transmission line can be quickly and accurately optimized by utilizing real-time data of the energy routers, the loss of energy transmission is reduced, and the power supply efficiency and reliability are improved.

Description

Energy route optimization method
Technical Field
The application relates to the technical field of artificial intelligence, in particular to an energy route optimization method.
Background
The energy internet becomes the development direction of the current electric energy system due to the advantages of the consumption capability of distributed energy resources such as photovoltaic and wind power and the energy information sharing, and in the energy internet, energy supply and energy utilization users can perform equal and free energy exchange. As a core device in energy and information exchange of an energy Internet, an energy router needs to provide reasonable energy supply node selection and optimal energy routing according to energy consumption requirements so as to meet the energy consumption requirements of users and reduce loss in energy transmission.
Currently, energy routers are powered by nodes and energy routing optimization mainly adopts two modes of a routing table and a dynamic optimization algorithm.
The routing table needs to store information of all energy routers and lines in each energy router, so that when a newly added load is transferred to one energy router, the next node of energy transfer is determined by querying the routing table. However, in the actual energy internet, due to the power generation of new energy and the fluctuation of load, the state information of each energy router changes dynamically, and the routing table needs to be updated frequently, so that the transmission load of information flow in the energy internet is increased; meanwhile, the routing table can only reflect the information of the line, and the factors such as loss in energy transmission and the like cannot be considered. When the dynamic optimization algorithm is used for optimizing the energy supply nodes and the energy routes of the energy router, although the loss in energy transmission can be considered, when more considered factors such as the electric energy quality, the energy supply price and the voltage stability are added, the solution of the dynamic optimization algorithm becomes very complicated, and the real-time performance and the accuracy of the result cannot be ensured.
Disclosure of Invention
Because the existing method has the above problems, the embodiments of the present application provide an energy route optimization method.
Specifically, the embodiment of the present application provides the following technical solutions:
the embodiment of the application provides an energy route optimization method, which comprises the following steps:
acquiring a network structure formed by interconnection of N energy routers in an energy internet, and a state information historical data set and equipment parameters of each energy router;
constructing a reinforcement learning environment of the energy Internet according to the operation principle of the energy Internet and the energy router, and constructing a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network according to the network structure; the reinforcement learning Q network, the actor network and the critic network all adopt fully-connected neural network structures;
according to the state information historical data set and the equipment parameters of each energy router in the energy Internet, the available power supply capacity, the power supply quality, the reliability and the price of each energy router are integrated, the number of routes and the length of a transmission line which are required to pass by a newly added load to the energy router are increased, an evaluation Q function of the power supply energy router is constructed, and a reward R function of each energy transfer is constructed for the energy transfer between every two energy routers; the evaluation Q function is used for calculating the reward value of the power supply energy router; the reward R function is used for calculating a reward value obtained by executing each energy transfer action;
initializing parameter θ of graph convolutional neural network, reinforcement learning Q network, actor network, and critic networkG、θQ、θaAnd thetac(ii) a By adding at random energy routersThe load simulates the situation of newly added load, the state information historical data set and equipment parameters of each energy router in the energy Internet are used as input samples, and the evaluation Q function, the energy routing path and the reward R function of each energy router are used as output samples, so that a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network is trained;
for the requirement of new loads in the energy Internet, inputting the real-time state information and equipment parameters of each energy router in the energy Internet and the new loads into the deep reinforcement learning model to obtain an optimal power supply energy router and an optimal energy routing path; the optimal power supply energy router is the energy router with the maximum Q function value, and the optimal energy routing path is output by the actor network.
Optionally, the state information history data set of the energy router includes: load power in the microgrid connected to the energy router, power of the power generation equipment and microgrid voltage.
Optionally, the device parameters of the energy router include: energy transfer efficiency, maximum power generation capacity, and maximum load capacity.
Optionally, the structure of the graph convolutional neural network includes: the number of graph adjacency matrices, the number of graph convolution layers, and the number of convolution kernels per layer of graph convolution.
Optionally, the structure of the reinforcement learning Q network, the actor network, and the criticc network includes the number of layers of the network, the type of each layer, and the input/output size.
Optionally, the evaluation Q function is:
Q(nload,nsource)=-α1lline2crouter3rsource4psource4qsourceΔP;
wherein n isloadEnergy router, n, indicating where the newly added load is locatedsourceEnergy router supplying power to the energy router node where the newly added load is located, Δ P representing the newly added load, llineRepresents nloadAnd nsourceTotal length of connecting line between crouterRepresents nloadAnd nsourceNumber of routers in between, rsourceRepresents nsourceReliability of power supply of psourceRepresents nsourcePrice of power supply per unit energy, qsourceRepresents nsourceQuality of supply of alpha1、α2、α3、α4And alpha5Coefficients of 5 parts of evaluations, respectively, for adjusting specific gravity;
for energy transfer between every two energy routers, an energy transfer starting node n is givenstartEnergy receiving node nendAnd the load to be transferred, Δ P, the corresponding reward R function is:
R(nstart,nend)=-β1Lline2Lrouter3ΔU;
wherein L islineRepresenting the energy loss on the line during the energy transfer, LrouterRepresents the energy loss of the energy router in the energy transfer process, and Delta U is the absolute value of the voltage deviation brought by the transfer process, beta1、β2And beta3Coefficients of 3 bonus points, respectively, are used to adjust the specific gravity.
Optionally, the initializing graph convolutional neural network includes:
constructing an adjacency matrix of the graph according to the network structure of the energy Internet, and determining the number of graph convolution layers and the number of convolution kernels of each layer of graph convolution; the construction method of the adjacency matrix A comprises the following steps:
Figure BDA0002970258360000041
initialization of a reinforcement learning Q network, comprising: determining the number of layers of the network and the type of each layer, wherein the input size is the same as the output size of the graph convolution neural network, the output size is N-1, and the set does not contain all energy routers of the energy router correspondingly;
initialization of an actor network, comprising: determining the number of layers of the network and the type of each layer, wherein the input size is the same as the output size of the graph convolution neural network, the output size is 1, namely the number of the energy transfer target router;
initialization of the critic network includes: and determining the number of layers of the network and the type of each layer, wherein the input size is the same as the output size of the graph convolution neural network, and the output size is 1, namely the evaluation value of the energy transfer process.
Optionally, the inputting the real-time state information and the device parameters of each energy router in the energy internet and the newly added load into the deep reinforcement learning model to obtain an optimal power supply energy router and an optimal energy routing path includes:
inputting the real-time state information and equipment parameters of each energy router in the energy Internet and the newly added load into the deep reinforcement learning model, and performing feature extraction on the input by a graph convolution neural network to obtain features F of each nodej,j=1,...N;
F of all nodes by reinforcement learning Q networkjFor input, an evaluation value Q with each node as a supply node is obtainedjAnd selecting the node with the largest evaluation value as the energy supply node nsource
The operator network will FjAnd nsourceThe input is the route a of the next energy transfer, and the output is the route a of the next energy transfer;
critic network will FjAnd nsourceFor input, an evaluation v(s) of the current energy internet status is given.
Optionally, the forward propagation and backward propagation processes of the deep reinforcement learning model composed of the graph convolution neural network, the reinforcement learning Q network, the operator network, and the critic network include:
historical data s (t) of M time nodes according to initial state data seti) Each state sample initial value s in M, at random energy router node nloadIncreasing the load Δ P and performing the following steps:
respectively sending the comprehensive information of N energy routersFeature extraction is carried out in N nodes of the graph convolution neural network to obtain features F of N energy routersj,j=1,...N;
F is to bejN, and obtaining a Q value Q of each energy router node by using a Q networkjN, and a true Q value Q 'is calculated from the evaluation Q function of each power supply node'jN, according to a mean square error loss function: epsilon (theta)Q)=∑(Qj-Q′j)2Updating Q network parameter thetaQAnd the neural network parameter θG
Figure BDA0002970258360000051
Figure BDA0002970258360000052
Selecting the node with the maximum Q value as a power supply node nsourceThen, the following steps are repeated to train the operator network and the critic network:
f is to bejN and a selected supply node NsourceUsing the operator network pi (· | s; theta) as inputa) Determining action a, and calculating the R value R(s) of the action by an incentive R function;
performing action a, transferring energy to the next energy router node nsource', obtaining the energy internet state s ' at the next moment, and sending s ' into the graph convolution neural network to obtain Fj′,j=1,...N;
F is to bejN and F, j ═ 1j', j ═ 1.. N is substituted into the critic network to calculate estimates V(s) and V (s') of the cost function, and TD error δ is calculated:
δ=R(s)+γV(s)-V(s′);
updating the operator network parameter and the graph neural network parameter theta according to the TD error deltaG
Figure BDA0002970258360000053
Figure BDA0002970258360000054
According to the mean square error loss function: epsilon (theta)c)=(R(s)+γV(s)-V(s′))2Updating the critical network parameter θcAnd the neural network parameter θG
Figure BDA0002970258360000055
Figure BDA0002970258360000056
N is to besource' as a new supply node nsource,Fj' is a novel FjStarting the next cycle until nsourceIs' nload
When the preset maximum number of training rounds is reached, the training is finished, and the parameters theta of the graph convolutional neural network, the reinforcement learning Q network, the actor and the critic network are returnedG、θQ、θaAnd thetac
Optionally, the state sample initial value s includes: n energy routers NkState information and device parameters of N.
According to the technical scheme, the real-time state information and the equipment parameters of each energy router in the energy internet and the newly added load in the energy internet are input into the deep reinforcement learning model consisting of the graph convolution neural network, the reinforcement learning Q network, the actor network and the critic network, so that the optimal power supply energy router and the optimal energy routing path are obtained. Therefore, the embodiment of the application realizes the rapid and accurate optimization of the energy transmission line by utilizing the real-time data of each energy router in the energy internet, thereby reducing the loss of energy transmission and ensuring the efficiency and reliability of power supply. In addition, in the optimization process, the available power supply capacity, the power supply quality, the reliability and the price of each energy router, the number of the routes required to be passed by newly adding a load to the energy router, the length of the transmission line and other factors are comprehensively considered, and the accuracy of the optimization result is further improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a flowchart of an energy route optimization method provided in an embodiment of the present application;
fig. 2 is a second flowchart of an energy route optimization method according to an embodiment of the present application;
FIG. 3 is a network architecture diagram of a deep reinforcement learning model according to an embodiment of the present disclosure;
FIG. 4 is a block diagram of a graph convolution neural network provided by an embodiment of the present application;
FIG. 5 is a block diagram of a atlas layer of an atlas neural network provided in an embodiment of the present application;
FIG. 6 is a flow chart of deep reinforcement learning model training provided by an embodiment of the present application;
fig. 7 is a schematic structural diagram of an energy routing optimization apparatus provided in an embodiment of the present application;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 shows a flowchart of an energy route optimization method provided in an embodiment of the present application, fig. 2 is a flowchart of another energy route optimization method provided in the embodiment of the present application, fig. 3 is a network structure diagram of a deep reinforcement learning model provided in the embodiment of the present application, fig. 4 is a structure diagram of a convolutional neural network provided in the embodiment of the present application, fig. 5 is a structure diagram of a convolutional layer of the convolutional neural network provided in the embodiment of the present application, and fig. 6 is a training flowchart of the deep reinforcement learning model provided in the embodiment of the present application. The energy route optimization method provided by the embodiment of the present application is explained and explained in detail below with reference to fig. 1 to 6, and as shown in fig. 1, the energy route optimization method provided by the embodiment of the present application specifically includes:
step 101: acquiring a network structure formed by interconnection of N energy routers in an energy internet, and a state information historical data set and equipment parameters of each energy router;
in this step, it should be noted that, first, a network structure in which all N energy routers in the energy internet are interconnected, and a state information history data set and device parameters of each energy router are obtained. Specifically, for an energy internet system interconnected by N energy routers, the network structure of the energy internet can be obtained according to the connection relationship of the electric energy transmission lines among the energy routers. Further, the energy router is used as a node of the graph, and the connection line is used as an edge, so that the topological structure of the energy Internet can be obtained. And then for each energy router node niAcquiring the device parameters thereof including the energy transfer efficiency effiMaximum power generation capacity Cgen iMaximum load capacity Cload iAnd acquiring the state information of the energy router, wherein the state information comprises load power in a microgrid connected with the energy router, power of power generation equipment and microgrid voltage. And finally, combining the equipment parameters and the state information of the N energy routers into comprehensive information of the energy Internet system.
Step 102: constructing a reinforcement learning environment of the energy Internet according to the operation principle of the energy Internet and the energy router, and constructing a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network according to the network structure; the reinforcement learning Q network, the actor network and the critic network all adopt fully-connected neural network structures;
in this step, it should be noted that the energy internet reinforcement learning environment is built according to a mechanism that energy routers in the energy internet perform energy transfer, an equation of a transfer process of energy between the energy routers is established, and energy loss generated by conversion efficiency of the energy routers, energy loss generated by lines due to impedance, and change of network voltage are mainly considered. The energy internet reinforcement learning environment can simulate an energy transfer process in the energy internet, namely, energy transfer is carried out according to the states of all energy routers in the existing energy internet, and the updated states of all energy routers in the energy internet after transfer are obtained.
In this step, as shown in fig. 4, in the structure diagram of the graph convolution neural network provided in the embodiment of the present application, the graph convolution neural network is formed by connecting a plurality of graph convolution layers, and the output of more than one graph convolution layer of each graph convolution layer is used as an input until a final feature extraction result of the energy internet is obtained. Each graph convolution layer in the graph convolution neural network is as shown in fig. 5, and the graph convolution layer takes the characteristic of each graph node as input and combines an adjacent matrix representing the graph structure to perform graph convolution operation on the characteristic of the graph node. The graph convolution operation carries out fusion and feature extraction on the features of the nodes with the connection relation, and compared with a common convolution neural network or a full-connection neural network, the extracted features can reflect the structural information of the graph.
In this step, it should be noted that a deep reinforcement learning model composed of a graph convolution neural network, a reinforcement learning Q network, an actor network, and a critic network is built according to a network structure of the energy internet. Wherein, the reinforcement learning Q network, the actor network and the critic network all adopt a fully connected neural network structure, and in terms of network output, as shown in fig. 3: the output of the reinforcement learning Q network is the Q value of the corresponding node of each energy router, which represents the reward value of the energy router as the energy supply node, and the larger the value is, the more suitable the energy router is as the energy supply node; the output of the actor network is the next energy router node of energy transfer; the output of the critic network is an evaluation of the energy delivery process.
Step 103: according to the state information historical data set and the equipment parameters of each energy router in the energy Internet, the available power supply capacity, the power supply quality, the reliability and the price of each energy router are integrated, the number of routes and the length of a transmission line which are required to pass by a newly added load to the energy router are increased, an evaluation Q function of the power supply energy router is constructed, and a reward R function of each energy transfer is constructed for the energy transfer between every two energy routers; the evaluation Q function is used for calculating the reward value of the power supply energy router; the reward R function is used to calculate a reward value obtained for performing each energy transfer action;
in this step, it should be noted that, according to the state information historical data set and the device parameters of each energy router in the energy internet, the available power supply capacity, the power supply quality, the reliability, and the price of each energy router are integrated, and the number of routes that a load needs to be added to the energy router and the length of a transmission line are integrated, so as to construct an evaluation Q function of the power supply energy router, where the Q function is:
Q(nload,nsource)=-α1lline2crouter3rsource4psource4qsourceΔP
wherein n isloadEnergy router, n, indicating where the newly added load is locatedsourceEnergy router supplying power to the energy router node where the newly added load is located, Δ P representing the newly added load, llineRepresents nloadAnd nsourceTotal length of connecting line between crouterRepresents nloadAnd nsourceNumber of routers in between, rsourceRepresents nsourceReliability of power supply of psourceRepresents nsourcePrice of power supply per unit energy, qsourceRepresents nsourceQuality of supply of alpha1、α2、α3、α4And alpha5Each 5-part coefficient was evaluated for adjusting specific gravity.
In this step, for energy transfer between every two energy routers, an energy transfer starting node n is givenstartEnergy receiving node nendAnd the load to be transferred, Δ P, the corresponding reward R function is:
R(nstart,nend)=-β1Lline2Lrouter3ΔU;
wherein L islineRepresenting the energy loss on the line during the energy transfer, LrouterThe energy loss of the energy router in the energy transfer process is shown, the delta U is the absolute value of the voltage deviation brought by the transfer process, and the beta 1, the beta 2 and the beta 3 are respectively coefficients of 3 part rewards and are used for adjusting the specific gravity. The reward R function is used to calculate the value of the reward obtained for each energy transfer action, giving the system a direction of travel.
Step 104: initializing parameter θ of graph convolutional neural network, reinforcement learning Q network, actor network, and critic networkG、θQ、θaAnd thetac(ii) a Simulating the situation of newly added loads by adding loads to random energy routers, and training a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network by using state information historical data sets and equipment parameters of all energy routers in an energy internet as input samples and using evaluation Q functions, energy routing paths and reward R functions of all the energy routers as output samples;
in this step, it should be noted that the initialization of the convolutional neural network includes:
constructing an adjacency matrix of the graph according to the network structure of the energy Internet, and determining the number of graph convolution layers and the number of convolution kernels of each layer of graph convolution; the construction method of the adjacency matrix A comprises the following steps:
Figure BDA0002970258360000101
initialization of a reinforcement learning Q network, comprising: determining the number of layers of the network and the type of each layer, wherein the input size is the same as the output size of the graph convolution neural network, the output size is N-1, and the set does not contain all energy routers of the energy router correspondingly;
initialization of an actor network, comprising: determining the number of layers of the network and the type of each layer, wherein the input size is the same as the output size of the graph convolution neural network, the output size is 1, namely the number of the energy transfer target router;
initialization of the critic network includes: and determining the number of layers of the network and the type of each layer, wherein the input size is the same as the output size of the graph convolution neural network, and the output size is 1, namely the evaluation value of the energy transfer process.
In the step, the situation of newly added load is simulated by adding load to the random energy router, and a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network is trained by using a state information historical data set and equipment parameters of each energy router in the energy internet as input samples and using an evaluation Q function, an energy routing path and an incentive R function of each energy router as output samples. Therefore, the embodiment of the invention learns the energy transfer process among the multiple energy routers in the energy internet by adopting a mode of combining the graph convolution neural network and the deep reinforcement learning, selects and optimizes the energy supply nodes and the energy routing paths of the energy routers, and can ensure the power supply efficiency and reliability of the energy internet.
And the energy router online monitoring state data is integrated, so that the energy routing in the energy Internet is selected and optimized quickly and accurately, and the power supply efficiency and reliability of the energy Internet can be guaranteed.
Step 105: for the requirement of new loads in the energy Internet, inputting the real-time state information and equipment parameters of each energy router in the energy Internet and the new loads into the deep reinforcement learning model to obtain an optimal power supply energy router and an optimal energy routing path; the optimal power supply energy router is the energy router with the maximum Q function value, and the optimal energy routing path is output by the actor network.
In this step, as shown in fig. 6, the deep reinforcement learning model takes the real-time status information of each energy router of the energy internet, the device parameters, and the newly added load of a certain node as inputs. First, the neural network G (theta) is mappedG) Carrying out feature extraction on the input to obtain the feature F of each nodejJ is 1. Then reinforcement learns the Q network Q (theta)Q) With F of all nodesjFor input, an evaluation value Q with each node as a power supply node is obtainedjThe node with the largest evaluation value is selected as the power supply node nsource. actor network pi (· | s;. theta)a) With FjAnd nsourceFor input, the action output is the route a for the next energy transfer. critic network V (s; θ c) by FjAnd nsourceFor input, an evaluation v(s) of the current energy internet status is given.
According to the technical scheme, the real-time state information and the equipment parameters of each energy router in the energy internet and the newly added load in the energy internet are input into the deep reinforcement learning model consisting of the graph convolution neural network, the reinforcement learning Q network, the actor network and the critic network, so that the optimal power supply energy router and the optimal energy routing path are obtained. Therefore, the embodiment of the application realizes the rapid and accurate optimization of the energy transmission line by utilizing the real-time data of each energy router in the energy internet, thereby reducing the loss of energy transmission and ensuring the efficiency and reliability of power supply. In addition, in the optimization process, the available power supply capacity, the power supply quality, the reliability and the price of each energy router, the number of routes required to be passed by newly adding loads to the energy routers, the length of transmission lines and other factors are comprehensively considered, and the accuracy of the optimization result is further improved.
Based on the content of the foregoing embodiment, in this embodiment, the state information history data set of the energy router includes: load power in the microgrid connected to the energy router, power of the power generation equipment and microgrid voltage.
In the present embodiment, the historical data set of the state information of the energy router in the energy internet includes load power in the microgrid connected to the energy router, power of the power generation equipment, and microgrid voltage. Preferably, the state data of the energy routers can be monitored on line to ensure the accuracy of energy routing optimization each time, and the existing routing table optimization method needs to update the routing table frequently when new energy generates electricity or the load fluctuates, so that the transmission load of information flow in the energy internet is increased.
Based on the content of the foregoing embodiment, in this embodiment, the device parameters of the energy router include: energy transfer efficiency, maximum power generation capacity, and maximum load capacity.
In this embodiment, it should be noted that the device parameters of the energy router include: energy transfer efficiency, maximum power generation capacity, and maximum load capacity. The method has the advantages that the equipment parameter information of each energy router in the energy internet is obtained during each energy routing optimization, and the multi-dimensional reference is provided for the selection of the power supply energy router.
Based on the content of the foregoing embodiment, in the present embodiment, the structure of the graph convolution neural network includes: the number of graph adjacency matrices, the number of graph convolution layers, and the number of convolution kernels per layer of graph convolution.
In this embodiment, it should be noted that the graph convolution neural network is widely applied in many fields, and has a good effect on processing of graph structure data, such as wireless network node selection, urban road traffic prediction, and the like. The energy internet formed by connecting a plurality of energy routers is a typical graph structure, and a graph convolution neural network can perform efficient feature extraction on operation information of the graph convolution neural network. As shown in fig. 4, the structure of the graph convolution neural network includes a neighborhood matrix of a graph, the number of graph convolution layers, and the number of convolution kernels convolved per layer of the graph.
Based on the contents of the above embodiments, in the present embodiment, the structures of the reinforcement learning Q network, the operator network, and the criticc network include the number of layers of the network, the type of each layer, and the input/output size.
In this embodiment, the structure of the reinforcement learning Q network, the operator network and the criticc network includes the number of layers of the network, the type of each layer and the input/output size.
Based on the content of the foregoing embodiment, in the present embodiment, the evaluation Q function is:
Q(nload,nsource)=-α1lline2crouter3rsource4psource4qsourceΔP;
wherein n isloadEnergy router, n, indicating where the newly added load is locatedsourceEnergy router supplying power to the energy router node where the newly added load is located, Δ P representing the newly added load, llineRepresents nloadAnd nsourceTotal length of connecting line between crouterRepresents nloadAnd nsourceNumber of routers in between, rsourceRepresents nsourceReliability of power supply of psourceRepresents nsourcePrice of power supply per unit energy, qsourceRepresents nsourceQuality of supply of alpha1、α2、α3、α4And alpha5Coefficients of 5 parts of evaluations, respectively, for adjusting specific gravity;
for energy transfer between every two energy routers, an energy transfer starting node n is givenstartEnergy receiving node nendAnd the load to be transferred, Δ P, the corresponding reward R function is:
R(nstart,nend)=-β1Lline2Lrouter3ΔU;
wherein L islineRepresenting the energy loss on the line during the energy transfer, LrouterRepresents the energy loss of the energy router in the energy transfer process, and Delta U is the absolute value of the voltage deviation brought by the transfer process, beta1、β2And beta3Coefficients of 3 bonus points, respectively, are used to adjust the specific gravity.
In this embodiment, it should be noted that, according to the historical state information data set and the device parameters of each energy router in the energy internet, and by integrating the available power supply capacity, the power supply quality, the reliability, and the price of each energy router, and the number of routes that a load needs to be added to the energy router and the length of a transmission line, an evaluation Q function of the power supply energy router is constructed, where the Q function is:
Q(nload,nsource)=-α1lline2crouter3rsource4psource4qsourceΔP
wherein n isloadEnergy router, n, indicating where the newly added load is locatedsourceEnergy router supplying power to the energy router node where the newly added load is located, Δ P representing the newly added load, llineRepresents nloadAnd nsourceTotal length of connecting line between crouterRepresents nloadAnd nsourceNumber of routers in between, rsourceRepresents nsourceReliability of power supply of psourceRepresents nsourcePrice of power supply per unit energy, qsourceRepresents nsourceQuality of supply of alpha1、α2、α3、α4And alpha5Each 5-part coefficient was evaluated for adjusting specific gravity.
In the present embodiment, for energy transfer between every two energy routers, an energy transfer starting node n is givenstartEnergy receiving node nendAnd the load to be transferred, Δ P, the corresponding reward R function is:
R(nstart,nend)=-β1Lline2Lrouter3ΔU;
wherein L islineRepresenting the energy loss on the line during the energy transfer, LrouterRepresents the energy loss of the energy router in the energy transfer process, and Delta U is the absolute value of the voltage deviation brought by the transfer process, beta1、β2And beta3Coefficients of 3 bonus points, respectively, are used to adjust the specific gravity. The reward R function is used to calculate the value of the reward obtained for each energy transfer action, giving the system a direction of travel.
Based on the content of the foregoing embodiment, in this embodiment, the initializing graph convolutional neural network includes:
constructing an adjacency matrix of the graph according to the network structure of the energy Internet, and determining the number of graph convolution layers and the number of convolution kernels of each layer of graph convolution; the construction method of the adjacency matrix A comprises the following steps:
Figure BDA0002970258360000141
initialization of a reinforcement learning Q network, comprising: determining the number of layers of the network and the type of each layer, wherein the input size is the same as the output size of the graph convolution neural network, the output size is N-1, and the set does not contain all energy routers of the energy router correspondingly;
initialization of an actor network, comprising: determining the number of layers of the network and the type of each layer, wherein the input size is the same as the output size of the graph convolution neural network, the output size is 1, namely the number of the energy transfer target router;
initialization of the critic network includes: and determining the number of layers of the network and the type of each layer, wherein the input size is the same as the output size of the graph convolution neural network, and the output size is 1, namely the evaluation value of the energy transfer process.
In this embodiment, it should be noted that the initializing of the convolutional neural network includes:
constructing an adjacency matrix of the graph according to the network structure of the energy Internet, and determining the number of graph convolution layers and the number of convolution kernels of each layer of graph convolution; the construction method of the adjacency matrix A comprises the following steps:
Figure BDA0002970258360000151
initialization of a reinforcement learning Q network, comprising: determining the number of layers of the network and the type of each layer, wherein the input size is the same as the output size of the graph convolution neural network, the output size is N-1, and the set does not contain all energy routers of the energy router correspondingly;
initialization of an actor network, comprising: determining the number of layers of the network and the type of each layer, wherein the input size is the same as the output size of the graph convolution neural network, the output size is 1, namely the number of the energy transfer target router;
initialization of the critical network, including: and determining the number of layers of the network and the type of each layer, wherein the input size is the same as the output size of the graph convolution neural network, and the output size is 1, namely the evaluation value of the energy transfer process.
Based on the content of the foregoing embodiment, in this embodiment, the inputting the real-time state information and the device parameters of each energy router in the energy internet and the newly added load into the deep reinforcement learning model to obtain an optimal power supply energy router and energy routing path includes:
inputting the real-time state information and equipment parameters of each energy router in the energy Internet and the newly added load into the deep reinforcement learning model, and performing feature extraction on the input by a graph convolution neural network to obtain features F of each nodej,j=1,...N;
F of all nodes by reinforcement learning Q networkjFor input, an evaluation value Q with each node as a supply node is obtainedjAnd selecting the node with the largest evaluation value as the energy supply node nsource
The operator network will FjAnd nsourceThe input is the route a of the next energy transfer, and the output is the route a of the next energy transfer;
critic network will FjAnd nsourceFor input, an evaluation v(s) of the current energy internet status is given.
In this embodiment, it should be noted that the deep reinforcement learning model takes real-time status information of each energy router of the energy internet, device parameters, and a newly added load of a certain node as input. First, the neural network G (theta) is mappedG) Carrying out feature extraction on the input to obtain the feature F of each nodejJ is 1. Then reinforcement learns the Q network Q (theta)Q) With F of all nodesjFor input, an evaluation value Q with each node as a power supply node is obtainedjThe node with the largest evaluation value is selected as the power supply node nsource. actor network pi (· | s;. theta)a) With Fj and nsourceFor input, the action output is the route a for the next energy transfer. criticc network V (s; θ c) by FjAnd nsourceFor input, an evaluation v(s) of the current energy internet status is given.
Based on the content of the foregoing embodiment, in this embodiment, the forward propagation and backward propagation processes of the deep reinforcement learning model composed of the graph convolution neural network, the reinforcement learning Q network, the actor network, and the critic network include:
historical data s (t) of M time nodes according to initial state data seti) Each state sample initial value s in M, at random energy router node nloadIncreasing the load Δ P and performing the following steps:
respectively sending the comprehensive information of the N energy routers into N nodes of the graph convolution neural network for feature extraction to obtain features F of the N energy routersj,j=1,...N;
F is to bejN, and obtaining a Q value Q of each energy router node by using a Q networkjN, and a true Q value Q 'is calculated from the evaluation Q function of each power supply node'jN, according to a mean square error loss function: epsilon (theta)Q)=∑(Qj-Q′j)2Updating Q network parameter thetaQAnd neural network parametersNumber thetaG
Figure BDA0002970258360000171
Figure BDA0002970258360000172
Selecting the node with the maximum Q value as a power supply node nsourceThen, the following steps are repeated to train the operator network and the critic network:
f is to bejN and a selected supply node NsourceUsing the operator network pi (· | s; theta) as inputa) Determining action a, and calculating the R value R(s) of the action by a reward R function;
performing action a, transferring energy to the next energy router node nsource', obtaining the energy internet state s ' at the next moment, and sending s ' into the graph convolution neural network to obtain Fj′,j=1,...N;
F is to bejN and F, j ═ 1jN is substituted into the criticc network to calculate estimates V(s) and V (s') of the cost function, and TD error δ is calculated:
δ=R(s)+γV(s)-V(s′);
updating the operator network parameter and the graph neural network parameter theta according to the TD error deltaG
Figure BDA0002970258360000173
Figure BDA0002970258360000174
According to the mean square error loss function: epsilon (theta)c)=(R(s)+γV(s)-V(s′))2Updating critic network parameter thetacAnd the neural network parameter θG
Figure BDA0002970258360000175
Figure BDA0002970258360000176
N is to besource' as a new supply node nsource,Fj' is a novel FjStarting the next cycle until nsourceIs' nload
When the preset maximum number of training rounds is reached, the training is finished, and the parameters theta of the graph convolutional neural network, the reinforcement learning Q network, the actor and the critic network are returnedG、θQ、θaAnd thetac
In this embodiment, it should be noted that the forward and backward propagation processes of the deep reinforcement learning model are as follows:
given a maximum number of training rounds, the following steps are then performed round by round:
historical data s (t) of M time nodes according to initial state data seti) Each state sample initial value s in 1,. M, at a random energy router node nloadIncreasing the load Δ P and performing the following steps:
respectively sending the comprehensive information of the N energy routers into N nodes of the graph convolutional neural network for feature extraction to obtain features F of the N energy routersj,j=1,...N;
F is to bejN, and obtaining a Q value Q of each energy router node by using a Q networkjN, and a true Q value Q 'is calculated from the evaluation Q function of each power supply node'jN, according to a mean square error loss function: epsilon (theta)Q)=∑(Qj-Q′j)2Updating Q network parameter thetaQAnd the neural network parameter θG
Figure BDA0002970258360000181
Figure BDA0002970258360000182
Selecting the node with the maximum Q value as a power supply node nsourceThen, the following steps are repeated to train the operator network and the critic network:
f is to bejN and a selected supply node NsourceUsing the operator network pi (· | s; theta) as inputa) Determining action a, and calculating the R value R(s) of the action by a reward R function;
performing action a, transferring energy to the next energy router node nsource' obtaining the energy internet state s ' at the next moment, and sending s ' into a graph convolution neural network to obtain Fj′,j=1,...N;
F is to bejN and F, j ═ 1j', j ═ 1.. N is substituted into the critic network to calculate estimates V(s) and V (s') of the cost function, and TD error δ is calculated:
δ=R(s)+γV(s)-V(s′);
updating the operator network parameter and the graph neural network parameter theta according to the TD error deltaG
Figure BDA0002970258360000183
Figure BDA0002970258360000184
According to the mean square error loss function: epsilon (theta)c)=(R(s)+γV(s)-V(s′))2Updating critic network parameter thetacAnd the neural network parameter θG
Figure BDA0002970258360000185
Figure BDA0002970258360000186
N is to besource' as a new supply node nsource,Fj' is a novel FjStarting the next cycle until nsourceIs' nload
When the preset maximum number of training rounds is reached, the training is finished, and the parameters theta of the graph convolutional neural network, the reinforcement learning Q network, the actor and the critic network are returnedG、θQ、θaAnd thetac
For a newly added load of a certain energy router node, energy internet information and load information are input into a graph convolution neural network, and an optimal power supply node is obtained through a Q network; inputting the energy Internet characteristics and the power supply node information into an operator network to obtain a next node for energy transfer until the energy is transferred to the node where the newly added load is located; the output process of the operator network is further arranged into an optimal energy route, and the optimal energy route is sent to the energy route through the nodes by the information network to establish an actual energy transmission route.
Based on the same inventive concept, another embodiment of the present invention provides an energy route optimization device, as shown in fig. 7, the energy route optimization device provided in the embodiment of the present application includes:
the system comprises a first processing module 1, a second processing module and a control module, wherein the first processing module is used for acquiring a network structure formed by interconnection of N energy routers in the energy Internet, and a state information historical data set and equipment parameters of each energy router;
the second processing module 2 is used for constructing a reinforcement learning environment of the energy internet according to the operation principle of the energy internet and the energy router, and constructing a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network according to the network structure; the reinforcement learning Q network, the actor network and the critic network all adopt fully-connected neural network structures;
the third processing module 3 is configured to construct an evaluation Q function of the power supply energy router according to the state information historical data set and the device parameters of each energy router in the energy internet, and synthesize the available power supply capacity, the power supply quality, the reliability, and the price of each energy router, the number of routes that a load needs to be added to the energy router, and the length of a transmission line, and construct a reward R function for each energy transfer for energy transfer between every two energy routers; the evaluation Q function is used for calculating the reward value of the power supply energy router; the reward R function is used for calculating a reward value obtained by executing each energy transfer action;
a fourth processing module 4 for initializing the parameters theta of the graph convolution neural network, the reinforcement learning Q network, the actor network and the critic networkG、θQ、θaAnd thetac(ii) a Simulating the situation of newly added loads by adding loads to random energy routers, and training a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network by using state information historical data sets and equipment parameters of all energy routers in an energy internet as input samples and using evaluation Q functions, energy routing paths and reward R functions of all the energy routers as output samples;
the fifth processing module 5 is configured to input the real-time state information and the device parameters of each energy router in the energy internet and the newly added load into the deep reinforcement learning model to obtain an optimal power supply energy router and an optimal energy routing path; the optimal power supply energy router is the energy router with the maximum Q function value, and the optimal energy routing path is output by the actor network.
In this embodiment, it should be noted that, first, a network structure in which all N energy routers in the energy internet are interconnected, and a state information history data set and device parameters of each energy router are obtained. Specifically, for an energy internet system interconnected by N energy routers, according to the connection relationship of electric energy transmission lines among the energy routers, the connection relationship may beAnd obtaining a network structure of the energy Internet. Further, the energy router is used as a node of the graph, and the connection line is used as an edge, so that the topological structure of the energy Internet can be obtained. And then for each energy router node niAcquiring the device parameters thereof including the energy transfer efficiency effiMaximum power generation capacity Cgen iMaximum load capacity Cload iAnd acquiring the state information of the energy router, wherein the state information comprises load power in a microgrid connected with the energy router, power of power generation equipment and microgrid voltage. And finally, combining the equipment parameters and the state information of the N energy routers into comprehensive information of the energy Internet system.
In the embodiment, the energy internet reinforcement learning environment is built according to the mechanism of energy transmission of the energy routers in the energy internet, an equation of the energy transmission process among the energy routers is established, and energy loss generated by the conversion efficiency of the energy routers, energy loss of lines due to impedance and change of network voltage are mainly considered. The energy internet reinforcement learning environment can simulate an energy transfer process in the energy internet, namely, energy transfer is carried out according to the states of all energy routers in the existing energy internet, and the updated states of all energy routers in the energy internet after transfer are obtained.
In this embodiment, as shown in fig. 4, in the structure diagram of the graph convolution neural network provided in the embodiment of the present application, the graph convolution neural network is formed by connecting a plurality of graph convolution layers, and the output of more than one graph convolution layer of each graph convolution layer is used as an input until a final feature extraction result of the energy internet is obtained. Each graph convolution layer in the graph convolution neural network is as shown in fig. 5, and the graph convolution layer takes the characteristics of each graph node as input, and combines the adjacent matrix of the characteristic graph structure to perform graph convolution operation on the characteristics of the graph nodes. The graph convolution operation carries out fusion and feature extraction on the features of the nodes with the connection relation, and compared with a common convolution neural network or a full-connection neural network, the extracted features can reflect the structural information of the graph.
In this embodiment, a deep reinforcement learning model composed of a graph convolution neural network, a reinforcement learning Q network, an actor network, and a critic network is built according to a network structure of an energy internet. Wherein, the reinforcement learning Q network, the actor network and the critic network all adopt a fully connected neural network structure, and in terms of network output, as shown in fig. 3: the output of the reinforcement learning Q network is the Q value of the corresponding node of each energy router, which represents the reward value of the energy router as the energy supply node, and the larger the value is, the more suitable the energy router is as the energy supply node; the output of the actor network is the next energy router node of energy transfer; the output of the critic network is an evaluation of the energy delivery process.
In this embodiment, it should be noted that, according to the historical state information data set and the device parameters of each energy router in the energy internet, and by integrating the available power supply capacity, the power supply quality, the reliability, and the price of each energy router, and the number of routes that a load needs to be added to the energy router and the length of a transmission line, an evaluation Q function of the power supply energy router is constructed, where the Q function is:
Q(nload,nsource)=-α1lline2crouter3rsource4psource4qsourceΔP
wherein n isloadEnergy router, n, indicating where the newly added load is locatedsourceEnergy router supplying power to the energy router node where the newly added load is located, Δ P representing the newly added load, llineRepresents nloadAnd nsourceTotal length of connecting line between crouterRepresents nloadAnd nsourceNumber of routers in between, rsourceRepresents nsourceReliability of power supply of psourceRepresents nsourcePrice of power supply per unit energy, qsourceRepresents nsourceQuality of supply of alpha1、α2、α3、α4And alpha5Each of which was evaluated in section 5, and was used to adjust specific gravity.
In the present embodiment, forEnergy transfer between every two energy routers, given an energy transfer initiation node nstartEnergy receiving node nendAnd the load to be transferred, Δ P, the corresponding reward R function is:
R(nstart,nend)=-β1Lline2Lrouter3ΔU;
wherein L islineRepresenting the energy loss on the line during the energy transfer, LrouterRepresents the energy loss of the energy router in the energy transfer process, and Delta U is the absolute value of the voltage deviation brought by the transfer process, beta1、β2And beta3Coefficients of 3 bonus points, respectively, are used to adjust the specific gravity. The reward R function is used to calculate the value of the reward obtained for each energy transfer action, giving the system a direction of travel.
In this embodiment, it should be noted that the initializing of the graph convolution neural network includes:
constructing an adjacency matrix of the graph according to the network structure of the energy Internet, and determining the number of graph convolution layers and the number of convolution kernels of each layer of graph convolution; the construction method of the adjacency matrix A comprises the following steps:
Figure BDA0002970258360000221
initialization of a reinforcement learning Q network, comprising: determining the number of layers of the network and the type of each layer, wherein the input size is the same as the output size of the graph convolution neural network, the output size is N-1, and the set does not contain all energy routers of the energy router correspondingly;
initialization of an actor network, comprising: determining the number of layers of the network and the type of each layer, wherein the input size is the same as the output size of the graph convolution neural network, the output size is 1, namely the number of the energy transfer target router;
initialization of the critical network, including: and determining the number of layers of the network and the type of each layer, wherein the input size is the same as the output size of the graph convolution neural network, and the output size is 1, namely the evaluation value of the energy transfer process.
In the embodiment, the situation of newly added loads is simulated by adding loads to random energy routers, real-time state information and equipment parameters of each energy router in the energy internet are used as input samples, and an evaluation Q function, an energy routing path and an incentive R function of each energy router are used as output samples, so that a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network is trained. Therefore, the embodiment of the invention learns the energy transfer process among the multiple energy routers in the energy internet by adopting a mode of combining the graph convolution neural network and the deep reinforcement learning, selects and optimizes the energy supply nodes and the energy routing paths of the energy routers, and can ensure the power supply efficiency and reliability of the energy internet.
And the energy router online monitoring state data is integrated, so that the energy routing in the energy Internet is selected and optimized quickly and accurately, and the power supply efficiency and reliability of the energy Internet can be guaranteed.
In this embodiment, as shown in fig. 6, the deep reinforcement learning model takes the state information of each energy router of the energy internet, the device parameters, and the newly added load of a certain node as inputs. First, the neural network G (theta) is mappedG) Performing feature extraction on the input to obtain features F of each nodejJ is 1. Then reinforcement learns the Q network Q (theta)Q) With F for all nodesjFor input, an evaluation value Q with each node as a power supply node is obtainedjThe node with the largest evaluation value is selected as the power supply node nsource. actor network pi (· | s;. theta)a) With FjAnd nsourceFor input, the action output is the route a for the next energy transfer. critic network V (s; θ c) by FjAnd nsourceFor input, an evaluation v(s) of the current energy internet status is given.
According to the technical scheme, the real-time state information and the equipment parameters of each energy router in the energy internet and the newly added load in the energy internet are input into the deep reinforcement learning model consisting of the graph convolution neural network, the reinforcement learning Q network, the actor network and the critic network, so that the optimal power supply energy router and the optimal energy routing path are obtained. Therefore, the embodiment of the application realizes the rapid and accurate optimization of the energy transmission line by utilizing the real-time data of each energy router in the energy internet, thereby reducing the loss of energy transmission and ensuring the efficiency and reliability of power supply. In addition, in the optimization process, the available power supply capacity, the power supply quality, the reliability and the price of each energy router, the number of routes required to be passed by newly adding loads to the energy routers, the length of transmission lines and other factors are comprehensively considered, and the accuracy of the optimization result is further improved.
The energy route optimization device described in this embodiment may be used to implement the above method embodiments, and the principle and technical effect are similar, which are not described herein again.
Based on the same inventive concept, another embodiment of the present invention provides an electronic device, which, with reference to the schematic structural diagram of the electronic device shown in fig. 8, specifically includes the following contents: a processor 801, memory 802, communication interface 803, and communication bus 804;
the processor 801, the memory 802 and the communication interface 803 complete mutual communication through the communication bus 804; the communication interface 803 is used for realizing information transmission between devices;
the processor 801 is configured to call a computer program in the memory 802, and when the processor executes the computer program, the processor implements all the steps of the energy route optimization method, for example, acquiring a network structure formed by interconnecting N energy routers in an energy internet, and a state information history data set and device parameters of each energy router; constructing a reinforcement learning environment of the energy Internet according to the operation principle of the energy Internet and the energy router, and constructing a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network according to the network structure; the reinforcement learning Q network, the operator network and the critic network all adopt a fully-connected neural network structure; according to theThe method comprises the steps that state information historical data sets and equipment parameters of each energy router in the energy internet are integrated, the available power supply capacity, the power supply quality, the reliability and the price of each energy router are integrated, the number of routes and the length of a transmission line which are required to be passed by a newly added load to the energy router are increased, an evaluation Q function of the power supply energy router is constructed, and a reward R function of each energy transfer is constructed for the energy transfer between every two energy routers; the evaluation Q function is used for calculating the reward value of the power supply energy router; the reward R function is used to calculate a reward value obtained for performing each energy transfer action; initializing parameter θ of graph convolutional neural network, reinforcement learning Q network, actor network, and critic networkG、θQ、θaAnd thetac(ii) a Simulating the situation of newly added loads by adding loads to random energy routers, and training a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network by using state information historical data sets and equipment parameters of all energy routers in an energy internet as input samples and using evaluation Q functions, energy routing paths and reward R functions of all the energy routers as output samples; inputting state information historical data sets and equipment parameters of all energy routers in an energy internet and newly-added loads in the energy internet into the deep reinforcement learning model to obtain an optimal power supply energy router and an optimal energy routing path; the optimal power supply energy router is the energy router with the maximum Q function value, and the optimal energy routing path is output by the actor network.
Based on the same inventive concept, yet another embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements all the steps of the above energy route optimization method, for example, acquiring a network structure formed by interconnecting N energy routers in an energy internet, and a historical data set of state information and device parameters of each energy router; the energy Internet is constructed according to the operation principle of the energy Internet and the energy routerA deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network is constructed according to the network structure; the reinforcement learning Q network, the actor network and the critic network all adopt fully-connected neural network structures; according to the state information historical data set and the equipment parameters of each energy router in the energy Internet, the available power supply capacity, the power supply quality, the reliability and the price of each energy router are integrated, the number of routes and the length of a transmission line which are required to pass by a newly added load to the energy router are increased, an evaluation Q function of the power supply energy router is constructed, and a reward R function of each energy transfer is constructed for the energy transfer between every two energy routers; the evaluation Q function is used for calculating the reward value of the power supply energy router; the reward R function is used to calculate a reward value obtained for performing each energy transfer action; initializing parameter θ of graph convolutional neural network, reinforcement learning Q network, actor network, and critic networkG、θQ、θaAnd thetac(ii) a Simulating the situation of newly added loads by adding loads to random energy routers, and training a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network by using state information historical data sets and equipment parameters of all energy routers in an energy internet as input samples and using evaluation Q functions, energy routing paths and reward R functions of all the energy routers as output samples; inputting state information historical data sets and equipment parameters of all energy routers in an energy internet and newly-added loads in the energy internet into the deep reinforcement learning model to obtain an optimal power supply energy router and an optimal energy routing path; the optimal power supply energy router is the energy router with the maximum Q function value, and the optimal energy routing path is output by an operator network.
In addition, the logic instructions in the memory may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on such understanding, the above technical solutions may be essentially or partially implemented in the form of software products, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and include instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the energy route optimization method according to the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (8)

1. A method for energy routing optimization, comprising:
acquiring a network topology formed by interconnection of N energy routers in an energy internet, and a state information historical data set and equipment parameters of each energy router;
constructing a reinforcement learning environment of the energy Internet according to the mechanism of energy transfer of the energy Internet and the energy router, and constructing a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network according to the network topology; the reinforcement learning Q network, the operator network and the critic network all adopt a fully-connected neural network structure;
according to the state information historical data set and the equipment parameters of each energy router in the energy Internet, the available power supply capacity, the power supply quality, the reliability and the price of each energy router, the number of routes which are newly added and load needs to pass through the energy router and the length of a transmission line, an evaluation Q function of the power supply energy router is constructed, and a reward R function of each energy transfer is constructed for the energy transfer between every two energy routers; the evaluation Q function is used for calculating the reward value of the power supply energy router; the reward R function is used for calculating a reward value obtained by executing each energy transfer action;
initializing parameter θ of graph convolutional neural network, reinforcement learning Q network, actor network, and critic networkG、θQ、θaAnd thetac(ii) a Simulating new load conditions by adding loads to random energy routers, using historical data sets of state information and equipment parameters of each energy router in the energy internet as input samples, and using evaluation of each energy routerTaking the price Q function, the energy routing path and the reward R function as output samples, and training a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network;
inputting real-time state information and equipment parameters of each energy router in the energy internet and the newly added load into the deep reinforcement learning model to obtain an optimal power supply energy router and an optimal energy routing path for energy consumption requirements of the newly added load in the energy internet; the optimal power supply energy router is the energy router with the maximum Q function value, and the optimal energy routing path is output by an actor network;
wherein the evaluation Q function is:
Q(nload,nsource)=-α1lline2crouter3rsource4psource4qsourceΔP;
wherein n isloadEnergy router, n, indicating where the newly added load is locatedsourceEnergy router supplying power to the energy router node where the newly added load is located, Δ P representing the newly added load, llineRepresents nloadAnd nsourceTotal length of connecting line between crouterRepresents nloadAnd nsourceNumber of routers in between, rsourceRepresents nsourceReliability of power supply of psourceRepresents nsourcePrice of power supply per unit energy, qsourceRepresents nsourceQuality of supply of alpha1、α2、α3、α4And alpha5Coefficients of 5 parts of evaluations, respectively, for adjusting specific gravity;
for energy transfer between every two energy routers, an energy transfer starting node n is givenstartEnergy receiving node nendAnd the load to be transferred, Δ P, the corresponding reward R function is:
R(nstart,nend)=-β1Lline2Lrouter3ΔU;
wherein L islineRepresenting the energy loss on the line during the energy transfer, LrouterRepresents the energy loss of the energy router in the energy transfer process, and Delta U is the absolute value of the voltage deviation brought by the transfer process, beta1、β2And beta3Coefficients of 3 part prizes respectively for adjusting specific gravity;
the method for obtaining the optimal power supply energy router and energy routing path by inputting the real-time state information and the equipment parameters of each energy router in the energy internet and the newly added load into the deep reinforcement learning model comprises the following steps:
inputting the real-time state information and equipment parameters of each energy router in the energy Internet and the newly added load into the deep reinforcement learning model, and performing feature extraction on the input by a graph convolution neural network to obtain features F of each nodej,j=1,...N;
F of all nodes by reinforcement learning Q networkjFor input, an evaluation value Q with each node as a supply node is obtainedjAnd selecting the node with the largest evaluation value as the energy supply node nsource
The operator network will FjAnd nsourceThe input is the route a of the next energy transfer, and the output is the route a of the next energy transfer;
critic network will FjAnd nsourceFor input, an evaluation v(s) of the current energy internet status is given.
2. The energy routing optimization method of claim 1, wherein the historical data set of state information for the energy router comprises: load power in the microgrid connected to the energy router, power of the power generation equipment and microgrid voltage.
3. The energy routing optimization method of claim 1, wherein the device parameters of the energy router comprise: energy transfer efficiency, maximum power generation capacity, and maximum load capacity.
4. The energy routing optimization method of claim 1, wherein the structure of the graph convolutional neural network comprises: the number of graph adjacency matrices, the number of graph convolution layers, and the number of convolution kernels for each layer of graph convolution.
5. The energy routing optimization method according to claim 1, wherein the structure of the reinforcement learning Q network, the actor network and the critic network comprises the number of layers of the network, the type of each layer and the input-output size.
6. The energy routing optimization method of claim 1, wherein initializing the graph convolutional neural network comprises:
constructing an adjacency matrix of the graph according to the network topology of the energy Internet, and determining the number of graph convolution layers and the number of convolution kernels of each layer of graph convolution; the construction method of the adjacency matrix A comprises the following steps:
Figure FDA0003518695300000031
initialization of a reinforcement learning Q network, comprising: determining the number of layers of the network and the type of each layer, wherein the input size is the same as the output size of the graph convolution neural network, the output size is N-1, and the set does not contain all energy routers of the energy router correspondingly;
initialization of an actor network, comprising: determining the number of layers of the network and the type of each layer, wherein the input size is the same as the output size of the graph convolution neural network, the output size is 1, namely the number of the energy transfer target router;
initialization of the critic network includes: and determining the number of layers of the network and the type of each layer, wherein the input size is the same as the output size of the graph convolution neural network, and the output size is 1, namely the evaluation value of the energy transfer process.
7. The energy route optimization method according to claim 1, wherein the forward propagation and backward propagation processes of the deep reinforcement learning model composed of the graph convolution neural network, the reinforcement learning Q network, the operator network and the critic network comprise:
historical data s (t) of M time nodes according to initial state data seti) Each state sample initial value s in M, at random energy router node nloadIncreasing the load Δ P and performing the following steps:
respectively sending the comprehensive information of the N energy routers into N nodes of the graph convolution neural network for feature extraction to obtain features F of the N energy routersj,j=1,...N;
F is to bejN, and obtaining a Q value Q of each energy router node by using a Q networkjN, and a true Q value Q 'is calculated from the evaluation Q function of each power supply node'jN, according to a mean square error loss function: epsilon (theta)Q)=∑(Qj-Q′j)2Updating Q network parameter thetaQAnd the neural network parameter θG
Figure FDA0003518695300000041
Figure FDA0003518695300000042
Selecting the node with the maximum Q value as a power supply node nsourceThen, the following steps are repeated to train the operator network and the critic network:
f is to bejN and a selected supply node NsourceUsing the operator network pi (· | s; theta) as inputa) Determining action a, and calculating the R value R(s) of the action by a reward R function;
performing action a, transferring energy to the next energy router node nsource', obtaining the energy internet state s ' at the next moment, and sending s ' into the graph convolution neural network to obtain Fj′,j=1,...N;
F is to bejN and F, j ═ 1jN is substituted into the criticc network to calculate estimates V(s) and V (s') of the cost function, and TD error δ is calculated:
δ=R(s)+γV(s)-V(s′);
updating the operator network parameter and the graph neural network parameter theta according to the TD error deltaG
Figure FDA0003518695300000043
Figure FDA0003518695300000044
According to the mean square error loss function: epsilon (theta)c)=(R(s)+γV(s)-V(s′))2Updating critic network parameter thetacAnd the neural network parameter θG
Figure FDA0003518695300000051
Figure FDA0003518695300000052
N is to besource' as a new supply node nsource,Fj' is a novel FjStarting the next cycle until nsourceIs' nload
When the preset maximum number of training rounds is reached, the training is finished, and the parameters theta of the graph convolutional neural network, the reinforcement learning Q network, the actor and the critic network are returnedG、θQ、θaAnd thetac
8. The energy routing optimization method of claim 7, wherein the stateA sample initial value s comprising: n energy routers NkN, k ═ 1.. N.
CN202110261579.7A 2021-03-10 2021-03-10 Energy route optimization method Active CN113132232B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110261579.7A CN113132232B (en) 2021-03-10 2021-03-10 Energy route optimization method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110261579.7A CN113132232B (en) 2021-03-10 2021-03-10 Energy route optimization method

Publications (2)

Publication Number Publication Date
CN113132232A CN113132232A (en) 2021-07-16
CN113132232B true CN113132232B (en) 2022-05-20

Family

ID=76773010

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110261579.7A Active CN113132232B (en) 2021-03-10 2021-03-10 Energy route optimization method

Country Status (1)

Country Link
CN (1) CN113132232B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113572697B (en) * 2021-07-20 2023-09-22 电子科技大学 Load balancing method based on graph convolution neural network and deep reinforcement learning
CN113780482A (en) * 2021-11-12 2021-12-10 中国科学院理化技术研究所 Intelligent detection method for abnormity of energy router
CN114172840B (en) * 2022-01-17 2022-09-30 河海大学 Multi-microgrid system energy routing method based on graph theory and deep reinforcement learning
CN115022231B (en) * 2022-06-30 2023-11-03 武汉烽火技术服务有限公司 Optimal path planning method and system based on deep reinforcement learning
CN117294640B (en) * 2023-10-13 2024-05-24 北京亿美芯科技有限公司 Vehicle-mounted opportunity routing node selection method and system based on PPO algorithm

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3057192A1 (en) * 2015-02-12 2016-08-17 Northeastern University An energy internet and a hierarchical control system and a control method thereof
CN107911299A (en) * 2017-10-24 2018-04-13 浙江工商大学 A kind of route planning method based on depth Q study
CN108038545A (en) * 2017-12-06 2018-05-15 湖北工业大学 Fast learning algorithm based on Actor-Critic neutral net continuous controls
CN111967179A (en) * 2020-07-02 2020-11-20 江苏能来能源互联网研究院有限公司 Dynamic optimization matching method for energy units of energy Internet

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3057192A1 (en) * 2015-02-12 2016-08-17 Northeastern University An energy internet and a hierarchical control system and a control method thereof
CN107911299A (en) * 2017-10-24 2018-04-13 浙江工商大学 A kind of route planning method based on depth Q study
CN108038545A (en) * 2017-12-06 2018-05-15 湖北工业大学 Fast learning algorithm based on Actor-Critic neutral net continuous controls
CN111967179A (en) * 2020-07-02 2020-11-20 江苏能来能源互联网研究院有限公司 Dynamic optimization matching method for energy units of energy Internet

Also Published As

Publication number Publication date
CN113132232A (en) 2021-07-16

Similar Documents

Publication Publication Date Title
CN113132232B (en) Energy route optimization method
Kumar et al. A hybrid multi-agent based particle swarm optimization algorithm for economic power dispatch
CN110570034B (en) Bus load prediction method based on multi-XGboost model fusion
CN106909728B (en) FPGA interconnection resource configuration generation method based on reinforcement learning
CN112685657B (en) Conversation social recommendation method based on multi-mode cross fusion graph network
CN114358520A (en) Method, system, device and medium for economic dispatching decision of power system
CN116207739A (en) Optimal scheduling method and device for power distribution network, computer equipment and storage medium
Fu et al. The distributed economic dispatch of smart grid based on deep reinforcement learning
CN117392483B (en) Album classification model training acceleration method, system and medium based on reinforcement learning
CN116451880B (en) Distributed energy optimization scheduling method and device based on hybrid learning
CN111600734B (en) Network fault processing model construction method, fault processing method and system
CN112559904A (en) Conversational social recommendation method based on door mechanism and multi-modal graph network
CN112101651B (en) Electric energy network coordination control method, system and information data processing terminal
CN116974249A (en) Flexible job shop scheduling method and flexible job shop scheduling device
CN115360768A (en) Power scheduling method and device based on muzero and deep reinforcement learning and storage medium
CN111160557B (en) Knowledge representation learning method based on double-agent reinforcement learning path search
CN115001978A (en) Cloud tenant virtual network intelligent mapping method based on reinforcement learning model
CN114662204A (en) Elastic bar system structure system data processing method and device based on graph neural network
CN112036936A (en) Deep Q network-based generator bidding behavior simulation method and system
CN111027709A (en) Information recommendation method and device, server and storage medium
CN115566692B (en) Method and device for determining reactive power optimization decision, computer equipment and storage medium
US11973662B1 (en) Intelligent mapping method for cloud tenant virtual network based on reinforcement learning model
CN118101493B (en) Simulation optimizing method, device, equipment and medium for intelligent computation center network architecture
CN117061605B (en) Intelligent lithium battery active information pushing method and device based on end cloud cooperation
CN109933858B (en) Core division parallel simulation method for power distribution network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant