CN113132232B - Energy route optimization method - Google Patents
Energy route optimization method Download PDFInfo
- Publication number
- CN113132232B CN113132232B CN202110261579.7A CN202110261579A CN113132232B CN 113132232 B CN113132232 B CN 113132232B CN 202110261579 A CN202110261579 A CN 202110261579A CN 113132232 B CN113132232 B CN 113132232B
- Authority
- CN
- China
- Prior art keywords
- energy
- network
- router
- source
- reinforcement learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/14—Routing performance; Theoretical aspects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Neurology (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The embodiment of the application discloses an energy route optimization method, which comprises the following steps: acquiring a network structure formed by interconnection of N energy routers in an energy internet, and a state information historical data set and equipment parameters of each energy router; constructing and training a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network according to the network structure; and inputting the real-time state information and equipment parameters of each energy router in the energy Internet and the newly added load in the energy Internet into a deep reinforcement learning model to obtain the optimal power supply energy router and energy routing path. The energy transfer process between the multiple energy routers in the energy internet is learned in a mode of combining the graph convolutional neural network and the deep reinforcement learning, so that the energy transmission line can be quickly and accurately optimized by utilizing real-time data of the energy routers, the loss of energy transmission is reduced, and the power supply efficiency and reliability are improved.
Description
Technical Field
The application relates to the technical field of artificial intelligence, in particular to an energy route optimization method.
Background
The energy internet becomes the development direction of the current electric energy system due to the advantages of the consumption capability of distributed energy resources such as photovoltaic and wind power and the energy information sharing, and in the energy internet, energy supply and energy utilization users can perform equal and free energy exchange. As a core device in energy and information exchange of an energy Internet, an energy router needs to provide reasonable energy supply node selection and optimal energy routing according to energy consumption requirements so as to meet the energy consumption requirements of users and reduce loss in energy transmission.
Currently, energy routers are powered by nodes and energy routing optimization mainly adopts two modes of a routing table and a dynamic optimization algorithm.
The routing table needs to store information of all energy routers and lines in each energy router, so that when a newly added load is transferred to one energy router, the next node of energy transfer is determined by querying the routing table. However, in the actual energy internet, due to the power generation of new energy and the fluctuation of load, the state information of each energy router changes dynamically, and the routing table needs to be updated frequently, so that the transmission load of information flow in the energy internet is increased; meanwhile, the routing table can only reflect the information of the line, and the factors such as loss in energy transmission and the like cannot be considered. When the dynamic optimization algorithm is used for optimizing the energy supply nodes and the energy routes of the energy router, although the loss in energy transmission can be considered, when more considered factors such as the electric energy quality, the energy supply price and the voltage stability are added, the solution of the dynamic optimization algorithm becomes very complicated, and the real-time performance and the accuracy of the result cannot be ensured.
Disclosure of Invention
Because the existing method has the above problems, the embodiments of the present application provide an energy route optimization method.
Specifically, the embodiment of the present application provides the following technical solutions:
the embodiment of the application provides an energy route optimization method, which comprises the following steps:
acquiring a network structure formed by interconnection of N energy routers in an energy internet, and a state information historical data set and equipment parameters of each energy router;
constructing a reinforcement learning environment of the energy Internet according to the operation principle of the energy Internet and the energy router, and constructing a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network according to the network structure; the reinforcement learning Q network, the actor network and the critic network all adopt fully-connected neural network structures;
according to the state information historical data set and the equipment parameters of each energy router in the energy Internet, the available power supply capacity, the power supply quality, the reliability and the price of each energy router are integrated, the number of routes and the length of a transmission line which are required to pass by a newly added load to the energy router are increased, an evaluation Q function of the power supply energy router is constructed, and a reward R function of each energy transfer is constructed for the energy transfer between every two energy routers; the evaluation Q function is used for calculating the reward value of the power supply energy router; the reward R function is used for calculating a reward value obtained by executing each energy transfer action;
initializing parameter θ of graph convolutional neural network, reinforcement learning Q network, actor network, and critic networkG、θQ、θaAnd thetac(ii) a By adding at random energy routersThe load simulates the situation of newly added load, the state information historical data set and equipment parameters of each energy router in the energy Internet are used as input samples, and the evaluation Q function, the energy routing path and the reward R function of each energy router are used as output samples, so that a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network is trained;
for the requirement of new loads in the energy Internet, inputting the real-time state information and equipment parameters of each energy router in the energy Internet and the new loads into the deep reinforcement learning model to obtain an optimal power supply energy router and an optimal energy routing path; the optimal power supply energy router is the energy router with the maximum Q function value, and the optimal energy routing path is output by the actor network.
Optionally, the state information history data set of the energy router includes: load power in the microgrid connected to the energy router, power of the power generation equipment and microgrid voltage.
Optionally, the device parameters of the energy router include: energy transfer efficiency, maximum power generation capacity, and maximum load capacity.
Optionally, the structure of the graph convolutional neural network includes: the number of graph adjacency matrices, the number of graph convolution layers, and the number of convolution kernels per layer of graph convolution.
Optionally, the structure of the reinforcement learning Q network, the actor network, and the criticc network includes the number of layers of the network, the type of each layer, and the input/output size.
Optionally, the evaluation Q function is:
Q(nload,nsource)=-α1lline-α2crouter+α3rsource+α4psource+α4qsourceΔP;
wherein n isloadEnergy router, n, indicating where the newly added load is locatedsourceEnergy router supplying power to the energy router node where the newly added load is located, Δ P representing the newly added load, llineRepresents nloadAnd nsourceTotal length of connecting line between crouterRepresents nloadAnd nsourceNumber of routers in between, rsourceRepresents nsourceReliability of power supply of psourceRepresents nsourcePrice of power supply per unit energy, qsourceRepresents nsourceQuality of supply of alpha1、α2、α3、α4And alpha5Coefficients of 5 parts of evaluations, respectively, for adjusting specific gravity;
for energy transfer between every two energy routers, an energy transfer starting node n is givenstartEnergy receiving node nendAnd the load to be transferred, Δ P, the corresponding reward R function is:
R(nstart,nend)=-β1Lline-β2Lrouter-β3ΔU;
wherein L islineRepresenting the energy loss on the line during the energy transfer, LrouterRepresents the energy loss of the energy router in the energy transfer process, and Delta U is the absolute value of the voltage deviation brought by the transfer process, beta1、β2And beta3Coefficients of 3 bonus points, respectively, are used to adjust the specific gravity.
Optionally, the initializing graph convolutional neural network includes:
constructing an adjacency matrix of the graph according to the network structure of the energy Internet, and determining the number of graph convolution layers and the number of convolution kernels of each layer of graph convolution; the construction method of the adjacency matrix A comprises the following steps:
initialization of a reinforcement learning Q network, comprising: determining the number of layers of the network and the type of each layer, wherein the input size is the same as the output size of the graph convolution neural network, the output size is N-1, and the set does not contain all energy routers of the energy router correspondingly;
initialization of an actor network, comprising: determining the number of layers of the network and the type of each layer, wherein the input size is the same as the output size of the graph convolution neural network, the output size is 1, namely the number of the energy transfer target router;
initialization of the critic network includes: and determining the number of layers of the network and the type of each layer, wherein the input size is the same as the output size of the graph convolution neural network, and the output size is 1, namely the evaluation value of the energy transfer process.
Optionally, the inputting the real-time state information and the device parameters of each energy router in the energy internet and the newly added load into the deep reinforcement learning model to obtain an optimal power supply energy router and an optimal energy routing path includes:
inputting the real-time state information and equipment parameters of each energy router in the energy Internet and the newly added load into the deep reinforcement learning model, and performing feature extraction on the input by a graph convolution neural network to obtain features F of each nodej,j=1,...N;
F of all nodes by reinforcement learning Q networkjFor input, an evaluation value Q with each node as a supply node is obtainedjAnd selecting the node with the largest evaluation value as the energy supply node nsource;
The operator network will FjAnd nsourceThe input is the route a of the next energy transfer, and the output is the route a of the next energy transfer;
critic network will FjAnd nsourceFor input, an evaluation v(s) of the current energy internet status is given.
Optionally, the forward propagation and backward propagation processes of the deep reinforcement learning model composed of the graph convolution neural network, the reinforcement learning Q network, the operator network, and the critic network include:
historical data s (t) of M time nodes according to initial state data seti) Each state sample initial value s in M, at random energy router node nloadIncreasing the load Δ P and performing the following steps:
respectively sending the comprehensive information of N energy routersFeature extraction is carried out in N nodes of the graph convolution neural network to obtain features F of N energy routersj,j=1,...N;
F is to bejN, and obtaining a Q value Q of each energy router node by using a Q networkjN, and a true Q value Q 'is calculated from the evaluation Q function of each power supply node'jN, according to a mean square error loss function: epsilon (theta)Q)=∑(Qj-Q′j)2Updating Q network parameter thetaQAnd the neural network parameter θG;
Selecting the node with the maximum Q value as a power supply node nsourceThen, the following steps are repeated to train the operator network and the critic network:
f is to bejN and a selected supply node NsourceUsing the operator network pi (· | s; theta) as inputa) Determining action a, and calculating the R value R(s) of the action by an incentive R function;
performing action a, transferring energy to the next energy router node nsource', obtaining the energy internet state s ' at the next moment, and sending s ' into the graph convolution neural network to obtain Fj′,j=1,...N;
F is to bejN and F, j ═ 1j', j ═ 1.. N is substituted into the critic network to calculate estimates V(s) and V (s') of the cost function, and TD error δ is calculated:
δ=R(s)+γV(s)-V(s′);
updating the operator network parameter and the graph neural network parameter theta according to the TD error deltaG:
According to the mean square error loss function: epsilon (theta)c)=(R(s)+γV(s)-V(s′))2Updating the critical network parameter θcAnd the neural network parameter θG:
N is to besource' as a new supply node nsource,Fj' is a novel FjStarting the next cycle until nsourceIs' nload;
When the preset maximum number of training rounds is reached, the training is finished, and the parameters theta of the graph convolutional neural network, the reinforcement learning Q network, the actor and the critic network are returnedG、θQ、θaAnd thetac。
Optionally, the state sample initial value s includes: n energy routers NkState information and device parameters of N.
According to the technical scheme, the real-time state information and the equipment parameters of each energy router in the energy internet and the newly added load in the energy internet are input into the deep reinforcement learning model consisting of the graph convolution neural network, the reinforcement learning Q network, the actor network and the critic network, so that the optimal power supply energy router and the optimal energy routing path are obtained. Therefore, the embodiment of the application realizes the rapid and accurate optimization of the energy transmission line by utilizing the real-time data of each energy router in the energy internet, thereby reducing the loss of energy transmission and ensuring the efficiency and reliability of power supply. In addition, in the optimization process, the available power supply capacity, the power supply quality, the reliability and the price of each energy router, the number of the routes required to be passed by newly adding a load to the energy router, the length of the transmission line and other factors are comprehensively considered, and the accuracy of the optimization result is further improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a flowchart of an energy route optimization method provided in an embodiment of the present application;
fig. 2 is a second flowchart of an energy route optimization method according to an embodiment of the present application;
FIG. 3 is a network architecture diagram of a deep reinforcement learning model according to an embodiment of the present disclosure;
FIG. 4 is a block diagram of a graph convolution neural network provided by an embodiment of the present application;
FIG. 5 is a block diagram of a atlas layer of an atlas neural network provided in an embodiment of the present application;
FIG. 6 is a flow chart of deep reinforcement learning model training provided by an embodiment of the present application;
fig. 7 is a schematic structural diagram of an energy routing optimization apparatus provided in an embodiment of the present application;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 shows a flowchart of an energy route optimization method provided in an embodiment of the present application, fig. 2 is a flowchart of another energy route optimization method provided in the embodiment of the present application, fig. 3 is a network structure diagram of a deep reinforcement learning model provided in the embodiment of the present application, fig. 4 is a structure diagram of a convolutional neural network provided in the embodiment of the present application, fig. 5 is a structure diagram of a convolutional layer of the convolutional neural network provided in the embodiment of the present application, and fig. 6 is a training flowchart of the deep reinforcement learning model provided in the embodiment of the present application. The energy route optimization method provided by the embodiment of the present application is explained and explained in detail below with reference to fig. 1 to 6, and as shown in fig. 1, the energy route optimization method provided by the embodiment of the present application specifically includes:
step 101: acquiring a network structure formed by interconnection of N energy routers in an energy internet, and a state information historical data set and equipment parameters of each energy router;
in this step, it should be noted that, first, a network structure in which all N energy routers in the energy internet are interconnected, and a state information history data set and device parameters of each energy router are obtained. Specifically, for an energy internet system interconnected by N energy routers, the network structure of the energy internet can be obtained according to the connection relationship of the electric energy transmission lines among the energy routers. Further, the energy router is used as a node of the graph, and the connection line is used as an edge, so that the topological structure of the energy Internet can be obtained. And then for each energy router node niAcquiring the device parameters thereof including the energy transfer efficiency effiMaximum power generation capacity Cgen iMaximum load capacity Cload iAnd acquiring the state information of the energy router, wherein the state information comprises load power in a microgrid connected with the energy router, power of power generation equipment and microgrid voltage. And finally, combining the equipment parameters and the state information of the N energy routers into comprehensive information of the energy Internet system.
Step 102: constructing a reinforcement learning environment of the energy Internet according to the operation principle of the energy Internet and the energy router, and constructing a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network according to the network structure; the reinforcement learning Q network, the actor network and the critic network all adopt fully-connected neural network structures;
in this step, it should be noted that the energy internet reinforcement learning environment is built according to a mechanism that energy routers in the energy internet perform energy transfer, an equation of a transfer process of energy between the energy routers is established, and energy loss generated by conversion efficiency of the energy routers, energy loss generated by lines due to impedance, and change of network voltage are mainly considered. The energy internet reinforcement learning environment can simulate an energy transfer process in the energy internet, namely, energy transfer is carried out according to the states of all energy routers in the existing energy internet, and the updated states of all energy routers in the energy internet after transfer are obtained.
In this step, as shown in fig. 4, in the structure diagram of the graph convolution neural network provided in the embodiment of the present application, the graph convolution neural network is formed by connecting a plurality of graph convolution layers, and the output of more than one graph convolution layer of each graph convolution layer is used as an input until a final feature extraction result of the energy internet is obtained. Each graph convolution layer in the graph convolution neural network is as shown in fig. 5, and the graph convolution layer takes the characteristic of each graph node as input and combines an adjacent matrix representing the graph structure to perform graph convolution operation on the characteristic of the graph node. The graph convolution operation carries out fusion and feature extraction on the features of the nodes with the connection relation, and compared with a common convolution neural network or a full-connection neural network, the extracted features can reflect the structural information of the graph.
In this step, it should be noted that a deep reinforcement learning model composed of a graph convolution neural network, a reinforcement learning Q network, an actor network, and a critic network is built according to a network structure of the energy internet. Wherein, the reinforcement learning Q network, the actor network and the critic network all adopt a fully connected neural network structure, and in terms of network output, as shown in fig. 3: the output of the reinforcement learning Q network is the Q value of the corresponding node of each energy router, which represents the reward value of the energy router as the energy supply node, and the larger the value is, the more suitable the energy router is as the energy supply node; the output of the actor network is the next energy router node of energy transfer; the output of the critic network is an evaluation of the energy delivery process.
Step 103: according to the state information historical data set and the equipment parameters of each energy router in the energy Internet, the available power supply capacity, the power supply quality, the reliability and the price of each energy router are integrated, the number of routes and the length of a transmission line which are required to pass by a newly added load to the energy router are increased, an evaluation Q function of the power supply energy router is constructed, and a reward R function of each energy transfer is constructed for the energy transfer between every two energy routers; the evaluation Q function is used for calculating the reward value of the power supply energy router; the reward R function is used to calculate a reward value obtained for performing each energy transfer action;
in this step, it should be noted that, according to the state information historical data set and the device parameters of each energy router in the energy internet, the available power supply capacity, the power supply quality, the reliability, and the price of each energy router are integrated, and the number of routes that a load needs to be added to the energy router and the length of a transmission line are integrated, so as to construct an evaluation Q function of the power supply energy router, where the Q function is:
Q(nload,nsource)=-α1lline-α2crouter+α3rsource+α4psource+α4qsourceΔP
wherein n isloadEnergy router, n, indicating where the newly added load is locatedsourceEnergy router supplying power to the energy router node where the newly added load is located, Δ P representing the newly added load, llineRepresents nloadAnd nsourceTotal length of connecting line between crouterRepresents nloadAnd nsourceNumber of routers in between, rsourceRepresents nsourceReliability of power supply of psourceRepresents nsourcePrice of power supply per unit energy, qsourceRepresents nsourceQuality of supply of alpha1、α2、α3、α4And alpha5Each 5-part coefficient was evaluated for adjusting specific gravity.
In this step, for energy transfer between every two energy routers, an energy transfer starting node n is givenstartEnergy receiving node nendAnd the load to be transferred, Δ P, the corresponding reward R function is:
R(nstart,nend)=-β1Lline-β2Lrouter-β3ΔU;
wherein L islineRepresenting the energy loss on the line during the energy transfer, LrouterThe energy loss of the energy router in the energy transfer process is shown, the delta U is the absolute value of the voltage deviation brought by the transfer process, and the beta 1, the beta 2 and the beta 3 are respectively coefficients of 3 part rewards and are used for adjusting the specific gravity. The reward R function is used to calculate the value of the reward obtained for each energy transfer action, giving the system a direction of travel.
Step 104: initializing parameter θ of graph convolutional neural network, reinforcement learning Q network, actor network, and critic networkG、θQ、θaAnd thetac(ii) a Simulating the situation of newly added loads by adding loads to random energy routers, and training a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network by using state information historical data sets and equipment parameters of all energy routers in an energy internet as input samples and using evaluation Q functions, energy routing paths and reward R functions of all the energy routers as output samples;
in this step, it should be noted that the initialization of the convolutional neural network includes:
constructing an adjacency matrix of the graph according to the network structure of the energy Internet, and determining the number of graph convolution layers and the number of convolution kernels of each layer of graph convolution; the construction method of the adjacency matrix A comprises the following steps:
initialization of a reinforcement learning Q network, comprising: determining the number of layers of the network and the type of each layer, wherein the input size is the same as the output size of the graph convolution neural network, the output size is N-1, and the set does not contain all energy routers of the energy router correspondingly;
initialization of an actor network, comprising: determining the number of layers of the network and the type of each layer, wherein the input size is the same as the output size of the graph convolution neural network, the output size is 1, namely the number of the energy transfer target router;
initialization of the critic network includes: and determining the number of layers of the network and the type of each layer, wherein the input size is the same as the output size of the graph convolution neural network, and the output size is 1, namely the evaluation value of the energy transfer process.
In the step, the situation of newly added load is simulated by adding load to the random energy router, and a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network is trained by using a state information historical data set and equipment parameters of each energy router in the energy internet as input samples and using an evaluation Q function, an energy routing path and an incentive R function of each energy router as output samples. Therefore, the embodiment of the invention learns the energy transfer process among the multiple energy routers in the energy internet by adopting a mode of combining the graph convolution neural network and the deep reinforcement learning, selects and optimizes the energy supply nodes and the energy routing paths of the energy routers, and can ensure the power supply efficiency and reliability of the energy internet.
And the energy router online monitoring state data is integrated, so that the energy routing in the energy Internet is selected and optimized quickly and accurately, and the power supply efficiency and reliability of the energy Internet can be guaranteed.
Step 105: for the requirement of new loads in the energy Internet, inputting the real-time state information and equipment parameters of each energy router in the energy Internet and the new loads into the deep reinforcement learning model to obtain an optimal power supply energy router and an optimal energy routing path; the optimal power supply energy router is the energy router with the maximum Q function value, and the optimal energy routing path is output by the actor network.
In this step, as shown in fig. 6, the deep reinforcement learning model takes the real-time status information of each energy router of the energy internet, the device parameters, and the newly added load of a certain node as inputs. First, the neural network G (theta) is mappedG) Carrying out feature extraction on the input to obtain the feature F of each nodejJ is 1. Then reinforcement learns the Q network Q (theta)Q) With F of all nodesjFor input, an evaluation value Q with each node as a power supply node is obtainedjThe node with the largest evaluation value is selected as the power supply node nsource. actor network pi (· | s;. theta)a) With FjAnd nsourceFor input, the action output is the route a for the next energy transfer. critic network V (s; θ c) by FjAnd nsourceFor input, an evaluation v(s) of the current energy internet status is given.
According to the technical scheme, the real-time state information and the equipment parameters of each energy router in the energy internet and the newly added load in the energy internet are input into the deep reinforcement learning model consisting of the graph convolution neural network, the reinforcement learning Q network, the actor network and the critic network, so that the optimal power supply energy router and the optimal energy routing path are obtained. Therefore, the embodiment of the application realizes the rapid and accurate optimization of the energy transmission line by utilizing the real-time data of each energy router in the energy internet, thereby reducing the loss of energy transmission and ensuring the efficiency and reliability of power supply. In addition, in the optimization process, the available power supply capacity, the power supply quality, the reliability and the price of each energy router, the number of routes required to be passed by newly adding loads to the energy routers, the length of transmission lines and other factors are comprehensively considered, and the accuracy of the optimization result is further improved.
Based on the content of the foregoing embodiment, in this embodiment, the state information history data set of the energy router includes: load power in the microgrid connected to the energy router, power of the power generation equipment and microgrid voltage.
In the present embodiment, the historical data set of the state information of the energy router in the energy internet includes load power in the microgrid connected to the energy router, power of the power generation equipment, and microgrid voltage. Preferably, the state data of the energy routers can be monitored on line to ensure the accuracy of energy routing optimization each time, and the existing routing table optimization method needs to update the routing table frequently when new energy generates electricity or the load fluctuates, so that the transmission load of information flow in the energy internet is increased.
Based on the content of the foregoing embodiment, in this embodiment, the device parameters of the energy router include: energy transfer efficiency, maximum power generation capacity, and maximum load capacity.
In this embodiment, it should be noted that the device parameters of the energy router include: energy transfer efficiency, maximum power generation capacity, and maximum load capacity. The method has the advantages that the equipment parameter information of each energy router in the energy internet is obtained during each energy routing optimization, and the multi-dimensional reference is provided for the selection of the power supply energy router.
Based on the content of the foregoing embodiment, in the present embodiment, the structure of the graph convolution neural network includes: the number of graph adjacency matrices, the number of graph convolution layers, and the number of convolution kernels per layer of graph convolution.
In this embodiment, it should be noted that the graph convolution neural network is widely applied in many fields, and has a good effect on processing of graph structure data, such as wireless network node selection, urban road traffic prediction, and the like. The energy internet formed by connecting a plurality of energy routers is a typical graph structure, and a graph convolution neural network can perform efficient feature extraction on operation information of the graph convolution neural network. As shown in fig. 4, the structure of the graph convolution neural network includes a neighborhood matrix of a graph, the number of graph convolution layers, and the number of convolution kernels convolved per layer of the graph.
Based on the contents of the above embodiments, in the present embodiment, the structures of the reinforcement learning Q network, the operator network, and the criticc network include the number of layers of the network, the type of each layer, and the input/output size.
In this embodiment, the structure of the reinforcement learning Q network, the operator network and the criticc network includes the number of layers of the network, the type of each layer and the input/output size.
Based on the content of the foregoing embodiment, in the present embodiment, the evaluation Q function is:
Q(nload,nsource)=-α1lline-α2crouter+α3rsource+α4psource+α4qsourceΔP;
wherein n isloadEnergy router, n, indicating where the newly added load is locatedsourceEnergy router supplying power to the energy router node where the newly added load is located, Δ P representing the newly added load, llineRepresents nloadAnd nsourceTotal length of connecting line between crouterRepresents nloadAnd nsourceNumber of routers in between, rsourceRepresents nsourceReliability of power supply of psourceRepresents nsourcePrice of power supply per unit energy, qsourceRepresents nsourceQuality of supply of alpha1、α2、α3、α4And alpha5Coefficients of 5 parts of evaluations, respectively, for adjusting specific gravity;
for energy transfer between every two energy routers, an energy transfer starting node n is givenstartEnergy receiving node nendAnd the load to be transferred, Δ P, the corresponding reward R function is:
R(nstart,nend)=-β1Lline-β2Lrouter-β3ΔU;
wherein L islineRepresenting the energy loss on the line during the energy transfer, LrouterRepresents the energy loss of the energy router in the energy transfer process, and Delta U is the absolute value of the voltage deviation brought by the transfer process, beta1、β2And beta3Coefficients of 3 bonus points, respectively, are used to adjust the specific gravity.
In this embodiment, it should be noted that, according to the historical state information data set and the device parameters of each energy router in the energy internet, and by integrating the available power supply capacity, the power supply quality, the reliability, and the price of each energy router, and the number of routes that a load needs to be added to the energy router and the length of a transmission line, an evaluation Q function of the power supply energy router is constructed, where the Q function is:
Q(nload,nsource)=-α1lline-α2crouter+α3rsource+α4psource+α4qsourceΔP
wherein n isloadEnergy router, n, indicating where the newly added load is locatedsourceEnergy router supplying power to the energy router node where the newly added load is located, Δ P representing the newly added load, llineRepresents nloadAnd nsourceTotal length of connecting line between crouterRepresents nloadAnd nsourceNumber of routers in between, rsourceRepresents nsourceReliability of power supply of psourceRepresents nsourcePrice of power supply per unit energy, qsourceRepresents nsourceQuality of supply of alpha1、α2、α3、α4And alpha5Each 5-part coefficient was evaluated for adjusting specific gravity.
In the present embodiment, for energy transfer between every two energy routers, an energy transfer starting node n is givenstartEnergy receiving node nendAnd the load to be transferred, Δ P, the corresponding reward R function is:
R(nstart,nend)=-β1Lline-β2Lrouter-β3ΔU;
wherein L islineRepresenting the energy loss on the line during the energy transfer, LrouterRepresents the energy loss of the energy router in the energy transfer process, and Delta U is the absolute value of the voltage deviation brought by the transfer process, beta1、β2And beta3Coefficients of 3 bonus points, respectively, are used to adjust the specific gravity. The reward R function is used to calculate the value of the reward obtained for each energy transfer action, giving the system a direction of travel.
Based on the content of the foregoing embodiment, in this embodiment, the initializing graph convolutional neural network includes:
constructing an adjacency matrix of the graph according to the network structure of the energy Internet, and determining the number of graph convolution layers and the number of convolution kernels of each layer of graph convolution; the construction method of the adjacency matrix A comprises the following steps:
initialization of a reinforcement learning Q network, comprising: determining the number of layers of the network and the type of each layer, wherein the input size is the same as the output size of the graph convolution neural network, the output size is N-1, and the set does not contain all energy routers of the energy router correspondingly;
initialization of an actor network, comprising: determining the number of layers of the network and the type of each layer, wherein the input size is the same as the output size of the graph convolution neural network, the output size is 1, namely the number of the energy transfer target router;
initialization of the critic network includes: and determining the number of layers of the network and the type of each layer, wherein the input size is the same as the output size of the graph convolution neural network, and the output size is 1, namely the evaluation value of the energy transfer process.
In this embodiment, it should be noted that the initializing of the convolutional neural network includes:
constructing an adjacency matrix of the graph according to the network structure of the energy Internet, and determining the number of graph convolution layers and the number of convolution kernels of each layer of graph convolution; the construction method of the adjacency matrix A comprises the following steps:
initialization of a reinforcement learning Q network, comprising: determining the number of layers of the network and the type of each layer, wherein the input size is the same as the output size of the graph convolution neural network, the output size is N-1, and the set does not contain all energy routers of the energy router correspondingly;
initialization of an actor network, comprising: determining the number of layers of the network and the type of each layer, wherein the input size is the same as the output size of the graph convolution neural network, the output size is 1, namely the number of the energy transfer target router;
initialization of the critical network, including: and determining the number of layers of the network and the type of each layer, wherein the input size is the same as the output size of the graph convolution neural network, and the output size is 1, namely the evaluation value of the energy transfer process.
Based on the content of the foregoing embodiment, in this embodiment, the inputting the real-time state information and the device parameters of each energy router in the energy internet and the newly added load into the deep reinforcement learning model to obtain an optimal power supply energy router and energy routing path includes:
inputting the real-time state information and equipment parameters of each energy router in the energy Internet and the newly added load into the deep reinforcement learning model, and performing feature extraction on the input by a graph convolution neural network to obtain features F of each nodej,j=1,...N;
F of all nodes by reinforcement learning Q networkjFor input, an evaluation value Q with each node as a supply node is obtainedjAnd selecting the node with the largest evaluation value as the energy supply node nsource;
The operator network will FjAnd nsourceThe input is the route a of the next energy transfer, and the output is the route a of the next energy transfer;
critic network will FjAnd nsourceFor input, an evaluation v(s) of the current energy internet status is given.
In this embodiment, it should be noted that the deep reinforcement learning model takes real-time status information of each energy router of the energy internet, device parameters, and a newly added load of a certain node as input. First, the neural network G (theta) is mappedG) Carrying out feature extraction on the input to obtain the feature F of each nodejJ is 1. Then reinforcement learns the Q network Q (theta)Q) With F of all nodesjFor input, an evaluation value Q with each node as a power supply node is obtainedjThe node with the largest evaluation value is selected as the power supply node nsource. actor network pi (· | s;. theta)a) With Fj and nsourceFor input, the action output is the route a for the next energy transfer. criticc network V (s; θ c) by FjAnd nsourceFor input, an evaluation v(s) of the current energy internet status is given.
Based on the content of the foregoing embodiment, in this embodiment, the forward propagation and backward propagation processes of the deep reinforcement learning model composed of the graph convolution neural network, the reinforcement learning Q network, the actor network, and the critic network include:
historical data s (t) of M time nodes according to initial state data seti) Each state sample initial value s in M, at random energy router node nloadIncreasing the load Δ P and performing the following steps:
respectively sending the comprehensive information of the N energy routers into N nodes of the graph convolution neural network for feature extraction to obtain features F of the N energy routersj,j=1,...N;
F is to bejN, and obtaining a Q value Q of each energy router node by using a Q networkjN, and a true Q value Q 'is calculated from the evaluation Q function of each power supply node'jN, according to a mean square error loss function: epsilon (theta)Q)=∑(Qj-Q′j)2Updating Q network parameter thetaQAnd neural network parametersNumber thetaG;
Selecting the node with the maximum Q value as a power supply node nsourceThen, the following steps are repeated to train the operator network and the critic network:
f is to bejN and a selected supply node NsourceUsing the operator network pi (· | s; theta) as inputa) Determining action a, and calculating the R value R(s) of the action by a reward R function;
performing action a, transferring energy to the next energy router node nsource', obtaining the energy internet state s ' at the next moment, and sending s ' into the graph convolution neural network to obtain Fj′,j=1,...N;
F is to bejN and F, j ═ 1jN is substituted into the criticc network to calculate estimates V(s) and V (s') of the cost function, and TD error δ is calculated:
δ=R(s)+γV(s)-V(s′);
updating the operator network parameter and the graph neural network parameter theta according to the TD error deltaG:
According to the mean square error loss function: epsilon (theta)c)=(R(s)+γV(s)-V(s′))2Updating critic network parameter thetacAnd the neural network parameter θG:
N is to besource' as a new supply node nsource,Fj' is a novel FjStarting the next cycle until nsourceIs' nload;
When the preset maximum number of training rounds is reached, the training is finished, and the parameters theta of the graph convolutional neural network, the reinforcement learning Q network, the actor and the critic network are returnedG、θQ、θaAnd thetac。
In this embodiment, it should be noted that the forward and backward propagation processes of the deep reinforcement learning model are as follows:
given a maximum number of training rounds, the following steps are then performed round by round:
historical data s (t) of M time nodes according to initial state data seti) Each state sample initial value s in 1,. M, at a random energy router node nloadIncreasing the load Δ P and performing the following steps:
respectively sending the comprehensive information of the N energy routers into N nodes of the graph convolutional neural network for feature extraction to obtain features F of the N energy routersj,j=1,...N;
F is to bejN, and obtaining a Q value Q of each energy router node by using a Q networkjN, and a true Q value Q 'is calculated from the evaluation Q function of each power supply node'jN, according to a mean square error loss function: epsilon (theta)Q)=∑(Qj-Q′j)2Updating Q network parameter thetaQAnd the neural network parameter θG;
Selecting the node with the maximum Q value as a power supply node nsourceThen, the following steps are repeated to train the operator network and the critic network:
f is to bejN and a selected supply node NsourceUsing the operator network pi (· | s; theta) as inputa) Determining action a, and calculating the R value R(s) of the action by a reward R function;
performing action a, transferring energy to the next energy router node nsource' obtaining the energy internet state s ' at the next moment, and sending s ' into a graph convolution neural network to obtain Fj′,j=1,...N;
F is to bejN and F, j ═ 1j', j ═ 1.. N is substituted into the critic network to calculate estimates V(s) and V (s') of the cost function, and TD error δ is calculated:
δ=R(s)+γV(s)-V(s′);
updating the operator network parameter and the graph neural network parameter theta according to the TD error deltaG:
According to the mean square error loss function: epsilon (theta)c)=(R(s)+γV(s)-V(s′))2Updating critic network parameter thetacAnd the neural network parameter θG:
N is to besource' as a new supply node nsource,Fj' is a novel FjStarting the next cycle until nsourceIs' nload;
When the preset maximum number of training rounds is reached, the training is finished, and the parameters theta of the graph convolutional neural network, the reinforcement learning Q network, the actor and the critic network are returnedG、θQ、θaAnd thetac。
For a newly added load of a certain energy router node, energy internet information and load information are input into a graph convolution neural network, and an optimal power supply node is obtained through a Q network; inputting the energy Internet characteristics and the power supply node information into an operator network to obtain a next node for energy transfer until the energy is transferred to the node where the newly added load is located; the output process of the operator network is further arranged into an optimal energy route, and the optimal energy route is sent to the energy route through the nodes by the information network to establish an actual energy transmission route.
Based on the same inventive concept, another embodiment of the present invention provides an energy route optimization device, as shown in fig. 7, the energy route optimization device provided in the embodiment of the present application includes:
the system comprises a first processing module 1, a second processing module and a control module, wherein the first processing module is used for acquiring a network structure formed by interconnection of N energy routers in the energy Internet, and a state information historical data set and equipment parameters of each energy router;
the second processing module 2 is used for constructing a reinforcement learning environment of the energy internet according to the operation principle of the energy internet and the energy router, and constructing a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network according to the network structure; the reinforcement learning Q network, the actor network and the critic network all adopt fully-connected neural network structures;
the third processing module 3 is configured to construct an evaluation Q function of the power supply energy router according to the state information historical data set and the device parameters of each energy router in the energy internet, and synthesize the available power supply capacity, the power supply quality, the reliability, and the price of each energy router, the number of routes that a load needs to be added to the energy router, and the length of a transmission line, and construct a reward R function for each energy transfer for energy transfer between every two energy routers; the evaluation Q function is used for calculating the reward value of the power supply energy router; the reward R function is used for calculating a reward value obtained by executing each energy transfer action;
a fourth processing module 4 for initializing the parameters theta of the graph convolution neural network, the reinforcement learning Q network, the actor network and the critic networkG、θQ、θaAnd thetac(ii) a Simulating the situation of newly added loads by adding loads to random energy routers, and training a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network by using state information historical data sets and equipment parameters of all energy routers in an energy internet as input samples and using evaluation Q functions, energy routing paths and reward R functions of all the energy routers as output samples;
the fifth processing module 5 is configured to input the real-time state information and the device parameters of each energy router in the energy internet and the newly added load into the deep reinforcement learning model to obtain an optimal power supply energy router and an optimal energy routing path; the optimal power supply energy router is the energy router with the maximum Q function value, and the optimal energy routing path is output by the actor network.
In this embodiment, it should be noted that, first, a network structure in which all N energy routers in the energy internet are interconnected, and a state information history data set and device parameters of each energy router are obtained. Specifically, for an energy internet system interconnected by N energy routers, according to the connection relationship of electric energy transmission lines among the energy routers, the connection relationship may beAnd obtaining a network structure of the energy Internet. Further, the energy router is used as a node of the graph, and the connection line is used as an edge, so that the topological structure of the energy Internet can be obtained. And then for each energy router node niAcquiring the device parameters thereof including the energy transfer efficiency effiMaximum power generation capacity Cgen iMaximum load capacity Cload iAnd acquiring the state information of the energy router, wherein the state information comprises load power in a microgrid connected with the energy router, power of power generation equipment and microgrid voltage. And finally, combining the equipment parameters and the state information of the N energy routers into comprehensive information of the energy Internet system.
In the embodiment, the energy internet reinforcement learning environment is built according to the mechanism of energy transmission of the energy routers in the energy internet, an equation of the energy transmission process among the energy routers is established, and energy loss generated by the conversion efficiency of the energy routers, energy loss of lines due to impedance and change of network voltage are mainly considered. The energy internet reinforcement learning environment can simulate an energy transfer process in the energy internet, namely, energy transfer is carried out according to the states of all energy routers in the existing energy internet, and the updated states of all energy routers in the energy internet after transfer are obtained.
In this embodiment, as shown in fig. 4, in the structure diagram of the graph convolution neural network provided in the embodiment of the present application, the graph convolution neural network is formed by connecting a plurality of graph convolution layers, and the output of more than one graph convolution layer of each graph convolution layer is used as an input until a final feature extraction result of the energy internet is obtained. Each graph convolution layer in the graph convolution neural network is as shown in fig. 5, and the graph convolution layer takes the characteristics of each graph node as input, and combines the adjacent matrix of the characteristic graph structure to perform graph convolution operation on the characteristics of the graph nodes. The graph convolution operation carries out fusion and feature extraction on the features of the nodes with the connection relation, and compared with a common convolution neural network or a full-connection neural network, the extracted features can reflect the structural information of the graph.
In this embodiment, a deep reinforcement learning model composed of a graph convolution neural network, a reinforcement learning Q network, an actor network, and a critic network is built according to a network structure of an energy internet. Wherein, the reinforcement learning Q network, the actor network and the critic network all adopt a fully connected neural network structure, and in terms of network output, as shown in fig. 3: the output of the reinforcement learning Q network is the Q value of the corresponding node of each energy router, which represents the reward value of the energy router as the energy supply node, and the larger the value is, the more suitable the energy router is as the energy supply node; the output of the actor network is the next energy router node of energy transfer; the output of the critic network is an evaluation of the energy delivery process.
In this embodiment, it should be noted that, according to the historical state information data set and the device parameters of each energy router in the energy internet, and by integrating the available power supply capacity, the power supply quality, the reliability, and the price of each energy router, and the number of routes that a load needs to be added to the energy router and the length of a transmission line, an evaluation Q function of the power supply energy router is constructed, where the Q function is:
Q(nload,nsource)=-α1lline-α2crouter+α3rsource+α4psource+α4qsourceΔP
wherein n isloadEnergy router, n, indicating where the newly added load is locatedsourceEnergy router supplying power to the energy router node where the newly added load is located, Δ P representing the newly added load, llineRepresents nloadAnd nsourceTotal length of connecting line between crouterRepresents nloadAnd nsourceNumber of routers in between, rsourceRepresents nsourceReliability of power supply of psourceRepresents nsourcePrice of power supply per unit energy, qsourceRepresents nsourceQuality of supply of alpha1、α2、α3、α4And alpha5Each of which was evaluated in section 5, and was used to adjust specific gravity.
In the present embodiment, forEnergy transfer between every two energy routers, given an energy transfer initiation node nstartEnergy receiving node nendAnd the load to be transferred, Δ P, the corresponding reward R function is:
R(nstart,nend)=-β1Lline-β2Lrouter-β3ΔU;
wherein L islineRepresenting the energy loss on the line during the energy transfer, LrouterRepresents the energy loss of the energy router in the energy transfer process, and Delta U is the absolute value of the voltage deviation brought by the transfer process, beta1、β2And beta3Coefficients of 3 bonus points, respectively, are used to adjust the specific gravity. The reward R function is used to calculate the value of the reward obtained for each energy transfer action, giving the system a direction of travel.
In this embodiment, it should be noted that the initializing of the graph convolution neural network includes:
constructing an adjacency matrix of the graph according to the network structure of the energy Internet, and determining the number of graph convolution layers and the number of convolution kernels of each layer of graph convolution; the construction method of the adjacency matrix A comprises the following steps:
initialization of a reinforcement learning Q network, comprising: determining the number of layers of the network and the type of each layer, wherein the input size is the same as the output size of the graph convolution neural network, the output size is N-1, and the set does not contain all energy routers of the energy router correspondingly;
initialization of an actor network, comprising: determining the number of layers of the network and the type of each layer, wherein the input size is the same as the output size of the graph convolution neural network, the output size is 1, namely the number of the energy transfer target router;
initialization of the critical network, including: and determining the number of layers of the network and the type of each layer, wherein the input size is the same as the output size of the graph convolution neural network, and the output size is 1, namely the evaluation value of the energy transfer process.
In the embodiment, the situation of newly added loads is simulated by adding loads to random energy routers, real-time state information and equipment parameters of each energy router in the energy internet are used as input samples, and an evaluation Q function, an energy routing path and an incentive R function of each energy router are used as output samples, so that a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network is trained. Therefore, the embodiment of the invention learns the energy transfer process among the multiple energy routers in the energy internet by adopting a mode of combining the graph convolution neural network and the deep reinforcement learning, selects and optimizes the energy supply nodes and the energy routing paths of the energy routers, and can ensure the power supply efficiency and reliability of the energy internet.
And the energy router online monitoring state data is integrated, so that the energy routing in the energy Internet is selected and optimized quickly and accurately, and the power supply efficiency and reliability of the energy Internet can be guaranteed.
In this embodiment, as shown in fig. 6, the deep reinforcement learning model takes the state information of each energy router of the energy internet, the device parameters, and the newly added load of a certain node as inputs. First, the neural network G (theta) is mappedG) Performing feature extraction on the input to obtain features F of each nodejJ is 1. Then reinforcement learns the Q network Q (theta)Q) With F for all nodesjFor input, an evaluation value Q with each node as a power supply node is obtainedjThe node with the largest evaluation value is selected as the power supply node nsource. actor network pi (· | s;. theta)a) With FjAnd nsourceFor input, the action output is the route a for the next energy transfer. critic network V (s; θ c) by FjAnd nsourceFor input, an evaluation v(s) of the current energy internet status is given.
According to the technical scheme, the real-time state information and the equipment parameters of each energy router in the energy internet and the newly added load in the energy internet are input into the deep reinforcement learning model consisting of the graph convolution neural network, the reinforcement learning Q network, the actor network and the critic network, so that the optimal power supply energy router and the optimal energy routing path are obtained. Therefore, the embodiment of the application realizes the rapid and accurate optimization of the energy transmission line by utilizing the real-time data of each energy router in the energy internet, thereby reducing the loss of energy transmission and ensuring the efficiency and reliability of power supply. In addition, in the optimization process, the available power supply capacity, the power supply quality, the reliability and the price of each energy router, the number of routes required to be passed by newly adding loads to the energy routers, the length of transmission lines and other factors are comprehensively considered, and the accuracy of the optimization result is further improved.
The energy route optimization device described in this embodiment may be used to implement the above method embodiments, and the principle and technical effect are similar, which are not described herein again.
Based on the same inventive concept, another embodiment of the present invention provides an electronic device, which, with reference to the schematic structural diagram of the electronic device shown in fig. 8, specifically includes the following contents: a processor 801, memory 802, communication interface 803, and communication bus 804;
the processor 801, the memory 802 and the communication interface 803 complete mutual communication through the communication bus 804; the communication interface 803 is used for realizing information transmission between devices;
the processor 801 is configured to call a computer program in the memory 802, and when the processor executes the computer program, the processor implements all the steps of the energy route optimization method, for example, acquiring a network structure formed by interconnecting N energy routers in an energy internet, and a state information history data set and device parameters of each energy router; constructing a reinforcement learning environment of the energy Internet according to the operation principle of the energy Internet and the energy router, and constructing a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network according to the network structure; the reinforcement learning Q network, the operator network and the critic network all adopt a fully-connected neural network structure; according to theThe method comprises the steps that state information historical data sets and equipment parameters of each energy router in the energy internet are integrated, the available power supply capacity, the power supply quality, the reliability and the price of each energy router are integrated, the number of routes and the length of a transmission line which are required to be passed by a newly added load to the energy router are increased, an evaluation Q function of the power supply energy router is constructed, and a reward R function of each energy transfer is constructed for the energy transfer between every two energy routers; the evaluation Q function is used for calculating the reward value of the power supply energy router; the reward R function is used to calculate a reward value obtained for performing each energy transfer action; initializing parameter θ of graph convolutional neural network, reinforcement learning Q network, actor network, and critic networkG、θQ、θaAnd thetac(ii) a Simulating the situation of newly added loads by adding loads to random energy routers, and training a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network by using state information historical data sets and equipment parameters of all energy routers in an energy internet as input samples and using evaluation Q functions, energy routing paths and reward R functions of all the energy routers as output samples; inputting state information historical data sets and equipment parameters of all energy routers in an energy internet and newly-added loads in the energy internet into the deep reinforcement learning model to obtain an optimal power supply energy router and an optimal energy routing path; the optimal power supply energy router is the energy router with the maximum Q function value, and the optimal energy routing path is output by the actor network.
Based on the same inventive concept, yet another embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements all the steps of the above energy route optimization method, for example, acquiring a network structure formed by interconnecting N energy routers in an energy internet, and a historical data set of state information and device parameters of each energy router; the energy Internet is constructed according to the operation principle of the energy Internet and the energy routerA deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network is constructed according to the network structure; the reinforcement learning Q network, the actor network and the critic network all adopt fully-connected neural network structures; according to the state information historical data set and the equipment parameters of each energy router in the energy Internet, the available power supply capacity, the power supply quality, the reliability and the price of each energy router are integrated, the number of routes and the length of a transmission line which are required to pass by a newly added load to the energy router are increased, an evaluation Q function of the power supply energy router is constructed, and a reward R function of each energy transfer is constructed for the energy transfer between every two energy routers; the evaluation Q function is used for calculating the reward value of the power supply energy router; the reward R function is used to calculate a reward value obtained for performing each energy transfer action; initializing parameter θ of graph convolutional neural network, reinforcement learning Q network, actor network, and critic networkG、θQ、θaAnd thetac(ii) a Simulating the situation of newly added loads by adding loads to random energy routers, and training a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network by using state information historical data sets and equipment parameters of all energy routers in an energy internet as input samples and using evaluation Q functions, energy routing paths and reward R functions of all the energy routers as output samples; inputting state information historical data sets and equipment parameters of all energy routers in an energy internet and newly-added loads in the energy internet into the deep reinforcement learning model to obtain an optimal power supply energy router and an optimal energy routing path; the optimal power supply energy router is the energy router with the maximum Q function value, and the optimal energy routing path is output by an operator network.
In addition, the logic instructions in the memory may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on such understanding, the above technical solutions may be essentially or partially implemented in the form of software products, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and include instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the energy route optimization method according to the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (8)
1. A method for energy routing optimization, comprising:
acquiring a network topology formed by interconnection of N energy routers in an energy internet, and a state information historical data set and equipment parameters of each energy router;
constructing a reinforcement learning environment of the energy Internet according to the mechanism of energy transfer of the energy Internet and the energy router, and constructing a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network according to the network topology; the reinforcement learning Q network, the operator network and the critic network all adopt a fully-connected neural network structure;
according to the state information historical data set and the equipment parameters of each energy router in the energy Internet, the available power supply capacity, the power supply quality, the reliability and the price of each energy router, the number of routes which are newly added and load needs to pass through the energy router and the length of a transmission line, an evaluation Q function of the power supply energy router is constructed, and a reward R function of each energy transfer is constructed for the energy transfer between every two energy routers; the evaluation Q function is used for calculating the reward value of the power supply energy router; the reward R function is used for calculating a reward value obtained by executing each energy transfer action;
initializing parameter θ of graph convolutional neural network, reinforcement learning Q network, actor network, and critic networkG、θQ、θaAnd thetac(ii) a Simulating new load conditions by adding loads to random energy routers, using historical data sets of state information and equipment parameters of each energy router in the energy internet as input samples, and using evaluation of each energy routerTaking the price Q function, the energy routing path and the reward R function as output samples, and training a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network;
inputting real-time state information and equipment parameters of each energy router in the energy internet and the newly added load into the deep reinforcement learning model to obtain an optimal power supply energy router and an optimal energy routing path for energy consumption requirements of the newly added load in the energy internet; the optimal power supply energy router is the energy router with the maximum Q function value, and the optimal energy routing path is output by an actor network;
wherein the evaluation Q function is:
Q(nload,nsource)=-α1lline-α2crouter+α3rsource+α4psource+α4qsourceΔP;
wherein n isloadEnergy router, n, indicating where the newly added load is locatedsourceEnergy router supplying power to the energy router node where the newly added load is located, Δ P representing the newly added load, llineRepresents nloadAnd nsourceTotal length of connecting line between crouterRepresents nloadAnd nsourceNumber of routers in between, rsourceRepresents nsourceReliability of power supply of psourceRepresents nsourcePrice of power supply per unit energy, qsourceRepresents nsourceQuality of supply of alpha1、α2、α3、α4And alpha5Coefficients of 5 parts of evaluations, respectively, for adjusting specific gravity;
for energy transfer between every two energy routers, an energy transfer starting node n is givenstartEnergy receiving node nendAnd the load to be transferred, Δ P, the corresponding reward R function is:
R(nstart,nend)=-β1Lline-β2Lrouter-β3ΔU;
wherein L islineRepresenting the energy loss on the line during the energy transfer, LrouterRepresents the energy loss of the energy router in the energy transfer process, and Delta U is the absolute value of the voltage deviation brought by the transfer process, beta1、β2And beta3Coefficients of 3 part prizes respectively for adjusting specific gravity;
the method for obtaining the optimal power supply energy router and energy routing path by inputting the real-time state information and the equipment parameters of each energy router in the energy internet and the newly added load into the deep reinforcement learning model comprises the following steps:
inputting the real-time state information and equipment parameters of each energy router in the energy Internet and the newly added load into the deep reinforcement learning model, and performing feature extraction on the input by a graph convolution neural network to obtain features F of each nodej,j=1,...N;
F of all nodes by reinforcement learning Q networkjFor input, an evaluation value Q with each node as a supply node is obtainedjAnd selecting the node with the largest evaluation value as the energy supply node nsource;
The operator network will FjAnd nsourceThe input is the route a of the next energy transfer, and the output is the route a of the next energy transfer;
critic network will FjAnd nsourceFor input, an evaluation v(s) of the current energy internet status is given.
2. The energy routing optimization method of claim 1, wherein the historical data set of state information for the energy router comprises: load power in the microgrid connected to the energy router, power of the power generation equipment and microgrid voltage.
3. The energy routing optimization method of claim 1, wherein the device parameters of the energy router comprise: energy transfer efficiency, maximum power generation capacity, and maximum load capacity.
4. The energy routing optimization method of claim 1, wherein the structure of the graph convolutional neural network comprises: the number of graph adjacency matrices, the number of graph convolution layers, and the number of convolution kernels for each layer of graph convolution.
5. The energy routing optimization method according to claim 1, wherein the structure of the reinforcement learning Q network, the actor network and the critic network comprises the number of layers of the network, the type of each layer and the input-output size.
6. The energy routing optimization method of claim 1, wherein initializing the graph convolutional neural network comprises:
constructing an adjacency matrix of the graph according to the network topology of the energy Internet, and determining the number of graph convolution layers and the number of convolution kernels of each layer of graph convolution; the construction method of the adjacency matrix A comprises the following steps:
initialization of a reinforcement learning Q network, comprising: determining the number of layers of the network and the type of each layer, wherein the input size is the same as the output size of the graph convolution neural network, the output size is N-1, and the set does not contain all energy routers of the energy router correspondingly;
initialization of an actor network, comprising: determining the number of layers of the network and the type of each layer, wherein the input size is the same as the output size of the graph convolution neural network, the output size is 1, namely the number of the energy transfer target router;
initialization of the critic network includes: and determining the number of layers of the network and the type of each layer, wherein the input size is the same as the output size of the graph convolution neural network, and the output size is 1, namely the evaluation value of the energy transfer process.
7. The energy route optimization method according to claim 1, wherein the forward propagation and backward propagation processes of the deep reinforcement learning model composed of the graph convolution neural network, the reinforcement learning Q network, the operator network and the critic network comprise:
historical data s (t) of M time nodes according to initial state data seti) Each state sample initial value s in M, at random energy router node nloadIncreasing the load Δ P and performing the following steps:
respectively sending the comprehensive information of the N energy routers into N nodes of the graph convolution neural network for feature extraction to obtain features F of the N energy routersj,j=1,...N;
F is to bejN, and obtaining a Q value Q of each energy router node by using a Q networkjN, and a true Q value Q 'is calculated from the evaluation Q function of each power supply node'jN, according to a mean square error loss function: epsilon (theta)Q)=∑(Qj-Q′j)2Updating Q network parameter thetaQAnd the neural network parameter θG;
Selecting the node with the maximum Q value as a power supply node nsourceThen, the following steps are repeated to train the operator network and the critic network:
f is to bejN and a selected supply node NsourceUsing the operator network pi (· | s; theta) as inputa) Determining action a, and calculating the R value R(s) of the action by a reward R function;
performing action a, transferring energy to the next energy router node nsource', obtaining the energy internet state s ' at the next moment, and sending s ' into the graph convolution neural network to obtain Fj′,j=1,...N;
F is to bejN and F, j ═ 1jN is substituted into the criticc network to calculate estimates V(s) and V (s') of the cost function, and TD error δ is calculated:
δ=R(s)+γV(s)-V(s′);
updating the operator network parameter and the graph neural network parameter theta according to the TD error deltaG:
According to the mean square error loss function: epsilon (theta)c)=(R(s)+γV(s)-V(s′))2Updating critic network parameter thetacAnd the neural network parameter θG:
N is to besource' as a new supply node nsource,Fj' is a novel FjStarting the next cycle until nsourceIs' nload;
When the preset maximum number of training rounds is reached, the training is finished, and the parameters theta of the graph convolutional neural network, the reinforcement learning Q network, the actor and the critic network are returnedG、θQ、θaAnd thetac。
8. The energy routing optimization method of claim 7, wherein the stateA sample initial value s comprising: n energy routers NkN, k ═ 1.. N.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110261579.7A CN113132232B (en) | 2021-03-10 | 2021-03-10 | Energy route optimization method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110261579.7A CN113132232B (en) | 2021-03-10 | 2021-03-10 | Energy route optimization method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113132232A CN113132232A (en) | 2021-07-16 |
CN113132232B true CN113132232B (en) | 2022-05-20 |
Family
ID=76773010
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110261579.7A Active CN113132232B (en) | 2021-03-10 | 2021-03-10 | Energy route optimization method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113132232B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113572697B (en) * | 2021-07-20 | 2023-09-22 | 电子科技大学 | Load balancing method based on graph convolution neural network and deep reinforcement learning |
CN113780482A (en) * | 2021-11-12 | 2021-12-10 | 中国科学院理化技术研究所 | Intelligent detection method for abnormity of energy router |
CN114172840B (en) * | 2022-01-17 | 2022-09-30 | 河海大学 | Multi-microgrid system energy routing method based on graph theory and deep reinforcement learning |
CN115022231B (en) * | 2022-06-30 | 2023-11-03 | 武汉烽火技术服务有限公司 | Optimal path planning method and system based on deep reinforcement learning |
CN117294640B (en) * | 2023-10-13 | 2024-05-24 | 北京亿美芯科技有限公司 | Vehicle-mounted opportunity routing node selection method and system based on PPO algorithm |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3057192A1 (en) * | 2015-02-12 | 2016-08-17 | Northeastern University | An energy internet and a hierarchical control system and a control method thereof |
CN107911299A (en) * | 2017-10-24 | 2018-04-13 | 浙江工商大学 | A kind of route planning method based on depth Q study |
CN108038545A (en) * | 2017-12-06 | 2018-05-15 | 湖北工业大学 | Fast learning algorithm based on Actor-Critic neutral net continuous controls |
CN111967179A (en) * | 2020-07-02 | 2020-11-20 | 江苏能来能源互联网研究院有限公司 | Dynamic optimization matching method for energy units of energy Internet |
-
2021
- 2021-03-10 CN CN202110261579.7A patent/CN113132232B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3057192A1 (en) * | 2015-02-12 | 2016-08-17 | Northeastern University | An energy internet and a hierarchical control system and a control method thereof |
CN107911299A (en) * | 2017-10-24 | 2018-04-13 | 浙江工商大学 | A kind of route planning method based on depth Q study |
CN108038545A (en) * | 2017-12-06 | 2018-05-15 | 湖北工业大学 | Fast learning algorithm based on Actor-Critic neutral net continuous controls |
CN111967179A (en) * | 2020-07-02 | 2020-11-20 | 江苏能来能源互联网研究院有限公司 | Dynamic optimization matching method for energy units of energy Internet |
Also Published As
Publication number | Publication date |
---|---|
CN113132232A (en) | 2021-07-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113132232B (en) | Energy route optimization method | |
Kumar et al. | A hybrid multi-agent based particle swarm optimization algorithm for economic power dispatch | |
CN110570034B (en) | Bus load prediction method based on multi-XGboost model fusion | |
CN106909728B (en) | FPGA interconnection resource configuration generation method based on reinforcement learning | |
CN112685657B (en) | Conversation social recommendation method based on multi-mode cross fusion graph network | |
CN114358520A (en) | Method, system, device and medium for economic dispatching decision of power system | |
CN116207739A (en) | Optimal scheduling method and device for power distribution network, computer equipment and storage medium | |
Fu et al. | The distributed economic dispatch of smart grid based on deep reinforcement learning | |
CN117392483B (en) | Album classification model training acceleration method, system and medium based on reinforcement learning | |
CN116451880B (en) | Distributed energy optimization scheduling method and device based on hybrid learning | |
CN111600734B (en) | Network fault processing model construction method, fault processing method and system | |
CN112559904A (en) | Conversational social recommendation method based on door mechanism and multi-modal graph network | |
CN112101651B (en) | Electric energy network coordination control method, system and information data processing terminal | |
CN116974249A (en) | Flexible job shop scheduling method and flexible job shop scheduling device | |
CN115360768A (en) | Power scheduling method and device based on muzero and deep reinforcement learning and storage medium | |
CN111160557B (en) | Knowledge representation learning method based on double-agent reinforcement learning path search | |
CN115001978A (en) | Cloud tenant virtual network intelligent mapping method based on reinforcement learning model | |
CN114662204A (en) | Elastic bar system structure system data processing method and device based on graph neural network | |
CN112036936A (en) | Deep Q network-based generator bidding behavior simulation method and system | |
CN111027709A (en) | Information recommendation method and device, server and storage medium | |
CN115566692B (en) | Method and device for determining reactive power optimization decision, computer equipment and storage medium | |
US11973662B1 (en) | Intelligent mapping method for cloud tenant virtual network based on reinforcement learning model | |
CN118101493B (en) | Simulation optimizing method, device, equipment and medium for intelligent computation center network architecture | |
CN117061605B (en) | Intelligent lithium battery active information pushing method and device based on end cloud cooperation | |
CN109933858B (en) | Core division parallel simulation method for power distribution network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |