CN113132232B

CN113132232B - Energy route optimization method

Info

Publication number: CN113132232B
Application number: CN202110261579.7A
Authority: CN
Inventors: 郭盛; 曹军威
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2021-03-10
Filing date: 2021-03-10
Publication date: 2022-05-20
Anticipated expiration: 2041-03-10
Also published as: CN113132232A

Abstract

The embodiment of the application discloses an energy route optimization method, which comprises the following steps: acquiring a network structure formed by interconnection of N energy routers in an energy internet, and a state information historical data set and equipment parameters of each energy router; constructing and training a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network according to the network structure; and inputting the real-time state information and equipment parameters of each energy router in the energy Internet and the newly added load in the energy Internet into a deep reinforcement learning model to obtain the optimal power supply energy router and energy routing path. The energy transfer process between the multiple energy routers in the energy internet is learned in a mode of combining the graph convolutional neural network and the deep reinforcement learning, so that the energy transmission line can be quickly and accurately optimized by utilizing real-time data of the energy routers, the loss of energy transmission is reduced, and the power supply efficiency and reliability are improved.

Description

Energy route optimization method

Technical Field

The application relates to the technical field of artificial intelligence, in particular to an energy route optimization method.

Background

The energy internet becomes the development direction of the current electric energy system due to the advantages of the consumption capability of distributed energy resources such as photovoltaic and wind power and the energy information sharing, and in the energy internet, energy supply and energy utilization users can perform equal and free energy exchange. As a core device in energy and information exchange of an energy Internet, an energy router needs to provide reasonable energy supply node selection and optimal energy routing according to energy consumption requirements so as to meet the energy consumption requirements of users and reduce loss in energy transmission.

Currently, energy routers are powered by nodes and energy routing optimization mainly adopts two modes of a routing table and a dynamic optimization algorithm.

The routing table needs to store information of all energy routers and lines in each energy router, so that when a newly added load is transferred to one energy router, the next node of energy transfer is determined by querying the routing table. However, in the actual energy internet, due to the power generation of new energy and the fluctuation of load, the state information of each energy router changes dynamically, and the routing table needs to be updated frequently, so that the transmission load of information flow in the energy internet is increased; meanwhile, the routing table can only reflect the information of the line, and the factors such as loss in energy transmission and the like cannot be considered. When the dynamic optimization algorithm is used for optimizing the energy supply nodes and the energy routes of the energy router, although the loss in energy transmission can be considered, when more considered factors such as the electric energy quality, the energy supply price and the voltage stability are added, the solution of the dynamic optimization algorithm becomes very complicated, and the real-time performance and the accuracy of the result cannot be ensured.

Disclosure of Invention

Because the existing method has the above problems, the embodiments of the present application provide an energy route optimization method.

Specifically, the embodiment of the present application provides the following technical solutions:

the embodiment of the application provides an energy route optimization method, which comprises the following steps:

acquiring a network structure formed by interconnection of N energy routers in an energy internet, and a state information historical data set and equipment parameters of each energy router;

constructing a reinforcement learning environment of the energy Internet according to the operation principle of the energy Internet and the energy router, and constructing a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network according to the network structure; the reinforcement learning Q network, the actor network and the critic network all adopt fully-connected neural network structures;

according to the state information historical data set and the equipment parameters of each energy router in the energy Internet, the available power supply capacity, the power supply quality, the reliability and the price of each energy router are integrated, the number of routes and the length of a transmission line which are required to pass by a newly added load to the energy router are increased, an evaluation Q function of the power supply energy router is constructed, and a reward R function of each energy transfer is constructed for the energy transfer between every two energy routers; the evaluation Q function is used for calculating the reward value of the power supply energy router; the reward R function is used for calculating a reward value obtained by executing each energy transfer action;

initializing parameter θ of graph convolutional neural network, reinforcement learning Q network, actor network, and critic network_G、θ_Q、θ_aAnd theta_c(ii) a By adding at random energy routersThe load simulates the situation of newly added load, the state information historical data set and equipment parameters of each energy router in the energy Internet are used as input samples, and the evaluation Q function, the energy routing path and the reward R function of each energy router are used as output samples, so that a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network is trained;

for the requirement of new loads in the energy Internet, inputting the real-time state information and equipment parameters of each energy router in the energy Internet and the new loads into the deep reinforcement learning model to obtain an optimal power supply energy router and an optimal energy routing path; the optimal power supply energy router is the energy router with the maximum Q function value, and the optimal energy routing path is output by the actor network.

Optionally, the state information history data set of the energy router includes: load power in the microgrid connected to the energy router, power of the power generation equipment and microgrid voltage.

Optionally, the device parameters of the energy router include: energy transfer efficiency, maximum power generation capacity, and maximum load capacity.

Optionally, the structure of the graph convolutional neural network includes: the number of graph adjacency matrices, the number of graph convolution layers, and the number of convolution kernels per layer of graph convolution.

Optionally, the structure of the reinforcement learning Q network, the actor network, and the criticc network includes the number of layers of the network, the type of each layer, and the input/output size.

Optionally, the evaluation Q function is:

Q(n_load,n_source)＝-α₁l_line-α₂c_router+α₃r_source+α₄p_source+α₄q_sourceΔP；

wherein n is_loadEnergy router, n, indicating where the newly added load is located_sourceEnergy router supplying power to the energy router node where the newly added load is located, Δ P representing the newly added load, l_lineRepresents n_loadAnd n_sourceTotal length of connecting line between c_routerRepresents n_loadAnd n_sourceNumber of routers in between, r_sourceRepresents n_sourceReliability of power supply of p_sourceRepresents n_sourcePrice of power supply per unit energy, q_sourceRepresents n_sourceQuality of supply of alpha₁、α₂、α₃、α₄And alpha₅Coefficients of 5 parts of evaluations, respectively, for adjusting specific gravity;

for energy transfer between every two energy routers, an energy transfer starting node n is given_startEnergy receiving node n_endAnd the load to be transferred, Δ P, the corresponding reward R function is:

R(n_start,n_end)＝-β₁L_line-β₂L_router-β₃ΔU；

wherein L is_lineRepresenting the energy loss on the line during the energy transfer, L_routerRepresents the energy loss of the energy router in the energy transfer process, and Delta U is the absolute value of the voltage deviation brought by the transfer process, beta₁、β₂And beta₃Coefficients of 3 bonus points, respectively, are used to adjust the specific gravity.

Optionally, the initializing graph convolutional neural network includes:

constructing an adjacency matrix of the graph according to the network structure of the energy Internet, and determining the number of graph convolution layers and the number of convolution kernels of each layer of graph convolution; the construction method of the adjacency matrix A comprises the following steps:

initialization of a reinforcement learning Q network, comprising: determining the number of layers of the network and the type of each layer, wherein the input size is the same as the output size of the graph convolution neural network, the output size is N-1, and the set does not contain all energy routers of the energy router correspondingly;

initialization of an actor network, comprising: determining the number of layers of the network and the type of each layer, wherein the input size is the same as the output size of the graph convolution neural network, the output size is 1, namely the number of the energy transfer target router;

initialization of the critic network includes: and determining the number of layers of the network and the type of each layer, wherein the input size is the same as the output size of the graph convolution neural network, and the output size is 1, namely the evaluation value of the energy transfer process.

Optionally, the inputting the real-time state information and the device parameters of each energy router in the energy internet and the newly added load into the deep reinforcement learning model to obtain an optimal power supply energy router and an optimal energy routing path includes:

inputting the real-time state information and equipment parameters of each energy router in the energy Internet and the newly added load into the deep reinforcement learning model, and performing feature extraction on the input by a graph convolution neural network to obtain features F of each node_j,j＝1,...N；

F of all nodes by reinforcement learning Q network_jFor input, an evaluation value Q with each node as a supply node is obtained_jAnd selecting the node with the largest evaluation value as the energy supply node n_source；

The operator network will F_jAnd n_sourceThe input is the route a of the next energy transfer, and the output is the route a of the next energy transfer;

critic network will F_jAnd n_sourceFor input, an evaluation v(s) of the current energy internet status is given.

Optionally, the forward propagation and backward propagation processes of the deep reinforcement learning model composed of the graph convolution neural network, the reinforcement learning Q network, the operator network, and the critic network include:

historical data s (t) of M time nodes according to initial state data set_i) Each state sample initial value s in M, at random energy router node n_loadIncreasing the load Δ P and performing the following steps:

respectively sending the comprehensive information of N energy routersFeature extraction is carried out in N nodes of the graph convolution neural network to obtain features F of N energy routers_j,j＝1,...N；

F is to be_jN, and obtaining a Q value Q of each energy router node by using a Q network_jN, and a true Q value Q 'is calculated from the evaluation Q function of each power supply node'_jN, according to a mean square error loss function: epsilon (theta)_Q)＝∑(Q_j-Q′_j)²Updating Q network parameter theta_QAnd the neural network parameter θ_G；

Selecting the node with the maximum Q value as a power supply node n_sourceThen, the following steps are repeated to train the operator network and the critic network:

f is to be_jN and a selected supply node N_sourceUsing the operator network pi (· | s; theta) as input_a) Determining action a, and calculating the R value R(s) of the action by an incentive R function;

performing action a, transferring energy to the next energy router node n_source', obtaining the energy internet state s ' at the next moment, and sending s ' into the graph convolution neural network to obtain F_j′,j＝1,...N；

F is to be_jN and F, j ═ 1_j', j ═ 1.. N is substituted into the critic network to calculate estimates V(s) and V (s') of the cost function, and TD error δ is calculated:

δ＝R(s)+γV(s)-V(s′)；

updating the operator network parameter and the graph neural network parameter theta according to the TD error delta_G：

According to the mean square error loss function: epsilon (theta)_c)＝(R(s)+γV(s)-V(s′))²Updating the critical network parameter θ_cAnd the neural network parameter θ_G：

N is to be_source' as a new supply node n_source，F_j' is a novel F_jStarting the next cycle until n_sourceIs' n_load；

When the preset maximum number of training rounds is reached, the training is finished, and the parameters theta of the graph convolutional neural network, the reinforcement learning Q network, the actor and the critic network are returned_G、θ_Q、θ_aAnd theta_c。

Optionally, the state sample initial value s includes: n energy routers N_kState information and device parameters of N.

According to the technical scheme, the real-time state information and the equipment parameters of each energy router in the energy internet and the newly added load in the energy internet are input into the deep reinforcement learning model consisting of the graph convolution neural network, the reinforcement learning Q network, the actor network and the critic network, so that the optimal power supply energy router and the optimal energy routing path are obtained. Therefore, the embodiment of the application realizes the rapid and accurate optimization of the energy transmission line by utilizing the real-time data of each energy router in the energy internet, thereby reducing the loss of energy transmission and ensuring the efficiency and reliability of power supply. In addition, in the optimization process, the available power supply capacity, the power supply quality, the reliability and the price of each energy router, the number of the routes required to be passed by newly adding a load to the energy router, the length of the transmission line and other factors are comprehensively considered, and the accuracy of the optimization result is further improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a flowchart of an energy route optimization method provided in an embodiment of the present application;

fig. 2 is a second flowchart of an energy route optimization method according to an embodiment of the present application;

FIG. 3 is a network architecture diagram of a deep reinforcement learning model according to an embodiment of the present disclosure;

FIG. 4 is a block diagram of a graph convolution neural network provided by an embodiment of the present application;

FIG. 5 is a block diagram of a atlas layer of an atlas neural network provided in an embodiment of the present application;

FIG. 6 is a flow chart of deep reinforcement learning model training provided by an embodiment of the present application;

fig. 7 is a schematic structural diagram of an energy routing optimization apparatus provided in an embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Fig. 1 shows a flowchart of an energy route optimization method provided in an embodiment of the present application, fig. 2 is a flowchart of another energy route optimization method provided in the embodiment of the present application, fig. 3 is a network structure diagram of a deep reinforcement learning model provided in the embodiment of the present application, fig. 4 is a structure diagram of a convolutional neural network provided in the embodiment of the present application, fig. 5 is a structure diagram of a convolutional layer of the convolutional neural network provided in the embodiment of the present application, and fig. 6 is a training flowchart of the deep reinforcement learning model provided in the embodiment of the present application. The energy route optimization method provided by the embodiment of the present application is explained and explained in detail below with reference to fig. 1 to 6, and as shown in fig. 1, the energy route optimization method provided by the embodiment of the present application specifically includes:

step 101: acquiring a network structure formed by interconnection of N energy routers in an energy internet, and a state information historical data set and equipment parameters of each energy router;

in this step, it should be noted that, first, a network structure in which all N energy routers in the energy internet are interconnected, and a state information history data set and device parameters of each energy router are obtained. Specifically, for an energy internet system interconnected by N energy routers, the network structure of the energy internet can be obtained according to the connection relationship of the electric energy transmission lines among the energy routers. Further, the energy router is used as a node of the graph, and the connection line is used as an edge, so that the topological structure of the energy Internet can be obtained. And then for each energy router node n_iAcquiring the device parameters thereof including the energy transfer efficiency eff_iMaximum power generation capacity C^gen _iMaximum load capacity C^load _iAnd acquiring the state information of the energy router, wherein the state information comprises load power in a microgrid connected with the energy router, power of power generation equipment and microgrid voltage. And finally, combining the equipment parameters and the state information of the N energy routers into comprehensive information of the energy Internet system.

Step 102: constructing a reinforcement learning environment of the energy Internet according to the operation principle of the energy Internet and the energy router, and constructing a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network according to the network structure; the reinforcement learning Q network, the actor network and the critic network all adopt fully-connected neural network structures;

in this step, it should be noted that the energy internet reinforcement learning environment is built according to a mechanism that energy routers in the energy internet perform energy transfer, an equation of a transfer process of energy between the energy routers is established, and energy loss generated by conversion efficiency of the energy routers, energy loss generated by lines due to impedance, and change of network voltage are mainly considered. The energy internet reinforcement learning environment can simulate an energy transfer process in the energy internet, namely, energy transfer is carried out according to the states of all energy routers in the existing energy internet, and the updated states of all energy routers in the energy internet after transfer are obtained.

In this step, as shown in fig. 4, in the structure diagram of the graph convolution neural network provided in the embodiment of the present application, the graph convolution neural network is formed by connecting a plurality of graph convolution layers, and the output of more than one graph convolution layer of each graph convolution layer is used as an input until a final feature extraction result of the energy internet is obtained. Each graph convolution layer in the graph convolution neural network is as shown in fig. 5, and the graph convolution layer takes the characteristic of each graph node as input and combines an adjacent matrix representing the graph structure to perform graph convolution operation on the characteristic of the graph node. The graph convolution operation carries out fusion and feature extraction on the features of the nodes with the connection relation, and compared with a common convolution neural network or a full-connection neural network, the extracted features can reflect the structural information of the graph.

In this step, it should be noted that a deep reinforcement learning model composed of a graph convolution neural network, a reinforcement learning Q network, an actor network, and a critic network is built according to a network structure of the energy internet. Wherein, the reinforcement learning Q network, the actor network and the critic network all adopt a fully connected neural network structure, and in terms of network output, as shown in fig. 3: the output of the reinforcement learning Q network is the Q value of the corresponding node of each energy router, which represents the reward value of the energy router as the energy supply node, and the larger the value is, the more suitable the energy router is as the energy supply node; the output of the actor network is the next energy router node of energy transfer; the output of the critic network is an evaluation of the energy delivery process.

Step 103: according to the state information historical data set and the equipment parameters of each energy router in the energy Internet, the available power supply capacity, the power supply quality, the reliability and the price of each energy router are integrated, the number of routes and the length of a transmission line which are required to pass by a newly added load to the energy router are increased, an evaluation Q function of the power supply energy router is constructed, and a reward R function of each energy transfer is constructed for the energy transfer between every two energy routers; the evaluation Q function is used for calculating the reward value of the power supply energy router; the reward R function is used to calculate a reward value obtained for performing each energy transfer action;

in this step, it should be noted that, according to the state information historical data set and the device parameters of each energy router in the energy internet, the available power supply capacity, the power supply quality, the reliability, and the price of each energy router are integrated, and the number of routes that a load needs to be added to the energy router and the length of a transmission line are integrated, so as to construct an evaluation Q function of the power supply energy router, where the Q function is:

Q(n_load,n_source)＝-α₁l_line-α₂c_router+α₃r_source+α₄p_source+α₄q_sourceΔP

wherein n is_loadEnergy router, n, indicating where the newly added load is located_sourceEnergy router supplying power to the energy router node where the newly added load is located, Δ P representing the newly added load, l_lineRepresents n_loadAnd n_sourceTotal length of connecting line between c_routerRepresents n_loadAnd n_sourceNumber of routers in between, r_sourceRepresents n_sourceReliability of power supply of p_sourceRepresents n_sourcePrice of power supply per unit energy, q_sourceRepresents n_sourceQuality of supply of alpha₁、α₂、α₃、α₄And alpha₅Each 5-part coefficient was evaluated for adjusting specific gravity.

In this step, for energy transfer between every two energy routers, an energy transfer starting node n is given_startEnergy receiving node n_endAnd the load to be transferred, Δ P, the corresponding reward R function is:

R(n_start，n_end)＝-β₁L_line-β₂L_router-β₃ΔU；

wherein L is_lineRepresenting the energy loss on the line during the energy transfer, L_routerThe energy loss of the energy router in the energy transfer process is shown, the delta U is the absolute value of the voltage deviation brought by the transfer process, and the beta 1, the beta 2 and the beta 3 are respectively coefficients of 3 part rewards and are used for adjusting the specific gravity. The reward R function is used to calculate the value of the reward obtained for each energy transfer action, giving the system a direction of travel.

Step 104: initializing parameter θ of graph convolutional neural network, reinforcement learning Q network, actor network, and critic network_G、θ_Q、θ_aAnd theta_c(ii) a Simulating the situation of newly added loads by adding loads to random energy routers, and training a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network by using state information historical data sets and equipment parameters of all energy routers in an energy internet as input samples and using evaluation Q functions, energy routing paths and reward R functions of all the energy routers as output samples;

in this step, it should be noted that the initialization of the convolutional neural network includes:

In the step, the situation of newly added load is simulated by adding load to the random energy router, and a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network is trained by using a state information historical data set and equipment parameters of each energy router in the energy internet as input samples and using an evaluation Q function, an energy routing path and an incentive R function of each energy router as output samples. Therefore, the embodiment of the invention learns the energy transfer process among the multiple energy routers in the energy internet by adopting a mode of combining the graph convolution neural network and the deep reinforcement learning, selects and optimizes the energy supply nodes and the energy routing paths of the energy routers, and can ensure the power supply efficiency and reliability of the energy internet.

And the energy router online monitoring state data is integrated, so that the energy routing in the energy Internet is selected and optimized quickly and accurately, and the power supply efficiency and reliability of the energy Internet can be guaranteed.

Step 105: for the requirement of new loads in the energy Internet, inputting the real-time state information and equipment parameters of each energy router in the energy Internet and the new loads into the deep reinforcement learning model to obtain an optimal power supply energy router and an optimal energy routing path; the optimal power supply energy router is the energy router with the maximum Q function value, and the optimal energy routing path is output by the actor network.

In this step, as shown in fig. 6, the deep reinforcement learning model takes the real-time status information of each energy router of the energy internet, the device parameters, and the newly added load of a certain node as inputs. First, the neural network G (theta) is mapped_G) Carrying out feature extraction on the input to obtain the feature F of each node_jJ is 1. Then reinforcement learns the Q network Q (theta)_Q) With F of all nodes_jFor input, an evaluation value Q with each node as a power supply node is obtained_jThe node with the largest evaluation value is selected as the power supply node n_source. actor network pi (· | s;. theta)_a) With F_jAnd n_sourceFor input, the action output is the route a for the next energy transfer. critic network V (s; θ c) by F_jAnd n_sourceFor input, an evaluation v(s) of the current energy internet status is given.

According to the technical scheme, the real-time state information and the equipment parameters of each energy router in the energy internet and the newly added load in the energy internet are input into the deep reinforcement learning model consisting of the graph convolution neural network, the reinforcement learning Q network, the actor network and the critic network, so that the optimal power supply energy router and the optimal energy routing path are obtained. Therefore, the embodiment of the application realizes the rapid and accurate optimization of the energy transmission line by utilizing the real-time data of each energy router in the energy internet, thereby reducing the loss of energy transmission and ensuring the efficiency and reliability of power supply. In addition, in the optimization process, the available power supply capacity, the power supply quality, the reliability and the price of each energy router, the number of routes required to be passed by newly adding loads to the energy routers, the length of transmission lines and other factors are comprehensively considered, and the accuracy of the optimization result is further improved.

Based on the content of the foregoing embodiment, in this embodiment, the state information history data set of the energy router includes: load power in the microgrid connected to the energy router, power of the power generation equipment and microgrid voltage.

In the present embodiment, the historical data set of the state information of the energy router in the energy internet includes load power in the microgrid connected to the energy router, power of the power generation equipment, and microgrid voltage. Preferably, the state data of the energy routers can be monitored on line to ensure the accuracy of energy routing optimization each time, and the existing routing table optimization method needs to update the routing table frequently when new energy generates electricity or the load fluctuates, so that the transmission load of information flow in the energy internet is increased.

Based on the content of the foregoing embodiment, in this embodiment, the device parameters of the energy router include: energy transfer efficiency, maximum power generation capacity, and maximum load capacity.

In this embodiment, it should be noted that the device parameters of the energy router include: energy transfer efficiency, maximum power generation capacity, and maximum load capacity. The method has the advantages that the equipment parameter information of each energy router in the energy internet is obtained during each energy routing optimization, and the multi-dimensional reference is provided for the selection of the power supply energy router.

Based on the content of the foregoing embodiment, in the present embodiment, the structure of the graph convolution neural network includes: the number of graph adjacency matrices, the number of graph convolution layers, and the number of convolution kernels per layer of graph convolution.

In this embodiment, it should be noted that the graph convolution neural network is widely applied in many fields, and has a good effect on processing of graph structure data, such as wireless network node selection, urban road traffic prediction, and the like. The energy internet formed by connecting a plurality of energy routers is a typical graph structure, and a graph convolution neural network can perform efficient feature extraction on operation information of the graph convolution neural network. As shown in fig. 4, the structure of the graph convolution neural network includes a neighborhood matrix of a graph, the number of graph convolution layers, and the number of convolution kernels convolved per layer of the graph.

Based on the contents of the above embodiments, in the present embodiment, the structures of the reinforcement learning Q network, the operator network, and the criticc network include the number of layers of the network, the type of each layer, and the input/output size.

In this embodiment, the structure of the reinforcement learning Q network, the operator network and the criticc network includes the number of layers of the network, the type of each layer and the input/output size.

Based on the content of the foregoing embodiment, in the present embodiment, the evaluation Q function is:

R(n_start,n_end)＝-β₁L_line-β₂L_router-β₃ΔU；

In this embodiment, it should be noted that, according to the historical state information data set and the device parameters of each energy router in the energy internet, and by integrating the available power supply capacity, the power supply quality, the reliability, and the price of each energy router, and the number of routes that a load needs to be added to the energy router and the length of a transmission line, an evaluation Q function of the power supply energy router is constructed, where the Q function is:

In the present embodiment, for energy transfer between every two energy routers, an energy transfer starting node n is given_startEnergy receiving node n_endAnd the load to be transferred, Δ P, the corresponding reward R function is:

R(n_start,n_end)＝-β₁L_line-β₂L_router-β₃ΔU；

wherein L is_lineRepresenting the energy loss on the line during the energy transfer, L_routerRepresents the energy loss of the energy router in the energy transfer process, and Delta U is the absolute value of the voltage deviation brought by the transfer process, beta₁、β₂And beta₃Coefficients of 3 bonus points, respectively, are used to adjust the specific gravity. The reward R function is used to calculate the value of the reward obtained for each energy transfer action, giving the system a direction of travel.

Based on the content of the foregoing embodiment, in this embodiment, the initializing graph convolutional neural network includes:

In this embodiment, it should be noted that the initializing of the convolutional neural network includes:

initialization of the critical network, including: and determining the number of layers of the network and the type of each layer, wherein the input size is the same as the output size of the graph convolution neural network, and the output size is 1, namely the evaluation value of the energy transfer process.

Based on the content of the foregoing embodiment, in this embodiment, the inputting the real-time state information and the device parameters of each energy router in the energy internet and the newly added load into the deep reinforcement learning model to obtain an optimal power supply energy router and energy routing path includes:

In this embodiment, it should be noted that the deep reinforcement learning model takes real-time status information of each energy router of the energy internet, device parameters, and a newly added load of a certain node as input. First, the neural network G (theta) is mapped_G) Carrying out feature extraction on the input to obtain the feature F of each node_jJ is 1. Then reinforcement learns the Q network Q (theta)_Q) With F of all nodes_jFor input, an evaluation value Q with each node as a power supply node is obtained_jThe node with the largest evaluation value is selected as the power supply node n_source. actor network pi (· | s;. theta)_a) With Fj and n_sourceFor input, the action output is the route a for the next energy transfer. criticc network V (s; θ c) by F_jAnd n_sourceFor input, an evaluation v(s) of the current energy internet status is given.

Based on the content of the foregoing embodiment, in this embodiment, the forward propagation and backward propagation processes of the deep reinforcement learning model composed of the graph convolution neural network, the reinforcement learning Q network, the actor network, and the critic network include:

respectively sending the comprehensive information of the N energy routers into N nodes of the graph convolution neural network for feature extraction to obtain features F of the N energy routers_j,j＝1,...N；

F is to be_jN, and obtaining a Q value Q of each energy router node by using a Q network_jN, and a true Q value Q 'is calculated from the evaluation Q function of each power supply node'_jN, according to a mean square error loss function: epsilon (theta)_Q)＝∑(Q_j-Q′_j)²Updating Q network parameter theta_QAnd neural network parametersNumber theta_G；

f is to be_jN and a selected supply node N_sourceUsing the operator network pi (· | s; theta) as input_a) Determining action a, and calculating the R value R(s) of the action by a reward R function;

F is to be_jN and F, j ═ 1_jN is substituted into the criticc network to calculate estimates V(s) and V (s') of the cost function, and TD error δ is calculated:

δ＝R(s)+γV(s)-V(s′)；

According to the mean square error loss function: epsilon (theta)_c)＝(R(s)+γV(s)-V(s′))²Updating critic network parameter theta_cAnd the neural network parameter θ_G：

In this embodiment, it should be noted that the forward and backward propagation processes of the deep reinforcement learning model are as follows:

given a maximum number of training rounds, the following steps are then performed round by round:

historical data s (t) of M time nodes according to initial state data set_i) Each state sample initial value s in 1,. M, at a random energy router node n_loadIncreasing the load Δ P and performing the following steps:

respectively sending the comprehensive information of the N energy routers into N nodes of the graph convolutional neural network for feature extraction to obtain features F of the N energy routers_j,j＝1,...N；

performing action a, transferring energy to the next energy router node n_source' obtaining the energy internet state s ' at the next moment, and sending s ' into a graph convolution neural network to obtain F_j′,j＝1,...N；

δ＝R(s)+γV(s)-V(s′)；

For a newly added load of a certain energy router node, energy internet information and load information are input into a graph convolution neural network, and an optimal power supply node is obtained through a Q network; inputting the energy Internet characteristics and the power supply node information into an operator network to obtain a next node for energy transfer until the energy is transferred to the node where the newly added load is located; the output process of the operator network is further arranged into an optimal energy route, and the optimal energy route is sent to the energy route through the nodes by the information network to establish an actual energy transmission route.

Based on the same inventive concept, another embodiment of the present invention provides an energy route optimization device, as shown in fig. 7, the energy route optimization device provided in the embodiment of the present application includes:

the system comprises a first processing module 1, a second processing module and a control module, wherein the first processing module is used for acquiring a network structure formed by interconnection of N energy routers in the energy Internet, and a state information historical data set and equipment parameters of each energy router;

the second processing module 2 is used for constructing a reinforcement learning environment of the energy internet according to the operation principle of the energy internet and the energy router, and constructing a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network according to the network structure; the reinforcement learning Q network, the actor network and the critic network all adopt fully-connected neural network structures;

the third processing module 3 is configured to construct an evaluation Q function of the power supply energy router according to the state information historical data set and the device parameters of each energy router in the energy internet, and synthesize the available power supply capacity, the power supply quality, the reliability, and the price of each energy router, the number of routes that a load needs to be added to the energy router, and the length of a transmission line, and construct a reward R function for each energy transfer for energy transfer between every two energy routers; the evaluation Q function is used for calculating the reward value of the power supply energy router; the reward R function is used for calculating a reward value obtained by executing each energy transfer action;

a fourth processing module 4 for initializing the parameters theta of the graph convolution neural network, the reinforcement learning Q network, the actor network and the critic network_G、θ_Q、θ_aAnd theta_c(ii) a Simulating the situation of newly added loads by adding loads to random energy routers, and training a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network by using state information historical data sets and equipment parameters of all energy routers in an energy internet as input samples and using evaluation Q functions, energy routing paths and reward R functions of all the energy routers as output samples;

the fifth processing module 5 is configured to input the real-time state information and the device parameters of each energy router in the energy internet and the newly added load into the deep reinforcement learning model to obtain an optimal power supply energy router and an optimal energy routing path; the optimal power supply energy router is the energy router with the maximum Q function value, and the optimal energy routing path is output by the actor network.

In this embodiment, it should be noted that, first, a network structure in which all N energy routers in the energy internet are interconnected, and a state information history data set and device parameters of each energy router are obtained. Specifically, for an energy internet system interconnected by N energy routers, according to the connection relationship of electric energy transmission lines among the energy routers, the connection relationship may beAnd obtaining a network structure of the energy Internet. Further, the energy router is used as a node of the graph, and the connection line is used as an edge, so that the topological structure of the energy Internet can be obtained. And then for each energy router node n_iAcquiring the device parameters thereof including the energy transfer efficiency eff_iMaximum power generation capacity C^gen _iMaximum load capacity C^load _iAnd acquiring the state information of the energy router, wherein the state information comprises load power in a microgrid connected with the energy router, power of power generation equipment and microgrid voltage. And finally, combining the equipment parameters and the state information of the N energy routers into comprehensive information of the energy Internet system.

In the embodiment, the energy internet reinforcement learning environment is built according to the mechanism of energy transmission of the energy routers in the energy internet, an equation of the energy transmission process among the energy routers is established, and energy loss generated by the conversion efficiency of the energy routers, energy loss of lines due to impedance and change of network voltage are mainly considered. The energy internet reinforcement learning environment can simulate an energy transfer process in the energy internet, namely, energy transfer is carried out according to the states of all energy routers in the existing energy internet, and the updated states of all energy routers in the energy internet after transfer are obtained.

In this embodiment, as shown in fig. 4, in the structure diagram of the graph convolution neural network provided in the embodiment of the present application, the graph convolution neural network is formed by connecting a plurality of graph convolution layers, and the output of more than one graph convolution layer of each graph convolution layer is used as an input until a final feature extraction result of the energy internet is obtained. Each graph convolution layer in the graph convolution neural network is as shown in fig. 5, and the graph convolution layer takes the characteristics of each graph node as input, and combines the adjacent matrix of the characteristic graph structure to perform graph convolution operation on the characteristics of the graph nodes. The graph convolution operation carries out fusion and feature extraction on the features of the nodes with the connection relation, and compared with a common convolution neural network or a full-connection neural network, the extracted features can reflect the structural information of the graph.

In this embodiment, a deep reinforcement learning model composed of a graph convolution neural network, a reinforcement learning Q network, an actor network, and a critic network is built according to a network structure of an energy internet. Wherein, the reinforcement learning Q network, the actor network and the critic network all adopt a fully connected neural network structure, and in terms of network output, as shown in fig. 3: the output of the reinforcement learning Q network is the Q value of the corresponding node of each energy router, which represents the reward value of the energy router as the energy supply node, and the larger the value is, the more suitable the energy router is as the energy supply node; the output of the actor network is the next energy router node of energy transfer; the output of the critic network is an evaluation of the energy delivery process.

wherein n is_loadEnergy router, n, indicating where the newly added load is located_sourceEnergy router supplying power to the energy router node where the newly added load is located, Δ P representing the newly added load, l_lineRepresents n_loadAnd n_sourceTotal length of connecting line between c_routerRepresents n_loadAnd n_sourceNumber of routers in between, r_sourceRepresents n_sourceReliability of power supply of p_sourceRepresents n_sourcePrice of power supply per unit energy, q_sourceRepresents n_sourceQuality of supply of alpha₁、α₂、α₃、α₄And alpha₅Each of which was evaluated in section 5, and was used to adjust specific gravity.

In the present embodiment, forEnergy transfer between every two energy routers, given an energy transfer initiation node n_startEnergy receiving node n_endAnd the load to be transferred, Δ P, the corresponding reward R function is:

R(n_start，n_end)＝-β₁L_line-β₂L_router-β₃ΔU；

In this embodiment, it should be noted that the initializing of the graph convolution neural network includes:

In the embodiment, the situation of newly added loads is simulated by adding loads to random energy routers, real-time state information and equipment parameters of each energy router in the energy internet are used as input samples, and an evaluation Q function, an energy routing path and an incentive R function of each energy router are used as output samples, so that a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network is trained. Therefore, the embodiment of the invention learns the energy transfer process among the multiple energy routers in the energy internet by adopting a mode of combining the graph convolution neural network and the deep reinforcement learning, selects and optimizes the energy supply nodes and the energy routing paths of the energy routers, and can ensure the power supply efficiency and reliability of the energy internet.

In this embodiment, as shown in fig. 6, the deep reinforcement learning model takes the state information of each energy router of the energy internet, the device parameters, and the newly added load of a certain node as inputs. First, the neural network G (theta) is mapped_G) Performing feature extraction on the input to obtain features F of each node_jJ is 1. Then reinforcement learns the Q network Q (theta)_Q) With F for all nodes_jFor input, an evaluation value Q with each node as a power supply node is obtained_jThe node with the largest evaluation value is selected as the power supply node n_source. actor network pi (· | s;. theta)_a) With F_jAnd n_sourceFor input, the action output is the route a for the next energy transfer. critic network V (s; θ c) by F_jAnd n_sourceFor input, an evaluation v(s) of the current energy internet status is given.

The energy route optimization device described in this embodiment may be used to implement the above method embodiments, and the principle and technical effect are similar, which are not described herein again.

Based on the same inventive concept, another embodiment of the present invention provides an electronic device, which, with reference to the schematic structural diagram of the electronic device shown in fig. 8, specifically includes the following contents: a processor 801, memory 802, communication interface 803, and communication bus 804;

the processor 801, the memory 802 and the communication interface 803 complete mutual communication through the communication bus 804; the communication interface 803 is used for realizing information transmission between devices;

the processor 801 is configured to call a computer program in the memory 802, and when the processor executes the computer program, the processor implements all the steps of the energy route optimization method, for example, acquiring a network structure formed by interconnecting N energy routers in an energy internet, and a state information history data set and device parameters of each energy router; constructing a reinforcement learning environment of the energy Internet according to the operation principle of the energy Internet and the energy router, and constructing a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network according to the network structure; the reinforcement learning Q network, the operator network and the critic network all adopt a fully-connected neural network structure; according to theThe method comprises the steps that state information historical data sets and equipment parameters of each energy router in the energy internet are integrated, the available power supply capacity, the power supply quality, the reliability and the price of each energy router are integrated, the number of routes and the length of a transmission line which are required to be passed by a newly added load to the energy router are increased, an evaluation Q function of the power supply energy router is constructed, and a reward R function of each energy transfer is constructed for the energy transfer between every two energy routers; the evaluation Q function is used for calculating the reward value of the power supply energy router; the reward R function is used to calculate a reward value obtained for performing each energy transfer action; initializing parameter θ of graph convolutional neural network, reinforcement learning Q network, actor network, and critic network_G、θ_Q、θ_aAnd theta_c(ii) a Simulating the situation of newly added loads by adding loads to random energy routers, and training a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network by using state information historical data sets and equipment parameters of all energy routers in an energy internet as input samples and using evaluation Q functions, energy routing paths and reward R functions of all the energy routers as output samples; inputting state information historical data sets and equipment parameters of all energy routers in an energy internet and newly-added loads in the energy internet into the deep reinforcement learning model to obtain an optimal power supply energy router and an optimal energy routing path; the optimal power supply energy router is the energy router with the maximum Q function value, and the optimal energy routing path is output by the actor network.

Based on the same inventive concept, yet another embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements all the steps of the above energy route optimization method, for example, acquiring a network structure formed by interconnecting N energy routers in an energy internet, and a historical data set of state information and device parameters of each energy router; the energy Internet is constructed according to the operation principle of the energy Internet and the energy routerA deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network is constructed according to the network structure; the reinforcement learning Q network, the actor network and the critic network all adopt fully-connected neural network structures; according to the state information historical data set and the equipment parameters of each energy router in the energy Internet, the available power supply capacity, the power supply quality, the reliability and the price of each energy router are integrated, the number of routes and the length of a transmission line which are required to pass by a newly added load to the energy router are increased, an evaluation Q function of the power supply energy router is constructed, and a reward R function of each energy transfer is constructed for the energy transfer between every two energy routers; the evaluation Q function is used for calculating the reward value of the power supply energy router; the reward R function is used to calculate a reward value obtained for performing each energy transfer action; initializing parameter θ of graph convolutional neural network, reinforcement learning Q network, actor network, and critic network_G、θ_Q、θ_aAnd theta_c(ii) a Simulating the situation of newly added loads by adding loads to random energy routers, and training a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network by using state information historical data sets and equipment parameters of all energy routers in an energy internet as input samples and using evaluation Q functions, energy routing paths and reward R functions of all the energy routers as output samples; inputting state information historical data sets and equipment parameters of all energy routers in an energy internet and newly-added loads in the energy internet into the deep reinforcement learning model to obtain an optimal power supply energy router and an optimal energy routing path; the optimal power supply energy router is the energy router with the maximum Q function value, and the optimal energy routing path is output by an operator network.

In addition, the logic instructions in the memory may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on such understanding, the above technical solutions may be essentially or partially implemented in the form of software products, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and include instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the energy route optimization method according to the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for energy routing optimization, comprising:

acquiring a network topology formed by interconnection of N energy routers in an energy internet, and a state information historical data set and equipment parameters of each energy router;

constructing a reinforcement learning environment of the energy Internet according to the mechanism of energy transfer of the energy Internet and the energy router, and constructing a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network according to the network topology; the reinforcement learning Q network, the operator network and the critic network all adopt a fully-connected neural network structure;

according to the state information historical data set and the equipment parameters of each energy router in the energy Internet, the available power supply capacity, the power supply quality, the reliability and the price of each energy router, the number of routes which are newly added and load needs to pass through the energy router and the length of a transmission line, an evaluation Q function of the power supply energy router is constructed, and a reward R function of each energy transfer is constructed for the energy transfer between every two energy routers; the evaluation Q function is used for calculating the reward value of the power supply energy router; the reward R function is used for calculating a reward value obtained by executing each energy transfer action;

initializing parameter θ of graph convolutional neural network, reinforcement learning Q network, actor network, and critic network_G、θ_Q、θ_aAnd theta_c(ii) a Simulating new load conditions by adding loads to random energy routers, using historical data sets of state information and equipment parameters of each energy router in the energy internet as input samples, and using evaluation of each energy routerTaking the price Q function, the energy routing path and the reward R function as output samples, and training a deep reinforcement learning model consisting of a graph convolution neural network, a reinforcement learning Q network, an actor network and a critic network;

inputting real-time state information and equipment parameters of each energy router in the energy internet and the newly added load into the deep reinforcement learning model to obtain an optimal power supply energy router and an optimal energy routing path for energy consumption requirements of the newly added load in the energy internet; the optimal power supply energy router is the energy router with the maximum Q function value, and the optimal energy routing path is output by an actor network;

wherein the evaluation Q function is:

R(n_start,n_end)＝-β₁L_line-β₂L_router-β₃ΔU；

wherein L is_lineRepresenting the energy loss on the line during the energy transfer, L_routerRepresents the energy loss of the energy router in the energy transfer process, and Delta U is the absolute value of the voltage deviation brought by the transfer process, beta₁、β₂And beta₃Coefficients of 3 part prizes respectively for adjusting specific gravity;

the method for obtaining the optimal power supply energy router and energy routing path by inputting the real-time state information and the equipment parameters of each energy router in the energy internet and the newly added load into the deep reinforcement learning model comprises the following steps:

2. The energy routing optimization method of claim 1, wherein the historical data set of state information for the energy router comprises: load power in the microgrid connected to the energy router, power of the power generation equipment and microgrid voltage.

3. The energy routing optimization method of claim 1, wherein the device parameters of the energy router comprise: energy transfer efficiency, maximum power generation capacity, and maximum load capacity.

4. The energy routing optimization method of claim 1, wherein the structure of the graph convolutional neural network comprises: the number of graph adjacency matrices, the number of graph convolution layers, and the number of convolution kernels for each layer of graph convolution.

5. The energy routing optimization method according to claim 1, wherein the structure of the reinforcement learning Q network, the actor network and the critic network comprises the number of layers of the network, the type of each layer and the input-output size.

6. The energy routing optimization method of claim 1, wherein initializing the graph convolutional neural network comprises:

constructing an adjacency matrix of the graph according to the network topology of the energy Internet, and determining the number of graph convolution layers and the number of convolution kernels of each layer of graph convolution; the construction method of the adjacency matrix A comprises the following steps:

7. The energy route optimization method according to claim 1, wherein the forward propagation and backward propagation processes of the deep reinforcement learning model composed of the graph convolution neural network, the reinforcement learning Q network, the operator network and the critic network comprise:

δ＝R(s)+γV(s)-V(s′)；

8. The energy routing optimization method of claim 7, wherein the stateA sample initial value s comprising: n energy routers N_kN, k ═ 1.. N.