CN113489654A

CN113489654A - Routing method, routing device, electronic equipment and storage medium

Info

Publication number: CN113489654A
Application number: CN202110764128.5A
Authority: CN
Inventors: 谢可; 郭文静; 杨成; 张楠
Original assignee: State Grid Information and Telecommunication Co Ltd
Current assignee: State Grid Information and Telecommunication Co Ltd
Priority date: 2021-07-06
Filing date: 2021-07-06
Publication date: 2021-10-08
Anticipated expiration: 2041-07-06
Also published as: CN113489654B

Abstract

The application provides a routing method, a device, electronic equipment and a storage medium, in the method, on the basis of obtaining a graph neural network model through training, the target network topology structure, the target routing scheme and the target network traffic are input into the graph neural network model, the network delay between nodes in each link in the target routing scheme output by the graph neural network model is obtained, the accurate prediction of the network delay can be ensured aiming at the change of the network topology and the fluctuation of the network traffic, on the basis, the network delay between nodes in each link in the target routing scheme is input into a deep reinforcement learning network model, each node output by the deep reinforcement learning network model is obtained as the action value of each next hop node, and the routing scheme is obtained in real time according to the network delay, and the stability of the routing performance is improved.

Description

Routing method, routing device, electronic equipment and storage medium

Technical Field

The present application relates to the field of communications technologies, and in particular, to a routing method, an apparatus, an electronic device, and a storage medium.

Background

The electric power thing networking is the basis and the carrier of energy internet digitization, and along with the construction of electric power thing networking constantly promotes to the degree of depth demand of energy perception, the requirement platform layer possesses hundred million level terminal access, ten million level concurrent connections, and traditional network architecture has been unable to satisfy the demand. In order to meet the requirements, the power internet of things can apply a Software Defined Network (SDN) technology, and the SDN technology can make a Network policy based on macro control of Network state information, so as to provide a more refined service policy for the Network. The electric power internet of things can use a machine learning algorithm based on an SDN framework to realize an intelligent routing scheme.

However, based on the machine learning algorithm, the stability of the routing performance of the implemented intelligent routing scheme still needs to be improved.

Disclosure of Invention

The application provides the following technical scheme:

a routing method, comprising:

acquiring a target network topological structure;

generating a target routing scheme and target network traffic based on the target network topology, wherein the target network traffic comprises traffic load between nodes in each link in the target network topology;

inputting the target network topology, the target routing scheme and the target network traffic into a graph neural network model to obtain network delay between nodes in each link in the target routing scheme output by the graph neural network model, wherein the graph neural network model is obtained by utilizing the network topology, the routing scheme and the network traffic and training the network delay between the nodes in each link in the routing scheme;

inputting network delay between nodes in each link in the target routing scheme into a deep reinforcement learning network model to obtain an action value of each node output by the deep reinforcement learning network model as each next hop node, wherein the deep reinforcement learning network model is obtained by utilizing a four-tuple training obtained based on a Markov decision process, and the four-tuple comprises: network delay between nodes in a link, an output action characterizing a selected link, a reward for performing the output action, and network delay between nodes in a link after performing the output action;

and selecting a target node which is respectively used as each next hop node from the plurality of nodes based on the action value of each node output by the deep reinforcement learning network model as each next hop node, and determining a route to be used based on the plurality of target nodes.

Optionally, the inputting the target network topology, the target routing scheme, and the target network traffic into a graph neural network model to obtain a network delay between nodes in each link in the target routing scheme output by the graph neural network model includes:

respectively converting the target network topology structure, the target routing scheme and the target network traffic into a network topology structure to be processed, a routing scheme to be processed and network traffic to be processed which accord with a pre-established uniform resource description model;

using a normalisation formula

Normalizing the traffic load between nodes in each link in the network topology structure to be processed included in the network traffic to be processed to obtain a standard traffic load, wherein x represents the traffic load, and x represents the traffic load_minRepresents a minimum value among traffic loads between nodes in a plurality of links in the target network topology, the x_maxRepresenting a maximum value in traffic load between nodes in a plurality of links in the target network topology;

and inputting standard network flow consisting of the network topology to be processed, the routing scheme to be processed and the standard flow load into a graph neural network model to obtain network delay between nodes in each link in the routing scheme to be processed output by the graph neural network model.

Optionally, when the graph neural network adopts a message passing neural network framework, the step of inputting standard network traffic composed of the to-be-processed network topology, the to-be-processed routing scheme, and the standard traffic load into a graph neural network model to obtain a network delay between nodes in each link in the to-be-processed routing scheme output by the graph neural network model includes:

inputting standard network flow composed of the network topology structure to be processed, the routing scheme to be processed and the standard flow load into a message transmission stage model, and obtaining the state of each node in the network topology structure to be processed, which is output by the message transmission stage model through executing a conversion process;

the conversion process comprises: aggregating the current state of each node in the network topology structure to be processed, the traffic load of the adjacent nodes thereof and the states of the adjacent nodes to obtain an aggregation result, and updating the current state of the nodes by using the aggregation result and the historical state of the nodes to obtain the states of the nodes;

and inputting the state of each node in the network topology structure to be processed into a message reading stage model to obtain the network delay between the nodes in each link in the routing scheme to be processed output by the message reading stage model.

Optionally, the training process of the deep reinforcement learning network model includes:

modeling a routing problem as a Markov decision process and obtaining a plurality of quadruples, the quadruples comprising: network delay between nodes in a link, an output action characterizing a selected link, a reward for performing a harvest of the output action, and network delay between nodes in a link after performing the output action

Writing a plurality of the quadruples into a memory pool;

randomly selecting data from the memory pool as training data, and updating parameters of a depth Q network for fitting an action state cost function by using the training data, wherein the input of the action state cost function is network delay among nodes in a link, and the output of the action state cost function is the action value of each next hop node as the node.

Optionally, the selecting, from the plurality of nodes, a target node to be used as each next-hop node based on the action value of each node output by the deep reinforcement learning network model as each next-hop node includes:

and aiming at each next hop node, determining the maximum action value from the action values of the next hop nodes, which are output by the deep reinforcement learning network model, of each node, and taking the node corresponding to the maximum action value as a target node.

A routing device, comprising:

the acquisition module is used for acquiring a target network topological structure;

a generating module, configured to generate a target routing scheme and target network traffic based on the target network topology, where the target network traffic includes a traffic load between nodes in each link in the target network topology;

a first processing module, configured to input the target network topology, the target routing scheme, and the target network traffic into a graph neural network model, and obtain a network delay between nodes in each link in the target routing scheme output by the graph neural network model, where the graph neural network model is obtained by using a network topology, a routing scheme, and network traffic and training a network delay between nodes in each link in the routing scheme;

a second processing module, configured to input network delay between nodes in each link in the target routing scheme into a deep reinforcement learning network model, to obtain an action value of each node output by the deep reinforcement learning network model as each next-hop node, where the deep reinforcement learning network model is obtained by using a quadruple training obtained based on a markov decision process, and the quadruple includes: network delay between nodes in a link, an output action characterizing a selected link, a reward for performing the output action, and network delay between nodes in a link after performing the output action;

and the third processing module is used for selecting a target node which is respectively used as each next hop node from the plurality of nodes based on the action value of each node which is output by the deep reinforcement learning network model and is used as each next hop node, and determining a route to be used based on the plurality of target nodes.

Optionally, the first processing module is specifically configured to:

using a normalisation formula

Optionally, the first processing module is specifically configured to:

under the condition that the graph neural network adopts a message transmission neural network framework, inputting standard network flow consisting of the network topology structure to be processed, the routing scheme to be processed and the standard flow load into a message transmission stage model, and acquiring the state of each node in the network topology structure to be processed, which is output by the message transmission stage model through executing a conversion process;

An electronic device, comprising:

a memory and a processor;

a memory for storing at least one set of instructions;

a processor for calling and executing the instruction set in the memory, and executing the instruction set to perform the steps of the routing method according to any one of the above items.

A storage medium storing a computer program implementing a routing method as claimed in any preceding claim, the computer program being executable by a processor to implement the steps of the routing method as claimed in any preceding claim.

Compared with the prior art, the beneficial effect of this application is:

in the application, on the basis of obtaining a graph neural network model through training, the target network topology structure, the target routing scheme and the target network traffic are input to the graph neural network model, network delay between nodes in each link in the target routing scheme output by the graph neural network model is obtained, accurate prediction of the network delay can be guaranteed according to changes of network topology and fluctuation of the network traffic, on the basis, the network delay between nodes in each link in the target routing scheme is input to a deep reinforcement learning network model, the action value of each node output by the deep reinforcement learning network model is obtained as each next hop node, the routing scheme is obtained according to the network delay in real time, and stability of routing performance is improved.

Furthermore, a routing scheme for improving the stability of the routing performance is adopted, so that the blocking probability can be effectively reduced, and the throughput of the network is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.

Fig. 1 is a schematic flowchart of a routing method provided in embodiment 1 of the present application;

fig. 2 is a schematic flowchart of a routing method provided in embodiment 2 of the present application;

fig. 3 is a schematic flowchart of a routing method provided in embodiment 3 of the present application;

fig. 4 is a schematic logical structure diagram of a routing device according to embodiment 4 of the present application;

fig. 5 is a schematic structural diagram of an electronic device provided in the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In order to solve the above problem, the present application provides a routing method, and the routing method provided in the present application is described next.

Referring to fig. 1, which is a schematic flow chart of a routing method provided in embodiment 1 of the present application, the routing method provided in the present application may be applied to an electronic device, and the product type of the electronic device is not limited in the present application, as shown in fig. 1, the method may include, but is not limited to, the following steps:

and S101, acquiring a target network topology structure.

In this embodiment, the target network topology may be obtained by the SDN controller.

Step S102, based on the target network topological structure, generating a target routing scheme and target network flow, wherein the target network flow comprises flow load between nodes in each link in the target network topological structure.

Based on the target network topology structure, a target routing scheme and a target network traffic are generated, which can be understood as follows: and randomly generating a target routing scheme and target network flow based on the target network topological structure through experimental simulation.

Step S103, inputting the target network topology structure, the target routing scheme and the target network flow into a graph neural network model to obtain the network delay between nodes in each link in the target routing scheme output by the graph neural network model.

The graph neural network model is obtained by utilizing a network topological structure, a routing scheme, network flow and network delay training between the network topological structure, the routing scheme and nodes in each link in the routing scheme. Specifically, the training process of the neural network model may include:

s1031, acquiring a network topology structure through an SDN controller, and generating a routing scheme and network traffic based on the network topology structure, wherein the network traffic comprises traffic load among nodes in each link in the network topology structure;

s1032, acquiring the network delay among the nodes in each link in the routing scheme through experiments.

S1033, taking the network topology structure, the routing scheme and the network flow as input data, taking the network delay between nodes in each link in the routing scheme as output data, and training a graph neural network model.

In this embodiment, when the graph neural network adopts a message passing neural network framework, the step of inputting the standard network traffic composed of the topology of the network to be processed, the routing scheme to be processed, and the standard traffic load into the graph neural network model to obtain the network delay between nodes in each link in the routing scheme to be processed output by the graph neural network model may include:

s1034, inputting the target network topological structure, the target routing scheme and the target network flow into a message transmission stage model, and obtaining the state of each node in the target network topological structure output by the message transmission stage model through executing a conversion process;

the conversion process comprises: and aggregating the current state of each node in the target network topology structure, the traffic load of the adjacent nodes of the node and the states of the adjacent nodes to obtain an aggregation result, and updating the current state of the node by using the aggregation result and the historical state of the node to obtain the state of the node.

The current state of a node can be understood as: and aggregating the traffic load of the node and the network topological relation of the adjacent nodes. The state of the neighboring node can be understood as: and aggregating the traffic load of the adjacent node and the network topological relation of the adjacent node. The historical state of the node can be understood as: and aggregating the historical traffic load of the node and the network topological relation of the adjacent nodes.

In this embodiment, the current state of each node in the target network topology, the traffic load of the neighboring node thereof, and the state of the neighboring node are aggregated, which may be implemented by the following relational expression:

wherein M is_tThe function of the conversion is identified,

in order to be the current state of the node,

is the state of the neighboring node, e_vωBeing the network delay between a node and a neighboring node,

as a result of the polymerization.

Updating the current state of the node by using the aggregation result and the historical state of the node to obtain the state of the node, wherein the current state of the node can be obtained by the following formula:

U_tin order to be an output function of the output,

in the form of a history of the state of the node,

in order to obtain the result of the polymerization,

is the updated node state.

S1035, inputting the state of each node in the target network topology into the message reading stage model, to obtain the network delay between nodes in each link in the target routing scheme output by the message reading stage model.

Inputting the state of each node and the state of the adjacent node in the target network topology structure into a message reading stage model to obtain the network delay between the nodes in each link in the target routing scheme output by the message reading stage model, which can be realized by the following formula:

and in the message reading stage, the features are aggregated and a predicted value of the link delay is output, and the calculation formula is as follows:

wherein R is a read-out function for aggregating the state of each node in the target network topology structure to obtain an aggregated state, and predicting the network delay between nodes in the link based on the aggregated state,

is the state of the nodes at both ends of the link,

is the predicted network delay between nodes in the link.

In this embodiment, the message passing stage model may be implemented by using a Gated Recurrent Unit (GRU), and the GRU layer may use a dynamic RNN structure, and may process path information of different lengths, and output the state of a node by acquiring a network topology, a network traffic, and a routing scheme. The message reading stage model can adopt a fully-connected neural network, and outputs the predicted network delay between nodes in the link through the state values of the input node and the adjacent nodes.

In this embodiment, the GRU layer adopts a dynamic RNN structure, and can process path information of different lengths, adapt to dynamic network topology change, and further improve the stability of routing performance.

Step S104, inputting the network delay between nodes in each link in the target routing scheme into a deep reinforcement learning network model, and obtaining the action value of each node output by the deep reinforcement learning network model as each next hop node.

The deep reinforcement learning network model is obtained by utilizing a four-tuple training obtained based on a Markov decision process, wherein the four-tuple comprises: network delay between nodes in a link, an output action characterizing a selected link, a reward for performing the output action, and network delay between nodes in a link after performing the output action. In particular, the deep reinforcement learning network model may be, but is not limited to: deep Q Network (DQN) model. In the case that the deep reinforcement learning network model is a deep Q network model, the training process of the deep reinforcement learning network model may include:

s1401, modeling a routing problem as a Markov decision process, and obtaining a plurality of quadruples, wherein the quadruples comprise: network delay between nodes in a link, an output action characterizing a selected link, a reward for performing the output action, and network delay between nodes in a link after performing the output action.

In this embodiment, the deep reinforcement learning model may model the routing problem as a markov decision process, where the markov decision process is a markov process including an incentive and a decision, and represents by a quadruple < S, a, r, S '> S represents state information of the environment (i.e., network delay between nodes in the link), a represents an output action of the deep reinforcement learning algorithm (i.e., output action characterizing the selected link), r represents an incentive harvested for performing the output action a, and S' represents environment state information after performing the output action a (i.e., network delay between nodes in the link after performing the output action a).

And S1402, writing the quadruples into a memory pool.

And S1403, randomly selecting data from the memory pool as training data, and updating parameters of a depth Q network for fitting an action state cost function by using the training data, wherein the input of the action state cost function is network delay between nodes in a link, and the output of the action state cost function is the action value of the node as each next hop node.

In this embodiment, the fitting action state cost function may be represented as Q (s, a; θ), where θ is a parameter of the deep Q network model.

Updating parameters of a deep Q network used to fit an action state cost function using the training data may include:

s14031, updating the parameters of the depth Q network by adopting a gradient descent method, wherein the calculation formula is as follows:

wherein α is the learning rate;

s14032, substituting the calculated parameters of the depth Q network into the defined loss function

J(θ)＝[r_t+1+γmax_αQ(s_t+1，a；θ)-Q(s_t，a；θ)]²；

S14033, determining whether the loss function converges:

if not, returning to execute step S14031; if the convergence is reached, the training is ended.

Step S105, based on the action value of each node outputted by the deep reinforcement learning network model as each next-hop node, selecting a target node from the plurality of nodes as each next-hop node, and based on the plurality of target nodes, determining a route to be used.

As another alternative embodiment of the present application, referring to fig. 2, a schematic flow chart of a routing method provided in embodiment 2 of the present application is provided, and this embodiment mainly describes a refinement scheme of the routing method described in embodiment 1 above, as shown in fig. 2, the method may include, but is not limited to, the following steps:

step S201, obtaining a target network topology structure.

Step S202, based on the target network topology structure, generating a target routing scheme and target network traffic, wherein the target network traffic includes traffic load between nodes in each link in the target network topology structure.

The detailed processes of steps S201 to S202 can be referred to the related descriptions of steps S101 to S102 in embodiment 1, and are not described herein again.

Step S203, the target network topology structure, the target routing scheme and the target network traffic are respectively converted into a network topology structure to be processed, a routing scheme to be processed and a network traffic to be processed which accord with a pre-established uniform resource description model.

The pre-established uniform resource description model may be:

network topology N ═ N_ijIn which n is_ijDenotes a link between the ith node and the jth node, when the link is a path, n_ijWhen i is j, n is 1_ij0; routing scheme C ═ { C ═ C_ijRepresents the set of physical links under the current scheme policy; network flow F ═ F_ijTherein off_ijRepresenting the traffic load of the link between nodes i, j; link delay D ═ D_ijIn which d is_ijRepresenting the network delay between nodes i, j.

Step S204, utilizing a normalization formula

And normalizing the traffic load between nodes in each link in the network topology structure to be processed, which is included in the network traffic to be processed, to obtain a standard traffic load.

Said x represents said traffic load, said x_minRepresents a minimum value among traffic loads among nodes in a plurality of links in the target network topology, the X_maxRepresenting a maximum value of traffic load between nodes in a plurality of links in the target network topology.

Using a normalisation formula

The traffic load between nodes in each link in the network topology structure to be processed, which is included in the network traffic to be processed, is normalized, so that the dimensional influence between the traffic loads can be eliminated, the processing speed is increased, and the routing efficiency is improved.

Step S205, inputting standard network flow composed of the network topology to be processed, the routing scheme to be processed and the standard flow load into a graph neural network model, and obtaining network delay between nodes in each link in the routing scheme to be processed output by the graph neural network model.

In a case that the graph neural network adopts a message passing neural network framework, the inputting standard network traffic composed of the to-be-processed network topology, the to-be-processed routing scheme, and the standard traffic load to a graph neural network model to obtain a network delay between nodes in each link in the to-be-processed routing scheme output by the graph neural network model may include:

s2051, inputting standard network traffic composed of the network topology to be processed, the routing scheme to be processed and the standard traffic load into a message transmission stage model, and obtaining the state of each node in the network topology to be processed, which is output by the message transmission stage model through executing a conversion process;

the conversion process comprises: and aggregating the current state of each node in the network topology structure to be processed, the traffic load of the adjacent nodes thereof and the states of the adjacent nodes to obtain an aggregation result, and updating the current state of the nodes by using the aggregation result and the historical state of the nodes to obtain the states of the nodes.

S2052, inputting the state of each node in the network topology structure to be processed into a message reading stage model, and obtaining the network delay between the nodes in each link in the routing scheme to be processed output by the message reading stage model.

The detailed processes of steps S2051-S2052 can be referred to the related descriptions of S1034-S1035 in embodiment 1, and are not described herein again.

Steps S203 to S205 are a specific implementation of step S103 in example 1.

Step S206, inputting the network delay between the nodes in each link in the routing scheme to be processed into a deep reinforcement learning network model, and obtaining the action value of each node output by the deep reinforcement learning network model as each next hop node.

The deep reinforcement learning network model is obtained by utilizing a four-tuple training obtained based on a Markov decision process, wherein the four-tuple comprises: network delay between nodes in a link, an output action characterizing a selected link, a reward for performing the output action, and network delay between nodes in a link after performing the output action.

Step S206 is a specific implementation manner of step S104 in embodiment 1. The detailed process of inputting the network delay between nodes in each link in the to-be-processed routing scheme into the deep reinforcement learning network model to obtain the action value of each node output by the deep reinforcement learning network model as each next-hop node may be referred to in the related description of step S104 in embodiment 1, and is not described herein again.

Step S207, based on the action value of each node outputted by the deep reinforcement learning network model as each next-hop node, selecting a target node from the plurality of nodes as each next-hop node, and based on the plurality of target nodes, determining a route to be used.

In this embodiment, the target network topology, the target routing scheme, and the target network traffic are converted into the to-be-processed network topology, the to-be-processed routing scheme, and the to-be-processed network traffic conforming to the pre-established uniform resource description model, respectively, and a normalization formula is used

And normalizing the traffic load between nodes in each link in the network topology structure to be processed, which is included in the network traffic to be processed, to obtain a standard traffic load, so that the processing speed can be increased, and the routing efficiency can be improved.

And on the basis of obtaining a graph neural network model through training, inputting the network topology structure to be processed, the routing scheme to be processed and the standard network flow into the graph neural network model, obtaining the network delay among nodes in each link in the routing scheme to be processed output by the graph neural network model, and ensuring that the accurate prediction of the network delay can be carried out according to the change of the network topology and the fluctuation of the network flow.

As another alternative embodiment of the present application, referring to fig. 3, a schematic flow chart of a routing method provided in embodiment 3 of the present application is provided, and this embodiment mainly describes a refinement scheme of the routing method described in embodiment 1 above, as shown in fig. 3, the method may include, but is not limited to, the following steps:

s301, acquiring a target network topological structure;

step S302, based on the target network topological structure, generating a target routing scheme and target network traffic, wherein the target network traffic comprises traffic load between nodes in each link in the target network topological structure;

step S303, inputting the target network topology structure, the target routing scheme and the target network traffic into a graph neural network model to obtain the network delay between nodes in each link in the target routing scheme output by the graph neural network model.

The graph neural network model is obtained by utilizing a network topological structure, a routing scheme, network flow and network delay training between the network topological structure, the routing scheme and nodes in each link in the routing scheme.

Step S304, inputting the network delay between nodes in each link in the target routing scheme into a deep reinforcement learning network model, and obtaining the action value of each node output by the deep reinforcement learning network model as each next hop node.

The detailed processes of steps S301 to S304 can refer to the related descriptions of steps S101 to S104 in embodiment 1, and are not described herein again.

Step S305, for each next hop node, determining a maximum action value from the action values of the next hop node of each node output from the deep reinforcement learning network model, taking the node corresponding to the maximum action value as a target node, and determining a route to be used based on a plurality of target nodes.

Step S305 is a specific implementation manner of step S105 in example 1.

In this embodiment, for each next hop node, the maximum action value is determined from the action values of the next hop nodes of each node output by the deep reinforcement learning network model, and the node corresponding to the maximum action value is used as the target node, so that the accuracy of the target node can be ensured, the reliability of the route to be used is further ensured, and the stability of the routing performance is improved.

Next, a description will be given of a routing device provided in the present application, and the routing device described below and the routing method described above may be referred to in correspondence.

Referring to fig. 4, the routing apparatus includes: an acquisition module 100, a generation module 200, a first processing module 300, a second processing module 400 and a third processing module 500.

An obtaining module 100, configured to obtain a target network topology.

A generating module 200, configured to generate a target routing scheme and target network traffic based on the target network topology, where the target network traffic includes a traffic load between nodes in each link in the target network topology.

A first processing module 300, configured to input the target network topology, the target routing scheme, and the target network traffic into a graph neural network model, and obtain a network delay between nodes in each link in the target routing scheme output by the graph neural network model, where the graph neural network model is obtained by using a network topology, a routing scheme, and network traffic and training a network delay between nodes in each link in the routing scheme.

A second processing module 400, configured to input the network delay between nodes in each link in the target routing scheme into a deep reinforcement learning network model, to obtain an action value of each node output by the deep reinforcement learning network model as each next-hop node, where the deep reinforcement learning network model is obtained by using a quadruple training obtained based on a markov decision process, and the quadruple includes: network delay between nodes in a link, an output action characterizing a selected link, a reward for performing the output action, and network delay between nodes in a link after performing the output action;

a third processing module 500, configured to select, based on an action value of each node output by the deep reinforcement learning network model as each next-hop node, a target node from the multiple nodes, which is respectively used as each next-hop node, and determine a route to be used based on the multiple target nodes.

The first processing module 300 may be specifically configured to:

using a normalisation formula

The first processing module 300 may be specifically configured to:

The routing device may further include: the deep reinforcement learning network model training module is used for:

Writing a plurality of the quadruples into a memory pool;

In this embodiment, the third processing module 500 may be specifically configured to:

Corresponding to the embodiment of the routing method provided by the application, the application also provides an embodiment of electronic equipment applying the routing method.

As shown in fig. 5, which is a schematic structural diagram of an embodiment 1 of an electronic device provided in the present application, the electronic device may include the following structures:

a memory 10 and a processor 20.

A memory 10 for storing at least one set of instructions;

the processor 20 is configured to call and execute the instruction set in the memory 10, and execute the steps of the routing method according to the embodiments described above by executing the instruction set.

Corresponding to the embodiment of the routing method provided by the present application, the present application also provides an embodiment of a storage medium.

In this embodiment, a storage medium stores a computer program for implementing the routing method according to any one of the foregoing embodiments, and the computer program is executed by a processor for implementing the steps of the routing method according to any one of the foregoing embodiments.

It should be noted that each embodiment is mainly described as a difference from the other embodiments, and the same and similar parts between the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.

From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.

The foregoing detailed description is directed to a routing method, apparatus, electronic device, and storage medium provided by the present application, and specific examples are applied in the present application to explain the principles and implementations of the present application, and the descriptions of the foregoing examples are only used to help understand the method and core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A routing method, comprising:

acquiring a target network topological structure;

2. The method of claim 1, wherein inputting the target network topology, the target routing scheme, and the target network traffic to a graph neural network model to obtain a network delay between nodes in each link in the target routing scheme output by the graph neural network model comprises:

using a normalisation formula

3. The method of claim 2, wherein, in a case that the graph neural network adopts a message passing neural network framework, the inputting standard network traffic composed of the topology of the network to be processed, the routing scheme to be processed and the standard traffic load into a graph neural network model to obtain the network delay between nodes in each link in the routing scheme to be processed output by the graph neural network model comprises:

4. The method of claim 1, wherein the training process of the deep reinforcement learning network model comprises:

Writing a plurality of the quadruples into a memory pool;

5. The method according to claim 1, wherein the selecting a target node from the plurality of nodes as each next-hop node based on the action value of each node as each next-hop node output by the deep reinforcement learning network model comprises:

6. A routing device, comprising:

7. The apparatus of claim 6, wherein the first processing module is specifically configured to:

using a normalisation formula

8. The apparatus of claim 7, wherein the first processing module is specifically configured to:

9. An electronic device, comprising:

a memory and a processor;

a memory for storing at least one set of instructions;

a processor for invoking and executing said set of instructions in said memory, the steps of the routing method of any of claims 1-5 being performed by executing said set of instructions.

10. A storage medium, in which a computer program implementing the routing method according to any one of claims 1 to 5 is stored, the computer program being executed by a processor implementing the steps of the routing method according to any one of claims 1 to 5.