CN111526055A - Route planning method and device and electronic equipment - Google Patents

Route planning method and device and electronic equipment Download PDF

Info

Publication number
CN111526055A
CN111526055A CN202010330122.2A CN202010330122A CN111526055A CN 111526055 A CN111526055 A CN 111526055A CN 202010330122 A CN202010330122 A CN 202010330122A CN 111526055 A CN111526055 A CN 111526055A
Authority
CN
China
Prior art keywords
router
target
target router
action
destination address
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010330122.2A
Other languages
Chinese (zh)
Inventor
姚海鹏
袁鑫
买天乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202010330122.2A priority Critical patent/CN111526055A/en
Publication of CN111526055A publication Critical patent/CN111526055A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/02Topology update or discovery
    • H04L45/08Learning-based routing, e.g. using neural networks or artificial intelligence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting
    • H04L41/0823Configuration setting characterised by the purposes of a change of settings, e.g. optimising configuration for enhancing reliability
    • H04L41/0836Configuration setting characterised by the purposes of a change of settings, e.g. optimising configuration for enhancing reliability to enhance reliability, e.g. reduce downtime
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/74Address processing for routing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion

Abstract

The invention provides a route planning method, a device and electronic equipment, wherein the method comprises the following steps: acquiring a destination address route of a target router and an adjacent router of the target router; inputting a destination address route of a target router and an adjacent router of the target router into a deep learning model obtained through pre-training, and obtaining a weighing value corresponding to each executable action of the target router based on the deep learning model obtained through pre-training; wherein the executable action comprises a next hop router of the target router and/or each path from the target router to the destination address route; determining a target execution action of the target router based on the metric values. The invention improves the reliability of route planning.

Description

Route planning method and device and electronic equipment
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a route planning method and apparatus, and an electronic device.
Background
With the rapid development, the number of users is rapidly increased, new network applications are continuously emerged, so that the network flow is rapidly increased, the network congestion caused by the rapid increase becomes a bottleneck problem restricting the network development and the application, the information congestion is a main reason influencing the network service quality, and the effective solution of the congestion problem has important significance for improving the network performance. SDN (software Defined networking) as a novel network architecture, has a characteristic of separating forwarding from control, centralized control also brings great convenience to network management, how to find a suitable forwarding path for a data packet, and how to fully and efficiently utilize each data link in the SDN is a hot topic of current research. However, the existing route planning technology is based on a reinforcement learning algorithm to calculate the metric value, and is difficult to apply to the problem of high-dimensional state space and continuous state space. Therefore, the existing route planning method has the problem of low reliability.
Disclosure of Invention
The embodiment of the invention aims to provide a route planning method, a route planning device and electronic equipment, which can improve the reliability of route planning.
In a first aspect, an embodiment of the present invention provides a routing planning method, including: acquiring a destination address route of a target router and an adjacent router of the target router; inputting a pre-trained deep learning model into a destination address route of the target router and an adjacent router of the target router, and obtaining a weighing value corresponding to each executable action of the target router based on the pre-trained deep learning model; wherein the executable action comprises a next hop router of the target router and/or respective paths of the target router to the destination address route; determining a target execution action for the target router based on the metric value.
In an alternative embodiment, the deep learning model comprises a Seq2Seq model; the executable action comprises a next hop router of the target router; the step of obtaining a metric value corresponding to each executable action of the target router based on the deep learning model obtained by the pre-training comprises the following steps: taking each adjacent router of the target router as a next hop router of the target router; and determining a metric value generated from the target router to each next-hop router based on the Seq2Seq model obtained by pre-training.
In an alternative embodiment, the executable action includes respective paths of the destination router to the destination address route; the step of obtaining a metric value corresponding to each executable action of the target router based on the deep learning model obtained by the pre-training comprises the following steps: and determining the metric values generated by all paths from the target router to the destination address route based on the Seq2Seq model obtained by pre-training.
In an alternative embodiment, the training process of the Seq2Seq model includes: inputting a target training sample into a Seq2Seq model, and carrying out iterative training on the Seq2Seq model based on the target training sample until the training is finished to obtain the trained Seq2Seq model; the target training samples comprise samples marked with metric values generated from the target router to each next-hop router and/or samples marked with metric values generated from each path from the target router to the destination address route, and the metric values marked by the target training samples are environment rewards to actions obtained in advance based on a reinforcement learning algorithm.
In an optional embodiment, the step of determining the target execution action of the target router based on the metric value includes: determining a corresponding target execution action when the weighing value is maximum based on a preset first greedy strategy equation, wherein the preset first greedy strategy equation is as follows:
Figure BDA0002464017890000021
wherein, Q(s)t,at) Is the said measure, atPerforming an action for the target, stIs the current network state of the target router.
In an optional embodiment, the step of determining the target execution action of the target router based on the metric value includes: and determining a corresponding target execution action when the weighing value is maximum based on a preset second greedy strategy equation, wherein the preset second greedy strategy equation is as follows:
Figure BDA0002464017890000031
wherein, taunIs a temperature parameter.
In an alternative embodiment, the calculation of the temperature parameter is as follows:
Figure BDA0002464017890000032
wherein, numnIs (tau)nn-1]Number of dynamic streams in the period, T time to achieve convergence, τ0And τTInitial and final values, respectively.
In a second aspect, an embodiment of the present invention provides a route planning apparatus, including: the state acquisition module is used for acquiring a destination address route of a target router and an adjacent router of the target router; the metric value determining module is used for inputting a destination address route of the target router and an adjacent router of the target router into a pre-trained deep learning model, and obtaining a metric value corresponding to each executable action of the target router based on the pre-trained deep learning model; wherein the executable action comprises a next hop router of the target router and/or respective paths of the target router to the destination address route; and the action determining module is used for determining a target execution action of the target router based on the weighing value.
In a third aspect, an embodiment of the present invention provides an electronic device, including a memory and a processor, where the memory stores a computer program operable on the processor, and the processor executes the computer program to implement the steps of the method according to the first aspect.
In a fourth aspect, embodiments of the present invention provide a computer-readable medium, wherein the computer-readable medium stores computer-executable instructions that, when invoked and executed by a processor, cause the processor to implement the method of the first aspect.
The embodiment of the invention provides a route planning method, a device and electronic equipment, wherein in the method, a destination address route of a target router and adjacent routers of the target router are obtained firstly; inputting a destination address route of the target router and adjacent routers of the target router into a deep learning model obtained through pre-training, and obtaining weighing values corresponding to each executable action (including a next hop router of the target router and/or each path from the target router to the destination address route) of the target router based on the deep learning model obtained through pre-training; and finally, determining the target execution action of the target router based on the weighing value. According to the method, the target address route of the target router to be planned and the adjacent routers of the target router are input into the deep learning model, so that the weighing values corresponding to all executable actions of the target router can be obtained, the method can be applied to a novel network architecture of the SDN, the problem that a traditional reinforcement learning algorithm is difficult to apply to a high-dimensional state space and a continuous state space is solved, and the reliability of route planning is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart of a route planning method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of Seq2Seq model identification according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a route planning apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the following embodiments, and it should be understood that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In view of the problem that the existing route planning method is low in reliability, embodiments of the present invention provide a route planning method, a device, and an electronic device, which can be applied to improve the reliability of route planning, and the embodiments of the present invention are described in detail below.
An embodiment of the present invention provides a route planning method, which may be executed by an electronic device such as a mobile terminal or a computer, and mainly includes the following steps S102 to S106, referring to a flow chart of the route planning method shown in fig. 1:
step S102, the destination address route of the target router and the adjacent router of the target router are obtained.
The target router may be any router in the SDN network architecture, which needs to plan a packet forwarding path, and the destination address route is a packet forwarding destination of the target router, that is, the target router forwards the packet to the destination address route, where the neighboring routers include all neighboring routers of the target router.
And step S104, inputting the destination address route of the target router and the adjacent router of the target router into the pre-trained deep learning model, and obtaining the weighing values corresponding to each executable action of the target router based on the pre-trained deep learning model.
The executable actions include a next hop router of the target router and/or respective paths of the target router to the destination address route. The above-mentioned metric value may also be referred to as a reward (i.e. a reward fed back by the environment after the state transition), i.e. a reward fed back by the environment. In another embodiment, the current state of the target router may also be input into a deep learning model obtained through pre-training, where the current state of the target router includes the source router, the target router, and the destination address route, and the deep learning model obtains a reward corresponding to each packet forwarding path from the target router to the destination address route according to the source router, the target router, and the destination address route. And obtaining the reward corresponding to each data packet forwarding path from the target router to the destination address route by using a reinforcement learning algorithm according to the source router, the target router and the destination address route.
And step S106, determining the target execution action of the target router based on the metric value.
Because each executable action of the target router has a corresponding metric value, the executable action corresponding to the maximum metric value can be used as the target execution action of the target router, so that the target router can obtain the maximum metric value or receive the best reward fed back by the environment after the target router performs the action to the next router according to the target execution action or performs the action to the target address route according to the target execution action.
According to the route planning method provided by the embodiment, the target address route of the target router to be planned and the adjacent routers of the target router are input into the deep learning model, so that the weighing values corresponding to each executable action of the target router can be obtained, the method can be applied to a novel network architecture of an SDN, the problem that a traditional reinforcement learning algorithm is difficult to apply to a high-dimensional state space and a continuous state space is solved, and the reliability of route planning is improved.
In order to accurately obtain the metric values corresponding to the executable actions of the target router, this embodiment provides an implementation manner of obtaining the metric values corresponding to the executable actions of the target router based on a deep learning model obtained by training in advance, and the following steps (1) to (2) may be specifically referred to for execution:
step (1): and taking each adjacent router of the target router as the next hop router of the target router.
The deep learning model includes a Seq2Seq model, the Seq2Seq model is an architectural form represented by an encoding (Encode) and a decoding (Decode), and the Seq2Seq model is a state-action value that generates an output from a current state of an input. When the executable action includes a next-hop router of the target router, respective neighboring routers of the target router may be treated as next-hop routers of the target router in order to determine the next-hop router of the target router, wherein the source router cannot be treated as the next-hop router of the target router in order to prevent the target router from jumping back (i.e., jumping back to the source router).
Step (2): and determining the weighing values generated from the target router to each next-hop router based on a Seq2Seq model obtained by pre-training.
After the current state of the target router (source router, target router and destination address route) is input into the Seq2Seq model, referring to the Seq2Seq model identification diagram shown in fig. 2, the mapping matrix M of the slave routern×mObtaining a vector of the target router, wherein the mapping matrix Mn×mComprises the following steps:
Figure BDA0002464017890000071
such as from the mapping matrix Mn×mRespectively obtaining vectors h1, h2 and h3 of a source router, a target router and a destination address route, generating a current state vector c of the target router according to the vectors h1, h2 and h3 of the source router, the target router and the destination address route, and generating a current state vector c of the target router according to the current state vector c of the target router and each adjacent router of the target router
Figure BDA0002464017890000072
Can obtain y1Will y is1Metric values (rewards) Q(s) generated as target routers to respective next hop routerst,at). The Seq2Seq model is obtained by training a sample labeled with a metric value generated from a target router to each next-hop router.
In another embodiment, the above executable actsThe method comprises each path from a target router to a destination address route, and can also determine a metric value generated by each path from the target router to the destination address route based on a pre-trained Seq2Seq model. In this embodiment, the Seq2Seq model is based on the current state vector c of the target router and each neighboring router of the target router
Figure BDA0002464017890000073
The quantitative value (reward) Q(s) generated by each data packet forwarding path from the target router to the destination address can be obtainedt,at). The Seq2Seq model is obtained by training samples labeled with metric values generated by each path from the target router to the destination address route.
In a specific embodiment, the training process of the Seq2Seq model includes:
inputting a target training sample into the Seq2Seq model, and performing iterative training on the Seq2Seq model based on the target training sample until the training is finished to obtain the trained Seq2Seq model; the target training samples comprise samples marked with metric values generated from the target router to each next-hop router and/or samples marked with metric values generated from each path from the target router to the destination address route, and the metric values marked by the target training samples are rewards of the environment to the action (the executable action of the target router) obtained in advance based on the reinforcement learning algorithm. The mark for ending the training of the Seq2Seq model may be that the iterative training frequency of the Seq2Seq model reaches a preset frequency.
In order to obtain the target execution action of the target router, this embodiment provides a specific implementation manner for determining the target execution action of the target router based on the metric value, and the following manner one and manner two may be specifically referred to for execution:
the first method is as follows: and determining a corresponding target execution action when the weighing value is maximum based on a preset first greedy strategy equation, wherein the preset first greedy strategy equation is as follows:
Figure BDA0002464017890000081
wherein, Q(s)t,at) Is a constant value of atTo perform an action for the target, stIs the current network state of the target router (i.e., the current state of the target router), including the source router, the target router, and the destination address route. a istCan be selected as the next hop that the target router will pass through to the destination address route, Q(s)t,at) A reward for environmental feedback at the current state of the target router. Aiming at the current network state of the target router, acquiring the weighted value Q(s) by using a preset first greedy strategy equationt,at) Corresponding target execution action a when taking maximum valuet. The maximum reward for environmental feedback may be obtained when the target router forwards the packet according to the target execution action.
The second method comprises the following steps: and determining a corresponding target execution action when the weighing value is maximum based on a preset second greedy strategy equation, wherein the preset second greedy strategy equation is as follows:
Figure BDA0002464017890000082
wherein, taunThe calculation formula of the temperature parameter is as follows:
Figure BDA0002464017890000091
wherein, numnIs (tau)nn-1]Number of dynamic streams in the period, T time to achieve convergence, τ0And τTInitial and final values, respectively. In order to make the route simultaneously learn the dynamic property of the current network and reserve the utilization of the learned knowledge, the greedy strategy is improved, and a temperature parameter tau is addedn. Temperature parameter taunIs a time-varying parameter used to balance the exploration of unknown behavior with the selection of an existing policy when τnAt larger values, the probability of action being taken is almost the same, and τnAt smaller values, with τnThe number of the grooves is continuously reduced,the selection of target execution actions gradually approaches a greedy strategy. The temperature parameter can be used for exploring more possible next hop routes when the network dynamics is strong, and when a plurality of network conditions are learned and the network tends to be static, better next hop route selection is given according to the prior experience. Temperature parameter taunAs the arrival and departure of flows in the network fluctuate, the more the flows fluctuate, the more new alternative routes need to be explored, and the smaller the fluctuations, the better the benefits will be brought by taking the best known target to perform the action (routing policy).
In the routing planning method provided by this embodiment, on one hand, the Q value in the conventional Q-learning algorithm is estimated by using the deep learning Seq2Seq model, so that the problem that the conventional Q-learning algorithm is difficult to be applied to a high-dimensional state space and a continuous state space is solved, and compared with the Q-learning algorithm, the Q values in different states are calculated by using the deep learning Seq2Seq model, so that inconvenience caused by storing the Q values in the Q-learning algorithm in a Q table form is avoided; on the other hand, when the target execution action is determined, the temperature parameter is introduced, the fluctuation of the network flow is considered, and the obtained routing planning strategy can bring better network benefits.
Corresponding to the above route planning method, this embodiment provides a route planning device, referring to the schematic structural diagram of the route planning device shown in fig. 3, the device includes:
and a state obtaining module 31, configured to obtain a destination address route of the target router and a router adjacent to the target router.
A metric value determining module 32, configured to input the destination address route of the target router and the neighboring routers of the target router into a pre-trained deep learning model, and obtain, based on the pre-trained deep learning model, a metric value corresponding to each executable action of the target router; wherein the executable action includes a next hop router of the target router and/or respective paths of the target router to the destination address route.
And an action determining module 33, configured to determine a target execution action of the target router based on the metric value.
According to the route planning device provided by the embodiment, the destination address route of the target router to be planned and the adjacent routers of the target router are input into the deep learning model, so that the weighing values corresponding to each executable action of the target router can be obtained, the device can be applied to a novel network architecture of an SDN (software defined network), the problem that a traditional reinforcement learning algorithm is difficult to apply to a high-dimensional state space and a continuous state space is solved, and the reliability of route planning is improved.
In one embodiment, the deep learning model includes a Seq2Seq model; the executable action includes a next hop router of the target router; the metric value determining module 32 is further configured to use each neighboring router of the target router as a next-hop router of the target router; and determining the weighing values generated from the target router to each next-hop router based on a Seq2Seq model obtained by pre-training.
In one embodiment, the executable actions include respective paths from the target router to the destination address route; the metric value determining module 32 is further configured to determine, based on a pre-trained Seq2Seq model, metric values generated by paths from the target router to the destination address route.
In one embodiment, the training process of the Seq2Seq model includes: inputting a target training sample into the Seq2Seq model, and performing iterative training on the Seq2Seq model based on the target training sample until the training is finished to obtain the trained Seq2Seq model; the target training samples comprise samples marked with metric values generated from the target router to each next-hop router and/or samples marked with metric values generated from each path from the target router to the destination address route, and the metric values marked by the target training samples are environment rewards to actions obtained in advance based on a reinforcement learning algorithm.
In an embodiment, the action determining module 33 is further configured to determine, based on a preset first greedy policy equation, a corresponding target execution action when the metric value is maximum, where the preset first greedy policy equation is:
Figure BDA0002464017890000111
wherein, Q(s)t,at) Is a constant value of atTo perform an action for the target, stIs the current network state of the target router.
In an embodiment, the action determining module 33 is further configured to determine, based on a preset second greedy policy equation, a corresponding target execution action when the metric value is maximum, where the preset second greedy policy equation is:
Figure BDA0002464017890000112
wherein, taunIs a temperature parameter.
In one embodiment, the calculation formula of the temperature parameter is:
Figure BDA0002464017890000113
wherein, numnIs (tau)nn-1]Number of dynamic streams in the period, T time to achieve convergence, τ0And τTInitial and final values, respectively.
On one hand, the route planning apparatus provided in this embodiment estimates the Q value in the conventional Q-learning algorithm by using the deep learning Seq2Seq model, so as to solve the problem that the conventional Q-learning algorithm is difficult to be applied to a high-dimensional state space and a continuous state space, and compared with the Q-learning algorithm, calculates the Q values in different states by using the deep learning Seq2Seq model, thereby avoiding inconvenience caused by storing the Q values in the Q-learning form; on the other hand, when the target execution action is determined, the temperature parameter is introduced, the fluctuation of the network flow is considered, and the obtained routing planning strategy can bring better network benefits.
The device provided by the embodiment has the same implementation principle and technical effect as the foregoing embodiment, and for the sake of brief description, reference may be made to the corresponding contents in the foregoing method embodiment for the portion of the embodiment of the device that is not mentioned.
An embodiment of the present invention provides an electronic device, as shown in a schematic structural diagram of the electronic device shown in fig. 4, where the electronic device includes a processor 41 and a memory 42, where a computer program operable on the processor is stored in the memory, and when the processor executes the computer program, the steps of the method provided in the foregoing embodiment are implemented.
Referring to fig. 4, the electronic device further includes: a bus 44 and a communication interface 43, and the processor 41, the communication interface 43 and the memory 42 are connected by the bus 44. The processor 41 is arranged to execute executable modules, such as computer programs, stored in the memory 42.
The Memory 42 may include a high-speed Random Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 43 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, etc. may be used.
The bus 44 may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, an EISA (extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 4, but that does not indicate only one bus or one type of bus.
The memory 42 is configured to store a program, and the processor 41 executes the program after receiving an execution instruction, and the method executed by the apparatus defined by the flow process disclosed in any of the foregoing embodiments of the present invention may be applied to the processor 41, or implemented by the processor 41.
The processor 41 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 41. The Processor 41 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like. The device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 42, and the processor 41 reads the information in the memory 42 and performs the steps of the above method in combination with the hardware thereof.
Embodiments of the present invention provide a computer-readable medium, wherein the computer-readable medium stores computer-executable instructions, which, when invoked and executed by a processor, cause the processor to implement the method of the above-mentioned embodiments.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for routing, comprising:
acquiring a destination address route of a target router and an adjacent router of the target router;
inputting a pre-trained deep learning model into a destination address route of the target router and an adjacent router of the target router, and obtaining a weighing value corresponding to each executable action of the target router based on the pre-trained deep learning model; wherein the executable action comprises a next hop router of the target router and/or respective paths of the target router to the destination address route;
determining a target execution action for the target router based on the metric value.
2. The method of claim 1, wherein the deep learning model comprises a Seq2Seq model; the executable action comprises a next hop router of the target router;
the step of obtaining a metric value corresponding to each executable action of the target router based on the deep learning model obtained by the pre-training comprises the following steps:
taking each adjacent router of the target router as a next hop router of the target router;
and determining a metric value generated from the target router to each next-hop router based on the Seq2Seq model obtained by pre-training.
3. The method of claim 2, wherein the executable actions include respective paths of the target router to the destination address route;
the step of obtaining the metric values corresponding to each executable action of the target router based on the deep learning model obtained by pre-training comprises the following steps:
and determining the metric values generated by all paths from the target router to the destination address route based on the Seq2Seq model obtained by pre-training.
4. The method of claim 3, wherein the training process of the Seq2Seq model comprises:
inputting a target training sample into a Seq2Seq model, and carrying out iterative training on the Seq2Seq model based on the target training sample until the training is finished to obtain the trained Seq2Seq model; the target training samples comprise samples marked with metric values generated from the target router to each next-hop router and/or samples marked with metric values generated from each path from the target router to the destination address route, and the metric values marked by the target training samples are environment rewards to actions obtained in advance based on a reinforcement learning algorithm.
5. The method of claim 1, wherein the step of determining the target of the target router to perform the action based on the metric value comprises:
determining a corresponding target execution action when the weighing value is maximum based on a preset first greedy strategy equation, wherein the preset first greedy strategy equation is as follows:
Figure FDA0002464017880000021
wherein, Q(s)t,at) Is the said measure, atPerforming an action for the target, stIs the current network state of the target router.
6. The method of claim 5, wherein the step of determining the target of the target router to perform the action based on the metric value comprises:
and determining a corresponding target execution action when the weighing value is maximum based on a preset second greedy strategy equation, wherein the preset second greedy strategy equation is as follows:
Figure FDA0002464017880000022
wherein, taunIs a temperature parameter.
7. The method of claim 6, wherein the temperature parameter is calculated by:
Figure FDA0002464017880000023
wherein, numnIs (tau)nn-1]Number of dynamic streams in the period, T time to achieve convergence, τ0And τTInitial and final values, respectively.
8. A route planning apparatus, comprising:
the state acquisition module is used for acquiring a destination address route of a target router and an adjacent router of the target router;
the metric value determining module is used for inputting a destination address route of the target router and an adjacent router of the target router into a pre-trained deep learning model, and obtaining a metric value corresponding to each executable action of the target router based on the pre-trained deep learning model; wherein the executable action comprises a next hop router of the target router and/or respective paths of the target router to the destination address route;
and the action determining module is used for determining a target execution action of the target router based on the weighing value.
9. An electronic device comprising a memory and a processor, wherein the memory stores a computer program operable on the processor, and wherein the processor implements the method of any of claims 1-7 when executing the computer program.
10. A computer-readable medium having stored thereon computer-executable instructions that, when invoked and executed by a processor, cause the processor to implement the method of any one of claims 1-7.
CN202010330122.2A 2020-04-23 2020-04-23 Route planning method and device and electronic equipment Pending CN111526055A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010330122.2A CN111526055A (en) 2020-04-23 2020-04-23 Route planning method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010330122.2A CN111526055A (en) 2020-04-23 2020-04-23 Route planning method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN111526055A true CN111526055A (en) 2020-08-11

Family

ID=71904914

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010330122.2A Pending CN111526055A (en) 2020-04-23 2020-04-23 Route planning method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN111526055A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117033005A (en) * 2023-10-07 2023-11-10 之江实验室 Deadlock-free routing method and device, storage medium and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108900419A (en) * 2018-08-17 2018-11-27 北京邮电大学 Route decision method and device based on deeply study under SDN framework
CN110890985A (en) * 2019-11-27 2020-03-17 北京邮电大学 Virtual network mapping method and model training method and device thereof

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108900419A (en) * 2018-08-17 2018-11-27 北京邮电大学 Route decision method and device based on deeply study under SDN framework
CN110890985A (en) * 2019-11-27 2020-03-17 北京邮电大学 Virtual network mapping method and model training method and device thereof

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JUSTIN A.BOYAN: "Packet Routing in Dynamically Changing Networks:A Reinforcement Learning Approach", 《PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING SYSTEMS》 *
买天乐: "基于深度强化学习的路由调度机制研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
刘辰屹等: "基于机器学习的智能路由算法综述", 《计算机研究与发展》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117033005A (en) * 2023-10-07 2023-11-10 之江实验室 Deadlock-free routing method and device, storage medium and electronic equipment
CN117033005B (en) * 2023-10-07 2024-01-26 之江实验室 Deadlock-free routing method and device, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN108900419B (en) Routing decision method and device based on deep reinforcement learning under SDN framework
Talebi et al. Stochastic online shortest path routing: The value of feedback
EP2453612B1 (en) Bus control device
Wei et al. TRUST: A TCP throughput prediction method in mobile networks
WO2021052379A1 (en) Data stream type identification method and related devices
Lücking et al. Which is the worst-case Nash equilibrium?
CN108028805A (en) A kind of system and method for control flow equalization in band in software defined network
US20180324082A1 (en) Weight setting using inverse optimization
US20220366280A1 (en) Generating confidence scores for machine learning model predictions
CN113612692B (en) Centralized optical on-chip network self-adaptive route planning method based on DQN algorithm
CN113572697A (en) Load balancing method based on graph convolution neural network and deep reinforcement learning
CN111526055A (en) Route planning method and device and electronic equipment
Sivakumar et al. Prediction of traffic load in wireless network using time series model
Danielis et al. Dynamic flow migration for delay constrained traffic in software-defined networks
Suzuki et al. Multi-agent deep reinforcement learning for cooperative computing offloading and route optimization in multi cloud-edge networks
CN111340192A (en) Network path allocation model training method, path allocation method and device
CN114422453B (en) Method, device and storage medium for online planning of time-sensitive stream
CN108093083B (en) Cloud manufacturing task scheduling method and device and terminal
Zhang et al. Network performance reliability evaluation based on network reduction
JP2003037649A (en) Method for estimating contents distribution end time, recording medium and program
CN107710701A (en) Constraint separation path computing
CN111917657B (en) Method and device for determining flow transmission strategy
RU2757781C1 (en) Method for stable data routing in a virtual communication network
CN113839794B (en) Data processing method, device, equipment and storage medium
CN111083051B (en) Path planning method and device based on multiple intelligent agents and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200811